How to create and edit Expectations with instant feedback from a sample Batch of data
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
- Have a working installation of Great Expectations
- Configured a Data Context.
Steps
1. Use the CLI to begin the interactive process of creating Expectations
The --interactive
mode denotes the fact that you are interacting with your data. In other words, you have access to a DatasourceProvides a standard API for accessing and interacting with data from a wide variety of source systems. and can specify a BatchA selection of records from a Data Asset. of data to be used to create ExpectationsA verifiable assertion about data. against. --manual
mode (please see How to create and edit Expectations based on domain knowledge, without inspecting data directly) still allows you to create Expectations (e.g., if you already know enough about your data, such as the various columns in a database table), but you will not be able to ValidateThe act of applying an Expectation Suite to a Batch. data until you specify a Batch of data, which can be done at a later point; in fact, you can switch back and forth between the interactive and manual modes, and all your Expectations will be intact.
Run this command in the root directory of your project (where the init command created the great_expectations
subdirectory:
great_expectations suite new --interactive
This command prompts you to select a Datasource, a Data ConnectorProvides the configuration details based on the source data system which are needed by a Datasource to define Data Assets., and a Data AssetA collection of records within a Datasource which is usually named based on the underlying data system and sliced to correspond to a desired specification. so as to identify a sample Batch of data the Expectation SuiteA collection of verifiable assertions about data. will eventually describe. If there are unique choices (e.g., only one Data Connector in your Datasource configuration), then Great Expectations will automatically select it for you (to speed up the process).
Finally, unless you specify the name of the Expectation Suite on the command line (using the--expectation-suite
option), the command will ask you to name your new Expectation Suite and offer you a default name to simply accept, or provide your own.
Then an empty suite is created and added to your project.
Then Great Expectations creates a Jupyter Notebook for you to start creating your new suite. The command concludes by opening the newly generated Jupyter Notebook.
2. (Optional) If you wish to skip the automated opening of Jupyter Notebook, add the --no-jupyter
flag
great_expectations suite new --interactive --no-jupyter
3. (Optional) Use the --profile
CLI flag to assist creating your expectations in the interactive mode.
One of the easiest ways to get starting in the interactive mode is to take advantage of the --profile
flag (please see How to create and edit Expectations with a Profiler).
When in the interactive mode, the initialization cell of your Jupyter Notebook will contain the batch_request
dictionary. You can convert it to JSON and save in a file for future use. The contents of this file would look like this:
{
"datasource_name": "my_datasource",
"data_connector_name": "my_data_connector",
"data_asset_name": "my_asset"
}
You can then utilize this saved batch_request
(containing any refinements you may have made to it in your notebook) and skip the steps of selecting its components:
great_expectations suite new --interactive --batch-request my_saved_batch_request_file.json
Unless you specify the name of the Expectation Suite on the command line (using the --expectation_suite MY_SUITE
syntax),
the command will ask you to name your new Expectation Suite and offer you a default name for you to simply accept, or provide your own.
You can extend the previous example to specify the name of the Expectation Suite on the command line as follows:
great_expectations suite new --expectation-suite my_suite --interactive --batch-request my_saved_batch_request.json
To check the syntax, you can always run the following command in the root directory of your project (where the init
command created the great_expectations
subdirectory:
great_expectations suite new --help