How to create Expectations that span multiple Batches using Evaluation Parameters
This guide will help you create ExpectationsA verifiable assertion about data. that span multiple BatchesA selection of records from a Data Asset. of data using Evaluation ParametersA dynamic value used during Validation of an Expectation which is populated by evaluating simple expressions or by referencing previously generated metrics. (see also Evaluation Parameter StoresA connector to store and retrieve information about parameters used during Validation of an Expectation which reference simple expressions or previously generated metrics.). This pattern is useful for things like verifying that row counts between tables stay consistent.
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
- Have a working installation of Great Expectations
- Configured a Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components..
- Configured a DatasourceProvides a standard API for accessing and interacting with data from a wide variety of source systems. (or several Datasources) with at least two Data AssetsA collection of records within a Datasource which is usually named based on the underlying data system and sliced to correspond to a desired specification. and understand the basics of Batch RequestsProvided to a Datasource in order to create a Batch..
- Also created Expectations SuitesA collection of verifiable assertions about data. for those Data Assets.
-
Have a working Evaluation Parameter store. (The
default in-memory
StoreA connector to store and retrieve
information about metadata in Great
Expectations.
from
great_expectations init
can work for this.) - Have a working CheckpointThe primary means for validating data in a production deployment of Great Expectations.
Steps
In a notebook,
1. Import great_expectations and instantiate your Data Context
import great_expectations as ge
context = ge.DataContext()
2. Instantiate two Validators, one for each Data Asset
We'll call one of these ValidatorsUsed to run an Expectation Suite against data. the upstream Validator and the other the downstream Validator. Evaluation Parameters will allow us to use Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite. from the upstream Validator as parameters passed into Expectations on the downstream.
It's common (but not required) for both Batch Requests to have the same Datasource and Data ConnectorProvides the configuration details based on the source data system which are needed by a Datasource to define Data Assets..
batch_request_1 = BatchRequest(
datasource_name="my_datasource",
data_connector_name="my_data_connector",
data_asset_name="my_data_asset_1"
)
upstream_validator = context.get_validator(batch_request=batch_request_1, expectation_suite_name="my_expectation_suite_1")
batch_request_2 = BatchRequest(
datasource_name="my_datasource",
data_connector_name="my_data_connector",
data_asset_name="my_data_asset_2"
)
downstream_validator = context.get_validator(batch_request=batch_request_2, expectation_suite_name="my_expectation_suite_2")
3. Disable interactive evaluation for the downstream Validator
downstream_validator.interactive_evaluation = False
Disabling interactive evaluation allows you to declare an Expectation even when it cannot be evaluated immediately.
4. Define an Expectation using an Evaluation Parameter on the downstream Validator
eval_param_urn = 'urn:great_expectations:validations:my_expectation_suite_1:expect_table_row_count_to_be_between.result.observed_value'
downstream_validator.expect_table_row_count_to_equal(
value={
'$PARAMETER': eval_param_urn, # this is the actual parameter we're going to use in the validation
}
)
The core of this is a
$PARAMETER : URN
pair. When Great
Expectations encounters a $PARAMETER
flag
during
ValidationThe act of applying an Expectation Suite to a
Batch., it will replace the URN
with a value
retrieved from an Evaluation Parameter Store or
Metrics StoreA connector to store and retrieve information
about computed attributes of data, such as the
mean of a column.
(see also
How to configure a MetricsStore).
This declaration above includes two
$PARAMETERS
. The first is the real
parameter that will be used after the Expectation
Suite is stored and deployed in a Validation Operator.
The second parameter supports immediate evaluation in
the notebook.
When executed in the notebook, this Expectation will generate a Validation Result. Most values will be missing, since interactive evaluation was disabled.
{
"result": {},
"success": null,
"meta": {},
"exception_info": {
"raised_exception": false,
"exception_traceback": null,
"exception_message": null
}
}
Your URN must be exactly correct in order to work in production. Unfortunately, successful execution at this stage does not guarantee that the URN is specified correctly and that the intended parameters will be available when executed later.
5. Save your Expectation Suite
downstream_validator.save_expectation_suite(discard_failed_expectations=False)
This step is necessary because your
$PARAMETER
will only function properly
when invoked within a Validation operation with
multiple Validators. The simplest way to execute such
an operation is through a :ref:Validation Operator
<reference__core_concepts__validation__validation_operator>
, and Validation Operators are configured to load
Expectation Suites from
Expectation StoresA connector to store and retrieve information
about collections of verifiable assertions about
data., not memory.
6. Execute an existing Checkpoint
You can do this within your notebook by running
context.run_checkpoint
.
results = context.run_checkpoint(
checkpoint_name="my_checkpoint"
)
7. Rebuild Data Docs and review results in docs
You can do this within your notebook by running:
context.build_data_docs()
You can also execute from the command line with:
great_expectations docs build
Once your Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. rebuild, open them in a browser and navigate to the page for the new Validation Result.
If your Evaluation Parameter was executed successfully, you'll see something like this:
If it encountered an error, you'll see something like this. The most common problem is a mis-specified URN name.