How to dynamically load evaluation parameters from a database
This guide will help you create an ExpectationA verifiable assertion about data. that loads part of its Expectation configuration from a database at runtime. Using a dynamic Evaluation ParameterA dynamic value used during Validation of an Expectation which is populated by evaluating simple expressions or by referencing previously generated metrics. makes it possible to maintain part of an Expectation SuiteA collection of verifiable assertions about data. in a shared database.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A working deployment of Great Expectations.
- Credentials for a database to query for dynamic values.
- A SQL query to return values for your expectation configuration.
Steps
1. Add a new SqlAlchemy Query Store to your Data Context
A SqlAlchemy Query StoreA connector to store and retrieve information about metadata in Great Expectations. acts as a bridge that can query a SqlAlchemy-connected database and return the result of the query to be available for an evaluation parameter.
Find the stores
section in your
great_expectations.yml
file, and add the
following configuration for a new store called
"my_query_store". You can add and reference
multiple Query Stores with different names.
By default, query results will be returned as a list. If instead you need a scalar for your expectation, you can specify the return_type
my_query_store:
class_name: SqlAlchemyQueryStore
credentials: ${my_query_store_creds}
queries:
# The query name (e.g. unique_passenger_counts, max_passenger_counts) can be replaced with your desired name
unique_passenger_counts: SELECT DISTINCT passenger_count FROM postgres_taxi_data ORDER BY passenger_count ASC;
max_passenger_counts:
query: SELECT max(passenger_count) FROM postgres_taxi_data;
return_type: scalar # return_type can be either "scalar" or "list" or omitted
Ensure you have added valid credentials to the
config-variables.yml
file (replacing the
values with your database credentials):
my_query_store_creds:
drivername: postgresql
host: localhost
port: 5432
username: postgres
password:
database: test_ci
2. In a notebook, get a test Batch of data to use for Validation
import great_expectations as gx
context = gx.get_context()
pg_datasource = context.sources.add_sql(
name="version-0.16.16 pg_datasource", connection_string=PG_CONNECTION_STRING
)
table_asset = pg_datasource.add_table_asset(
name="postgres_taxi_data", table_name="version-0.16.16 postgres_taxi_data"
)
batch_request = table_asset.build_batch_request()
validator = context.get_validator(
batch_request=batch_request, create_expectation_suite_with_name="version-0.16.16 my_suite_name"
)
3. Define an Expectation that relies on a dynamic query
Great Expectations recognizes several types of
Evaluation Parameters that can use advanced features
provided by the
Data ContextThe primary entry point for a Great Expectations
deployment, with configurations and methods for
all supporting components.. To dynamically load data, we will be using a
store-style URN, which starts with
urn:great_expectations:stores
. The next
component of the URN is the name of the store we
configured above (my_query_store
), and
the final component is the name of the query we
defined above (unique_passenger_counts
):
validator_results = validator.expect_column_values_to_be_in_set(
column="passenger_count",
value_set={
"$PARAMETER": "urn:great_expectations:stores:my_query_store:unique_passenger_counts"
},
)
The SqlAlchemyQueryStore that you configured above
will execute the defined query and return the results
as the value of the value_set
parameter
to evaluate your Expectation:
expected_validator_results = """
{
"expectation_config": {
"expectation_type": "expect_column_values_to_be_in_set",
"meta": {},
"kwargs": {
"column": "passenger_count",
"value_set": [
1,
2,
3,
4,
5,
6
],
"batch_id": "pg_datasource-postgres_taxi_data"
}
},
"meta": {},
"result": {
"element_count": 10000,
"unexpected_count": 0,
"unexpected_percent": 0.0,
"partial_unexpected_list": [],
"missing_count": 0,
"missing_percent": 0.0,
"unexpected_percent_total": 0.0,
"unexpected_percent_nonmissing": 0.0
},
"success": true,
"exception_info": {
"raised_exception": false,
"exception_traceback": null,
"exception_message": null
}
}
"""