How to configure and use a MetricStore
Metric storage is an experimental feature.
A MetricStore
is a
StoreA connector to store and retrieve information
about metadata in Great Expectations.
that stores Metrics computed during Validation. A
MetricStore
tracks the
run_id
of the Validation and the
Expectation SuiteA collection of verifiable assertions about
data.
name in addition to the Metric name and Metric kwargs.
Saving MetricsA computed attribute of data such as the mean of a column. during ValidationThe act of applying an Expectation Suite to a Batch. lets you construct a new data series based on observed dataset characteristics computed by Great Expectations. A data series can serve as the source for a dashboard, or overall data quality metrics.
Prerequisites
- A Great Expectations instance
- Completion of the Quickstart
- A configured Data Context
1. Add a MetricStore
To define a MetricStore
, add a
Metric StoreA connector to store and retrieve information
about computed attributes of data, such as the
mean of a column.
configuration to the stores
section of
your great_expectations.yml
. The
configuration must include the following keys:
-
class_name
- EnterMetricStore
. This key determines which class is instantiated to create theStoreBackend
. Other fields are passed through to theStoreBackend
class on instantiation. The only backend Store under test for use with aMetricStore
is theDatabaseStoreBackend
with Postgres. -
store_backend
- Defines how your metrics are persisted.
To use an SQL Database such as Postgres, add the following fields and values:
-
class_name
- EnterDatabaseStoreBackend
. -
credentials
- Point to the credentials defined in yourconfig_variables.yml
, or define them inline.
The following is an example of how the
MetricStore
configuration appears in
great_expectations.yml
:
stores:
# ...
metric_store: # You can choose any name as the key for your metric store
class_name: MetricStore
store_backend:
class_name: DatabaseStoreBackend
credentials: ${my_store_credentials}
# alternatively, define credentials inline:
# credentials:
# username: my_username
# password: my_password
# port: 1234
# host: xxxx
# database: my_database
# driver: postgresql
The next time your Data Context is loaded, it will connect to the database and initialize a table to store metrics if one has not already been created.
2. Configure a Validation Action
When a MetricStore
is available, add a
StoreMetricsAction
validation
ActionA Python class with a run method that takes a
Validation Result and does something with it
to your
CheckpointThe primary means for validating data in a
production deployment of Great Expectations.
to save Metrics during Validation. The validation
Action must include the following fields:
-
class_name
- EnterStoreMetricsAction
. Determines which class is instantiated to execute the Action. -
target_store_name
- Enter the key for the MetricStore you added in yourgreat_expectations.yml
. In the previous example, themetrics_store
field defines which Store backend to use when persisting the metrics. -
requested_metrics
- Identify the Expectation Suites and Metrics you want to store.
Add the following entry to
great_expectations.yml
to generate
Validation ResultGenerated when data is Validated against an
Expectation or Expectation Suite.
statistics:
expectation_suite_name:
statistics.<statistic name>
Add the following entry to
great_expectations.yml
to generate values
from a specific
ExpectationA verifiable assertion about data.
result
field:
expectation_suite_name:
- column:
<column name>:
<expectation name>.result.<value name>
To indicate that any Expectation Suite can be used to
generate values, use the wildcard
"*"
.
If you use an Expectation Suite name as a key,
Metrics are only added to the
MetricStore
when the Expectation
Suite runs. When you use the wildcard
"*"
, Metrics are added to
the MetricStore
for each Expectation
Suite that runs in the Checkpoint.
The following example yaml configuration adds
StoreMetricsAction
to the
taxi_data
dataset:
action_list:
# ...
- name: store_metrics
action:
class_name: StoreMetricsAction
target_store_name: metric_store # This should match the name of the store configured above
requested_metrics:
public.taxi_data.warning: # match a particular expectation suite
- column:
passenger_count:
- expect_column_values_to_not_be_null.result.element_count
- expect_column_values_to_not_be_null.result.partial_unexpected_list
- statistics.successful_expectations
"*": # wildcard to match any expectation suite
- statistics.evaluated_expectations
- statistics.success_percent
- statistics.unsuccessful_expectations
3. Test your MetricStore and StoreMetricsAction
Run the following command to run your Checkpoint and
test StoreMetricsAction
:
import great_expectations as gx
context = gx.get_context()
checkpoint_name = "version-0.16.16 your checkpoint name here"
context.run_checkpoint(checkpoint_name=checkpoint_name)