How to configure and use a MetricStore
Saving MetricsA computed attribute of data such as the mean of a column. during ValidationThe act of applying an Expectation Suite to a Batch. makes it easy to construct a new data series based on observed dataset characteristics computed by Great Expectations. That data series can serve as the source for a dashboard or overall data quality metrics, for example.
Storing metrics is still an experimental feature of Great Expectations, and we expect configuration and capability to evolve rapidly.
Steps
1. Adding a MetricStore
A MetricStore
is a special
StoreA connector to store and retrieve information
about metadata in Great Expectations.
that can store Metrics computed during Validation. A
MetricStore
tracks the run_id of the
Validation and the
Expectation SuiteA collection of verifiable assertions about
data.
name in addition to the Metric name and Metric kwargs.
To define a MetricStore
, add a
Metric StoreA connector to store and retrieve information
about computed attributes of data, such as the
mean of a column.
config to the stores
section of your
great_expectations.yml
. This config
requires two keys:
-
The
class_name
field determines which class will be instantiated to create this store, and must beMetricStore
. -
The
store_backend
field configures the particulars of how your metrics will be persisted.
The class_name
field determines which
class will be instantiated to create this
StoreBackend
, and other fields are passed
through to the StoreBackend class on instantiation.
In theory, any valid StoreBackend can be used, however
at the time of writing, the only BackendStore under
test for use with a MetricStore
is the
DatabaseStoreBackend with Postgres.
To use an SQL Database like Postgres, provide two
fields: class_name
, with the value of
DatabaseStoreBackend
, and
credentials
. Credentials can point to
credentials defined in your
config_variables.yml
, or alternatively
can be defined inline.
stores:
# ...
metric_store: # You can choose any name as the key for your metric store
class_name: MetricStore
store_backend:
class_name: DatabaseStoreBackend
credentials: ${my_store_credentials}
# alternatively, define credentials inline:
# credentials:
# username: my_username
# password: my_password
# port: 1234
# host: xxxx
# database: my_database
# driver: postgresql
The next time your DataContext is loaded, it will connect to the database and initialize a table to store metrics if one has not already been created. See the metrics_reference for more information on additional configuration options.
2. Configuring a Validation Action
Once a MetricStore
is available, a
StoreMetricsAction
validation
ActionA Python class with a run method that takes a
Validation Result and does something with it
can be added to your
CheckpointThe primary means for validating data in a
production deployment of Great Expectations.
in order to save Metrics during Validation. This
validation Action has three required fields:
-
The
class_name
field determines which class will be instantiated to execute this action, and must beStoreMetricsAction
. -
The
target_store_name
field defines which Store backend to use when persisting the metrics. This should match the key of the MetricStore you added in yourgreat_expectations.yml
, which in our example above ismetrics_store
. -
The
requested_metrics
field identifies which Expectation Suites and Metrics to store. Please note that this API is likely to change in a future release.
expectation_suite_name:
statistics.<statistic name>
Values from inside a particular
Expectation'sA verifiable assertion about data.
result
field are available using the
following format:
expectation_suite_name:
- column:
<column name>:
<expectation name>.result.<value name>
In place of the Expectation Suite name, you may use
"*"
to denote that any
Expectation Suite should match.
If an Expectation Suite name is used as a key,
those Metrics will only be added to the
MetricStore
when that Suite is run.
When the wildcard "*"
is
used, those metrics will be added to the
MetricStore
for each Suite which runs
in the Checkpoint.
Here is an example yaml config for adding a
StoreMetricsAction
to the
taxi_data
dataset:
action_list:
# ...
- name: store_metrics
action:
class_name: StoreMetricsAction
target_store_name: metric_store # This should match the name of the store configured above
requested_metrics:
public.taxi_data.warning: # match a particular expectation suite
- column:
passenger_count:
- expect_column_values_to_not_be_null.result.element_count
- expect_column_values_to_not_be_null.result.partial_unexpected_list
- statistics.successful_expectations
"*": # wildcard to match any expectation suite
- statistics.evaluated_expectations
- statistics.success_percent
- statistics.unsuccessful_expectations
3. Test your MetricStore and StoreMetricsAction
To test your StoreMetricsAction
, run your
Checkpoint from your code or the
CLICommand Line Interface:
import great_expectations as gx
context = gx.get_context()
checkpoint_name = "version-0.15.50 your checkpoint name here"
context.run_checkpoint(checkpoint_name=checkpoint_name)
$ great_expectations checkpoint run <your checkpoint name>
Summary
The
StoreMetricsValidationAction
processes an
ExpectationValidationResult
and stores
Metrics to a configured Store.
Now, after your Checkpoint is run, the requested metrics will be available in your database!