Data Context
A Data Context represents a Great Expectations project. It organizes storage and access for Expectation Suites, Datasources, notification settings, and data fixtures.
The Data Context is configured via a yml file or directly in code. The configuration and managed Expectation Suites should be stored in version control.
Data Contexts manage connections to your data and compute resources, and support integration with execution frameworks ( such as Airflow, Nifi, dbt, or Dagster) to describe and produce batches of data ready for analysis. Those features enable fetching, validation, profiling, and documentation of your data in a way that is meaningful within your existing infrastructure and work environment.
Data Contexts also manage Expectation Suites. Expectation Suites combine multiple Expectation Configurations into an overall description of a dataset. Expectation Suites should have names corresponding to the kind of data they define, like “NPI” for National Provider Identifier data or “company.users” for a users table.
The Data Context also provides other services, such as storing and substituting evaluation parameters during validation. See Evaluation Parameter stores for more information.
Interactively testing configurations
Especially during the beginning of a Great Expecations
project, it is often incredibly useful to rapidly
iterate over configurations of key Data Context
components. The test_yaml_config
feature
makes that easy.
test_yaml_config
is a convenience method
for configuring the moving parts of a Great
Expectations deployment. It allows you to quickly test
out configs for Datasources, Checkpoints, and each
type of Store (ExpectationStores,
ValidationResultStores, and MetricsStores). For many
deployments of Great Expectations, these components
(plus Expectations) are the only ones you'll
need.
Here's a typical example:
config = """
class_name: Datasource
execution_engine:
class_name: PandasExecutionEngine
data_connectors:
my_data_connector:
class_name: InferredAssetFilesystemDataConnector
base_directory: {data_base_directory}
glob_directive: "*/*.csv"
default_regex:
pattern: (.+)/(.+)\\.csv
group_names:
- data_asset_name
- partition
"""
my_context.test_yaml_config(
config=config
)
Running test_yaml_config
will show some
feedback on the configuration. The helpful output can
include any result from the "self check" of
an artifact produced using that configuration.
Evaluation Parameter Stores
An Evaluation Parameter Store is a kind of Metric Store that makes it possible to build Expectation Suites that depend on values from other batches of data, such as ensuring that the number of rows in a downstream dataset equals the number of unique values from an upstream one. A Data Context can manage a store to facilitate that validation scenario.