Expectation Suite
Overview
Definition
An Expectation Suite is a collection of verifiable assertions about data.
Features and promises
Expectation Suites combine multiple
ExpectationsA verifiable assertion about data.
into an overall description of data. For example, a
team can group all the Expectations about a given
table in given database into an Expectation Suite and
call it my_database.my_table
. Note these
names are completely flexible and the only constraint
on the name of a suite is that it must be unique to a
given project.
Relationship to other objects
Expectation Suites are stored in an Expectation StoreA connector to store and retrieve information about collections of verifiable assertions about data.. They are generated interactively using a ValidatorUsed to run an Expectation Suite against data. or automatically using ProfilersGenerates Metrics and candidate Expectations from data., and are used by CheckpointsThe primary means for validating data in a production deployment of Great Expectations. to ValidateThe act of applying an Expectation Suite to a Batch. data.
Use cases
Create Expectations |
The lifecycle of an Expectation Suite starts with creating it. Then it goes through an iterative loop of Review and Edit as the team's understanding of the data described by the suite evolves.
Expectation Suites are largely managed automatically in the workflows for creating Expectations. When the Expectations are created, an Expectation Suite is created to contain them. In the Profiling workflow, this Expectation Suite will contain all the Expectations generated by the Profiler. In the interactive workflow, an Expectation Suite will be configured to include Expectations as they are defined, but will not be saved to an Expectation Store until you issue the command for it to be.
For more information on these processes, please see:
- Our overview on the process of Creating Expectations
- Our guide on how to create and edit Expectations with a Profiler
- Our guide on how to create and edit Expectations with instant feedback from a sample Batch of data
Validate Data |
Expectation Suites are used during the Validation of data. In this step, you will need to provide one or more Expectation Suites to a Checkpoint. This can either be done by configuring the Checkpoint to use a preset list of one or more Expectation Suites, or by configuring the Checkpoint to accept a list of one or more Expectation Suites at runtime.
Features
CRUD operations
A Great Expectations Expectation Suite enables you to perform Create, Read, Update, and Delete (CRUD) operations on the Suite's Expectations without needing to re-run them.
Reusability
Expectation Suites are primarily used by Checkpoints, which can accept a list of one or more Expectation Suite and Batch Request pairs. Because they are stored independently of the Checkpoints that use them, the same Expectation Suite can be included in the list for multiple Checkpoints, provided the Expectation Suite contains a list of Expectations that describe the data that Checkpoint will Validate. You can even use the same Expectation Suite multiple times within the same Checkpoint by pairing it with different Batch Requests.
API basics
CRUD operations
Each of the Expectation Suite methods that support a
Create, Read, Update, or Delete (CRUD) operation
relies on two main parameters -
expectation_configuration
and
match_type
.
-
expectation_configuration - an
ExpectationConfiguration
object that is used to determine whether and where this Expectation already exists within the Suite. It can be a complete or a partial ExpectationConfiguration. -
match_type - a string with the
value of
domain
,success
, orruntime
which determines the criteria used for matching:-
domain
checks whether two Expectation Configurations apply to the same data. It results in the loosest match, and can use the least complete ExpectationConfiguration object. For example, for a column map Expectation, adomain
match_type will check that the expectation_type matches, and that the column and any row_conditions that affect which rows are evaluated by the Expectation match. -
success
criteria are more exacting - in addition to thedomain
kwargs, these include those kwargs used when evaluating the success of an Expectation, likemostly
,max
, orvalue_set
. -runtime
are the most specific - in addition todomain_kwargs
andsuccess_kwargs
, these include kwargs used for runtime configuration. Currently, these includeresult_format
,include_config
, andcatch_exceptions
-
How to access
You will rarely need to directly access an Expectation Suite. If you do need to edit one, the simplest way is through the CLI. To do so, run the command:
great_expectations suite edit NAME_OF_YOUR_SUITE_HERE
This will open a Jupyter Notebook where each Expectation in the Expectation Suite is loaded as an individual cell. You can edit, remove, and add Expectations in this list. Running the cells will create the Expectations in a new Expectation Suite, which you can then save over the old Expectation Suite or save under a new name. The Expectation Suite and any changes made will not be stored until you give the command for it to be saved, however.
In almost all other circumstances you will simply pass the name of any relevant Expectation Suites to an object such as a Checkpoint that will manage accessing and using it for you.
Saving Expectation Suites
Each Expectation Suite is saved in an Expectation
Store, as a JSON file in the
great_expectations/expectations
subdirectory of the Data Context. Best practice is for
users to check these files into the version control
each time they are updated, in the same way they treat
their source files. This discipline allows data
quality to be an integral part of versioned pipeline
releases.
You can save an Expectation Suite by using a
Validator'sUsed to run an Expectation Suite against
data.
save_expectation_suite()
method. This
method will be included in the last cell of any
Jupyter notebook launched from the CLI for the purpose
of creating or editing Expectations.