How to create a new Checkpoint
This guide will help you create a new CheckpointThe primary means for validating data in a production deployment of Great Expectations., which allows you to couple an Expectation SuiteA collection of verifiable assertions about data. with a data set to ValidateThe act of applying an Expectation Suite to a Batch..
Note: As of Great Expectations version 0.13.7, we have updated and improved the Checkpoints feature. You can continue to use your existing legacy Checkpoint workflows if you’re working with concepts from the Batch Kwargs (v2) API. If you’re using concepts from the BatchRequest (v3) API, please refer to the new Checkpoints guides.
Steps for Checkpoints (>=0.13.12)
This how-to guide assumes you have already:
1. Use the CLI to open a Jupyter Notebook for creating a new Checkpoint
To assist you with creating Checkpoints, our CLICommand Line Interface has a convenience method that will open a Jupyter Notebook with all the scaffolding you need to easily configure and save your Checkpoint. Simply run the following CLI command from your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.:
great_expectations checkpoint new my_checkpoint
2. Configure and save your Checkpoint in the provided Jupyter Notebook
The Jupyter Notebook which was opened in the previous step will guide you through the steps of creating a Checkpoint. It will also include a default configuration that you can edit to suite your use case. The following sections of this document walk you through an example of how to do this.
3. Configuring a SimpleCheckpoint (Example)
3a. Edit the configuration
For this example, we’ll demonstrate using a basic
Checkpoint configuration with the
SimpleCheckpoint
class, which takes care
of some defaults. Replace all names such as
my_datasource
with the respective
DatasourceProvides a standard API for accessing and
interacting with data from a wide variety of
source systems.,
Data ConnectorProvides the configuration details based on the
source data system which are needed by a
Datasource to define Data Assets.,
Data AssetA collection of records within a Datasource which
is usually named based on the underlying data
system and sliced to correspond to a desired
specification., and Expectation Suite names you have configured in
your great_expectations.yml
.
config = """
name: my_checkpoint
config_version: 1
class_name: SimpleCheckpoint
validations:
- batch_request:
datasource_name: my_datasource
data_connector_name: my_data_connector
data_asset_name: MyDataAsset
data_connector_query:
index: -1
expectation_suite_name: my_suite
"""
This is the minimum required to configure a Checkpoint
that will run the Expectation Suite
my_suite
against the Data Asset
MyDataAsset
. See
How to configure a new Checkpoint using
test_yaml_config
for advanced configuration options.
3b. Test your config using
context.test_yaml_config
context.test_yaml_config(yaml_config=config)
When executed, test_yaml_config will instantiate the component and run through a self_check procedure to verify that the component works as expected.
In the case of a Checkpoint, this means
- validating the yaml configuration,
- verifying that the Checkpoint class with the given configuration, if valid, can be instantiated, and
- printing warnings in case certain parts of the configuration, while valid, may be incomplete and need to be better specified for a successful Checkpoint operation.
The output will look something like this:
Attempting to instantiate class from config...
Instantiating as a SimpleCheckpoint, since class_name is SimpleCheckpoint
Successfully instantiated SimpleCheckpoint
Checkpoint class name: SimpleCheckpoint
If something about your configuration wasn’t set up
correctly, test_yaml_config
will raise an
error.
2c. Store your Checkpoint config
After you are satisfied with your configuration, save it by running the appropriate cells in the Jupyter Notebook.
2d. (Optional) Check your stored Checkpoint config
If the
StoreA connector to store and retrieve information
about metadata in Great Expectations.
backend of your
Checkpoint StoreA connector to store and retrieve information
about means for validating data in a production
deployment of Great Expectations.
is on the local filesystem, you can navigate to the
checkpoints
store directory that is
configured in great_expectations.yml
and
find the configuration files corresponding to the
Checkpoints you created.
2e. (Optional) Test run the new Checkpoint and open Data Docs
Now that you have stored your Checkpoint configuration
to the Store backend configured for the Checkpoint
Configuration store of your Data Context, you can also
test context.run_checkpoint
, right within
your Jupyter Notebook by running the appropriate
cells.
Before running a Checkpoint, make sure that all classes and Expectation Suites referred to in the configuration exist.
When run_checkpoint
returns, the
checkpoint_run_result
can then be checked
for the value of the success
field (all
validations passed) and other information associated
with running the specified
ActionsA Python class with a run method that takes a
Validation Result and does something with it.
For more advanced configurations of Checkpoints, please see How to configure a new Checkpoint using test_yaml_config.