Step 4: Validate data
How-to guides
Core skills
How to validate data by running a Checkpoint

Version: 0.14.13

How to validate data by running a Checkpoint

This guide will help you ValidateThe act of applying an Expectation Suite to a Batch. your data by running a CheckpointThe primary means for validating data in a production deployment of Great Expectations..

As stated in the Getting Started Tutorial Step 4: Validate data, the best way to Validate data in production with Great Expectations is using a CheckpointThe primary means for validating data in a production deployment of Great Expectations.. The advantage of using a Checkpoint is ease of use, due to its principal capability of combining the existing configuration in order to set up and perform the Validation:

Expectation SuitesA collection of verifiable assertions about data.
Data ConnectorsProvides the configuration details based on the source data system which are needed by a Datasource to define Data Assets.
Batch RequestsProvided to a Datasource in order to create a Batch.
ActionsA Python class with a run method that takes a Validation Result and does something with it

Otherwise, configuring these validation parameters would have to be done via the API. A Checkpoint encapsulates this "boilerplate" and ensures that all components work in harmony together. Finally, running a configured Checkpoint is a one-liner, as described below.

Prerequisites: This how-to guide assumes you have:

Completed the Getting Started Tutorial
Have a working installation of Great Expectations
Configured a Data Context.
Configured an Expectations Suite.
Configured a Checkpoint

You can run the Checkpoint from the CLICommand Line Interface in a Terminal shell or using Python.

Terminal
Python

Steps

1. Run your Checkpoint

Checkpoints can be run like applications from the command line by running:

great_expectations checkpoint run my_checkpoint
Validation failed!

2. Observe the output

The output of your validation will tell you if all validations passed or if any failed.

Additional notes

This command will return posix status codes and print messages as follows:

+-------------------------------+-----------------+-----------------------+
| **Situation**                 | **Return code** | **Message**           |
+-------------------------------+-----------------+-----------------------+
| all validations passed        | 0               | Validation succeeded! |
+-------------------------------+-----------------+-----------------------+
| one or more validation failed | 1               | Validation failed!    |
+-------------------------------+-----------------+-----------------------+

                                    
                                  

Steps

1. Generate the Python script

From your console, run the CLI command:

great_expectations checkpoint script my_checkpoint

After the command runs, you will see a message about where the Python script was created similar to the one below:

A Python script was created that runs the checkpoint named: `my_checkpoint`
  - The script is located in `great_expectations/uncommitted/run_my_checkpoint.py`
  - The script can be run with `python great_expectations/uncommitted/run_my_checkpoint.py`

2. Open the script

The script that was produced should look like this:

"""
This is a basic generated Great Expectations script that runs a Checkpoint.

Checkpoints are the primary method for validating batches of data in production and triggering any followup actions.

A Checkpoint facilitates running a validation as well as configurable Actions such as updating Data Docs, sending a
notification to team members about Validation Results, or storing a result in a shared cloud storage.

Checkpoints can be run directly without this script using the `great_expectations checkpoint run` command.  This script
is provided for those who wish to run Checkpoints in Python.

Usage:
- Run this file: `python great_expectations/uncommitted/run_my_checkpoint.py`.
- This can be run manually or via a scheduler such, as cron.
- If your pipeline runner supports Python snippets, then you can paste this into your pipeline.
"""
import sys

from great_expectations.checkpoint.types.checkpoint_result import CheckpointResult
from great_expectations.data_context import DataContext

data_context: DataContext = DataContext(
    context_root_dir="/path/to/great_expectations"
)

result: CheckpointResult = data_context.run_checkpoint(
    checkpoint_name="my_checkpoint",
    batch_request=None,
    run_name=None,
)

if not result["success"]:
    print("Validation failed!")
    sys.exit(1)

print("Validation succeeded!")
sys.exit(0)

                                    
                                  

3. Run the script

This Python script can then be invoked directly using Python:

python great_expectations/uncommitted/run_my_checkpoint.py

Alternatively, the above Python code can be embedded in your pipeline.

Additional Notes

Other arguments to the DataContext.run_checkpoint() method may be required, depending on the amount and specifics of the Checkpoint configuration previously saved in the configuration file of the Checkpoint with the corresponding name.
The dynamically specified Checkpoint configuration, provided to the runtime as arguments to DataContext.run_checkpoint() must complement the settings in the Checkpoint configuration file so as to comprise a properly and sufficiently configured Checkpoint with the given name.
Please see How to configure a new Checkpoint using test_yaml_config for more Checkpoint configuration examples (including the convenient templating mechanism) and DataContext.run_checkpoint() invocation options.

Prerequisites: This how-to guide assumes you have:

Steps​

1. Run your Checkpoint​

2. Observe the output​

Additional notes​

Steps​

1. Generate the Python script​

2. Open the script​

3. Run the script​

Additional Notes​

Steps

1. Run your Checkpoint

2. Observe the output

Additional notes

Steps

1. Generate the Python script

2. Open the script

3. Run the script

Additional Notes