Skip to main content
Version: 0.16.16

How to validate data by running a Checkpoint

This guide will help you ValidateThe act of applying an Expectation Suite to a Batch. your data by running a CheckpointThe primary means for validating data in a production deployment of Great Expectations..

The best way to Validate data with Great Expectations is using a Checkpoint. Checkpoints identify what Expectation SuitesA collection of verifiable assertions about data. to run against which Data AssetA collection of records within a Datasource which is usually named based on the underlying data system and sliced to correspond to a desired specification. and BatchA selection of records from a Data Asset. (described by a Batch RequestsProvided to a Datasource in order to create a Batch.), and what ActionsA Python class with a run method that takes a Validation Result and does something with it to take based on the results of those tests.

Succinctly: Checkpoints are used to test your data and take action based on the results.

Prerequisites

You can run the Checkpoint from the CLICommand Line Interface in a Terminal shell or using Python.

If you already have created and saved a Checkpoint, then the following code snippet will retrieve it from your context and run it:

# context = gx.get_context()
result = context.run_checkpoint(
checkpoint_name="version-0.16.16 my_checkpoint",
batch_request={
"datasource_name": "taxi_source",
"data_asset_name": "yellow_tripdata",
},
run_name=None,
)

if not result["success"]:
print("Validation failed!")
sys.exit(1)

print("Validation succeeded!")

If you do not have a Checkpoint, the pre-requisite guides mentioned above will take you through the necessary steps. Alternatively, this concise example below shows how to connect to data, create an expectation suite using a validator, and create a checkpoint (saving everything to the Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. along the way).

# setup
import sys
import great_expectations as gx

context = gx.get_context()

# starting from scratch, we add a datasource and asset
datasource = context.sources.add_pandas_filesystem(
name="version-0.16.16 taxi_source", base_directory=data_directory
)

asset = datasource.add_csv_asset(
"yellow_tripdata",
batching_regex=r"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2}).csv",
order_by=["-year", "month"],
)

# use a validator to create an expectation suite
validator = context.get_validator(
datasource_name="taxi_source", data_asset_name="version-0.16.16 yellow_tripdata"
)
validator.expect_column_values_to_not_be_null("pickup_datetime")
context.add_expectation_suite("yellow_tripdata_suite")

# create a checkpoint
checkpoint = gx.checkpoint.SimpleCheckpoint(
name="version-0.16.16 my_checkpoint",
data_context=context,
expectation_suite_name="version-0.16.16 yellow_tripdata_suite",
)

# add (save) the checkpoint to the data context
context.add_checkpoint(checkpoint=checkpoint)
cp = context.get_checkpoint(name="version-0.16.16 my_checkpoint")
assert cp.name == "my_checkpoint"