Skip to main content
Version: 0.17.23

Validate data with Expectations and Checkpoints

This guide will help you pass an in-memory DataFrame to a CheckpointThe primary means for validating data in a production deployment of Great Expectations. that is defined at runtime. This is especially useful if you already have your data in memory due to an existing process such as a pipeline runner.

The full script used in the following code examples, is available in GitHub here: how_to_pass_an_in_memory_dataframe_to_a_checkpoint.py.

Set up Great Expectations

Run the following command to import the required libraries and load your DataContext

import pandas

import great_expectations as gx

context = gx.get_context()

Read a DataFrame and create a Checkpoint

The following example uses the read_* method on the PandasDatasource to directly return a ValidatorUsed to run an Expectation Suite against data.. To use Validators to interactively build an Expectation Suite, see How to create Expectations interactively in Python. The Validator can be passed directly to a Checkpoint

df = pandas.read_csv("./data/yellow_tripdata_sample_2019-01.csv")

validator = context.sources.add_pandas("taxi_datasource").read_dataframe(
df, asset_name="taxi_frame", batch_metadata={"year": "2019", "month": "01"}
)
validator.save_expectation_suite() # this allows the checkpoint to reference the expectation suite

checkpoint = context.add_or_update_checkpoint(
name="my_taxi_validator_checkpoint", validator=validator
)

checkpoint_result = checkpoint.run()

Alternatively, you can use add_* methods to add the asset and then retrieve a Batch RequestProvided to a Datasource in order to create a Batch.. This method is consistent with how other Data Assets work, and can integrate in-memory data with other Batch Request workflows and configurations.

dataframe_asset = context.sources.add_pandas(
"my_taxi_validator_checkpoint"
).add_dataframe_asset(
name="taxi_frame", dataframe=df, batch_metadata={"year": "2019", "month": "01"}
)
context.add_or_update_expectation_suite("my_expectation_suite")

batch_request = dataframe_asset.build_batch_request()

checkpoint = context.add_or_update_checkpoint(
name="my_taxi_dataframe_checkpoint",
batch_request=batch_request,
expectation_suite_name="my_expectation_suite",
)

checkpoint_result = checkpoint.run()

In both examples, batch_metadata is an optional parameter that can associate meta-data with the batch or DataFrame. When you work with DataFrames, this can help you distinguish Validation results.