Skip to main content
Version: 0.16.16

How to pass an in-memory DataFrame to a Checkpoint

This guide will help you pass an in-memory DataFrame to an existing CheckpointThe primary means for validating data in a production deployment of Great Expectations.. This is especially useful if you already have your data in memory due to an existing process such as a pipeline runner.

Steps

1. Set up Great Expectations

Import the required libraries and load your DataContext

import pandas
import great_expectations as gx
from great_expectations.checkpoint import SimpleCheckpoint

context = gx.get_context()

2. Read a DataFrame and create a Checkpoint

The following example uses the read_* method on the PandasDatasource to directly return a ValidatorUsed to run an Expectation Suite against data.. To use Validators to interactively build an Expectation Suite, see How to create Expectations interactively in Python. The Validator can be passed directly to a SimpleCheckpoint

df = pandas.read_csv("./data/yellow_tripdata_sample_2019-01.csv")

validator = context.sources.add_pandas("taxi_datasource").read_dataframe(
df, asset_name="version-0.16.16 taxi_frame", batch_metadata={"year": "2019", "month": "01"}
)

checkpoint = SimpleCheckpoint(
name="version-0.16.16 my_taxi_validator_checkpoint",
data_context=context,
validator=validator,
)

checkpoint_result = checkpoint.run()

Alternatively, you can use add_* methods to add the asset and then retrieve a Batch RequestProvided to a Datasource in order to create a Batch.. This method is consistent with how other Data Assets work, and can integrate in-memory data with other Batch Request workflows and configurations.

dataframe_asset = context.sources.add_pandas(
"my_taxi_validator_checkpoint"
).add_dataframe_asset(
name="version-0.16.16 taxi_frame", dataframe=df, batch_metadata={"year": "2019", "month": "01"}
)
context.add_or_update_expectation_suite("my_expectation_suite")

batch_request = dataframe_asset.build_batch_request()

checkpoint = SimpleCheckpoint(
name="version-0.16.16 my_taxi_dataframe_checkpoint",
data_context=context,
batch_request=batch_request,
expectation_suite_name="version-0.16.16 my_expectation_suite",
)

checkpoint_result = checkpoint.run()

In both examples, batch_metadata is an optional parameter that can associate meta-data with the batch or DataFrame. When you work with DataFrames, this can help you distinguish Validation results.

Additional Notes

To view the full script used in this page, see it on GitHub: