How to pass an in-memory DataFrame to a Checkpoint
This guide will help you pass an in-memory DataFrame to an existing CheckpointThe primary means for validating data in a production deployment of Great Expectations.. This is especially useful if you already have your data in memory due to an existing process such as a pipeline runner.
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- Configured a Data Context.
Steps
1. Set up Great Expectations
Import the required libraries and load your DataContext
import pandas
import great_expectations as gx
from great_expectations.checkpoint import SimpleCheckpoint
context = gx.get_context()
2. Read a DataFrame and create a Checkpoint
The following example uses the
read_*
method on the PandasDatasource to
directly return a
ValidatorUsed to run an Expectation Suite against
data.. To use Validators to interactively build an
Expectation Suite, see
How to create Expectations interactively in
Python. The Validator can be passed directly to a
SimpleCheckpoint
df = pandas.read_csv("./data/yellow_tripdata_sample_2019-01.csv")
validator = context.sources.add_pandas("taxi_datasource").read_dataframe(
df, asset_name="version-0.16.16 taxi_frame", batch_metadata={"year": "2019", "month": "01"}
)
checkpoint = SimpleCheckpoint(
name="version-0.16.16 my_taxi_validator_checkpoint",
data_context=context,
validator=validator,
)
checkpoint_result = checkpoint.run()
Alternatively, you can use add_*
methods
to add the asset and then retrieve a
Batch RequestProvided to a Datasource in order to create a
Batch.. This method is consistent with how other Data
Assets work, and can integrate in-memory data with
other Batch Request workflows and configurations.
dataframe_asset = context.sources.add_pandas(
"my_taxi_validator_checkpoint"
).add_dataframe_asset(
name="version-0.16.16 taxi_frame", dataframe=df, batch_metadata={"year": "2019", "month": "01"}
)
context.add_or_update_expectation_suite("my_expectation_suite")
batch_request = dataframe_asset.build_batch_request()
checkpoint = SimpleCheckpoint(
name="version-0.16.16 my_taxi_dataframe_checkpoint",
data_context=context,
batch_request=batch_request,
expectation_suite_name="version-0.16.16 my_expectation_suite",
)
checkpoint_result = checkpoint.run()
In both examples, batch_metadata
is an
optional parameter that can associate meta-data with
the batch or DataFrame. When you work with DataFrames,
this can help you distinguish Validation results.
Additional Notes
To view the full script used in this page, see it on GitHub: