Validate multiple Batches from a Batch Request with a single Checkpoint
By default, a Checkpoint only validates the last Batch
included in a Batch Request. Use the information
provided here to learn how you can use a Python loop
and the Checkpoint validations
parameter
to validate multiple Batches identified by a single
Batch Request.
Prerequisites
Create a Batch Request with multiple Batches
The following Python code creates a Batch Request that
includes every available Batch in a Data Asset named
asset
:
batch_request = asset.build_batch_request()
A Batch Request can only retrieve multiple Batches from a Data Asset that has been configured to include more than the default single Batch.
When working with a Filesystem Data Source and
organizing Batches, the
batching_regex
argument determines
the inclusion of multiple Batches into a single
Data Asset, with each file that matches the
batching_regex
resulting in a single
Batch.
SQL Data source data Assets include a single Batch by default. You can use splitters to split the single Batch into multiple Batches.
For more information on partitioning a Data Asset into Batches, see Manage Data Assets.
Get a list of Batches from the Batch Request
Use the same Data Asset that your Batch Request was built from to retrieve a list of Batches with the following code:
batch_list = asset.get_batch_list_from_batch_request(batch_request)
Convert the list of Batches into a list of Batch Requests
A Checkpoint validates Batch Requests, but only validates the last Batch found in a Batch Request. You'll need to convert the list of Batches into a list of Batch Requests that return the corresponding individual Batch.
batch_request_list = [batch.batch_request for batch in batch_list]
Build a validations list
A Checkpoint class's
validations
parameter consists of a list
of dictionaries. Each dictionary pairs one Batch
Request with the Expectation Suite it should be
validated against. The following code creates a valid
validations
list and associates each
Batch Request with an Expectation Suite named
example_suite
.
validations = [
{"batch_request": batch.batch_request, "expectation_suite_name": "example_suite"}
for batch in batch_list
]
Run Checkpoint
The validations
list, containing the
pairings of Batch Requests and Expectation Suites, can
now be passed to a single Checkpoint instance which
validates each Batch Request against its corresponding
Expectation Suite. This effectively validates each
Batch included in the original multiple-Batch Batch
Request.
checkpoint = context.add_or_update_checkpoint(
name="my_taxi_validator_checkpoint", validations=validations
)
checkpoint_result = checkpoint.run()
Review the Validation Results
After the validations run, use the following code to build and view the Validation Results as Data Docs.
context.build_data_docs()
context.open_data_docs()