Skip to main content
Version: 0.17.23

Validate multiple Batches from a Batch Request with a single Checkpoint

By default, a Checkpoint only validates the last Batch included in a Batch Request. Use the information provided here to learn how you can use a Python loop and the Checkpoint validations parameter to validate multiple Batches identified by a single Batch Request.

Prerequisites

Create a Batch Request with multiple Batches

The following Python code creates a Batch Request that includes every available Batch in a Data Asset named asset:

batch_request = asset.build_batch_request()
tip

A Batch Request can only retrieve multiple Batches from a Data Asset that has been configured to include more than the default single Batch.

When working with a Filesystem Data Source and organizing Batches, the batching_regex argument determines the inclusion of multiple Batches into a single Data Asset, with each file that matches the batching_regex resulting in a single Batch.

SQL Data source data Assets include a single Batch by default. You can use splitters to split the single Batch into multiple Batches.

For more information on partitioning a Data Asset into Batches, see Manage Data Assets.

Get a list of Batches from the Batch Request

Use the same Data Asset that your Batch Request was built from to retrieve a list of Batches with the following code:

batch_list = asset.get_batch_list_from_batch_request(batch_request)

Convert the list of Batches into a list of Batch Requests

A Checkpoint validates Batch Requests, but only validates the last Batch found in a Batch Request. You'll need to convert the list of Batches into a list of Batch Requests that return the corresponding individual Batch.

batch_request_list = [batch.batch_request for batch in batch_list]

Build a validations list

A Checkpoint class's validations parameter consists of a list of dictionaries. Each dictionary pairs one Batch Request with the Expectation Suite it should be validated against. The following code creates a valid validations list and associates each Batch Request with an Expectation Suite named example_suite.

validations = [
{"batch_request": batch.batch_request, "expectation_suite_name": "example_suite"}
for batch in batch_list
]

Run Checkpoint

The validations list, containing the pairings of Batch Requests and Expectation Suites, can now be passed to a single Checkpoint instance which validates each Batch Request against its corresponding Expectation Suite. This effectively validates each Batch included in the original multiple-Batch Batch Request.

checkpoint = context.add_or_update_checkpoint(
name="my_taxi_validator_checkpoint", validations=validations
)

checkpoint_result = checkpoint.run()

Review the Validation Results

After the validations run, use the following code to build and view the Validation Results as Data Docs.

context.build_data_docs()
context.open_data_docs()