Batch Request
A Batch Request specifies a
BatchA selection of records from a Data Asset.
of data. It can be created by using the
build_batch_request
method found on a
Data AssetA collection of records within a Datasource which
is usually named based on the underlying data
system and sliced to correspond to a desired
specification..
A Batch Request contains all the necessary details to
query the appropriate underlying data. The
relationship between a Batch Request and the data
returned as a Batch is guaranteed. If a Batch Request
identifies multiple Batches that fit the criteria of
the user provided options
argument to the
build_batch_request
method on a Data
Asset, the Batch Request will return all of the
matching Batches.
If you are using an interactive session, you can
inspect the allowed keys for the
options
argument for a Data Asset by
printing the
batch_request_options
attribute.
Relationship to other objects
A Batch Request is always used to build a Batch. For example, when you run a CheckpointThe primary means for validating data in a production deployment of Great Expectations. or use a ValidatorUsed to run an Expectation Suite against data.), you'll use a Batch Request to create the Batch.
Use cases
When you create Expectations, you'll need to provide a Batch of data to test your Expectations. To get the Batch of data, you'll use a Batch Request.
For more information, see:
When ValidatingThe act of applying an Expectation Suite to a Batch. data with a Checkpoint, you will need to provide one or more Batch Requests and one or more Expectation SuitesA collection of verifiable assertions about data.. You can do this at runtime, or by defining Batch Request and Expectation Suite pairs in advance, in the Checkpoint's configuration.
For more information on setting up Batch Request/Expectation Suite pairs in a Checkpoint configuration, see How to add validations data or suites to a Checkpoint.
Guaranteed relationships
The relationship between a Batch and the Batch Request that generated it is guaranteed. A Batch Request includes all the information necessary to identify a specific Batch or Batches.
Batches are always built using a Batch Request. When
the Batch is built metadata is attached to the Batch
object and is available via the Batch
metadata
attribute. This metadata
contains all the option values necessary to recreate
the Batch Request that corresponds to the Batch.
Access
You will rarely need to access an existing Batch Request. Instead, you will often build a Batch Request from a Data Asset. A Batch Request can also be saved to a configuration file when you save an object that required a Batch Request for setup, such as a Checkpoint. Once you receive a Batch back, it is unlikely you will need to reference to the Batch Request that generated it. Indeed, if the Batch Request was part of a configuration, Great Expectations will simply initialize a new copy rather than load an existing one when the Batch Request is needed.
Create
You can create a Batch Request from a Data Asset by
calling build_batch_request
. Here is an
example of configuring a Pandas Filesystem Asset and
creating a Batch Request:
import great_expectations as gx
context = gx.get_context()
# data_directory is the full path to a directory containing csv files
datasource = context.sources.add_pandas_filesystem(
name="my_pandas_datasource", base_directory=data_directory
)
# The batching_regex should max file names in the data_directory
asset = datasource.add_csv_asset(
name="csv_asset",
batching_regex=r"yellow_tripdata_sample_(?P<year>\d{4})-(?P<month>\d{2}).csv",
order_by=["year", "month"],
)
batch_request = asset.build_batch_request(options={"year": "2019", "month": "02"})
The options
one passes in to specify a
batch will vary depending on how the specific Data
Asset was configured. To look at the keys for the
options dictionary, you can do the following:
options = asset.batch_request_options
print(options)