How to request data from a Data Asset
This guide demonstrates how you can request data from
a Datasource that has been defined with the
context.sources.add_*
method.
Prerequisites
- An installation of GX
- A Datasource with a configured Data Asset
Steps
1. Import GX and instantiate a Data Context
The code to import Great Expectations and instantiate a Data Context is:
import great_expectations as gx
context = gx.get_context()
2. Retrieve your Data Asset
If you already have an instance of your Data Asset
stored in a Python variable, you do not need to
retrieve it again. If you do not, you can instantiate
a previously defined Datasource with your Data
Context's
get_datasource(...)
method. Likewise, a
Datasource's get_asset(...)
method
will instantiate a previously defined Data Asset.
In this example we will use a previously defined
Datasource named my_datasource
and a
previously defined Data Asset named
my_asset
.
my_asset = context.get_datasource("my_datasource").get_asset("my_asset")
3. (Optional) Build an options
dictionary
for your Batch Request
An options
dictionary can be used to
limit the Batches returned by a Batch Request.
Omitting the options
dictionary will
result in all available Batches being returned.
The structure of the options
dictionary
will depend on the type of Data Asset being used. The
valid keys for the options
dictionary can
be found by checking the Data Asset's
batch_request_options
property.
print(my_asset.batch_request_options)
The batch_request_options
property is a
tuple that contains all the valid keys that can be
used to limit the Batches returned in a Batch Request.
You can create a dictionary of keys pulled from the
batch_request_options
tuple and values
that you want to use to specify the Batch or Batches
your Batch Request should return, then pass this
dictionary in as the options
parameter
when you build your Batch Request.
4. Build your Batch Request
We will use the
build_batch_request(...)
method of our
Data Asset to generate a Batch Request.
my_batch_request = my_asset.build_batch_request()
For dataframe
Data Assets, the
dataframe
is always specified as the
argument of exactly one API method:
my_batch_request = my_asset.build_batch_request(dataframe=dataframe)
5. Verify that the correct Batches were returned
The
get_batch_list_from_batch_request(...)
method will return a list of the Batches a given Batch
Request refers to.
batches = my_asset.get_batch_list_from_batch_request(my_batch_request)
Because Batch definitions are quite verbose, it is
easiest to determine what data the Batch Request will
return by printing just the batch_spec
of
each Batch.
for batch in batches:
print(batch.batch_spec)
Next steps
Now that you have a retrieved data from a Data Asset, you may be interested in creating Expectations about your data: