How to connect to in-memory data using Pandas
In this guide we will demonstrate how to connect to an in-memory Pandas DataFrame. Pandas can read many types of data into its DataFrame class, but in our example we will use data originating in a parquet file.
Prerequisites
- A Great Expectations instance. See Install Great Expectations locally.
- A Data Context.
- Access to data that can be read into a Pandas DataFrame
Steps
1. Import the Great Expectations module and instantiate a Data Context
The code to import Great Expectations and instantiate a Data Context is:
import great_expectations as gx
context = gx.get_context()
2. Create a Datasource
To access our in-memory data, we will create a Pandas Datasource:
datasource = context.sources.add_pandas(name="version-0.16.16 my_pandas_datasource")
3. Read your source data into a Pandas DataFrame
For this example, we will read a parquet file into a Pandas DataFrame, which we will then use in the rest of this guide.
The code to create the Pandas DataFrame we are using in this guide is defined with:
import pandas as pd
dataframe = pd.read_parquet(
"https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-11.parquet"
)
4. Add a Data Asset to the Datasource
A Pandas DataFrame Data Asset can be defined with two elements:
-
name
: The name by which the Datasource will be referenced in the future -
dataframe
: A Pandas DataFrame containing the data
We will use the dataframe
from the
previous step as the corresponding parameter's
value. For the name
parameter, we will
define a name in advance by storing it in a Python
variable:
name = "version-0.16.16 taxi_dataframe"
Now that we have the name
and
dataframe
for our Data Asset, we can
create the Data Asset with the code:
data_asset = datasource.add_dataframe_asset(name=name)
For dataframe
Data Assets, the
dataframe
is always specified as the
argument of exactly one API method:
my_batch_request = data_asset.build_batch_request(dataframe=dataframe)
Next steps
Now that you have connected to your data, you may want to look into:
- How to request Data from a Data Asset
- How to create Expectations while interactively evaluating a set of data
- How to use the Onboarding Data Assistant to evaluate data and create Expectations
Additional information
External APIs
For more information on Pandas read methods, please reference the official Pandas Input/Output documentation.