Version: 0.17.23

Connect to in-memory source data

Use the information provided here to connect to an in-memory pandas or Spark DataFrame. Great Expectations (GX) uses the term source data when referring to data in its original format, and the term source data system when referring to the storage location for source data.

pandas
Spark

pandas

pandas can read many types of data into its DataFrame class, but the following examples use data originating in a parquet file.

Prerequisites

A Great Expectations instance. See Install Great Expectations with source data system dependencies.
A Data Context.
Access to data that can be read into a Pandas DataFrame

Import the Great Expectations module and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

import great_expectations as gx

context = gx.get_context()

Create a Data Source

Run the following Python code to create a Pandas Data Source:

datasource = context.sources.add_pandas(name="my_pandas_datasource")

Read your source data into a Pandas DataFrame

In the following example, a parquet file is read into a Pandas DataFrame that will be used in subsequent code examples.

Run the following Python code to create the Pandas DataFrame:

import pandas as pd

dataframe = pd.read_parquet(
    "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2022-11.parquet"
)

Add a Data Asset to the Data Source

The following information is required when you create a Pandas DataFrame Data Asset:

name: The Data Source name.
dataframe: The Pandas DataFrame containing the source data.

The DataFrame you created previously is the value you'll enter for dataframe parameter.

Run the following Python code to define the name parameter and store it as a Python variable:
```
name = "taxi_dataframe"
```
Run the following Python code to create the Data Asset:
```
data_asset = datasource.add_dataframe_asset(name=name)
```
For dataframe Data Assets, the dataframe is always specified as the argument of one API method. For example:
```
my_batch_request = data_asset.build_batch_request(dataframe=dataframe)
```

Next steps

For more information on Pandas read methods, see the Pandas Input/Output documentation.

Spark

Connect to in-memory source data using Spark.

Prerequisites

A Great Expectations instance. See Install Great Expectations with source data system dependencies.
A Data Context.
Access to data that can be read into a Spark
An active Spark Context

Import the Great Expectations module and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

import great_expectations as gx

context = gx.get_context()

Create a Data Source

Run the following Python code to create a Spark Data Source:

datasource = context.sources.add_spark("my_spark_datasource")

Read your source data into a Spark DataFrame

In the following example, you'll create a simple Spark DataFrame that is used in the following code examples.

Run the following Python code to create the Spark DataFrame:

df = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5, 6],
        "b": [100, 200, 300, 400, 500, 600],
        "c": ["one", "two", "three", "four", "five", "six"],
    },
    index=[10, 20, 30, 40, 50, 60],
)

dataframe = spark.createDataFrame(data=df)

                                    
                                  

Add a Data Asset to the Datasource

The following information is required when you create a Spark DataFrame Data Asset:

name: The Datasource name.
dataframe: The Spark DataFrame containing the source data.

The DataFrame you created previously is the value you'll enter for dataframe parameter.

Run the following Python code to define the name parameter and store it as a Python variable:
```
name = "my_df_asset"
```
Run the following Python code to create the Data Asset:
```
data_asset = datasource.add_dataframe_asset(name=name)
```
For dataframe Data Assets, the dataframe is always specified as the argument of one API method. For example:
```
my_batch_request = data_asset.build_batch_request(dataframe=dataframe)
```

Next steps

For more information on Spark read methods, see the Spark Input/Output documentation.

pandas​

Prerequisites​

Import the Great Expectations module and instantiate a Data Context​

Create a Data Source​

Read your source data into a Pandas DataFrame​

Add a Data Asset to the Data Source​

Next steps​

Related documentation​

Spark​

Prerequisites​

Import the Great Expectations module and instantiate a Data Context​

Create a Data Source​

Read your source data into a Spark DataFrame​

Add a Data Asset to the Datasource​

Next steps​

Related documentation​

pandas

Prerequisites

Import the Great Expectations module and instantiate a Data Context

Create a Data Source

Read your source data into a Pandas DataFrame

Add a Data Asset to the Data Source

Next steps

Related documentation

Spark

Prerequisites

Import the Great Expectations module and instantiate a Data Context

Create a Data Source

Read your source data into a Spark DataFrame

Add a Data Asset to the Datasource

Next steps

Related documentation