Instantiate a Data Context
A Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. contains the configurations for ExpectationsA verifiable assertion about data., Metadata StoresA connector to store and retrieve information about metadata in Great Expectations., Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc., CheckpointsThe primary means for validating data in a production deployment of Great Expectations., and all things related to working with Great Expectations (GX). Use the information provided here to instantiate a Data Context so that you can continue working with previously defined GX configurations.
- Existing Filesystem
- Filesystem with Python
- Specific Filesystem
- Ephemeral
Existing Filesystem
Instantiate an existing Filesystem Data Context so that you can continue working with previously defined GX configurations.
Prerequisites
- A Great Expectations instance. See Install Great Expectations with source data system dependencies.
Import GX
Run the following command to import the GX module:
import great_expectations as gx
Run the get_context(...)
method
To quickly acquire a Data Context, use the
get_context(...)
method without any
defined parameters:
context = gx.get_context()
This functions as a convenience method for
initializing, instantiating, and returning a
Data Context. In the absence of parameters
defining its behavior, calling
get_context()
returns a Cloud Data
Context, a Filesystem Data Context, or an
Ephemeral Data Context depending on what type of
Data Context has previously been initialized
with your GX install.
If you have GX Cloud configured on your system,
get_context()
instantiates and
returns a Cloud Data Context. Otherwise,
get_context()
instantiates and
returns the last accessed Filesystem Data
Context. If a previously initialized Filesystem
Data Context cannot be found,
get_context()
initializes,
instantiates, and returns a temporary in-memory
Ephemeral Data Context.
An Ephemeral Data Context is an in-memory Data Context that is not intended to persist beyond the current Python session. However, if you decide that you would like to save its contents for future use you can do so by converting it to a Filesystem Data Context:
context = context.convert_to_file_context()
This method will initialize a Filesystem Data Context in the current working directory of the Python process that contains the Ephemeral Data Context. For more detailed explanation of this method, please see our guide on how to convert an ephemeral data context to a filesystem data context
Verify Data Context content
We can ensure that the Data Context was instantiated correctly by printing its contents.
print(context)
This will output the full configuration of the Data Context in the format of a Python dictionary.
Python
A Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. is required in almost all Python scripts utilizing GX. Use Python code to initialize, instantiate, and verify the contents of a Filesystem Data Context.
Prerequisites
- A Great Expectations instance. See Install Great Expectations with source data system dependencies.
Import GX
Run the following command to import the GX module:
import great_expectations as gx
Determine the folder to initialize the Data Context in
Run the following command to initialize your Filesystem Data Context in an empty folder:
path_to_empty_folder = "/my_gx_project/"
Create a context
You provide the path for your empty folder to
the GX library's
FileDataContext.create(...)
method
as the project_root_dir
parameter.
Because you are providing a path to an empty
folder,
FileDataContext.create(...)
initializes a Filesystem Data Context in that
location.
For convenience, the
FileDataContext.create(...)
method
instantiates and returns the newly initialized
Data Context, which you can keep in a Python
variable.
from great_expectations.data_context import FileDataContext
context = FileDataContext.create(project_root_dir=path_to_empty_folder)
If the
project_root_dir
provided to
the
FileDataContext.create(...)
method points to a folder that does not
already have a Data Context present, the
FileDataContext.create(...)
method initializes a Filesystem Data Context
in that location even if other files and
folders are present. This allows you to
initialize a Filesystem Data Context in a
folder that contains your source data or
other project related contents.
If a Data Context already exists in
project_root_dir
, the
FileDataContext.create(...)
method will not re-initialize it. Instead,
FileDataContext.create(...)
instantiates and returns the existing Data
Context.
Verify the Data Context content
We can ensure that the Data Context was instantiated correctly by printing its contents.
print(context)
This will output the full configuration of the Data Context in the format of a Python dictionary.
Specific
If you're using GX for multiple projects, you might want to use a different Data Context for each project. Instantiate a specific Filesystem Data Context so that you can switch between sets of previously defined GX configurations.
Prerequisites
- A Great Expectations instance. See Install Great Expectations with source data system dependencies.
- A previously initialized Filesystem Data Context.
Import GX
Run the following command to import the GX module:
import great_expectations as gx
Specify a folder containing a previously initialized Filesystem Data Context
Each Filesystem Data Context has a root folder in which it was initialized. This root folder identifies the specific Filesystem Data Context to instantiate.
path_to_project_root = "./my_project/"
Run the get_context(...)
method
You provide the path for your empty folder to
the GX library's
get_context(...)
method as the
project_root_dir
parameter. Because
you are providing a path to an empty folder, the
get_context(...)
method
instantiates and return the Data Context at that
location.
context = gx.get_context(project_root_dir=path_to_project_root)
Note that there is a subtle distinction
between the
project_root_dir
and
context_root_dir
arguments
accepted by get_context(...)
.
Your context root is the directory that contains all your GX config while your project root refers to your actual working directory (and therefore contains the context root).
# The overall directory is your project root
data/
gx/ # The GX folder with your config is your context root
great_expectations.yml
...
...
Both are functionally equivalent for purposes of working with a file-backed project.
If the root directory provided to the
get_context(...)
method points
to a folder that does not already have a
Data Context, the
get_context(...)
method
initializes a new Filesystem Data Context in
that location.
The get_context(...)
method
instantiates and returns the newly
initialized Data Context.
Verify the Data Context content
We can ensure that the Data Context was instantiated correctly by printing its contents.
print(context)
This will output the full configuration of the Data Context in the format of a Python dictionary.
Ephemeral
An Ephemeral Data Context is a temporary, in-memory Data Context. They are ideal for doing data exploration and initial analysis when you do not want to save anything to an existing project, or for when you need to work in a hosted environment such as an EMR Spark Cluster.
An Ephemeral Data Context does not persist beyond the current Python session. To keep the contents of your Ephemeral Data Context for future use, see How to convert an Ephemeral Data Context to a Filesystem Data Context.
Prerequisites
- A Great Expectations instance. See Setup: Overview.
Import classes
To create your Data Context, you'll create a configuration that uses in-memory Metadata Stores.
-
Run the following command to import the
DataContextConfig
and theInMemoryStoreBackendDefaults
classes:from great_expectations.data_context.types.base import (
DataContextConfig,
InMemoryStoreBackendDefaults,
) -
Run the following command to import the
EphemeralDataContext
class:from great_expectations.data_context import EphemeralDataContext
Create the Data Context configuration
Run the following command to create a Data
Context configuration that specifies the use of
in-memory Metadata Stores and pass in an
instance of the
InMemoryStoreBackendDefaults
class
as a parameter when initializing an instance of
the DataContextConfig
class:
project_config = DataContextConfig(
store_backend_defaults=InMemoryStoreBackendDefaults()
)
Instantiate an Ephemeral Data Context
Run the following command to initialize the
EphemeralDataContext
class while
passing in the
DataContextConfig
instance you
created as the value of the
project_config
parameter.
context = EphemeralDataContext(project_config=project_config)
An Ephemeral Data Context is an in-memory Data Context that is not intended to persist beyond the current Python session. However, if you decide that you would like to save its contents for future use you can do so by converting it to a Filesystem Data Context:
context = context.convert_to_file_context()
This method will initialize a Filesystem Data Context in the current working directory of the Python process that contains the Ephemeral Data Context. For more detailed explanation of this method, please see our guide on how to convert an ephemeral data context to a filesystem data context
Connect GX to source data systems
Now that you have an Ephemeral Data Context you can connect GX to your data. See the following topics:
Next steps
To customize a Data Context configuration for Metadata Stores and Data Docs, see:
- Configure Expectation Stores
- Configure Validation Result Stores
- How to configure and use a Metric Store
- How to host and share Data Docs on a filesystem
To connect GX to source data: