How to initialize a Filesystem Data Context in Python
A Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. is required in almost all Python scripts utilizing GX, and when using the CLICommand Line Interface.
Use the information provided here to use Python code to initialize, instantiate, and verify the contents of a Filesystem Data Context.
Prerequisites
- A Great Expectations instance. See Install Great Expectations locally.
Steps
1. Import Great Expectations
We will import the Great Expectations module with the command:
import great_expectations as gx
2. Determine the folder to initialize the Data Context in
For purposes of this example, we will assume that we have an empty folder to initialize our Filesystem Data Context in:
path_to_empty_folder = "/my_gx_project/"
3. Create a GX context
We will provide our empty folder's path to the GX
library's
FileDataContext.create(...)
method as the
project_root_dir
parameter. Because we
are providing a path to an empty folder
FileDataContext.create(...)
will
initialize a Filesystem Data Context at that location.
For convenience, the
FileDataContext.create(...)
method will
then instantiate and return the newly initialized Data
Context, which we can keep in a Python variable.
from great_expectations.data_context import FileDataContext
context = FileDataContext.create(project_root_dir=path_to_empty_folder)
If the project_root_dir
provided to
the
FileDataContext.create(...)
method
points to a folder that does not already have a
Data Context present, the
FileDataContext.create(...)
method
will initialize a Filesystem Data Context at that
location even if other files and folders are
present. This allows you to easily initialize a
Filesystem Data Context in a folder that contains
your source data or other project related
contents.
If a Data Context already exists at the provided
project_root_dir
, the
FileDataContext.create(...)
method
will not re-initialize it. Instead,
FileDataContext.create(...)
will
simply instantiate and return the existing Data
Context as is.
4. Verify the content of the returned Data Context
We can ensure that the Data Context was instantiated correctly by printing its contents.
print(context)
This will output the full configuration of the Data Context in the format of a Python dictionary.
Next steps
For guidance on further customizing your Data Context's configurations for Metadata StoresA connector to store and retrieve information about metadata in Great Expectations. and Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc., please see:
- How to configure an Expectation Store on a filesystem
- How to configure a Validation Result Store on a filesystem
- How to configure and use a Metric Store
- How to host and share Data Docs on a filesystem
If you are content with the default configuration of your Data Context, you can move on to connecting GX to your source data:
- How to configure a Pandas Datasource
- How to configure a Spark Datasource
- How to configure a SQL Datasource
Additional information
Related guides
To initialize and instantiate a temporary Data Context, see: How to instantiate an Ephemeral Data Context.