Create and edit Expectations based on domain knowledge, without inspecting data directly
This guide shows how to create an Expectation SuiteA collection of verifiable assertions about data. without a sample BatchA selection of records from a Data Asset..
The following are the reasons why you might want to do this:
- You don't have a sample.
- You don't currently have access to the data to make a sample.
- You know exactly how you want your ExpectationsA verifiable assertion about data. to be configured.
- You want to create Expectations parametrically (you can also do this in interactive mode).
- You don't want to spend the time to validate against a sample.
If you have a use case we have not considered, please contact us on Slack.
No. The interactive method used to create and edit Expectations does not edit or alter the Batch data.
Prerequisites
- Great Expectations installed in a Python environment
- A Filesystem Data Context for your Expectations
- Created a Data Source from which to request a Batch of data for introspection
Import the Great Expectations module and instantiate a Data Context
For this guide we will be working with Python code in a Jupyter Notebook. Jupyter is included with GX and lets us easily edit code and immediately see the results of our changes.
Run the following code to import Great Expectations and instantiate a Data Context:
import great_expectations as gx
context = gx.data_context.FileDataContext.create(full_path_to_project_directory)
If you're using an Ephemeral Data Context,
your configurations will not persist beyond the
current Python session. However, if you're
using a Filesystem or Cloud Data Context, they do
persist. The get_context()
method
returns the first Cloud or Filesystem Data Context
it can find. If a Cloud or Filesystem Data Context
has not be configured or cannot be found, it
provides an Ephemeral Data Context. For more
information about the
get_context()
method, see
Instantiate a Data Context.
Create an ExpectationSuite
We will use the
add_expectation_suite()
method to create
an empty ExpectationSuite.
suite = context.add_expectation_suite(expectation_suite_name="my_suite")
Create Expectation Configurations
You are adding Expectation configurations to the
suite. Since there is no sample Batch of data, no
ValidationThe act of applying an Expectation Suite to a
Batch.
happens during this process. To illustrate how to do
this, consider a hypothetical example. Suppose that
you have a table with the columns
account_id
, user_id
,
transaction_id
,
transaction_type
, and
transaction_amt_usd
. Then the following
code snipped adds an Expectation that the columns of
the actual table will appear in the order specified
above:
from great_expectations.core.expectation_configuration import ExpectationConfiguration
# Create an Expectation
expectation_configuration_1 = ExpectationConfiguration(
# Name of expectation type being added
expectation_type="expect_table_columns_to_match_ordered_list",
# These are the arguments of the expectation
# The keys allowed in the dictionary are Parameters and
# Keyword Arguments of this Expectation Type
kwargs={
"column_list": [
"account_id",
"user_id",
"transaction_id",
"transaction_type",
"transaction_amt_usd",
]
},
# This is how you can optionally add a comment about this expectation.
# It will be rendered in Data Docs.
# See this guide for details:
# `How to add comments to Expectations and display them in Data Docs`.
meta={
"notes": {
"format": "markdown",
"content": "Some clever comment about this expectation. **Markdown** `Supported`",
}
},
)
# Add the Expectation to the suite
suite.add_expectation(expectation_configuration=expectation_configuration_1)
Here are a few more example expectations for this dataset:
expectation_configuration_2 = ExpectationConfiguration(
expectation_type="expect_column_values_to_be_in_set",
kwargs={
"column": "transaction_type",
"value_set": ["purchase", "refund", "upgrade"],
},
# Note optional comments omitted
)
suite.add_expectation(expectation_configuration=expectation_configuration_2)
expectation_configuration_3 = ExpectationConfiguration(
expectation_type="expect_column_values_to_not_be_null",
kwargs={
"column": "account_id",
"mostly": 1.0,
},
meta={
"notes": {
"format": "markdown",
"content": "Some clever comment about this expectation. **Markdown** `Supported`",
}
},
)
suite.add_expectation(expectation_configuration=expectation_configuration_3)
expectation_configuration_4 = ExpectationConfiguration(
expectation_type="expect_column_values_to_not_be_null",
kwargs={
"column": "user_id",
"mostly": 0.75,
},
meta={
"notes": {
"format": "markdown",
"content": "Some clever comment about this expectation. **Markdown** `Supported`",
}
},
)
suite.add_expectation(expectation_configuration=expectation_configuration_4)
You can see all the available Expectations in the Expectation Gallery.
Save your Expectations for future use
To keep your Expectations for future use, you save them to your Data Context. A Filesystem or Cloud Data Context persists outside the current Python session, so saving the Expectation Suite in your Data Context's Expectations Store ensures you can access it in the future:
context.save_expectation_suite(expectation_suite=suite)
Ephemeral Data Contexts don't persist beyond
the current Python session. If you're working
with an Ephemeral Data Context, you'll need
to convert it to a Filesystem Data Context using
the Data Context's
convert_to_file_context()
method.
Otherwise, your saved configurations won't be
available in future Python sessions as the Data
Context itself is no longer available.
Next steps
Now that you have created and saved an Expectation Suite, you can Validate your data.