Configure Expectation Stores
An Expectation Store is a connector to store and retrieve information about collections of verifiable assertions about data.
By default, new
ExpectationsA verifiable assertion about data.
are stored as
Expectation SuitesA collection of verifiable assertions about
data.
in JSON format in the
expectations/
subdirectory of your
gx/
folder. Use the information provided
here to configure a store for your Expectations.
- Amazon S3
- Microsoft Azure Blob Storage
- Google Cloud Service
- Filesystem
- PostgreSQL
Amazon S3
Use the information provided here to configure a new storage location for Expectations in Amazon S3.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A Data Context.
- An Expectations Suite.
- Permissions to install boto3 in your local environment.
- An S3 bucket and prefix to store Expectations.
Install boto3 with pip
Python interacts with AWS through the
boto3
library. Great Expectations
makes use of this library in the background when
working with AWS. Although you won't use
boto3
directly, you'll need to
install it in your virtual environment.
Run one of the following pip commands to install
boto3
in your virtual environment:
python -m pip install boto3
or
python3 -m pip install boto3
To set up
boto3
with AWS, and use boto3
within
Python, see the
Boto3 documentation.
Verify your AWS credentials
Run the following command in the AWS CLI to verify that your AWS credentials are properly configured:
aws sts get-caller-identity
When your credentials are properly configured,
your UserId
, Account
,
and Arn
are returned. If your
credentials are not configured correctly, an
error message appears. If you received an error
message, or you couldn't verify your
credentials, see
Configuring the AWS CLI.
Identify your Data Context Expectations Store
Your Expectation StoreA connector to store and retrieve information about collections of verifiable assertions about data. configuration is in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components..
The following section in your
Data ContextThe primary entry point for a Great
Expectations deployment, with configurations
and methods for all supporting
components.
great_expectations.yml
file tells
Great Expectations to look for Expectations in a
Store named expectations_store
:
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
expectations_store_name: expectations_store
The default base_directory
for
expectations_store
is
expectations/
.
Update your configuration file to include a new Store for Expectations
To manually add an
Expectations StoreA connector to store and retrieve
information about collections of verifiable
assertions about data.
to your configuration, add the following
configuration to the stores
section
of your
great_expectations.yml
file:
stores:
expectations_S3_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your>'
prefix: '<your>' # Bucket and prefix in combination must be unique across all stores
expectations_store_name: expectations_S3_store
Change the default
store_backend
settings to make the
Store work with S3. The
class_name
is set to
TupleS3StoreBackend
,
bucket
is the address of your S3
bucket, and prefix
is the folder in
your S3 bucket where Expectations are located.
The following example shows the additional
options that are available to customize
TupleS3StoreBackend
:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>' # Bucket and prefix in combination must be unique across all stores
boto3_options:
endpoint_url: ${S3_ENDPOINT} # Uses the S3_ENDPOINT environment variable to determine which endpoint to use.
region_name: '<your_aws_region_name>'
In the previous example, the Store name is
expectations_S3_store
. If you use a
personalized Store name, you must also update
the value of the
expectations_store_name
key to
match the Store name. For example:
expectations_store_name: expectations_S3_store
When you update the
expectations_store_name
key value,
Great Expectations uses the new Store for
Validation Results.
Add the following code to
great_expectations.yml
to configure
the IAM user:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
aws_access_key_id: ${AWS_ACCESS_KEY_ID} # Uses the AWS_ACCESS_KEY_ID environment variable to get aws_access_key_id.
aws_secret_access_key: ${AWS_ACCESS_KEY_ID}
aws_session_token: ${AWS_ACCESS_KEY_ID}
Add the following code to
great_expectations.yml
to configure
the IAM Assume Role:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>' # Bucket and prefix in combination must be unique across all stores
boto3_options:
assume_role_arn: '<your_role_to_assume>'
region_name: '<your_aws_region_name>'
assume_role_duration: session_duration_in_seconds
If you're storing
Validations in S3
or
DataDocs in S3, make sure that the
prefix
values are disjoint and
one is not a substring of the other.
Copy existing Expectation JSON files to the S3 bucket (Optional)
If you are converting an existing local Great Expectations deployment to one that works in AWS, you might have Expectations saved that you want to transfer to your S3 bucket.
Run the following
aws s3 sync
command to copy
Expectations into Amazon S3:
aws s3 sync '<base_directory>' s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'
The base_directory
is set to
expectations/
by default.
In the following example, the Expectations
exp1
and exp2
are
copied to Amazon S3 and a confirmation message
is returned:
upload: ./exp1.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/exp1.json
upload: ./exp2.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/exp2.json
Confirm Expectation Suite availability
If you copied your existing Expectation Suites to the S3 bucket, run the following Python code to confirm that Great Expectations can find them:
import great_expectations as gx
context = gx.get_context()
context.list_expectation_suite_names()
The Expectations you copied to S3 are returned as a list. Expectations that weren't copied to the new Store aren't listed.
Microsoft Azure Blob Storage
Use the information provided here to configure a new storage location for Expectations in Microsoft Azure Blob Storage.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A Data Context.
- An Expectations Suite.
- An Azure Storage account.
-
An Azure Blob container. If you need to
host and share Data Docs on Azure Blob
Storage, then you can set this up first and then
use the
$web
existing container to store your Expectations. - A prefix (folder) where to store Expectations. You don't need to create the folder, the prefix is just part of the Azure Blob name.
Configure the
config_variables.yml
file with your
Azure Storage credentials
GX recommends that you store Azure Storage
credentials in the
config_variables.yml
file, which is
located in the uncommitted/
folder
by default, and is not part of source control.
The following code adds Azure Storage
credentials below the
AZURE_STORAGE_CONNECTION_STRING
key:
AZURE_STORAGE_CONNECTION_STRING: "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
To learn more about the additional options for
configuring the
config_variables.yml
file, or
additional environment variables, see
How to configure credentials
Identify your Data Context Expectations Store
Your Expectations Store configuration is
provided in your
Data ContextThe primary entry point for a Great
Expectations deployment, with configurations
and methods for all supporting
components.. Open great_expectations.yml
and
find the following entry:
expectations_store_name: expectations_store
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
This configuration tells Great Expectations to
look for Expectations in a Store named
expectations_store
. The default
base_directory
for
expectations_store
is
expectations/
.
Update your configuration file to include a new Store for Expectations
In the following example,
expectations_store_name
is set to
expectations_AZ_store
, but it can
be personalized. You also need to change the
store_backend
settings. The
class_name
is
TupleAzureBlobStoreBackend
,
container
is the name of your blob
container where Expectations are stored,
prefix
is the folder in the
container where Expectations are located, and
connection_string
is
${AZURE_STORAGE_CONNECTION_STRING}
to reference the corresponding key in the
config_variables.yml
file.
expectations_store_name: expectations_AZ_store
stores:
expectations_AZ_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleAzureBlobStoreBackend
container: <blob-container>
prefix: expectations
connection_string: ${AZURE_STORAGE_CONNECTION_STRING}
If the container for
hosting and sharing Data Docs on Azure
Blob Storage
is named $web
, use
container: \$web
to allow
access to the $web
container.
Additional authentication and configuration options are available. See Hosting and sharing Data Docs on Azure Blob Storage.
Copy existing Expectation JSON files to the Azure blob (Optional)
You can use the
az storage blob upload
command to
copy Expectations into Azure Blob Storage. The
following command copies the Expectation
exp1
from a local folder to Azure
Blob Storage:
export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
az storage blob upload -f <local/path/to/expectation.json> -c <GREAT-EXPECTATION-DEDICATED-AZURE-BLOB-CONTAINER-NAME> -n <PREFIX>/<expectation.json>
example :
az storage blob upload -f gx/expectations/exp1.json -c <blob-container> -n expectations/exp1.json
Finished[#############################################################] 100.0000%
{
"etag": "\"0x8D8E08E5DA47F84\"",
"lastModified": "2021-03-06T10:55:33+00:00"
}
To learn more about other methods that are available to copy Expectation JSON files into Azure Blob Storage, see Introduction to Azure Blob Storage.
Confirm that the new Expectation Suites have been added
If you copied your existing Expectation Suites to Azure Blob Storage, run the following Python command to confirm that Great Expectations can find them:
import great_expectations as gx
context = gx.get_context()
context.list_expectation_suite_names()
A list of Expectations you copied to Azure Blob Storage is returned. Expectations that weren't copied to the new folder are not listed.
Confirm that Expectations can be accessed from Azure Blob Storage
Run the following command to confirm your Expectations have been copied to Azure Blob Storage:
great_expectations suite list
If your Expectations have not been copied to Azure Blob Storage, the message "No Expectations were found" is returned.
GCS
Use the information provided here to configure a new storage location for Expectations in GCS.
To view all the code used in this topic, see how_to_configure_an_expectation_store_in_gcs.py.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A Data Context.
- An Expectations Suite.
- A GCP service account with credentials that allow access to GCP resources such as Storage Objects.
- A GCP project, GCS bucket, and prefix to store Expectations.
Configure your GCP credentials
Confirm that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Expectations will be stored. This includes the following:
- A GCP service account.
-
Setting the
GOOGLE_APPLICATION_CREDENTIALS
environment variable. - Verifying authentication by running a Google Cloud Storage client library script.
For more information about validating your GCP authentication credentials, see Authenticate to Cloud services using client libraries.
Identify your Data Context Expectations Store
The configuration for your Expectations
StoreA connector to store and retrieve
information about metadata in Great
Expectations.
is available in your
Data ContextThe primary entry point for a Great
Expectations deployment, with configurations
and methods for all supporting
components.. Open great_expectations.yml
and
find the following entry:
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
expectations_store_name: expectations_store
This configuration tells Great Expectations to
look for Expectations in the
expectations_store
Store. The
default base_directory
for
expectations_store
is
expectations/
.
Update your configuration file to include a new store for Expectations
In the following example,
expectations_store_name
is set to
expectations_GCS_store
, but it can
be personalized. You also need to change the
store_backend
settings. The
class_name
is
TupleGCSStoreBackend
,
project
is your GCP project,
bucket
is the address of your GCS
bucket, and prefix
is the folder on
GCS where Expectations are stored.
stores:
expectations_GCS_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <your>
bucket: <your>
prefix: <your>
expectations_store_name: expectations_GCS_store
If you are also storing
Validations in GCS
or
DataDocs in GCS, make sure that the
prefix
values are disjoint and
one is not a substring of the other.
Copy existing Expectation JSON files to the GCS bucket (Optional)
Use the gsutil cp
command to copy
Expectations into GCS. For example, the
following command copies the Expectation `my_expectation_suite
from a local folder into a GCS bucket:
gsutil cp expectations/my_expectation_suite.json gs://<your>/<your>/my_expectation_suite.json
The following confirmation message is returned:
Operation completed over 1 objects
Additional methods for copying Expectations into GCS are available. See Upload objects from a filesystem.
Confirm that the new Expectation Suites have been added
If you copied your existing Expectation Suites to GCS, run the following Python command to confirm that Great Expectations can find them:
import great_expectations as gx
context = gx.get_context()
context.list_expectation_suite_names()
A list of Expectation Suites you copied to GCS is returned. Expectation Suites that weren't copied to the new Store aren't listed.
Confirm that Expectations can be accessed from GCS
Run the following command to confirm your Expectations were copied to GCS:
great_expectations suite list
If your Expectations were not copied to Azure Blob Storage, a message indicating no Expectations were found is returned.
Filesystem
Use the information provided here to configure a new storage location for Expectations on your Filesystem.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A Data Context.
- An Expectation Suite.
- A storage location for Expectations. This can be a local path, or a path to a network filesystem.
Create a new folder for Expectations
Run the following command to create a new folder for your Expectations and move your existing Expectations to the new folder:
# in the gx/ folder
mkdir shared_expectations
mv expectations/npi_expectations.json shared_expectations/
In this example, the name of the Expectation is
npi_expectations
and the path to
the new storage location is
/shared_expectations
.
Identify your Data Context Expectations Store
The configuration for your Expectations
StoreA connector to store and retrieve
information about metadata in Great
Expectations.
is available in your
Data ContextThe primary entry point for a Great
Expectations deployment, with configurations
and methods for all supporting
components.. Open great_expectations.yml
and
find the following entry:
expectations_store_name: expectations_store
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
This configuration tells Great Expectations to
look for Expectations in the
expectations_store
Store. The
default base_directory
for
expectations_store
is
expectations/
.
Update your configuration file to include a new Store for Expectations results
In the following example,
expectations_store_name
is set to
shared_expectations_filesystem_store
, but it can be personalized. Also,
base_directory
is set to
shared_expectations/
, but you can
set it to another path that is accessible by
Great Expectations.
expectations_store_name: shared_expectations_filesystem_store
stores:
shared_expectations_filesystem_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: shared_expectations/
Confirm that the new Expectation Suites have been added
If you copied your existing Expectation Suites to your filesystem, run the following Python command to confirm that Great Expectations can find them:
import great_expectations as gx
context = gx.get_context()
context.list_expectation_suite_names()
A list of Expectation Suites you copied your filesystem is returned. Expectation Suites that weren't copied to the new Store aren't listed.
Version control systems
GX recommends that you store Expectations in a
version control system such as Git. The JSON
format of Expectations allows for informative
diff-statements and modification tracking. In
the following example, the `expect_table_column_count_to_equal
value changes from 333
to
331
, and then to 330
:
git log -p npi_expectations.json
commit cbc127fb27095364c3c1fcbf6e7f078369b07455
changed expect_table_column_count_to_equal to 331
diff --git a/gx/expectations/npi_expectations.json b/great_expectations/expectations/npi_expectations.json
--- a/gx/expectations/npi_expectations.json
+++ b/gx/expectations/npi_expectations.json
@@ -17,7 +17,7 @@
{
"expectation_type": "expect_table_column_count_to_equal",
"kwargs": {
- "value": 333
+ "value": 331
}
commit 05b3c8c1ed35d183bac1717d4877fe13bc574963
changed expect_table_column_count_to_equal to 333
diff --git a/gx/expectations/npi_expectations.json b/great_expectations/expectations/npi_expectations.json
--- a/gx/expectations/npi_expectations.json
+++ b/gx/expectations/npi_expectations.json
{
"expectation_type": "expect_table_column_count_to_equal",
"kwargs": {
- "value": 330
+ "value": 333
}
PostgreSQL
Use the information provided here to configure an Expectations store in a PostgreSQL database.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A Data Context.
- An Expectations Suite.
- A PostgreSQL database with appropriate credentials.
Configure the
config_variables.yml
file with your
database credentials
GX recommends storing database credentials in
the config_variables.yml
file,
which is located in the
uncommitted/
folder by default, and
not part of source control.
To add database credentials, open
config_variables.yml
and add the
following entry below the
db_creds
key:
db_creds:
drivername: postgresql
host: '<your_host_name>'
port: '<your_port>'
username: '<your_username>'
password: '<your_password>'
database: '<your_database_name>'
To configure the
config_variables.yml
file, or
additional environment variables, see
How to configure credentials.
Identify your Data Context Expectations Store
Open great_expectations.yml
and find
the following entry:
expectations_store_name: expectations_store
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
This configuration tells Great Expectations to
look for Expectations in the
expectations_store
Store. The
default base_directory
for
expectations_store
is
expectations/
.
Update your configuration file to include a new Store for Expectations
In the following example,
expectations_store_name
is set to
expectations_postgres_store
, but it
can be personalized. You also need to make some
changes to the
store_backend
settings. The
class_name
is
DatabaseStoreBackend
, and
credentials
is
${db_creds}
to reference the
corresponding key in the
config_variables.yml
file.
expectations_store_name: expectations_postgres_store
stores:
expectations_postgres_store:
class_name: ExpectationsStore
store_backend:
class_name: DatabaseStoreBackend
credentials: ${db_creds}