How to configure an Expectation Store to use GCS
The Great Expectations CLI is no longer the preferred method for implementing and configuring Great Expectations. This topic will be updated soon to reflect this change. For more information, see A fond farewell to the CLI.
By default, newl
ProfiledThe act of generating Metrics and candidate
Expectations from data.
ExpectationsA verifiable assertion about data.
are stored as
Expectation SuitesA collection of verifiable assertions about
data.
in JSON format in the
expectations/ subdirectory of your
great_expectations/ folder. Use the
information provided here to configure a new storage
location for Expectations in Google Cloud Storage
(GCS).
To view all the code used in this topic, see how_to_configure_an_expectation_store_in_gcs.py.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A Data Context.
- An Expectations Suite.
- A GCP service account with credentials that allow access to GCP resources such as Storage Objects.
- A GCP project, GCS bucket, and prefix to store Expectations.
1. Configure your GCP credentials
Confirm that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Expectations will be stored. This includes the following:
- A GCP service account.
-
Setting the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable. - Verifying authentication by running a Google Cloud Storage client library script.
For more information about validating your GCP authentication credentials, see Authenticate to Cloud services using client libraries.
2. Identify your Data Context Expectations Store
The configuration for your Expectations
StoreA connector to store and retrieve information
about metadata in Great Expectations.
is available in your
Data ContextThe primary entry point for a Great Expectations
deployment, with configurations and methods for
all supporting components.. Open great_expectations.yml and find
the following entry:
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
expectations_store_name: expectations_store
This configuration tells Great Expectations to look
for Expectations in the
expectations_store Store. The default
base_directory for
expectations_store is
expectations/.
3. Update your configuration file to include a new store for Expectations
In the following example,
expectations_store_name is set to
expectations_GCS_store, but it can be
personalized. You also need to change the
store_backend settings. The
class_name is
TupleGCSStoreBackend,
project is your GCP project,
bucket is the address of your GCS bucket,
and prefix is the folder on GCS where
Expectations are stored.
stores:
expectations_GCS_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <your>
bucket: <your>
prefix: <your>
expectations_store_name: expectations_GCS_store
If you are also storing
Validations in GCS
or
DataDocs in GCS, make sure that the prefix values
are disjoint and one is not a substring of the
other.
4. Copy existing Expectation JSON files to the GCS bucket (Optional)
Use the gsutil cp command to copy
Expectations into GCS. For example, the following
command copies the Expectation `my_expectation_suite
from a local folder into a GCS bucket:
gsutil cp expectations/my_expectation_suite.json gs://<your>/<your>/my_expectation_suite.json
The following confirmation message is returned:
Operation completed over 1 objects
Additional methods for copying Expectations into GCS are available. See Upload objects from a filesystem.
5. Confirm that the new Expectation Suites have been added
If you copied your existing Expectation Suites to GCS, run the following Python command to confirm that Great Expectations can find them:
import great_expectations as gx
context = gx.get_context()
context.list_expectation_suite_names()
A list of Expectation Suites you copied to GCS is returned. Expectation Suites that weren't copied to the new Store aren't listed.
6. Confirm that Expectations can be accessed from GCS
Run the following command to confirm your Expectations were copied to GCS:
great_expectations suite list
If your Expectations were not copied to Azure Blob Storage, a message indicating no Expectations were found is returned.