How to configure a Validation Result store in GCS
By default,
Validation ResultsGenerated when data is Validated against an
Expectation or Expectation Suite.
are stored in JSON format in the
uncommitted/validations/ subdirectory of
your great_expectations/ folder.
Validation Results can include sensitive or regulated
data that should not be committed to a source control
system. Use the information provided here to configure
a new storage location for Validation Results in
Google Cloud Storage (GCS).
To view all the code used in this topic, see how_to_configure_a_validation_result_store_in_gcs.py.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A Data Context.
- An Expectations Suite.
- A Checkpoint.
- A GCP service account with credentials that allow access to GCP resources such as Storage Objects.
- A GCP project, GCS bucket, and prefix to store Validation Results.
1. Configure your GCP credentials
Confirm that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Validation Results will be stored. This includes the following:
- A GCP service account.
-
Setting the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable. - Verifying authentication by running a Google Cloud Storage client library script.
For more information about validating your GCP authentication credentials, see Authenticate to Cloud services using client libraries.
2. Identify your Data Context Validation Results Store
The configuration for your
Validation Results StoreA connector to store and retrieve information
about objects generated when data is Validated
against an Expectation Suite.
is available in your
Data ContextThe primary entry point for a Great Expectations
deployment, with configurations and methods for
all supporting components.. Open great_expectations.ymland find
the following entry:
stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
validations_store_name: validations_store
This configuration tells Great Expectations to look
for Validation Results in the
validations_store Store. The default
base_directory for
validations_store is
uncommitted/validations/.
3. Update your configuration file to include a new Store for Validation Results
In the following example,
validations_store_name is set to
validations_GCS_store, but it can be
personalized. You also need to change the
store_backend settings. The
class_name is
TupleGCSStoreBackend,
project is your GCP project,
bucket is the address of your GCS bucket,
and prefix is the folder on GCS where
Validation Result files are stored.
stores:
validations_GCS_store:
class_name: ValidationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <your>
bucket: <your>
prefix: <your>
validations_store_name: validations_GCS_store
If you are also storing
Expectations in GCS
or
DataDocs in GCS, make sure that the prefix values
are disjoint and one is not a substring of the
other.
4. Copy existing Validation Results to the GCS bucket (Optional)
Use the gsutil cp command to copy
Validation Results into GCS. For example, the
following command copies the Validation results
validation_1 and
validation_2into a GCS bucket:
gsutil cp uncommitted/validations/my_expectation_suite/validation_1.json gs://<your>/<your>/validation_1.json
gsutil cp uncommitted/validations/my_expectation_suite/validation_2.json gs://<your>/<your>/validation_2.json
The following confirmation message is returned:
Operation completed over 2 objects
Additional methods for copying Validation Results into GCS are available. See Upload objects from a filesystem.
5. Reference the new configuration
To make Great Expectations look for Validation Results
on the GCS store, set the
validations_store_name variable to the
name of your GCS Validations Store. In the previous
example this was validations_GCS_store.
6. Confirm that the Validation Results Store has been correctly configured
Run a Checkpoint to store results in the new Validation Results Store on GCS, and then visualize the results by re-building Data Docs.