Skip to main content
Version: 0.16.16

How to configure a Validation Result store in GCS

By default, Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite. are stored in JSON format in the uncommitted/validations/ subdirectory of your great_expectations/ folder. Validation Results can include sensitive or regulated data that should not be committed to a source control system. Use the information provided here to configure a new storage location for Validation Results in Google Cloud Storage (GCS).

To view all the code used in this topic, see how_to_configure_a_validation_result_store_in_gcs.py.

Prerequisites

1. Configure your GCP credentials

Confirm that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Validation Results will be stored. This includes the following:

  • A GCP service account.
  • Setting the GOOGLE_APPLICATION_CREDENTIALS environment variable.
  • Verifying authentication by running a Google Cloud Storage client library script.

For more information about validating your GCP authentication credentials, see Authenticate to Cloud services using client libraries.

2. Identify your Data Context Validation Results Store

The configuration for your Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite. is available in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. Open great_expectations.ymland find the following entry:

stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/

validations_store_name: validations_store

This configuration tells Great Expectations to look for Validation Results in the validations_store Store. The default base_directory for validations_store is uncommitted/validations/.

3. Update your configuration file to include a new Store for Validation Results

In the following example, validations_store_name is set to validations_GCS_store, but it can be personalized. You also need to change the store_backend settings. The class_name is TupleGCSStoreBackend, project is your GCP project, bucket is the address of your GCS bucket, and prefix is the folder on GCS where Validation Result files are stored.

stores:
validations_GCS_store:
class_name: ValidationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <your>
bucket: <your>
prefix: <your>

validations_store_name: validations_GCS_store
danger

If you are also storing Expectations in GCS or DataDocs in GCS, make sure that the prefix values are disjoint and one is not a substring of the other.

4. Copy existing Validation Results to the GCS bucket (Optional)

Use the gsutil cp command to copy Validation Results into GCS. For example, the following command copies the Validation results validation_1 and validation_2into a GCS bucket:

gsutil cp uncommitted/validations/my_expectation_suite/validation_1.json gs://<your>/<your>/validation_1.json
gsutil cp uncommitted/validations/my_expectation_suite/validation_2.json gs://<your>/<your>/validation_2.json

The following confirmation message is returned:

Operation completed over 2 objects

Additional methods for copying Validation Results into GCS are available. See Upload objects from a filesystem.

5. Reference the new configuration

To make Great Expectations look for Validation Results on the GCS store, set the validations_store_name variable to the name of your GCS Validations Store. In the previous example this was validations_GCS_store.

6. Confirm that the Validation Results Store has been correctly configured

Run a Checkpoint to store results in the new Validation Results Store on GCS, and then visualize the results by re-building Data Docs.