How to configure a Validation Result store in GCS
By default,
Validation ResultsGenerated when data is Validated against an
Expectation or Expectation Suite.
are stored in JSON format in the
uncommitted/validations/
subdirectory of
your great_expectations/
folder. Since
Validation Results may include examples of data (which
could be sensitive or regulated) they should not be
committed to a source control system. This guide will
help you configure a new storage location for
Validation Results in a Google Cloud Storage (GCS)
bucket.
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
- A working installation of Great Expectations
- Configured a Data Context.
- Configured an Expectations Suite.
- Configured a Checkpoint.
- Configured a Google Cloud Platform (GCP) service account with credentials that can access the appropriate GCP resources, which include Storage Objects.
- Identified the GCP project, GCS bucket, and prefix where Validation Results will be stored.
Steps
1. Configure your GCP credentials
Check that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Validation Results will be stored.
The Google Cloud Platform documentation describes how to verify your authentication for the Google Cloud API, which includes:
- Creating a Google Cloud Platform (GCP) service account,
-
Setting the
GOOGLE_APPLICATION_CREDENTIALS
environment variable, - Verifying authentication by running a simple Google Cloud Storage client library script.
2. Identify your Data Context Validation Results Store
As with other
StoresA connector to store and retrieve information
about metadata in Great Expectations., you can find your
Validation Results StoreA connector to store and retrieve information
about objects generated when data is Validated
against an Expectation Suite.
through your
Data ContextThe primary entry point for a Great Expectations
deployment, with configurations and methods for
all supporting components.. In your great_expectations.yml
, look
for the following lines. The configuration tells Great
Expectations to look for Validation Results in a Store
called validations_store
. The
base_directory
for
validations_store
is set to
uncommitted/validations/
by default.
stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
validations_store_name: validations_store
3. Update your configuration file to include a new Store for Validation Results on GCS
In our case, the name is set to
validations_GCS_store
, but it can be any
name you like. We also need to make some changes to
the store_backend
settings. The
class_name
will be set to
TupleGCSStoreBackend
,
project
will be set to your GCP project,
bucket
will be set to the address of your
GCS bucket, and prefix
will be set to the
folder on GCS where Validation Result files will be
located.
stores:
validations_GCS_store:
class_name: ValidationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <your>
bucket: <your>
prefix: <your>
validations_store_name: validations_GCS_store
If you are also storing
Expectations in GCS
or
DataDocs in GCS, please ensure that the
prefix
values are disjoint and one is
not a substring of the other.
4. Copy existing Validation Results to the GCS bucket (This step is optional)
One way to copy Validation Results into GCS is by
using the gsutil cp
command, which is
part of the Google Cloud SDK. In the example below,
two Validation results, validation_1
and
validation_2
are copied to the GCS
bucket. Information on other ways to copy Validation
results, like the Cloud Storage browser in the Google
Cloud Console, can be found in the
Documentation for Google Cloud.
gsutil cp uncommitted/validations/my_expectation_suite/validation_1.json gs://<your>/<your>/validation_1.json
gsutil cp uncommitted/validations/my_expectation_suite/validation_2.json gs://<your>/<your>/validation_2.json
Operation completed over 2 objects
5. Confirm that the new Validation Results Store has been added by running
great_expectations store list
Only the active Stores will be listed. Great
Expectations will look for Validation Results in GCS
as long as we set the
validations_store_name
variable to
validations_GCS_store
, and the config for
validations_store
can be removed if you
would like.
- name: validations_GCS_store
class_name: ValidationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <your>
bucket: <your>
prefix: <your>
6. Confirm that the Validation Results Store has been correctly configured
Run a Checkpoint to store results in the new Validation Results Store on GCS then visualize the results by re-building Data Docs.
Additional Notes
To view the full script used in this page, see it on GitHub: