How to configure an Expectation Store to use GCS
By default, newly
ProfiledThe act of generating Metrics and candidate
Expectations from data.
ExpectationsA verifiable assertion about data.
are stored as
Expectation SuitesA collection of verifiable assertions about
data.
in JSON format in the
expectations/
subdirectory of your
great_expectations/
folder. This guide
will help you configure Great Expectations to store
them in a Google Cloud Storage (GCS) bucket.
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
- Have a working installation of Great Expectations
- Configured a Data Context.
- Configured an Expectations Suite.
- Configured a Google Cloud Platform (GCP) service account with credentials that can access the appropriate GCP resources, which include Storage Objects.
- Identified the GCP project, GCS bucket, and prefix where Expectations will be stored.
Steps
1. Configure your GCP credentials
Check that your environment is configured with the appropriate authentication credentials needed to connect to the GCS bucket where Expectations will be stored.
The Google Cloud Platform documentation describes how to verify your authentication for the Google Cloud API, which includes:
- Creating a Google Cloud Platform (GCP) service account,
-
Setting the
GOOGLE_APPLICATION_CREDENTIALS
environment variable, - Verifying authentication by running a simple Google Cloud Storage client library script.
2. Identify your Data Context Expectations Store
In your great_expectations.yml
, look for
the following lines. The configuration tells Great
Expectations to look for Expectations in a
StoreA connector to store and retrieve information
about metadata in Great Expectations.
called expectations_store
. The
base_directory
for
expectations_store
is set to
expectations/
by default.
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
expectations_store_name: expectations_store
3. Update your configuration file to include a new store for Expectations on GCS
In our case, the name is set to
expectations_GCS_store
, but it can be any
name you like. We also need to make some changes to
the store_backend
settings. The
class_name
will be set to
TupleGCSStoreBackend
,
project
will be set to your GCP project,
bucket
will be set to the address of your
GCS bucket, and prefix
will be set to the
folder on GCS where Expectation files will be located.
If you are also storing
Validations in GCS
or
DataDocs in GCS, please ensure that the
prefix
values are disjoint and one is
not a substring of the other.
stores:
expectations_GCS_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <YOUR GCP PROJECT NAME>
bucket: <YOUR GCS BUCKET NAME>
prefix: <YOUR GCS PREFIX NAME>
expectations_store_name: expectations_GCS_store
4. Copy existing Expectation JSON files to the GCS bucket (This step is optional)
One way to copy Expectations into GCS is by using the
gsutil cp
command, which is part of the
Google Cloud SDK. The following example will copy one
Expectation, my_expectation_suite
from a
local folder to the GCS bucket. Information on other
ways to copy Expectation JSON files, like the Cloud
Storage browser in the Google Cloud Console, can be
found in the
Documentation for Google Cloud.
gsutil cp expectations/my_expectation_suite.json gs://<YOUR GCS BUCKET NAME>/<YOUR GCS PREFIX NAME>/my_expectation_suite.json
Operation completed over 1 objects
5. Confirm that the new Expectations store has been added
Run the following:
great_expectations store list
Only the active Stores will be listed. Great
Expectations will look for Expectations in GCS as long
as we set the
expectations_store_name
variable to
expectations_GCS_store
, and the config
for expectations_store
can be removed if
you would like.
- name: expectations_GCS_store
class_name: ExpectationsStore
store_backend:
class_name: TupleGCSStoreBackend
project: <YOUR GCP PROJECT NAME>
bucket: <YOUR GCS BUCKET NAME>
prefix: <YOUR GCS PREFIX NAME>
6. Confirm that Expectations can be accessed from GCS
To do this, run the following:
great_expectations suite list
If you followed Step 4, the output should include the
Expectation we copied to GCS:
my_expectation_suite
. If you did not copy
Expectations to the new Store, you will see a message
saying no Expectations were found.
1 Expectation Suite found:
- my_expectation_suite
Additional Notes
To view the full script used in this page, see it on GitHub: