How to configure a Validation Result Store in Amazon S3
By default,
Validation ResultsGenerated when data is Validated against an
Expectation or Expectation Suite.
are stored in JSON format in the
uncommitted/validations/
subdirectory of
your great_expectations/
folder. Use the
information provided here to configure a new storage
location for Validation Results in Amazon S3.
Validation Results can include sensitive or regulated data that should not be committed to a source control system.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A Data Context.
- An Expectations Suite.
- A Checkpoint.
- Permissions to install boto3 in your local environment.
- An S3 bucket and prefix for the Validation Results.
1. Install boto3 in your local environment
Python interacts with AWS through the
boto3
library. Great Expectations makes
use of this library in the background when working
with AWS. Although you won't use
boto3
directly, you'll need to
install it in your virtual environment.
Run one of the following pip commands to install
boto3
in your virtual environment:
python -m pip install boto3
or
python3 -m pip install boto3
To set up
boto3
with AWS, and use boto3
from within
Python, see the
Boto3 documentation.
2. Verify your AWS credentials are properly configured
Run the following command in the AWS CLI to verify that your AWS credentials are properly configured:
aws sts get-caller-identity
When your credentials are properly configured, your
UserId
, Account
and
Arn
are returned. If your credentials are
not configured correctly, an error message appears.
If an error message appears, or if you couldn't use the AWS CLI to verify your credentials configuration, see Configuring the AWS CLI.
3. Identify your Data Context Validation Results Store
Your Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite. configuration is in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components..
The following section in your
Data ContextThe primary entry point for a Great Expectations
deployment, with configurations and methods for
all supporting components.
great_expectations.yml
file tells Great
Expectations to look for Validation Results in a Store
named validations_store
. It also creates
a ValidationsStore
named
validations_store
that is backed by a
Filesystem and stores Validation Results under the
base_directory
uncommitted/validations
(the default).
stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
validations_store_name: validations_store
4. Update your configuration file to include a new Store for Validation Results
To manually add a Validation Results Store, add the
following configuration to the
stores
section of your
great_expectations.yml
file:
stores:
validations_S3_store:
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: <your>
prefix: <your>
As shown in the previous example, you need to change
the default store_backend
settings to
make the Store work with S3. The
class_name
is set to
TupleS3StoreBackend
,
bucket
is the address of your S3 bucket,
and prefix
is the folder in your S3
bucket where Validation Results are located.
The following example shows the additional options
that are available to customize
TupleS3StoreBackend
:
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
endpoint_url: ${S3_ENDPOINT} # Uses the S3_ENDPOINT environment variable to determine which endpoint to use.
region_name: '<your_aws_region_name>'
In the previous example, the Store name is
validations_S3_store
. If you use a
personalized Store name, you must also update the
value of the validations_store_name
key
to match the Store name. For example:
validations_store_name: validations_S3_store
When you update the
validations_store_name
key value, Great
Expectations uses the new Store for Validation
Results.
Add the following code to
great_expectations.yml
to configure the
IAM user:
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
aws_access_key_id: ${AWS_ACCESS_KEY_ID} # Uses the AWS_ACCESS_KEY_ID environment variable to get aws_access_key_id.
aws_secret_access_key: ${AWS_ACCESS_KEY_ID}
aws_session_token: ${AWS_ACCESS_KEY_ID}
Add the following code to
great_expectations.yml
to configure the
IAM Assume Role:
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
assume_role_arn: '<your_role_to_assume>'
region_name: '<your_aws_region_name>'
assume_role_duration: session_duration_in_seconds
If you are also storing
ExpectationsA verifiable assertion about data.
in S3 (How to configure an Expectation store to use
Amazon S3), or DataDocs in S3 (How to host and share Data Docs on Amazon S3), then make sure the prefix
values
are disjoint and one is not a substring of the
other.
5. Copy existing Validation results to the S3 bucket (Optional)
If you are converting an existing local Great Expectations deployment to one that works in AWS, you might have Validation Results saved that you want to transfer to your S3 bucket.
To copy Validation Results into Amazon S3, use the
aws s3 sync
command as shown in the
following example:
aws s3 sync '<base_directory>' s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'
The base_directory
is set to
uncommitted/validations/
by default.
In the following example, the Validation Results
Validation1
and
Validation2
are copied to Amazon S3 and a
confirmation message is returned:
upload: uncommitted/validations/val1/val1.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/val1.json
upload: uncommitted/validations/val2/val2.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/val2.json
6. Confirm the Validations Results Store configuration
Run a Checkpoint to store results in the new Validation Results Store on S3 then visualize the results by re-building Data Docs.