Skip to main content
Version: 0.15.50

How to configure a Validation Result Store in Amazon S3

By default, Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite. are stored in JSON format in the uncommitted/validations/ subdirectory of your great_expectations/ folder. Since Validation Results may include examples of data (which could be sensitive or regulated) they should not be committed to a source control system. The following steps will help you configure a new storage location for Validation Results in Amazon S3.

Prerequisites: This how-to guide assumes you have:
caution

Since Validation Results may include examples of data (which could be sensitive or regulated) they should not be committed to a source control system.

Steps

1. Install boto3 to your local environment

Python interacts with AWS through the boto3 library. Great Expectations makes use of this library in the background when working with AWS. Therefore, although you will not need to use boto3 directly, you will need to have it installed into your virtual environment.

You can do this with the pip command:

Terminal command
python -m pip install boto3

or

Terminal command
python3 -m pip install boto3

For more detailed instructions on how to set up boto3 with AWS, and information on how you can use boto3 from within Python, please reference boto3's documentation site.

2. Verify that your AWS credentials are properly configured

If you have installed the AWS CLI, you can verify that your AWS credentials are properly configured by running the command:

Terminal command
aws sts get-caller-identity

If your credentials are properly configured, this will output your UserId, Account and Arn. If your credentials are not configured correctly, this will throw an error.

If an error is thrown, or if you were unable to use the AWS CLI to verify your credentials configuration, you can find additional guidance on configuring your AWS credentials by referencing Amazon's documentation on configuring the AWS CLI.

3. Identify your Data Context Validation Results Store

You can find your Validation Results StoreA connector to store and retrieve information about objects generated when data is Validated against an Expectation Suite. configuration within your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components..

Look for the following section in your Data Context'sThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. great_expectations.yml file:

File contents: great_expectations.yml
validations_store_name: validations_store

stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/

This configuration tells Great Expectations to look for Validation Results in a Store called validations_store. It also creates a ValidationsStore called validations_store that is backed by a Filesystem and will store Validation Results under the base_directory uncommitted/validations (the default).

4. Update your configuration file to include a new Store for Validation Results on S3

You can manually add a Validation Results Store by adding the configuration below to the stores section of your great_expectations.yml file:

File contents: great_expectations.yml
stores:
validations_S3_store:
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'

To make the Store work with S3, you will need to make some changes from the default store_backend settings, as has been done in the above example. The class_name will be set to TupleS3StoreBackend, bucket will be set to the address of your S3 bucket, and prefix will be set to the folder in your S3 bucket where Validation results will be located.

For the example above, note that the new Store's name is set to validations_S3_store. This can be any name you like, as long as you also update the value of the validations_store_name key to match the new Store's name.

File contents: great_expectations.yml
validations_store_name: validations_S3_store

This update to the value of the validations_store_name key will tell Great Expectations to use the new Store for Validation Results.

caution

If you are also storing ExpectationsA verifiable assertion about data. in S3 (How to configure an Expectation store to use Amazon S3), or DataDocs in S3 (How to host and share Data Docs on Amazon S3), then please ensure that the prefix values are disjoint and one is not a substring of the other.

5. Confirm that the new Validation Results Store has been properly added

You can verify your active Stores are configured correctly by running the terminal command:

Terminal input
great_expectations store list

This will list the currently configured Stores that Great Expectations has access to. If you added a new S3 Validation Results Store, the output should include the following ValidationStore entry:

Terminal output
- name: validations_S3_store
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'

Please note that the great_expectations store list command will specifically list your active Stores, which are the ones specified by expectations_store_name, validations_store_name, evaluation_parameter_store_name, and checkpoint_store_name in the file great_expectations.yml. These are the Stores that your Data Context will use by default.

To make Great Expectations look for Validation Results on the S3 bucket, you must set the validations_store_name variable to the name of your S3 Validations Store, which in our example is validations_s3_store.

Additional options are available for a more fine-grained customization of the TupleS3StoreBackend.

File contents: great_expectations.yml
class_name: ValidationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
endpoint_url: ${S3_ENDPOINT} # Uses the S3_ENDPOINT environment variable to determine which endpoint to use.
region_name: '<your_aws_region_name>'

6. Copy existing Validation results to the S3 bucket (This step is optional)

If you are converting an existing local Great Expectations deployment to one that works in AWS you may already have Validation Results saved that you wish to keep and transfer to your S3 bucket.

You can copy Validation Results into Amazon S3 is by using the aws s3 sync command. As mentioned earlier, the base_directory is set to uncommitted/validations/ by default.

Terminal input
aws s3 sync '<base_directory>' s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'

In the example below, two Validation Results, Validation1 and Validation2 are copied to Amazon S3. This results in the following output:

Terminal output
upload: uncommitted/validations/val1/val1.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/val1.json
upload: uncommitted/validations/val2/val2.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/val2.json

If you have Validation Results to copy into S3, your output should look similar.

7. Confirm that the Validations Results Store has been correctly configured

Run a Checkpoint to store results in the new Validation Results Store on S3 then visualize the results by re-building Data Docs.

🚀🚀 Congratulations! 🚀🚀

You have configured your Validation Results Store to exist in your S3 bucket!