How to configure an Expectation Store to use Amazon S3
By default, new
ProfiledThe act of generating Metrics and candidate
Expectations from data.
ExpectationsA verifiable assertion about data.
are stored as
Expectation SuitesA collection of verifiable assertions about
data.
in JSON format in the
expectations/ subdirectory of your
great_expectations/ folder. Use the
information provided here to configure a new storage
location for Expectations in an Amazon S3 bucket.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A Data Context.
- An Expectations Suite.
- Permissions to install boto3 in your local environment.
- An S3 bucket and prefix to store Expectations.
1. Install boto3 with pip
Python interacts with AWS through the
boto3 library. Great Expectations makes
use of this library in the background when working
with AWS. Although you won't use
boto3 directly, you'll need to
install it in your virtual environment.
Run one of the following pip commands to install
boto3 in your virtual environment:
python -m pip install boto3
or
python3 -m pip install boto3
To set up
boto3
with AWS, and use boto3 from within
Python, see the
Boto3 documentation.
2. Verify your AWS credentials are properly configured
Run the following command in the AWS CLI to verify that your AWS credentials are properly configured:
aws sts get-caller-identity
When your credentials are properly configured, your
UserId, Account and
Arn are returned. If your credentials are
not configured correctly, an error message appears.
If an error message appears, or if you couldn't use the AWS CLI to verify your credentials configuration, see Configuring the AWS CLI.
3. Identify your Data Context Expectations Store
Your Expectation StoreA connector to store and retrieve information about collections of verifiable assertions about data. configuration is in your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components..
The following section in your
Data ContextThe primary entry point for a Great Expectations
deployment, with configurations and methods for
all supporting components.
great_expectations.yml file tells Great
Expectations to look for Expectations in a Store named
expectations_store:
stores:
expectations_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: expectations/
expectations_store_name: expectations_store
The default base_directory for
expectations_store is
expectations/.
4. Update your configuration file to include a new Store for Expectations
To manually add an
Expectations StoreA connector to store and retrieve information
about collections of verifiable assertions about
data.
to your configuration, add the following configuration
to the stores section of your
great_expectations.yml file:
stores:
expectations_S3_store:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: <your>
prefix: <your>
expectations_store_name: expectations_S3_store
As shown in the previous example, you need to change
the default store_backend settings to
make the Store work with S3. The
class_name is set to
TupleS3StoreBackend,
bucket is the address of your S3 bucket,
and prefix is the folder in your S3
bucket where Expectations are located.
The following example shows the additional options
that are available to customize
TupleS3StoreBackend:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
endpoint_url: ${S3_ENDPOINT} # Uses the S3_ENDPOINT environment variable to determine which endpoint to use.
region_name: '<your_aws_region_name>'
In the previous example, the Store name is
expectations_S3_store. If you use a
personalized Store name, you must also update the
value of the expectations_store_name key
to match the Store name. For example:
expectations_store_name: expectations_S3_store
When you update the
expectations_store_name key value, Great
Expectations uses the new Store for Validation
Results.
Add the following code to
great_expectations.yml to configure the
IAM user:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
aws_access_key_id: ${AWS_ACCESS_KEY_ID} # Uses the AWS_ACCESS_KEY_ID environment variable to get aws_access_key_id.
aws_secret_access_key: ${AWS_ACCESS_KEY_ID}
aws_session_token: ${AWS_ACCESS_KEY_ID}
Add the following code to
great_expectations.yml to configure the
IAM Assume Role:
class_name: ExpectationsStore
store_backend:
class_name: TupleS3StoreBackend
bucket: '<your_s3_bucket_name>'
prefix: '<your_s3_bucket_folder_name>'
boto3_options:
assume_role_arn: '<your_role_to_assume>'
region_name: '<your_aws_region_name>'
assume_role_duration: session_duration_in_seconds
If you are also storing
Validations in S3
or
DataDocs in S3, make sure that the prefix values
are disjoint and one is not a substring of the
other.
5. Copy existing Expectation JSON files to the S3 bucket (Optional)
If you are converting an existing local Great Expectations deployment to one that works in AWS, you might have Expectations saved that you want to transfer to your S3 bucket.
To copy Expectations into Amazon S3, use the
aws s3 sync command as shown in the
following example:
aws s3 sync '<base_directory>' s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'
The base_directory is set to
expectations/ by default.
In the following example, the Expectations
exp1 and exp2 are copied to
Amazon S3 and a confirmation message is returned:
upload: ./exp1.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/exp1.json
upload: ./exp2.json to s3://'<your_s3_bucket_name>'/'<your_s3_bucket_folder_name>'/exp2.json
6. Confirm Expectation Suite availability
If you copied your existing Expectation Suites to the S3 bucket, run the following Python code to confirm that Great Expectations can find them:
import great_expectations as gx
context = gx.get_context()
context.list_expectation_suite_names()
The Expectations you copied to S3 are returned as a list. Expectations that weren't copied to the new Store aren't listed.