Skip to main content
Version: 0.16.16

How to set up Great Expectations to work with data on Amazon Web Services S3

This guide will walk you through best practices for creating your GX Python environment and demonstrate how to locally install Great Expectations along with the necessary dependencies for working with data stored in Amazon Web Services S3 storage.

Prerequisites

Steps

1. Ensure your AWS CLI version is the most recent

You can verify that the AWS CLI has been installed by running the command:

Terminal command
aws --version

If this command does not respond by informing you of the version information of the AWS CLI, you may need to install the AWS CLI or otherwise troubleshoot your current installation. For detailed guidance on how to do this, please refer to Amazon's documentation on how to install the AWS CLI)

2. Ensure your AWS credentials are correctly configured

You can verify that the AWS CLI has been installed by running the command:

Terminal command
aws --version

If this command does not respond by informing you of the version information of the AWS CLI, you may need to install the AWS CLI or otherwise troubleshoot your current installation. For detailed guidance on how to do this, please refer to Amazon's documentation on how to install the AWS CLI)

3. Check your Python version

You can check your version of Python by running:

Terminal command
python --version

GX currently supports Python versions 3.7 to 3.10

executing python commands with python or python3

Depending on your installation and configuration of Python 3, you may find that executing Python commands from the terminal by calling python doesn't work as desired. If a command using python does not work, try using python3.

Instead of:

Terminal command
python --version

Try:

Terminal command
python3 --version

If this produces the desired result, simply replace python with python3 in our example terminal commands.

If this does not work, you may need to look into your Python 3 installation or configuration.

4. Create a Python virtual environment

As a best practice, we recommend using a virtual environment to partition your GX installation from any other Python projects that may exist on the same system. This ensures that there will not be dependency conflicts between the GX installation and other Python projects.

Once we have confirmed that Python 3 is installed locally, we can create a virtual environment with venv.

Why use venv?

We have chosen to use venv for virtual environments in this guide because it is included with Python 3. You are not limited to using venv, and can just as easily install Great Expectations into virtual environments with tools such as virtualenv, pyenv, etc.

We will create our virtual environment by running:

Terminal command
python -m venv my_venv

This command will create a new directory called my_venv. Our virtual environment will be located in this directory.

In order to activate the virtual environment we will run:

Terminal command
source my_venv/bin/activate
Why name it my_venv?

You can name your virtual environment anything you like. Simply replace my_venv in the examples above with the name that you would like to use.

4. Install GX with optional dependencies for S3

To install Great Expectations with the optional dependencies needed to work with AWS S3 we execute the following pip command from the terminal:

Terminal input
python -m pip install 'great_expectations[s3]'

This will install Great Expectations and the boto3 package. GX uses boto3 to access S3.

5. Verify that GX has been installed correctly

You can verify that GX installed successfully with the CLI command:

Terminal input
great_expectations --version

The output you receive if GX was successfully installed will be:

Terminal output
great_expectations, version 0.16.15

Next steps

Now that you have installed GX with the necessary dependencies for working with S3, you are ready to initialize your Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.. The Data Context will contain your configurations for GX components, as well as provide you with access to GX's Python API.

To quickly create a Data Context and dive into working with GX, please see:

To initialize a Data Context on your filesystem, please reference:

To work with a temporary, in-memory Data Context, see: