How to set up Great Expectations to work with data in Azure Blob Storage
This guide will walk you through best practices for creating your GX Python environment and demonstrate how to locally install Great Expectations along with the necessary dependencies for working with data stored in Azure Blob Storage.
Prerequisites
- A supported version of Python (3.8 to 3.11). To download and install Python, see Python downloads.
- The ability to install Python modules with pip
- An Azure Storage account. A connection string is required to complete the setup.
Steps
1. Check your Python version
You can check your version of Python by running:
python --version
GX currently supports Python versions 3.7 to 3.10
python
or
python3
Depending on your installation and configuration
of Python 3, you may find that executing Python
commands from the terminal by calling
python
doesn't work as desired.
If a command using python
does not
work, try using python3
.
Instead of:
python --version
Try:
python3 --version
If this produces the desired result, simply
replace python
with
python3
in our example terminal
commands.
If this does not work, you may need to look into your Python 3 installation or configuration.
2. Create a Python virtual environment
As a best practice, we recommend using a virtual environment to partition your GX installation from any other Python projects that may exist on the same system. This ensures that there will not be dependency conflicts between the GX installation and other Python projects.
Once we have confirmed that Python 3 is installed
locally, we can create a virtual environment with
venv
.
venv
?
We have chosen to use venv
for
virtual environments in this guide because it is
included with Python 3. You are not limited to
using venv
, and can just as easily
install Great Expectations into virtual
environments with tools such as
virtualenv
, pyenv
, etc.
We will create our virtual environment by running:
python -m venv my_venv
This command will create a new directory called
my_venv
. Our virtual environment will be
located in this directory.
In order to activate the virtual environment we will run:
source my_venv/bin/activate
my_venv
?
You can name your virtual environment anything you
like. Simply replace my_venv
in the
examples above with the name that you would like
to use.
3. Install GX with optional dependencies for Azure Blob Storage
To install Great Expectations with the optional dependencies needed to work with Azure Blob Storage we execute the following pip command from the terminal:
python -m pip install 'great_expectations[azure]'
4. Verify that GX has been installed correctly
You can verify that GX installed successfully with the CLI command:
great_expectations --version
The output you receive if GX was successfully installed will be:
great_expectations, version 0.16.15
5. Configure the
config_variables.yml
file with your Azure
Storage credentials
We recommend that Azure Storage credentials be stored
in the config_variables.yml
file, which
is located in the uncommitted/
folder by
default, and is not part of source control. The
following lines add Azure Storage credentials under
the key AZURE_STORAGE_CONNECTION_STRING
.
Additional options for configuring the
config_variables.yml
file or additional
environment variables can be found
here.
AZURE_STORAGE_CONNECTION_STRING: "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
Next steps
To continue configuring your Data Context to use Azure Blob Storage, please see: