How to configure a Validation Result Store in Azure Blob Storage
By default,
Validation ResultsGenerated when data is Validated against an
Expectation or Expectation Suite.
are stored in JSON format in the
uncommitted/validations/
subdirectory of
your great_expectations/
folder.
Validation Results might include sensitive or
regulated data that should not be committed to a
source control system. Use the information provided
here to configure a new storage location for
Validation Results in Azure Blob Storage.
Prerequisites
- Completion of the Quickstart guide.
- A working installation of Great Expectations.
- A Data Context.
- An Expectations Suite.
- A Checkpoint.
- An Azure Storage account and get the connection string.
-
An Azure Blob container. If you want to
host and share Data Docs on Azure Blob
Storage, you can set this up first and then use the
$web
existing container to store your ExpectationsA verifiable assertion about data.. - A prefix (folder) to store Validation Results. You don't need to create the folder, the prefix is just part of the Blob name.
1. Configure the
config_variables.yml
file with your Azure
Storage credentials
GX recommends that you store Azure Storage credentials
in the config_variables.yml
file, which
is located in the uncommitted/
folder by
default, and is not part of source control. The
following code adds Azure Storage credentials under
the key AZURE_STORAGE_CONNECTION_STRING
:
AZURE_STORAGE_CONNECTION_STRING: "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
To learn more about the additional options for
configuring the
config_variables.yml
file, or additional
environment variables, see
How to configure credentials
2. Identify your Validation Results Store
Your
Validation Results StoreA connector to store and retrieve information
about objects generated when data is Validated
against an Expectation Suite.
configuration is provided in your
Data ContextThe primary entry point for a Great Expectations
deployment, with configurations and methods for
all supporting components.. Open great_expectations.yml
and find
the following entry:
validations_store_name: validations_store
stores:
validations_store:
class_name: ValidationsStore
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/validations/
This configuration tells Great Expectations to look
for Validation Results in a Store named
validations_store
. The default
base_directory
for
validations_store
is
uncommitted/validations/
.
3. Update your configuration file to include a new Store for Validation Results on Azure Storage account
In the following example,
validations_store_name
is set to
validations_AZ_store
, but it can be
personalized. You also need to change the
store_backend
settings. The
class_name
is
TupleAzureBlobStoreBackend
,
container
is the name of your blob
container where Validation Results are stored,
prefix
is the folder in the container
where Validation Result files are located, and
connection_string
is
${AZURE_STORAGE_CONNECTION_STRING}
to
reference the corresponding key in the
config_variables.yml
file.
validations_store_name: validations_AZ_store
stores:
validations_AZ_store:
class_name: ValidationsStore
store_backend:
class_name: TupleAzureBlobStoreBackend
container: <blob-container>
prefix: validations
connection_string: ${AZURE_STORAGE_CONNECTION_STRING}
If the container for
hosting and sharing Data Docs on Azure Blob
Storage
is named $web
, use
container: \$web
to allow access to
the $web
container.
Additional authentication and configuration options are available. See Hosting and sharing Data Docs on Azure Blob Storage.
4. Copy existing Validation Results JSON files to the Azure blob (Optional)
You can use the
az storage blob upload
command to copy
Validation Results into Azure Blob Storage. The
following command copies one Validation Result from a
local folder to the Azure blob:
export AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
az storage blob upload -f <local/path/to/validation.json> -c <GREAT-EXPECTATION-DEDICATED-AZURE-BLOB-CONTAINER-NAME> -n <PREFIX>/<validation.json>
example with a validation related to the exp1 expectation:
az storage blob upload -f great_expectations/uncommitted/validations/exp1/20210306T104406.877327Z/20210306T104406.877327Z/8313fb37ca59375eb843adf388d4f882.json -c <blob-container> -n validations/exp1/20210306T104406.877327Z/20210306T104406.877327Z/8313fb37ca59375eb843adf388d4f882.json
Finished[#############################################################] 100.0000%
{
"etag": "\"0x8D8E09F894650C7\"",
"lastModified": "2021-03-06T12:58:28+00:00"
}
To learn more about other methods that are available to copy Validation Result JSON files into Azure Blob Storage, see Quickstart: Upload, download, and list blobs with the Azure portal.
5. Reference the new configuration
To make Great Expectations look for Validation Results
on the Azure store, set the
validations_store_name
variable to the
name of your Azure Validations Store. In the previous
example this was validations_AZ_store
.
6. Confirm that the Validation Results Store has been correctly configured
Run a Checkpoint to store results in the new Validation Results Store on Azure Blob and then visualize the results by re-building Data Docs.