How to host and share Data Docs on Azure Blob Storage
This guide will explain how to host and share Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. on Azure Blob Storage. Data Docs will be served using an Azure Blob Storage static website with restricted access.
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
- A working installation of Great Expectations
- Set up a working deployment of Great Expectations
- Have permission to create and configured an Azure Storage account
1. Create an Azure Blob Storage static website
- Create a storage account.
- In settings select Static website to display the configuration page for static websites.
- Select Enabled to enable static website hosting for the storage account.
- Write "index.html" in Index document.
Note the Primary endpoint url. Your team will be able
to consult your data doc on this url when you have
finished this tutorial. You could also map a custom
domain to this endpoint. A container called
$web
should have been created in your
storage account.
2. Configure the
config_variables.yml
file with your Azure
Storage credentials
Get the Connection string of the storage account you have just created.
We recommend that Azure Storage credentials be stored
in the config_variables.yml
file, which
is located in the uncommitted/
folder by
default, and is not part of source control. The
following lines add Azure Storage credentials under
the key AZURE_STORAGE_CONNECTION_STRING
.
Additional options for configuring the
config_variables.yml
file or additional
environment variables can be found
here.
AZURE_STORAGE_CONNECTION_STRING: "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=<YOUR-STORAGE-ACCOUNT-NAME>;AccountKey=<YOUR-STORAGE-ACCOUNT-KEY==>"
3. Add a new Azure site to the data_docs_sites section of your great_expectations.yml
data_docs_sites:
local_site:
class_name: SiteBuilder
show_how_to_buttons: true
store_backend:
class_name: TupleFilesystemStoreBackend
base_directory: uncommitted/data_docs/local_site/
site_index_builder:
class_name: DefaultSiteIndexBuilder
az_site: # this is a user-selected name - you may select your own
class_name: SiteBuilder
store_backend:
class_name: TupleAzureBlobStoreBackend
container: \$web
connection_string: ${AZURE_STORAGE_CONNECTION_STRING}
site_index_builder:
class_name: DefaultSiteIndexBuilder
You may also replace the default
local_site
if you would only like to
maintain a single Azure Data Docs site.
Since the container is called $web
,
if we simply set container: $web
in
great_expectations.yml
then Great
Expectations would unsuccessfully try to find the
variable called web
in
config_variables.yml
. We use an
escape char \
before the
$
so the
substitute_config_variable
method will allow us to reach the
$web
container.
You also may configure Great Expectations to store
your
ExpectationsA verifiable assertion about data.
and
Validation ResultsGenerated when data is Validated against an
Expectation or Expectation Suite.
in this Azure Storage account. You can follow the
documentation from the guides for
Expectations
and
Validation Results
but be sure you set container: \$web
in
place of the other container name.
The following options are available for this backend:
-
container
: The name of the Azure Blob container to store your data in. -
connection_string
: The Azure Storage connection string. This can also be supplied by setting theAZURE_STORAGE_CONNECTION_STRING
environment variable. -
prefix
: All paths on blob storage will be prefixed with this string. -
account_url
: The URL to the blob storage account. Any other entities included in the URL path (e.g. container or blob) will be discarded. This URL can be optionally authenticated with a SAS token. This can only be used if you don't configure theconnection_string
. You can also configure this by setting theAZURE_STORAGE_ACCOUNT_URL
environment variable.
The most common authentication methods are supported:
-
SAS token authentication: append the SAS token to
account_url
or make sure it is set in theconnection_string
. -
Account key authentication: include the account key
in the
connection_string
. - When none of the above authentication methods are specified, the DefaultAzureCredential will be used which supports most common authentication methods. You still need to provide the account url either through the config file or environment variable.
4. Build the Azure Blob Data Docs site
You can create or modify an
Expectation SuiteA collection of verifiable assertions about
data.
and this will build the Data Docs website. Or you can
use the following
CLICommand Line Interface
command:
great_expectations docs build --site-name
az_site
.
> great_expectations docs build --site-name az_site
The following Data Docs sites will be built:
- az_site: https://<your-storage-account>.blob.core.windows.net/$web/index.html
Would you like to proceed? [Y/n]: y
Building Data Docs...
Done building Data Docs
If successful, the CLI will provide the object URL of the index page. You may secure the access of your website using an IP filtering mechanism.
5. Limit the access to your company
- On your Azure Storage Account Settings click on Networking
- Allow access from Selected networks
- You can add access to Virtual Network
- You can add IP ranges to the firewall
More details are available here.