How to host and share Data Docs on GCS
This guide will explain how to host and share Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. on Google Cloud Storage. We recommend using IP-based access, which is achieved by deploying a simple Google App Engine app. Data Docs can also be served on Google Cloud Storage if the contents of the bucket are set to be publicly readable, but this is strongly discouraged.
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
 - A working installation of Great Expectations
 - Set up a Google Cloud project
 - Installed and initialized the Google Cloud SDK (in order to use the gcloud CLI)
 - Set up the gsutil command line tool
 - Have permissions to: list and create buckets, deploy Google App Engine apps, add app firewall rules
 
Steps
1. Create a Google Cloud Storage bucket using gsutil
Make sure you modify the project name, bucket name, and region for your situation.
gsutil mb -p <your> -l US-EAST1 -b on gs://<your>/
                            
                          Creating gs://<your>/...
                            
                          2. Create a directory for your Google App Engine app and add the following files
                          We recommend placing it in your project directory, for
                          example great_expectations/team_gcs_app.
                        
app.yaml:
runtime: python37
env_variables:
  CLOUD_STORAGE_BUCKET: <your>
                            
                          requirements.txt:
flask>=1.1.0
google-cloud-storage
                            
                          main.py:
import logging
import os
from flask import Flask, request
from google.cloud import storage
app = Flask(__name__)
# Configure this environment variable via app.yaml
CLOUD_STORAGE_BUCKET = os.environ['CLOUD_STORAGE_BUCKET']
@app.route('/', defaults={'path': 'index.html'})
@app.route('/<path:path>')
def index(path):
    gcs = storage.Client()
    bucket = gcs.get_bucket(CLOUD_STORAGE_BUCKET)
    try:
        blob = bucket.get_blob(path)
        content = blob.download_as_string()
        if blob.content_encoding:
            resource = content.decode(blob.content_encoding)
        else:
            resource = content
    except Exception as e:
        logging.exception("couldn't get blob")
        resource = "<p>"
    return resource
@app.errorhandler(500)
def server_error(e):
    logging.exception('An error occurred during a request.')
    return '''
    An internal error occurred: <pre>{}
    See logs for full stacktrace.
    '''.format(e), 500
                            
                          3. If you haven't done so already, authenticate the gcloud CLI and set the project
gcloud auth login && gcloud config set project <your>
                            
                          4. Deploy your Google App Engine app
Issue the following CLICommand Line Interface command from within the app directory created above:
gcloud app deploy
                            
                          5. Set up Google App Engine firewall for your app to control access
Visit the following page for instructions on creating firewall rules: Creating firewall rules
6. Add a new GCS site to the data_docs_sites section of your great_expectations.yml
                          You may also replace the default
                          local_site if you would only like to
                          maintain a single GCS Data Docs site.
                        
data_docs_sites:
  local_site:
    class_name: SiteBuilder
    show_how_to_buttons: true
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
    site_index_builder:
      class_name: DefaultSiteIndexBuilder
  gs_site:  # this is a user-selected name - you may select your own
    class_name: SiteBuilder
    store_backend:
      class_name: TupleGCSStoreBackend
      project: <your>
      bucket: <your>
    site_index_builder:
      class_name: DefaultSiteIndexBuilder
                            
                          7. Build the GCS Data Docs site
Use the following CLI command:
great_expectations docs build --site-name gs_site
                            
                          If successful, the CLI will provide the object URL of the index page. Since the bucket is not public, this URL will be inaccessible. Rather, you will access the Data Docs site using the App Engine app configured above.
The following Data Docs sites will be built:
 - gs_site: https://storage.googleapis.com/<your>/index.html
Would you like to proceed? [Y/n]: Y
Building Data Docs...
Done building Data Docs
                            
                          8. Test that everything was configured properly by launching your App Engine app
                          Issue the following CLI command:
                          gcloud app browse. If successful, the
                          gcloud CLI will provide the URL to your app and launch
                          it in a new browser window. The page displayed should
                          be the index page of your Data Docs site.
                        
Additional notes
- 
                            
If you wish to host a Data Docs site through a private DNS, you can configure a
base_public_pathfor the Data Docs StoreA connector to store and retrieve information pertaining to Human readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc.. The following example will configure a GCS site with thebase_public_pathset to www.mydns.com . Data Docs will still be written to the configured location on GCS (for example https://storage.cloud.google.com/my_org_data_docs/index.html), but you will be able to access the pages from your DNS (http://www.mydns.com/index.html in our example).data_docs_sites:
gs_site: # this is a user-selected name - you may select your own
class_name: SiteBuilder
store_backend:
class_name: TupleGCSStoreBackend
project: <YOUR GCP PROJECT NAME>
bucket: <YOUR GCS BUCKET NAME>
base_public_path: http://www.mydns.com
site_index_builder:
class_name: DefaultSiteIndexBuilder 
Additional resources
- Google App Engine
 - Controlling App Access with Firewalls
 - Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc.
 - To view the full script used in this page, see it on GitHub: how_to_host_and_share_data_docs_on_gcs.py