Deploying Great Expectations in a hosted environment without file system or CLI
If you follow the steps of the Getting Started tutorial, you create a standard deployment of Great Expectations. By default, this relies on two components:
- The Great Expectations CLI to initialize a Data Context, create Expectation Suites, add Datasources, etc.
-
The
great_expectations.yml
file to configure your Data Context, e.g. to point at different Stores for Validation Results, etc.
However, you might not have these components available in hosted environments, such as Databricks, AWS EMR, Google Cloud Composer, and others. This workflow guide will outline the main steps required to successfully use Great Expectations in a hosted environment.
Step 1: Configure your Data Context
Instead of using the Great Expectations CLI, you can create a Data Context directly in code. Your Data Context also manages the following components described in this guide:
- Datasources to connect to data
- Stores to save Expectations and Validation Results
- Data Docs hosting
The following guide gives an overview of creating an in-code Data Context including defaults to help you more quickly set one up for common configurations:
The following guides will contain examples for each environment we have tested out:
- How to instantiate a Data Context on an EMR Spark cluster
- How to use Great Expectations in Databricks
Step 2: Create Expectation Suites and add Expectations
If you want to create an Expectation Suite in your environment without using the CLI, you can follow this guide from step 5 onward to add a Datasource and an Expectation Suite: How to connect to a PostgreSQL database
You can then add Expectations to your Suite one at a time like this example:
validator.expect_column_values_to_not_be_null("my_column")
validator.save_expectation_suite(discard_failed_expectations=False)
In order to load the Suite at a later time, you will need to ensure that you have an Expectation store configured:
- How to configure an Expectation store to use Amazon S3
- How to configure an Expectation store to use Azure Blob Storage
- How to configure an Expectation store to use GCS
- How to configure an Expectation store to use a filesystem
- How to configure an Expectation store to use PostgreSQL
Step 3: Run validation
In order to use an Expectation Suite you've created to validate data, follow this guide: How to validate data without a Checkpoint
Step 4: Use Data Docs
Finally, if you would like to build and view Data Docs in your environment, please follow the guides for configuring Data Docs: Options for hosting Data Docs
Additional notes
If you have successfully deployed Great Expectations in a hosted environment other than the ones listed above, we would love to hear from you. Please reach out to us on Slack