Customize your deployment
Customizing your deployment by upgrading specific components of your deployment is a straight forward task. Data Contexts make this modular, so that you can add or swap out one component at a time. Most of these changes are quick, incremental steps—so you can upgrade from a basic demo deployment to a full production deployment at your own pace and be confident that your Data Context will continue to work at every step along the way.
This reference guide is designed to present you with clear options for upgrading your deployment. For specific implementation steps, please check out the linked How-to guides.
Components
Here’s an overview of the components of a typical Great Expectations deployment:
-
Great Expectations configs and metadata
-
Integrations to related systems
Options for storing Great Expectations configuration
The simplest way to manage your Great Expectations configuration is usually by committing great_expectations/great_expectations.yml to Git. However, it’s not usually a good idea to commit credentials to source control. In some situations, you might need to deploy without access to source control (or maybe even a file system).
Here’s how to handle each of those cases:
Options for storing Expectations
Many teams find it convenient to store Expectations in Git. Essentially, this approach treats Expectations like test fixtures: they live adjacent to code and are stored within version control. Git acts as a collaboration tool and source of record.
Alternatively, you can treat Expectations like configs, and store them in a blob store. Finally, you can store them in a database.
- How to configure an Expectation store in Amazon S3
- How to configure an Expectation store in GCS
- How to configure an Expectation store in Azure Blob Storage
- How to configure an Expectation store to PostgreSQL
- How to configure an Expectation store on a filesystem
Options for storing Validation Results
By default, Validation Results are stored locally, in an uncommitted directory. This is great for individual work, but not good for collaboration. The most common pattern is to use a cloud-based blob store such as S3, GCS, or Azure blob store. You can also store Validation Results in a database.
- How to configure a Validation Result store on a filesystem
- How to configure a Validation Result store in Amazon S3
- How to configure a Validation Result store in GCS
- How to configure a Validation Result store in Azure Blob Storage
- How to configure a Validation Result store to PostgreSQL
Reference Architectures
- How to instantiate a Data Context on an EMR Spark cluster
- How to use Great Expectations in Databricks
Connecting to Data
Great Expectations allows you to connect to data in a wide variety of sources, and the list is constantly getting longer. If you have an idea for a source not listed here, please speak up in the public discussion forum.
- How to connect to a Athena database
- How to connect to a BigQuery database
- How to connect to a MSSQL database
- How to connect to a MySQL database
- How to connect to a Postgres database
- How to connect to a Redshift database
- How to connect to a Snowflake database
- How to connect to a SQLite database
- How to connect to data on a filesystem using Spark
- How to connect to data on S3 using Spark
- How to connect to data on GCS using Spark
Options for hosting Data Docs
By default, Data Docs are stored locally, in an uncommitted directory. This is great for individual work, but not good for collaboration. A better pattern is usually to deploy to a cloud-based blob store (S3, GCS, or Azure Blob Storage), configured to share a static website.
- How to host and share Data Docs on a filesystem
- How to host and share Data Docs on Azure Blob Storage
- How to host and share Data Docs on GCS
- How to host and share Data Docs on Amazon S3
Additional Checkpoints and Actions
Most teams will want to configure various Checkpoints and Validation Actions as part of their deployment. There are two primary patterns for deploying Checkpoints. Sometimes Checkpoints are executed during data processing (e.g. as a task within Airflow). From this vantage point, they can control program flow. Sometimes Checkpoints are executed against materialized data. Great Expectations supports both patterns. There are also some rare instances where you may want to validate data without using a Checkpoint.
- How to trigger Slack notifications as a Validation Action
- How to trigger Opsgenie notifications as a Validation Action
- How to trigger Email as a Validation Action
- How to deploy a scheduled Checkpoint with cron
- How to get Data Docs URLs for custom Validation Actions
- How to validate data without a Checkpoint
- How to run a Checkpoint in Airflow
Not interested in managing your own configuration or infrastructure?
Learn more about Great Expectations Cloud — our fully managed SaaS offering. Sign up for our weekly cloud workshop! You’ll get to see our newest features and apply for our private Alpha program!