Review and next steps
- Completed Step 4: Validate Data of this tutorial.
Review
In this tutorial we've taken you through the four steps you need to be able to perform to use Great Expectations.
Let's review each of these steps and take a look at the important concepts and features we used.
Step 1: SetupYou installed Great Expectations and initialized your Data Context. |
- Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components.: The folder structure that contains the entirety of your Great Expectations project. It is also the entry point for accessing all the primary methods for creating elements of your project, configuring those elements, and working with the metadata for your project.
- CLICommand Line Interface: The Command Line Interface for Great Expectations. The CLI provides helpful utilities for deploying and configuring Data Contexts, as well as a few other convenience methods.
Step 2: Connect to DataYou created and configured your Datasource. |
- DatasourceProvides a standard API for accessing and interacting with data from a wide variety of source systems.: An object that brings together a way of interacting with data (an Execution Engine) and a way of accessing that data (a Data Connector). Datasources are used to obtain Batches for Validators, Expectation Suites, and Profilers.
- Jupyter Notebooks: These notebooks are launched by some processes in the CLI. They provide useful boilerplate code for everything from configuring a new Datasource to building an Expectation Suite to running a Checkpoint.
Step 3: Create ExpectationsYou used the automatic Profiler to build an Expectation Suite. |
- Expectation SuiteA collection of verifiable assertions about data.: A collection of Expectations.
- ExpectationsA verifiable assertion about data.: A verifiable assertion about data. Great Expectations is a framework for defining Expectations and running them against your data. In the tutorial's example, we asserted that NYC taxi rides should have a minimum of one passenger. When we ran that Expectation against our second set of data Great Expectations reported back that some records in the new data indicated a ride with zero passengers, which failed to meet this Expectation.
- ProfilerGenerates Metrics and candidate Expectations from data.: A tool that automatically generates Expectations from a BatchA selection of records from a Data Asset. of data.
Step 4: Validate DataYou created a Checkpoint which you used to validate new data. You then viewed the Validation Results in Data Docs. |
- CheckpointThe primary means for validating data in a production deployment of Great Expectations.: An object that uses a Validator to run an Expectation Suite against a batch of data. Running a Checkpoint produces Validation Results for the data it was run on.
- Validation ResultsGenerated when data is Validated against an Expectation or Expectation Suite.: A report generated from an Expectation Suite being run against a batch of data. The Validation Result itself is in JSON and is rendered as Data Docs.
- Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc.: Human readable documentation that describes Expectations for data and its Validation Results. Data docs can be generated both from Expectation Suites (describing our Expectations for the data) and also from Validation Results (describing if the data meets those Expectations).
Going forward
Your specific use case will no doubt differ from that of our tutorial. However, the four steps you'll need to perform in order to get Great Expectations working for you will be the same. Setup, connect to data, create Expectations, and validate data. That's all there is to it! As long as you can perform these four steps you can have Great Expectations working to validate data for you.
For those who only need to know the basics in order to make Great Expectations work our documentation include an Overview reference for each step.
For those who prefer working from examples, we have "How to" guides which show working examples of how to configure objects from Great Expectations according to specific use cases. You can find these in the table of contents under the category that corresponds to when you would need to do so. Or, if you want a broad overview of the options for customizing your deployment we also provide a reference document on ways to customize your deployment.