How to create a Custom Table Expectation
                          TableExpectations are
                          one of the most common types of
                          Expectation. They are evaluated for an entire table, and answer
                          a semantic question about the table itself. For
                          example,
                          expect_table_column_count_to_equal and
                          expect_table_row_count_to_equal answer
                          how many columns and rows are in your table.
                        
                          This guide will walk you through the process of
                          creating your own custom
                          TableExpectation.
                        
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
- Set up your dev environment
- Read the overview for creating Custom Expectations.
Steps
1. Choose a name for your Expectation
                          First, decide on a name for your own Expectation. By
                          convention, TableExpectations always
                          start with expect_table_. For more on
                          Expectation naming conventions, see the
                          Expectations section
                          of the Code Style Guide.
                        
                          Your Expectation will have two versions of the same
                          name: a CamelCaseName and a
                          snake_case_name. For example, this
                          tutorial will use:
                        
- ExpectTableColumnsToBeUnique
- 
                            expect_table_columns_to_be_unique
2. Copy and rename the template file
By convention, each Expectation is kept in its own python file, named with the snake_case version of the Expectation's name.
You can find the template file for a custom TableExpectation here. Download the file, place it in the appropriate directory, and rename it to the appropriate name.
cp table_expectation_template.py /SOME_DIRECTORY/expect_table_columns_to_be_unique.py
Where should I put my Expectation file?
                                  During development, you don't actually
                                  need to put the file anywhere in particular.
                                  It's self-contained, and can be executed
                                  anywhere as long as
                                  great_expectations is installed.
                                
But to use your new Expectation alongside the other components of Great Expectations, you'll need to make sure the file is in the right place. The right place depends on what you intend to use it for.
- 
                                    If you're building a Custom Expectation
                                    for personal use, you'll need to put it
                                    in the
                                    great_expectations/plugins/expectationsfolder of your Great Expectations deployment, and import your Custom Expectation from that directory whenever it will be used. When you instantiate the correspondingDataContext, it will automatically make all plugins in the directory available for use.
- 
                                    If you're building a Custom Expectation
                                    to contribute to the open source project,
                                    you'll need to put it in the repo for
                                    the Great Expectations library itself. Most
                                    likely, this will be within a package within
                                    contrib/:great_expectations/contrib/SOME_PACKAGE/SOME_PACKAGE/expectations/. To use these Expectations, you'll need to install the package.
See our guide on how to use a Custom Expectation for more!
3. Generate a diagnostic checklist for your Expectation
Once you've copied and renamed the template file, you can execute it as follows.
python expect_table_columns_to_be_unique.py
                          The template file is set up so that this will run the
                          Expectation's
                          print_diagnostic_checklist() method. This
                          will run a diagnostic script on your new Expectation,
                          and return a checklist of steps to get it to full
                          production readiness. This guide will walk you through
                          the first four steps, the minimum for a functioning
                          Custom Expectation and all that is required for
                          contribution back to open source
                          at an Experimental level.
                        
Completeness checklist for ExpectColumnAggregateToMatchSomeCriteria:
  ✔ Has a library_metadata object
    Has a docstring, including a one-line short description
    Has at least one positive and negative example case, and all test cases pass
    Has core logic and passes tests on at least one Execution Engine
...
When in doubt, the next step to implement is the first one that doesn't have a ✔ next to it. This guide covers the first four steps on the checklist.
4. Change the Expectation class name and add a docstring
By convention, your Metric class is defined first in a Custom Expectation. For now, we're going to skip to the Expectation class and begin laying the groundwork for the functionality of your Custom Expectation.
Let's start by updating your Expectation's name and docstring.
Replace the Expectation class name
class ExpectTableToMeetSomeCriteria(TableExpectation):
with your real Expectation class name, in upper camel case:
class ExpectTableColumnsToBeUnique(TableExpectation):
You can also go ahead and write a new one-line docstring, replacing
"""TODO: add a docstring here"""
with something like:
"""Expect table to contain columns with unique contents."""
You'll also need to change the class name at the bottom of the file, by replacing this line:
ExpectTableToMeetSomeCriteria().print_diagnostic_checklist()
with this one:
ExpectTableColumnsToBeUnique().print_diagnostic_checklist()
Later, you can go back and write a more thorough docstring.
At this point you can re-run your diagnostic checklist. You should see something like this:
$ python expect_table_columns_to_be_unique.py
Completeness checklist for ExpectTableColumnsToBeUnique:
  ✔ Has a library_metadata object
  ✔ Has a docstring, including a one-line short description
    Has at least one positive and negative example case, and all test cases pass
    Has core logic and passes tests on at least one Execution Engine
...
Congratulations! You're one step closer to implementing a Custom Expectation.
5. Add example cases
                          Next, we're going to search for
                          examples = [] in your file, and replace
                          it with at least two test examples. These examples
                          serve a dual purpose:
                        
- They provide test fixtures that Great Expectations can execute automatically via pytest.
- They help users understand the logic of your Expectation by providing tidy examples of paired input and output. If you contribute your Expectation to open source, these examples will appear in the Gallery.
Your examples will look something like this:
examples = [
        {
            "data": {
                "col1": [1, 2, 3, 4, 5],
                "col2": [2, 3, 4, 5, 6],
                "col3": [3, 4, 5, 6, 7],
            },
            "tests": [
                {
                    "title": "strict_positive_test",
                    "exact_match_out": False,
                    "include_in_gallery": True,
                    "in": {"strict": True},
                    "out": {"success": True},
                }
            ],
        },
        {
            "data": {
                "col1": [1, 2, 3, 4, 5],
                "col2": [1, 2, 3, 4, 5],
                "col3": [3, 4, 5, 6, 7],
            },
            "tests": [
                {
                    "title": "loose_positive_test",
                    "exact_match_out": False,
                    "include_in_gallery": True,
                    "in": {"strict": False},
                    "out": {"success": True},
                },
                {
                    "title": "strict_negative_test",
                    "exact_match_out": False,
                    "include_in_gallery": True,
                    "in": {"strict": True},
                    "out": {"success": False},
                },
            ],
        },
    ]
                          Here's a quick overview of how to create test
                          cases to populate examples. The overall
                          structure is a list of dictionaries. Each dictionary
                          has two keys:
                        
- 
                            data: defines the input data of the example as a table/data frame. In this example the table has one column namedxand a second column namedy. Both columns have 5 rows. (Note: if you define multiple columns, make sure that they have the same number of rows.)
- 
                            tests: a list of test cases to validate against the data frame defined in the correspondingdata.- 
                                titleshould be a descriptive name for the test case. Make sure to have no spaces.
- 
                                include_in_gallery: This must be set toTrueif you want this test case to be visible in the Gallery as an example.
- 
                                incontains exactly the parameters that you want to pass in to the Expectation."in": {"strict": True}in the example above is equivalent toexpect_table_columns_to_be_unique(strict=True)
- 
                                outis based on the Validation Result returned when executing the Expectation.
- 
                                exact_match_out: if you setexact_match_out=False, then you don’t need to include all the elements of the Validation Result object - only the ones that are important to test.
 
- 
                                
test_backends?
                          test_backends is an optional key you
                              can pass to offer more granular control over which
                              backends and SQL dialects your tests are run
                              against.
                            If you run your Expectation file again, you won't see any new checkmarks, as the logic for your Custom Expectation hasn't been implemented yet. However, you should see that the tests you've written are now being caught and reported in your checklist:
$ python expect_table_columns_to_be_unique.py
Completeness checklist for ExpectTableColumnsToBeUnique:
  ✔ Has a library_metadata object
  ✔ Has a docstring, including a one-line short description
...
    Has core logic that passes tests for all applicable Execution Engines and SQL dialects
          Only 0 / 2 tests for pandas are passing
          Failing: basic_positive_test, basic_negative_test
...
                              For more information on tests and example cases,
                              
                              see our guide on
                              creating example cases for a Custom
                                Expectation.
                            
6. Implement your Metric and connect it to your Expectation
This is the stage where you implement the actual business logic for your Expectation. To do so, you'll need to implement a function within a Metric class, and link it to your Expectation. By the time your Expectation is complete, your Metric will have functions for all three Execution Engines (Pandas, Spark, and SQLAlchemy) supported by Great Expectations. For now, we're only going to define one.
                              Metrics answer questions about your data posed by
                              your Expectation, 
                              and allow your Expectation to judge whether your
                              data meets
                              your expectations.
                            
                          Your Metric function will have the
                          @metric_value decorator, with the
                          appropriate engine. Metric functions can
                          be as complex as you like, but they're often very
                          short. For example, here's the definition for a
                          Metric function to find the unique columns of a table
                          with the PandasExecutionEngine.
                        
@metric_value(engine=PandasExecutionEngine)
    def _pandas(
        cls,
        execution_engine,
        metric_domain_kwargs,
        metric_value_kwargs,
        metrics,
        runtime_configuration,
    ):
        df, _, _ = execution_engine.get_compute_domain(
            metric_domain_kwargs, domain_type=MetricDomainTypes.TABLE
        )
        unique_columns = set(df.T.drop_duplicates().T.columns)
        return unique_columns
                              The @metric_value decorator allows us
                              to explicitly structure queries and directly
                              access our compute domain. While this can result
                              in extra roundtrips to your database in some
                              situations, it allows for advanced functionality
                              and customization of your Custom Expectations.
                            
This is all that you need to define for now. In the next step, we will implement the method to validate the result of this Metric.
Other parameters
Expectation Success Keys - A tuple consisting of values that must / could be provided by the user and defines how the Expectation evaluates success.
Expectation Default Kwarg Values (Optional) - Default values for success keys and the defined domain, among other values.
Metric Condition Value Keys (Optional) - Contains any additional arguments passed as parameters to compute the Metric.
                          Next, choose a Metric Identifier for your Metric. By
                          convention, Metric Identifiers for Column Map
                          Expectations start with column.. The
                          remainder of the Metric Identifier simply describes
                          what the Metric computes, in snake case. For this
                          example, we'll use
                          column.custom_max.
                        
You'll need to substitute this metric into two places in the code. First, in the Metric class, replace
metric_name = "METRIC NAME GOES HERE"
with
metric_name = "table.columns.unique"
Second, in the Expectation class, replace
metric_dependencies = ("METRIC NAME GOES HERE",)
with
metric_dependencies = ("table.columns.unique", "table.columns")
It's essential to make sure to use matching Metric Identifier strings across your Metric class and Expectation class. This is how the Expectation knows which Metric to use for its internal logic.
Finally, rename the Metric class name itself, using the camel case version of the Metric Identifier, minus any periods.
For example, replace:
class TableMeetsSomeCriteria(TableMetricProvider):
with
class TableColumnsUnique(TableMetricProvider):
7. Validate
In this step, we simply need to validate that the results of our Metrics meet our Expectation.
                          The validate method is implemented as
                          _validate(...):
                        
def _validate(
        self,
        configuration: ExpectationConfiguration,
        metrics: Dict,
        runtime_configuration: dict = None,
        execution_engine: ExecutionEngine = None,
    ):
                          This method takes a dictionary named
                          metrics, which contains all Metrics
                          requested by your Metric dependencies, and performs a
                          simple validation against your success keys (i.e.
                          important thresholds) in order to return a dictionary
                          indicating whether the Expectation has evaluated
                          successfully or not.
                        
                          To do so, we'll be accessing our success keys, as
                          well as the result of our previously-calculated
                          Metrics. For example, here is the definition of a
                          _validate(...) method to validate the
                          results of our
                          table.columns.unique Metric against our
                          success keys:
                        
def _validate(
        self,
        configuration: ExpectationConfiguration,
        metrics: Dict,
        runtime_configuration: dict = None,
        execution_engine: ExecutionEngine = None,
    ):
        unique_columns = metrics.get("table.columns.unique")
        table_columns = metrics.get("table.columns")
        strict = configuration.kwargs.get("strict")
        duplicate_columns = unique_columns.symmetric_difference(table_columns)
        if strict is True:
            success = len(duplicate_columns) == 0
        else:
            success = len(duplicate_columns) < len(table_columns)
        return {
            "success": success,
            "result": {"observed_value": {"duplicate_columns": duplicate_columns}},
        }
Running your diagnostic checklist at this point should return something like this:
$ python expect_table_columns_to_be_unique.py
Completeness checklist for ExpectTableColumnsToBeUnique:
  ✔ Has a library_metadata object
  ✔ Has a docstring, including a one-line short description
  ✔ Has at least one positive and negative example case, and all test cases pass
  ✔ Has core logic and passes tests on at least one Execution Engine
...
                            Congratulations!
🎉 You've just built
                              your first Custom Expectation! 🎉
                          
8. Contribution (Optional)
This guide will leave you with a Custom Expectation sufficient for contribution back to Great Expectations at an Experimental level.
                          If you plan to contribute your Expectation to the
                          public open source project, you should update the
                          library_metadata object before submitting
                          your
                          Pull Request. For example:
                        
library_metadata = {
        "tags": [],  # Tags for this Expectation in the Gallery
        "contributors": [  # Github handles for all contributors to this Expectation.
            "@your_name_here",  # Don't forget to add your github handle here!
        ],
    }
would become
library_metadata = {
        "tags": ["uniqueness"],
        "contributors": ["@joegargery"],
    }
This is particularly important because we want to make sure that you get credit for all your hard work!
For more information on our code standards and contribution, see our guide on Levels of Maturity for Expectations.
To view the full script used in this page, see it on GitHub: