How to create example cases for a Custom Expectation
This guide will help you add example cases to document and test the behavior of your ExpectationA verifiable assertion about data..
Prerequisites: This how-to guide assumes you have:
Example cases in Great Expectations serve a dual purpose:
- First, they help the users of the Expectation understand its logic by providing examples of input data that the Expectation will evaluate.
- Second, they provide test cases that the Great Expectations testing framework can execute automatically.
If you decide to contribute your Expectation, its entry in the Expectations Gallery will render these examples.
We will explain the structure of these tests using the Custom ExpectationAn extension of the `Expectation` class, developed outside of the Great Expectations library. implemented in our guide on how to create Custom Column Aggregate Expectations.
Steps
1. Decide which tests you want to implement
Expectations can have a robust variety of possible applications. We want to create tests that demonstrate (and verify) the capabilities and limitations of our Custom Expectation.
What kind of tests can I create?
{"success": False}
, or as
granular as:
{
"success": True,
"expectation_config": {
"expectation_type": "expect_column_value_z_scores_to_be_less_than",
"kwargs": {
"column": "a",
"mostly": 0.9,
"threshold": 4,
"double_sided": True,
},
"meta": {},
},
"result": {
"element_count": 6,
"unexpected_count": 0,
"unexpected_percent": 0.0,
"partial_unexpected_list": [],
"missing_count": 0,
"missing_percent": 0.0,
"unexpected_percent_total": 0.0,
"unexpected_percent_nonmissing": 0.0,
},
"exception_info": {
"raised_exception": False,
"exception_traceback": None,
"exception_message": None,
}
}
At a minimum, we want to create tests that show what our Custom Expectation will and will not do.
These basic positive and negative example cases are the minimum amount of test coverage required for a Custom Expectation to be accepted into the Great Expectations codebase at an Experimental level.
To begin with, let's implement those two basic tests: one positive example case, and one negative example case.
2. Defining our data
Search for examples = []
in the template
file you are modifying for your new Custom
Expectation.
We're going to populate
examples
with a list of example cases.
What is an example case?
-
data
: defines the input data of the example as a table/dataframe. -
tests
: a list of test cases that use the data defined above as input to validate against. -
title
: a descriptive name for the test case. Make sure to have no spaces. -
include_in_gallery
: set it to True if you want this test case to be visible in the gallery as an example (true for most test cases). -
in
: contains exactly the parameters that you want to pass in to the Expectation."in": {"column": "x", "min_value": 4}
would be equivalent toexpect_column_max_to_be_between_custom(column="x", min_value=4)
-
out
: indicates the results the test requires from theValidationResult
needed to pass. -
exact_match_out
: if you setexact_match_out=False
, then you don’t need to include all the elements of the result object - only the ones that are important to test, such as{"success": True}
.
In our example, data
will have two
columns, "x" and "y", each with
five rows. If you define multiple columns, make sure
that they have the same number of rows. When possible,
include test data and tests that includes null values
(None
in the Python test definition).
"data": {"x": [1, 2, 3, 4, 5], "y": [0, -1, -2, 4, None]},
When you define data in your examples, we will mostly
guess the type of the columns. Sometimes you need to
specify the precise type of the columns for each
backend. Then you use the
schemas
attribute (on the same level as
data
and tests
in the
dictionary):
"schemas": {
"spark": {
"x": "IntegerType",
},
"sqlite": {
"x": "INTEGER",
},
While Pandas is fairly flexible in typing, Spark and many SQL dialects are much more strict.
You may find you wish to use data that is
incompatible with a given backend, or write
different individual tests for different backends.
To do this, you can use the
only_for
attribute, which accepts a
list containing pandas
,
spark
, sqlite
, a SQL
dialect, or a combination of any of the above:
"only_for": ["spark", "pandas"]
Passing this attribute on the same level as
data
, tests
, and
schemas
will tell Great Expectations
to only instantiate the data specified in that
example for the given backend, ensuring you
don't encounter any backend-related errors
relating to data before your Custom Expectation
can even be tested:
Passing this attribute within a test (at the same
level as title
, in
,
out
, etc.) will execute that
individual test only for that specified backend.
3. Defining our tests
In our example, tests
will be a list
containing dictionaries defining each test.
You will need to:
- Title your tests (
title
) -
Define the input for your tests (
in
) -
Decide how precisely you want to test the output of
your tests (
exact_match_out
) -
Define the expected output for your tests
(
out
)
If you are interested in contributing your Custom
Expectation back to Great Expectations, you will also
need to decide if you want these tests publicly
displayed to demonstrate the functionality of your
Custom Expectation (include_in_gallery
).
examples = [
{
"data": {"x": [1, 2, 3, 4, 5], "y": [0, -1, -2, 4, None]},
"tests": [
{
"title": "basic_positive_test",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"column": "x",
"min_value": 4,
"strict_min": True,
"max_value": 5,
"strict_max": False,
},
"out": {"success": True},
},
{
"title": "basic_negative_test",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"column": "y",
"min_value": -2,
"strict_min": False,
"max_value": 3,
"strict_max": True,
},
"out": {"success": False},
},
],
"test_backends": [
{
"backend": "pandas",
"dialects": None,
},
{
"backend": "sqlalchemy",
"dialects": ["sqlite", "postgresql"],
},
{
"backend": "spark",
"dialects": None,
},
],
}
]
You may have noticed that specifying
test_backends
isn't required for
successfully testing your Custom Expectation.
If not specified, Great Expectations will attempt to determine the implemented backends automatically, but wll only run SQLAlchemy tests against sqlite.
Can I test for errors?
out
key, and include an
error
key defining a
traceback_substring
.For example:
"out": {},
"error": {
"traceback_substring" : "TypeError: Column values, min_value, and max_value must either be None or of the same type."
}
4. Verifying our tests
If you now run your file,
print_diagnostic_checklist()
will attempt
to execute these example cases.
If the tests are correctly defined, and the rest of the logic in your Custom Expectation is already complete, you will see the following in your Diagnostic Checklist:
✔ Has at least one positive and negative example case, and all test cases pass
Congratulations!
🎉 You've successfully
created example cases & tests for a Custom
Expectation! 🎉
5. Contribution (Optional)
This guide will leave you with test coverage sufficient for contribution back to Great Expectations at an Experimental level.
If you're interested in having your contribution accepted at a Beta level, these tests will need to pass for all supported backends (Pandas, Spark, & SQLAlchemy).
For full acceptance into the Great Expectations codebase at a Production level, we require a more robust test suite. If you believe your Custom Expectation is otherwise ready for contribution at a Production level, please submit a Pull Request, and we will work with you to ensure adequate testing.
For more information on our code standards and contribution, see our guide on Levels of Maturity for Expectations.
To view the full script used in this page, see it on GitHub: