How to add input validation and type checking for a Custom Expectation
Prerequisites: This how-to guide assumes you have:
ExpectationsA verifiable assertion about data. will typically be configured using input parameters. These parameters are required to provide your Custom ExpectationAn extension of the `Expectation` class, developed outside of the Great Expectations library. with the context it needs to ValidateThe act of applying an Expectation Suite to a Batch. your data. Ensuring that these requirements are fulfilled is the purpose of type checking and validating your input parameters.
For example, we might expect the fraction of null
values to be mostly=.05
, in which case
any value above 1 would indicate an impossible
fraction of a single whole (since a value above one
indicates more than a single whole), and should throw
an error. Another example would be if we want to
indicate that the the mean of a row adheres to a
minimum value bound, such as min_value=5
.
In this case, attempting to pass in a non numerical
value should clearly throw an error!
This guide will walk you through the process of adding validation and Type Checking to the input parameters of the Custom Expectation built in the guide for how to create a Custom Column Aggregate Expectation. When you have completed this guide, you will have implemented a method to validate that the input parameters provided to this Custom Expectation satisfy the requirements necessary for them to be used as intended by the Custom Expectation's code.
Steps
1. Deciding what to validate
As a general rule, we want to validate any of our
input parameters and success keys that are explicitly
used by our Expectation class. In the case of our
example Expectation
expect_column_max_to_be_between_custom
,
we've defined four parameters to validate:
-
min_value
: An integer or float defining the lowest acceptable bound for our column max -
max_value
: An integer or float defining the highest acceptable bound for our column max -
strict_min
: A boolean value defining whether our column max is (strict_min=False
) or is not (strict_min=True
) allowed to equal themin_value
-
strict_max
: A boolean value defining whether our column max is (strict_max=False
) or is not (strict_max=True
) allowed to equal themax_value
What don't we need to validate?
column
parameter has been
set. Great Expectations implicitly handles the
validation of certain parameters universal to each
class of Expectation, so you don't have to!
2. Defining the validation method
We define the
validate_configuration(...)
method of our
Custom Expectation class to ensure that the input
parameters constitute a valid configuration, and
doesn't contain illogical or incorrect values.
For example, if min_value
is greater than
max_value
, max_value=True
,
or strict_min=Joe
, we want to throw an
exception. To do this, we're going to write a
series of assert
statements to catch
invalid values for our parameters.
To begin with, we want to create our
validate_configuration(...)
method and
ensure that a configuration is set:
def validate_configuration(
self, configuration: Optional[ExpectationConfiguration] = None
) -> None:
"""
Validates that a configuration has been set, and sets a configuration if it has yet to be set. Ensures that
necessary configuration arguments have been provided for the validation of the expectation.
Args:
configuration (OPTIONAL[ExpectationConfiguration]): \
An optional Expectation Configuration entry that will be used to configure the expectation
Returns:
None. Raises InvalidExpectationConfigurationError if the config is not validated successfully
"""
# Setting up a configuration
super().validate_configuration(configuration)
configuration = configuration or self.configuration
Next, we're going to implement the logic for validating the four parameters we identified above.
3. Accessing parameters and writing assertions
First we need to access the parameters to be evaluated:
min_value = configuration.kwargs["min_value"]
max_value = configuration.kwargs["max_value"]
strict_min = configuration.kwargs["strict_min"]
strict_max = configuration.kwargs["strict_max"]
Now we can begin writing the assertions to validate these parameters.
We're going to ensure that at least one of
min_value
or max_value
is
set:
try:
assert (
min_value is not None or max_value is not None
), "min_value and max_value cannot both be none"
Check that min_value
and
max_value
are of the correct type:
assert min_value is None or isinstance(
min_value, (float, int)
), "Provided min threshold must be a number"
assert max_value is None or isinstance(
max_value, (float, int)
), "Provided max threshold must be a number"
Verify that, if both min_value
and
max_value
are set,
min_value
does not exceed
max_value
:
if min_value and max_value:
assert (
min_value <= max_value
), "Provided min threshold must be less than or equal to max threshold"
And assert that strict_min
and
strict_max
, if provided, are of the
correct type:
assert strict_min is None or isinstance(
strict_min, bool
), "strict_min must be a boolean value"
assert strict_max is None or isinstance(
strict_max, bool
), "strict_max must be a boolean value"
If any of these fail, we raise an exception:
except AssertionError as e:
raise InvalidExpectationConfigurationError(str(e))
Putting this all together, our
validate_configuration(...)
method should
verify that all necessary inputs have been provided,
that all inputs are of the correct types, that they
have a correct relationship between each other, and
that if any of these conditions aren't met, we
raise an exception.
4. Verifying our method
If you now run your file,
print_diagnostic_checklist()
will attempt
to execute the
validate_configuration(...)
using the
input provided in your
Example Cases.
If your input is successfully validated, and the rest the logic in your Custom Expectation is already complete, you will see the following in your Diagnostic Checklist:
✔ Has basic input validation and type checking
✔ Custom 'assert' statements in validate_configuration
Congratulations!
🎉 You've successfully
added input validation & type checking to a
Custom Expectation! 🎉
5. Contribution (Optional)
The method implemented in this guide is an optional feature for Experimental Expectations, and a requirement for contribution back to Great Expectations at Beta and Production levels.
If you would like to contribute your Custom Expectation to the Great Expectations codebase, please submit a Pull Request.
For more information on our code standards and contribution, see our guide on Levels of Maturity for Expectations.
To view the full script used in this page, see it on GitHub: