Add input validation and type checking for a Custom Expectation
Prerequisites
ExpectationsA verifiable assertion about data. will typically be configured using input parameters. These parameters are required to provide your Custom ExpectationAn extension of the `Expectation` class, developed outside of the Great Expectations library. with the context it needs to ValidateThe act of applying an Expectation Suite to a Batch. your data. Ensuring that these requirements are fulfilled is the purpose of type checking and validating your input parameters.
For example, we might expect the fraction of null
values to be mostly=.05, in which case
any value above 1 would indicate an impossible
fraction of a single whole (since a value above one
indicates more than a single whole), and should throw
an error. Another example would be if we want to
indicate that the mean of a row adheres to a minimum
value bound, such as min_value=5. In this
case, attempting to pass in a non numerical value
should clearly throw an error!
This guide will walk you through the process of adding validation and Type Checking to the input parameters of the Custom Expectation built in the guide for how to create a Custom Column Aggregate Expectation. When you have completed this guide, you will have implemented a method to validate that the input parameters provided to this Custom Expectation satisfy the requirements necessary for them to be used as intended by the Custom Expectation's code.
Decide what to validate
As a general rule, we want to validate any of our
input parameters and success keys that are explicitly
used by our Expectation class. In the case of our
example Expectation
expect_column_max_to_be_between_custom,
we've defined four parameters to validate:
-
min_value: An integer or float defining the lowest acceptable bound for our column max -
max_value: An integer or float defining the highest acceptable bound for our column max -
strict_min: A boolean value defining whether our column max is (strict_min=False) or is not (strict_min=True) allowed to equal themin_value -
strict_max: A boolean value defining whether our column max is (strict_max=False) or is not (strict_max=True) allowed to equal themax_value
What don't we need to validate?
column parameter has been
set. Great Expectations implicitly handles the
validation of certain parameters universal to each
class of Expectation, so you don't have to!
Define the Validation method
We define the
validate_configuration(...) method of our
Custom Expectation class to ensure that the input
parameters constitute a valid configuration, and
doesn't contain illogical or incorrect values.
For example, if min_value is greater than
max_value, max_value=True,
or strict_min=Joe, we want to throw an
exception. To do this, we're going to write a
series of assert statements to catch
invalid values for our parameters.
To begin with, we want to create our
validate_configuration(...) method and
ensure that a configuration is set:
def validate_configuration(
self, configuration: Optional[ExpectationConfiguration] = None
) -> None:
"""
Validates that a configuration has been set, and sets a configuration if it has yet to be set. Ensures that
necessary configuration arguments have been provided for the validation of the expectation.
Args:
configuration (OPTIONAL[ExpectationConfiguration]): \
An optional Expectation Configuration entry that will be used to configure the expectation
Returns:
None. Raises InvalidExpectationConfigurationError if the config is not validated successfully
"""
# Setting up a configuration
super().validate_configuration(configuration)
configuration = configuration or self.configuration
Next, we're going to implement the logic for validating the four parameters we identified above.
Access parameters and writing assertions
First we need to access the parameters to be evaluated:
min_value = configuration.kwargs["min_value"]
max_value = configuration.kwargs["max_value"]
strict_min = configuration.kwargs["strict_min"]
strict_max = configuration.kwargs["strict_max"]
Now we can begin writing the assertions to validate these parameters.
We're going to ensure that at least one of
min_value or max_value is
set:
try:
assert (
min_value is not None or max_value is not None
), "min_value and max_value cannot both be none"
Check that min_value and
max_value are of the correct type:
assert min_value is None or isinstance(
min_value, (float, int)
), "Provided min threshold must be a number"
assert max_value is None or isinstance(
max_value, (float, int)
), "Provided max threshold must be a number"
Verify that, if both min_value and
max_value are set,
min_value does not exceed
max_value:
if min_value and max_value:
assert (
min_value <= max_value
), "Provided min threshold must be less than or equal to max threshold"
And assert that strict_min and
strict_max, if provided, are of the
correct type:
assert strict_min is None or isinstance(
strict_min, bool
), "strict_min must be a boolean value"
assert strict_max is None or isinstance(
strict_max, bool
), "strict_max must be a boolean value"
If any of these fail, we raise an exception:
except AssertionError as e:
raise InvalidExpectationConfigurationError(str(e))
Putting this all together, our
validate_configuration(...) method should
verify that all necessary inputs have been provided,
that all inputs are of the correct types, that they
have a correct relationship between each other, and
that if any of these conditions aren't met, we
raise an exception.
Verify your method
If you now run your file,
print_diagnostic_checklist() will attempt
to execute the
validate_configuration(...) using the
input provided in your
Example Cases.
If your input is successfully validated, and the rest the logic in your Custom Expectation is already complete, you will see the following in your Diagnostic Checklist:
✔ Has basic input validation and type checking
✔ Custom 'assert' statements in validate_configuration
Congratulations!
🎉 You've successfully
added input validation & type checking to a
Custom Expectation! 🎉
Contribution (Optional)
The method implemented in this guide is an optional feature for Experimental Expectations, and a requirement for contribution to Great Expectations at Beta and Production levels.
If you would like to contribute your Custom Expectation to the Great Expectations codebase, please submit a Pull Request.
For more information on our code standards and contribution, see our guide on Levels of Maturity for Expectations.
To view the full script used in this page, see it on GitHub: