Create a Custom Query Expectation
QueryExpectations
are a
type of
ExpectationA verifiable assertion about data., enabled for SQL and Spark, that enable a
higher-complexity type of workflow when compared with
core Expectation classes such as
ColumnAggregate
, ColumnMap
, and Table
.
QueryExpectations
allow you to set
Expectations against the results of your own custom
queries, and make intermediate queries to your
database. While this approach can result in extra
roundtrips to your database, it can also unlock
advanced functionality for your
Custom ExpectationsAn extension of the `Expectation` class,
developed outside of the Great Expectations
library..
They are evaluated for the results of a query, and
answer a semantic question about your data returned by
that query. For example,
expect_queried_table_row_count_to_equal
answers how many rows are returned from your table by
your query.
This guide will walk you through the process of
creating your own custom
QueryExpectation
.
Prerequisites
Choose a name for your Expectation
First, decide on a name for your own Expectation. By
convention, QueryExpectations
always
start with expect_queried_
.
All QueryExpectations
support the
parameterization of your
Active BatchA selection of records from a Data Asset.; some QueryExpectations
also support
the parameterization of a Column
. This
tutorial will detail both approaches.
- Batch Parameterization
- Batch & Column Parameterization
Your Expectation will have two versions of the
same name: a CamelCaseName
and a
snake_case_name
. For example, this
tutorial will use:
-
ExpectQueriedTableRowCountToBe
-
expect_queried_table_row_count_to_be
For more on Expectation naming conventions, see the Expectations section of the Code Style Guide.
Copy and rename the template file
By convention, each Expectation is kept in its own python file, named with the snake_case version of the Expectation's name.
You can find the template file for a custom
QueryExpectation
here. Download the file, place it in the
appropriate directory, and rename it to the
appropriate name.
cp query_expectation_template.py /SOME_DIRECTORY/expect_queried_table_row_count_to_equal.py
Storing Expectation files
During development, you don't need to store Expectation files in a specific location. Expectation files are self-contained and can be executed anywhere as long as GX is installed However, to use your new Expectation with other GX components, you'll need to make sure the file is stored one of the following locations:
-
If you're building a Custom ExpectationAn extension of the `Expectation` class, developed outside of the Great Expectations library. for personal use, you'll need to put it in the
great_expectations/plugins/expectations
folder of your GX deployment, and import your Custom Expectation from that directory whenever it will be used. When you instantiate the correspondingDataContext
, it will automatically make all PluginsExtends Great Expectations' components and/or functionality. in the directory available for use. -
If you're building a Custom Expectation to contribute to the open source project, you'll need to put it in the repo for the Great Expectations library itself. Most likely, this will be within a package within
contrib/
:great_expectations/contrib/SOME_PACKAGE/SOME_PACKAGE/expectations/
. To use these Expectations, you'll need to install the package.
For more information about Custom Expectations, see Use a Custom Expectation.
Generate a diagnostic checklist for your Expectation
Once you've copied and renamed the template file, you can execute it as follows.
python expect_queried_table_row_count_to_be.py
The template file is set up so that this will
run the Expectation's
print_diagnostic_checklist()
method. This will run a diagnostic script on
your new Expectation, and return a checklist of
steps to get it to full production readiness.
Completeness checklist for ExpectQueryToMatchSomeCriteria:
✔ Has a valid library_metadata object
Has a docstring, including a one-line short description
Has at least one positive and negative example case, and all test cases pass
Has core logic and passes tests on at least one Execution Engine
Passes all linting checks
Has basic input validation and type checking
Has both Statement Renderers: prescriptive and diagnostic
Has core logic that passes tests for all applicable Execution Engines and SQL dialects
Has a robust suite of tests, as determined by a code owner
Has passed a manual review by a code owner for code standards and style guides
When in doubt, the next step to implement is the first one that doesn't have a ✔ next to it. This guide will walk you through the first five steps, the minimum for a functioning Custom Expectation and all that is required for contribution back to open source at an Experimental level.
Change the Expectation class name and add a docstring
Now we're going to begin laying the groundwork for the functionality of your Custom Expectation.
Let's start by updating your Expectation's name and docstring.
Replace the Expectation class name
class ExpectQueryToMatchSomeCriteria(QueryExpectation):
with your real Expectation class name, in upper camel case:
class ExpectQueriedTableRowCountToBe(QueryExpectation):
You can also go ahead and write a new one-line docstring, replacing
"""TODO: Add a docstring here"""
with something like:
"""Expect the expect the number of rows returned from a queried table to equal a specified value."""
Make sure your one-line docstring begins with "Expect " and ends with a period. You'll also need to change the class name at the bottom of the file, by replacing this line:
ExpectQueryToMatchSomeCriteria().print_diagnostic_checklist()
with this one:
ExpectQueriedTableRowCountToBe().print_diagnostic_checklist()
Later, you can go back and write a more thorough docstring. See Expectation Docstring Formatting.
At this point you can re-run your diagnostic checklist. You should see something like this:
$ python expect_queried_table_row_count_to_be.py
Completeness checklist for ExpectQueriedTableRowCountToBe:
✔ Has a valid library_metadata object
✔ Has a docstring, including a one-line short description
Has at least one positive and negative example case, and all test cases pass
Has core logic and passes tests on at least one Execution Engine
Passes all linting checks
...
Metric classes
If you've built a Custom Expectation before, you may have noticed that the template doesn't contain a MetricA computed attribute of data such as the mean of a column. class.
While you are still able to create a Custom
Metric for your Custom Expectation if needed,
the nature of
QueryExpectations
allows us to
provide a small number of generic
query.*
Metrics are capable of
supporting many use-cases.
Add example cases
You're going to search for
examples = []
in your file, and
replace it with at least two test examples.
These examples serve the following purposes:
-
They provide test fixtures that Great Expectations can execute automatically with
pytest
. -
They help users understand the logic of your Expectation by providing tidy examples of paired input and output. If you contribute your Expectation to open source, these examples will appear in the Gallery.
Your examples will look similar to this example:
examples = [
{
"data": [
{
"data": {
"col1": [1, 2, 2, 3, 4],
"col2": ["a", "a", "b", "b", "a"],
},
},
],
"tests": [
{
"title": "basic_positive_test",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"value": 5,
},
"out": {"success": True},
"only_for": ["sqlite", "spark"],
},
{
"title": "basic_negative_test",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"value": 2,
},
"out": {"success": False},
"only_for": ["sqlite", "spark"],
},
{
"title": "positive_test_static_data_asset",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"value": 5,
"query": """
SELECT COUNT(*)
FROM test
""",
},
"out": {"success": True},
"only_for": ["sqlite"],
},
{
"title": "positive_test_row_condition",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"value": 2,
"row_condition": 'col("col1")==2',
"condition_parser": "great_expectations__experimental__",
},
"out": {"success": True},
"only_for": ["sqlite", "spark"],
},
],
},
]
Here's a quick overview of how to create
test cases to populate examples
.
The overall structure is a list of dictionaries.
Each dictionary has two keys:
-
data
: defines the input data of the example as a table/data frame. In this example the tabletest
has one column namedcol1
and a second column namedcol2
. Both columns have 5 rows. (Note: if you define multiple columns, make sure that they have the same number of rows.) -
tests
: a list of test cases to ValidateThe act of applying an Expectation Suite to a Batch. against the data frame defined in the correspondingdata
.-
title
should be a descriptive name for the test case. Make sure to have no spaces. -
include_in_gallery
: This must be set toTrue
if you want this test case to be visible in the Gallery as an example. -
in
contains exactly the parameters that you want to pass in to the Expectation."in": {"value": 5}
in the example above is equivalent toexpect_queried_table_row_count_to_be(value=5)
-
out
is based on the Validation ResultGenerated when data is Validated against an Expectation or Expectation Suite. returned when executing the Expectation. -
exact_match_out
: if you setexact_match_out=False
, then you don’t need to include all the elements of the Validation Result object - only the ones that are important to test.
-
The only_for key
only_for
is an optional key you can
pass to offer more granular control over which
backends and SQL dialects your tests are run
against.
If you run your Expectation file again, you won't see any new checkmarks, as the logic for your Custom Expectation hasn't been implemented yet. However, you should see that the tests you've written are now being caught and reported in your checklist:
$ python expect_queried_table_row_count_to_be.py
Completeness checklist for ExpectQueriedTableRowCountToBe:
✔ Has a valid library_metadata object
✔ Has a docstring, including a one-line short description
...
Has core logic that passes tests for all applicable Execution Engines and SQL dialects
Only 0 / 2 tests for sqlite are passing
Failing: basic_positive_test, basic_negative_test
...
For more information on tests and example cases, see how to create example cases for a Custom Expectation.
Implement a Query & Connect a Metric to your Expectation
The query is the core of a
QueryExpectation
; this query is
what defines the scope of your expectations for
your data.
To implement your query, replace the
query
attribute of your Custom
Expectation.
This:
query = """
SQL QUERY GOES HERE
"""
Becomes something like this:
query = """
SELECT COUNT(*)
FROM {active_batch}
"""
As noted above,
QueryExpectations
support
parameterization of your
Active BatchA selection of records from a Data
Asset..
We strongly recommend making use of
that parameterization as above, by querying
against {active_batch}
. Not
doing so could result in your Custom
Expectation unintentionally being run
against the wrong data!
Metrics for QueryExpectations
are a
thin wrapper, allowing you to execute that
parameterized SQL query with Great Expectations.
The results of that query are then validated to
judge whether your data meets
your expectations.
Great Expectations provides a small number of
simple, ready-to-use
query.*
Metrics that can plug into
your Custom Expectation, or serve as a basis for
your own custom Metrics.
Query Metric functions have the
@metric_value
decorator, with
the appropriate engine
.
The @metric_value
decorator
allows us to explicitly structure queries
and directly access our compute domain.
While this can result in extra roundtrips to
your database in some situations, it allows
for advanced functionality and customization
of your Custom Expectations.
See an example of a
query.table
metric
here.
To connect this Metric to our Custom
Expectation, we'll need to include the
metric_name
for this Metric in our
metric_dependencies
.
This tuple:
metric_dependencies = ("METRIC NAME GOES HERE",)
Becomes:
metric_dependencies = ("query.table",)
Other parameters
Expectation Success Keys - A tuple consisting of values that must / could be provided by the user and defines how the Expectation evaluates success.
Expectation Default Kwarg Values (Optional) - Default values for success keys and the defined domain, among other values.
Metric Condition Value Keys (Optional) - Contains any additional arguments passed as parameters to compute the Metric.
Validate
In this step, we simply need to validate that the results of our Metrics meet our Expectation.
The validate method is implemented as
_validate(...)
:
def _validate(
self,
configuration: ExpectationConfiguration,
metrics: dict,
runtime_configuration: dict = None,
execution_engine: ExecutionEngine = None,
) -> Union[ExpectationValidationResult, dict]:
This method takes a dictionary named
metrics
, which contains all Metrics
requested by your Metric dependencies, and
performs a simple validation against your
success keys (i.e. important thresholds) in
order to return a dictionary indicating whether
the Expectation has evaluated successfully or
not.
To do so, we'll be accessing our success
keys, as well as the result of our
previously-calculated Metrics. For example, here
is the definition of a
_validate(...)
method to validate
the results of our
query.table
Metric against our
success keys:
def _validate(
self,
configuration: ExpectationConfiguration,
metrics: dict,
runtime_configuration: dict = None,
execution_engine: ExecutionEngine = None,
) -> Union[ExpectationValidationResult, dict]:
metrics = convert_to_json_serializable(data=metrics)
query_result = list(metrics.get("query.table")[0].values())[0]
value = configuration["kwargs"].get("value")
success = query_result == value
return {
"success": success,
"result": {"observed_value": query_result},
}
Running your diagnostic checklist at this point should return something like this:
$ python expect_queried_table_row_count_to_be.py
Completeness checklist for ExpectQueriedTableRowCountToBe:
✔ Has a valid library_metadata object
✔ Has a docstring, including a one-line short description
✔ Has at least one positive and negative example case, and all test cases pass
✔ Has core logic and passes tests on at least one Execution Engine
Passes all linting checks
...
Linting
Finally, we need to lint our now-functioning
Custom Expectation. Our CI system will test your
code using black
, and
ruff
.
If you've set up your dev environment, these libraries will already be available to you, and can be invoked from your command line to automatically lint your code:
black <PATH/TO/YOUR/EXPECTATION.py>
ruff <PATH/TO/YOUR/EXPECTATION.py> --fix
If desired, you can automate this to happen at commit time. See our guidance on linting for more on this process.
Once this is done, running your diagnostic checklist should now reflect your Custom Expectation as meeting our linting requirements:
$ python expect_queried_table_row_count_to_be.py
Completeness checklist for ExpectQueriedTableRowCountToBe:
✔ Has a valid library_metadata object
✔ Has a docstring, including a one-line short description
✔ Has at least one positive and negative example case, and all test cases pass
✔ Has core logic and passes tests on at least one Execution Engine
✔ Passes all linting checks
...
Your Expectation will have two versions of the
same name: a CamelCaseName
and a
snake_case_name
. For example, this
tutorial will use:
-
ExpectQueriedColumnValueFrequencyToMeetThreshold
-
expect_queried_column_value_frequency_to_meet_threshold
For more on Expectation naming conventions, see the Expectations section of the Code Style Guide.
Copy and rename the template file
By convention, each Expectation is kept in its own python file, named with the snake_case version of the Expectation's name.
You can find the template file for a custom
QueryExpectation
here. Download the file, place it in the
appropriate directory, and rename it to the
appropriate name.
cp query_expectation_template.py /SOME_DIRECTORY/expect_queried_column_value_frequency_to_meet_threshold.py
Storing Expectation files
During development, you don't need to store Expectation files in a specific location. Expectation files are self-contained and can be executed anywhere as long as GX is installed However, to use your new Expectation with other GX components, you'll need to make sure the file is stored one of the following locations:
-
If you're building a Custom ExpectationAn extension of the `Expectation` class, developed outside of the Great Expectations library. for personal use, you'll need to put it in the
great_expectations/plugins/expectations
folder of your GX deployment, and import your Custom Expectation from that directory whenever it will be used. When you instantiate the correspondingDataContext
, it will automatically make all PluginsExtends Great Expectations' components and/or functionality. in the directory available for use. -
If you're building a Custom Expectation to contribute to the open source project, you'll need to put it in the repo for the Great Expectations library itself. Most likely, this will be within a package within
contrib/
:great_expectations/contrib/SOME_PACKAGE/SOME_PACKAGE/expectations/
. To use these Expectations, you'll need to install the package.
For more information about Custom Expectations, see Use a Custom Expectation.
Generate a diagnostic checklist for your Expectation
Once you've copied and renamed the template file, you can execute it as follows.
python expect_queried_column_value_frequency_to_meet_threshold.py
The template file is set up so that this will
run the Expectation's
print_diagnostic_checklist()
method. This will run a diagnostic script on
your new Expectation, and return a checklist of
steps to get it to full production readiness.
Completeness checklist for ExpectQueriedColumnValueFrequencyToMeetThreshold:
✔ Has a valid library_metadata object
Has a docstring, including a one-line short description
Has at least one positive and negative example case, and all test cases pass
Has core logic and passes tests on at least one Execution Engine
Passes all linting checks
Has basic input validation and type checking
Has both Statement Renderers: prescriptive and diagnostic
Has core logic that passes tests for all applicable Execution Engines and SQL dialects
Has a robust suite of tests, as determined by a code owner
Has passed a manual review by a code owner for code standards and style guides
When in doubt, the next step to implement is the first one that doesn't have a ✔ next to it. This guide will walk you through the first five steps, the minimum for a functioning Custom Expectation and all that is required for contribution back to open source at an Experimental level.
Change the Expectation class name and add a docstring
Now we're going to begin laying the groundwork for the functionality of your Custom Expectation.
Let's start by updating your Expectation's name and docstring.
Replace the Expectation class name
class ExpectQueryToMatchSomeCriteria(QueryExpectation):
with your real Expectation class name, in upper camel case:
class ExpectQueriedColumnValueFrequencyToMeetThreshold(QueryExpectation):
You can also go ahead and write a new one-line docstring, replacing
"""TODO: Add a docstring here"""
with something like:
"""Expect the frequency of occurrences of a specified value in a queried column to be at least <threshold> percent of values in that column."""
You'll also need to change the class name at the bottom of the file, by replacing this line:
ExpectQueryToMatchSomeCriteria().print_diagnostic_checklist()
with this one:
ExpectQueriedColumnValueFrequencyToMeetThreshold().print_diagnostic_checklist()
Later, you can go back and write a more thorough docstring.
At this point you can re-run your diagnostic checklist. You should see something like this:
$ python expect_queried_column_value_frequency_to_meet_threshold.py
Completeness checklist for ExpectQueriedColumnValueFrequencyToMeetThreshold:
✔ Has a valid library_metadata object
✔ Has a docstring, including a one-line short description
Has at least one positive and negative example case, and all test cases pass
Has core logic and passes tests on at least one Execution Engine
Passes all linting checks
...
Metric classes
If you've built a Custom Expectation before, you may have noticed that the template doesn't contain a MetricA computed attribute of data such as the mean of a column. class.
While you are still able to create a Custom
Metric for your Custom Expectation if needed,
the nature of
QueryExpectations
allows us to
provide a small number of generic
query.*
Metrics are capable of
supporting many use-cases.
Add example cases
Next, we're going to search for
examples = []
in your file, and
replace it with at least two test examples.
These examples serve a dual purpose:
-
They provide test fixtures that Great
Expectations can execute automatically via
pytest
. - They help users understand the logic of your Expectation by providing tidy examples of paired input and output. If you contribute your Expectation to open source, these examples will appear in the Gallery.
Your examples will look something like this:
examples = [
{
"data": [
{
"data": {
"col1": [1, 2, 2, 3, 4],
"col2": ["a", "a", "b", "b", "a"],
},
},
],
"tests": [
{
"title": "basic_positive_test",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"column": "col2",
"value": "a",
"threshold": 0.6,
},
"out": {"success": True},
"only_for": ["sqlite", "spark"],
},
{
"title": "basic_negative_test",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"column": "col1",
"value": 2,
"threshold": 1,
},
"out": {"success": False},
"only_for": ["sqlite", "spark"],
},
{
"title": "multi_value_positive_test",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"column": "col2",
"value": ["a", "b"],
"threshold": [0.6, 0.4],
},
"out": {"success": True},
"only_for": ["sqlite", "spark"],
},
{
"title": "multi_value_positive_test_static_data_asset",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"column": "col2",
"value": ["a", "b"],
"threshold": [0.6, 0.4],
"query": """
SELECT {col},
CAST(COUNT({col}) AS float) / (SELECT COUNT({col}) FROM test)
FROM test
GROUP BY {col}
""",
},
"out": {"success": True},
"only_for": ["sqlite"],
},
{
"title": "multi_value_positive_test_row_condition",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"column": "col2",
"value": ["a", "b"],
"threshold": [0.6, 0.4],
"row_condition": 'col("col1")==2',
"condition_parser": "great_expectations__experimental__",
},
"out": {"success": False},
"only_for": ["sqlite", "spark"],
},
],
},
]
Here's a quick overview of how to create
test cases to populate examples
.
The overall structure is a list of dictionaries.
Each dictionary has two keys:
-
data
: defines the input data of the example as a table/data frame. In this example the tabletest
has one column namedcol1
and a second column namedcol2
. Both columns have 5 rows. (Note: if you define multiple columns, make sure that they have the same number of rows.) -
tests
: a list of test cases to ValidateThe act of applying an Expectation Suite to a Batch. against the data frame defined in the correspondingdata
.-
title
should be a descriptive name for the test case. Make sure to have no spaces. -
include_in_gallery
: This must be set toTrue
if you want this test case to be visible in the Gallery as an example. -
in
contains exactly the parameters that you want to pass in to the Expectation."in": {"column": "col2", "value": "a", "threshold": 0.6,}
in the example above is equivalent toexpect_queried_column_value_frequency_to_meet_threshold(column="col2", value="a", threshold=0.6)
-
out
is based on the Validation ResultGenerated when data is Validated against an Expectation or Expectation Suite. returned when executing the Expectation. -
exact_match_out
: if you setexact_match_out=False
, then you don’t need to include all the elements of the Validation Result object - only the ones that are important to test.
-
The only_for key
only_for
is an optional key you can
pass to offer more granular control over which
backends and SQL dialects your tests are run
against.
If you run your Expectation file again, you won't see any new checkmarks, as the logic for your Custom Expectation hasn't been implemented yet. However, you should see that the tests you've written are now being caught and reported in your checklist:
$ python expect_queried_column_value_frequency_to_meet_threshold.py
Completeness checklist for ExpectQueriedColumnValueFrequencyToMeetThreshold:
✔ Has a valid library_metadata object
✔ Has a docstring, including a one-line short description
...
Has core logic that passes tests for all applicable Execution Engines and SQL dialects
Only 0 / 2 tests for sqlite are passing
Failing: basic_positive_test, basic_negative_test
...
For more information on tests and example cases, see our guide on how to create example cases for a Custom Expectation.
Implement a Query & Connect a Metric to your Expectation
The query is the core of a
QueryExpectation
; this query is
what defines the scope of your expectations for
your data.
To implement your query, replace the
query
attribute of your Custom
Expectation.
This:
query = """
SQL QUERY GOES HERE
"""
Becomes something like this:
query = """
SELECT {col},
CAST(COUNT({col}) AS float) / (SELECT COUNT({col}) FROM {active_batch})
FROM {active_batch}
GROUP BY {col}
"""
As noted above,
QueryExpectations
support
parameterization of your
Active BatchA selection of records from a Data
Asset., and can support parameterization of a
column name.
While parameterizing a column name with
{col}
is optional and supports
flexibility in your Custom Expectations, we
strongly recommend making use of
batch parameterization, by querying against
{active_batch}
. Not doing so
could result in your Custom Expectation
unintentionally being run against the wrong
data!
Metrics for QueryExpectations
are a
thin wrapper, allowing you to execute that
parameterized SQL query with Great Expectations.
The results of that query are then validated to
judge whether your data meets
your expectations.
Great Expectations provides a small number of
simple, ready-to-use
query.*
Metrics that can plug into
your Custom Expectation, or serve as a basis for
your own custom Metrics.
Query Metric functions have the
@metric_value
decorator, with
the appropriate engine
.
The @metric_value
decorator
allows us to explicitly structure queries
and directly access our compute domain.
While this can result in extra roundtrips to
your database in some situations, it allows
for advanced functionality and customization
of your Custom Expectations.
See an example of a
query.column
metric
here.
To connect this Metric to our Custom
Expectation, we'll need to include the
metric_name
for this Metric in our
metric_dependencies
.
In this case, we'll be using the
query.column
Metric, allowing us to
parameterize both our
Active BatchA selection of records from a Data
Asset.
and a column name.
This tuple:
metric_dependencies = ("METRIC NAME GOES HERE",)
Becomes:
metric_dependencies = ("query.column",)
Other parameters
Expectation Success Keys - A tuple consisting of values that must / could be provided by the user and defines how the Expectation evaluates success.
Expectation Default Kwarg Values (Optional) - Default values for success keys and the defined domain, among other values.
Metric Condition Value Keys (Optional) - Contains any additional arguments passed as parameters to compute the Metric.
Validate
In this step, we simply need to validate that the results of our Metrics meet our Expectation.
The validate method is implemented as
_validate(...)
:
def _validate(
self,
configuration: ExpectationConfiguration,
metrics: dict,
runtime_configuration: dict = None,
execution_engine: ExecutionEngine = None,
) -> Union[ExpectationValidationResult, dict]:
This method takes a dictionary named
metrics
, which contains all Metrics
requested by your Metric dependencies, and
performs a simple validation against your
success keys (i.e. important thresholds) in
order to return a dictionary indicating whether
the Expectation has evaluated successfully or
not.
To do so, we'll be accessing our success
keys, as well as the result of our
previously-calculated Metrics. For example, here
is the definition of a
_validate(...)
method to validate
the results of our
query.column
Metric against our
success keys:
def _validate(
self,
configuration: ExpectationConfiguration,
metrics: dict,
runtime_configuration: dict = None,
execution_engine: ExecutionEngine = None,
) -> Union[ExpectationValidationResult, dict]:
metrics = convert_to_json_serializable(data=metrics)
query_result = metrics.get("query.column")
query_result = dict([element.values() for element in query_result])
value = configuration["kwargs"].get("value")
threshold = configuration["kwargs"].get("threshold")
if isinstance(value, list):
success = all(
query_result[value[i]] >= threshold[i] for i in range(len(value))
)
return {
"success": success,
"result": {
"observed_value": [
query_result[value[i]] for i in range(len(value))
]
},
}
success = query_result[value] >= threshold
return {
"success": success,
"result": {"observed_value": query_result[value]},
}
Running your diagnostic checklist at this point should return something like this:
$ python expect_queried_column_value_frequency_to_meet_threshold.py
Completeness checklist for ExpectQueriedColumnValueFrequencyToMeetThreshold:
✔ Has a valid library_metadata object
✔ Has a docstring, including a one-line short description
✔ Has at least one positive and negative example case, and all test cases pass
✔ Has core logic and passes tests on at least one Execution Engine
Passes all linting checks
...
Linting
Finally, we need to lint our now-functioning
Custom Expectation. Our CI system will test your
code using black
, and
ruff
.
If you've set up your dev environment as recommended in the Prerequisites, these libraries will already be available to you, and can be invoked from your command line to automatically lint your code:
black <PATH/TO/YOUR/EXPECTATION.py>
ruff <PATH/TO/YOUR/EXPECTATION.py> --fix
If desired, you can automate this to happen at commit time. See our guidance on linting for more on this process.
Once this is done, running your diagnostic checklist should now reflect your Custom Expectation as meeting our linting requirements:
$ python expect_queried_column_value_frequency_to_meet_threshold.py
Completeness checklist for ExpectQueriedColumnValueFrequencyToMeetThreshold:
✔ Has a valid library_metadata object
✔ Has a docstring, including a one-line short description
✔ Has at least one positive and negative example case, and all test cases pass
✔ Has core logic and passes tests on at least one Execution Engine
✔ Passes all linting checks
...
Other parameters
Expectation Success Keys - A tuple consisting of values that must / could be provided by the user and defines how the Expectation evaluates success.
Expectation Default Kwarg Values (Optional) - Default values for success keys and the defined domain, among other values.
Metric Condition Value Keys (Optional) - Contains any additional arguments passed as parameters to compute the Metric.
Contribute (Optional)
This guide will leave you with a Custom Expectation sufficient for contribution to Great Expectations at an Experimental level.
If you plan to contribute your Expectation to the
public open source project, you should update the
library_metadata
object before submitting
your
Pull Request. For example:
library_metadata = {
"tags": [], # Tags for this Expectation in the Gallery
"contributors": [ # Github handles for all contributors to this Expectation.
"@your_name_here", # Don't forget to add your github handle here!
],
}
would become
# This dictionary contains metadata for display in the public gallery
library_metadata = {
"tags": ["query-based"],
"contributors": ["@joegargery"],
}
This is particularly important because we want to make sure that you get credit for all your hard work!
For more information on our code standards and contribution, see our guide on Levels of Maturity for Expectations.
To view the full scripts used in this page, see them on GitHub: