Add SQLAlchemy support for Custom Expectations
This guide will help you implement native SQLAlchemy support for your Custom ExpectationAn extension of the `Expectation` class, developed outside of the Great Expectations library..
Prerequisites
Great Expectations supports a number of
Execution EnginesA system capable of processing data to compute
Metrics., including a SQLAlchemy Execution Engine. These
Execution Engines provide the computing resources used
to calculate the
MetricsA computed attribute of data such as the mean of
a column.
defined in the Metric
class of your
Custom Expectation.
If you decide to contribute your ExpectationA verifiable assertion about data., its entry in the Expectations Gallery will reflect the Execution Engines that it supports.
We will add SQLAlchemy support for the Custom Expectations implemented in our guides on how to create Custom Column Aggregate Expectations and how to create Custom Column Map Expectations.
Specify your backends and dialects
While SQLAlchemy is able to provide a common interface to a variety of SQL dialects, some functions may not work in a particular dialect, or in some cases they may return different values. To avoid surprises, it can be helpful to determine beforehand what backends and dialects you plan to support, and test them along the way.
Within the examples
defined inside your
Expectation class, the optional
only_for
and
suppress_test_for
keys specify which
backends to use for testing. If a backend is not
specified, Great Expectations attempts testing on all
supported backends. Run the following command to add
entries corresponding to the functionality you want to
add:
examples = [
{
"data": {"x": [1, 2, 3, 4, 5], "y": [0, -1, -2, 4, None]},
"only_for": ["pandas", "spark", "sqlite", "postgresql"],
"tests": [
{
"title": "basic_positive_test",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"column": "x",
"min_value": 4,
"strict_min": True,
"max_value": 5,
"strict_max": False,
},
"out": {"success": True},
},
{
"title": "basic_negative_test",
"exact_match_out": False,
"include_in_gallery": True,
"in": {
"column": "y",
"min_value": -2,
"strict_min": False,
"max_value": 3,
"strict_max": True,
},
"out": {"success": False},
},
],
}
]
The optional only_for
and
suppress_test_for
keys can be
specified at the top-level (next to
data
and tests
) or
within specific tests (next to title
,
and so on).
Allowed backends include: "bigquery", "mssql", "mysql", "pandas", "postgresql", "redshift", "snowflake", "spark", "sqlite", "trino"
Implement the SQLAlchemy logic for your Custom Expectation
Great Expectations provides a variety of ways to implement an Expectation in SQLAlchemy. Two of the most common include:
- Defining a partial function that takes a SQLAlchemy column as input
- Directly executing queries using SQLAlchemy objects to determine the value of your Expectation's metric directly
- Partial Function
- Query Execution
Great Expectations allows for much of the
SQLAlchemy logic for executing queries be
abstracted away by specifying metric behavior as
a partial function. To do this, we use one of
the @column_*_partial
decorators:
-
@column_aggregate_partial
for Column Aggregate Expectations -
@column_condition_partial
for Column Map Expectations -
@column_pair_condition_partial
for Column Pair Map Expectations -
@multicolumn_condition_partial
for Multicolumn Map Expectations
These decorators expect an appropriate
engine
argument. In this case,
we'll pass our
SqlAlchemyExecutionEngine
. The
decorated method takes in an SQLAlchemy
Column
object and will either
return a
sqlalchemy.sql.functions.Function
or a
sqlalchemy.sql.expression.ColumnOperator
that Great Expectations will use to generate the
appropriate SQL queries.
For our Custom Column Map Expectation
ExpectColumnValuesToEqualThree
,
we're going to leverage SQLAlchemy's
in_
ColumnOperator and the
@column_condition_partial
decorator.
@column_condition_partial(engine=SqlAlchemyExecutionEngine)
def _sqlalchemy(cls, column, **kwargs):
return column.in_([3])
Getting func
-y?
func
special
object instance.func
allows us to pass common generic functions
which SQLAlchemy will compile appropriately
for the targeted dialect, giving us the
flexibility to not have write that targeted
code ourselves!Here's an example from ExpectColumnSumToBeBetween:
@column_aggregate_partial(engine=SqlAlchemyExecutionEngine)
def _sqlalchemy(cls, column, **kwargs):
return sa.func.sum(column)
For more on func
and the
func
-tionality it provides,
see
SQLAlchemy's Functions
documentation.
The most direct way of implementing a metric is
by computing its value by constructing or
directly executing querys using objects provided
by the @metric_*
decorators:
-
@metric_value
for Column Aggregate Expectations-
Expects an appropriate
engine
,metric_fn_type
, anddomain_type
-
Expects an appropriate
-
@metric_partial
for all Map Expectations-
Expects an appropriate
engine
,partial_fn_type
, anddomain_type
-
Expects an appropriate
Our engine
will reflect the backend
we're implementing
(SqlAlchemyExecutionEngine
), while
our fn_type
and
domain_type
are unique to the type
of Expectation we're implementing.
These decorators enable a higher-complexity workflow, allowing you to explicitly structure your queries and make intermediate queries to your database. While this approach can result in extra roundtrips to your database, it can also unlock advanced functionality for your Custom Expectations.
For our Custom Column Aggregate Expectation
ExpectColumnMaxToBeBetweenCustom
,
we're going to implement the
@metric_value
decorator, specifying
the type of value we're computing
(AGGREGATE_VALUE
) and the domain
over which we're computing
(COLUMN
):
@metric_value(engine=SqlAlchemyExecutionEngine)
def _sqlalchemy(
cls,
execution_engine: SqlAlchemyExecutionEngine,
metric_domain_kwargs,
metric_value_kwargs,
metrics,
runtime_configuration,
):
The decorated method takes in a valid Execution EngineA system capable of processing data to compute Metrics. and relevant key word arguments, and will return a computed value.
To do this, we need to access our Compute Domain directly:
(
selectable,
compute_domain_kwargs,
accessor_domain_kwargs,
) = execution_engine.get_compute_domain(
metric_domain_kwargs, MetricDomainTypes.COLUMN
)
column_name = accessor_domain_kwargs["column"]
column = sa.column(column_name)
This allows us to build a query and use our Execution Engine to execute that query against our data to return the actual value we're looking for, instead of returning a query to find that value:
query = sa.select(sa.func.max(column)).select_from(selectable)
result = execution_engine.execute_query(query).fetchone()
return result[0]
Getting func
-y?
func
special object
instance.func
allows us
to pass common generic functions which
SQLAlchemy will compile appropriately for
the targeted dialect, giving us the
flexibility to not have write that targeted
code ourselves!For more on
func
and the
func
-tionality it provides, see
SQLAlchemy's Functions
documentation.
Verify your implementation
If you now run your file,
print_diagnostic_checklist()
will attempt
to execute your example cases using this new backend.
If your implementation is correctly defined, and the rest of the core logic in your Custom Expectation is already complete, you will see the following in your Diagnostic Checklist:
✔ Has at least one positive and negative example case, and all test cases pass
If you've already implemented the Pandas backend covered in our How-To guides for creating Custom Expectations and the Spark backend covered in our guide on how to add Spark support for Custom Expectations, you should see the following in your Diagnostic Checklist:
✔ Has core logic that passes tests for all applicable Execution Engines and SQL dialects
Congratulations!
🎉 You've successfully
implemented SQLAlchemy support for a Custom
Expectation! 🎉
Contribution (Optional)
This guide will leave you with core functionality sufficient for contribution to Great Expectations at an Experimental level.
If you're interested in having your contribution accepted at a Beta level, your Custom Expectation will need to support SQLAlchemy, Spark, and Pandas.
For full acceptance into the Great Expectations codebase at a Production level, we require that your Custom Expectation meets our code standards, test coverage and style. If you believe your Custom Expectation is otherwise ready for contribution at a Production level, please submit a Pull Request, and we will work with you to ensure your Custom Expectation meets these standards.
For more information on our code standards and contribution, see our guide on Levels of Maturity for Expectations.
To view the full scripts used in this page, see them on GitHub: