Execution Engine
THE FOLLOWING WAS TAKEN FROM 20201007_execution_engine.md in github.com/superconductive/design The same documentation also includes information on:
- Validation
- Expectations
- Expectation Bundles
:::
An Execution Engine
provides the
computing resources that will be used to actually
perform validation. Great Expectations can take
advantage of many different Execution Engines, such as
Pandas, Spark, or SqlAlchemy, and even translate the
same Expectations to validate data using different
engines.
Data is always viewed through the lens of an Execution Engine in Great Expectations. When we obtain a Batch of data, that Batch contains metadata that wraps the native Data Object of the Execution Engine -- for example, a DataFrame in Pandas or Spark, or a table or query result in SQL.
Execution Engine init arguments
name
caching
-
batch_spec_defaults
(is this needed?) batch_data_dict
validator
Execution Engine Properties
-
loaded_batch_data
(all "loaded" batches) active_batch_data_id
Execution Engine Methods
-
load_batch_data(batrch_id, batch_data)
-
resolve_metrics
: computes metric values -
get_compute_domain
: gets the compute domain for a particular type of intermediate metric.
SqlAlchemyExecutionEngine and SparkDFExecutionEngine provide an additional feature that allows deferred resolution of metrics, making it possible to bundle the request for several metrics into a single trip to the backend. Additional Execution Engines may also support this feature in the future.
-
resolve_metric_bundle
: computes values of a bundle of metrics; this function is used internally by resolve_metrics on execution engines that support bundled metrics
Validation Flow
- Validator.graph_validate(expectation_suite)
-
for each Expectation: get_validation_dependencies
{
"user_useful_name": MetricConfiguration,
...
} - _populate_dependencies
-
for each dependent metric:
get_evaluation_dependencies
- a validation_graph object is ready. Nodes are MetricConfigurations, edges are dependencies.
_parse_validation_graph
- for each set of ready_metrics: Execution Engine resolve_metrics
-
for each metric: bundleable? a. yes -> add to
bundle b. no ->
resolve_metric
i. call metric_fn to get value of metric -
resolve_metric_bundle
a. for each metric in bundle: i. call metric_fn to get: tuple(engine_function, domain_kwargs) ii. add engine_function to resolve call for the domain b. for each domain, dispatch call to engine, and add resulting metrics to metrics dictionary - Expectation.validate(metrics) (now metrics are populated)