Version: 0.14.13

Execution Engine

THE FOLLOWING WAS TAKEN FROM 20201007_execution_engine.md in github.com/superconductive/design The same documentation also includes information on:

Validation
Expectations
Expectation Bundles

:::

An Execution Engine provides the computing resources that will be used to actually perform validation. Great Expectations can take advantage of many different Execution Engines, such as Pandas, Spark, or SqlAlchemy, and even translate the same Expectations to validate data using different engines.

Data is always viewed through the lens of an Execution Engine in Great Expectations. When we obtain a Batch of data, that Batch contains metadata that wraps the native Data Object of the Execution Engine -- for example, a DataFrame in Pandas or Spark, or a table or query result in SQL.

Execution Engine init arguments

name
caching
batch_spec_defaults (is this needed?)
batch_data_dict
validator

Execution Engine Properties

loaded_batch_data (all "loaded" batches)
active_batch_data_id

Execution Engine Methods

load_batch_data(batrch_id, batch_data)
resolve_metrics: computes metric values
get_compute_domain: gets the compute domain for a particular type of intermediate metric.

SqlAlchemyExecutionEngine and SparkDFExecutionEngine provide an additional feature that allows deferred resolution of metrics, making it possible to bundle the request for several metrics into a single trip to the backend. Additional Execution Engines may also support this feature in the future.

resolve_metric_bundle: computes values of a bundle of metrics; this function is used internally by resolve_metrics on execution engines that support bundled metrics

Validation Flow

Validator.graph_validate(expectation_suite)

for each Expectation: get_validation_dependencies

  {
      "user_useful_name": MetricConfiguration,
      ...
  }

_populate_dependencies
for each dependent metric: get_evaluation_dependencies
- a validation_graph object is ready. Nodes are MetricConfigurations, edges are dependencies.
_parse_validation_graph
for each set of ready_metrics: Execution Engine resolve_metrics
for each metric: bundleable? a. yes -> add to bundle b. no -> resolve_metric i. call metric_fn to get value of metric
resolve_metric_bundle a. for each metric in bundle: i. call metric_fn to get: tuple(engine_function, domain_kwargs) ii. add engine_function to resolve call for the domain b. for each domain, dispatch call to engine, and add resulting metrics to metrics dictionary
Expectation.validate(metrics) (now metrics are populated)

Execution Engine init arguments​

Execution Engine Properties​

Execution Engine Methods​

Validation Flow​

Execution Engine init arguments

Execution Engine Properties

Execution Engine Methods

Validation Flow