Skip to main content
Version: 0.17.23

Result format

You can use the result_format parameter to define the level of detail for Validation Results in your Data Docs. For example, you can return a success or failure message, a summary of observed values, a list of failing values, or you can add a query or a filter function that returns all failing rows. Typical use cases for this parameter include cleaning data and excluding Validation Result data in published Data Docs.

The result_format parameter can be a string or a dictionary which specifies the fields to return in result.

The string values "BOOLEAN_ONLY", "BASIC", "SUMMARY", and "COMPLETE" are supported. The default is "SUMMARY". The behavior of each setting is described in examples.

When using a dictionary, result_format can include the following keys:

  • result_format: Sets the fields to return in results.

  • unexpected_index_column_names: Defines the columns that can be used to identify unexpected results. For example, primary key (PK) column(s) or other columns with unique identifiers. Supports multiple column names as a list.

  • return_unexpected_index_query: When running validations, a query (or a set of indices) is returned that allows you to retrieve the full set of unexpected results including any columns identified in unexpected_index_column_names. Setting this value to False suppresses the output (default is True).

  • partial_unexpected_count: Sets the number of results to include in `partial_unexpected_count``, if applicable. Set the value to zero to suppress the unexpected counts.

  • exclude_unexpected_values: When running validations, a set of unexpected results' indices and values is returned. Setting this value to True suppresses values from the output to only have indices (default is False).

  • include_unexpected_rows: When running validations, this returns the entire row for each unexpected value in dictionary form. When using include_unexpected_rows, you must explicitly specify result_format and result_format must be more verbose than BOOLEAN_ONLY.

    :::note include_unexpected_rows returns EVERY row for each unexpected value. In large tables, this could result in an unmanageable amount of data. :::

Configure result format

You can specify result_format for a single Expectation, or an entire Checkpoint. When configured at the Expectation-level, the configuration is not persisted, and you'll receive a UserWarning. GX recommends that you use an Expectation-level configuration for exploratory analysis, and add the final configuration at the Checkpoint-level.

Expectation-level configuration

To apply result_format to an Expectation, you pass it into the Expectation configuration on your Validator:

```python name="tests/integration/docusaurus/reference/core_concepts/result_format/result_format_complete_example_set"
```

Checkpoint-level configuration

Run the following code to apply result_format to every Expectation in a Suite:

checkpoint: Checkpoint = Checkpoint(
name="my_checkpoint",
run_name_template="%Y%m%d-%H%M%S-my-run-name-template",
data_context=context,
batch_request=my_batch_request,
expectation_suite_name="test_suite",
action_list=[
{
"name": "store_validation_result",
"action": {"class_name": "StoreValidationResultAction"},
},
{
"name": "store_evaluation_params",
"action": {"class_name": "StoreEvaluationParametersAction"},
},
{"name": "update_data_docs", "action": {"class_name": "UpdateDataDocsAction"}},
],
runtime_configuration={
"result_format": {
"result_format": "COMPLETE",
"unexpected_index_column_names": ["pk_column"],
"return_unexpected_index_query": True,
},
},
)

Your Checkpoint configuration is defined below the runtime_configuration key.

The results are stored in the Validation Result after running the Checkpoint.

note

The unexpected_index_list, as represented by primary key (PK) columns, is rendered in Data Docs when COMPLETE is selected.

The unexpected_index_query, which for SQL and Spark is a query that allows you to retrieve the full set of unexpected values from the dataset, is also rendered by default when COMPLETE is selected. For Pandas, this parameter returns the full set of unexpected indices, which can also be used to retrieve the full set of unexpected values. This is returned whether the unexpected_index_column_names are defined.

To suppress this output, set the return_unexpected_index_query parameter to False.

Regardless of how Result Format is configured, unexpected_list is never rendered in Data Docs.

Column Map Expectations result format values and fields

Fields within result BOOLEAN_ONLY BASIC SUMMARY COMPLETE
element_count no yes yes yes
missing_count no yes yes yes
missing_percent no yes yes yes
unexpected_count no yes yes yes
unexpected_percent no yes yes yes
unexpected_percent_nonmissing no yes yes yes
partial_unexpected_list no yes yes yes
partial_unexpected_index_list no no yes yes
partial_unexpected_counts no no yes yes
unexpected_index_list no no no yes
unexpected_index_query no no no yes
unexpected_list no no no yes

Column Aggregate Expectations result format values and fields

Fields within result BOOLEAN_ONLY BASIC SUMMARY COMPLETE
observed_value no yes yes yes

Example use cases for different result_format values

result_format Setting Example use case
BOOLEAN_ONLY Automatic validation. No result is returned.
BASIC Exploratory analysis in a notebook.
SUMMARY Detailed exploratory work with follow-on investigation.
COMPLETE Debugging pipelines or developing detailed regression tests.

Examples

The following examples use the data defined in the following Pandas DataFrame:

dataframe = pd.DataFrame(
{
"pk_column": ["zero", "one", "two", "three", "four", "five", "six", "seven"],
"my_var": ["A", "B", "B", "C", "C", "C", "D", "D"],
"my_numbers": [1.0, 2.0, 2.0, 3.0, 3.0, 3.0, 4.0, 4.0],
}
)

Behavior for BOOLEAN_ONLY

When the result_format is BOOLEAN_ONLY, no result is returned. The result of evaluating the Expectation is exclusively returned via the value of the success parameter.

For example:

validation_result = my_validator.expect_column_values_to_be_in_set(
column="my_var",
value_set=["A", "B"],
result_format={"result_format": "BOOLEAN_ONLY"},
)

Returns the following output:

assert validation_result.success == False
assert validation_result.result == {}

Behavior for BASIC

For BASIC format, a result is generated with a basic justification for why an Expectation failed or succeeded. The format is intended for quick feedback and it works well in Jupyter Notebooks.

GX has standard behavior for describing the results of column_map_expectation and ColumnAggregateExpectation Expectations.

column_map_expectation applies a boolean test function to each element within a column, and so returns a list of unexpected values to justify the Expectation result.

The basic result includes:

{
"success" : Boolean,
"result" : {
"partial_unexpected_list" : [A list of up to 20 values that violate the Expectation]
"unexpected_count" : The total count of unexpected values in the column
"unexpected_percent" : The overall percent of unexpected values
"unexpected_percent_nonmissing" : The percent of unexpected values, excluding missing values from the denominator
}
}

Note: When unexpected values are duplicated, unexpected_list contains multiple copies of the value.

For example:

validation_result = my_validator.expect_column_values_to_be_in_set(
column="my_var", value_set=["A", "B"], result_format={"result_format": "BASIC"}
)

Returns the following output:

assert validation_result.success == False
assert validation_result.result == {
"element_count": 8,
"unexpected_count": 5,
"unexpected_percent": 62.5,
"partial_unexpected_list": ["C", "C", "C", "D", "D"],
"missing_count": 0,
"missing_percent": 0.0,
"unexpected_percent_total": 62.5,
"unexpected_percent_nonmissing": 62.5,
}

ColumnAggregateExpectation computes a single aggregate value for the column, and so returns a single observed_value to justify the Expectation result.

The basic result includes:

{
"success" : Boolean,
"result" : {
"observed_value" : The aggregate statistic computed for the column
}
}

For example:

validation_result = my_validator.expect_column_mean_to_be_between(
column="my_numbers", min_value=0.0, max_value=10.0, result_format="BASIC"
)

Returns the following output:

assert validation_result.success == True
assert validation_result.result == {"observed_value": 2.75}

Behavior for SUMMARY

A result is generated with a summary justification for why an Expectation was successful or unsuccessful. The format is intended for more detailed exploratory work and includes additional information beyond what is included by BASIC. For example, it can support generating dashboard results of whether a set of Expectations are being met.

GX has standard behavior for support for describing the results of column_map_expectation and ColumnAggregateExpectation Expectations.

column_map_expectation applies a boolean test function to each element within a column, and so returns a list of unexpected values to justify the Expectation result.

The summary result includes:

{
'success': False,
'result': {
'element_count': The total number of values in the column
'unexpected_count': The total count of unexpected values in the column (also in `BASIC`)
'unexpected_percent': The overall percent of unexpected values (also in `BASIC`)
'unexpected_percent_nonmissing': The percent of unexpected values, excluding missing values from the denominator (also in `BASIC`)
"partial_unexpected_list" : [A list of up to 20 values that violate the Expectation] (also in `BASIC`)
'missing_count': The number of missing values in the column
'missing_percent': The total percent of missing values in the column
'partial_unexpected_counts': [{A list of objects with value and counts, showing the number of times each of the unexpected values occurs}
'partial_unexpected_index_list': [A list of up to 20 of the indices of the unexpected values in the column, as defined by the columns in `unexpected_index_column_names`]
}
}

For example:

validation_result = my_validator.expect_column_values_to_be_in_set(
column="my_var",
value_set=["A", "B"],
result_format={
"result_format": "SUMMARY",
"unexpected_index_column_names": ["pk_column"],
"return_unexpected_index_query": True,
},
)

Returns the following output:

assert validation_result.success == False
assert validation_result.result == {
"element_count": 8,
"unexpected_count": 5,
"unexpected_percent": 62.5,
"partial_unexpected_list": ["C", "C", "C", "D", "D"],
"unexpected_index_column_names": ["pk_column"],
"missing_count": 0,
"missing_percent": 0.0,
"unexpected_percent_total": 62.5,
"unexpected_percent_nonmissing": 62.5,
"partial_unexpected_index_list": [
{"my_var": "C", "pk_column": "three"},
{"my_var": "C", "pk_column": "four"},
{"my_var": "C", "pk_column": "five"},
{"my_var": "D", "pk_column": "six"},
{"my_var": "D", "pk_column": "seven"},
],
"partial_unexpected_counts": [
{"value": "C", "count": 3},
{"value": "D", "count": 2},
],
}

ColumnAggregateExpectation computes a single aggregate value for the column, and so returns a observed_value to justify the Expectation result. It also includes additional information regarding observed values and counts, depending on the specific Expectation.

The summary result includes:

{
'success': False,
'result': {
'observed_value': The aggregate statistic computed for the column (also in `BASIC`)
}
}

For example:

validation_result = my_validator.expect_column_mean_to_be_between(
column="my_numbers", min_value=0.0, max_value=10.0, result_format="SUMMARY"
)

Returns the following output:

assert validation_result.success == True
assert validation_result.result == {"observed_value": 2.75}

Behavior for COMPLETE

A result is generated with all available justification for why an Expectation was successful or unsuccessful. The format is intended for debugging pipelines or developing detailed regression tests.

Great Expectations implements standard behaviors to support describing the results of column_map_expectation and ColumnAggregateExpectation Expectations.

column_map_expectation applies a boolean test function to each element within a column, and so returns a list of unexpected values to justify the Expectation result.

The complete result includes:

{
'success': False,
'result': {
"unexpected_list" : [A list of all values that violate the Expectation]
'unexpected_index_list': [A list of the indices of the unexpected values in the column, as defined by the columns in `unexpected_index_column_names`]
'unexpected_index_query': [A query that can be used to retrieve all unexpected values (SQL and Spark), or the full list of unexpected indices (Pandas)]
'element_count': The total number of values in the column (also in `SUMMARY`)
'unexpected_count': The total count of unexpected values in the column (also in `SUMMARY`)
'unexpected_percent': The overall percent of unexpected values (also in `SUMMARY`)
'unexpected_percent_nonmissing': The percent of unexpected values, excluding missing values from the denominator (also in `SUMMARY`)
'missing_count': The number of missing values in the column (also in `SUMMARY`)
'missing_percent': The total percent of missing values in the column (also in `SUMMARY`)
}
}

For example:

validation_result = my_validator.expect_column_values_to_be_in_set(
column="my_var",
value_set=["A", "B"],
result_format={
"result_format": "COMPLETE",
"unexpected_index_column_names": ["pk_column"],
"return_unexpected_index_query": True,
},
)

Returns the following output:

assert validation_result.success == False
assert validation_result.result == {
"element_count": 8,
"unexpected_count": 5,
"unexpected_percent": 62.5,
"partial_unexpected_list": ["C", "C", "C", "D", "D"],
"unexpected_index_column_names": ["pk_column"],
"missing_count": 0,
"missing_percent": 0.0,
"unexpected_percent_total": 62.5,
"unexpected_percent_nonmissing": 62.5,
"partial_unexpected_index_list": [
{"my_var": "C", "pk_column": "three"},
{"my_var": "C", "pk_column": "four"},
{"my_var": "C", "pk_column": "five"},
{"my_var": "D", "pk_column": "six"},
{"my_var": "D", "pk_column": "seven"},
],
"partial_unexpected_counts": [
{"value": "C", "count": 3},
{"value": "D", "count": 2},
],
"unexpected_list": ["C", "C", "C", "D", "D"],
"unexpected_index_list": [
{"my_var": "C", "pk_column": "three"},
{"my_var": "C", "pk_column": "four"},
{"my_var": "C", "pk_column": "five"},
{"my_var": "D", "pk_column": "six"},
{"my_var": "D", "pk_column": "seven"},
],
"unexpected_index_query": "df.filter(items=[3, 4, 5, 6, 7], axis=0)",
}

ColumnAggregateExpectation computes a single aggregate value for the column, and so returns a observed_value to justify the Expectation result. It also includes additional information regarding observed values and counts, depending on the specific Expectation.

The complete result includes:

{
'success': False,
'result': {
'observed_value': The aggregate statistic computed for the column (also in `SUMMARY`)
'details': {<Expectation-specific result justification fields, which may be more detailed than in `SUMMARY`>}
}
}

For example:

validation_result = my_validator.expect_column_mean_to_be_between(
column="my_numbers", min_value=0.0, max_value=10.0, result_format="COMPLETE"
)

Returns the following output:

assert validation_result.success == True
assert validation_result.result == {"observed_value": 2.75}