How to create and edit Expectations in bulk
The JsonSchemaProfiler
helps you quickly
create
Expectation SuitesA collection of verifiable assertions about
data.
from jsonschema files.
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
- Have a working installation of Great Expectations
- Have a valid jsonschema file that has top level object of type object.
This implementation does not traverse any levels of nesting.
Steps
1.Set a filename and a suite name
jsonschema_file = versioned_code/version-0.14.13/"YOUR_JSON_SCHEMA_FILE.json"
suite_name = "YOUR_SUITE_NAME"
2. Load a DataContext
context = ge.data_context.DataContext()
3. Load the jsonschema file
with open(jsonschema_file, "r") as f:
schema = json.load(f)
4. Instantiate the profiler
profiler = JsonSchemaProfiler()
5. Create the suite
suite = profiler.profile(schema, suite_name)
6. Save the suite
context.save_expectation_suite(suite)
7. Optionally, generate Data Docs and review the results there.
Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. provides a concise and useful way to review the Expectation Suite that has been created.
In python, this is done by calling the
build_data_docs()
method of your
Data ContextThe primary entry point for a Great Expectations
deployment, with configurations and methods for
all supporting components..
context.build_data_docs()
You can also review and update the Expectations created by the ProfilerGenerates Metrics and candidate Expectations from data. to get to the Expectation Suite you want using:
great_expectations suite edit
Additional notes
Note that
JsonSchemaProfiler
generates
Expectation Suites using column map
ExpectationsA verifiable assertion about data., which assumes a tabular data structure, because
Great Expectations does not currently support
nested data structures.
The full example script is here:
import json
import great_expectations as ge
from great_expectations.profile.json_schema_profiler import JsonSchemaProfiler
jsonschema_file = versioned_code/version-0.14.13/"YOUR_JSON_SCHEMA_FILE.json"
suite_name = "YOUR_SUITE_NAME"
context = ge.data_context.DataContext()
with open(jsonschema_file, "r") as f:
raw_json = f.read()
schema = json.loads(raw_json)
print("Generating suite...")
profiler = JsonSchemaProfiler()
suite = profiler.profile(schema, suite_name)
context.save_expectation_suite(suite)