How to create a new Expectation Suite by profiling from a jsonschema file
The JsonSchemaProfiler
helps you quickly
create
Expectation SuitesA collection of verifiable assertions about
data.
from jsonschema files.
Prerequisites: This how-to guide assumes you have:
- Completed the Getting Started Tutorial
- A working installation of Great Expectations
- Set up a working deployment of Great Expectations
-
Have a valid jsonschema file that has top level
object of type
object
This implementation does not traverse any levels of nesting.
Steps
1. Set a filename and a suite name
jsonschema_file = "YOUR_JSON_SCHEMA_FILE.json"
suite_name = "version-0.15.50 YOUR_SUITE_NAME"
2. Load a DataContext
context = gx.get_context()
3. Load the jsonschema file
with open(jsonschema_file, "r") as f:
schema = json.load(f)
4. Instantiate the profiler
profiler = JsonSchemaProfiler()
5. Create the suite
suite = profiler.profile(schema, suite_name)
6. Save the suite
context.add_expectation_suite(expectation_suite=suite)
7. (Optional) Generate Data Docs and review the results
Data DocsHuman readable documentation generated from Great Expectations metadata detailing Expectations, Validation Results, etc. provides a concise and useful way to review the Expectation Suite that has been created.context.build_data_docs()
You can also review and update the
ExpectationsA verifiable assertion about data.
created by the
ProfilerGenerates Metrics and candidate Expectations from
data.
to get to the Expectation Suite you want using
great_expectations suite edit
.
Additional notes
Note that JsonSchemaProfiler generates Expectation Suites using column map Expectations, which assumes a tabular data structure, because Great Expectations does not currently support nested data structures.
The full example script is here:
import json
import great_expectations as gx
from great_expectations.profile.json_schema_profiler import JsonSchemaProfiler
jsonschema_file = "YOUR_JSON_SCHEMA_FILE.json"
suite_name = "version-0.15.50 YOUR_SUITE_NAME"
context = gx.get_context()
with open(jsonschema_file, "r") as f:
raw_json = f.read()
schema = json.loads(raw_json)
print("Generating suite...")
profiler = JsonSchemaProfiler()
suite = profiler.profile(schema, suite_name)
context.add_expectation_suite(expectation_suite=suite)