Skip to main content
Version: 0.15.50

Datasource

Setup Arrow Connect to Data Arrow Create Expectations Arrow Validate Data

Definition

A Datasource provides a standard API for accessing and interacting with data from a wide variety of source systems.

Features and promises

Datasources provide a unified API across multiple backends: the Datasource API remains the same for PostgreSQL, CSV Filesystems, and all other supported data backends.

Important:

Datasources do not modify your data.

Relationship to other objects

Datasources function by bringing together a way of interacting with Data (an Execution EngineA system capable of processing data to compute Metrics.) with a way of accessing that data (a Data Connector.Provides the configuration details based on the source data system which are needed by a Datasource to define Data Assets.). Batch RequestsProvided to a Datasource in order to create a Batch. utilize Datasources in order to return a BatchA selection of records from a Data Asset. of data.

Use Cases

Setup

Connect to Data

When connecting to data the Datasource is your primary tool. At this stage, you will create Datasources to define how Great Expectations can find and access your Data AssetsA collection of records within a Datasource which is usually named based on the underlying data system and sliced to correspond to a desired specification.. Under the hood, each Datasource must have an Execution Engine and one or more Data Connectors configured. Once a Datasource is configured you will be able to operate with the Datasource's API rather than needing a different API for each possible data backend you may be working with.

Setup

Create Expectations

When creating ExpectationsA verifiable assertion about data. you will use your Datasources to obtain BatchesA selection of records from a Data Asset. for ProfilersGenerates Metrics and candidate Expectations from data. to analyze. Datasources also provide Batches for Expectation SuitesA collection of verifiable assertions about data., such as when you use the interactive workflow to create new Expectations.

Setup

Validate Data

Datasources are also used to obtain Batches for ValidatorsUsed to run an Expectation Suite against data. to run against when you are validating data.

Features

Unified API

Datasources support connecting to a variety of different data backends. No matter which source data system you employ, the Datasource's API will remain the same.

No Unexpected Modifications

Datasources do not modify your data during profiling or validation, but they may create temporary artifacts to optimize computing Metrics and Validation. This behaviour can be configured at the Data Connector level.

API Basics

How to access

You will typically only access your Datasource directly through Python code, which can be executed from a script, a Python console, or a Jupyter Notebook. To access a Datasource all you need is a Data ContextThe primary entry point for a Great Expectations deployment, with configurations and methods for all supporting components. and the name of the Datasource you want to access, as shown below:

Python console:
import great_expectations as gx

context = gx.get_context()
datasource = context.get_datasource("my_datasource_name")

How to create and configure

Creating a Datasource is quick and easy, and can be done from the CLICommand Line Interface or through Python code. Configuring the Datasource may differ between backends, according to the given backend's requirements, but the process of creating one will remain the same.

To create a new Datasource through the CLI, run great_expectations datasource new.

To create a new Datasource through Python code, obtain a data context and call its add_datasource method.

Advanced users may also create a Datasource directly through a YAML config file.

For detailed instructions on how to create Datasources that are configured for various backends, see our documentation on Connecting to Data.