How to connect to SQL data using a query
In this guide we will demonstrate how to connect Great Expectations to the data returned by a query in a generic SQL database. GX uses SQLAlchemy to connect to SQL data, and therefore supports most SQL dialects that SQLAlchemy does. For more information on the SQL dialects supported by SQLAlchemy, see Dialects.
To connect to the contents of a SQL table instead of the results of a SQL query, see our guide on how to connect to a SQL table, instead.
Prerequisites
- An installation of GX set up to work with SQL. See How to set up GX to work with SQL databases.
- Source data stored in a SQL database.
If you still need to connect a Datasource to a SQL database
Please reference the appropriate one of these guides:
General SQL Datasources
Specific SQL dialects
Details
If you're using a Datasource that was created with the advanced block-config method, see How to configure a SQL Datasource with the block-config method, instead.
Steps
1. Import GX and instantiate a Data Context
The code to import Great Expectations and instantiate a Data Context is:
import great_expectations as gx
context = gx.get_context()
2. Retrieve a SQL Datasource
For this guide, we will use a previously defined
Datasource named
"my_datasource"
. For purposes
of our demonstration, this Datasource was configured
to connect to a SQL database.
To retrieve this Datasource, we will supply the
get_datasource(...)
method of our Data
Context with the name of the Datasource we wish to
retrieve:
datasource = context.get_datasource("my_datasource")
3. Add a query to the Datasource as a Data Asset
To indicate the query that provides the data to
connect to we will define a Data Asset. This is done
by providing the
add_query_asset(...)
method a
name
by which we will reference the Data
Asset in the future and a query
which
will provide the data we wish the Data Asset to
connect to.
table_asset = datasource.add_query_asset(
name="version-0.16.16 my_asset",
query="SELECT passenger_count, total_amount FROM yellow_tripdata_sample_2019_01",
)
4. (Optional) Repeat step 3 as needed to add additional queries
If you wish to connect to the contents of additional queries in the same SQL Database, simply repeat the steps above to add them as query Data Assets.
Next steps
Now that you have connected to the data returned by a SQL query, you may want to look into: