How to connect to SQLite data
In this guide we will demonstrate how to connect Great Expectations to data in a SQLite database. We will demonstrate how to create a SQLite Datasource. With our SQLite Datasource we will then show the methods for connecting to data in a SQLite table and connecting to data from a SQLite query.
Prerequisites
- An installation of GX set up to work with SQL
- Source data stored in a SQLite database
Steps
1. Import GX and instantiate a Data Context
The code to import Great Expectations and instantiate a Data Context is:
import great_expectations as gx
context = gx.get_context()
2. Determine your connection string
For this example we will use a connection string to connect to our PostgreSQL database. In PostgreSQL, connection strings are formatted like:
my_connection_string = "sqlite:///<path_to_db_file>"
3. Create a SQLite Datasource
Creating a PostgreSQL Datasource is as simple as
providing the add_sqlite(...)
method a
name
by which to reference it in the
future and the connection_string
with
which to access it.
datasource_name = "version-0.16.16 my_datasource"
With these two values, we can create our Datasource:
datasource = context.sources.add_sqlite(
name=datasource_name, connection_string=my_connection_string
)
add_sql(...)
vs
add_sqlite(...)
to create a
Datasource
The basic SQL Datasource created with
add_sql
can connect to data in a
SQLite database, but it won't work as well as
a SQLite Datasource from
add_sqlite(...)
.
SQLite stores datetime values as strings. Because of this, a general SQL Datasource will see datetime columns as string columns, instead. The SQLite Datasource has additional handling in place for these fields, and also has additional error reporting for SQLite specific issues.
If you are working with SQLite data, you should
always use add_sqlite(...)
to create
your Datasource! The
add_sql(...)
method may connect to
your SQLite database, but it won't handle
datetime columns properly or report errors as
clearly.
4. (Optional) Connect to the data in a table
We will indicate a table to connect to with a Table
Data Asset. This is done by providing the
add_table_asset(...)
method a
name
by which we will reference the Data
Asset in the future and a table_name
to
specify the table we wish the Data Asset to connect
to.
asset_name = "version-0.16.16 my_asset"
asset_table_name = my_table_name
With these two values, we can create our Data Asset:
table_asset = datasource.add_table_asset(name=asset_name, table_name=asset_table_name)
5. (Optional) Connect to the data in a query
To indicate the query that provides data to connect to
we will define a Query Data Asset. This done by
providing the add_query_asset(...)
method
a name
by which we will reference the
Data Asset in the future and a
query
which will provide the data we wish
the Data Asset to connect to.
asset_name = "version-0.16.16 my_query_asset"
query = "SELECT * from yellow_tripdata_sample_2019_01"
Once we have these two values, we can create our Data Asset with:
query_asset = datasource.add_query_asset(name=asset_name, query=query)
6. (Optional) Repeat steps 4 or 5 as needed to add additional tables or queries
If you wish to connect to additional tables or queries in the same PostgreSQL Database, simply repeat the steps above to add them as table Data Assets.
Next steps
Now that you have connected to a SQLite Database and created a Data Asset, you may want to look into: