How to connect to a PostgreSQL database
In this guide we will demonstrate how to connect Great Expectations to data in a PostgreSQL database. We will demonstrate how to create a PostgreSQL Datasource. With our PostgreSQL Datasource we will then show the methods for connecting to data in a PostgreSQL table and connecting to data from a PostgreSQL query.
Prerequisites
- An installation of GX set up to work with PostgreSQL
- Source data stored in a PostgreSQL database
Steps
1. Import GX and instantiate a Data Context
The code to import Great Expectations and instantiate a Data Context is:
import great_expectations as gx
context = gx.get_context()
2. Determine your connection string
For this example we will use a connection string to connect to our PostgreSQL database. In PostgreSQL, connection strings are formatted like:
We recommend that database credentials be stored
in the config_variables.yml
file,
which is located in the
uncommitted/
folder by default, and
is not part of source control. The following lines
add database credentials under the key
db_creds
.
db_creds:
drivername: postgres
host: '<your_host_name>'
port: '<your_port>'
username: '<your_username>'
password: '<your_password>'
database: '<your_database_name>'
For additional options on configuring the
config_variables.yml
file or
additional environment variables, please see our
guide on
how to configure credentials.
3. Create a PostgreSQL Datasource
Creating a PostgreSQL Datasource is as simple as
providing the add_postgres(...)
method a
name
by which to reference it in the
future and the connection_string
with
which to access it.
datasource_name = "version-0.16.16 my_datasource"
my_connection_string = (
"postgresql+psycopg2://<username>:<password>@<host>:<port>/<database>"
)
With these two values, we can create our Datasource:
datasource = context.sources.add_postgres(
name=datasource_name, connection_string=my_connection_string
)
4. Connect to a specific set of data with a Data Asset
Now that our Datasource has been created, we will use it to connect to a specific set of data in the database it is configured for. This is done by defining a Data Asset in the Datasource. A Datasource may contain multiple Data Assets, each of which will serve as the interface between GX and the specific set of data it has been configured for.
With SQL databases, there are two types of Data Assets that can be used. The first is a Table Data Asset, which connects GX to the data contained in a single table in the source database. The other is a Query Data Asset, which connects GX to the data returned by a SQL query. We will demonstrate how to create both of these in the following steps.
Although there is no set maximum number of Data Assets you can define for a datasource, there is a functional minimum. In order for GX to retrieve data from your Datasource you will need to create at least one Data Asset.
5. (Optional) Connect a Data Asset to the data in a table
We will indicate a table to connect to with a Table
Data Asset. This is done by providing the
add_table_asset(...)
method a
name
by which we will reference the Data
Asset in the future and a table_name
to
specify the table we wish the Data Asset to connect
to.
asset_name = "version-0.16.16 my_table_asset"
asset_table_name = "version-0.16.16 postgres_taxi_data"
With these two values, we can create our Data Asset:
table_asset = datasource.add_table_asset(name=asset_name, table_name=asset_table_name)
6. (Optional) Connect a Data Asset to the data returned by a query
To indicate the query that provides data to connect to
we will define a Query Data Asset. This done by
providing the add_query_asset(...)
method
a name
by which we will reference the
Data Asset in the future and a
query
which will provide the data we wish
the Data Asset to connect to.
asset_name = "version-0.16.16 my_query_asset"
asset_query = "SELECT * from postgres_taxi_data"
Once we have these two values, we can create our Data Asset with:
query_asset = datasource.add_query_asset(name=asset_name, query=asset_query)
7. (Optional) Repeat steps 5 and 6 as needed to connect to additional tables or queries
If you wish to connect to additional tables or queries in the same PostgreSQL Database, simply repeat the step above to add them as additional Data Assets.
Next steps
Now that you have connected to a PostgreSQL database and created a Data Asset, you may want to look into: