Skip to main content
Version: 0.17.23

Connect to SQL database source data

Use the information provided here to connect to source data stored in SQL databases. Great Expectations (GX) uses SQLAlchemy to connect to SQL source data, and most of the SQL dialects supported by SQLAlchemy are also supported by GX. For more information about the SQL dialects supported by SQLAlchemy, see Dialects.

SQL

Connect GX to a SQL database to access source data.

Prerequisites

Import GX and instantiate a Data Context

Run the following Python code to import GX and instantiate a Data Context:

import great_expectations as gx

context = gx.get_context()

Determine your connection string

GX supports numerous SQL source data systems. However, most SQL dialects have their own specifications for defining a connection string. See the dialect documentation to determine the connection string for your SQL database.

Some examples of different connection strings:

The following are some of the connection strings that are available for different SQL dialects:

  • AWS Athena: awsathena+rest://@athena.<REGION>.amazonaws.com/<DATABASE>?s3_staging_dir=<S3_PATH>
  • BigQuery: bigquery://<GCP_PROJECT>/<BIGQUERY_DATASET>?credentials_path=/path/to/your/credentials.json
  • MSSQL: mssql+pyodbc://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>?driver=<DRIVER>&charset=utf&autocommit=true
  • MySQL: mysql+pymysql://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>
  • PostgreSQL: postgresql+psycopg2://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>
  • Redshift: postgresql+psycopg2://<USER_NAME>:<PASSWORD>@<HOST>:<PORT>/<DATABASE>?sslmode=<SSLMODE>
  • Snowflake: snowflake://<USER_NAME>:<PASSWORD>@<ACCOUNT_NAME>/<DATABASE_NAME>/<SCHEMA_NAME>?warehouse=<WAREHOUSE_NAME>&role=<ROLE_NAME>&application=great_expectations_oss
  • SQLite: sqlite:///<PATH_TO_DB_FILE>
  • Trino: trino://<USERNAME>:<PASSWORD>@<HOST>:<PORT>/<CATALOG>/<SCHEMA>

Run one of the connection strings in your preferred SQL dialect to store the connection string in the connection_string variable with plain text credentials. The following code is an example of the PostgreSQL connection string format:

connection_string = "postgresql+psycopg2://username:my_password@localhost/test"
Is there a more secure way to include my credentials?

You can use environment variables or a key in config_variables.yml to store connection string passwords. After you define your password, you reference it in your connection string similar to this example:

connection_string = (
"postgresql+psycopg2://<username>:${MY_PASSWORD}@<host>:<port>/<database>"
)

In the previous example MY_PASSWORD is the name of the environment variable, or the key to the value in config_variables.yml that corresponds to your password.

If you include a password as plain text in your connection string when you define your Data Source, GX automatically removes it, adds it to config_variables.yml, and substitutes it in the Data Source saved configuration with a variable.

Create a SQL Data Source

Run the following Python code to create a SQL Data Source:

datasource = context.sources.add_sql(
name="my_datasource", connection_string=connection_string
)