How to quickly connect to a single file using Pandas
In this guide we will demonstrate how to use
Pandas
to connect to data stored in files on a filesystem. In
this example we will specifically be connecting to
data in .csv
format.
However, GX supports most read methods
available through Pandas.
Prerequisites
- A Great Expectations instance. See Install Great Expectations locally.
- A Data Context.
- Access to source data stored in a filesystem
Steps
1. Import the Great Expectations module and instantiate a Data Context
The code to import Great Expectations and instantiate a Data Context is:
import great_expectations as gx
context = gx.get_context()
2. Specify a file to read into a Data Asset
Great Expectations supports reading the data in individual files directly into a Validator using Pandas. To do this, we will run the code:
validator = context.sources.pandas_default.read_csv(
"https://raw.githubusercontent.com/great-expectations/gx_tutorials/main/data/yellow_tripdata_sample_2019-01.csv"
)
In this example, we are connecting to a
csv file. However, Great Expectations
supports connecting to most types of files that
Pandas has read_*
methods for.
Because you will be using Pandas to connect to
these files, the specific
add_*_asset
methods that will be
available to you will be determined by your
currently installed version of Pandas.
For more information on which Pandas
read_*
methods are available to you
as add_*_asset
methods, please
reference
the official Pandas Input/Output
documentation
for the version of Pandas that you have installed.
In the GX Python API,
add_*_asset
methods will require the
same parameters as the corresponding Pandas
read_*
method, with one caveat: In
Great Expectations, you will also be required to
provide a value for an
asset_name
parameter.
Next steps
Now that you have a Validator, you can immediately move on to creating Expectations. For more information, please see: