Batch
- class great_expectations.core.batch.Batch(data: Optional[Union[great_expectations.core.batch.BatchData, pandas.core.frame.DataFrame, pyspark.sql.dataframe.DataFrame]] = None, batch_request: Optional[Union[great_expectations.core.batch.BatchRequestBase, dict]] = None, batch_definition: Optional[great_expectations.core.batch.BatchDefinition] = None, batch_spec: Optional[great_expectations.core.id_dict.BatchSpec] = None, batch_markers: Optional[great_expectations.core.batch.BatchMarkers] = None, data_context=None, datasource_name=None, batch_parameters=None, batch_kwargs=None)#
-
A Batch is a selection of records from a Data Asset.
A Datasource produces Batch objects to interact directly with data. Creating a Batch does NOT require moving data; the Batch facilitates access to the data and maintains metadata.
- -Relevant Documentation Links -
- Parameters
-
-
data – A BatchDataType object which interacts directly with the ExecutionEngine.
-
batch_request – BatchRequest that was used to obtain the data.
-
batch_definition – Complete BatchDefinition that describes the data.
-
batch_spec – Complete BatchSpec that describes the data.
-
batch_markers – Additional metadata that may be useful to understand batch.
-
data_context –
DataContext connected to the
Deprecated since version 0.14.0.
-
datasource_name –
name of datasource used to obtain the batch
Deprecated since version 0.14.0.
-
batch_parameters –
keyword arguments describing the batch data
Deprecated since version 0.14.0.
-
batch_kwargs –
keyword arguments used to request a batch from a Datasource
Deprecated since version 0.14.0.
-
- Returns
-
Batch instance created.
- head(n_rows=5, fetch_all=False)#
-
Return the first n rows from the Batch.
This function returns the first n_rows rows. It is useful for quickly testing if your object has the data you expected.
It will always obtain data from the Datasource and return a Pandas DataFrame available locally.
- Parameters
-
-
n_rows – the number of rows to return
-
fetch_all – whether to fetch all rows; overrides n_rows if set to True
-
- Returns
-
A Pandas DataFrame
- to_json_dict() Dict[str, Optional[Union[Dict[str, JSONValues], List[JSONValues], str, int, float, bool]]] #
-
Returns a JSON-serializable dict representation of this Batch.
- Returns
-
A JSON-serializable dict representation of this Batch.