ConfiguredAssetGCSDataConnector
- class great_expectations.datasource.data_connector.ConfiguredAssetGCSDataConnector(name: str, datasource_name: str, bucket_or_name: str, assets: dict, execution_engine: Optional[great_expectations.execution_engine.execution_engine.ExecutionEngine] = None, default_regex: Optional[dict] = None, sorters: Optional[list] = None, prefix: Optional[str] = None, delimiter: Optional[str] = None, max_results: Optional[int] = None, gcs_options: Optional[dict] = None, batch_spec_passthrough: Optional[dict] = None, id: Optional[str] = None)#
-
Extension of ConfiguredAssetFilePathDataConnector used to connect to GCS.
A ConfiguredAssetGCSDataConnector requires an explicit specification of each DataAsset you want to connect to. This allows more fine-tuning, but also requires more setup. Please note that in order to maintain consistency with Google’s official SDK, we utilize terms like “bucket_or_name” and “max_results”. Since we convert these keys from YAML to Python and directly pass them in to the GCS connection object, maintaining consistency is necessary for proper usage.
- This DataConnector supports the following methods of authentication:
-
-
Standard gcloud auth / GOOGLE_APPLICATION_CREDENTIALS environment variable workflow
-
Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_file
-
Manual creation of credentials from google.oauth2.service_account.Credentials.from_service_account_info
-
- Parameters
-
-
name (str) – required name for DataConnector
-
datasource_name (str) – required name for datasource
-
bucket_or_name (str) – bucket name for Google Cloud Storage
-
assets (dict) – dict of asset configuration (required for ConfiguredAssetDataConnector)
-
execution_engine (ExecutionEngine) – optional reference to ExecutionEngine
-
default_regex (dict) – optional regex configuration for filtering data_references
-
sorters (list) – optional list of sorters for sorting data_references
-
prefix (str) – GCS prefix
-
delimiter (str) – GCS delimiter
-
max_results (int) – max blob filepaths to return
-
gcs_options (dict) – wrapper object for optional GCS **kwargs
-
batch_spec_passthrough (dict) – dictionary with keys that will be added directly to batch_spec
-
- get_available_data_asset_names() List[str] #
-
Return the list of asset names known by this DataConnector.
- Returns
-
A list of available names