Create a Google Drive Connector
A Google Drive connector enables you to ingest data from your Google Drive (including Shared Drives and My Drive) into the Zeta Alpha platform. This guide shows you how to create and configure a Google Drive connector for your data ingestion workflows.
Info: This guide presents an example configuration for a Google Drive connector. For a complete set of configuration options, see the Google Drive Connector Configuration Reference.
Prerequisites
Before you begin, ensure you have:
- Access to the Zeta Alpha Platform UI
- A tenant created
- An index created
- A Google service account JSON (refer to the Connecting Google Drive to Zeta Alpha.pdf tutorial for detailed instructions)
Step 1: Create the Google Drive Basic Configuration
To create a Google Drive connector, define a JSON configuration file with these basic fields:
-
is_document_owner
: (boolean) Indicates whether this connector "owns" the documents. When set totrue
, other connectors cannot crawl the same documents. -
content_source_name
: (string) A human-friendly name for the content source in the UI. -
service_account
: (object) The service account credentials exported from Google Cloud. This should contain the complete JSON structure includingtype
,project_id
,private_key_id
,private_key
,client_email
,client_id
,auth_uri
,token_uri
,auth_provider_x509_cert_url
, andclient_x509_cert_url
. -
path_include_regex_patterns
: (array of strings, optional) Regular expressions to include only matching file paths. -
path_exclude_regex_patterns
: (array of strings, optional) Regular expressions to exclude matching file paths. -
drive_ids
: (array of strings, optional) IDs of the Shared Drives to crawl. If provided:- Non-empty list: only those Shared Drives are crawled
- Empty list: no Shared Drives are crawled
To find a drive's ID, navigate to the Google Drive web UI's Shared drives section, select a drive, and copy the string after
/folders/
in your browser's address bar (e.g.,https://drive.google.com/drive/folders/DRIVE_ID
). -
crawl_my_drive
: (boolean, optional, default:false
) Whentrue
, also crawls the authenticated user's personal My Drive. -
logo_url
: (string, optional) URL of the logo to display for the connector.
Example Configuration
Here is a minimal example:
{
"name": "My Google Drive Connector",
"description": "Daily crawl of our Team Drives",
"is_indexable": true,
"connector": "google_drive",
"connector_configuration": {
"is_document_owner": true,
"content_source_name": "Corporate Drive",
"service_account": { /* your exported JSON */ },
"drive_ids": [ "0AAbC12345XYZ" ],
"crawl_my_drive": false,
"logo_url": "https://example.com/logo.png"
}
}
Step 2: Add Field Mapping Configuration
When crawling Google Drive, the connector extracts document metadata and content for each file (see the Google Drive Connector Configuration Reference for the full list of available fields). You can map these Google Drive fields to your index fields using the field_mappings
configuration.
Example Field Mappings
{
"connector_configuration": {
…,
"field_mappings": [
{ "content_source_field_name": "name", "index_field_name": "DCMI.title" },
{ "content_source_field_name": "created_time", "index_field_name": "DCMI.created" },
{ "content_source_field_name": "modified_time", "index_field_name": "DCMI.modified" },
{ "content_source_field_name": "owners.display_name", "index_field_name": "DCMI.creator" },
{ "content_source_field_name": "uri", "index_field_name": "uri" },
{ "content_source_field_name": "uri_hash", "index_field_name": "uri_hash" },
{ "content_source_field_name": "document_content_type", "index_field_name": "document_content_type" },
{ "content_source_field_name": "base64_content", "index_field_name": "document_content_path.base64_content" }
]
}
}
Step 3: Specify What to Crawl
You can fine-tune which files get ingested using the following options:
drive_ids
: Specify Shared Drives by ID to crawl only those drivescrawl_my_drive
: Add My Drive (root) to the crawlpath_include_regex_patterns
: Include only files with matching pathspath_exclude_regex_patterns
: Exclude files with matching paths
Files whose paths match both include and exclude patterns will be excluded. If neither pattern list is set, all files in the selected drives are crawled.
Step 4: Create the Google Drive Connector
To create your Google Drive connector in the Zeta Alpha Platform UI:
- Navigate to your tenant's indexes and click View under Content Sources for the desired index
- Click Create Content Source
- Paste your JSON configuration into the editor
- Click Submit
Once created, the connector will run according to its schedule (or on demand) and ingest files matching your configuration.
Crawling Behavior
The connector currently performs a full crawl of all selected drives on every run. Incremental crawling (fetching only changed files) is not yet supported.