Skip to main content

Create a Google Drive Connector

A Google Drive connector enables you to ingest data from your Google Drive (including Shared Drives and My Drive) into the Zeta Alpha platform. This guide shows you how to create and configure a Google Drive connector for your data ingestion workflows.

Info: This guide presents an example configuration for a Google Drive connector. For a complete set of configuration options, see the Google Drive Connector Configuration Reference.

Prerequisites

Before you begin, ensure you have:

  1. Access to the Zeta Alpha Platform UI
  2. A tenant created
  3. An index created
  4. A Google service account JSON (refer to the Connecting Google Drive to Zeta Alpha.pdf tutorial for detailed instructions)

Step 1: Create the Google Drive Basic Configuration

To create a Google Drive connector, define a JSON configuration file with these basic fields:

  • is_document_owner: (boolean) Indicates whether this connector "owns" the documents. When set to true, other connectors cannot crawl the same documents.

  • content_source_name: (string) A human-friendly name for the content source in the UI.

  • service_account: (object) The service account credentials exported from Google Cloud. This should contain the complete JSON structure including type, project_id, private_key_id, private_key, client_email, client_id, auth_uri, token_uri, auth_provider_x509_cert_url, and client_x509_cert_url.

  • path_include_regex_patterns: (array of strings, optional) Regular expressions to include only matching file paths.

  • path_exclude_regex_patterns: (array of strings, optional) Regular expressions to exclude matching file paths.

  • drive_ids: (array of strings, optional) IDs of the Shared Drives to crawl. If provided:

    • Non-empty list: only those Shared Drives are crawled
    • Empty list: no Shared Drives are crawled

    To find a drive's ID, navigate to the Google Drive web UI's Shared drives section, select a drive, and copy the string after /folders/ in your browser's address bar (e.g., https://drive.google.com/drive/folders/DRIVE_ID).

  • crawl_my_drive: (boolean, optional, default: false) When true, also crawls the authenticated user's personal My Drive.

  • logo_url: (string, optional) URL of the logo to display for the connector.

Example Configuration

Here is a minimal example:

{
"name": "My Google Drive Connector",
"description": "Daily crawl of our Team Drives",
"is_indexable": true,
"connector": "google_drive",
"connector_configuration": {
"is_document_owner": true,
"content_source_name": "Corporate Drive",
"service_account": { /* your exported JSON */ },
"drive_ids": [ "0AAbC12345XYZ" ],
"crawl_my_drive": false,
"logo_url": "https://example.com/logo.png"
}
}

Step 2: Add Field Mapping Configuration

When crawling Google Drive, the connector extracts document metadata and content for each file (see the Google Drive Connector Configuration Reference for the full list of available fields). You can map these Google Drive fields to your index fields using the field_mappings configuration.

Example Field Mappings

{
"connector_configuration": {
,
"field_mappings": [
{ "content_source_field_name": "name", "index_field_name": "DCMI.title" },
{ "content_source_field_name": "created_time", "index_field_name": "DCMI.created" },
{ "content_source_field_name": "modified_time", "index_field_name": "DCMI.modified" },
{ "content_source_field_name": "owners.display_name", "index_field_name": "DCMI.creator" },
{ "content_source_field_name": "uri", "index_field_name": "uri" },
{ "content_source_field_name": "uri_hash", "index_field_name": "uri_hash" },
{ "content_source_field_name": "document_content_type", "index_field_name": "document_content_type" },
{ "content_source_field_name": "base64_content", "index_field_name": "document_content_path.base64_content" }
]
}
}

Step 3: Specify What to Crawl

You can fine-tune which files get ingested using the following options:

  • drive_ids: Specify Shared Drives by ID to crawl only those drives
  • crawl_my_drive: Add My Drive (root) to the crawl
  • path_include_regex_patterns: Include only files with matching paths
  • path_exclude_regex_patterns: Exclude files with matching paths

Files whose paths match both include and exclude patterns will be excluded. If neither pattern list is set, all files in the selected drives are crawled.

Step 4: Create the Google Drive Connector

To create your Google Drive connector in the Zeta Alpha Platform UI:

  1. Navigate to your tenant's indexes and click View under Content Sources for the desired index
  2. Click Create Content Source
  3. Paste your JSON configuration into the editor
  4. Click Submit

Once created, the connector will run according to its schedule (or on demand) and ingest files matching your configuration.

Crawling Behavior

The connector currently performs a full crawl of all selected drives on every run. Incremental crawling (fetching only changed files) is not yet supported.