Skip to main content

Create a Google Drive Connector

A Google Drive connector enables you to ingest data from your Google Drive (including Shared Drives and My Drive) into the Zeta Alpha platform. This guide shows you how to create and configure a Google Drive connector for your data ingestion workflows.

Info: This guide presents an example configuration for a Google Drive connector. For a complete set of configuration options, see the Google Drive Connector Configuration Reference.

Prerequisites

Before you begin, ensure you have:

  1. Access to the Zeta Alpha Platform UI
  2. A tenant created
  3. An index created
  4. Google Drive credentials (refer to the tutorial Configure Google Drive App Access for detailed instructions)

Step 1: Create the Google Drive Basic Configuration

To create a Google Drive connector, define a JSON configuration file with these basic fields:

  • is_document_owner: (boolean) Indicates whether this connector "owns" the documents. When set to true, other connectors cannot crawl the same documents.

  • content_source_name: (string) A human-friendly name for the content source in the UI.

  • service_account: (object) The service account credentials exported from Google Cloud. This should contain the complete JSON structure including type, project_id, private_key_id, private_key, client_email, client_id, auth_uri, token_uri, auth_provider_x509_cert_url, and client_x509_cert_url.

  • path_include_regex_patterns: (array of strings, optional) Regular expressions to include only matching file paths.

  • path_exclude_regex_patterns: (array of strings, optional) Regular expressions to exclude matching file paths.

  • drive_ids: (array of strings, optional) IDs of the Shared Drives to crawl. If provided:

    • Non-empty list: only those Shared Drives are crawled
    • Empty list: no Shared Drives are crawled

    To find a drive's ID, navigate to the Google Drive web UI's Shared drives section, select a drive, and copy the string after /folders/ in your browser's address bar (e.g., https://drive.google.com/drive/folders/DRIVE_ID).

  • crawl_my_drive: (boolean, optional, default: false) When true, also crawls the authenticated user's personal My Drive.

  • folder_ids: (array of strings, optional) IDs of specific folders to crawl within the selected drives. When provided, only files within these folders (and their subfolders if recursive_folders is true) will be crawled. This provides server-side filtering for more targeted crawling.

  • recursive_folders: (boolean, optional, default: true) When true and folder_ids is provided, automatically includes all subfolders recursively. When false, only crawls direct children of the specified folders.

  • logo_url: (string, optional) URL of the logo to display for the connector.

Example Configuration

Here is a minimal example:

{
"name": "My Google Drive Connector",
"description": "Daily crawl of our Team Drives",
"is_indexable": true,
"connector": "google_drive",
"connector_configuration": {
"is_document_owner": true,
"content_source_name": "Corporate Drive",
"service_account": { /* your exported JSON */ },
"drive_ids": [ "0AAbC12345XYZ" ],
"crawl_my_drive": false,
"folder_ids": [ "1aBcD2eFgH3iJkL" ],
"recursive_folders": true,
"logo_url": "https://example.com/logo.png"
}
}

Step 2: Add Field Mapping Configuration

When crawling Google Drive, the connector extracts document metadata and content for each file (see the Google Drive Connector Configuration Reference for the full list of available fields). You can map these Google Drive fields to your index fields using the field_mappings configuration.

Example Field Mappings

{
"connector_configuration": {
,
"field_mappings": [
{ "content_source_field_name": "name", "index_field_name": "DCMI.title" },
{ "content_source_field_name": "created_time", "index_field_name": "DCMI.created" },
{ "content_source_field_name": "modified_time", "index_field_name": "DCMI.modified" },
{ "content_source_field_name": "owners.display_name", "index_field_name": "DCMI.creator" },
{ "content_source_field_name": "uri", "index_field_name": "uri" },
{ "content_source_field_name": "uri_hash", "index_field_name": "uri_hash" },
{ "content_source_field_name": "document_content_type", "index_field_name": "document_content_type" },
{ "content_source_field_name": "base64_content", "index_field_name": "document_content_path.base64_content" }
]
}
}

Step 3: Specify What to Crawl

You can fine-tune which files get ingested using the following options:

  • drive_ids: Specify Shared Drives by ID to crawl only those drives
  • crawl_my_drive: Add My Drive (root) to the crawl
  • folder_ids: Restrict crawling to specific folder IDs within the selected drives (server-side filtering)
  • recursive_folders: Control whether subfolders are included when using folder_ids (default: true)
  • path_include_regex_patterns: Include only files with matching paths
  • path_exclude_regex_patterns: Exclude files with matching paths

Files whose paths match both include and exclude patterns will be excluded. If neither pattern list is set, all files in the selected drives are crawled.

Example: Crawling Specific Folders

To crawl only specific folders within a Shared Drive:

{
"connector_configuration": {
,
"drive_ids": [ "0AAbC12345XYZ" ],
"folder_ids": [ "1aBcD2eFgH3iJkL", "2cDeFgHi4jKlMn" ],
"recursive_folders": true
}
}

To find a folder's ID, navigate to the folder in Google Drive and copy the string after /folders/ in your browser's address bar.

Step 4: Create the Google Drive Connector

To create your Google Drive connector in the Zeta Alpha Platform UI:

  1. Navigate to your tenant's indexes and click View under Content Sources for the desired index
  2. Click Create Content Source
  3. Paste your JSON configuration into the editor
  4. Click Submit

Once created, the connector will run according to its schedule (or on demand) and ingest files matching your configuration.

Crawling Behavior

The connector supports both full and incremental crawling:

  • Full Crawl: When first configured or when no start_page_token is present, the connector performs a full crawl of all selected drives and folders
  • Incremental Crawl: After the initial crawl, the connector automatically tracks changes using Google Drive's Changes API and only processes new, modified, or deleted files

The start_page_token is automatically managed by the connector and persisted in the content source configuration after each successful run.