Skip to main content

Create a Box Connector

A Box connector enables you to ingest data from your Box instance into the Zeta Alpha platform. This guide shows you how to create and configure a Box connector for your data ingestion workflows, including options that impact crawling performance.

Info: This guide presents an example configuration for a Box connector. For a complete set of configuration options, see the Box Connector Configuration Reference.

Prerequisites

Before you begin, ensure you have:

  1. Access to the Zeta Alpha Platform UI
  2. A tenant created
  3. An index created
  4. Box credentials (refer to the PDF tutorial "Connecting Box to Zeta Alpha.pdf" for detailed instructions)

Step 1: Create the Box Basic Configuration

To create a Box connector, define a configuration file with the following basic fields:

  • is_document_owner: (boolean) Indicates whether this connector "owns" the crawled documents. When set to true, other connectors cannot crawl the same documents.
  • content_source_name: (string) The name that identifies the content source in the UI.
  • access_credentials: (object) The credentials required to access Box:
    • client_id: The client ID of your Box application
    • client_secret: The client secret of your Box application
    • enterprise_id: The enterprise ID of your Box account
  • logo_url: (string, optional) The URL of a logo to display on document cards

Example Configuration

{
"name": "My Box Connector",
"description": "My Box connector",
"is_indexable": true,
"connector": "box",
"connector_configuration": {
"is_document_owner": true,
"content_source_name": "Box Content Source",
"access_credentials": {
"client_id": "your-client-id",
"client_secret": "your-client-secret",
"enterprise_id": "your-enterprise-id"
},
"logo_url": "https://example.com/logo.png"
...
}
}

Step 2: Add Field Mapping Configuration

When crawling Box, the connector extracts document metadata and content as described in the Box Connector Configuration Reference. You can map these Box fields to your index fields using the field_mappings configuration.

Example Field Mappings

The following example shows field mappings for the default index fields:

{
...
"connector_configuration": {
...
"field_mappings": [
{
"content_source_field_name": "name",
"index_field_name": "DCMI.title"
},
{
"content_source_field_name": "description",
"index_field_name": "DCMI.abstract"
},
{
"content_source_field_name": "owner",
"index_field_name": "DCMI.creator",
"inner_field_mappings": [
{
"content_source_field_name": "display_name",
"index_field_name": "full_name"
}
]
},
{
"content_source_field_name": "content_source_name",
"index_field_name": "DCMI.source"
},
{
"content_source_field_name": "created_at",
"index_field_name": "DCMI.created"
},
{
"content_source_field_name": "file_id",
"index_field_name": "DCMI.identifier"
},
{
"content_source_field_name": "uri",
"index_field_name": "uri"
},
{
"content_source_field_name": "uri_hash",
"index_field_name": "uri_hash"
},
{
"content_source_field_name": "document_content_type",
"index_field_name": "document_content_type"
},
{
"content_source_field_name": "document_content",
"index_field_name": "document_content"
},
{
"content_source_field_name": "document_content_path",
"index_field_name": "document_content_path"
}
],
...
}
}

Step 3: Specify What to Crawl

You can configure the Box connector to crawl specific content using the following options:

  • path_inclusion_regex_patterns: (array of strings, optional) Regular expressions to match file paths for inclusion. Only files whose paths match these patterns will be crawled. If a file matches both inclusion and exclusion patterns, the exclusion takes precedence.
  • path_exclusion_regex_patterns: (array of strings, optional) Regular expressions to match file paths for exclusion. Files whose paths match these patterns will not be crawled.
  • since_crawl_date: (string, optional) Only crawl files updated after this date and time (ISO 8601 format).
  • crawl_limit: (integer, optional) The maximum number of files to crawl in a single run.
  • full_crawl: (boolean, optional, default: false) Whether to perform a full crawl of all files in the Box account.
  • folder_ids: (array of strings, optional) The IDs of specific folders to crawl. If not specified, the connector crawls all accessible files. To find a folder ID, view the folder in the Box web app and extract the numeric identifier from the URL.

Folder ID

Example Configuration

{
...
"connector_configuration": {
...
"path_inclusion_regex_patterns": [
".*\\.pdf$",
".*\\.docx$"
],
"path_exclusion_regex_patterns": [
".*\\.txt$"
],
"folder_ids": [
"1234567890",
"0987654321"
],
"since_crawl_date": "2022-01-01T00:00:00Z",
"crawl_limit": 1000,
"full_crawl": false,
...
}
}

Step 4: Create the Box Content Source

To create your Box connector in the Zeta Alpha Platform UI:

  1. Navigate to your tenant and click View next to your target index
  2. Click View under Content Sources for the index
  3. Click Create Content Source
  4. Paste your JSON configuration
  5. Click Submit

Create Box Connector