Skip to main content

Create a Confluence Connector

A Confluence connector enables you to ingest pages from your Confluence instance into the Zeta Alpha platform. This guide shows you how to create and configure a Confluence connector for your data ingestion workflows.

Info: This guide presents an example configuration for a Confluence connector. For a complete set of configuration options, see the Confluence Connector Configuration Reference.

Prerequisites

Before you begin, ensure you have:

  1. Access to the Zeta Alpha Platform UI
  2. A tenant created
  3. An index created
  4. Confluence credentials (refer to the PDF tutorial "Connecting Confluence to Zeta Alpha.pdf" for detailed instructions)

Step 1: Create the Confluence Basic Configuration

To create a Confluence connector, define a configuration file with the following basic fields:

  • is_document_owner: (boolean) Indicates whether this connector "owns" the crawled documents. When set to true, other connectors cannot crawl the same documents.
  • content_source_name: (string) The name that identifies the content source in the UI.
  • access_credentials: (object) The credentials required to access Confluence:
    • instance_url: Your Confluence instance URL (e.g., "https://example.confluence.com")
    • username: The user account for crawling
    • password: The API token created for this user
  • include_archived_spaces: (boolean) Whether to include archived spaces in the crawl.
  • include_personal_spaces: (boolean) Whether to include personal spaces in the crawl.
  • logo_url: (string, optional) The URL of a logo to display on document cards

Example Configuration

{
"name": "My Confluence Connector",
"description": "My Confluence connector",
"is_indexable": true,
"connector": "confluence",
"connector_configuration": {
"is_document_owner": true,
"content_source_name": "Confluence",
"access_credentials": {
"instance_url": "https://example.confluence.com",
"username": "your-username",
"password": "your-api-token"
},
"include_archived_spaces": false,
"include_personal_spaces": true,
"logo_url": "https://example.com/logo.png"
}
}

Step 2: Add Field Mapping Configuration

When crawling Confluence, the connector extracts document metadata and content as described in the Confluence Connector Configuration Reference. You can map these Confluence fields to your index fields using the field_mappings configuration.

Example Field Mappings

The following example shows field mappings for the default index fields:

{
...
"connector_configuration": {
...
"field_mappings": [
{
"content_source_field_name": "title",
"index_field_name": "DCMI.title"
},
{
"content_source_field_name": "created_date",
"index_field_name": "DCMI.created"
},
{
"content_source_field_name": "last_updated",
"index_field_name": "DCMI.modified"
},
{
"content_source_field_name": "contributors.display_name",
"index_field_name": "DCMI.creator"
},
{
"content_source_field_name": "content_source_name",
"index_field_name": "DCMI.source"
},
{
"content_source_field_name": "space_name",
"index_field_name": "DCMI.coverage"
},
{
"content_source_field_name": "labels",
"index_field_name": "DCMI.subject"
}
],
...
}
}

Step 3: Specify What to Crawl

You can configure the Confluence connector to crawl specific content using the following options:

  • space_include_keys: (array of strings, optional) Space keys in the list will be crawled. If a space key is in both the include and exclude lists, the space will not be crawled. If not passed, all spaces are crawled.
  • space_exclude_keys: (array of strings, optional) Space keys in the list will not be crawled. If a space key is in both the include and exclude lists, the space will not be crawled.
  • page_include_regex_patterns: (array of strings, optional) Pages whose titles match any of the regular expressions in the list will be crawled. If a page title matches both an include and exclude pattern, the exclude pattern takes precedence and the page is not crawled. If not passed, all pages are crawled.
  • page_exclude_regex_patterns: (array of strings, optional) Pages whose titles match any of the regular expressions in the list will not be crawled. If a page title matches both an include and exclude pattern, the exclude pattern takes precedence and the page is not crawled.

Example Configuration

{
...
"connector_configuration": {
...
"space_include_keys": [
"ENG",
"PROD"
],
"page_include_regex_patterns": [
".*Documentation.*",
".*Guide.*"
],
"page_exclude_regex_patterns": [
".*Draft.*",
".*Archived.*"
],
...
}
}

Step 4: Create the Confluence Content Source

To create your Confluence connector in the Zeta Alpha Platform UI:

  1. Navigate to your tenant and click View next to your target index
  2. Click View under Content Sources for the index
  3. Click Create Content Source
  4. Paste your JSON configuration
  5. Click Submit

Crawling Behavior

The connector crawls pages based on your configuration, extracting:

  • Page title and content
  • Creation and modification dates
  • Contributors and authors
  • Space information
  • Labels and metadata
  • Page status (active or archived)
  • Parent-child page relationships

Each crawled page includes access rights based on Confluence permissions, ensuring that only authorized users can access the documents in Zeta Alpha.