Create a Confluence Connector

A Confluence connector enables you to ingest pages and blog posts from your Confluence instance into the Zeta Alpha platform. This guide shows you how to create and configure a Confluence connector for your data ingestion workflows.

Info: This guide presents an example configuration for a Confluence connector. For a complete set of configuration options, see the Confluence Connector Configuration Reference.

Prerequisites

Before you begin, ensure you have:

Access to the Zeta Alpha Platform UI
A tenant created
An index created
Confluence credentials (refer to the tutorial Configure Confluence App Access for detailed instructions)

Step 1: Create the Confluence Basic Configuration

To create a Confluence connector, define a configuration file with the following basic fields:

is_document_owner: (boolean) Indicates whether this connector "owns" the crawled documents. When set to true, other connectors cannot crawl the same documents.
content_source_name: (string) The name that identifies the content source in the UI.
access_credentials: (object) The credentials required to access Confluence:
- instance_url: Your Confluence instance URL (e.g., "https://example.confluence.com")
- username: The user account for crawling
- password: The API token created for this user
include_archived_spaces: (boolean) Whether to include archived spaces in the crawl.
include_personal_spaces: (boolean) Whether to include personal spaces in the crawl.
logo_url: (string, optional) The URL of a logo to display on document cards
custom_metadata: (object, optional) Static key-value pairs added to every ingested document. See Content Source Custom Metadata.

Example Configuration

{
    "name": "My Confluence Connector",
    "description": "My Confluence connector",
    "is_indexable": true,
    "connector": "confluence",
    "connector_configuration": {
        "confluence": {
            "is_document_owner": true,
            "content_source_name": "Confluence",
            "access_credentials": {
                "instance_url": "https://example.confluence.com",
                "username": "your-username",
                "password": "your-api-token"
            },
            "include_archived_spaces": false,
            "include_personal_spaces": true,
            "logo_url": "https://example.com/logo.png"
        }
    }
}

Step 2: Add Field Mapping Configuration

When crawling Confluence, the connector extracts document metadata and content as described in the Confluence Connector Configuration Reference. You can map these Confluence fields to your index fields using the field_mappings configuration.

Example Field Mappings

The following example shows field mappings for the default index fields:

{
    ...
    "connector_configuration": {
        "confluence": {
            ...
            "field_mappings": [
                {
                    "content_source_field_name": "title",
                    "index_field_name": "DCMI.title"
                },
                {
                    "content_source_field_name": "created_date",
                    "index_field_name": "DCMI.created"
                },
                {
                    "content_source_field_name": "last_updated",
                    "index_field_name": "DCMI.modified"
                },
                {
                    "content_source_field_name": "contributors.display_name",
                    "index_field_name": "DCMI.creator"
                },
                {
                    "content_source_field_name": "content_source_name",
                    "index_field_name": "DCMI.source"
                },
                {
                    "content_source_field_name": "space_name",
                    "index_field_name": "DCMI.coverage"
                },
                {
                    "content_source_field_name": "labels",
                    "index_field_name": "DCMI.subject"
                }
            ],
            ...
        }
    }
}

Step 3: Specify What to Crawl

You can configure the Confluence connector to crawl specific content using the following options:

content_types: (array of strings, optional) The Confluence content types to crawl. Supported values: "page", "blogpost". Defaults to pages only if not specified.
space_include_keys: (array of strings, optional) Space keys in the list will be crawled. If a space key is in both the include and exclude lists, the space will not be crawled. If not passed, all spaces are crawled.
space_exclude_keys: (array of strings, optional) Space keys in the list will not be crawled. If a space key is in both the include and exclude lists, the space will not be crawled.
page_include_regex_patterns: (array of strings, optional) Content items whose titles match any of the regular expressions in the list will be crawled. If a title matches both an include and exclude pattern, the exclude pattern takes precedence and the item is not crawled. If not passed, all items are crawled.
page_exclude_regex_patterns: (array of strings, optional) Content items whose titles match any of the regular expressions in the list will not be crawled. If a title matches both an include and exclude pattern, the exclude pattern takes precedence and the item is not crawled.
label_include_regex_patterns: (array of strings, optional) Only content items that have at least one label matching any of the regular expressions will be crawled. If not passed, labels are not used for filtering.
label_exclude_regex_patterns: (array of strings, optional) Content items that have any label matching any of the regular expressions will not be crawled.
path_include_regex_patterns: (array of strings, optional) Only content items whose hierarchy path (ancestor titles joined by /, e.g. "Parent/Child/Page Title") matches any of the regular expressions will be crawled.
path_exclude_regex_patterns: (array of strings, optional) Content items whose hierarchy path matches any of the regular expressions will not be crawled.

Example Configuration

{
    ...
    "connector_configuration": {
        "confluence": {
            ...
            "content_types": [
                "page",
                "blogpost"
            ],
            "space_include_keys": [
                "ENG",
                "PROD"
            ],
            "page_include_regex_patterns": [
                ".*Documentation.*",
                ".*Guide.*"
            ],
            "page_exclude_regex_patterns": [
                ".*Draft.*",
                ".*Archived.*"
            ],
            "label_include_regex_patterns": [
                "^published$",
                "^approved-.*"
            ],
            "path_include_regex_patterns": [
                "^Engineering/.*"
            ],
            ...
        }
    }
}

Step 4: Create the Confluence Content Source

To create your Confluence connector in the Zeta Alpha Platform UI:

Navigate to your tenant and click View next to your target index
Click View under Content Sources for the index
Click Create Content Source
Paste your JSON configuration
Click Submit

Crawling Behavior

The connector crawls content based on your configuration, extracting:

Title and content (HTML)
Creation and modification dates
Contributors and authors
Space information
Labels and metadata
Status (active or archived)
Parent-child hierarchy (pages) or flat structure (blog posts)
Content type (page or blogpost)

Each crawled item includes access rights based on Confluence permissions, ensuring that only authorized users can access the documents in Zeta Alpha.

Prerequisites​

Step 1: Create the Confluence Basic Configuration​

Example Configuration​

Step 2: Add Field Mapping Configuration​

Example Field Mappings​

Step 3: Specify What to Crawl​

Example Configuration​

Step 4: Create the Confluence Content Source​

Crawling Behavior​

Prerequisites

Step 1: Create the Confluence Basic Configuration

Example Configuration

Step 2: Add Field Mapping Configuration

Example Field Mappings

Step 3: Specify What to Crawl

Example Configuration

Step 4: Create the Confluence Content Source

Crawling Behavior