Create a Confluence Connector
A Confluence connector enables you to ingest pages from your Confluence instance into the Zeta Alpha platform. This guide shows you how to create and configure a Confluence connector for your data ingestion workflows.
Info: This guide presents an example configuration for a Confluence connector. For a complete set of configuration options, see the Confluence Connector Configuration Reference.
Prerequisites
Before you begin, ensure you have:
- Access to the Zeta Alpha Platform UI
- A tenant created
- An index created
- Confluence credentials (refer to the PDF tutorial "Connecting Confluence to Zeta Alpha.pdf" for detailed instructions)
Step 1: Create the Confluence Basic Configuration
To create a Confluence connector, define a configuration file with the following basic fields:
is_document_owner
: (boolean) Indicates whether this connector "owns" the crawled documents. When set totrue
, other connectors cannot crawl the same documents.content_source_name
: (string) The name that identifies the content source in the UI.access_credentials
: (object) The credentials required to access Confluence:instance_url
: Your Confluence instance URL (e.g.,"https://example.confluence.com"
)username
: The user account for crawlingpassword
: The API token created for this user
include_archived_spaces
: (boolean) Whether to include archived spaces in the crawl.include_personal_spaces
: (boolean) Whether to include personal spaces in the crawl.logo_url
: (string, optional) The URL of a logo to display on document cards
Example Configuration
{
"name": "My Confluence Connector",
"description": "My Confluence connector",
"is_indexable": true,
"connector": "confluence",
"connector_configuration": {
"is_document_owner": true,
"content_source_name": "Confluence",
"access_credentials": {
"instance_url": "https://example.confluence.com",
"username": "your-username",
"password": "your-api-token"
},
"include_archived_spaces": false,
"include_personal_spaces": true,
"logo_url": "https://example.com/logo.png"
}
}
Step 2: Add Field Mapping Configuration
When crawling Confluence, the connector extracts document metadata and content as described in the Confluence Connector Configuration Reference. You can map these Confluence fields to your index fields using the field_mappings
configuration.
Example Field Mappings
The following example shows field mappings for the default index fields:
{
...
"connector_configuration": {
...
"field_mappings": [
{
"content_source_field_name": "title",
"index_field_name": "DCMI.title"
},
{
"content_source_field_name": "created_date",
"index_field_name": "DCMI.created"
},
{
"content_source_field_name": "last_updated",
"index_field_name": "DCMI.modified"
},
{
"content_source_field_name": "contributors.display_name",
"index_field_name": "DCMI.creator"
},
{
"content_source_field_name": "content_source_name",
"index_field_name": "DCMI.source"
},
{
"content_source_field_name": "space_name",
"index_field_name": "DCMI.coverage"
},
{
"content_source_field_name": "labels",
"index_field_name": "DCMI.subject"
}
],
...
}
}
Step 3: Specify What to Crawl
You can configure the Confluence connector to crawl specific content using the following options:
space_include_keys
: (array of strings, optional) Space keys in the list will be crawled. If a space key is in both the include and exclude lists, the space will not be crawled. If not passed, all spaces are crawled.space_exclude_keys
: (array of strings, optional) Space keys in the list will not be crawled. If a space key is in both the include and exclude lists, the space will not be crawled.page_include_regex_patterns
: (array of strings, optional) Pages whose titles match any of the regular expressions in the list will be crawled. If a page title matches both an include and exclude pattern, the exclude pattern takes precedence and the page is not crawled. If not passed, all pages are crawled.page_exclude_regex_patterns
: (array of strings, optional) Pages whose titles match any of the regular expressions in the list will not be crawled. If a page title matches both an include and exclude pattern, the exclude pattern takes precedence and the page is not crawled.
Example Configuration
{
...
"connector_configuration": {
...
"space_include_keys": [
"ENG",
"PROD"
],
"page_include_regex_patterns": [
".*Documentation.*",
".*Guide.*"
],
"page_exclude_regex_patterns": [
".*Draft.*",
".*Archived.*"
],
...
}
}
Step 4: Create the Confluence Content Source
To create your Confluence connector in the Zeta Alpha Platform UI:
- Navigate to your tenant and click View next to your target index
- Click View under Content Sources for the index
- Click Create Content Source
- Paste your JSON configuration
- Click Submit
Crawling Behavior
The connector crawls pages based on your configuration, extracting:
- Page title and content
- Creation and modification dates
- Contributors and authors
- Space information
- Labels and metadata
- Page status (active or archived)
- Parent-child page relationships
Each crawled page includes access rights based on Confluence permissions, ensuring that only authorized users can access the documents in Zeta Alpha.