Create a Federated Search Connector

A Federated Search connector enables you to ingest data from an existing federated search engine (such as Google, Google Scholar, etc.) using Zeta Alpha's Search API interface. This guide shows you how to create and configure a Federated Search connector for your data ingestion workflows.

Info: This guide presents an example configuration for a Federated Search connector. For a complete set of configuration options, see the Federated Search Connector Configuration Reference.

Prerequisites

Before you begin, ensure you have:

Access to the Zeta Alpha Platform UI
A tenant created
A supported federated search engine name (e.g., "google_scholar", "google", or "bing")
A destination index created (Create Custom Index guide)
A user ID with access to the destination index

Step 1: Create the Federated Search Basic Configuration

To create a Federated Search connector, define a JSON configuration file with these basic fields:

is_document_owner: (boolean, optional, default: true) Whether this connector "owns" the documents.
content_source_name: (string, optional) A human-friendly name for the content source in the UI.
queries: (array of objects, required) List of query specifications. Each object must contain at least a query_string field and any additional parameters such as filters, year, date, or sources. For more information, see the Zeta Alpha Search API documentation.
source_index_id: (string, required) The name of the search engine to use. Supported values are "google_scholar", "google", and "bing".
search_api_url: (string, optional) The URL of the search API endpoint. Defaults to the tenant's search endpoint.
sort_by_relevance: (boolean, optional, default: true) Whether to sort results by relevance score or by date.
max_number_of_pages: (integer, optional, default: 10) Maximum number of result pages to fetch.
page_size: (integer, optional, default: 10) Number of documents to fetch per page.
authorization: (string, optional) Authorization header value to include in Search API requests.
stop_early: (boolean, optional, default: false) Stop fetching further pages when encountering a result that was already ingested. Works best with sort_by_relevance set to false.
request_headers: (object, optional) Additional HTTP headers to send to the Zeta Alpha Search API.
fetch_abstracts: (boolean, optional, default: false) Whether to fetch document abstracts directly from the document source. If false, uses the abstract returned by the federated search engine.
logo_url: (string, optional) URL of the logo to display for the connector.

Example Configuration

A minimal example:

{
  "name": "My Federated Search Connector",
  "description": "Ingest top research results from Google Scholar",
  "is_indexable": true,
  "connector": "federated_search",
  "connector_configuration": {
    "queries": [
      {
        "query_string": "machine learning",
        "filters": {
          "year": { "lower_bound": 2025 }
        }
      }
    ],
    "source_index_id": "google_scholar",
    "page_size": 20,
    "max_number_of_pages": 5,
    "sort_by_relevance": true,
    "authorization": "Bearer eyJ...",
    "request_headers": {
      "X-Custom-Header": "value"
    }
  }
}

Step 2: Add Field Mapping Configuration

The Federated Search connector emits document fields returned by the search API. You can map those fields into your index via field_mappings.

Example Field Mappings

{
  "connector_configuration": {
    "field_mappings": [
      { "content_source_field_name": "uri", "index_field_name": "uri" },
      { "content_source_field_name": "title", "index_field_name": "DCMI.title" },
      { "content_source_field_name": "abstract", "index_field_name": "DCMI.abstract" },
      { "content_source_field_name": "created_at", "index_field_name": "DCMI.date" },
      { "content_source_field_name": "authors.full_name", "index_field_name": "DCMI.creator" },
      { "content_source_field_name": "document_content_url", "index_field_name": "document_content_path.url_content" }
    ]
  }
}

Step 3: Configure Access Rights (Optional)

You can restrict document access based on user rights:

allow_access_rights: (array of objects, optional) Users with any of these access rights will be allowed to retrieve the ingested documents.
deny_access_rights: (array of objects, optional) Users with these access rights will be prevented from accessing the documents.

Step 4: Create the Federated Search Connector

To create your Federated Search connector in the Zeta Alpha Platform UI:

Navigate to your tenant's indexes and click View under Content Sources for the destination index
Click Create Content Source
Paste your JSON configuration into the editor
Click Submit

Once created, the connector will run according to its schedule (or on demand) and ingest search results into your destination index.

Crawling Behavior

Each run executes the specified queries against the search engine using the search API and ingests the returned documents. The connector will fetch at most max_number_of_pages * page_size documents per run.

Prerequisites​

Step 1: Create the Federated Search Basic Configuration​

Example Configuration​

Step 2: Add Field Mapping Configuration​

Example Field Mappings​

Step 3: Configure Access Rights (Optional)​

Step 4: Create the Federated Search Connector​

Crawling Behavior​