Skip to main content

Join Data from Multiple Sources

A Join Enhancement enables you to enrich documents from one content source with data from another content source based on matching field values. This is useful for combining related information from different systems.

This guide shows you how to configure a join enhancement for your data ingestion workflows.

Prerequisites

Before you begin, ensure you have:

  1. Access to the Zeta Alpha Platform UI
  2. A tenant created
  3. An index created
  4. At least two content sources configured:
    • Target content source: Contains the documents to be enhanced
    • Enhancer content source: Contains the enhancing data

Step 1: Understand the Join Concept

The join enhancement works by:

  1. Matching documents from two content sources based on a common field (e.g., document ID, URI)
  2. Extracting data from the enhancer documents
  3. Adding that data to the target documents as an enhancement
  4. Optionally aggregating data when multiple enhancer documents match a single target document

Example Scenario

You have:

  • Content Source A: Product documents with basic information (document_id, name, description)
  • Content Source B: Customer reviews with ratings (product_document_id, rating, review_text)

You want to enhance product documents with aggregated review data (max rating, review count).

Step 2: Create the Join Enhancement Configuration

To create a join enhancement connector, define a configuration file with the following fields. For complete configuration details, see the Join Enhancement API Reference.

  • connector: (string) The connector type. Set it to join_enhancement.
  • name: (string) The name of the enhancement.
  • description: (string) A description of the enhancement.
  • is_indexable: (boolean) Whether the enhancement is indexable.
  • connector_configuration: (object) The configuration of the enhancement:
    • enhancement_id: (string) A unique identifier for this join enhancement.
    • target_content_source_id: (string) The ID of the content source containing documents to be enhanced.
    • enhancer_content_source_id: (string) The ID of the content source containing enhancing documents.
    • join_fields: (object) Specifies which fields to use for joining:
      • target_field_name: (string) The field name in target documents to match on (typically "document_id" or "uri").
      • enhancer_field_name: (string) The field name in enhancer documents to match on.
    • custom_metadata: (object, optional) Static custom metadata to add to all enhanced documents.
    • custom_metadata_aggregates: (array, optional) Defines how to aggregate data from enhancer documents.

Example Configuration

{
"name": "Product Reviews Join",
"description": "Enhance products with review data",
"is_indexable": true,
"connector": "join_enhancement",
"connector_configuration": {
"enhancement_id": "product_reviews",
"target_content_source_id": "products-source-id",
"enhancer_content_source_id": "reviews-source-id",
"join_fields": {
"target_field_name": "document_id",
"enhancer_field_name": "product_document_id"
},
"custom_metadata": {
"has_reviews": true
},
"custom_metadata_aggregates": [
{
"index_field_name": "max_rating",
"enhancer_field_name": "rating",
"operator": "max"
},
{
"index_field_name": "review_count",
"operator": "count"
},
{
"index_field_name": "reviews",
"operator": "push",
"array_field_mapping": [
{
"enhancer_field_name": "review_text",
"index_field_name": "text"
},
{
"enhancer_field_name": "rating",
"index_field_name": "rating"
},
{
"enhancer_field_name": "reviewer_name",
"index_field_name": "author"
}
]
}
]
}
}

Step 3: Understanding Aggregation Operators

The custom_metadata_aggregates field supports three types of aggregation:

1. Aggregate on Field (sum, max, first)

Performs calculations on a specific field from enhancer documents:

{
"index_field_name": "total_sales",
"enhancer_field_name": "sale_amount",
"operator": "sum"
}

Supported operators:

  • sum: Sum all values
  • max: Take the maximum value
  • first: Take the first value

2. Count Aggregation

Counts the number of matching enhancer documents:

{
"index_field_name": "review_count",
"operator": "count"
}

3. Array Aggregation (push)

Collects data from all matching enhancer documents into an array:

{
"index_field_name": "all_reviews",
"operator": "push",
"array_field_mapping": [
{
"enhancer_field_name": "review_text",
"index_field_name": "text"
},
{
"enhancer_field_name": "rating",
"index_field_name": "score"
}
]
}

Step 4: Create the Join Enhancement Content Source

To create the join enhancement content source in the Zeta Alpha Platform UI:

  1. Navigate to your tenant and click View next to your target index
  2. Click View under Content Sources for the index
  3. Click Create Content Source
  4. Paste your JSON configuration
  5. Click Submit

Common Use Cases

Product Reviews Enhancement

Enhance product documents with aggregated review data:

{
"enhancement_id": "reviews",
"target_content_source_id": "products",
"enhancer_content_source_id": "reviews",
"join_fields": {
"target_field_name": "document_id",
"enhancer_field_name": "product_document_id"
},
"custom_metadata_aggregates": [
{
"index_field_name": "max_rating",
"enhancer_field_name": "rating",
"operator": "max"
},
{
"index_field_name": "review_count",
"operator": "count"
}
]
}

Document Citations Enhancement

Enhance research papers with citation information:

{
"enhancement_id": "citations",
"target_content_source_id": "papers",
"enhancer_content_source_id": "citations",
"join_fields": {
"target_field_name": "uri",
"enhancer_field_name": "cited_paper_uri"
},
"custom_metadata_aggregates": [
{
"index_field_name": "citation_count",
"operator": "count"
},
{
"index_field_name": "latest_citation_date",
"enhancer_field_name": "citation_date",
"operator": "max"
},
{
"index_field_name": "citing_papers",
"operator": "push",
"array_field_mapping": [
{
"enhancer_field_name": "citing_paper_id",
"index_field_name": "paper_id"
},
{
"enhancer_field_name": "citation_date",
"index_field_name": "date"
}
]
}
]
}

User Activity Enhancement

Enhance documents with user engagement metrics:

{
"enhancement_id": "user_activity",
"target_content_source_id": "documents",
"enhancer_content_source_id": "user_events",
"join_fields": {
"target_field_name": "document_id",
"enhancer_field_name": "document_id"
},
"custom_metadata_aggregates": [
{
"index_field_name": "view_count",
"operator": "count"
},
{
"index_field_name": "last_viewed",
"enhancer_field_name": "timestamp",
"operator": "max"
},
{
"index_field_name": "first_viewed",
"enhancer_field_name": "timestamp",
"operator": "first"
}
]
}

Price History Enhancement

Enhance products with historical pricing data:

{
"enhancement_id": "price_history",
"target_content_source_id": "products",
"enhancer_content_source_id": "price_updates",
"join_fields": {
"target_field_name": "document_id",
"enhancer_field_name": "product_document_id"
},
"custom_metadata_aggregates": [
{
"index_field_name": "current_price",
"enhancer_field_name": "price",
"operator": "first"
},
{
"index_field_name": "max_price",
"enhancer_field_name": "price",
"operator": "max"
},
{
"index_field_name": "price_changes",
"operator": "push",
"array_field_mapping": [
{
"enhancer_field_name": "price",
"index_field_name": "amount"
},
{
"enhancer_field_name": "change_date",
"index_field_name": "date"
}
]
}
]
}

Join Behavior

The join enhancement:

  1. Runs automatically: Processes when either target or enhancer documents are ingested/updated
  2. Handles one-to-many: A single target document can be enhanced with data from multiple enhancer documents
  3. Updates dynamically: When enhancer documents change, target documents are automatically re-enhanced
  4. Preserves original data: Original document data remains unchanged; enhancements are stored separately