Process Documents using AI Agents
You can process documents using AI agents configured in your tenant. This enhancement allows you to perform various tasks such as summarization, classification, or custom processing based on the agent's configuration. The processed output is stored as an enhancement and can be mapped to specific fields of the document in the index.
This guide shows you how to configure an agent processor enhancement for your data ingestion workflows.
Prerequisites
Before you begin, ensure you have:
- Access to the Zeta Alpha Platform UI
- A tenant created
- An index created
- An AI agent configured in the tenant configuration
Step 1: Configure the AI Agent in the Tenant
First, you need to add the AI agent to your tenant configuration. The agent should be defined in the chat_bot_setups
field of the tenant configuration:
{
"tenant": "my-tenant",
...
"chat_bot_setups": [
{
"bot_identifier": "document_processor",
"llm_configuration_name": "my_llm_config",
"llm_tracing_configuration_name": "my_tracing_config",
"agent_name": "agent_processor",
"serving_url": "https://agent-service.example.com",
"bot_configuration": {
"agent_config": {
"task_description": "Summarize the document content",
"output_field": "summary",
"model_parameters": {
"temperature": 0.7,
"max_tokens": 500
}
}
}
}
]
}
Agent Configuration Fields
bot_identifier
: (string) The unique identifier for the agentllm_configuration_name
: (string) Reference to the LLM configuration in the tenantllm_tracing_configuration_name
: (string, optional) Reference to the LLM tracing configurationagent_name
: (string) The name of the agent. Set it toagent_processor
for document processingserving_url
: (string) The URL of the agent serving endpointbot_configuration
: (object) The configuration of the agent containing task details and parameters
Step 2: Create the Agent Processor Enhancement Content Source
To create the agent processor enhancement content source, define a configuration file with the following fields:
connector
: (string) The connector type. Set it toagent_processor_enhancement
.name
: (string) The name of the enhancement.description
: (string) A description of the enhancement.is_indexable
: (boolean) Whether the enhancement is indexable.connector_configuration
: (object) The configuration of the enhancement:enhancement_id
: (string) The id of the enhancement, must be "agent_output".field_mappings
: (array) Maps the agent output fields to index fields.
Example Configuration
Here's an example agent processor enhancement content source configuration:
{
"name": "Document Summarization Enhancement",
"description": "AI-generated summaries for documents",
"is_indexable": true,
"connector": "agent_processor_enhancement",
"workflow_name_overrides": {
"ingest": "enhance-agent",
"reingest": "enhance-agent"
},
"connector_configuration": {
"enhancement_id": "agent_output",
"field_mappings": [
{
"content_source_field_name": "summary",
"index_field_name": "ai_generated_summary"
},
{
"content_source_field_name": "key_topics",
"index_field_name": "extracted_topics"
}
]
}
}
Step 3: Create an Agent Enhancement Workflow
To process documents using the agent enhancement, you need to ensure the enhancement workflow properly handles document flow. The processor in the main document workflow will trigger an enhancement workflow. Since the document is already in the pipeline, the agent-processed output will eventually be indexed by the main document workflow.
To avoid reprocessing the document with the enhancement workflow, ensure the following workflow exists:
{
"name": "enhance-agent",
"steps": [
{
"next_services": [
"pipeline_source"
],
"service": "start"
}
],
"tasks": [
{
"name": "pipeline-source",
"processor_settings": {
"always_run": true,
"skip_deleted": false
}
}
]
}
This workflow should be set as the workflow override for the agent processor enhancement content source in the next step.
Step 4: Create the Content Source
To create the agent processor enhancement content source in the Zeta Alpha Platform UI:
- Navigate to your tenant and click View next to your target index
- Click View under Content Sources for the index
- Click Create Content Source
- Paste your JSON configuration
- Click Submit
Step 5: Add the Agent Processor to the Documents Workflow
To process documents using the agent processor, add it to the documents workflow. This processor invokes the AI agent to process document content according to the agent's configuration.
Configuration Steps
- Identify the content source that ingests the documents you want to process
- Create or modify the workflow that content source uses by adding the
agent_processor
task - Ensure the processor is added after the text representation is created and before the
index_updater
processor
Example Workflow Task
Add the following task to the list of workflow tasks:
{
"name": "agent_processor",
"local_settings": {
"agent_identifier": "document_processor",
"should_rerun_agent": false,
"content_source_name": "Document Summarization Enhancement",
"agent_input_representations": [
"document_id",
"text",
"metadata"
],
"agent_output_to_representation_mapping": {
"summary": "agent_output.summary",
"key_topics": "agent_output.key_topics"
}
}
}
Configuration Parameters
agent_identifier
: (string) The identifier of the agent configured in the tenant (matchesbot_identifier
from Step 1)should_rerun_agent
: (boolean) When set totrue
, always processes documents, even when reprocessed. Set tofalse
to process only on first ingestion.content_source_name
: (string) The name of the agent processor enhancement content source created in Step 2agent_input_representations
: (array of strings) Specifies what document representations to send to the agent. Common values include"document_id"
,"text"
,"pdf"
, and"metadata"
agent_output_to_representation_mapping
: (object) Maps the agent's output fields to document representation paths where the processed data will be stored
Viewing Enhanced Documents
After processing, enhanced documents will contain:
- The original document fields
- Additional fields with AI-generated content
- An
enhancements
section in the ingested document details showing the agent output
You can view the enhancements by:
- Navigating to your tenant and clicking View next to your target index
- Clicking View under Content Sources for the index
- Clicking View under Ingested Documents for a content source
- Selecting a document to see its details and enhancements