Skip to main content

Custom RAG Agent with the Completion Client

note

This article is for custom agent development: writing a Python agent class because the built-in configurable agent is not enough. For standard search and retrieval-augmented generation, configure the built-in agent with the CLI instead - see Configuring Tools.

Use this pattern when you need direct control over retrieval, prompt construction, or post-processing that does not fit the built-in provider configuration. The example below uses ZAVRetriever to search Zeta Alpha and ZAVChatCompletionClient to generate the final answer.

info

You can use low-level vendor clients directly, but most custom agents should prefer injectable dependencies so configuration and credentials stay outside the agent class. See The Dependency Injection System.

Prerequisites

Step 1: Create a Custom Agent

Change to your agents project directory and scaffold a custom agent:

za agents add --custom "my_rag_agent"

This creates my_rag_agent.py in your agents project:

<agents project>/
├── .gitignore
├── __init__.py
├── my_rag_agent.py
├── agent_setups.json
└── env/
└── agent_setups.json

Step 2: Implement Retrieval and Answer Generation

Replace the contents of my_rag_agent.py with:

import json
from typing import List, Optional

from zav.agents_sdk import ChatAgent, ChatAgentClassRegistry, ChatMessage
from zav.agents_sdk.adapters import ZAVChatCompletionClient, ZAVRetriever


@ChatAgentClassRegistry.register()
class MyRAGAgent(ChatAgent):
agent_name = "my_rag_agent"

def __init__(
self,
retriever: ZAVRetriever,
client: ZAVChatCompletionClient,
):
self.retriever = retriever
self.client = client

async def execute(self, conversation: List[ChatMessage]) -> Optional[ChatMessage]:
query = conversation[-1].content
search_result = await self.retriever.search(query_string=query)

documents = json.dumps(search_result.get("hits", []), indent=2, default=str)
conversation[-1].content = f"""Answer the following question based on the provided documents.

# Question
{query}

# Documents
{documents}
"""

response = await self.client.complete(
messages=conversation,
max_tokens=2048,
)
return ChatMessage.from_orm(response.chat_completion)

The agent reads the latest user message, searches for relevant documents, builds a RAG prompt, and asks the completion client to produce the answer.

Step 3: Run the Agent Locally

Run the local development UI:

za agents dev --reload

You can also serve the agent as an API:

za agents serve --reload

Then call the chat endpoint:

curl -X POST "http://localhost:8000/chats/responses?tenant=zetaalpha" \
-H "Content-Type: application/json" \
-d '{
"agent_identifier": "my_rag_agent",
"conversation": [
{"sender": "user", "content": "What is a transformer?"}
]
}'

The response should contain an answer grounded in the retrieved documents.