Custom RAG Agent with the Completion Client
This article is for custom agent development: writing a Python agent class because the built-in configurable agent is not enough. For standard search and retrieval-augmented generation, configure the built-in agent with the CLI instead - see Configuring Tools.
Use this pattern when you need direct control over retrieval, prompt construction, or post-processing that does not fit the built-in provider configuration. The example below uses ZAVRetriever to search Zeta Alpha and ZAVChatCompletionClient to generate the final answer.
You can use low-level vendor clients directly, but most custom agents should prefer injectable dependencies so configuration and credentials stay outside the agent class. See The Dependency Injection System.
Prerequisites
- Completed Getting Started with the Agents SDK
- Familiar with Custom Agent Development
Step 1: Create a Custom Agent
Change to your agents project directory and scaffold a custom agent:
za agents add --custom "my_rag_agent"
This creates my_rag_agent.py in your agents project:
<agents project>/
├── .gitignore
├── __init__.py
├── my_rag_agent.py
├── agent_setups.json
└── env/
└── agent_setups.json
Step 2: Implement Retrieval and Answer Generation
Replace the contents of my_rag_agent.py with:
import json
from typing import List, Optional
from zav.agents_sdk import ChatAgent, ChatAgentClassRegistry, ChatMessage
from zav.agents_sdk.adapters import ZAVChatCompletionClient, ZAVRetriever
@ChatAgentClassRegistry.register()
class MyRAGAgent(ChatAgent):
agent_name = "my_rag_agent"
def __init__(
self,
retriever: ZAVRetriever,
client: ZAVChatCompletionClient,
):
self.retriever = retriever
self.client = client
async def execute(self, conversation: List[ChatMessage]) -> Optional[ChatMessage]:
query = conversation[-1].content
search_result = await self.retriever.search(query_string=query)
documents = json.dumps(search_result.get("hits", []), indent=2, default=str)
conversation[-1].content = f"""Answer the following question based on the provided documents.
# Question
{query}
# Documents
{documents}
"""
response = await self.client.complete(
messages=conversation,
max_tokens=2048,
)
return ChatMessage.from_orm(response.chat_completion)
The agent reads the latest user message, searches for relevant documents, builds a RAG prompt, and asks the completion client to produce the answer.
Step 3: Run the Agent Locally
Run the local development UI:
za agents dev --reload
You can also serve the agent as an API:
za agents serve --reload
Then call the chat endpoint:
curl -X POST "http://localhost:8000/chats/responses?tenant=zetaalpha" \
-H "Content-Type: application/json" \
-d '{
"agent_identifier": "my_rag_agent",
"conversation": [
{"sender": "user", "content": "What is a transformer?"}
]
}'
The response should contain an answer grounded in the retrieved documents.