Custom RAG Agent with the Completion Client

note

This article is for custom agent development: writing a Python agent class because the built-in configurable agent is not enough. For standard search and retrieval-augmented generation, configure the built-in agent with the CLI instead - see Configuring Tools.

Use this pattern when you need direct control over retrieval, prompt construction, or post-processing that does not fit the built-in provider configuration. The example below uses ZAVRetriever to search Zeta Alpha and ZAVChatCompletionClient to generate the final answer.

info

You can use low-level vendor clients directly, but most custom agents should prefer injectable dependencies so configuration and credentials stay outside the agent class. See The Dependency Injection System.

Prerequisites

Completed Getting Started with the Agents SDK
Familiar with Custom Agent Development

Step 1: Create a Custom Agent

Change to your agents project directory and scaffold a custom agent:

za agents add --custom "my_rag_agent"

This creates my_rag_agent.py in your agents project:

<agents project>/
├── .gitignore
├── __init__.py
├── my_rag_agent.py
├── agent_setups.json
└── env/
    └── agent_setups.json

Step 2: Implement Retrieval and Answer Generation

Replace the contents of my_rag_agent.py with:

import json
from typing import List, Optional

from zav.agents_sdk import ChatAgent, ChatAgentClassRegistry, ChatMessage
from zav.agents_sdk.adapters import ZAVChatCompletionClient, ZAVRetriever


@ChatAgentClassRegistry.register()
class MyRAGAgent(ChatAgent):
    agent_name = "my_rag_agent"

    def __init__(
        self,
        retriever: ZAVRetriever,
        client: ZAVChatCompletionClient,
    ):
        self.retriever = retriever
        self.client = client

    async def execute(self, conversation: List[ChatMessage]) -> Optional[ChatMessage]:
        query = conversation[-1].content
        search_result = await self.retriever.search(query_string=query)

        documents = json.dumps(search_result.get("hits", []), indent=2, default=str)
        conversation[-1].content = f"""Answer the following question based on the provided documents.

        # Question
        {query}

        # Documents
        {documents}
        """

        response = await self.client.complete(
            messages=conversation,
            max_tokens=2048,
        )
        return ChatMessage.from_orm(response.chat_completion)

The agent reads the latest user message, searches for relevant documents, builds a RAG prompt, and asks the completion client to produce the answer.

Step 3: Run the Agent Locally

Run the local development UI:

za agents dev --reload

You can also serve the agent as an API:

za agents serve --reload

Then call the chat endpoint:

curl -X POST "http://localhost:8000/chats/responses?tenant=zetaalpha" \
-H "Content-Type: application/json" \
-d '{
  "agent_identifier": "my_rag_agent",
  "conversation": [
    {"sender": "user", "content": "What is a transformer?"}
  ]
}'

The response should contain an answer grounded in the retrieved documents.

Prerequisites​

Step 1: Create a Custom Agent​

Step 2: Implement Retrieval and Answer Generation​

Step 3: Run the Agent Locally​

Prerequisites

Step 1: Create a Custom Agent

Step 2: Implement Retrieval and Answer Generation

Step 3: Run the Agent Locally