Skip to main content

Building and Running Your First RAG Agent

In this tutorial, we will build and run our first Retrieval-Augmented Generation (RAG) agent using the built-in completion client. This is the simplest LLM-powered agent and is a good starting point for understanding how to create more complex agents. The built-in completion client is a small wrapper around the low-level vendor clients (e.g., OpenAI, Anthropic, Ollama, etc.) that understands the native input arguments to the agent's main method.

info

It's also possible to use the low-level vendor clients directly. Refer to the How to Create Injectable Dependencies tutorial for more information.

Prerequisites

Before you begin, make sure you have completed the Getting Started with the Agents SDK tutorial.

Step 1: Create a new Agent

Change to the <agents project> directory from the previous tutorial and run the following command to create a new agent:

rag_agents new "my_rag_agent"

This command will create a new agent file my_rag_agent.py in the <agents project> directory. The directory structure should now look like this:

<agents project>/
├── .gitignore
├── __init__.py
├── my_rag_agent.py
├── agent_setups.json
└── env/
└── agent_setups.json

Step 2: Writing Your RAG Agent

Open the my_rag_agent.py file in your project directory and replace its content with the following code:

import json
from typing import List, Optional
from zav.agents_sdk import ChatAgent, ChatAgentFactory, ChatMessage
from zav.agents_sdk.adapters import ZAVChatCompletionClient, ZAVRetriever


@ChatAgentFactory.register()
class MyRAGAgent(ChatAgent):
agent_name = "my_rag_agent"

def __init__(
self,
retriever: ZAVRetriever,
client: ZAVChatCompletionClient,
):
self.retriever = retriever
self.client = client

async def execute(self, conversation: List[ChatMessage]) -> Optional[ChatMessage]:
# Retrieve relevant documents
query = conversation[-1].content
search_result = await self.retriever.search(query_string=query)

# Create RAG prompt
documents = json.dumps(search_result.get("hits", []), indent=2, default=str)
conversation[-1].content = f"""Answer the following question based on the provided documents.

# Question
{query}

# Documents
{documents}
"""

# Generate a response using the retrieved documents
response = await self.client.complete(
messages=conversation,
max_tokens=2048,
)
return ChatMessage.from_orm(response.chat_completion)

This code defines a simple RAG agent that takes the last message in the conversation, searches for relevant documents, creates a RAG prompt, and finally generates a response based on the retrieved documents.

Step 3: Running Your Agent Locally

Running in the UI

You can run and test your agent in the UI by running the following command:

rag_agents dev --reload

Here you can chat with your agent and test its functionality.

Running as an API

Alternatively, you can serve your agent as an API by running the following command:

rag_agents serve --reload

To test your agent, you can use the Swagger UI available at http://localhost:8000/docs or send a POST request to the /chats/responses endpoint. Here's an example using curl:

curl -X POST "http://localhost:8000/v1/chats/responses?tenant=zetaalpha" \
-H "Content-Type: application/json" \
-d '{
"agent_identifier": "my_rag_agent",
"conversation": [
{"sender": "user", "content": "What is a transformer?"}
]
}'

You should receive a response from your agent with the answer to your query.

If you have any questions or run into issues, feel free to reach out to our support team.