Building and Running Your First RAG Agent
In this tutorial, we will build and run our first Retrieval-Augmented Generation (RAG) agent using the built-in completion client. This is the simplest LLM-powered agent and is a good starting point for understanding how to create more complex agents. The built-in completion client is a small wrapper around the low-level vendor clients (e.g., OpenAI, Anthropic, Ollama, etc.) that understands the native input arguments to the agent's main method.
It's also possible to use the low-level vendor clients directly. Refer to the How to Create Injectable Dependencies tutorial for more information.
Prerequisites
Before you begin, make sure you have completed the Getting Started with the Agents SDK tutorial.
Step 1: Create a new Agent
Change to the <agents project>
directory from the previous tutorial and run the following command to create a new agent:
rag_agents new "my_rag_agent"
This command will create a new agent file my_rag_agent.py
in the <agents project>
directory. The directory structure should now look like this:
<agents project>/
├── .gitignore
├── __init__.py
├── my_rag_agent.py
├── agent_setups.json
└── env/
└── agent_setups.json
Step 2: Writing Your RAG Agent
Open the my_rag_agent.py
file in your project directory and replace its content with the following code:
import json
from typing import List, Optional
from zav.agents_sdk import ChatAgent, ChatAgentFactory, ChatMessage
from zav.agents_sdk.adapters import ZAVChatCompletionClient, ZAVRetriever
@ChatAgentFactory.register()
class MyRAGAgent(ChatAgent):
agent_name = "my_rag_agent"
def __init__(
self,
retriever: ZAVRetriever,
client: ZAVChatCompletionClient,
):
self.retriever = retriever
self.client = client
async def execute(self, conversation: List[ChatMessage]) -> Optional[ChatMessage]:
# Retrieve relevant documents
query = conversation[-1].content
search_result = await self.retriever.search(query_string=query)
# Create RAG prompt
documents = json.dumps(search_result.get("hits", []), indent=2, default=str)
conversation[-1].content = f"""Answer the following question based on the provided documents.
# Question
{query}
# Documents
{documents}
"""
# Generate a response using the retrieved documents
response = await self.client.complete(
messages=conversation,
max_tokens=2048,
)
return ChatMessage.from_orm(response.chat_completion)
This code defines a simple RAG agent that takes the last message in the conversation, searches for relevant documents, creates a RAG prompt, and finally generates a response based on the retrieved documents.
Step 3: Running Your Agent Locally
Running in the UI
You can run and test your agent in the UI by running the following command:
rag_agents dev --reload
Here you can chat with your agent and test its functionality.
Running as an API
Alternatively, you can serve your agent as an API by running the following command:
rag_agents serve --reload
To test your agent, you can use the Swagger UI available at http://localhost:8000/docs
or send a POST request to the /chats/responses
endpoint. Here's an example using curl
:
curl -X POST "http://localhost:8000/v1/chats/responses?tenant=zetaalpha" \
-H "Content-Type: application/json" \
-d '{
"agent_identifier": "my_rag_agent",
"conversation": [
{"sender": "user", "content": "What is a transformer?"}
]
}'
You should receive a response from your agent with the answer to your query.
If you have any questions or run into issues, feel free to reach out to our support team.