Getting Started with the Chat API
Prerequisites
Before you begin, make sure you have:
- An active Zeta Alpha account with API access
- An API key with the necessary permissions
- The base URL for the Chat REST API:
https://api.zeta-alpha.com/v0/service/chat/response
- The base URL for the Chat Streaming API:
https://api.zeta-alpha.com/v0/service/chat/stream
Overview
A set of agents is exposed through the Chat API. Each agent is configured with a specific behaviour and has access to the Search API in order to follow the RAG pattern when needed.
The agents are configured per tenant. While the following built-in agents can be easily enabled for your tenant, we recommend contacting the Zeta Alpha support team to select the most relevant agents for your use case or even to build custom agents if needed, following the Getting Started with RAG Agents tutorial.
Built-in agents
chat_with_pdf
: Chat with a single PDF document, by passing its document ID in thecontext
field.chat_with_multiple_docs
: Chat with a list of documents, by passing their document IDs in thecontext
field.chat_with_dynamic_retrieval
: Chat without the need to pass a static context. The agent will dynamically retrieve the relevant context and provide an answer along with the documents that it retrieved. If context is passed, it will be used for the first answer, but for the follow-up questions the agent will dynamically retrieve more context if needed.quizbot
: Create a quiz about a given document, by passing its document ID in thecontext
field.
The Chat API documentation can be found here.
Quickstart: Using the streaming endpoint
The Chat API supports streaming through SSE based on the streaming endpoint.
Assume the following packages are installed:
pip install requests==2.28.1
pip install sseclient-py==1.8.0
Assume the ZETA_ALPHA_API_KEY is stored in env variables:
export ZETA_ALPHA_API_KEY=my-api-key
At each stream, the content of the agent's message is enhanced with the next token, so that each stream contains the whole message generated so far along with the evidences if any.
import json
import os
import requests
import sseclient
TENANT = "zetaalpha"
CHAT_STREAMING_ENDPOINT = (
f"https://api.zeta-alpha.com/v0/service/chat/stream?tenant={TENANT}"
)
headers = {
"accept": "text/event-stream",
"Content-Type": "application/json",
"x-auth": os.getenv("ZETA_ALPHA_API_KEY"),
}
response = requests.post(
CHAT_STREAMING_ENDPOINT,
headers=headers,
json={
"conversation": [
{
"sender": "user",
"content": "What is BERT?",
},
],
"agent_identifier": "chat_with_dynamic_retrieval",
},
stream=True,
)
response.raise_for_status()
client = sseclient.SSEClient(response)
for event in client.events():
try:
streamed_data = json.loads(event.data)
print(f"Data stream: {streamed_data}")
except Exception:
print(f"Data stream error: {event.data}")
streamed_data = None
if streamed_data:
print("\n---------------- COMPLETE MESSAGE ----------------")
print(f"Message:\n{streamed_data['content']}\n")
print(f"Evidences:\n{streamed_data['evidences']}\n")
print(f"Function Call:\n{streamed_data['function_call_request']}\n")
print("--------------------------------------------------")
Sample final output:
...
---------------- COMPLETE MESSAGE ----------------
Message:
BERT, which stands for Bidirectional Encoder Representations from Transformers, is a machine learning model developed by Google in 2018 for natural language processing (NLP) tasks. It uses a transformer architecture that processes text bidirectionally, meaning it considers the context from both the left and right sides of a word to understand its meaning more accurately. BERT has significantly improved performance on various NLP tasks, such as sentiment analysis, named entity recognition, and question answering <sup>2cc7f7f6</sup><sup>2c43820c</sup><sup>5495b04c</sup>.
Evidences:
[{'document_hit_url': '/documents/chunk/list?tenant=zetaalpha&property_name=id&property_values=ff8e21d5226f02ed0b22d7551cb630769d4d6495_0', 'text_extract': "Text Classification with BERT in PyTorch\n['Ruben Winastwan']\nBack in 2018, Google developed a powerful Transformer-based machine learning model for NLP applications that outperforms previous language models in different benchmark datasets. <b>And this model is called BERT.</b> In this post, we’re going to use a pre-trained BERT model from Hugging Face for a text classification task.", 'anchor_text': '<sup>2cc7f7f6</sup>'}, {'document_hit_url': '/documents/chunk/list?tenant=zetaalpha&property_name=id&property_values=1c53fc0caddd3739e8ae5d8537c688c6f2eb3c64_17', 'text_extract': "BERT 101 - State Of The Art NLP Model Explained\n['Britney Muller']\nBERT 101 - State Of The Art NLP Model Explained BERT 101 🤗 State Of The Art NLP Model Explained Published\n\t\t\t\tMarch 2, 2022 Update on GitHub britneymuller Britney Muller What is BERT? <b>BERT, short for Bidirectional Encoder Representations from Transformers, is a Machine Learning (ML) model for natural language processing.</b> It was developed in 2018 by researchers at Google AI Language and serves as a swiss army knife solution to 11+ of the most common language tasks, such as sentiment analysis and named entity recognition.", 'anchor_text': '<sup>2c43820c</sup>'}, {'document_hit_url': '/documents/chunk/list?tenant=zetaalpha&property_name=id&property_values=2d1d934c47d6642a2c7ca4b4c1d5e5360922226d_4', 'text_extract': "['Pooja Aggarwal']\nWhat is BERT? <b>BERT stands for Bidirectional Encoder Representations from Transformers.</b> Each word here has a meaning to it and we will understand by the end of this article BERT is a general purpose framework to…", 'anchor_text': '<sup>5495b04c</sup>'}]
Function Call:
None
--------------------------------------------------
Quickstart: Using the REST endpoint
The Chat API supports a non-streaming REST endpoint based on the non-streaming endpoint.
Assume the following packages are installed:
pip install requests==2.28.1
Assume the ZETA_ALPHA_API_KEY is stored in env variables:
export ZETA_ALPHA_API_KEY=my-api-key
The non-streaming endpoint has the same API schema as the streaming endpoint, but instead of returning a stream of events, it returns a single standalone response after the agent has finished generating the full answer.
import json
import os
import requests
TENANT = "zetaalpha"
CHAT_REST_ENDPOINT = (
f"https://api.zeta-alpha.com/v0/service/chat/response?tenant={TENANT}"
)
headers = {
"accept": "application/json",
"Content-Type": "application/json",
"x-auth": os.getenv("ZETA_ALPHA_API_KEY"),
}
response = requests.post(
CHAT_REST_ENDPOINT,
headers=headers,
json={
"conversation": [
{
"sender": "user",
"content": "What is BERT?",
},
],
"agent_identifier": "chat_with_dynamic_retrieval",
},
)
response.raise_for_status()
data = response.json()
bot_answer = data["conversation"][-1] if data.get("conversation") else None
if not bot_answer:
raise ValueError("No data returned from the endpoint")
print("\n---------------- COMPLETE MESSAGE ----------------")
print(f"Message:\n{bot_answer.get('content')}\n")
print(f"Evidences:\n{bot_answer.get('evidences')}\n")
print(f"Function Call:\n{bot_answer.get('function_call_request')}\n")
print("--------------------------------------------------")
The response JSON is the same as the one returned by the streaming endpoint.
Parsing the evidences
The QA agents provide evidences to the actual documents/passages that support the answer.
Each evidence object in the evidences
list contains the following fields:
document_hit_url
: The URL of the document that contains the evidence.text_extract
: The text extract of the evidence from the document.anchor_text
: The anchor text of the citation in the answer. This field can be used to link the evidence to the exact part of the answer that cites it. In this way, you can display the citation as preferred by replacing theanchor_text
with a format of your choice.
A common pattern in citations is to make them clickable. You can achieve this by replacing the anchor_text
with a markdown link.
Additionally, when displaying the citation in a UI component, you can also display a tooltip with a document preview in the place of the anchor_text
.
Inspecting the dynamic context added by the agent
An agent such the one used in the previous examples (chat_with_dynamic_retrieval
) may perform a dynamic retrieval in the middle of the conversation in order to search for more context that can be used to respond to the user's question. Such agents can communicate back to the API client what context was added.
A use case of this functionality would be to display the searching state and the dynamically retrieved context to the user in order for them to be aware that new context was found.
For the above reason, the content_parts
field of the ChatMessage
should be used instead of the content
field. The content_parts
field contains a list of message parts, which could be either text
or context
with the text
being the bot message sent to the user, and the context
being the new ConversationContext
that was added during the conversation.
import json
import os
import requests
import sseclient
TENANT = "zetaalpha"
CHAT_STREAMING_ENDPOINT = (
f"https://api.zeta-alpha.com/v0/service/chat/stream?tenant={TENANT}"
)
headers = {
"accept": "text/event-stream",
"Content-Type": "application/json",
"x-auth": os.getenv("ZETA_ALPHA_API_KEY"),
}
response = requests.post(
CHAT_STREAMING_ENDPOINT,
headers=headers,
json={
"conversation_context": {},
"conversation": [
{
"sender": "user",
"content": "What is RAGElo?",
},
],
"agent_identifier": "chat_with_dynamic_retrieval",
},
stream=True,
)
response.raise_for_status()
client = sseclient.SSEClient(response)
new_context = None
for event in client.events():
try:
streamed_data = json.loads(event.data)
content_parts = streamed_data.get("content_parts", [])
context_part = next(
(part for part in content_parts if part.get("type") == "context"), None
)
if context_part:
if not context_part.get("context"):
print("Searching for new context...\n")
elif not new_context:
new_context = context_part["context"]
print(f"Found new context: {new_context}\n")
except Exception:
print(f"Data stream error: {event.data}")
streamed_data = None
if streamed_data:
text = " ".join([part["text"] for part in content_parts if part["text"]])
print("\n---------------- COMPLETE MESSAGE ----------------")
print(f"Full response:\n{streamed_data}\n")
print(f"Message:\n{text}\n")
print(f"Evidences:\n{streamed_data['evidences']}\n")
print("--------------------------------------------------")
Sample output:
Searching for new context...
Found new context: {'document_context': {'document_ids': ['93b951a0b39ad32a8702a034716b5fbf1fddb24a_0', '419e39909b4026aa549f1dfbafb9f3958e464b6d_7', 'e7ef3919880d15363d70f854f3f1c61a22174414_4', '419e39909b4026aa549f1dfbafb9f3958e464b6d_10', '93b951a0b39ad32a8702a034716b5fbf1fddb24a_6', '704708f14b5e5167bc2819551fc0a61eaa2a6bcf_5', '93b951a0b39ad32a8702a034716b5fbf1fddb24a_2', 'a65c0ac8d43793998c5c4622cf61877ac76e1fed_0', '93b951a0b39ad32a8702a034716b5fbf1fddb24a_5', '9d1a992abbec6dfef57d2281b4bd7c734fd2dc09_10'], 'retrieval_unit': 'chunk'}, 'custom_context': None}
---------------- COMPLETE MESSAGE ----------------
Full response:
{'sender': 'bot', 'content': 'RAGElo is a toolkit designed to evaluate Retrieval Augmented Generation (RAG)-powered Large Language Models (LLMs) using the Elo rating system. It facilitates the comparison of different RAG pipelines and prompts by ranking their outputs through a tournament-style evaluation. This method helps in identifying the most effective configurations for LLM-based question-answering agents by comparing their performance across multiple questions and scenarios <sup>e4ac3bd9</sup><sup>a83188b8</sup>.', 'content_parts': [{'type': 'context', 'context': {'document_context': {'document_ids': ['93b951a0b39ad32a8702a034716b5fbf1fddb24a_6', '704708f14b5e5167bc2819551fc0a61eaa2a6bcf_5', '93b951a0b39ad32a8702a034716b5fbf1fddb24a_0', 'bf925bdeb58fdfffb124fd5c266f890423d1f890_7', '93b951a0b39ad32a8702a034716b5fbf1fddb24a_5', 'e0929619dd96c529a3b52a615d6445e5049b4930_45', 'e7ef3919880d15363d70f854f3f1c61a22174414_14', '72df85aa7e308aa322cc218c09dc1d7f2a1bcdff_2', 'e7ef3919880d15363d70f854f3f1c61a22174414_4', '48ac0c6a2174f8c2942cbefee9d4da1a2277c438_2'], 'retrieval_unit': 'chunk'}, 'custom_context': None}, 'text': None}, {'type': 'text', 'context': None, 'text': 'RAGElo is a toolkit designed to evaluate Retrieval Augmented Generation (RAG)-powered Large Language Models (LLMs) using the Elo rating system. It facilitates the comparison of different RAG pipelines and prompts by ranking their outputs through a tournament-style evaluation. This method helps in identifying the most effective configurations for LLM-based question-answering agents by comparing their performance across multiple questions and scenarios [e4ac3bd9][a83188b8].'}], 'image_uri': None, 'function_call_request': None, 'evidences': [{'document_hit_url': '/documents/chunk/list?tenant=zetaalpha&property_name=id&property_values=93b951a0b39ad32a8702a034716b5fbf1fddb24a_6', 'text_extract': " <b>RAGElo\n['zetaalphavector']\nRAGElo RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo rank</b>", 'anchor_text': '<sup>e4ac3bd9</sup>'}, {'document_hit_url': '/documents/chunk/list?tenant=zetaalpha&property_name=id&property_values=93b951a0b39ad32a8702a034716b5fbf1fddb24a_0', 'text_extract': " <b>RAGElo\n['zetaalphavector']\nElo-based RAG Agent evaluator \n\nRAGElo[^1] is a streamlined toolkit for evaluating Retrieval Augmented Generation (RAG)-powered Large Language Models (LLMs) question answering agents using the Elo rating system.</b> While it has become easier to prototype and incorporate generative LLMs in production, evaluation is still the most challenging part of the solution.", 'anchor_text': '<sup>a83188b8</sup>'}], 'function_specs': None}
Message:
RAGElo is a toolkit designed to evaluate Retrieval Augmented Generation (RAG)-powered Large Language Models (LLMs) using the Elo rating system. It facilitates the comparison of different RAG pipelines and prompts by ranking their outputs in a tournament-style format. This helps in identifying the most effective configurations for question-answering agents without frequent expert intervention [a83188b8][d698dfd4].
Evidences:
[{'document_hit_url': '/documents/chunk/list?tenant=zetaalpha&property_name=id&property_values=93b951a0b39ad32a8702a034716b5fbf1fddb24a_0', 'text_extract': " <b>RAGElo\n['zetaalphavector']\nElo-based RAG Agent evaluator \n\nRAGElo[^1] is a streamlined toolkit for evaluating Retrieval Augmented Generation (RAG)-powered Large Language Models (LLMs) question answering agents using the Elo rating system.</b> While it has become easier to prototype and incorporate generative LLMs in production, evaluation is still the most challenging part of the solution.", 'anchor_text': '<sup>a83188b8</sup>'}, {'document_hit_url': '/documents/chunk/list?tenant=zetaalpha&property_name=id&property_values=e7ef3919880d15363d70f854f3f1c61a22174414_4', 'text_extract': 'Table 1: Sample of questions submitted by users to the Infi-\nneon RAG-Fusion system\nUser-submitted queries\nWhat is the country of origin of IM72D128, and how does geopolitical\nexposure affect the market and my SAM for the microphone? <b>What is the IP rating of mounted IM72D128?</b>', 'anchor_text': '<sup>d698dfd4</sup>'}]
--------------------------------------------------
Chatting with static context
In some usecases you may want to chat with a static context, for example when you want to chat with a specific document or a predefined set of documents. This static context can be passed to the chat API using the conversation_context
field.
Chatting with a single document
The chat_with_pdf
agent can be used to chat with a single document. Pass the document_id
of your document in the document_context.document_ids
list in order to rely the answers of the agent on the context of this specific document.
The quality of the agent's answers may be affected by the length of the document and the configuration of the agent in the tenant's settings. For further help, or if the quality of the answers is not consistently good, please contact our support team in order to configure the behaviour of the agent according to your needs.
import json
import os
import requests
import sseclient
TENANT = "zetaalpha"
CHAT_STREAMING_ENDPOINT = (
f"https://api.zeta-alpha.com/v0/service/chat/stream?tenant={TENANT}"
)
headers = {
"accept": "text/event-stream",
"Content-Type": "application/json",
"x-auth": os.getenv("ZETA_ALPHA_API_KEY"),
}
response = requests.post(
CHAT_STREAMING_ENDPOINT,
headers=headers,
json={
"conversation_context": {
"document_context": {
"document_ids": ["df40f22694ea7515ef8cd321d877e54c30d336ca_0"],
"retrieval_unit": "document",
}
},
"conversation": [
{
"sender": "user",
"content": "What is BERT?",
},
],
"agent_identifier": "chat_with_pdf",
},
stream=True,
)
response.raise_for_status()
client = sseclient.SSEClient(response)
for event in client.events():
try:
streamed_data = json.loads(event.data)
print(f"Data stream: {streamed_data}")
except Exception:
print(f"Data stream error: {event.data}")
streamed_data = None
if streamed_data:
print("\n---------------- COMPLETE MESSAGE ----------------")
print(f"Message:\n{streamed_data['content']}\n")
print(f"Evidences:\n{streamed_data['evidences']}\n")
print("--------------------------------------------------")
Sample output:
...
---------------- COMPLETE MESSAGE ----------------
Message:
BERT stands for Bidirectional Encoder Representations from Transformers. It is a transformer-based NLP pretraining model developed by Google in 2018. BERT is unique because it is deeply bidirectional and unsupervised, allowing it to process words in relation to all other words in a sentence, unlike previous models. This capability enables BERT to consider the full context of a word by looking at the words that come before and after it simultaneously <sup>doc0_chunk1</sup>.
Evidences:
[{'document_hit_url': '/documents/chunk/list?tenant=zetaalpha&property_name=id&property_values=df40f22694ea7515ef8cd321d877e54c30d336ca_0', 'text_extract': 'BERT is a deeply bidirectional, unsupervised language representation model which, unlike previous models, is able to process words in relation to all the other words in a sentence. <b>It can consider the full context of a word by looking at the words that come before and after it simultaneously.</b>', 'anchor_text': '<sup>doc0_chunk1</sup>'}]
--------------------------------------------------
Chatting with a fixed set of documents
The chat_with_multiple_docs
agent can be used to chat with a fixed set of documents. Pass the document_id
s of your documents in the document_context.document_ids
list in order to rely the answers of the agent on the context of this specific documents.
The quality of the agent's answers may be affected by the number of documents in the context and the configuration of the agent in the tenant's settings. For further help, or if the quality of the answers is not consistently good, please contact our support team in order to configure the behaviour of the agent according to your needs.
import json
import os
import requests
import sseclient
TENANT = "zetaalpha"
CHAT_STREAMING_ENDPOINT = (
f"https://api.zeta-alpha.com/v0/service/chat/stream?tenant={TENANT}"
)
headers = {
"accept": "text/event-stream",
"Content-Type": "application/json",
"x-auth": os.getenv("ZETA_ALPHA_API_KEY"),
}
response = requests.post(
CHAT_STREAMING_ENDPOINT,
headers=headers,
json={
"conversation_context": {
"document_context": {
"document_ids": [
"2d1d934c47d6642a2c7ca4b4c1d5e5360922226d_0",
"6c42c17b131d886f0ccf4897d055e42580574240_0",
"df40f22694ea7515ef8cd321d877e54c30d336ca_0",
],
"retrieval_unit": "document",
}
},
"conversation": [
{
"sender": "user",
"content": "What is BERT?",
},
],
"agent_identifier": "chat_with_multiple_docs",
},
stream=True,
)
response.raise_for_status()
client = sseclient.SSEClient(response)
for event in client.events():
try:
streamed_data = json.loads(event.data)
print(f"Data stream: {streamed_data}")
except Exception:
print(f"Data stream error: {event.data}")
streamed_data = None
if streamed_data:
print("\n---------------- COMPLETE MESSAGE ----------------")
print(f"Message:\n{streamed_data['content']}\n")
print(f"Evidences:\n{streamed_data['evidences']}\n")
print("--------------------------------------------------")
Sample output:
...
---------------- COMPLETE MESSAGE ----------------
Message:
BERT stands for Bidirectional Encoder Representations from Transformers. It is a transformer-based NLP pretraining model developed by Google in 2018. BERT is a deeply bidirectional, unsupervised language representation model that processes words in relation to all other words in a sentence, considering the full context by looking at the words before and after simultaneously <sup>doc1</sup>, <sup>doc3</sup>.
Evidences:
[{'document_hit_url': '/documents/document/list?tenant=zetaalpha&property_name=id&property_values=2d1d934c47d6642a2c7ca4b4c1d5e5360922226d_0', 'text_extract': 'What is BERT? <b>BERT stands for Bidirectional Encoder Representations from Transformers.</b> Each word here has a meaning to it and we will understand by the end of this article BERT is a general purpose framework to…\nPooja Aggarwal\nWhat is BERT?', 'anchor_text': '<sup>doc1</sup>'}, {'document_hit_url': '/documents/document/list?tenant=zetaalpha&property_name=id&property_values=df40f22694ea7515ef8cd321d877e54c30d336ca_0', 'text_extract': 'What is BERT? <b>In 2018 Google developed a transformer-based NLP pretraining model called BERT or Bidirectional Encoder Representations from Transformers.</b> It is nothing but a Transformer language model with multiple encoder layers and self-attention heads.', 'anchor_text': '<sup>doc3</sup>'}]
--------------------------------------------------