Agents SDK Architecture
This page is the deep reference for how the built-in agent works internally. It covers the dependency injection system, the provider-source composition pattern, the agent execution loop, and the cross-cutting concerns that tie everything together. For the conceptual overview, see How the Agent Harness Works. For the design rationale, see Design Principles.
The Dependency Injection System
The SDK uses a custom DI container built around three components: AgentDependencyFactory, AgentDependencyRegistry, and ChatAgentFactory.
AgentDependencyFactory
Every injectable component has a factory with a create() classmethod. The return type annotation tells the registry what type this factory produces:
from zav.agents_sdk import AgentDependencyFactory, AgentDependencyRegistry
class MyServiceFactory(AgentDependencyFactory):
@classmethod
def create(cls, some_config: SomeConfig) -> MyService:
return MyService(some_config.value)
AgentDependencyRegistry.register(MyServiceFactory)
The create() method's parameters are resolved recursively by the same DI system — they can be configuration models (looked up by name in agent_configuration), other registered dependencies (looked up by type), or framework-provided objects like LLMClientConfiguration or ConversationContext.
AgentDependencyRegistry
The registry maps types to their factories:
class AgentDependencyRegistry:
registry: Dict[type, AgentDependencyFactory] = {}
@classmethod
def register(cls, factory):
# Uses the return annotation of factory.create() as the key
created_cls = inspect.signature(factory.create).return_annotation
cls.registry[created_cls] = factory
@classmethod
def get(cls, t: type) -> Optional[AgentDependencyFactory]:
return cls.registry.get(t)
@classmethod
def get_subclasses_of(cls, base: type) -> List[AgentDependencyFactory]:
return [f for return_type, f in cls.registry.items()
if issubclass(return_type, base)]
The get_subclasses_of() method is what powers auto-collection — see DependencyGroup below.
DependencyGroup — Auto-Collection
A DependencyGroup subclass declares what base type it collects:
class ToolsSourceGroup(DependencyGroup[ToolsSource]):
__collects__ = ToolsSource
When ChatAgentFactory encounters a constructor parameter typed as ToolsSourceGroup, it:
- Reads
__collects__→ToolsSource - Calls
registry.get_subclasses_of(ToolsSource)→ all factories that produceToolsSourcesubclasses - Resolves each factory via normal DI (recursively)
- Wraps the results:
ToolsSourceGroup(items=[index_tools, doc_tools, web_tools, ...])
This is how registration is all you need. Implement a source, register a factory, and it's automatically collected into the right group — no central list to maintain.
ChatAgentFactory — The Resolution Engine
ChatAgentFactory.create() is the entry point. When a request arrives:
- Fetches
AgentSetupviaAgentSetupRetriever.get(agent_identifier) - Looks up the agent class via
ChatAgentClassRegistry.get(agent_setup.agent_name) - Inspects the agent class constructor signature
- For each parameter, resolves the value using these rules (in order):
AgentCreator→ creates a closure for spawning sub-agents- Registered factory → resolves via DI (recursively parsing
create()params) DependencyGroup→ auto-collects all matching subclass factoriesConversationContext→ injects the current request contextChatAgentsubclass → creates a sub-agent recursivelyLLMClientConfiguration→ injects fromAgentSetupSpan→ injects the tracing span- Config model → looks up
param_nameinagent_configuration, parses as Pydantic model
A resolution_cache ensures dependencies marked with __singleton__ = True are created once per request, not once per injection site.
The Provider-Source Pattern
The DI system above is a general-purpose container. What gives the SDK its additive composition property is the provider-source pattern built on top of it:
Source → DependencyGroup → Provider → Agent
A source is a single implementation of a capability (e.g., IndexToolsSource provides search tools). A group auto-collects all registered sources of a given type via DependencyGroup. A provider applies configuration — filtering, ordering, formatting — and exposes a uniform interface to the Agent.
Each source implementation registers a factory with AgentDependencyRegistry. When ChatAgentFactory builds the agent, DependencyGroup auto-collects all factories whose return type matches the source base class (e.g., ToolsSource), resolves each one, and hands the collection to the provider. The provider then applies source filtering (include_sources / exclude_sources from configuration) and exposes a uniform interface to the agent.
This pattern repeats for every capability: tools, skills, context, memory, instructions, delegation, MCP, and message processing — each with its own source base class, group, and provider.
Why This Pattern Matters
When a new capability is added — say, image search tools — the developer implements ToolsSource, registers a factory, and it automatically appears in ToolsSourceGroup. The ToolsProvider, the Agent class, and ChatAgentFactory don't change. The deployer enables it by adding the source name to include_sources.
This is the core architectural invariant: the Agent class does not change when capabilities are added or removed. Configuration controls composition.
For the complete table of providers, sources, groups, and what lifecycle phases each participates in, see the Configuring Capabilities overview.
The Agent Execution Loop
When agent.execute_streaming(conversation) is called, the agent goes through a fixed sequence of phases. Each phase invokes specific providers.
1. Dispatch Check
dispatch = await dispatch_provider.try_dispatch(conversation, conversation_context)
All DispatchRule implementations are checked. If any rule matches the ConversationContext, it short-circuits the entire flow — the rule handles the request directly, and the LLM is never called. This is used for special routing like scheduled task handling.
2. Collect Tools
Tools are aggregated from all providers that expose them:
tools_registry.extend(await mcp_provider.get_tools())
tools_registry.extend(await skills_provider.get_tools())
tools_registry.extend(await agent_delegation_provider.get_tools())
tools_registry.extend(await tools_provider.get_tools())
tools_registry.extend(await memory_provider.get_tools())
tools_registry.extend(await context_provider.get_tools())
3. Build System Prompt
Each provider appends its section to the system prompt in order:
base_system_prompt (from agent_configuration)
+ skills_provider.to_prompt() → skill catalog with names and descriptions
+ agent_delegation_provider.to_prompt() → available agents and delegation instructions
+ tools_provider.to_prompt() → tool-specific guidance (per-source)
+ instructions_provider.to_prompt() → cross-cutting instructions (date, citations, etc.)
+ memory_provider.to_prompt() → memory instructions + recalled memories
+ context_provider.to_prompt() → resolved context (when injection_mode=prompt)
Each source within a provider can contribute its own prompt section via a to_prompt() method. For example, IndexToolsSource adds guidance about how search works; DocumentToolsSource adds guidance about reading documents. The provider aggregates these.
4. Process Conversation
completions = await context_provider.process_conversation(conversation_context, conversation)
ContextProvider can interleave developer messages into the conversation history. This is how document context, tag context, and custom context get injected as inline developer messages when injection_mode=conversation.
5. LLM Loop
The ZAVChatCompletionClient.complete() method handles the recursive tool-use loop:
Call LLM with (system prompt, conversation, tools)
└─ If LLM requests tool calls:
Execute tools (parallel when possible)
Append tool results to conversation
Call LLM again (recursive, up to 20 iterations)
└─ If no tool calls:
Stream the response text back
The loop is capped at max_nesting_level=20. If the LLM keeps requesting tools beyond that limit, the SDK performs one final tool-less LLM call instructing the model to produce its best answer with the information already gathered, and returns that response. If the budget is exhausted before any LLM call is made, the SDK returns a static fallback bot message instead. In both cases the client emits {"max_nesting_level_reached": True} via log_fn for observability.
6. Post-Process
async for response, message in message_processing_provider.process_stream(raw_stream):
yield message
The agent wraps the raw LLM output stream in a chain of MessageProcessor implementations. Each processor receives the full async generator of (ChatResponse, ChatMessage) pairs and yields transformed pairs — controlling its own buffering strategy. Processors compose via generator chaining: each wraps the previous one's output.
The primary processor is CitationProcessor, which passes through intermediate streaming chunks unchanged and resolves citation markers only on the final message.
7. Cleanup
After execution completes, the Agent iterates all instance attributes and calls aclose(), cleanup(), or async_cleanup() on anything that implements those methods — then deletes all attributes and triggers garbage collection. This ensures MCP connections, file handles, and other resources are released per-request.
Cross-Cutting Concerns — The agent_state Module
Some data needs to flow between components that don't have direct references to each other. The tool execution phase produces data that the post-processing phase consumes. These are handled by singleton request-scoped objects in the agent_state module.
CitationStore
The bridge between search tools and citation post-processing:
- During tool execution: whenever
IndexToolsSourceformats hits for the LLM — whether from search, browse, document listing, metadata retrieval, or document reading — it callscitation_store.register_hit()for each result, storing the full document metadata, context snippet, short ID, and index type. - During post-processing:
CitationProcessorreads from the sameCitationStoreto resolve inline citation markers in the LLM's final response into structured evidence objects with source URLs, page ranges, and relevance scores.
The CitationStoreFactory is marked __singleton__ = True, so the same instance is shared across all components within a single request. Both the tool source and the message processor receive the same store through DI.
The citation strategy — how the LLM formats its references — is configured via CitationConfiguration:
| Strategy | What the LLM outputs | What the processor does |
|---|---|---|
inline_url | Full URLs in markdown links | Resolves to evidence with document metadata |
short_id | Short hash IDs like [abc123] | Maps short ID → source URL → full evidence |
deferred | Nothing — processor handles it | Matches response text to search results post-hoc |
sup_numeric | Superscript numbers ¹²³ | Maps numbers to search result order |
ConversationImageStore
Handles images that appear in conversations:
- During context resolution: when context sources encounter images, they call
image_store.register_image(base64_data)and get back an 8-character SHA256 hash. - During tool execution: tools that need the original image data call
image_store.get_image_data(short_id).
Like CitationStore, this is a singleton per request — the same instance bridges context resolution and tool execution.
The Pattern
Both stores follow the same pattern:
- Registered via
AgentDependencyFactorywith__singleton__ = True - Auto-discovered through DI — any component that declares the store as a constructor parameter gets the same instance
- No direct references between producer and consumer — they communicate through the shared store
This keeps the tool sources and processors decoupled. A ToolsSource doesn't need to know that a MessageProcessor exists; they share data through the store.
Source Filtering — The Valve Chain
Every provider uses the same filtering logic to decide which sources are active. The filter evaluates three inputs in order:
exclude_sources— hard veto. If a source name is in the exclude set, it's OFF regardless of anything else.include_sources— allowlist. If defined and the source is in it, the source is ON (overrides the source's ownenabledflag). If defined and the source is not in it, the source is OFF.source_enabled— when no include list is defined, the source's ownenabledflag decides.
def is_source_active(name, source_enabled, include, exclude) -> bool:
if exclude is not None and name in exclude:
return False # exclude always wins
if include is not None:
return name in include # allowlist overrides enabled flag
return source_enabled # fall back to source default
This creates a clean precedence chain. A source can ship as enabled=False by default (opt-in) or enabled=True (opt-out). The deployer's include_sources / exclude_sources in configuration always has the final say.
A second filter, passes_name_filter, applies the same logic at the individual tool level — a source might be active, but specific tools within it can be included or excluded.
Server Application and Request Lifecycle
The HTTP controller receives requests and routes them through the message bus to handlers. Handlers retrieve agent configuration (from tenant settings in managed deployments, from agent_setups.json locally), assemble the agent via ChatAgentFactory, and stream the response back.
The Complete Request Lifecycle
1. Request arrives at HTTP controller
2. Controller dispatches Command via message bus
3. Handler calls ChatAgentFactory.create(agent_identifier, bot_params)
→ Fetches AgentSetup from AgentSetupRetriever
→ Looks up agent class from ChatAgentClassRegistry
→ DI resolution: all constructor parameters resolved recursively
- Providers via AgentDependencyFactory
- Sources via DependencyGroup auto-collection
- Singletons (CitationStore, ConversationImageStore) via resolution_cache
- Config models from agent_configuration
4. Handler calls agent.execute_streaming(conversation)
a. DispatchProvider.try_dispatch() — short-circuit if a rule matches
b. Collect tools from all providers
c. Assemble system prompt from all providers
d. ContextProvider.process_conversation() — interleave context into messages
e. ZAVChatCompletionClient.complete() — streaming with recursive tool execution loop
f. MessageProcessingProvider.process() — post-process each response chunk
g. Yield processed ChatMessage to caller
5. Cleanup: all provider/source attributes closed, GC triggered
Agent Base Classes
Three base classes support different agent patterns:
ChatAgent— core base withexecute(), tracing instrumentation,ToolsRegistry, and cleanup lifecycleStreamableChatAgent— extendsChatAgentwithexecute_streaming()for streaming responses. The built-inAgentextends this.ProcessorAgent— non-chat agent for document processing pipelines (not covered here)
Register agent classes with the @ChatAgentClassRegistry.register() decorator. The agent_name class attribute maps to the agent_name field in AgentSetup.