Skip to main content

Agents SDK Architecture

This page is the deep reference for how the built-in agent works internally. It covers the dependency injection system, the provider-source composition pattern, the agent execution loop, and the cross-cutting concerns that tie everything together. For the conceptual overview, see How the Agent Harness Works. For the design rationale, see Design Principles.

The Dependency Injection System

The SDK uses a custom DI container built around three components: AgentDependencyFactory, AgentDependencyRegistry, and ChatAgentFactory.

AgentDependencyFactory

Every injectable component has a factory with a create() classmethod. The return type annotation tells the registry what type this factory produces:

from zav.agents_sdk import AgentDependencyFactory, AgentDependencyRegistry

class MyServiceFactory(AgentDependencyFactory):
@classmethod
def create(cls, some_config: SomeConfig) -> MyService:
return MyService(some_config.value)

AgentDependencyRegistry.register(MyServiceFactory)

The create() method's parameters are resolved recursively by the same DI system — they can be configuration models (looked up by name in agent_configuration), other registered dependencies (looked up by type), or framework-provided objects like LLMClientConfiguration or ConversationContext.

AgentDependencyRegistry

The registry maps types to their factories:

class AgentDependencyRegistry:
registry: Dict[type, AgentDependencyFactory] = {}

@classmethod
def register(cls, factory):
# Uses the return annotation of factory.create() as the key
created_cls = inspect.signature(factory.create).return_annotation
cls.registry[created_cls] = factory

@classmethod
def get(cls, t: type) -> Optional[AgentDependencyFactory]:
return cls.registry.get(t)

@classmethod
def get_subclasses_of(cls, base: type) -> List[AgentDependencyFactory]:
return [f for return_type, f in cls.registry.items()
if issubclass(return_type, base)]

The get_subclasses_of() method is what powers auto-collection — see DependencyGroup below.

DependencyGroup — Auto-Collection

A DependencyGroup subclass declares what base type it collects:

class ToolsSourceGroup(DependencyGroup[ToolsSource]):
__collects__ = ToolsSource

When ChatAgentFactory encounters a constructor parameter typed as ToolsSourceGroup, it:

  1. Reads __collects__ToolsSource
  2. Calls registry.get_subclasses_of(ToolsSource) → all factories that produce ToolsSource subclasses
  3. Resolves each factory via normal DI (recursively)
  4. Wraps the results: ToolsSourceGroup(items=[index_tools, doc_tools, web_tools, ...])

This is how registration is all you need. Implement a source, register a factory, and it's automatically collected into the right group — no central list to maintain.

ChatAgentFactory — The Resolution Engine

ChatAgentFactory.create() is the entry point. When a request arrives:

  1. Fetches AgentSetup via AgentSetupRetriever.get(agent_identifier)
  2. Looks up the agent class via ChatAgentClassRegistry.get(agent_setup.agent_name)
  3. Inspects the agent class constructor signature
  4. For each parameter, resolves the value using these rules (in order):
    • AgentCreator → creates a closure for spawning sub-agents
    • Registered factory → resolves via DI (recursively parsing create() params)
    • DependencyGroup → auto-collects all matching subclass factories
    • ConversationContext → injects the current request context
    • ChatAgent subclass → creates a sub-agent recursively
    • LLMClientConfiguration → injects from AgentSetup
    • Span → injects the tracing span
    • Config model → looks up param_name in agent_configuration, parses as Pydantic model

A resolution_cache ensures dependencies marked with __singleton__ = True are created once per request, not once per injection site.

The Provider-Source Pattern

The DI system above is a general-purpose container. What gives the SDK its additive composition property is the provider-source pattern built on top of it:

Source → DependencyGroup → Provider → Agent

A source is a single implementation of a capability (e.g., IndexToolsSource provides search tools). A group auto-collects all registered sources of a given type via DependencyGroup. A provider applies configuration — filtering, ordering, formatting — and exposes a uniform interface to the Agent.

Each source implementation registers a factory with AgentDependencyRegistry. When ChatAgentFactory builds the agent, DependencyGroup auto-collects all factories whose return type matches the source base class (e.g., ToolsSource), resolves each one, and hands the collection to the provider. The provider then applies source filtering (include_sources / exclude_sources from configuration) and exposes a uniform interface to the agent.

This pattern repeats for every capability: tools, skills, context, memory, instructions, delegation, MCP, and message processing — each with its own source base class, group, and provider.

Why This Pattern Matters

When a new capability is added — say, image search tools — the developer implements ToolsSource, registers a factory, and it automatically appears in ToolsSourceGroup. The ToolsProvider, the Agent class, and ChatAgentFactory don't change. The deployer enables it by adding the source name to include_sources.

This is the core architectural invariant: the Agent class does not change when capabilities are added or removed. Configuration controls composition.

For the complete table of providers, sources, groups, and what lifecycle phases each participates in, see the Configuring Capabilities overview.

The Agent Execution Loop

When agent.execute_streaming(conversation) is called, the agent goes through a fixed sequence of phases. Each phase invokes specific providers.

1. Dispatch Check

dispatch = await dispatch_provider.try_dispatch(conversation, conversation_context)

All DispatchRule implementations are checked. If any rule matches the ConversationContext, it short-circuits the entire flow — the rule handles the request directly, and the LLM is never called. This is used for special routing like scheduled task handling.

2. Collect Tools

Tools are aggregated from all providers that expose them:

tools_registry.extend(await mcp_provider.get_tools())
tools_registry.extend(await skills_provider.get_tools())
tools_registry.extend(await agent_delegation_provider.get_tools())
tools_registry.extend(await tools_provider.get_tools())
tools_registry.extend(await memory_provider.get_tools())
tools_registry.extend(await context_provider.get_tools())

3. Build System Prompt

Each provider appends its section to the system prompt in order:

base_system_prompt (from agent_configuration)
+ skills_provider.to_prompt() → skill catalog with names and descriptions
+ agent_delegation_provider.to_prompt() → available agents and delegation instructions
+ tools_provider.to_prompt() → tool-specific guidance (per-source)
+ instructions_provider.to_prompt() → cross-cutting instructions (date, citations, etc.)
+ memory_provider.to_prompt() → memory instructions + recalled memories
+ context_provider.to_prompt() → resolved context (when injection_mode=prompt)

Each source within a provider can contribute its own prompt section via a to_prompt() method. For example, IndexToolsSource adds guidance about how search works; DocumentToolsSource adds guidance about reading documents. The provider aggregates these.

4. Process Conversation

completions = await context_provider.process_conversation(conversation_context, conversation)

ContextProvider can interleave developer messages into the conversation history. This is how document context, tag context, and custom context get injected as inline developer messages when injection_mode=conversation.

5. LLM Loop

The ZAVChatCompletionClient.complete() method handles the recursive tool-use loop:

Call LLM with (system prompt, conversation, tools)
└─ If LLM requests tool calls:
Execute tools (parallel when possible)
Append tool results to conversation
Call LLM again (recursive, up to 20 iterations)
└─ If no tool calls:
Stream the response text back

The loop is capped at max_nesting_level=20. If the LLM keeps requesting tools beyond that limit, the SDK performs one final tool-less LLM call instructing the model to produce its best answer with the information already gathered, and returns that response. If the budget is exhausted before any LLM call is made, the SDK returns a static fallback bot message instead. In both cases the client emits {"max_nesting_level_reached": True} via log_fn for observability.

6. Post-Process

async for response, message in message_processing_provider.process_stream(raw_stream):
yield message

The agent wraps the raw LLM output stream in a chain of MessageProcessor implementations. Each processor receives the full async generator of (ChatResponse, ChatMessage) pairs and yields transformed pairs — controlling its own buffering strategy. Processors compose via generator chaining: each wraps the previous one's output.

The primary processor is CitationProcessor, which passes through intermediate streaming chunks unchanged and resolves citation markers only on the final message.

7. Cleanup

After execution completes, the Agent iterates all instance attributes and calls aclose(), cleanup(), or async_cleanup() on anything that implements those methods — then deletes all attributes and triggers garbage collection. This ensures MCP connections, file handles, and other resources are released per-request.

Cross-Cutting Concerns — The agent_state Module

Some data needs to flow between components that don't have direct references to each other. The tool execution phase produces data that the post-processing phase consumes. These are handled by singleton request-scoped objects in the agent_state module.

CitationStore

The bridge between search tools and citation post-processing:

  • During tool execution: whenever IndexToolsSource formats hits for the LLM — whether from search, browse, document listing, metadata retrieval, or document reading — it calls citation_store.register_hit() for each result, storing the full document metadata, context snippet, short ID, and index type.
  • During post-processing: CitationProcessor reads from the same CitationStore to resolve inline citation markers in the LLM's final response into structured evidence objects with source URLs, page ranges, and relevance scores.

The CitationStoreFactory is marked __singleton__ = True, so the same instance is shared across all components within a single request. Both the tool source and the message processor receive the same store through DI.

The citation strategy — how the LLM formats its references — is configured via CitationConfiguration:

StrategyWhat the LLM outputsWhat the processor does
inline_urlFull URLs in markdown linksResolves to evidence with document metadata
short_idShort hash IDs like [abc123]Maps short ID → source URL → full evidence
deferredNothing — processor handles itMatches response text to search results post-hoc
sup_numericSuperscript numbers ¹²³Maps numbers to search result order

ConversationImageStore

Handles images that appear in conversations:

  • During context resolution: when context sources encounter images, they call image_store.register_image(base64_data) and get back an 8-character SHA256 hash.
  • During tool execution: tools that need the original image data call image_store.get_image_data(short_id).

Like CitationStore, this is a singleton per request — the same instance bridges context resolution and tool execution.

The Pattern

Both stores follow the same pattern:

  1. Registered via AgentDependencyFactory with __singleton__ = True
  2. Auto-discovered through DI — any component that declares the store as a constructor parameter gets the same instance
  3. No direct references between producer and consumer — they communicate through the shared store

This keeps the tool sources and processors decoupled. A ToolsSource doesn't need to know that a MessageProcessor exists; they share data through the store.

Source Filtering — The Valve Chain

Every provider uses the same filtering logic to decide which sources are active. The filter evaluates three inputs in order:

  1. exclude_sources — hard veto. If a source name is in the exclude set, it's OFF regardless of anything else.
  2. include_sources — allowlist. If defined and the source is in it, the source is ON (overrides the source's own enabled flag). If defined and the source is not in it, the source is OFF.
  3. source_enabled — when no include list is defined, the source's own enabled flag decides.
def is_source_active(name, source_enabled, include, exclude) -> bool:
if exclude is not None and name in exclude:
return False # exclude always wins
if include is not None:
return name in include # allowlist overrides enabled flag
return source_enabled # fall back to source default

This creates a clean precedence chain. A source can ship as enabled=False by default (opt-in) or enabled=True (opt-out). The deployer's include_sources / exclude_sources in configuration always has the final say.

A second filter, passes_name_filter, applies the same logic at the individual tool level — a source might be active, but specific tools within it can be included or excluded.

Server Application and Request Lifecycle

The HTTP controller receives requests and routes them through the message bus to handlers. Handlers retrieve agent configuration (from tenant settings in managed deployments, from agent_setups.json locally), assemble the agent via ChatAgentFactory, and stream the response back.

The Complete Request Lifecycle

1. Request arrives at HTTP controller
2. Controller dispatches Command via message bus
3. Handler calls ChatAgentFactory.create(agent_identifier, bot_params)
→ Fetches AgentSetup from AgentSetupRetriever
→ Looks up agent class from ChatAgentClassRegistry
→ DI resolution: all constructor parameters resolved recursively
- Providers via AgentDependencyFactory
- Sources via DependencyGroup auto-collection
- Singletons (CitationStore, ConversationImageStore) via resolution_cache
- Config models from agent_configuration
4. Handler calls agent.execute_streaming(conversation)
a. DispatchProvider.try_dispatch() — short-circuit if a rule matches
b. Collect tools from all providers
c. Assemble system prompt from all providers
d. ContextProvider.process_conversation() — interleave context into messages
e. ZAVChatCompletionClient.complete() — streaming with recursive tool execution loop
f. MessageProcessingProvider.process() — post-process each response chunk
g. Yield processed ChatMessage to caller
5. Cleanup: all provider/source attributes closed, GC triggered

Agent Base Classes

Three base classes support different agent patterns:

  • ChatAgent — core base with execute(), tracing instrumentation, ToolsRegistry, and cleanup lifecycle
  • StreamableChatAgent — extends ChatAgent with execute_streaming() for streaming responses. The built-in Agent extends this.
  • ProcessorAgent — non-chat agent for document processing pipelines (not covered here)

Register agent classes with the @ChatAgentClassRegistry.register() decorator. The agent_name class attribute maps to the agent_name field in AgentSetup.