Inference
Inference
Model interface, provider adapters, streaming, reasoning configuration, and resilience wrappers.
Inference is the boundary between the agent runner and LLM providers. Every adapter implements the Model interface (infer + stream) and translates the SDK's normalized InferenceRequest / InferenceResponse shapes to provider wire formats.
Topics
- Model and adapters —
Model,OpenAICompatibleModel,AnthropicModel,OpenRouterModel,VercelGatewayModel,StaticModel - Streaming and reasoning —
StreamChunk, reasoning channels, multimodal content - Retry and fallback —
RetryingModel,FallbackModel
Request flow
flowchart LR
Runner["Agent runner"] --> Req["InferenceRequest"]
Req --> Adapter["Model adapter"]
Adapter --> API["Provider API"]
API --> Res["InferenceResponse / StreamChunk"]
Res --> RunnerThe runner builds InferenceRequest from conversation messages, tool definitions, optional reasoning, and max_tokens. Adapters handle provider-specific serialization; the runner never speaks raw HTTP.