Inference

Model interface, provider adapters, streaming, reasoning configuration, and resilience wrappers.

Inference is the boundary between the agent runner and LLM providers. Every adapter implements the Model interface (infer + stream) and translates the SDK's normalized InferenceRequest / InferenceResponse shapes to provider wire formats.

Topics

Model and adapters — Model, OpenAICompatibleModel, AnthropicModel, OpenRouterModel, VercelGatewayModel, StaticModel
Streaming and reasoning — StreamChunk, reasoning channels, multimodal content
Retry and fallback — RetryingModel, FallbackModel

Request flow

flowchart LR
  Runner["Agent runner"] --> Req["InferenceRequest"]
  Req --> Adapter["Model adapter"]
  Adapter --> API["Provider API"]
  API --> Res["InferenceResponse / StreamChunk"]
  Res --> Runner

The runner builds InferenceRequest from conversation messages, tool definitions, optional reasoning, and max_tokens. Adapters handle provider-specific serialization; the runner never speaks raw HTTP.

Inference

Topics

Request flow

On this page