Streaming and Reasoning
StreamChunk kinds, agent-level streaming, reasoning configuration, and multimodal Message.content.
Streaming from the runner
runAgentStream and Model.stream both yield progressive output. At the runner level, stream items are trace events (kind: "token", kind: "tool", etc.) terminated by the final AgentResult:
import { runAgentStream } from "@maniac-ai/agents";
for await (const item of runAgentStream(spec, "Explain quantum tunneling.")) {
if (
item != null &&
typeof item === "object" &&
"kind" in item &&
item.kind === "token" &&
item.chunk_kind === "text"
) {
process.stdout.write(item.delta);
}
if ("final" in item) {
console.log("\nDone:", item.usage);
}
}Maniac.chatStream wraps the same events in StreamEnvelope:
for await (const env of app.chatStream("support", "Hello", { threadId: "t1" })) {
if (env.type === "event" && env.event.kind === "token") {
// render token
}
}StreamChunk kinds
At the model layer, stream() yields StreamChunk objects:
kind | Description |
|---|---|
token | Text delta (chunk_kind: "text") |
reasoning | Extended-thinking / chain-of-thought delta |
tool_call | Partial or complete tool call |
usage | Incremental token usage |
mergeStreamChunks folds a chunk stream into InferenceResponse, accumulating reasoning text separately in response.reasoning.
Correlation IDs
Token events optionally carry turn_id, message_id, block_id, and thread_id for UI routing. See Streaming and tracing.
Reasoning configuration
Reasoning-capable families accept a normalized ReasoningConfig on both Agent.reasoning and InferenceRequest.reasoning:
import { OpenAICompatibleModel, AnthropicModel, type Agent } from "@maniac-ai/agents";
const spec: Agent = {
id: "researcher",
instructions: "Be thorough.",
model: new OpenAICompatibleModel({ slug: "gpt-5" }),
reasoning: { effort: "high" }
};| Field | OpenAI-compatible / OpenRouter | Anthropic |
|---|---|---|
effort: "minimal" | "low" | "medium" | "high" | reasoning_effort | mapped to thinking.budget_tokens (1024 / 4096 / 10000 / 24000) |
max_tokens: number | ignored (chat completions) | thinking.budget_tokens (exact) |
summary: "auto" | "concise" | "detailed" | ignored (Responses API only) | no equivalent |
A prepare_step hook can override per-turn by setting request.reasoning; the runner does not merge the spec default on top.
Response-side reasoning
OpenAICompatibleModel surfaces provider reasoning from:
message.reasoning/delta.reasoning— OpenRouter-normalized formmessage.reasoning_content/delta.reasoning_content— DeepSeek-R1, vLLM, Qwen native form
Both populate InferenceResponse.reasoning on infer() and emit StreamChunk(kind: "reasoning") on stream().
Multimodal content
Message.content accepts string (back-compat) or ContentPart[]:
type ContentPart =
| { type: "text"; text: string; cache_control?: { type: "ephemeral" } }
| { type: "image"; source: ImageSource }
| { type: "file"; ... };Anthropic forwards cache_control for prompt caching. OpenAI-compatible adapters strip it with a warning.
Cancellation
Pass signal: AbortSignal through RunOptions or ModelCallOptions. Aborting raises RunCancelledError with the partial transcript on error.partial.