The agent SDK built for the models you actually run
Most agent frameworks assume a frontier model that never makes a mistake. @maniac-ai/agents assumes the opposite — and engineers for it. It's built for small, open, and local models: the 7Bs and 14Bs on Mach, vLLM, Ollama, llama.cpp and MLX.
$ npm install @maniac-ai/agentsRead the SDK docsimport { OpenAICompatibleModel, runAgent } from "@maniac-ai/agents";
const model = new OpenAICompatibleModel({ slug: "qwen2.5-7b-instruct",
baseUrl: "http://localhost:8000/v1", requireApiKey: false });
const result = await runAgent(
{ id: "assistant", instructions: "Be concise.", model },
"Summarize the release checks."
);localhost
Point at any OpenAI-compatible server — vLLM, Ollama, llama.cpp, SGLang, MLX.
top-10
Only the most relevant tools enter the prompt each turn, not your whole catalog.
depth-bounded
Recursive subagents keep messy multi-step work in isolated context.
auto-repair
Malformed tool arguments and JSON are caught and retried, never silently dropped.
Point it at localhost. That's the whole setup.
Any server that speaks the OpenAI Chat Completions API is one constructor away — vLLM, Ollama, llama.cpp, MLX. Just change the base URL. Or implement the one-method Model interface for any other runtime.
import { OpenAICompatibleModel, runAgent } from "@maniac-ai/agents";
const model = new OpenAICompatibleModel({
slug: "qwen2.5-7b-instruct",
baseUrl: "http://localhost:8000/v1", // vLLM · Ollama · llama.cpp
requireApiKey: false, // local servers don't check
});
await runAgent({ id: "assistant", instructions: "Be concise.", model }, prompt);Hundreds of tools, without drowning the context window
Every unused tool schema is context a small model pays for and gets distracted by. Register large tool sets as toolsets, and each turn the SDK surfaces only the handful most relevant to the query — a catalog of 200 for the cost of ~10.
Recursive subagents that keep context where it belongs
A whole toolset can collapse into one { prompt } tool that spins up a nested subagent — its own fresh context, the messy work done in isolation, just the answer handed back. Recursive and bounded by max_depth, so the main transcript stays clean.
Small models break JSON. We expect it.
Malformed tool arguments surface as a structured error instead of running with empty args. Structured output is schema-validated with an automatic repair loop, and parsing tolerates quirks like markdown-fenced JSON.
// model returns near-JSON, wrapped in a fence
```json
{ "total": 42, } // trailing comma
```
// → fence stripped, validated against output_model
rejected: "Unexpected token } in JSON"
// → model gets its own error, asked to resubmit
output_repair_attempts: 2 remaining
{ "total": 42 } // ✓ valid on retryStart small, escalate only when you must
Small models are fast and cheap until they're stuck. RetryingModel absorbs flaky servers, FallbackModel cascades 7B → 14B → hosted, and a hook can route one hard turn to a bigger model — then drop back down.
Make a small context window go far
A compactor folds older turns into a summary, a relevance filter drops what's irrelevant, and working memory keeps facts out of context — so the model carries less each turn. All of it fails open.
Before · long transcript
After · fits the window
System prompt and recent turns kept verbatim; the middle becomes a summary. Fails open — an error continues the run untouched.
Hard limits and a black box, for models that surprise you
Loose sampling, runaway loops, and flaky servers come with the territory. The runtime caps them with budgets and guardrails and records every step — so a misbehaving model is bounded and debuggable, not a mystery.
Budgets
Token, cost, wall-clock, and iteration caps enforced every turn.
Guardrails
Allow, rewrite, block, or require approval on LM and tool calls.
Resume
Interrupted runs persist their transcript and seal pending tool calls.
Tracing
Every step, tool, and token mapped to OpenTelemetry spans.
A different design center, not a longer feature list
The Vercel AI SDK and Mastra are excellent — for the models they assume. @maniac-ai/agents starts from a different premise: the model is small, local, and fallible. Here's where that changes the engineering.
Build agents on the models you own
@maniac-ai/agents is the runtime inside the Maniac app — or install it standalone and point it at your own local server.