Observational & working memory
Background compression tiers, the remember tool, per-turn memory gates, and query-conditioned relevance filtering.
Observational & working memory
Beyond raw conversation history, Maniac supports two summarized memory tiers that splice into the chat prefix as system messages. Both run background LM jobs so foreground turns stay fast.
| Tier | Header | Updated by |
|---|---|---|
| Observational | [Active observations], [Reflections] | Observer + Reflector (background) |
| Working | [Working memory] | Post-turn updater + remember tool |
Conversation turns in ConversationStore are always preserved. Only memory-tier prefix blocks are candidates for the relevance filter.
Configure on Maniac
import { Maniac } from "@maniac-ai/agents";
import { SqliteMemory } from "@maniac-ai/agents/memory/sqlite";
const memory = new SqliteMemory({ filename: "./agent.db" });
await memory.setup();
const app = new Maniac({
memory,
agents: [myAgent],
observationalMemory: {
model: cheapModel,
scope: "thread" // or "resource" for cross-thread pooling
},
workingMemory: {
model: cheapModel,
scope: "resource", // default — pool profile across threads for same user
template: "# User profile\n\n## Goals\n\n## Notes\n",
update_after_every_turn: true
},
memoryRelevanceFilter: {
model: cheapModel,
min_chars_to_filter: 2000
}
});Pass resourceId on chat / chatStream when using scope: "resource":
await app.chat("agent", "thread-1", "Hello", { resourceId: "user-42" });Observational memory
Observational memory compresses unobserved message tails into dense observation chunks stored in ObservationStore.
Per-turn flow
loadThread— builds the prefix: active observations, reflections, current task, then the recent raw-message tail- Foreground run — the agent sees the compressed prefix instead of the full history
afterTurn— schedules background Observer to compress new slices; Reflector folds observations into higher-level reflections
Activation triggers
Buffered observations become active (included in the prefix) when:
- Unobserved tail exceeds
message_chars - Idle timeout elapses
- Provider changes mid-thread
- Sync fallback (
block_after) fires
Statuses: buffered → active.
Per-turn opt-out
Gate observational read/write per turn via memory on chat:
await app.chat("agent", "thread", message, {
memory: {
observational: "read", // "off" | "read" | "write"
working: "off"
}
});REPL introspection
The runtime exposes an ObservationsProxy for debugging: current(), buffered(), reflections(), state(), recent().
Working memory
Working memory is a single markdown document the agent maintains across turns — preferences, goals, standing instructions.
Each turn:
WorkingMemoryStore.load()→ splice as[Working memory]\n<doc>- After the turn, a background LM updater rewrites the doc from prior content + new messages + optional
template
The built-in remember(note) tool appends bullets under ## Notes (configurable via note_section).
Default scope is "resource" so a user profile pools across threads when you pass the same resourceId.
Relevance filter
MemoryRelevanceFilter runs a cheap background LM before each foreground turn to score which observation/reflection blocks and working-memory ## sections are relevant to the new user query. It reduces noise for multi-domain agents with large memory prefixes.
Configure on Maniac:
const app = new Maniac({
memory,
memoryRelevanceFilter: {
model: cheapModel,
min_chars_to_filter: 2000, // skip LM when candidate body is smaller
filter_observations: true, // obs + reflection blocks
filter_working_memory: true, // WM sections by ## headers
keep_when_empty: true, // fail-open: keep unfiltered if LM drops all
instructions: "Optional override for the scorer system prompt"
}
});Behavior
parseMemoryPrefix()decomposes the prefix into block ids (wm.0,obs.2,ref.1, …)- The scorer LM returns
{"keep": ["wm.0", "obs.2", ...]} - Fail-open: LM or parse errors return the unfiltered prefix and emit a
memory_filter_errortrace event - If the LM drops every block and
keep_when_emptyistrue, the unfiltered prefix is kept
Conversation history is never filtered — only [Active observations], [Reflections], and [Working memory] blocks.
Trace events
Filter runs emit tracer events (kind memory) you can surface in UIs or export via OTelTracer. An optional onComplete callback receives a MemoryFilterSummary.
Layering diagram
flowchart TB
CS[ConversationStore] --> OB[ObservationBuffer]
OS[ObservationStore] --> OB
WM[WorkingMemoryStore] --> WMR[WorkingMemoryRunner]
OB --> prefix[prefixMessages]
WMR --> prefix
prefix --> MRF[MemoryRelevanceFilter]
MRF --> run[runAgent foreground]
CS --> runHoncho interaction
Do not enable both HonchoMemory and observationalMemory — Honcho already compacts session context and the SDK logs a warning when both are configured.