Open protocol · Apache‑2.0 · Multilingual
Deterministic memory
for agents.
Permanent, auditable state across sessions. +100pp cross‑session recall, engine‑agnostic identity, 100% drift self‑detection.
Measured on real tasks · not LoCoMo trivia
Numbers from real downstream tasks.
Most agent-memory products score themselves on conversational recall benchmarks (LoCoMo) — which empirically don't translate to faster coding completion. Every number below comes from a real downstream task, reproducible from the public repo with your own API keys.
The moat
Deterministic · Auditable · Engine‑agnostic ·Time‑travel queryable
Append-only causal-DAG with view-time reconciliation. No LLM-extraction in the hot path — every decision is traceable, replayable, and survives audit. Mem0 / Letta / Zep all rely on probabilistic LLM extraction; we don't.
Every number comes from a reproducible benchmark in our public repo. N=1 at this scale — results will vary by model, prompt, and workload. Click any tile for the raw data, or read how we run our benchmarks →
The problem
Agents hit the
same wall.
Every agent tool — Claude Code, Cursor, Aider, LangChain — fails in the same way after long sessions. The ceiling isn't the model. It's how state is stored.
Context fills.
Even 1M/2M tokens fill. Attention dilutes. Lost-in-the-middle is measured, not theoretical — and every tool call pays the per-request premium again.
turn 1 ~12k tokens
turn 8 ~94k tokens
turn 20 compaction triggers
...specifics lostCompaction is lossy.
Summarizers flatten transcripts. Pinned state, in-flight plans, specific decisions — all collapse into a paragraph. Two sessions later the agent forgets the detail that mattered.
[compact] 94k → 6k "...worked on auth flow, fixed a bug in token refresh..." # real details: gone
Memory is fragmented.
Every tool rolls its own. CLAUDE.md, cursorrules, moltbook, LangChain memory. None share across agents. None survive a provider switch. The same rules get copy-pasted five times.
~/proj/ ├── CLAUDE.md ├── .cursorrules ├── .windsurf/ ├── JULES.md └── .github/copilot-*
A substrate, not another feature.
The agent is a thin process; the context window is its working set. Identity, commitments, and self-knowledge live in a typed, verifiable substrate on disk — outside the inference budget. The model becomes interchangeable. The mind doesn't reset.
Architecture
Five layers,
one substrate.
Each layer is independently useful. Combined, they turn your context window into a working set with the full history one getAction() away.
ActionRecord
Every edit, read, search, or decision becomes a typed, verifiable event. UUIDv7 ids, structured verification slots, append-only.
type: 'edit'
verification: {
ast: { ok: true }
}DecisionGraph
parent_id edges link every record to its cause. Time-travel queries, causality chains, who-changed-what-and-why — all queryable.
traceCausalChain(id) → edit → verify → read → injector
Counterfactual Replay
Every intent records the alternatives the agent considered. Replay any path. Diff cost + verification deltas. Find out what the cheaper option would have done — without re-running the whole task.
gemclaw replay --intent X --use 0 --materialize → cheaper · no regressions
KnowledgeAtlas
Portable NDJSON export of your accumulated substrate. Ship it between machines, teams, or models. Topologically-ordered so imports don't break.
gemclaw atlas export → atlas.ndjson version 0.1 · 620 rows
Virtual Runtime
Compaction becomes a page-swap, not a lossy summary. Pinned pages survive; evicted pages remain fetchable via fetch_action(id).
onBeforeCompact() → kept 3, dropped 21 type: 'compact' logged
Compared
Different layer.
Different problem solved.
Mem0, Letta, Zep, and Cognee solve memory retrieval. We solve identity continuity, drift self-detection, and engine-agnostic commitment. Different categories — accurate side-by-side below. Submit corrections via GitHub issue; we update within 48 hours.
| Mem0 | Letta | Zep | Cognee | GemClaw | |
|---|---|---|---|---|---|
| Memory primitive | vector similarity | context paging | bi-temporal graph | ontological KG | append-only event log |
| Engine-agnostic | yes (API) | partial | yes | yes | verified · 2 engines |
| Self-detects drift | — | — | — | — | 100% TP / 0% FP |
| Anti-sycophancy floor | — | — | — | — | +90pp |
| Infra requirement | API | API | Neo4j / FalkorDB | pipeline | single SQLite file |
| Open protocol | proprietary | proprietary | proprietary | proprietary | GMP v0.2 · 24 tests |
- Memory primitive
- vector similarity
- Engine-agnostic
- yes (API)
- Self-detects drift
- —
- Anti-sycophancy floor
- —
- Infra requirement
- API
- Open protocol
- proprietary
- Memory primitive
- context paging
- Engine-agnostic
- partial
- Self-detects drift
- —
- Anti-sycophancy floor
- —
- Infra requirement
- API
- Open protocol
- proprietary
- Memory primitive
- bi-temporal graph
- Engine-agnostic
- yes
- Self-detects drift
- —
- Anti-sycophancy floor
- —
- Infra requirement
- Neo4j / FalkorDB
- Open protocol
- proprietary
- Memory primitive
- ontological KG
- Engine-agnostic
- yes
- Self-detects drift
- —
- Anti-sycophancy floor
- —
- Infra requirement
- pipeline
- Open protocol
- proprietary
- Memory primitive
- append-only event log
- Engine-agnostic
- verified · 2 engines
- Self-detects drift
- 100% TP / 0% FP
- Anti-sycophancy floor
- +90pp
- Infra requirement
- single SQLite file
- Open protocol
- GMP v0.2 · 24 tests
Broader ecosystem we respect but differentiate from: OMEGA (local-first, 95.4% LongMemEval claim), Supermemory (#1 MemoryBench self-claim), Hindsight (Vectorize, 91.4% LongMemEval), Mastra OM, MemMachine, Memobase. Architecture details verified from each vendor's own docs as of May 2026. How we run our own benchmarks →
Backends
Any model, any provider.
Your key, your choice.
The substrate lives on your machine. The model runs wherever you want it to. Bring your own key for Anthropic, Qwen3-Max, DeepSeek, MiniMax — or point at a local OpenAI-compatible server for zero per-request cost.
Cloud models · BYOK
Anthropic · Alibaba · DeepInfra · OpenRouter
Plug in any API key. The proxy translates between Anthropic message format and whichever backend you point it at. Fallback chain handles rate limits and outages.
# Point Claude Code at the proxy ANTHROPIC_BASE_URL=http://localhost:8000 \ claude
Local hardware
Zero per-request cost · your data never leaves
Run Qwen, Llama, Gemma, or any open-weight model on your own machine. Same substrate, same hooks, same protocol. The agent never knows the difference.
# Point the proxy at your local server
# in ~/.gemclaw/config.json:
{ "providers": {
"local": { "enabled": true,
"host": "127.0.0.1",
"port": 1234 } } }Get started
Three paths.
Same substrate.
Public access opens after the YC S26 review window. Until then, request early access for one of the three paths below.
The commands below are the install paths once the repo is public.
The primary path. One command, hooks load on next launch.
Pricing
Open core.
Cloud when you're ready.
The substrate is yours, free, forever. The hosted layer is opening with a small early-access cohort while we tune defaults from real usage.
Independent developers
Open Source
Free, forever
Apache-2.0
- Append-only causal-DAG substrate
- Engine-agnostic — swap models, keep your mind
- Full causality + audit trail
- Self-hosted, your data never leaves
Fast-moving startups
Cloud
Early access
Per-org pricing with substrate event caps — not per-agent, not per-seat
- Hosted substrate, zero ops
- Cross-machine sync
- Team-shared mind state
- Dashboard + observability
Mission-critical teams
Enterprise
Founding partner
Custom — work directly with us; reduced pricing in exchange for early feedback
- BYOC deployment (your AWS/GCP)
- SOC2 Type II + GDPR posture (roadmap — see /teams)
- Role-based access + audit log
- Dedicated support channel
Cloud cohort opens in waves. Form takes a minute — the four fields help us prioritize who to onboard first. Pricing finalized after the first 20 production deployments. We commit to no surprise per-agent or per-seat charges, ever.