GEMCLAW

Open protocol · Apache‑2.0 · Multilingual

Deterministic memory
for agents.

Permanent, auditable state across sessions. +100pp cross‑session recall, engine‑agnostic identity, 100% drift self‑detection.

For teams
Scroll

Measured on real tasks · not LoCoMo trivia

Numbers from real downstream tasks.

Most agent-memory products score themselves on conversational recall benchmarks (LoCoMo) — which empirically don't translate to faster coding completion. Every number below comes from a real downstream task, reproducible from the public repo with your own API keys.

The moat

Deterministic · Auditable · Engine‑agnostic ·Time‑travel queryable

Append-only causal-DAG with view-time reconciliation. No LLM-extraction in the hot path — every decision is traceable, replayable, and survives audit. Mem0 / Letta / Zep all rely on probabilistic LLM extraction; we don't.

Every number comes from a reproducible benchmark in our public repo. N=1 at this scale — results will vary by model, prompt, and workload. Click any tile for the raw data, or read how we run our benchmarks →

The problem

Agents hit the
same wall.

Every agent tool — Claude Code, Cursor, Aider, LangChain — fails in the same way after long sessions. The ceiling isn't the model. It's how state is stored.

Context fills.

Even 1M/2M tokens fill. Attention dilutes. Lost-in-the-middle is measured, not theoretical — and every tool call pays the per-request premium again.

turn 1   ~12k tokens
turn 8   ~94k tokens
turn 20  compaction triggers
          ...specifics lost

Compaction is lossy.

Summarizers flatten transcripts. Pinned state, in-flight plans, specific decisions — all collapse into a paragraph. Two sessions later the agent forgets the detail that mattered.

[compact] 94k → 6k
  "...worked on auth flow,
   fixed a bug in token
   refresh..."
  # real details: gone

Memory is fragmented.

Every tool rolls its own. CLAUDE.md, cursorrules, moltbook, LangChain memory. None share across agents. None survive a provider switch. The same rules get copy-pasted five times.

~/proj/
├── CLAUDE.md
├── .cursorrules
├── .windsurf/
├── JULES.md
└── .github/copilot-*

A substrate, not another feature.

The agent is a thin process; the context window is its working set. Identity, commitments, and self-knowledge live in a typed, verifiable substrate on disk — outside the inference budget. The model becomes interchangeable. The mind doesn't reset.

Architecture

Five layers,
one substrate.

Each layer is independently useful. Combined, they turn your context window into a working set with the full history one getAction() away.

01

ActionRecord

Every edit, read, search, or decision becomes a typed, verifiable event. UUIDv7 ids, structured verification slots, append-only.

type: 'edit'
verification: {
  ast: { ok: true }
}
02

DecisionGraph

parent_id edges link every record to its cause. Time-travel queries, causality chains, who-changed-what-and-why — all queryable.

traceCausalChain(id)
→ edit → verify
  → read → injector
03

Counterfactual Replay

Every intent records the alternatives the agent considered. Replay any path. Diff cost + verification deltas. Find out what the cheaper option would have done — without re-running the whole task.

gemclaw replay --intent X
  --use 0 --materialize
→ cheaper · no regressions
04

KnowledgeAtlas

Portable NDJSON export of your accumulated substrate. Ship it between machines, teams, or models. Topologically-ordered so imports don't break.

gemclaw atlas export
→ atlas.ndjson
  version 0.1 · 620 rows
05

Virtual Runtime

Compaction becomes a page-swap, not a lossy summary. Pinned pages survive; evicted pages remain fetchable via fetch_action(id).

onBeforeCompact()
→ kept 3, dropped 21
  type: 'compact' logged

Compared

Different layer.
Different problem solved.

Mem0, Letta, Zep, and Cognee solve memory retrieval. We solve identity continuity, drift self-detection, and engine-agnostic commitment. Different categories — accurate side-by-side below. Submit corrections via GitHub issue; we update within 48 hours.

Mem0
Memory primitive
vector similarity
Engine-agnostic
yes (API)
Self-detects drift
Anti-sycophancy floor
Infra requirement
API
Open protocol
proprietary
Letta
Memory primitive
context paging
Engine-agnostic
partial
Self-detects drift
Anti-sycophancy floor
Infra requirement
API
Open protocol
proprietary
Zep
Memory primitive
bi-temporal graph
Engine-agnostic
yes
Self-detects drift
Anti-sycophancy floor
Infra requirement
Neo4j / FalkorDB
Open protocol
proprietary
Cognee
Memory primitive
ontological KG
Engine-agnostic
yes
Self-detects drift
Anti-sycophancy floor
Infra requirement
pipeline
Open protocol
proprietary
GemClawThis is us
Memory primitive
append-only event log
Engine-agnostic
verified · 2 engines
Self-detects drift
100% TP / 0% FP
Anti-sycophancy floor
+90pp
Infra requirement
single SQLite file
Open protocol
GMP v0.2 · 24 tests

Broader ecosystem we respect but differentiate from: OMEGA (local-first, 95.4% LongMemEval claim), Supermemory (#1 MemoryBench self-claim), Hindsight (Vectorize, 91.4% LongMemEval), Mastra OM, MemMachine, Memobase. Architecture details verified from each vendor's own docs as of May 2026. How we run our own benchmarks →

Backends

Any model, any provider.
Your key, your choice.

The substrate lives on your machine. The model runs wherever you want it to. Bring your own key for Anthropic, Qwen3-Max, DeepSeek, MiniMax — or point at a local OpenAI-compatible server for zero per-request cost.

Cloud models · BYOK

Anthropic · Alibaba · DeepInfra · OpenRouter

Plug in any API key. The proxy translates between Anthropic message format and whichever backend you point it at. Fallback chain handles rate limits and outages.

Anthropic Claude (BYOK or Max subscription)
Alibaba Qwen3-Max via Coding Plan
DeepInfra DeepSeek V3.2
OpenRouter MiniMax M2.5 (free tier)
Anything OpenAI-compatible
# Point Claude Code at the proxy
ANTHROPIC_BASE_URL=http://localhost:8000 \
  claude

Local hardware

Zero per-request cost · your data never leaves

Run Qwen, Llama, Gemma, or any open-weight model on your own machine. Same substrate, same hooks, same protocol. The agent never knows the difference.

LM Studio
Ollama
llama.cpp server
vLLM
Any OpenAI-compatible endpoint
# Point the proxy at your local server
# in ~/.gemclaw/config.json:
{ "providers": {
    "local": { "enabled": true,
               "host": "127.0.0.1",
               "port": 1234 } } }
Also works withClaude Code·Cursor·Aider·LangChain·LlamaIndex·Python·any MCP / GMP client →

Get started

Three paths.
Same substrate.

Public access opens after the YC S26 review window. Until then, request early access for one of the three paths below.

The commands below are the install paths once the repo is public.

The primary path. One command, hooks load on next launch.

Terminal · Claude Code plugin
Request early access →·Apache-2.0·Node 20+, Python 3.9+

Pricing

Open core.
Cloud when you're ready.

The substrate is yours, free, forever. The hosted layer is opening with a small early-access cohort while we tune defaults from real usage.

Independent developers

Open Source

Free, forever

Apache-2.0

  • Append-only causal-DAG substrate
  • Engine-agnostic — swap models, keep your mind
  • Full causality + audit trail
  • Self-hosted, your data never leaves
Notify me on launch →
Most popular

Fast-moving startups

Cloud

Early access

Per-org pricing with substrate event caps — not per-agent, not per-seat

  • Hosted substrate, zero ops
  • Cross-machine sync
  • Team-shared mind state
  • Dashboard + observability

Mission-critical teams

Enterprise

Founding partner

Custom — work directly with us; reduced pricing in exchange for early feedback

  • BYOC deployment (your AWS/GCP)
  • SOC2 Type II + GDPR posture (roadmap — see /teams)
  • Role-based access + audit log
  • Dedicated support channel

Cloud cohort opens in waves. Form takes a minute — the four fields help us prioritize who to onboard first. Pricing finalized after the first 20 production deployments. We commit to no surprise per-agent or per-seat charges, ever.