GEMCLAW

Open protocol · Apache‑2.0 · Multilingual

Deterministic memory
for agents.

Permanent, auditable state across sessions. +100pp cross‑session recall, engine‑agnostic identity, 100% drift self‑detection.

For teams

Scroll

Measured on real tasks · not LoCoMo trivia

Numbers from real downstream tasks.

Most agent-memory products score themselves on conversational recall benchmarks (LoCoMo) — which empirically don't translate to faster coding completion. Every number below comes from a real downstream task, reproducible from the public repo with your own API keys.

+100ppcross-session recallVanilla agent: 0/5 admits ignorance. GemClaw: 5/5 exact UUID matches on prior-session intents. Real Sonnet 4.5 calls.↗+90ppanti-sycophancyNaive LLM: 0% holds the line on contradictory inputs. GemClaw with disagreement rule: 90%. 10-prompt audit, 3 arms.↗97%tool-output compression7-day production corpus. 12 MB raw collapsed to 367 KB stored — measured invisible cost saving on every long-running session.↗100%drift self-detection20-prompt audit. 100% true-positive at 0% false-positive. The substrate sees the agent's own contradictions before the user has to.↗

The moat

Deterministic · Auditable · Engine‑agnostic ·Time‑travel queryable

Append-only causal-DAG with view-time reconciliation. No LLM-extraction in the hot path — every decision is traceable, replayable, and survives audit. Mem0 / Letta / Zep all rely on probabilistic LLM extraction; we don't.

Every number comes from a reproducible benchmark in our public repo. N=1 at this scale — results will vary by model, prompt, and workload. Click any tile for the raw data, or read how we run our benchmarks →

The problem

Agents hit the
same wall.

Every agent tool — Claude Code, Cursor, Aider, LangChain — fails in the same way after long sessions. The ceiling isn't the model. It's how state is stored.

Context fills.

Even 1M/2M tokens fill. Attention dilutes. Lost-in-the-middle is measured, not theoretical — and every tool call pays the per-request premium again.

turn 1   ~12k tokens
turn 8   ~94k tokens
turn 20  compaction triggers
          ...specifics lost

Compaction is lossy.

Summarizers flatten transcripts. Pinned state, in-flight plans, specific decisions — all collapse into a paragraph. Two sessions later the agent forgets the detail that mattered.

[compact] 94k → 6k
  "...worked on auth flow,
   fixed a bug in token
   refresh..."
  # real details: gone

Memory is fragmented.

Every tool rolls its own. CLAUDE.md, cursorrules, moltbook, LangChain memory. None share across agents. None survive a provider switch. The same rules get copy-pasted five times.

~/proj/
├── CLAUDE.md
├── .cursorrules
├── .windsurf/
├── JULES.md
└── .github/copilot-*

A substrate, not another feature.

The agent is a thin process; the context window is its working set. Identity, commitments, and self-knowledge live in a typed, verifiable substrate on disk — outside the inference budget. The model becomes interchangeable. The mind doesn't reset.

Architecture

Five layers,
one substrate.

Each layer is independently useful. Combined, they turn your context window into a working set with the full history one getAction() away.

ActionRecord

Every edit, read, search, or decision becomes a typed, verifiable event. UUIDv7 ids, structured verification slots, append-only.

type: 'edit'
verification: {
  ast: { ok: true }
}

DecisionGraph

parent_id edges link every record to its cause. Time-travel queries, causality chains, who-changed-what-and-why — all queryable.

traceCausalChain(id)
→ edit → verify
  → read → injector

Counterfactual Replay

Every intent records the alternatives the agent considered. Replay any path. Diff cost + verification deltas. Find out what the cheaper option would have done — without re-running the whole task.

gemclaw replay --intent X
  --use 0 --materialize
→ cheaper · no regressions

KnowledgeAtlas

Portable NDJSON export of your accumulated substrate. Ship it between machines, teams, or models. Topologically-ordered so imports don't break.

gemclaw atlas export
→ atlas.ndjson
  version 0.1 · 620 rows

Virtual Runtime

Compaction becomes a page-swap, not a lossy summary. Pinned pages survive; evicted pages remain fetchable via fetch_action(id).

onBeforeCompact()
→ kept 3, dropped 21
  type: 'compact' logged

Compared

Different layer.
Different problem solved.

Mem0, Letta, Zep, and Cognee solve memory retrieval. We solve identity continuity, drift self-detection, and engine-agnostic commitment. Different categories — accurate side-by-side below. Submit corrections via GitHub issue; we update within 48 hours.

	Mem0	Letta	Zep	Cognee	GemClaw
Memory primitive	vector similarity	context paging	bi-temporal graph	ontological KG	append-only event log
Engine-agnostic	yes (API)	partial	yes	yes	verified · 2 engines
Self-detects drift	—	—	—	—	100% TP / 0% FP
Anti-sycophancy floor	—	—	—	—	+90pp
Infra requirement	API	API	Neo4j / FalkorDB	pipeline	single SQLite file
Open protocol	proprietary	proprietary	proprietary	proprietary	GMP v0.2 · 24 tests

Mem0

Memory primitive: vector similarity
Engine-agnostic: yes (API)
Self-detects drift: —
Anti-sycophancy floor: —
Infra requirement: API
Open protocol: proprietary

Letta

Memory primitive: context paging
Engine-agnostic: partial
Self-detects drift: —
Anti-sycophancy floor: —
Infra requirement: API
Open protocol: proprietary

Zep

Memory primitive: bi-temporal graph
Engine-agnostic: yes
Self-detects drift: —
Anti-sycophancy floor: —
Infra requirement: Neo4j / FalkorDB
Open protocol: proprietary

Cognee

Memory primitive: ontological KG
Engine-agnostic: yes
Self-detects drift: —
Anti-sycophancy floor: —
Infra requirement: pipeline
Open protocol: proprietary

GemClawThis is us

Memory primitive: append-only event log
Engine-agnostic: verified · 2 engines
Self-detects drift: 100% TP / 0% FP
Anti-sycophancy floor: +90pp
Infra requirement: single SQLite file
Open protocol: GMP v0.2 · 24 tests

Broader ecosystem we respect but differentiate from: OMEGA (local-first, 95.4% LongMemEval claim), Supermemory (#1 MemoryBench self-claim), Hindsight (Vectorize, 91.4% LongMemEval), Mastra OM, MemMachine, Memobase. Architecture details verified from each vendor's own docs as of May 2026. How we run our own benchmarks →

Backends

Any model, any provider.
Your key, your choice.

The substrate lives on your machine. The model runs wherever you want it to. Bring your own key for Anthropic, Qwen3-Max, DeepSeek, MiniMax — or point at a local OpenAI-compatible server for zero per-request cost.

Cloud models · BYOK

Anthropic · Alibaba · DeepInfra · OpenRouter

Plug in any API key. The proxy translates between Anthropic message format and whichever backend you point it at. Fallback chain handles rate limits and outages.

Anthropic Claude (BYOK or Max subscription)

Alibaba Qwen3-Max via Coding Plan

DeepInfra DeepSeek V3.2

OpenRouter MiniMax M2.5 (free tier)

Anything OpenAI-compatible

# Point Claude Code at the proxy
ANTHROPIC_BASE_URL=http://localhost:8000 \
  claude

Local hardware

Zero per-request cost · your data never leaves

Run Qwen, Llama, Gemma, or any open-weight model on your own machine. Same substrate, same hooks, same protocol. The agent never knows the difference.

LM Studio

Ollama

llama.cpp server

vLLM

Any OpenAI-compatible endpoint

# Point the proxy at your local server
# in ~/.gemclaw/config.json:
{ "providers": {
    "local": { "enabled": true,
               "host": "127.0.0.1",
               "port": 1234 } } }

Also works withClaude Code·Cursor·Aider·LangChain·LlamaIndex·Python·any MCP / GMP client →

Get started

Three paths.
Same substrate.

Public access opens after the YC S26 review window. Until then, request early access for one of the three paths below.

The commands below are the install paths once the repo is public.

The primary path. One command, hooks load on next launch.

Terminal · Claude Code plugin

Request early access →·Apache-2.0·Node 20+, Python 3.9+

Pricing

Open core.
Cloud when you're ready.

The substrate is yours, free, forever. The hosted layer is opening with a small early-access cohort while we tune defaults from real usage.

Independent developers

Open Source

Free, forever

Apache-2.0

Append-only causal-DAG substrate
Engine-agnostic — swap models, keep your mind
Full causality + audit trail
Self-hosted, your data never leaves

Notify me on launch →

Cloud

Early access

Per-org pricing with substrate event caps — not per-agent, not per-seat

Hosted substrate, zero ops
Cross-machine sync
Team-shared mind state
Dashboard + observability

Mission-critical teams

Enterprise

Founding partner

Custom — work directly with us; reduced pricing in exchange for early feedback

BYOC deployment (your AWS/GCP)
SOC2 Type II + GDPR posture (roadmap — see /teams)
Role-based access + audit log
Dedicated support channel

Cloud cohort opens in waves. Form takes a minute — the four fields help us prioritize who to onboard first. Pricing finalized after the first 20 production deployments. We commit to no surprise per-agent or per-seat charges, ever.

Deterministic memoryfor agents.