the problem

The ergonomic ceiling
hits at session three.

Past two parallel sessions, the human stops being a strategic director and starts being a clipboard — manually ferrying decisions, specs, and context between terminal windows. Token costs spiral. Sessions drift. Mistakes compound.

01 · ceiling

Three sessions hits the wall

Two parallel sessions are manageable. Three, and you're spending more time coordinating than directing. Four, and you're a full-time message bus.

02 · clipboard

Human-as-relay

Decisions made in session A need to reach session B. You copy-paste. You paraphrase. You forget. The information degrades every hop.

03 · drift

Sessions diverge silently

Each session cold-starts from its context window. No session knows what the others decided an hour ago. Contradictions ship. You find them in production.

04 · cost

Re-deriving burns tokens

Each session cold-starts from its context window and re-derives what others already worked out. The same approach gets reasoned into existence three times across three sessions. Multiply by every resume, every parallel branch, every week.

the thesis

Not a wrapper.
The memory underneath your sessions.

The other tools wrap one AI session in a visual workspace.
Vinculum sits underneath whatever clients you already run — every session reads what every other session decided, without you copy-pasting between windows.
One human. One chat. N workers. Zero clipboards.

You don't run Vinculum to do your AI work in. You run it so the sessions you already use share a coherent picture of what you're building and what's already been decided.

every other tool

Wraps an AI client in a visual workspace. Install it, open it, do your AI work inside it. The orchestration lives in the application.

Want parallel sessions? Open parallel windows — and you're the bus again.

vinculum

Sits underneath whatever clients you already run. Your chat session becomes the conductor. Your CLI workers become the workforce.

Every decision any session makes gets written back; every session that joins reads what came before. You stop being the relay.

how it works

General · Colonel · Lieutenant · Grunt.
Four layers. One job each.

The role model wasn't designed upfront — it appeared in the first day of using the system. The names stuck because they described what was already happening.

Generalthat's you

Strategic intent, written in plain language to your chat session. “Spawn a couple workers — I want a new onboarding flow. Here's the brief.”

You never open a terminal. You never manage a git branch. You never relay context. You describe what you want and supervise the dashboard while it happens.

Colonelyour chat session

Tactical translation and worker direction, never implementation. Your chat session (claude.ai web, Windows app, CLI Claude in a terminal) writes directives to Vinculum, targeting the right workers on the right branches. It coordinates; it doesn't code.

The colonel's unique value is the gap between what you said and what the workers need to hear. It holds that gap without collapsing it.

Lieutenantdecomposers · QA

Decomposers, auditors, committers. Lieutenants take a colonel-sized brief, break it into atomic grunt directives, dispatch the grunts, batch-review their work, and commit. Always Sonnet or Opus — the work needs judgment Haiku can't reliably give.

Real-military analogy: an LT leads a platoon.

Gruntssergeant · private

Focused execution in one domain, writing back everything they decide. Two model tiers under one substrate role: Sergeants (Sonnet) handle judgment-heavy work and can peer-coordinate when an LT directs them. Privates (Haiku) handle mechanical execution.

Both write implementations, questions, and blockers back to the substrate so the colonel and the LT stay current without polling.

The spawner closes the loop. Your chat session calls spawn_grunt and a worker process materializes on your host — stdin pipe open, focus declared, ready to work. No terminal ceremony. No manual session management.spawn_grunt(role="grunt", model="claude-sonnet-4-6", session_label="builder-auth-flow")

why now

The agent tooling shipped.
The memory layer underneath didn't.

The primitives for running parallel AI agents exist now. The substrate they share — the persistent memory underneath — is the empty lane.

Recently

Agent delegation becomes a primitive

Multi-agent coordination becomes a first-class API primitive. AI sessions can delegate work to other AI sessions. The orchestration substrate exists. The memory substrate doesn't.

Recently

The spawn primitive is real

Process and lifecycle management for long-running agents. Spinning up a worker on a remote host is a documented, supported operation. The spawn primitive is real. The memory those spawns share is still ephemeral.

Today

The memory layer is the gap

Every competitor answered "how do I run AI sessions in a better wrapper?" Nobody answered "how do those sessions remember anything across the team's work?"

The memory layer is the empty lane.Vinculum is in it.

get started

Up in 60 seconds.
No API key. No new subscription.

Intelligence features route through MCP sampling — your existing Claude subscription, no extra charges. Self-hosted stays free forever under AGPL v3.

# 1 — run the server
$ uvx vinculum-mcp --bind :31415

# 2 — add to Claude Code
$ claude mcp add vinculum http://localhost:31415

# 3 — open claude.ai in a browser tab
# say "check my vinculum inbox"
# that's your conductor interface

dashboard → localhost:31415/dashboard

self-hosted · free forever

AGPL v3. Your data, your box. Run it on the same machine as your Claude Code sessions.

uvx vinculum-mcp

hosted · vinculum.run

Managed auth, multi-tenant, team coordination. Same codebase, same data model. No self-hosting required.

vinculum.run →

source

Everything is open. github.com/whalefall-media/vinculum →

04. the dashboard

Every session. Every write.
One navigable graph.

The dashboard is the workspace, not a picture of it. Nodes are entries, branches are threads, edges are relations. Click a node to see its context. Filter by role, author, or time. Live via SSE — new entries appear as you work.

vinculum.run/dashboard

who it's for

One solo dev or
six AI workers.

Vinculum scales from “I want my AI to remember what we were doing” to “I'm running a six-session parallel build and need a real coordination layer.”

casual

free · self-host

You have one assistant who has memory.

You use Claude in chat. You want it to remember what you decided last session without you recapping every time. Vinculum is the memory layer. One session, one project, always caught up.

Self-hosted in 60 seconds — uvx vinculum-mcp
AGPL v3, your box, your data
Works with claude.ai or Claude Code

self-host free →

power user

$20/mo hosted

You direct an army.

You already pay for Claude Max. You run two, four, six sessions in parallel. You're tired of being the message bus. You want to describe what needs doing and supervise a dashboard while workers execute.

Spawn workers from chat — no terminal ceremony
Real-time dashboard: every session, every write, navigable graph
Background intelligence: auto-generated entry titles via your Claude subscription, no server-side API key

go pro →

pricing

Add memory for ~10% of what you
already pay for AI.

Self-hosted is free and stays free. The hosted tier adds auth, background intelligence, and managed ops. Full pricing breakdown →

Free

Self-host or 1-project hosted. AGPL v3, free forever (commercial license for enterprises).

Substrate

$5/mo

Unlimited projects. Unlimited retention. Live intelligence.

The questions you're
already asking.

How is this different from mcp-memory-keeper?

Memory keepers give a single session a longer memory. Vinculum gives multiple sessions a shared memory they all read from and write to simultaneously. It's the difference between a notebook one person carries and a whiteboard a team updates in real time. You're not extending one session's context — you're giving all your sessions a shared memory, so they stay coherent without you in the middle.

Why not just use a bigger context window?

A 200K context window is a big clipboard. It still belongs to one session. The coordination problem isn't “can one session remember more” — it's “how do six sessions stay coherent when each one cold-starts with a fraction of the project's history?” Vinculum solves the horizontal problem. Bigger windows solve the vertical one. They're not the same problem.

Is this just for Claude Code?

No. Vinculum works with any MCP client — Claude Code, claude.ai, Cursor, Zed, Cline, anything. The typical setup is claude.ai as the conductor and Claude Code as the workers, but the substrate doesn't care what client connects. If it speaks MCP, it can read and write entries.

Can I self-host?

Yes — and self-hosted is free under the AGPL v3 license. uvx vinculum-mcpand you're up in about 60 seconds. The hosted service at vinculum.run runs the same code with managed auth, multi-tenant Postgres, and background intelligence on our infrastructure. Use it if you don't want to run a box; skip it if you do.

What's the open-source license?

AGPL v3. Full source at github.com/whalefall-media/vinculum. You can fork it, run it, modify it, use it commercially — the license lets you. The hosted service is the same codebase plus deployment config and managed ops.

Who built this?

Steve Duskett — solo founder, Boise Idaho. I built Vinculum because I was running six parallel Claude sessions on my own software projects and spending half my time as the coordination layer between them. The stack runs on a single OVH bare-metal machine: Postgres 17, Next.js 15, a custom MCP server. No Kubernetes, no cold starts, no VC runway. If you want the full story, read the about page.

vinculum

Latin for bond, link, that which binds. In mathematics, the bar over a repeating decimal — the mark that says these digits recur, indefinitely, as a unit. That is the architectural claim: a substrate that accumulates a project's intent across its entire life, holding parallel sessions together not by synchronizing them, but by giving them a single authoritative surface to read from and write to. The sessions are ephemeral. The graph is not.

— the substrate that persists —

LLMs don't have memory.

The ergonomic ceiling
hits at session three.

Three sessions hits the wall

Human-as-relay

Sessions diverge silently

Re-deriving burns tokens

Not a wrapper.
The memory underneath your sessions.

General · Colonel · Lieutenant · Grunt.
Four layers. One job each.

The agent tooling shipped.
The memory layer underneath didn't.

Agent delegation becomes a primitive

The spawn primitive is real

The memory layer is the gap

Up in 60 seconds.
No API key. No new subscription.

Every session. Every write.
One navigable graph.

One solo dev or
six AI workers.

You have one assistant who has memory.

You direct an army.

Add memory for ~10% of what you
already pay for AI.

The questions you're
already asking.

Give your sessions the memory they're missing.

vinculum

LLMs don't have memory.

The ergonomic ceilinghits at session three.

Three sessions hits the wall

Human-as-relay

Sessions diverge silently

Re-deriving burns tokens

Not a wrapper.The memory underneath your sessions.

General · Colonel · Lieutenant · Grunt.Four layers. One job each.

The agent tooling shipped.The memory layer underneath didn't.

Agent delegation becomes a primitive

The spawn primitive is real

The memory layer is the gap

Up in 60 seconds.No API key. No new subscription.

Every session. Every write.One navigable graph.

One solo dev orsix AI workers.

You have one assistant who has memory.

You direct an army.

Add memory for ~10% of what youalready pay for AI.

The questions you'realready asking.

Give your sessions the memory they're missing.

vinculum

The ergonomic ceiling
hits at session three.

Not a wrapper.
The memory underneath your sessions.

General · Colonel · Lieutenant · Grunt.
Four layers. One job each.

The agent tooling shipped.
The memory layer underneath didn't.

Up in 60 seconds.
No API key. No new subscription.

Every session. Every write.
One navigable graph.

One solo dev or
six AI workers.

Add memory for ~10% of what you
already pay for AI.

The questions you're
already asking.