Chronicle #18 - Teaching the Machine to Remember

Every time one of our AI sub-agents encounters a problem, it solves it from scratch. Every time.

The Supabase IPv6 fix? We figured it out three weeks ago. Port 6543 instead of 5432. Took four hours of debugging the first time. The second time a sub-agent hit the same error, it spent another two hours rediscovering the same solution. The third time — you get the idea.

AI agents don't have institutional memory. They don't swap stories around the coffee machine. There's no senior engineer who says "oh yeah, we hit that in Q3, here's the fix." Every agent starts from zero, every time.

Enter the Evolver

We built a system called the Evolver — a knowledge base where solved problems become reusable capsules. When a sub-agent encounters an error, the pipeline checks: "Have we seen this before?"

Signals are the error fingerprint — the module names, error types, and keywords extracted from a ticket. When a new ticket arrives, the Evolver extracts its signals and matches them against every capsule in the database. Match above threshold? The solution gets attached as a "Known Solution Hint" before an engineer agent even starts working.

No LLM involved in the matching. Pure signal overlap. Fast, deterministic, and cheap.

Security First, Always

The Evolver connects to an external marketplace called EvoMap — a hub where AI nodes share validated solutions. Before we plugged in, we did what we always do: a full security review.

We pulled the source code. Read every file. Found a dry-run bug where the --dry-run flag didn't actually prevent writes. Submitted a PR (#68) upstream and forked the repo to our own account in the meantime.

Then we audited the network layer. What exactly gets sent to the hub? We traced every HTTP call: GET /a2a/assets/search?signals=ModuleNotFoundError,streamlit&status=promoted&limit=5. Just keywords. No source code. No file contents. No credentials. Just error signal keywords and module names.

This is the part most AI builders skip. The excitement of a new integration is strong. The discipline of auditing it first is stronger. We have a security protocol for a reason — every line of external code gets reviewed before it touches our pipeline.

From 3 to 22

We seeded the Evolver with 3 capsules from known problems: the Supabase IPv6 fix, the Streamlit reload chain, and a cookie persistence issue.

We pulled every closed ticket across both projects — ChurnPilot and StatusPulse — and asked: "Is there a reusable technical lesson here?" Not every ticket qualifies. Documentation updates? Skip. Straightforward feature additions? Skip. But a tricky Python 3.13 sys.path regression that broke production? That's a capsule.

By end of day: 22 capsules. Each one a solved problem that no future agent will need to solve again.

The confidence system is the part I like most. Every capsule starts at 0.85. When a sub-agent uses a capsule and the fix works, confidence bumps up (+0.05). When it fails, confidence drops (-0.10). Below 0.3? Auto-disabled. The knowledge base gets smarter over time without any human curation.

Capsule #71 — a Streamlit reload chain fix — has been validated twice in production. Confidence: 0.95, streak: 2. That particular problem will never waste another hour of compute.

The Hub

We registered as a node on EvoMap (node adc4188...) and published two bundles — the Supabase IPv6 fix and the Streamlit reload chain. Both auto-promoted on the marketplace.

This means other AI builders hitting the same problems can find our solutions. And when someone else publishes a fix for a problem we haven't hit yet, we can pull it into our capsule database before it ever costs us debugging time.

It's the beginning of something interesting: AI agents building shared institutional memory across organizations. Not shared training data — shared solutions. Specific, validated, versioned fixes for specific problems.

The Pipeline Gap

The Evolver exists. The capsules are loaded. The hub is connected. But there was a problem: the automated CTO — the cron job that triages tickets and dispatches engineers — wasn't using it.

The instructions said "run the evolver skill before dispatching." The CTO just... didn't. Same way it saw 19 backlog tickets and only worked on one.

Root cause: no enforcement mechanism. The rules said "do this" without a gate that prevented advancing without doing it. Like writing "wash hands" on a sign without installing a sensor on the door.

The fix: baked the evolver check directly into Phase 1 (triage) as a mandatory step. Every status:new ticket now runs through capsule matching before classification. And triage can't advance to dispatch until every new ticket is processed — zero status:new remaining.

The Feedback Loop

The most important part isn't the capsules themselves — it's that they evolve.

When a sub-agent uses a capsule and succeeds, the system records it: confidence goes up, success streak increments. When a capsule leads nowhere, confidence drops. Below 0.3? Auto-disabled. No human needs to curate the knowledge base. The system prunes its own bad advice.

This is the difference between a knowledge base and a learning knowledge base. Static documentation rots. A system that tracks its own hit rate stays honest.

Over time, the high-confidence capsules surface faster and the noise fades. We don't have to guess which solutions are useful — the production data tells us.

674 Things Wrong (The B-Side)

ruff check . — 674 lint violations across all projects. Unused imports, bare excepts catching SystemExit, mutable default arguments, f-strings with no placeholders. The kind of problems that work until they don't.

Auto-fixed all 674, reformatted 205 files, installed pre-commit hooks so new violations get caught before they land. One command, six seconds, three weeks of accumulated debt gone.

Also: fixed a production outage caused by Python 3.13 silently changing sys.path behavior. Two-line fix, four-hour diagnosis. That fix became capsule #71 — the Evolver's first real-world validation.

The Self-Evolving Day

Today's theme wasn't features or fixes. It was meta-improvement — building systems that make the system better.

The Evolver gives our pipeline institutional memory. The capsule feedback loop makes that memory self-correcting. The hub connection lets us share solutions with other builders and benefit from theirs. The pipeline enforcement gates ensure the tools we build actually get used.

None of this ships a feature. All of it makes every future feature ship faster, cheaper, and with fewer rediscovered bugs.

The conventional approach to AI agents is: make each agent smarter. Better models, more context, longer conversations.

We're trying something different: make the system smarter. Individual agents can be cheap and simple if the system remembers what worked before. A Haiku-class model with the right capsule hint can solve a problem that stumped an Opus-class model working from scratch.

📊 The Scoreboard

Evolver capsules: 3 seeded → 22 mined from closed tickets
Hub bundles published: 2 (both auto-promoted)
Production validations: Capsule #71 (confidence 0.95, streak 2)
Lint violations fixed: 674 auto-fixed, 205 files reformatted
Pipeline gaps fixed: 2 (triage completion gate + evolver enforcement)
Security reviews completed: Full EvoMap source audit + PR #68 upstream
Key insight: Make the system remember, not the agent think harder

PS: We found a bug in the external tool before we integrated it. Filed a PR. Forked the repo. Then connected. That's the order. Always audit external code before it touches your pipeline. Excitement is not a security clearance.

PPS: Capsule #71 has a confidence score of 0.95 and a success streak of 2. By next month, we'll know which of our 22 capsules are genuinely useful and which were noise. The system will tell us — we don't have to guess.

Teaching the Machine to Remember

The Problem With Amnesia