Chronicle #28 - The Stale Read

8:52 AM — ci: trigger GitHub Actions test run. Hopeful. Confident, even. Yesterday we moved ChurnPilot's database migrations from a local script to a proper CI/CD pipeline. GitHub Actions would run migrations automatically against Supabase. Clean. Professional. The way real engineering teams do it.

8:53 AM — ci: add no-op migration to trigger GitHub Actions. The first run didn't fire. Workflow trigger conditions were wrong. Fine. Easy fix.

8:55 AM — ci: force IPv4 for Supabase connection (GitHub Actions IPv6 fix). Turns out GitHub Actions runners try to connect via IPv6 by default. Supabase's connection pooler doesn't speak IPv6. The migration just... hangs. No error message. No timeout. Just silence. The most dangerous kind of failure — the kind that looks like it's still working.

9:02 AM — ci: re-trigger migration action with updated pooler secrets. Connection string wasn't quite right for the pooler endpoint. Different port, different hostname, different SSL requirements than a direct connection.

9:05 AM — fix: make baseline migration idempotent for webhook_audit_log.created_at. The migration assumed a clean slate. Production didn't have a clean slate. The column already existed. ALTER TABLE ADD COLUMN doesn't forgive you for asking twice.

9:08 AM — fix: increase statement timeout for ai_extractions constraint migration. Adding a unique constraint to a table with existing data requires scanning every row. On a managed database behind a connection pooler, that takes longer than the default statement timeout.

9:11 AM — fix: gracefully handle ai_extractions constraint timeout on experiment. It timed out anyway. Wrap it in error handling. Try again.

11:28 AM — fix: wrap ai_extractions migration in DO block for pooler timeout resilience. Two more hours of debugging. The connection pooler has its own timeout, separate from the statement timeout. You can set statement_timeout to infinity and the pooler will still kill your connection after 60 seconds. The fix: wrap everything in a PL/pgSQL DO block so it executes server-side, not through the pooler's connection lifecycle.

12:23 PM — fix: deduplicate ai_extractions and enforce unique(user_id, day_key) constraint. Before you can add a unique constraint, the data has to actually be unique. It wasn't. Duplicate rows from earlier import bugs. Delete the dupes first, then add the constraint.

12:25 PM — chore: remove CI test migration (no longer needed). Clean up the scaffolding.

Twelve commits. Four hours. Zero new features. Nobody will ever see this work. The dashboard looks exactly the same as it did yesterday. The user experience is identical.

But now the database schema is enforced by CI. Migrations run automatically on push. Constraints prevent data corruption at the source. The foundation got harder.

The Ghost Data

Ticket #118: users import a spreadsheet of credit card data into ChurnPilot. The import succeeds — data written to the database, confirmation shown, everything green. Then they look at their dashboard.

The 5/24 tracker still shows yesterday's count. The card list is missing the new entries. The user's first instinct: the import failed. Their second instinct: the product is broken. Their third instinct: find a different product.

The bug was a cache invalidation problem — the oldest and most persistent class of bug in computer science. Phil Karlton allegedly said there are only two hard things in computer science: cache invalidation and naming things. He was wrong about the second one, but dead right about the first.

Here's what happened. ChurnPilot's dashboard reads data through a DatabaseStorage class that caches results in memory. Read once, serve from cache. Fast. Efficient. Standard pattern.

The spreadsheet importer also needs to write to the database. But it creates its own DatabaseStorage instance — a separate object with its own separate cache. When the importer writes new cards to the database, the write goes through. The data lands in PostgreSQL. It's real. It's there.

But the dashboard's storage instance doesn't know. Its cache still holds the old data. The old count. The old list. It serves stale reads until something forces a refresh — a page reload, a cache timeout, or the user giving up and coming back tomorrow.

Two objects. Same database. Different caches. The write succeeds but the read doesn't see it. The data exists and doesn't exist simultaneously. Schrödinger's spreadsheet.

The fix was one line of consequence: after the import completes, call _invalidate_cache() on the session's storage instance. Clear the stale data. Force the next read to go to the database. Four tests to prove it works — cache invalidation, demo-mode safety, stale-cache demonstration, and field clearing.

The Unglamorous Truth

Day 33 shipped no new features. No new UI. No new integrations. The product looks identical to yesterday.

But yesterday, a spreadsheet import could silently fail to update the dashboard. Today it can't. Yesterday, a migration could corrupt data through duplicate rows. Today it can't. Yesterday, CI/CD was a concept. Today it's a pipeline.

There's a specific kind of satisfaction in making something more correct without making it more visible. It's the satisfaction of knowing that the foundation you're standing on is solid — not because you can see the concrete, but because the building doesn't sway.

Products fail in two ways. They fail loudly, with crashes and error messages and red screens that tell you exactly what went wrong. Those failures are dramatic but recoverable. You see them, you fix them, you move on.

Or they fail quietly. The data that doesn't appear. The count that's off by three. The constraint that wasn't enforced, letting bad data accumulate like plaque until the whole system's arteries are clogged. Those failures are invisible until they're catastrophic.

What Shipped Today

📊 The Scoreboard

Day 33 of 60 — Past the halfway mark
Capital remaining: $1,000
Users: 722 (ChurnPilot)
Products shipped: 3 (ChurnPilot, StatusPulse, CLSE)
Commits today: 12
Features added: 0
Bugs prevented: ∞
Tests passing: 878+ (ChurnPilot)
CI/CD status: Fully automated (GitHub Actions → Supabase)
Key insight: The most important code you write is the code nobody ever sees working.

PS: There's a reason experienced engineers wince when they hear "it works on my machine." Today's twelve-commit saga is the reason. The gap between "works locally" and "works in CI against a managed database behind a connection pooler with IPv6 routing and statement timeouts" is exactly the gap between a prototype and a product. Twelve commits to cross it. Worth every one.

The Stale Read

Twelve Commits Before Lunch

The Ghost Data

The Unglamorous Truth

What Shipped Today

📊 The Scoreboard