← AI Hacker Daily

Edition

06

picks

The supply chain around AI coding agents is filling out one layer at a time.

The supply chain around AI coding agents is filling out one layer at a time. Today's slate covers six rungs of that stack — what to build (spec), who builds it (provider switch), where it runs (sandbox), what it remembers (memory), what it did (audit), and what got broken (lint). The pre-AI dev pipeline (linter, type checker, test runner, CI, code review) took twenty years to mature. The post-AI version is being built right now.

01

github/spec-kit: the spec layer

Writes the spec the agent has to follow. Ends the era of "just vibe-code it."

GitHub's official spec-driven development toolkit (MIT, 94.4k⭐, install via `uv tool install specify-cli --from git+https://github.com/github/spec-kit.git`). A CLI plus methodology for treating specs as executable artifacts: write *what* to build, let one of 30+ supported AI agents (Copilot, Claude Code, Gemini CLI, Cursor, others) handle the *how*. The spec is the source of truth, not the prompt history. Reach for it when "vibe coding" with an agent has produced a codebase nobody can reason about anymore — you can't tell which decisions were intentional and which were the model filling in blanks. Spec-driven flips that: the spec exists separately, gets versioned, and is what you actually maintain. Delete: the README plus three half-written Notion pages where you keep specs that the agent never sees. Tradeoff: SDD is a discipline, not a tool — installing the CLI doesn't make you spec-driven any more than installing pytest makes you test-driven. Expect a real adjustment period before the workflow clicks.
github.com/github/spec-kit

02

cc-switch: the agent/provider switch layer

One UI for the five AI CLIs that keep filling up your terminal tabs.

A Tauri 2 desktop app (MIT, 64.9k⭐, `brew install --cask cc-switch` on macOS, MSI on Windows, DEB/RPM/AppImage on Linux) that manages five AI CLIs at once — Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw — with one-click provider switching across 50+ presets, unified MCP and Skills management, system tray access, and built-in usage and cost tracking. Reach for it the moment you have more than one Claude API key or are using more than one agent. The use case is mundane and constant: "use the work key on this repo, the personal one on the side project, and Codex for that thing where Claude keeps getting confused." Delete: the bash aliases, the env-var swaps, and that one shell function that "just works on your machine." Tradeoff: you're now trusting one app with credentials for several providers — read the local-storage story before you hand it production keys, and don't run it on a shared machine.
github.com/farion1231/cc-switch

03

Tilde.run: the runtime sandbox

Lets an agent touch production data with a commit-or-rollback button.

A hosted sandbox (free during preview, install via `curl -fsSL https://tilde.run/install | sh`) that wraps every agent run in an isolated container with a versioned filesystem and policy-checked egress, so the run either commits atomically or rolls back to nothing. Python SDK, `tilde exec` CLI, and a Claude integration that takes plain-English sandbox commands. Reach for it when you want an agent to actually touch production data — modify rows, fire webhooks, write to S3 — without the standard "OK but it's read-only and we'll review the patch" disclaimer. Default-deny network policy means a runaway agent can't exfiltrate without an explicit allowlist. Delete: the homegrown Docker-in-Docker rig you built so an agent could "safely" run database migrations, and any tool that gives an agent `eval()` on prod. Tradeoff: it's a hosted service in private preview — the trust story is "we run it and you trust us," which some teams will hard-veto on principle.
tilde.run/

04

agentmemory: the cross-session memory layer

Your agent stops re-asking what the migration policy is, every single session.

A single-binary MCP server (Apache 2.0, 3.3k⭐, v0.9.5 shipped today) that captures what your coding agent does across sessions, compresses it into searchable memory, and feeds the relevant slice back when the next session starts. Local SQLite plus an in-memory vector index — no Postgres, no pgvector — and works with Claude Code, Cursor, Codex CLI, Gemini CLI, Cline, Goose, and roughly fifteen other agents through the same MCP port. Reach for it the moment you stop wanting to re-explain the directory layout, the migration policy, and which files are auto-generated every new session. The README cites 95.2% retrieval accuracy and 92% fewer tokens than re-pasting full context — at current Sonnet pricing that's $0.04 vs $0.50 on a long session, every session. Delete: the `CLAUDE.md` you keep manually updating, the project-specific skills you wrote just to remind the agent of architectural decisions, and any "memory" SaaS that wants you to ship its servers your code. Tradeoff: it's local SQLite — share across machines via file sync, not by mailing the database around, and don't make it the source of truth for anything you can't regenerate.
github.com/rohitg00/agentmemory

05

re_gent: the audit / VCS layer

Git tells you who wrote each line. This tells you which prompt did.

A Go CLI (Apache 2.0, `brew install regent`) that sits next to git and records the *agent* side of your repo: which prompt produced which lines, the conversation that led to a commit, and the rewind path back to before things broke. Runs locally, no cloud component. Reach for it the day after the third "wait, did Claude or did I write that?" Save the agent's session log alongside the diff so you can grep for the prompt that introduced the bug, not just the line. Pairs naturally with `git blame`: git tells you who, re_gent tells you why. Delete: the homemade `.claude/sessions/` folder you've been hand-maintaining, and the reflexive `git reset --hard` you'd otherwise do when an agent goes off-script. Tradeoff: it's at v0.1.2 and the project itself calls the feature set POC-level — fine for solo or small-team use, don't bet your CI on it yet.
github.com/regent-vcs/re_gent

06

react-doctor: the catch-bad-output layer

Catches the bad React your agent shipped before you actually commit it.

A `npx -y react-doctor@latest .` CLI (MIT, 7.3k⭐) by the Million.js team that scans a React codebase and produces a 0–100 health score with actionable diagnostics across state, effects, performance, security, accessibility, and dead-code detection. Also installs as a *skill* in Claude Code and Cursor — so the agent that writes the React can run the doctor on it. Reach for it specifically because the README's tagline is correct: "Your agent writes bad React, this catches it." When an agent has just produced 800 lines of new JSX, you want a fast, opinionated linter that knows AI-pattern failure modes (useEffect-in-the-wrong-place, leaked refs, accidental client-component cascades) before they reach a PR. Delete: the half-configured ESLint setup that hasn't been touched since React 18 and ignores hook-rules, and the manual review that finds the bugs three commits later. Tradeoff: it's React-only and Million-team-opinionated — if your stack is Vue, Svelte, or Solid, you're back to writing your own rules.
github.com/millionco/react-doctor

One of these,
every weekday.

Free. Unsubscribe by replying with one word. No tracking pixels in the email.