You stopped writing the code; now the job is supervising whatever did.

You stopped writing the code; now the job is supervising whatever did. HN's top post all day was "Running local models is good now" (1,359 points) — last week's local-coding wave arriving as a fact of life: more code, generated faster and cheaper, by something that never re-reads its own diffs. Two of today's Show HNs are, independently, "code review that runs the code." The slate underneath is the supervision stack that gap demands — understand what's there, review it before it lands, run it to prove it works, watch the agents once they're live, and gate what they're allowed to touch. Dropped: the model launches (Microsoft's Fara-7B, GLM-5.2, Qwen-Robot), the $60B SpaceX-buys-Cursor headline, one more Claude Code fork (openclaude, 29k stars and still a fork), and Agent-Reach — giving an agent free run of the whole internet is the opposite of today's instinct, however many stars it's pulling.

Understand-Anything

Understand-Anything is a plugin/skill — for Claude Code, Cursor, Copilot, Codex, and Gemini CLI — that turns a codebase into an interactive knowledge graph: every file, function, and class is a node a multi-agent pipeline builds locally over tree-sitter. Install it with `/plugin install understand-anything` or a one-line script, then `/understand` to scan, `/understand-chat` to ask "which parts handle auth?", and `/understand-diff` to see what a change actually touches before it lands. It's MIT, TypeScript, and has pulled 62k stars since March. Reach for it when you (or your agent) inherited a repo nobody holds in their head and need the blast radius of a change, not a vibe. The `/understand-diff` command is the one that matters here — impact analysis is the first step of supervising work you didn't write. Delete the grep-archaeology and the "ask the one dev who remembers" ritual. Tradeoff: 62k stars in three months is a velocity hard to separate from hype, and a generated map is only as trustworthy as the model that drew it — a confidently-wrong graph is worse than no graph.

→ github.com/Egonex-AI/Understand-Anything

git-lrc

git-lrc is a free Go CLI that installs as a git hook and runs an AI review on your staged diff at commit time — locally, BYOK (Gemini by default; OpenAI, Claude, DeepSeek, and others), checking ~100 failure patterns across 10 risk categories. Sixty-second setup, Linux/macOS/Windows, 1.4k stars, on a Sustainable Use License that allows self-hosting and business use but not reselling. The author's pitch is the whole theme in one line: his team started generating tons of code and spending less time looking at it, and regressions slipped through. git-lrc moves the catch left to the moment the context is freshest and the fix is cheapest — before anything reaches a PR. Delete the "I'll review it properly later" intention you never act on. Tradeoff: it's a commit-time gate, not a replacement for human PR review; BYOK means every commit spends tokens, and a checker that knows 100 patterns will cry wolf on a few of them.

→ github.com/HexmosTech/git-lrc

Ito

Ito is a GitHub app that reviews a PR by actually running it: it deploys your app in an isolated sandbox, generates test cases without you writing a line of Playwright or Cypress, exercises the impacted user flows, and posts video, screenshots, and run logs of what broke straight to the PR timeline. Pre-merge behavioral QA, free to sign up, no card. This is the freshest idea in today's pool — review that executes instead of just reading — and it's the right response to agents shipping UI changes faster than anyone clicks through them, where "the diff looks fine" keeps merging regressions. Delete the brittle end-to-end suite you keep meaning to fix and the manual click-through before every release. Tradeoff: it's also the least proven pick here — closed, hosted, thin adoption signal — and an AI writing the tests that then pass is a circularity worth watching closely before you trust the green check.

→ www.ito.ai

Spanly

Spanly is observability for MCP servers: a drop-in CLI and SDK — open source, Apache 2.0, at github.com/spanlyhq/spanly — that captures every JSON-RPC tool call and response, then surfaces error rates, p50/p95/p99 latency, session traces, and client analytics. The agent is open; the cloud dashboard is a paid service with a free tier and plans from $49, US or EU data residency, meant to sit alongside the Datadog/Sentry/New Relic you already run. Reach for it once you've exposed an MCP server to agents and realize you have no idea which tools they're calling, how often, or what's quietly failing. Watching the agents at runtime is the supervision step that comes after the code ships. Delete the print-statement debugging and the "why is this agent slow" guesswork. Tradeoff: it instruments the protocol layer, not the model's reasoning — you see the calls, not the why — and the retention and analytics worth having live behind the cloud tier.

→ www.spanly.com/

SolonGate

SolonGate is a gateway that sits between the LLM and your APIs, databases, and internal systems, and evaluates each tool call against a policy before it executes. It reads the real call payload, returns a structured ALLOW/DENY with the exact rule that fired in milliseconds, and logs every decision — so an agent's reach into production is governed, not hoped for. This is the last step of the arc: once you understand, review, run, and watch, you still want a bouncer at the door for the actions you can't take back. Reach for it when agents can touch production and "please don't drop the table" in the system prompt is not a control. Delete the hope-based security of trusting the model to behave. Tradeoff: it's the least-proven pick on this list — design-partner stage, closed, no public pricing — and the "AI judge" half just relocates the trust problem; it's the deterministic policy engine that's worth caring about.

→ solongate.com/

One of these,
every weekday.

Free. Unsubscribe by replying with one word. No tracking pixels in the email.