← AI Hacker Daily

Edition

00

picks

# AI Hacker Daily — 2026-05-24 **Theme: the diff stops being the unit of work.** A year ago the question was whether an agent could write the function.

# AI Hacker Daily — 2026-05-24 **Theme: the diff stops being the unit of work.** A year ago the question was whether an agent could write the function. This week it's that you have several agents running at once and can't read everything they produce — the HN front page literally surfaced a post titled `--dangerously-skip-reading-code`. So the useful tooling has moved off the editor and onto the fleet around it: where the agents run, how requests route between them, how you review intent instead of lines, and how context follows them across sessions. And the pattern is spreading past code entirely — Anthropic shipped a plugin marketplace that points the same agents at sales, finance, legal, and marketing. We dropped earendil-works/pi (a fine open coding-agent stack, but it's one more agent, not a way to run many), set aside Firecrawl's pdf-inspector as an off-theme data-prep utility, and note that BloopAI's vibe-kanban — the first-gen browser kanban for orchestrating agents — is sunsetting, so the category is already consolidating. ## Anthropic ships a knowledge-work plugin marketplace, not a coding one The first-party, Apache-2.0 repo bundles 11 plugins — sales, marketing, legal, finance, data, customer-support, product-management, enterprise-search, bio-research, productivity, and a plugin-management plugin for building your own — for Claude Code and the new Claude Cowork surface. Each plugin is the same shape: skills as markdown (auto-activated domain knowledge), explicit slash commands like `/sales:call-prep` and `/data:write-query`, and MCP connectors into the business stack (Slack, HubSpot, Notion, Jira, Figma, Snowflake). You add it by name: `claude plugin marketplace add anthropics/knowledge-work-plugins`, then `claude plugin install sales@knowledge-work-plugins`. We covered the Small Business pack on 2026-05-14 and the legal pack on 2026-05-15 as one-off vertical drops. This is the same template generalized into a single marketplace and pointed at every business function, not just developers. The structure flagged a week ago — skills as markdown, agents as slash commands, MCP as the integration layer — is now declared stable and shipped as a marketplace you subscribe to by name. Read it as Anthropic redefining "agent tooling" to mean "all knowledge work," and setting the plugin format others will have to be compatible with. **Delete:** the assumption that "agent tooling" is a synonym for "coding tooling." **Tradeoff:** it's scaffolding — markdown and JSON. Out of the box it's a well-organized empty shell; the value is entirely in the MCP connectors you wire and the company context you add. [GitHub](https://github.com/anthropics/knowledge-work-plugins) ## cmux is a terminal built for running many coding agents at once A native macOS terminal built on libghostty (GPU-rendered, reads your existing Ghostty config) for people running several agent sessions in parallel. The features all target that one problem: per-agent notifications via OSC 9/99/777 escape sequences so a tab lights up when its agent needs you, sidebar tabs showing each session's git branch, PR status, working directory, and listening ports, a scriptable browser pane agents can drive against a dev server, and SSH remote workspaces. `brew tap manaflow-ai/cmux && brew install --cask cmux`. Reach for it the moment you have more agent sessions than you can track by squinting at tab titles. It speaks to roughly 13 agent CLIs — Claude Code, Codex, Grok, OpenCode, Pi, Amp, Cursor CLI, Gemini, Copilot, and more — so it's a harness, not a lock-in. The timing is the tell: in the same window BloopAI announced vibe-kanban, the popular browser kanban for orchestrating agents, is sunsetting. The terminal-native take on the same problem is what's still shipping. **Delete:** the generic terminal where six agents produce six indistinguishable tabs and nothing tells you which one stalled an hour ago. **Tradeoff:** macOS-only and GPL-3, and it's a Ghostty extension — you inherit Ghostty's model whether you wanted it or not. [GitHub](https://github.com/manaflow-ai/cmux) ## Plano is a proxy that routes across a fleet of agents An AI-native proxy and data plane (Rust) that sits between your app and your agents. You declare agents in YAML as plain OpenAI-compatible HTTP endpoints; Plano routes between them with a purpose-built 4B orchestrator model — by model name, by semantic alias, or by automatic preference — captures end-to-end OpenTelemetry traces with no instrumentation code, and runs guardrail and jailbreak filter chains in front of them. Apache-2.0, with a hosted free dev tier or self-host. Reach for it once "the agent" is actually five agents and you're hand-writing the routing, retry, and tracing glue between them. Plano pushes that into infrastructure the way an API gateway did for microservices: the agents stay dumb HTTP servers, the proxy owns routing, observability, and safety. It's the orchestration layer for when you stop thinking about a single agent and start thinking about traffic between many. **Delete:** the bespoke if/else router and the OpenTelemetry wiring you bolted onto your agent app by hand. **Tradeoff:** it's young (v0.4.x), and you're trusting a 4B model's routing decisions plus an extra network hop — fine for dev, but audit it before it fronts production traffic. [GitHub](https://github.com/katanemo/plano) ## Mainline makes you review the intent before the diff A git-native "intent memory" CLI for coding agents. Before an agent writes code, you capture the goal, the decisions, the rejected alternatives, and the hard constraints; Mainline stores them in Git refs and notes that travel with the repo (`mainline preflight` / `start` / `append` / `seal`), and a reviewer reads the sealed intent before opening the diff. It hooks into Codex, Claude Code, and Cursor at session start. Install via `curl … install.sh | bash` or `go install github.com/mainline-org/mainline@latest`. Reach for it when agents produce diffs too large and too fast for a human to reverse-engineer intent from the code. The README's worked example is the load-bearing one: an agent deletes a "removable" legacy `/reports/invoices.csv` route that an enterprise contract still depends on, because the constraint lived in someone's head, not in the diff. Mainline's own controlled eval reports zero constraint violations in intent-first mode versus nine code-first across eight scenarios — treat the number as the vendor's, but the failure mode is one every agent user has hit. **Delete:** the review habit of inferring "what was this agent even trying to do" from the diff alone. **Tradeoff:** layered license — the CLI and hooks are open, the hosted Hub is proprietary — and it only helps if someone actually writes the intent down, which is exactly the step people skip under deadline. [GitHub](https://github.com/mainline-org/mainline) ## CoreMem carries your context across every agent A hosted context store. You build "mems" — collections of files, docs, and notes — then load them into any tool through direct integrations, a public or scoped share URL, or an MCP server endpoint, instead of re-explaining your project to each new session. Agents can propose updates to a mem, but nothing is written until you approve it. Free signup, with the MCP endpoint at `api.coremem.app`. Reach for it when you run several agents and tools and each one starts cold — and your `CLAUDE.md`, Cursor rules, and pasted system prompts have quietly forked into five slightly different versions of the truth. CoreMem is the single source the whole fleet reads from. The approve-before-write gate is the right default: agent-written memory that updates itself silently is how context rots without anyone noticing. **Delete:** the copy-pasted context blob you paste into every new chat, and the drifting per-tool rules files. **Tradeoff:** it's a hosted, closed service holding your context — the opposite of local-first — and pricing past the free tier isn't spelled out, so weigh the lock-in before you pour your whole knowledge base in. [CoreMem](https://coremem.app)

One of these,
every weekday.

Free. Unsubscribe by replying with one word. No tracking pixels in the email.