← AI Hacker Daily

Edition

00

picks

# AI Hacker Daily — 2026-05-15 **Theme: agents went ambient this week, so the useful work is the discipline layer.** Two platform moves bookend the week.

# AI Hacker Daily — 2026-05-15 **Theme: agents went ambient this week, so the useful work is the discipline layer.** Two platform moves bookend the week. OpenAI put Codex in the ChatGPT mobile app — you can now dispatch a coding agent from your phone, with no diff in front of you and no laptop in the loop. Anthropic shipped `claude-for-legal`, its second open vertical agent pack in two days (Small Business was yesterday). The direction is clear: agents are getting more ambient and more vertical, and the volume of code that no human reads before it merges keeps going up. The genuinely useful indie work this week is the opposite motion — three tools that put reviewability and discipline back around AI-written code. We dropped the local-LLM tooling (whichllm, the GGUF explainer), the humanoid-robot PR (Gatsby), and Halgorithem (an unvalidated alpha whose "no AI" claim doesn't survive the spaCy dependency). ## OpenAI puts Codex in the ChatGPT mobile app The headline is the whole story: you can now kick off Codex coding tasks from the ChatGPT mobile app, away from a terminal. This is the same motion as Anthropic's Claude Code GitHub Action and background agents — moving code generation off the developer's screen and into a fire-and-forget surface. The capability isn't the interesting part; the review surface is. A task dispatched from a phone is a task whose diff you are statistically less likely to read line by line before it lands. For builders this is a forcing function, not a feature. Every "agent from anywhere" announcement raises the floor on how much unreviewed machine-written code is in motion, which is exactly why the next four picks exist. Treat ambient dispatch as a reason to harden your merge gate, not as permission to relax it. **Delete:** the mental model where AI code generation happens somewhere you're watching it. **Tradeoff:** convenience of phone-dispatched agents directly trades against the quality of the review they get. [Announcement](https://openai.com/index/work-with-codex-from-anywhere/) ## anthropics/claude-for-legal is the second open vertical agent pack from Anthropic in two days Worth covering not as a legal product but as a structural signal. It's an Apache-2.0 reference implementation: 10+ practice-area plugins, 60+ named agents (each with a job title — "DSAR Responder," "Claim Chart Builder"), 20+ MCP connectors (iManage, Everlaw, CourtListener), and managed-agent cookbooks for headless deployment. No build step — everything is markdown and JSON, customized through a per-plugin `CLAUDE.md` practice profile written by a cold-start interview. The thing to notice is the standardizing shape. After SMB yesterday and legal today, Anthropic is establishing a repeatable vertical-pack template: skills as markdown, agents as named slash commands, MCP connectors as the integration layer, attorney/human-review gates baked in as explicit guardrails. If you build agent tooling, this is now the de facto structure to be compatible with, the same way `package.json` shape became non-optional. Read it as a spec, not a law firm. **Delete:** the assumption that vertical agent packs will each invent their own structure. **Tradeoff:** every output is "a draft for attorney review" — the guardrails that make it safe also cap how much it actually saves. [GitHub](https://github.com/anthropics/claude-for-legal) ## Gox is a strict Go static analyzer built specifically for LLM-written code Eleven fail-closed rules, every one an error by default with a required justification to opt out. The flagship is `namedargs`: when adjacent arguments share a type, it forces an inline `/* paramName */` comment at the call site. That catches the bug LLMs produce constantly and tests never see — `transfer(orderID, userID)` vs `transfer(userID, orderID)`, both compile, both pass, one silently corrupts. The rest (`errcheck`, `forcetypeassert`, `contextcheck`, `bodyclose`, `noglobals`) target the silent-failure class, not style. The detail that makes it real: `gox install claude` registers a Claude Code Stop hook that blocks the agent's turn until issues are annotated or fixed. That's the right integration point — you want the gate inside the loop, not in CI after the agent has moved on. Zero external deps, ~2.6s warm on a monorepo, BSD-3. Explicitly not a golangci-lint replacement; it runs alongside. **Delete:** the assumption that "tests pass" means an LLM didn't swap two same-typed arguments. **Tradeoff:** fail-closed-by-default means real annotation friction on legitimate code; that friction is the product. [GitHub](https://github.com/mentasystems/gox) ## CodeBoarding generates the architecture diagram so you can see an AI change's blast radius LSP-based static analysis across Python, TS/JS, Java, Go, PHP, Rust, plus an LLM pass to interpret it, output as Mermaid diagrams and markdown under `.codeboarding/`. Runs as a CLI (`pipx install codeboarding`), a VS Code extension, or a GitHub Action that updates diagrams in CI. Incremental mode re-analyzes only what changed. The on-theme use is the one the README names directly: reviewing AI-generated changes with system context before they turn into hidden debt. The failure mode of ambient agents isn't usually a wrong line — it's a structurally wrong change that looks locally fine, a new dependency edge or a bypassed boundary you don't notice in a diff. A regenerated component diagram in the PR makes that visible. MIT. **Delete:** the practice of reviewing agent PRs purely as line diffs with no structural view. **Tradeoff:** the LLM interpretation pass means the diagram itself is AI-mediated — useful, but not ground truth. [GitHub](https://github.com/CodeBoarding/CodeBoarding) ## JDS ports the obra/superpowers discipline model to Copilot CLI Nine skills that force a coding agent through think → plan → execute → verify → finish instead of letting it free-associate. The rigid ones have teeth: `jds-tdd` deletes any code written before its test, `jds-verify` rejects "I believe this works" and requires actual command output as proof, `jds-plan` refuses vague directives and demands executable code in the plan. Workers run as isolated subagents without session history, which is the deliberate move that stops assumption drift across a long task. It's an open adaptation of obra/superpowers, restructured for Copilot's plugin system with SQL task tracking and a live dependency-graph server. The honest framing: the ideas aren't new, the Copilot-CLI packaging is. If you're on Copilot CLI and have been eyeing the superpowers approach, this is the install. If you're on Claude Code, use it as a reference for the enforcement patterns. MIT. **Delete:** the hope that an agent will self-impose TDD discipline without a harness that enforces it. **Tradeoff:** Copilot-CLI-only, and sequential gating slows the agent down — which is the entire point. [GitHub](https://github.com/josipmusa/jds)

One of these,
every weekday.

Free. Unsubscribe by replying with one word. No tracking pixels in the email.