The platforms start shipping their canonical answers to the agent stack.

The platforms start shipping their canonical answers to the agent stack. Yesterday's slate was community-built layers — VCS, sandbox, memory, audit, catch. Today: Anthropic releases its official Skills repo, Microsoft drops an eval CLI, Google expands Gemini's RAG to multimodal, Nous puts out a self-improving agent runtime. Plus two community kitchen-sinks that show what "production agent stack" is starting to mean in practice.

anthropics/skills: the skill-format canon

If you're already writing skills, this is what Anthropic actually meant the format to look like.

The official Anthropic skills repository (Apache 2.0, 132k⭐) — the reference implementation of Claude's Skills feature, including source-available DOCX, PDF, PPTX, and XLSX document skills plus example skills across design, dev, and enterprise categories. Install via `/plugin marketplace add anthropics/skills` in Claude Code, then pick the specific bundles you actually want. Reach for it when you've been collecting skill definitions from a dozen different community sources and need a clean baseline of what Anthropic actually meant the format to look like. The document skills are the quiet bonus — real PDF and PPTX manipulation code, not "instructions for Claude to do PDF manipulation." Delete: any skill-format guesswork you were doing from blog posts. Tradeoff: it's the *reference*, not a comprehensive plugin pack — for breadth, layer addyosmani/agent-skills or everything-claude-code on top once you understand the primitive.

→ github.com/anthropics/skills

NousResearch/hermes-agent: a self-improving agent runtime

An agent that lives in your Telegram instead of dying when you close the terminal.

Nous Research's open agent runtime (MIT, 142k⭐), installable via a one-line `curl` or PowerShell `irm` script that bundles Python 3.11, Node, uv, and dependencies. Built-in learning loop that creates skills from experience, searches its own past conversations, persists a model of the user across sessions, and switches between 200+ models (OpenRouter, NVIDIA NIM, OpenAI, Anthropic) without code changes. Runs on a $5 VPS, Modal, or Docker; hibernates idle. Reach for it when you want an agent that *lives somewhere* — Telegram, Discord, Slack, WhatsApp, Signal — rather than a Claude Code session that ends when you close the terminal. The persistent cross-session memory is the real distinction; it builds context on you instead of relearning every time. Delete: the cron-job-plus-prompt-template rig that's been pretending to be your "always-on agent." Tradeoff: "self-improving" claims always invite skepticism — read the learning-loop implementation before betting workflow on it, and run it in a sandbox the first week.

→ github.com/NousResearch/hermes-agent

everything-claude-code: the community kitchen-sink

178k stars worth of someone else's CLAUDE.md, ready to install in one command.

A hackathon-winning Claude Code performance pack (MIT, 178k⭐, by Affaan Mustafa) — 48 specialized agents, 182 workflow skills, 68 legacy command shims, 14 MCP server configs, a dashboard GUI, hooks, installers. Install via `npm install -g ecc-universal` or `/plugin install` in Claude Code's marketplace. Works equally as a runnable pack and as a reference implementation for "what does a comprehensive agent harness look like." Reach for it when "official skills + a couple of community bundles" isn't enough and you want every reasonable agent, skill, and hook configuration pre-stitched. It's the "I want the kitchen sink, I'll trim later" option — useful for spinning up an opinionated environment fast and then deleting the parts that don't fit. Delete: the cargo-culted scraps you copied from five different gists trying to build the same thing piecemeal. Tradeoff: 48 agents and 182 skills means there's overlap, contradiction, and dead code — you'll spend the first week disabling things you'll never use, not adopting them all.

→ github.com/affaan-m/everything-claude-code

microsoft/waza: eval scaffolding for agent skills

You shipped a skill. This tells you whether it actually made things better, instead of "seems better."

A Go CLI from Microsoft (MIT, 574⭐, `curl -fsSL https://raw.githubusercontent.com/microsoft/waza/main/install.sh | bash`) that scaffolds eval suites, runs benchmarks across models, and compares results. Supports behavior validation, LLM-as-judge grading, and workflow verification. CI/CD integration via GitHub Actions; built-in token and cost tracking. Also available as an Azure Developer CLI extension. Reach for it when you're shipping skills — Anthropic's, your team's, everything-claude-code's — and need an actual measurement layer rather than vibes-based "this seems better." It's the missing step between "I wrote a skill" and "I know whether this skill made things better." Delete: the spreadsheet of "I tried this prompt and it worked, mostly," and the manual A/B you do by re-running the same task and squinting at the diff. Tradeoff: it's Microsoft / Azure-flavored — it works standalone but the deepest integration is the Azure Developer CLI, which may not match your stack.

→ github.com/microsoft/waza

Gemini API File Search: multimodal and citation-grade RAG

Your RAG corpus has images. Your retrieval layer can finally see them.

Google expanded Gemini's File Search tool with three additions: native multimodal handling (images and text together, via Gemini Embedding 2), key-value metadata filtering on indexed files, and page-level citations that point to the exact source location for any retrieved chunk. Use through the existing File Search endpoint; the additions are non-breaking for existing implementations. Reach for it when your RAG corpus is genuinely mixed-media — design systems with screenshots, scientific papers with figures, product docs with diagrams — and the text-only chunk-and-embed path has been giving you confidently wrong answers about the visuals. The page-level citations are the quieter win: an agent stops bluffing about "the document says…" because you can cheaply verify the claim. Delete: the homemade "extract images, caption them, embed the captions, hope it works" pipeline. Tradeoff: it's Gemini-only and adds Google as a dependency on your retrieval layer — fine for prototypes, harder to justify on a stack that already standardized on OpenAI or Anthropic embeddings.

→ blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/

chenhg5/cc-connect: agents-to-where-people-are bridge

Drive Claude Code from Slack, Telegram, or Discord. No public IP needed.

A multi-agent-to-messaging bridge (MIT, 8.4k⭐, install via `npm` / Homebrew / binary) that connects local AI coding agents — Claude Code, Cursor, Codex, Gemini CLI, Kimi CLI, plus any ACP-compatible agent — to messaging platforms: Slack, Telegram, Discord, WeChat Work, Feishu, DingTalk, LINE. No public IP required for most platforms. Includes scheduled tasks, persistent memory, and a web admin dashboard. Reach for it when the people consuming the agent's output aren't the same people running it — your PM wants to ask the codebase a question from Slack, you want to drive a long-running refactor from your phone, an on-call alert needs an agent to triage from a Telegram message. Distinct from yesterday's cc-switch (which handled accounts and providers); this is the people-side connectivity layer. Delete: the bespoke webhook server you built so a `/claude` Slack command could shell out to the right agent. Tradeoff: the more platforms you connect, the more credentials and tokens live in this one app — review the auth and storage story before you hand it your prod Slack token.

→ github.com/chenhg5/cc-connect

One of these,
every weekday.

Free. Unsubscribe by replying with one word. No tracking pixels in the email.