← AI Hacker Daily

Edition

06

picks

The platforms start shipping their canonical answers to the agent stack.

The platforms start shipping their canonical answers to the agent stack. Yesterday's slate was community-built layers — VCS, sandbox, memory, audit, catch. Today: Anthropic releases its official Skills repo, Microsoft drops an eval CLI, Google expands Gemini's RAG to multimodal, Nous puts out a self-improving agent runtime. Plus two community kitchen-sinks that show what "production agent stack" is starting to mean in practice.

01

anthropics/skills: the skill-format canon

If you're already writing skills, this is what Anthropic actually meant the format to look like.

The official Anthropic skills repository (Apache 2.0, 132k⭐) — the reference implementation of Claude's Skills feature, including source-available DOCX, PDF, PPTX, and XLSX document skills plus example skills across design, dev, and enterprise categories. Install via `/plugin marketplace add anthropics/skills` in Claude Code, then pick the specific bundles you actually want. Reach for it when you've been collecting skill definitions from a dozen different community sources and need a clean baseline of what Anthropic actually meant the format to look like. The document skills are the quiet bonus — real PDF and PPTX manipulation code, not "instructions for Claude to do PDF manipulation." Delete: any skill-format guesswork you were doing from blog posts. Tradeoff: it's the *reference*, not a comprehensive plugin pack — for breadth, layer addyosmani/agent-skills or everything-claude-code on top once you understand the primitive.
github.com/anthropics/skills

02

NousResearch/hermes-agent: a self-improving agent runtime

An agent that lives in your Telegram instead of dying when you close the terminal.

Nous Research's open agent runtime (MIT, 142k⭐), installable via a one-line `curl` or PowerShell `irm` script that bundles Python 3.11, Node, uv, and dependencies. Built-in learning loop that creates skills from experience, searches its own past conversations, persists a model of the user across sessions, and switches between 200+ models (OpenRouter, NVIDIA NIM, OpenAI, Anthropic) without code changes. Runs on a $5 VPS, Modal, or Docker; hibernates idle. Reach for it when you want an agent that *lives somewhere* — Telegram, Discord, Slack, WhatsApp, Signal — rather than a Claude Code session that ends when you close the terminal. The persistent cross-session memory is the real distinction; it builds context on you instead of relearning every time. Delete: the cron-job-plus-prompt-template rig that's been pretending to be your "always-on agent." Tradeoff: "self-improving" claims always invite skepticism — read the learning-loop implementation before betting workflow on it, and run it in a sandbox the first week.
github.com/NousResearch/hermes-agent

03

everything-claude-code: the community kitchen-sink

178k stars worth of someone else's CLAUDE.md, ready to install in one command.

A hackathon-winning Claude Code performance pack (MIT, 178k⭐, by Affaan Mustafa) — 48 specialized agents, 182 workflow skills, 68 legacy command shims, 14 MCP server configs, a dashboard GUI, hooks, installers. Install via `npm install -g ecc-universal` or `/plugin install` in Claude Code's marketplace. Works equally as a runnable pack and as a reference implementation for "what does a comprehensive agent harness look like." Reach for it when "official skills + a couple of community bundles" isn't enough and you want every reasonable agent, skill, and hook configuration pre-stitched. It's the "I want the kitchen sink, I'll trim later" option — useful for spinning up an opinionated environment fast and then deleting the parts that don't fit. Delete: the cargo-culted scraps you copied from five different gists trying to build the same thing piecemeal. Tradeoff: 48 agents and 182 skills means there's overlap, contradiction, and dead code — you'll spend the first week disabling things you'll never use, not adopting them all.
github.com/affaan-m/everything-claude-code

04

microsoft/waza: eval scaffolding for agent skills

You shipped a skill. This tells you whether it actually made things better, instead of "seems better."

A Go CLI from Microsoft (MIT, 574⭐, `curl -fsSL https://raw.githubusercontent.com/microsoft/waza/main/install.sh | bash`) that scaffolds eval suites, runs benchmarks across models, and compares results. Supports behavior validation, LLM-as-judge grading, and workflow verification. CI/CD integration via GitHub Actions; built-in token and cost tracking. Also available as an Azure Developer CLI extension. Reach for it when you're shipping skills — Anthropic's, your team's, everything-claude-code's — and need an actual measurement layer rather than vibes-based "this seems better." It's the missing step between "I wrote a skill" and "I know whether this skill made things better." Delete: the spreadsheet of "I tried this prompt and it worked, mostly," and the manual A/B you do by re-running the same task and squinting at the diff. Tradeoff: it's Microsoft / Azure-flavored — it works standalone but the deepest integration is the Azure Developer CLI, which may not match your stack.
github.com/microsoft/waza

05

Gemini API File Search: multimodal and citation-grade RAG

Your RAG corpus has images. Your retrieval layer can finally see them.

Google expanded Gemini's File Search tool with three additions: native multimodal handling (images and text together, via Gemini Embedding 2), key-value metadata filtering on indexed files, and page-level citations that point to the exact source location for any retrieved chunk. Use through the existing File Search endpoint; the additions are non-breaking for existing implementations. Reach for it when your RAG corpus is genuinely mixed-media — design systems with screenshots, scientific papers with figures, product docs with diagrams — and the text-only chunk-and-embed path has been giving you confidently wrong answers about the visuals. The page-level citations are the quieter win: an agent stops bluffing about "the document says…" because you can cheaply verify the claim. Delete: the homemade "extract images, caption them, embed the captions, hope it works" pipeline. Tradeoff: it's Gemini-only and adds Google as a dependency on your retrieval layer — fine for prototypes, harder to justify on a stack that already standardized on OpenAI or Anthropic embeddings.
blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/

06

chenhg5/cc-connect: agents-to-where-people-are bridge

Drive Claude Code from Slack, Telegram, or Discord. No public IP needed.

A multi-agent-to-messaging bridge (MIT, 8.4k⭐, install via `npm` / Homebrew / binary) that connects local AI coding agents — Claude Code, Cursor, Codex, Gemini CLI, Kimi CLI, plus any ACP-compatible agent — to messaging platforms: Slack, Telegram, Discord, WeChat Work, Feishu, DingTalk, LINE. No public IP required for most platforms. Includes scheduled tasks, persistent memory, and a web admin dashboard. Reach for it when the people consuming the agent's output aren't the same people running it — your PM wants to ask the codebase a question from Slack, you want to drive a long-running refactor from your phone, an on-call alert needs an agent to triage from a Telegram message. Distinct from yesterday's cc-switch (which handled accounts and providers); this is the people-side connectivity layer. Delete: the bespoke webhook server you built so a `/claude` Slack command could shell out to the right agent. Tradeoff: the more platforms you connect, the more credentials and tokens live in this one app — review the auth and storage story before you hand it your prod Slack token.
github.com/chenhg5/cc-connect

One of these,
every weekday.

Free. Unsubscribe by replying with one word. No tracking pixels in the email.