01
Edition
04
picks
Your agent stopped suggesting code and started running it.
Your agent stopped suggesting code and started running it. Today's launches are all fence, no engine. The shift that makes them matter is "code mode" — agents like VLM Run's Orion 2 now write a whole program and execute it end to end instead of asking permission one tool call at a time. The unit of risk used to be a function call you could approve; now it's a script that already ran, plus whatever it installed and whatever it read on the way. So the interesting work this week isn't a smarter agent — it's the perimeter around a dumb one you can't fully trust: a box to run it in, a leash on what it installs, a blindfold over what it sees. (There's even a benchmark now, islo-labs' RewardHackBench, for measuring whether the box actually holds when the agent tries to cheat its way out.)
02
VELA — the same problem, solved with microVMs instead
03
Refuse — block the CVE before your agent installs it
04
pii-gui — redact it locally before any model sees it
One of these,
every weekday.
Free. Unsubscribe by replying with one word. No tracking pixels in the email.