13 candidates reviewed. 5 made the cut.
Three unrelated projects shipped containment mechanisms this week — sandboxing, credential isolation, pre-execution validation. The field isn't just building agents anymore; it's building the guardrails around them.
Top Signal
shuru — a microVM sandbox for running AI agents safely on macOS
The security model for most agent systems right now is "hope the model doesn't do anything stupid." shuru is a different bet.
It runs each agent inside a lightweight microVM — isolated execution environment, no access to your filesystem or network unless you explicitly grant it. Written in Rust. Local-first, no cloud dependency. You spin up an agent, it gets a VM, it can't escape unless you hand it a capability.
387 stars in 7 days. More importantly, the architecture is credible — the README describes capability-based permissions: agents get exactly the I/O they need, nothing more. This is how you'd design it if you were thinking about it seriously.
The gap this fills: E2B handles cloud agent sandboxing well, but there's been nothing for local macOS development. If you're prototyping agents on your laptop that touch your filesystem, run shell commands, or have credentials in scope — you've been running them with full host access. shuru changes that calculus.
It's early. But the concept is right, and someone was going to build this. The team that builds it first and gets the capability model right will own a real piece of the agent infrastructure stack.
Verdict: evaluate now if you're running agents locally. Complements E2B for cloud; doesn't compete with it.
Radar
pydantic-ai v1.63.0 — args_validator ships: attach validation logic to any tool, runs before execution. Agent calls delete_file(path=None), validator intercepts, returns error, model corrects — no side effect fires. Also ships same-day Gemini 3.1 Pro Preview support. Use now if you have tools with consequences. →
browser-use 0.11.12 — Browser.from_system_chrome() attaches to your existing Chrome session with cookies and logins intact. Agent operates inside an already-authenticated browser — the login-wall problem that blocked production browser automation now has a real answer. CDP connectivity fixes included. Better than Selenium auth, no comparison. Use now. →
planetscale/database-skills [253⭐] — PlanetScale shipped a skills package for AI agents working with databases: query execution, schema inspection, migration guidance, designed as MCP-compatible tools. First serious DB toolset from an infrastructure company, not a hobbyist. Beats generic text2sql on production-grade design. Evaluate. →
SixHq/Overture [196⭐] — Open-source MCP server that maps your coding agent's execution plan as an interactive flowchart before it writes code. Works with Claude Code, Cursor, Cline, and Windsurf. Debugging what your agent decided to do just got a visual interface instead of reading logs. Evaluate. →
Deep Cut
memelord — in-process agentic memory
76 stars. This one's flying under the radar.
Most memory systems for agents are external services — a vector DB, a managed memory layer, something you call over the network. memelord is different: it's an in-process memory system, living inside your application. No separate service, no infrastructure to run.
If you're building agents that need persistent memory but don't want to stand up Mem0, Zep, or a vector DB just to get there — memelord is the only serious in-process option in this space right now. The name is terrible. The idea is correct.
Watch this one, especially at early star count. In-process memory is the right architecture for a lot of embedded agent use cases.
AgentFeed is a daily briefing on the AI agent ecosystem. Tools, frameworks, releases — filtered so you don't have to.
/
