I built a PoC of an agent runtime designed to reduce prompt‑injection blast radius. Instead of doing tasks directly, the agent writes a web app to do the work. The app can run sub‑inference (model calls without tool access). So injected prompts can still distort outputs, but the worst case is a weird UI result, not a tool‑level side effect. Today it’s a VS Code extension you launch in debug mode (README has instructions). The design is modular enough to support other runtimes. It uses a CRDT (Automerge) to coordinate state between the web process and the main agent loop. Would appreciate feedback on the architecture, threat model, and next steps. |
No comments yet