I’ve been testing a simple idea inspired by Apple/Cloudflare/Anthropic: LLMs are far better at writing a small program than coordinating multiple tool calls. So instead of giving the model 10+ tools, I exposed one: a TypeScript sandbox with access to the same interfaces. The model writes a script → it runs once → done. What changed - +68% reduction in token use - No multi-step drift or retries - Local models (Llama 3.1 8B / Phi-3) became much more reliable Repo: https://github.com/universal-tool-calling-protocol/code-mode Curious whether others have seen the same thing: Is code execution a better abstraction for agents than tool calls? |
No comments yet