Plan mode in Claude Code / Codex works, for one session. Next session, your agent re-reads source and re-derives the same decisions you already made. TNL (Typed Natural Language) is that same review-before-code discipline, but persistent: a short English contract with a fixed schema (paths, behaviors with MUST/SHOULD/MAY/[semantic], non-goals), proposed by the agent, approved by you, implemented against, saved on disk, and read by every future session. It's not a new agent or tool, it slots into whatever you already use. npx typed-nl init adds a workflow stanza to your CLAUDE.md / AGENTS.md / GEMINI.md, scaffolds a tnl/ directory, and optionally wires a PreToolUse hook and MCP server. The minimum product is a stanza + a folder. Hooks, MCP, and tnl verify (CI gate for path and test-binding integrity) are optional layers. We ran a controlled A/B on an existing 16KLOC Python codebase, event-driven triggers, a 35-scenario behavioural matrix, deliberately ambiguous prompt. Both Baseline and TNL conditions got the same coding discipline in their instruction file; Same agent, same model, same base commit. Results:
No overlap: TNL's lowest paired cell is 86%, baseline's highest is 83%.Other signals: Follow-up work: on round-2 tasks in the same worktrees, TNL agents edited the existing contract (4/4 samples); baseline re-read source. Caveats: small n, LLM sessions are noisy, and we built the tool. Every script, prompt, raw JSON, and session transcript is committed. We dogfooded it, every feature of the tool itself has its own TNL in tnl/. Install: npx typed-nl init Repo: https://github.com/janaraj/tnl npm: https://www.npmjs.com/package/typed-nl Happy to answer questions, especially from people who've tried plan-mode workflows and want to know where this differs. |