Show HN: Tilth v0.5.0 –> ~40% cheaper AI code navigation (160 runs, 3 models)

4 points by jahala 116 days ago | 2 comments

Smart code reading for humans and AI agents. Tilth is what happens when you give ripgrep, tree-sitter, and cat a shared brain.

—

v0.5.0 was about figuring out why models weren’t using tilth tools consistently — even when they were available.

Results vs baseline (built-in tools only):

Sonnet 4.6: -44% $/correct (84% → 94% accuracy, 31% fewer turns)

Opus 4.6: -39% $/correct (91% → 92% accuracy, 37% fewer turns)

Haiku 4.5: -38% $/correct (54% → 73% accuracy, 7% fewer turns)

—

https://github.com/jahala/tilth/

Full results: https://github.com/jahala/tilth/blob/main/benchmark/README.m...

— PS: I don't have the budget to run the benchmark a lot (especially with Opus), so if any token whales has capacity to run some benchmarks, please feel free to PR results.

joknoll 115 days ago |

I love the idea of not only trying to improve models by giving them more "cognitive" power, but also by improving the harness, where improvements seem to be very low hanging fruits compared to advancing frontier models. This could make older/smaller models also viable for coding agents.

jahala 114 days ago | |

Hey @joknoll - in the benchmarks, I'm seeing very positive results with Haiku, getting quicker and more correct answers. So I think you're absolutely right that harness improvements will be a natural part of "sharpening" most models - especially the smaller ones with less reasoning capability.