DeepSeek reasonix, DeepSeek native coding agent with high caching and low cost

DeepSeek reasonix, DeepSeek native coding agent with high caching and low cost(esengine.github.io)

147 points by Alifatisk 4 hours ago | 85 comments

I'm not sure you need a "DeepSeek native coding agent" to take advantage of DeepSeeks cache, yesterday as the Codex quota usage issue still wasn't solved for me, I wrote a tiny little bridge so I could use DeepSeek V4 Pro via Codex, and seems most of everything I did was basically cached as far as I can tell: https://i.imgur.com/7eKn6wN.png (2026-05-23 Input (Cache hit): 39,123,200 tokens, Input (Cache miss) 1,692,286), and the bridge is doing not special, just massage the DeepSeek API shape into what Codex expects, nothing particular about caching at all.

Besides being even better at the caching, I'm not sure what benefits you'd get compared to just firing up OpenCode with the DeepSeek API yourself, it'll similarly do caching for sure and also "talks directly to api.deepseek.com" if that matters, and you'll get a much more mature harness.

3uler 2 hours ago | |

Opencode has really bad cache stability issues that they seem uninterested in fixing at the moment.

dathery 33 minutes ago | | |

The OpenCode devs talk about this on Twitter a lot, e.g. https://xcancel.com/thdxr/status/2048268697790300343

> tool call pruning breaks cache and people will tell you this is horrible and expensive

> except i looked at some anthropic data and real user behavior ends up with better cache hits and 30% less spend

> even this is needs to be analyzed further, it's just not simple

> for openai data it's inverted! cache hit ratio is actually better [sic: I think he meant worse based on the screenshot] with tool call pruning turned on

> but the net $ saved is only 5%

> kimi is a funny one - it has better cache hits with pruning on...but is also more expensive!

There was also another thread recently where he discussed that pruning improves user experience (models are smarter with less context) but I can't find it.

This can also be disabled in the config: https://opencode.ai/docs/config/#compaction

huqedato 52 minutes ago | | |

I can't confirm this. Having utilized Opencode for a large project over the past 10 months, with multiple models and agents, we've never run into such 'cache stability issues'."

embedding-shape 1 hour ago | | |

That'd be really easy to spot and also fix, most likely. Any open issue you could point us to, must surely been reported already?

bwfan123 2 hours ago | |

> I wrote a tiny little bridge so I could use DeepSeek V4 Pro via Codex

Can you share the bridge. DeepSeek v4 is awesome paired with claude-code or opencode. I found that claude code costs me less than opencode and I am presuming this is due to a better engineered harness.

embedding-shape 2 hours ago | | |

Sure, keep in mind it's a steaming pile of hacked together hacks, probably won't work in every case, doesn't support every feature that should be supported (like parallel tool calling, both Codex + DeepSeek API support it), and it might make your computer catch on fire: https://gist.github.com/embedding-shapes/eab3e63e5a95d3d78a2...

I only used it for a few hours to play around with stuff before the quota issue was fixed and I could resume using GPT models, and the bridge was coded by DeepSeek-V4-Flash-IQ2XXS + DwarfStar4 locally, I take no responsibility for what might happen with your computer or you, during usage or just reading the code.

Edit: heh, like don't look at line 117 for example where seemingly it likes to handle misspellings in the .env file which totally wasn't my fault for typo'ing the API key in that file... I'm sure there are tons of sharp edges and dumb stuff in there.

bayesianbot 10 minutes ago | | |

LiteLLM can serve OpenAI API endpoint IIRC and proxy that to other providers like DeepSeek, should work with Codex

Den_VR 1 hour ago | | |

I’m feeling more a novice every day, but how isn’t this just handing over your code to team deepseek for whatever they might want

himata4113 2 hours ago | |

this appears to be native to the terminal, as in, there's no special application that runs or wraps an agent inside a tui. So basically instead of commands you type plain english?

embedding-shape 2 hours ago | | |

> this appears to be native to the terminal, as in, there's no special application that runs or wraps an agent inside a tui

Same with codex? codex-rs at least, is a TUI as well, it does run a "app-server" in the background, that the TUI actually interacts with, but that's just an implementation detail. Also makes it easy to hook in your own programs to fire of codex "headless" sessions even without the TUI.

skeledrew 2 hours ago |

Not a fan of that page. The animated typing and resulting continuous resize of the example keeps moving the content beneath it down and up. Such bad UX.

embedding-shape 2 hours ago | |

Agents or no agents, people still need to test their websites on different resolutions or at least window width, but seems this is becoming a lost art.

mirekrusin 1 hour ago | | |

Yeah, doesn’t look designed for people who want to read it beyond animated typing animation.

m4rkuskk 35 minutes ago | |

Claude design AI slob.

danborn26 33 minutes ago |

High caching rates for coding agents can drastically reduce latency and API costs. I am curious to see how the caching strategy handles context invalidation across multiple files.

imagetic 48 minutes ago |

https://shittycodingagent.ai

chabes 47 minutes ago | |

Aka pi.dev

andai 21 minutes ago |

But Claude made the website?

Alifatisk 10 minutes ago | |

What conclusion are you drawing from that?

unshavedyak 1 hour ago |

It's pretty funny, i'm a $200/m Claude subscriber and i've had little need to use anything else. However the more Claude has been restricting my workflow (notably around the recent IDE/-p usage change) the more i've been wanting to go elsehwere.

I'm concerned since i really want SOTA reasoning, but DeepSeek still has me interested.

0xbadcafebee 6 minutes ago | |

[delayed]

Alifatisk 25 minutes ago | |

> I'm concerned since i really want SOTA reasoning

I think you should give other models a try and see how much they differ from SOTA models. I did this and realized, even Qwen-2.5-Max was enough. I am sure even Claude Sonnet 3.5 is enough for things I play around with. I am not really striving for fields medal in Mathematics.

logicchains 36 minutes ago | |

If you want SOTA reasoning you should be using GPT 5.5 Pro.

auggierose 2 minutes ago | | |

Codex has only GPT 5.5

hebetude 1 hour ago |

Wow the UI looks exactly what I vibe coded yesterday. What a coincidence

huqedato 51 minutes ago | |

It's obvious why...

schaefer 2 hours ago |

Okay, I'm curious.

From the FAQ, I see:

>Can I point it at a self-hosted / private DeepSeek endpoint?

>Yes. Since 0.30 we accept non-standard key prefixes for self-hosted DeepSeek endpoints. Just point `baseUrl` at your internal address — the loop, cache strategy, and tool protocol are unchanged.

But my question is: If I use Reasonix to talk to a deepseek endpoint through openrouter, am I still getting the cache-hit benifits of this agent harness?

csunoser 1 hour ago | |

Yes*. At least from my limited usage of deepseek-flash for a few billion tokens on openrouter, the cache-hit rate is >95%. And I simply used the claude code harness pointed at the openrouter anthropic compatible endpoint with no fluff.

schaefer 1 hour ago | | |

thank you!

declan_roberts 2 hours ago |

I love the focus on cache hit efficiency. Hats off to the deekseek team for creating a great product that maximizes cost efficiency for the user.

bwfan123 2 hours ago | |

> Hats off to the deekseek team for creating a great product

I have been using it for a while, and I wholeheartedly agree. imo, it is as good as codex or claude which I also use. It is a winner in the cost-sensitive tier, and if some startup could put it together with data-retention in mind, it could be a great product sold to the enterprise, as data-retention and privacy are the main issues for the coding-assistant usecase.

chillfox 1 hour ago | | |

Deepseek v4 pro is definitely my preferred cheap model, it's very good, and I use it all the time for my personal projects (opencode go plan), but I also use Claude Opus all the time at work and Deepseek is not as good as that, but it does compete with Sonnet for capability, and beats it on price.

nicce 1 hour ago | |

Just in case, note that this project is someone's side project

> Independent open-source project · not affiliated with DeepSeek

Bombthecat 1 hour ago | |

Adding already cheap API cost and you probably could let it run for days and the same task..

stavros 2 hours ago | |

How can you have cache hit efficiency? Isn't it just a matter of not changing the previous context? I don't understand what knobs there are to tweak on this.

everforward 1 hour ago | | |

> Isn't it just a matter of not changing the previous context?

Yes, but a lot of harnesses change previous context. E.g. the system prompt injects the current time/date, working directory, files in the working directory, etc. Compaction also changes the whole previous context. I _think_ changing the list of tools also invalidates cache, so invoking a subagent with different tools would invalidate the cache.

My vague impression is that it's in a similar vein to functional programming languages. It generally disallows doing things that lead to bugs (cache misses in this case), and presumably allows you to do those things in a way that makes it much clearer that this is likely to cause cache misses. I would guess that in this paradigm, you don't mutate your existing session, you derive a new session by mutating the prior context into a new context.

mmaunder 1 hour ago |

Unusable thanks to the top animation pushing the rest of the site down repeatedly as you’re trying to read.

Hfuffzehn 27 minutes ago |

This is really tickling the conspiracy theorist part of my brain.

"Independent open-source project · not affiliated with DeepSeek" "Reasonix only targets DeepSeek because..." "Why DeepSeek only? Can I swap to Claude / GPT? It's a design choice, not a limitation"

The lady doth protest too much, methinks?

Nicely timed shortly after the making the rebate permanent anouncement.

Could just be Chinese devs trying to help western devs with some software and a western facing marketing campaign to raise awareness. Could be DeepSeek astroturfing. Could be "someone" in China trying to get more access to western data.

Who knows?

fouric 43 minutes ago |

I don't think it's particularly effective to create a new coding agent when there's existing open-source agents (especially extremely extensible ones like Pi) that already optimize for cache hits, have far larger communities, and work for providers other than Deepseek.

I specifically use multiple different models and providers, so this wouldn't be useful for me.

And it contributes to the problem of each person vibe-coding their own, incompatible, half-baked tool in a space, instead of contributing to a small set of tools and expanding them.

It'd be better to just extend an existing tool.

ricardobeat 46 minutes ago |

> The loop is append-only, engineered around DeepSeek's byte-stable prefix cache — long sessions hold 90%+ cache hit and input-token cost collapses to ~1/5. Terminal-first, leave it running.

AI marketing slop. This is how all models and coding harnesses work, isn't it?

The author claims (in another AI-written post):

> LangChain — along with every generic agent framework I checked — rebuilds the prompt every turn. Timestamps get injected. History gets reordered. Tool schemas re-serialize with different whitespace.

I haven't touched LangChain in a long, long time, but don't think any of the current harnesses, Claude Code, Pi, Crush, OpenCode etc do that except if you change configuration? Keeping the context stable for caching is a very basic principle and not a wild innovation.

This posing as DeepSeek-specific is also a mystery.

yalogin 1 hour ago |

Can someone give me a eli5 version of what this is? It really sounds useful to Claude subscribers.

Is this improving the cache hit and hence overall efficiency of coding workflows?

Does it also let me host a local llm (deepseek)? What are model min requirements for this?

timcobb 53 minutes ago | |

You can also ask Claude and get an immediate answer, the power is yours

pkulak 54 minutes ago |

Doesn't Pi Agent do exactly this? Assuming "append only" means they do some kind of compaction as well.

hirako2000 2 hours ago |

Good timing given the cost spike across other frontier models.

notjes 2 hours ago | |

Good thing DS just made their discount permanent. https://x.com/deepseek_ai/status/2057854261699195173

theanonymousone 2 hours ago |

Isn't caching a server-side thing? How does the agent affect it, significantly at least?

embedding-shape 2 hours ago | |

Say you put the current time down to the second in the system prompt, which is the message that goes in front of the entire conversation, then basically nothing will be cached, every agent turn needs to ingest the entire session over and over. Contrast to not doing that, and the backend can leverage caching all the way up to the latest message, as nothing until then changed.

esperent 2 hours ago | | |

Surely other agent CLIs are not dumb enough to invalidate cache on every turn over something so obvious?

theanonymousone 57 minutes ago | | |

Yes, of course you can destroy it. But how far can you "improve", beyond decent "common sense" behaviour.

singiamtel 1 hour ago |

I would've liked benchmarks against other harnesses showing the caching performance

Alifatisk 23 minutes ago | |

Is there benchmarks and measurements that offers comparisons between different harnesses?

hmokiguess 44 minutes ago |

Click on the download page, it's hilarious. It has a lot of information about the "smart probe" on the download and it's a realtime probe you can rerun.

That's the pinnacle of AI slop over engineered garbage in my opinion. All of that information is noise.

am17an 46 minutes ago |

This Claude front end skill is now soon to be slop.

ricardobeat 43 minutes ago | |

Already is. Every new website looks exactly the same.

quotemstr 1 hour ago |

> no reordering, no marker-based compaction

Is this really the behavior you want? Yes, doing tool-result clearing and such will blow your cache, but if you do it only occasionally, it's still likely a win. Yes, cache hits are good, but not so good that it's okay to be profligate with context to preserve those precious, precious KVs.

sergiotapia 2 hours ago |

What AI model did you use for the website design? This is the second one I see with the exact same font and color scheme. Just curious because Claude models lean towards purples for example. Thank you!

pcwelder 1 hour ago | |

Opus 4.7 selects such palette and motifs by default. Might even be first iteration of claude design.

franga2000 2 hours ago | |

This design still screams Claude to me, but a newer version than what you're thinking of. At some point they added a markdown file that tells it to use obviously AI designs like lots of blue/purple and gradients. Since then, this is its new style.

sheepscreek 2 hours ago | |

DeepSeek v4 perhaps?

FergusArgyll 1 hour ago | |

Frontend design skill by Anthropic specifically says not to use purple. I'd be surprised if it still uses purple. Have you seen that recently?

canadiantim 2 hours ago |

So what's best low cost coding agent these days? Kimi 2.6? Qwen's latest closed model? Composer 2.5? DeepSeek?