DeepSeek V4: The Open-Source Model Frontier Labs Feared

DeepSeek V4: The Open-Source Model Frontier Labs Feared(helloai.com)

84 points by HelloAi 3 days ago | 31 comments

ndgold 2 days ago |

I didn’t read the article but I will say that the value/performance of Deepseek v4 flash is so awesome it is a lifesaver and I’m thrilled for it.

cultofmetatron 2 days ago | |

deepseek v4 flash is basically the anthropic killer. I've been able to offload the vast majority of my workflow to that using opencode go. between that and the occasional use of pro and kimi k.26, I don't understand what the big deal is about claudecode.

bfivyvysj 1 day ago | | |

How much does that cost you to run monthly?

sschueller 1 day ago |

The pricing of deepseek v4 flash is incredible. I have been hammering it with kilo code and end up using only cents per day.

superasn 2 days ago |

I think that is sale pricing at 75% discount till end of May only.

chid 2 days ago | |

still a lot cheaper, seems wild that it is basically like the price of Gemma 4

Alifatisk 2 days ago | | |

That's insane pricing https://api-docs.deepseek.com/quick_start/pricing

Lucasoato 2 days ago |

Do you know what kind of machine do I need to run the original DeepSeek v4 pro model with a good tok/s throughput?

killingtime74 2 days ago | |

You don't need a machine. You need a rack of them. 1.34TB VRAM https://wavespeed.ai/blog/posts/deepseek-v4-gpu-vram-require...

fgonzag 2 days ago | | |

Nobody is serving models in BF16 precision, not even commercial providers. Especially with newer quant methods (like nv4)

The article states you can fit Q4 in 4 x 4090 and it works reasonably well.

I'd personally fo for deepseek V4 flash at Q8, hardware prices need to come down though. Once an NV4 version get released it'll be easier to run on commodity hardware.

sterlind 2 days ago | | |

less if you quantize. apparently Q8 and Q4 do pretty well.

zamalek 2 days ago | |

It's not really plausible to host at home, unless you have deep pockets. What you/we win here is a model that doesn't suddenly become worse like the proprietary ones have been doing, and you can choose a provider from a competitive market.

karmakaze 2 days ago | |

DeepSeek v4 pro is still rather large, DeepSeek-V4-Flash[0] becomes relatively more reasonable with smaller quantizations and eventually will be able to effectively offload 'facts' to system RAM. See DwarfStar 4[1] for current sweet spots.

[0] https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash

[1] https://news.ycombinator.com/item?id=48142108

ruxiz 2 days ago |

Am I able to play with it at home?

LizaBabella 3 days ago |

The cost angle is what most coverage misses. We're using Claude Haiku in production for a small consumer app and the per-call cost is genuinely fine, but the second you have any kind of multilingual fan-out the bill grows non-linearly because the same query gets re-issued in N localized contexts.

Open-weight models with strong multilingual support change the math because you can self-host at marginal cost once you have GPU capacity. DeepSeek's earlier versions already punched above their weight on non-English benchmarks (especially CJK and some Indic languages where the gap to GPT-4 was much narrower than English-only benchmarks suggested).

Two questions for anyone who's actually deployed V4 in production yet:

1. How does it handle Turkish / Slavic morphology compared to V3? In our tests V3 was solid for Russian and respectable for Turkish, but handled compound morphology in agglutinative languages a bit awkwardly.

2. Is the long-context window actually usable end-to-end or does quality degrade past ~64k like with most open models?

shivang2607 3 days ago |

In my personal experience, no model comes close to claude when it comes to coding performance. It does not matter what any of the benchmarks says.

Having said that I really hope this model of deepseek, performs significantly on par with the claude saunnet model.