The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin

simonw 12 hours ago |

First model I've tried that gave me back HTML with a "Change Pelican Color" button: https://static.simonwillison.net/static/2026/hy3-preview-pel...

(Transcript: https://gist.github.com/simonw/c2a0d8ecd3056a2681319eae8fc3f...)

cwmoore 5 hours ago | |

But…and I’m sure I’m not alone here…that is a snowman, and what it is on is not a bicycle.

What do we think we are doing with this life?

fragmede 12 hours ago | |

Haha does it get bonus points for the extra button, or does it fail because html != SVG?

dodslaser 12 hours ago | | |

Any bonus points for the color sre immediately subtracted because the "animate wheels" button leaves the wheels stationary and makes the sun rotate.

MostlyStable 11 hours ago | | |

I wonder if it is actually animating the wheels as well, but just managed to match up the spin rate to the gap size.

Garlef 9 hours ago | | |

Judging from the dotted trajectory lines, it even "thought" about giving the bike a wobble.

(But maybe that's just my interpretation based on something else going wrong in the animation)

cicko 10 hours ago | | |

That depends on the perspective. If you're on the Sun, the wheels rotate around you.

fragmede 11 hours ago | | |

Hy3 is a Scandinavian model, and is leaking that out via Norse mythology about Sol being a wheel!

postepowanieadm 10 hours ago | | |

ROTFL

preek 7 hours ago | | |

It actually rendered an SVG inline in the HTML page. I just tested the SVG and it renders itself just fine, including colors. So, tbh, I'd say the task has been properly achieved.

embedding-shape 3 hours ago | | |

Maybe I'm just extremely nitpicky, but I'd consider that a failure, as the prompt is asking for SVG, not HTML.

Bit like asking for CSS and then getting a HTML file back with the CSS embedded, that was not what I was asking for!

Aurornis 14 hours ago |

> Two new models are now beating LLM darling Claude in terms of token usage and by more than 50%?

Time for a reminder that OpenRouter leaderboards only show tokens sent through OpenRouter, which most Anthropic API users don’t use.

svantana 6 hours ago | |

I would think that's true for all the models on OR. The data is skewed for sure, but it's interesting none the less.

killingtime74 4 hours ago | |

Are you next going to say YouTube rankings don't take into account videos that aren't on YouTube and Spotify rankings don't take into account songs that aren't on Spotify?

bandrami 12 hours ago |

For the life of me I will never understand the thought process that leads you to say "we don't really know who developed this LLM but I'm going to feed all of my business's data to it"

WithinReason 8 hours ago | |

It's from Tencent, says it in the article:

https://hy.tencent.com/research/hy3

bandrami 8 hours ago | | |

Right but Tencent is a massive half-state-controlled holding company so that's not really helpful.

throawayonthe 4 hours ago | | |

but we know who they are? how is this relevant

minraws 7 hours ago | | |

OpenAI & Anthropic are deeply in bed with US govt, and they need US govt approval before model releases, and all US Companies under various acts need to share data with the govt.

I mean sure there are investors and a little more open-ness, but with the example of Mythos we don't even know if public will get access to the "good" stuff because it's too dangerous.

If your only opinion on trusting these companies more than one based in China is, they are Chinese then good luck, all the best.

estearum 6 hours ago | | |

The difference is "the various acts" in the US are things that are largely very hard to do, extremely limited in scope, and companies who dispute the government's propriety can (and do) go to court to fight it.

Sure "China bad, US good" is naive, but certainly not more naive than suggesting that companies and individuals have similar rights and protections as each other.

> and they need US govt approval before model releases

This is just not true and it would be a gigantic legal battle to make it true against the model companies' wishes, which is indicative of your entire misunderstanding here.

adrian_b 4 hours ago | | |

There was recently some announcement from the US govt itself (after the Mythos announcement) that they were pondering about allowing model releases from now on only after approving them.

So it may not be strictly true for the moment, but it is certainly something that the current US govt can mandate at any time.

bandrami 7 hours ago | | |

Well, I mean, just as a legal question I'm not allowed to use Chinese software at work, so yeah that's kind of definitive for me

nl 6 hours ago | | |

> and they need US govt approval before model releases

This isn't the case (yet).

irthomasthomas 6 hours ago | | |

It is for models trained with 10^26 flops. Anthropic confirmed Mythos was less than this. You could estimate the upper bound on model size from this.

est 11 hours ago | |

> I'm going to feed all of my business's data to it

Your business data is probably worthless, even considered harmful for the pretrain corpus.

Your interactions and decision making process are most valuable parts of the whole business.

bandrami 11 hours ago | | |

I assure you my business's data is not remotely worthless which is why there are pretty strict laws and regulations about what we can do with it

TZubiri 10 hours ago | | |

>Your business data is probably worthless

please tell me you are not in charge of the data of any business I'm a client of

elpocko 8 hours ago | | |

Could be! Let's check. I just need your name and address, your SSN, a list of businesses you are a client of, and a DNA sample.

est 9 hours ago | | |

to clarify, probably worthless to AI vendors, but might be useful for third-parties.

TZubiri 9 hours ago | | |

Third parties that can be clients of the AI vendor...

selcuka 7 hours ago | | |

If it's worthless to AI vendors, they won't include it in the training corpus, so third parties won't have access to it.

kirtivr 9 hours ago | |

You don't need to know who developed the LLM - whether it was Google or OpenAI.

What you need to know is who is the provider for the LLM, and whether their endpoints are zero data retention enabled and opted out of training. OpenRouter gives you an easy way to control this.

lmf4lol 9 hours ago | | |

This is not entirely true and ignoring a couple of potential attack vectors like Data Poisoning: https://arxiv.org/abs/2408.12798

Its of course highly dependant on the use case and the environment, but simply saying that the only important part is to know where the data goes is too simple.

koiueo 9 hours ago | | |

How can openrouter control what LLM provider does with your data on their side?

kirtivr 9 hours ago | | |

OpenRouter and the provider sign a contract clearly specifying how input data is to be handled.

It's the same way we trust OpenAI to not train on our data if we've opted out although there is no control on whether they can retain the data indefinitely.

lmf4lol 9 hours ago | | |

I really dont want to be cynic but those guys gave a flying f””” about copyright while scraping the whole internet. How can I ever trust them to respect the oot-out setting. I cant. Thieves be thieves.

And even if they dont train on the data. Who guarantees us, they dont let another AI model analyse all the data, exfiltrating all kinds of intelligence and using it? I only can imagine what OpenAI and Anthropic know….

astrange 9 hours ago | | |

Scraping the internet isn't a copyright violation. Using it for LLM training is much more transformative than Google and Internet Archive, which are legal.

koiueo 8 hours ago | | |

Contracts means shit if they are not enforceable.

Ask yourself

1. How would you know the provider has violated the contract?

2. How could you prove it?

3. Why would OpenRouter take your side in this (unlike your example with OpenAI, you're not a signing party)?

4. How would OpenRouter enforce the contract after all three above are somehow resolved in your favor?

IANAL, but IMO it's all a legal theater.

EDIT: formatting

ddalex 12 hours ago | |

what can it do ? it's just a big set of numbers, if you trust the host that's good enough

what266262 12 hours ago | | |

If you are ok with everything being fed into it being stored forever I guess it’s no problem. I don’t see how you trust them if you don’t know them.

Dylan16807 11 hours ago | | |

Who is "them" here? The developers and the hosts are not the same.

bandrami 10 hours ago | | |

(And either one is a threat vector)

ddalex 5 hours ago | | |

where would it be stored ? it's just a big set of numbers.

Mashimo 11 hours ago | |

If you Code open source projects anyway, might give it a spin.

st3fan 5 hours ago | |

How do you “feed data into a model” ? Use the correct terminology and concepts please. It is important.

zone411 11 hours ago |

I’ve tested this model on four of my benchmarks:

https://github.com/lechmazur/buyout_game 10th out 36.

https://github.com/lechmazur/pact/ 14th out 25.

https://github.com/lechmazur/nyt-connections/ 60th out 81.

https://github.com/lechmazur/debate 16th out of 29.

simonw 12 hours ago |

OpenRouter rankings frustrate me, because they show the total number of tokens but they provide no indication of how many unique users a model has.

Which means if a surprise model tops the leaderboard one week we can never be sure if it was because a single whale user pushing billions of tokens a day switched to it, or if it represents a genuine community trend towards that model.

svantana 7 hours ago | |

Also, while we're pitching new features to openrouter, I'd like to see a "$ spent" chart, which would remove all these huge freebie spikes. It looks like it would be pretty much dominated by claude.

senordevnyc 9 hours ago | |

Agreed. My little solo dev SaaS app’s production pipelines push almost two billion tokens a day.

senordevnyc 2 hours ago | | |

Haha, I never tire of the AI haters downvoting stuff like this.

Down with reality!!

andai 15 hours ago |

So basically, Hy3 is the cheapest decent model on OpenRouter, unless you use DeepSeek as the provider for DeepSeek V4 Flash, in which case DeepSeek's insane caching wins out. (And Hy3 is close-ish on the benchmarks.)

0xbadcafebee 14 hours ago | |

You need to use DeepSeek API directly to gain the extra caching benefits. The DeepSeek provider on OpenRouter is only the 5th-cheapest for V4 Flash, so you have to specify DeepSeek provider when calling OpenRouter. But DeepSeek's API discounts on its models only applies if you call DeepSeek directly. So anyone using OpenRouter to call DeepSeek models is actually losing quite a bit of money.

NitpickLawyer 11 hours ago | | |

> The DeepSeek provider on OpenRouter is only the 5th-cheapest for V4 Flash

You might have the default settings on your account, which limit Deepseek as a provider. If you disable that feature you see them on openrouter as well (and they serve it at the same cost as their own API).

0xbadcafebee 10 hours ago | | |

I just checked my settings and I have everything enabled. https://openrouter.ai/deepseek/deepseek-v4-flash?sort=price (per-1M price) shows DeepSeek provider as #5. https://openrouter.ai/deepseek/deepseek-v4-flash/pricing?sor... (effective price) shows them as #3. The effective price will change your total cost since each provider has a different price for input vs output vs cache, so what's #1 and #5 for one person could be #5 and #1 for somebody else, depending on their workload.

However, I just double checked, and OpenRouter's pricing page for Flash v4 with DeepSeek provider shows a cache hit rate of $0.0028, which is the same as on DeepSeek's official API pricing page ($0.0028), so they do seem to be the same price, (assuming DeepSeek is able to pin your specific OpenRouter requests to the same DeepSeek server). OpenRouter adds 5% to that cost, but still it might be cheaper than the other providers.

Also just found out OpenRouter has a new feature "Response Caching" where they can cache identical requests and return them immediately with no billing. The entire request must be identical, though, not just a prefix, and you have to enable this feature. I don't know who would need to send multiple identical requests, but it's better than nothing?

NitpickLawyer 10 hours ago | | |

Interesting, it seems we have some providers offering dsv4-flash cheaper than ds themselves. For the full model it's the other way around, all 3rd party providers are 2x+ more expensive.

beacon294 13 hours ago | | |

ZDR is also on by default and deepseek is not ZDR.

0xbadcafebee 14 hours ago |

> it makes sense that a cheaper model would prevail, but only if it offered similar quality

You're trying to think logically, which has no place in an AI discussion. :) People just jump to whatever the latest model is. Plenty of people also prefer price to "quality" (which is very subjective). It's new, it's cheap, so people use it. It's likely people will stop using it when something else is cheaper and/or newer.

olmo23 7 hours ago | |

Since my employer pays for it, I just select the latest and greatest.

cicko 10 hours ago |

How is it a "mysterious" model? It's Tencent's Hy3?

theanonymousone 10 hours ago | |

My question as well. Isn't Tencent a very well-known company? Maybe the mystery is in the model itself?

alecco 10 hours ago |

PSA: Don't use OpenRouter for DeepSeek V4 as it messes up you caching. Use DeepSeek API directly and you'll get 2x to 3x more cached tokens.

vessenes 13 hours ago |

Since there’s only one inference provider it could be a recycling/ad experiment. The similar usage between trial and paid periods would be explained by this as well.

lithiumii 10 hours ago |

What's so mysterious? Isn't it from Tencent?

gmerc 7 hours ago |

Very mysterious: https://huggingface.co/tencent/Hy3-preview

thot_experiment 11 hours ago |

Tried this extensively in OpenCode, never used it once since Gemma 4 came out, got into thought loops and did stupid edits I didn't ask for more often than the local 31b model. One of the worst "frontier" models I've ever tried.

segmondy 4 hours ago |

High token usage cuz it's free doesn't count

freakynit 12 hours ago |

This was originally a 400+B param model which was later reduced to 295B considering it as the "optimal zone".

https://www.mdshare.online/s/uend0pj3og_A_rgcxzINf