Openrouter Fusion API

78 points by tdchaitanya 5 hours ago | 26 comments

dsl 1 hour ago |

Heh. I built "Fusion" a few months ago as an MCP using OpenRouter. The idea was to give Claude a "panel of experts" to go talk to when it got stuck.

After extensive testing and benchmarking I discovered that when you ask one model to judge another's response you don't actually get a better answer. You are just asking it "how closely does this resemble the answer you would have given me." Additional rounds and all the "obvious" solutions that pop into your mind reading the proceeding sentence are essentially just cranking up the temperature.

I did find a solution, but it is insanely expensive. Maybe if this gains traction I'll release mine.

jiaosdjf 48 minutes ago | |

But.. but I told the LLM that it is an _expert_, is that worth nothing??

arizen 1 hour ago |

Some anecdata on Fusion: I run same query I used for Fable on OR Fusion and results were worse.

It felt, like Fable was able to kinda grasp very deep knowledge/intelligence layers and outline solution not only in agreeable way, but rather it proposed to prioritize solution items, with discarding some of the items, which made a lot of sense to me.

While Fusion felt more like a bit diversified answer of the same class of pre-Fable SOTA models, without touching the depth of knowledge/intelligence layers, which Fable was able to get, in my very limited tests I did, while Fable was accessible.

ljlolel 19 minutes ago |

Similar feature launched open-source and end-to-end encrypted on my TrustedRouter https://trustedrouter.com/

michaelbuckbee 2 hours ago |

I ran a quick eval to see what this looks like qualitatively vs just calling Opus 4.7 or GPT 5.5 directly.

As expected, Fusion was 7x slower and 4x the cost.

This isn't a knock against it, just that it I think this places Fusion into a "use it only when you need it" category.

https://3fpi5avcqq.evvl.io/

IanCal 1 hour ago | |

Which models were you using under this? If you used the quality default as exists in the interface, it makes sense that it was ~4x the cost as it'd be 3 frontier models judged by one of those.

The idea would be to use fusion with simpler, cheaper models.

nielsole 1 hour ago | |

Sounds like fusion would be a really good distillation target?

galsapir 1 hour ago | |

yeah its really counterintuitive i think; i.e, getting the right framework and structure for this to work probably isn't trivial, models really hate playing well together. i wonder how their version would fair in real world use.

bsenftner 1 hour ago |

I'm sure many have made something like this, I've done a few. I've found simply submitting one's prompt to multiple models to be kind of pointless. You're just going to get statistical noise from the variances in their training methods, as they are all training on pretty much the same data.

I get significantly better results by pre-prompting each LLM (they can be the same LLM too, just another instance), I pre-prompt them to approach from a different perspective. Basically, I create expert personas that each believe they are someone of a different career, different intellectual perspectives, and then that generates a real debate between experts.

Oras 56 minutes ago | |

Agree, and I see opus and Gemini pro as “quality” on openrouter fusion, this would be super pricy if the prompts are dynamic and not optimised for caching.

I would love to hear why they have created it, what was the business case, what this is going to serve? As you said, this is pretty easy to replicate

_pdp_ 1 hour ago |

You could easily distribute the same task to 5 subagents that are specifically programmed to do as best as they can based on their scope and merge the results into a single coherent response.

That is more or less the same thing.

I am not sure who is the intended user of this fusion api as with all things prompt + model matter.

vidarh 1 hour ago | |

People who don't want the hassle. A lot of Openrouters selling point is removing hassle, and providing things like this can move them up the value chain for people who aren't very cost sensitive and are happy to pay to get better outcomes without having to do the work themselves.

andai 2 hours ago |

Context:

Surpassing Frontier Performance with Fusion

https://news.ycombinator.com/item?id=48525392

And a slightly better UI here: https://openrouter.ai/fusion

On OpenRouter's fusion API your request is routed to several models simultaneously and a judge model combines their answers into a final response. This significantly boosts performance, at the cost of time (at least on the one benchmark they tested, a deep research benchmark).

They have a Budget preset consisting of 3 cheaper models (which roughly matches Fable on that benchmark, costing half as much), and a Quality preset of 3 expensive ones (which beats Fable, but costs twice as much as Fable).

Pareto graph: https://openrouter.ai/blog/images/blog/fusion-benchmark-cost...

Curiously, fusing a model with itself also boosted performance (2xOpus4.8 roughly matching Fable on the benchmark, but costing twice as much as Fable). There's a further, smaller gain from mixing different models. The main gain seems to be from additional test time compute.

Would love to see more research on this, especially focusing on the cheap models that came out recently (e.g. Fusing DSV4 with itself, or with Mimo), and to see what the tradeoffs look like between running a fusion (parallel test time compute) vs increased reasoning or turns.

rektlessness 1 hour ago |

I tried OpenRouter Fusion with the budget model option but swapped out DeepSeek v3.2 for DeepSeek V4 Pro. The results weren't that bad. An interesting take on quorums for sure. However I did notice a tool call to Claude Opus 4.8 for 1168 - 237 tokens, and $0.0118 cost, which I cannot account for because Opus was not in my selection and only revealed in logs. Strange.

maccam912 1 hour ago | |

Same for me! I bet they use opus to synthesize the final answer somehow? Regardless, it was unexpected.

eknkc 1 hour ago |

I opened the page and prompted it `Which 3d printer is the best`. I mean this is a stupid question but I was looking at some 3d printers so it popped into my mind.

Seeing this log is interesting: https://link.ekin.dev/6RzYGGX7

It came up with a decent response but I guess Opus or GPT 5.5 would do fine anyway. Gotta try it on different stuff. But this feels like it would work great on some situations.

bushido 1 hour ago |

Interestingly I've had a similar experience with agent teams/swarms, albeit they can get much more expensive depending on the workflow.

I found that Fable didn't have as much of an impact when put in a team.

But it was/is a very pleasant model to work with 1:1. And was the first time I didn't use my primary team based workhorse in months, across 10s of sessions last week.

Havoc 2 hours ago |

Interesting. Will definitely use this.

One scenario I can see it working is writing markdown specs before the coding starts and analysing it for gaps. That’s so few tokens that throwing as much LLM against it as possible is worthwhile regardless of cost per million tks

egeres 2 hours ago |

I wonder if these fusion techniques could help to run better local AI by streaming tokens from multiple machines and combining them

galsapir 1 hour ago |

really interesting that its basically almost 80% claude opus..