The core insight: when Claude answers a question alone, it's confident. When Claude has to answer after reading Gemini's counterargument, it's more careful and often corrects itself. The adversarial pressure is the feature.
A few things that surprised me building this:
1. Model diversity matters more than model quality. Three different models debating beats three instances of the best model.
2. Sequential > parallel. Models that read prior arguments produce substantively different (better) responses than models answering blind.
3. The moderator role is critical. Without synthesis, you just get three opinions. The moderator finds where they agree, where they disagree, and
why.
Would love feedback on the MCP integration experience — that's been the trickiest part to get right (resumable SSE streams, OAuth flow, structured tool outputs).
If Interested, hit me up and I share a 50% discount code with you.