I wrote about why single-model AI has a structural quality ceiling and why ensemble/hybrid approaches consistently outperform: https://philippdubach.com/posts/the-impossible-backhand/
I agree with models being better at different tasks: gemini-cli is superficial, codex is stubborn as a mule and dependable, claude-cli just wants to get something working and done. qwen-cli, Qwen, in general, has a tendency to pendulate too much.
I also reduced the team to two, codex and claude, for me.
I need a tool to put them in a loop together to get this done more efficiently…I guess I’ll plug this in as a prompt and go from there!