Claude vs. OpenAI GPT-4 generated content, side-by-side comparison - OpenAI GPT-4 https://gist.github.com/adaboese/12e3c3d28783bc831c202ad1e55d932b - Claude 3 (Opus) https://gist.github.com/adaboese/d0b7397381726a7d394920e6a82ee39c Both of these are outputs of AIMD app. They are not made using a single prompt, but rather using RAG with over a dozen instructions. This allows to test a quite broad range of expectations, such as the adherence to instructions, error rate, speed, etc. Since the two model APIs are mostly compatible, I've decided to compare it side-by-side. A few interesting observations: - Claude followed instructions a lot closer than OpenAI. The outline that was provided to the initial instructions is pretty close to the final article structure despite multiple revisions. - Claude output scored better in terms of use of broader set of data formats (tables, lists, quotes). - Contrary to many tweets, Claude output is not excessively verbose. Worth mentioning that part of RAG instructions to rewrite content for brevity. - Claude took 5 minutes to execute 52 prompts. OpenAI took 7 minutes. |