GPT-5 Usage Guide(platform.openai.com) |
GPT-5 Usage Guide(platform.openai.com) |
Interesting, implies that they still have not fully solved the tool call issues "reasoning" often introduces. Neither Google, nor Anthropic or OpenAI seem to have a reliable solution for that yet. Also, GPT-5 has only 40% of GPT-4.1s context window, so not a full drop in replacement for all use cases, though I have never been able to reliably test any model beyond 250k [0], so potentially this won't be an issue for most.
[0] Not because of any issue with Gemini 2.5 Pro or GPT-4.1, but because I personally have a hard time coming up with realistic test scenarios of such massive size. Just dumping a large code base easily can fill even a 1m context window, but testing whether everything gets properly parsed is challenging. Exception is Llama 4, which has issues even at 75k, despite the 10m context window technically supported.