Show HN: We made GPT-4.1-mini beat 4.1 at Tic-Tac-Toe using dynamic context

Show HN: We made GPT-4.1-mini beat 4.1 at Tic-Tac-Toe using dynamic context(github.com)

5 points by farouqaldori 319 days ago | 1 comment

We wanted to test if a smaller model like GPT-4.1-mini could beat its bigger brother 4.1 at the game Tic-Tac-Toe using only context engineering.

We put them in a 100-game tournament. For the smaller model, we gave it a few examples of winning moves from past games right before it made its own move.

The results were clear. Without the examples, the smaller model struggled against GPT-4.1. With the examples, its effectiveness increased by nearly 200%, and it consistently won.

It's a simple demonstration, but it shows that a smaller, faster model with good, timely examples can outperform a more capable base model.

The full write up and code are in the repo.

totisjosema 319 days ago |

Other author here, This started as an experiment to see how much the performance of models improves when you give them examples — basically, how big of a difference do examples actually make? We also wanted to explore whether there’s an ideal number of examples that gives the best results. Was quite fun and scalable to battle any LLMs you want…

We have a short video walkthrough of the setup here https://www.youtube.com/watch?v=z1MhXgmHbwk