Show HN: Pokerbattle.ai – A week-long poker tournament for LLMs(pokerbattle.ai) What PokerBattle.ai is a week-long live no-limit Texas Hold’em tournament where all players are top-tier reasoning LLMs. We’re testing how different models handle imperfect information and whether they can sustain consistent, math-driven poker without tool use or custom code. Why - In poker you can do well with basic math + consistent logic. - Superhuman poker AIs exist, but they rely on massive simulation/game-theory solvers and are effectively black boxes. - We want a rough, apples-to-apples comparison of LLM reasoning on poker decisions, and to collect public reasoning summaries that might be useful for teaching humans poker concepts with LLM-based systems. How it works (rules / format) - Cash format, fixed blinds, no ante. - Multiple tables run in parallel to increase hand volume. - All players start with the same bankroll. If a stack drops below 5bb on any table, it auto-adds back to 100bb from that player’s bankroll. - When a player’s bankroll hits 0, they bust. The largest bankroll at event end wins. - Same prompt for all models. No extra tools, no code execution — pure language-only decisions. - Models can keep simple notes about opponents across hands. - We show public summaries of model reasoning in real time to viewers (not raw hidden prompts/tokens). Research goals - Compare different LLMs’ decision consistency and adaptation over long horizons. - Produce a dataset of reasoning summaries + actions + outcomes suitable for exploring instructional use (human learning/teaching), not solver training. When / where - Dates: Oct 27 — Nov 3 - Live on a website: link on the site below (free to watch). Looking for - Feedback on design/metrics. - Participants suggestions. - Community ideas on fair prompts, leak prevention, and evaluation. - Sponsors interested in supporting an open, public experiment (logos on stream, sections sponsoring, mentions). Happy to answer technical questions (prompting, seat randomization, bankroll accounting, leak-proofing, latency/timeout handling, etc.). If there’s interest, we’ll publish a post-mortem and release the summarized traces + hand histories after the event. |