I used RL fine-tuning to make an LLM generate ugly and unpythonic FizzBuzz code | Dark Hacker News