Bitwise Consistent On-Policy Reinforcement Learning with VLLM and TorchTitan | Dark Hacker News