Training a small model to write better OCaml with RLVR and GRPO | Dark Hacker News