3B Parameters, One GPU? No Problem: Fit More and Train Faster with Zero | Dark Hacker News