Beyond Gradient Averaging in Parallel Optimization | Dark Hacker News