Decoupled DiLoCo: Resilient, Distributed AI Training at Scale(deepmind.google) |
Decoupled DiLoCo: Resilient, Distributed AI Training at Scale(deepmind.google) |
This paper proposes a work partitioning scheme that removes a constraint that makes parallelizing AI training inefficient. The idea of a work partitioning scheme isn't novel, but the scheme itself is.