Parallel Scaling Law for Language Models(arxiv.org) |
Parallel Scaling Law for Language Models(arxiv.org) |
Compared to scaling parameters alone, the same performance increase using their technique may be achieved with 22x less increase in memory and 6x less latency increase.