Annotated Implementation of DeepNet: Scaling Transformers to 1k Layers(nn.labml.ai)3 points by vpj 4 years ago | 0 commentsNo comments yet