Annotated Implementation of DeepNet: Scaling Transformers to 1k Layers | Dark Hacker News