본문 바로가기

Foundation Transformers1

[논문리뷰:개념] DeepNet, Foundation Transformers 논문링크: https://arxiv.org/abs/2203.00555 DeepNet: Scaling Transformers to 1,000 Layers In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify the residual connection in Transformer, accompanying with theoretically derived i arxiv.org 논문링크: https://arxiv.org/abs/2210.06423 Foundat.. 2024. 3. 27.

이전 1 다음

티스토리툴바