본문 바로가기
  • Deep dive into Learning
  • Deep dive into Optimization
  • Deep dive into Deep Learning
Paper Review

Paper Review: Why Transformers need Adam : A Hessian Perspective

by Sapiens_Nam 2025. 4. 16.

Presentation.pdf
1.47MB

 

728x90

'Paper Review' 카테고리의 다른 글

Shampoo Optimizer 리뷰  (0) 2025.03.29
Adam can converge without any modification on Update rules  (0) 2023.07.04
Sharpness-Aware Minimization  (3) 2023.06.26

댓글