Why gradient clipping accelerates training: A theoretical justification for adaptivity

Why gradient clipping accelerates training: A theoretical justification for adaptivity

Abstract

We provide a theoretical explanation for the effectiveness of gradient clipping in training deep neural networks. The key ingredient is a new smoothness condition derived from practical neural network training examples. We observe that gradient smoothness, a concept central to the analysis of first-order optimization algorithms that is often assumed to be a constant, demonstrates significant variability along the training trajectory of deep neural networks. Further, this smoothness positively correlates with the gradient norm, and contrary to standard assumptions in the literature, it can grow with the norm of the gradient. These empirical observations limit the applicability of existing theoretical analyses of algorithms that rely on a fixed bound on smoothness. These observations motivate us to introduce a novel relaxation of gradient smoothness that is weaker than the commonly used Lipschitz smoothness assumption. Under the new condition, we prove that two popular methods, namely, gradient clipping and normalized gradient, converge arbitrarily faster than gradient descent with fixed stepsize. We further explain why such adaptively scaled gradient methods can accelerate empirical convergence and verify our results empirically in popular neural network training settings.

\iclrfinalcopy\definecolor

darkbluergb0.0,0.0,0.55 \usetkzobjall \newcolumntypeS¿\arraybackslash m.10 \newcolumntypeT¿\arraybackslash m.30

\subfile

sections/introduction

\subfile

sections/smoothness

\subfile

sections/preliminaries

\subfile

sections/main-results

\subfile

sections/experiment

\subfile

sections/discussion

1 Acknowledgement

SS acknowledges support from an NSF-CAREER Award (Number 1846088) and an Amazon Research Award. AJ acknowledges support from an MIT-IBM-Exploratory project on adaptive, robust, and collaborative optimization.

References

\subfile

sections/appendix

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
407730
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description