string_data_2021

Name: string_data_2021
Start: 2021-12-13T13:00:00+02:00
End: 2021-12-17T20:00:00+02:00
Location: No location set

13–17 Dec 2021

Africa/Johannesburg timezone

Contact

vishnu.jejjala@gmail.com

Renormalizing the optimal hyperparameters of a neural network

14 Dec 2021, 18:30

30m

Greg Yang (Microsoft Research)

Hyperparameter tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters that often can only be trained once. We show that, in the recently discovered Maximal Update Parametrization (µP), many optimal hyperparameters remain stable even as model size changes. Using this insight, for example, we are able to re-tune the 6.7-billion-parameter model of GPT-3 and obtain performance comparable to the 13-billionparameter model of GPT-3, effectively doubling the model size.
In this context, there is a rich analogy we can make to Wilsonian effective field theory. For example, if “coupling constants” in physics correspond to “optimal hyperparameters” in deep learning and “cutoff scale” corresponds to “model size”, then we can say “µP” is a renormalizable theory of neural networks.” We explore this analogy further in the talk and leave open the question whether methods from effective field theory itself can make advances in tuning hyperparameters.

Greg Yang (Microsoft Research)

yang.pdf

YouTube Recording (Greg Yang)

string_data_2021

Contact

Renormalizing the optimal hyperparameters of a neural network

Speaker

Description

Author

Presentation materials

Choose timezone

string_data_2021

Contact

Speaker

Description

Author

Presentation materials