The Science of Deep Learning

For the MIT reading group, click here!

Note: the content on this page, the homepage, is pretty outdated and incomplete! I'll update it eventually though!

Artificial neural networks have in the last decade been responsible for some really impressive advances in AI capabilities, particularly on perceptual and control tasks. But despite this empirical success, we currently lack good explanatory theories for a variety of observed properties of deep neural networks, such as why they generalize well and why they scale as they do. Doing deep learning is like trying to build steam engines without having a good theory of thermodynamics – progress is brought about more by trial and error, guided by loose heuristics, than by first-principles.

What is needed is a "Science of Deep Learning" -- good, predictive, unifying explanations for when/why deep learning works and what its weaknesses are. Ideally, such a theory should be able to convince someone from 1980 that deep learning is a good idea.

I imagine that a mature Science of Deep Learning will pull ideas from both traditional ML theory as well as information theory, statistical physics, and will probably have some entirely new ideas too. This page compiles papers which I think will be most relevant to a mature understanding of deep learning. If you have suggestions, email me at ericjm@mit.edu. I may eventually replace this site with a publically-editable wiki or Roam graph.

Papers

Information Theory

Scaling Laws

Generalization

Interpretability and "Circuits"

Other

Informal Articles/Blog Posts

Courses