The Science of Deep Learning

Artificial neural networks have in the last decade been responsible for a wide range of significant advances in computer vision, natural language processing, and reinforcement learning. Despite their success, we lack strong, high-level theories to predict most of the empirical results in the field. For instance, we cannot make precise quantitative predictions about how one architecture will perform vs another before actually training them.

Analogies can be drawn to the beginning of the First Industrial Revolution, when the Laws of Thermodynamics lacked a strong theoretical foundation, although they were well understood as a matter of practice -- well enough to build steam engines. But it wasn't until the perhaps the late 1800s that they were satisfactorily derived from physical first principles, giving rise to the field of statistical physics.

The development of a "Science of Deep Learning" is now an active, interdisciplinary area of research combining insights from information theory, statistical physics, mathematical biology, and others. This page organizes key results from the field, and is maintained by Eric Michaud. If you have suggestions, email me at ericjmichaud@berkeley.edu

Information Theory

Scaling Laws and "foresight"

Feature Visualization

Intuitions from Evolution; Competition vs. Cooperation