Off the convex path

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Networks

Jul 15, 2022 (Noam Razin). The ability of large neural networks to generalize is commonly believed to stem from an implicit regularization — a tendency... Continue

Predicting Generalization using GANs

Jun 6, 2022 (Sanjeev Arora and Yi Zhang). A central problem of generalization theory is the following: Given a training dataset and a deep net trained with that... Continue

Does Gradient Flow Over Neural Networks Really Represent Gradient Descent?

Jan 6, 2022 (Nadav Cohen). TL;DR A lot was said in this blog (cf. post by Sanjeev) about the importance of studying trajectories of gradient... Continue

Implicit Regularization in Tensor Factorization: Can Tensor Rank Shed Light on Generalization in Deep Learning?

Jul 8, 2021 (Noam Razin, Asaf Maman, Nadav Cohen). In effort to understand implicit regularization in deep learning, a lot of theoretical focus is being directed at matrix factorization,... Continue

Rip van Winkle's Razor, a Simple New Estimate for Adaptive Data Analysis

Apr 7, 2021 (Sanjeev Arora, Yi Zhang). Can you trust a model whose designer had access to the test/holdout set? This implicit question in Dwork et al... Continue

When are Neural Networks more powerful than Neural Tangent Kernels?

Mar 25, 2021 (Yu Bai, Minshuo Chen, Jason D. Lee). The empirical success of deep learning has posed significant challenges to machine learning theory: Why can we efficiently train neural... Continue

Beyond log-concave sampling (Part 3)

Mar 12, 2021 (Andrej Risteski). In the first post of this series, we introduced the challenges of sampling distributions beyond log-concavity. In Part 2 we... Continue

Beyond log-concave sampling (Part 2)

Mar 1, 2021 (Holden Lee, Andrej Risteski). In our previous blog post, we introduced the challenges of sampling distributions beyond log-concavity. We first introduced the problem of... Continue

Can implicit regularization in deep learning be explained by norms?

Nov 27, 2020 (Nadav Cohen). This post is based on my recent paper with Noam Razin (to appear at NeurIPS 2020), studying the question of... Continue

How to allow deep learning on your data without revealing the data

Nov 11, 2020 (Sanjeev Arora). Today’s online world and the emerging internet of things is built around a Faustian bargain: consumers (and their internet of... Continue

Mismatches between Traditional Optimization Analyses and Modern Deep Learning

Oct 21, 2020 (Zhiyuan Li and Sanjeev Arora). You may remember our previous blog post showing that it is possible to do state-of-the-art deep learning with learning rate... Continue

Beyond log-concave sampling

Sep 19, 2020 (Holden Lee, Andrej Risteski). As the growing number of posts on this blog would suggest, recent years have seen a lot of progress in... Continue

Training GANs - From Theory to Practice

Jul 6, 2020 (Oren Mangoubi, Sushant Sachdeva, Nisheeth Vishnoi). GANs, originally discovered in the context of unsupervised learning, have had far reaching implications to science, engineering, and society. However,... Continue

An equilibrium in nonconvex-nonconcave min-max optimization

Jun 24, 2020 (Oren Mangoubi and Nisheeth Vishnoi). While there has been incredible progress in convex and nonconvex minimization, a multitude of problems in ML today are in... Continue

Exponential Learning Rate Schedules for Deep Learning (Part 1)

Apr 24, 2020 (Zhiyuan Li and Sanjeev Arora). This blog post concerns our ICLR20 paper on a surprising discovery about learning rate (LR), the most basic hyperparameter in... Continue

Ultra-Wide Deep Nets and Neural Tangent Kernel (NTK)

Oct 3, 2019 (Wei Hu and Simon Du). (Crossposted at CMU ML.) Traditional wisdom in machine learning holds that there is a careful trade-off between training error and... Continue

Understanding implicit regularization in deep learning by analyzing trajectories of gradient descent

Jul 10, 2019 (Nadav Cohen and Wei Hu). Sanjeev’s recent blog post suggested that the conventional view of optimization is insufficient for understanding deep learning, as the value... Continue

Landscape Connectivity of Low Cost Solutions for Multilayer Nets

Jun 16, 2019 (Rong Ge). A big mystery about deep learning is how, in a highly nonconvex loss landscape, gradient descent often finds near-optimal solutions... Continue

Is Optimization a Sufficient Language for Understanding Deep Learning?

Jun 3, 2019 (Sanjeev Arora). In this Deep Learning era, machine learning usually boils down to defining a suitable objective/cost function for the learning task... Continue

Contrastive Unsupervised Learning of Semantic Representations: A Theoretical Framework

Mar 19, 2019 (Sanjeev Arora, Hrishikesh Khandeparkar, Orestis Plevrakis, Nikunj Saunshi). Semantic representations (aka semantic embeddings) of complicated data types (e.g. images, text, video) have become central in machine learning, and... Continue

The search for biologically plausible neural computation: A similarity-based approach

Dec 3, 2018 (Cengiz Pehlevan and Dmitri "Mitya" Chklovskii). This is the second post in a series reviewing recent progress in designing artificial neural networks (NNs) that resemble natural... Continue

Understanding optimization in deep learning by analyzing trajectories of gradient descent

Nov 7, 2018 (Nadav Cohen). Neural network optimization is fundamentally non-convex, and yet simple gradient-based algorithms seem to consistently solve such problems. This phenomenon is... Continue

Simple and efficient semantic embeddings for rare words, n-grams, and language features

Sep 18, 2018 (Sanjeev Arora, Mikhail Khodak, Nikunj Saunshi). Distributional methods for capturing meaning, such as word embeddings, often require observing many examples of words in context. But most... Continue

When Recurrent Models Don't Need to be Recurrent

Jul 27, 2018 (John Miller). In the last few years, deep learning practitioners have proposed a litany of different sequence models. Although recurrent neural networks... Continue

Deep-learning-free Text and Sentence Embedding, Part 2

Jun 25, 2018 (Sanjeev Arora, Mikhail Khodak, Nikunj Saunshi). This post continues Sanjeev’s post and describes further attempts to construct elementary and interpretable text embeddings. The previous post described... Continue

Deep-learning-free Text and Sentence Embedding, Part 1

Jun 17, 2018 (Sanjeev Arora). Word embeddings (see my old post1 and post2) capture the idea that one can express “meaning” of words using a... Continue

Limitations of Encoder-Decoder GAN architectures

Mar 12, 2018 (Sanjeev Arora and Andrej Risteski). This is yet another post about Generative Adversarial Nets (GANs), and based upon our new ICLR’18 paper with Yi Zhang.... Continue

Can increasing depth serve to accelerate optimization?

Mar 2, 2018 (Nadav Cohen). “How does depth help?” is a fundamental question in the theory of deep learning. Conventional wisdom, backed by theoretical studies... Continue

Proving generalization of deep nets via compression

Feb 17, 2018 (Sanjeev Arora). This post is about my new paper with Rong Ge, Behnam Neyshabur, and Yi Zhang which offers some new perspective... Continue

Generalization Theory and Deep Nets, An introduction

Dec 8, 2017 (Sanjeev Arora). Deep learning holds many mysteries for theory, as we have discussed on this blog. Lately many ML theorists have become... Continue

How to Escape Saddle Points Efficiently

Jul 19, 2017 (Chi Jin and Michael Jordan). A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient... Continue

Do GANs actually do distribution learning?

Jul 6, 2017 (Sanjeev Arora and Yi Zhang). This post is about our new paper, which presents empirical evidence that current GANs (Generative Adversarial Nets) are quite far... Continue

Unsupervised learning, one notion or many?

Jun 26, 2017 (Sanjeev Arora and Andrej Risteski). Unsupervised learning, as the name suggests, is the science of learning from unlabeled data. A look at the wikipedia page... Continue

Generalization and Equilibrium in Generative Adversarial Networks (GANs)

Mar 30, 2017 (Sanjeev Arora). The previous post described Generative Adversarial Networks (GANs), a technique for training generative models for image distributions (and other complicated... Continue

Generative Adversarial Networks (GANs), Some Open Questions

Mar 15, 2017 (Sanjeev Arora). Since ability to generate “realistic-looking” data may be a step towards understanding its structure and exploiting it, generative models are... Continue

Back-propagation, an introduction

Dec 20, 2016 (Sanjeev Arora and Tengyu Ma). Given the sheer number of backpropagation tutorials on the internet, is there really need for another? One of us (Sanjeev)... Continue

The search for biologically plausible neural computation: The conventional approach

Nov 3, 2016 (Dmitri "Mitya" Chklovskii). Inventors of the original artificial neural networks (NNs) derived their inspiration from biology. However, as artificial NNs progressed, their design... Continue

Gradient Descent Learns Linear Dynamical Systems

Oct 13, 2016 (Moritz Hardt and Tengyu Ma). From text translation to video captioning, learning to map one sequence to another is an increasingly active research area in... Continue

Linear algebraic structure of word meanings

Jul 10, 2016 (Sanjeev Arora). Word embeddings capture the meaning of a word using a low-dimensional vector and are ubiquitous in natural language processing (NLP).... Continue

A Framework for analysing Non-Convex Optimization

May 8, 2016 (Sanjeev Arora, Tengyu Ma). Previously Rong’s post and Ben’s post show that (noisy) gradient descent can converge to local minimum of a non-convex function,... Continue

Markov Chains Through the Lens of Dynamical Systems: The Case of Evolution

Apr 4, 2016 (Nisheeth Vishnoi). In this post, we will see the main technical ideas in the analysis of the mixing time of evolutionary Markov... Continue

Saddles Again

Mar 24, 2016 (Benjamin Recht). Thanks to Rong for the very nice blog post describing critical points of nonconvex functions and how to avoid them.... Continue

Escaping from Saddle Points

Mar 22, 2016 (Rong Ge). Convex functions are simple — they usually have only one local minimum. Non-convex functions can be much more complicated. In... Continue

Stability as a foundation of machine learning

Mar 14, 2016 (Moritz Hardt). Central to machine learning is our ability to relate how a learning algorithm fares on a sample to its performance... Continue

Evolution, Dynamical Systems and Markov Chains

Mar 7, 2016 (Nisheeth Vishnoi). In this post we present a high level introduction to evolution and to how we can use mathematical tools such... Continue

Word Embeddings: Explaining their properties

Feb 14, 2016 (Sanjeev Arora). This is a followup to an earlier post about word embeddings, which capture the meaning of a word using a... Continue

NIPS 2015 workshop on non-convex optimization

Jan 25, 2016 (Anima Anandkumar). While convex analysis has received much attention by the machine learning community, theoretical analysis of non-convex optimization is still nascent.... Continue

Nature, Dynamical Systems and Optimization

Dec 21, 2015 (Nisheeth Vishnoi). The language of dynamical systems is the preferred choice of scientists to model a wide variety of phenomena in nature.... Continue

Tensor Methods in Machine Learning

Dec 17, 2015 (Rong Ge). Tensors are high dimensional generalizations of matrices. In recent years tensor decompositions were used to design learning algorithms for estimating... Continue

Semantic Word Embeddings

Dec 12, 2015 (Sanjeev Arora). This post can be seen as an introduction to how nonconvex problems arise naturally in practice, and also the relative... Continue

Why go off the convex path?

Dec 11, 2015 (blogmasters). The notion of convexity underlies a lot of beautiful mathematics. When combined with computation, it gives rise to the area... Continue