Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Networks
The ability of large neural networks to generalize is commonly believed to stem from an implicit regularization — a tendency... Continue
The ability of large neural networks to generalize is commonly believed to stem from an implicit regularization — a tendency... Continue
A central problem of generalization theory is the following: Given a training dataset and a deep net trained with that... Continue
TL;DR A lot was said in this blog (cf. post by Sanjeev) about the importance of studying trajectories of gradient... Continue
In effort to understand implicit regularization in deep learning, a lot of theoretical focus is being directed at matrix factorization,... Continue
Can you trust a model whose designer had access to the test/holdout set? This implicit question in Dwork et al... Continue
The empirical success of deep learning has posed significant challenges to machine learning theory: Why can we efficiently train neural... Continue
In the first post of this series, we introduced the challenges of sampling distributions beyond log-concavity. In Part 2 we... Continue
In our previous blog post, we introduced the challenges of sampling distributions beyond log-concavity. We first introduced the problem of... Continue
This post is based on my recent paper with Noam Razin (to appear at NeurIPS 2020), studying the question of... Continue
Today’s online world and the emerging internet of things is built around a Faustian bargain: consumers (and their internet of... Continue
You may remember our previous blog post showing that it is possible to do state-of-the-art deep learning with learning rate... Continue
As the growing number of posts on this blog would suggest, recent years have seen a lot of progress in... Continue
GANs, originally discovered in the context of unsupervised learning, have had far reaching implications to science, engineering, and society. However,... Continue
While there has been incredible progress in convex and nonconvex minimization, a multitude of problems in ML today are in... Continue
This blog post concerns our ICLR20 paper on a surprising discovery about learning rate (LR), the most basic hyperparameter in... Continue
(Crossposted at CMU ML.) Traditional wisdom in machine learning holds that there is a careful trade-off between training error and... Continue
Sanjeev’s recent blog post suggested that the conventional view of optimization is insufficient for understanding deep learning, as the value... Continue
A big mystery about deep learning is how, in a highly nonconvex loss landscape, gradient descent often finds near-optimal solutions... Continue
In this Deep Learning era, machine learning usually boils down to defining a suitable objective/cost function for the learning task... Continue
Semantic representations (aka semantic embeddings) of complicated data types (e.g. images, text, video) have become central in machine learning, and... Continue
This is the second post in a series reviewing recent progress in designing artificial neural networks (NNs) that resemble natural... Continue
Neural network optimization is fundamentally non-convex, and yet simple gradient-based algorithms seem to consistently solve such problems. This phenomenon is... Continue
Distributional methods for capturing meaning, such as word embeddings, often require observing many examples of words in context. But most... Continue
In the last few years, deep learning practitioners have proposed a litany of different sequence models. Although recurrent neural networks... Continue
This post continues Sanjeev’s post and describes further attempts to construct elementary and interpretable text embeddings. The previous post described... Continue
Word embeddings (see my old post1 and post2) capture the idea that one can express “meaning” of words using a... Continue
This is yet another post about Generative Adversarial Nets (GANs), and based upon our new ICLR’18 paper with Yi Zhang.... Continue
“How does depth help?” is a fundamental question in the theory of deep learning. Conventional wisdom, backed by theoretical studies... Continue
This post is about my new paper with Rong Ge, Behnam Neyshabur, and Yi Zhang which offers some new perspective... Continue
Deep learning holds many mysteries for theory, as we have discussed on this blog. Lately many ML theorists have become... Continue
A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient... Continue
This post is about our new paper, which presents empirical evidence that current GANs (Generative Adversarial Nets) are quite far... Continue
Unsupervised learning, as the name suggests, is the science of learning from unlabeled data. A look at the wikipedia page... Continue
The previous post described Generative Adversarial Networks (GANs), a technique for training generative models for image distributions (and other complicated... Continue
Since ability to generate “realistic-looking” data may be a step towards understanding its structure and exploiting it, generative models are... Continue
Given the sheer number of backpropagation tutorials on the internet, is there really need for another? One of us (Sanjeev)... Continue
Inventors of the original artificial neural networks (NNs) derived their inspiration from biology. However, as artificial NNs progressed, their design... Continue
From text translation to video captioning, learning to map one sequence to another is an increasingly active research area in... Continue
Word embeddings capture the meaning of a word using a low-dimensional vector and are ubiquitous in natural language processing (NLP).... Continue
Previously Rong’s post and Ben’s post show that (noisy) gradient descent can converge to local minimum of a non-convex function,... Continue
In this post, we will see the main technical ideas in the analysis of the mixing time of evolutionary Markov... Continue
Thanks to Rong for the very nice blog post describing critical points of nonconvex functions and how to avoid them.... Continue
Convex functions are simple — they usually have only one local minimum. Non-convex functions can be much more complicated. In... Continue
Central to machine learning is our ability to relate how a learning algorithm fares on a sample to its performance... Continue
In this post we present a high level introduction to evolution and to how we can use mathematical tools such... Continue
This is a followup to an earlier post about word embeddings, which capture the meaning of a word using a... Continue
While convex analysis has received much attention by the machine learning community, theoretical analysis of non-convex optimization is still nascent.... Continue
The language of dynamical systems is the preferred choice of scientists to model a wide variety of phenomena in nature.... Continue
Tensors are high dimensional generalizations of matrices. In recent years tensor decompositions were used to design learning algorithms for estimating... Continue
This post can be seen as an introduction to how nonconvex problems arise naturally in practice, and also the relative... Continue
The notion of convexity underlies a lot of beautiful mathematics. When combined with computation, it gives rise to the area... Continue