### Understanding implicit regularization in deep learning by analyzing trajectories of gradient descent

Sanjeev’s recent blog post suggested that the conventional view of optimization is insufficient for understanding deep learning, as the value...### Landscape Connectivity of Low Cost Solutions for Multilayer Nets

A big mystery about deep learning is how, in a highly nonconvex loss landscape, gradient descent often finds near-optimal solutions...### Is Optimization a Sufficient Language for Understanding Deep Learning?

In this Deep Learning era, machine learning usually boils down to defining a suitable objective/cost function for the learning task...### Contrastive Unsupervised Learning of Semantic Representations: A Theoretical Framework

Semantic representations (aka semantic embeddings) of complicated data types (e.g. images, text, video) have become central in machine learning, and...### The search for biologically plausible neural computation: A similarity-based approach

This is the second post in a series reviewing recent progress in designing artificial neural networks (NNs) that resemble natural...### Understanding optimization in deep learning by analyzing trajectories of gradient descent

Neural network optimization is fundamentally non-convex, and yet simple gradient-based algorithms seem to consistently solve such problems. This phenomenon is...### Simple and efficient semantic embeddings for rare words, n-grams, and language features

Distributional methods for capturing meaning, such as word embeddings, often require observing many examples of words in context. But most...### When Recurrent Models Don't Need to be Recurrent

In the last few years, deep learning practitioners have proposed a litany of different sequence models. Although recurrent neural networks...### Deep-learning-free Text and Sentence Embedding, Part 2

This post continues Sanjeev’s post and describes further attempts to construct elementary and interpretable text embeddings. The previous post described...### Deep-learning-free Text and Sentence Embedding, Part 1

Word embeddings (see my old post1 and post2) capture the idea that one can express “meaning” of words using a...### Limitations of Encoder-Decoder GAN architectures

This is yet another post about Generative Adversarial Nets (GANs), and based upon our new ICLR’18 paper with Yi Zhang....### Can increasing depth serve to accelerate optimization?

“How does depth help?” is a fundamental question in the theory of deep learning. Conventional wisdom, backed by theoretical studies...### Proving generalization of deep nets via compression

This post is about my new paper with Rong Ge, Behnam Neyshabur, and Yi Zhang which offers some new perspective...### Generalization Theory and Deep Nets, An introduction

Deep learning holds many mysteries for theory, as we have discussed on this blog. Lately many ML theorists have become...### How to Escape Saddle Points Efficiently

A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient...### Do GANs actually do distribution learning?

This post is about our new paper, which presents empirical evidence that current GANs (Generative Adversarial Nets) are quite far...### Unsupervised learning, one notion or many?

Unsupervised learning, as the name suggests, is the science of learning from unlabeled data. A look at the wikipedia page...### Generalization and Equilibrium in Generative Adversarial Networks (GANs)

The previous post described Generative Adversarial Networks (GANs), a technique for training generative models for image distributions (and other complicated...### Generative Adversarial Networks (GANs), Some Open Questions

Since ability to generate “realistic-looking” data may be a step towards understanding its structure and exploiting it, generative models are...### Back-propagation, an introduction

Given the sheer number of backpropagation tutorials on the internet, is there really need for another? One of us (Sanjeev)...### The search for biologically plausible neural computation: The conventional approach

Inventors of the original artificial neural networks (NNs) derived their inspiration from biology. However, as artificial NNs progressed, their design...### Gradient Descent Learns Linear Dynamical Systems

From text translation to video captioning, learning to map one sequence to another is an increasingly active research area in...### Linear algebraic structure of word meanings

Word embeddings capture the meaning of a word using a low-dimensional vector and are ubiquitous in natural language processing (NLP)....### A Framework for analysing Non-Convex Optimization

Previously Rong’s post and Ben’s post show that (noisy) gradient descent can converge to local minimum of a non-convex function,...### Markov Chains Through the Lens of Dynamical Systems: The Case of Evolution

In this post, we will see the main technical ideas in the analysis of the mixing time of evolutionary Markov...### Saddles Again

Thanks to Rong for the very nice blog post describing critical points of nonconvex functions and how to avoid them....### Escaping from Saddle Points

Convex functions are simple — they usually have only one local minimum. Non-convex functions can be much more complicated. In...### Stability as a foundation of machine learning

Central to machine learning is our ability to relate how a learning algorithm fares on a sample to its performance...### Evolution, Dynamical Systems and Markov Chains

In this post we present a high level introduction to evolution and to how we can use mathematical tools such...### Word Embeddings: Explaining their properties

This is a followup to an earlier post about word embeddings, which capture the meaning of a word using a...### NIPS 2015 workshop on non-convex optimization

While convex analysis has received much attention by the machine learning community, theoretical analysis of non-convex optimization is still nascent....### Nature, Dynamical Systems and Optimization

The language of dynamical systems is the preferred choice of scientists to model a wide variety of phenomena in nature....### Tensor Methods in Machine Learning

Tensors are high dimensional generalizations of matrices. In recent years tensor decompositions were used to design learning algorithms for estimating...### Semantic Word Embeddings

This post can be seen as an introduction to how nonconvex problems arise naturally in practice, and also the relative...### Why go off the convex path?

The notion of convexity underlies a lot of beautiful mathematics. When combined with computation, it gives rise to the area...
Newer

Older