### Understanding optimization in deep learning by analyzing trajectories of gradient descent

Neural network optimization is fundamentally non-convex, and yet simple gradient-based algorithms seem to consistently solve such problems. This phenomenon is...### Simple and efficient semantic embeddings for rare words, n-grams, and language features

Distributional methods for capturing meaning, such as word embeddings, often require observing many examples of words in context. But most...### When Recurrent Models Don't Need to be Recurrent

In the last few years, deep learning practitioners have proposed a litany of different sequence models. Although recurrent neural networks...### Deep-learning-free Text and Sentence Embedding, Part 2

This post continues Sanjeev’s post and describes further attempts to construct elementary and interpretable text embeddings. The previous post described...### Deep-learning-free Text and Sentence Embedding, Part 1

Word embeddings (see my old post1 and post2) capture the idea that one can express “meaning” of words using a...### Limitations of Encoder-Decoder GAN architectures

This is yet another post about Generative Adversarial Nets (GANs), and based upon our new ICLR’18 paper with Yi Zhang....### Can increasing depth serve to accelerate optimization?

“How does depth help?” is a fundamental question in the theory of deep learning. Conventional wisdom, backed by theoretical studies...### Proving generalization of deep nets via compression

This post is about my new paper with Rong Ge, Behnam Neyshabur, and Yi Zhang which offers some new perspective...### Generalization Theory and Deep Nets, An introduction

Deep learning holds many mysteries for theory, as we have discussed on this blog. Lately many ML theorists have become...### How to Escape Saddle Points Efficiently

A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient...### Do GANs actually do distribution learning?

This post is about our new paper, which presents empirical evidence that current GANs (Generative Adversarial Nets) are quite far...### Unsupervised learning, one notion or many?

Unsupervised learning, as the name suggests, is the science of learning from unlabeled data. A look at the wikipedia page...### Generalization and Equilibrium in Generative Adversarial Networks (GANs)

The previous post described Generative Adversarial Networks (GANs), a technique for training generative models for image distributions (and other complicated...### Generative Adversarial Networks (GANs), Some Open Questions

Since ability to generate “realistic-looking” data may be a step towards understanding its structure and exploiting it, generative models are...### Back-propagation, an introduction

Given the sheer number of backpropagation tutorials on the internet, is there really need for another? One of us (Sanjeev)...### The search for biologically plausible neural computation: The conventional approach

Inventors of the original artificial neural networks (NNs) derived their inspiration from biology. However, as artificial NNs progressed, their design...### Gradient Descent Learns Linear Dynamical Systems

From text translation to video captioning, learning to map one sequence to another is an increasingly active research area in...### Linear algebraic structure of word meanings

Word embeddings capture the meaning of a word using a low-dimensional vector and are ubiquitous in natural language processing (NLP)....### A Framework for analysing Non-Convex Optimization

Previously Rong’s post and Ben’s post show that (noisy) gradient descent can converge to local minimum of a non-convex function,...### Markov Chains Through the Lens of Dynamical Systems: The Case of Evolution

In this post, we will see the main technical ideas in the analysis of the mixing time of evolutionary Markov...### Saddles Again

Thanks to Rong for the very nice blog post describing critical points of nonconvex functions and how to avoid them....### Escaping from Saddle Points

Convex functions are simple — they usually have only one local minimum. Non-convex functions can be much more complicated. In...### Stability as a foundation of machine learning

Central to machine learning is our ability to relate how a learning algorithm fares on a sample to its performance...### Evolution, Dynamical Systems and Markov Chains

In this post we present a high level introduction to evolution and to how we can use mathematical tools such...### Word Embeddings: Explaining their properties

This is a followup to an earlier post about word embeddings, which capture the meaning of a word using a...### NIPS 2015 workshop on non-convex optimization

While convex analysis has received much attention by the machine learning community, theoretical analysis of non-convex optimization is still nascent....### Nature, Dynamical Systems and Optimization

The language of dynamical systems is the preferred choice of scientists to model a wide variety of phenomena in nature....### Tensor Methods in Machine Learning

Tensors are high dimensional generalizations of matrices. In recent years tensor decompositions were used to design learning algorithms for estimating...### Semantic Word Embeddings

This post can be seen as an introduction to how nonconvex problems arise naturally in practice, and also the relative...### Why go off the convex path?

The notion of convexity underlies a lot of beautiful mathematics. When combined with computation, it gives rise to the area...
Newer

Older