### When Recurrent Models Don't Need to be Recurrent

In the last few years, deep learning practitioners have proposed a litany of different sequence models. Although recurrent neural networks were once the tool of choice, now models like the autoregressive Wavenet or the Transformer...### Deep-learning-free Text and Sentence Embedding, Part 2

This post continues Sanjeev’s post and describes further attempts to construct elementary and interpretable text embeddings. The previous post described the the SIF embedding, which uses a simple weighted combination of word embeddings combined with...### Deep-learning-free Text and Sentence Embedding, Part 1

Word embeddings (see my old post1 and post2) capture the idea that one can express “meaning” of words using a vector, so that the cosine of the angle between the vectors captures semantic similarity. (“Cosine...### Limitations of Encoder-Decoder GAN architectures

This is yet another post about Generative Adversarial Nets (GANs), and based upon our new ICLR’18 paper with Yi Zhang. A quick recap of the story so far. GANs are an unsupervised method in deep...### Can increasing depth serve to accelerate optimization?

“How does depth help?” is a fundamental question in the theory of deep learning. Conventional wisdom, backed by theoretical studies (e.g. Eldan & Shamir 2016; Raghu et al. 2017; Lee et al. 2017; Cohen et...### Proving generalization of deep nets via compression

This post is about my new paper with Rong Ge, Behnam Neyshabur, and Yi Zhang which offers some new perspective into the generalization mystery for deep nets discussed in my earlier post. The new paper...### Generalization Theory and Deep Nets, An introduction

Deep learning holds many mysteries for theory, as we have discussed on this blog. Lately many ML theorists have become interested in the generalization mystery: why do trained deep nets perform well on previously unseen...### How to Escape Saddle Points Efficiently

A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient descent (GD) generically escapes saddle points asymptotically (see Rong Ge’s and Ben Recht’s blog posts),...### Do GANs actually do distribution learning?

This post is about our new paper, which presents empirical evidence that current GANs (Generative Adversarial Nets) are quite far from learning the target distribution. Previous posts had introduced GANs and described new theoretical analysis...
Newer