Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
VAE variantsHao Dong
Peking University
1
• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)
2
VAE variants
Hierarchical representation learning
Temporal representa2on learning
Representation learning
• Convolutional VAE• Conditional VAE• IWAE• β-VAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)
3
VAE variants
Hierarchical representation learning
Temporal representation learning
Representation learning
4
Convolutional Variational Autoencoder
• Limitations of vanilla VAE• The size of weight of fully connected layer == input size x output size• If VAE uses fully connected layers only, will lead to curse of dimensionality
when the input dimension is large (e.g., image).
• Solution
Image is modified from: Deep Clustering with Convolutional Autoencoder. NIPS 2017.
Fully connected layers here(independent variables)
• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)
5
VAE variants
Hierarchical representa2on learning
Temporal representation learning
Representation learning
6
Conditional Variational Autoencoder
• Train and inference with labelled data.
Learning structured output representation using deep conditional generative models. NeurIPS 2015.
7
Recap: Variational Autoencoder
Auto-Encoding Variational Bayes. Diederik P. Kingma, Max Welling. ICLR 2013
• Recap: Se>ng up the objecAve• Maximise P(X)• Set Q(z) to be an arbitrary distribuGon
8
Recap: Variational Autoencoder
Auto-Encoding Variational Bayes. Diederik P. Kingma, Max Welling. ICLR 2013
• Recap: Setting up the objective
encoder ideal reconstruction KLD
9
Condi6onal Varia6onal Autoencoder
Learning structured output representation using deep conditional generative models. NeurIPS 2015.
• Setting up the objective with labels• Maximise P(Y|X)• Set Q(z) to be an arbitrary distribution
10
Conditional Variational Autoencoder
Learning structured output representation using deep conditional generative models. NeurIPS 2015.
• Setting up the objective
encoder ideal
reconstruction KLD
11
Conditional Variational Autoencoder
• Train and inference without labelled data i.e., vanilla VAE
Learning structured output representation using deep conditional generative models. NeurIPS 2015.
12
Conditional Variational Autoencoder
• Train and inference with labelled data.
Learning structured output representation using deep conditional generative models. NeurIPS 2015.
13
Conditional Variational Autoencoder
• Train and inference with labelled data.
Learning structured output representation using deep conditional generative models. NeurIPS 2015.
14
Conditional Variational Autoencoder
• Train and inference with labeled data.
Learning structured output representation using deep conditional generative models. NeurIPS 2015.
• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)
15
VAE variants
Hierarchical representation learning
Temporal representation learning
Representation learning
16
Before we start
• Disentangled / Factorised representation• Each variable in the inferred latent representation is only sensitive to one
single generative factor and relatively invariant to other factors• Good interpretability and easy generalization to a variety of tasks
β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.
17
Before we start
• Unsupervised hierarchical representation learning
Rethinking Style and Content Disentanglement in Variational Autoencoders. ICLR 2018 Workshops.
18
β-VAE
• Unsupervised representation learning• Augment the original VAE framework with a single hyper-parameter β
that modulates the learning constraints• Impose a limit on the capacity of the latent information channel
β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.
19
β-VAE
β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.
20
β-VAE
β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.
21
β-VAE
β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.
22
β-VAE
β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.
23
β-VAE
β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.
24
β-VAE
β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.
25
β-VAE
• Discussion: It is really unsupervised?
• It is unsupervised/self-supervised learning, because it does not need any label data
• It is not fully unsupervised learning, it works because of the inductive bias of the neural network model, the hierarchical design introduces prior knowledge about the data
β-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. ICLR 2017.
• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)
26
VAE variants
Hierarchical representation learning
Temporal representation learning
Representation learning
27
IWAE(Importance Weighted Autoencoder)
Importance Weighted Autoencoders. ICLR 2016.
• Optimise a tighter lower bound than VAE• VAE just optimises a lower bound of log P(X)
encoder ideal reconstruction KLD
-ELBO
ELBO
28
IWAE
Importance Weighted Autoencoders. ICLR 2016.
• Optimise a tighter lower bound than VAE
29
IWAE
Importance Weighted Autoencoders. ICLR 2016.
30
IWAE
Importance Weighted Autoencoders. ICLR 2016.
• Why “Importance weighted”
where
• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)
31
VAE variants
Hierarchical representation learning
Temporal representation learning
Representation learning
32
Ladder VAE
• To learn hierarchical latent representation• Deep models with several layers of dependent stochastic variables are difficult to train
• Limiting the improvements obtained using these highly expressive models
LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.
33
Ladder VAE
LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.
encode decode encode decode
Hierarchical VAE Ladder VAE
34
Ladder VAE
LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.
35
Ladder VAE
LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.
36
Ladder VAE
LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.
37
Ladder VAE
LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.
38
Ladder VAE
LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.
39
Ladder VAE
LVAE: Ladder Variational Autoencoder. Casper Kaae Sønderby, Tapani Raiko, Lars Maaløe, Søren Kaae Sønderby, and Ole Winther. NIPS 2016.
• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)
40
VAE variants
Hierarchical representation learning
Temporal representation learning
Representation learning
41
Progressive + Fade-in VAE
• Discussion
Progressive Learning and Disentanglement of Hierarchical Representations. ICLR 2020.
encoders decoders
high-level
middle-level
low-level
Can we directly train a hierarchical VAE with ladder structure like that?
42
Progressive + Fade-in VAE
• Discussion
Progressive Learning and Disentanglement of Hierarchical Representations. ICLR 2020.
encoders decoders
high-level
middle-level
low-level
Information SHORTCUT problemall information go through the low-level path,other paths are ignored.model is lazy…
43
Progressive + Fade-in VAE
• Progressive + Fade-in
Progressive Learning and Disentanglement of Hierarchical Representations. ICLR 2020.
Stage 1: enable high-level path Stage 2: enable high-level and middle paths
Stage 3: enable all paths …
𝛼 increase from 0 to 1 gradually
44
Progressive + Fade-in VAE
• Results
Progressive Learning and Disentanglement of Hierarchical Representations. ICLR 2020.
High-level: background and foreground colors
Middle-level: shape
Low-level: …
45
Progressive + Fade-in VAE
• Results
Progressive Learning and Disentanglement of Hierarchical Representations. ICLR 2020.
High-level Middle-level Low-level
• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)
46
VAE variants
Hierarchical representation learning
Temporal representation learning
Representation learning
47
VAE in speech
• Learning latent representations for style control and transfer in end-to-end speech synthesis
• RNN as encoder
Learning latent representations for style control and transfer in end-to-end speech synthesis. ICASSP 2019.
• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)
48
VAE variants
Hierarchical representation learning
Temporal representation learning
Representation learning
50
TD-VAE
• State-space model as a Markov Chain model
Temporal Difference VAE. ICLR 2019.
51
TD-VAE
Temporal Difference VAE. ICLR 2019.
52
TD-VAE
Temporal Difference VAE. ICLR 2019.
53
TD-VAE
Temporal Difference VAE. ICLR 2019.
54
TD-VAE
Temporal Difference VAE. ICLR 2019.
55
TD-VAE
Temporal Difference VAE. ICLR 2019.
• Convolutional VAE• Conditional VAE• β-VAE• IWAE• Ladder VAE• Progressive + Fade-in VAE• VAE in speech• Temporal Difference VAE (TD-VAE)
56
Summary
Hierarchical representation learning
Temporal representation learning
Representation learning
Thanks
57