Deep Learning for Computer VisionFall 2020
http://vllab.ee.ntu.edu.tw/dlcv.html (Public website)
https://cool.ntu.edu.tw/courses/3368 (NTU COOL; for grade, etc.)
Yu-Chiang Frank Wang ็้บๅผท, Professor
Dept. Electrical Engineering, National Taiwan University
2020/11/10
Week Date Topic Remarks
1 9/15 Course Logistics
2 9/22 Machine Learning 101
3 9/29 Intro to Neural Networks; Convolutional Neural Network (I) HW #1 out
4 10/6 Convolutional Neural Network (II): Visualization & Extensions of CNN
5 10/13 Tutorials on Python, Github, etc. (by TAs) HW #1 due
6 10/20 Visualization of CNN (II)Object Detection & Segmentation
HW #2 out
7 10/27 Image Segmentation; Generative Models
8 11/3 Generative Adversarial Network (GAN)
9 11/10 Transfer Learning for Visual Classification & Synthesis; Representation Disentanglement
HW #2 due;HW #3 out
10 11/17 TBD (CVPR Week)
11 11/24 Recurrent Neural Networks & Transformer
12 12/1 Meta-Learning; Few-Shot and Zero-Shot Classification (I) HW #3 due
13 12/8 Meta-Learning; Few-Shot and Zero-Shot Classification (II) HW #4 out
14 12/15 From Domain Adaptation to Domain Generalization Team-up for Final Projects
15 12/22 Beyond 2D vision (3D and Depth)
16 12/29 Image Inpainting and Outpainting; Guest Lecture HW #4 due
17 1/5 Guest Lectures
1/18-22 Presentation for Final Projects TBD2
What to Cover Todayโฆโข Transfer Learning for Visual Classification & Synthesis
โข Visual Classificationโข Domain Adaptation & Adversarial Learning
โข Visual Synthesisโข Style Transfer
โข Representation Disentanglementโข Supervised vs. unsupervised feature disentanglement
Many slides from Richard Turner, Fei-Fei Li, Yaser Sheikh, Simon Lucey, Kaiming He, and J.-B. Huang 3
Revisit of CNN for Visual Classification
4LeCun & Ranzato, Deep Learning Tutorial, ICML 2013
(Traditional) Machine Learning vs. Transfer Learning
โข Machine Learningโข Collecting/annotating data is typically expensive.
5Image Credit: A. Karpathy
(Traditional) Machine Learning vs. Transfer Learning (contโd)
โข Transfer Learningโข Improved learning & understanding in the (target) domain of interest
by leveraging knowledge from a different source domain.
6
7https://techcrunch.com/2017/02/08/udacity-open-sources-its-self-driving-car-simulator-for-anyone-to-use/https://googleblog.blogspot.tw/2014/04/the-latest-chapter-for-self-driving-car.html
โข A More Practical Example
Transfer Learning: What, When, and Why?
Transfer Learning
8
Transfer Learning
Multi-task Learning
TransductiveTransfer Learning
Unsupervised Transfer Learning
Inductive Transfer Learning
Domain Adaptation
Sample Selection Bias /Covariance Shift
Self-taught Learning
Labeled data are available in a target domain
Labeled data are available only in a source domain
No labeled data in both source and target domain
No labeled data in a source domain
Labeled data are available in a source domain
Case 1
Case 2Source and target tasks are learnt simultaneously
Different domains but single task
Assumption: single domain and single task
S. J. Pan and Q. Yang, โA survey on transfer learning,โ IEEE TKDE, 2010.
Domain Adaptationin Transfer Learning
โข Whatโs DA?โข Leveraging info from source to target domains,
so that the same learning task across domains can be addressed.โข Typically all the source-domain data are labeled,
while the target domain data are partially labeled or fully unlabeled.
โข Settingsโข Semi-supervised/unsupervised DA:
few/no target-domain data are with labels.โข Imbalanced DA:
fewer classes of interest in the target domainโข Open/closed-set/universal DA:
overlapping label space or notโข Homogeneous vs. heterogeneous DA:
same/distinct feature types across domains
9
Deep Feature is Sufficiently Promising.
โข DeCAFโข Leveraging an auxiliary large dataset to train CNN.โข The resulting features exhibit sufficient representation ability.โข Supporting results on Office+Caltech datasets, etc.
10Donahue et al., DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition, ICML 2014
Recent Deep Learning Methods for DA
โข Deep Domain Confusion (DDC)
โข Domain-Adversarial Training of Neural Networks (DANN)
โข Adversarial Discriminative Domain Adaptation (ADDA)
โข Domain Separation Network (DSN)
โข Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks (PixelDA)
โข No More Discrimination: Cross City Adaptation of Road Scene Segmenters
11
Shared weights Adaptation loss Generative model
DDC โ MMD โ
DANN โ Adversarial โ
ADDA โ Adversarial โ
DSN Partially shared MMD/Adversarial โ
PixelDA โ Adversarial โ
Deep Domain Confusion (DDC)
โข Deep Domain Confusion: Maximizing for Domain Invarianceโข Tzeng et al., arXiv: 1412.3474, 2014
12
Deep Domain Confusion (DDC)
13
sharedweights
โMinimize classification loss:
Domain Confusion by Domain-Adversarial Training
โข Domain-Adversarial Training of Neural Networks (DANN)โข Y. Ganin et al., ICML 2015โข Maximize domain confusion = maximize domain classification lossโข Minimize source-domain data classification lossโข The derived feature f can be viewed as a disentangled & domain-invariant feature.
14
Beyond Domain Confusion
โข Domain Separation Network (DSN)โข Bousmalis et al., NIPS 2016โข Separate encoders for domain-invariant and domain-specific featuresโข Private/common features are disentangled from each other.
15
Beyond Domain Confusion
โข Domain Separation Network, NIPS 2016โข Example results
32
Source-domain image Xs
Reconstruct private + shared featuresD(Ec(xs)+Ep(xs))
Reconstruct shared feature only D(Ec(xs))
Reconstruct private feature D(Ep(xs))
Target-domain image XT
Beyond Domain Confusion
โข Domain Separation Network, NIPS 2016โข Example results
32
Source-domain image Xs
Target-domain image XT
What to Cover Todayโฆโข Transfer Learning for Visual Classification & Synthesis
โข Visual Classificationโข Domain Adaptation & Adversarial Learning
โข Visual Synthesisโข Style Transfer
โข Representation Disentanglementโข Supervised vs. unsupervised feature disentanglement
Many slides from Richard Turner, Fei-Fei Li, Yaser Sheikh, Simon Lucey, Kaiming He, and J.-B. Huang 18
Transfer Learning for Manipulating Data?
โข TL not only addresses cross-domain classification tasks.โข Letโs see how we can synthesize and manipulate data across domains.
โข As a computer vision guy, letโs focus on visual data in this lectureโฆ
19
SourceDomain
TargetDomain
Transfer Learning for Image Synthesis
โข Cross-Domain Image Translationโข Pix2pix (CVPRโ17): Pairwise cross-domain training dataโข CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training dataโข UNIT (NIPSโ17): Learning cross-domain image representation (with unpaired training data)โข DTN (ICLRโ17) : Learning cross-domain image representation (with unpaired training data)โข Beyond image translation
20
A Super Brief Review forGenerative Adversarial Networks (GAN)โข Architecture of GAN
โข Loss:
21Goodfellow et al., Generative Adversarial Nets, NIPS, 2014
โ๐บ๐บ๐บ๐บ๐บ๐บ ๐บ๐บ,๐ท๐ท = ๐ผ๐ผ log 1 โ ๐ท๐ท (๐บ๐บ(๐ฅ๐ฅ)) + ๐ผ๐ผ log๐ท๐ท ๐ฆ๐ฆ
x
y
Pix2pix
โข Image-to-image translation with conditional adversarial networks (CVPRโ17)โข Can be viewed as image style transfer
22
Sketch Photo
Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017.
Pix2pixโข Goal / Problem Setting
โข Image translation across two distinct domains (e.g., sketch v.s. photo)
โข Pairwise training data
โข Method: Conditional GANโข Example: Sketch to Photo
โข GeneratorInput: SketchOutput: Photo
โข DiscriminatorInput: Concatenation of Input(Sketch) & Synthesized/Real(Photo) imagesOutput: Real or Fake
23
Testing Phase
GeneratedInput
Input
Concat
Concat
Training Phase
Input
real
Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017.
Pix2pix
24
โข Learning the model
GeneratedInput
Input
Concat
Training Phase
Concat
Input
Real
โ๐๐๐บ๐บ๐บ๐บ๐บ๐บ(G, D) = ๐ผ๐ผ ๐ฅ๐ฅ log 1 โ D(๐ฅ๐ฅ, G(๐ฅ๐ฅ)) + ๐ผ๐ผ ๐ฅ๐ฅ,๐ฆ๐ฆ log D ๐ฅ๐ฅ,๐ฆ๐ฆFake (Generated) Real
Concatenate Concatenate
โ๐ฟ๐ฟ๐ฟ(G) = ๐ผ๐ผ ๐ฅ๐ฅ,๐ฆ๐ฆ ๐ฆ๐ฆ โ G(๐ฅ๐ฅ) ๐ฟ
Reconstruction Loss
Conditional GAN loss
Overall objective functionGโ = arg min
GmaxD
โ๐๐๐บ๐บ๐บ๐บ๐บ๐บ G, D + โ๐ฟ๐ฟ๐ฟ(G)
Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017.
Pix2pix
25
โข Experiment results
Demo page: https://affinelayer.com/pixsrv/
Isola et al. " Image-to-image translation with conditional adversarial networks." CVPR 2017.
Transfer Learning for Image Synthesis
โข Cross-Domain Image Translationโข Pix2pix (CVPRโ17): Pairwise cross-domain training dataโข CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training dataโข UNIT (NIPSโ17): Learning cross-domain image representation (with unpaired training data)โข DTN (ICLRโ17) : Learning cross-domain image representation (with unpaired training data)โข Beyond image translation
26
CycleGAN/DiscoGAN/DualGAN
โข CycleGAN (CVPRโ17)โข Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial
Networks -to-image translation with conditional adversarial networks
27Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
โข Easier to collect training data
โข More practical
Paired Unpaired
1-to-1 Correspondence
No Correspondence
CycleGAN
28Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
โข Goal / Problem Settingโข Image translation across two distinct domains โข Unpaired training data
โข Ideaโข Autoencoding-like image translationโข Cycle consistency between two domains
Photo PaintingUnpaired
Cycle Consistency
Photo Painting Photo
Training data
Painting Photo Painting
Cycle Consistency
CycleGAN
29Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
โข Method (Example: Photo & Painting)
โข Based on 2 GANsโข First GAN (G1, D1): Photo to Paintingโข Second GAN (G2, D2): Painting to Photo
โข Cycle Consistencyโข Photo consistencyโข Painting consistency
Photo(Input)
Painting(Generated)
G1
D1
Painting(Real)
or Real / Fake
Photo(Generated)
Painting(Input)
G2
D2or Real / Fake
Photo(Real)
CycleGAN
30Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
โข Method (Example: Photo vs. Painting)
โข Based on 2 GANsโข First GAN (G1, D1): Photo to Paintingโข Second GAN (G2, D2): Photo to Painting
โข Cycle Consistencyโข Photo consistencyโข Painting consistency
Photo Consistency
Photo Painting PhotoG1 G2
Painting Photo Painting
Painting Consistency
G1G2
CycleGAN
31Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
โข Learning
โข Adversarial Lossโข First GAN (G1, D1):
โข Second GAN (G2, D2):
Overall objective functionG๐ฟโ , G2โ = arg min
G1,G2maxD1,D2
โ๐บ๐บ๐บ๐บ๐บ๐บ G๐ฟ, D๐ฟ + โ๐บ๐บ๐บ๐บ๐บ๐บ G2, D2 + โ๐๐๐ฆ๐ฆ๐๐ G๐ฟ, G2First GAN Second GAN
โ๐บ๐บ๐บ๐บ๐บ๐บ G๐ฟ, D๐ฟ = ๐ผ๐ผ log 1 โ D๐ฟ(G๐ฟ(๐ฅ๐ฅ)) + ๐ผ๐ผ log D๐ฟ ๐ฆ๐ฆ
โ๐บ๐บ๐บ๐บ๐บ๐บ G2, D2 = ๐ผ๐ผ log 1 โ D2(G2(๐ฆ๐ฆ)) + ๐ผ๐ผ log D2 ๐ฅ๐ฅ
Photo(Input)
Painting(Generated)
G1
D1
Painting(Real)
or
Real/ Fake
Photo(Generated)
Painting(Input)
G2
D2or Real/ Fake
Photo(Real)
๐ฅ๐ฅ G๐ฟ(๐ฅ๐ฅ)
๐ฆ๐ฆ
๐ฆ๐ฆ G2(๐ฆ๐ฆ)
๐ฅ๐ฅ
CycleGAN
32Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
โข Learning
โข Consistency Lossโข Photo and Painting consistency
Overall objective functionG๐ฟโ , G2โ = arg min
G1,G2maxD1,D2
โ๐บ๐บ๐บ๐บ๐บ๐บ G๐ฟ, D๐ฟ + โ๐บ๐บ๐บ๐บ๐บ๐บ G2, D2 + โ๐๐๐ฆ๐ฆ๐๐ G๐ฟ, G2Cycle Consistency
โ๐๐๐ฆ๐ฆ๐๐ G๐ฟ, G2 = ๐ผ๐ผ G2 G๐ฟ ๐ฅ๐ฅ โ ๐ฅ๐ฅ ๐ฟ + G๐ฟ G2 ๐ฆ๐ฆ โ ๐ฆ๐ฆ ๐ฟ
Photo Consistency
Photo Painting Photo
G1 G2๐ฅ๐ฅ G๐ฟ ๐ฅ๐ฅ G2 G๐ฟ ๐ฅ๐ฅ
Painting Photo Painting
Painting Consistency
G1G2๐ฆ๐ฆ G2 ๐ฆ๐ฆ G๐ฟ G2 ๐ฆ๐ฆ
CycleGAN
33Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.
โข Example results
Project Page: https://junyanz.github.io/CycleGAN/
Image Translation Using Unpaired Training Data
34
Zhu et al. "Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks." CVPR 2017.Kim et al. "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks.โ, ICML 2017
Yi, Zili, et al. "Dualgan: Unsupervised dual learning for image-to-image translation." ICCV 2017
โข CycleGAN, DiscoGAN, and DualGAN
CycleGANICCVโ17
DiscoGANICMLโ17
DualGANICCVโ17
Transfer Learning for Image Synthesis
โข Cross-Domain Image Translationโข Pix2pix (CVPRโ17): Pairwise cross-domain training dataโข CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training dataโข UNIT (NIPSโ17): Learning cross-domain image representation (with unpaired training data)โข DTN (ICLRโ17) : Learning cross-domain image representation (with unpaired training data)โข Beyond image translation
35
UNIT
โข Unsupervised Image-to-Image Translation Networks (NIPSโ17)โข Image translation via learning cross-domain joint representation
36Liu et al., "Unsupervised image-to-image translation networks.โ, NIPS 2017
๐ฅ๐ฅ๐ฟ๐ฅ๐ฅ2
๐ง๐ง๐ต๐ต: Joint latent space
๐ณ๐ณ๐ฟ ๐ณ๐ณ2
๐ฅ๐ฅ๐ฟ๐ฅ๐ฅ2
๐ง๐ง๐ต๐ต: Joint latent space
๐ณ๐ณ๐ฟ ๐ณ๐ณ2
Stage1: Encode to the joint space Stage2: Generate cross-domain images
Day Night Day Night
UNIT
โข Goal/Problem Settingโข Image translation
across two distinct domainsโข Unpaired training image data
โข Ideaโข Based on two parallel VAE-GAN models
37Liu et al., "Unsupervised image-to-image translation networks.โ, NIPS 2017
VAE GAN
UNIT
โข Goal/Problem Settingโข Image translation
across two distinct domainsโข Unpaired training image data
โข Ideaโข Based on two parallel VAE-GAN modelsโข Learning of joint representation
across image domains
38Liu et al., "Unsupervised image-to-image translation networks.โ, NIPS 2017
UNIT
โข Goal/Problem Settingโข Image translation
across two distinct domainsโข Unpaired training image data
โข Ideaโข Based on two parallel VAE-GAN modelsโข Learning of joint representation
across image domainsโข Generate cross-domain images
from joint representation
39Liu et al., "Unsupervised image-to-image translation networks.โ, NIPS 2017
Variation Autoencoder Loss
โข Learning
40
Overall objective functionGโ = arg min
GmaxD
โ๐๐๐บ๐บ๐๐ E๐ฟ, G๐ฟ, E2, G2 + โ๐บ๐บ๐บ๐บ๐บ๐บ G๐ฟ, D๐ฟ, G2, D2
โ๐๐๐บ๐บ๐๐ E๐ฟ, G๐ฟ, E2, G2 = ๐ผ๐ผ G๐ฟ E๐ฟ ๐ฅ๐ฅ๐ฟ โ ๐ฅ๐ฅ๐ฟ 2 + ๐ผ๐ผ ๐ฆ๐ฆโ(๐๐๐ฟ(๐ง๐ง)||๐๐(๐ง๐ง))๐ผ๐ผ G2 E2 ๐ฅ๐ฅ2 โ ๐ฅ๐ฅ2 2 + ๐ผ๐ผ ๐ฆ๐ฆโ(๐๐2(๐ง๐ง)||๐๐(๐ง๐ง))
VAEG๐ฟE๐ฟ D๐ฟ
E2 G2 D2
Adversarial Lossโ๐บ๐บ๐บ๐บ๐บ๐บ G๐ฟ, D๐ฟ, G2, D2 = ๐ผ๐ผ log 1 โ D๐ฟ(G๐ฟ(๐ง๐ง) + ๐ผ๐ผ log D๐ฟ ๐ฆ๐ฆ๐ฟ
๐ผ๐ผ log 1 โ D2(G2(๐ง๐ง) + ๐ผ๐ผ log D2 ๐ฆ๐ฆ2
G๐ฟ(๐ง๐ง)
G2(๐ง๐ง)
GAN
Variation Autoencoder Adversarial
UNIT
Generated
Variation Autoencoder Loss
โข Learning
41
โ๐๐๐บ๐บ๐๐ ๐ธ๐ธ๐ฟ,๐บ๐บ๐ฟ,๐ธ๐ธ2,๐บ๐บ2 = ๐ผ๐ผ ๐บ๐บ๐ฟ ๐ธ๐ธ๐ฟ ๐ฅ๐ฅ๐ฟ โ ๐ฅ๐ฅ๐ฟ 2 + ๐ผ๐ผ ๐ฆ๐ฆโ(๐๐๐ฟ(๐ง๐ง)||๐๐(๐ง๐ง))๐ผ๐ผ ๐บ๐บ2 ๐ธ๐ธ2 ๐ฅ๐ฅ2 โ ๐ฅ๐ฅ2 2 + ๐ผ๐ผ ๐ฆ๐ฆโ(๐๐2(๐ง๐ง)||๐๐(๐ง๐ง))
Adversarial Lossโ๐บ๐บ๐บ๐บ๐บ๐บ ๐บ๐บ๐ฟ,๐ท๐ท๐ฟ,๐บ๐บ2,๐ท๐ท2 = ๐ผ๐ผ ๐๐๐๐๐๐ 1 โ ๐ท๐ท๐ฟ(๐บ๐บ๐ฟ(๐ง๐ง) + ๐ผ๐ผ ๐๐๐๐๐๐๐ท๐ท๐ฟ ๐ฆ๐ฆ๐ฟ
๐ผ๐ผ ๐๐๐๐๐๐ 1 โ ๐ท๐ท2(๐บ๐บ2(๐ง๐ง) + ๐ผ๐ผ ๐๐๐๐๐๐๐ท๐ท2 ๐ฆ๐ฆ2Real
UNIT
Overall objective functionGโ = arg min
GmaxD
โ๐๐๐บ๐บ๐๐ E๐ฟ, G๐ฟ, E2, G2 + โ๐บ๐บ๐บ๐บ๐บ๐บ G๐ฟ, D๐ฟ, G2, D2
Variation Autoencoder Adversarial
VAEG๐ฟE๐ฟ D๐ฟ
E2 G2 D2
G๐ฟ(๐ง๐ง)
G2(๐ง๐ง)
GAN
UNIT
42
โข Example results
Liu et al., "Unsupervised image-to-image translation networks.โ, NIPS 2017
Sunny โ Rainy
Rainy โ Sunny
Real Street-view โ Synthetic Street-view
Synthetic Street-view โ Real Street-view
Github Page: https://github.com/mingyuliutw/UNIT
Transfer Learning for Image Synthesis
โข Cross-Domain Image Translationโข Pix2pix (CVPRโ17): Pairwise cross-domain training dataโข CycleGAN/DualGAN/DiscoGAN: Unpaired cross-domain training dataโข UNIT (NIPSโ17): Learning cross-domain image representation (with unpaired training data)โข DTN (ICLRโ17) : Learning cross-domain image representation (with unpaired training data)โข Beyond image translation
43
Domain Transfer Networks
โข Unsupervised Cross-Domain Image Generation (ICLRโ17)โข Goal/Problem Setting
โข Image translation across two domainsโข One-way only translationโข Unpaired training data
โข Idea โข Apply unified model to learn
joint representation across domains.
44Taigman et al., "Unsupervised cross-domain image generation.โ, ICLR 2017
Domain Transfer Networks
โข Unsupervised Cross-Domain Image Generation (ICLRโ17)โข Goal/Problem Setting
โข Image translation across two domainsโข One-way only translationโข Unpaired training data
โข Idea โข Apply unified model to learn
joint representation across domains.โข Consistency observed in image and feature spaces
45Taigman et al., "Unsupervised cross-domain image generation.โ, ICLR 2016
Image consistency
feature consistency
46
โข Learningโข Unified model to translate across domains
โข Consistency of feature and image space
โข Adversarial loss
Gโ = arg minG
maxD
โ๐๐๐๐๐๐ G + โ๐๐๐๐๐๐๐๐ G + โ๐บ๐บ๐บ๐บ๐บ๐บ G, D
โ๐๐๐๐๐๐ G = ๐ผ๐ผ ๐๐ ๐๐ ๐ฆ๐ฆ โ ๐ฆ๐ฆ 2
โ๐๐๐๐๐๐๐๐ G = ๐ผ๐ผ ๐๐(๐๐ ๐๐ ๐ฅ๐ฅ ) โ ๐๐(๐ฅ๐ฅ) 2
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D = ๐ผ๐ผ log 1 โ D(G(๐ฅ๐ฅ) + ๐ผ๐ผ log 1 โ D(G(๐ฆ๐ฆ) + ๐ผ๐ผ log D ๐ฆ๐ฆ
G D
Domain Transfer Networks
47
โข Learningโข Unified model to translate across domains
โข Consistency of image and feature space
โข Adversarial loss
Gโ = arg minG
maxD
โ๐๐๐๐๐๐ G + โ๐๐๐๐๐๐๐๐ G + โ๐บ๐บ๐บ๐บ๐บ๐บ G, D
โ๐๐๐๐๐๐ G = ๐ผ๐ผ ๐๐ ๐๐ ๐ฆ๐ฆ โ ๐ฆ๐ฆ 2
โ๐๐๐๐๐๐๐๐ G = ๐ผ๐ผ ๐๐(๐๐ ๐๐ ๐ฅ๐ฅ ) โ ๐๐(๐ฅ๐ฅ) 2
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D = ๐ผ๐ผ log 1 โ D(G(๐ฅ๐ฅ) + ๐ผ๐ผ log 1 โ D(G(๐ฆ๐ฆ) + ๐ผ๐ผ log D ๐ฆ๐ฆ
๐ฆ๐ฆ
๐ฅ๐ฅ
G D
Image consistency
feature consistency
G = {๐๐,๐๐}
Domain Transfer Networks
48
โข Learningโข Unified model to translate across domains
โข Consistency of feature and image space
โข Adversarial loss
Gโ = arg minG
maxD
โ๐๐๐๐๐๐ G + โ๐๐๐๐๐๐๐๐ G + โ๐บ๐บ๐บ๐บ๐บ๐บ G, D
โ๐๐๐๐๐๐ G = ๐ผ๐ผ ๐๐ ๐๐ ๐ฆ๐ฆ โ ๐ฆ๐ฆ 2
โ๐๐๐๐๐๐๐๐ G = ๐ผ๐ผ ๐๐(๐๐ ๐๐ ๐ฅ๐ฅ ) โ ๐๐(๐ฅ๐ฅ) 2
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D = ๐ผ๐ผ log 1 โ D(G(๐ฅ๐ฅ) + ๐ผ๐ผ log 1 โ D(G(๐ฆ๐ฆ) + ๐ผ๐ผ log D ๐ฆ๐ฆ
๐ฆ๐ฆ
๐ฅ๐ฅ
G DG(๐ฆ๐ฆ)
G(๐ฅ๐ฅ)
Domain Transfer Networks
49
โข Learningโข Unified model to translate across domains
โข Consistency of feature and image space
โข Adversarial loss
Gโ = arg minG
maxD
โ๐๐๐๐๐๐ G + โ๐๐๐๐๐๐๐๐ G + โ๐บ๐บ๐บ๐บ๐บ๐บ G, D
โ๐๐๐๐๐๐ G = ๐ผ๐ผ ๐๐ ๐๐ ๐ฆ๐ฆ โ ๐ฆ๐ฆ 2
โ๐๐๐๐๐๐๐๐ G = ๐ผ๐ผ ๐๐(๐๐ ๐๐ ๐ฅ๐ฅ ) โ ๐๐(๐ฅ๐ฅ) 2
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D = ๐ผ๐ผ log 1 โ D(G(๐ฅ๐ฅ) + ๐ผ๐ผ log 1 โ D(G(๐ฆ๐ฆ) + ๐ผ๐ผ log D ๐ฆ๐ฆ
๐ฆ๐ฆ
๐ฅ๐ฅ
G DG(๐ฆ๐ฆ)
G(๐ฅ๐ฅ)
Domain Transfer Networks
DTN
50
โข Example results
Taigman et al., "Unsupervised cross-domain image generation.โ, ICLR 2016
SVHN 2 MNIST Photo 2 Emoji
What to Cover Todayโฆโข Transfer Learning for Visual Classification & Synthesis
โข Visual Classificationโข Domain Adaptation & Adversarial Learning
โข Visual Synthesisโข Style Transfer
โข Representation Disentanglementโข Supervised vs. unsupervised feature disentanglement
Many slides from Richard Turner, Fei-Fei Li, Yaser Sheikh, Simon Lucey, Kaiming He, and J.-B. Huang 51
Beyond Image Style Transfer:Learning Interpretable Deep Representationsโข Faceapp โ Putting a smile on your face!
โข Deep learning for representation disentanglement โข Interpretable deep feature representation
InputMr. Takeshi Kaneshiro
52
Recall: Generative Adversarial Networks (GAN)
โข Architecture of GANโข Loss
Goodfellow et al., Generative Adversarial Nets, NIPS, 2014
โ๐บ๐บ๐บ๐บ๐บ๐บ ๐บ๐บ,๐ท๐ท = ๐ผ๐ผ log 1 โ ๐ท๐ท (๐บ๐บ(๐ฅ๐ฅ)) + ๐ผ๐ผ log๐ท๐ท ๐ฆ๐ฆ
x
y
53
Representation Disentanglement
โข Goalโข Interpretable deep feature representationโข Disentangle attribute of interest c from the derived latent representation zโข Possible solutions: VAE, GAN, or mix of themโฆ
GLatent feature z
(uninterpretable)
InterpretableFactor c
(e.g., season)
54
Representation Disentanglement
โข Goalโข Interpretable deep feature representationโข Disentangle attribute of interest c from the derived latent representation z
โข Supervised setting: from VAE to conditional VAE
55
Representation Disentanglement
โข Conditional VAEโข Given training data x and attribute of interest c,
we model the conditional distribution ๐๐๐๐ ๐ฅ๐ฅ|๐๐ .
https://zhuanlan.zhihu.com/p/25518643 56
Representation Disentanglement
โข Conditional VAEโข Example Results
57
Representation Disentanglement
โข Conditional GANโข Interpretable latent factor cโข Latent representation z
https://arxiv.org/abs/1411.1784 58
Representation Disentanglement
โข Goalโข Interpretable deep feature representationโข Disentangle attribute of interest c from the derived latent representation z
โข Unsupervised: InfoGANโข Supervised: AC-GAN
InfoGANChen et al.
NIPS โ16
ACGANOdena et al.
ICML โ17
Chen et al., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets., NIPS 2016.Odena et al., Conditional image synthesis with auxiliary classifier GANs. ICMLโ17 59
AC-GAN
Odena et al., Conditional image synthesis with auxiliary classifier GANs. ICMLโ17
Real dataw.r.t. its domain label
โข Supervised Disentanglement
Gโ = arg minG
maxD
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D + โ๐๐๐๐๐๐ G, D
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D = ๐ผ๐ผ log 1 โ D(G(๐ง๐ง, ๐๐)) + ๐ผ๐ผ log D ๐ฆ๐ฆ
โ๐๐๐๐๐๐ G, D = ๐ผ๐ผ โ log Dcls(๐๐โฒ|๐ฆ๐ฆ) + ๐ผ๐ผ โ log Dcls(๐๐|G(๐ฅ๐ฅ, ๐๐))
โข Learningโข Overall objective function
โข Adversarial Loss
โข Disentanglement lossG
D
๐ง๐ง๐๐
G(๐ง๐ง, ๐๐)๐ฆ๐ฆ (real)
Supervised
Generated dataw.r.t. assigned label
60
AC-GAN
Odena et al., Conditional image synthesis with auxiliary classifier GANs. ICMLโ17
โข Supervised Disentanglement
G
D
๐ง๐ง๐๐
G(๐ง๐ง, ๐๐)๐ฆ๐ฆ (real)
Supervised
Different ๐๐ values
61
InfoGAN
โข Unsupervised Disentanglement
Generated dataw.r.t. assigned label
Gโ = arg minG
maxD
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D + โ๐๐๐๐๐๐ G, D
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D = ๐ผ๐ผ log 1 โ D(G(๐ง๐ง, ๐๐)) + ๐ผ๐ผ log D ๐ฆ๐ฆ
โ๐๐๐๐๐๐ G, D = ๐ผ๐ผ โ log Dcls(๐๐|G(๐ฅ๐ฅ, ๐๐))
โข Learningโข Overall objective function
โข Adversarial Loss
โข Disentanglement loss
Chen et al., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets., NIPS 2016.62
InfoGAN
โข Unsupervised Disentanglement
Chen et al., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets., NIPS 2016.
โข No guarantee in disentangling particular semantics
Different ๐๐
Rotation Angle Width
Training process
Different ๐๐
Time
Loss
63
What to Cover Todayโฆโข Transfer Learning for Visual Classification & Synthesis
โข Visual Classificationโข Domain Adaptation & Adversarial Learning
โข Visual Synthesisโข Style Transfer
โข Representation Disentanglementโข Supervised vs. unsupervised feature disentanglementโข Joint style transfer & feature disentanglement
Many slides from Richard Turner, Fei-Fei Li, Yaser Sheikh, Simon Lucey, Kaiming He, and J.-B. Huang 64
StarGAN
โข Goalโข Unified GAN for multi-domain image-to-image translation
Choi et al. "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation." CVPR 2018
Traditional Cross-Domain Models Unified Multi-Domain Model(StarGAN)
65
StarGAN
โข Goalโข Unified GAN for multi-domain image-to-image translation
Choi et al. "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation." CVPR 2018
Traditional Cross-Domain ModelsUnified Multi-Domain Model
(StarGAN)
UnifiedG
๐บ๐บ๐ฟ2
๐บ๐บ๐ฟ3 ๐บ๐บ24
๐บ๐บ34
๐บ๐บ๐ฟ4
๐บ๐บ23
66
StarGAN
โข Goal / Problem Settingโข Single image translation model across
multiple domains โข Unpaired training data
67
StarGAN
โข Goal / Problem Settingโข Single Image translation model across multiple
domains โข Unpaired training data
โข Ideaโข Concatenate image and target domain label as input of generatorโข Auxiliary domain classifier on Discriminator
Target domain Image
68
StarGAN
โข Goal / Problem Settingโข Single Image translation model across multiple
domains โข Unpaired training data
โข Ideaโข Concatenate image and target domain label as input of
Generatorโข Auxiliary domain classifier as discriminator too
69
StarGAN
โข Goal / Problem Settingโข Single Image translation model across
multiple domains โข Unpaired training data
โข Ideaโข Concatenate image and target domain label as input of
Generatorโข Auxiliary domain classifier on Discriminatorโข Cycle consistency across domains
70
StarGANโข Goal / Problem Setting
โข Single Image translation model across multiple domains
โข Unpaired training data
โข Ideaโข Auxiliary domain classifier as discriminatorโข Concatenate image and target domain label as inputโข Cycle consistency across domains
71
StarGAN
โข LearningOverall objective function
Gโ = arg minG
maxD
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D + โ๐๐๐๐๐๐ G, D + โ๐๐๐ฆ๐ฆ๐๐ G
72
StarGAN
โข Learning
โข Adversarial Loss
Overall objective functionGโ = arg min
GmaxD
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D + โ๐๐๐๐๐๐ G, D + โ๐๐๐ฆ๐ฆ๐๐ G
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D = ๐ผ๐ผ log 1 โ D(G(๐ฅ๐ฅ, ๐๐)) + ๐ผ๐ผ log D ๐ฆ๐ฆ
Adversarial Loss ๐ฆ๐ฆG(๐ฅ๐ฅ, ๐๐)
๐ฅ๐ฅ๐๐
73
StarGAN
โข Learning
โข Adversarial Loss
โข Domain Classification Loss (Disentanglement)
Overall objective functionGโ = arg min
GmaxD
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D + โ๐๐๐๐๐๐ G, D + โ๐๐๐ฆ๐ฆ๐๐ G
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D = ๐ผ๐ผ log 1 โ D(G(๐ฅ๐ฅ, ๐๐)) + ๐ผ๐ผ log D ๐ฆ๐ฆ
Domain Classification Loss ๐ฆ๐ฆG(๐ฅ๐ฅ, ๐๐)
๐ฅ๐ฅ๐๐
โ๐๐๐๐๐๐ G, D = ๐ผ๐ผ โ log Dcls(๐๐โฒ|๐ฆ๐ฆ) + ๐ผ๐ผ โ log Dcls(๐๐|G(๐ฅ๐ฅ, ๐๐))
74
StarGAN
โข Learning
โข Adversarial Loss
โข Domain Classification Loss (Disentanglement)
Overall objective functionGโ = arg min
GmaxD
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D + โ๐๐๐๐๐๐ G, D + โ๐๐๐ฆ๐ฆ๐๐ G
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D = ๐ผ๐ผ log 1 โ D(G(๐ฅ๐ฅ, ๐๐)) + ๐ผ๐ผ log D ๐ฆ๐ฆ
Domain Classification Loss ๐ฆ๐ฆG(๐ฅ๐ฅ, ๐๐)
โ๐๐๐๐๐๐ G, D = ๐ผ๐ผ โ log Dcls(๐๐โฒ|๐ฆ๐ฆ) + ๐ผ๐ผ โ log Dcls(๐๐|G(๐ฅ๐ฅ, ๐๐))
Real dataw.r.t. its domain label
๐๐
๐ฅ๐ฅ
Dcls(๐๐โฒ|๐ฆ๐ฆ)
75
StarGAN
โข Learning
โข Adversarial Loss
โข Domain Classification Loss (Disentanglement)
Overall objective functionGโ = arg min
GmaxD
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D + โ๐๐๐๐๐๐ G, D + โ๐๐๐ฆ๐ฆ๐๐ G
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D = ๐ผ๐ผ log 1 โ D(G(๐ฅ๐ฅ, ๐๐)) + ๐ผ๐ผ log D ๐ฆ๐ฆ
Domain Classification Loss ๐ฆ๐ฆG(๐ฅ๐ฅ, ๐๐)
โ๐๐๐๐๐๐ G, D = ๐ผ๐ผ โ log Dcls(๐๐โฒ|๐ฆ๐ฆ) + ๐ผ๐ผ โ log Dcls(๐๐|G(๐ฅ๐ฅ, ๐๐))
Generated dataw.r.t. assigned label
๐๐ ๐ฅ๐ฅ
Dcls(๐๐|G(๐ฅ๐ฅ, ๐๐))
76
StarGAN
โข Learning
โข Adversarial Loss
โข Domain Classification Loss (Disentanglement)
โข Cycle Consistency Loss
Overall objective functionGโ = arg min
GmaxD
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D + โ๐๐๐๐๐๐ G, D + โ๐๐๐ฆ๐ฆ๐๐ G
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D = ๐ผ๐ผ log 1 โ D(G(๐ฅ๐ฅ, ๐๐)) + ๐ผ๐ผ log D ๐ฆ๐ฆ
Consistency Loss
๐๐๐ฅ๐ฅ
G(๐ฅ๐ฅ, ๐๐)
๐ฅ๐ฅ๐๐โ๐๐๐๐๐๐ G, D= ๐ผ๐ผ โ log Dcls(๐๐โฒ|๐ฆ๐ฆ) + ๐ผ๐ผ โ log Dcls(๐๐|G(๐ฅ๐ฅ, ๐๐))
โ๐๐๐ฆ๐ฆ๐๐ G = ๐ผ๐ผ G G ๐ฅ๐ฅ, ๐๐ , ๐๐๐ฅ๐ฅ โ ๐ฅ๐ฅ ๐ฟ
G G ๐ฅ๐ฅ, ๐๐ , ๐๐๐ฅ๐ฅ
77
StarGAN
โข Learning
โข Adversarial Loss
โข Domain Classification Loss
โข Cycle Consistency Loss
Overall objective functionGโ = arg min
GmaxD
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D + โ๐๐๐๐๐๐ G, D + โ๐๐๐ฆ๐ฆ๐๐ G
โ๐บ๐บ๐บ๐บ๐บ๐บ G, D = ๐ผ๐ผ log 1 โ D(G(๐ฅ๐ฅ, ๐๐)) + ๐ผ๐ผ log D ๐ฆ๐ฆ
โ๐๐๐๐๐๐ G, D = ๐ผ๐ผ โ log Dcls(๐๐โฒ|๐ฆ๐ฆ) + ๐ผ๐ผ โ log Dcls(๐๐|G(๐ฅ๐ฅ, ๐๐))
โ๐๐๐ฆ๐ฆ๐๐ G = ๐ผ๐ผ G G ๐ฅ๐ฅ, ๐๐ , ๐๐๐ฅ๐ฅ โ ๐ฅ๐ฅ ๐ฟ
78
StarGANโข Example results
โข StarGAN can somehow be viewed as a representation disentanglement model, instead of an image translation one.
Choi et al. "StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation." CVPR 2018
Multiple Domains Multiple Domains
Github Page: https://github.com/yunjey/StarGAN
79
What to Cover Todayโฆโข Transfer Learning for Visual Classification & Synthesis
โข Visual Classificationโข Domain Adaptation & Adversarial Learning
โข Visual Synthesisโข Style Transfer
โข Representation Disentanglementโข Supervised vs. unsupervised feature disentanglementโข Joint style transfer & feature disentanglement
Many slides from Richard Turner, Fei-Fei Li, Yaser Sheikh, Simon Lucey, Kaiming He, and J.-B. Huang 80
A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation
81
โข Learning interpretable representations
A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation
82
โข Learning interpretable representations
A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation
83
โข Learning interpretable representations
Example Results
โข Face image translation
84
Example Results
โข Multi-attribute image translation
85
Next Week
86
Guest Lectures:1. โThe Paradigm Shift in AIโ
- 2:20pm-3:10pm- Dr. Trista ChenChief Scientist of Machine LearningInventec Corp.
2. โAI้ซ็ๅตๆฅญๅไบซโ- 3:30pm-4:20pm- David ChouFounder & CEO of Deep01