[DL輪読会]Learning What and Where to Draw (NIPS’16)

論⽂輪読LearningWhatandWhereto

Draw(NIPS’16)

2017/1/20 1

書誌情報• LearningWhatandWheretoDraw• ScottReed(Google),Zeynep Akata (MPI),SantoshMohan(umich),SamuelTenka (umich),Bernt Schiele(MPI),Honglak Lee(umich)• NIPS‘16(ConferenceEventType:Poster)• https://papers.nips.cc/paper/6111-learning-what-and-where-to-draw

2017/1/20 2

c.f.GenerativeAdversarialTexttoImageSynthesis

• ICML’16• http://www.slideshare.net/mmisono/generative-adversarial-text-to-image-synthesis

2017/1/20 3

2017/1/20 4

2017/1/20 5

GenerativeAdversarialWhat-WhereNetwork(GAWWN)•「なに」を「どこ」に描くか指定する GAN

⽂章 bondingbox/keypoint

2017/1/20 6

Bounding-box-conditionaltext-to-imagemodel1. textembeddingをMxMxTに変換2. boundingboxに合うように正規化.周りは0で埋める

0でマスクMxMxT 0でマスク

2017/1/20 7

Keypoint-conditionaltext-to-imagemodelKeyPointはグリッド座標で指定それぞれがhead,leftfoot,などに対応

2017/1/20 8

Conditionalkeypoint generationmodel

•全てのキーポイントを⼊⼒するのは⾯倒• 今回の実験では，⿃は15個のキーポイントを持つ

•ここではConditionalGANでキーポイントを⽣成

•キーポイント :• x,y :座標,v:visibleflag• v=0なら x=y=0

• Generator:

• Dはを1,合成したものを0とするよう学習

s:ユーザが指定したキーポイントに対応する箇所が1

2017/1/20 9

Experiments:Dataset

• USBBirdsdataset• 200種類の⿃，11,788枚の画像• 1枚の画像に10のキャプション,1つのboundingbox,15のkeypoints

• MHP• 25kimage,410種類の動作• 各画像3キャプション

• 複数⼈が写っている画像を除くと19k

2017/1/20 10

Experiments:Misc

• textencoder:char-CNN-GRU• GenerativeAdversarialTextToImageSynthesisと多分同じ

• Solver:Adam• Batchsize 16• Learningrate0.0002

•実装 :torch• spatialtransform:https://github.com/qassemoquab/stnbhwd• looselybasedondcgan.torch

2017/1/20 11

Conditionalbirdlocationviaboundingboxes

textとnoiseは3つとも同じ・背景は似ている3つの画像で同じではない・boundingboxが変わっても⿃の向きは同じ・zは背景や向きなど制御できない情報を担当しているのでは2017/1/20 12

Conditionalindividualpartlocationsviakeypoints

・keypointsは groundtruthに固定 (合成でない)・noiseは各例で別

・keypointsはnoiseに対してinvaliant・背景等はnoiseで変化

2017/1/20 13

Usingkeypoints condition

・くちばしと尻尾を指定・全ての⿃が左を向いている (c.f.conditiononboundingbox)

2017/1/20 14

Generatingbothbirdkeypoints andimagesfromtextalone

・textだけからkeypointsを⽣成，その後画像⽣成・全部keypointsを⽣成するようにすると質は下がる2017/1/20 15

先⾏研究との⽐較・先⾏研究はtextはほぼ正確に捉えているものの，くちばちなどが⽋けることがある (64x64)・提案⼿法は128x128でほぼ正確な画像を⽣成

2017/1/20 16

GeneratingHuman

・⿃より質が下がる・textが似ているものが少ない，複雑なポーズは難しい (ヨガぐらいならまぁまぁできてる)2017/1/20 17

まとめ• GAWWN:boundingboxとkeypointsでどこに描くかを条件付け

• CUBdatasetでは128x128で質の⾼い画像が⽣成可能

• Futurework• 物体の位置を unsupervisedorweeklysupervisedな⽅法で学習• bettertext-to-humangeneration

2017/1/20 18

所感•「どこ」の情報をどうエンコードするか，という点が新しい• boundingbox• keypoints

•⽂章だけだと任意性が⾼すぎる．位置情報を与えてあげることで画像が⽣成しやすくなる

•細かいネットワーク構成に関しては，なぜそういう設計にしたか説明がないため不明• もう少し何か理論的根拠が欲しいところ

2017/1/20 19

[DL輪読会]Learning What and Where to Draw (NIPS’16)

Technology

Draw records

Dual Learning for Machine Translation (NIPS 2016)

CUT & DRAW

Clase02 Draw

Active Learning from Imperfect Labelers @ NIPS読み会・関西

N.EXTECHS I.NDOOR P.OSITIONING S - Tenenga · N.EXTECHS I.NDOOR P.OSITIONING S.YSTEM NIPS – AN ULTRA WIDE BAND REAL TIME POSITIONING SYSTEM . WHAT NIPS IS AND HOW IT WORKS NIPS

nips勉強会_Toward Property-Based Classification of Clustering Paradigms

Nips yomikai 1226

Belajar corel draw - menguak rahasia corel draw 12

輪島市ホームページ | 輪島市輪島市ホームページ | 輪島市

Sim u Lasik Em Ampua Nips

NIPS 2010 読む会

NIPS2014読み会 NIPS参加報告

[Dl輪読会]dl hacks輪読

NIPS 2012 読む会

Belajar corel draw menguak rahasia corel draw 12

BIOVIA DRAW - Accelrysaccelrys.com/products/pdf/biovia-draw-ds.pdfBIOVIA Draw (32 and 64bit) enables scientists to draw and edit complex biologics, molecules and chemical reactions

[DL輪読会]Combining Fully Convolutional and Recurrent Neural Networks for 3D Biomedical Image Segmentation (NIPS 2016 Poster)／U-Net: Convolutional Networks for Biomedical Image

前輪と後輪の荷重分布138 テクニカル前輪と後輪の荷重分布前輪荷重 0.1W 後輪荷重 0.4W 後輪の輪帯幅後輪の接地長さ後輪の輪帯幅 139

Physics of Information - NiPS) Lab