CVPR2016読み会 Learning Sparse High Dimensional Filters

コンピュータビジョン勉強会 @ 関東CVPR2016 読み会 ( 後編 )

Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks

Jin Yamanaka 7/22/2016

今日伝えたいこと

• 今後ともよろしくお願いします• word embedding 面白い• CNN を多次元疎空間の特徴検出に使う方法• BNN (Bilateral Neural Network) ??

自己紹介

http://jtpa.org

http://jtpa.org/



how can you find your paper?

OMG 643 件だって？どうやって選べばいいんだ！

Feel! Don’t think…

…

643 たいのてきがあらわれた !! 　まほうはつうじない！

using word embedding (word2vec)

htmlunit

PDF RDB(MySQL)

corpus( テキストファイル )

pdfbox java

word (paper) embedding

vector

word2vecscikit-learn

論文の傾向把握論文の検索

word2vectensorflow word2vec_basic をベースに各種追加・改造https://github.com/tensorflow/tensorflow/blob/r0.9/tensorflow/examples/tutorials/word2vec/word2vec_basic.py

教師なし学習

https://github.com/tensorflow/tensorflow/blob/r0.9/tensorflow/examples/tutorials/word2vec/word2vec_basic.py

https://github.com/tensorflow/tensorflow/blob/r0.9/tensorflow/examples/tutorials/word2vec/word2vec_basic.py

word2vec (result)

・論文から抽出したコーパスは 20MB・十分に収束するまで 3 万 -10 万ステップ・ MBP で 10 分ぐらい・辞書は 8,000 words, 96 次元にマップ

multiple: different, two, all, three, several, textureless, has, various

approach: method, algorithm, framework, methods, model, also, algorithms, proposed_method

deep_learning: existing, proposed, neural_network, immediately, approaches, ultimate, compactness

table: figure, fig, comparisons, average, quantitative, auc, plot

detection: classification, localization, object_detection, prediction, recognition, tracking, segmentation

image: photos, input_image, video, scene, patch, channel, render, input

depth: disparity, rgb, input, reaction, saliency, raw, from, optical_flow 意味というより、類義語を学ぶ？

Paper embedding vectors (result)

・ abstract ＋ title 内の単語ベクトルの平均＝論文ベクトルと仮定・ t-sne による自動分類・ Deep Learning VS 3D ?

・多重領域上のものは恐らく両方の話題を含む？・綺麗に分離はできないが、似ている論文はそれなりにまとまる・マイナーな領域の論文は分離不可

試しにビジュアライズ

他の皆さんの発表論文をマッピング、結構バランスが良いこれが見えざる神の手か？とすると次なる神の一手は・・・

ピカチュウ！　君に決めた！

Learning Sparse High Dimensional Filters…

Paper auto clustering tools!

　似ている論文を探してくれる論文の番号を指定すると、その論文と近い論文ベクトルを持つものを表示するquestion: 575 “Sparse Coding for Third-Order Super-Symmetric Tensor Descriptors With …”

answer: 630 “TenSR: Multi-Dimensional Tensor Sparse Representation”

　キーワードを指定すると、論文を探してくれるキーワードを指定すると、それらを含む論文の中で最も平均的なもの？（重心に近いもの）を表示する　question: “3D”, “unsupervised”

answer: 164 “Dense Human Body Correspondences Using Convolutional Networks”

おまけで作ってみた

Paper auto clustering tools!

　各キーワードの popularity (n=643)

おまけで調べてみた

Deep Learning: 354CNN, DCNN: 187RNN, LSTM: 48

HOG, SIFT, Saliency Map: 39MRF, CRF: 36Optical Flow: 21SVM, Logistic Regression: 14Light Field: 9Sparse Coding: 8Random Forest: 4Infra Red: 4HDR: 3

DL 大人気！RNN も増加中 ?

対抗は MRF / CRF

Data sets: 212Framework: 139

フレームワークやデータセットにも注目が集まる

3D, Stereo, RGB-D, Depth: 151Video, Movie: 123Text, Text Detection: 28Real Time: 26Multi View, Multi Angle: 24

動画や 3D への応用が進行中

Image Classification: 104Segmentation, Contour Detection: 81 Object Detection / Recognition: 68Pose Estimation, Action Recognition / Prediction: 48Scene Representation, Image Annotation, Question Answering: 12

難易度順？

Supervised: 39Unsupervised: 31Semi-Supervised, Zero-Shot Learning: 25Weakly Supervised: 19Fine Tuning: 7



キーワード CRF 、 CNN 、 Bilateral Filter 、Permutohedral Lattice

Learning Sparse High Dimensional Filters:Image Filtering, Dense CRFs and Bilateral Neural Networks

・学習型の Bilateral Filter の提案　（ちょっと Deep ）　→多次元空間での convolutional なフィルタは　　バックプロパゲーションできるよ・疎な高次元の特徴空間で、性能、速度　共に良い　→性能が良いのは学習型にしたから　→速度が速いのは多角形型の格子モデルを使うから

Learning Sparse High Dimensional Filters:Image Filtering, Dense CRFs and Bilateral Neural Networks

1. 　 Deep

2. 　 R,G,B ＋ Depth などクロスモーダルで使える　3. 　 Permutohedral Lattice 面白そう

My Points

とっても便利なバイラテラルフィルタ今日はこいつを拡張します

1. 　高次元の特徴量 (r,g,b) へ対応2. 　疎な空間でも効率よく　3. 　ガウシアンの代わりに NN 使ったら性能上がるんじゃね？4. 　いろいろなアプリケーションで性能を検証

1. Introduction

今までの研究では高速化が多く自己学習型の試みは少ない

" コンピューテイショナルフォトグラフィ " のスライドからhttp://www.slideshare.net/FukushimaNorishige/ss-11861123

"非技術者でもわかる（？）コンピュータビジョン紹介資料 " のスライドからhttp://www.slideshare.net/takmin/20140710-cv

重みを学習で得る

http://www.slideshare.net/takmin/20140710-cv

5 次元の特徴量を作り、その空間上での距離から重みをつけるr,g,b以外にも depth などを入れたりできる

多次元空間を考える

画像上の各セルに必ずデータがある密な場合と違う（空間上に点在するものをどう畳み込むか？）

2010年ごろから研究されている？上記はブラーのサンプル

permutohedral lattice

探索量は通常のバイラテラルフィルタと変わらない

VS

5 次元以上の空間で付近のデータを重み付きで畳み込むのに適している疎な空間に対してなかなか良い

これらを統合し、一般化

2. Related Work

Image Adaptive Filtering:今までのアプローチは、畳み込み動作の高速化、ガウス分布の近似エラーを小さく（ Bilateral Grid 、 Gaussian KDtrees 、 Permutohedral Lattice)

Neural Networks: CNN でフィルタを学習で得られるDense CRF: CNN+CRF で今までの最高の性能を発揮

3. Learning Sparse High Dimensional Filters

勾配を計算させれば、バックプロパゲーションできる！

loss

学習用ラベルいらない！

4. Single Bilateral Filter Applications

Upsampling を通じて joint bilateral filter の問題を確認しよう

1. 高解像度のグレイイメージと低解像度のカラーイメージから高解像度のカラーイメージを生成2. 高解像度のカラーイメージと低解像度のデプスイメージから高解像度のデプスイメージを生成

4.1. Joint Bilateral Upsampling

学習型フィルタいいね！

Pascal VOC2012 segmentation [19] using train, val and test splits, and 200 higher resolution (2MP) imagesfrom Google image search [1] with 100 train, 50 validation and 50 test

grey imagecolor image

5. Learning Pairwise Potentials in Dense CRFs

バイラテイラルフィルタの結果と CNN の結果両方を考慮する[ 一般化された表現 ] としてこれを利用する

単項の確率：can be CNN output

ノーマライズ用係数

5.2. Learning Pairwise Potentials

CNN の結果に、バイラテラル項の結果も利用する

Segmentation results (using already learned data as initial)



6. Bilateral Neural Networks

具体的な状況でのモデルを作って検討してみよう1. 新たなモデルを作る

2. 既存の DeepLearning モデルにバイラテラル項を加えてみる？

我々はガウシアンの呪縛から解き放たれた！さらに Force （ NN ）の力添えも得た

6.1 Segmenting Tiles

20 x 20

64 x 64

ガウシアンノイズを乗せてあげて、領域を判定させるカラー空間なのがちょっとポイント（各色がランダムなので近傍の色の違いを見て判断する）

10k training, 1k validation and 1k test

→本日の対戦相手： CNN Layers

Windows Size: 9, 13, 17, 21

6.1 Segmenting Tiles (result)

→BCL: Bilateral Convolution LayersBNN = BCL を一層以上含む CNN layers CNN は形を見るので苦手 ?BNN(BCL) はバイラテラル要素がある ...

あれ、 CNN-21x21 の方が性能良くない？

安心してください。早いですよ？CNN 21x21 -> 282k params vs BNN(3x3) -> 40k params

6.2. Character Recognition

次は CNN の得意分野に殴り込むぜ、文字認識や！

→本日の対戦相手： LeNet-7, DeepCNet

対するは本提案の BNN-LeNet 、 BNN-DeepCNet これはそれぞれの CNN レイヤーの最初の部分を BCL にしたもの

6.2. Character Recognition (result)

Average CPU/GPU runtime (in ms)

7. Conclusion

高次元の特徴量空間でも CNN ライクに使える

バイラテラルフィルタは今までいろいろな分野で活用されてきた。この手法は使える。

今まで使われてきた CNN について、これらを BNN に置き換えていくことでさらなる飛躍が得られるのではないか



Thank you!

Science

CVPR2016読み会 Learning Sparse High Dimensional Filters