40
2017/8/19 CV @ CVPR2017

Global optimality in neural network training

Embed Size (px)

Citation preview

Page 1: Global optimality in neural network training

2017/8/19 CV @CVPR2017

Page 2: Global optimality in neural network training
Page 3: Global optimality in neural network training

••

••

••••

Page 4: Global optimality in neural network training
Page 5: Global optimality in neural network training

ShalloworDeep?

Good Bad

Localminima

Globalminima

Page 6: Global optimality in neural network training

••

Page 7: Global optimality in neural network training

Page 8: Global optimality in neural network training

•𝐿 𝜙 𝜃

𝑓 𝑤 =' 𝐿 𝑦), 𝜙 𝑥); 𝑤�

)+ 𝜃 𝑤

•••

𝑓 𝑾 = 𝐿 𝒀,𝝓 𝑾 + 𝜃 𝑾

Page 9: Global optimality in neural network training

𝑓 𝛼𝑾 = 𝐿 𝒀, 𝛼3𝝓 𝑾 + 𝛼3𝜃 𝑾 ,( 𝛼 > 0)

Page 10: Global optimality in neural network training

•• ℎ 𝛼𝑾 = 𝛼ℎ 𝑾 • ℎ 𝛼𝑾 = 𝛼3ℎ 𝑾 • 𝛼 > 0

••

Localminima

0 W

f

Page 11: Global optimality in neural network training

•• 𝒚 = 𝑾𝒙 𝑾 ∈ ℝ<=×<?• 𝑾 𝛼 𝛼

••

𝒙

𝑾𝟏𝑾𝟐 𝑾𝟑 𝑾𝟒

𝑾𝟏𝒙 𝑾𝟐∘𝟏𝒙 𝑾𝟑∘𝟐∘𝟏𝒙

𝑾𝟒∘𝟑∘𝟐∘𝟏𝒙

𝛼𝑾𝟏 𝛼𝑾𝟐 𝛼𝑾𝟑𝛼𝑾𝟒

𝛼𝟒𝑾𝟒∘𝟑∘𝟐∘𝟏𝒙

𝛼𝑾𝟏𝒙 𝛼E𝑾𝟐∘𝟏𝒙 𝛼F𝑾𝟑∘𝟐∘𝟏𝒙

Page 12: Global optimality in neural network training

0

𝛼𝑤G

𝛼𝑤E

𝛼𝑤F

𝛼𝑧 max 𝛼𝑧, 0

max 𝛼𝑧G, 𝛼𝑧E, 𝛼𝑧F, 𝛼𝑧L

𝛼をそのまま通す

(正斉次性を崩す加減算などが無い)

Page 13: Global optimality in neural network training

•••••••

Page 14: Global optimality in neural network training

IN

Conv

+

ReLU

Conv

+

ReLU

LinearMaxPool

Out

𝛼𝑾𝟏 𝛼𝑾𝟐 𝛼𝑾𝟑𝒙

𝜙 𝛼𝑾 = 𝛼𝑾𝟑 𝑀𝑃 𝜑 𝛼𝑾𝟐 ∗ 𝜑 𝛼𝑾𝟏 ∗ 𝒙

= 𝛼F𝑾𝟑 𝑀𝑃 𝜑 𝑾𝟐 ∗ 𝜑 𝑾𝟏 ∗ 𝒙

= 𝛼F𝜙 𝑾

Page 15: Global optimality in neural network training

•••

𝜃 𝑾𝟏,𝑾𝟐,𝑾𝟑,𝑾𝟒 = 𝑾𝟏QE + 𝑾𝟐

QE + 𝑾𝟑

QE + 𝑾𝟒

QE

𝜃 𝛼𝑾𝟏, 𝛼𝑾𝟐, 𝛼𝑾𝟑, 𝛼𝑾𝟒 = 𝛼E𝜃 𝑾𝟏,𝑾𝟐,𝑾𝟑,𝑾𝟒

𝜙 𝛼𝑾𝟏, 𝛼𝑾𝟐, 𝛼𝑾𝟑, 𝛼𝑾𝟒 = 𝛼L𝜙 𝑾𝟏,𝑾𝟐,𝑾𝟑,𝑾𝟒

正斉次性を満たさないため、局所解を持つ

正則化項:

ネットワーク:

Page 16: Global optimality in neural network training

𝜃 𝑾𝟏,𝑾𝟐,𝑾𝟑,𝑾𝟒 = 𝑾𝟏 𝑾𝟐 𝑾𝟑 𝑾𝟒

𝜃 𝑾𝟏,𝑾𝟐,𝑾𝟑,𝑾𝟒 = 𝑾𝟏QL + 𝑾𝟐

QL + 𝑾𝟑

QL + 𝑾𝟒

QL

or

𝜃 𝛼𝑾 = 𝛼R𝜃 𝑾

𝜙 𝛼𝑾 = 𝛼R𝜙 𝑾

Page 17: Global optimality in neural network training

••

Page 18: Global optimality in neural network training

• 𝜖が微小になると、左辺が無視できる。

• ネットワークの次数が正則化項の次数より大きい𝑝 > 𝑝V こととする

• 右辺は正則化項なのでW≠0のとき、ゼロより大きい

Page 19: Global optimality in neural network training

> 0

W=0から少しでもズレるとfの値は大きくなる

ネットワーク項の次数>正則化項の次数のとき、W=0は局所解

Page 20: Global optimality in neural network training

Page 21: Global optimality in neural network training

••

サブネットワーク入りの局所解の一つが、サブネットを削ったネットワークの大域最適になる(後述)

r個のネットワークを並列に接続

Page 22: Global optimality in neural network training

••

𝜱 𝛼𝑾𝟏,… , 𝛼𝑾𝑲 =' 𝜙 𝛼𝑾𝒓G,… , 𝛼𝑾𝒓

R�

\

𝜣 𝛼𝑾G,… , 𝛼𝑾R = 𝛼R𝜣 𝑾G,… ,𝑾L

= ∑ 𝛼R𝜙 𝑾𝒓G,… ,𝑾𝒓

R�\ =𝛼R𝜱 𝑾G,… ,𝑾R

正則化項も同様に・・・

ネットワーク項の冗長化

Page 23: Global optimality in neural network training

•• 𝜱 𝑿

𝑾• 𝑾• 𝑾

𝛀𝝓,𝜽 𝑿 ≡ inf𝒓∈ℕg

inf𝑾𝟏,…,𝑾𝑲

'𝜃 𝑾𝒊𝟏,…,𝑾𝒊

𝑲\

)iG

,

s. t. 𝜱 𝑾G,… ,𝑾R = 𝑿

Page 24: Global optimality in neural network training

𝑾

𝜱𝜱 = 𝑿(緑線)

中心に近づくほど 𝑾 が小さいものとする

𝜱 = 𝑿を満たす𝑾候補

正則化ロスが最も低い𝑾

𝛀𝝓,𝜽 𝑿 ≡ inf𝒓∈ℕg

inf𝑾𝟏,…,𝑾𝑲

'𝜃 𝑾𝒊𝟏,…,𝑾𝒊

𝑲\

)iG

,

s. t. 𝜱 𝑾G,… ,𝑾R = 𝑿

𝑾 = 𝟎(赤線)

Page 25: Global optimality in neural network training

••

ただし、Ωはinf項のために直接評価できないので、このままでは解けない

𝛀𝝓,𝜽 𝑿 ≡ inf𝒓∈ℕg

inf𝑾𝟏,…,𝑾𝑲

'𝜃 𝑾𝒊𝟏,…,𝑾𝒊

𝑲\

)iG

,

s. t. 𝜱 𝑾G,… ,𝑾R = 𝑿

minn𝐹 𝑿 ≡ 𝐿 𝒀, 𝑿 + 𝜆Ωr,s 𝑿

Page 26: Global optimality in neural network training

式変形

minn𝐹 𝑿 ≡ 𝐿 𝒀, 𝑿 + 𝜆Ωr,s 𝑿

𝐿 𝒀,𝝓 𝑾 + 𝜆'𝜃 𝑾𝒊𝟏, …,𝑾𝒊

𝑲\

)iG

凸問題

Page 27: Global optimality in neural network training

••

Page 28: Global optimality in neural network training

W0

Page 29: Global optimality in neural network training

局所解を2つ持つ

W0

W1

Page 30: Global optimality in neural network training

重みパラメータゼロ(W1=0)の面に偶然局所解があったとする

W0

W1

Page 31: Global optimality in neural network training

当然、サブネットを減らしW0だけにしも局所解

Page 32: Global optimality in neural network training
Page 33: Global optimality in neural network training

••

Page 34: Global optimality in neural network training

••

Page 35: Global optimality in neural network training

••

•••

Page 36: Global optimality in neural network training

••

Page 37: Global optimality in neural network training

••

Page 38: Global optimality in neural network training

1.適当な局所解Wを見つける

※十分な数のrがあれば𝛽が計算可能

2.∑ 𝛽)𝜙 𝑾) = 0\)iG を満たす𝛽を計算する

3.𝑅) 𝛾 = 1 + 𝛾𝛽) G x⁄ 𝑊)として、 𝛾を0から1に動かす

4.その時の、𝑊 = 𝑅 1 も局所解であり、かつWの一つがゼロになっている!

※ 𝛾 =0の時、元々の局所解W

ただしmin𝛽) = −1

Page 39: Global optimality in neural network training

•••

Page 40: Global optimality in neural network training

••

••