19
Backpropagation Hung-yi Lee

Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

Backpropagation

Hung-yi Lee

Page 2: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

Background

• Cost Function 𝐶 𝜃

• Given training examples: 𝑥1, 𝑦1 , ⋯ , 𝑥𝑟 , 𝑦𝑟 , ⋯ , 𝑥𝑅 , 𝑦𝑅

• Find a set of parameters θ* minimizing C(θ)

• 𝐶 𝜃 =1

𝑅 𝑟 𝐶

𝑟 𝜃 , 𝐶𝑟 𝜃 = 𝑓 𝑥𝑟; 𝜃 − 𝑦𝑟

• Gradient Descent

• 𝛻𝐶 𝜃 =1

𝑅 𝑟 𝛻𝐶𝑟 𝜃

• Given 𝑤𝑖𝑗𝑙 and 𝑏𝑖

𝑙, we have to compute 𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 and

𝜕𝐶𝑟 𝜕𝑏𝑖𝑙

• There is an efficient way to compute the gradients of the network parameters – backpropagation.

Page 3: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

Chain Rule

Case 1

Case 2

yhz xgy

dx

dy

dy

dz

dx

dzzyx

yxkz , shy sgx

s

y

y

z

s

x

x

z

s

z

s z

x

y

Page 4: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙

• is the multiplication of two termsl

ij

r

w

C

……

1

2

j

……

1

2

il

ijw l

iz l

ia

rl

i

l

ij ΔCΔzΔw

l

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

Layer lLayer 1l

Page 5: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - First Term

• is the multiplication of two termsl

ij

r

w

C

……

1

2

j

……

1

2

il

ijw l

iz l

ia

rl

i

l

ij ΔCΔzΔw

l

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

Layer lLayer 1l

Page 6: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - First Term

……

Layer l-1

1

2

j…

1

2

i

Layer l

l

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

l

i

l

j

j

l

ij

l

i bawz 1 1

l

jl

ij

l

i aw

zIf l > 1

l

ijw l

iz

Page 7: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - First Term

l

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

If l = 1

111

i

r

j

j

iji bxwz r

j

ij

i xw

z

1

1

……

Input

……

1

2

i

Layer 1

rx1

rx2

r

jx1

ijw1

iz

l

i

l

j

j

l

ij

l

i bawz 1 1

l

jl

ij

l

i aw

zIf l > 1

Page 8: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term

• is the multiplication of two termsl

ij

r

w

C

……

1

2

j

……

1

2

il

ijw l

iz l

ia

rl

i

l

ij ΔCΔzΔw

l

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

Layer lLayer 1l

l

i

Page 9: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term

……

Layer l-1

1

2

j…

1

2

i

Layer l …

…1

2

k

Layer l+1

……

……

……

……

1

2

n

Layer L(output layer)

1. How to compute Lδ

2. The relation of and lδ 1lδl

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

l

i

l

lδ2

lδ1

1l

1

2

1

1

L

L

Lδ1

lδ1lδ Lδ1-Lδ

Page 10: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term

1. How to compute Lδ

2. The relation of and lδ 1lδl

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

l

i

L

L

n

r

nz

C

rrLL Cyaz nnn

rL

r

n

r

n

n

y

C

z

y

L

nz ……

1

2

n

Layer L(output layer)

L

L

Lδ1

z

z

Depending on the definition of cost function

L

nz

Page 11: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term

1. How to compute Lδ

2. The relation of and lδ 1lδl

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

l

i

L

L

n

r

nz

C

rL

r

n

r

n

n

y

C

z

y

L

n

L

L

L

z

z

z

z

2

1

r

n

r

rr

rr

rr

yC

yC

yC

yC

2

1

rrl yCzδ L

r

n

rL

ny

Cz

element-wise multiplication

?Lδ

Page 12: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term

l

i

l

i ΔaΔz rΔC

kl

k

r

l

i

l

k

l

i

l

i

l

i

rl

iz

C

a

z

z

a

z

C1

1

1

1

lΔz

1

2

lΔz

1l

kΔz

……

l

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

l

i

1. How to compute Lδ

2. The relation of and lδ 1lδ

……

1

2

i

Layer l

……

1

2

k

Layer l+1

l

lδ2

lδ1

1l

1

2

1

1

1lδlδ

l

i

rl

iz

1l

k

Page 13: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term

l

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

l

i

k

l

k

l

ki

l

i

l

i wz 11……

1

2

i

Layer l

……

1

2

k

Layer l+1

l

lδ2

lδ1

1l

1

2

1

1

1lδlδ

l

i

l

i ΔaΔz rΔC

1

1

lΔz

1

2

lΔz

1l

kΔz

……

k

l

kl

i

l

k

l

i

l

il

ia

z

z

a 11

l

izσ111 l

k

l

i

i

l

ki

l

k bawz

Page 14: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term

l

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

l

i

l

iδ i

l

iz

multiply a constant

1l

1

2

1

1

……

1l

kiw

1

2

l

iw

1

1

l

iwoutput

input

new type of neuron

k

l

k

l

ki

l

i

l

i wz 11

……

1

2

i

Layer l

……

1

2

k

Layer l+1

l

lδ2

lδ1

1l

1

2

1

1

1lδlδ

Page 15: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term

2

1

1

lz

1

2

lz

1 l

kz

1

k

2

1

i

Layer l+1Layer l

lz1

lz2

l

iz

l

lδ2

lδ1

1l

1

2

1

1

11 lTlll Wz

1lδlδ

l

i

l

l

l

z

z

z

z

2

1

k

l

k

l

ki

l

i

l

i wz 11

Page 16: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

𝜕𝐶𝑟 𝜕𝑤𝑖𝑗𝑙 - Second Term

2

1

1

lz

1

2

lz

1 l

kz

1

k

2

1

i

Layer l+1Layer l

lz1

lz2

l

iz

l

lδ2

lδ1

1l

1

2

1

1

11 lTlll Wz

1lδlδ

Compare

1l

ka

1

2

la

1

1

la…

Layer l

1

2

i

……

1

2

kl

ia

Layer l+1

la2

la1

la1la

111 llll baWa

Page 17: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

1

2

n

……

r

1y

C r

Lz1

Lz2

L

nz

r

2y

C r

r

n

r

y

C

Layer L

2

1

1

lz

1

2

lz

1 l

kz

1

k

2

1

i……

Layer l+1Layer l

lz1

lz2

l

iz

lδ1

lδ2

l

2

… 1L

1

z

1

m

Layer L-1

……

……

……

1. How to compute Lδ

2. The relation of and lδ 1lδ

11 lTlll Wz

TW L TlW 1

rr yCzδ LL l

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

l

i

rr yCL1-L

1L

2

z

1L mz

1lδ

Page 18: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

Concluding Remarksl

i

r

l

ij

l

i

l

ij

r

z

C

w

z

w

C

Forward Pass Backward Pass

11 ll za

11 lTlll Wz 1211 llll baWz

rrL yCzδ L

1

11

lx

lar

j

l

j

……

1

2

j

……

1

2

il

ijw

Layer lLayer 1l

111 bxWz r

11 za

LTLLL Wz 11

l

i

Page 19: Backpropagation - NTU Speech Processing Laboratoryspeech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN backprop.pdf · V c element-wise multiplication /L? 𝜕𝐶𝑟 𝜕

Acknowledgement

•感謝 Ryan Sun 來信指出投影片上的錯誤