Upload
roro132
View
223
Download
0
Embed Size (px)
Citation preview
8/13/2019 ANa-deriv
1/3
8/13/2019 ANa-deriv
2/3
= (y y)zh
The 12 factor is canceled when we take the partial derivative, and we include a multiplicative factorof -1 for the derivative with respect to y inside ( y y). From Equation 1, we know that yv h is zh .
Step 2: Update the weights W
To gure out how to update each of the weights in W , we compute the partial derivative of E withrespect to whj (for the weight that connects hidden node h with input j ) and multiply it by the learningfactor . Lets break down the partial derivative:
whj = E whj
= E y
yzh
zhwhj
We know that E
y is (y
y
) from the previous step. The partial derivative y
z h is also simple; fromEquation 1 we see it is just vh .The interesting part is nding z hw hj . This can be re-written as
z hw T h x
w T h xw hj . The rst part,
z hw T h x
, iswhere the sigmoid comes in. The sigmoid equation, in general, is
sigmoid( a ) = 1
1 + e a.
I am using a here to represent any argument given to the sigmoid function. For zh , a = wT h x . Now thepartial derivative of the sigmoid with respect to its argument, a , is:
sigmoid( a )a
= 11+ e a
a
= 1
(1 + e a )2 ea ( 1)
= 1
(1 + e a )2 e a
= 11 + e a )
e a
1 + e a
= 11 + e a )
(1 1) + e a
1 + e a
= 11 + e a )
(1 + e a ) 11 + e a
= 11 + e a )
1 1
1 + e a
= sigmoid(a)(1 sigmoid(a))
So for zh , we have z hw T h x = zh (1 zh ). Isnt that neat?
But dont forget the second half, wT h x
w hj . Luckily, this is straightforward: it is just x j .
2
8/13/2019 ANa-deriv
3/3
Thus, overall we have
whj = E whj
= E y
yzh
zhwhj
= ( (y y)) vh (zh (1 zh )x j )= (y y) vh zh (1 zh )x j
Thats it! Let me know if you have any questions.
3