ANa-deriv

  • Upload
    roro132

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

  • 8/13/2019 ANa-deriv

    1/3

  • 8/13/2019 ANa-deriv

    2/3

    = (y y)zh

    The 12 factor is canceled when we take the partial derivative, and we include a multiplicative factorof -1 for the derivative with respect to y inside ( y y). From Equation 1, we know that yv h is zh .

    Step 2: Update the weights W

    To gure out how to update each of the weights in W , we compute the partial derivative of E withrespect to whj (for the weight that connects hidden node h with input j ) and multiply it by the learningfactor . Lets break down the partial derivative:

    whj = E whj

    = E y

    yzh

    zhwhj

    We know that E

    y is (y

    y

    ) from the previous step. The partial derivative y

    z h is also simple; fromEquation 1 we see it is just vh .The interesting part is nding z hw hj . This can be re-written as

    z hw T h x

    w T h xw hj . The rst part,

    z hw T h x

    , iswhere the sigmoid comes in. The sigmoid equation, in general, is

    sigmoid( a ) = 1

    1 + e a.

    I am using a here to represent any argument given to the sigmoid function. For zh , a = wT h x . Now thepartial derivative of the sigmoid with respect to its argument, a , is:

    sigmoid( a )a

    = 11+ e a

    a

    = 1

    (1 + e a )2 ea ( 1)

    = 1

    (1 + e a )2 e a

    = 11 + e a )

    e a

    1 + e a

    = 11 + e a )

    (1 1) + e a

    1 + e a

    = 11 + e a )

    (1 + e a ) 11 + e a

    = 11 + e a )

    1 1

    1 + e a

    = sigmoid(a)(1 sigmoid(a))

    So for zh , we have z hw T h x = zh (1 zh ). Isnt that neat?

    But dont forget the second half, wT h x

    w hj . Luckily, this is straightforward: it is just x j .

    2

  • 8/13/2019 ANa-deriv

    3/3

    Thus, overall we have

    whj = E whj

    = E y

    yzh

    zhwhj

    = ( (y y)) vh (zh (1 zh )x j )= (y y) vh zh (1 zh )x j

    Thats it! Let me know if you have any questions.

    3