Backpropagation: Understanding How to Update ANNs Weights Step-by-Step

Preview:

Citation preview

Backpropagation: Understanding How to Update ANNs Weights Step-by-Step

Ahmed Fawzy Gad

ahmed.fawzy@ci.menofia.edu.eg

MENOUFIA UNIVERSITYFACULTY OF COMPUTERS AND INFORMATION

INFORMATION TECHNOLOGY

جامعة المنوفية

كلية الحاسبات والمعلومات

تكنولوجيا المعلومات

جامعة المنوفية

Train then Update

• The backpropagation algorithm is used to update the NN weightswhen they are not able to make the correct predictions. Hence, weshould train the NN before applying backpropagation.

Initial Weights PredictionTraining

Train then Update

• The backpropagation algorithm is used to update the NN weightswhen they are not able to make the correct predictions. Hence, weshould train the NN before applying backpropagation.

Initial Weights PredictionTraining

BackpropagationUpdate

Neural Network Training Example

𝐗𝟏 𝐗𝟐 𝐎𝐮𝐭𝐩𝐮𝐭

𝟎. 𝟏 𝟎. 𝟑 𝟎. 𝟎𝟑

𝐖𝟏 𝐖𝟐 𝐛

𝟎. 𝟓 𝟎. 𝟓 1. 𝟖𝟑

Training Data Initial Weights

𝟎. 𝟏

In Out

𝑾𝟏 = 𝟎. 𝟓

𝑾𝟐 = 𝟎. 𝟐

+𝟏

𝒃 = 𝟏. 𝟖𝟑

𝟎. 𝟑

𝑿𝟏

In Out

𝑾𝟏

𝑾𝟐

+𝟏

𝒃

𝑿𝟐

Network Training

• Steps to train our network:1. Prepare activation function input

(sum of products between inputsand weights).

2. Activation function output.

𝟎. 𝟏

In Out

𝑾𝟏 = 𝟎. 𝟓

𝑾𝟐 = 𝟎. 𝟐

+𝟏

𝒃 = 𝟏. 𝟖𝟑

𝟎. 𝟑

Network Training: Sum of Products

• After calculating the sop between inputsand weights, next is to use this sop as theinput to the activation function.

𝟎. 𝟏

In Out

𝑾𝟏 = 𝟎. 𝟓

𝑾𝟐 = 𝟎. 𝟐

+𝟏

𝒃 = 𝟏. 𝟖𝟑

𝟎. 𝟑

𝒔 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃

𝒔 = 𝟎. 𝟏 ∗ 𝟎. 𝟓 + 𝟎. 𝟑 ∗ 𝟎. 𝟐 + 𝟏. 𝟖𝟑

𝒔 = 𝟏. 𝟗𝟒

Network Training: Activation Function

• In this example, the sigmoid activationfunction is used.

• Based on the sop calculated previously,the output is as follows:

𝟎. 𝟏

In Out

𝑾𝟏 = 𝟎. 𝟓

𝑾𝟐 = 𝟎. 𝟐

+𝟏

𝒃 = 𝟏. 𝟖𝟑

𝟎. 𝟑

𝒇 𝒔 =𝟏

𝟏 + 𝒆−𝒔

𝒇 𝒔 =𝟏

𝟏 + 𝒆−𝟏.𝟗𝟒=

𝟏

𝟏 + 𝟎. 𝟏𝟒𝟒=

𝟏

𝟏. 𝟏𝟒𝟒

𝒇 𝒔 = 𝟎. 𝟖𝟕𝟒

Network Training: Prediction Error

• After getting the predicted outputs,next is to measure the prediction errorof the network.

• We can use the squared error functiondefined as follows:

• Based on the predicted output, theprediction error is:

𝟎. 𝟏

In Out

𝑾𝟏 = 𝟎. 𝟓

𝑾𝟐 = 𝟎. 𝟐

+𝟏

𝒃 = 𝟏. 𝟖𝟑

𝟎. 𝟑

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐

𝑬 =𝟏

𝟐𝟎. 𝟎𝟑 − 𝟎. 𝟖𝟕𝟒 𝟐 =

𝟏

𝟐−𝟎. 𝟖𝟒𝟒 𝟐 =

𝟏

𝟐𝟎. 𝟕𝟏𝟑 = 𝟎. 𝟑𝟓𝟕

How to Minimize Prediction Error?

• There is a prediction error and it should be minimized until reachingan acceptable error.

What should we do in order to minimize the error?• There must be something to change in order to minimize the error. In

our example, the only parameter to change is the weight.

How to update the weights?• We can use the weights update equation:

𝑾𝒏𝒆𝒘 = 𝑾𝒐𝒍𝒅 + η 𝒅 − 𝒀 𝑿

Weights Update Equation

• We can use the weights update equation:

𝑾𝒏𝒆𝒘: new updated weights.

𝑾𝒐𝒍𝒅: current weights. [1.83, 0.5, 0.2]

η: network learning rate. 0.01

𝒅: desired output. 0.03

𝒀: predicted output. 0.874

𝑿: current input at which the network made false prediction. [+1, 0.1, 0.3]

𝑾𝒏𝒆𝒘 = 𝑾𝒐𝒍𝒅 + η 𝒅 − 𝒀 𝑿

Weights Update Equation𝑾𝒏𝒆𝒘 = 𝑾𝒐𝒍𝒅 + η 𝒅 − 𝒀 𝑿

= [𝟏. 𝟖𝟑, 𝟎. 𝟓, 𝟎. 𝟐 + 𝟎. 𝟎𝟏 𝟎. 𝟎𝟑 − 𝟎. 𝟖𝟕𝟒 [+𝟏, 𝟎. 𝟏, 𝟎. 𝟑

= [𝟏. 𝟖𝟑, 𝟎. 𝟓, 𝟎. 𝟐 + −𝟎. 𝟎𝟎𝟖𝟒[+𝟏, 𝟎. 𝟏, 𝟎. 𝟑

= [𝟏. 𝟖𝟑, 𝟎. 𝟓, 𝟎. 𝟐 + [−𝟎. 𝟎𝟎𝟖𝟒,−𝟎. 𝟎𝟎𝟎𝟖𝟒,−𝟎. 𝟎𝟎𝟐𝟓

= [𝟏. 𝟖𝟐𝟐, 𝟎. 𝟒𝟗𝟗, 𝟎. 𝟏𝟗𝟖

Weights Update Equation

• The new weights are:

• Based on the new weights, the network will be re-trained.

𝑾𝟏𝒏𝒆𝒘 𝑾𝟐𝒏𝒆𝒘 𝒃𝒏𝒆𝒘

𝟎. 𝟏𝟗𝟖 𝟎. 𝟒𝟗𝟗 𝟏. 𝟖𝟐𝟐

𝟎. 𝟏

In Out

𝑾𝟏 = 𝟎. 𝟓

𝑾𝟐 = 𝟎. 𝟐

+𝟏

𝒃 = 𝟏. 𝟖𝟑

𝟎. 𝟑

Weights Update Equation

• The new weights are:

• Based on the new weights, the network will be re-trained.

• Continue these operations until prediction error reaches anacceptable value.

1. Updating weights.2. Retraining network.3. Calculating prediction error.

𝑾𝟏𝒏𝒆𝒘 𝑾𝟐𝒏𝒆𝒘 𝒃𝒏𝒆𝒘

𝟎. 𝟏𝟗𝟖 𝟎. 𝟒𝟗𝟗 𝟏. 𝟖𝟐𝟐

𝟎. 𝟏

In Out

𝑾𝟏 = 𝟎. 𝟒𝟗𝟗

𝑾𝟐 = 𝟎. 𝟏𝟗𝟖

+𝟏

𝒃 = 𝟏. 𝟖22

𝟎. 𝟑

Why Backpropagation Algorithm is Important?

• The backpropagation algorithm is used to answer these questionsand understand effect of each weight over the prediction error.

New Weights!Old Weights

Forward Vs. Backward Passes

• When training a neural network, there are twopasses: forward and backward.

• The goal of the backward pass is to know how eachweight affects the total error. In other words, howchanging the weights changes the prediction error?

Forward

Backward

Backward Pass

• Let us work with a simpler example:

• How to answer this question: What is the effect on the output Ygiven a change in variable X?

• This question is answered using derivatives. Derivative of Y wrt X (𝝏𝒀

𝝏𝑿)

will tell us the effect of changing the variable X over the output Y.

𝒀 = 𝑿𝟐𝒁 + 𝑯

Calculating Derivatives

• The derivative𝝏𝒀

𝝏𝑿can be calculated as follows:

• Based on these two derivative rules:

• The result will be:

𝝏𝒀

𝛛𝑿=

𝛛

𝛛𝑿(𝑿𝟐𝒁 + 𝑯)

𝒀 = 𝑿𝟐𝒁 + 𝑯

𝛛

𝛛𝑿𝑿𝟐 = 𝟐𝑿Square

𝛛

𝛛𝑿𝑪 = 𝟎Constant

𝝏𝒀

𝛛𝑿= 𝟐𝑿𝒁 + 𝟎 = 𝟐𝑿𝒁

Prediction Error – Weight Derivative

E W?

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐

Change in Y wrt X𝝏𝒀

𝛛𝑿Change in E wrt W

𝝏𝑬

𝛛𝑾

Prediction Error – Weight Derivative

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐

Prediction Error – Weight Derivative

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐

Prediction Error – Weight Derivative

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐

𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕)

Prediction Error – Weight Derivative

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐

𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 =𝟏

𝟏 + 𝒆−𝒔

Prediction Error – Weight Derivative

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −

𝟏

𝟏 + 𝒆−𝒔

𝟐

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐

𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 =𝟏

𝟏 + 𝒆−𝒔

Prediction Error – Weight Derivative

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −

𝟏

𝟏 + 𝒆−𝒔

𝟐

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐

𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 =𝟏

𝟏 + 𝒆−𝒔

Prediction Error – Weight Derivative

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −

𝟏

𝟏 + 𝒆−𝒔

𝟐

𝒔 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐

𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 =𝟏

𝟏 + 𝒆−𝒔

Prediction Error – Weight Derivative

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −

𝟏

𝟏 + 𝒆−𝒔

𝟐

𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑 (𝑪𝒐𝒏𝒔𝒕𝒂𝒏𝒕) 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒇 𝒔 =𝟏

𝟏 + 𝒆−𝒔

𝒔 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −

𝟏

𝟏 + 𝒆−(𝑿1∗ 𝑾1+ 𝑿2∗𝑾2+𝒃)

𝟐

Multivariate Chain Rule

Predicted Output

Prediction Error

sop Weights

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝒇 𝒙 =

𝟏

𝟏 + 𝒆−𝒔𝒔 = 𝑿𝟏 ∗ 𝑾𝟏 + 𝑿𝟐 ∗ 𝑾𝟐 + 𝒃 𝑾𝟏,𝑾𝟐

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −

𝟏

𝟏 + 𝒆−(𝑿1∗ 𝑾1+ 𝑿2∗𝑾2+𝒃)

𝟐

𝝏𝑬

𝝏𝑾=

𝝏

𝝏𝑾(𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 −

𝟏

𝟏 + 𝒆−(𝑿𝟏∗ 𝑾𝟏+ 𝑿𝟐∗𝑾𝟐+𝒃)

𝟐

)

Chain Rule

Multivariate Chain RulePredicted

OutputPrediction

Errorsop Weights

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐 𝒇 𝒙 =

𝟏

𝟏 + 𝒆−𝒔𝒔 = 𝑿𝟏 ∗ 𝑾𝟏 + 𝑿𝟐 ∗ 𝑾𝟐 + 𝒃 𝑾𝟏,𝑾𝟐

𝝏𝑬

𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝛛𝒔

𝝏𝒔

𝛛𝑾𝟏

𝝏𝒔

𝛛𝑾𝟐

𝝏𝑬

𝛛𝑾𝟏

𝝏𝑬

𝛛𝑾𝟐

Let’s calculate these individual partial derivatives.

𝝏𝑬

𝝏𝑾𝟏=

𝝏𝑬

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅∗

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝝏𝒔∗

𝝏𝒔

𝝏𝑾𝟏

𝝏𝑬

𝝏𝑾𝟐=

𝝏𝑬

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅∗

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝝏𝒔∗

𝝏𝒔

𝝏𝑾𝟐

𝝏𝑬

𝝏𝑾𝟐=

𝝏𝑬

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅∗

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝝏𝒔∗

𝝏𝒔

𝝏𝑾𝟐

Error-Predicted (𝝏𝑬

𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅) Partial Derivative

Substitution

𝝏𝑬

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅=

𝝏

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅(𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐)

= 𝟐 ∗𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐−𝟏 ∗ (𝟎 − 𝟏)

)= (𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅) ∗ (−𝟏

= 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 − 𝒅𝒆𝒔𝒊𝒓𝒆𝒅

𝝏𝑬

𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅= 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 − 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟖𝟕𝟒 − 𝟎. 𝟎𝟑

𝝏𝑬

𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅= 𝟎. 𝟖𝟒𝟒

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝟐

Predicted-sop (𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝝏𝒔) Partial Derivative

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝝏𝒔=

𝝏

𝝏𝒔(

𝟏

𝟏 + 𝒆−𝒔)

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝝏𝒔=

𝟏

𝟏 + 𝒆−𝒔(𝟏 −

𝟏

𝟏 + 𝒆−𝒔)

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝝏𝒔=

𝟏

𝟏 + 𝒆−𝒔(𝟏 −

𝟏

𝟏 + 𝒆−𝒔) =

𝟏

𝟏 + 𝒆−𝟏.𝟗𝟒(𝟏 −

𝟏

𝟏 + 𝒆−𝟏.𝟗𝟒)

=𝟏

𝟏 + 𝟎. 𝟏𝟒𝟒(𝟏 −

𝟏

𝟏 + 𝟎. 𝟏𝟒𝟒)

=𝟏

𝟏. 𝟏𝟒𝟒(𝟏 −

𝟏

𝟏. 𝟏𝟒𝟒)

= 𝟎. 𝟖𝟕𝟒(𝟏 − 𝟎. 𝟖𝟕𝟒)

= 𝟎. 𝟖𝟕𝟒(𝟎. 𝟏𝟐𝟔)

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝛛𝒔= 𝟎. 𝟏𝟏

Substitution

𝐏𝐫𝐞𝐝𝐢𝐜𝐭𝐞𝐝 =𝟏

𝟏 + 𝒆−𝒔

Sop-𝑊1 (𝝏𝒔

𝛛𝑾𝟏) Partial Derivative

𝝏𝒔

𝛛𝑾𝟏=

𝛛

𝛛𝑾𝟏(𝑿𝟏 ∗ 𝑾𝟏 + 𝑿𝟐 ∗ 𝑾𝟐 + 𝒃)

= 𝟏 ∗ 𝑿𝟏 ∗ 𝑾𝟏𝟏−𝟏 + 𝟎 + 𝟎

= 𝑿𝟏 ∗ 𝑾𝟏𝟎

)= 𝑿𝟏(𝟏𝝏𝒔

𝛛𝑾𝟏= 𝑿𝟏

𝝏𝒔

𝛛𝑾𝟏= 𝑿𝟏

Substitution

𝝏𝒔

𝛛𝑾𝟏= 𝟎. 𝟏

𝐬 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃

𝝏𝒔

𝛛𝑾𝟐=

𝛛

𝛛𝑾𝟐(𝑿𝟏 ∗ 𝑾𝟏 + 𝑿𝟐 ∗ 𝑾𝟐 + 𝒃)

= 𝟎 + 𝟏 ∗ 𝑿𝟐 ∗ 𝑾𝟐𝟏−𝟏 + 𝟎

= 𝑿𝟐 ∗ 𝑾𝟐𝟎

)= 𝑿𝟐(𝟏𝝏𝒔

𝛛𝑾𝟐= 𝑿𝟐

𝝏𝒔

𝛛𝑾𝟐= 𝑿𝟐 = 𝟎. 𝟑

Substitution

𝝏𝒔

𝛛𝑾𝟐= 𝟎. 𝟑

𝐬 = 𝑿1 ∗ 𝑾1 + 𝑿2 ∗ 𝑾2 + 𝒃

Sop-𝑊1 (𝝏𝒔

𝛛𝑾𝟐) Partial Derivative

Error-𝑊1 (𝛛𝑬

𝛛𝑾𝟏) Partial Derivative

• After calculating each individual derivative, we can multiply all ofthem to get the desired relationship between the prediction errorand each weight.

𝝏𝑬

𝝏𝑾𝟏=

𝝏𝑬

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅∗

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝝏𝒔∗

𝝏𝒔

𝝏𝑾𝟏𝝏𝑬

𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅= 𝟎. 𝟖𝟒𝟒

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝛛𝒔= 𝟎. 𝟏𝟏

𝝏𝒔

𝛛𝑾𝟏= 𝟎. 𝟏

𝝏𝑬

𝛛𝑾𝟏= 𝟎. 𝟖𝟒𝟒 ∗ 𝟎. 𝟏𝟏 ∗ 𝟎. 𝟏

𝝏𝑬

𝛛𝑾𝟏= 𝟎. 𝟎𝟏

Calculated Derivatives

Error-𝑊2 (𝛛𝑬

𝛛𝑾𝟐) Partial Derivative

𝝏𝑬

𝝏𝑾𝟐=

𝝏𝑬

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅∗

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝝏𝒔∗

𝝏𝒔

𝝏𝑾𝟐𝝏𝑬

𝛛𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅= 𝟎. 𝟖𝟒𝟒

𝝏𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅

𝛛𝒔= 𝟎. 𝟏𝟏

𝝏𝒔

𝛛𝑾𝟐= 𝟎. 𝟑

𝛛𝑬

𝛛𝑾𝟐= 𝟎. 𝟎𝟑

𝝏𝑬

𝛛𝑾𝟐= 𝟎. 𝟖𝟒𝟒 ∗ 𝟎. 𝟏𝟏 ∗ 𝟎. 𝟑

Calculated Derivatives

Interpreting Derivatives

• There are two useful pieces of information from the derivativescalculated previously.

Increasing/decreasing weight increases/decreases error.

Derivative MagnitudeDerivative Sign

Positive

Increasing/decreasing weight decreases/increases error.

Negative

Increasing/decreasing weight by P increases/decreases error by MAG*P.

Increasing/decreasing weight by P decreases/increases error by MAG*P.

Positive Sign

Negative Sign

In our example, because both 𝛛𝑬

𝛛𝑾𝟏and

𝛛𝑬

𝛛𝑾𝟐are positive, then we would

like to decrease the weights in order to decrease the prediction error.

𝛛𝑬

𝛛𝑾𝟐= 𝟎. 𝟎𝟑

𝝏𝑬

𝛛𝑾𝟏= 𝟎. 𝟎𝟏

Updating Weights• Each weight will be updated based on its derivative according to this

equation:

𝑾𝒊𝒏𝒆𝒘 = 𝑾𝒊𝒐𝒍𝒅 − η ∗𝛛𝑬

𝛛𝑾𝒊

𝑾𝟏𝒏𝒆𝒘 = 𝑾𝟏 − η ∗𝛛𝑬

𝛛𝑾𝟏

= 𝟎. 𝟓 − 0.01 ∗ 𝟎. 𝟎𝟏

𝑾𝟏𝒏𝒆𝒘 = 𝟎. 𝟒𝟗𝟗𝟗𝟏

𝑾𝟐𝒏𝒆𝒘 = 𝑾𝟐 − η ∗𝛛𝑬

𝛛𝑾𝟐

= 𝟎. 𝟐 − 0.01 ∗ 𝟎. 𝟎𝟐𝟖

𝑾𝟐𝒏𝒆𝒘 = 𝟎. 𝟏𝟗𝟗𝟕

Updating 𝑾𝟏 Updating 𝑾𝟐

Continue updating weights according to derivatives and re-train the network until reaching an acceptable error.

Second ExampleBackpropagation for NN with Hidden Layer

ANN with Hidden Layer

𝑾𝟏 𝑾𝟐 𝑾𝟑 𝑾𝟒 𝑾𝟓 𝑾𝟔 𝒃𝟏 𝒃𝟐 𝒃𝟑

𝟎. 𝟓 𝟎. 𝟏 𝟎. 𝟔𝟐 𝟎. 𝟐 −𝟎. 𝟐 𝟎. 𝟑 𝟎. 𝟒 −𝟎. 𝟏 𝟏. 𝟖𝟑

𝐗𝟏 𝐗𝟐 𝐎𝐮𝐭𝐩𝐮𝐭

𝟎. 𝟏 𝟎. 𝟑 𝟎. 𝟎𝟑

Training Data

Initial Weights

ANN with Hidden Layer

Initial Weights PredictionTraining

ANN with Hidden Layer

Initial Weights PredictionTraining

BackpropagationUpdate

Forward Pass – Hidden Layer Neurons

𝒉𝟏𝒊𝒏 = 𝑿𝟏 ∗ 𝑾𝟏 + 𝑿𝟐 ∗ 𝑾𝟐 + 𝒃𝟏

= 𝟎. 𝟏 ∗ 𝟎. 𝟓 + 𝟎. 𝟑 ∗ 𝟎. 𝟏 + 𝟎. 𝟒

𝒉𝟏𝒊𝒏 = 𝟎. 𝟒𝟖

𝒉𝟏𝒐𝒖𝒕 =𝟏

𝟏 + 𝒆−𝒉𝟏𝒊𝒏

=𝟏

𝟏 + 𝒆−𝟎.𝟒𝟖

𝒉𝟏𝒐𝒖𝒕 = 𝟎. 𝟔𝟏𝟖

𝒉𝟏

In

Out

Forward Pass – Hidden Layer Neurons

𝒉𝟐𝒊𝒏 = 𝑿𝟏 ∗ 𝑾𝟑 + 𝑿𝟐 ∗ 𝑾𝟒 + 𝒃𝟐

= 𝟎. 𝟏 ∗ 𝟎. 𝟔𝟐 + 𝟎. 𝟑 ∗ 𝟎. 𝟐 − 𝟎. 𝟏

𝒉𝟐𝒊𝒏 = 𝟎. 𝟎𝟐𝟐

𝒉𝟐𝒐𝒖𝒕 =𝟏

𝟏 + 𝒆−𝒉𝟐𝒊𝒏

=𝟏

𝟏 + 𝒆−𝟎.𝟎𝟐𝟐

𝒉𝟐𝒐𝒖𝒕 = 𝟎. 𝟓𝟎𝟔

𝒉𝟐

In

Out

Forward Pass – Output Layer Neuron

𝒐𝒖𝒕𝒊𝒏 = 𝒉𝟏𝒐𝒖𝒕 ∗ 𝑾𝟓 + 𝒉𝟐𝒐𝒖𝒕 ∗ 𝑾𝟔 + 𝒃𝟑

= 𝟎. 𝟔𝟏𝟖 ∗ −𝟎. 𝟐 + 𝟎. 𝟓𝟎𝟔 ∗ 𝟎. 𝟑 + 𝟏. 𝟖𝟑

𝒐𝒖𝒕𝒊𝒏 = 𝟏. 𝟖𝟓𝟖

𝒐𝒖𝒕𝒐𝒖𝒕 =𝟏

𝟏 + 𝒆−𝒐𝒖𝒕𝒊𝒏

=𝟏

𝟏 + 𝒆−𝟏.𝟖𝟓𝟖

𝒐𝒖𝒕𝒐𝒖𝒕 = 𝟎. 𝟖𝟔𝟓

𝒐𝒖𝒕

In

Out

Forward Pass – Prediction Error

𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟎𝟑

𝑬 =𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒐𝒖𝒕𝒐𝒖𝒕

𝟐

=𝟏

𝟐𝟎. 𝟎𝟑 − 𝟎. 𝟖𝟔𝟓 𝟐

𝑬 = 𝟎. 𝟑𝟒𝟗

𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 = 𝒐𝒖𝒕𝒐𝒖𝒕 = 𝟎. 𝟖𝟔𝟓

𝝏𝑬

𝝏𝑾𝟏,

𝝏𝑬

𝝏𝑾𝟐,

𝝏𝑬

𝝏𝑾𝟑,

𝝏𝑬

𝝏𝑾𝟒,

𝝏𝑬

𝝏𝑾𝟓,

𝝏𝑬

𝝏𝑾𝟔

Partial Derivatives Calculation

E−𝑊5 (𝝏𝑬

𝝏𝑾𝟓) Parial Derivative

𝝏𝑬

𝛛𝑾𝟓=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝑾𝟓

E−𝑊5 (𝝏𝑬

𝝏𝑾𝟓) Parial Derivative

𝝏𝑬

𝛛𝑾𝟓=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝑾𝟓

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕=

𝝏

𝝏𝒐𝒖𝒕𝒐𝒖𝒕(𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒐𝒖𝒕𝒐𝒖𝒕

𝟐)

= 𝟐 ∗𝟏

𝟐𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒐𝒖𝒕𝒐𝒖𝒕

𝟐−𝟏 ∗ (𝟎 − 𝟏)

= 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 − 𝒐𝒖𝒕𝒐𝒖𝒕 ∗ (−𝟏)𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝒐𝒖𝒕𝒐𝒖𝒕 − 𝒅𝒆𝒔𝒊𝒓𝒆𝒅

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝒐𝒖𝒕𝒐𝒖𝒕 − 𝒅𝒆𝒔𝒊𝒓𝒆𝒅 = 𝟎. 𝟖𝟔𝟓 − 𝟎. 𝟎𝟑

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝟎. 𝟖𝟑𝟓

Partial Derivative

Substitution

E−𝑊5 (𝝏𝑬

𝝏𝑾𝟓) Parial Derivative

𝝏𝑬

𝛛𝑾𝟓=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝑾𝟓

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏=

𝝏

𝝏𝒐𝒖𝒕𝒊𝒏(

𝟏

𝟏 + 𝒆−𝒐𝒖𝒕𝒊𝒏)

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= (

𝟏

𝟏 + 𝒆−𝒐𝒖𝒕𝒊𝒏)(𝟏 −

𝟏

𝟏 + 𝒆−𝒐𝒖𝒕𝒊𝒏)

𝜕𝒐𝒖𝒕𝒐𝒖𝒕

𝜕𝒐𝒖𝒕𝒊𝒏= (

𝟏

𝟏 + 𝒆−𝟏.𝟖𝟓𝟖)(𝟏 −

𝟏

𝟏 + 𝒆−𝟏.𝟖𝟓𝟖)

= (𝟏

𝟏. 𝟓𝟔)(𝟏 −

𝟏

𝟏. 𝟓𝟔)

= 𝟎. 𝟔𝟒𝟏 𝟏 − 𝟎. 𝟔𝟒𝟏 = 𝟎. 𝟔𝟒𝟏 𝟎. 𝟑𝟓𝟗𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= 𝟎. 𝟐𝟑

Partial Derivative

Substitution

E−𝑊5 (𝝏𝑬

𝝏𝑾𝟓) Parial Derivative

𝝏𝑬

𝛛𝑾𝟓=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝑾𝟓

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝑾𝟓=

𝝏

𝝏𝑾𝟓(𝒉𝟏𝒐𝒖𝒕 ∗ 𝑾𝟓 + 𝒉𝟐𝒐𝒖𝒕 ∗ 𝑾𝟔 + 𝒃𝟑)

= 𝟏 ∗ 𝒉𝟏𝒐𝒖𝒕 ∗ (𝑾𝟓)𝟏−𝟏+ 𝟎 + 𝟎

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝑾𝟓= 𝒉𝟏𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝑾𝟓= 𝒉𝟏𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝑾𝟓= 𝟎. 𝟔𝟏𝟖

Partial Derivative

Substitution

E−𝑊5 (𝝏𝑬

𝝏𝑾𝟓) Parial Derivative

𝝏𝑬

𝛛𝑾𝟓=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝑾𝟓

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝑾𝟓= 𝟎. 𝟔𝟏𝟖

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= 𝟎. 𝟐𝟑

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝟎. 𝟖𝟑𝟓

𝝏𝑬

𝝏𝑾𝟓= 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ 𝟎. 𝟔𝟏𝟖

𝝏𝑬

𝝏𝑾𝟓= 𝟎. 𝟏𝟏𝟗

E−𝑊6 (𝝏𝑬

𝝏𝑾𝟔) Parial Derivative

𝝏𝑬

𝛛𝑾𝟔=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝑾𝟔

E−𝑊6 (𝝏𝑬

𝝏𝑾𝟔) Parial Derivative

𝝏𝑬

𝛛𝑾𝟔=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝑾𝟔

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= 𝟎. 𝟐𝟑

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝟎. 𝟖𝟑𝟓

E−𝑊6 (𝝏𝑬

𝝏𝑾𝟔) Parial Derivative

𝝏𝑬

𝛛𝑾𝟓=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝑾𝟔

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝑾𝟔=

𝝏

𝝏𝑾𝟔(𝒉𝟏𝒐𝒖𝒕 ∗ 𝑾𝟓 + 𝒉𝟐𝒐𝒖𝒕 ∗ 𝑾𝟔 + 𝒃𝟑)

= 𝟎 + 𝟏 ∗ 𝒉𝟐𝒐𝒖𝒕 ∗ (𝑾𝟔)𝟏−𝟏+𝟎

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝑾𝟔= 𝒉𝟐𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝑾𝟔= 𝒉𝟐𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝑾𝟔= 𝟎. 𝟓𝟎𝟔

Partial Derivative

Substitution

E−𝑊6 (𝝏𝑬

𝝏𝑾𝟔) Parial Derivative

𝝏𝑬

𝛛𝑾𝟔=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝑾𝟔

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= 𝟎. 𝟐𝟑

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝟎. 𝟖𝟑𝟓

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝑾𝟔= 𝟎. 𝟓𝟎𝟔

𝝏𝑬

𝛛𝑾𝟔= 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ 𝟎. 𝟓𝟎𝟔

𝛛𝑬

𝛛𝑾𝟔= 𝟎. 𝟎𝟗𝟕

E−𝑊1 (𝝏𝑬

𝝏𝑾𝟏) Parial Derivative

𝝏𝑬

𝛛𝑾𝟏=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕∗

𝛛𝒉𝟏𝒐𝒖𝒕

𝛛𝒉𝟏𝒊𝒏∗

𝛛𝒉𝟏𝒊𝒏

𝛛𝑾𝟏

E−𝑊1 (𝝏𝑬

𝝏𝑾𝟏) Parial Derivative

𝝏𝑬

𝛛𝑾𝟏=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕∗

𝛛𝒉𝟏𝒐𝒖𝒕

𝛛𝒉𝟏𝒊𝒏∗

𝛛𝒉𝟏𝒊𝒏

𝛛𝑾𝟏

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= 𝟎. 𝟐𝟑

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝟎. 𝟖𝟑𝟓

E−𝑊1 (𝝏𝑬

𝝏𝑾𝟏) Parial Derivative

𝝏𝑬

𝛛𝑾𝟏=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕∗

𝛛𝒉𝟏𝒐𝒖𝒕

𝛛𝒉𝟏𝒊𝒏∗

𝛛𝒉𝟏𝒊𝒏

𝛛𝑾𝟏

Partial Derivative

Substitution

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝒉𝟏𝒐𝒖𝒕=

𝝏

𝝏𝒉𝟏𝒐𝒖𝒕(𝒉𝟏𝒐𝒖𝒕 ∗ 𝑾𝟓 + 𝒉𝟐𝒐𝒖𝒕 ∗ 𝑾𝟔 + 𝒃𝟑)

= (𝒉𝟏𝒐𝒖𝒕)𝟏−𝟏∗ 𝑾𝟓 + 𝟎 + 𝟎

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝒉𝟏𝒐𝒖𝒕= 𝑾𝟓

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝒉𝟏𝒐𝒖𝒕= 𝑾𝟓

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕= −𝟎. 𝟐

E−𝑊1 (𝝏𝑬

𝝏𝑾𝟏) Parial Derivative

𝝏𝑬

𝛛𝑾𝟏=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕∗

𝛛𝒉𝟏𝒐𝒖𝒕

𝛛𝒉𝟏𝒊𝒏∗

𝛛𝒉𝟏𝒊𝒏

𝛛𝑾𝟏

Partial Derivative

Substitution

𝝏𝒉𝟏𝒐𝒖𝒕

𝝏𝒉𝟏𝒊𝒏=

𝝏

𝝏𝒉𝟏𝒊𝒏(

𝟏

𝟏 + 𝒆−𝒉𝟏𝒊𝒏)

𝝏𝒉𝟏𝒐𝒖𝒕

𝝏𝒉𝟏𝒊𝒏= (

𝟏

𝟏 + 𝒆−𝒉𝟏𝒊𝒏)(𝟏 −

𝟏

𝟏 + 𝒆−𝒉𝟏𝒊𝒏)

𝝏𝒉𝟏𝒐𝒖𝒕

𝝏𝒉𝟏𝒊𝒏= (

𝟏

𝟏 + 𝒆−𝒉𝟏𝒊𝒏)(𝟏 −

𝟏

𝟏 + 𝒆−𝒉𝟏𝒊𝒏)

= (𝟏

𝟏 + 𝒆−𝟎.𝟒𝟖)(𝟏 −

𝟏

𝟏 + 𝒆−𝟎.𝟒𝟖)

𝝏𝒉𝟐𝒐𝒖𝒕

𝝏𝒉𝟐𝒊𝒏= 𝟎. 𝟐𝟑𝟔

E−𝑊1 (𝝏𝑬

𝝏𝑾𝟏) Parial Derivative

𝝏𝑬

𝛛𝑾𝟏=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕∗

𝛛𝒉𝟏𝒐𝒖𝒕

𝛛𝒉𝟏𝒊𝒏∗

𝛛𝒉𝟏𝒊𝒏

𝛛𝑾𝟏

Partial Derivative

Substitution

𝝏𝒉𝟏𝒊𝒏

𝝏𝑾𝟏=

𝝏

𝝏𝑾𝟏(𝑿𝟏 ∗ 𝑾𝟏 + 𝑿𝟐 ∗ 𝑾𝟐 + 𝒃𝟏)

= 𝑿𝟏 ∗ (𝑾𝟏)𝟏−𝟏+ 𝟎 + 𝟎

𝝏𝒉𝟏𝒊𝒏

𝝏𝑾𝟏= 𝑿𝟏

𝝏𝒉𝟏𝒊𝒏

𝝏𝑾𝟏= 𝑿𝟏

𝝏𝒉𝟏𝒊𝒏

𝝏𝑾𝟏= 𝟎. 𝟏

E−𝑊1 (𝝏𝑬

𝝏𝑾𝟏) Parial Derivative

𝝏𝑬

𝛛𝑾𝟏=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕∗

𝛛𝒉𝟏𝒐𝒖𝒕

𝛛𝒉𝟏𝒊𝒏∗

𝛛𝒉𝟏𝒊𝒏

𝛛𝑾𝟏

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= 𝟎. 𝟐𝟑

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝟎. 𝟖𝟑𝟓

𝝏𝒉𝟏𝒊𝒏

𝝏𝑾𝟏= 𝟎. 𝟏

𝝏𝒉𝟐𝒐𝒖𝒕

𝝏𝒉𝟐𝒊𝒏= 𝟎. 𝟐𝟑𝟔

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕= −𝟎. 𝟐

𝝏𝑬

𝝏𝑾𝟏= 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ −𝟎. 𝟐 ∗ 𝟎. 𝟐𝟑𝟔 ∗ 𝟎. 𝟏

𝝏𝑬

𝝏𝑾𝟏= −𝟎. 𝟎𝟎𝟏

E−𝑊2 (𝝏𝑬

𝝏𝑾𝟐) Parial Derivative:

𝝏𝑬

𝛛𝑾𝟐=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕∗

𝛛𝒉𝟏𝒐𝒖𝒕

𝛛𝒉𝟏𝒊𝒏∗

𝛛𝒉𝟏𝒊𝒏

𝛛𝑾𝟐

E−𝑊2 (𝝏𝑬

𝝏𝑾𝟐) Parial Derivative:

𝝏𝑬

𝛛𝑾𝟐=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕∗

𝛛𝒉𝟏𝒐𝒖𝒕

𝛛𝒉𝟏𝒊𝒏∗

𝛛𝒉𝟏𝒊𝒏

𝛛𝑾𝟐

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= 𝟎. 𝟐𝟑

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝟎. 𝟖𝟑𝟓

𝝏𝒉𝟐𝒐𝒖𝒕

𝝏𝒉𝟐𝒊𝒏= 𝟎. 𝟐𝟑𝟔

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕= −𝟎. 𝟐

E−𝑊2 (𝝏𝑬

𝝏𝑾𝟐) Parial Derivative:

Partial Derivative

Substitution

𝝏𝒉𝟏𝒊𝒏

𝝏𝑾𝟐=

𝝏

𝝏𝑾𝟐(𝑿𝟏 ∗ 𝑾𝟏 + 𝑿𝟐 ∗ 𝑾𝟐 + 𝒃𝟏)

= 𝟎 + 𝑿𝟐 ∗ (𝑾𝟐)𝟏−𝟏+𝟎

𝝏𝒉𝟏𝒊𝒏

𝝏𝑾𝟐= 𝑿𝟐

𝝏𝒉𝟏𝒊𝒏

𝝏𝑾𝟐= 𝑿𝟐

𝝏𝒉𝟏𝒊𝒏

𝝏𝑾𝟐= 𝟎. 𝟑

𝝏𝑬

𝛛𝑾𝟐=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕∗

𝛛𝒉𝟏𝒐𝒖𝒕

𝛛𝒉𝟏𝒊𝒏∗

𝛛𝒉𝟏𝒊𝒏

𝛛𝑾𝟐

E−𝑊2 (𝝏𝑬

𝝏𝑾𝟐) Parial Derivative:

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= 𝟎. 𝟐𝟑

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝟎. 𝟖𝟑𝟓

𝝏𝒉𝟐𝒐𝒖𝒕

𝝏𝒉𝟐𝒊𝒏= 𝟎. 𝟐𝟑𝟔

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕= −𝟎. 𝟐

𝝏𝒉𝟏𝒊𝒏

𝝏𝑾𝟐= 𝟎. 𝟑

𝝏𝑬

𝝏𝑾𝟐= 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ −𝟎. 𝟐 ∗ 𝟎. 𝟐𝟑𝟔 ∗ 𝟎. 𝟑

𝝏𝑬

𝝏𝑾𝟐= −. 𝟎𝟎𝟑

𝝏𝑬

𝛛𝑾𝟐=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟏𝒐𝒖𝒕∗

𝛛𝒉𝟏𝒐𝒖𝒕

𝛛𝒉𝟏𝒊𝒏∗

𝛛𝒉𝟏𝒊𝒏

𝛛𝑾𝟐

E−𝑊3 (𝝏𝑬

𝝏𝑾𝟑) Parial Derivative:

𝝏𝑬

𝛛𝑾𝟑=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟐𝒐𝒖𝒕∗

𝛛𝒉𝟐𝒐𝒖𝒕

𝛛𝒉𝟐𝒊𝒏∗

𝛛𝒉𝟐𝒊𝒏

𝛛𝑾𝟑

E−𝑊3 (𝝏𝑬

𝝏𝑾𝟑) Parial Derivative:

𝝏𝑬

𝛛𝑾𝟑=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟐𝒐𝒖𝒕∗

𝛛𝒉𝟐𝒐𝒖𝒕

𝛛𝒉𝟐𝒊𝒏∗

𝛛𝒉𝟐𝒊𝒏

𝛛𝑾𝟑

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= 𝟎. 𝟐𝟑

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝟎. 𝟖𝟑𝟓

E−𝑊3 (𝝏𝑬

𝝏𝑾𝟑) Parial Derivative:

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝒉𝟐𝒐𝒖𝒕=

𝝏

𝝏𝒉𝟐𝒐𝒖𝒕(𝒉𝟏𝒐𝒖𝒕 ∗ 𝑾𝟓 + 𝒉𝟐𝒐𝒖𝒕 ∗ 𝑾𝟔 + 𝒃𝟑)

= 𝟎 + (𝒉𝟐𝒐𝒖𝒕)𝟏−𝟏∗ 𝑾𝟔 + 𝟎

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝒉𝟐𝒐𝒖𝒕= 𝑾𝟔

Partial Derivative

Substitution 𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝒉𝟐𝒐𝒖𝒕= 𝑾𝟔

𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝒉𝟐𝒐𝒖𝒕= 𝟎. 𝟑

𝝏𝑬

𝛛𝑾𝟑=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟐𝒐𝒖𝒕∗

𝛛𝒉𝟐𝒐𝒖𝒕

𝛛𝒉𝟐𝒊𝒏∗

𝛛𝒉𝟐𝒊𝒏

𝛛𝑾𝟑

E−𝑊3 (𝝏𝑬

𝝏𝑾𝟑) Parial Derivative:

𝝏𝒉𝟐𝒐𝒖𝒕

𝝏𝒉𝟐𝒊𝒏=

𝝏

𝝏𝒉𝟐𝒊𝒏(

𝟏

𝟏 + 𝒆−𝒉𝟐𝒊𝒏)

𝝏𝒉𝟐𝒐𝒖𝒕

𝝏𝒉𝟐𝒊𝒏= (

𝟏

𝟏 + 𝒆−𝒉𝟐𝒊𝒏)(𝟏 −

𝟏

𝟏 + 𝒆−𝒉𝟐𝒊𝒏)

Partial Derivative

Substitution𝝏𝒉𝟐𝒐𝒖𝒕

𝝏𝒉𝟐𝒊𝒏= (

𝟏

𝟏 + 𝒆−𝒉𝟐𝒊𝒏)(𝟏 −

𝟏

𝟏 + 𝒆−𝒉𝟐𝒊𝒏)

= (𝟏

𝟏 + 𝒆−𝟎.𝟎𝟐𝟐)(𝟏 −

𝟏

𝟏 + 𝒆−𝟎.𝟎𝟐𝟐)

𝝏𝒉𝟐𝒐𝒖𝒕

𝝏𝒉𝟐𝒊𝒏= 𝟎. 𝟐𝟓

𝝏𝑬

𝛛𝑾𝟑=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟐𝒐𝒖𝒕∗

𝛛𝒉𝟐𝒐𝒖𝒕

𝛛𝒉𝟐𝒊𝒏∗

𝛛𝒉𝟐𝒊𝒏

𝛛𝑾𝟑

E−𝑊3 (𝝏𝑬

𝝏𝑾𝟑) Parial Derivative:

𝝏𝒉𝟐𝒊𝒏

𝝏𝑾𝟑=

𝝏

𝝏𝑾𝟑(𝑿𝟏 ∗ 𝑾𝟑 + 𝑿𝟐 ∗ 𝑾𝟒 + 𝒃𝟐)

= 𝑿𝟏 ∗ 𝑾𝟑 + 𝑿𝟐 ∗ 𝑾𝟒 + 𝒃𝟐

= (𝑿𝟏)𝟏−𝟏∗ 𝑾𝟑 + 𝟎 + 𝟎

𝝏𝒉𝟐𝒊𝒏

𝝏𝑾𝟑= 𝑾𝟑

Partial Derivative

Substitution𝝏𝒉𝟐𝒊𝒏

𝝏𝑾𝟑= 𝑾𝟑

𝝏𝒉𝟐𝒊𝒏

𝝏𝑾𝟑= 𝟎. 𝟔𝟐

𝝏𝑬

𝛛𝑾𝟑=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟐𝒐𝒖𝒕∗

𝛛𝒉𝟐𝒐𝒖𝒕

𝛛𝒉𝟐𝒊𝒏∗

𝛛𝒉𝟐𝒊𝒏

𝛛𝑾𝟑

E−𝑊3 (𝝏𝑬

𝝏𝑾𝟑) Parial Derivative:

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= 𝟎. 𝟐𝟑

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝟎. 𝟖𝟑𝟓 𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝒉𝟐𝒐𝒖𝒕= 𝟎. 𝟑

𝝏𝒉𝟐𝒐𝒖𝒕

𝝏𝒉𝟐𝒊𝒏= 𝟎. 𝟐𝟓

𝝏𝒉𝟐𝒊𝒏

𝝏𝑾𝟑= 𝟎. 𝟔𝟐

𝝏𝑬

𝝏𝑾𝟑= 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ 𝟎. 𝟑 ∗ 𝟎. 𝟐𝟓 ∗ 𝟎. 𝟔𝟐

𝝏𝑬

𝝏𝑾𝟑= 𝟎. 𝟎𝟎𝟗

𝝏𝑬

𝛛𝑾𝟑=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟐𝒐𝒖𝒕∗

𝛛𝒉𝟐𝒐𝒖𝒕

𝛛𝒉𝟐𝒊𝒏∗

𝛛𝒉𝟐𝒊𝒏

𝛛𝑾𝟑

E−𝑊4 (𝝏𝑬

𝝏𝑾𝟒) Parial Derivative:

𝝏𝑬

𝛛𝑾𝟒=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟐𝒐𝒖𝒕∗

𝛛𝒉𝟐𝒐𝒖𝒕

𝛛𝒉𝟐𝒊𝒏∗

𝛛𝒉𝟐𝒊𝒏

𝛛𝑾𝟒

E−𝑊4 (𝝏𝑬

𝝏𝑾𝟒) Parial Derivative:

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= 𝟎. 𝟐𝟑

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝟎. 𝟖𝟑𝟓 𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝒉𝟐𝒐𝒖𝒕= 𝟎. 𝟑

𝝏𝒉𝟐𝒐𝒖𝒕

𝝏𝒉𝟐𝒊𝒏= 𝟎. 𝟐𝟓

𝝏𝑬

𝛛𝑾𝟒=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟐𝒐𝒖𝒕∗

𝛛𝒉𝟐𝒐𝒖𝒕

𝛛𝒉𝟐𝒊𝒏∗

𝛛𝒉𝟐𝒊𝒏

𝛛𝑾𝟒

E−𝑊4 (𝝏𝑬

𝝏𝑾𝟒) Parial Derivative:

𝝏𝒉𝟐𝒊𝒏

𝝏𝑾𝟒=

𝝏

𝝏𝑾𝟒(𝑿𝟏 ∗ 𝑾𝟑 + 𝑿𝟐 ∗ 𝑾𝟒 + 𝒃𝟐)

= 𝑿𝟏 ∗ 𝑾𝟑 + 𝑿𝟐 ∗ 𝑾𝟒 + 𝒃𝟐

= 𝟎 + (𝑿𝟐)𝟏−𝟏∗ 𝑾𝟒 + 𝟎

𝝏𝒉𝟐𝒊𝒏

𝝏𝑾𝟒= 𝑾𝟒

𝝏𝒉𝟐𝒊𝒏

𝝏𝑾𝟒= 𝑾𝟒

𝝏𝒉𝟐𝒊𝒏

𝝏𝑾𝟒= 𝟎. 𝟐

Partial Derivative

Substitution

𝝏𝑬

𝛛𝑾𝟒=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟐𝒐𝒖𝒕∗

𝛛𝒉𝟐𝒐𝒖𝒕

𝛛𝒉𝟐𝒊𝒏∗

𝛛𝒉𝟐𝒊𝒏

𝛛𝑾𝟒

E−𝑊4 (𝝏𝑬

𝝏𝑾𝟒) Parial Derivative:

𝝏𝒐𝒖𝒕𝒐𝒖𝒕

𝝏𝒐𝒖𝒕𝒊𝒏= 𝟎. 𝟐𝟑

𝝏𝑬

𝝏𝒐𝒖𝒕𝒐𝒖𝒕= 𝟎. 𝟖𝟑𝟓 𝝏𝒐𝒖𝒕𝒊𝒏

𝝏𝒉𝟐𝒐𝒖𝒕= 𝟎. 𝟑

𝝏𝒉𝟐𝒐𝒖𝒕

𝝏𝒉𝟐𝒊𝒏= 𝟎. 𝟐𝟓

𝝏𝒉𝟐𝒊𝒏

𝝏𝑾𝟒= 𝟎. 𝟐

𝝏𝑬

𝝏𝑾𝟒= 𝟎. 𝟖𝟑𝟓 ∗ 𝟎. 𝟐𝟑 ∗ 𝟎. 𝟑 ∗ 𝟎. 𝟐𝟓 ∗ 𝟎. 𝟐

𝝏𝑬

𝝏𝑾𝟒= 𝟎. 𝟎𝟎𝟑

𝝏𝑬

𝛛𝑾𝟒=

𝛛𝑬

𝛛𝒐𝒖𝒕𝒐𝒖𝒕∗

𝛛𝒐𝒖𝒕𝒐𝒖𝒕

𝛛𝒐𝒖𝒕𝒊𝒏∗

𝛛𝒐𝒖𝒕𝒊𝒏

𝛛𝒉𝟐𝒐𝒖𝒕∗

𝛛𝒉𝟐𝒐𝒖𝒕

𝛛𝒉𝟐𝒊𝒏∗

𝛛𝒉𝟐𝒊𝒏

𝛛𝑾𝟒

All Error-Weights Partial Derivatives

𝝏𝑬

𝝏𝑾𝟒= 𝟎. 𝟎𝟎𝟑

𝝏𝑬

𝝏𝑾𝟑= 𝟎. 𝟎𝟎𝟗

𝝏𝑬

𝝏𝑾𝟐= −. 𝟎𝟎𝟑

𝝏𝑬

𝝏𝑾𝟏= −𝟎. 𝟎𝟎𝟏

𝛛𝑬

𝛛𝑾𝟔= 𝟎. 𝟎𝟗𝟕

𝝏𝑬

𝝏𝑾𝟓= 𝟎. 𝟏𝟏𝟗

Updated Weights𝑾𝟏𝒏𝒆𝒘 = 𝑾𝟏 − η ∗

𝝏𝑬

𝝏𝑾𝟏= 𝟎. 𝟓 − 𝟎. 𝟎𝟏 ∗ −𝟎. 𝟎𝟎𝟏 = 𝟎. 𝟓𝟎𝟎𝟎𝟏

𝑾𝟐𝒏𝒆𝒘 = 𝑾𝟐 − η ∗𝝏𝑬

𝝏𝑾𝟐= 𝟎. 𝟏 − 𝟎. 𝟎𝟏 ∗ −𝟎. 𝟎𝟎𝟑 = 𝟎. 𝟏𝟎𝟎𝟎𝟑

𝑾𝟑𝒏𝒆𝒘 = 𝑾𝟑 − η ∗𝝏𝑬

𝝏𝑾𝟑= 𝟎. 𝟔𝟐 − 𝟎. 𝟎𝟏 ∗ 𝟎. 𝟎𝟎𝟗 = 𝟎. 𝟔𝟏𝟗𝟗𝟏

𝑾𝟒𝒏𝒆𝒘 = 𝑾𝟒 − η ∗𝝏𝑬

𝝏𝑾𝟒= 𝟎. 𝟐 − 𝟎. 𝟎𝟏 ∗ 𝟎. 𝟎𝟎𝟑 = 𝟎. 𝟏𝟗𝟗𝟕

𝑾𝟓𝒏𝒆𝒘 = 𝑾𝟓 − η ∗𝝏𝑬

𝝏𝑾𝟓= −𝟎. 𝟐 − 𝟎. 𝟎𝟏 ∗ 𝟎. 𝟔𝟏𝟖 = −𝟎. 𝟐𝟎𝟔𝟏𝟖

𝑾𝟔𝒏𝒆𝒘 = 𝑾𝟔 − η ∗𝝏𝑬

𝝏𝑾𝟔= 𝟎. 𝟑 − 𝟎. 𝟎𝟏 ∗ 𝟎. 𝟎𝟗𝟕 = 𝟎. 𝟐𝟗𝟗𝟎𝟑

Continue updating weights according to derivatives and re-train the network until reaching an acceptable error.

Recommended