Recurrent Neural Networks - GitHub PagesRNN: LSTM: Source: Understanding LSTM Networks Christopher...

深度学习讨论班

黄雷

2016-12-27

第四节Recurrent Neural Networks

(递归神经网络)

上一节主要内容

• Modeling of CNN

– Module-wise architecture模块化结构

• Convolutional layer (module)

– Convolution in general

– Filters

– Convolution module

• Pooling layer (module)

Module-wise architectureTorch 平台

Input:

𝑥0 = 1 ℎ0(1)

output

hidden layer

linear linear

𝒂(𝟏) = 𝑾(𝟏) ∙ 𝒙 𝒉(𝟏) = 𝜎 𝒂(𝟏)

d L𝒅𝒙

=d L𝒅𝒂(𝟏)

𝑾(𝟏) d L𝒂(𝟏)

=d L𝒅 𝒉(𝟏)

𝜎 𝒂(𝟏) (1−𝜎 𝒂(𝟏) )d L𝒅𝒙

=g(d L𝒅𝒚

𝑦 = 𝑓(𝒙)

𝒅𝑾 d L𝒅𝑾(𝟏)=

d L𝒅𝒂(𝟏)

• Training per iteration:

Convolution Neural Network

• Lenet-5

Convolution (卷积) Pooling (池化) 全连接层

Convolution

• 连续空间的卷积:

𝑦 𝑡 = 𝑥(𝑡) ∗ ℎ(𝑡) =

−∞

𝑥 𝑠 ℎ 𝑡 − 𝑠 𝑑𝑠

𝑔 𝑖, 𝑗 =

𝑘,𝑙

𝑓 𝑘, 𝑙 𝑤(𝑖 − 𝑘, 𝑗 − 𝑙)

• 图像卷积是二维离散卷积

• 离散空间卷积：

𝑦 𝑛 = 𝑥(𝑛) ∗ 𝑤(𝑛) =

𝑖=−∞

𝑖=+∞

𝑥 𝑖 𝑤 𝑛 − 𝑖

Practice with linear filters

Original

Edge detect （边缘检测）

Filter

Output Image

Convolution Layer (卷积层)Input: X∈

𝑹𝒅𝒊𝒏×𝒉×𝒘

weight: W∈

𝑹𝒅𝒐𝒖𝒕×𝒅𝒊𝒏×𝑭𝒉×𝑭𝒘

output: Y∈

𝑹𝒅𝒐𝒖𝒕×𝒉×𝒘

Input: x∈ 𝑹𝒅𝒊𝒏

weight: W∈ 𝑹𝒅𝒐𝒖𝒕×𝒅𝒊𝒏

output: y∈ 𝑹𝒅𝒐𝒖𝒕

Feature maps

Forward (前向过程)

Input: X∈𝑹𝒅𝒊𝒏×𝒉×𝒘

weight: W∈𝑹𝒅𝒐𝒖𝒕×𝒅𝒊𝒏×𝑭𝒉×𝑭𝒘

output: Y∈𝑹𝒅𝒐𝒖𝒕×𝒉×𝒘

𝒚𝒇′,𝒊′,𝒋′ =

𝒇=𝟏

𝒅𝒊𝒏

𝒊=−𝑭𝒉𝟐

𝑭𝒉𝟐

𝒋=−𝑭𝒘𝟐

𝑭𝒘𝟐

𝒙𝒇,𝒊′+𝒊−𝟏,𝒋′+𝒋−𝟏𝒘𝒇′,𝒇,𝒊,𝒋 +𝒃 𝒇′

Feature maps

Back-propagation

Input: X∈

𝑹𝒅𝒊𝒏×𝒉×𝒘

weight: W∈

𝑹𝒅𝒐𝒖𝒕×𝒅𝒊𝒏×𝑭𝒉×𝑭𝒘

output: Y∈

𝑹𝒅𝒐𝒖𝒕×𝒉×𝒘

𝒅𝑳

𝒘𝒇′,𝒇,𝒊,𝒋

= 𝒇′=𝟏𝒅𝒐𝒖𝒕 𝒊′=𝟏

𝒉 𝒋′=𝟏𝒘 𝒅𝑳

𝒅𝒚𝒇′,𝒊′,𝒋′

𝒘𝒇′,𝒇,𝒊,𝒋

𝒇′=𝟏

𝒅𝒐𝒖𝒕

𝒊′=𝟏

𝒋′=𝟏

𝒘𝒅𝑳

𝒅𝒚𝒇′,𝒊′,𝒋′𝒙𝒇,𝒊′+𝒊−𝟏,𝒋′+𝒋−𝟏

𝒚𝒇′,𝒊′,𝒋′ =

𝒇=𝟏

𝒅𝒊𝒏

𝒊=−𝑭𝒉𝟐

𝑭𝒉𝟐

𝒋=−𝑭𝒘𝟐

𝑭𝒘𝟐

𝒙𝒇,𝒊′+𝒊−𝟏,𝒋′+𝒋−𝟏𝒘𝒇′,𝒇,𝒊,𝒋 +𝒃 𝒇′

𝒅𝑳

𝒅𝒙𝒇,𝒊,𝒋=

𝒇′=𝟏𝒅𝒐𝒖𝒕 𝒊′=𝟏

𝒉 𝒋′=𝟏𝒘 𝒅𝑳

𝒅𝒚𝒇′,𝒊′,𝒋′

𝒅𝒙𝒇,𝒊,𝒋

𝒇′=𝟏

𝒅𝒐𝒖𝒕

𝒊′=𝟏

𝒋′=𝟏

𝒘𝒅𝑳

𝒅𝒚𝒇′,𝒊′,𝒋′𝒘𝒇′,𝒇,𝒊−𝒊′+𝟏,𝒋−𝒋′+𝟏

• In ConvNet architectures, Conv layers are often followed by Pooling layers• makes the representations smaller and more

manageable without losing too much information.• Invariant in region.

downsampling32

POOLING Layer

Source: Stanford CS231n, Andrej Karpathy & Fei-Fei Li

outline

• Recurrent Neural Network

– Modeling

– Training

• Long Short Term Memory (LSTM)

–Motivation

–Modeling

• Application

– Generate article

Classification: MLP and CNN

(1, 0, 0)𝑇

Input:

𝑥0 = 1 ℎ0(1)

output

hidden layer(𝑥1 , 𝑥2 , 𝑥3 )Convolution (卷积)

I go to cinema, and I book a ticket

One example-modeling: motivation

• Task: Character-level language model– example

• Vocabulary [h,e,l,o]

• Training sequence“hello”

• Data representation：

– X: {h, e, l, l}

– Y: {e, l , l, o}

Input: x hidden layer output layer: Y

Modeling

• MLP RNN

𝒉 = 𝜎 𝑾𝒉𝒙𝒙

𝒚 = 𝜎 𝑾𝒚𝒉𝒉𝜎 𝑎 =

1 + 𝑒−𝑎

tanh 𝑎 =𝑒𝑎−𝑒−𝑎

𝑒𝑎+𝑒−𝑎

𝐲 = 𝑾𝒚𝒉𝒉

𝒉 = 𝑡𝑎𝑛ℎ 𝑾𝒉𝒙𝒙

𝐲 = 𝑾𝒚𝒉𝒉

𝒉𝒕 = 𝑡𝑎𝑛ℎ 𝑾𝒉𝒙𝒙 +𝑾𝒉𝒉𝒉𝒕−𝟏

𝒚𝒕 = 𝑾𝒚𝒉𝒉𝒕

Modeling

• MLP RNN

Modeling

• RNN-unrolling(展开)

𝒉𝒕 = 𝑡𝑎𝑛ℎ 𝑾𝒉𝒙𝒙 +𝑾𝒉𝒉𝒉𝒕−𝟏

𝒚𝒕 = 𝑾𝒚𝒉𝒉𝒕

𝒙𝟏

𝒚𝟏

𝒉𝟏

𝒙𝟐

𝒚𝟐

𝒉𝟐

𝒙𝟑

𝒚𝟑

𝒉𝟑𝑾𝒉𝒉

𝑾𝒉𝒙 𝑾𝒉𝒙 𝑾𝒉𝒙

𝑾𝒉𝒉

𝑾yh 𝑾yh 𝑾yh

Example

• Character-level language model

– Training sequence:

• “Hello”

– Presentation:

X: {h, e, l, l}

Y: {e, l , l, o} One-hot aka one-of-K encoding

• Examples:

• “Hello”

– Presentation:

X: {h, e, l, l}

Y: {e, l , l, o}

Example

• Examples:

• “Hello”

– Presentation:

X: {h, e, l, l}

Y: {e, l , l, o}

Example

• Examples:

• “Hello”

– Presentation:

X: {h, e, l, l}

Y: {e, l , l, o}

Example

Source: The Unreasonable Effectiveness of Recurrent Neural NetworksAndrej Karpathy

outline

– Modeling

– Training

–Motivation

–Modeling

• Application

Neural Network• Training Algorithm

0.初始化权重 𝑾(0)

1. 前向过程： 1.1根据输入𝒙，计算输出值 𝒚 1.2.计算损失函数值L(𝐲, 𝒚)。

2.后向传播

计算d L𝐲

后向传播直到计算d L𝐱

3.计算梯度d L𝒅𝑾

4.更新梯度

𝑾(𝑡+1) = 𝑾(𝑡) − 𝜂d L𝒅𝑾(𝑡)

(1, 0, 0)𝑇

Input:

𝑥0 = 1 ℎ0(1)

output

hidden layer

Training

• learning

– Sequence length=3

𝒙𝟏

𝒚𝟏

𝒉𝟏

𝒙𝟐

𝒚𝟐

𝒉𝟐

𝒙𝟑

𝒚𝟑

𝑾𝒉𝒉

Target: 𝒚𝟏 𝒚𝟐 𝒚𝟑

• Back-propagation

-{d L𝒅 𝒚𝑖

d L𝒅 𝒉𝑖

d L𝒅 𝒙𝑖

Training

𝒙𝟑

𝒚𝟑

𝒉𝟑

𝑾𝒉𝒙

𝑾yh

Target: 𝒚𝟑

d L𝑑 𝒉3

=d L𝒅 𝒚3

d𝒚3𝑑𝒉3

Training

𝒙𝟐

𝒚𝟐

𝒉𝟐

𝒙𝟑

𝒚𝟑

𝒉𝟑

𝑾𝒉𝒙 𝑾𝒉𝒙

𝑾𝒉𝒉

𝑾yh 𝑾yh

Target: 𝒚𝟐 𝒚𝟑

d L𝑑 𝒉3

=d L𝒅 𝒚3

d𝒚3𝑑𝒉3

d L𝑑 𝒉2

=d L𝒅 𝒚2

d𝒚2𝑑𝒉2

+d L𝒅 𝒚3

d𝒚3𝑑𝒉3

d𝒉3𝑑𝒉2

Training

𝒙𝟏

𝒚𝟏

𝒉𝟏

𝒙𝟐

𝒚𝟐

𝒉𝟐

𝒙𝟑

𝒚𝟑

𝑾𝒉𝒉

d L𝑑 𝒉3

=d L𝒅 𝒚3

d𝒚3𝑑𝒉3

d L𝑑 𝒉2

=d L𝒅 𝒚2

d𝒚2𝑑𝒉2

+d L𝒅 𝒚3

d𝒚3𝑑𝒉3

d𝒉3𝑑𝒉2

d L𝑑 𝒉1

=d L𝒅 𝒚1

d𝒚1𝑑𝒉1

+d L𝒅 𝒚2

d𝒚2𝑑𝒉2

d𝒉2𝑑𝒉1

d L𝒅 𝒚3

d𝒚3𝑑𝒉3

d𝒉3𝑑𝒉2

d𝒉2𝑑𝒉1

𝑑 𝒉𝑡=

𝑠=𝑡

𝑇=3d L

𝒅 𝒚𝑠

d𝒚𝑠𝑑𝒉𝑠

d𝒉𝑠𝑑𝒉𝑡

Training

• Gradient r.t Weight

𝒙𝟏

𝒚𝟏

𝒉𝟏

𝒙𝟐

𝒚𝟐

𝒉𝟐

𝒙𝟑

𝒚𝟑

𝑾𝒉𝒉

𝑑 𝒉𝑡=

𝑠=𝑡

𝑇d L

𝒅 𝒚𝑠

𝑑𝐖hh(i)

𝑑𝐖hx(i)

𝑑𝐖yh(i)

Calculate gradient respect to weight

𝑾(𝑡+1) = 𝑾(𝑡) − 𝜂 𝑠=1𝑇 d L

𝒅𝑾(𝑠)

Update weight

• longer

• deeper

𝒙𝟏

𝒚𝟏

𝒉𝟏

𝒙𝟐

𝒚𝟐

𝒉𝟐

𝒙𝟑

𝒚𝟑

𝑾𝒉𝒉

𝒙𝑻

𝒚𝑻

𝒉𝑻

𝑾𝒉𝒙

𝑾𝒉𝒉

𝑾yh

𝒚𝑻

Relation: MLP, CNN and RNN

𝒕𝟏

𝒕𝟐

x∈ 𝑹𝒅𝒊𝒏

𝑾𝒙𝒉 ∈ 𝑹𝒅𝒐𝒖𝒕×𝒅𝒊𝒏

y∈ 𝑹𝒅𝒐𝒖𝒕

X∈ 𝑹𝒅𝒊𝒏×𝒉×𝒘

W∈ 𝑹𝒅𝒐𝒖𝒕×𝒅𝒊𝒏×𝑭𝒉×𝑭𝒘

Y∈ 𝑹𝒅𝒐𝒖𝒕×𝒉×𝒘

𝑾𝒉𝒉

𝑾 ∈ 𝑹𝒅𝒐𝒖𝒕×𝒅𝒊𝒏

outline

– Modeling

– Training

–Motivation

–Modeling

• Application

Long Short Term Memory (LSTM)

• Motivation

“the clouds are in the sky,”

Source: Understanding LSTM NetworksChristopher Olah

Long Short Term Memory (LSTM)

• Motivation

“I grew up in France… I speak fluent French.”

• Motivation– Gradient explosion (梯度爆炸)

– Gradient vanishing(梯度弥散)

𝒙𝟏

𝒚𝟏

𝒉𝟏

𝒙𝟐

𝒚𝟐

𝒉𝟐

𝒙𝟑

𝒚𝟑

𝑾𝒉𝒉

𝑑 𝒉𝑡=

𝑠=𝑡

𝑇d L

𝒅 𝒚𝑠

outline

– Modeling

– Training

–Motivation

–Modeling

• Application

LSTM• Modeling 𝒉𝒕 = 𝑡𝑎𝑛ℎ 𝑾𝒉𝒙𝒙 +𝑾𝒉𝒉𝒉𝒕−𝟏

LSTM• Modeling

– Input gate

– Input information

LSTM• Modeling

Forget Gate

PreviousInfo

InputGate

InputInfoSource: Understanding LSTM Networks

Christopher Olah

LSTM• Modeling

– Output gate

– Output information

LSTM• Modeling

outline

– Modeling

– Training

–Motivation

–Modeling

• Application

RNN-application

• Generate article

RNN-application• Generate article

RNN-application• Generate C code

RNN-application• Generated C code

Recurrent Neural Networks - GitHub PagesRNN: LSTM: Source: Understanding LSTM Networks Christopher...

Documents

RNN & LSTM

Understanding Convolutional Neural Networks · 2017-10-17 · Abstract This seminar paper focusses on convolutional neural networks and a visualization technique allowing further

4/8/03 lixia@cs.ucla.edu 1 Telecommunication networks Circuit-switched networks FDM TDM Packet-switched networks virtual circuit Networks Datagram Networks

Transformation Networks for Target-Oriented Sentiment ...tti-coin.jp/share/tsujimura_acl2018nagoya.pdf · •Bi-LSTM層 •二方向をconcatして出力とする普通のBi-LSTM

レビューのネガポジ RandomForest vs LSTM

MLP深層学習 LSTM

FPGA implementation of a LSTM Neural NetworkAbstract The objective of this Dissertation work is to implement a Long Short-Term Memory (LSTM) Neural Network in an FPGA platform. Neural

IMPLEMENTASI METODE LSTM-RNN PADA KLASIFIKASI …

Path Computation Solutions for Multi-Domain OTN Networks ......Networks and Federated Controllers Pedro Miguel Martins Robalo ... For this, were done simulations to have an understanding

LSTM: A Search Space Odyssey - University of Illinois

APPRENTISSAGE MACHINE & DEEP LEARNING Deep Learning · Understanding the difficulty of training deep feedforward neural networks, Glorot and Bengio Adaptive Subgradient Methods for

Why did the network make this prediction?ece739/lectures/18739... · Understanding Deep Neural Networks We understand them enough to: Design architectures for complex learning tasks

Learning to forget continual prediction with lstm

MathWorks Japan · 系列データを分類するには？ LSTM Block を複数段重ねて使うこともある LSTM Block 𝒕+𝑵 𝒕+𝑵 Output LSTM Block LSTM Block LSTM Block

Backpropagation: Understanding How to Update Artificial Neural Networks Weights Step by Step

Understanding Understanding Source Code with …ckaestne/pdf/icse14_fmri.pdf · Understanding Understanding Source Code with Functional Magnetic Resonance Imaging Janet Siegmundˇ,

2005 IPA Performance Benchmarking Program...marketing and selling techniques, and a deep understanding of sectors and multinational production and servicing networks. IPAs also have

kyoto univ 2020 2021 allpages · 2020. 11. 16. · Kyoto University engages with the international community through diverse and vibrant networks founded on multicultural understanding

LSTM 및 CNN-LSTM 신경망을 활용한 도시부 간선도로 속도 예측

Understanding Large Internet Service Provider …ji/F02/ir21/21-guest-gottlieb.pdf · Understanding Large Internet Service Provider Backbone Networks Joel M. Gottlieb IP Network Management