Introduction to Chainer (LL Ring Recursive)

Introduction to Chainer

Preferred Networks

[email protected]

2015/9/5 LL Ring Recursive@ 1stRing

(@delta2323_)

2012.3 PFI 2014.10 PFN

Chainer

http://delta2323.github.io

NIPS2014ICML2015

1!

2

git clone https://github.com/pfnet/chainer.git

Chainerhttp://chainer.org

PFNPFI

201569

1.3.0201592

1.3.1 (9/16) 1.4.0 (9/30)

MIT (Expat)

HPhttp://chainer.org

https://github.com/pfnet/chainer

Twitter@ChainerOfficial

Google GroupChainer Uesr Group

Contribution Guidehttp://docs.chainer.org/en/stable/contribution.

html

PowerfulCUDAGPU

Flexible

IntuitivePython

x1

xN

h1

hH

kM

k1

yM

y1

Forward

Backward

5

50%

AI

+

+

QSAR

()

e

Deep Q Network*

* Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.** CaffeDeep Q-Network http://d.hatena.ne.jp/muupan/20141021/1413850461*** PFI2014 http://www.ustream.tv/recorded/53153399

7

Kingma, Diederik P., et al. "Semi-supervised learning with deep generative models." Advances in Neural Information Processing Systems. 2014.

http://soumith.ch/eyescream/

Eye Scream Project http://soumith.ch/eyescream/

A Neural Algorithm of Artistic Style [Gatys+'15]

9

https://research.preferred.jp/2015/06/distributed-deep-reinforcement-learning/

http://rll.berkeley.edu/deeplearningrobotics/

10

PubChem

55

MPI

MPI

TSUBAME 824GPU(K40) MPI

Neural Network

x1

xN

h1

hH

kM

k1

yM

y1

f1f2

f3

W2/b2W1/b1

tM

t1

Forward

Backward

W1:1 b1:1 W2:2 b2:2

11

Forward h = f1(x) = Sigmoid(W1x+b1) k = f2(h) = Sigmoid(W2h+b2) y = f3(k) = SoftMax(k)

f3i(k) = exp(ki)/_{j} exp(kj)

DeepLearning

Caffe Chainer

n

Blob Variable

Layer Function Net (FunctionSet)

Solver Optimizer

12

(DAG)

Forward Propagation

Forward(Loss)

Loss

Forward

Chain Rule

Forward Propagationy = f(x; )

: Layer

L

,

x y

Backward Propagation

Backward

Backward

Backward

( : )

SGD / Momentum / AdaGrad / ADADELTA / RMSprop / Adam etc

http://imgur.com/a/Hqolp

OSLinuxUbuntu 14.04

MacOSWindows

Python(Cpython)

2.7+/3.4+

Numpy1.9+Six1.9+

CUDACUDA6.5+

pip install chainer

Github Stars20155

Theano

PyLearn2

https://twitter.com/fchollet/status/635891305084796929

Github Stars20158

https://twitter.com/fchollet/status/635891305084796929

PyLearn2

Theano

2

Chainer

Python

CuPyGPUNumPy

NumPy

CPU GPU

BLAS CUDAToolkit cuDNN

NumPy CuPy

Chainer

Python

NumPyPythonPythonNumPy

GoogLeNet, NTM, Recursive Net, LSTM

Chainer Caffe167 2058

GoogleNet

(2012)AlexNet*, 7

(2014) GoogLeNet**, 22

22

* ImageNet Classification with Deep Convolutional Neural Networks http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf** Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014).

ChainerDefine-by-Run

Define-and-Run

prototxt, yaml, Lua etc.

Caffe/Torch/Theano

1

f g

x f g

Define-by-Run

for

x yf

x = chainer.Variable(...)y = f(x)z = g(x)

zg

=

Chainer

Forward

x = chainer.Variable(np.array(1))

y = chainer.Variable(np.array(1))

z = x**2 + 2*x*y + y

z.backward()

Split

x

y

_ ** 2

2 * _ _ * _

_ + _ z

_ + _

chainer.Variable

chainer.Function

Forward

Split

x

y

_ ** 2

2 * _ _ * _

_ + _ z

_ + _

x = chainer.Variable(np.array(1))

y = chainer.Variable(np.array(1))

z = x**2 + 2*x*y + y

z.backward()

MNIST# (1) Model definition

model = FunctionSet(

l1=F.Linear(784, 100),

l2=F.Linear(100, 100),

l3=F.Linear(100, 10)).to_gpu()

opt = optimizers.SGD()

opt.setup(model)

# (2) Forward computation

def forward(x, t):

h1 = F.relu(model.l1(x))

h2 = F.relu(model.l2(h1))

y = model.l3(h2)

return F.softmax_cross_entropy(y, t)

# (3) Training loop

for epoch in xrange(n_epoch):

for i in xrange(0, N, batchsize):

x = Variable(to_gpu(...))

t = Variable(to_gpu(...))

opt.zero_grads()

loss = forward(x, t)

loss.backward()opt.update()

784 100 100 10

0:2%1:5%2:90%

9:1%

FunctionSet# Model definition


l1=F.Linear(784, 100),

l2=F.Linear(100, 100),



opt.setup(model)

# Forward computation

def forward(x, t):



y = model.l3(h2)


# Training loop





opt.zero_grads()



FunctionFunctionSet

784 100 100 10

0:2%1:5%2:90%

9:1%

Optimizer# Model definition


l1=F.Linear(784, 100),

l2=F.Linear(100, 100),



opt.setup(model)


def forward(x, t):



y = model.l3(h2)


# Training loop





opt.zero_grads()



Optimizer

784 100 100 10

0:2%1:5%2:90%

9:1%

# Model definition


l1=F.Linear(784, 100),

l2=F.Linear(100, 100),



opt.setup(model)


def forward(x, t):



y = model.l3(h2)


# Training loop





opt.zero_grads()

loss = forward(x, t)loss.backward()opt.update()

784 100 100 10

0:2%1:5%2:90%

9:1%

# Model definition


l1=F.Linear(784, 100),

l2=F.Linear(100, 100),



opt.setup(model)


def forward(x, t):



y = model.l3(h2)


# Training loop





opt.zero_grads()



784 100 100 10

0:2%1:5%2:90%

9:1%

Python (if / for / while etc)

ForRNN

def forward(x, t, train=True):h = F.relu(model.l1(x))y = model.l2(h)if train:loss = F.softmax_cross_entropy(y, t)return loss

else:prob = F.softmax(y)acc = F.accuracy(y, t)return acc

y sceloss

y smprob acc

acc

y

y

truncated BPTT

x f y g z

y g z

y.unchain_backward()

x = Variable()

y = f(x)

z = g(y)

y.unchain_backward()

BPTTBack Propagation Through TimeRNNtruncated BPTTBPTT

Caffe Reference Model

Caffe Model ZooBVLC Reference ModelChainerfunction

func = CaffeFunction('path/to/bvlc_reference_caffenet.caffemodel')

x = Variable()

y, = func(inputs={'data': x}, outputs=['fc8'])

CaffeC++Model ZooCaffeWiki

CuPyGPUNumPy

cupy.ndarray

numpy.ndarray

etc

ElementwiseReduction

CPUGPU

def softmax(x)

xp = get_array_module(x)

y = x x.max(axis=1, keepdims=True)

y = xp.exp(y)

return y / y.sum(axis=1, keepdims=True) xp = numpy/cupy

OK

numpy/cupy

ChainerDefine-by-RunPython

Python

CuPyCPUGPU

HPhttp://chainer.org

https://github.com/pfnet/chainer

Twitter@ChainerOfficial

Google GroupChainer Uesr Group

Contribution Guidehttp://docs.chainer.org/en/stable/contribution.html

git clone https://github.com/pfnet/chainer.git

Your Contribution is Welcomed!!

MochaJulia

Chiyuan Zhang (MIT)

v0.0.9(2015721)

MIT Expat License

train LeNet with MNIST

https://github.com/pluskid/Mocha.jl#hell

o-world

Caffe

Caffe

Caffe

Pure Julia / C++ / GPU

Technology

Introduction to Chainer (LL Ring Recursive)