39
Introduction to Chainer 株式会社Preferred Networks 野健太 [email protected] 2015/9/5 LL Ring Recursive @新場 1stRing

Introduction to Chainer (LL Ring Recursive)

Embed Size (px)

Citation preview

  • Introduction to Chainer

    Preferred Networks

    [email protected]

    2015/9/5 LL Ring Recursive@ 1stRing

  • (@delta2323_)

    2012.3 PFI 2014.10 PFN

    Chainer

    http://delta2323.github.io

    NIPS2014ICML2015

    1!

    2

  • git clone https://github.com/pfnet/chainer.git

  • Chainerhttp://chainer.org

    PFNPFI

    201569

    1.3.0201592

    1.3.1 (9/16) 1.4.0 (9/30)

    MIT (Expat)

    HPhttp://chainer.org

    https://github.com/pfnet/chainer

    Twitter@ChainerOfficial

    Google GroupChainer Uesr Group

    Contribution Guidehttp://docs.chainer.org/en/stable/contribution.

    html

    PowerfulCUDAGPU

    Flexible

    IntuitivePython

  • x1

    xN

    h1

    hH

    kM

    k1

    yM

    y1

    Forward

    Backward

    5

    50%

  • AI

    +

    +

    QSAR

    ()

    e

  • Deep Q Network*

    * Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.** CaffeDeep Q-Network http://d.hatena.ne.jp/muupan/20141021/1413850461*** PFI2014 http://www.ustream.tv/recorded/53153399

    7

  • Kingma, Diederik P., et al. "Semi-supervised learning with deep generative models." Advances in Neural Information Processing Systems. 2014.

    http://soumith.ch/eyescream/

    Eye Scream Project http://soumith.ch/eyescream/

    A Neural Algorithm of Artistic Style [Gatys+'15]

  • 9

    https://research.preferred.jp/2015/06/distributed-deep-reinforcement-learning/

    http://rll.berkeley.edu/deeplearningrobotics/

  • 10

    PubChem

    55

    MPI

    MPI

    TSUBAME 824GPU(K40) MPI

  • Neural Network

    x1

    xN

    h1

    hH

    kM

    k1

    yM

    y1

    f1f2

    f3

    W2/b2W1/b1

    tM

    t1

    Forward

    Backward

    W1:1 b1:1 W2:2 b2:2

    11

    Forward h = f1(x) = Sigmoid(W1x+b1) k = f2(h) = Sigmoid(W2h+b2) y = f3(k) = SoftMax(k)

    f3i(k) = exp(ki)/_{j} exp(kj)

  • DeepLearning

    Caffe Chainer

    n

    Blob Variable

    Layer Function Net (FunctionSet)

    Solver Optimizer

    12

    (DAG)

  • Forward Propagation

    Forward(Loss)

    Loss

    Forward

  • Chain Rule

    Forward Propagationy = f(x; )

    : Layer

    L

    ,

    x y

  • Backward Propagation

    Backward

    Backward

  • Backward

    ( : )

    SGD / Momentum / AdaGrad / ADADELTA / RMSprop / Adam etc

    http://imgur.com/a/Hqolp

  • OSLinuxUbuntu 14.04

    MacOSWindows

    Python(Cpython)

    2.7+/3.4+

    Numpy1.9+Six1.9+

    CUDACUDA6.5+

    pip install chainer

  • Github Stars20155

    Theano

    PyLearn2

    https://twitter.com/fchollet/status/635891305084796929

  • Github Stars20158

    https://twitter.com/fchollet/status/635891305084796929

    PyLearn2

    Theano

  • 2

    Chainer

    Python

    CuPyGPUNumPy

    NumPy

    CPU GPU

    BLAS CUDAToolkit cuDNN

    NumPy CuPy

    Chainer

    Python

    NumPyPythonPythonNumPy

  • GoogLeNet, NTM, Recursive Net, LSTM

    Chainer Caffe167 2058

    GoogleNet

    (2012)AlexNet*, 7

    (2014) GoogLeNet**, 22

    22

    * ImageNet Classification with Deep Convolutional Neural Networks http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf** Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014).

    ChainerDefine-by-Run

  • Define-and-Run

    prototxt, yaml, Lua etc.

    Caffe/Torch/Theano

    1

    f g

    x f g

  • Define-by-Run

    for

    x yf

    x = chainer.Variable(...)y = f(x)z = g(x)

    zg

    =

    Chainer

  • Forward

    x = chainer.Variable(np.array(1))

    y = chainer.Variable(np.array(1))

    z = x**2 + 2*x*y + y

    z.backward()

    Split

    x

    y

    _ ** 2

    2 * _ _ * _

    _ + _ z

    _ + _

    chainer.Variable

    chainer.Function

  • Forward

    Split

    x

    y

    _ ** 2

    2 * _ _ * _

    _ + _ z

    _ + _

    x = chainer.Variable(np.array(1))

    y = chainer.Variable(np.array(1))

    z = x**2 + 2*x*y + y

    z.backward()

  • MNIST# (1) Model definition

    model = FunctionSet(

    l1=F.Linear(784, 100),

    l2=F.Linear(100, 100),

    l3=F.Linear(100, 10)).to_gpu()

    opt = optimizers.SGD()

    opt.setup(model)

    # (2) Forward computation

    def forward(x, t):

    h1 = F.relu(model.l1(x))

    h2 = F.relu(model.l2(h1))

    y = model.l3(h2)

    return F.softmax_cross_entropy(y, t)

    # (3) Training loop

    for epoch in xrange(n_epoch):

    for i in xrange(0, N, batchsize):

    x = Variable(to_gpu(...))

    t = Variable(to_gpu(...))

    opt.zero_grads()

    loss = forward(x, t)

    loss.backward()opt.update()

    784 100 100 10

    0:2%1:5%2:90%

    9:1%

  • FunctionSet# Model definition

    model = FunctionSet(

    l1=F.Linear(784, 100),

    l2=F.Linear(100, 100),

    l3=F.Linear(100, 10)).to_gpu()

    opt = optimizers.SGD()

    opt.setup(model)

    # Forward computation

    def forward(x, t):

    h1 = F.relu(model.l1(x))

    h2 = F.relu(model.l2(h1))

    y = model.l3(h2)

    return F.softmax_cross_entropy(y, t)

    # Training loop

    for epoch in xrange(n_epoch):

    for i in xrange(0, N, batchsize):

    x = Variable(to_gpu(...))

    t = Variable(to_gpu(...))

    opt.zero_grads()

    loss = forward(x, t)

    loss.backward()opt.update()

    FunctionFunctionSet

    784 100 100 10

    0:2%1:5%2:90%

    9:1%

  • Optimizer# Model definition

    model = FunctionSet(

    l1=F.Linear(784, 100),

    l2=F.Linear(100, 100),

    l3=F.Linear(100, 10)).to_gpu()

    opt = optimizers.SGD()

    opt.setup(model)

    # Forward computation

    def forward(x, t):

    h1 = F.relu(model.l1(x))

    h2 = F.relu(model.l2(h1))

    y = model.l3(h2)

    return F.softmax_cross_entropy(y, t)

    # Training loop

    for epoch in xrange(n_epoch):

    for i in xrange(0, N, batchsize):

    x = Variable(to_gpu(...))

    t = Variable(to_gpu(...))

    opt.zero_grads()

    loss = forward(x, t)

    loss.backward()opt.update()

    Optimizer

    784 100 100 10

    0:2%1:5%2:90%

    9:1%

  • # Model definition

    model = FunctionSet(

    l1=F.Linear(784, 100),

    l2=F.Linear(100, 100),

    l3=F.Linear(100, 10)).to_gpu()

    opt = optimizers.SGD()

    opt.setup(model)

    # Forward computation

    def forward(x, t):

    h1 = F.relu(model.l1(x))

    h2 = F.relu(model.l2(h1))

    y = model.l3(h2)

    return F.softmax_cross_entropy(y, t)

    # Training loop

    for epoch in xrange(n_epoch):

    for i in xrange(0, N, batchsize):

    x = Variable(to_gpu(...))

    t = Variable(to_gpu(...))

    opt.zero_grads()

    loss = forward(x, t)loss.backward()opt.update()

    784 100 100 10

    0:2%1:5%2:90%

    9:1%

  • # Model definition

    model = FunctionSet(

    l1=F.Linear(784, 100),

    l2=F.Linear(100, 100),

    l3=F.Linear(100, 10)).to_gpu()

    opt = optimizers.SGD()

    opt.setup(model)

    # Forward computation

    def forward(x, t):

    h1 = F.relu(model.l1(x))

    h2 = F.relu(model.l2(h1))

    y = model.l3(h2)

    return F.softmax_cross_entropy(y, t)

    # Training loop

    for epoch in xrange(n_epoch):

    for i in xrange(0, N, batchsize):

    x = Variable(to_gpu(...))

    t = Variable(to_gpu(...))

    opt.zero_grads()

    loss = forward(x, t)

    loss.backward()opt.update()

    784 100 100 10

    0:2%1:5%2:90%

    9:1%

  • Python (if / for / while etc)

    ForRNN

    def forward(x, t, train=True):h = F.relu(model.l1(x))y = model.l2(h)if train:loss = F.softmax_cross_entropy(y, t)return loss

    else:prob = F.softmax(y)acc = F.accuracy(y, t)return acc

    y sceloss

    y smprob acc

    acc

  • y

    y

    truncated BPTT

    x f y g z

    y g z

    y.unchain_backward()

    x = Variable()

    y = f(x)

    z = g(y)

    y.unchain_backward()

    BPTTBack Propagation Through TimeRNNtruncated BPTTBPTT

  • Caffe Reference Model

    Caffe Model ZooBVLC Reference ModelChainerfunction

    func = CaffeFunction('path/to/bvlc_reference_caffenet.caffemodel')

    x = Variable()

    y, = func(inputs={'data': x}, outputs=['fc8'])

    CaffeC++Model ZooCaffeWiki

  • CuPyGPUNumPy

    cupy.ndarray

    numpy.ndarray

    etc

    ElementwiseReduction

    CPUGPU

    def softmax(x)

    xp = get_array_module(x)

    y = x x.max(axis=1, keepdims=True)

    y = xp.exp(y)

    return y / y.sum(axis=1, keepdims=True) xp = numpy/cupy

    OK

    numpy/cupy

  • ChainerDefine-by-RunPython

    Python

    CuPyCPUGPU

    HPhttp://chainer.org

    https://github.com/pfnet/chainer

    Twitter@ChainerOfficial

    Google GroupChainer Uesr Group

    Contribution Guidehttp://docs.chainer.org/en/stable/contribution.html

    git clone https://github.com/pfnet/chainer.git

    Your Contribution is Welcomed!!

  • MochaJulia

    Chiyuan Zhang (MIT)

    v0.0.9(2015721)

    MIT Expat License

    train LeNet with MNIST

    https://github.com/pluskid/Mocha.jl#hell

    o-world

    Caffe

    Caffe

    Caffe

    Pure Julia / C++ / GPU