16
Copyright ©201Treasure Data. All Rights Reserved. Treasure Data Inc. Research Engineer 油井 誠 @myui 2015/04/30 Machine Learning Casual Talk #3 1 Hivemall v0.3 の新機能の紹介 http://myui.github.io/

Hivemall LT @ Machine Learning Casual Talks #3

Embed Size (px)

Citation preview

  • Copyright 201 Treasure Data. All Rights Reserved.

    Treasure Data Inc.Research Engineer@myui

    2015/04/30Machine Learning Casual Talk #3 1

    Hivemall v0.3

    http://myui.github.io/

  • Copyright 201 Treasure Data. All Rights Reserved.

    2015/04 1ML as a Service (MLaaS)(?)

    2015/03

    2009/03 NAIST XML

    H141

    2

  • Copyright 201 Treasure Data. All Rights Reserved.

    3

    0

    2000

    4000

    6000

    8000

    10000

    12000

    Aug-12

    Sep-12Oct-12

    Nov-12

    Dec-12Jan-13

    Feb-13

    Mar-13

    Apr-13

    May-13Jun-13

    Jul-13

    Aug-13

    Sep-13Oct-13

    Nov-13

    Dec-13Jan-14

    Feb-14

    Mar-14

    Apr-14

    May-14Jun-14

    Jul-14

    Aug-14

    Sep-14Oct-14

    (

    )10

    Series A Funding

    100

    GartnerCool Vendor in Big Data

    10

    (201410):40 10

    120 1

  • Copyright 201 Treasure Data. All Rights Reserved.

    100+

    15

    4,000

    500,0001

    4

  • Copyright 201 Treasure Data. All Rights Reserved.

    HivemallApache Hadoop

    Hadoop HDFS

    MapReduce(MRv1)

    Hive/PIG

    Hivemall

    Apache YARN

    Apache TezDAG MR v2

    github.com/myui/hivemall

    5

  • Copyright 201 Treasure Data. All Rights Reserved.

    SQL

    Hivemall

    Mahout

    CREATE TABLE lr_model ASSELECTfeature, -- reducers perform model averaging in parallelavg(weight) as weightFROM (SELECT logress(features,label,..) as (feature,weight)FROM train) t -- map-only taskGROUP BY feature; -- shuffled to reducers

    APIHiveQLAPIstableSparkunstable)

    Hadoop

    6

  • Copyright 201 Treasure Data. All Rights Reserved.

    Hivemall v0.3

    7

    (/) Perceptron Passive Aggressive (PA) Confidence Weighted (CW) Adaptive Regularization of Weight

    Vectors (AROW) Soft Confidence Weighted (SCW) AdaGrad+RDA

    PA Regression AROW Regression AdaGrad AdaDELTA

    K & Minhashb-Bit Minhash (LSH variant)KMatrix Factorization

    Feature engineering Feature hashing Feature scaling (normalization, z-score) TF-IDF vectorizer

    v0.35

  • Copyright 201 Treasure Data. All Rights Reserved.

    8

    Matrix Factorization

    kP,Q

  • Copyright 201 Treasure Data. All Rights Reserved.

    9

    Matrix Factorization

    Biased MFSGDAdagrad

  • Copyright 201 Treasure Data. All Rights Reserved.

    10

    Matrix Factorization

  • Copyright 201 Treasure Data. All Rights Reserved.

    11

    Matrix Factorization/

  • Copyright 201 Treasure Data. All Rights Reserved.

    12

    1

    2

    N

  • Copyright 201 Treasure Data. All Rights Reserved.

    create table kdd10a_pa1_model1 asselect feature,cast(voted_avg(weight) as float) as weightfrom (select train_pa1(addBias(features),label,"-mix host01,host02,host03")

    as (feature,weight)from kdd10a_train_x3

    ) t group by feature;

    MIX Server

    Mix server

    13

  • Copyright 201 Treasure Data. All Rights Reserved.

    Model updates

    Async add

    AVG/Argmin KLD accumulator

    hash(feature) % N

    Non-blocking Channel(single shared TCP connection w/ TCP keepalive)

    classifiers

    Mix serv.Mix serv.

    Computation/training is not being blocked

    MIX Server

    14

  • Copyright 201 Treasure Data. All Rights Reserved.

    15

    Feature requirements in Treasure Data

  • Copyright 201 Treasure Data. All Rights Reserved.

    16

    Treasure Data/KaggleMaster/Data Scientists

    [email protected]@myui