Upload
makoto-yui
View
2.284
Download
2
Embed Size (px)
Citation preview
Copyright 201 Treasure Data. All Rights Reserved.
Treasure Data Inc.Research Engineer@myui
2015/04/30Machine Learning Casual Talk #3 1
Hivemall v0.3
http://myui.github.io/
Copyright 201 Treasure Data. All Rights Reserved.
2015/04 1ML as a Service (MLaaS)(?)
2015/03
2009/03 NAIST XML
H141
2
Copyright 201 Treasure Data. All Rights Reserved.
3
0
2000
4000
6000
8000
10000
12000
Aug-12
Sep-12Oct-12
Nov-12
Dec-12Jan-13
Feb-13
Mar-13
Apr-13
May-13Jun-13
Jul-13
Aug-13
Sep-13Oct-13
Nov-13
Dec-13Jan-14
Feb-14
Mar-14
Apr-14
May-14Jun-14
Jul-14
Aug-14
Sep-14Oct-14
(
)10
Series A Funding
100
GartnerCool Vendor in Big Data
10
(201410):40 10
120 1
Copyright 201 Treasure Data. All Rights Reserved.
100+
15
4,000
500,0001
4
Copyright 201 Treasure Data. All Rights Reserved.
HivemallApache Hadoop
Hadoop HDFS
MapReduce(MRv1)
Hive/PIG
Hivemall
Apache YARN
Apache TezDAG MR v2
github.com/myui/hivemall
5
Copyright 201 Treasure Data. All Rights Reserved.
SQL
Hivemall
Mahout
CREATE TABLE lr_model ASSELECTfeature, -- reducers perform model averaging in parallelavg(weight) as weightFROM (SELECT logress(features,label,..) as (feature,weight)FROM train) t -- map-only taskGROUP BY feature; -- shuffled to reducers
APIHiveQLAPIstableSparkunstable)
Hadoop
6
Copyright 201 Treasure Data. All Rights Reserved.
Hivemall v0.3
7
(/) Perceptron Passive Aggressive (PA) Confidence Weighted (CW) Adaptive Regularization of Weight
Vectors (AROW) Soft Confidence Weighted (SCW) AdaGrad+RDA
PA Regression AROW Regression AdaGrad AdaDELTA
K & Minhashb-Bit Minhash (LSH variant)KMatrix Factorization
Feature engineering Feature hashing Feature scaling (normalization, z-score) TF-IDF vectorizer
v0.35
Copyright 201 Treasure Data. All Rights Reserved.
8
Matrix Factorization
kP,Q
Copyright 201 Treasure Data. All Rights Reserved.
9
Matrix Factorization
Biased MFSGDAdagrad
Copyright 201 Treasure Data. All Rights Reserved.
10
Matrix Factorization
Copyright 201 Treasure Data. All Rights Reserved.
11
Matrix Factorization/
Copyright 201 Treasure Data. All Rights Reserved.
12
1
2
N
Copyright 201 Treasure Data. All Rights Reserved.
create table kdd10a_pa1_model1 asselect feature,cast(voted_avg(weight) as float) as weightfrom (select train_pa1(addBias(features),label,"-mix host01,host02,host03")
as (feature,weight)from kdd10a_train_x3
) t group by feature;
MIX Server
Mix server
13
Copyright 201 Treasure Data. All Rights Reserved.
Model updates
Async add
AVG/Argmin KLD accumulator
hash(feature) % N
Non-blocking Channel(single shared TCP connection w/ TCP keepalive)
classifiers
Mix serv.Mix serv.
Computation/training is not being blocked
MIX Server
14
Copyright 201 Treasure Data. All Rights Reserved.
15
Feature requirements in Treasure Data
Copyright 201 Treasure Data. All Rights Reserved.
16
Treasure Data/KaggleMaster/Data Scientists
[email protected]@myui