Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data

Parallel muiticategory Support Vector Machines (PMC-SVM)

for Classifying Microarray Data

研究生研究生 : : 許景復許景復

單位單位 : : 光電與通訊研究光電與通訊研究所所

OutlineOutline

IntroductionIntroduction SMO-SVM SMO-SVM Parallel Muiticategory SVMParallel Muiticategory SVM Parallel Implementation and Environment Parallel Evaluation and Analysis Classifying Microarray DataClassifying Microarray Data ConclusionsConclusions

IntroductionIntroduction

Biologists want to separate the data into multiple categories using a reliable cancer diagnostic model.

Based on a comprehensive evaluation of several muiticategory classification methods, it is found that support vector machines (SVM) are the most effective classifiers for performing accurate cancer diagnosis form gene expression.

In the paper, we developed new parallel muiticategory support vector machines (PMC-SVM) based on the sequential minimum optimization-type decomposition methods for support vector machines (SMO-SVM) of LibSVM term that needs less memory.

SMO-SVM

}}1,1{,,,...,1),,{( iynR

}1,1{: nRF

TeQf 2

1)(min

The basic idea behind SVM is to separate two point classes of a training set,

by using a decision function optimization by solving a

convex quadratic programming optimization problem of the form

,,...,1,0

liCSubject to

SMO-SVM

),,(, jijiji xxKyyQ ,,...,2,1, Nji

entries jiQ , are defined as

where denotes a kernel function, such as polynomial kernel

or Gaussian kernel.

whereT

Nyyyy ],...,,[,],...,,[ 2121

is a constant.

e is a vector of all ones. Q is the symmetric positive

semidefinite matrix.

SMO-SVM

The subset, denoted as B, is called working set.

If B is restricted to have only two elements, this special type of decomposition method is the Sequential Minimal Optimization (SMO).

Step2:k

If Is a stationary point of (2), stop. Otherwise, find

a two-element working set }.,...,1{},{ ljiB Define BlN \},...,1{ , and k

as subvector of corresponding to B Nand ,respectively.

There are four steps to implement SMO:

1Find as the initial feasible solution. Set

1kStep1:

Step3: If 02, ijjjiiji KKKa

iTkNBNB

ijiiji Qp

Solve the following sub-problem with the variable

subject to

TNjjii

))()((4

22 kjj

iTkNBNB

ijiiji Qp

subject to constraints of (4)

Step4:1k

kN 1 1 kk

Set to be the optimal solution of (4) and

and go to step 2.. Set

Parallel Muiticategory SVM(PMC-SVM)

In muiticategory classification of support vector machines, the algorithm will generate sub models for categories.

Generating models is the most time consuming task in this algorithm so it is desirable to distribute all the sub models onto multiple processors and each processor perform a subtask to improve the performance.

2/)1( kkk

Example:

We have 4 processors and k=16, that means we have to generate k(k-1)/2 models,

which are total 120 models.

,1,...,0

),()1(,

piNkT ip

is the total number of the

processors and the number of

categories.

Parallel Implementation and Environment

One is the sharedmemory SGI Origin 2800 Supercomputers(sweetgum) equipped with 128 CPUs, 64 gigabytes of memory, and 1.6 Terabytes of fiberchannel disk.

The other is a distributed memory Linux cluster (mimosa) with 192 nodes.

Parallel Evaluation and Analysis

PMC-SVM is tested on both sweetgum and mimosa platforms using the above two datasets.

Dataset 1: Letter_scale

classes: 26

trainig size: 16,000

features: 16

Dataset 2: Mnist_scale

classes: 10

training size: 21,000

features: 780

Figure 2. The speedup of PMC-SVM on sweetgum with Dataset 1 (Letter_scale )

Figure 3. The speedup of PMC-SVM on mimosa with Datasets 1 (Leetter_scale)

Figure 4. The speedup of PMC-SVM on swetgum with Datasets 2 (Mnist_problem)

Figure 4. The speedup of PMC-SVM on mimosa with Datasets 2 (Mnist_problem)

Classifying Microarray DataClassifying Microarray Data

Dataset 3: 14_Tumors(40Mb)

Human tumor types: 14

normal tissue types: 12

Dataset 4: 11_Tumors(18Mb)

Human tumor types: 11

In the work, two microarray datasets were to demonstrate the

performance of PMC-SVM, as listed below:

#of PEs Time (s) Speedup

1 774.2 -

2 434.7 1.78

4 240.1 3.22

8 150.7 5.14

16 90.5 8.55

24 74.1 10.45

#of PEs Time (s) Speedup

1 257.7 -

2 140.9 1.82

4 82.2 3.13

8 57.2 4.50

16 39.9 6.62

Table 6: Performance on sweetgum (Dataset 3)

Table 7: Performance on sweetgum (Dataset 4)

ConclusionsConclusions

PMC-SVM has been developed for classifying large datasets based on SMO-type decomposition method.

The experimental results show that the high performance computing techniques and parallel implementation can achieve a significant speedup.

Thanks for your attendance!Thanks for your attendance!

Parallel muiticategory Support Vector Machines (PMC-SVM) for Classifying Microarray Data

Documents

Classifying multisensor remote sensing data: …hss.ulb.uni-bonn.de/2007/1285/1285.pdf · Classifying multisensor remote sensing data: Concepts, Algorithms and Applications ... multisensor

Una plataforma microarray para tipos de grupos sanguíneos...Una plataforma microarray para tipos de grupos sanguíneos. Una plataforma microarray para tipos de grupos sanguíneos

Microarray ISAC

svm lap pj

Classifying Matter Mixtures, Elements and Compounds

Classifying Hand-written Chinese Characters using

Ranking SVM

Introduction to Matter Chemistry & Classifying Matter ...leilehuaphysicalscience.weebly.com/uploads/8/5/2/0/8520482/intro... · Chemistry & Classifying Matter (Section 2.1) Matter

Bioinformatica Microarray

Análise Cromossomica por Microarray

SVM TR53 SVM TR107 - Star Foils Tecniche/SVM TR...SVM TR53 THERMAL TRANSFER OVERPRINTER Esempi di stampa SVM TR107 I marcatori multipista trasversali soddisfano le esigenze di stampare

SVM Admin2

Svm(support vector)

ΣΗΜΕΙΩΣΕΙΣ ΣΕΜΙΝΑΡΙΑΚΩΝ ΔΙΑΛΕΞΕΩΝmde-lab.aegean.gr/images/stories/docs/matlab-marinakis.pdf · bioinfo\microarray - Bioinformatics Toolbox -- Microarray

Mapping and Classifying Knowledge Management Tools with

Kernel SVM

Tutorial Microarray

New · 2004. 11. 16. · SVM -O 155- 10 ETRIOII'/ 4.1 SVM 2007119} non-eye non-eye —L 719-1 SVM 1 Ù)A5L, Radial Basis Function(RBF) SVM non-eye SVM* 17 non-eye SVM 71 gl EL 711

Introduzione ai microarray games

Classifying complex surfaces and symplectic 4-manifoldsmedia.ma.utexas.edu/media/First_Cut/Tim_Peruz/Lecture_Notes.pdf · First Cut Seminar Tim Perutz Classifying complex surfaces