25
1 Parallel Software for SemiDefinite Programming wit h Sparse Schur Complement Matrix Makoto Yamashita @ Tokyo-Tech Katsuki Fujisawa @ Chuo University Mituhiro Fukuda @ Tokyo-Tech Yoshiaki Futakata @ University of Virginia Kazuhiro Kobayashi @ National Maritime Research Institute Masakazu Kojima @ Tokyo-Tech Kazuhide Nakata @ Tokyo-Tech Maho Nakata @ RIKEN ISMP 2009 @ Chicago [2009/08/26]

Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix

  • Upload
    dobry

  • View
    36

  • Download
    0

Embed Size (px)

DESCRIPTION

Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix. Makoto Yamashita @ Tokyo-Tech Katsuki Fujisawa @ Chuo University Mituhiro Fukuda @ Tokyo-Tech Yoshiaki Futakata @ University of Virginia Kazuhiro Kobayashi @ National Maritime Research Institute - PowerPoint PPT Presentation

Citation preview

Page 1: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

1

Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix

Makoto Yamashita @ Tokyo-TechKatsuki Fujisawa @ Chuo UniversityMituhiro Fukuda @ Tokyo-TechYoshiaki Futakata @ University of VirginiaKazuhiro Kobayashi @ National Maritime Research InstituteMasakazu Kojima @ Tokyo-TechKazuhide Nakata @ Tokyo-TechMaho Nakata @ RIKEN

ISMP 2009 @ Chicago [2009/08/26]

Page 2: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

2

Extremely Large SDPs Arising from various fields

Quantum Chemistry Sensor Network Problems Polynomial Optimization Problems

Most computation time is related to Schur complement matrix (SCM)

[SDPARA]Parallel computation for SCM In particular, sparse SCM

Page 3: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

3

Outline

1. SemiDefinite Programming and Schur complement matrix

2. Parallel Implementation3. Parallel for Sparse Schur complement4. Numerical Results5. Future works

Page 4: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

4

Standard form of SDP

Page 5: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

5

Primal-Dual Interior-Point Methods

Page 6: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

6

Computation for Search Direction

Schur complement matrix ⇒ Cholesky Factorizaiton

Exploitation of Sparsity in 1.ELEMENTS

2.CHOLESKY

Page 7: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

7

Bottlenecks on Single Processor

Apply Parallel Computation to the Bottlenecks

in secondOpteron 246 (2.0GHz)

LiOH HF

m 10592 15018

ELEMENTS 6150( 43%) 16719( 35%)

CHOLESKY 7744( 54%) 20995( 44%)

TOTAL 14250(100%) 47483(100%)

Page 8: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

8

SDPARA SDPA parallel version

(generic SDP solver) MPI & ScaLAPACK

Row-wise distribution for ELEMENTS parallel Cholesky factorization for CHOLESKY

http://sdpa.indsys.chuo-u.ac.jp/sdpa/

Page 9: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

9

Row-wise distribution for evaluation of the Schur complement matrix

4 CPU is availableEach CPU computes only their assigned rows

. No communication between CPUsEfficient memory management

Page 10: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

10

Parallel Cholesky factorization We adopt Scalapack for the Cholesky factorization of t

he Schur complement matrix We redistribute the matrix from row-wise to two-dimen

sional block-cyclic distribtuion

Redistribution

Page 11: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

11

Computation time on SDP from Quantum Chemistry [LiOH]

14250

3514969

414

61501654

30884

7744

1186357

141

1

10

100

1000

10000

100000

1 4 16 64#processors

second TOTAL

ELEMENTSCHOLESKY

AIST super clusterOpteron 246 (2.0GHz)

6GB memory/node

Page 12: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

12

Sclability on SDP from Quantum Chemistry [NF]

1

10

100

1 2 4 8 16 32 64#processors

scalability TOTAL

ELEMENTSCHOLESKY

Total 29 times

ELEMENTS 63 times

CHOLESKY 39 times

ELEMENTS is very effective

Page 13: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

13

Sparse Schur complement matrix

Schur complement matrix becomes very sparse for some applications.

⇒Simple Row-wise loses its efficiencyfrom Control Theory(100%) from Sensor Network(2.12%)

Page 14: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

14

Sparseness ofSchur complement matrix

Many applications havediagonal block structure

Page 15: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

15

Exploitation of Sparsityin SDPA

We change the formula by row-wise

F1

F2

F3

Page 16: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

16

ELEMENTS forSparse Schur complement

150 40 30 20

135 20

70 10

50 5

30

3

Load on each CPU

CPU1:190

CPU2:185

CPU3:188

Page 17: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

17

CHOLESKY forSparse Schur complement Parallel Sparse Cholesky factorization implemente

d in MUMPS MUMPS adopts Multiple Frontal method

150 40 30 20

135 20

70 10

50 5

30

3

Memory storage on each processor should

be consecutive.

The distribution for ELEMENTS matches

this method.

Page 18: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

18

Computation time for SDPs from Polynomial Optimization Problem

1126645 486 479

270 251

411207

10555

2916

664391

243 336179 188

1

10

100

1000

10000

1 2 4 8 16 32#processors

second TOTAL

ELEMENTSCHOLESKY

tsubasaXeon E5440 (2.83GHz)

8GB memory/node

Parallel Sparse Cholesky achieves mild scalability.ELEMENTS attains 24x speed-up on 32 CPUs.

Page 19: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

19

ELEMENTS Load-balance on 32 CPUs

Only first processor has a little heavier computation.

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Processor Number

Tim

e(se

cond

)

0

200000

400000

600000

800000

1000000

1200000

1400000

#dis

trib

uted

ele

men

ts

Time(second) #distributed elements

Page 20: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

20

Automatic selection ofsparse / dense SCM Dense Parallel Cholesk

y achieves higher scalability than Sparse Parallel Cholesky

Dense becomes better for many processors.

We estimate both computation time using computation cost and scalability. 1

10

1 2 4 8 16 32#processors

second auto

densesparse

Page 21: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

21

Sparse/Dense CHOLESKY for a small SDP from POP

70 52 4424

14 14

13663

3523

13 13

71 52 44 36 30 30

1

10

100

1000

1 2 4 8 16 32#processors

second auto

densesparse

tsubasaXeon E5440 (2.83GHz)

8GB memory/node

Only on 4 CPUs, the auto selection failed.(since scalability on sparse cholesky

is unstable on 4 CPUs.)

Page 22: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

22

Numerical Results

Comparison with PCSDP Sensor Network Problem

generated by SFSDP Multi Threading

Quantum Chemistry

Page 23: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

23

SDPs from Sensor Network#sensors 1,000 (m=16,450: density 1.23%)

#CPU 1 2 4 8 16

SDPARA 28.2 22.1 16.7 13.8 27.3

PCSDP M.O. 1527 887 591 368

#sensors 35,000 (m=527,096: density )

#CPU 1 2 4 8 16

SDPARA 1080 845 614 540 506

PCSDP Memory Over. if #sensors >= 4,000

(time unit : second)

Page 24: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

24

MPI + Multi Threading for Quantum Chemistry

N.4P.DZ.pqgt11t2p(m=7230)

5376336206

5803

2785418134

2992

142739190

1630

78954729

931

46502479

565

100

1000

10000

100000

1 2 4 8 16

#nodes

PCSDPSDPARA(1)SDPARA(2)SDPARA(4)SDPARA(8)

seco

nd

64x speed-up on [16nodesx8threads]

Page 25: Parallel Software  for SemiDefinite Programming with Sparse Schur Complement Matrix

25

Concluding Remarks & Future works

1. New parallel schemes for sparse Schur complement matrix

2. Reasonable Scalability3. Extremely large-scale SDPs with sparse Sc

hur complement matrix

Improvement on Multi-Threading for sparse Schur complement matrix