Upload
dacia
View
30
Download
0
Embed Size (px)
DESCRIPTION
Parallel Computation for SDPs Focusing on the Sparsity of Schur Complements Matrices. Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhide Nakata @ Tokyo Tech Maho Nakata @ RIKEN. - PowerPoint PPT Presentation
Citation preview
INFOMRS 2011 @ Charlotte 1
Parallel Computation for SDPs Focusing on the Sparsity of Schur Complements Matrices
Makoto Yamashita @ Tokyo TechKatsuki Fujisawa @ Chuo UnivMituhiro Fukuda @ Tokyo TechKazuhide Nakata @ Tokyo TechMaho Nakata @ RIKEN
INFORMS Annual Meeting @ Charlotte 2011/11/15(2011/11/13-2011/11/16)
INFOMRS 2011 @ Charlotte 2
Key phrase
SDPARA:The fastest solver for large SDPs
available at http://sdpa.sf.net/
SemiDefinite Programming Algorithm paRAllel veresion
INFOMRS 2011 @ Charlotte 3
SDPA Online Solver
1. Log-in the online solver
2. Upload your problem
3. Push ’Execute’ button
4. Receive the result via Web/Mail
http://sdpa.sf.net/ ⇒ Online Solver
INFOMRS 2011 @ Charlotte 4
Outline
1. SDP applications2. Standard form and
Primal-Dual Interior-Point Methods3. Inside of SDPARA4. Numerical Results5. Conclusion
INFOMRS 2011 @ Charlotte 5
SDP Applications 1.Control theory
Against swing,we want to keep stability.
Stability Condition⇒ Lyapnov Condition⇒ SDP
INFOMRS 2011 @ Charlotte 6
Ground state energy Locate electrons
Schrodinger Equation⇒Reduced Density Matrix⇒SDP
SDP Applications2. Quantum Chemistry
INFOMRS 2011 @ Charlotte 7
SDP Applications3. Sensor Network Localization
Distance Information⇒Sensor Locations
Protein Structure
INFOMRS 2011 @ Charlotte 8
Standard form
The variables are Inner Product is The size is roughly determined by
m
kkk
m
kkk
kk
OYCYzA
zbD
OXmkbXA
XCP
1
1
,s.t.
max)(
),,,1(s.t.
min)(
mnn RSSzYX ,,,,
n
jiijijYXYX
1,
YXn
Pm
and of size the
)(in sconstraintequality ofnumber the Our target 000,30m
INFOMRS 2011 @ Charlotte 9
Primal-Dual Interior-Point Methods
Feasible region
mnn RSSzYX ,,,, *** ,, zYX
Optimal
Central Path
000 ,, zYX
),,( dzdYdXTarget
111 ,, zYX
222 ,, zYX
INFOMRS 2011 @ Charlotte 10
Schur Complement Matrix
2/,1
1T
m
jjj
dXdXdXYXdYRdX
dzADdY
rBdz
jiij AYXAB 1where
Schur Complement Equation
Schur Complement Matrix
1. ELEMENTS (Evaluation of SCM)2. CHOLESKY (Cholesky factorization of SCM)
INFOMRS 2011 @ Charlotte 11
Computation time on single processor
SDPARA replaces these bottleneks by parallel computation
Control POP
ELEMENTS 22228 668
CHOLESKY 1593 1992
Total 23986 2713
Time unit is second, SDPA 7, Xeon 5460 (3.16GHz)
%95
INFOMRS 2011 @ Charlotte 12
Dense & Sparse SCM
SDPARA can select Dense or Sparse automatically.
Fully dense SCM (100%) Quantum Chemistry
Sparse SCM (9.26%) POP
B B
jiij AYXAB 1
INFOMRS 2011 @ Charlotte 13
Different Approaches
Dense Sparse
ELEMENTS Row-wise distribution
Formula-cost-based distribution
CHOLESKY Parallel dense
Cholesky(Scalapack)
Parallel sparseCholesky(MUMPS)
INFOMRS 2011 @ Charlotte 14
Three formulas for ELEMENTS
jiij AYXAB 1
jiji
ji
AUBYXAU
denseAAF
,
,:1
1
,, ,1
2
,
,:
jiji
ji
AXVBYAV
sparseAdenseAF
, ,,
1,,,
3 ,:
jiij
ji
AYAXB
sparseAAF B
dense sparse1A mA
1A
mA
1F
2F
3FAll rows are independent.
INFOMRS 2011 @ Charlotte 15
Row-wise distribution
Assign servers in a cyclic manner
Simple idea⇒Very EFFICINENT
High scalability
Server1
Server2
Server3
Server2
Server3
Server4
Server1
Server4
INFOMRS 2011 @ Charlotte 16
Numerical Results on Dense SCM Quantum Chemistry (m=7230, SCM=100%), middle size SDPARA 7.3.1, Xeon X5460, 3.16GHz x2, 48GB memory
28678
7192
1826548
13147
29700
7764
2294
10
100
1000
10000
100000
1 4 16
Servers
Second
ELEMENTSCHOLESKYTotal
ELEMENTS 15x speedupTotal 13x speedup
Very fast!!
INFOMRS 2011 @ Charlotte 17
Drawback of Row-wise to Sparse SCM
, ,,
1,,,
3 ,:
jiij
ji
AYAXB
sparseAAF B
dense sparse1A mA
1A
mA
Simple row-wise is ineffective for sparse SCM
We estimate cost of each element
jiij AAB ##2)(cost
INFOMRS 2011 @ Charlotte 18
Formula-cost-based distribution
150 40 30 20
135 20
70 10
50 5
30
3
Server1 190
Server2 185
Server3 188
Good load-balance
INFOMRS 2011 @ Charlotte 19
Numerical Results on Sparse SCM Control Theory (m=109,246, SCM=4.39%), middle size SDPARA 7.3.1, Xeon X5460, 3.16GHz x2, 48GB memory
1137
296
85
4053
1386950
5284
17441074
10
100
1000
10000
1 4 16Servers
Second
ELEMENTSCHOLESKYTotal
ELEMENTS 13x speedupCHOLESKY 4.7xspeedup Total 5x speedup
INFOMRS 2011 @ Charlotte 20
Comparison with PCSDPby SDP with Dense SCM
developed by Ivanov & de Klerk
Servers 1 2 4 8 16
PCSDP 53768 27854 14273 7995 4050
SDPARA 5983 2002 1680 901 565
Time unit is secondSDP: B.2P Quantum Chemistry (m = 7230, SCM = 100%)Xeon X5460, 3.16GHz x2, 48GB memory
SDPARA is 8x faster by MPI & Multi-Threading
INFOMRS 2011 @ Charlotte 21
Comparison with PCSDPby SDP with Sparse SCM
SDPARA handles SCM as sparse Only SDPARA can solve this size
#sensors 1,000 (m=16450; density=1.23%)
#Servers 1 2 4 8 16
PCSDP O.M. 1527 887 591 368
SDPARA 28.2 22.1 16.7 13.8 27.3
#sensors 35,000 (m=527096; density=6.53 × 10−3%)
#Servers 1 2 4 8 16
PCSDP Out of Memory
SDPARA 1080 845 614 540 506
INFOMRS 2011 @ Charlotte 22
Extremely Large-Scale SDPs
16 Servers [Xeon X5670(2.93GHz) , 128GB Memory]
m SCM time
Esc32_b(QAP) 198,432 100% 129,186 second (1.5days)
Other solvers can handle only 000,40m
The LARGEST solved SDP in the world
INFOMRS 2011 @ Charlotte 23
Conclusion
Row-wise & Formula-cost-based distribution
parallel Cholesky factorization SDPARA:
The fastest solver for large SDPs
http://sdpa.sf.net/ & Online solverThank you very much for your attention.