View
4
Download
0
Category
Preview:
Citation preview
インシリコ創薬時代の 最新チップとアプリの開発状況 ソリューションアーキテクト 郡司 茂樹 shigegunjiintelcom バイオグリッド研究会2015
生命科学の発展を支える製品ポートフォリオ
COMPUTE
FABRIC
VISUALIZATION NETWORKING
Intelreg Xeonreg Intelreg Xeon Phitrade
Intelreg iGFX Intelreg Xeonreg
Intelreg Xeon Phitrade Iris Protrade Graphics
Embree Ray-Tracing
Intelreg Lustre Intelreg SSDNVMe RAID Controller Intelreg Xeonreg
Intelreg 1040GbE Intelreg Switch Si
Intelreg True Scale Intelreg Omni-Path
Intelreg 1040GbE
Intelreg Software Developer Tools
Intelreg Intel Cluster Ready Intelreg Data Center Manager
BoardsSystems
IA Programming Model amp Code Base The Broadest Technical Computing Ecosystem
STORAGE
Intelreg Xeonreg Processor
E5 Family
インテルの HPC パフォーマンスの基礎 ほぼ全域のワークロードにとって理想的
業界をリードする性能とワットあたりの性能
標準的な範囲のコア数を備え
高速なシリアル性能にもフォーカスした シリアルおよび並列ワークロードのための
汎用プロセッサー
3
wwwintelcomxeon
Intelreg Xeonreg Processor E5 Family
4
ディープラーニング も朝飯前
Intelreg Xeon Phitrade
Coprocessor 7120P
61 Cores 244 Threads 1238 GHz
121 TFLOPS(倍精度浮動小数点ピーク性能) 512 bit SIMD instructions
スレッドあたり32 個のベクトルレジスター 16GB GDDR5 メモリー 352 GBs
300W(冷却方式パッシブ) PCIe x16( IA のホストプロセッサーが必要)
22nm with the worldrsquos first 3-D Tri-Gate transistors
Linux operating system IP addressable
Common x86IA Programming Models and SW-Tools
wwwintelcomxeonphi
5
6
Is Xeon Phidagger performance compelling Vs Xeondagger E5v2 ldquo2-socket Xeon E5v2 systemrdquo Vs ldquo2-socket Xeon E5v2 system + Xeon Phi 7120rdquo
httpwwwintelcomperformance
daggerXeon = Intelreg Xeonreg processor daggerXeon Phi = Intelreg Xeon Phitrade coprocessor
Xeon Phi delivers up to 165 higher performance (with 1 card) versus 2-socket Xeon E5v2
3+ TFLOPS2
Intelreg Xeon Phitrade Product Family
第3世代 Intelreg Xeon Phitrade Product Family
第2世代 Intel Omni-Path Architecture
10nm プロセス技術
systems providers expected3
many more card-based systems
Knights Hill
Knights Landing
+
gt50
Knights Corner
2Hrsquo15 First
Commercial Systems
Knights Landing
Intelreg Xeon Phitrade Coprocessor ndash Applications
and Solutions Catalog
1 Claim based on calculated theoretical peak double precision performance capability for a single coprocessor 16 DP FLOPSclockcore 61 cores 123GHz = 1208 TeraFLOPS
2Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expectations of cores clock frequency and floating point operations per cycle FLOPS = cores x clock frequency x floating-point operations per second per cycle 3 Intel internal estimate
-プロセッサ(ブート可能) -広帯域メモリをオンパッケージ -インターコネクト内蔵
1 TFLOPS1
gt100 PFLOPS customer system compute commits to-date3
Intelreg Omni-Path Architecture Coming 2Hrsquo15
Infi
niB
and
56
56 低い 遅延4
Lower is Better vs 36 in InfiniBand
100 Gbps
Line speed
48ポート Switch Chip Architecture
高い システム 拡張性
高い アプリ性能 拡張性
13x
Maximize SINGLE SWITCH investment
48 ports supports up to 12 addrsquol nodes by only adding CABLES1
up to frac12 スケーラブル
Over 27k NODES in a 2-tier 5-hop FABRIC3
高いポート密度
小規模クラスタ 主流のクラスタ スパコン
スイッチ数の削減2
23x
wwwintelcomomnipath
1 As compared to a shipping 36-port edge InfiniBand switch 2 Reduction in up to frac12 fewer switches claim based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters 3 A23X based on 27648 nodes based on a cluster configured with the Intelreg Omni-Path Architecture using 48-port switch ASICs as compared with a 36-port switch chip that can support up to 11664 nodes 4 Latency reductions based on Mellanox CS7500 Director Switch and Mellanox SB7700SB7790 Edge switches compared to preliminary Intel simulations for Intelreg Omni-Path switches based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration (2-tier 5 total switch hops) using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling and provided to you for informational purposes Any differences in your system hardware software or configuration may affect your actual performancerdquo
Intelreg EE for Lustre Hadoopとの接続性
オープンソース インテル独自の拡張 Hadoopディストリビューションへのコネクター内訳
9 Hadoopに接続可能なLustre
ANL Selects Intel for Worldrsquos Biggest Supercomputer
Dagger Cray XC Series at National Energy Research Scientific Computing Center (NERSC) dagger Cray XC Series at National Nuclear Security Administration (NNSA)
2-system CORAL award extends IA leadership in extreme scale HPC
Aurora Argonne National Laboratory
gt180PF
April lsquo15
Theta Argonne National Laboratory
gt85PF
Trinity NNSAdagger
gt40PF
July rsquo14
Cori NERSCDagger
gt30PF
April rsquo14
2
gt$200M
+
The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery amp innovation
11
gt180 PFLOPS (option to increase up to 450 PF)
gt50000 nodes
13MW
2018 delivery
18X higher performancedagger
gt6X more energy efficientdagger
Prime Contractor
Subcontractor
Source Argonne National Laboratory and Intel daggerComparison of theoretical peak double precision FLOPS and power consumption to ANLrsquos largest current system MIRA (10PFs and 48MW)
Wang Bingqiang
Head of High Performance Computing BGI
アプリケーション対応状況Life Sciences
ldquoIntelrsquos leading technology amp product provide
great high performance computing power
which enable us achieve more genome
scientific research success for genome
application development for China and for the
whole human beingrdquo
12 13
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
生命科学の発展を支える製品ポートフォリオ
COMPUTE
FABRIC
VISUALIZATION NETWORKING
Intelreg Xeonreg Intelreg Xeon Phitrade
Intelreg iGFX Intelreg Xeonreg
Intelreg Xeon Phitrade Iris Protrade Graphics
Embree Ray-Tracing
Intelreg Lustre Intelreg SSDNVMe RAID Controller Intelreg Xeonreg
Intelreg 1040GbE Intelreg Switch Si
Intelreg True Scale Intelreg Omni-Path
Intelreg 1040GbE
Intelreg Software Developer Tools
Intelreg Intel Cluster Ready Intelreg Data Center Manager
BoardsSystems
IA Programming Model amp Code Base The Broadest Technical Computing Ecosystem
STORAGE
Intelreg Xeonreg Processor
E5 Family
インテルの HPC パフォーマンスの基礎 ほぼ全域のワークロードにとって理想的
業界をリードする性能とワットあたりの性能
標準的な範囲のコア数を備え
高速なシリアル性能にもフォーカスした シリアルおよび並列ワークロードのための
汎用プロセッサー
3
wwwintelcomxeon
Intelreg Xeonreg Processor E5 Family
4
ディープラーニング も朝飯前
Intelreg Xeon Phitrade
Coprocessor 7120P
61 Cores 244 Threads 1238 GHz
121 TFLOPS(倍精度浮動小数点ピーク性能) 512 bit SIMD instructions
スレッドあたり32 個のベクトルレジスター 16GB GDDR5 メモリー 352 GBs
300W(冷却方式パッシブ) PCIe x16( IA のホストプロセッサーが必要)
22nm with the worldrsquos first 3-D Tri-Gate transistors
Linux operating system IP addressable
Common x86IA Programming Models and SW-Tools
wwwintelcomxeonphi
5
6
Is Xeon Phidagger performance compelling Vs Xeondagger E5v2 ldquo2-socket Xeon E5v2 systemrdquo Vs ldquo2-socket Xeon E5v2 system + Xeon Phi 7120rdquo
httpwwwintelcomperformance
daggerXeon = Intelreg Xeonreg processor daggerXeon Phi = Intelreg Xeon Phitrade coprocessor
Xeon Phi delivers up to 165 higher performance (with 1 card) versus 2-socket Xeon E5v2
3+ TFLOPS2
Intelreg Xeon Phitrade Product Family
第3世代 Intelreg Xeon Phitrade Product Family
第2世代 Intel Omni-Path Architecture
10nm プロセス技術
systems providers expected3
many more card-based systems
Knights Hill
Knights Landing
+
gt50
Knights Corner
2Hrsquo15 First
Commercial Systems
Knights Landing
Intelreg Xeon Phitrade Coprocessor ndash Applications
and Solutions Catalog
1 Claim based on calculated theoretical peak double precision performance capability for a single coprocessor 16 DP FLOPSclockcore 61 cores 123GHz = 1208 TeraFLOPS
2Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expectations of cores clock frequency and floating point operations per cycle FLOPS = cores x clock frequency x floating-point operations per second per cycle 3 Intel internal estimate
-プロセッサ(ブート可能) -広帯域メモリをオンパッケージ -インターコネクト内蔵
1 TFLOPS1
gt100 PFLOPS customer system compute commits to-date3
Intelreg Omni-Path Architecture Coming 2Hrsquo15
Infi
niB
and
56
56 低い 遅延4
Lower is Better vs 36 in InfiniBand
100 Gbps
Line speed
48ポート Switch Chip Architecture
高い システム 拡張性
高い アプリ性能 拡張性
13x
Maximize SINGLE SWITCH investment
48 ports supports up to 12 addrsquol nodes by only adding CABLES1
up to frac12 スケーラブル
Over 27k NODES in a 2-tier 5-hop FABRIC3
高いポート密度
小規模クラスタ 主流のクラスタ スパコン
スイッチ数の削減2
23x
wwwintelcomomnipath
1 As compared to a shipping 36-port edge InfiniBand switch 2 Reduction in up to frac12 fewer switches claim based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters 3 A23X based on 27648 nodes based on a cluster configured with the Intelreg Omni-Path Architecture using 48-port switch ASICs as compared with a 36-port switch chip that can support up to 11664 nodes 4 Latency reductions based on Mellanox CS7500 Director Switch and Mellanox SB7700SB7790 Edge switches compared to preliminary Intel simulations for Intelreg Omni-Path switches based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration (2-tier 5 total switch hops) using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling and provided to you for informational purposes Any differences in your system hardware software or configuration may affect your actual performancerdquo
Intelreg EE for Lustre Hadoopとの接続性
オープンソース インテル独自の拡張 Hadoopディストリビューションへのコネクター内訳
9 Hadoopに接続可能なLustre
ANL Selects Intel for Worldrsquos Biggest Supercomputer
Dagger Cray XC Series at National Energy Research Scientific Computing Center (NERSC) dagger Cray XC Series at National Nuclear Security Administration (NNSA)
2-system CORAL award extends IA leadership in extreme scale HPC
Aurora Argonne National Laboratory
gt180PF
April lsquo15
Theta Argonne National Laboratory
gt85PF
Trinity NNSAdagger
gt40PF
July rsquo14
Cori NERSCDagger
gt30PF
April rsquo14
2
gt$200M
+
The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery amp innovation
11
gt180 PFLOPS (option to increase up to 450 PF)
gt50000 nodes
13MW
2018 delivery
18X higher performancedagger
gt6X more energy efficientdagger
Prime Contractor
Subcontractor
Source Argonne National Laboratory and Intel daggerComparison of theoretical peak double precision FLOPS and power consumption to ANLrsquos largest current system MIRA (10PFs and 48MW)
Wang Bingqiang
Head of High Performance Computing BGI
アプリケーション対応状況Life Sciences
ldquoIntelrsquos leading technology amp product provide
great high performance computing power
which enable us achieve more genome
scientific research success for genome
application development for China and for the
whole human beingrdquo
12 13
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
Intelreg Xeonreg Processor
E5 Family
インテルの HPC パフォーマンスの基礎 ほぼ全域のワークロードにとって理想的
業界をリードする性能とワットあたりの性能
標準的な範囲のコア数を備え
高速なシリアル性能にもフォーカスした シリアルおよび並列ワークロードのための
汎用プロセッサー
3
wwwintelcomxeon
Intelreg Xeonreg Processor E5 Family
4
ディープラーニング も朝飯前
Intelreg Xeon Phitrade
Coprocessor 7120P
61 Cores 244 Threads 1238 GHz
121 TFLOPS(倍精度浮動小数点ピーク性能) 512 bit SIMD instructions
スレッドあたり32 個のベクトルレジスター 16GB GDDR5 メモリー 352 GBs
300W(冷却方式パッシブ) PCIe x16( IA のホストプロセッサーが必要)
22nm with the worldrsquos first 3-D Tri-Gate transistors
Linux operating system IP addressable
Common x86IA Programming Models and SW-Tools
wwwintelcomxeonphi
5
6
Is Xeon Phidagger performance compelling Vs Xeondagger E5v2 ldquo2-socket Xeon E5v2 systemrdquo Vs ldquo2-socket Xeon E5v2 system + Xeon Phi 7120rdquo
httpwwwintelcomperformance
daggerXeon = Intelreg Xeonreg processor daggerXeon Phi = Intelreg Xeon Phitrade coprocessor
Xeon Phi delivers up to 165 higher performance (with 1 card) versus 2-socket Xeon E5v2
3+ TFLOPS2
Intelreg Xeon Phitrade Product Family
第3世代 Intelreg Xeon Phitrade Product Family
第2世代 Intel Omni-Path Architecture
10nm プロセス技術
systems providers expected3
many more card-based systems
Knights Hill
Knights Landing
+
gt50
Knights Corner
2Hrsquo15 First
Commercial Systems
Knights Landing
Intelreg Xeon Phitrade Coprocessor ndash Applications
and Solutions Catalog
1 Claim based on calculated theoretical peak double precision performance capability for a single coprocessor 16 DP FLOPSclockcore 61 cores 123GHz = 1208 TeraFLOPS
2Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expectations of cores clock frequency and floating point operations per cycle FLOPS = cores x clock frequency x floating-point operations per second per cycle 3 Intel internal estimate
-プロセッサ(ブート可能) -広帯域メモリをオンパッケージ -インターコネクト内蔵
1 TFLOPS1
gt100 PFLOPS customer system compute commits to-date3
Intelreg Omni-Path Architecture Coming 2Hrsquo15
Infi
niB
and
56
56 低い 遅延4
Lower is Better vs 36 in InfiniBand
100 Gbps
Line speed
48ポート Switch Chip Architecture
高い システム 拡張性
高い アプリ性能 拡張性
13x
Maximize SINGLE SWITCH investment
48 ports supports up to 12 addrsquol nodes by only adding CABLES1
up to frac12 スケーラブル
Over 27k NODES in a 2-tier 5-hop FABRIC3
高いポート密度
小規模クラスタ 主流のクラスタ スパコン
スイッチ数の削減2
23x
wwwintelcomomnipath
1 As compared to a shipping 36-port edge InfiniBand switch 2 Reduction in up to frac12 fewer switches claim based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters 3 A23X based on 27648 nodes based on a cluster configured with the Intelreg Omni-Path Architecture using 48-port switch ASICs as compared with a 36-port switch chip that can support up to 11664 nodes 4 Latency reductions based on Mellanox CS7500 Director Switch and Mellanox SB7700SB7790 Edge switches compared to preliminary Intel simulations for Intelreg Omni-Path switches based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration (2-tier 5 total switch hops) using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling and provided to you for informational purposes Any differences in your system hardware software or configuration may affect your actual performancerdquo
Intelreg EE for Lustre Hadoopとの接続性
オープンソース インテル独自の拡張 Hadoopディストリビューションへのコネクター内訳
9 Hadoopに接続可能なLustre
ANL Selects Intel for Worldrsquos Biggest Supercomputer
Dagger Cray XC Series at National Energy Research Scientific Computing Center (NERSC) dagger Cray XC Series at National Nuclear Security Administration (NNSA)
2-system CORAL award extends IA leadership in extreme scale HPC
Aurora Argonne National Laboratory
gt180PF
April lsquo15
Theta Argonne National Laboratory
gt85PF
Trinity NNSAdagger
gt40PF
July rsquo14
Cori NERSCDagger
gt30PF
April rsquo14
2
gt$200M
+
The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery amp innovation
11
gt180 PFLOPS (option to increase up to 450 PF)
gt50000 nodes
13MW
2018 delivery
18X higher performancedagger
gt6X more energy efficientdagger
Prime Contractor
Subcontractor
Source Argonne National Laboratory and Intel daggerComparison of theoretical peak double precision FLOPS and power consumption to ANLrsquos largest current system MIRA (10PFs and 48MW)
Wang Bingqiang
Head of High Performance Computing BGI
アプリケーション対応状況Life Sciences
ldquoIntelrsquos leading technology amp product provide
great high performance computing power
which enable us achieve more genome
scientific research success for genome
application development for China and for the
whole human beingrdquo
12 13
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
Intelreg Xeonreg Processor E5 Family
4
ディープラーニング も朝飯前
Intelreg Xeon Phitrade
Coprocessor 7120P
61 Cores 244 Threads 1238 GHz
121 TFLOPS(倍精度浮動小数点ピーク性能) 512 bit SIMD instructions
スレッドあたり32 個のベクトルレジスター 16GB GDDR5 メモリー 352 GBs
300W(冷却方式パッシブ) PCIe x16( IA のホストプロセッサーが必要)
22nm with the worldrsquos first 3-D Tri-Gate transistors
Linux operating system IP addressable
Common x86IA Programming Models and SW-Tools
wwwintelcomxeonphi
5
6
Is Xeon Phidagger performance compelling Vs Xeondagger E5v2 ldquo2-socket Xeon E5v2 systemrdquo Vs ldquo2-socket Xeon E5v2 system + Xeon Phi 7120rdquo
httpwwwintelcomperformance
daggerXeon = Intelreg Xeonreg processor daggerXeon Phi = Intelreg Xeon Phitrade coprocessor
Xeon Phi delivers up to 165 higher performance (with 1 card) versus 2-socket Xeon E5v2
3+ TFLOPS2
Intelreg Xeon Phitrade Product Family
第3世代 Intelreg Xeon Phitrade Product Family
第2世代 Intel Omni-Path Architecture
10nm プロセス技術
systems providers expected3
many more card-based systems
Knights Hill
Knights Landing
+
gt50
Knights Corner
2Hrsquo15 First
Commercial Systems
Knights Landing
Intelreg Xeon Phitrade Coprocessor ndash Applications
and Solutions Catalog
1 Claim based on calculated theoretical peak double precision performance capability for a single coprocessor 16 DP FLOPSclockcore 61 cores 123GHz = 1208 TeraFLOPS
2Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expectations of cores clock frequency and floating point operations per cycle FLOPS = cores x clock frequency x floating-point operations per second per cycle 3 Intel internal estimate
-プロセッサ(ブート可能) -広帯域メモリをオンパッケージ -インターコネクト内蔵
1 TFLOPS1
gt100 PFLOPS customer system compute commits to-date3
Intelreg Omni-Path Architecture Coming 2Hrsquo15
Infi
niB
and
56
56 低い 遅延4
Lower is Better vs 36 in InfiniBand
100 Gbps
Line speed
48ポート Switch Chip Architecture
高い システム 拡張性
高い アプリ性能 拡張性
13x
Maximize SINGLE SWITCH investment
48 ports supports up to 12 addrsquol nodes by only adding CABLES1
up to frac12 スケーラブル
Over 27k NODES in a 2-tier 5-hop FABRIC3
高いポート密度
小規模クラスタ 主流のクラスタ スパコン
スイッチ数の削減2
23x
wwwintelcomomnipath
1 As compared to a shipping 36-port edge InfiniBand switch 2 Reduction in up to frac12 fewer switches claim based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters 3 A23X based on 27648 nodes based on a cluster configured with the Intelreg Omni-Path Architecture using 48-port switch ASICs as compared with a 36-port switch chip that can support up to 11664 nodes 4 Latency reductions based on Mellanox CS7500 Director Switch and Mellanox SB7700SB7790 Edge switches compared to preliminary Intel simulations for Intelreg Omni-Path switches based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration (2-tier 5 total switch hops) using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling and provided to you for informational purposes Any differences in your system hardware software or configuration may affect your actual performancerdquo
Intelreg EE for Lustre Hadoopとの接続性
オープンソース インテル独自の拡張 Hadoopディストリビューションへのコネクター内訳
9 Hadoopに接続可能なLustre
ANL Selects Intel for Worldrsquos Biggest Supercomputer
Dagger Cray XC Series at National Energy Research Scientific Computing Center (NERSC) dagger Cray XC Series at National Nuclear Security Administration (NNSA)
2-system CORAL award extends IA leadership in extreme scale HPC
Aurora Argonne National Laboratory
gt180PF
April lsquo15
Theta Argonne National Laboratory
gt85PF
Trinity NNSAdagger
gt40PF
July rsquo14
Cori NERSCDagger
gt30PF
April rsquo14
2
gt$200M
+
The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery amp innovation
11
gt180 PFLOPS (option to increase up to 450 PF)
gt50000 nodes
13MW
2018 delivery
18X higher performancedagger
gt6X more energy efficientdagger
Prime Contractor
Subcontractor
Source Argonne National Laboratory and Intel daggerComparison of theoretical peak double precision FLOPS and power consumption to ANLrsquos largest current system MIRA (10PFs and 48MW)
Wang Bingqiang
Head of High Performance Computing BGI
アプリケーション対応状況Life Sciences
ldquoIntelrsquos leading technology amp product provide
great high performance computing power
which enable us achieve more genome
scientific research success for genome
application development for China and for the
whole human beingrdquo
12 13
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
Intelreg Xeon Phitrade
Coprocessor 7120P
61 Cores 244 Threads 1238 GHz
121 TFLOPS(倍精度浮動小数点ピーク性能) 512 bit SIMD instructions
スレッドあたり32 個のベクトルレジスター 16GB GDDR5 メモリー 352 GBs
300W(冷却方式パッシブ) PCIe x16( IA のホストプロセッサーが必要)
22nm with the worldrsquos first 3-D Tri-Gate transistors
Linux operating system IP addressable
Common x86IA Programming Models and SW-Tools
wwwintelcomxeonphi
5
6
Is Xeon Phidagger performance compelling Vs Xeondagger E5v2 ldquo2-socket Xeon E5v2 systemrdquo Vs ldquo2-socket Xeon E5v2 system + Xeon Phi 7120rdquo
httpwwwintelcomperformance
daggerXeon = Intelreg Xeonreg processor daggerXeon Phi = Intelreg Xeon Phitrade coprocessor
Xeon Phi delivers up to 165 higher performance (with 1 card) versus 2-socket Xeon E5v2
3+ TFLOPS2
Intelreg Xeon Phitrade Product Family
第3世代 Intelreg Xeon Phitrade Product Family
第2世代 Intel Omni-Path Architecture
10nm プロセス技術
systems providers expected3
many more card-based systems
Knights Hill
Knights Landing
+
gt50
Knights Corner
2Hrsquo15 First
Commercial Systems
Knights Landing
Intelreg Xeon Phitrade Coprocessor ndash Applications
and Solutions Catalog
1 Claim based on calculated theoretical peak double precision performance capability for a single coprocessor 16 DP FLOPSclockcore 61 cores 123GHz = 1208 TeraFLOPS
2Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expectations of cores clock frequency and floating point operations per cycle FLOPS = cores x clock frequency x floating-point operations per second per cycle 3 Intel internal estimate
-プロセッサ(ブート可能) -広帯域メモリをオンパッケージ -インターコネクト内蔵
1 TFLOPS1
gt100 PFLOPS customer system compute commits to-date3
Intelreg Omni-Path Architecture Coming 2Hrsquo15
Infi
niB
and
56
56 低い 遅延4
Lower is Better vs 36 in InfiniBand
100 Gbps
Line speed
48ポート Switch Chip Architecture
高い システム 拡張性
高い アプリ性能 拡張性
13x
Maximize SINGLE SWITCH investment
48 ports supports up to 12 addrsquol nodes by only adding CABLES1
up to frac12 スケーラブル
Over 27k NODES in a 2-tier 5-hop FABRIC3
高いポート密度
小規模クラスタ 主流のクラスタ スパコン
スイッチ数の削減2
23x
wwwintelcomomnipath
1 As compared to a shipping 36-port edge InfiniBand switch 2 Reduction in up to frac12 fewer switches claim based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters 3 A23X based on 27648 nodes based on a cluster configured with the Intelreg Omni-Path Architecture using 48-port switch ASICs as compared with a 36-port switch chip that can support up to 11664 nodes 4 Latency reductions based on Mellanox CS7500 Director Switch and Mellanox SB7700SB7790 Edge switches compared to preliminary Intel simulations for Intelreg Omni-Path switches based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration (2-tier 5 total switch hops) using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling and provided to you for informational purposes Any differences in your system hardware software or configuration may affect your actual performancerdquo
Intelreg EE for Lustre Hadoopとの接続性
オープンソース インテル独自の拡張 Hadoopディストリビューションへのコネクター内訳
9 Hadoopに接続可能なLustre
ANL Selects Intel for Worldrsquos Biggest Supercomputer
Dagger Cray XC Series at National Energy Research Scientific Computing Center (NERSC) dagger Cray XC Series at National Nuclear Security Administration (NNSA)
2-system CORAL award extends IA leadership in extreme scale HPC
Aurora Argonne National Laboratory
gt180PF
April lsquo15
Theta Argonne National Laboratory
gt85PF
Trinity NNSAdagger
gt40PF
July rsquo14
Cori NERSCDagger
gt30PF
April rsquo14
2
gt$200M
+
The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery amp innovation
11
gt180 PFLOPS (option to increase up to 450 PF)
gt50000 nodes
13MW
2018 delivery
18X higher performancedagger
gt6X more energy efficientdagger
Prime Contractor
Subcontractor
Source Argonne National Laboratory and Intel daggerComparison of theoretical peak double precision FLOPS and power consumption to ANLrsquos largest current system MIRA (10PFs and 48MW)
Wang Bingqiang
Head of High Performance Computing BGI
アプリケーション対応状況Life Sciences
ldquoIntelrsquos leading technology amp product provide
great high performance computing power
which enable us achieve more genome
scientific research success for genome
application development for China and for the
whole human beingrdquo
12 13
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
6
Is Xeon Phidagger performance compelling Vs Xeondagger E5v2 ldquo2-socket Xeon E5v2 systemrdquo Vs ldquo2-socket Xeon E5v2 system + Xeon Phi 7120rdquo
httpwwwintelcomperformance
daggerXeon = Intelreg Xeonreg processor daggerXeon Phi = Intelreg Xeon Phitrade coprocessor
Xeon Phi delivers up to 165 higher performance (with 1 card) versus 2-socket Xeon E5v2
3+ TFLOPS2
Intelreg Xeon Phitrade Product Family
第3世代 Intelreg Xeon Phitrade Product Family
第2世代 Intel Omni-Path Architecture
10nm プロセス技術
systems providers expected3
many more card-based systems
Knights Hill
Knights Landing
+
gt50
Knights Corner
2Hrsquo15 First
Commercial Systems
Knights Landing
Intelreg Xeon Phitrade Coprocessor ndash Applications
and Solutions Catalog
1 Claim based on calculated theoretical peak double precision performance capability for a single coprocessor 16 DP FLOPSclockcore 61 cores 123GHz = 1208 TeraFLOPS
2Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expectations of cores clock frequency and floating point operations per cycle FLOPS = cores x clock frequency x floating-point operations per second per cycle 3 Intel internal estimate
-プロセッサ(ブート可能) -広帯域メモリをオンパッケージ -インターコネクト内蔵
1 TFLOPS1
gt100 PFLOPS customer system compute commits to-date3
Intelreg Omni-Path Architecture Coming 2Hrsquo15
Infi
niB
and
56
56 低い 遅延4
Lower is Better vs 36 in InfiniBand
100 Gbps
Line speed
48ポート Switch Chip Architecture
高い システム 拡張性
高い アプリ性能 拡張性
13x
Maximize SINGLE SWITCH investment
48 ports supports up to 12 addrsquol nodes by only adding CABLES1
up to frac12 スケーラブル
Over 27k NODES in a 2-tier 5-hop FABRIC3
高いポート密度
小規模クラスタ 主流のクラスタ スパコン
スイッチ数の削減2
23x
wwwintelcomomnipath
1 As compared to a shipping 36-port edge InfiniBand switch 2 Reduction in up to frac12 fewer switches claim based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters 3 A23X based on 27648 nodes based on a cluster configured with the Intelreg Omni-Path Architecture using 48-port switch ASICs as compared with a 36-port switch chip that can support up to 11664 nodes 4 Latency reductions based on Mellanox CS7500 Director Switch and Mellanox SB7700SB7790 Edge switches compared to preliminary Intel simulations for Intelreg Omni-Path switches based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration (2-tier 5 total switch hops) using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling and provided to you for informational purposes Any differences in your system hardware software or configuration may affect your actual performancerdquo
Intelreg EE for Lustre Hadoopとの接続性
オープンソース インテル独自の拡張 Hadoopディストリビューションへのコネクター内訳
9 Hadoopに接続可能なLustre
ANL Selects Intel for Worldrsquos Biggest Supercomputer
Dagger Cray XC Series at National Energy Research Scientific Computing Center (NERSC) dagger Cray XC Series at National Nuclear Security Administration (NNSA)
2-system CORAL award extends IA leadership in extreme scale HPC
Aurora Argonne National Laboratory
gt180PF
April lsquo15
Theta Argonne National Laboratory
gt85PF
Trinity NNSAdagger
gt40PF
July rsquo14
Cori NERSCDagger
gt30PF
April rsquo14
2
gt$200M
+
The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery amp innovation
11
gt180 PFLOPS (option to increase up to 450 PF)
gt50000 nodes
13MW
2018 delivery
18X higher performancedagger
gt6X more energy efficientdagger
Prime Contractor
Subcontractor
Source Argonne National Laboratory and Intel daggerComparison of theoretical peak double precision FLOPS and power consumption to ANLrsquos largest current system MIRA (10PFs and 48MW)
Wang Bingqiang
Head of High Performance Computing BGI
アプリケーション対応状況Life Sciences
ldquoIntelrsquos leading technology amp product provide
great high performance computing power
which enable us achieve more genome
scientific research success for genome
application development for China and for the
whole human beingrdquo
12 13
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
3+ TFLOPS2
Intelreg Xeon Phitrade Product Family
第3世代 Intelreg Xeon Phitrade Product Family
第2世代 Intel Omni-Path Architecture
10nm プロセス技術
systems providers expected3
many more card-based systems
Knights Hill
Knights Landing
+
gt50
Knights Corner
2Hrsquo15 First
Commercial Systems
Knights Landing
Intelreg Xeon Phitrade Coprocessor ndash Applications
and Solutions Catalog
1 Claim based on calculated theoretical peak double precision performance capability for a single coprocessor 16 DP FLOPSclockcore 61 cores 123GHz = 1208 TeraFLOPS
2Over 3 Teraflops of peak theoretical double-precision performance is preliminary and based on current expectations of cores clock frequency and floating point operations per cycle FLOPS = cores x clock frequency x floating-point operations per second per cycle 3 Intel internal estimate
-プロセッサ(ブート可能) -広帯域メモリをオンパッケージ -インターコネクト内蔵
1 TFLOPS1
gt100 PFLOPS customer system compute commits to-date3
Intelreg Omni-Path Architecture Coming 2Hrsquo15
Infi
niB
and
56
56 低い 遅延4
Lower is Better vs 36 in InfiniBand
100 Gbps
Line speed
48ポート Switch Chip Architecture
高い システム 拡張性
高い アプリ性能 拡張性
13x
Maximize SINGLE SWITCH investment
48 ports supports up to 12 addrsquol nodes by only adding CABLES1
up to frac12 スケーラブル
Over 27k NODES in a 2-tier 5-hop FABRIC3
高いポート密度
小規模クラスタ 主流のクラスタ スパコン
スイッチ数の削減2
23x
wwwintelcomomnipath
1 As compared to a shipping 36-port edge InfiniBand switch 2 Reduction in up to frac12 fewer switches claim based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters 3 A23X based on 27648 nodes based on a cluster configured with the Intelreg Omni-Path Architecture using 48-port switch ASICs as compared with a 36-port switch chip that can support up to 11664 nodes 4 Latency reductions based on Mellanox CS7500 Director Switch and Mellanox SB7700SB7790 Edge switches compared to preliminary Intel simulations for Intelreg Omni-Path switches based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration (2-tier 5 total switch hops) using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling and provided to you for informational purposes Any differences in your system hardware software or configuration may affect your actual performancerdquo
Intelreg EE for Lustre Hadoopとの接続性
オープンソース インテル独自の拡張 Hadoopディストリビューションへのコネクター内訳
9 Hadoopに接続可能なLustre
ANL Selects Intel for Worldrsquos Biggest Supercomputer
Dagger Cray XC Series at National Energy Research Scientific Computing Center (NERSC) dagger Cray XC Series at National Nuclear Security Administration (NNSA)
2-system CORAL award extends IA leadership in extreme scale HPC
Aurora Argonne National Laboratory
gt180PF
April lsquo15
Theta Argonne National Laboratory
gt85PF
Trinity NNSAdagger
gt40PF
July rsquo14
Cori NERSCDagger
gt30PF
April rsquo14
2
gt$200M
+
The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery amp innovation
11
gt180 PFLOPS (option to increase up to 450 PF)
gt50000 nodes
13MW
2018 delivery
18X higher performancedagger
gt6X more energy efficientdagger
Prime Contractor
Subcontractor
Source Argonne National Laboratory and Intel daggerComparison of theoretical peak double precision FLOPS and power consumption to ANLrsquos largest current system MIRA (10PFs and 48MW)
Wang Bingqiang
Head of High Performance Computing BGI
アプリケーション対応状況Life Sciences
ldquoIntelrsquos leading technology amp product provide
great high performance computing power
which enable us achieve more genome
scientific research success for genome
application development for China and for the
whole human beingrdquo
12 13
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
Intelreg Omni-Path Architecture Coming 2Hrsquo15
Infi
niB
and
56
56 低い 遅延4
Lower is Better vs 36 in InfiniBand
100 Gbps
Line speed
48ポート Switch Chip Architecture
高い システム 拡張性
高い アプリ性能 拡張性
13x
Maximize SINGLE SWITCH investment
48 ports supports up to 12 addrsquol nodes by only adding CABLES1
up to frac12 スケーラブル
Over 27k NODES in a 2-tier 5-hop FABRIC3
高いポート密度
小規模クラスタ 主流のクラスタ スパコン
スイッチ数の削減2
23x
wwwintelcomomnipath
1 As compared to a shipping 36-port edge InfiniBand switch 2 Reduction in up to frac12 fewer switches claim based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters 3 A23X based on 27648 nodes based on a cluster configured with the Intelreg Omni-Path Architecture using 48-port switch ASICs as compared with a 36-port switch chip that can support up to 11664 nodes 4 Latency reductions based on Mellanox CS7500 Director Switch and Mellanox SB7700SB7790 Edge switches compared to preliminary Intel simulations for Intelreg Omni-Path switches based on a 1024-node full bisectional bandwidth (FBB) Fat-Tree configuration (2-tier 5 total switch hops) using a 48-port switch for Intelreg Omni-Path cluster and 36-port switch ASIC for either Mellanox or Intelreg True Scale clusters Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling and provided to you for informational purposes Any differences in your system hardware software or configuration may affect your actual performancerdquo
Intelreg EE for Lustre Hadoopとの接続性
オープンソース インテル独自の拡張 Hadoopディストリビューションへのコネクター内訳
9 Hadoopに接続可能なLustre
ANL Selects Intel for Worldrsquos Biggest Supercomputer
Dagger Cray XC Series at National Energy Research Scientific Computing Center (NERSC) dagger Cray XC Series at National Nuclear Security Administration (NNSA)
2-system CORAL award extends IA leadership in extreme scale HPC
Aurora Argonne National Laboratory
gt180PF
April lsquo15
Theta Argonne National Laboratory
gt85PF
Trinity NNSAdagger
gt40PF
July rsquo14
Cori NERSCDagger
gt30PF
April rsquo14
2
gt$200M
+
The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery amp innovation
11
gt180 PFLOPS (option to increase up to 450 PF)
gt50000 nodes
13MW
2018 delivery
18X higher performancedagger
gt6X more energy efficientdagger
Prime Contractor
Subcontractor
Source Argonne National Laboratory and Intel daggerComparison of theoretical peak double precision FLOPS and power consumption to ANLrsquos largest current system MIRA (10PFs and 48MW)
Wang Bingqiang
Head of High Performance Computing BGI
アプリケーション対応状況Life Sciences
ldquoIntelrsquos leading technology amp product provide
great high performance computing power
which enable us achieve more genome
scientific research success for genome
application development for China and for the
whole human beingrdquo
12 13
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
Intelreg EE for Lustre Hadoopとの接続性
オープンソース インテル独自の拡張 Hadoopディストリビューションへのコネクター内訳
9 Hadoopに接続可能なLustre
ANL Selects Intel for Worldrsquos Biggest Supercomputer
Dagger Cray XC Series at National Energy Research Scientific Computing Center (NERSC) dagger Cray XC Series at National Nuclear Security Administration (NNSA)
2-system CORAL award extends IA leadership in extreme scale HPC
Aurora Argonne National Laboratory
gt180PF
April lsquo15
Theta Argonne National Laboratory
gt85PF
Trinity NNSAdagger
gt40PF
July rsquo14
Cori NERSCDagger
gt30PF
April rsquo14
2
gt$200M
+
The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery amp innovation
11
gt180 PFLOPS (option to increase up to 450 PF)
gt50000 nodes
13MW
2018 delivery
18X higher performancedagger
gt6X more energy efficientdagger
Prime Contractor
Subcontractor
Source Argonne National Laboratory and Intel daggerComparison of theoretical peak double precision FLOPS and power consumption to ANLrsquos largest current system MIRA (10PFs and 48MW)
Wang Bingqiang
Head of High Performance Computing BGI
アプリケーション対応状況Life Sciences
ldquoIntelrsquos leading technology amp product provide
great high performance computing power
which enable us achieve more genome
scientific research success for genome
application development for China and for the
whole human beingrdquo
12 13
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
ANL Selects Intel for Worldrsquos Biggest Supercomputer
Dagger Cray XC Series at National Energy Research Scientific Computing Center (NERSC) dagger Cray XC Series at National Nuclear Security Administration (NNSA)
2-system CORAL award extends IA leadership in extreme scale HPC
Aurora Argonne National Laboratory
gt180PF
April lsquo15
Theta Argonne National Laboratory
gt85PF
Trinity NNSAdagger
gt40PF
July rsquo14
Cori NERSCDagger
gt30PF
April rsquo14
2
gt$200M
+
The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery amp innovation
11
gt180 PFLOPS (option to increase up to 450 PF)
gt50000 nodes
13MW
2018 delivery
18X higher performancedagger
gt6X more energy efficientdagger
Prime Contractor
Subcontractor
Source Argonne National Laboratory and Intel daggerComparison of theoretical peak double precision FLOPS and power consumption to ANLrsquos largest current system MIRA (10PFs and 48MW)
Wang Bingqiang
Head of High Performance Computing BGI
アプリケーション対応状況Life Sciences
ldquoIntelrsquos leading technology amp product provide
great high performance computing power
which enable us achieve more genome
scientific research success for genome
application development for China and for the
whole human beingrdquo
12 13
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
The Most Advanced Supercomputer Ever Built An Intel-led collaboration with ANL and Cray to accelerate discovery amp innovation
11
gt180 PFLOPS (option to increase up to 450 PF)
gt50000 nodes
13MW
2018 delivery
18X higher performancedagger
gt6X more energy efficientdagger
Prime Contractor
Subcontractor
Source Argonne National Laboratory and Intel daggerComparison of theoretical peak double precision FLOPS and power consumption to ANLrsquos largest current system MIRA (10PFs and 48MW)
Wang Bingqiang
Head of High Performance Computing BGI
アプリケーション対応状況Life Sciences
ldquoIntelrsquos leading technology amp product provide
great high performance computing power
which enable us achieve more genome
scientific research success for genome
application development for China and for the
whole human beingrdquo
12 13
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
Wang Bingqiang
Head of High Performance Computing BGI
アプリケーション対応状況Life Sciences
ldquoIntelrsquos leading technology amp product provide
great high performance computing power
which enable us achieve more genome
scientific research success for genome
application development for China and for the
whole human beingrdquo
12 13
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
0
1
2
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Intelreg Xeonreg processor E5-2697 v2 (optimized)
Xeon E5-2697 v2 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 (optimized) + NVIDIA K40 DPFP
Intelreg Xeonreg processor E5-2697 v3
Xeon E5-2697 v3 (optimized) + Intelreg Xeon Phitrade coprocessor 7120A
13 13
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is the Intelreg Xeonreg processor E5-2697 v2 compared to
the Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Offload processing on both and using the released code double precision code across the platforms 50 workload on the host and 50 on the coprocessor
Highlights The code was optimized delivered to the AMBER community (whoever has license) and available as an update patch during code configuration The benchmark information is at httpwwwksuiuceduResearchSTMV
Results Optimized Intel Xeon processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A offload demonstrated up to 241X improved performance over the Intel Xeon processor E5-2697 v2 Optimized offload process demonstrated 107X increased performance compared to NVIDIA K40 performance
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
AMBER 14 PME Tobacco Virus 1 Million Atoms
1 NODE
AMBER 14 Particle Mesh Ewald (PME) Tobacco Virus
For configuration details go here
1
152X
2X
226X
193X
241X
Other names and brands may be claimed as the property of others
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
0
1
2 nodes 3 nodes
Intelreg Xeonreg processor E5-2697 v2 (baseline)
Xeon E5-2697 v2 + Intelreg Xeon Phitrade coprocessor 7120A
Xeon E5-2697 v2 + NVIDIA K40 DPFP
ldquoXeon E5-2697 v2rdquo = Intelreg Xeonreg processor E5-2697 v2
14 14
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
Application AMBER 14
Description Bimolecular Simulations (Protein DNA RNA virus etc) Full double precision (DPDP) More at httpambermdorg
Availability Code Available as a patch Recipe Available here (Section 187 of the manual)
Usage Model Baseline is on the Intelreg Xeonreg processor E5-2697 v2 host only
(also measured in httpambermdorggpusbenchmarkshtmBenchmarks) and speed up is shown with offload processing on both the Intel Xeon processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor 7120A
Performance shown is for the released code double precision across the platforms 50 workload on the host 50 on the coprocessor
Highlights The code had been optimized will be delivered to the AMBER community (whoever has license) and available as update patch during code configuration
Results Optimized offload process demonstrated compelling cluster performance improvement up to 26X over the baseline Intelreg Xeonreg processor E5-2697 v2
AMBER 14 Particle Mesh Ewald (PME) Cellulose NPT
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
AMBER PME Cellulose NPT (408K Atoms)
1
114X 111X
137X
3 NODES CLUSTER BENCHMARK
Other names and brands may be claimed as the property of others
157X
132X
APPROVED FOR PUBLIC PRESENTATION
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
0
1
Intelreg Xeonreg processor E5-2697 v2
1 Intelreg Xeon Phitrade coprocessor 7120PX
2 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 1 Intelreg Xeon Phitrade coprocessor 7120PX
Intelreg Xeonreg processor E5-2697 v2 + 2 Intelreg Xeon Phitrade coprocessor 7120PX
156X
15
GROMACS 512K H2O with RF
Application GROMACS 50-RC1 Workload 512K H2O with RF method
Description GROMACS is a versatile package to perform molecular dynamics ie simulate the Newtonian equations of motion for systems with hundreds to millions of particles It is one of the fastest and the most popular Molecular Dynamics packages
Availability Code Version 50-rc1 available here and here Recipe Available here
Highlights Highly optimized for Intelreg Xeonreg Processors (AVX-
intrinsics) Able to run full simulation on Intelreg Xeon Phitrade
coprocessor natively + host processor using a symmetric model
Optimized with intrinsics for 512-bit vectorization on Intel Xeon Phi coprocessors
Results Symmetric process demonstrated up to 179X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
APPROVED FOR PUBLIC PRESENTATION
179X
1 NODE
SOURCE INTEL MEASURED RESULTS AS OF APRIL 2014
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
GROMACS 512K H2O with RF Speed Up
1 103X
172X
Other names and brands may be claimed as the property of others
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
16
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
NWChem 63 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2
NWChem 65 64S Intelreg Xeonreg processor E5-2697 v2 + 64 Intelreg Xeon
Phitrade Coprocessor 7120A 2
Co
mp
ara
tiv
e P
erf
orm
an
ce
NWChem 63rev2 and 65 CCSD(T) Method 32 Node Speed Up Application NWChem is a computational chemistry software package that includes quantum chemical and molecular dynamics functionality NWChem is developed the Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory (PNNL) More at httpwwwnwchem-sworg
Availability Code Available here and from the SVN repository Recipe Available here
Usage Model Offload using LEO and OpenMP
Highlights NWChem with Intelreg Xeon Phitrade coprocessor 7120A offloading is a compelling and cluster compelling application for the NWChem community
Results Compared to the NWChem 63rev2 and Intelreg Xeonreg processor E5-2697 v2 baseline 1) NWChem 65 CCSD(T) performed up to 124X faster with the
Intelreg Xeonreg processor E5-2697 v2 2) NWChem 65 CCSD(T) performed up to 152X faster with the
Intelreg Xeonreg processor E5-2697 v2 and the Intel Xeon Phi coprocessor 7120A
SOURCE INTEL MEASURED RESULTS AS OF JULY 2014
APPROVED FOR PUBLIC PRESENTATION 32 NODES
NWChem CCSD(T) Method
CLUSTER BENCHMARK
For configuration details go here
1
124X
152X
Other names and brands may be claimed as the property of others
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
0
5
10
15
20
25
30
1 Node 8 Nodes 32 Nodes
Intelreg Xeonreg processor E5-2697 v2 (Baseline 1 node 23 or 47 PPN)
Intelreg Xeonreg processor E5-2697 v3 (27 or 55 PPN)
Xeon E5-2697 v2 (23 or 47 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7120A (240T)
Xeon E5-2697 v3 (27 or 55 PPN) + 1 Intelreg Xeon Phitrade coprocessor 7110A (240T)
17
NAMD 210 Pre-Release STMV
Application NAMD 210 pre-release STMV
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large biomolecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is
available as a pre-release Use the nightly build Recipe Available here
Usage Model Single rank on host with 47 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the STMV workload the Intelreg Xeonreg processor E5-2697 v3 and the Intelreg Xeon Phitrade coprocessor (32 nodes 55 PPN) improved performance by up to 32X compared to the baseline processor (1 node 47 PPN)
APPROVED FOR PUBLIC PRESENTATION
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
32 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase STMV (~1M atoms)
Co
mp
ara
tiv
e P
erf
orm
an
ce
1 2X
68X
122X
272X
12X 21X
79X
131X
20X
32X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
242X
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
0
1
2
1 Node 2 Nodes
Intelreg Xeonreg processor E5-2697 v3 (Baseline 1 node)
Intelreg Xeonreg processor E5-2697 v3 + Intelreg Xeon Phitrade coprocessor B1-
7110A (240T)
18
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
NAMD 210 Pre-Release ApoA1
Application NAMD 210 pre-release ApoA1
Description A parallel object-oriented molecular dynamics code designed for high-performance simulation of large bio molecular systems More at httpwwwksuiuceduResearchnamd
Availability Code Intelreg Xeon Phitrade coprocessor support is available as a pre-
release Use the nightly build Recipe Available here
Usage Model Single rank on host with 55 threads Various computations are offloaded to Intelreg Xeon Phitrade coprocessor from each thread
Highlights Intelreg Xeon Phitrade coprocessor support is now in the development branch of NAMD 210 pre-release
Results For the ApoA1 workload 2-node performance can be accelerated by up to 261X using a single Intelreg Xeon Phitrade coprocessor
SOURCE INTEL MEASURED RESULTS AS OF SEPTEMBER 2014
APPROVED FOR PUBLIC PRESENTATION 2 NODES CLUSTER BENCHMARK
For configuration details go here
NAMD 210 (pre-release) Cluster Performance Increase ApoA1 (~92K atoms) 55 PPN
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
194X
261X
(Baseline 1 node 55PPN)
Other names and brands may be claimed as the property of others
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
0
1
2
3
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S Xeon E5-2697 v3 + Tesla K40c boost off ECC on
2S Xeon E5-2697 v3 + Xeon Phi 7120A turbo off (LAMMPS IA Package)
19
LAMMPS Stillinger-Weber Water Benchmark
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-
range terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Simulation rate increase with Intelreg Package is up to 36X Concurrent Intel Xeon Phi coprocessor computations and MPI communications yield improved speedup and higher node counts
SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
LAMMPS Liquid Crystal Benchmark Performance (Mixed Precision)
Co
mp
ara
tiv
e P
erf
orm
an
ce
For configuration details go here
1
3X
341X
1
305X
36X
Other names and brands may be claimed as the property of others
ldquoXeon E5-2697 v3rdquo = Intelreg Xeonreg processor E5-2697 v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
09X
APPROVED FOR PUBLIC PRESENTATION
No testing
on Tesla
NEW
32 NODES CLUSTER BENCHMARK
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
20
Application LAMMPS
Description Simulation of molecular systems with classical models More at httplammpssandiagov
Availability Code In main LAMMPS repository Recipe Available here
Usage Model Load balancer offloads part of neighbor-list and non-bond force calculations to Intelreg Xeon Phitrade coprocessor for concurrent calculations with CPU
Highlights Improved results with Intelreg Xeonreg processor E5-2697 v3 and Intel Xeon Phi coprocessor 7120A Dynamic load balancing allows for concurrent Data transfer between host and coprocessor Calculations of neighbor-list non-bond bond and long-range
terms Same routines in LAMMPS Intel Package also run faster on CPU
Results Up to 168X performance improvement utilizing Intelreg Xeonreg processors and Intelreg Xeon Phitrade coprocessors with application optimization on a single node compared to the baseline configuration Performance gains continue to hold at 147X when scaling up to 32 nodes
LAMMPS Rhodopsin Benchmark 512K Atoms
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF AUGUST 2014
0
1
1 Node 32 Nodes
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS Baseline)
2S Intelreg Xeonreg processor E5-2697 v3 (LAMMPS IA Package)
2S E5-2697 v3 + Intelreg Xeon Phitrade coprocessor 7110P7120A Turbo Off
(LAMMPS IA Package)
Co
mp
ara
tiv
e P
erf
orm
an
ce
LAMMPS Rhodopsin Benchmark Performance (Mixed Precision)
APPROVED FOR PUBLIC PRESENTATION 32 NODES CLUSTER BENCHMARK
For configuration details go here
1
127X
168X
1 107X
147X
Other names and brands may be claimed as the property of others
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
0
1
ERR161544 SRR034966_1 ERR000589 SRR002273_1
Intelreg Xeonreg processor E5-2697 v2 + 1 NVIDIA Tesla K40
Intelreg Xeonreg processor E5-2697 v3
21
Johns Hopkins Bowtie 2 Multiple workloads
Application Bowtie2 version 223 Intelreg AVX2 port
Description NVBowtie version 0993 Bowtie is a GPU-accelerated re-engineering of Bowtie2 a very widely used short-read aligner While being completely rewritten from scratch nvBowtie reproduces many (though not all) of the features of Bowtie2 httpnvlabsgithubionvbionvbowtie_pagehtml
Availability Code Available here Recipe Not available Check for future availability here
Usage Model ERR161544 SRR002273_1 HEK001(TGen) ERR000589_1 SRR033552_1 SRR034966_1 amp ERR024139_1
Highlights See more here
Results Bowtie2 running on the Intelreg Xeonreg processor E5-2697 v3 with Intelreg AVX2 port faster than NVBowtie running on the Intelreg Xeonreg processor E5-2697 v2 and the NVIDIA Tesla K40 for 6 of 7 workloads NVIDIA published data of K40 compared to Intelreg Xeonreg processor E5-2600 (6 cores) on one workload
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2015
Johns Hopkins Bowtie 2 TGen Workload Speed Up
Co
mp
ara
tiv
e I
ncr
ea
se
1 NODE
For configuration details go here
1
187X
Other names and brands may be claimed as the property of others
APPROVED FOR PUBLIC PRESENTATION
159X
108X
88X
NEW
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
22
Application Burrows-Wheeler Aligner version 0510 BWA-ALN is represented in this benchmark Workload is korean_female (read file 35 GB 30 GB reference data base)
Description BWA is a popular software package for mapping low-divergent sequences against a large reference genome such as the human genome More at httpbio-bwasourceforgenet
Availability Code Available here Recipe Available here
Usage Model Hybrid MPI + OpenMP using symmetric mode
Highlights Results are identical to the unmodified run of BWA-ALN
Results The Intelreg Xeonreg processor E5-2697 v2 and the Intelreg Xeon Phitrade coprocessor symmetric process demonstrated up to 186X improved performance over the baseline Intelreg Xeonreg processor E5-2697 v2
SOURCE INTEL MEASURED RESULTS AS OF JANUARY 2014
0
1
BWA-ALN Speed Up
2S Intelreg Xeonreg processor E5-2697 v2 (baseline BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 (optimized BWA-ALN)
2S Intelreg Xeonreg processor E5-2697 v2 + Intelreg Xeon Phitrade coprocessor
7120A
Co
mp
ara
tiv
e P
erf
orm
an
ce
APPROVED FOR PUBLIC PRESENTATION 1 NODE
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Burrows-Wheeler Aligner (BWA-ALN) Human Genome
For configuration details go here
1
124X
186X
Other names and brands may be claimed as the property of others
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3
ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
23
Application Basic Local Alignment Search Tool (BLASTn) v30
Description Searching for alignment in nucleotide query sequences against a known nucleotide db volume set National Center for Biotechnology Information (NCBI) More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 100 NCBI queries (concatenated) against db refseq_rna00-02 are distributed to the Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 8020 and 592318
Highlights Throughput for this load sharing model has a small sweet spot for a sufficiently large query set
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 152X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
BLAST BLASTn v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
BLASTn v30 Speed Up
1 NODE
For configuration details go here
Co
mp
ara
tiv
e P
erf
orm
an
ce
1
152X
APPROVED FOR PUBLIC PRESENTATION
149X
122X
141X
126X
NEW
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
0
1
2S Xeon E5-2697 v2 (BLASTn v30 baseline)
2S Xeon E5-2697 v2 + Xeon Phi 7120A
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
2S Xeon E5-2697 v3 (BLASTn v30 baseline)
2S Xeon E5-2697 v3 + Xeon Phi 7120A2
2S Xeon E5-2697 v2 + Xeon Phi 7120A OFS parallelized
24
BLAST BLASTp v30
Other names and brands may be claimed as the property of others SOURCE INTEL MEASURED RESULTS AS OF MARCH 2015
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors Performance tests such as SYSmark and MobileMark are measured using specific computer systems components software operations and functions Any change to any of those factors may cause the results to vary You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases including the performance of that product when combined with other products See benchmark tests and configurations in the speaker notes For more information go to httpwwwintelcomperformance
Application Basic Local Alignment Search Tool (BLASTp) v30
Description Searching for alignment in protein query sequence against a known protein db volume set More at httpblastncbinlmnihgov
Availability Code Available here Recipe Available here
Usage Model 4 (multiple queries multiple db) 40 NCBI queries (concatenated) against db nr_sorted00-02 are distributed to Intelreg Xeonreg processor and Intelreg Xeon Phitrade coprocessor for maximum speedup sweet spot Experiment was repeated 20 times with the pick of queries randomized for a sweet spot split 337 and 2857
Highlights Throughput for this offload model has a small sweet spot for a sufficiently large query set Throughput is limited due to GAT stage not parallelized
Results Compared to the baseline simulation rate speed up with Intelreg Xeonreg processor E5-2697 v3 and Intelreg Xeon Phitrade coprocessor 7120A heterogeneous model is 141X Performance is also improved on the CPU due to Output Formatting Section (OFS) parallelization
1 NODE
For configuration details go here
BLASTp v30 Speed Up
Co
mp
ara
tiv
e P
erf
orm
an
ce
ldquoXeon E5-2697 v2v3rdquo = Intelreg Xeonreg processor E5-2697 v2v3 ldquoXeon Phi 7120Ardquo = Intelreg Xeon Phitrade coprocessor 7120A
141X
1
APPROVED FOR PUBLIC PRESENTATION
139X
121X 13X
115X
NEW
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
法務情報
本資料に記載されているすべての製品コンピューターシステム日付および数値は現在の予想に基づくものであり予告なく変更されることがあります インテルプロセッサーナンバーはパフォーマンスの指標ではありませんプロセッサーナンバーは 同一プロセッサーファミリー内の製品の機能を区別します異なるプロセッサーファミリー 間の機能の区別には用いません 詳細についてはhttpwwwintelcojpjpproductsprocessor_number を参照してください インテルreg プロセッサーチップセットおよびデスクトップボードにはエラッタと呼ばれる設計上の不具合が含まれている可能性があり公表されている仕様とは異なる動作をする場合があります現在確認済みのエラッタについてはインテルまでお問い合わせください インテルreg バーチャライゼーションテクノロジーを利用するには同テクノロジーに対応したインテルreg プロセッサーBIOSおよび仮想マシンモニター (VMM) を搭載したコンピューターシステムが必要です機能性性能もしくはその他のバーチャライゼーションテクノロジーの特長はご使用のハードウェアやソフトウェアの構成によって異なりますご利用になる OS によってはソフトウェアアプリケーションとの互換性がない場合があります各 PC メーカーにお問い合わせください 詳細についてはhttpwwwintelcojpcontentwwwjpjavirtualizationvirtualization-technologyhardware-assist-virtualization-technologyhtml を参照してください すべての条件下で絶対的なセキュリティーを提供できるコンピューターシステムはありませんインテルreg トラステッドエグゼキューションテクノロジー (インテルreg TXT) を利用するにはインテルreg バーチャライゼーションテクノロジーインテルreg TXT に対応したプロセッサーチップセットBIOSAuthenticated Code モジュールインテルreg TXT に対応した Measured Launched Environment (MLE) を搭載するコンピューターシステムが必要ですさらにインテルreg TXTを利用するにはシステムが TPM v1s を搭載している必要があります 詳細についてはhttpwwwintelcojpcontentwwwjpjadata-securitysecurity-overview-general-technologyhtml を参照してください インテルreg ターボブーストテクノロジーに対応したシステムが必要ですインテルreg ターボブーストテクノロジーおよびインテルreg ターボブーストテクノロジー 20 は一部のインテルreg プロセッサーでのみ利用可能です各 PC メーカーにお問い合わせください実際の性能はハードウェアソフトウェアシステム構成によって異なります詳細についてはhttpwwwintelcojpjptechnologyturboboost を参照してください インテルreg AES New Instructions (インテルreg AES-NI) を利用するにはインテルreg AES-NI に対応したプロセッサーを搭載したコンピューターシステムおよび命令を正しい手順で実行する他社製ソフトウェアが必要ですインテルreg AES-NI は一部のインテルreg プロセッサーで利用できます提供状況については各 PC メーカーなどにお問い合わせください詳細についてはhttpsoftwareintelcomen-usarticlesintel-advanced-encryption-standard-instructions-aes-ni (英語) を参照してください IntelインテルIntel ロゴIntel Inside ロゴXeonXeon InsideIntel Xeon Phi はアメリカ合衆国および またはその他の国における Intel Corporation の商標です copy 2012 Intel Corporation 無断での引用転載を禁じます
26
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
法律的な免責条項 パフォーマンス
性能に関するテストや評価は特定のコンピューターシステムコンポーネントまたはそれらを組み合わせて行ったものでありこのテストによるインテル製品の性能の概算の値を表しているものですシステムハードウェアソフトウェアの設計構成などの違いにより実際の性能は掲載された性能テストや評価とは異なる場合がありますシステムやコンポーネントの購入を検討される場合はほかの情報も参考にしてパフォーマンスを総合的に評価することをお勧めしますインテル製品の性能評価についてさらに詳しい情報をお知りになりたい場合はhttpwwwintelcomperformance を参照してください インテルは本資料で参照している第三者のベンチマークまたは Web サイトの設計や実装について管理や監査を行っていません本資料で参照している Web サイトまたは類似の性能ベンチマークが報告されているほかの Web サイトも参照して本資料で参照しているベンチマークが購入可能なシステムの性能を正確に表しているかを確認されるようお勧めします 各ベンチマークの相対パフォーマンスはベンチマーク結果に 10 のベースライン値を割り当て各プラットフォームのベンチマークの結果をベースラインとなるプラットフォームの実際のベンチマーク結果で割り報告されたパフォーマンスの向上に比例する相対パフォーマンスの数値を割り当てることによって計算しています SPECSPECintSPECfpSPECrateSPECpowerSPECjAppServerSPECjEnterpriseSPECjbbSPECompMSPECompLSPEC MPI はStandard Performance Evaluation Corporation の商標です詳細については httpwwwspecorgspectrademarkshtml (英語) を参照してください TPC ベンチマークは Transaction Processing Council の商標です詳細についてはhttpwwwtpcorg (英語) を参照してください SAP および SAP NetWeaver はドイツおよびその他の国々における SAP AG の登録商標です詳細についてはhttpwwwsapcombenchmark(英語) を参照してください 本資料に掲載されている情報は現状のまま提供され明示されているか否かにかかわらずまた禁反言によるとよらずにかかわらずいかなる知的財産権のライセンスを許諾するものではありませんこの情報に関する明示または黙示の保証 (特定目的への適合性商品適格性あらゆる特許権著作権その他知的財産権の非侵害性への保証を含む)に関してもいかなる責任も負いません 性能に関するテストに使用されるソフトウェアとワークロードは性能がインテルreg マイクロプロセッサー用に最適化されていることがありますSYSmark や MobileMark などの性能テストは特定のコンピューターシステムコンポーネントソフトウェア操作機能に基づいて行ったものです結果はこれらの要因によって異なります製品の購入を検討される場合は他の製品と組み合わせた場合の本製品の性能などほかの情報や性能テストも参考にしてパフォーマンスを総合的に評価することをお勧めします
27
Recommended