Hot Chips & SC14トピックス、試作ボードの現状と今後 tutorial and presentation...

2014/12/10

Hot Chips & SC14トピックス、CAE試作ボードの現状と今後広島市立大学　情報科学研究科　　北村　俊明

HotChips 26での発表から

Hot Chipsとは

✤ 1989年以来夏に行われているマイクロプロセッサなどの半導体を中心とした学会

✤ ほとんど企業の発表で、最近は新製品の発表がよくおこなわれる

✤ モバイルからPC、サーバ、スパコン用プロセッサまでセッションがある

✤ FPGAのセッションもある

August 10-12, 2014 A Symposium on High-Performance ChipsFlint Center for the Performing Arts-Cupertino,CA http://www.hotchips.org

ADVANCE PROGRAM26

A Symposium of the Technical Committee on Microprocessors and Microcomputers of the IEEE Computer Society and the Solid-State Circuits Society

WarthmanAssociatesTechnical Writerswww.warthman.com

High-Performance Computing • SX-ACE Processor: NEC's Brand-New Vector Processor NEC • SPARC64 XIfx: Fujitsu’s Next Generation Processor for HPC Fujitsu • Anton 2: A 2nd-Generation ASIC for Molecular Dynamics Simulation D.E. Shaw Research

Keynote 1 Power Constraints: From Sensors to Servers Michael Muller ARM Mobile Processors• NVIDIA’s Tegra K1 System-on-Chip NVIDIA• Applying AMD’s “Kaveri” APU for Heterogeneous Computing AMD• NVIDIA’s Denver Processor NVIDIA

Technology• HBM: Memory Solution for Bandwidth-Hungry Processors SK Hynix Inc• Improved 3D Chip Stacking withThruChip Wireless Connections ThruChip Communications• CMOS Biochips for Point-of-Care Molecular Diagnostics InSilixa

ARM Servers• The AMD Opteron “Seattle”: A 64b ARM Dense Server Processor AMD• ARM Next-Generation IP Supporting LSI’s High-End Networking ARM, LSI Logic• X-Gene2: 28nm Scale-Out Processor Applied Micro

FPGAs• Design of a High-Density SOC-FPGA at 20nm Altera• Large-Scale Reconfigurable Computing in a Microsoft Datacenter Microsoft• Xilinx FPGAs Case Study: High Capacity and Performance 20nm FPGAs Xilinx• SDA: Software-Defined Accelerator for Large-Scale DNN Systems Baidu

High-Performance ASICs• Hardware-Accelerated Text Analytics IBM• Myriad2 “Eye” of the Computational-Vision Storm Movidius• Goldstrike 1: A 1st Generation Cryptocurrency Processor for Bitcoin Mining Cointerra• RayChip: Real-Time Ray Tracing Chip for Embedded Applications Siliconarts

Keynote 2 The Internet of Everything: What is it? What’s driving it? What comes next? Rob Chandhok Qualcomm Dense Servers and Server Technology• SCORPIO: 36-Core Shared-Memory Processor with a Coherent Mesh MIT

• Oracle’s Next-Generation SPARC Processor Cache Hierarchy Oracle• Unchaining the Datacenter with OpenPOWER: Reengineering a Server Ecosystem IBM• Intel C2000 Atom Microserver: Power Efficient Processing for the Data Center Intel

Big-Iron Servers• Performance Characteristics of the POWER8 Processor IBM• Next-Generation Oracle SPARC Processor Oracle• IvyBridge Server: Delivering Performance from Workstations to Mission Critical Intel

Tutorial 1: Emerging Trends in Hardware Support for Security• Security Basics Princeton• Mobile HW Security ARM• Secure Systems Design AMD• Mitigating Exploits, Rootkits and Advanced Persistent Threats Intel

• University Research in Hardware Security Princeton Tutorial 2: Internet of Things• Powering the Internet of Things TI• Ultra Low Power Design Approaches for IoT National University of Singapore• Connecting the IoT Qualcomm• Standards for Constrained IoT Devices ARM

Organizing CommitteeChairKrste Asanovic UC BerkeleyVice ChairFred WeberFinanceLily Jow HPAdvertisingDon Draper OracleSponsorshipAmr Zaky InvensensePublicationsRandall NeffPressRalph Wittig XilinxRegistrationCharlie Neuhauser Neuhauser

AssociatesLocation ServicesJohn Sell MicrosoftAllen BaumVolunteer CoordinatorGary Brown TensilicaWebmaster, ITKevin BrochProductionLance HammondMike AlbaughKeith DiefendorffSteering CommitteeChairAlan Jay SmithCommittee MembersAllen BaumDon Draper OraclePradeep Dubey IntelLily Jow HPJohn Mashey TechviserJohn Sell MicrosoftKeith DiefendorffProgram CommitteeProgram Co-ChairsSam Naffziger AMDGuri Sohi U. WisconsinCommittee MembersForest Baskett NEAPradeep Dubey IntelJohn Davis MicrosoftAlan Jay Smith UC BerkeleySteve Miller NetAppSubhasish Mitra StanfordStefan Rusu IntelTom McWilliams BayStorageBehnam Robatmili QualcommRalph Wittig XilinxMike Taylor UCSDBill Dally NVIDIAFounder Bob Stewart SRE

HOTCHIPS brings together designers and architects of high-performance chips, software, and systems. The tutorial andpresentation sessions focus on up-to-the-minute developments in leading-edge industrial designs and research projects. Register now at: https://www.123signup.com/register?id=drvzv

AMDのARMサーバ

✤ ARMではなくAMDが設計

✤ x86ではなくARMアーキテクチャでサーバ利用を目指す

THE AMD OPTERONTM

A1100 PROCESSOR CODENAMED "SEATTLE"

SEAN WHITE 11 AUGUST 2014

| AMD “SEATTLE” | HOT CHIPS 26 | 11 AUGUST 2014 2

“SEATTLE” – WHAT IS IT AND WHY?

\ What is it? ‒ “Seattle” is AMD’s first 64-bit ARM-based processor

‒ 8 ARM CortexTM-A57 cores ‒ 2 DDR3/4 DRAM channels ‒ 10G Ethernet, PCI-Express, SATA ‒ GlobalFoundries 28nm process

\ Why did AMD build it? ‒ “Seattle” is a dense server processor for datacenter applications

‒ Performance/dollar/watt drives today’s datacenter designs ‒ A significant number of datacenter workloads have inherently low Instructions Per Clock

(IPC) and high cache miss rates ‒ For such workloads, processors like “Seattle,” with smaller cores and caches, can deliver

the equivalent performance as traditional server processors with large cores and caches, but using much less power and area

‒ The 32-bit to 64-bit transition for the ARM architecture is a major shift in the industry, like the 32-bit to 64-bit transition in x86 was

‒ AMD is taking a leadership role in the 64-bit ARM space, as it did in the 64-bit x86 space

“SEATTLE” SOC OVERVIEW 28nm Process Technology

Cortex A5 System Control Processor

Cryptographic Coprocessor

L3 Cache 8MB

DDR3/4 Memory Controller

L2 Cache 1MB

64-bit Cortex

A57 Core

64-bit Cortex

A57 Core

64-bit Cortex

A57 Core

64-bit Cortex

A57 Core

L2 Cache 1MB

64-bit Cortex

A57 Core

64-bit Cortex

A57 Core

64-bit Cortex

A57 Core

64-bit Cortex

A57 Core

L2 Cache 1MB

1Gbit Ethernet (RGMII)

10Gbit Ethernet (KR)

SATA 3

PCIe Gen 3

Package • 27mm x 27mm, SP1 BGA

Power Efficient Cores • Up to Eight ARM Cortex-A57 cores • Up to 4MB shared L2 cache total

Cache Coherent Network • Full cache coherency • 8MB L3 cache • SMMU: I/O address mapping and protection

High Performance, Flexible Memory • Two 64-bit DDR3/4 channels with ECC • Two DIMMs/channel up to 1866Mhz • SODIMM, UDIMM, RDIMM support • Up to 128GB per CPU

Highly Integrated I/O • 8x SATA 3 (6Gb/s) ports • Two 10GBASE-KR Ethernet ports • 8 lanes PCI-Express® Gen 3, supports x8, x4, x2

System Control Processor • TrustZone® technology for enhanced security • Dedicated 1GbE system management port (RGMII) • SPI, UART, I2C interfaces

Cryptographic Coprocessor • Separate Cryptographic algorithm engine for

offloading encryption, decryption, compression, decompression computations

Standalone uATX board

• 1P standalone platform intended to meet the needs of partners (ISV, OSV, IHV) • Off-the-shelf 2U rack mount chassis

• DDR3 DIMMS only

• x8 PCIe Gen3 lanes supporting (1) x8 slot or

alternatively (2) x4 slots • NIC supported through add-in card option

• Supports up to 8 hard drives

• Provisions for remote access to start, stop, and

remote console will be provided

“SEATTLE” REFERENCE SYSTEM

“SEATTLE” REFERENCE SYSTEM BOARD

• uATX form factor

• 1 “Seattle” SP1 BGA processor

• DDR3 2-DIMM per memory channel config (up to 4 DIMMs per CPU)

• 1 x8 PCIe slot • 2 x4 PCIe slots an alternative via mux

• 8 SATA3 ports

• 2 10GBase-T connectors

• 4 I2C ports

• 2 UARTs

• Supports required debug features

ARMコア入りFPGA

✤ 20nmプロセスを使った製品✤ ARMコアを含むSoC全体を1チップに

Design of a High-Density SoC FPGA at 20nm

Brad Vest, Sean Atsatt, Mike Hutton Altera, San Jose

Device Goals

� Mid-Range FPGA: balance of performance/power/cost targeting Key Market Applications

� Key Targets and Metrics: − 491 MHz fixed-point DSP datapath for Wireless RRU − 1M+LEs at 350 MHz for 4xOTU4 (400G) OTN networks, with Partial Reconfig − Cloud Server Acceleration – Hardened Floating-Point − 28G transceivers to support 200G to 400G networking/routing − Dramatic die-size reduction

Overview and Floorplan

� TSMC 20SOC Process − 5.3B Tx, 11LM

� Resources − 1.15M LEs, 1.7M FFs − 64Mb embedded SRAM − 32 fPLL, 16 PLLs, 32 GCLK − 1.5 TFlops IEEE754 DSP − Dual-Core ARM A9 − Row-based redundancy

� I/O − 28G SERDES, >1.7Tb b/w − x72 2.667Gbps DDR4 w/

Hard memory Controller − Hardened PCIe/ILKN/10GE

Hardened Floating Point DSP

� Hardened IEEE 754 Floating Point adder & Multiplier − 12% DSP Area increase (<<1% die area)

� 100% Fixed Point backwards compatible − No performance or power penalty

� ‘Have your cake and eat it too’ � How is this possible?

− Overlaid FP algorithms on Fixed point circuits

Major Innovation – Hard Floating Point on a Commercial FPGA

DSP Block – 1000s of blocks at very low latency

� 1.5 TFLOPS of aggregate computation; 50 GFLOPS/W − 1678 blocks @ 2 FLOPS/clock @ 450 MHz = 1.520 GFLOPs − Can run individually or as large integrated DSP system

� Hardware recursive structure support (Vector Mode) − 10s/100s of DSP blocks can be seamlessly integrated − Internal/External pipeling of individual DSP elements

� Very small latency − Floating Point used for iterative algorithms – require small latency − Arria 10 Floating Point - 256 length dot products ~ 25 clocks − Standard FPGA Technology - 256 length systolic FIR filter ~750 clocks

C D AB+CD

I J IJ+KL+ MN+OP

AB+CD+EF+GH IJ+KL

AB+CD+ EF+GH

AB+CD+EF+GH+ IJ+KL+MN+OP

Vivado ® routes more complex designs on UltraScale UltraScale shows lower congestion on complex designs As a result, timing closure is accelerated Delivers 1 speedgrade higher Fmax

UltraScale Results

No routing congestion

High routing congestionCannot route

Spartan-6/Virtex-6(45nm/40nm)

7 Series(28nm)

UltraScale(20nm/16nm)

up to 50%

Power Optimizations

Static

Dynamic

Transceiver

Static

Dynamic

Transceiver

up to 50%

Static Dynamic

I/O Transceiver

25-45%

• Architectural optimizations • Low power mode

• I/O multi-mode control (cont’d from 28nm) • DDR4 voltage reduction

• CLB packing & reduced wire length

• HW based clock gating on leaf cells

• BRAM hardened data cascading

• BRAM dynamic power gating

• DSP hardened features

• MMCM & PLL lower supply voltage

• Process node

• Power binning & lower voltage scaling • 3D IC static power binned slices

up to 40%

up to 30%

up to 50%

up to 60%

up to 65%

up to 30%

up to 40%

装置の１部品から装置全体へ✤ SoCの流れに沿って、システムの１構成要素としてFPGAによる機能を利用すると言う構成から、FPGAの上でSoCを構成してしまうと言う方向に変化

✤ これを可能にしているのは、半導体の集積度向上✤ より高速な回路を要求、しかも消費電力の削減も

HOT CHIPS 26の資料

✤ http://www.hotchips.orgに歴代の資料があります。✤ 数年前の分から、プレゼンテーションのビデオも見られます。✤ ２６については、Keynoteのみ一般公開。✤ １２月には全て公開の予定です。

SuperComputing 2014からの話題

SuperComputing

✤ 毎年１１月に開催✤ 今年度は１１月１６～２２日New Orleansのコンベンションセンターで✤ 論文発表のペーパーセッション以外に、展示会とBoFセッションもある。

Hot Chips & SC14トピックス、試作ボードの現状と今後 tutorial and presentation...

Documents

ロータリートピックス

ATLYS ボード操作マニュアル（VHDLAtlys_rm_VHDL).pdf2012年5月（Ver 0.0 ） 2 ATLYSボード操作マニュアル(VHDL) 1）Project Navigator の起動

形2JCIE-EV01-RP1 センサ評価ボードユーザーズマニュアル · Title: 形2JCIE-EV01-RP1 センサ評価ボードユーザーズマニュアル Author: OMRON Keywords:

CAD 00 - 最初の ML403 ボード利用 -

Fuji sakuraボード　2013/11/23

トピックス COMIKET CHRONICLE 40th...198 OIE ROILE t 199 トピックス t COMIKET CHRONICLE 40thtopics トピックス ― 代表交代について・代表が代わった日

RY R8C38 DIP スイッチ基板製作マニュアル...RY_R8C38 ボードDIP スイッチ基板製作マニュアル 4. 本体の組み立て RY_R8C38 ボードにRY_R8C38 ボードDIP

トピックス 2017 Dermatology Resident Program Dermatology Resident Program 2017年度皮膚科研修医プログラム Education 教育 55 トピックス Topics 臨床 Clinical

SH7216 CPUボード R0K572167C001BR ユーザーズ … CPU ボードユーザーズマニュアルルネサスマイクロコンピュータ SuperHTM RISC engine ファミリ／SH7216

RX600シリーズ RX65N CPUボード AP-RX65N-0AAP-RX65N-0A

評価用ボード・ユーザー・ガイド...10441-010 図9. ハードウェア・インストール・ウィザード UG-364 評価用ボード・ユーザー・ガイド Rev

形2JCIE-EV01-AR1 センサ評価ボードユーザーズマニュアル · 2020-01-20 · 1 形2jcie-ev01-ar1 センサ評価ボードユーザーズマニュアル（cdsc-034）

定番＆最新FPGAの研究〜Altera編〜新入門ボー …58 新入門ボードDE0-CVにHDMIとステレオ・オーディオ出力を拡張する DE0-CVボードと DE0/DE0-nano拡張ボード

インテル® Edison ボード　ハッカソン東京

ニュース＆トピックス『ロボット情報WEBマガジ …...2017/10/02 · お問い合わせサイトマップ Home ニュース＆トピックス:2017年『ロボット情報WEBマガジン

MatrixQuestUSB for RZ/A1対応クラスドライバパフォーマンスデータ【測定環境】CPU：RZ/A1H CPUボード：ルネサス製RSK+ボード OS：CMSIS-RTOS TRX TOOL:ARMCC(DS-5)

SH7216 CPUボード R0K572167C001BR ユーザーズ …...User’s Manual R0K572167C001BR SH7216 CPU ボードユーザーズマニュアルルネサスマイクロコンピュータ

1000BASE-T 接続ボードユーザーズガイドsupport.express.nec.co.jp/.../N8104-126/N8104-126_user.pdfN8104-126 1000BASE-T 接続ボードユーザーズガイド構成品一覧表

02-1仕様書 - mhlw.go.jp...PF GW－B GB－S F せっこうボード不燃積層せっこうボード不燃積層せっこうボード（トラ ... SOP（F ） H27版改修 ver1.00

Wakayama.rbボード Ver UmeJam

Hot Chips & SC14トピックス、 試作ボードの現状と今後 tutorial and presentation...

Hot Chips & SC14トピックス、試作ボードの現状と今後 tutorial and presentation...