37
SUMMARY 모바일 CPU 기술 동향 및 산업 전망 l저자l 한태희 PD / KEIT 시스템반도체 PD이유상 책임 / KEIT 시스템반도체 PD모바일 CPU: 인텔 vs ARM 모바일기기에서는 PC달리 다양한 OS경쟁하고, 배터리 사용시간이 중요해 인텔의 x86 CPU보다 전력효율이 높은 ARM CPU 코어가 선호되고 있음 인텔은 자사 x86아키텍처를 모바일向으로 전력화 Atom 프로세서를 발전시키고 있으며, 압도적인 공정 · 소자기술과 기존 PCSW와의 호환성이 강점임 ARMv7에서 Out-of-order 수퍼스칼라를, v8에서 64비트를 지원하며 고성능화를 추구하는 동시에, 에너지 효율적 big.LITTLE 구조를 제안하면서 데스크톱, 서버용 프로세서 시장에서도 인텔과 경쟁 관계에 돌입 모바일 CPU 기반의 Application Processor(AP) 시장 급신장 스마트폰, 태블릿의 등장으로 AP 시장이 확대되고 있으며, 201182$에서 2015362$PCCPU 시장을 넘어설 전망임 향후 2-3년간은 스마트폰과 태블릿 제품군 중심으로 성장하다 디지털가전, 자동차 IT 분야로 확산 전망 마이크로프로세서의 발전 방향 전력소모 발열 문제로 인해 무어의 법칙에 의한 발전이 둔화될 것으로 전망되는 가운데, 클럭 속도 향상은 더디어지고 멀티/매니코어가 보편화되며 구조적으로 이종 코어(Heterogeneous core), 또는 소수의 고성능向 big 코어와 다수의 고에너지효율向 little 코어의 조합이 가능성이 높음 커스텀 하드웨어 가속기 사용을 병행하여 에너지 효율을 극대화하고 효율적인 메모리 계층구조, 새로운 인터커넥션 방식, 특화 SW통해 데이터 이동을 최소화하는 추세로 발전될 전망

모바일 CPU 기술 동향 및 산업 전망 - home.skku.eduhome.skku.edu/~mobiletech/201207_CPU.pdf · 정도로 빠르게 성장한 스마트폰 시장에 진입하고자 하고

Embed Size (px)

Citation preview

  • SUMMARY

    CPU

    PD / KEIT PD / KEIT PD

    CPU: vs ARM

    PC OS , x86 CPU ARM CPU

    x86 Atom ,

    PC SW

    ARM v7 Out-of-order , v8 64 , big.LITTLE ,

    CPU Application Processor(AP)

    , AP , 2011 82$ 2015 362$ PC CPU

    2-3 , IT

    , / (Heterogeneous core), big little

    , , SW

  • 54 55

    PD ISSUE REPORT JULY 2012 VOL 12-6

    1.

    CPU( )

    CPU //UMPC PDA PC , SoC AP(Application Processor)

    - CPU, GPU (OS) ,

    AP CPU, 2D/3D GPU, ISP(Image Signal Processor) , DSP, IP

    * VPU(Visual Processing Unit or Video Processing Unit)

    [ 3-1] AP

    MPU GPU

    DSP

    I/D Cache

    SIMDShaderShader

    ShaderShader

    2D3D

    Connectivity

    Timer

    Keypad

    I2S/AC97/SPDIF

    USB/IrDA/MMC

    Haptics

    UART/I2C/SPI

    SDRAM/Flash

    GSM/HSDPA

    HQ Audio Decoder

    Effector(BBE)

    WCDMA

    HQ Audio Encoder

    Codec modules

    MMP

    Control processor

    Video Encoder

    Video Decoder(720p, 1080p)

    ISP, JPEG

    BB

    - CPU OS , SW , ARM Cortex-A

    - GPU OpenVG, OpenGL ES Imagination Technologies PowerVR ARM Mali GPU IP

    - H.264, MPEG2, DivX, Xvid IP

  • 54 55

    ISSUE 3 CPU

    CPU

    CPU PC ARM

    - CPU , PC 2011 5 , ARM 500$

    ARM [1, 2]

    - Calxeda 32- ARM 64- ARM AppliedMicro

    - , 4 Lava International Atom

    -

    CPU 2006

    [ 3-2] Pollack's rule

    4 2.8 4

    4 4 4

    1 1 1

    2 2 2

    2 1. 4 2

    CPUCPU

    Freq

    uenc

    y

    Freq

    uenc

    yArchitectureEnhancement

    Tran

    sist

    or

    Tran

    sist

    or

    Pow

    er

    Pow

    er

    Perf

    orm

    ance

    Perf

    orm

    ance

    Perf

    orm

    ance

    Wat

    t

    Perf

    orm

    ance

    Wat

    t

    Die size(Transistors) 2xPerformance 1.4-1.5x

    * : 25 5(2010. 10)

    - Pollacks rule

    * Pollacks rule: () 2 1.4( ) , ,

  • 56 57

    PD ISSUE REPORT JULY 2012 VOL 12-6

    ARM CPU

    - PC ,

    - CPU HD , AV , , , 3D

    vs. ARM

    CPU PC x86 ARM

    - PC CPU 86%, AMD 12% x86 , MS OS CPU

    x86 CISC(Complex Instruction Set Computing) ARM RISC (Reduced Instruction Set Computing)

    * CISC RISC , RISC

    * 30 RISC CISC

    , , ARM

    - Out-of-Order

    * Out-of-Order Xeon In-Order Atom 2.5 [2]

    ARM

    *

    - Calxeda ARM , Xeon 1/10 1.5W ,

  • 56 57

    ISSUE 3 CPU

    * Calxeda HP 8 288 Calxeda ARM

    - , 3-4 3D

    PC , OS

    - AP ARM , (, , , , TI 600 ) ARM AP

    / x86 ARM

    - x86 Medfield CPU, GPU, , I/O 2012 22 3D Tri-gate Medfield 2013

    - ARM CPU , IDM(Integrated Device Manufacturer)

    (Win-Tel) MS ARM Windows 8 Windows RT(RunTime) CPU

    - Windows 8(Windows RT) ARM CPU Windows SW ARM , MS Silverlight, Adobe Flash10 AIR, Mozilla Firefox Windows 8 ARM SW ARM CPU

    - 2015 13% , PC(/) ARM 8%

    * PC CPU 107$ 20$ , 2015 2.2B$

  • 58 59

    PD ISSUE REPORT JULY 2012 VOL 12-6

    2. CPU

    IT AP

    AP , CPU SoC 5

    - AP 45.0% 2015 362$

    - AP , 2015 AP PC CPU( 342$)

    , , 3D PC CPU

    , AP 2011 (4.9 ), (5,500 ), TV(700 ) , 2015 (9.6 ), (2.6 ), TV(0.9 )

    8 16 MCU(Micro Controller Unit) , AP , PC CPU

    - AP , 2015 (0.3 ), (0.3 ), (0.2 ) AP

    AP PC CPU , AP

    - 2009 13% 2015 48% 9.6 ,

  • 58 59

    ISSUE 3 CPU

    [ 3-1] AP

    2009 2010 2011E 2012E 2013E 2014E 2015E

    Device

    ( )

    Total 890 1,125 1,384 1,615 1,848 2,022 2,276YoY 11.9% 26.5% 23.0% 16.7% 14.5% 9.4% 12.5%PC 316 363 381 398 425 451 478

    180 307 485 620 749 845 960 4 16 55 109 155 191 256

    TV 170 205 217 226 238 242 268 156 162 169 176 185 191 198

    64 72 77 83 88 92 95 - - - 3 8 10 20

    AP

    PC - - - 1% 8% 12% 15% 80% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%

    TV - 1% 3% 8% 15% 25% 35% - - - 1% 3% 12% 18%

    1% 3% 4% 7% 10% 16% 25% - - - 100% 100% 100% 100%

    AP ( )

    Total 148 328 550 761 996 1,198 1,461YoY 43.5% 121.4% 67.8% 38.3% 30.9% 20.3% 21.9%PC - - - 4 34 54 72

    144 307 485 620 749 845 960 4 16 55 109 155 191 256

    TV - 2 7 18 36 61 94 - - - 1 6 23 36

    1 2 3 6 9 15 24 - - - 3 8 10 20

    Device ()

    PC - - - 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

    TV - 1 1 1 1 1 1 - - - 1 1 1 1

    1 1 1 1 2 2 3 - - - 1 1 1 1

    AP ( )

    Total 148 328 550 761 1,005 1,213 1,509YoY 43.5% 121.4% 67.8% 38.3% 32.1% 20.7% 24.4%PC - - - 4 34 54 72

    144 307 485 620 749 845 960 4 16 55 109 155 191 256

    TV - 2 7 18 36 61 94 - - - 1 6 23 36

    1 2 3 6 18 29 71 - - - 3 8 10 20

    ASP() 18 16 15 18 20 22 24YoY -10.0% -11.1% -6.3% 20.0% 11.1% 10.0% 9.1%

    AP ( ) 27 52 82 137 201 267 362CPU ( ) 323 355 384 407 399 369 342

    YoY(%, AP) 29.2% 96.8% 57.3% 66.0% 46.8% 32.8% 35.7%YoY(%, CPU) -1.8% 9.8% 8.2% 6.1% -2.2% -7.3% -7.3%

    * : Gartner,

  • 60 61

    PD ISSUE REPORT JULY 2012 VOL 12-6

    - PC PC CPU AP

    20W x86 ARM CPU AP 0.5W ,

    - 46.9% 2015 2.6

    - , , OS AP

    7 , , AP AP 1 5 AP

    - , AP OS 1.7

    AP 2015 18%(3.6 ) AP

    PC

    - PC AP Windows 8(Windows RT) ARM CPU AP

    PC CPU 3.0GHz, , 64 , ARM AP 1.5 GHz, , 32 5 ,

    Windows 8(Windows RT) ARM

    ARMv7 Cortex-A15 2GHz ,

    - ARM AP CPU SoC ,

    CPU 608$, AP 499$ 18% , SoC 20$

  • 60 61

    ISSUE 3 CPU

    , 40% 600$ ,

    - ARM AP

    CPU AP

    PC , AP HD AP PC CPU

    - AP CPU ( )

    * AP A5(122mm2) A4(53mm2) 2

    3. CPU

    CPU

    Intel Architecture(IA) 1978 PC CISC(Complex Instruction Set Computing) 8086 32 IA-32 64 x86 -64 [3]

    * IA (ISA: Instruction Set Architecture) x86 x86 ISA AMD, Via ownership IA

    * IA-32 x86 32 1986 80386 2000 Pentium4 [4]

    * IA-64 1994 HP VLIW(Very Long Instruction Word) EPIC(Explicit Parallel Instruction Computing) 64 Itanium . IA-64 x86

    * x86 64 IA-64 x86-64 AMD AMD64 Athlon 64, Opteron . AMD64 IA-32e, EM64T 2006 Intel 64 Intel 64 2004 Xeon [5]

    - CPU Atom x86

  • 62 63

    PD ISSUE REPORT JULY 2012 VOL 12-6

    - Atom Core2Duo I/O Controller Hub(ICH) Memory Controller Hub(MCH) System Controller Hub(SCH)

    Atom IA-32 x86-64 CPU 2008 45 CMOS , UMPC, 32 High-K Metal Gate (HKMG) [6]

    - , x86 PC

    Atom 2-issue x86 CISC RISC In-order Bonnell

    - Bonnell ALU load/store , P6 Netburst

    - CPU P5 i486 , hyper - threading

    [ 3-3]

    Atom Processor

    CPU

    SystemMemory

    Memory, GFX, High SpeedI/O, Legacy and LowerSpeed I/O Controller

    Peripherals

    BIOSHigh SpeedI/O

    FSB

    Components : IOH I/O Hub

    ICH I/O Controller Hub

    SCH System Controller Hub

    MCH Memory Controller Hub

    Interfaces : FSB Front Side Bus

    QPI Quick Path Interconnect

    DMI Direct Media Interface

    Core 2 Duo Processor Core i7 Processor

    High Speed I/O

    Controller

    Legacy andLower

    Speed I/OController

    SystemMemory Peripherals

    BIOSHigh SpeedI/O

    CPU

    IOH

    DMIIntel

    OPI

    ICH

    SystemMemory

    Memory,GFX & HighSpeed I/OController

    Legacy andLower

    Speed I/OController

    Peripherals

    BIOSHigh SpeedI/O

    FSB DMICPU

    1 45 UMPC/MID Silverthorne(Atom Z5xx) Classmate PC Diamondville(Atom N2xx)

  • 62 63

    ISSUE 3 CPU

    - Atom , Sony Atom

    2 45 Pineview HD GMA X3100 GMA 3100 GPU 7W 40% 6 10

    - Pine Trail-M Atom Pineview-M Tiger Point Platform Controller Hub , 3 2

    - 2010 1 / N4xx DDR2 GPU , hyper- threading , Menlow Atom 1/2

    2011 5 Whitney Point Moorestown Oak Trail 2 MID Lincroft

    32 3 Cedarview 2011 11 , NM10 Southbridge Cedar Trail

    - 3 Blu-ray 2.0 DirectX 9 GPU , 1080p , HDMI DisplayPort , TDP

    3 MID UMPC Medfield 3D HD 2012 5

    - 240M pixels per second ISP(Image Signal Processor), low frequency mode, high frequency mode, max frequency mode

    [ 3-4] ATOM

  • 64 65

    PD ISSUE REPORT JULY 2012 VOL 12-6

    [ 3-5] Medfield (Z2460)

    CPU

    2D/3D Graphics

    Video Decode1080p30

    I/O

    MIPI-HSIUARTULPI

    I/O

    LPDDR2eMMC

    MIPI-DSIHDMI 1.3aMIPI-CSI

    UART

    Video ImageEnhance

    L2 $512KB

    LP-DDR2Ctrl

    Video Encode1080p30

    Display Ctrl3 Pipes

    Image Signal ProcessorProgrammable

    IOSF - OCP Bridge

    Security Programmable ExecutionEnvironment & Cryto Engine

    [ 3-2] ATOM

    Platform

    - Stealey A100 90nm - TDP: 3W : 600, 800MHz

    1

    Dimondville*(2008. 03)

    N2xx 45nm - TDP: -2.5W : 1.6GHz FSB: -533MHz

    2xx 45nm - TDP: 4W 64 - bit

    Silverthrone**(2008. 03) Z5xx 45nm menlow

    TDP: 0.65 and 2.4W( 0.1W) : 0.8 to 2.0GHz FSB : 533MHz

    Dual Dimondville*(2008. 09) 3xx 45nm, dual -

    TDP: 8W : 1.6 to 1.66GHz(64-bit) FSB: -667MHz

    2

    Pineview*(2009. 12)

    N4xx 45nm

    Pine Trail

    TDP: 6.5W : 1.83GHz(64-bit) GPU: 200MHz

    D4xx 45nm TDP: 10W : 1.66GHz(64-bit) GPU: 400MHz

    N5xx 45nm, dual TDP: 8.5W 64 -bit GPU: 200MHz

    D5xx 45nm, dual TDP: 13W : 1.66GHz(64-bit) GPU: 400 MHz

    Lincroft**(2010. 05) Z6xx 45nm Moorestown&Oak Trail

    TDP: 1.3-3W GPU: 400MHz

  • 64 65

    ISSUE 3 CPU

    Platform

    3

    Cedarview*(2011. 11)

    D2500 32nm, dual

    Cedar Trail

    TDP: 10W 64-bit GPU: 400MHz

    D2700 32nm, dual TDP: 10W 64-bit GPU: 600MHz

    N2600 32nm, dual TDP: 3.5W : 1.6GHz(64-bit) GPU: 400MHz

    N2800 32nm, dual TDP: 6.5W : 1.86GHz(64-bit) GPU: 600MHz

    Medfield**(2012. 05) Z24xx 32nm -

    : 1.6GHz GPU: 400MHz

    * : //Classmate PC, ** : MID, UMPC,

    * TDP: Thermal Design Power

    4. ARM

    ARM

    RISC ARM feature Thumb ARMv4 ARM7-TDMI(ARM7-Thumb+Debug+Multiplier+ICE)

    2011 ARM ARMv7 Cortex ,

    - 2011 10 ARM Cortex-A7 big.LITTEL energy

    - Application A-Profile(ARMv7-A), Real-time R-Profile (ARMv7-R), Mid-end Low-end Micro-controller M-Profile(ARMv7-M)

    - A-Profile Cortex-A8 Cortex-A9 SMP(Symmetric Multi-Processor) , , ARM926 Cortex-A5 ,

    - ARMv7 ecosystem , SW backward compatible

  • 66 67

    PD ISSUE REPORT JULY 2012 VOL 12-6

    -

    ARMv7, ARMv8 , ARM PC CPU

    - Cortex-A9 ARM PC , ARM

    , ARM 32 ARMv7 Cortex-A15 LPAE(Large Physical Address Extensions) (Virtualization)

    - 32 4KB 40 LPAE , CPU GPU 4GB

    - , feature SW

    ARMv8 [7]

    ARMv8 Cortex-A9 Cortex-A15 32 ARMv7 64 (virtual addressing)

    - ARMv7 Real-time(R-Profile) Micro-controller(M-Profile) ARMv8 Cortex A-Profile(ARMv7-A)

    - A-Profile Cortex-A15 LPAE , LPAE 40 , ARMv8-A 48 , Cortex-A15 SW

    ARMv8 64 A64 , SW

    - Backward compatibility , 64 SW 64

  • 66 67

    ISSUE 3 CPU

    A64 32 , (operand)

    - JIT(Just-in-Time) , 31 64

    - , LDM/STM(load/store multiple) ,

    - SW , SIMD 64

    ARMv8 64 (Exception) OS TrustZone Hypervisor HYP ARMv7-A Cortex-A15

    - VM hypervisor 32 64

    AArch64 MMU(memory management unit) 48 Cortex-A15 , 48 4

    - AArch64 4KB 64KB 4 2 , 64 4GB 256TB

    - Cortex-A15 , ARMv8 MMU (VA: Virtual Address), (IPA: Intermediate Physical Address), (PA: Physical Address)

    A64 30 , X0-X30 64 ,

    - , X30 PLR(Procedure Link Register) 32 64 32 64 OS Hypervisor 32

    ecosystem tool SW, ARMv8 OS A-Profile

    - 32 64 Linux OS , Linux 64 , ecosystem 64 OS 32

  • 68 69

    PD ISSUE REPORT JULY 2012 VOL 12-6

    [ 3-6] ARM

    ARMv8-A

    A32+T32 ISAs

    AArch32

    A64 ISA

    AArch64

    CRYPTOARMv8 - A-profile only (at this time) - 64-bit architecture support

    CRYPTO

    including:- Scalar FP (SP and DP)- Adv SIMD (SP Float)

    including:- Scalar FP (SP and DP)- Adv SIMD (SP+DP Float)

    Key featureARMv7-A

    compatibility

    Thumb-2

    VFPv2

    TrustZone

    Jazelle

    SIMD

    ARMv5

    VFPv3/v4

    NEONAdv. SIMD

    ARMv5 ARMv5ARMv6 ARMv-7A/R

    big.LITTLE

    CPU (big) CPU (LITTLE) Cortex-A15 Cortex-A7

    [ 3-7] big.LITTLE Cortex-A7 Cortex-A15

    Performance and Energy-Efficiency

    LITT

    LEbi

    g

    Most energy-efficient processor from ARM Cortex-A7

    Cortex-A15

    - SImple, In-order, 8 stage pipeline- Performance better than today's mainstream, high-volume smartphones

    - Complex, out-of-order, multi-issue pipeline- Up to 5x the performance of today's mainstream, high-volume smartphones

    Highest performance in mobile power envelope

    - Cortex-A CPU 1-4 CPU L2 , SCU(Snoop Control Unit) , big.LITTLE 2 Cortex-A15 2 Cortex-A7

  • 68 69

    ISSUE 3 CPU

    - Cortex-A7 Cortex-A15 , Cortex-A15 LPAE

    [ 3-8] big.LITTLE

    GIC-400

    CCI-400 (Cache Coherent Interconnect)

    Interrupts

    Memory Controller Ports System Ports

    Interrupts

    Cortex-A15Core

    Cortex-A7Core

    Cortex-A15Core

    Cortex-A7Core IO

    CoherentMasterL2 L2

    big.LITTLE coherency Cortex-A15 Cortex-A7 L1 L2 ,

    - coherency ARM CCI-400 GIC(Generic Interrupt Controller)-400

    OS Task Migration Model

    - Task Migration OS DVFS(Dynamic Voltage and Frequency Scaling) ( ) , Cortex-A15 Cortex-A7

    * : Cortex-A15 Cortex-A15 Cortex-A7

    - Inbound Outbound L2 Inbound , Outbound Outbound

  • 70 71

    PD ISSUE REPORT JULY 2012 VOL 12-6

    [ 3-9] Cortex-A15, Cortex-A7 DVFS Curves

    Power

    Overdrive condition

    Highest Cortex-A15 Operation Point

    Lowest Cortex-A15 Operating Point

    Highest Cortex-A7 Operating Point

    Lowest Cortex-A7 Operating Point

    Performance

    Cortex-A15

    Cortex-A7

    CPU MP Cortex-A15 Cortex-A7 , OS Dispatcher big.LITTLE OS

    big.LITTLE , LG, TI, , , ST,

    [ 3-10] ARM CPU-GPU

    Scalable Mobile Processor Evolution25

    20

    15

    10

    5

    0

    2009 2010 2011 2012 2013 2014 2015 2016

    Superphone

    Entry LevelCortex-A8Mali-200

    Cortex-A8Mali-300

    Cortex-A8Mali-400 MP

    Dual Cortex-A9Quad Mali-400 MP

    Cortex-A5Mali-300

    Quad Cortex-A7Quad Mali-400 MP

    Dual Cortex-A7Dual Mali-400 MP

    Cortex-A5Mali-400 MP

    Quad Cortex-A9Quad Mali-T604

    Dual Cortex-A15Dual Cortex-A7Quad Mali-T658

    Dual Cortex-A16Dual Cortex-A7Eight Mali-T658

    Cortex-A15Cortex-A7Dual Mali-T604

    Rela

    tive

    Perf

    orm

    ance

    Mid Range

  • 70 71

    ISSUE 3 CPU

    [ 3-3] ARM

    Cache(I/D), MMU Typical MIPS

    ARM11

    ARMv6 ARM1136J(F)-S variable, MMU 740@532665 MHz(i.MX31 SoC), 400528 MHz

    ARMv6T2 ARM1156T2(F)-S variable, MPU -

    ARMv6Z ARM1176JZ(F)-S variable, MMU+TrustZone965 DMIPS@772 MHz,

    up to 2 600 DMIPS with four processors

    ARMv6K ARM11 MPCore variable, MMU -

    Cortex-A ARMv7-A

    Cortex-A5(MPCore)

    4-64KB/4-64KB L1, MMU+TrustZone 1.57 DMIPS / MHz per core

    Cortex-A7 MPCore

    32KB/32KB L1, 0-4MB L2, L1&L2 have Parity&ECC,

    MMU+TrustZone-

    Cortex-A816-32KB/16-32KB L1,

    0-1MB L2 opt ECC, MMU+TrustZone

    up to 2,000(2.0 DMIPS/MHz in speed from 600 MHz to greater than 1 GHz)

    Cortex-A9 MPCore

    16-64KB/16-64KB L1, 0-8MB L2 opt Parity,

    MMU+TrustZone

    2.5 DMIPS/MHz per core, 10,000 DMIPS@2 GHz on Performance

    Optimized TSMC 40G(dual core)

    Cortex-A15 MPCore

    32KB/32KB L1, 0-4MB L2, L1&L2 have Parity&ECC,

    MMU+TrustZone

    At least 3.5 DMIPS/MHz per core (Up to 4.01 DMIPS/MHz depending on

    implementation)

    Cortex-R ARMv7-R

    Cortex-R40-64KB/0-64KB L1, 0-2 of 0-8MB TCM,

    opt MPU with 8/12 regions-

    Cortex-R5 (MPCore)

    0-64KB/0-64KB L1, 0-2 of 0-8MB TCM,

    opt MPU with 12/16 regions-

    Cortex-R7 (MPCore)

    0-64KB / 0-64KB L1, 0-128KB of TCM,

    opt MPU with 16 regions-

    Cortex-M

    ARMv6-M

    Cortex-M0 No cache, No TCM, No MPU 0.9 DMIPS/MHz

    Cortex-M1 No cache, 0-1024KB I-TCM, 0-1024KB D-TCM, No MPU136 DMIPS@170 MHz,

    (0.8 DMIPS/MHz FPGA-dependent)

    ARMv7-M Cortex-M3 No cache, No TCM, opt MPU with 8 regions 1.25 DMIPS/MHz

    ARMv7-ME Cortex-M4 No cache, No TCM, opt MPU with 8 regions 1.25 DMIPS/MHz

  • 72 73

    PD ISSUE REPORT JULY 2012 VOL 12-6

    ARM AP

    Windows PC OS , OS

    - OS 2010 38%, 23%, RIM 17%, iOS 16%, Windows Mobile 4% , 2015 43.8%, iOS 16.9%, Windows 20.3%, RIM OS 13.4% OS

    ARM x86 IP ARM AP

    [ 3-4] x86 vs. ARM

    x86 Atom ARM Cortex-A9 AP

    / 35$ 15$

    70mm2 45nm 55mm2 45nm

    / 950 8 1,150 8

    ARM

    ARM IP

    - , ARM

    * , MS, ,

    ARM AP

    ARM AP , AP

    - PC CPU , 2010 52% 1, TI 23%, 13% 2, 3

    - 2007 AP AP 1 , 2010 , AP Baseband AP

  • 72 73

    ISSUE 3 CPU

    * AP , AP

    - AP TI AP AP , TI AP

    AP ARM , AP

    - AP , 1.2-1.5GHz ,

    5.

    CPU

    AP CPU Cortex-A , Atom //, MIPS DTV

    [ 3-5]

    ()

    ARM(Cortex-A8/A9/A15), MIPS(MIPS32), IBM(PowerPC), (Atom), ITC(Godson, ),

    Ceva(Ceva-X), Tensilica(Xtensa), ADChips(EISC)

    AP(A4, A5), (SnapDragon), (S5PC111, Exynos), Freescale(i.MX53), AMD(Au1250), (Tegra2/3/4), (MV8657), ADChips(EAGLE)

    ( )

    (iPhone, iPad), (Galaxy S/S2), HTC(Desire), LG(Q), RIM(Blackberry), (N95), (Moto QRTY), MS(KIN)

    [ 3-6] ARM CPU

    ARM11 ARM Cortex-A7 ARM Cortex-A8 ARM Cortex-A9 Scorpion Krait

    Decode single-issue partial dual-issue 2-wide 2-wide 2-wide 3-wide

    Pipeline Depth 8 stages 8 stages 13 stages 8 stages 10 stages 11 stages

    Out of Order Execution N N N Y Partial Y

    Pipelined FPU Y Y N Y Y Y

  • 74 75

    PD ISSUE REPORT JULY 2012 VOL 12-6

    ARM11 ARM Cortex-A7 ARM Cortex-A8 ARM Cortex-A9 Scorpion Krait

    NEON N/A Y(64-bit wide)Y

    (64-bit wide)Optional MPE (64-bit wide)

    Y(128-bit wide)

    Y(128-bit wide)

    Process Technology 90nm 40nm/28nm 65nm/45nm 40nm 40nm 28nm

    Typical Clock Speeds 412MHz 1.5GHz 600MHz/1GHz 1.2GHz 1GHz 1.5GHz

    iPhone 3GS AP 2008 P.A.Semi , 2009 Intrinsity

    - Intrinsity Dynamic logic EDA SW Fast14 1GHz Cortex-A8 , Intrinsity S5PC111 , A4 iPad iPhone4, TV CPU

    - 2011 2 iPad2 ARM Cortex-A9 CPU A4 PowerVR 545 GPU A5

    - A AP iPhone iPad

    iPad 3(The new iPad) ARM Cortex-A9 A6 , Cortex-A9

    [ 3-11] Apple AP(A4, A5, A5X) die size

    A4(53.3mm2)7.3mmx7.3mm

    ChipworksApple A4 Polysilicon

    ChipworksApple A4 Polysilicon Apple A5x Polysilicon Die Photo

    source: chipworksDDR SDRMInterface

    ProcessorData Path

    2 GPIO

    GPIOGPIOUSB

    Arm Core Arm Core

    Audio

    VideoDAC

    WiFi

    I/OI/O

    ProcessorData Path

    1

    ProcessorData Path

    2

    ProcessorData Path

    1

    DigitalLogic

    Blocks

    GPUCore

    GPUCore

    GPUCore

    GPUCore

    GPUSpine

    (preliminary)

    GPUSpine

    (preliminary)

    ARMCortex9

    Core

    ARMCortex9

    Core

    A5(122.2mm2)10.09mmx12.15 mm

    A5X(165mm2)12.90mmx12.79mm

    * : www.chipworks.com

  • 74 75

    ISSUE 3 CPU

    - 2011 2 iPad2 iPhone 4S AP A5(S5L8940) 45 1GHz ARM Cortex-A9 CPU A4 PowerVR543-MP2 GPU die 122.2mm2

    - 2012 3 A5 2 S5L8942 TV iPad2 2 , 32 S5L8940 41% 69.6mm2 die

    - 2012 3 iPad3 1 GHz ARM Cortex-A9 CPU PowerVR543-MP4 GPU S5L8945 , 45 GPU 210%

    1990 DEC MPU , 2000 iPhone AP AP

    - 2007 iPhone 2G, 3GS AP , iPhone4 iPad AP AP

    [ 3-12] Exynos

    Display / Camera CortexA9 Dual core Memory I/F

    High speed I/F

    System Peripheral

    Graphic / Video / ISP

    Wireless BB

    Modem I/F

    2-ch MIPI CSI input

    TV out : Composite

    HDMI v1.3a output

    Single WXGADual WSVGA

    LPDDR2 / DDR2 / DDR3

    USB Host 2.0

    Dynamic addressing

    SCU and ACP

    CPU 01.0GHz

    32KB / 32KBNEON

    CPU 11.0GHz

    32KB / 32KBNEON

    1MB L2 Cache

    JPEG HW codec

    3D / OpenVG / 2D HW

    1080p 30fps codec

    GPS

    eMMC4.4 DDR 8-bit

    SATA 1/2

    Memory Interleaving

    32x DMA

    PCI Express (SCP)

    4PLLs

    Timers / 4x PWM

    SRAM / ROM/NOR

    HSIC

    LP co-processor

    OneNAND / SLC

    USB Dev 2.0

    CPU cache coherence

    HSIC, DPSRAM

    SecureRAM

    CryptoEngine

    SecureROM

    PPMU

    External Peripheral

    HS-SPI

    14x8 keyboard

    UART

    MIPI Slimbus

    SD2.0 / MMC4.3

    I2C

    I2C/PCMAC97/S/PDIF

    Multi-Layer AXI/AHB Bus

  • 76 77

    PD ISSUE REPORT JULY 2012 VOL 12-6

    Imagination Technologies 3D GPU , Intrinsity 2009 4 1GHz ARM Cortex-A8 Hummingbird (S5PC110 S5PC111) [8]

    - S5PC111 MMP MFC(Multi-Format Video Codec)

    2010 S5PC111 45 Cortex-A9 Mali-400 MP GPU Exynos 4210

    - AP Exynos , Imagination Technologies PowerVR GPU ARM Mali GPU

    - 2012 Exynos 4210 32 HKMG LP 4212, Cortex-A15 Mali-T400 Mali-T604 GPU 5250 , Cortex-A9 Mali-T604 GPU 4412 S3

    - Mali-T604 GPU 4 Shader Tri-pipe OpenVG 1.1, OpenGL 1.1, 2.0 DirectX11 Mali-400 5

    PC

    - ARM CPU GeForce Tegra MID , MID

    - 2003 , , Mesia Q, 2005 ULi Electronics, 2006 2D/3D Hybrid Graphics, 2007 , PortalPlayer CPU

    60% AP , 3D

    - Tegra, Tesla, PC GeForce

    - ARM CPU IP IP, ARM11 Tegra600 ARM Cortex-A8 , ARM Cortex-A9 AP Tegra2

    - , Shader 3D IP GPU Shader , ARM7 Tegra 2

  • 76 77

    ISSUE 3 CPU

    [ 3-13] Tegra2

    Cortex-A9Processpor

    HD VideoDecode

    Processor

    HD VideoEncode

    Processor

    AudioProcessor

    2D/3DGraphics

    Processor

    ARM7

    Cortex-A9Processpor

    ImageProcesspor

    Cortex-A9Dual-Core

    Image Processor - Still/Video Camera driver

    Audio decoder(Tech, by acquiring Portalplayer 2007)

    Video Encoder/Decoder - Encoding of 1080p H.264 - 1080p HP H.264 decode (NVIDIA says no one else can do 1080p decode power-efficiently like them)

    Chip management - Dataflow, power management and other similar tasks

    GPU- OpenGL ES 2.0 support- 32b 333MHz LPDDR2

    Tegra 3 Cortex-A9 Cortex-A9 Companion , MPE(media processing engine) [9]

    - Tegra 3 TSMC 40 LPG , Cortex-A9 G(generic) , Companion LP

    * TSMC 40 40 LP

    [ 3-14] 40nm LP(CPU B) G(CPU A) vs

    CPU B

    CPU A

    POW

    ER

    PERFORMANCE

    CPU on Fast Process-has higher leakage power in active standby

    CPU on low power process-has lower leakage power but

    consumes more power at hihger perfromance ranges

    POW

    ER

    PERFORMANCE

    Companion Core = OFFMain Cores = ON

    Max Quad CorePerformance

    Companion Core = ONMain Cores = OFF

    - , Variable SMP OS , DVFS CPU Hot-Plug companion

  • 78 79

    PD ISSUE REPORT JULY 2012 VOL 12-6

    vSMP companion activate coherency OS , hysteresis thrashing [9]

    * Thrashing CPU

    - Companion L2 nano sec. ,

    - , vSMP OS CPU , voltage regulator ,

    [ 3-15]

    Single(Companion) core , , ,

    Single core performance, 2D , ,

    Dual core performance , ,

    Quad core performance , ,

    Core 1Companion

    Companion

    Companion

    Companion

    Core 2

    Core 2

    Core 2

    Core 2

    Core 1

    Core 1

    Core 1

    Core 1 Core 3

    Core 3

    Core 3

    Core 3 Core 4

    Core 4

    Core 4

    Core 4

    Core 1

    Core 1

    Core 1

    Tegra 3 vertex shader 4 3D pixel shader 4 8 GPU

    - ARM App. NEON Tegra 2 Tegra 3

  • 78 79

    ISSUE 3 CPU

    [ 3-16] vSMP

    100%

    80%

    60%

    40%

    20%LPO* MP3 Playback HD video playback Gaming

    Tegra 2 Project Kal-EI

    28%14%

    lower

    61%lower

    34%lower

    * LP0:

    CPU AP

    - 2011 5 USB 3G/4G Icera , CPU Tgra+Icera

    - OS OS 3.0 Tegra 2 Tegra AP , Windows RT Tegra 3 kal-El+ Wayne OS

    [ 3-17] Tegra Roadmap

    T2

    2011

    Smar

    tpho

    necl

    amsh

    ells

    Tabl

    ets

    Supe

    rpho

    ne

    2012 2013

    Kal-EI+

    Tegra+Icera

    T2

    Tegra Roadmap

    Kal-EI

    Tegra+Icera

    Grey

    Wayne

  • 80 81

    PD ISSUE REPORT JULY 2012 VOL 12-6

    Baseband AP Baseband one-chip MSM , SnapDragon

    SnapDragon , , , Windows OS AP, 1GHz Baseband AP

    - SnapDragon MSM8660 1.5GHz ,

    ARM ARMv7(Cortex-A8 ISA) Scorpion ARM

    - SnapDragon CPU ARMv7 1.5GHz Cortex-A8 , AMD Adreno GPU GPU

    - 2011 Krait Adreno225 GPU AP Krait 28 , 1.7GHz Cortex-A9 Krait CPU Adreno320 GPU APQ8064 LG

    - 65 S1 , 45 S2 S3 , Krait CPU SnapDragon S4

    [ 3-18] SnapDragon S4

    Memory (MCP)Memory (MCP)

    SDRAMSDRAM

    NAND FlashNAND Flash

    Multi-core CPUBluetooth

    2D/3D GraphicsWiFi

    Multimedia Codec

    2G/3G/4G Cellular

    RF TransceiverRF Frontend

    (Antenna, LNA, PAs)

    GPS Connectivity

    AP (Application Processor)

    DSP

    Mobile Broadcasting

    Display/imaging/Memory Support

    DigitalBasdband

    Modem(2G/3G/4G Celluar

    + WiFi/BT/FM/GPS)

    A/DD/A

    (Analog + Digital)

    , AP

    Snapdragon:(+AP )

  • 80 81

    ISSUE 3 CPU

    [ 3-19] SnapDragon S4

    LTE WorldModem

    AdrenoGPU

    KraitCPU

    VeNum

    L1 Cache

    L2 Cache

    Snapdragon System Fabric

    Dual Channel Memory

    Snapdragon Adaptive Power Technologies

    L1 Cache

    VeNum

    KraitCPU

    HexgonDSP

    HexgonDSP

    MultimediaProcessor

    HexgonDSP

    GPS/WiFi/BT/FM Audio/Video HWAccalerators

    ModemSubsystem

    MulticoreSubsystem

    MultimediaSubsystem

    SnapDragon S4 MSM8960 LTE Krait CPU SoC TSMC 28 LP S3

    - Krait CPU Scorpion 1.6 , S4 S1() 8 [10]

    [ 3-20] thermal vs. performance

    Thermal Limit

    POWER

    PERFORMANCE

    40G ARM hits thermal limit,starts throttling and results in

    lower performance

    Krait in 28LP sustains higherperformance with better

    thermal performance

    40nm G

    28nm LP

    Krait L2 (aSMP: asynchronous Symmetrical Multi-Processor)

  • 82 83

    PD ISSUE REPORT JULY 2012 VOL 12-6

    - CPU ( ) , 25-40%

    - Krait double-precision VFP(Vector Floating Point) SIMD(Single Instruction Multiple Data) ,

    [ 3-7] SnapDragon S4 AP

    S4 Play S4 Plus S4 Pro S4 Prime

    CPU 1.7 GHz Dual ARM Cortex A5 CPU 1.7 GHz

    Dual Krait CPU 1.7 GHz

    Dual or Quad Krait CPU 1.7 GHz

    Quad Krait CPU

    GPU Adreno 230GPU Adreno 305GPU Adreno 320GPU Adreno 320GPU

    Video FWVGA 1080p HD 1080p HD 1080p HD

    Modem 3G/4G /

    LTE3G/4G /

    LTE 3G/4G / LTE 3G/4G

    / LTE

    Camera 8 MP 20MP, 3D

    20MP, 3D

    20MP, 3D

    GPS gpsOne Gen 7 gpsOne Gen8A gpsOne Gen8A gpsOne Gen8A

    USB USB 2.0 USB 2.0

    OTG(480Mbps)USB 2.0

    OTG(480Mbps)USB 2.0

    OTG(480Mbps)

    Bluetooth BT3.x BT4.0 BT4.0 BT4.0

    Wifi

    802.11n(2.4/5GHz)

    802.11n(2.4/5GHz)

    802.11n(2.4/5GHz)

    802.11n(2.4/5GHz)

    45nm 28nm 28nm 28nm

    Individual Chips

    MSM8625MSM8225

    APQ8060AMSM8960

    MSM8660AMSM8260AAPQ8030MSM8930MSM8630MSM8230MSM8627MSM8227

    APQ8064MSM8960T MPQ 8064

    App. , TV,

    Vertex, pixel geometry shader Unified Shader AMD GPU Adreno 200 GPU ,

  • 82 83

    ISSUE 3 CPU

    - SnapDragon S4 Adreno 220 50% Adreno 225 GPU , Windows RT Adreno 320 GPU LTE S4 CES2012

    TI(Texas Instruments)

    TI 2011 Cortex-A9 OMAP4 , Cortex-A15 OMAP5 ARM AP

    - TI , SoC , Tegra [11]

    [ 3-21] OMAP5430 SoC

    OMAP5 OMAP5430 OMAP5432 28 LP Cortex-A15 Symmetric Multi-processing (SMP)

    - TSMC 40 1.3 GHz Cortex-A9 Tegra 3 2GHz Cortex-A15 DMIPS OMAP5 Cortex-A15

    - OMAP5430 Cortex-A15 ARM Cortex-M4 , big.LITTLE Cortex-M4 16 Thumb/Thumb-2 32 (1-cycle 32 , SIMD )

  • 84 85

    PD ISSUE REPORT JULY 2012 VOL 12-6

    OpenGL ES, OpenGL, OpenVG DirectX PowerVR SGX544-MPx GPU

    - AP GPU 100-400 MHz 1-16 GPU , iPad 3 Cortex-A9 A5X SGX543-MP4 GPU , TI OMAP5430 SGX544-MP2 GPU

    - 2012 6 MWC 2012 TI OMAP5430 iPad 3 1080p GLBenchmark 2.5 iPad3 34 FPS(on-screen) 43 FPS(off-screen), OMAP5430 38 FPS 45 FPS , SGX543 4 SGX544 2

    [ 3-8] OMAP5 (5430, 5432)

    OMAP5430 OMAP5432

    Area-sensitive(, ) Cost-sensitive(, )

    28

    ARM Cortex-A15 Clock Speed (two) Up to 2 GHz

    2D &3D Graphics ,

    Video performance(2D) 1080p60 multi-standard

    Video Performance(3D) 1080p30 multi-standard

    Imaging Performance Up to 24 MP(MIPI CSI-3+ 3x MIPI CSI-2+ CPI interfaces) Up to 20MP(3x CSI-2+ CPI interfaces)

    Memory Support 2xLPDDR2 2xDDR3/DDR3L

    Peripheral Support UART(6x), HSIC(3x), SPI(4x), MIPI UniPortSM-M, MIPI LLI, HSI (2x)UART(5x), HSIC(2x), SPI(3x)MIPI

    UniPortSM-M, MIPI LLI, HSI

    14mmx14mm PoP 980 balls 0.4mm

    pitch(240-ball, 0.5mm PoP)17mmx17mm BGA 754 balls 0.5mm

    pitch (w/depop)

    IP

    AP CPU GPU IP , CPU 2011 Cortex-A9

    - GPU GeForce GPU, Adreno GPU, PowerVR , ARM Mali

  • 84 85

    ISSUE 3 CPU

    [ 3-9] AP IP

    AP Embedded CPU GPU Fab.

    A4(Apple) Cortex-A8(single core) PowerVR SGX 535 Samsung 45nm

    A5(Apple) Cortex-A9(single core) PowerVR SGX 543MP2 Samsung 45nm

    S5PC110(Samsung) Cortex-A8(single core) PowerVR SGX540 Samsung 45nm

    Exynos(Samsung) Cortex-A9(dual core) ARM Mali Samsung 45nm

    Tegra2() Cortex-A9(dual core) GeForce ULP (4 core) TSMC 40nm

    OMAP5(TI) Cortex-A15(dual core) PowerVR SGX544-MP2 TSMC 28nm

    MSM8655() Scorpion(ARMv7, single) Adreno 205 TSMC 45nm

    MSM8960() Cortex-A9(quad core) Adreno GPU TSMC 28nm

    [ 3-10] GPU

    Adreno 225 PowerVR SGX 540

    PowerVR SGX 543

    PowerVR SGX 543MP2

    Mali-400 MP4

    GeForce ULP

    Kal-El GeForce

    SIMD Name - USSE USSE2 USSE2 Core Core Core

    # of SIMDs 8 4 4 8 4+1 8 12

    MADs per SIMD 4 2 4 4 4/2 1 1

    Total MADs 32 8 16 32 18 8 12

    GFLOPS @200MHz 12.8 GFLOPS 3.2 GFLOPS 6.4 GFLOPS 12.8 GFLOPS 7.2 GFLOPS 3.2 GFLOPS 4.8 GFLOPS

    GFLOPS @300MHz 19.2 GFLOPS 4.8 GFLOPS 9.6 GFLOPS 19.2 GFLOPS 10.8 GFLOPS 4.8 GFLOPS 7.2 GFLOPS

    2012 Windows RT CPU AP Exynos Snapragon S4, TI OMAP 5 Tegra 3

    - Exynos , TSMC 28 TSMC 28 Snapragon S4, OMAP 5 , TSMC 40 Tegra 3 [12]

    - Atom Z2460 22

  • 86 87

    PD ISSUE REPORT JULY 2012 VOL 12-6

    [ 3-11] CPU

    SnapDragon S4 OMAP6430 Exynos 5250 Tegra 3 Atom Z2460

    28nm LP 28nm LP 32nm HKMG LP 40nm LPG 32nm HKMG LP

    CPU Krait Cortex-A15Cortex-M4 Cortex-A15 Cortex-A9 x86 Atom

    CPU (+1) (w/HT)

    CPU 1.2GHz 2.0GHz 2.0GHz 1.4GHz 1.6GHz

    L2 2MB 1MB - 1MB 512KB

    LPDDR2 1066 LPDDR2 1066 LPDDR2 1600? LPDDR2 1066 LPDDR2 800

    - 8GB - 2GB 1GB

    8.5GB/s 8.5GB/s 12.8GB/s 4.26GB/s 6.4GB/s

    GPU QualcommAdreno 305PowerVR

    SGX544MPxARM

    Mali-T604NVIDIA

    ULP GeForcePowerVRSGX540

    HD 1080p30 1080p60 1080p60 1080p30 1080p30

    20MP 24MP - 32MP 24MP

    2560x1440 2560x2048 2560x1600 2048x1536 (1280x1024)

    2012 Q2 2012 Q3 2012 Q2 2011 Q4 2012 Q2

    * : www.bodnara.co.kr

    3 AP CPU,

    - , , , AR(Augmented Reality) , OS

    [ 3-22] AP

    1 AP AMP+DSP()

    2010 2015 2020

    2 AP

    3 AP Smart AP( )

    - - , , - Sensing, AR, Recognition

    MPU+DSP+MMP+GPU(, MID)

    : Hummmingbird(+ ) Qualcomm: Snapdragon NVIDIA: Tegra 1, 2 (GPU )

    6.

  • 86 87

    ISSUE 3 CPU

    6.

    CPU

    IT CPU ,

    20 , , , 1 20 [13]

    - (Heterogeneous core),

    - , , SW

    vs ARM CPU

    CPU ARM

    - ARM v8 big.LITTLE , IDM(Integrated Device Manufacturer) / PC SW

    CPU AP OS , x86 CPU ARM

    - AP 82$ 2015 362$ , PC CPU

    - ARM AP , , TI, ,

  • 88 89

    PD ISSUE REPORT JULY 2012 VOL 12-6

    []

    1. Chipmakers ARM for Battle in Traditional Computing Market, Sixto Ortiz Jr., 2011 4, IEEE Computer

    2. The high stakes of low power, Rachel Courtland, 2012 5, IEEE Spectrum

    3. White Paper: Introduction to Intel Architecture, Todd Langley and Rob Kowalczyk, 2009 1, Intel Press

    4. Online, http://en.wikipedia.org/wiki/IA-32

    5. Online, http://en.wikipedia.org/wiki/X86

    6. Online, http://en.wikipedia.org/wiki/Intel_atom

    7. White Paper: The ARM v8 Architecture, John Goodacre, 2011 11, ARM Processor Division

    8. , /, 2010 10, 25 5

    9. White Paper: Variable SMP (4-PLUS-1TM) - A Multi-Core CPU Architecture for Low Power and High Performance, NVidia, 2011, NVdia

    10. White Paper: Snapdragon S4 Processors: System on Chip Solutions for a New Mobile Age, Qualcomm, 2011, Qualcomm

    11. White Paper: Going beyond a faster horse to transform mobile devices, Brian Carlson, 2011 5, Texas Instruments

    12. AP , , 2011 9,

    13. The future of Microprocessors, Shekhar Borkar/ Andrew A. Chien, 2011 5, Communications of the ACM

  • 88 89

    ISSUE 3 CPU

    [ ]

    Multi-core

    Multi-core Application algorithm architecture Target , RTOS

    2008. 06-2011. 12

    ()

    OpenGL|ES 2.0, OpenVG 1.1, OpenCL 1.1 GPU

    SoC

    Multi-threading Processor OpenGL|ES 2.0, OpenVG, OpenCL SoC

    2011. 11-2014. 09

    Shader GPU (Fusion)

    Unified Shader Shader CPU -Shader Architecture SoC

    2012. 05-2016. 04

    16GOPS SDK

    2010. 03-2014. 02

    MPCore SoC

    ASIP (SVC, H.264, MPEG-4/2, VC-1, AVS ) C/SystemC Virtual Platform SDK

    2007. 03-2011. 02

    / DSP

    DSP architecture / C/C++ compiler, Hardware debugger SDK

    2006. 03-2010. 02