151
Chapter 2 Chapter 2 Structures of the Embedded Processors Structures of the Embedded Processors Professor Professor Tzyy Tzyy - - Kuen Kuen Tien Tien E E - - mail: mail: [email protected] [email protected] Http:// Http:// www.eecs.stut.edu.tw www.eecs.stut.edu.tw STUT/EE STUT/EE

Chapter2-Structures of Embedded Processorstylee/Courses/Chapter2-Structures of...EJTAG on-chip debugging FMT On-chip SRAM TAP Interface 教育部顧問室PAL聯盟/系統雛型與軟硬體整合設計

  • Upload
    vanbao

  • View
    229

  • Download
    0

Embed Size (px)

Citation preview

  • Chapter 2Chapter 2Structures of the Embedded ProcessorsStructures of the Embedded Processors

    Professor Professor TzyyTzyy--KuenKuen TienTienEE--mail: mail: [email protected]@mail.stut.edu.tw

    Http://Http://www.eecs.stut.edu.twwww.eecs.stut.edu.twSTUT/EESTUT/EE

  • P-2/151PAL/

    OutlineOutline2.1 Characteristics of Embedded Processors 2.1 Characteristics of Embedded Processors 2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell Processors2.3 Introduction to ARM 322.3 Introduction to ARM 32--bit CPU Core Familybit CPU Core Family2.4 Intel 2.4 Intel XScaleXScale CoreCore2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction Set

  • P-3/151PAL/

    OutlineOutline2.1 Characteristics of Embedded Processors2.1 Characteristics of Embedded Processors2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell Processors2.3 Introduction to ARM 322.3 Introduction to ARM 32--bit CPU Core Familybit CPU Core Family2.4 Intel 2.4 Intel XScaleXScale CoreCore2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction Set

  • P-4/151PAL/

    2.1 2.1 Characteristics of Embedded Processors Characteristics of Embedded Processors On the Meaning of Embedded SystemsEmbedded System RequirementsEmbedded ProcessorsCharacteristics of Embedded Processors

  • P-5/151PAL/

    2.1 Digital Systems2.1 Digital SystemsClassification of digital systems.

    General-purpose systems.The systems are not customized for any specific application.Examples of general-purpose systems include: desktop computers, workstations, and server systems.

    Application-specific systems.The systems are designed for dedicated applications.The systems can be found in process control, networking, home appliances, consumer-electronics devices, etc.

  • P-6/151PAL/

    2.1 Embedded System2.1 Embedded SystemAn embedded system usually includes a application-specific system and some non-electronic or electronic systems.

    Definition of embedded systems.Embedded systems (inexpensive) are mass-produced elements of a large system providing a dedicated, possibly time constrained, service to that system.

    Examples of embedded systems include: fax machines, cell phones, printers, CD players, etc.

  • P-7/151PAL/

    2.1 An Embedded System Example2.1 An Embedded System ExampleAn electronic system is embedded within an external process (plant) comprising a physical system and human operators performing supervising and parameter setting activities.

  • P-8/151PAL/

    2.1 2.1 Characteristics of Embedded Processors Characteristics of Embedded Processors On the Meaning of Embedded SystemsEmbedded System RequirementsEmbedded ProcessorsCharacteristics of Embedded Processors

  • P-9/151PAL/

    2.1 Embedded System Requirements2.1 Embedded System RequirementsFunctional requirements.Temporal requirements.Dependability requirements.

  • P-10/151PAL/

    2.1 Functional Requirements2.1 Functional RequirementsData gathering.

    To obtain information from the embedded system or the environment surrounding the embedded system.

    Data transformation.To display, to send, to operate the digital data, etc.

    Data control.Based on the transformed data a decision is taken to act on the environment.

  • P-11/151PAL/

    2.1 Temporal Requirements2.1 Temporal RequirementsSome tasks have deadlines.All the tasks with hard real-time deadlines are critical and a failure to complete any of them having catastrophic results.

  • P-12/151PAL/

    2.1 Dependability Requirements2.1 Dependability RequirementsReliability measures.

    the embedded systems time of being operational within a certain time span.

    Maintainability measures.The time to repair a system after a failure occurrence.

    Availability measures.The fraction of time that the embedded system is available to provide its services with respect to the total time.

  • P-13/151PAL/

    2.1 2.1 Characteristics of Embedded ProcessorsCharacteristics of Embedded Processors

    On the Meaning of Embedded SystemsEmbedded System RequirementsEmbedded ProcessorsCharacteristics of Embedded Processors

  • P-14/151PAL/

    2.1 Embedded Processor2.1 Embedded ProcessorDefinition of an embedded processor.

    An embedded processor is an (inexpensive) mass-producedprocessing element of a larger system providing dedicated computations and other, possibly real-time, services to that system.

  • P-15/151PAL/

    2.1 2.1 Characteristics of Embedded ProcessorsCharacteristics of Embedded Processors

    On the Meaning of Embedded SystemsEmbedded System RequirementsEmbedded ProcessorsCharacteristics of Embedded Processors

  • P-16/151PAL/

    2.1 Characteristics of Embedded Processors2.1 Characteristics of Embedded Processors

    Embedded-processor are application-specific processors.The processors can be and should be optimized for the applications.Hardware/software co-design methodologies can be applied.

    Embedded processors have a property of static structure.Limited access to programming.

    Embedded-processor are non-homogeneous processors.Induced by the heterogeneous character of the process within which the processor is embedded.

  • P-17/151PAL/

    2.1 Heterogeneous Characters2.1 Heterogeneous CharactersBoth the digital sub-processors may be present in the systemThe topology of the system is rather irregularThe hardware may include microprocessors, micro-controllers, DSPs, ASICs, FPGAs, etc.The software may include various software modules as well as a multitasking real-time operating system

  • P-18/151PAL/

    OutlineOutline2.1 Characteristics of Embedded Processors 2.1 Characteristics of Embedded Processors 2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell Processors2.3 Introduction to ARM 322.3 Introduction to ARM 32--bit CPU Core Familybit CPU Core Family2.4 Intel 2.4 Intel XScaleXScale CoreCore2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction Set

  • P-19/151PAL/

    2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell ProcessorsMIPS processorsARM processorsCell processors

  • P-20/151PAL/

    2.22.2 MIPS 32MIPS 32--bit Coresbit Cores

    M4K

    4K 4KE

    4KSd

    Most SecureLicensable Core

    For consumer devices

    Entry-Level Cores

    24K

    24KE

    34K

    NextGen

    NextGen

    Host Based SignalProcessing

    SIMD up to 3x signalProcessing performance

    MicrocontrollerReplacement

    225 MHz @ < 0.5mmin 130nm G

    2

    Mainstream EmbeddedProcessor

    233MHz at 130nm G

    Fastest Single ThreadedPerformance

    400MHz @ 130nm G625 MHz @ 130 nm LV

    lowK-OD

    MT + DSPExtensions

    Up to 2x systemlevel performance500MHz @ 90 nm G

    High-Performance CoresMulti-threaded

    Single-threaded

    MIPS 32-bit Cores

  • P-21/151PAL/

    2.22.2 MIPS32 34K FamilyMIPS32 34K FamilyDesigned to exploit multi-threading in embedded applications.Processing multiple software threads in parallel.

    CorExtend

    CoprocessorInterface

    FPU

    EJTAGTrace

    MDU

    ExecutionUnit

    PowerMgmt.

    TCDispatch

    Unit

    FetchUnit

    MMU

    Load/StoreUnit

    PolicyManager(for QoS)

    BIU

    D-cache(0-64K)

    I-cache(0-64K)

    D-SPRAM

    I-SPRAM

    ITC

    Off-chip

    Debug

    I/F

    OCP-DMA

    I/F

    OCP

    I/F

    OCP-DMA

    I/F

    optional

  • P-22/151PAL/

    Designed to power through graphics, Java and demanding code and with features like an ultra fast multiply, etc.

    2.22.2 MIPS32 24K FamilyMIPS32 24K Family

    Mul/Div unit

    MIPS32*32-bitexecution unit

    OptionalFPU

    CorExtendunit

    MMUcontrol

    FMT orTLB

    Datacache

    Cachecontroller

    Inst.cache

    Powermanagement

    BIU for64-bit interface

    EJTAG on-chipdebugging

    OCP

    Inte

    rfac

    eTA

    P In

    terf

    ace

  • P-23/151PAL/

    Incorporating the MIPS DSP Application Specific Extension (ASE) to MIPS32 24k.Providing efficient DSP capability while reducing overall SoCdie area, cost, and power consumption.

    2.22.2 MIPS32 24KE FamilyMIPS32 24KE Family

    Mul/Div Unit

    DSP ASE

    MIPS32*32-bitexecution unit

    DSP ASE

    OptionalFPU

    OptionalCorExtend

    unit

    MMUcontrol

    FMT orTLB

    Datacache

    Cachecontroller

    Inst.cache

    Powermanagement

    BIU for64-bit interface

    EJTAG on-chipdebugging

    OCP

    Inte

    rfac

    eTA

    P In

    terf

    ace

  • P-24/151PAL/

    Designed for SoC applications that require an easy-to-use, and cost-efficient embedded processor.

    2.22.2 MIPS32 4K FamilyMIPS32 4K Family

    MultiplyDivide Unit

    RegisterFile

    BranchControl

    ALUShifter

    InstructionCache

    PowerManagement

    CacheControl

    MMU

    BIU

    Data Cache EJTAG

  • P-25/151PAL/

    The mainstream embedded processor for MIPS32 family.

    2.22.2 MIPS32 4KE FamilyMIPS32 4KE Family

    Mul/Div Unit

    MIPS32*32-bitexecution unit

    User-definedcoprocessor

    MMUcontrol

    FMT orTLB

    Datacache

    Cachecontroller

    Inst.cache

    Powermanagement

    BIU for 32-bit EC interface

    EJTAG on-chipdebugging

    EC In

    terf

    ace

    COP Interface

    TAP

    Inte

    rfac

    e

  • P-26/151PAL/

    Designed for emerging secure data applications and stringent power, security and size requirements for smart cards.

    2.22.2 MIPS32 4Ksd FamilyMIPS32 4Ksd Family

    ExecutionCore

    SecureMMU

    Security

    TLB

    InstructionCache

    SecureCache Control

    DataCache

    EJTAG

    BIU

    PowerMgmt.

    Fixed Configurable

    On-

    Chip

    Bus

    (es

    )

    Copr

    oces

    sor

    i

    nter

    face

    2

    Processor Core

  • P-27/151PAL/

    The M4K core enables designers to meet the high system throughput demands of multi-CPU SoC designs while controlling silicon cost.

    2.22.2 MIPS32 M4K CoreMIPS32 M4K Core

    Mul/DivUnit

    MIPS32*32-bitexecution unit

    Coprocessor 2interface

    MMUcontrol

    SRAMinterface

    Powermanagement

    EJTAG on-chipdebuggingFMT

    On-chip

    SRAM

    TAP

    Inte

    rfac

    e

  • P-28/151PAL/

    2.22.2 MIPS64 5K FamilyMIPS64 5K FamilyThe series are synthesizable 64-bit MIPS processor cores designed for SoC applications.MIPS64 5Kc core: suited for digital consumer, networking, officeautomation, and embedded applications.MIPS64 5Kf core: fully pipelined IEEE 754-compliant floating-point unit with the multiply/add (MADD) instruction.

    MIPS64 5Kf

    Mul/DivUnit

    MIPS32*32-bitexecution unit

    Coprocessor 2interface

    MMUcontrol

    SRAMinterface

    Powermanagement

    EJTAG on-chipdebuggingFMT

    On-chip

    SRAM

    TAP

    Inte

    rfac

    e

    MIPS64 5Kc

    MultiplyDivide

    Unit

    RegisterFile

    BranchControl

    ALUShifter

    InstructionCache

    PowerManagement

    CacheControl

    MMU

    BIU

    Data Cache EJTAG

    Co-processorInterface

  • P-29/151PAL/

    A 64-bit embedded core providing MIPS-3D enhanced floating-point unit.

    2.22.2 MIPS64 20Kc CoreMIPS64 20Kc Core

    Div/Sqrt Div/Sqrt

    Multiply Multiply

    Add Add

    MIPS-3D SIMD FPU

    Bypass &Pipefile

    Registers

    Instruction Dispatch UnitFetchBuffer

    Decode

    Dispath

    PipeQueue

    ALU a ALU b

    Shifter Multi/Div(MAC)

    AddressGenerator Branch

    Integer Execution Unit

    Bypass &Pipefile

    Registers

    BranchHistoryTableReturn

    PredictionStack

    32 KBI-Cache

    PCGeneration

    PrefetchBuffer

    Instruction Fetch Unit

    JointTLB

    uTLB

    MMUControl

    MMU

    32 KB D-Cache

    Load/Store Control

    Write Buffer

    Fill/Store Buffer

    MemoryTransaction Queue

    Load/Store Unit

    AddrOut

    DataOut

    AddrIn

    DataIn

    SOC Interface

    PowerManagement

    & ClockControl

    EJTAG

    Pipe B

    Pipe A

  • P-30/151PAL/

    2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell ProcessorsMIPS processorsARM processorsCell processors

  • P-31/151PAL/

    2.22.2 Introduction to ARMIntroduction to ARMARM: Advanced RISC Machines.ARM limited: Advanced RISC Machines Limited.Characteristics of ARM processors.

    Low die area, low power, low cost, and high performance.The most widely used 16/32-bit embedded RISC solution in the world.

    ARM products.ARM7, ARM9, ARM9E, ARM10, ARM 11, ARM7, ARM9, ARM9E, ARM10, ARM 11, SecurCoreSecurCore..Intel Intel StrongARMStrongARM, , XscaleXscale..

  • P-32/151PAL/

    2.22.2 ARM History (1/2)ARM History (1/2)1985: Acorn Computer Group developed the world's first commercial RISC processor. 1987: Acorn's ARM processor debuts as the first RISC processor for low-cost PCs.1990: ARM Ltd.ARM Ltd. was established as a separate company.1991: ARM introduced its first embeddable RISC core, the ARM6 solution. 1993: ARM introduced the ARM7 core. 1995: ARM announced the Thumb architecture extension. 1997: ARM9TDMI family announced. 1998: ARM announced ARM10 Thumb family of processors. 2000: ARM introduced ARM922T core. ARM supported Intel's launch of ARM architecture-compliant XScale microarchitecture.

  • P-33/151PAL/

    2001: ARM announced new ARMv6 architecture. 2002: ARM launched the ARM11 microarchitecture. 2003: AMBA 3.0 (AXI) methodology announced. 2004: ARM Cortex-M3 processor announced. 2005: ARM Cortex-A8 processor announced. 2006: Cortex-R4 processor announced.

    2.22.2 ARM History (2/2)ARM History (2/2)

  • P-34/151PAL/

    2.22.2 ARM Architecture RevisionsARM Architecture Revisions

    1994 1996 1998 2000 2002 2004 2006

    ARM7TDMI ARM720T

    Strong ARM ARM920T

    V4

    ARM v5

    ARM v6

    ARM1020ARM9E

    XScale

    ARM926EJARM10EJ

    V6 cores

    ARM1022E

    time

  • P-35/151PAL/

    2.22.2 ARM7 Processors (1/2)ARM7 Processors (1/2)ARM7 processors.

    ARM7TDMI, ARM7TDMIARM7TDMI, ARM7TDMI--S, ARM7EJS, ARM7EJ--S, ARM720T.S, ARM720T.Characters.

    Embedded ICEEmbedded ICE--RT debug logic.RT debug logic.Very low power consumption.Very low power consumption.0.9 MIPS/MHz, three0.9 MIPS/MHz, three--stage pipeline.stage pipeline.A von Neumann machine architecture.

    Each successive operation can read or write any memory Each successive operation can read or write any memory location, independent of the location accessed by the location, independent of the location accessed by the previous operation.previous operation.

    A von Neumann machine also has a CPU with one or more A von Neumann machine also has a CPU with one or more registers that hold data that are being operated on.registers that hold data that are being operated on.

  • P-36/151PAL/

    2.22.2 ARM7 Processors (2/2)ARM7 Processors (2/2)

    The ARM7 Thumb Family

    Thumb Extensions Thumb Extensions Thumb Extensions

    Thumb ExtensionsARM7 Core ARM7 Core

    ARM7 CoreARM v4T ARM v4T

    ARM v4TETM7 Interface ETM7 Interface

    ETM7 InterfaceEmbeddedICE-RT EmbeddedICE-RT

    EmbeddedICE-RT EmbeddedICE-RT

    ARM v5TEJ

    Jazelle DBX Extensions

    DSP Extensions

    EMT9 Interface

    AHB Interface

    ARM 7TDMIInteger core

    ARM 7TDMI-SSynthesizable integer core

    ARM 7EJ-SJazelle DBX enables core

    ARM 720TOpen platform processor core

    8K Cache

    MMU

  • P-37/151PAL/

    2.22.2 ARM7TDMI and ARM7TDMIARM7TDMI and ARM7TDMISSARM7TDMI core is the ARM7TDMI core is the industrys most widely used 32-bit RISC embedded microprocessor.ARM7TDMIARM7TDMI--S is the S is the synthesizable version of ARM7TDMI.version of ARM7TDMI.

    CoprocessorInterface

    Bus Interface Unit

    ETM Interface Embedded ICE-RT logic

    ARM7TDMIcore

    ARM

    7TD

    MI

  • P-38/151PAL/

    2.22.2 ARM7EJARM7EJ--S and ARM720TS and ARM720T

    ETM Interface

    ARM v5TEJCPU core

    ARM7EJ-S

    CoprocessorInterface

    ETM Interface

    ARM7TDMIcore

    ARM

    720T

    AMBA AHB bus

    Control Logic and Bus Interface Unit

    Write buffer

    Cache

    MMU

    Control Logic and Bus Interface Unit

    CoprocessorInterface

  • P-39/151PAL/

    2.22.2 ARM9 Processor Cores (1/2)ARM9 Processor Cores (1/2)ARM9 processor cores.

    ARM920T, ARM922T, ARM940T.ARM920T, ARM922T, ARM940T.

    CharacteristicsCharacteristics5-stage integer pipeline achieves 1.1 MIPS/MHz.integer pipeline achieves 1.1 MIPS/MHz.Single 32-bit AMBA bus interface. Excellent debug support for SoC designers, including ETMinterface. MMU supporting Windows CE, Symbian OS, Linux, Palm OS.

  • P-40/151PAL/

    2.22.2 ARM9 Processor Cores (2/2)ARM9 Processor Cores (2/2)ARM920T

    Dual 16k caches for applications running for applications running SymbianSymbian OS, Palm OS, Palm OS, Linux and Windows CE.OS, Linux and Windows CE.

    ARM922TDual 8k caches for applications running for applications running SymbianSymbian OS, Palm OS, Palm OS, Linux and Windows CE.OS, Linux and Windows CE.

    ARM940TDual 4k caches for embedded control applications running a for embedded control applications running a RTOS.RTOS.

  • P-41/151PAL/

    2.22.2 ARM920T & 922T Cores (1/2)ARM920T & 922T Cores (1/2)

    MMU

    ARM9 Core

    Dual 8K Caches

    Thumb Extensions

    ETM9 Interface

    Embedded ICE

    ARM v4T

    ASB Interface

    ARM922TOpen platform processor core

    MMU

    ARM9 Core

    Dual 16K Caches

    Thumb Extensions

    ETM9 Interface

    Embedded ICE

    ARM v4T

    ASB Interface

    ARM920TOpen platform processor core

  • P-42/151PAL/

    2.22.2 ARM920T & ARM922T Cores (2/2)ARM920T & ARM922T Cores (2/2)

    ETM Interface

    ARM9TDMIcore

    16KInstruction

    Cache

    MMU

    16KData

    Cache

    MMU

    CoprocessorInterfaceA

    RM92

    0T

    Control Logic and Bus Interface Unit

    Write buffer

    AMBA ASB interface

    ETM Interface

    ARM9TDMIcore

    8KInstruction

    Cache

    MMU

    8KData

    Cache

    MMU

    CoprocessorInterfaceA

    RM92

    2T

    Control Logic and Bus Interface Unit

    Write buffer

    AMBA ASB interface

  • P-43/151PAL/

    2.22.2 ARM9E Processor family (1/2)ARM9E Processor family (1/2)ARM9E processors.

    ARM926EJ-S , ARM946E-S, ARM966E-S, ARM968E-S and ARM996HS.

    CharacteristicsDSP-enhanced 32-bit RISC processors. Enable single processor solutions for microcontroller, DSP and Java applications.Offering savings in chip area and complexity, power consumption, and time-to-market.Suited for applications requiring a mix of DSP and microcontroller performance.

  • P-44/151PAL/

    2.22.2 ARM9E Processor family (2/2)ARM9E Processor family (2/2)

    Dual AHB-Lite AHB Interface AHB Interface

    ARM 9E

    0-4MB TCMs

    TCM Interfaces

    Dual AHB

    ARM 968E-S ARM 966E-S

    ARM 946E-S ARM 926EJ-S

    MMU

    EMT9 Interface EMT9 Interface EMT9 Interface EMT9 Interface

    ARM 9E ARM 9E ARM 9EJ

    0-64MB TCMs

    0-1MB TCMs 0-1MB TCMs

    0-1MB Caches 4-128KB Caches

    TCM Interfaces

    TCM Interfaces TCM Interfaces

    Protection UnitThe ARM9E Family

  • P-45/151PAL/

    2.22.2 ARM946EARM946E--S & ARM926EJS & ARM926EJ--SS

    ARM946E-S ARM926EJ-S

    ETM Interface

    ARM9Ecore

    ARM

    946E

    -S

    CoprocessorInterface

    Control Logic and Bus Interface Unit

    Write buffer

    AMBA AHB interface

    DataCache

    MPU

    DataTCM interface

    InstructionCache

    MPU

    InstructionTCM interface

    ETM Interface

    ARM9EJ-Score

    ARM

    946E

    J -

    S

    CoprocessorInterface

    Control Logic and Bus Interface Unit

    Write buffer

    16KData

    Cache

    MMU

    DataTCM interface

    InstructionCache

    MMU

    InstructionTCM interface

    Instruction DataAMBA AHB interface

  • P-46/151PAL/

    2.22.2 ARM966EARM966E--S & ARM968ES & ARM968E--SS

    ARM966E-S ARM968E-S

    ETM Interface

    ARM9Ecore

    ARM

    966E

    -S

    CoprocessorInterface

    Control Logic and Bus Interface UnitWrite buffer

    AMBA AHB interface

    DataTCM interface

    InstructionTCM interface

    ETM Interface

    ARM9Ecore

    ARM

    968E

    -S

    Write buffer

    DataTCM interface

    InstructionTCM interface

    Control Logic and Bus Interface Unit Arbitration

    AMBA AHB interfaceCoprocessorInterface

  • P-47/151PAL/

    2.22.2 ARM10E Processors (1/2)ARM10E Processors (1/2)ARM10E processors.

    ARM1020E, ARM1022E and ARM1026EJ-S. Characteristics

    Offering an excellent combination of high performance and low power consumption. Includes new architectural features to deliver the highest MIPS/MHz of any ARM product. Features new power-saving modes, 64-bit load-store micro-architecture, and IEEE754 compatible floating-point coprocessor with vector operations.

  • P-48/151PAL/

    The ARM1022E is identical to the ARM1020E macrocell but has 16K/16K caches while the ARM1020E has 32K/32K caches.

    2.22.2 ARM10E Processors (2/2)ARM10E Processors (2/2)

    ETM Interface

    ARM10TMcore

    22KInstruction

    CacheMMU

    22KData

    CacheMMU

    CoprocessorInterfaceA

    RM10

    20E

    Control Logic and Bus Interface Unit

    Write buffer

    ETM10C Interface

    ARM1026EJ-Score

    ARM

    1026

    EJ -

    S Control Logic and Bus Interface UnitWrite buffer

    DataCache

    MMU MPU

    DataTCM

    Interface

    InstructionCache

    MMU MPU

    InstructionTCM

    Interface

    Instruction DataAMBA AHB InterfaceVFP10

    InterfaceWC

    InterfaceInstruction DataAMBA AHB Interface

  • P-49/151PAL/

    2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell ProcessorsMIPS processorsARM processorsCell processors

  • P-50/151PAL/

    2.22.2 Key Attributes of Cell Processor (1/2)Key Attributes of Cell Processor (1/2)Cell is Multi-Core.

    Contains 64-bit Power Architecture.Contains 8 Synergistic Processor Elements (SPE).

    Cell is a Flexible Architecture. Multi-OS support (including Linux) with Virtualization technology. Path for OS, legacy apps, and software development.

    Cell is a Broadband Architecture. SPE is RISC architecture with SIMD organization and Local Store. 128+ concurrent transactions to memory per processor.

  • P-51/151PAL/

    Cell is a Real-Time Architecture.Resource allocation (for Bandwidth Measurement).Locking Caches (via Replacement Management Tables).

    Cell is a Security Enabled Architecture. SPE dynamically reconfigurable as secure processors.

    2.22.2 Key Attributes of Cell Processor (2/2)Key Attributes of Cell Processor (2/2)

  • P-52/151PAL/

    2.22.2 Cell Chip Block Diagram Cell Chip Block Diagram Cell Chip Block DiagramSUX SUX

    LS LS

    SMF SMF

    EIB (up to 96 Bytes/cycle)

    L2

    L1 PXU

    MIC BIC

    SUX SUX SUX SUX SUX SUX

    LS LS LS LS LS LS

    SMF SMF SMF SMF SMF SMF

    PPEDual XDR

    FlexIO

    SPU SPE

  • P-53/151PAL/

    2.22.2 Cell Prototype DieCell Prototype Die

  • P-54/151PAL/

    2.22.2 Cell HighlightsCell HighlightsObserved clock speed.

    > 4 GHz.Peak performance (single precision).

    > 256 GFlops. Peak performance (double precision).

    > 26 GFlops.Area: 221mm2.Technology: 90nm Silicon on Insulator (SoI).Total # of transistors: 234M.

  • P-55/151PAL/

    2.22.2 Element Interconnect Bus Element Interconnect Bus EIB data ring for internal communication.

    Four 16 byte data rings, supporting multiple transfers. 96B/cycle peak bandwidth. Over 100 outstanding requests.

  • P-56/151PAL/

    2.22.2 Power Processor Element Power Processor Element PPE handles operating system and control tasks.

    64-bit Power Architecture.2-way hardware Multi-threading. Coherent Load/Store with 32KB I & D L1 and 512KB L2.

  • P-57/151PAL/

    2.22.2 Synergistic Processor Element Synergistic Processor Element SPE provides computational performance.

    Up to 16-way 128-bit SIMD.Dedicated resources: 128 128-bit RF, 256KB Local Store. Dedicated DMA engine: Up to 16 outstanding request.

  • P-58/151PAL/

    2.22.2 SPE Organization SPE Organization

    Floating Point UnitFixed Point Unit

    Permute UnitChannel Unit

    Forwarding MacroRegister File

    Issue Control

    Instruction LoadBuffer

    Load Store

    Read Data Latch

    DMA ReadData Latch

    DMA WriteData Latch

    DMA Unit

    Even pipe Odd pipe

    3 X 16B operands

    16B result

    3 X 16B operands

    16B result

    16B load/store

    2 Instructions

    128B line read

    64B read transfer

    8B DMA

    OutBus

    8B DMA

    InBus

    128B line write

    On chip interconnects

  • P-59/151PAL/

    2.22.2 I/O and Memory Interfaces I/O and Memory Interfaces I/O Provides wide bandwidth.

    Dual XDR controller. Two configurable interfaces. Flexible Bandwidth between interfaces. Allows for multiple system configurations.

  • P-60/151PAL/

    2.22.2 Cell Processor Example Application AreasCell Processor Example Application Areas

    Cell is a processor that excels at processing of rich media contentin the context of broad connectivity.

    Digital content creation. Game playing and game serving. Distribution of (dynamic, media rich) content. Imaging and image processing. Image analysis (e.g. video surveillance). Next-generation physics-based visualization. Streaming applications (codecs etc.). Physical simulation & science.

  • P-61/151PAL/

    OutlineOutline2.1 Characteristics of Embedded Processors 2.2 MIPS, ARM, and Cell Processors2.3 Introduction to ARM 32-bit CPU Core Family2.4 Intel XScale Core2.5 XScale (ARM V5TE) Instruction Set

  • P-62/151PAL/

    2.3 ARM 322.3 ARM 32--bit CPU Family (1/2)bit CPU Family (1/2)ARM7 family.

    Application core: ARM720T.Embedded cores: ARM7EJ-S, ARM7TDMI, ARM7TDMI-S.

    ARM9 family.Application cores: ARM920T, ARM922T.

    ARM9E family.Application core: ARM926EJ-S.Embedded cores: ARM946E-S, ARM966E-S, ARM968E-S, ARM996E-S.

    ARM10E family.Application cores: ARM1020E, ARM1022E, ARM1026EJ-S.Embedded core: ARM1026EJ-S.

  • P-63/151PAL/

    ARM11 family.Application cores: ARM11 MPcore, ARM1136J(F)-S, ARM1176JZ(F)-S.Embedded core: ARM1156T2(F)-S.

    T: T: 1616--bit Thumb Instruction Set.bit Thumb Instruction Set.D: D: onon--chip debug.chip debug.M: M: Enhancement Multiplier.Enhancement Multiplier.I: I: Embedded ICE hardware, interrupt & test.Embedded ICE hardware, interrupt & test.S: S: Synthesizable.Synthesizable.E:E: DSP application.DSP application.J:J: JazelleJazelle technology for efficient embedded Java execution.technology for efficient embedded Java execution.F: Integrated floating point coprocessor.

    2.3 ARM 322.3 ARM 32--bit CPU Family (2/2)bit CPU Family (2/2)

  • P-64/151PAL/

    2.3 ARM Instruction Set Architecture (1/4)2.3 ARM Instruction Set Architecture (1/4)ARMv4 is the oldest version of the architecture supported by ARM now.

    a 32-bit ISA operating in a 32-bit address space.Implementation: ARM7 core and ARM9 core families, Intel StrongARM.

    The ARMv4T architecture added the 16-bit Thumb instruction set to ARMv4.

  • P-65/151PAL/

    The supports by ARMv5, ARMv6, and ARMv7.

    2.3 ARM Instruction Set Architecture (2/4)2.3 ARM Instruction Set Architecture (2/4)

    ARMv5

    Jazelle

    VFPv2

    ARMv6 ARMv7 A&R ARMv7 M

    Thumb-2 only

    SIMD

    TrustZoneTM

    Thumb -2(option)

    Thumb 2(mandated)

    NEONTMadvanced SIMD

    VFPv3

    Dynamiccompilersupport

  • P-66/151PAL/

    ARMv5TEImproves to the Thumb architecture, along with ARM Enhanced DSP instruction set extensions to the ARM ISA (in 1999).Implementation: ARM9E, ARM10E families.

    ARMv5TEJadded the Jazelle extension to support Java acceleration technology (in 2000).

    ARMv6Announced in 2001, better support for multiprocessing environments.Includes media instructions to support Single Instruction Multiple Data (SIMD) software execution.Implementation: ARM11 core family.

    2.3 ARM Instruction Set Architecture (3/4)2.3 ARM Instruction Set Architecture (3/4)

  • P-67/151PAL/

    ARMv7Defines three distinct processor profiles: the A profile for sophisticated, virtual memory-based OS and user applications; the R profile for real-time systems; and the M profile optimized for microcontroller and low-cost applications.Implement Thumb 2 technology.Includes the NEON technology extensions to increase DSP and media processing throughput.Implementation: ARM Cortex-XX cores.

    Vector Floating Point (VFP).Vector Floating Point (VFP) coprocessor.

    ARM TrustZone.Provides hardware support for two separate address spaces. Provides a secure environment.

    2.3 ARM Instruction Set Architecture (4/4)2.3 ARM Instruction Set Architecture (4/4)

  • P-68/151PAL/

    2.3 ARM7 Family (1/3)2.3 ARM7 Family (1/3)Integer processor.Synthesizable version of the ARM7TDMI processor. Synthesizable core with DSP and Jazelle technologyenhancements for Java acceleration. Cached core with Memory Management Unit (MMU) supporting operating systems including Windows CE, Palm OS, SymbianOS and Linux.Established, high-volume 32-bit RISC architecture. Up to 130 MIPs (Dhrystone 2.1) performance on a typical 0.13m process. Small die size and very low power consumption. High code density, comparable to 16-bit microcontroller.

  • P-69/151PAL/

    2.3 ARM7 Family (2/3)2.3 ARM7 Family (2/3)Wide operating system and RTOS support - including Windows CE, Palm OS, Symbian OS, Linux and market-leading RTOS.Wide choice of development tools. Simulation models for leading EDA environments. Excellent debug support for SoC designers, including ETM interface.Multiple sourcing from industry-leading silicon vendors. Availability in 0.25m, 0.18m and 0.13m processes. Migration and support across new process technologies. Code is forward-compatible to ARM9, ARM9E and ARM10 processors as well as Intel's XScale technology.

  • P-70/151PAL/

    2.3 ARM7 Family (3/3)2.3 ARM7 Family (3/3)Performance characteristics.

    ApplicationsPersonal audio (MP3, WMA, AAC players). Entry level wireless handsets. Two-way pagers.

    Cache Size(Inst/Data)

    TightlyCoupledMemory

    MemoryMgt

    BusInterface Thumb DSP Jazelle

    ARM 720T

    ARM 7EJ-S

    ARM 7TDMI

    ARM 7TDMI-S

    8K unified

    -

    -

    -

    -

    -

    -

    -

    MMU

    -

    -

    -

    AHB

    Yes

    Yes**

    Yes

    Yes

    Yes

    Yes

    Yes

    No

    Yes

    No

    No

    No

    Yes

    No

    No

  • P-71/151PAL/

    2.3 ARM9 Family (1/2)2.3 ARM9 Family (1/2)Dual 16k caches for applications running Symbian OS, Palm OS, Linux and Windows CE. Dual 8k caches for applications running Symbian OS, Palm OS, Linux and Windows CE Applications.32-bit RISC processor with ARM and Thumb instruction sets. 5-stage integer pipeline achieves 1.1 MIPS/MHz.Up to 300 MIPS (Dhrystone 2.1) in a typical 0.13m process. Single 32-bit AMBA bus interface.MMU supporting Windows CE, Symbian OS, Linux, Palm OS. Integrated instruction and data caches. Excellent debug support for SoC designers, including ETM. Interface. 8-entry write buffer avoids stalling the processor when writes to external memory are performed. Portable to latest 0.18m, 0.15m, 0.13m silicon processes.

  • P-72/151PAL/

    2.3 ARM9 Family (2/2)2.3 ARM9 Family (2/2)Performance characteristics.

    ApplicationsNext-generation hand-held products. Videophones, portable communicators, PDAs.Digital consumer products. Set-top boxes, home gateways, games consoles, MP3 audio, MPEG4 video.Imaging Desktop printers, still picture cameras, digital video cameras.Automotive Telematic and infotainment systems.

    Cache Size(Inst/Data)

    TightlyCoupledMemory

    MemoryMgt

    BusInterface Thumb DSP Jazelle

    ARM 920T

    ARM 922T

    16K/16K

    8K/8K

    -

    -

    MMU

    MMU

    AHB

    AHB

    Yes

    Yes

    No

    No

    No

    No

  • P-73/151PAL/

    2.3 ARM9E Family (1/4)2.3 ARM9E Family (1/4)Jazelle technology, memory management unit (MMU), variable size instruction and data caches (4K - 128K), instruction and data tightly coupled memory (TCM) interfaces. Variable size instruction and data cache (0K- 1M), instruction and data TCM(0 - 1M), and memory protection unit for embedded applications. Targets "hard real-time" applications requiring predictable instruction execution timings with high performance and low power consumption. The smallest, lowest power ARM9E family processor to date, aimed specifically at embedded real-time applications. MOVE Video Coprocessor. This coprocessor for ARM9E family processors accelerates the Sum of Absolute Differences (SAD) operation used in MPEG-4 Encoder Motion Estimation.

  • P-74/151PAL/

    2.3 ARM9E Family (2/4)2.3 ARM9E Family (2/4)32-bit RISC processor with ARM, Thumb and DSP instruction sets.ARM Jazelle technology delivers 8x Java acceleration (ARM926EJ-S). 5-stage integer pipeline achieves 1.1 MIPS/MHz.Up to 300 MIPS (Dhrystone 2.1) in a typical 0.13m process. Integrated real-time trace and debug support. Optional VFP9 coprocessor delivers floating-point performance. 215 MFLOPS for 3D graphics and real-time control systems. High-performance AHB system.MMU supporting Windows CE, Symbian OS, Linux, Palm OS (ARM926EJ-S). Integrated instruction and data caches. Real-time debug support for SoC designers, including ETM interface. 16-entry write buffer avoids stalling the processor when writes to external memory are performed. Flexible soft IP delivery, synthesizable to the latest 0.18m, 0.15m, 0.13m silicon processes.

  • P-75/151PAL/

    2.3 ARM9E Family (3/4)2.3 ARM9E Family (3/4)Performance characteristics.

    Cache Size(Inst/Data)

    TightlyCoupledMemory

    MemoryMgt

    BusInterface Thumb DSP Jazelle

    ARM 926EJ-S

    ARM 946E-S

    ARM 966E-S

    ARM 968E-S

    Variable

    Variable

    -

    n/a

    Yes

    Yes

    Yes

    Yes

    MMU

    MPU

    -

    DMA

    2AHB

    AHB

    AHB

    AHB-Lite

    Yes

    Yes

    Yes

    Yes

    Yes

    Yes

    Yes

    Yes

    Yes

    No

    No

    No

    ARM 996HS n/a MPU(optional) Dual AMBAAHB Yes Yes No

  • P-76/151PAL/

    2.3 ARM9e Family (4/4)2.3 ARM9e Family (4/4)Applications

    Next-generation hand-held products. Videophones, portable communicators, Internet appliances. Digital consumer products. Set-top boxes, home gateways, games consoles. Imaging Desktop printers, still picture cameras, digital video cameras.Storage HDD and DVD drives.Automotive Powertrain, infotainment, ABS, body control systems. Industrial control systems. Motion controls, power delivery. Networking VoIP, Wireless LAN, xDSL.

  • P-77/151PAL/

    2.3 ARM10E Family (1/5)2.3 ARM10E Family (1/5)Features DSP instruction-set extensions, on-chip debugging capability, dual 32 kByte cache memories and full memory management unit (MMU) supporting Windows CE, Symbian OS, Linux and PalmOS. As the ARM1020E, but with dual 16 kByte cache memories. A fully synthesizable processor delivering a new level of performance, functionality and flexibility to enable innovative SoC applications. 32-bit RISC processor with ARM, Thumb and DSP instruction sets. Jazelle technology extension set (ARM1026EJ-S). 6-stage integer pipeline with branch prediction achieves 1.35 MIPS/MHz.

  • P-78/151PAL/

    2.3 ARM10E Family (2/5)2.3 ARM10E Family (2/5)430+ Dhrystone 2.1 MIPS in a widely available 0.13um process. Optional VFP10 coprocessor delivers floating-point performance. 650 MFLOPS for 3D graphics and real-time control systems. Dual 64-bit AMBA AHB bus interface and 64-bit internal bus architecture. MMU supporting Windows CE, Symbian OS, Linux, Palm OS. Integrated instruction and data caches. Parallel load/store unit. Non-blocking hit-under-miss data cache to maximize processor performance with slow memory systems.

  • P-79/151PAL/

    2.3 ARM10E Family (3/5)2.3 ARM10E Family (3/5)8-entry, double-word write buffer avoids stalling the processor when writes to external memory are performed. Real-time debug support for SoC designers, including ETM interface. High-performance AHB system with dual 64-bit Bus Masters. Portable to latest 0.18m, 0.15m, 0.13m silicon processes.

  • P-80/151PAL/

    2.3 ARM10E Family (4/5)2.3 ARM10E Family (4/5)Performance characteristics.

    Cache Size(Inst/Data)

    TightlyCoupledMemory

    MemoryMgt

    BusInterface Thumb DSP Jazelle

    ARM 1020E

    ARM 1022E

    32K/32K

    16K/16K

    -

    -

    MMU

    MMU

    2AHB

    2AHB

    Yes

    Yes

    Yes

    Yes

    No

    No

    ARM 1026EJ-S Variable Yes MMU or MPU 2AHB Yes Yes Yes

  • P-81/151PAL/

    2.3 ARM10E Family (5/5)2.3 ARM10E Family (5/5)Applications

    Next-generation hand-held products. Videophones, portable communicators, subnotebookcomputers, Internet appliances. Digital consumer products. Set-top boxes, home gateways, games consoles. Imaging Laser printers, still digital cameras, digital video cameras. Automotive Powertrain, infotainment systems. Industrial control systems. Motion controls, power delivery.

  • P-82/151PAL/

    2.3 ARM11 Family (1/5)2.3 ARM11 Family (1/5)Powerful ARMv6 instruction set architecture. ARM Thumb instruction set reduces memory bandwidth and size requirements by up to 35%.ARM Jazelle technology for efficient embedded Java execution. ARM DSP extensions. SIMD (Single Instruction Multiple Data) media processing extensions deliver up to 2x performance for video processing. ARM TrustZone technology for on-chip security foundation(ARM1176JZ-S and ARM1176JZF-S cores). Thumb-2 core technology for enhanced performance, energy efficiency and code density (ARM1156T2-S and ARM1156T2F-S cores).

  • P-83/151PAL/

    2.3 ARM11 Family (2/5)2.3 ARM11 Family (2/5)Low power consumption:

    0.6mW/MHz (0.13m, 1.2V) including cache controllers. Energy saving power-down modes address static leakage currents in advanced processes. Intelligent Energy Manager (IEM) technology for dynamic power management (ARM1176JZ-S and ARM1176JZF-S cores).

    High performance integer processor. 8-stage integer pipeline delivers high clock frequency (9 stages for ARM1156T2(F)-S). Separate load-store and arithmetic pipelines. Branch Prediction and Return Stack.

  • P-84/151PAL/

    2.3 ARM11 Family (3/5)2.3 ARM11 Family (3/5)High performance memory system design.

    Supports 4-64k cache sizes. Optional tightly coupled memories with DMA for multi-media applications. High-performance 64-bit memory system speeds data access for media processing and networking applications. ARMv6 memory system architecture accelerates OS context-switch.

    Vectored interrupt interface and low-interrupt-latency mode speeds interrupt response and real-time performance. Optional Vector Floating Point coprocessor (ARM1136JF-S, ARM1176JZF-S and ARM1156T2F-S cores) for automotive/industrial controls and 3D graphics acceleration.

  • P-85/151PAL/

    2.3 ARM11 Family (4/5)2.3 ARM11 Family (4/5)Performance characteristics.

    TightlyCoupledMemory

    MemoryMgt

    BusInterface

    -

    Yes

    MMU + Cachecoherency

    MMU

    1 or 2AMBAAXI

    5AHB

    Yes MPU 3AXI

    Yes MMU +TrustZone 4AXI

    Cache Size(Inst/Data) Thumb DSP Jazelle

    ARM 11 MPCore

    ARM 1136J(F)-S

    Variable

    Variable

    Yes

    Yes

    Yes

    Yes

    Yes

    Yes

    ARM 1156T2(F)-S Variable Yes Yes No

    ARM 1176JZ(F)-S Variable Yes Yes Yes

  • P-86/151PAL/

    2.3 ARM11 Family (5/5)2.3 ARM11 Family (5/5)Applications

    ARM1136J(F)-S

    Automotive

    Computer

    Consumer

    Industrial

    Infotainment, DVDnavigation

    PDA

    Digital TV, DVDPVR, Set-top box,games

    Networking

    ARM1156T2(F)-S ARM1176JZ(F)-S ARM11 MPCore

    Wireless

    Power train, body *

    Infotainment, DVD,navigation, imageand speechrecognition

    -

    Infrastructure,Switch/router

    Smartphone,applicationsprocessor

    Printer, data storage

    Digital camera

    Embedded control

    Modem

    Base station

    PDA

    Set-top Box*

    EPOS Terminal

    *

    Combinedapplications andbase-bandprocessor forSmartphone

    PDA, Printer,industrial, Kiosks,server blade

    DTV, IPSTB, DSC,Camcorder, Gamesbox

    -

    CPE Terminals,Switch/Router, NAS

    Mobile gaming,PDA, Media Player

  • P-87/151PAL/

    OutlineOutline2.1 Characteristics of Embedded Processors 2.1 Characteristics of Embedded Processors 2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell Processors2.3 Introduction to ARM 322.3 Introduction to ARM 32--bit CPU Core Familybit CPU Core Family2.4 Intel 2.4 Intel XScaleXScale CoreCore2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction Set

  • P-88/151PAL/

    2.4 High2.4 High--level Overview of Intel level Overview of Intel XscaleXscale CoreCore

    An ARM V5TE compliant microprocessor.

    Designed as an embedded core an ASSP (Application Specific Standard Product).

    The Intel XScale core implements the integer instruction set architecture of ARM V5, but does not provide hardware support of the floating point instructions.

    The Intel XScale core provides the Thumb instruction set (ARM V5T) and the ARM V5E DSP extensions.

  • P-89/151PAL/

    2.4 Features of 2.4 Features of XscaleXscale MicroarchitectureMicroarchitecture (1/2)(1/2)

    A 7-stage integer/8-stage memory super-pipelined core.Dynamic Voltage Management.Media Processing Technology.

    A multiply-accumulate coprocessor performing two simultaneous 16-bit SIMD multiplies with 40-bit accumulation.

    Power Management Unit.128-entry Branch Target Buffer.32 KB Instruction Cache, 2 KB Data Cache.32-Entry Instruction Memory Management Unit.Four Entry Fill and Pend Buffers.Performance Monitoring Unit.

  • P-90/151PAL/

    Debug Unit.32-bit Coprocessor Interface.64-bit Core Memory Bus with Simultaneous 32-bit Input. Path and 32-bit Output Path.8-Entry Write Buffer.Thumb Instruction Set Supported.

    2.4 Features of 2.4 Features of XscaleXscale MicroarchitectureMicroarchitecture (2/2)(2/2)

  • P-91/151PAL/

    2.4 2.4 XScaleXScale ArchitectureArchitectureThe following graph shows the major functional blocks of Scale core.

  • P-92/151PAL/

    2.4 Multiply/Accumulate (MAC)2.4 Multiply/Accumulate (MAC)Supports early termination of multiplies/accumulates in two cycles.Sustain a throughput of a MAC operation every cycle.A 40-bit accumulator and support for 16-bit packed data for audio coding algorithms.

  • P-93/151PAL/

    2.4 Memory Management2.4 Memory ManagementThe MMU provides access protection and virtual to physical address translation.Specifies the caching policies for the instruction cache and data cache.The caching policies include:

    Identifying code as cacheable or non-cacheable.Selecting between the mini-data cache or data cache.Write-back or write-through data caching.Enabling data write allocation policy.Enabling the write buffer to coalesce stores to external memory.

  • P-94/151PAL/

    2.4 Instruction Cache2.4 Instruction Cache

    Implements a 32-Kbyte, 32-way set associative instruction cache with a line size of 32 bytes.All requests that miss the instruction cache generate a 32-byte read request to external memory.A mechanism to lock critical code within the cache is also provided.

  • P-95/151PAL/

    2.4 Branch Target Buffer2.4 Branch Target Buffer

    Can predict the outcome of branch type instructions.Provides storage for the target address of branch type instructions.Predicts the next address to present to the instruction cache.

  • P-96/151PAL/

    2.4 Data Cache2.4 Data Cache

    Implements a 32-Kbyte, 32-way set associative data cacheand a 2-Kbyte, 2-way set associative mini-data cache.Each cache has a line size of 32 bytes, supporting write-through or write-back caching.

  • P-97/151PAL/

    2.4 Fill Buffer & Write Buffer2.4 Fill Buffer & Write BufferEnable the loading and storing of data to memory beyond the Intel XScale core.The Write Buffer carries.

    allowing data coalescing when both globally enabled, and when associated with the appropriate memory page types.

    The Fill buffer. assists the loading of data from memory.allows the application processor.

    external SDRAM to be read as 4-word bursts, rather than single word accesses.

  • P-98/151PAL/

    2.4 Performance Monitoring2.4 Performance Monitoring

    Two performance monitoring counters have been added to monitor various events in the Intel XScale core.These events allow a software developer to measure cache efficiency, detect system bottlenecks and reduce the overall latency of programs.

  • P-99/151PAL/

    2.4 Power Management2.4 Power Management

    Assists ASSPs in controlling their clocking and managing their power.

  • P-100/151PAL/

    2.4 Debug2.4 DebugTwo instruction address breakpoint registers.One data-address breakpoint register.One data-address/mask breakpoint register.A mini-instruction cache and a trace buffer.Testability & hardware debugging is supported on the Intel XScale core through the Test Access Port (TAP) Controller implementation.

  • P-101/151PAL/

    2.4 JTAG2.4 JTAGBased on IEEE 1149.1 (JTAG) Standard Test Access Port and Boundary-Scan Architecture.

  • P-102/151PAL/

    OutlineOutline2.1 Characteristics of Embedded Processors 2.1 Characteristics of Embedded Processors 2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell Processors2.3 2.3 IntroductionIntroduction to ARM 32to ARM 32--bit CPU Core Familybit CPU Core Family2.4 Intel 2.4 Intel XScaleXScale CoreCore2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction Set

  • P-103/151PAL/

    2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction SetOverview of the XScale Instruction Set.The ARM Addressing Modes. The ARM V5TE Instruction Set.The Thumb Instruction Set.

  • P-104/151PAL/

    2.5 Overview of 2.5 Overview of XScaleXScale Instruction SetInstruction SetThe Intel XScale core implements the integer instruction setarchitecture specified in ARM Version 5TE.

    The Intel XScale core supports both big and little endian data representation.

    The Intel XScale core supports the Thumb instruction set.

    The Intel XScale core implements ARMs DSP-enhancedinstruction set.

  • P-105/151PAL/

    2.5 2.5 Extensions to ARM ArchitectureThe Intel XScale core made a few extensions to the ARM Version 5 architecture to meet the needs of various markets and design requirements.

    A DSP coprocessor (CP0) has been added that contains a 40-bit accumulator and 8 new operations in coprocessor space, hereafter referred to as new instructions.New page attributes were added to the page table descriptors.Additional functionality has been added to coprocessor 15. Coprocessor 14 was created.Enhancements were made to the Event Architecture, instruction cache and data cache parity error exceptions, breakpoint events,and imprecise external data aborts.

  • P-106/151PAL/

    2.5 2.5 Xscale DSP Coprocessor 0 (CP0)The 40-bit accumulator is referenced by several new instructions that were added to the architecture.

    MIA, MIAPH and MIAxy are multiply/accumulate instructions that reference the 40-bit accumulator instead of a register specified accumulator.

    MRA and MAR provide the ability to read and write the 40-bit accumulator.

  • P-107/151PAL/

    2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction SetOverview of the XScale Instruction SetThe ARM Addressing ModesThe ARM V5TE Instruction SetThe Thumb Instruction Set

  • P-108/151PAL/

    2.5 Addressing Modes2.5 Addressing ModesData-processing operands.Load and store word or unsigned byte.Miscellaneous loads and stores.Load and store multiple.Load and store coprocessor.

  • P-109/151PAL/

    2.5 Data2.5 Data--processing Operandsprocessing OperandsGeneral instruction syntax.

    {} {S} , , 11 addressing modes to calculate the in an ARM data-processing instruction.

    Immediate: #Register: Logical shift left by immediate: , LSL #Logical shift left by register: , LSL Logical shift right by immediate: , LSR #Logical shift right by register: , LSR Arithmetic shift right by immediate: , ASR #Arithmetic shift right by register: , ASR Rotate right by immediate: , ROR #Rotate right by register: , ROR Rotate right with extend: , RRX

  • P-110/151PAL/

    2.5 Examples2.5 ExamplesImmediate operand value.

    MOV R0, #0 ; Move zero to R0.ADD R3, R3, #1 ; Add one to R3.CMP R7, #1000 ; compare value of R7 with 1000.

    Register operand value.MO R2, R0ADD R4, R3, R2 ; R4 = R3 + R2.CMP R7, R8 ; compare the value of R7 and R8.

    Shifted register operand value.MOV R2, R0, LSL #2 ; shift R0 left by 2, write to R2.ADD R9, R5, R5, LSL #3 ; R9 = R5 + R5 * 8.RSB R9, R5, R5, LSL #3 ; R9 = R5 * 8 R5.

  • P-111/151PAL/

    2.5 Load and Store Word or Unsigned Byte2.5 Load and Store Word or Unsigned ByteGeneral instruction syntax.

    LDR|STR{} {B} {T} , Addressing modes.

    Immediate offset: [, #+/-]Register offset: [, +/-]Scaled register offset: [, +/-, #]Immediate pre-indexed: [, #+/-]!Register pre-indexed: [, +/-]!Scaled register pre-indexed: [, +/-, #]!Immediate post-indexed: [], #+/-Register post-indexed: [], +/-Scaled register post-indexed: [], +/-, #

  • P-112/151PAL/

    2.5 Miscellaneous Loads and Stores2.5 Miscellaneous Loads and StoresThere are six addressing modes used to calculate the address forload and store (signed or unsigned) halfword, load signed byte, or load and store doubleword instructions.General instruction syntax.

    LDR|STR{}H|SH|SB|D , Addressing modes.

    Immediate offset: [, #+/-]Register offset: [, +/-]Immediate pre-indexed: [, #+/-]!Register pre-indexed: [, +/-]!Immediate post-indexed: [], #+/-Register post-indexed: [], +/-

  • P-113/151PAL/

    2.5 Load and Store Multiple2.5 Load and Store MultipleGeneral instruction syntax.

    LDM|STM{} {!}, {^}Addressing modes.

    Increment after: IA ; non-stack addressing mode.Increment before: IB ; non-stack addressing mode.Decrement after: DA ; non-stack addressing mode.Decrement before: DB ; non-stack addressing mode.Full descending: FD ; stack addressing mode.Empty descending: ED ; stack addressing mode.Full ascending: FA ; stack addressing mode.Empty Ascending: EA ; stack addressing mode.

  • P-114/151PAL/

    2.5 Load and Store Coprocessor2.5 Load and Store CoprocessorGeneral instruction syntax.

    {}{L} , , Addressing modes.

    Immediate offset: [, #+/-*4]Immediate pre-indexed: [, #+/-*4]!Immediate post-indexed: [], #+/-*4Unindexed: [],

  • P-115/151PAL/

    2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction SetOverview of the Xscale Instruction SetThe ARM Addressing ModesThe ARM V5TE Instruction SetThe Thumb Instruction Set

  • P-116/151PAL/

    2.5 The Condition Field2.5 The Condition FieldAlmost all ARM instructions can be conditionally executed.

    If the N, Z, C, and V flags in CPSR satisfy a conditionspecified in the instruction, the instruction is executed.If the flags do not satisfy this condition, the instruction actsas a NOP.

    Every instruction contains a 4-bit condition code field in bits 31 to 28.

    cond

    31 28 27 0

  • P-117/151PAL/

    2.5 Branch Instructions2.5 Branch InstructionsAllow a conditional branch forwards or backwards up to 32MB.List of branch instructions.

    B, BL: branch and branch with link.BLX: branch with link and exchange.BX: branch and exchange instruction .

    Operation Assembler Action

    with link and exchange

    Branch

    with link and exchange (1)with link and exchange (2)

    BL{cond} labelBX{cond} Rm

    B{cond} label

    BLX lableBLX {cond}

    R14 := R15 4 , R15 := label R15 := Rm , Change to Thumb if Rm[0] is 1

    R15 := label

    R14 := R15 4 , R15 := labelRm R14 := R15 4 , R15 := Rm[31:1]

  • P-118/151PAL/

    2.5 Examples2.5 ExamplesB label ; branch unconditionally to label.BCC label ; branch to label if carry flag is clear.BEQ label ; branch to label if zero flag is set.MOV PC, #0 ; R15 = 0, branch to location zero.BL func ; subroutine call to function func.

  • P-119/151PAL/

    2.5 Data Processing Instructions [1/3]2.5 Data Processing Instructions [1/3]

    Rd := Rn OR Oprnd2 Rd := Rn AND NOT Oprnd2

    Rd := Rn AND Oprnd2 Rd := Rn EOR Oprnd2

    Update CPSR flags on Rn EOR Oprnd2

    Update CPSR flags on Rn AND Oprnd2

    Operation Assembler Action

    ORRBit Clear

    ANDEOR

    Test equivalence

    Test

    ORR{cond}{S} Rd , Rn , BIC{cond}{S} Rd , Rn ,

    AND{cond}{S} Rd , Rn , EOR{cond}{S} Rd , Rn ,

    TEQ{cond} Rn ,

    TST{cond} Rn ,

    negative

    Compare

    CMN{cond} Rn ,

    CMP{cond} Rn ,

    Update CPSR flags on Rn + Oprnd2

    Update CPSR flags on Rn - Oprnd2

  • P-120/151PAL/

    2.5 Data Processing Instructions [2/3]2.5 Data Processing Instructions [2/3]Operation Assembler Action

    reverse subtract with carrysaturating

    with carryreverse subtract

    saturatingdouble saturating

    Subtract

    Add with carry

    double saturating

    Rd := Oprnd2 Rn NOT ( Carry )Rd := SAT (Rm - Rn)

    Rd := Rn Oprnd2 NOT ( Carry )Rd := Oprnd2 - Rn

    Rd := SAT ( Rm + Rn )Rd := SAT ( Rm + SAT ( Rn * 2 ) )Rd := Rn Oprnd2

    Rd := Rn + Oprnd2Rd := Rn + Oprnd2 + Carry

    Rd= SAT ( Rm SAT ( Rn * 2 ) )

    RSC{cond}{S} Rd , Rn , QSUB{cond} Rd , Rm , Rn

    SBC{cond}{S} Rd , Rn , RSB{cond}{S} Rd , Rn ,

    QADD{cond} Rd , Rm , RnQDADD{cond} Rd , Rm , RnSUB{cond}{S} Rd , Rn ,

    ADD{cond}{S} Rd , Rn , ADC{cond}{S} Rd , Rn ,

    QDSUB{cond} Rd , Rm , Rn

  • P-121/151PAL/

    2.5 Data Processing Instructions [3/3]2.5 Data Processing Instructions [3/3]Operation Assembler Action

    register to SPSRregister to CPSR

    SPSR to registerCPSR to register

    MoveNOT

    immediate to SPSRimmediate to CPSR

    MSR{cond} SPSR_ , RmMSR{cond} SPSR_ , Rm

    MRS{cond} Rd , SPSR MRS{cond} Rd , CPSR

    MOV{cond}{S} Rd , MVN{cond}{S} Rd ,

    MSR{cond} SPSR_ , #MSR{cond} CPSR_ ,#

    SPSR := Rm (selected bytes only)CPSR := Rm (selected bytes only)

    Rd := SPSRRd := CPSR

    Rd := Oprnd2Rd := 0xFFFFFFFF EOR Oprnd2

    SPSR := immed_8r (selected bytes only)CPSR := immed_8r (selected bytes only)

  • P-122/151PAL/

    2.5 Instruction Encoding2.5 Instruction EncodingMOV, MVN

    {} {S} , CMP, CMN, TST, TEQ

    {} , ADD, SUB, RSB, ADC, SBC, RSC, AND, BIC, EOR, ORR

    {} {S} , ,

    I bit: distinguishes between the immediate and register forms of.S bit: Signifies that the instruction updates the condition codes.Rn: specifies the first source operand register.Rd: specifies the destination register.shifter_operand: specifies the second source operand.

    cond

    31 28 2726 25 24 21 20 19 16 15 12 11 0

    0 0 I opcode S Rn Rd shifter_operand

  • P-123/151PAL/

    2.5 Multiply Instructions [1/2]2.5 Multiply Instructions [1/2]ARM has two classes of Multiply instruction.

    Normal, 32-bit result.Long, 64-bit result.

    List of multiply instructions.Normal

    MUL: multiplies the values of two registers together, truncates the result to 32 bits, and stores the result in a third register.MLA: multiplies the values of two registers together, addsthe value of a third register, truncates the result to 32 bits, and stores the result in a fourth register.

    LongSMLAL: signed multiply accumulate long.SMULL: signed multiply long.UMLAL: unsigned multiply accumulate long.UMULL: unsigned multiply long.

  • P-124/151PAL/

    2.5 Multiply Instructions [2/2]2.5 Multiply Instructions [2/2]Operation Assembler Action

    signed accumulate longsigned 16 * 16 bit

    unsigned accumulate longsigned long

    accumulateunsigned long

    Multiply

    signed 32 * 16 bitsigned accumulate 16 * 16signed accumulate 32 * 16

    RdHi,RdLo := signed ( RdHi , RdLo + Rm * Rs )Rd := Rm[x] * Rs[y]

    RdHi,RdLo := unsigned (RdHi,RdLo + Rm * Rs )RdHi,RdLo := signed (Rm * Rs)

    Rd := ( ( Rm * Rs ) + Rn ) [31:0]RdHi , RdLo := unsigned ( Rm * Rs )

    Rd := ( Rm * Rs ) [31:0]

    Rd := ( Rm * Rs[y] ) [47:16]Rd := Rn + Rm[x] * Rs[y] Sticky.Rd := Rn + ( Rm * Rs[y] ) [47:16]

    SMLAL{cond}{S} RdLo , RdHi , Rm , RsSMULxy{cond} Rd , Rm , Rs

    UMLAL{cond}{S} RdLo, RdHi, Rm, RsSMULL{cond}{S} RdLo , RdHi , Rm , Rs

    MLA{cond}{S} Rd , Rm , Rs , RnUMULL{cond}{S} RdLo , RdHi , Rm , Rs

    MUL{cond}{S} Rd , Rm , Rs

    SMULWy{cond} Rd , Rm , RsSMLAxy{cond} Rd , Rm , Rs , RnSMLAWy{cond} Rd , Rm , Rs , Rn

    signed accumulate long 16 * 16 RdHi , RdLo := RdHi , RdLo + Rm[x] * Rs[y] SMLALxy{cond}RdLo , RdHi , Rm , Rs

  • P-125/151PAL/

    2.5 Examples2.5 ExamplesMUL R4, R2, R1 ; set R4 to value of R2 * R1.MULS R4, R2, R1 ; R4 = R2 * R1, set N and Z flags.

    ; N=1 if value of R4 is negative.; Z=1 if value of R4 is zero.

    MLA R7, R8, R9, R3 ; R7 = R8 * R9 + R3.SMULL R4, R8, R2, R3 ; R4 = bits 0 to 31 of R2 * R3.

    ; R8 = bits 32 to 63 of R2 * R3.

    UMULL R6, R8, R0, R1 ; R8, R6 = R0 * R1.UMLAL R5, R8, R0, R1 ; R8, R5 = R0 * R1 + R8, R5.

  • P-126/151PAL/

    2.5 Status Register Access Instructions2.5 Status Register Access InstructionsThere are two instructions for moving the contents of a program status register (PSR) to or from a general-purpose register. Both the SPSR and SPSR can be accessed.MRS: move PSR to general-purpose register.MSR: move general-purpose register to PSR.

    register to SPSRregister to CPSR

    SPSR to registerCPSR to register

    immediate to SPSRimmediate to CPSR

    MSR{cond} SPSR_ , RmMSR{cond} SPSR_ , Rm

    MRS{cond} Rd , SPSR MRS{cond} Rd , CPSR

    MSR{cond} SPSR_ , #MSR{cond} CPSR_ ,#

    SPSR := Rm (selected bytes only)CPSR := Rm (selected bytes only)

    Rd := SPSRRd := CPSR

    SPSR := immed_8r (selected bytes only)CPSR := immed_8r (selected bytes only)

    Operation Assembler Action

  • P-127/151PAL/

    2.5 Examples2.5 ExamplesThese example assume that the ARM processor is already in a privileged mode. If the ARM processor starts in User mode, only the flag update has any effect.

    MRS R0, CPSR ;read the CPSR.BIC R0, R0, #0xF0000000 ;clear the N, Z, C, and V bits.MSR CPSR_f, R0 ;update the flag bits in the CPSR.

    ;N, Z, C, and V flags now all clear.

    MRS R0, CPSR ;read the CPSR.BIC R0, R0, #0x80 ;set interrupt disable bit.MSR CPSR_f, R0 ;update the control bits in the CPSR.

    ;interrupts (IRQ) now disabled.

  • P-128/151PAL/

    2.5 Load and Store Instructions [1/3]2.5 Load and Store Instructions [1/3]Support two broad types.

    Load or store a 32-bit word or an 8-bit unsigned byte.Load or store a 16-bit unsigned halfword, and can load and sign extend a 16-bit halfword or an 8-bit byte.

    Addressing mode is formed from two parts:The base register and the offset.The base register can be one of the general-purposed registers.The offset takes one of three formats: immediate, register, and scaled register.

  • P-129/151PAL/

    2.5 Load and Store Instructions [2/32.5 Load and Store Instructions [2/3]]Operation Assembler Action

    Load instructions:

    ByteUser mode privilege

    branch ( and exchange)

    WordUser mode privilege

    signedHalfwordsigned

    LDR{cond}B Rd , LDR{cond}BT Rd ,

    LDR{cond} R15 ,

    LDR{cond} Rd , LDR{cond}T Rd ,

    LDR{cond}SB Rd , LDR{cond}H Rd , LDR{cond}SH Rd ,

    Rd := ZeroExtend [byte from address]

    R15 := [address][31:1] ,Change to Thumb if [address] [0] is 1

    Rd := [address]

    Rd := SignExtend [byte from address]Rd := ZeroExtend [halfword from address]Rd := SignExtend [halfword from address]

  • P-130/151PAL/

    2.5 Load and Store Instructions [3/3]2.5 Load and Store Instructions [3/3]Store instructions:

    Operation Assembler Action

    User mode privilegeHalfword

    User mode privilegeByte

    [address][7:0] := Rd[7:0][address][15:0] := Rd[15:0]

    [address] := Rd[address][7:0] := Rd[7:0]

    STR{cond}BT Rd , STR{cond}H Rd ,

    STR{cond}T Rd , STR{cond}B Rd ,

    Word [address] := RSTR{cond} Rd ,

  • P-131/151PAL/

    2.5 Examples [1/2]2.5 Examples [1/2]LDR R1, [R0] ;load R1 from the address in R0.LDR R8, [R3, #4] ;load R8 from the address in R3+4.LDR R12, [R13, #-4] ;load R8 from the address in R134.STR R2, [R1, #0x100] ;store R2 to the address in R1+0x100.

    LDRB R3, [R8, #3] ;load byte to R3 from R8 + 3.STRB R10, [R4, #0x200] ;store byte from R10 to R4 + 0x200.LDR R11, [R3, R5, LSL #2] ;load R11 from R3 + (R5 * 4).LDR R1, [R0, #4]! ;load R1 from R0 + 4, then R0=R0+4.STRB R7, [R6, #-1]! ;store byte from R7 to R6 - 1, then

    ; R6 = R6 1.

  • P-132/151PAL/

    2.5 Examples [2/2]2.5 Examples [2/2]LDR R3,[R9],#4 ;load R3 from R9, then R9=R9+4.STR R2,[R5],#8 ;store R2 to R5, then R5=R5+8.LDR R2,[R1],R0 ;load R2 from R1, then R1=R1+R0.

    LDRH R1,[R0] ;load halfword to R1 from RSTRH R2,[R1,#0x80];store halfword from R2 to R1+0x80.LDRH R11,[R0,R2] ;load halfword into R11 from

    ;address in R0+R2.

    LDRSH R1, [R0,#2] ;load signed halfword R1 from ;R0+2, then R0=R0+2.

  • P-133/151PAL/

    2.5 Load and Store Multiple Instructions2.5 Load and Store Multiple InstructionsOperation Assembler Action

    Push , or Block data storeUser mode registers

    Store list of registers to [Rd]Store list of User mode registers to [Rd]

    STM{cond} Rd{!} , STM{cond} Rd{!} , ^

    Pop , or Block data loadreturn ( and exchange )

    and restore CPSR User mode registers

    LDM{cond} Rd{!} , LDM{cond} Rd{!} ,

    LDM{cond} Rd{!} , ^LDM{cond} Rd , ^

    Load list of registers from [Rd]Load registers , R15 := [address][31:1]

    Load registers , branch and exchangeLoad list of User mode registers from [Rd]

    ExamplesSTMFD R13!, {R0-R12, LR}LDMFD R13!, {R0-R12, PC}LDMIA R0, {R5-R8}STMDA R1!, {R2,R5,R11}

  • P-134/151PAL/

    2.5 Semaphore Instructions2.5 Semaphore Instructions

    ExamplesSWP R12,R10,[R9] ;load R12 from address R9 and store

    ;R10 to address R9.SWPB R3, R4, [R8] ;load byte to R3 from address R8

    ;and store byte from R4 to address ;R8.

    SWP R1, R1, [R] ;exchange value in R1 and address in ;R2.

    Operation Assembler Action WordByte

    temp := [Rn], [Rn] := Rm, Rd := temptemp :=ZeroExtend ( [Rn] [7:0] ), [Rn] [7:0] := Rm [7:0], Rd := temp

    SWP{cond} Rd, Rm, [Rn]SWP{cond}B Rd, Rm, [Rn]

  • P-135/151PAL/

    2.5 Exception2.5 Exception--generating Instructionsgenerating InstructionsList of Exception-generating instructions.

    BKPT: breakpoint.SWI: software interrupt.

    The Breakpoint (BKPT) instruction is used for software breakpoints in ARM architecture versions 5 and above.The Software Interrupt (SWI) instruction is used to cause a SWI exception to occur.

  • P-136/151PAL/

    2.5 Coprocessor Instructions2.5 Coprocessor InstructionsList of coprocessor instructions.

    CDP: coprocessor data operation.LDC: load coprocessor register.MCR: move to coprocessor from ARM register.MRC: move to ARM register from coprocessor.STC: store coprocessor register.

  • P-137/151PAL/

    2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction SetOverview of the XScale Instruction Set.The ARM Addressing Modes. The ARM V5TE Instruction Set.The Thumb Instruction Set.

  • P-138/151PAL/

    2.5 Thumb Instruction Set2.5 Thumb Instruction SetA re-rncoded subset of the ARM instruction set.To increase the performance of ARM implementations using a 16-bit or narrower memory data bus.Every Thumb instruction is encoded in encoded in 16 bits.Thumb execution is normally entered by executing an ARM BXinstruction. On architecture V5 and above, BLX instruction and LDR/LDM instructions that load PC can be used similarly.

  • P-139/151PAL/

    2.5 Branch Instructions2.5 Branch InstructionsOperation Assembler Action

    Change to ARMBranch with link and exchange

    Branch and exchange

    Conditional branch

    Branch with link and exchange

    BLX Rm

    BX Rm

    B{cond} labelSee Table Condition Field (ARM side).AL not allowed

    BLX label

    R14 := R15 2, R15 := Rm AND 0xFFFFFFFE

    R15 := Rm AND 0xFFFFFFFE

    R15 := label

    R14 := R15 2, R15 := label

    Unconditional branchLong branch with link

    B labelBL label

    R15 := labelR14 := R15 2, R15 := label

  • P-140/151PAL/

    2.5 Examples2.5 ExamplesB label ;unconditionally branch to label.BCC label ;branch to label if carry flag is clear.BEQ label ;branch to label if zero flag is set.BL func ;subroutine call to func.

    func ; include body of function here.

    MOV PC,LR ;R15=R14, return to instruction after the BLBX R12 ;branch to address in R12, begin ARM

    ;execution if bit 0 of R12 is zero;otherwise continue executing Thumb code.

  • P-141/151PAL/

    2.5 Data2.5 Data--processing Instructions [1/4]processing Instructions [1/4]

    Operation Assembler Action

    with carry

    immediate

    Add Lo and LoHi to Lo, Lo to Hi, Hi to Hi

    value to SPfrom address from SPfrom address from PC

    ADC Rd, Rm

    ADD Rd, #

    ADD Rd, Rn, #ADD Rd, Rn, RmADD Rd, Rm

    ADD SP, #ADD Rd, SP, #ADD Rd, PC, #

    Rd := Rn + Rm + C-bit

    Rd := Rd + immed_8

    Rd := Rn + immed_3Rd := Rn + RmRd := Rd + Rm

    SP := SP + immed_7 * 4Rd := SP + immed_8 * 4Rd := (PC AND 0xFFFFFFFC) + immed_8*4

    Arithmetic instructions:

  • P-142/151PAL/

    2.5 Data2.5 Data--processing Instructions [2/4]processing Instructions [2/4]

    Operation Assembler Action

    Multiply

    immediate 8with carryvalue from SP

    Negate

    immediate 3

    Subtract

    negativeimmediate

    No operation

    Compare

    MUL Rd, Rm

    SUB Rd, #SBC Rd, RmSUB SP, #NEG Rd, Rm

    SUB Rd, Rn, #

    SUB Rd, Rn, Rm

    CMN Rn, RmCMP Rn, #NOP

    CMP Rn, Rm

    Rd := Rm * Rd

    Rd := Rd immed_8Rd := Rd Rm NOT C-bitSP := SP immed_7 * 4Rd := Rm

    Rd := Rn immed_3

    Rd := Rn Rm

    update CPSR flags on Rn + Rmupdate CPSR flags on Rn immed_8R8 := R8

    update CPSR flags on Rn Rm

    Arithmetic instructions:

  • P-143/151PAL/

    2.5 Data2.5 Data--processing Instructions [3/4]processing Instructions [3/4]

    Hi to Lo, Lo to Hi, Hi to Hi

    ImmediateLo to Lo

    Operation Assembler Action

    Move instructions:

    Operation Assembler Action

    Logical instructions:

    Rd := Rm

    Rd := immed_8Rd := Rm

    MOV Rd, Rm

    MOV Rd, #MOV Rd, Rm

    OR

    Exclusive OR

    AND

    Bit clearMove NOTTest bits

    ORR Rd, Rm

    EOR Rd, Rm

    AND Rd, Rm

    BIC Rd, RmMVN Rd, RmTST Rn, Rm

    Rd := Rd OR Rm

    Rd := Rd EOR Rm

    Rd := Rd AND Rm

    Rd := Rd AND NOT RmRd := NOT Rmupdate CPSR flags on Rn AND Rm

  • P-144/151PAL/

    2.5 Data2.5 Data--processing Instructions [4/4]processing Instructions [4/4]

    Operation Assembler Action

    Shift/Rotate instructions:

    Logical shift right

    LSL Rd, Rs

    Logical shift left

    LSR Rd, RsArithmetic shift rightASR Rd, RsRotate right

    LSR Rd, Rm, #

    Rd := Rd > RsASR Rd, Rm, #Rd := Rd ASR RsROR Rd, Rs

    Rd := Rm >> immed_5

    Rd := Rm

  • P-145/151PAL/

    2.5 Examples2.5 ExamplesADD R0, R4, R7 ; R0 = R4 + R7.SUB R6, R1, R2 ; R6 = R1 R2.ADD R0, #255 ; R0 = R0 + 255.ADD R1, R4, #4 ; R1 = R4 + 4.NEG R3, R1 ; R3 = 0 R1.AND R2, R5 ; R2 = R2 AND R5.EOR R1, R6 ; R1 = R1 EOR R6.CMP R7, #100 ; update flag after R7 100.MOV R0, R12 ; R0 = R12.ADD R8, R10 ; R8 = R8 + R12.

  • P-146/151PAL/

    2.5 Load and Store Register Instructions [1/2]2.5 Load and Store Register Instructions [1/2]

    Operation Assembler Action

    Load instructions:

    SP-relative

    signed halfwordbytesigned byte

    PC-relative

    halfword

    with immediate offset, wordhalfwordbyte

    with register offset, word

    Multiple

    Rd := [SP + immed_8 * 4]

    Rd :=SignExtend([Rn + Rm][15:0])Rd :=ZeroExtend([Rn + Rm][7:0])Rd :=SignExtend([Rn + Rm][7:0])Rd := [(PC AND 0xFFFFFFFC) + immed_8*4]

    Rd :=ZeroExtend([Rn + Rm][15:0])

    Rd := [Rn + immed_5 * 4]Rd :=ZeroExtend([Rn + immed_5 * 2][15:0])Rd :=ZeroExtend([Rn + immed_5][7:0])Rd := [Rn + Rm]

    Loads list of registers

    LDR Rd, [SP, #]

    LDRSH Rd, [Rn, Rm]LDRB Rd, [Rn, Rm]LDRSB Rd, [Rn, Rm]LDR Rd, [PC, #]

    LDRH Rd, [Rn, Rm]

    LDR Rd, [Rn, #]LDRH Rd, [Rn, #]LDRB Rd, [Rn, #]LDR Rd, [Rn, Rm]

    LDMIA Rn!,

  • P-147/151PAL/

    2.5 Load and Store Register Instructions [2/2]2.5 Load and Store Register Instructions [2/2]

    Operation Assembler Action

    Store instructions:

    SP-relative,wordMultiple

    byte S

    halfword

    with immediate offset, wordhalfwordbyte

    with register offset, word

    STR Rd, [SP, #]STMIA Rn!,

    STRB Rd, [Rn, Rm]

    STRH Rd, [Rn, Rm]

    STR Rd, [Rn, #]STRH Rd, [Rn, #]STRB Rd, [Rn, #]STR Rd, [Rn, Rm]

    [SP + immed_8*4] := RdStores list of registers

    [Rn + Rm][7:0] := Rd[7:0]

    [Rn + Rm][15:0] := Rd[15:0]

    [Rn + immed_5*4] := Rd[Rn + immed_5*2][15:0] := Rd[15:0][Rn + immed_5][7:0] := Rd[7:0][Rn + Rm] := Rd

  • P-148/151PAL/

    2.5 Examples2.5 ExamplesLDR R4,[R2,#4] ;load word into R4 from address R2+4.STR R0,[R7,#12];store word from R0 to address R7+12.STRB R1,[R5,#31];store byte from R1 to address R5+31. STRH R4,[R2,R3];store halfword from R4 to address. R2+R3

    LDMIA R7!,{R0-R3,R5};load R0-R3 and R5 from R7, ;add 20 to R7.

    STMIA R0!,[R3,R4,R5] ;store R3-R5 to R0; add 12 to R0.

  • P-149/151PAL/

    2.5 Stack Operation and Exception Instructions2.5 Stack Operation and Exception Instructions

    Operation Assembler Action

    Push & pop instructions:

    Operation Assembler Action

    Software interrupt & breakpoint instructions:

    Breakpoint

    PopPop and returnPop and return with exchange

    PushPush with link

    BKPT

    PUSH PUSH POP POP POP

    Prefetch abort or enter debug state

    Pop registers from stackPop registers, branch to address loaded to PCPop, branch, and change to ARM state if address[0] = 0

    Push registers onto stackPush LR and registers on to stack

    Software interrupt Software interrupt processor exception

  • P-150/151PAL/

    2.5 Examples2.5 ExamplesfunctionPUSH {R0-R7,LR} ;push onto the stack (R13) R0-R7

    ;and the return address code of the ; function body.

    ..

    POP {R0-R7, PC} ;restore R0-R7 from the stack and ;the program counter, and return.

  • P-151/151PAL/

    ReferencesReferencesS. Wong, S. Vassiliadis, and S. Cotofana, Embedded Processors: Characteristics and Trends, Technical Report CE-TR-2004-03, Computer Engineering Laboratory, Delft, Netherlands, 2004.The ARM document in http://www.arm.comThe MIPS document in http://www.mips.comThe Cell document in http://www.ibm.comThe Xscale document in http://www.intel.comD. Seal, ARM Architecture Reference Manual, Addison Wesley, 2000.