Upload
vanbao
View
229
Download
0
Embed Size (px)
Citation preview
Chapter 2Chapter 2Structures of the Embedded ProcessorsStructures of the Embedded Processors
Professor Professor TzyyTzyy--KuenKuen TienTienEE--mail: mail: [email protected]@mail.stut.edu.tw
Http://Http://www.eecs.stut.edu.twwww.eecs.stut.edu.twSTUT/EESTUT/EE
P-2/151PAL/
OutlineOutline2.1 Characteristics of Embedded Processors 2.1 Characteristics of Embedded Processors 2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell Processors2.3 Introduction to ARM 322.3 Introduction to ARM 32--bit CPU Core Familybit CPU Core Family2.4 Intel 2.4 Intel XScaleXScale CoreCore2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction Set
P-3/151PAL/
OutlineOutline2.1 Characteristics of Embedded Processors2.1 Characteristics of Embedded Processors2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell Processors2.3 Introduction to ARM 322.3 Introduction to ARM 32--bit CPU Core Familybit CPU Core Family2.4 Intel 2.4 Intel XScaleXScale CoreCore2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction Set
P-4/151PAL/
2.1 2.1 Characteristics of Embedded Processors Characteristics of Embedded Processors On the Meaning of Embedded SystemsEmbedded System RequirementsEmbedded ProcessorsCharacteristics of Embedded Processors
P-5/151PAL/
2.1 Digital Systems2.1 Digital SystemsClassification of digital systems.
General-purpose systems.The systems are not customized for any specific application.Examples of general-purpose systems include: desktop computers, workstations, and server systems.
Application-specific systems.The systems are designed for dedicated applications.The systems can be found in process control, networking, home appliances, consumer-electronics devices, etc.
P-6/151PAL/
2.1 Embedded System2.1 Embedded SystemAn embedded system usually includes a application-specific system and some non-electronic or electronic systems.
Definition of embedded systems.Embedded systems (inexpensive) are mass-produced elements of a large system providing a dedicated, possibly time constrained, service to that system.
Examples of embedded systems include: fax machines, cell phones, printers, CD players, etc.
P-7/151PAL/
2.1 An Embedded System Example2.1 An Embedded System ExampleAn electronic system is embedded within an external process (plant) comprising a physical system and human operators performing supervising and parameter setting activities.
P-8/151PAL/
2.1 2.1 Characteristics of Embedded Processors Characteristics of Embedded Processors On the Meaning of Embedded SystemsEmbedded System RequirementsEmbedded ProcessorsCharacteristics of Embedded Processors
P-9/151PAL/
2.1 Embedded System Requirements2.1 Embedded System RequirementsFunctional requirements.Temporal requirements.Dependability requirements.
P-10/151PAL/
2.1 Functional Requirements2.1 Functional RequirementsData gathering.
To obtain information from the embedded system or the environment surrounding the embedded system.
Data transformation.To display, to send, to operate the digital data, etc.
Data control.Based on the transformed data a decision is taken to act on the environment.
P-11/151PAL/
2.1 Temporal Requirements2.1 Temporal RequirementsSome tasks have deadlines.All the tasks with hard real-time deadlines are critical and a failure to complete any of them having catastrophic results.
P-12/151PAL/
2.1 Dependability Requirements2.1 Dependability RequirementsReliability measures.
the embedded systems time of being operational within a certain time span.
Maintainability measures.The time to repair a system after a failure occurrence.
Availability measures.The fraction of time that the embedded system is available to provide its services with respect to the total time.
P-13/151PAL/
2.1 2.1 Characteristics of Embedded ProcessorsCharacteristics of Embedded Processors
On the Meaning of Embedded SystemsEmbedded System RequirementsEmbedded ProcessorsCharacteristics of Embedded Processors
P-14/151PAL/
2.1 Embedded Processor2.1 Embedded ProcessorDefinition of an embedded processor.
An embedded processor is an (inexpensive) mass-producedprocessing element of a larger system providing dedicated computations and other, possibly real-time, services to that system.
P-15/151PAL/
2.1 2.1 Characteristics of Embedded ProcessorsCharacteristics of Embedded Processors
On the Meaning of Embedded SystemsEmbedded System RequirementsEmbedded ProcessorsCharacteristics of Embedded Processors
P-16/151PAL/
2.1 Characteristics of Embedded Processors2.1 Characteristics of Embedded Processors
Embedded-processor are application-specific processors.The processors can be and should be optimized for the applications.Hardware/software co-design methodologies can be applied.
Embedded processors have a property of static structure.Limited access to programming.
Embedded-processor are non-homogeneous processors.Induced by the heterogeneous character of the process within which the processor is embedded.
P-17/151PAL/
2.1 Heterogeneous Characters2.1 Heterogeneous CharactersBoth the digital sub-processors may be present in the systemThe topology of the system is rather irregularThe hardware may include microprocessors, micro-controllers, DSPs, ASICs, FPGAs, etc.The software may include various software modules as well as a multitasking real-time operating system
P-18/151PAL/
OutlineOutline2.1 Characteristics of Embedded Processors 2.1 Characteristics of Embedded Processors 2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell Processors2.3 Introduction to ARM 322.3 Introduction to ARM 32--bit CPU Core Familybit CPU Core Family2.4 Intel 2.4 Intel XScaleXScale CoreCore2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction Set
P-19/151PAL/
2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell ProcessorsMIPS processorsARM processorsCell processors
P-20/151PAL/
2.22.2 MIPS 32MIPS 32--bit Coresbit Cores
M4K
4K 4KE
4KSd
Most SecureLicensable Core
For consumer devices
Entry-Level Cores
24K
24KE
34K
NextGen
NextGen
Host Based SignalProcessing
SIMD up to 3x signalProcessing performance
MicrocontrollerReplacement
225 MHz @ < 0.5mmin 130nm G
2
Mainstream EmbeddedProcessor
233MHz at 130nm G
Fastest Single ThreadedPerformance
400MHz @ 130nm G625 MHz @ 130 nm LV
lowK-OD
MT + DSPExtensions
Up to 2x systemlevel performance500MHz @ 90 nm G
High-Performance CoresMulti-threaded
Single-threaded
MIPS 32-bit Cores
P-21/151PAL/
2.22.2 MIPS32 34K FamilyMIPS32 34K FamilyDesigned to exploit multi-threading in embedded applications.Processing multiple software threads in parallel.
CorExtend
CoprocessorInterface
FPU
EJTAGTrace
MDU
ExecutionUnit
PowerMgmt.
TCDispatch
Unit
FetchUnit
MMU
Load/StoreUnit
PolicyManager(for QoS)
BIU
D-cache(0-64K)
I-cache(0-64K)
D-SPRAM
I-SPRAM
ITC
Off-chip
Debug
I/F
OCP-DMA
I/F
OCP
I/F
OCP-DMA
I/F
optional
P-22/151PAL/
Designed to power through graphics, Java and demanding code and with features like an ultra fast multiply, etc.
2.22.2 MIPS32 24K FamilyMIPS32 24K Family
Mul/Div unit
MIPS32*32-bitexecution unit
OptionalFPU
CorExtendunit
MMUcontrol
FMT orTLB
Datacache
Cachecontroller
Inst.cache
Powermanagement
BIU for64-bit interface
EJTAG on-chipdebugging
OCP
Inte
rfac
eTA
P In
terf
ace
P-23/151PAL/
Incorporating the MIPS DSP Application Specific Extension (ASE) to MIPS32 24k.Providing efficient DSP capability while reducing overall SoCdie area, cost, and power consumption.
2.22.2 MIPS32 24KE FamilyMIPS32 24KE Family
Mul/Div Unit
DSP ASE
MIPS32*32-bitexecution unit
DSP ASE
OptionalFPU
OptionalCorExtend
unit
MMUcontrol
FMT orTLB
Datacache
Cachecontroller
Inst.cache
Powermanagement
BIU for64-bit interface
EJTAG on-chipdebugging
OCP
Inte
rfac
eTA
P In
terf
ace
P-24/151PAL/
Designed for SoC applications that require an easy-to-use, and cost-efficient embedded processor.
2.22.2 MIPS32 4K FamilyMIPS32 4K Family
MultiplyDivide Unit
RegisterFile
BranchControl
ALUShifter
InstructionCache
PowerManagement
CacheControl
MMU
BIU
Data Cache EJTAG
P-25/151PAL/
The mainstream embedded processor for MIPS32 family.
2.22.2 MIPS32 4KE FamilyMIPS32 4KE Family
Mul/Div Unit
MIPS32*32-bitexecution unit
User-definedcoprocessor
MMUcontrol
FMT orTLB
Datacache
Cachecontroller
Inst.cache
Powermanagement
BIU for 32-bit EC interface
EJTAG on-chipdebugging
EC In
terf
ace
COP Interface
TAP
Inte
rfac
e
P-26/151PAL/
Designed for emerging secure data applications and stringent power, security and size requirements for smart cards.
2.22.2 MIPS32 4Ksd FamilyMIPS32 4Ksd Family
ExecutionCore
SecureMMU
Security
TLB
InstructionCache
SecureCache Control
DataCache
EJTAG
BIU
PowerMgmt.
Fixed Configurable
On-
Chip
Bus
(es
)
Copr
oces
sor
i
nter
face
2
Processor Core
P-27/151PAL/
The M4K core enables designers to meet the high system throughput demands of multi-CPU SoC designs while controlling silicon cost.
2.22.2 MIPS32 M4K CoreMIPS32 M4K Core
Mul/DivUnit
MIPS32*32-bitexecution unit
Coprocessor 2interface
MMUcontrol
SRAMinterface
Powermanagement
EJTAG on-chipdebuggingFMT
On-chip
SRAM
TAP
Inte
rfac
e
P-28/151PAL/
2.22.2 MIPS64 5K FamilyMIPS64 5K FamilyThe series are synthesizable 64-bit MIPS processor cores designed for SoC applications.MIPS64 5Kc core: suited for digital consumer, networking, officeautomation, and embedded applications.MIPS64 5Kf core: fully pipelined IEEE 754-compliant floating-point unit with the multiply/add (MADD) instruction.
MIPS64 5Kf
Mul/DivUnit
MIPS32*32-bitexecution unit
Coprocessor 2interface
MMUcontrol
SRAMinterface
Powermanagement
EJTAG on-chipdebuggingFMT
On-chip
SRAM
TAP
Inte
rfac
e
MIPS64 5Kc
MultiplyDivide
Unit
RegisterFile
BranchControl
ALUShifter
InstructionCache
PowerManagement
CacheControl
MMU
BIU
Data Cache EJTAG
Co-processorInterface
P-29/151PAL/
A 64-bit embedded core providing MIPS-3D enhanced floating-point unit.
2.22.2 MIPS64 20Kc CoreMIPS64 20Kc Core
Div/Sqrt Div/Sqrt
Multiply Multiply
Add Add
MIPS-3D SIMD FPU
Bypass &Pipefile
Registers
Instruction Dispatch UnitFetchBuffer
Decode
Dispath
PipeQueue
ALU a ALU b
Shifter Multi/Div(MAC)
AddressGenerator Branch
Integer Execution Unit
Bypass &Pipefile
Registers
BranchHistoryTableReturn
PredictionStack
32 KBI-Cache
PCGeneration
PrefetchBuffer
Instruction Fetch Unit
JointTLB
uTLB
MMUControl
MMU
32 KB D-Cache
Load/Store Control
Write Buffer
Fill/Store Buffer
MemoryTransaction Queue
Load/Store Unit
AddrOut
DataOut
AddrIn
DataIn
SOC Interface
PowerManagement
& ClockControl
EJTAG
Pipe B
Pipe A
P-30/151PAL/
2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell ProcessorsMIPS processorsARM processorsCell processors
P-31/151PAL/
2.22.2 Introduction to ARMIntroduction to ARMARM: Advanced RISC Machines.ARM limited: Advanced RISC Machines Limited.Characteristics of ARM processors.
Low die area, low power, low cost, and high performance.The most widely used 16/32-bit embedded RISC solution in the world.
ARM products.ARM7, ARM9, ARM9E, ARM10, ARM 11, ARM7, ARM9, ARM9E, ARM10, ARM 11, SecurCoreSecurCore..Intel Intel StrongARMStrongARM, , XscaleXscale..
P-32/151PAL/
2.22.2 ARM History (1/2)ARM History (1/2)1985: Acorn Computer Group developed the world's first commercial RISC processor. 1987: Acorn's ARM processor debuts as the first RISC processor for low-cost PCs.1990: ARM Ltd.ARM Ltd. was established as a separate company.1991: ARM introduced its first embeddable RISC core, the ARM6 solution. 1993: ARM introduced the ARM7 core. 1995: ARM announced the Thumb architecture extension. 1997: ARM9TDMI family announced. 1998: ARM announced ARM10 Thumb family of processors. 2000: ARM introduced ARM922T core. ARM supported Intel's launch of ARM architecture-compliant XScale microarchitecture.
P-33/151PAL/
2001: ARM announced new ARMv6 architecture. 2002: ARM launched the ARM11 microarchitecture. 2003: AMBA 3.0 (AXI) methodology announced. 2004: ARM Cortex-M3 processor announced. 2005: ARM Cortex-A8 processor announced. 2006: Cortex-R4 processor announced.
2.22.2 ARM History (2/2)ARM History (2/2)
P-34/151PAL/
2.22.2 ARM Architecture RevisionsARM Architecture Revisions
1994 1996 1998 2000 2002 2004 2006
ARM7TDMI ARM720T
Strong ARM ARM920T
V4
ARM v5
ARM v6
ARM1020ARM9E
XScale
ARM926EJARM10EJ
V6 cores
ARM1022E
time
P-35/151PAL/
2.22.2 ARM7 Processors (1/2)ARM7 Processors (1/2)ARM7 processors.
ARM7TDMI, ARM7TDMIARM7TDMI, ARM7TDMI--S, ARM7EJS, ARM7EJ--S, ARM720T.S, ARM720T.Characters.
Embedded ICEEmbedded ICE--RT debug logic.RT debug logic.Very low power consumption.Very low power consumption.0.9 MIPS/MHz, three0.9 MIPS/MHz, three--stage pipeline.stage pipeline.A von Neumann machine architecture.
Each successive operation can read or write any memory Each successive operation can read or write any memory location, independent of the location accessed by the location, independent of the location accessed by the previous operation.previous operation.
A von Neumann machine also has a CPU with one or more A von Neumann machine also has a CPU with one or more registers that hold data that are being operated on.registers that hold data that are being operated on.
P-36/151PAL/
2.22.2 ARM7 Processors (2/2)ARM7 Processors (2/2)
The ARM7 Thumb Family
Thumb Extensions Thumb Extensions Thumb Extensions
Thumb ExtensionsARM7 Core ARM7 Core
ARM7 CoreARM v4T ARM v4T
ARM v4TETM7 Interface ETM7 Interface
ETM7 InterfaceEmbeddedICE-RT EmbeddedICE-RT
EmbeddedICE-RT EmbeddedICE-RT
ARM v5TEJ
Jazelle DBX Extensions
DSP Extensions
EMT9 Interface
AHB Interface
ARM 7TDMIInteger core
ARM 7TDMI-SSynthesizable integer core
ARM 7EJ-SJazelle DBX enables core
ARM 720TOpen platform processor core
8K Cache
MMU
P-37/151PAL/
2.22.2 ARM7TDMI and ARM7TDMIARM7TDMI and ARM7TDMISSARM7TDMI core is the ARM7TDMI core is the industrys most widely used 32-bit RISC embedded microprocessor.ARM7TDMIARM7TDMI--S is the S is the synthesizable version of ARM7TDMI.version of ARM7TDMI.
CoprocessorInterface
Bus Interface Unit
ETM Interface Embedded ICE-RT logic
ARM7TDMIcore
ARM
7TD
MI
P-38/151PAL/
2.22.2 ARM7EJARM7EJ--S and ARM720TS and ARM720T
ETM Interface
ARM v5TEJCPU core
ARM7EJ-S
CoprocessorInterface
ETM Interface
ARM7TDMIcore
ARM
720T
AMBA AHB bus
Control Logic and Bus Interface Unit
Write buffer
Cache
MMU
Control Logic and Bus Interface Unit
CoprocessorInterface
P-39/151PAL/
2.22.2 ARM9 Processor Cores (1/2)ARM9 Processor Cores (1/2)ARM9 processor cores.
ARM920T, ARM922T, ARM940T.ARM920T, ARM922T, ARM940T.
CharacteristicsCharacteristics5-stage integer pipeline achieves 1.1 MIPS/MHz.integer pipeline achieves 1.1 MIPS/MHz.Single 32-bit AMBA bus interface. Excellent debug support for SoC designers, including ETMinterface. MMU supporting Windows CE, Symbian OS, Linux, Palm OS.
P-40/151PAL/
2.22.2 ARM9 Processor Cores (2/2)ARM9 Processor Cores (2/2)ARM920T
Dual 16k caches for applications running for applications running SymbianSymbian OS, Palm OS, Palm OS, Linux and Windows CE.OS, Linux and Windows CE.
ARM922TDual 8k caches for applications running for applications running SymbianSymbian OS, Palm OS, Palm OS, Linux and Windows CE.OS, Linux and Windows CE.
ARM940TDual 4k caches for embedded control applications running a for embedded control applications running a RTOS.RTOS.
P-41/151PAL/
2.22.2 ARM920T & 922T Cores (1/2)ARM920T & 922T Cores (1/2)
MMU
ARM9 Core
Dual 8K Caches
Thumb Extensions
ETM9 Interface
Embedded ICE
ARM v4T
ASB Interface
ARM922TOpen platform processor core
MMU
ARM9 Core
Dual 16K Caches
Thumb Extensions
ETM9 Interface
Embedded ICE
ARM v4T
ASB Interface
ARM920TOpen platform processor core
P-42/151PAL/
2.22.2 ARM920T & ARM922T Cores (2/2)ARM920T & ARM922T Cores (2/2)
ETM Interface
ARM9TDMIcore
16KInstruction
Cache
MMU
16KData
Cache
MMU
CoprocessorInterfaceA
RM92
0T
Control Logic and Bus Interface Unit
Write buffer
AMBA ASB interface
ETM Interface
ARM9TDMIcore
8KInstruction
Cache
MMU
8KData
Cache
MMU
CoprocessorInterfaceA
RM92
2T
Control Logic and Bus Interface Unit
Write buffer
AMBA ASB interface
P-43/151PAL/
2.22.2 ARM9E Processor family (1/2)ARM9E Processor family (1/2)ARM9E processors.
ARM926EJ-S , ARM946E-S, ARM966E-S, ARM968E-S and ARM996HS.
CharacteristicsDSP-enhanced 32-bit RISC processors. Enable single processor solutions for microcontroller, DSP and Java applications.Offering savings in chip area and complexity, power consumption, and time-to-market.Suited for applications requiring a mix of DSP and microcontroller performance.
P-44/151PAL/
2.22.2 ARM9E Processor family (2/2)ARM9E Processor family (2/2)
Dual AHB-Lite AHB Interface AHB Interface
ARM 9E
0-4MB TCMs
TCM Interfaces
Dual AHB
ARM 968E-S ARM 966E-S
ARM 946E-S ARM 926EJ-S
MMU
EMT9 Interface EMT9 Interface EMT9 Interface EMT9 Interface
ARM 9E ARM 9E ARM 9EJ
0-64MB TCMs
0-1MB TCMs 0-1MB TCMs
0-1MB Caches 4-128KB Caches
TCM Interfaces
TCM Interfaces TCM Interfaces
Protection UnitThe ARM9E Family
P-45/151PAL/
2.22.2 ARM946EARM946E--S & ARM926EJS & ARM926EJ--SS
ARM946E-S ARM926EJ-S
ETM Interface
ARM9Ecore
ARM
946E
-S
CoprocessorInterface
Control Logic and Bus Interface Unit
Write buffer
AMBA AHB interface
DataCache
MPU
DataTCM interface
InstructionCache
MPU
InstructionTCM interface
ETM Interface
ARM9EJ-Score
ARM
946E
J -
S
CoprocessorInterface
Control Logic and Bus Interface Unit
Write buffer
16KData
Cache
MMU
DataTCM interface
InstructionCache
MMU
InstructionTCM interface
Instruction DataAMBA AHB interface
P-46/151PAL/
2.22.2 ARM966EARM966E--S & ARM968ES & ARM968E--SS
ARM966E-S ARM968E-S
ETM Interface
ARM9Ecore
ARM
966E
-S
CoprocessorInterface
Control Logic and Bus Interface UnitWrite buffer
AMBA AHB interface
DataTCM interface
InstructionTCM interface
ETM Interface
ARM9Ecore
ARM
968E
-S
Write buffer
DataTCM interface
InstructionTCM interface
Control Logic and Bus Interface Unit Arbitration
AMBA AHB interfaceCoprocessorInterface
P-47/151PAL/
2.22.2 ARM10E Processors (1/2)ARM10E Processors (1/2)ARM10E processors.
ARM1020E, ARM1022E and ARM1026EJ-S. Characteristics
Offering an excellent combination of high performance and low power consumption. Includes new architectural features to deliver the highest MIPS/MHz of any ARM product. Features new power-saving modes, 64-bit load-store micro-architecture, and IEEE754 compatible floating-point coprocessor with vector operations.
P-48/151PAL/
The ARM1022E is identical to the ARM1020E macrocell but has 16K/16K caches while the ARM1020E has 32K/32K caches.
2.22.2 ARM10E Processors (2/2)ARM10E Processors (2/2)
ETM Interface
ARM10TMcore
22KInstruction
CacheMMU
22KData
CacheMMU
CoprocessorInterfaceA
RM10
20E
Control Logic and Bus Interface Unit
Write buffer
ETM10C Interface
ARM1026EJ-Score
ARM
1026
EJ -
S Control Logic and Bus Interface UnitWrite buffer
DataCache
MMU MPU
DataTCM
Interface
InstructionCache
MMU MPU
InstructionTCM
Interface
Instruction DataAMBA AHB InterfaceVFP10
InterfaceWC
InterfaceInstruction DataAMBA AHB Interface
P-49/151PAL/
2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell ProcessorsMIPS processorsARM processorsCell processors
P-50/151PAL/
2.22.2 Key Attributes of Cell Processor (1/2)Key Attributes of Cell Processor (1/2)Cell is Multi-Core.
Contains 64-bit Power Architecture.Contains 8 Synergistic Processor Elements (SPE).
Cell is a Flexible Architecture. Multi-OS support (including Linux) with Virtualization technology. Path for OS, legacy apps, and software development.
Cell is a Broadband Architecture. SPE is RISC architecture with SIMD organization and Local Store. 128+ concurrent transactions to memory per processor.
P-51/151PAL/
Cell is a Real-Time Architecture.Resource allocation (for Bandwidth Measurement).Locking Caches (via Replacement Management Tables).
Cell is a Security Enabled Architecture. SPE dynamically reconfigurable as secure processors.
2.22.2 Key Attributes of Cell Processor (2/2)Key Attributes of Cell Processor (2/2)
P-52/151PAL/
2.22.2 Cell Chip Block Diagram Cell Chip Block Diagram Cell Chip Block DiagramSUX SUX
LS LS
SMF SMF
EIB (up to 96 Bytes/cycle)
L2
L1 PXU
MIC BIC
SUX SUX SUX SUX SUX SUX
LS LS LS LS LS LS
SMF SMF SMF SMF SMF SMF
PPEDual XDR
FlexIO
SPU SPE
P-53/151PAL/
2.22.2 Cell Prototype DieCell Prototype Die
P-54/151PAL/
2.22.2 Cell HighlightsCell HighlightsObserved clock speed.
> 4 GHz.Peak performance (single precision).
> 256 GFlops. Peak performance (double precision).
> 26 GFlops.Area: 221mm2.Technology: 90nm Silicon on Insulator (SoI).Total # of transistors: 234M.
P-55/151PAL/
2.22.2 Element Interconnect Bus Element Interconnect Bus EIB data ring for internal communication.
Four 16 byte data rings, supporting multiple transfers. 96B/cycle peak bandwidth. Over 100 outstanding requests.
P-56/151PAL/
2.22.2 Power Processor Element Power Processor Element PPE handles operating system and control tasks.
64-bit Power Architecture.2-way hardware Multi-threading. Coherent Load/Store with 32KB I & D L1 and 512KB L2.
P-57/151PAL/
2.22.2 Synergistic Processor Element Synergistic Processor Element SPE provides computational performance.
Up to 16-way 128-bit SIMD.Dedicated resources: 128 128-bit RF, 256KB Local Store. Dedicated DMA engine: Up to 16 outstanding request.
P-58/151PAL/
2.22.2 SPE Organization SPE Organization
Floating Point UnitFixed Point Unit
Permute UnitChannel Unit
Forwarding MacroRegister File
Issue Control
Instruction LoadBuffer
Load Store
Read Data Latch
DMA ReadData Latch
DMA WriteData Latch
DMA Unit
Even pipe Odd pipe
3 X 16B operands
16B result
3 X 16B operands
16B result
16B load/store
2 Instructions
128B line read
64B read transfer
8B DMA
OutBus
8B DMA
InBus
128B line write
On chip interconnects
P-59/151PAL/
2.22.2 I/O and Memory Interfaces I/O and Memory Interfaces I/O Provides wide bandwidth.
Dual XDR controller. Two configurable interfaces. Flexible Bandwidth between interfaces. Allows for multiple system configurations.
P-60/151PAL/
2.22.2 Cell Processor Example Application AreasCell Processor Example Application Areas
Cell is a processor that excels at processing of rich media contentin the context of broad connectivity.
Digital content creation. Game playing and game serving. Distribution of (dynamic, media rich) content. Imaging and image processing. Image analysis (e.g. video surveillance). Next-generation physics-based visualization. Streaming applications (codecs etc.). Physical simulation & science.
P-61/151PAL/
OutlineOutline2.1 Characteristics of Embedded Processors 2.2 MIPS, ARM, and Cell Processors2.3 Introduction to ARM 32-bit CPU Core Family2.4 Intel XScale Core2.5 XScale (ARM V5TE) Instruction Set
P-62/151PAL/
2.3 ARM 322.3 ARM 32--bit CPU Family (1/2)bit CPU Family (1/2)ARM7 family.
Application core: ARM720T.Embedded cores: ARM7EJ-S, ARM7TDMI, ARM7TDMI-S.
ARM9 family.Application cores: ARM920T, ARM922T.
ARM9E family.Application core: ARM926EJ-S.Embedded cores: ARM946E-S, ARM966E-S, ARM968E-S, ARM996E-S.
ARM10E family.Application cores: ARM1020E, ARM1022E, ARM1026EJ-S.Embedded core: ARM1026EJ-S.
P-63/151PAL/
ARM11 family.Application cores: ARM11 MPcore, ARM1136J(F)-S, ARM1176JZ(F)-S.Embedded core: ARM1156T2(F)-S.
T: T: 1616--bit Thumb Instruction Set.bit Thumb Instruction Set.D: D: onon--chip debug.chip debug.M: M: Enhancement Multiplier.Enhancement Multiplier.I: I: Embedded ICE hardware, interrupt & test.Embedded ICE hardware, interrupt & test.S: S: Synthesizable.Synthesizable.E:E: DSP application.DSP application.J:J: JazelleJazelle technology for efficient embedded Java execution.technology for efficient embedded Java execution.F: Integrated floating point coprocessor.
2.3 ARM 322.3 ARM 32--bit CPU Family (2/2)bit CPU Family (2/2)
P-64/151PAL/
2.3 ARM Instruction Set Architecture (1/4)2.3 ARM Instruction Set Architecture (1/4)ARMv4 is the oldest version of the architecture supported by ARM now.
a 32-bit ISA operating in a 32-bit address space.Implementation: ARM7 core and ARM9 core families, Intel StrongARM.
The ARMv4T architecture added the 16-bit Thumb instruction set to ARMv4.
P-65/151PAL/
The supports by ARMv5, ARMv6, and ARMv7.
2.3 ARM Instruction Set Architecture (2/4)2.3 ARM Instruction Set Architecture (2/4)
ARMv5
Jazelle
VFPv2
ARMv6 ARMv7 A&R ARMv7 M
Thumb-2 only
SIMD
TrustZoneTM
Thumb -2(option)
Thumb 2(mandated)
NEONTMadvanced SIMD
VFPv3
Dynamiccompilersupport
P-66/151PAL/
ARMv5TEImproves to the Thumb architecture, along with ARM Enhanced DSP instruction set extensions to the ARM ISA (in 1999).Implementation: ARM9E, ARM10E families.
ARMv5TEJadded the Jazelle extension to support Java acceleration technology (in 2000).
ARMv6Announced in 2001, better support for multiprocessing environments.Includes media instructions to support Single Instruction Multiple Data (SIMD) software execution.Implementation: ARM11 core family.
2.3 ARM Instruction Set Architecture (3/4)2.3 ARM Instruction Set Architecture (3/4)
P-67/151PAL/
ARMv7Defines three distinct processor profiles: the A profile for sophisticated, virtual memory-based OS and user applications; the R profile for real-time systems; and the M profile optimized for microcontroller and low-cost applications.Implement Thumb 2 technology.Includes the NEON technology extensions to increase DSP and media processing throughput.Implementation: ARM Cortex-XX cores.
Vector Floating Point (VFP).Vector Floating Point (VFP) coprocessor.
ARM TrustZone.Provides hardware support for two separate address spaces. Provides a secure environment.
2.3 ARM Instruction Set Architecture (4/4)2.3 ARM Instruction Set Architecture (4/4)
P-68/151PAL/
2.3 ARM7 Family (1/3)2.3 ARM7 Family (1/3)Integer processor.Synthesizable version of the ARM7TDMI processor. Synthesizable core with DSP and Jazelle technologyenhancements for Java acceleration. Cached core with Memory Management Unit (MMU) supporting operating systems including Windows CE, Palm OS, SymbianOS and Linux.Established, high-volume 32-bit RISC architecture. Up to 130 MIPs (Dhrystone 2.1) performance on a typical 0.13m process. Small die size and very low power consumption. High code density, comparable to 16-bit microcontroller.
P-69/151PAL/
2.3 ARM7 Family (2/3)2.3 ARM7 Family (2/3)Wide operating system and RTOS support - including Windows CE, Palm OS, Symbian OS, Linux and market-leading RTOS.Wide choice of development tools. Simulation models for leading EDA environments. Excellent debug support for SoC designers, including ETM interface.Multiple sourcing from industry-leading silicon vendors. Availability in 0.25m, 0.18m and 0.13m processes. Migration and support across new process technologies. Code is forward-compatible to ARM9, ARM9E and ARM10 processors as well as Intel's XScale technology.
P-70/151PAL/
2.3 ARM7 Family (3/3)2.3 ARM7 Family (3/3)Performance characteristics.
ApplicationsPersonal audio (MP3, WMA, AAC players). Entry level wireless handsets. Two-way pagers.
Cache Size(Inst/Data)
TightlyCoupledMemory
MemoryMgt
BusInterface Thumb DSP Jazelle
ARM 720T
ARM 7EJ-S
ARM 7TDMI
ARM 7TDMI-S
8K unified
-
-
-
-
-
-
-
MMU
-
-
-
AHB
Yes
Yes**
Yes
Yes
Yes
Yes
Yes
No
Yes
No
No
No
Yes
No
No
P-71/151PAL/
2.3 ARM9 Family (1/2)2.3 ARM9 Family (1/2)Dual 16k caches for applications running Symbian OS, Palm OS, Linux and Windows CE. Dual 8k caches for applications running Symbian OS, Palm OS, Linux and Windows CE Applications.32-bit RISC processor with ARM and Thumb instruction sets. 5-stage integer pipeline achieves 1.1 MIPS/MHz.Up to 300 MIPS (Dhrystone 2.1) in a typical 0.13m process. Single 32-bit AMBA bus interface.MMU supporting Windows CE, Symbian OS, Linux, Palm OS. Integrated instruction and data caches. Excellent debug support for SoC designers, including ETM. Interface. 8-entry write buffer avoids stalling the processor when writes to external memory are performed. Portable to latest 0.18m, 0.15m, 0.13m silicon processes.
P-72/151PAL/
2.3 ARM9 Family (2/2)2.3 ARM9 Family (2/2)Performance characteristics.
ApplicationsNext-generation hand-held products. Videophones, portable communicators, PDAs.Digital consumer products. Set-top boxes, home gateways, games consoles, MP3 audio, MPEG4 video.Imaging Desktop printers, still picture cameras, digital video cameras.Automotive Telematic and infotainment systems.
Cache Size(Inst/Data)
TightlyCoupledMemory
MemoryMgt
BusInterface Thumb DSP Jazelle
ARM 920T
ARM 922T
16K/16K
8K/8K
-
-
MMU
MMU
AHB
AHB
Yes
Yes
No
No
No
No
P-73/151PAL/
2.3 ARM9E Family (1/4)2.3 ARM9E Family (1/4)Jazelle technology, memory management unit (MMU), variable size instruction and data caches (4K - 128K), instruction and data tightly coupled memory (TCM) interfaces. Variable size instruction and data cache (0K- 1M), instruction and data TCM(0 - 1M), and memory protection unit for embedded applications. Targets "hard real-time" applications requiring predictable instruction execution timings with high performance and low power consumption. The smallest, lowest power ARM9E family processor to date, aimed specifically at embedded real-time applications. MOVE Video Coprocessor. This coprocessor for ARM9E family processors accelerates the Sum of Absolute Differences (SAD) operation used in MPEG-4 Encoder Motion Estimation.
P-74/151PAL/
2.3 ARM9E Family (2/4)2.3 ARM9E Family (2/4)32-bit RISC processor with ARM, Thumb and DSP instruction sets.ARM Jazelle technology delivers 8x Java acceleration (ARM926EJ-S). 5-stage integer pipeline achieves 1.1 MIPS/MHz.Up to 300 MIPS (Dhrystone 2.1) in a typical 0.13m process. Integrated real-time trace and debug support. Optional VFP9 coprocessor delivers floating-point performance. 215 MFLOPS for 3D graphics and real-time control systems. High-performance AHB system.MMU supporting Windows CE, Symbian OS, Linux, Palm OS (ARM926EJ-S). Integrated instruction and data caches. Real-time debug support for SoC designers, including ETM interface. 16-entry write buffer avoids stalling the processor when writes to external memory are performed. Flexible soft IP delivery, synthesizable to the latest 0.18m, 0.15m, 0.13m silicon processes.
P-75/151PAL/
2.3 ARM9E Family (3/4)2.3 ARM9E Family (3/4)Performance characteristics.
Cache Size(Inst/Data)
TightlyCoupledMemory
MemoryMgt
BusInterface Thumb DSP Jazelle
ARM 926EJ-S
ARM 946E-S
ARM 966E-S
ARM 968E-S
Variable
Variable
-
n/a
Yes
Yes
Yes
Yes
MMU
MPU
-
DMA
2AHB
AHB
AHB
AHB-Lite
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
No
No
ARM 996HS n/a MPU(optional) Dual AMBAAHB Yes Yes No
P-76/151PAL/
2.3 ARM9e Family (4/4)2.3 ARM9e Family (4/4)Applications
Next-generation hand-held products. Videophones, portable communicators, Internet appliances. Digital consumer products. Set-top boxes, home gateways, games consoles. Imaging Desktop printers, still picture cameras, digital video cameras.Storage HDD and DVD drives.Automotive Powertrain, infotainment, ABS, body control systems. Industrial control systems. Motion controls, power delivery. Networking VoIP, Wireless LAN, xDSL.
P-77/151PAL/
2.3 ARM10E Family (1/5)2.3 ARM10E Family (1/5)Features DSP instruction-set extensions, on-chip debugging capability, dual 32 kByte cache memories and full memory management unit (MMU) supporting Windows CE, Symbian OS, Linux and PalmOS. As the ARM1020E, but with dual 16 kByte cache memories. A fully synthesizable processor delivering a new level of performance, functionality and flexibility to enable innovative SoC applications. 32-bit RISC processor with ARM, Thumb and DSP instruction sets. Jazelle technology extension set (ARM1026EJ-S). 6-stage integer pipeline with branch prediction achieves 1.35 MIPS/MHz.
P-78/151PAL/
2.3 ARM10E Family (2/5)2.3 ARM10E Family (2/5)430+ Dhrystone 2.1 MIPS in a widely available 0.13um process. Optional VFP10 coprocessor delivers floating-point performance. 650 MFLOPS for 3D graphics and real-time control systems. Dual 64-bit AMBA AHB bus interface and 64-bit internal bus architecture. MMU supporting Windows CE, Symbian OS, Linux, Palm OS. Integrated instruction and data caches. Parallel load/store unit. Non-blocking hit-under-miss data cache to maximize processor performance with slow memory systems.
P-79/151PAL/
2.3 ARM10E Family (3/5)2.3 ARM10E Family (3/5)8-entry, double-word write buffer avoids stalling the processor when writes to external memory are performed. Real-time debug support for SoC designers, including ETM interface. High-performance AHB system with dual 64-bit Bus Masters. Portable to latest 0.18m, 0.15m, 0.13m silicon processes.
P-80/151PAL/
2.3 ARM10E Family (4/5)2.3 ARM10E Family (4/5)Performance characteristics.
Cache Size(Inst/Data)
TightlyCoupledMemory
MemoryMgt
BusInterface Thumb DSP Jazelle
ARM 1020E
ARM 1022E
32K/32K
16K/16K
-
-
MMU
MMU
2AHB
2AHB
Yes
Yes
Yes
Yes
No
No
ARM 1026EJ-S Variable Yes MMU or MPU 2AHB Yes Yes Yes
P-81/151PAL/
2.3 ARM10E Family (5/5)2.3 ARM10E Family (5/5)Applications
Next-generation hand-held products. Videophones, portable communicators, subnotebookcomputers, Internet appliances. Digital consumer products. Set-top boxes, home gateways, games consoles. Imaging Laser printers, still digital cameras, digital video cameras. Automotive Powertrain, infotainment systems. Industrial control systems. Motion controls, power delivery.
P-82/151PAL/
2.3 ARM11 Family (1/5)2.3 ARM11 Family (1/5)Powerful ARMv6 instruction set architecture. ARM Thumb instruction set reduces memory bandwidth and size requirements by up to 35%.ARM Jazelle technology for efficient embedded Java execution. ARM DSP extensions. SIMD (Single Instruction Multiple Data) media processing extensions deliver up to 2x performance for video processing. ARM TrustZone technology for on-chip security foundation(ARM1176JZ-S and ARM1176JZF-S cores). Thumb-2 core technology for enhanced performance, energy efficiency and code density (ARM1156T2-S and ARM1156T2F-S cores).
P-83/151PAL/
2.3 ARM11 Family (2/5)2.3 ARM11 Family (2/5)Low power consumption:
0.6mW/MHz (0.13m, 1.2V) including cache controllers. Energy saving power-down modes address static leakage currents in advanced processes. Intelligent Energy Manager (IEM) technology for dynamic power management (ARM1176JZ-S and ARM1176JZF-S cores).
High performance integer processor. 8-stage integer pipeline delivers high clock frequency (9 stages for ARM1156T2(F)-S). Separate load-store and arithmetic pipelines. Branch Prediction and Return Stack.
P-84/151PAL/
2.3 ARM11 Family (3/5)2.3 ARM11 Family (3/5)High performance memory system design.
Supports 4-64k cache sizes. Optional tightly coupled memories with DMA for multi-media applications. High-performance 64-bit memory system speeds data access for media processing and networking applications. ARMv6 memory system architecture accelerates OS context-switch.
Vectored interrupt interface and low-interrupt-latency mode speeds interrupt response and real-time performance. Optional Vector Floating Point coprocessor (ARM1136JF-S, ARM1176JZF-S and ARM1156T2F-S cores) for automotive/industrial controls and 3D graphics acceleration.
P-85/151PAL/
2.3 ARM11 Family (4/5)2.3 ARM11 Family (4/5)Performance characteristics.
TightlyCoupledMemory
MemoryMgt
BusInterface
-
Yes
MMU + Cachecoherency
MMU
1 or 2AMBAAXI
5AHB
Yes MPU 3AXI
Yes MMU +TrustZone 4AXI
Cache Size(Inst/Data) Thumb DSP Jazelle
ARM 11 MPCore
ARM 1136J(F)-S
Variable
Variable
Yes
Yes
Yes
Yes
Yes
Yes
ARM 1156T2(F)-S Variable Yes Yes No
ARM 1176JZ(F)-S Variable Yes Yes Yes
P-86/151PAL/
2.3 ARM11 Family (5/5)2.3 ARM11 Family (5/5)Applications
ARM1136J(F)-S
Automotive
Computer
Consumer
Industrial
Infotainment, DVDnavigation
PDA
Digital TV, DVDPVR, Set-top box,games
Networking
ARM1156T2(F)-S ARM1176JZ(F)-S ARM11 MPCore
Wireless
Power train, body *
Infotainment, DVD,navigation, imageand speechrecognition
-
Infrastructure,Switch/router
Smartphone,applicationsprocessor
Printer, data storage
Digital camera
Embedded control
Modem
Base station
PDA
Set-top Box*
EPOS Terminal
*
Combinedapplications andbase-bandprocessor forSmartphone
PDA, Printer,industrial, Kiosks,server blade
DTV, IPSTB, DSC,Camcorder, Gamesbox
-
CPE Terminals,Switch/Router, NAS
Mobile gaming,PDA, Media Player
P-87/151PAL/
OutlineOutline2.1 Characteristics of Embedded Processors 2.1 Characteristics of Embedded Processors 2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell Processors2.3 Introduction to ARM 322.3 Introduction to ARM 32--bit CPU Core Familybit CPU Core Family2.4 Intel 2.4 Intel XScaleXScale CoreCore2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction Set
P-88/151PAL/
2.4 High2.4 High--level Overview of Intel level Overview of Intel XscaleXscale CoreCore
An ARM V5TE compliant microprocessor.
Designed as an embedded core an ASSP (Application Specific Standard Product).
The Intel XScale core implements the integer instruction set architecture of ARM V5, but does not provide hardware support of the floating point instructions.
The Intel XScale core provides the Thumb instruction set (ARM V5T) and the ARM V5E DSP extensions.
P-89/151PAL/
2.4 Features of 2.4 Features of XscaleXscale MicroarchitectureMicroarchitecture (1/2)(1/2)
A 7-stage integer/8-stage memory super-pipelined core.Dynamic Voltage Management.Media Processing Technology.
A multiply-accumulate coprocessor performing two simultaneous 16-bit SIMD multiplies with 40-bit accumulation.
Power Management Unit.128-entry Branch Target Buffer.32 KB Instruction Cache, 2 KB Data Cache.32-Entry Instruction Memory Management Unit.Four Entry Fill and Pend Buffers.Performance Monitoring Unit.
P-90/151PAL/
Debug Unit.32-bit Coprocessor Interface.64-bit Core Memory Bus with Simultaneous 32-bit Input. Path and 32-bit Output Path.8-Entry Write Buffer.Thumb Instruction Set Supported.
2.4 Features of 2.4 Features of XscaleXscale MicroarchitectureMicroarchitecture (2/2)(2/2)
P-91/151PAL/
2.4 2.4 XScaleXScale ArchitectureArchitectureThe following graph shows the major functional blocks of Scale core.
P-92/151PAL/
2.4 Multiply/Accumulate (MAC)2.4 Multiply/Accumulate (MAC)Supports early termination of multiplies/accumulates in two cycles.Sustain a throughput of a MAC operation every cycle.A 40-bit accumulator and support for 16-bit packed data for audio coding algorithms.
P-93/151PAL/
2.4 Memory Management2.4 Memory ManagementThe MMU provides access protection and virtual to physical address translation.Specifies the caching policies for the instruction cache and data cache.The caching policies include:
Identifying code as cacheable or non-cacheable.Selecting between the mini-data cache or data cache.Write-back or write-through data caching.Enabling data write allocation policy.Enabling the write buffer to coalesce stores to external memory.
P-94/151PAL/
2.4 Instruction Cache2.4 Instruction Cache
Implements a 32-Kbyte, 32-way set associative instruction cache with a line size of 32 bytes.All requests that miss the instruction cache generate a 32-byte read request to external memory.A mechanism to lock critical code within the cache is also provided.
P-95/151PAL/
2.4 Branch Target Buffer2.4 Branch Target Buffer
Can predict the outcome of branch type instructions.Provides storage for the target address of branch type instructions.Predicts the next address to present to the instruction cache.
P-96/151PAL/
2.4 Data Cache2.4 Data Cache
Implements a 32-Kbyte, 32-way set associative data cacheand a 2-Kbyte, 2-way set associative mini-data cache.Each cache has a line size of 32 bytes, supporting write-through or write-back caching.
P-97/151PAL/
2.4 Fill Buffer & Write Buffer2.4 Fill Buffer & Write BufferEnable the loading and storing of data to memory beyond the Intel XScale core.The Write Buffer carries.
allowing data coalescing when both globally enabled, and when associated with the appropriate memory page types.
The Fill buffer. assists the loading of data from memory.allows the application processor.
external SDRAM to be read as 4-word bursts, rather than single word accesses.
P-98/151PAL/
2.4 Performance Monitoring2.4 Performance Monitoring
Two performance monitoring counters have been added to monitor various events in the Intel XScale core.These events allow a software developer to measure cache efficiency, detect system bottlenecks and reduce the overall latency of programs.
P-99/151PAL/
2.4 Power Management2.4 Power Management
Assists ASSPs in controlling their clocking and managing their power.
P-100/151PAL/
2.4 Debug2.4 DebugTwo instruction address breakpoint registers.One data-address breakpoint register.One data-address/mask breakpoint register.A mini-instruction cache and a trace buffer.Testability & hardware debugging is supported on the Intel XScale core through the Test Access Port (TAP) Controller implementation.
P-101/151PAL/
2.4 JTAG2.4 JTAGBased on IEEE 1149.1 (JTAG) Standard Test Access Port and Boundary-Scan Architecture.
P-102/151PAL/
OutlineOutline2.1 Characteristics of Embedded Processors 2.1 Characteristics of Embedded Processors 2.2 MIPS, ARM, and Cell Processors2.2 MIPS, ARM, and Cell Processors2.3 2.3 IntroductionIntroduction to ARM 32to ARM 32--bit CPU Core Familybit CPU Core Family2.4 Intel 2.4 Intel XScaleXScale CoreCore2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction Set
P-103/151PAL/
2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction SetOverview of the XScale Instruction Set.The ARM Addressing Modes. The ARM V5TE Instruction Set.The Thumb Instruction Set.
P-104/151PAL/
2.5 Overview of 2.5 Overview of XScaleXScale Instruction SetInstruction SetThe Intel XScale core implements the integer instruction setarchitecture specified in ARM Version 5TE.
The Intel XScale core supports both big and little endian data representation.
The Intel XScale core supports the Thumb instruction set.
The Intel XScale core implements ARMs DSP-enhancedinstruction set.
P-105/151PAL/
2.5 2.5 Extensions to ARM ArchitectureThe Intel XScale core made a few extensions to the ARM Version 5 architecture to meet the needs of various markets and design requirements.
A DSP coprocessor (CP0) has been added that contains a 40-bit accumulator and 8 new operations in coprocessor space, hereafter referred to as new instructions.New page attributes were added to the page table descriptors.Additional functionality has been added to coprocessor 15. Coprocessor 14 was created.Enhancements were made to the Event Architecture, instruction cache and data cache parity error exceptions, breakpoint events,and imprecise external data aborts.
P-106/151PAL/
2.5 2.5 Xscale DSP Coprocessor 0 (CP0)The 40-bit accumulator is referenced by several new instructions that were added to the architecture.
MIA, MIAPH and MIAxy are multiply/accumulate instructions that reference the 40-bit accumulator instead of a register specified accumulator.
MRA and MAR provide the ability to read and write the 40-bit accumulator.
P-107/151PAL/
2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction SetOverview of the XScale Instruction SetThe ARM Addressing ModesThe ARM V5TE Instruction SetThe Thumb Instruction Set
P-108/151PAL/
2.5 Addressing Modes2.5 Addressing ModesData-processing operands.Load and store word or unsigned byte.Miscellaneous loads and stores.Load and store multiple.Load and store coprocessor.
P-109/151PAL/
2.5 Data2.5 Data--processing Operandsprocessing OperandsGeneral instruction syntax.
{} {S} , , 11 addressing modes to calculate the in an ARM data-processing instruction.
Immediate: #Register: Logical shift left by immediate: , LSL #Logical shift left by register: , LSL Logical shift right by immediate: , LSR #Logical shift right by register: , LSR Arithmetic shift right by immediate: , ASR #Arithmetic shift right by register: , ASR Rotate right by immediate: , ROR #Rotate right by register: , ROR Rotate right with extend: , RRX
P-110/151PAL/
2.5 Examples2.5 ExamplesImmediate operand value.
MOV R0, #0 ; Move zero to R0.ADD R3, R3, #1 ; Add one to R3.CMP R7, #1000 ; compare value of R7 with 1000.
Register operand value.MO R2, R0ADD R4, R3, R2 ; R4 = R3 + R2.CMP R7, R8 ; compare the value of R7 and R8.
Shifted register operand value.MOV R2, R0, LSL #2 ; shift R0 left by 2, write to R2.ADD R9, R5, R5, LSL #3 ; R9 = R5 + R5 * 8.RSB R9, R5, R5, LSL #3 ; R9 = R5 * 8 R5.
P-111/151PAL/
2.5 Load and Store Word or Unsigned Byte2.5 Load and Store Word or Unsigned ByteGeneral instruction syntax.
LDR|STR{} {B} {T} , Addressing modes.
Immediate offset: [, #+/-]Register offset: [, +/-]Scaled register offset: [, +/-, #]Immediate pre-indexed: [, #+/-]!Register pre-indexed: [, +/-]!Scaled register pre-indexed: [, +/-, #]!Immediate post-indexed: [], #+/-Register post-indexed: [], +/-Scaled register post-indexed: [], +/-, #
P-112/151PAL/
2.5 Miscellaneous Loads and Stores2.5 Miscellaneous Loads and StoresThere are six addressing modes used to calculate the address forload and store (signed or unsigned) halfword, load signed byte, or load and store doubleword instructions.General instruction syntax.
LDR|STR{}H|SH|SB|D , Addressing modes.
Immediate offset: [, #+/-]Register offset: [, +/-]Immediate pre-indexed: [, #+/-]!Register pre-indexed: [, +/-]!Immediate post-indexed: [], #+/-Register post-indexed: [], +/-
P-113/151PAL/
2.5 Load and Store Multiple2.5 Load and Store MultipleGeneral instruction syntax.
LDM|STM{} {!}, {^}Addressing modes.
Increment after: IA ; non-stack addressing mode.Increment before: IB ; non-stack addressing mode.Decrement after: DA ; non-stack addressing mode.Decrement before: DB ; non-stack addressing mode.Full descending: FD ; stack addressing mode.Empty descending: ED ; stack addressing mode.Full ascending: FA ; stack addressing mode.Empty Ascending: EA ; stack addressing mode.
P-114/151PAL/
2.5 Load and Store Coprocessor2.5 Load and Store CoprocessorGeneral instruction syntax.
{}{L} , , Addressing modes.
Immediate offset: [, #+/-*4]Immediate pre-indexed: [, #+/-*4]!Immediate post-indexed: [], #+/-*4Unindexed: [],
P-115/151PAL/
2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction SetOverview of the Xscale Instruction SetThe ARM Addressing ModesThe ARM V5TE Instruction SetThe Thumb Instruction Set
P-116/151PAL/
2.5 The Condition Field2.5 The Condition FieldAlmost all ARM instructions can be conditionally executed.
If the N, Z, C, and V flags in CPSR satisfy a conditionspecified in the instruction, the instruction is executed.If the flags do not satisfy this condition, the instruction actsas a NOP.
Every instruction contains a 4-bit condition code field in bits 31 to 28.
cond
31 28 27 0
P-117/151PAL/
2.5 Branch Instructions2.5 Branch InstructionsAllow a conditional branch forwards or backwards up to 32MB.List of branch instructions.
B, BL: branch and branch with link.BLX: branch with link and exchange.BX: branch and exchange instruction .
Operation Assembler Action
with link and exchange
Branch
with link and exchange (1)with link and exchange (2)
BL{cond} labelBX{cond} Rm
B{cond} label
BLX lableBLX {cond}
R14 := R15 4 , R15 := label R15 := Rm , Change to Thumb if Rm[0] is 1
R15 := label
R14 := R15 4 , R15 := labelRm R14 := R15 4 , R15 := Rm[31:1]
P-118/151PAL/
2.5 Examples2.5 ExamplesB label ; branch unconditionally to label.BCC label ; branch to label if carry flag is clear.BEQ label ; branch to label if zero flag is set.MOV PC, #0 ; R15 = 0, branch to location zero.BL func ; subroutine call to function func.
P-119/151PAL/
2.5 Data Processing Instructions [1/3]2.5 Data Processing Instructions [1/3]
Rd := Rn OR Oprnd2 Rd := Rn AND NOT Oprnd2
Rd := Rn AND Oprnd2 Rd := Rn EOR Oprnd2
Update CPSR flags on Rn EOR Oprnd2
Update CPSR flags on Rn AND Oprnd2
Operation Assembler Action
ORRBit Clear
ANDEOR
Test equivalence
Test
ORR{cond}{S} Rd , Rn , BIC{cond}{S} Rd , Rn ,
AND{cond}{S} Rd , Rn , EOR{cond}{S} Rd , Rn ,
TEQ{cond} Rn ,
TST{cond} Rn ,
negative
Compare
CMN{cond} Rn ,
CMP{cond} Rn ,
Update CPSR flags on Rn + Oprnd2
Update CPSR flags on Rn - Oprnd2
P-120/151PAL/
2.5 Data Processing Instructions [2/3]2.5 Data Processing Instructions [2/3]Operation Assembler Action
reverse subtract with carrysaturating
with carryreverse subtract
saturatingdouble saturating
Subtract
Add with carry
double saturating
Rd := Oprnd2 Rn NOT ( Carry )Rd := SAT (Rm - Rn)
Rd := Rn Oprnd2 NOT ( Carry )Rd := Oprnd2 - Rn
Rd := SAT ( Rm + Rn )Rd := SAT ( Rm + SAT ( Rn * 2 ) )Rd := Rn Oprnd2
Rd := Rn + Oprnd2Rd := Rn + Oprnd2 + Carry
Rd= SAT ( Rm SAT ( Rn * 2 ) )
RSC{cond}{S} Rd , Rn , QSUB{cond} Rd , Rm , Rn
SBC{cond}{S} Rd , Rn , RSB{cond}{S} Rd , Rn ,
QADD{cond} Rd , Rm , RnQDADD{cond} Rd , Rm , RnSUB{cond}{S} Rd , Rn ,
ADD{cond}{S} Rd , Rn , ADC{cond}{S} Rd , Rn ,
QDSUB{cond} Rd , Rm , Rn
P-121/151PAL/
2.5 Data Processing Instructions [3/3]2.5 Data Processing Instructions [3/3]Operation Assembler Action
register to SPSRregister to CPSR
SPSR to registerCPSR to register
MoveNOT
immediate to SPSRimmediate to CPSR
MSR{cond} SPSR_ , RmMSR{cond} SPSR_ , Rm
MRS{cond} Rd , SPSR MRS{cond} Rd , CPSR
MOV{cond}{S} Rd , MVN{cond}{S} Rd ,
MSR{cond} SPSR_ , #MSR{cond} CPSR_ ,#
SPSR := Rm (selected bytes only)CPSR := Rm (selected bytes only)
Rd := SPSRRd := CPSR
Rd := Oprnd2Rd := 0xFFFFFFFF EOR Oprnd2
SPSR := immed_8r (selected bytes only)CPSR := immed_8r (selected bytes only)
P-122/151PAL/
2.5 Instruction Encoding2.5 Instruction EncodingMOV, MVN
{} {S} , CMP, CMN, TST, TEQ
{} , ADD, SUB, RSB, ADC, SBC, RSC, AND, BIC, EOR, ORR
{} {S} , ,
I bit: distinguishes between the immediate and register forms of.S bit: Signifies that the instruction updates the condition codes.Rn: specifies the first source operand register.Rd: specifies the destination register.shifter_operand: specifies the second source operand.
cond
31 28 2726 25 24 21 20 19 16 15 12 11 0
0 0 I opcode S Rn Rd shifter_operand
P-123/151PAL/
2.5 Multiply Instructions [1/2]2.5 Multiply Instructions [1/2]ARM has two classes of Multiply instruction.
Normal, 32-bit result.Long, 64-bit result.
List of multiply instructions.Normal
MUL: multiplies the values of two registers together, truncates the result to 32 bits, and stores the result in a third register.MLA: multiplies the values of two registers together, addsthe value of a third register, truncates the result to 32 bits, and stores the result in a fourth register.
LongSMLAL: signed multiply accumulate long.SMULL: signed multiply long.UMLAL: unsigned multiply accumulate long.UMULL: unsigned multiply long.
P-124/151PAL/
2.5 Multiply Instructions [2/2]2.5 Multiply Instructions [2/2]Operation Assembler Action
signed accumulate longsigned 16 * 16 bit
unsigned accumulate longsigned long
accumulateunsigned long
Multiply
signed 32 * 16 bitsigned accumulate 16 * 16signed accumulate 32 * 16
RdHi,RdLo := signed ( RdHi , RdLo + Rm * Rs )Rd := Rm[x] * Rs[y]
RdHi,RdLo := unsigned (RdHi,RdLo + Rm * Rs )RdHi,RdLo := signed (Rm * Rs)
Rd := ( ( Rm * Rs ) + Rn ) [31:0]RdHi , RdLo := unsigned ( Rm * Rs )
Rd := ( Rm * Rs ) [31:0]
Rd := ( Rm * Rs[y] ) [47:16]Rd := Rn + Rm[x] * Rs[y] Sticky.Rd := Rn + ( Rm * Rs[y] ) [47:16]
SMLAL{cond}{S} RdLo , RdHi , Rm , RsSMULxy{cond} Rd , Rm , Rs
UMLAL{cond}{S} RdLo, RdHi, Rm, RsSMULL{cond}{S} RdLo , RdHi , Rm , Rs
MLA{cond}{S} Rd , Rm , Rs , RnUMULL{cond}{S} RdLo , RdHi , Rm , Rs
MUL{cond}{S} Rd , Rm , Rs
SMULWy{cond} Rd , Rm , RsSMLAxy{cond} Rd , Rm , Rs , RnSMLAWy{cond} Rd , Rm , Rs , Rn
signed accumulate long 16 * 16 RdHi , RdLo := RdHi , RdLo + Rm[x] * Rs[y] SMLALxy{cond}RdLo , RdHi , Rm , Rs
P-125/151PAL/
2.5 Examples2.5 ExamplesMUL R4, R2, R1 ; set R4 to value of R2 * R1.MULS R4, R2, R1 ; R4 = R2 * R1, set N and Z flags.
; N=1 if value of R4 is negative.; Z=1 if value of R4 is zero.
MLA R7, R8, R9, R3 ; R7 = R8 * R9 + R3.SMULL R4, R8, R2, R3 ; R4 = bits 0 to 31 of R2 * R3.
; R8 = bits 32 to 63 of R2 * R3.
UMULL R6, R8, R0, R1 ; R8, R6 = R0 * R1.UMLAL R5, R8, R0, R1 ; R8, R5 = R0 * R1 + R8, R5.
P-126/151PAL/
2.5 Status Register Access Instructions2.5 Status Register Access InstructionsThere are two instructions for moving the contents of a program status register (PSR) to or from a general-purpose register. Both the SPSR and SPSR can be accessed.MRS: move PSR to general-purpose register.MSR: move general-purpose register to PSR.
register to SPSRregister to CPSR
SPSR to registerCPSR to register
immediate to SPSRimmediate to CPSR
MSR{cond} SPSR_ , RmMSR{cond} SPSR_ , Rm
MRS{cond} Rd , SPSR MRS{cond} Rd , CPSR
MSR{cond} SPSR_ , #MSR{cond} CPSR_ ,#
SPSR := Rm (selected bytes only)CPSR := Rm (selected bytes only)
Rd := SPSRRd := CPSR
SPSR := immed_8r (selected bytes only)CPSR := immed_8r (selected bytes only)
Operation Assembler Action
P-127/151PAL/
2.5 Examples2.5 ExamplesThese example assume that the ARM processor is already in a privileged mode. If the ARM processor starts in User mode, only the flag update has any effect.
MRS R0, CPSR ;read the CPSR.BIC R0, R0, #0xF0000000 ;clear the N, Z, C, and V bits.MSR CPSR_f, R0 ;update the flag bits in the CPSR.
;N, Z, C, and V flags now all clear.
MRS R0, CPSR ;read the CPSR.BIC R0, R0, #0x80 ;set interrupt disable bit.MSR CPSR_f, R0 ;update the control bits in the CPSR.
;interrupts (IRQ) now disabled.
P-128/151PAL/
2.5 Load and Store Instructions [1/3]2.5 Load and Store Instructions [1/3]Support two broad types.
Load or store a 32-bit word or an 8-bit unsigned byte.Load or store a 16-bit unsigned halfword, and can load and sign extend a 16-bit halfword or an 8-bit byte.
Addressing mode is formed from two parts:The base register and the offset.The base register can be one of the general-purposed registers.The offset takes one of three formats: immediate, register, and scaled register.
P-129/151PAL/
2.5 Load and Store Instructions [2/32.5 Load and Store Instructions [2/3]]Operation Assembler Action
Load instructions:
ByteUser mode privilege
branch ( and exchange)
WordUser mode privilege
signedHalfwordsigned
LDR{cond}B Rd , LDR{cond}BT Rd ,
LDR{cond} R15 ,
LDR{cond} Rd , LDR{cond}T Rd ,
LDR{cond}SB Rd , LDR{cond}H Rd , LDR{cond}SH Rd ,
Rd := ZeroExtend [byte from address]
R15 := [address][31:1] ,Change to Thumb if [address] [0] is 1
Rd := [address]
Rd := SignExtend [byte from address]Rd := ZeroExtend [halfword from address]Rd := SignExtend [halfword from address]
P-130/151PAL/
2.5 Load and Store Instructions [3/3]2.5 Load and Store Instructions [3/3]Store instructions:
Operation Assembler Action
User mode privilegeHalfword
User mode privilegeByte
[address][7:0] := Rd[7:0][address][15:0] := Rd[15:0]
[address] := Rd[address][7:0] := Rd[7:0]
STR{cond}BT Rd , STR{cond}H Rd ,
STR{cond}T Rd , STR{cond}B Rd ,
Word [address] := RSTR{cond} Rd ,
P-131/151PAL/
2.5 Examples [1/2]2.5 Examples [1/2]LDR R1, [R0] ;load R1 from the address in R0.LDR R8, [R3, #4] ;load R8 from the address in R3+4.LDR R12, [R13, #-4] ;load R8 from the address in R134.STR R2, [R1, #0x100] ;store R2 to the address in R1+0x100.
LDRB R3, [R8, #3] ;load byte to R3 from R8 + 3.STRB R10, [R4, #0x200] ;store byte from R10 to R4 + 0x200.LDR R11, [R3, R5, LSL #2] ;load R11 from R3 + (R5 * 4).LDR R1, [R0, #4]! ;load R1 from R0 + 4, then R0=R0+4.STRB R7, [R6, #-1]! ;store byte from R7 to R6 - 1, then
; R6 = R6 1.
P-132/151PAL/
2.5 Examples [2/2]2.5 Examples [2/2]LDR R3,[R9],#4 ;load R3 from R9, then R9=R9+4.STR R2,[R5],#8 ;store R2 to R5, then R5=R5+8.LDR R2,[R1],R0 ;load R2 from R1, then R1=R1+R0.
LDRH R1,[R0] ;load halfword to R1 from RSTRH R2,[R1,#0x80];store halfword from R2 to R1+0x80.LDRH R11,[R0,R2] ;load halfword into R11 from
;address in R0+R2.
LDRSH R1, [R0,#2] ;load signed halfword R1 from ;R0+2, then R0=R0+2.
P-133/151PAL/
2.5 Load and Store Multiple Instructions2.5 Load and Store Multiple InstructionsOperation Assembler Action
Push , or Block data storeUser mode registers
Store list of registers to [Rd]Store list of User mode registers to [Rd]
STM{cond} Rd{!} , STM{cond} Rd{!} , ^
Pop , or Block data loadreturn ( and exchange )
and restore CPSR User mode registers
LDM{cond} Rd{!} , LDM{cond} Rd{!} ,
LDM{cond} Rd{!} , ^LDM{cond} Rd , ^
Load list of registers from [Rd]Load registers , R15 := [address][31:1]
Load registers , branch and exchangeLoad list of User mode registers from [Rd]
ExamplesSTMFD R13!, {R0-R12, LR}LDMFD R13!, {R0-R12, PC}LDMIA R0, {R5-R8}STMDA R1!, {R2,R5,R11}
P-134/151PAL/
2.5 Semaphore Instructions2.5 Semaphore Instructions
ExamplesSWP R12,R10,[R9] ;load R12 from address R9 and store
;R10 to address R9.SWPB R3, R4, [R8] ;load byte to R3 from address R8
;and store byte from R4 to address ;R8.
SWP R1, R1, [R] ;exchange value in R1 and address in ;R2.
Operation Assembler Action WordByte
temp := [Rn], [Rn] := Rm, Rd := temptemp :=ZeroExtend ( [Rn] [7:0] ), [Rn] [7:0] := Rm [7:0], Rd := temp
SWP{cond} Rd, Rm, [Rn]SWP{cond}B Rd, Rm, [Rn]
P-135/151PAL/
2.5 Exception2.5 Exception--generating Instructionsgenerating InstructionsList of Exception-generating instructions.
BKPT: breakpoint.SWI: software interrupt.
The Breakpoint (BKPT) instruction is used for software breakpoints in ARM architecture versions 5 and above.The Software Interrupt (SWI) instruction is used to cause a SWI exception to occur.
P-136/151PAL/
2.5 Coprocessor Instructions2.5 Coprocessor InstructionsList of coprocessor instructions.
CDP: coprocessor data operation.LDC: load coprocessor register.MCR: move to coprocessor from ARM register.MRC: move to ARM register from coprocessor.STC: store coprocessor register.
P-137/151PAL/
2.5 2.5 XScaleXScale (ARM V5TE) Instruction Set(ARM V5TE) Instruction SetOverview of the XScale Instruction Set.The ARM Addressing Modes. The ARM V5TE Instruction Set.The Thumb Instruction Set.
P-138/151PAL/
2.5 Thumb Instruction Set2.5 Thumb Instruction SetA re-rncoded subset of the ARM instruction set.To increase the performance of ARM implementations using a 16-bit or narrower memory data bus.Every Thumb instruction is encoded in encoded in 16 bits.Thumb execution is normally entered by executing an ARM BXinstruction. On architecture V5 and above, BLX instruction and LDR/LDM instructions that load PC can be used similarly.
P-139/151PAL/
2.5 Branch Instructions2.5 Branch InstructionsOperation Assembler Action
Change to ARMBranch with link and exchange
Branch and exchange
Conditional branch
Branch with link and exchange
BLX Rm
BX Rm
B{cond} labelSee Table Condition Field (ARM side).AL not allowed
BLX label
R14 := R15 2, R15 := Rm AND 0xFFFFFFFE
R15 := Rm AND 0xFFFFFFFE
R15 := label
R14 := R15 2, R15 := label
Unconditional branchLong branch with link
B labelBL label
R15 := labelR14 := R15 2, R15 := label
P-140/151PAL/
2.5 Examples2.5 ExamplesB label ;unconditionally branch to label.BCC label ;branch to label if carry flag is clear.BEQ label ;branch to label if zero flag is set.BL func ;subroutine call to func.
func ; include body of function here.
MOV PC,LR ;R15=R14, return to instruction after the BLBX R12 ;branch to address in R12, begin ARM
;execution if bit 0 of R12 is zero;otherwise continue executing Thumb code.
P-141/151PAL/
2.5 Data2.5 Data--processing Instructions [1/4]processing Instructions [1/4]
Operation Assembler Action
with carry
immediate
Add Lo and LoHi to Lo, Lo to Hi, Hi to Hi
value to SPfrom address from SPfrom address from PC
ADC Rd, Rm
ADD Rd, #
ADD Rd, Rn, #ADD Rd, Rn, RmADD Rd, Rm
ADD SP, #ADD Rd, SP, #ADD Rd, PC, #
Rd := Rn + Rm + C-bit
Rd := Rd + immed_8
Rd := Rn + immed_3Rd := Rn + RmRd := Rd + Rm
SP := SP + immed_7 * 4Rd := SP + immed_8 * 4Rd := (PC AND 0xFFFFFFFC) + immed_8*4
Arithmetic instructions:
P-142/151PAL/
2.5 Data2.5 Data--processing Instructions [2/4]processing Instructions [2/4]
Operation Assembler Action
Multiply
immediate 8with carryvalue from SP
Negate
immediate 3
Subtract
negativeimmediate
No operation
Compare
MUL Rd, Rm
SUB Rd, #SBC Rd, RmSUB SP, #NEG Rd, Rm
SUB Rd, Rn, #
SUB Rd, Rn, Rm
CMN Rn, RmCMP Rn, #NOP
CMP Rn, Rm
Rd := Rm * Rd
Rd := Rd immed_8Rd := Rd Rm NOT C-bitSP := SP immed_7 * 4Rd := Rm
Rd := Rn immed_3
Rd := Rn Rm
update CPSR flags on Rn + Rmupdate CPSR flags on Rn immed_8R8 := R8
update CPSR flags on Rn Rm
Arithmetic instructions:
P-143/151PAL/
2.5 Data2.5 Data--processing Instructions [3/4]processing Instructions [3/4]
Hi to Lo, Lo to Hi, Hi to Hi
ImmediateLo to Lo
Operation Assembler Action
Move instructions:
Operation Assembler Action
Logical instructions:
Rd := Rm
Rd := immed_8Rd := Rm
MOV Rd, Rm
MOV Rd, #MOV Rd, Rm
OR
Exclusive OR
AND
Bit clearMove NOTTest bits
ORR Rd, Rm
EOR Rd, Rm
AND Rd, Rm
BIC Rd, RmMVN Rd, RmTST Rn, Rm
Rd := Rd OR Rm
Rd := Rd EOR Rm
Rd := Rd AND Rm
Rd := Rd AND NOT RmRd := NOT Rmupdate CPSR flags on Rn AND Rm
P-144/151PAL/
2.5 Data2.5 Data--processing Instructions [4/4]processing Instructions [4/4]
Operation Assembler Action
Shift/Rotate instructions:
Logical shift right
LSL Rd, Rs
Logical shift left
LSR Rd, RsArithmetic shift rightASR Rd, RsRotate right
LSR Rd, Rm, #
Rd := Rd > RsASR Rd, Rm, #Rd := Rd ASR RsROR Rd, Rs
Rd := Rm >> immed_5
Rd := Rm
P-145/151PAL/
2.5 Examples2.5 ExamplesADD R0, R4, R7 ; R0 = R4 + R7.SUB R6, R1, R2 ; R6 = R1 R2.ADD R0, #255 ; R0 = R0 + 255.ADD R1, R4, #4 ; R1 = R4 + 4.NEG R3, R1 ; R3 = 0 R1.AND R2, R5 ; R2 = R2 AND R5.EOR R1, R6 ; R1 = R1 EOR R6.CMP R7, #100 ; update flag after R7 100.MOV R0, R12 ; R0 = R12.ADD R8, R10 ; R8 = R8 + R12.
P-146/151PAL/
2.5 Load and Store Register Instructions [1/2]2.5 Load and Store Register Instructions [1/2]
Operation Assembler Action
Load instructions:
SP-relative
signed halfwordbytesigned byte
PC-relative
halfword
with immediate offset, wordhalfwordbyte
with register offset, word
Multiple
Rd := [SP + immed_8 * 4]
Rd :=SignExtend([Rn + Rm][15:0])Rd :=ZeroExtend([Rn + Rm][7:0])Rd :=SignExtend([Rn + Rm][7:0])Rd := [(PC AND 0xFFFFFFFC) + immed_8*4]
Rd :=ZeroExtend([Rn + Rm][15:0])
Rd := [Rn + immed_5 * 4]Rd :=ZeroExtend([Rn + immed_5 * 2][15:0])Rd :=ZeroExtend([Rn + immed_5][7:0])Rd := [Rn + Rm]
Loads list of registers
LDR Rd, [SP, #]
LDRSH Rd, [Rn, Rm]LDRB Rd, [Rn, Rm]LDRSB Rd, [Rn, Rm]LDR Rd, [PC, #]
LDRH Rd, [Rn, Rm]
LDR Rd, [Rn, #]LDRH Rd, [Rn, #]LDRB Rd, [Rn, #]LDR Rd, [Rn, Rm]
LDMIA Rn!,
P-147/151PAL/
2.5 Load and Store Register Instructions [2/2]2.5 Load and Store Register Instructions [2/2]
Operation Assembler Action
Store instructions:
SP-relative,wordMultiple
byte S
halfword
with immediate offset, wordhalfwordbyte
with register offset, word
STR Rd, [SP, #]STMIA Rn!,
STRB Rd, [Rn, Rm]
STRH Rd, [Rn, Rm]
STR Rd, [Rn, #]STRH Rd, [Rn, #]STRB Rd, [Rn, #]STR Rd, [Rn, Rm]
[SP + immed_8*4] := RdStores list of registers
[Rn + Rm][7:0] := Rd[7:0]
[Rn + Rm][15:0] := Rd[15:0]
[Rn + immed_5*4] := Rd[Rn + immed_5*2][15:0] := Rd[15:0][Rn + immed_5][7:0] := Rd[7:0][Rn + Rm] := Rd
P-148/151PAL/
2.5 Examples2.5 ExamplesLDR R4,[R2,#4] ;load word into R4 from address R2+4.STR R0,[R7,#12];store word from R0 to address R7+12.STRB R1,[R5,#31];store byte from R1 to address R5+31. STRH R4,[R2,R3];store halfword from R4 to address. R2+R3
LDMIA R7!,{R0-R3,R5};load R0-R3 and R5 from R7, ;add 20 to R7.
STMIA R0!,[R3,R4,R5] ;store R3-R5 to R0; add 12 to R0.
P-149/151PAL/
2.5 Stack Operation and Exception Instructions2.5 Stack Operation and Exception Instructions
Operation Assembler Action
Push & pop instructions:
Operation Assembler Action
Software interrupt & breakpoint instructions:
Breakpoint
PopPop and returnPop and return with exchange
PushPush with link
BKPT
PUSH PUSH POP POP POP
Prefetch abort or enter debug state
Pop registers from stackPop registers, branch to address loaded to PCPop, branch, and change to ARM state if address[0] = 0
Push registers onto stackPush LR and registers on to stack
Software interrupt Software interrupt processor exception
P-150/151PAL/
2.5 Examples2.5 ExamplesfunctionPUSH {R0-R7,LR} ;push onto the stack (R13) R0-R7
;and the return address code of the ; function body.
..
POP {R0-R7, PC} ;restore R0-R7 from the stack and ;the program counter, and return.
P-151/151PAL/
ReferencesReferencesS. Wong, S. Vassiliadis, and S. Cotofana, Embedded Processors: Characteristics and Trends, Technical Report CE-TR-2004-03, Computer Engineering Laboratory, Delft, Netherlands, 2004.The ARM document in http://www.arm.comThe MIPS document in http://www.mips.comThe Cell document in http://www.ibm.comThe Xscale document in http://www.intel.comD. Seal, ARM Architecture Reference Manual, Addison Wesley, 2000.