嵌入式處理器架構與程式設計
王建民中央研究院 資訊所
2008 年 7 月
2
Contents Introduction Computer Architecture ARM Architecture Development Tools GNU Development Tools ARM Instruction Set ARM Assembly Language ARM Assembly Programming GNU ARM ToolChain Interrupts and Monitor
Lecture 4Development Tools
4
Outline
Compilers Assemblers Linkers and Loaders Runtime Environment
5
What is a compiler?
A program translator Source language
E.g., C, C++, Java, Pascal Target language
E.g., assembly language for x86, MIPS, ARM
6
Historical Background
Machine language first 1957: First FORTRAN compiler
18 programmer-years of effort Extremely ad hoc
Today’s techniques were created in response to the difficulties of implementing early compilers
7
Phases of a Compiler
Analysis (“front end”) Lexical Analysis Syntax Analysis Semantic Analysis
Synthesis (“back end”) Intermediate Code Generation Intermediate Code Optimization Target Code Generation/Optimization
Front & back ends share symbol table
8
Lexical Analysis
Aka “scanning”, transform characters into tokens
Example:
double f = sqrt(-1);
TDOUBLE (“double”)TIDENT (“f”)TOP (“=“)TIDENT (“sqrt”) TLPAREN (“(“)
TOP (“-”)TINTCONSTANT (“1”)TRPAREN (“)”)TSEP (“;”)
9
Syntax Analysis
Aka “parsing” Uses context-free grammars Structural validation Creates parse tree or derivation
10
Derivation of “sqrt(-1)”
Expression
-> FuncCall
-> TIDENT TLPAREN Expression TRPAREN
-> TIDENT TLPAREN UnaryExpression TRPAREN
-> TIDENT TLPAREN TOP Expression TRPAREN
-> TIDENT TLPAREN TOP TINTCONSTANT TRPAREN
Expression -> UnaryExpressionExpression -> FuncCallExpression -> TINTCONSTANTUnaryExpression -> TOP ExpressionFuncCall -> TIDENT TLPAREN Expression TRPAREN
11
Parse Tree of “sqrt(-1)”Expression
FuncCall
TIDENT TLPAREN TRPAREN
UnaryExpression
Expression
Expression
TOP
TINTCONSTANT
12
Semantic Analysis
“Does it make sense”? Checking semantic rules, such as
Is variable declared? Are operand types compatible? Do function arguments match function
declarations? Types
13
Intermediate Code Generation
A program for an abstract machine Requirements
Easy to generate from parse tree Easy to translate into target code
A variety of forms Quadruple or three-address code Register transfer language
14
Intermediate Code Example
Three-address code (TAC)j = 2 * i + 1;if (j >= n) j = 2 * i + 3;return a[j];
t1 = 2 * it2 = t1 + 1j = t2t3 = j < nif t3 goto L0t4 = 2 * it5 = t4 + 3j = t5
L0: t6 = a[j]return t6
15
Intermediate Code Optimization
Inhibiting code generation of unreachable code segments
Getting rid of unused variables Eliminating multiplication by 1 and
addition by 0 Loop optimization Common sub-expression elimination . . ., etc.
16
Code Optimization Example
Before Aftert1 = 2 * it2 = t1 + 1j = t2t3 = j < nif t3 goto L0t4 = 2 * it5 = t4 + 3j = t5
L0: t6 = a[j]return t6
t1 = 2 * i
j = t1 + 1t3 = j < nif t3 goto L0 j = t1 + 3
L0: t6 = a[j]return t6
17
Target Code Generation
Example: a in %o0, i in %o1, n in %o2, j in %g2
t1 = 2 * i
j = t1 + 1 t3 = j < n if t3 goto L0
j = t1 + 3
L0: t6 = a[j] return t6
sll %o1, 1, %o1
add %o1, 1, %g2 cmp %g2, %o2 blt .LL3 nop add %o1, 3, %g2
.LL3: sll %g2, 2, %g2 retl
ld [%o0+%g2], %o0delayed branch
18
Pascal Example: Source CodePROGRAM STATSVAR SUM,SUMSQ,I,VALUE,MEAN,VARIANCE : INTEGERBEGIN SUM := 0; SUMSQ := 0; FOR I := 1 TO 100 DO BEGIN READ(VALUE); SUM := SUM + VALUE; SUMSQ := SUMSQ + VALUE * VALUE END; MEAN := SUM DIV 100; VARIANCE := SUMSQ DIV 100 – MEAN * MEAN; WRITE(MEAN,VARIANCE)END.
19
Pascal Example: Token Coding
Token CodePROGRAM 1VAR 2BEGIN 3END 4END. 5INTEGER 6FOR 7READ 8WRITE 9TO 10DO 11
Token Code; 12: 13, 14:= 15+ 16- 17* 18DIV 19( 20) 21id 22int 23
20
Scanner Output: Token Stream ILine Token type Toke specifier
1 122 ^STATS
2 23 22 ^SUM
1422 ^SUMSQ1422 ^I1422 ^VALUE1422 ^MEAN1422 ^VARIANCE13
64 35 22 ^SUM
15
Line Token type Toke specifier23 #012
6 22 ^SUMSQ1523 #012
7 722 ^I1523 #11023 #10011
8 39 8
2022 ^VALUE2112
21
Scanner Output: Token Stream IILine Token type Toke specifier10 22 ^SUM
1522 ^SUM1622 ^VALUE12
11 22 ^SUMSQ1522 ^SUMSQ1622 ^VALUE1822 ^VALUE
12 412
13 22 ^MEAN1522 ^SUM19
Line Token type Toke specifier23 #10012
14 22 ^VARIANCE1522 ^SUMSQ1923 #1001722 ^MEAN1822 ^MEAN12
15 92022 ^MEAN1422 ^VARIANCE21
16 5
22
Pascal Example: BNF Grammar1 <prog> ::= PROGRAM <prog-name> VAR <decl-list> BEGIN <stmt-list> END.2 <prog-name> ::= id3 <decl-list> ::= <dec> | <decl-list> ; <dec>4 <dec> ::= <id-list> : <type>5 <type> ::= INTEGER6 <id-list> ::= id | <id-list> , id7 <stmt-list> ::= <stmt> | <stmt-list> ; <stmt>8 <stmt> ::= <assign> | <read> | <write> | <for>9 <assign> ::= id := <exp>10 <exp> ::= <term> | <exp> + <term> | <exp> - <term>11 <term> ::= <factor> | <term> * <factor> | <term> DIV <factor>12 <factor> ::= id | int | ( <exp> )13 <read> ::= READ ( <id-list> )14 <write> ::= WRITE ( <id-list> )15 <for> ::= FOR <index-exp> DO <body>16 <index-exp> ::= id := <exp> TO <exp>17 <body> ::= <stmt> | BEGIN <stmt-list> END
23
Parser Output: Parse Tree I
24
Parser Output: Parse Tree II
25
Intermediate Code IOperation Op1 Op2 Result
(1) := #0 SUM {SUM := 0}
(2) := #0 SUMSQ {SUMSQ := 0}
(3) := #1 I {FOR I := 1 TO 100}
(4) JGT I #100 (15)
(5) CALL XREAD {READ(VALUE)}
(6) PARAM VALUE
(7) + SUM VALUE i1 {SUM :=
(8) := i1 SUM SUM + VALUE}
(9) * VALUE VALUE i2 {SUMSQ := SUMSQ
(10)
+ SUMSQ i2 i3 + VALUE * VALUE}
(11)
:= i3 SUMSQ
(12)
+ I #1 i4 {end of FOR loop}
26
Intermediate Code IIOperation Op1 Op2 Result
(13)
:= i4 I
(14)
J (4)
(15)
DIV SUM #100 i5 {MEAN :=
(16)
:= i5 MEAN SUM DIV 100}
(17)
DIV SUMSQ #100 i6 {VARIANCE :=
(18)
* MEAN MEAN i7 SUMSQ DIV 100
(19)
- i6 i7 i8 - MEAN * MEAN}
(20)
:= i8 VARIANCE
(21)
CALL XWRITE {WRITE
(22)
PARAM MEAN (MEAN,VARIANCE)}
(23)
PARAM VARIANCE VARIANCE)}
27
Assembly Code I
28
Assembly Code II
29
Compiler Issues
Symbol Table Management Scoping
Error Handling & Recovery Passes
One-pass vs. multi-pass Most compilers are one-pass up to code
optimization phase Several passes are usually required for code
optimization
30
End-to-End Compilation
Syntax analysis
Lexical analysi
s
Semantic
analysis
IR gen.
IRoptimizatio
n
Target code generation
Linker & Loader
Execution
sourcecode
tokens parse tree
IR
object code
executable code
Assembly code
IR
Assembler
AST
31
Outline
Compilers Assemblers Linkers and Loaders Runtime Environment
32
C versus Assembly Language
C is called a “portable assembly language” Allows low level operations on bits and bytes Allows access to memory via use of pointers Integrates well with assembly language
functions Advantages over assembly code
Easier to read and understand source code Requires fewer lines of code for same function Doesn’t require knowledge of the hardware
33
C versus Assembly Language
Good reasons for learning assembly language In time-critical sections of code, it is possible to
improve performance with assembly language It is a good way to learn how a processor works In writing a new operating system or in porting
an existing system to a new machine, there are sections of code which must be written in assembly language
34
Best of Both Worlds
Integrating C and assembly code Convenient to let C do most of the work
and integrate with assembly code where needed
Make our gas routines callable from C Use C compiler conventions for function calls Preserve registers that C compiler expects
saved
35
GNU vs. Intel Assembler1
In this course, we will be using the GNU assembler, referred to as “gas” Available on UNIX machines as “i386-as” The GNU assembler uses the AT&T syntax
(instead of official Intel/Microsoft syntax) Text is written using Intel assembly language
syntax which is not the same as GNU syntax Local references will provide gas notes for the
text sections with Intel assembly language
36
GNU vs. Intel Assembler2
Overall, the follow are the key differences between the Intel and the gas syntax: The GNU operation codes have a size indicator
that is not present on the Intel operation codes Intel: MOV gas: movb, movw, or movl
The GNU operands are in the opposite order from the Intel operands
Intel: MOV dest, source gas: movb source, dest
37
GNU vs. Intel Assembler3
The GNU register names are preceded by a % that is not present on the Intel register names
Intel: MOV AH, AL gas: movb %al, %ah GNU constants are represented differently from
the Intel constant representations Intel: MOV AL, 0AH gas: movb $0xa, %al
Comments are indicated with # instead of ; Intel: ; comment here gas: # comment here
38
GNU vs. Intel Assembler4
You should familiarize yourself with both Intel and GNU assembly language syntax
Even for GNU assembler, the syntax may not be the same on different platforms.
You may need to use Intel syntax in your professional work someday
39
The Four Field Format1
The Label Field A label is a symbol followed by : Can be referred to as a representation of the
address The ‘Opcode’ Field
Mnemonic to specify the instruction and size Unnecessary to remember instruction code values
Directives to guide the work of the assembler In GNU assembly language, directive begins with .
40
The Four Field Format2
The Operand Field(s) On which the instruction operates Zero, one, or two operands depending on the
instruction The Comment Field
Comment contains documentation It begins with a # anywhere and goes to the end
of the line
41
Symbolic Constants
Allow use of symbols for numeric values Perform same function as #define in C Format is: SYMBOL = value Example:
NCASES = 8
movl $NCASES, %eax
42
Assembly Coding for a C Function
General form for a C function in assembly:.globl _mycode
.text
_mycode:
. . .
ret
.data
_mydata: .long 17
.end
43
Assembler Directives1
Defining a label for external reference (call).globl _mycode
Defining code section of program.text
Defining data section of program.data
End of the Assembly Language.end
44
Assembler Directives2
Defining / initializing storage locations:.long 0x12345678 # 32 bits
.word 0x1234 # 16 bits
.byte 0x12 # 8 bits
Defining / initializing a string.ascii “Hello World\n\0”
.asciz “Hello World\n”
45
C Function Coding Conventions
Same function name as used in the calling C program except with a leading _
Use only %eax, %ecx, and %edx to avoid using registers that the C compiler expects to be preserved across a call and return
Save/restore other registers on stack as needed
Return value in %eax before “ret” instruction
46
Example #1: Sum Two Numbers
C “driver” to execute sum2.s is called sum2c.c
extern int sum2(void);
int main(void)
{
printf(“Sum2 returned %d\n”, sum2());
return 0;
}
47
Example #1: Assembly Code
Assembly code for sum2.s#sum2.s -- Sum of two numbers
.text
.globl _sum2
_sum2: movl $8, %eax
addl $3, %eax
ret #number in eax
48
How to pass parameters?
How would you modify the source code so that sum2(int a, int b) returns the sum of two integer parameters a and b?
extern int sum2(int a,int b);
int main(void)
{
printf(“3 + 8 = %d\n”, sum2(3,8));
return 0;
}
49
Addressing Memory and I/O
With gas, we’ve already seen operands with: % for registers (part of the processor itself) $ for immediate data (part of the instruction
itself) Accessing memory versus I/OMemory I/O
Read
Write
movb address, %al inb address, %al
movb %al, address outb %al, address
50
Addressing Memory1
Direct addressing for memory Intel uses [ ] gas does not use any “operator” Example:
.text
movl %eax, 0x1234
movl 0x1234, %edx
. . .
51
Addressing Memory2
Direct addressing for memory Gas allows use of a variable name for address Examples:
.text
movl %eax, total
movl total, %edx
. . .
.data
total: .long 0
52
Addressing Memory3
Direct addressing for memory Why can’t we write an instruction such as this?
movl first, second Intel instruction set does not support
instructions to move a value from memory to memory!
Must always use a register as an intermediate location for the value being moved
53
Addressing Memory4
Indirect Addressing Defined as using a register as the address of the
memory location to access in an instructionmovl $0x1234, %ebx
movb (%ebx), %al
%ebx
Memory
0x00001234
%al
One byte
54
Addressing Memory5
Indirect Addressing May also be done with a fixed offset, e.g. 4
movl $0x1234, %ebx
movb 4(%ebx), %al
%ebxMemory
0x00001234
%al One byte
+ 4
55
Outline
Compilers Assemblers Linkers and Loaders Runtime Environment
56
Linker and Loader Functions
Allocation Loading Relocation Linking Execution
57
Program Relocation
問題:如果程式載入的起始位址和原始程式的預定值不同,則執行時無法得到預期的結果 Absolute Program Relocatable Program
解決方法 利用 PC relative 或 base relative 定址模式 提供 address modification 的資訊給 loader ,於程式載入執行前完成修正的工作
real address = starting address + offset
59
Program Linking
允許獨立的程式單元,可以個別組譯,執行時才和其他程式單元連結 Section :可以獨立組譯、載入和重定位的程式單元,通常用於副程式或程式的邏輯單元
External definition :定義 section 內的symbols 給其他 section 使用
External reference :宣告 section 內所使用的symbols 為定義在其他 section 的 external symbols
64
Classification
Loader: a system program that performs the loading function. Absolute Loader Relocating Loader Linking Loader
Linker: a system program that performs the linking function. Linkage Editor Binder
65
Linkage Editors1
作用 Linkage editor 執行 linking 和部分 relocation的工作,並將連結好的程式寫到檔案或程式庫
說明 連結好的程式稱為 load module 或 executable
image 執行時只需利用簡單的 relocating loader 就可以將程式載入記憶體內
載入的動作只需一個 pass 即可完成
67
Linkage Editors2
比較 Linking loader 於每次執行時,都要進行所有
linking 的工作,並須處理 automatic library search 和 external references ; linkage editor只須於產生 load module 時進行一次上述工作
Linking loader 將連結好的程式直接置於記憶體內; linkage editor 將連結好的程式置於檔案或程式庫內
Linking loader 較浪費時間,適合於每次執行都要重新組譯的狀況; linkage editor 適合執行一個程式很多次而不必重新組譯的狀況
68
Dynamic Linking1
作用 程式執行前不做任何連結的工作,直到程式執行後,當一個副程式第一次被呼叫時才被載入記憶體,並和程式的其他部分連結
也稱為 dynamic loading 或 load-on-call 比較
Linkage editor 在程式被載入執行前進行連結 Linking loader 在程式被載入時進行連結 Dynamic linking 將連結延遲到程式執行後
69
Dynamic Linking2
優點 副程式真正用到時才將其載入,可節省載入和連結沒有被執行到的副程式所需的時間和佔用的空間
允許使用者以交談的方式呼叫任何副程式,而不必載入整個程式庫
系統可以隨時動態地進行 reconfiguration 缺點
系統較複雜而且有較多的額外工作 (overhead)
72
Dynamic Linking3
方法 將 dynamic loader 視為作業系統的一部分,凡是需要動態載入的程式都必須透過作業系統服務來呼叫
呼叫副程式時,程式發出 load-and-call 的作業系統服務請求,並以副程式名稱為參數
作業系統檢查內在表格以決定副程式是否已載入,如尚未載入則將副程式由程式庫載入
作業系統將控制權轉移給被呼叫的副程式 被呼叫的副程式完成處理後,先將控制權傳回作業系統,再由作業系統傳回原來的程式
73
Outline
Compilers Assemblers Linkers and Loaders Runtime Environment
74
Runtime Environment
To understand the environment in which your final output will be running.
How a program is laid out in memory: Code Data Stack Heap
How function callers and callees pass info
75
Executable Layout in Memory
From low memory up: Code (text segment, instructions) Static (constant) data Global data Dynamic data (heap) Runtime stack (procedure calls)
Review of what’s in each section:
code
static
globl
stack
heap
76
Text Segment (Executable Code)
Actual machine instructions Arithmetic / logical / comparison / branch /
jump / load / store / move / … Code segment write-protected, so running
code can’t overwrite itself. (Debugger can overwrite it.)
You’ll create the precursor for the code in this segment by emitting assembly code.
Assembler will build final text.
77
Data Segment1
Data Objects Whose size is known at compile time Whose lifetime is the full run of the program
(not just during a function invocation) Static data includes things that won’t
change (can be write-protected): Virtual-function dispatching tables String literals used in instructions Arithmetic literals could be, but more likely
incorporated into instructions.
78
Data Segment2
Global data (other than static) Variables declared global
Local variables declared static (in C) Declared local to a function. Retain values even between invocations of that
function (lifetime is whole run). Semantic analysis ensures that static locals are
not referenced outside their function scope.
79
Dynamic Data (Heap)1
Data created by malloc or New. Heap data lives until deallocated or until
program ends. (Sometimes longer than you want, if you lose track of it.)
Garbage collection / reference counting are ways of automatically de-allocating dead storage in the heap.
80
Dynamic Data (Heap)2
Heap allocation starts at bottom of heap (lower addresses) and allocates upward.
Requirements of alignment, specifics of allocation algorithm may cause storage to be allocated out of (address) order.p1 = new Big();p2 = new Medium();p3 = new Big();p4 = new Tiny(); So (int)p2 > (int)p1 But (int)p4 < (int)p3 Compare pointers for equality, not < or >.
0x1000000
*p3
*p1
*p4*p2
81
Runtime Stack1
Data used for function invocation: Variables declared local to functions (including
main) aka “automatic” data. Except for statics (in data segment)
Variables declared in anonymous blocks inside function.
Arguments to function (passed by caller). Temporaries used by generated code (not
representing names in source). Possibly value returned by callee to caller.
82
Runtime Stack2
Types of data that can be allocated on runtime stack: In C, all kinds of data: simple types, structs,
arrays. C++: stack can hold objects declared as class
type, as well as pointer type. Some languages don’t allow arrays on stack.
83
Stack Terminology1
A stack is an abstract data type.
Push new value onto Top; pop value off Top.Higher elements are more recent, lower
elements are older.
Top
Base
84
Stack Terminology2
Stack implementation can grow any direction. MIPS stack grows downward (from higher
memory addresses to lower). Possible difficulty with terminology.
Some people (and documents) talk about going “up” and “down” the stack.
Some use the abstraction, where “up” means “more recent”, towards Top.
Some (including gdb) say “up” meaning “towards older entries”, toward Base.
85
Other Resources
Caches (very fast) Possibly multiple levels
Physical memory (fast) Virtual memory (swapping is slower)
Includes main memory + swap space on disk Registers (the fastest)
86
Storage Layout Issues
Variables (local & file-scope) Functions Objects Arrays Strings
87
Arrays
C uses row-major order Whole first row is stored first, then whole
second row, … then whole nth row. Fortran uses column-major order
Whole first column is stored first, then whole second col, … then whole kth col.
Storage still a big block, but a column is contiguous instead of a row.
88
Generating Code for Array Refs
In C, use size of element and range of each dimension to compute the offset of any given element: If A has m rows and n columns of 4-byte
elements:
&A [i] [j] is &A + 4 * (n * i + j)
89
C struct Objects
Structs in C are stored in adjacent words, enough for all fields to be aligned:struct cow { char milk; // 3 slack-bytes after this char* name; // aligned on single word } Cow;
‘ A’0x20000 “Bossy”
90
Making Function Calls Each active function call has its own unique
stack frame Frame pointer Static link Return address Arguments Local variables and temporaries
Who does which (caller vs callee)? How do callers and callees communicate?
91
Who does what?1
Before a function call, the calling routine: Saves any necessary registers Pushes arguments onto the stack Sets up the static link (if appropriate) Saves the return address into $ra Jumps (or branches) to the target (AKA the
callee, the called function)
92
Who does what?2
During a function call, the called routine: Saves any necessary registers Sets up the new frame pointer Makes space for any locals or temporaries Does its work Sets up return value in $v0
Works only for integer or pointer values Tears down frame pointer and static link Restores any saved registers Jumps to return address (saved on stack)
93
Who does what?3
After a function call, the calling routine: Removes return address and parameters from
the stack Gets return value from $v0 Restores any saved registers Continues executing
94
Parameter Passing
Call by value Supported by C
Call by value-result Supported by Ada
Call by reference Supported by Fortran
Call by name Like C preprocessor macros
95
An Example Programint dump(arg1, arg2, arg3, stop)int arg1, arg2, arg3, *stop;{ int loc1 = 5, loc2 = 6, loc3 = 7; int *p;
printf("Address Content\n"); for (p = stop; p >= (int*)(&p); p--) printf("%8x: %8x\n", p, *p); return 9;}
int main(argc, argv, envp)int argc;char *argv[], *envp[];{ int var1 = 1, var2 = 2, var3 = 3;
var3 = dump(var1, var2, var3, &envp);}
96
Sample OutputAddress Content Commentbffff9a8: bffff9ec envpbffff9a4: bffff9e4 argvbffff9a0: 1 argcbffff99c: 420158d4 return address (crt0)bffff998: bffff9b8 fp (crt0) <- fp (main)bffff994: 1 var1bffff990: 2 var2bffff98c: 3 var3bffff988: bffff998bffff984: 4212a2d0bffff980: 4212aa58bffff97c: bffff9a8 stopbffff978: 3 arg3bffff974: 2 arg2bffff970: 1 arg1bffff96c: 80483d1 return address (main)bffff968: bffff998 fp (main) <- fp (dump)bffff964: 5 loc1bffff960: 6 loc2bffff95c: 7 loc3bffff958: bffff958 p