嵌入式處理器架構與程式設計

嵌入式處理器架構與程式設計

王建民中央研究院資訊所

2008 年 7 月

2

Contents Introduction Computer Architecture ARM Architecture Development Tools GNU Development Tools ARM Instruction Set ARM Assembly Language ARM Assembly Programming GNU ARM ToolChain Interrupts and Monitor

Lecture 4Development Tools

4

Outline

Compilers Assemblers Linkers and Loaders Runtime Environment

5

What is a compiler?

A program translator Source language

E.g., C, C++, Java, Pascal Target language

E.g., assembly language for x86, MIPS, ARM

6

Historical Background

Machine language first 1957: First FORTRAN compiler

18 programmer-years of effort Extremely ad hoc

Today’s techniques were created in response to the difficulties of implementing early compilers

7

Phases of a Compiler

Analysis (“front end”) Lexical Analysis Syntax Analysis Semantic Analysis

Synthesis (“back end”) Intermediate Code Generation Intermediate Code Optimization Target Code Generation/Optimization

Front & back ends share symbol table

8

Lexical Analysis

Aka “scanning”, transform characters into tokens

Example:

double f = sqrt(-1);

TDOUBLE (“double”)TIDENT (“f”)TOP (“=“)TIDENT (“sqrt”) TLPAREN (“(“)

TOP (“-”)TINTCONSTANT (“1”)TRPAREN (“)”)TSEP (“;”)

9

Syntax Analysis

Aka “parsing” Uses context-free grammars Structural validation Creates parse tree or derivation

10

Derivation of “sqrt(-1)”

Expression

-> FuncCall

-> TIDENT TLPAREN Expression TRPAREN

-> TIDENT TLPAREN UnaryExpression TRPAREN

-> TIDENT TLPAREN TOP Expression TRPAREN

-> TIDENT TLPAREN TOP TINTCONSTANT TRPAREN

Expression -> UnaryExpressionExpression -> FuncCallExpression -> TINTCONSTANTUnaryExpression -> TOP ExpressionFuncCall -> TIDENT TLPAREN Expression TRPAREN

11

Parse Tree of “sqrt(-1)”Expression

FuncCall

TIDENT TLPAREN TRPAREN

UnaryExpression

Expression

Expression

TOP

TINTCONSTANT

12

Semantic Analysis

“Does it make sense”? Checking semantic rules, such as

Is variable declared? Are operand types compatible? Do function arguments match function

declarations? Types

13

Intermediate Code Generation

A program for an abstract machine Requirements

Easy to generate from parse tree Easy to translate into target code

A variety of forms Quadruple or three-address code Register transfer language

14

Intermediate Code Example

Three-address code (TAC)j = 2 * i + 1;if (j >= n) j = 2 * i + 3;return a[j];

t1 = 2 * it2 = t1 + 1j = t2t3 = j < nif t3 goto L0t4 = 2 * it5 = t4 + 3j = t5

L0: t6 = a[j]return t6

15

Intermediate Code Optimization

Inhibiting code generation of unreachable code segments

Getting rid of unused variables Eliminating multiplication by 1 and

addition by 0 Loop optimization Common sub-expression elimination . . ., etc.

16

Code Optimization Example

Before Aftert1 = 2 * it2 = t1 + 1j = t2t3 = j < nif t3 goto L0t4 = 2 * it5 = t4 + 3j = t5


t1 = 2 * i

j = t1 + 1t3 = j < nif t3 goto L0 j = t1 + 3


17

Target Code Generation

Example: a in %o0, i in %o1, n in %o2, j in %g2

t1 = 2 * i

j = t1 + 1 t3 = j < n if t3 goto L0

j = t1 + 3

L0: t6 = a[j] return t6

sll %o1, 1, %o1

add %o1, 1, %g2 cmp %g2, %o2 blt .LL3 nop add %o1, 3, %g2

.LL3: sll %g2, 2, %g2 retl

ld [%o0+%g2], %o0delayed branch

18

Pascal Example: Source CodePROGRAM STATSVAR SUM,SUMSQ,I,VALUE,MEAN,VARIANCE : INTEGERBEGIN SUM := 0; SUMSQ := 0; FOR I := 1 TO 100 DO BEGIN READ(VALUE); SUM := SUM + VALUE; SUMSQ := SUMSQ + VALUE * VALUE END; MEAN := SUM DIV 100; VARIANCE := SUMSQ DIV 100 – MEAN * MEAN; WRITE(MEAN,VARIANCE)END.

19

Pascal Example: Token Coding

Token CodePROGRAM 1VAR 2BEGIN 3END 4END. 5INTEGER 6FOR 7READ 8WRITE 9TO 10DO 11

Token Code; 12: 13, 14:= 15+ 16- 17* 18DIV 19( 20) 21id 22int 23

20

Scanner Output: Token Stream ILine Token type Toke specifier

1 122 ^STATS

2 23 22 ^SUM

1422 ^SUMSQ1422 ^I1422 ^VALUE1422 ^MEAN1422 ^VARIANCE13

64 35 22 ^SUM

15

Line Token type Toke specifier23 #012

6 22 ^SUMSQ1523 #012

7 722 ^I1523 #11023 #10011

8 39 8

2022 ^VALUE2112

21

Scanner Output: Token Stream IILine Token type Toke specifier10 22 ^SUM

1522 ^SUM1622 ^VALUE12

11 22 ^SUMSQ1522 ^SUMSQ1622 ^VALUE1822 ^VALUE

12 412

13 22 ^MEAN1522 ^SUM19

Line Token type Toke specifier23 #10012

14 22 ^VARIANCE1522 ^SUMSQ1923 #1001722 ^MEAN1822 ^MEAN12

15 92022 ^MEAN1422 ^VARIANCE21

16 5

22

Pascal Example: BNF Grammar1 <prog> ::= PROGRAM <prog-name> VAR <decl-list> BEGIN <stmt-list> END.2 <prog-name> ::= id3 <decl-list> ::= <dec> | <decl-list> ; <dec>4 <dec> ::= <id-list> : <type>5 <type> ::= INTEGER6 <id-list> ::= id | <id-list> , id7 <stmt-list> ::= <stmt> | <stmt-list> ; <stmt>8 <stmt> ::= <assign> | <read> | <write> | <for>9 <assign> ::= id := <exp>10 <exp> ::= <term> | <exp> + <term> | <exp> - <term>11 <term> ::= <factor> | <term> * <factor> | <term> DIV <factor>12 <factor> ::= id | int | ( <exp> )13 <read> ::= READ ( <id-list> )14 <write> ::= WRITE ( <id-list> )15 <for> ::= FOR <index-exp> DO <body>16 <index-exp> ::= id := <exp> TO <exp>17 <body> ::= <stmt> | BEGIN <stmt-list> END

23

Parser Output: Parse Tree I

24

Parser Output: Parse Tree II

25

Intermediate Code IOperation Op1 Op2 Result

(1) := #0 SUM {SUM := 0}

(2) := #0 SUMSQ {SUMSQ := 0}

(3) := #1 I {FOR I := 1 TO 100}

(4) JGT I #100 (15)

(5) CALL XREAD {READ(VALUE)}

(6) PARAM VALUE

(7) + SUM VALUE i1 {SUM :=

(8) := i1 SUM SUM + VALUE}

(9) * VALUE VALUE i2 {SUMSQ := SUMSQ

(10)

+ SUMSQ i2 i3 + VALUE * VALUE}

(11)

:= i3 SUMSQ

(12)

+ I #1 i4 {end of FOR loop}

26

Intermediate Code IIOperation Op1 Op2 Result

(13)

:= i4 I

(14)

J (4)

(15)

DIV SUM #100 i5 {MEAN :=

(16)

:= i5 MEAN SUM DIV 100}

(17)

DIV SUMSQ #100 i6 {VARIANCE :=

(18)

* MEAN MEAN i7 SUMSQ DIV 100

(19)

- i6 i7 i8 - MEAN * MEAN}

(20)

:= i8 VARIANCE

(21)

CALL XWRITE {WRITE

(22)

PARAM MEAN (MEAN,VARIANCE)}

(23)

PARAM VARIANCE VARIANCE)}

27

Assembly Code I

28

Assembly Code II

29

Compiler Issues

Symbol Table Management Scoping

Error Handling & Recovery Passes

One-pass vs. multi-pass Most compilers are one-pass up to code

optimization phase Several passes are usually required for code

optimization

30

End-to-End Compilation

Syntax analysis

Lexical analysi

s

Semantic

analysis

IR gen.

IRoptimizatio

n

Target code generation

Linker & Loader

Execution

sourcecode

tokens parse tree

IR

object code

executable code

Assembly code

IR

Assembler

AST

31

Outline


32

C versus Assembly Language

C is called a “portable assembly language” Allows low level operations on bits and bytes Allows access to memory via use of pointers Integrates well with assembly language

functions Advantages over assembly code

Easier to read and understand source code Requires fewer lines of code for same function Doesn’t require knowledge of the hardware

33

C versus Assembly Language

Good reasons for learning assembly language In time-critical sections of code, it is possible to

improve performance with assembly language It is a good way to learn how a processor works In writing a new operating system or in porting

an existing system to a new machine, there are sections of code which must be written in assembly language

34

Best of Both Worlds

Integrating C and assembly code Convenient to let C do most of the work

and integrate with assembly code where needed

Make our gas routines callable from C Use C compiler conventions for function calls Preserve registers that C compiler expects

saved

35

GNU vs. Intel Assembler1

In this course, we will be using the GNU assembler, referred to as “gas” Available on UNIX machines as “i386-as” The GNU assembler uses the AT&T syntax

(instead of official Intel/Microsoft syntax) Text is written using Intel assembly language

syntax which is not the same as GNU syntax Local references will provide gas notes for the

text sections with Intel assembly language

36


Overall, the follow are the key differences between the Intel and the gas syntax: The GNU operation codes have a size indicator

that is not present on the Intel operation codes Intel: MOV gas: movb, movw, or movl

The GNU operands are in the opposite order from the Intel operands

Intel: MOV dest, source gas: movb source, dest

37


The GNU register names are preceded by a % that is not present on the Intel register names

Intel: MOV AH, AL gas: movb %al, %ah GNU constants are represented differently from

the Intel constant representations Intel: MOV AL, 0AH gas: movb $0xa, %al

Comments are indicated with # instead of ; Intel: ; comment here gas: # comment here

38


You should familiarize yourself with both Intel and GNU assembly language syntax

Even for GNU assembler, the syntax may not be the same on different platforms.

You may need to use Intel syntax in your professional work someday

39

The Four Field Format1

The Label Field A label is a symbol followed by : Can be referred to as a representation of the

address The ‘Opcode’ Field

Mnemonic to specify the instruction and size Unnecessary to remember instruction code values

Directives to guide the work of the assembler In GNU assembly language, directive begins with .

40

The Four Field Format2

The Operand Field(s) On which the instruction operates Zero, one, or two operands depending on the

instruction The Comment Field

Comment contains documentation It begins with a # anywhere and goes to the end

of the line

41

Symbolic Constants

Allow use of symbols for numeric values Perform same function as #define in C Format is: SYMBOL = value Example:

NCASES = 8

movl $NCASES, %eax

42

Assembly Coding for a C Function

General form for a C function in assembly:.globl _mycode

.text

_mycode:

. . .

ret

.data

_mydata: .long 17

.end

43

Assembler Directives1

Defining a label for external reference (call).globl _mycode

Defining code section of program.text

Defining data section of program.data

End of the Assembly Language.end

44

Assembler Directives2

Defining / initializing storage locations:.long 0x12345678 # 32 bits

.word 0x1234 # 16 bits

.byte 0x12 # 8 bits

Defining / initializing a string.ascii “Hello World\n\0”

.asciz “Hello World\n”

45

C Function Coding Conventions

Same function name as used in the calling C program except with a leading _

Use only %eax, %ecx, and %edx to avoid using registers that the C compiler expects to be preserved across a call and return

Save/restore other registers on stack as needed

Return value in %eax before “ret” instruction

46

Example #1: Sum Two Numbers

C “driver” to execute sum2.s is called sum2c.c

extern int sum2(void);

int main(void)

{

printf(“Sum2 returned %d\n”, sum2());

return 0;

}

47

Example #1: Assembly Code

Assembly code for sum2.s#sum2.s -- Sum of two numbers

.text

.globl _sum2

_sum2: movl $8, %eax

addl $3, %eax

ret #number in eax

48

How to pass parameters?

How would you modify the source code so that sum2(int a, int b) returns the sum of two integer parameters a and b?

extern int sum2(int a,int b);

int main(void)

{

printf(“3 + 8 = %d\n”, sum2(3,8));

return 0;

}

49

Addressing Memory and I/O

With gas, we’ve already seen operands with: % for registers (part of the processor itself) $ for immediate data (part of the instruction

itself) Accessing memory versus I/OMemory I/O

Read

Write

movb address, %al inb address, %al

movb %al, address outb %al, address

50

Addressing Memory1

Direct addressing for memory Intel uses [ ] gas does not use any “operator” Example:

.text

movl %eax, 0x1234

movl 0x1234, %edx

. . .

51

Addressing Memory2

Direct addressing for memory Gas allows use of a variable name for address Examples:

.text

movl %eax, total

movl total, %edx

. . .

.data

total: .long 0

52

Addressing Memory3

Direct addressing for memory Why can’t we write an instruction such as this?

movl first, second Intel instruction set does not support

instructions to move a value from memory to memory!

Must always use a register as an intermediate location for the value being moved

53

Addressing Memory4

Indirect Addressing Defined as using a register as the address of the

memory location to access in an instructionmovl $0x1234, %ebx

movb (%ebx), %al

%ebx

Memory

0x00001234

%al

One byte

54

Addressing Memory5

Indirect Addressing May also be done with a fixed offset, e.g. 4

movl $0x1234, %ebx

movb 4(%ebx), %al

%ebxMemory

0x00001234

%al One byte

+ 4

55

Outline


56

Linker and Loader Functions

Allocation Loading Relocation Linking Execution

57

Program Relocation

問題：如果程式載入的起始位址和原始程式的預定值不同，則執行時無法得到預期的結果 Absolute Program Relocatable Program

解決方法利用 PC relative 或 base relative 定址模式提供 address modification 的資訊給 loader ，於程式載入執行前完成修正的工作

real address = starting address + offset

59

Program Linking

允許獨立的程式單元，可以個別組譯，執行時才和其他程式單元連結 Section ：可以獨立組譯、載入和重定位的程式單元，通常用於副程式或程式的邏輯單元

External definition ：定義 section 內的symbols 給其他 section 使用

External reference ：宣告 section 內所使用的symbols 為定義在其他 section 的 external symbols

64

Classification

Loader: a system program that performs the loading function. Absolute Loader Relocating Loader Linking Loader

Linker: a system program that performs the linking function. Linkage Editor Binder

65

Linkage Editors1

作用 Linkage editor 執行 linking 和部分 relocation的工作，並將連結好的程式寫到檔案或程式庫

說明連結好的程式稱為 load module 或 executable

image 執行時只需利用簡單的 relocating loader 就可以將程式載入記憶體內

載入的動作只需一個 pass 即可完成

67

Linkage Editors2

比較 Linking loader 於每次執行時，都要進行所有

linking 的工作，並須處理 automatic library search 和 external references ； linkage editor只須於產生 load module 時進行一次上述工作

Linking loader 將連結好的程式直接置於記憶體內； linkage editor 將連結好的程式置於檔案或程式庫內

Linking loader 較浪費時間，適合於每次執行都要重新組譯的狀況； linkage editor 適合執行一個程式很多次而不必重新組譯的狀況

68

Dynamic Linking1

作用程式執行前不做任何連結的工作，直到程式執行後，當一個副程式第一次被呼叫時才被載入記憶體，並和程式的其他部分連結

也稱為 dynamic loading 或 load-on-call 比較

Linkage editor 在程式被載入執行前進行連結 Linking loader 在程式被載入時進行連結 Dynamic linking 將連結延遲到程式執行後

69

Dynamic Linking2

優點副程式真正用到時才將其載入，可節省載入和連結沒有被執行到的副程式所需的時間和佔用的空間

允許使用者以交談的方式呼叫任何副程式，而不必載入整個程式庫

系統可以隨時動態地進行 reconfiguration 缺點

系統較複雜而且有較多的額外工作 (overhead)

72

Dynamic Linking3

方法將 dynamic loader 視為作業系統的一部分，凡是需要動態載入的程式都必須透過作業系統服務來呼叫

呼叫副程式時，程式發出 load-and-call 的作業系統服務請求，並以副程式名稱為參數

作業系統檢查內在表格以決定副程式是否已載入，如尚未載入則將副程式由程式庫載入

作業系統將控制權轉移給被呼叫的副程式被呼叫的副程式完成處理後，先將控制權傳回作業系統，再由作業系統傳回原來的程式

73

Outline


74

Runtime Environment

To understand the environment in which your final output will be running.

How a program is laid out in memory: Code Data Stack Heap

How function callers and callees pass info

75

Executable Layout in Memory

From low memory up: Code (text segment, instructions) Static (constant) data Global data Dynamic data (heap) Runtime stack (procedure calls)

Review of what’s in each section:

code

static

globl

stack

heap

76

Text Segment (Executable Code)

Actual machine instructions Arithmetic / logical / comparison / branch /

jump / load / store / move / … Code segment write-protected, so running

code can’t overwrite itself. (Debugger can overwrite it.)

You’ll create the precursor for the code in this segment by emitting assembly code.

Assembler will build final text.

77

Data Segment1

Data Objects Whose size is known at compile time Whose lifetime is the full run of the program

(not just during a function invocation) Static data includes things that won’t

change (can be write-protected): Virtual-function dispatching tables String literals used in instructions Arithmetic literals could be, but more likely

incorporated into instructions.

78

Data Segment2

Global data (other than static) Variables declared global

Local variables declared static (in C) Declared local to a function. Retain values even between invocations of that

function (lifetime is whole run). Semantic analysis ensures that static locals are

not referenced outside their function scope.

79

Dynamic Data (Heap)1

Data created by malloc or New. Heap data lives until deallocated or until

program ends. (Sometimes longer than you want, if you lose track of it.)

Garbage collection / reference counting are ways of automatically de-allocating dead storage in the heap.

80

Dynamic Data (Heap)2

Heap allocation starts at bottom of heap (lower addresses) and allocates upward.

Requirements of alignment, specifics of allocation algorithm may cause storage to be allocated out of (address) order.p1 = new Big();p2 = new Medium();p3 = new Big();p4 = new Tiny(); So (int)p2 > (int)p1 But (int)p4 < (int)p3 Compare pointers for equality, not < or >.

0x1000000

*p3

*p1

*p4*p2

81

Runtime Stack1

Data used for function invocation: Variables declared local to functions (including

main) aka “automatic” data. Except for statics (in data segment)

Variables declared in anonymous blocks inside function.

Arguments to function (passed by caller). Temporaries used by generated code (not

representing names in source). Possibly value returned by callee to caller.

82

Runtime Stack2

Types of data that can be allocated on runtime stack: In C, all kinds of data: simple types, structs,

arrays. C++: stack can hold objects declared as class

type, as well as pointer type. Some languages don’t allow arrays on stack.

83

Stack Terminology1

A stack is an abstract data type.

Push new value onto Top; pop value off Top.Higher elements are more recent, lower

elements are older.

Top

Base

84

Stack Terminology2

Stack implementation can grow any direction. MIPS stack grows downward (from higher

memory addresses to lower). Possible difficulty with terminology.

Some people (and documents) talk about going “up” and “down” the stack.

Some use the abstraction, where “up” means “more recent”, towards Top.

Some (including gdb) say “up” meaning “towards older entries”, toward Base.

85

Other Resources

Caches (very fast) Possibly multiple levels

Physical memory (fast) Virtual memory (swapping is slower)

Includes main memory + swap space on disk Registers (the fastest)

86

Storage Layout Issues

Variables (local & file-scope) Functions Objects Arrays Strings

87

Arrays

C uses row-major order Whole first row is stored first, then whole

second row, … then whole nth row. Fortran uses column-major order

Whole first column is stored first, then whole second col, … then whole kth col.

Storage still a big block, but a column is contiguous instead of a row.

88

Generating Code for Array Refs

In C, use size of element and range of each dimension to compute the offset of any given element: If A has m rows and n columns of 4-byte

elements:

&A [i] [j] is &A + 4 * (n * i + j)

89

C struct Objects

Structs in C are stored in adjacent words, enough for all fields to be aligned:struct cow { char milk; // 3 slack-bytes after this char* name; // aligned on single word } Cow;

‘ A’0x20000 “Bossy”

90

Making Function Calls Each active function call has its own unique

stack frame Frame pointer Static link Return address Arguments Local variables and temporaries

Who does which (caller vs callee)? How do callers and callees communicate?

91

Who does what?1

Before a function call, the calling routine: Saves any necessary registers Pushes arguments onto the stack Sets up the static link (if appropriate) Saves the return address into $ra Jumps (or branches) to the target (AKA the

callee, the called function)

92

Who does what?2

During a function call, the called routine: Saves any necessary registers Sets up the new frame pointer Makes space for any locals or temporaries Does its work Sets up return value in $v0

Works only for integer or pointer values Tears down frame pointer and static link Restores any saved registers Jumps to return address (saved on stack)

93

Who does what?3

After a function call, the calling routine: Removes return address and parameters from

the stack Gets return value from $v0 Restores any saved registers Continues executing

94

Parameter Passing

Call by value Supported by C

Call by value-result Supported by Ada

Call by reference Supported by Fortran

Call by name Like C preprocessor macros

95

An Example Programint dump(arg1, arg2, arg3, stop)int arg1, arg2, arg3, *stop;{ int loc1 = 5, loc2 = 6, loc3 = 7; int *p;

printf("Address Content\n"); for (p = stop; p >= (int*)(&p); p--) printf("%8x: %8x\n", p, *p); return 9;}

int main(argc, argv, envp)int argc;char *argv[], *envp[];{ int var1 = 1, var2 = 2, var3 = 3;

var3 = dump(var1, var2, var3, &envp);}

96

Sample OutputAddress Content Commentbffff9a8: bffff9ec envpbffff9a4: bffff9e4 argvbffff9a0: 1 argcbffff99c: 420158d4 return address (crt0)bffff998: bffff9b8 fp (crt0) <- fp (main)bffff994: 1 var1bffff990: 2 var2bffff98c: 3 var3bffff988: bffff998bffff984: 4212a2d0bffff980: 4212aa58bffff97c: bffff9a8 stopbffff978: 3 arg3bffff974: 2 arg2bffff970: 1 arg1bffff96c: 80483d1 return address (main)bffff968: bffff998 fp (main) <- fp (dump)bffff964: 5 loc1bffff960: 6 loc2bffff95c: 7 loc3bffff958: bffff958 p

Documents

嵌入式處理器架構與 程式設計

嵌入式處理器架構與程式設計