54
BINARY TRAIN - PART I CMJ / 2017.03.18

[2017.03.18] hst binary training part 1

Embed Size (px)

Citation preview

BINARY TRAIN - PART I

CMJ / 2017.03.18

OUTLINE

NEXT 45 MIN

▸ In the next 45 min

▸ Learn the Mach-O binary format

▸ X86-64 Assembly Language / Machine Code

▸ Trivial Binary Bugs

▸ Order by DESC

ㄌㄡˋ洞 就在那邊

民明書房

不可不知的⼗⼤名句

BUG TO VULNERABILITY

SIGNAL

▸ There are so~ many SIGNAL in *nix-like system

▸ Some is helpful

▸ Some is bug prevention

▸ Understand the bug will find the vulnerabilities

▸ SIGFPE - devision-by-zero

▸ SIGILL - illegal instruction

▸ SIGSEGV - invalid virtual memory reference

BUG TO VULNERABILITY

SIGNAL

▸ There are so~ many SIGNAL in *nix-like system

▸ Some is helpful

▸ Some is bug prevention

▸ Understand the bug will find the vulnerabilities

▸ SIGFPE - devision-by-zero

▸ SIGILL - illegal instruction

▸ SIGSEGV - invalid virtual memory reference

BUG TO VULNERABILITY

ILLEGAL & INVALID

▸ Caused by compiler, library, logical

▸ Compiler - replace a newer compiler

▸ Run-time library - replace a newer library

▸ Run-time logical - replace a correct input

▸ 都是 They 的錯

BUG TO VULNERABILITY

ILLEGAL & INVALID

▸ Caused by compiler, library, logical

▸ Compiler - replace a newer compiler

▸ Run-time library - replace a newer library

▸ Run-time logical - replace a correct input

▸ 都是 They 的錯

VULNERABILITY

INPUT

▸ User Input

▸ User-Name, Age, email-address, Gender

▸ Store the user input into memory space

▸ ISSUE

A. How

B. What

C. Where

WORLD IN

X64-64

CPU

X86-64

▸ Register - extend to 64-bits

▸ 8 / 16 / 32 / 64 bits

▸ 128 bits (SSE)

▸ NX (No-Execute) bit

▸ Register is limited

▸ limited to 16 general registers

▸ 16 SSE registers

CPU

X86-64

▸ Von Neumann model

▸ Code / Data are put together (memory)

▸ When data need to be stored / loaded

▸ from register to memory

▸ from memory to register

STORAGE

SOMETHING IN MEMORY

▸ Code vs Data vs BSS vs Stack vs Heap

▸ Code is used to read-execute

▸ Data is used to read-write

▸ BSS is used to store Non-Initial data

▸ Stack is used to store template (local) data

▸ Heap is used to store dynamic data

▸ All of these are stored in the memory

HOPE YOU HAVE …

DATA IN PROGRAM

▸ Data

▸ Gender - one letter or full description

▸ Age - possible integer or impossible integer

▸ Name - alphabet or unicode

▸ All data in register / memory are integer-like

▸ 8-bit (0~255) to SSE (0 ~ 3.4e38)

▸ sign or unsigned is a question

HOPE YOU HAVE …

DATA IN PROGRAM

▸ Can simply put age into register

▸ Gender could be

▸ one letter - to ASCII and put in register

▸ Fix-length - store in memory

▸ Name should be

▸ store in memory

MEMORY

WHERE TO STORE

▸ Memory

▸ Sequently store user input

▸ decode by program / programmer

▸ ISSUE

▸ size

▸ permission

MEMORY

WHERE TO STORE

▸ Data vs BSS vs Stack vs Heap stack

▸ Fit the scenario (assumption)

▸ data is

1. temporary

2. global view

3. variable size

綠⾖糕、稿紙

どっち

你或許看過的 - 雅量

DECODE

⽂字

MOV

▸ In x86-64 opcodes

▸ lots of opcodes are MOV

▸ move from/to memory are frequently used actions

▸ mov ch, dl

▸ mov rax, [rax-0x10]

▸ mov [r8], rsp

▸ lea cx, [rbx]

▸ But there are difference opcode!

AGE

SAVE DATA

▸ Save 18 as age into program

▸ mov rax, 18 ; save as register

▸ mov [rax], 18 ; save into memory

▸ push 18 ; save into stack

GENDER

SAVE DATA

▸ Save ‘F’ (0x46) as gender into program

▸ mov rax, 0x46 ; save as register

▸ mov [rax], 0x46 ; save into memory

▸ push 0x46 ; save into stack

GENDER

SAVE DATA

▸ Save ‘Female’ as gender into program

▸ mov [rax], 0x46656D61

▸ mov [rax+0x04], 0x6C650000

▸ push 0x46

▸ push 0x65

▸ push …

MEMORY

SIZE IS MATTER

▸ Step to store data in memory

1. decide the size of memory

2. how to encode/decode data

3. decide the location of memory

4. put into / get from memory

MEMORY

OVERESTIMATE VS UNDERESTIMATE

▸ Over

▸ memory leak - OOM

▸ waste resource

▸ Under

▸ data corrupt

▸ overflow

MEMORY

▸ move to memory space

▸ Where is the space? BSS or Data or Heap

▸ Compile-time or Run-time

▸ fix-length or variable-length

▸ Save into Stack

▸ Push stack is not unlimited

IN C LANGUAGE

ASSUMPTION

▸ Struct in Cstruct foo { int age; char gender[8]; char email[128];};

‣ What happen if overflow in gender

‣ email is corrupt / age is corrupt

age

gender

email

0x1230

0x12B9

IN ASM

ASSUMPTION

[0x400000] call 0x400043

[0x400043] mov rax 18

[0x400048] ret

IN ASM

ASSUMPTION

[0x400000] call 0x400043

[0x400043] push 18

[0x400048] ret

IN ASM

ASSUMPTION

[0x400000] call 0x400043

[0x400043] mov [rbp-0x10] 0x46

[0x40004E] ret

IN ASM

ASSUMPTION

[0x400000] call 0x400043

[0x400043] mov r8 [rip+0x08]

[0x40004A] mov [r8] 18

[0x400051] ret

LEGACY

CODE/DATA BOTH IN MEMORY

▸ First: call is combined from push and jump

▸ call 0x400035

1. push rip

2. jump 0x400035

‣ ret

1. pop rip

2. jump rip

‣ And more

▸ call rax

▸ call [rax]

LEGACY PROGRAM ALWAYS HAS BUG

EVEN COMPILER

QUESTION

▸ If vulnerability could be

▸ source code to assembly code

QUESTION

▸ If vulnerability could be

▸ source code to assembly code

▸ NO BUG from assembly code to machine code?

⽂字

ASSEMBLE

▸ From assembly code to machine code

▸ 1-1 mapping

▸ platform-dependent

▸ Example

▸ pop rax - 58

▸ syscall - 0F 05

▸ xor r8 0x10 - 48 83 F0 10

▸ mov eax 0xDEADBEEF - B8 EF BE AD DE

X86-64 OPCODE

INSTRUCTION

X86-64 MACHINE CODE

▸ X86-64 machine code layout▸ [prefix] [opcode] [MOD] [SIB] [Displacement] [Immediate]

▸ Max to 15-bytes peer each instruction

▸ Displacement + Immediate max to 8-bytes (64-bit address)

▸ R(educed)ISC vs C(omplex)ISC

STFW

OPCODE

▸ X86-64 opcode

▸ Intel Manual[0]

▸ Web Resource[1]

▸ OPCODE possible 00 ~ FF

▸ Each one has possible usage or invalid

[0]: https://software.intel.com/sites/default/files/managed/ad/01/253666-sdm-vol-2a.pdf[1]: http://ref.x86asm.net/coder64.html

SIMPLE LIFE

OPCODE

▸ Simple (frequently-used) opcode

▸ No-OPeration

▸ NOP 90 (maybe xchg eax, eax)

▸ NOP 0F 0D

▸ FNOP D9 D0 (FPU nop)

[0]: http://stackoverflow.com/questions/25008772/whats-the-difference-between-the-x86-nop-and-fnop-instructions

X86-64

SLIGHTLY COMPLICATED

▸ Extension OPCODE

▸ add (01) support 16 / 32 / 64 operand

▸ add r/m16/32/64 r16/32/64

▸ One opcode do multiple thing?

▸ prefix 48 ~ 4F extend the size to 64-bit

7 3 2 1 0

+—————————+———+———+———+———+

| 0 1 0 0 | W | R | X | B |

+—————————+———+———+———+———+

X86-64

REGISTER EXTENSION

▸ Extension

▸ Size (32-bits to 64-bits)

▸ register (general to extension)

▸ mov eax, 0xdeadbeef B8 EF BE AD DE

▸ mov rax, 0xdeadbeef 48 B8 EF BE AD DE

▸ mov r8, 0xdeadbeef 49 B8 EF BE AD DE

X86-64

TRICKY

▸ OPCODE

▸ push implies r64

▸ push rax 50

▸ push rax 48 50

X86-64

PRIMARY OPCODE

▸ Some opcode is mixed

▸ OPCODE + second opcode

▸ push r16/64 would be merge with 1-byte

▸ push ax 66 50

▸ push rax 50

▸ push r9w 66 41 51

▸ push r9 41 51

X86-64

TWO-BYTE OPCODE

▸ Some opcode are two-type

▸ ADD 05

▸ syscall 0F 05

▸ Prefix (two-byte) 0F

X86-64

SOME PROBLEM

▸ Trivial case - condition check

▸ jz LABEL 48 0F 84 06 00 00 00

▸ Can be modified as

▸ nop 90 90 90 90 90 90 90

X86-64

SOME PROBLEM

▸ If we have

▸ add ax, 0x5150 66 05 50 51

▸ Can be modified as

▸ syscall 0F 05

▸ push rax 50

▸ push rcx 51

REAL-CASE - MAC OS X

POSSIBILITY

MACHO

▸ Mach-O is a binary format

▸ Header

▸ Commands

▸ Sections

▸ Segment

▸ Binary payload

▸ Multi-architecture binaries

MACH-O 64

HEADER

▸ Magic Number 0xFEEDFACF

▸ 64-bit

▸ CPU info

▸ X86_64 / ARM / ARM64 / POWERPC64 / …

▸ File Type

▸ Execute / Preload / DYLIB / …

▸ Number of commands (section/segment)

▸ Flags

▸ PIE / NOUNDEFS / DYLDLINK / LAZY_INIT / …

MACH-O 64

COMMANDS

▸ Lots of commands

▸ LC_SEGMENT_64

▸ LC_SYMTAB

▸ LC_LOAD_DYLIB

▸ LC_UNIXTHREAD

▸ LC_MAIN

▸ LC_RPATH

MACH-O 64

SEGMENT

▸ Segment

▸ command name

▸ memory address

▸ memory size

▸ file offset

▸ file size

▸ max VM protection

▸ max initial protection

▸ number of sections

MACH-O 64

SECTION

▸ Section Name

▸ Segment Name

▸ memory address

▸ size

▸ offset

▸ align

▸ flags

MACH-O 64

MINIMAL

▸ Minimal Mach-O 64 binary

▸ Low consumption - 4K

▸ Header

▸ 7 commands - 664 bytes

▸ Machine Code - 12 bytes

▸ Dummy \x00

ZASM

ASSEMBLER

▸ Assembler

▸ From assembly language to machine code

▸ Target format (ELF / Mach-O / …)

▸ Target platform (x86-64 / ARMv8 / …)

▸ Generator

[0]: https://github.com/cmj0121/Zerg/tree/master/src/zasm

Q&A THANKS FOR YOUR ATTENTION