27
Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1 , Ramesh Karri 1 , Gaston Ormazabal 2 , Sateesh K. Addepalli 3 1 Polytechnic Institute of NYU, Brooklyn, NY USA 2 Columbia University, New York, NY USA 3 Cisco Systems, San Jose, CA USA Information Forensics and Security, IEEE Transactions on (2012)

Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

Embed Size (px)

Citation preview

Page 1: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

Architecture Support for Dynamic Integrity Checking

2015. 6. 2Bonhyun Koo

Arun K. Kanuparthi1, Ramesh Karri1, Gaston Ormazabal2, Sateesh K. Addepalli3

1Polytechnic Institute of NYU, Brooklyn, NY USA2Columbia University, New York, NY USA

3Cisco Systems, San Jose, CA USA

Information Forensics and Security, IEEE Transactions on(2012)

Page 2: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

Contents

2/20

1. Introduction

2. Background

3. Dynamic Integrity Checker (DIC)

4. DIC Design

5. Experiment and Evaluation

6. Conclusion

Page 3: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

Problem and Contributions

Problem

Existing TPM Architectures do not support runtime integrity checking

Vulnerability : TOCTOU (Time of Check-Time of Use) Attacks

Contribution

1) Integrity Checker Module with a superscalar pipeline

2) Architecture for Dynamic Integrity Checking (Dynamic Execution traces)

3) Optimizations to reduce performance impact

(without compromising the security of the system)

4) Evaluation the proposed scheme (using a cycle-accurate simulator)

3/20

Page 4: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

1. Introduction

- Trusted Platform Module (TPM)

4/20

- Trusted Platform Module (developed by Trusted Computing Group, TCG): A separate chip to address security concerns : TPM acts as a root of trust for checking platform integrity at boot time (Only guarantee boot time security)

(PC) TPM module

Integrity Measurement

RSA Key pairsGeneration

TPM

PCR

H(PA) H(PB) H(PC)

: H(PN)

ProgramC

:

:

Nonce

EPukC( Nonce, H(PC) )

DPriC(EPukC( Nonce, H(PC) ))

PC H =

[ Example of Integrity Verification ]

Page 5: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

2. Background

TOCTOU Threat Model

5/20

TOC(ti)

TOU(ti+k) :

Attacker can exploit the duration between TOC and TOU

One approach to counter such TOCTOU threats is to frequently check for the integrity of instructions being executed. (by calculating the hash)

Attacker can change the system state after checking and before using it.

Page 6: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

2. Background

Vulnerabilities of Computer Systems

6/20

Stack smashing at-tack

Cold Boot Attack

No attack on a superscalar processor Disk and main memory are insecure, but the processor is secure.

Page 7: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

2. Background

7/20

Cryptographic Hash Function

MAC(keyed) MDC(unkeyed)

m MAC = h(k, m)

m MDC = h(m)

MAC

k

h?=

k

h

mm

Hash Function (MAC & MDC)

m

?=

k1

h

m

Ek2(m || h(k1, m))m || h(k1, m)k1

h

k2

E

k2

D

[ MAC : Integrity Checking ] [ Integrity + Confidentiality (Encryption) ]

Page 8: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

2. BackgroundDifferent Dynamic Integrity Schemes

8/20

1. REM(Basic Block)

3. CODESSEAL(Basic Block)

2. XOM(Instruction)

Comparison with precomputed Hash (execution time, AES-128)

(Hash) Memory of FPGA between memory and cache

Session Key to encrypt/decryptdifferent program (DES)Program decryption by the instruction

Compiler & Micro-architecture modification4. SPEF(Basic Block)

Page 9: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

2. Background

Different Dynamic Integrity Schemes

9/20

Page 10: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

3. Dynamic Integrity Checker (DIC)

A. Motivation (choose the optimum granularity level)

10/20

Example) 403.gcc (ref. input 166.i)SPEC CPU2006 benchmark

250x increase in the total execution cycles over the baseline (caused by DIC)

Page 11: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

3. Dynamic Integrity Checker (DIC)

B. DIC of Program Traces

11/20

T(1)

0

CFG (Control Flow Graph)

BB: Basic Block

There are six possible traces

# Trace Path

1 BB0-BB1-BB4-BB5

2 BB0-BB1-BB3

3 BB0-BB1-BB3-BB5

4 BB0-BB2

5 BB0-BB2-BB3

6 BB0-BB2-BB3-BB5

0x00400000: the starting address of the first instruction

TraceID : 0x00400000-010

1

N(0)

1

2

3

4

5

6

Page 12: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

4. DIC Design

12/20

Application Profiling and Trace Generation

(Compile-Time)

Interaction with the Pipe-line (DIC)

(Run-Time)

Hash Trace Cache(HTC)

Hash Storage Hierarchy

Prefetching : to reduce the impact of

Cold Start

load-time prefetching

A. B.

C. D.

Page 13: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

4. DIC Design

A. Compile-Time Preprocessing : Application Profiling and Trace Generation

13/20

(1) Generate a list of all basic blocks of the program.(2) Construct the control flow graph (CFG) of the program, where each node is a basic block.(3) Enumerate all traces of length (in terms of number of basic blocks) or smaller.(4) Profile the program by applying test inputs to count the number of times each trace is encountered.(5) Order the traces by their frequency of execution.

(1)

(2)

(3)

(4)

(5)

Page 14: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

4. DIC Design B. DIC : Interaction with the Pipeline

14/20

Architecture of the proposed scheme

(1) Tag all instructions with a pending bit. (※ Pending bit is cleared Commit) (2) Build the Trace ID (ex. 0x00400000-010...)(3) DIC initiates a fetch of the hash from the disk, using TraceID(4) DIC calculates the Hash(TraceID) generated in Step 2.(5) Decrypt the encrypted hash using the RSA key (hardwired in the DIC) Compare to the one calculated by the DICIf the hashes are equal Commit & Clear the Pending bit If the hashes are not equal Execution is aborted.

(1)

(3)

(4) (2)

(5)

Page 15: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

4. DIC Design

B. DIC : Interaction with the Pipeline

15/20

Case : A (Hash comparison) > B (Reaching the head of ROB) No performance degradation

A

B

the instructions reaches the head of the reorder-buffer (ROB)

Page 16: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

4. DIC Design

C. Hash Storage Hierarchy

16/20

(Performance Problem – Disk Access)If each disk access consumes this large number of cycles, instructions will line up at the commit stage, and performance will reach an unacceptable level.

One solution to this problem is to have a storage hierarchy of hashes

Disk

The main goal of HTC is to cache hashes fetched from the disk

DIC will not need to generate cryptographic hashes and make comparison if there is an HTC hit.

Hash Trace Cache : it only stores the hashes of traces prede-termined at compile time.

Page 17: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

4. DIC Design

D. Prefetching (to reduce the impact of Cold Start)

17/20

AAAAA....AAAAAABCFFFFFF....FFFFFXYGGGG..GG

Example) Trace Access Pattern

Disk

TraceID Frequency

A 100

F 70

G 30

B 1

C 1

X 1

Y 1

load-time prefetching: An easy and cost-effective way to get rid of compulsory misses

Page 18: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

5. Experimental Result

A. Experimental Setup

18/20

Simulator : Zesto (DIC, HTC implementation)

Intel Core2 Architecture, Nehalem

Benchmark : (1) SPEC CPU2006 (runspec), (2) BioBench, (3) STREAM (GNU gcc)

Profiling : exp-bbv (basic block generation tool)

- Hash design

Hashes comparison : 2 cycles, Hash searching : 1 cycle

HTC size : 32KB

- Access Time

Main memory : 200 cycles, Encryption/Decryption : 150 cycles

Disk : 2,656,250 cycles

Page 19: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

5. Experimental Result

19/20

54% (1.54x)42%35%32%

HTC (35%) + main memory (17%)

35%17.8%

HTC EffectivenessBaseline : without DIC

Page 20: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

5. Experimental Result

20/20

The worst case11.2%

Average : 8.03%

HTC + Main memory + PrefetchingHit Rate comparison

#1 Scheme : 32K Directed Mapped HTC Only#2 Scheme : HTC + main memory#3 Scheme : 4-way set-associate + prefetching

Average : 97%Lowest : 94%

Page 21: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

- Appendix -

21/10

Page 22: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

Example) Basic Block Trace

OpenFile ReadFile

22/10

Page 23: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

2. Background

23/20

RSA Public-Key Encryption Algorithm

Page 24: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

SHA-1 Hash Algorithm

Overall Sequence Flow

24/10

메시지

메시지 (448) 패딩

입력블록512 비트

입력블록512 비트

W0~W79 32 비트 ×80 개

초기 상태 160 비트

(A,B,C,D,E 32 비트 ×5 개

블록의 처리 80 단계

내부 상태 160 비트

(A,B,C,D,E 32 비트 ×5 개

블록의 처리 80 단계

블록의 처리 80 단계

해시값160 비트

입력블록512 비트

입력블록512 비트

입력블록512 비트

입력블록512 비트

W0~W79 32 비트 ×80 개

W0~W79 32 비트 ×80 개

내부 상태 160 비트

(A,B,C,D,E 32 비트 ×5 개

최종 상태 160 비트

(A,B,C,D,E 32 비트 ×5 개

서명문 길이

서명문 100‥‥0 64bit

K 비트 <2 비트64

패 딩

L x 512 비트 = N x 32 비트

Page 25: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

25/10

입력블록 512 비트

W0 W1 W2 W3 W4 W5 W6 W7 W8 W9 W10W11W12W13W14W15

XOR

XOR

XOR

XOR

XOR

1bit회전

1bit회전

1bit회전

1bit회전

1bit회전

W16

W17

W18

W19

W20

W63 W65 W71 W76

XOR

W79

1bit회전

Wt-16 Wt-14 Wt-8 Wt-3

XOR

Wt

1bit회전

SHA-1 Hash Algorithm

Wt 생성 과정

Page 26: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

26/10

SHA-1 Hash Algorithm

입력 블록 512 비트 A 버퍼32 비트

B 버퍼32 비트

C 버퍼32 비트

D 버퍼32 비트

E 버퍼32 비트

단계 0

단계 1

단계 2

단계 3

단계 77

단계 78

단계 79

1 블록 처리 전의 내부 상태 160비트

++

++

A 버퍼32 비트

B 버퍼32 비트

C 버퍼32 비트

D 버퍼32 비트

E 버퍼32 비트

1 블록 처리 후의 내부 상태 160 비트

입력 블록 512 비트를 160비트의 내부 상태에 섞어 넣는다 (80 단계 )

Page 27: Architecture Support for Dynamic Integrity Checking 2015. 6. 2 Bonhyun Koo Arun K. Kanuparthi 1, Ramesh Karri 1, Gaston Ormazabal 2, Sateesh K. Addepalli

27/10

SHA-1 Hash Algorithm

A 버퍼32 비트

B 버퍼32 비트

C 버퍼32 비트

D 버퍼32 비트

E 버퍼32 비트

1 단계 처리 전의 내부 상태 160 비트

1 단계 처리 후의 내부 상태 160 비트

A 버퍼의 초기값 67 45 23 01

B 버퍼의 초기값 EF CD AB 89

C 버퍼의 초기값 98 BA DC FE

D 버퍼의 초기값 10 32 54 76

E 버퍼의 초기값 C3 D2 E1 F0

A 버퍼32 비트

B 버퍼32 비트

C 버퍼32 비트

D 버퍼32 비트

E 버퍼32 비트

기약논리함수

ft

5 비트회전

30 비트회전

입력 블록과 단계에 의존하는 수 Wt

(32 비트 )

단계에 의존하는 정수 Kt 32 비트

f0~f19=(B · C) + (~B · D)

f20~f39=B ⊕ C ⊕ D

f40~f59=(B · C) + (C · D) + (D · B)

f60~f79=B ⊕ C ⊕ D

K0~K19= 5A 82 79 99

K20~K39= 6E D9 EB A1

K40~K59= 8F 1B BC DC K60~K79= CA 62 C1 D6

HSHA-1 각 단계 처리

+ : 법 232 연산