58
Victor Costan (龍望), Hsin-Jung Yang (楊昕蓉), Srini Devadas, Nickolai Zeldovich Secure Cloud Storage and Computing Using Reconfigurable Hardware

Trusted Cloud Storage Tech Talk

Embed Size (px)

Citation preview

Page 1: Trusted Cloud Storage Tech Talk

Victor Costan (龍望), Hsin-Jung Yang (楊昕蓉), Srini Devadas, Nickolai Zeldovich

Secure Cloud Storage and Computing Using

Reconfigurable Hardware

Page 2: Trusted Cloud Storage Tech Talk

Why Security Matters

Page 3: Trusted Cloud Storage Tech Talk

Cloud Computing: Dreams and Reality

• The Cloud: Ideal Picture • The Cloud: Reality

Page 4: Trusted Cloud Storage Tech Talk

Cloud Storage: Attack Vectors

Hypervisor Bugs

State Manipulation

Hardware Attacks

Page 5: Trusted Cloud Storage Tech Talk

Replay Attacks are Harmful

Page 6: Trusted Cloud Storage Tech Talk

Spot the Differences

Page 7: Trusted Cloud Storage Tech Talk

Spot the Differences

Page 8: Trusted Cloud Storage Tech Talk

Spot the Differences

Page 9: Trusted Cloud Storage Tech Talk

Spot the Differences: Job

Page 10: Trusted Cloud Storage Tech Talk

Spot the Differences: Job

Page 11: Trusted Cloud Storage Tech Talk

Spot the Differences: Name, Relationship Status

Page 12: Trusted Cloud Storage Tech Talk

Why It Matters

• We rely on fresh data to make decisions

– Google searches

– Facebook profiles

– Twitter, Linked-In

• Outdated data has big impact on users

– Wrong profile information: confusion, embarrassment

– Old search results: bad business decisions, embarrassment

– Old document versions: costly business decisions, regulatory issues

Page 13: Trusted Cloud Storage Tech Talk

System Design

Page 14: Trusted Cloud Storage Tech Talk

Design: Cloud Storage API • Block Device

– Fixed block size (1Mb)

– Write(block number, block)

– Read(block number) block

• Easy to reason about the security

• File systems operate on top of this abstraction

B1 B2 B3 B4

Disk divided into 1MB blocks

Page 15: Trusted Cloud Storage Tech Talk

Design: System Architecture

CPU (Untrusted)

Disk (Untrusted)

RAM (Untrusted) Network Card

(Untrusted)

FPGA / ASIC (Trusted)

Secure NVRAM Chip

System Bus (Untrusted)

Client

Internet (Untrusted)

Page 16: Trusted Cloud Storage Tech Talk

Design: Trusted Storage on Untrusted Disks 160-bit hash in trusted memory authenticates 1TB disk

B1 B2 B3 B4

h1=h(B1)

h5=h(h1||h2)

h2=h(B2) h3=h(B3) h4=h(B4)

h6=h(h3||h4)

h7=h(h5||h6)

Disk divided into 1MB blocks

Root Hash

Leaves hash their blocks

Nodes hash their children

Root hash matches iff all blocks match

20 levels

Page 17: Trusted Cloud Storage Tech Talk

Design: Hash Tree Caching

Node

number

Hash Verified Left

child

Right

child

1 fabe3c05d8ba995af93e Y Y N

2 e6fc9bc13d624ace2394 Y Y Y

4 53a81fc2dcc53e4da819 Y N N

5 b2ce548dfa2f91d83ec6 Y N N

1

2 3

4 5 6 7

The FPGA caches hash tree nodes

The untrusted OS is free to choose the caching policy, for maximum

performance

Page 18: Trusted Cloud Storage Tech Talk

Design: Hash Tree Cache • Server stores entire hash tree in RAM

• FPGA has a cache that stores a subset of nodes

• Server tells FPGA what nodes to store

Node Hash Verified

1 fabe… Y

2 e6fc… Y

4 53a8… Y

5 b2ce… Y

1

2 3

4 5 6 7

Cache management commands

Page 19: Trusted Cloud Storage Tech Talk

Design: Hash Tree Cache - Load

Node Hash Verified

1 fabe… Y

2 e6fc… Y

4 53a8… N

1

2

4

Node Hash Verified

1 fabe… Y

2 e6fc… Y

4 53a8… N

5 b2ce… N

1

2

4 5

• Server tells the FPGA to load a node into a cache entry

• The cache entry is unverified right after a load

Page 20: Trusted Cloud Storage Tech Talk

Design: Hash Tree Cache - Verify

Node Hash Verified

1 fabe… Y

2 e6fc… Y

4 53a8… N

5 b2ce… N

Node Hash Verified

1 fabe… Y

2 e6fc… Y

4 53a8… Y

5 b2ce… Y

1

2

4 5

• Server tells the FPGA to use a node to verify its children

• FPGA checks that parent’s hash matches children hashes

1

2

4 5

Page 21: Trusted Cloud Storage Tech Talk

Design: Hash Tree Cache - Efficiency

• Checking leaf 33 requires 10 node loads for a cold cache on this toy example (38 loads on the real FPGA tree)

• Remember the root is always loaded in the cache

1

2 3

4 5

16 17

8 9

32 33

Page 22: Trusted Cloud Storage Tech Talk

Design: Hash Tree Cache - Efficiency

• Checking leaf 38 only 4 node loads, because 9 is already in the cache and verified

• Server can predict client requests and manage cache for high performance

1

2 3

4 5

16 17

8 9

32 33

19 18

38 39

Page 23: Trusted Cloud Storage Tech Talk

Results

Page 24: Trusted Cloud Storage Tech Talk

Results: System Architecture

CPU (Untrusted)

Disk (Untrusted)

RAM (Untrusted) Network Card

(Untrusted)

FPGA / ASIC (Trusted)

Secure NVRAM Chip

System Bus (Untrusted)

Client

Internet (Untrusted)

Page 25: Trusted Cloud Storage Tech Talk

Results: Server Prototype

Page 26: Trusted Cloud Storage Tech Talk

Results: Server Prototype

Page 27: Trusted Cloud Storage Tech Talk

Results: Normal Operation

Page 28: Trusted Cloud Storage Tech Talk

Results: FPGA Board, Normal Operation

Page 29: Trusted Cloud Storage Tech Talk

Results: Attack Does Not Impact User

Page 30: Trusted Cloud Storage Tech Talk

Results: FPGA Board, Under Attack

Page 31: Trusted Cloud Storage Tech Talk

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Page 32: Trusted Cloud Storage Tech Talk

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Page 33: Trusted Cloud Storage Tech Talk

Results: Prototype Performance (est.)

1 MB = 1 block

Disk I/O Throughput

7,200 RPM HDD 70 MB/s

10,000 RPM HDD 100 MB/s

15,000 RPM HDD 130 MB/s

SSD 250 MB/s

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Page 34: Trusted Cloud Storage Tech Talk

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Page 35: Trusted Cloud Storage Tech Talk

Results: Prototype Performance (est.)

Operation Throughput

Block Hash 800 MB/s

Pipelined

Block Hash

3,200 MB/s

Transport Throughput

PCI Express x16 4,096 MB/s

SATA II 384 MB/s

PCI Express x1 250 MB/s

Ethernet 125 MB/s

1 MB = 1 block

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Page 36: Trusted Cloud Storage Tech Talk

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Page 37: Trusted Cloud Storage Tech Talk

Results: Prototype Performance (est.)

Operation Throughput

Tree Node Hash 1.25 M/s

Pipelined

Tree Node Hash

5.0 M/s

Tree Operations 62.5 k/s

Optimized Tree

Operations

2.5 M/s

Transport Throughput

PCI Express x16 4,096 MB/s

SATA II 384 MB/s

PCI Express x1 250 MB/s

Ethernet 125 MB/s

1 MB = 1 block

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Page 38: Trusted Cloud Storage Tech Talk

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Page 39: Trusted Cloud Storage Tech Talk

Results: Prototype Performance (est.)

Operation Throughput

Tree Node Hash 1.25 M/s

Pipelined

Tree Node Hash

5.0 M/s

Tree Operations 62.5 k/s

Transport Throughput

PCI Express x16 4,096 MB/s

SATA II 384 MB/s

PCI Express x1 250 MB/s

Ethernet 125 MB/s

1 MB = 1 block

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Page 40: Trusted Cloud Storage Tech Talk

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Page 41: Trusted Cloud Storage Tech Talk

Results: Prototype Performance (est.)

Operation Throughput

Node HMAC 1.25 M/s

Transport Throughput

PCI Express x16 4,096 MB/s

SATA II 384 MB/s

PCI Express x1 250 MB/s

Ethernet 125 MB/s

1 MB = 1 block

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Page 42: Trusted Cloud Storage Tech Talk

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

• Steps are performed in parallel (pipelined), because they are in different system components

• However, the slowest step is the bottleneck for the entire system

• Each step can be made faster by adding more hardware (e.g. more disks), assuming cache policies can scale up

Page 43: Trusted Cloud Storage Tech Talk

Results: Ping-Pong Workload

0

1

2

3

4

5

6

7

8

9

10

0 5 10 15 20

Blo

ck

Time

• Typical collaboration scenario

• Real-Life

– Google Docs

– Facebook Messages

– Dropbox

• Straight-up LRU shines here

Page 44: Trusted Cloud Storage Tech Talk

Results: Photo Gallery Workload

0

1

2

3

4

5

6

7

8

9

10

0 5 10 15 20

Blo

ck

Time

• Modeled after data on photo applications

• Real-Life

– Facebook’s #1 Feature

– Google Picasa

– Flixter

• Special policy inspired by Facebook Haystack classifies photos, loads cache predictively

Page 45: Trusted Cloud Storage Tech Talk

Results: Map-Reduce Workload

0

5

10

15

20

25

30

0 5 10

Blo

ck

Time

• Index-generating Map-Reduce

• Real-Life

– Google Pagerank

– Facebook friend graph (EdgeRank)

• Special policy that takes advantage of Map-Reduce access pattern

Page 46: Trusted Cloud Storage Tech Talk

Results: Cache Hit Rates

0.5

0.6

0.7

0.8

0.9

1

Spec LRU

Haystack

MR-Aware

• Applications: 2 users collaborating on a file (ping-pong), photo gallery browsing, Map-Reduce job

• Cache policies: Speculative Last-Recently Used, Facebook Haystack’s policy optimized for caching, policy optimized for Map-Reduce access patterns

• Conclusion: no policy works well on all applications, so app server must drive policy

Page 47: Trusted Cloud Storage Tech Talk

Results: Protocol Overhead

• Client – Server Bandwidth overhead: 0.002%

– Operation: 1 HMAC (20 bytes) per 1MB = 0.002%

– Handshake: extra secret exchange piggybacks on SSL: 5%

• Latency overhead (1 client): 4%

– Without security: 8.2ms / request

– With security: 8.5ms / request

– Latency overhead = the latency of a very fast Internet hop

• No throughput overhead (N-clients)

– With or without security: 100MB/s

– Need 40 HDDs to saturate PCI-E x16, 52 HDDs to saturate FPGA

MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY

Page 48: Trusted Cloud Storage Tech Talk

Results: Protocol Overhead

• Protocol is simple enough to implement on browser side

– Chrome

– Firefox

– Internet Explorer 10

• Easy integration in existing Web applications

• End-to-end security

Page 49: Trusted Cloud Storage Tech Talk

Questions?

Thank You!

Page 50: Trusted Cloud Storage Tech Talk

Other Applications

• FPGA can be used to load user-specified circuits and perform arbitrary computation with security guarantees

• Applications: encrypted image search, financial calculations

• Potential applications in highly regulated industries, e.g. medical record keeping and processing, secure financial services

MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY

Page 51: Trusted Cloud Storage Tech Talk

Secure Computation: Overview

Task

Untrusted computation:

VM image

Trusted computation:

Circuit spec

Cloud Machine

VM image CPU cores

Circuit spec FPGA

LUTs

• Most code is untrusted, executes in a VM

• Trusted code is broken up into kernels which become circuits deployed onto an FPGA

• If efficiency is not an issue, deploy a processor on the FPGA, execute software securely

6/9/2011

MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY

Page 52: Trusted Cloud Storage Tech Talk

Secure Computation: Challenge

• Multi-tenancy is the key to the cloud’s cost effectiveness

• FPGA can host different applications running in parallel

• Challenge: isolation between applications, just like a hypervisor

FPGA controller

Client 1 Application

Client 2 Application

Client 3 Application

VM Hypervisor

Client 1 VM

Client 2 VM

Client 3 VM

PCI Express

Page 53: Trusted Cloud Storage Tech Talk

Other Applications

• FPGA can be used to load user-specified circuits and perform arbitrary computation with security guarantees

• Applications: encrypted image search, financial calculations

• Potential applications in highly regulated industries, e.g. medical record keeping and processing, secure financial services

Page 54: Trusted Cloud Storage Tech Talk

Design: FPGA Boot Sequence

PKcard + Manufacturer Certificate

random nonce

PUFsyndrome + SignPKcard(PUFsyndrome)

Root Hash + SignPKcard(nonce || Root Hash)

EncSKfpga(SKcard) + MACSKfpga

(nonce || SKcard)

Check certificate against e-fuses

Compute SKfpga from PUFsyndrome

Verify signature

Verify MAC

Check Pkcard against certificate

Page 55: Trusted Cloud Storage Tech Talk

Design: Client Trust Model • Each FPGA – NVRAM pair has a Endorsement Key (EK)

• Manufacturer certifies the public EK

• Client uses the public EK to encrypt a HMAC key, which becomes its shared secret with the trusted hardware

Manufacturer

PrivEK PubEK

Endorsement Certificate

sign Client

verify

HMAC key

generate

Encrypted HMAC key

encrypt with PubEK

HMAC key

decrypt with PrivEK

Page 56: Trusted Cloud Storage Tech Talk

Design: Hash Tree Security

1. Impossible to come up with a block B1’ such that B1 ≠ B1’ but h(B1) = h(B1’)

2. Impossible to come up with a node hash h1’ such that h1’ such that h1 ≠ h1’ but h(h1||h2) = h(h1’||h2)

Therefore, the root hash authenticates the entire contents of the tree.

Page 57: Trusted Cloud Storage Tech Talk

• Server OS transfers messages between FPGA and Trusted Memory untrusted channel

• FPGA authenticates Trusted Memory using Manufacturer Certificate, whose public key is burned into FPGA’s e-fuses

• Trusted Memory authenticates FPGA using its Physically Unclonable Function (PUF)

• At manufacturing time, FPGA is paired with memory chip

• FPGA can be paired with new memory chip if necessary

Design: FPGA Boot Sequence Security

Page 58: Trusted Cloud Storage Tech Talk

Design: Hash Tree Cache Security

• Server OS responsible for loading and verifying tree nodes

• Parent node hash verifies children nodes

• Reading a block requires the block’s leaf to be verified

• Writing a block requires the path from the block’s leaf to the root to be loaded and verified

• A node can be loaded in at most one cache line, to prevent replay attacks using stale node hashes