Trusted Cloud Storage Tech Talk

Preview:

Citation preview

Victor Costan (龍望), Hsin-Jung Yang (楊昕蓉), Srini Devadas, Nickolai Zeldovich

Secure Cloud Storage and Computing Using

Reconfigurable Hardware

Why Security Matters

Cloud Computing: Dreams and Reality

• The Cloud: Ideal Picture • The Cloud: Reality

Cloud Storage: Attack Vectors

Hypervisor Bugs

State Manipulation

Hardware Attacks

Replay Attacks are Harmful

Spot the Differences

Spot the Differences

Spot the Differences

Spot the Differences: Job

Spot the Differences: Job

Spot the Differences: Name, Relationship Status

Why It Matters

• We rely on fresh data to make decisions

– Google searches

– Facebook profiles

– Twitter, Linked-In

• Outdated data has big impact on users

– Wrong profile information: confusion, embarrassment

– Old search results: bad business decisions, embarrassment

– Old document versions: costly business decisions, regulatory issues

System Design

Design: Cloud Storage API • Block Device

– Fixed block size (1Mb)

– Write(block number, block)

– Read(block number) block

• Easy to reason about the security

• File systems operate on top of this abstraction

B1 B2 B3 B4

Disk divided into 1MB blocks

Design: System Architecture

CPU (Untrusted)

Disk (Untrusted)

RAM (Untrusted) Network Card

(Untrusted)

FPGA / ASIC (Trusted)

Secure NVRAM Chip

System Bus (Untrusted)

Client

Internet (Untrusted)

Design: Trusted Storage on Untrusted Disks 160-bit hash in trusted memory authenticates 1TB disk

B1 B2 B3 B4

h1=h(B1)

h5=h(h1||h2)

h2=h(B2) h3=h(B3) h4=h(B4)

h6=h(h3||h4)

h7=h(h5||h6)

Disk divided into 1MB blocks

Root Hash

Leaves hash their blocks

Nodes hash their children

Root hash matches iff all blocks match

20 levels

Design: Hash Tree Caching

Node

number

Hash Verified Left

child

Right

child

1 fabe3c05d8ba995af93e Y Y N

2 e6fc9bc13d624ace2394 Y Y Y

4 53a81fc2dcc53e4da819 Y N N

5 b2ce548dfa2f91d83ec6 Y N N

1

2 3

4 5 6 7

The FPGA caches hash tree nodes

The untrusted OS is free to choose the caching policy, for maximum

performance

Design: Hash Tree Cache • Server stores entire hash tree in RAM

• FPGA has a cache that stores a subset of nodes

• Server tells FPGA what nodes to store

Node Hash Verified

1 fabe… Y

2 e6fc… Y

4 53a8… Y

5 b2ce… Y

1

2 3

4 5 6 7

Cache management commands

Design: Hash Tree Cache - Load

Node Hash Verified

1 fabe… Y

2 e6fc… Y

4 53a8… N

1

2

4

Node Hash Verified

1 fabe… Y

2 e6fc… Y

4 53a8… N

5 b2ce… N

1

2

4 5

• Server tells the FPGA to load a node into a cache entry

• The cache entry is unverified right after a load

Design: Hash Tree Cache - Verify

Node Hash Verified

1 fabe… Y

2 e6fc… Y

4 53a8… N

5 b2ce… N

Node Hash Verified

1 fabe… Y

2 e6fc… Y

4 53a8… Y

5 b2ce… Y

1

2

4 5

• Server tells the FPGA to use a node to verify its children

• FPGA checks that parent’s hash matches children hashes

1

2

4 5

Design: Hash Tree Cache - Efficiency

• Checking leaf 33 requires 10 node loads for a cold cache on this toy example (38 loads on the real FPGA tree)

• Remember the root is always loaded in the cache

1

2 3

4 5

16 17

8 9

32 33

Design: Hash Tree Cache - Efficiency

• Checking leaf 38 only 4 node loads, because 9 is already in the cache and verified

• Server can predict client requests and manage cache for high performance

1

2 3

4 5

16 17

8 9

32 33

19 18

38 39

Results

Results: System Architecture

CPU (Untrusted)

Disk (Untrusted)

RAM (Untrusted) Network Card

(Untrusted)

FPGA / ASIC (Trusted)

Secure NVRAM Chip

System Bus (Untrusted)

Client

Internet (Untrusted)

Results: Server Prototype

Results: Server Prototype

Results: Normal Operation

Results: FPGA Board, Normal Operation

Results: Attack Does Not Impact User

Results: FPGA Board, Under Attack

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Results: Prototype Performance (est.)

1 MB = 1 block

Disk I/O Throughput

7,200 RPM HDD 70 MB/s

10,000 RPM HDD 100 MB/s

15,000 RPM HDD 130 MB/s

SSD 250 MB/s

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Results: Prototype Performance (est.)

Operation Throughput

Block Hash 800 MB/s

Pipelined

Block Hash

3,200 MB/s

Transport Throughput

PCI Express x16 4,096 MB/s

SATA II 384 MB/s

PCI Express x1 250 MB/s

Ethernet 125 MB/s

1 MB = 1 block

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Results: Prototype Performance (est.)

Operation Throughput

Tree Node Hash 1.25 M/s

Pipelined

Tree Node Hash

5.0 M/s

Tree Operations 62.5 k/s

Optimized Tree

Operations

2.5 M/s

Transport Throughput

PCI Express x16 4,096 MB/s

SATA II 384 MB/s

PCI Express x1 250 MB/s

Ethernet 125 MB/s

1 MB = 1 block

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Results: Prototype Performance (est.)

Operation Throughput

Tree Node Hash 1.25 M/s

Pipelined

Tree Node Hash

5.0 M/s

Tree Operations 62.5 k/s

Transport Throughput

PCI Express x16 4,096 MB/s

SATA II 384 MB/s

PCI Express x1 250 MB/s

Ethernet 125 MB/s

1 MB = 1 block

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Results: Prototype Performance (est.)

Operation Throughput

Node HMAC 1.25 M/s

Transport Throughput

PCI Express x16 4,096 MB/s

SATA II 384 MB/s

PCI Express x1 250 MB/s

Ethernet 125 MB/s

1 MB = 1 block

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

Results: Performance Block Diagram

HMAC (Sign) Result

Limit: Hash Engine Speed

Update Hash Tree (Writes Only)

Limit: Hash Engine Speed Limit: Dependencies

Load & Verify Hash Tree Nodes

Limit: Hash Engine Speed Limit: Dependencies

Hash 1MB Data Block

Limit: Hash Engine Speed Limit: FPGA Data Bus

Read / Write 1MB Data Block to Disk

Limit: Disk I/O Speed

• Steps are performed in parallel (pipelined), because they are in different system components

• However, the slowest step is the bottleneck for the entire system

• Each step can be made faster by adding more hardware (e.g. more disks), assuming cache policies can scale up

Results: Ping-Pong Workload

0

1

2

3

4

5

6

7

8

9

10

0 5 10 15 20

Blo

ck

Time

• Typical collaboration scenario

• Real-Life

– Google Docs

– Facebook Messages

– Dropbox

• Straight-up LRU shines here

Results: Photo Gallery Workload

0

1

2

3

4

5

6

7

8

9

10

0 5 10 15 20

Blo

ck

Time

• Modeled after data on photo applications

• Real-Life

– Facebook’s #1 Feature

– Google Picasa

– Flixter

• Special policy inspired by Facebook Haystack classifies photos, loads cache predictively

Results: Map-Reduce Workload

0

5

10

15

20

25

30

0 5 10

Blo

ck

Time

• Index-generating Map-Reduce

• Real-Life

– Google Pagerank

– Facebook friend graph (EdgeRank)

• Special policy that takes advantage of Map-Reduce access pattern

Results: Cache Hit Rates

0.5

0.6

0.7

0.8

0.9

1

Spec LRU

Haystack

MR-Aware

• Applications: 2 users collaborating on a file (ping-pong), photo gallery browsing, Map-Reduce job

• Cache policies: Speculative Last-Recently Used, Facebook Haystack’s policy optimized for caching, policy optimized for Map-Reduce access patterns

• Conclusion: no policy works well on all applications, so app server must drive policy

Results: Protocol Overhead

• Client – Server Bandwidth overhead: 0.002%

– Operation: 1 HMAC (20 bytes) per 1MB = 0.002%

– Handshake: extra secret exchange piggybacks on SSL: 5%

• Latency overhead (1 client): 4%

– Without security: 8.2ms / request

– With security: 8.5ms / request

– Latency overhead = the latency of a very fast Internet hop

• No throughput overhead (N-clients)

– With or without security: 100MB/s

– Need 40 HDDs to saturate PCI-E x16, 52 HDDs to saturate FPGA

MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY

Results: Protocol Overhead

• Protocol is simple enough to implement on browser side

– Chrome

– Firefox

– Internet Explorer 10

• Easy integration in existing Web applications

• End-to-end security

Questions?

Thank You!

Other Applications

• FPGA can be used to load user-specified circuits and perform arbitrary computation with security guarantees

• Applications: encrypted image search, financial calculations

• Potential applications in highly regulated industries, e.g. medical record keeping and processing, secure financial services

MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY

Secure Computation: Overview

Task

Untrusted computation:

VM image

Trusted computation:

Circuit spec

Cloud Machine

VM image CPU cores

Circuit spec FPGA

LUTs

• Most code is untrusted, executes in a VM

• Trusted code is broken up into kernels which become circuits deployed onto an FPGA

• If efficiency is not an issue, deploy a processor on the FPGA, execute software securely

6/9/2011

MIT COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE LABORATORY

Secure Computation: Challenge

• Multi-tenancy is the key to the cloud’s cost effectiveness

• FPGA can host different applications running in parallel

• Challenge: isolation between applications, just like a hypervisor

FPGA controller

Client 1 Application

Client 2 Application

Client 3 Application

VM Hypervisor

Client 1 VM

Client 2 VM

Client 3 VM

PCI Express

Other Applications

• FPGA can be used to load user-specified circuits and perform arbitrary computation with security guarantees

• Applications: encrypted image search, financial calculations

• Potential applications in highly regulated industries, e.g. medical record keeping and processing, secure financial services

Design: FPGA Boot Sequence

PKcard + Manufacturer Certificate

random nonce

PUFsyndrome + SignPKcard(PUFsyndrome)

Root Hash + SignPKcard(nonce || Root Hash)

EncSKfpga(SKcard) + MACSKfpga

(nonce || SKcard)

Check certificate against e-fuses

Compute SKfpga from PUFsyndrome

Verify signature

Verify MAC

Check Pkcard against certificate

Design: Client Trust Model • Each FPGA – NVRAM pair has a Endorsement Key (EK)

• Manufacturer certifies the public EK

• Client uses the public EK to encrypt a HMAC key, which becomes its shared secret with the trusted hardware

Manufacturer

PrivEK PubEK

Endorsement Certificate

sign Client

verify

HMAC key

generate

Encrypted HMAC key

encrypt with PubEK

HMAC key

decrypt with PrivEK

Design: Hash Tree Security

1. Impossible to come up with a block B1’ such that B1 ≠ B1’ but h(B1) = h(B1’)

2. Impossible to come up with a node hash h1’ such that h1’ such that h1 ≠ h1’ but h(h1||h2) = h(h1’||h2)

Therefore, the root hash authenticates the entire contents of the tree.

• Server OS transfers messages between FPGA and Trusted Memory untrusted channel

• FPGA authenticates Trusted Memory using Manufacturer Certificate, whose public key is burned into FPGA’s e-fuses

• Trusted Memory authenticates FPGA using its Physically Unclonable Function (PUF)

• At manufacturing time, FPGA is paired with memory chip

• FPGA can be paired with new memory chip if necessary

Design: FPGA Boot Sequence Security

Design: Hash Tree Cache Security

• Server OS responsible for loading and verifying tree nodes

• Parent node hash verifies children nodes

• Reading a block requires the block’s leaf to be verified

• Writing a block requires the path from the block’s leaf to the root to be loaded and verified

• A node can be loaded in at most one cache line, to prevent replay attacks using stale node hashes