44
Sungkyunkwan University Copyright 2000-2016 Networking Laboratory Recovering Device Drivers Michael M. Swift, Muthukaruppan Annamalai, Brian N. Bershad, and Henry M. Levy University of Washington, USA Symposium on Operating Systems Design and Implementation, 2004 2016-05-16 김대천, 김희진, 차민철

Networking Laboratory

  • Upload
    phamanh

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Networking Laboratory

NetworkingLaboratory1/

Sungkyunkwan University

Copyright2000-2016NetworkingLaboratory

Recovering Device Drivers

Michael M. Swift, Muthukaruppan Annamalai, Brian N. Bershad, and Henry M. Levy

University of Washington, USASymposium on Operating Systems Design and Implementation, 2004

2016-05-16김대천, 김희진, 차민철

Page 2: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory2/44

목차

¢ Introduction

¢ Shadow driver

¢ Evaluation

¢ Conclusion

Page 3: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory3/44

Introduction¢ 대부분의 system failure는 device driver failure가원인이다.

► Driver failure를줄이면전반적인시스템 신뢰성증가

¢ 기존 Failure-isolation system에서 driver fail시 kernel은보호되지만application은에러발생

► failed driver를 kernel에올리지 않고 driver를초기상태로돌림► Failed driver에연결된 application state가날아감► 실행되던 application에잘못된 값전달

¢ Device driver가 fail되더라도연결된 application은영향을받지않는메커니즘제안

► Shadow Drivers

Page 4: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory4/44

왜 Device Drivers fail이발생하는가?¢ 대부분의 Device driver failure은예상치못한입력값이나이벤트때문

에발생한다.► Deterministic

« IO requests나환경설정에의해발생« generic tools로복구불가능

► Transient«디바이스에서추가적인 input이 들어오는경우발생

► Fail-stop« Failed driver가OS나디바이스, 애플리케이션에영향을미치기전에중지시킴

Page 5: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory5/44

Shadow Drivers(1/6)shadow drivers란?

¢ Shadow driver► Taps를이용함.► Kernel과드라이버간의 통신을모니터링함► Driver failure가발생하면 shadow driver가잠시동안 failed driver 역할을수행하는동시에 failed driver를복구

¢ 구현을위해 3가지서비스필요► Isolation service► Redirection mechanism► Object tracking service→ Nooks 사용

Page 6: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory6/44

Shadow Drivers(2/6)Shadow drivers

¢ Passive mode► 평소동작 모드► Kernel과 device driver 사이의통신을 모니터링하여복사본저장► 드라이버환경설정 정보기록

Page 7: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory7/44

Shadow Drivers(3/6)Shadow drivers

¢ Active mode► Driver fail시동작 모드► Failed driver 역할을수행하면서 kernel로부터의 call을가로챔► Failed driver를복구하기위해 kernel을흉내냄

« driver로부터 call을가로채서드라이버가재시작할수있도록함

Page 8: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory8/44

Shadow Drivers(4/6)Shadow drivers 동작

¢ Passive mode 동작► IO request

« Connection-oriented driver : 각각의 active connection 상태를저장« Request-oriented driver: 들어오는명령어들을대기시킨기록저장

► Kernel과 device driver 사이의통신을 모니터링하여복사본저장«실시간으로주고받는데이터는복사하지않음

► 드라이버환경설정과 인자값을기록« Persistent log만저장

Page 9: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory9/44

Shadow Drivers(5/6)Shadow drivers 동작

¢ Active mode 동작► Failed driver 동작멈춤► 드라이버초기화► 드라이버를 fail 발생하기전상태로 세팅

Page 10: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory10/44

Shadow Drivers(6/6)드라이버 복구 시스템 구조

Page 11: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory11/44

Evaluation(1/4)

¢테스트환경

[사용드라이버] [드라이버에연결된애플리케이션]

Page 12: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory12/44

►기존 failure-isolation환경과 shadow driver 추가한 환경 overhead 비교

Evaluation(2/4)performance

Page 13: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory13/44

► Driver fail시연결애플리케이션도 오류가발생하는지비교

Evaluation(3/4)Fault-Tolerance

Page 14: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory14/44

Evaluation(4/4)Limitations

► Non-fail-stop이얼마나발생하는지비교

Page 15: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory15/44

Conclusion(1/3)¢ Rx

► Software failure 시 checkpoint로 rollback하여다시 실행► 여러종류의 버그로부터프로그램복구

Page 16: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory16/44

Conclusion(2/3)¢ Shadow driver 한계

► 드라이버와커널 간주고받는데이터에서에러 발견못함► 잘못된인자값 발견못함► 많은수의 wrapper가수동적으로 쓰여지므로 fault들을포함할수 있음► 모든메모리에 write할수있는것을 막지못함

→ 하지만 커널신뢰성향상면에선 유용함

Page 17: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory17/44

¢ 현재는..

► Fault 수는 HAL을사용하면서 driver가여전히가장 높지만fault rate는 Arch가가장높음

► 연구에따라 fault 패턴이바뀌므로연구 우선순위를위해정기적인 source code 연구 필요

► 발생할수 있는모든종류의 fault들을 찾을수있는 tool 개발 필요

Conclusion(3/3)

Page 18: Networking Laboratory

NetworkingLaboratory18/

Sungkyunkwan University

Copyright2000-2016NetworkingLaboratory

IX : A Protected Dataplane Operating System for High Throughput and Low Latency

2016-05-16Mincheol Cha / Daecheon Kim / HeeJin Kim

Sungkyunkwan University

Page 19: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory19/44

Contents

¢ Introduction

¢ Terminology

¢ IX

¢ Evaluation

¢ Discussion & Conclusion

Page 20: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory20/44

IntroductionDatacenter Applications 문제(1/2)¢ Microsecond tail latency

► 많은수의서비스사이에서풍부한상호작용을위함► RPC 요청의 latency 분배

request latency 각서비스 node는 이상적으로 99%의 request latency를 옳바르게분배필요함

¢ High packet rates► Datacenter application을 구성하는 request와 reply는생각보다작음► 만약대규모의 connection counts를관리못하면 app에피해를줄수있음

높은 packet 율은높은수의 concurrent 접속과 많은접속 churn으로 인해지속되어야함

Page 21: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory21/44

IntroductionDatacenter Applications 문제(2/2)¢ Protection

► Kernel 기반 or hypervisor 기반 networking stack을사용하면보안문제발생함

Server를공유하는복수의 service에서 app간 isolation 필요함

¢ Resource efficiency► 주간패턴과 user traffic으로인해발생하는 datacenter app의 load가 상당히변함

각 service node는적은 resource로 packet rate를만족하고 tail latency 요구조건을만족해야함

Page 22: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory22/44

IntroductionThe Hardware – OS 와 hardware Mismatch

¢ The wealth of hardware resources but poor OS► 작은 latency와고 packet rate가필요함

Hardware 자원을효율적으로사용가능한 OS 개발필요

Page 23: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory23/44

IntroductionThe Hardware – Alternative Approaches

¢ User-space networking stacks► Kernel overhead 를줄이기위해 user-space의 networking stack을변환함► Kernel의복잡도를줄여 packet processing 최적화

packet rate와 latency tradeoff 함

¢ TCP 대안► UDP 사용

신뢰성있는통신 -> user level 처리

¢ POSIX API 대안► Mega pipe : software overhead와 packet rate 증가

kernel-based networking stack이가지는문제점계속가지고있음

¢ OS 향상► Tuning kernel-based stacks은 system 전개함에있어좋음

Epoll에최적화되어있지않음

Page 24: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory24/44

TerminologyModern system(1/3)

¢ Zero-copy API : context switch에서 copy를줄이고 data를 network에직접적으로보냄

Page 25: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory25/44

TerminologyModern system(2/3)

¢ mTCP : multicore system에서고성능 User-level TCP Stack으로활용

Page 26: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory26/44

TerminologyModern system(3/3)

¢ Hierarchical ring(protection ring) : faults와malicious behavior로부터data와기능을보호

Page 27: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory27/44

IXDesign of IX

¢ controlplane and dataplane를나누기► Controlplane : resource configuration, provisioning, and so on(simple kernel mode)► Dataplane: networking stack과 app logic 수행

¢ adaptive batching 기법으로 RTC 수행► Data와 instruction cache locality

¢ 분명한 flow control로 zero-copy API 실현

¢ Flow 일관성, 동기화-free processing

Page 28: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory28/44

IXIX implementation¢ The IX dataplane operating system

► Control과 Data Plane 분할► Adaptive Batching을 사용한RTC (network stack)

DataplaneControlplane

Page 29: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory29/44

IXIX Parts, Hierarchical rings¢ IX에서 Rings

FullLinuxKernelDune

VMXrootring0

=가상시스템의하이퍼바이져와비슷

Dataplane이 VMXnon-rootring0에서특수목적의OS로써작동하도록도와준다.(가상시스템에서guestkernel과같은것)

VMXNon- rootring3

App App1. Dataplane이 pagetables,exceptions등에직접연결도와줌2. Dataplane ->NIC연결도와줌3. Controlplane,dataplanes,신뢰성없는 appcode간

Full3-wayprotection제공

Page 30: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory30/44

IXIX Parts, Controlplane¢ IX에서 Controlplane

IXControlplane

FullLinuxKernelIXCP(Userlevelprogram)

1. PCIe(Lan card등등)장치초기화2. 기본리소스할당메커니즘제공(코어,메모리,네트워크)

1. Resource사용,dataplane기능감시2. 리소스할당정책구현

-리소스할당정책구현-*dataplane과

에너지비례원칙의어려운 tradeoff이해*인접 app의이해

(시간이지남에따라달라지는 load)

Controlplane dataplane

Coarsh-grainedmanner로 resource할당

Page 31: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory31/44

IXIX Parts, Dataplane(1/4)¢ Dataplane에서 Thread

IXDataplane 1. HighperformancenetworkIO2. 단일 app에서동작(libraryOS와비슷)+memoryisolation3. 다수의 kernel-levelservices제공

Single,multithreadapp지원단일주소공간OS

Thread지원Elasticthread:dataplane실행,network I/O소모Backgroundthread

=두 thread모두POSIXsystemcallissue가능

Elasticthread->blockingcall은안만듬(지연된 packetprossesing에서초래되는 network행동결과에부정적인영향을줌)

예측된 latency에서높은성능을내기위해 dataplane에할당된 hardwarethread나 core의독점적사용가능

Backgroundthread할당된 hardwarethread를공동사용

Controlplane이더자원을할당하면 elasticthread갯수조정가능 =exokernel과비슷

Page 32: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory32/44

IXIX Parts, Dataplane(2/4)¢ Dataplane에서 memory management

Hardwarethread

Memorypool에서dataobject할당

Hot-path활용 1. 사이즈가같은objectarray로써구성됨2. Page-sizedblock지급

1. Internalmemory의복잡도낮춤2. Memory효율성Up

*할당안된Object는리스트를만들어트랙킹->Freeobjects->Allocationroutines실행 (즉시할당해버림)*Single주소공간유지*kernel pages는 spervisor bit에의해보호됨*Swappablememory는지원안함(성능변화피하려고)

Page 33: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory33/44

IXIX Parts, Dataplane(3/4)¢ Dataplane에서남은기능

¢ Virtual address translation 관리 = nested paging 기법활용► 동시대OS들은 2MB large pages 사용► 이점 : 1. address translation overhead 감소

2. 현대서버에물리적memory 자원비교적풍부

¢ Mbuf : packet 전송에사용

¢ Hierarchical timing wheel 구현 : TCP 실패시재전송등을위해 timer 구현)► High-resolution timeout 지원(TCP incast congestion에서성능향상)

Page 34: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory34/44

IXIX Parts, Dataplane(4/4)¢ 그리고현재의 Dataplane

Dune,VT-x가상화사용

ARM,SPARC,Power등가상화지원으로 porting가능

Intel82599chipsetNIC사용(쉽게추가적인드라이버지원)

39kSLOC으로구성

RFC-Compliant:UDP,ARP,ICMP도지원

LwIP사용 :원래는임베디드용(메모리효율이좋음)->multicorescalability,fine-grainedtimermanagement용으로바꿈

DPDK41%IntelNICdevicedriver

LwIP 26% DUNELib16%

NewCode7kSLOC

Page 35: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory35/44

IXStructure of dataplane batched System Calls¢ The IX dataplane system call과 event condition API

► Elastic Threads는 batch system calls을발행하고 event conditions을사용함

Page 36: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory36/44

EvaluationExperimental Methodology 와 general evaluation

¢ NetPIPE 성능 : message sizes 변화와 system software 조정¢ IX 와 Linux, mTCP 비교

SPEC.► Quanta/Cumulus 48X10GBe switch► Mix of Xeon E5-2637► Xeon E5-2665 server► 256GB of DRAM► 8 cores and 16 hyperthreads► Intel x520 10GBE NICs

Page 37: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory37/44

EvaluationLatency 와 Single-flow Bandwidth (1/4)

¢ Multicore Scalability for Short Connections

Page 38: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory38/44

EvaluationLatency 와 Single-flow Bandwidth (2/4)

¢ N round-trips per connection

Page 39: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory39/44

EvaluationLatency 와 Single-flow Bandwidth (3/4)

¢ Different 메시지 sizes s (n=1)

Page 40: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory40/44

EvaluationLatency 와 Single-flow Bandwidth (4/4)

¢ Connection 확장성

Page 41: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory41/44

EvaluationThroughput 과 Scalability

¢ 2개의 memcached워크로드

Page 42: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory42/44

Discussion & ConclusionDiscussion

¢ IX를빠르게하는것은?

¢ adaptive batching의어려움

¢ current prototype의한계

Page 43: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory43/44

Discussion & ConclusionDiscussion

¢ IX vs Arrakis

Arrakis :IX와유사하게 per-app의 networkstack을개발(평상시batchinghandling동작을없앰)

IX Arrakis

Page 44: Networking Laboratory

OperatingSystemsDesignNetworkingLaboratory44/44

Discussion & ConclusionConclusion

¢ IX는과연좋은OS인가?

¢ Resource constrained issue

¢ IX 활용가능성