S⁄fl⁄’v D⁄‹†¤⁄g•†vp P¤ı‡¤[flfl⁄Œ‡ S N · Beehive is a more generic proposal built around a programming model that is similar ii. to centralized controllers,

Simple Distributed Programmingfor

Scalable Software-Defined Networks

by

Soheil Hassas Yeganeh

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of Computer ScienceUniversity of Toronto

Copyright © 2015 by Soheil Hassas Yeganeh

Abstract

Simple Distributed Programmingfor

Scalable So�ware-De�ned Networks

Soheil Hassas YeganehDoctor of Philosophy

Graduate Department of Computer ScienceUniversity of Toronto

2015

So�ware De�ned Networking (SDN) aims at simplifying the process of network programming,

by decoupling the control and data planes. �is is o�en exempli�ed by implementing a

complicated control logic as a simple control application deployed on a centralized controller.

In practice, for scalability and resilience, such simple control applications are implemented

as complex distributed applications, that demand signi�cant e�orts to develop, measure, and

optimize. Such complexities obviate the bene�ts of SDN.

In this thesis, we present two distributed control platforms, Kandoo and Beehive, that are

straightforward to program. Kandoo is a two-layer control plane: (i) the bottom layer is a colony

of controllers with only local state, and (ii) the top layer is a logically centralized controller

that maintains the network-wide state. Controllers at the bottom layer run only local control

applications (i.e., applications that can function using the state of a single switch) near datapath.

�ese controllers handle frequent events and shield the top layer. Using Kandoo, programmers

�ag their applications as either local or non-local, and the o�oading process is seamlessly

handled by the platform. Our evaluations show that a network controlled by Kandoo has an

order of magnitude lower control channel consumption compared to centralized controllers.

We demonstrate that Kandoo can be used to adapt and optimize Transform Control Protocol

(TCP) in a large-scale High-Performance Computing (HPC) datacenter with a low network

overhead.

Beehive is a more generic proposal built around a programming model that is similar

ii

to centralized controllers, yet enables the platform to automatically infer how applications

maintain their state and depend on one another. Using this programming model, the platform

automatically generates the distributed version of each control application. With runtime

instrumentation, the platform dynamically and seamlessly migrates applications among con-

trollers aiming to optimize the control plane. Beehive also provides feedback to identify design

bottlenecks in control applications, helping developers enhance the performance of the control

plane. Implementing a distributed controller and a simple routing algorithm using Beehive,

we demonstrate that Beehive is as simple to use as centralized controllers and can scale on

par with existing distributed controllers. Moreover, we demonstrate that Beehive automatically

improves control plane’s latency by optimizing the placement of control applications without

any intervention by network operators.

iii

Dedication

To Mom, Dad, and Shabnam.

iv

Acknowledgements

Above all, I would like to sincerely thank my advisor, Yashar Ganjali, for his support, for hisdedication, and for being an amazing source of encouragement and inspiration. I considermyself very fortunate to have him as my supervisor, mentor and friend. I am also grateful tothe members of my supervisory committee Eyal Delara and David Lie for their continuousguidance, and to my committee members Nick Feamster and Angela Demke Brown for theirinsightful and invaluable feedbacks on my research.

I would also like to thank Renèe Miller for her strong support and continuous researchadvice. I cannot go without mentioning my friends and co-authors Oktie Hassanzadeh, KavehAasaraai, and Ken Pu who have been extremely helpful during my �ve years at the Universityof Toronto. I am also grateful to the SciNet team, especially Chris Loken, Daniel Gruner, andChing-Hsing Yu, for trusting and supporting me during my experiments.

�is thesis is dedicated to my wife and my parents for their unconditional love, constantencouragement and support.

v

Contents

1 Introduction 1

1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4 �esis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.6 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Background 17

2.1 Precursors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 ForCES: An initial proposal . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.2 4D: Scalable Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.3 Ethane: Centralized Authentication . . . . . . . . . . . . . . . . . . . . . . 21

2.1.4 OpenFlow: Convergence of Proposals . . . . . . . . . . . . . . . . . . . . 22

2.2 Forwarding Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 Control Plane Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.1 In-Data-Plane Proposals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3.2 Distributed Control Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.4 Resilience to Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.5 Programming Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

vi

2.6 Correctness and Consistency Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.7 Monitoring and Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.8 Network Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3 Kandoo 44

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.2 Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4 Beehive 59

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.3 Control Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.3.1 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.3.2 Fault-Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

4.3.3 Live Migration of Bees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.3.4 Runtime Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

4.3.5 Automatic Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.3.6 Application Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.6 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5 Use Cases 91

5.1 OpenTCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.1.1 OpenTCP Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

vii

5.1.2 OpenTCP Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.1.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.2 Beehive SDN Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5.3 Simple Routing in Beehive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

5.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

6 Conclusion and Future Directions 117

Bibliography 120

viii

List of Tables

4.1 Sample use-cases implemented in Beehive and their code size . . . . . . . . . . . 84

5.1 �e approximate OpenTCP overhead for a network with ∼4000 nodes and ∼100

switches for di�erent Oracle refresh cycles. . . . . . . . . . . . . . . . . . . . . . . 106

ix

List of Figures

1.1 So�ware-De�ned Networking: A generic scheme. . . . . . . . . . . . . . . . . . . 2

1.2 Using an eventually consistent state, two controllers can steer tra�c (e.g., the

green and the red �ows) onto the same link, which results in an unnecessary

oversubscription in the network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 In a distributed routing application that tries to avoid loops in a network, we

need coordination and leader election mechanisms to avoid service disruption.

Without such mechanisms, an eventually consistent state may lead to faults in

the network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 When a virtual network migrates, the control plane can migrate the respective

virtual networking application instance. Without �ne-grained control on the

external datastore, however, such a migration is ine�ective. . . . . . . . . . . . . 7

1.5 Kandoo has two layers of controllers: Local controllers run local applications

and handle frequent events, while the root controller runs non-local applica-

tions processing in-frequent events. . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.6 Beehive’s control platform is built based on the intuitive primitives of hives,

bees, and cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1 ForCES Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 4D Four Planes: (i) data, (ii) discovery, (iii) dissemination, and (iv) decision planes 20

2.3 OpenFlow Switch: An OpenFlow 1.1 switch stores multiple �ow tables each

consisting of multiple �ow-entries. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

x

2.4 ONIX maintains the network-wide state in a distributed fashion. . . . . . . . . . 29

2.5 ONOS maintains the network-wide state using an external database system. . . 30

2.6 �e �ve steps when converging upon a link failure. . . . . . . . . . . . . . . . . . 33

3.1 Kandoo’s Two Levels of Controllers. Local controllers handle frequent events,

while a logically centralized root controller handles rare events. . . . . . . . . . . 46

3.2 Toy example for Kandoo’s design: In this example, two hosts are connected using

a simple line topology. Each switch is controlled by one local Kandoo controller.

�e root controller controls the local controllers. In this example, we have two

control applications: Appdetect is a local control application, but Appreroute is

non-local. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3 Kandoo’s high level architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.4 Kandoo in a virtualized environment. For so�ware switches, we can leverage the

same end-host for local controllers, and, for physical switches, we use separate

processing resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.5 Experiment Setup. We use a simple tree topology. Each switch is controlled

by one local Kandoo controller. �e root Kandoo controller manages the local

controllers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.6 Control Plane Load for the Elephant Flow Detection Scenario. �e load is based

on the number of elephant �ows in the network. . . . . . . . . . . . . . . . . . . . 54

3.7 Control Plane Load for the Elephant Flow Detection Scenario. �e load is based

on the number of nodes in the network. . . . . . . . . . . . . . . . . . . . . . . . . 55

4.1 A simple learning switch application in Beehive. . . . . . . . . . . . . . . . . . . . 62

4.2 �e LearningSwitch application (Figure 4.1) learns the MAC address to port

mapping on each switch and stores them in the Switches dictionary. . . . . . . 64

xi

4.3 �is simple elephant �ow routing application stores �ow periodically collects

stats, and accordingly reroutes tra�c. �e schematic view of this algorithm is

also portrayed in Figure 4.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.4 �e schematic view of the simple elephant �ow routing application presented

in Figure 4.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.5 In Beehive, each hive (i.e., controller) maintains the application state in the form

of cells (i.e., key-value pairs). Cells that must be colocated are exclusively owned

by one bee. Messages mapped to the same cell(s) are processed by the same bee. 68

4.6 Automatically generated Map functions based on the code in Figure 4.4. . . . . 69

4.7 Life of a message in Beehive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.8 �roughput of a queen bee for creating new bees, using di�erent batch sizes.

Here, we have used clusters of 1, 3, 5, 7, 9 and 11 hives. For larger batch sizes, the

throughput of a queen bee is almost the same for all clusters. . . . . . . . . . . . 72

4.9 Hives form a quorum on the assignment of cells to bees. Bees, on the other hand,

form quorums (i.e., colonies) to replicate their respective cells in a consistent

manner. �e size of a bee colony depends on the replication factor of its

application. Here, all applications have a replication factor of 3. . . . . . . . . . . 75

4.10 �e latency and throughput of the sample key-value store application: (a)

�roughput using replication factors of 1 to 5, and batch sizes of 128 to 8k. (b)

�e PDF of the end-to-end latency for GET and PUT requests in a cluster of 20

hives and 80 clients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.11 Beehive failover characteristics using di�erent Ra� election timeouts. In this

experiment, we use the key-value store sample application with a replication

factor of 3 on a 3-node cluster. Smaller election timeouts result in faster failover,

but they must be chosen to avoid spurious timeouts based on the network

characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

xii

4.12 �e PDF of a bee’s throughput in the sample key-value application, with and

without runtime instrumentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.13 Inter-hive tra�c matrix of the elephant �ow routing application (a) with

an ideal placement, and (b) when the placement is automatically optimized.

(c) Bandwidth usage of the control plane in (b). . . . . . . . . . . . . . . . . . . . 82

4.14 A centralized application maps all messages to the same cell. �is results in

processing all messages in the same bee. . . . . . . . . . . . . . . . . . . . . . . . . 85

4.15 A local application maps all messages to a cell local to the hive (e.g., the switch’s

ID). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.16 To manage a graph of network objects, one can easily map all messages to the

node’s ID. �is way nodes are spread among hives on the platform. . . . . . . . . 86

4.17 For virtual networks, messages of the same virtual network are naturally

processed in the same bee. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.1 Using Kandoo to implement OpenTCP:�e Oracle, SDN-enabled switches, and

Congestion Control Agents are the three components of OpenTCP. . . . . . . . . 93

5.2 �e Oracle collects network state, and statistics through the SDN controller and

switches. Combining those with the CCP de�ned by the operator, the Oracle

generates CUEs which are sent to CCAs sitting on end-hosts. CCAs will adapt

TCP based on the CUEs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.3 �e topology of SciNet, the datacenter network used for deploying and

evaluating OpenTCP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

5.4 SciNet tra�c characterization. �e x-axis is in log scale. (a) CDF of �ow sizes

excluding TCP/IP headers. (b) CDF of �ow completion times (right). . . . . . . 99

5.5 �e locality pattern as seen in matrix of log(number of bytes) exchanged

between server pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

5.6 CDF of �ow sizes in di�erent benchmarks used in experiments. �e x-axis is in

log scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

xiii

5.7 Comparing the CDF of FCT (a) and drop rate (b) of OpenTCP1, OpenTCP2,

and regular TCP while running the “all-to-all” IMB benchmark. . . . . . . . . . 102

5.8 Congestion window distribution for OpenTCP1 and unmodi�ed TCP. . . . . . . 103

5.9 Comparing the CDF of FCT (a) and drop rate (b) of OpenTCP1, OpenTCP2,

and regular TCP while running the “Flash” benchmark. . . . . . . . . . . . . . . 105

5.10 Beehive’s SDN controller suite. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.11 Fault-tolerance characteristics of the control suite using a Ra� election of

300ms: (a) �e default placement. (b) When the newly elected masters are

on the same hive. (c) When the new masters are on di�erent hives. (d)

Reactive forwarding latency of (b). (e) Reactive forwarding latency of (c). (f)

Reactive forwarding latency of (c) which is improved by automatic placement

optimization at second 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.12 When a switch disconnects from its master driver: (a) One of the slave drivers

will become the master. (b) �e reactive forwarding latency is optimized using

the placement heuristic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

5.13 �roughput of Beehive’s learning switch application and ONOS’s forwarding

application in di�erent cluster sizes. �e number of nodes (i.e., the x-axis) in

this graph represents the nodes on which we launch cbench. For Beehive, we

always use a 5-node cluster. For ONOS, the cluster size is equal to the replication

factor since all nodes in the cluster will participate in running the application. . 113

5.14 A distributed shortest path algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . 114

5.15 Convergence time of the �at routing algorithm in a k-ary fat-tree topology. . . . 116

xiv

Chapter 1

Introduction

Computer networks have traditionally been designed based on a plethora of protocols imple-

mented by autonomous networking elements. In such a setting, controlling the network is

di�cult and network programmability is limited since these network protocols are focused

on speci�c functionalities implemented using a speci�c mechanism. As networks and their

applications evolve, the need for �ne-grained control and network programming becomes more

pronounced. Without programmability, implementing new ideas (e.g., network virtualization

and isolation) and adapting new changes (e.g., new routing mechanisms) are quite challenging

and tedious [88].

So�ware-De�ned Networking (SDN) emerged with the goal of lowering the barrier

for innovation in computer networks. �ere are numerous SDN proposals, ranging from

ForCES [128] to 4D [49] to OpenFlow [88]. Although, these proposals are di�erent in scope,

moving the control functions out of data-path elements is the common denominator among

them. �ese proposals essentially decouple control and data planes, and provide a standard API

using which the control plane can con�gure the data plane. In SDN, as shown in Figure 1.1,

data-path elements (such as routers and switches) are high-performance entities implementing

a set of forwarding primitives (mostly matched-action forwarding rules). �e control plane, on

the other hand, is a cohort of programs (i.e., the control applications) that collectively control

1

Chapter 1. Introduction 2

the data-path elements using the exposed forwarding primitives. �ese control applications are

developed in general-purpose programming languages and are hosted on top of a middleware

called the network controller. �e network between the data plane and the control plane is called

the control channel, which can be either in-band (i.e., shared with the actual network �ows) or

out-of-band (i.e., a dedicated and isolated network).

Switch

Switch

Switch

ControllerCo

ntro

l App

Cont

rol P

lane

Ope

n AP

ID

ata

Plan

e

Cont

rol C

hann

elCo

ntro

l App

Cont

rol A

pp

Cont

rol A

pp

Cont

rol A

pp

Figure 1.1: So�ware-De�ned Networking: A generic scheme.

Advantages of SDN.�e separation of control from data path elements brings about numerous

advantages such as high �exibility, programmability, and the possibility of realizing a centralized

network view; with the promise of simplifying network programming. A�er all, it is signi�cantly

easier to implement complex policies with arbitrarily large scopes on a decoupled, centralized

controller. For example, consider a simple shortest-path routing application. In traditional

networks, one needs to implement the shortest path algorithm in a highly distributed,

heterogeneous environment and should de�ne boilerplates including the serialization protocol,

the advertisement primitives and the coordination mechanisms, in addition to the core of

the shortest path routing algorithm. In contrast, on a centralized controller, such a routing

application is implemented as �nding shortest paths using the centralized network-wide view.

Despite all its simplicity, centralization is not feasible in large scale networks due to concerns


about scalability and resilience [60]. In such environments, network programmers need to

implement distributed control applications. As we explain next, implementing distributed

control applications is quite challenging without proper platform-level support.

1.1 Problem Statement

In large-scale networks, to overcome scalability and performance limitations, control applica-

tions are usually implemented on distributed control platforms. Implementing a distributed

control application is clearly more di�cult than implementing its centralized version. Some of

these complexities (e.g., synchronization, and coordination) are common among all distributed

applications, whereas some are stemmed from the application’s design (e.g., parallelization).

Existing distributed control platforms either (i) expose the complexities of distributed

programming to control applications (e.g., ONIX [75]), or (ii) delegate distributed state

management to external distributed systems (e.g., ONOS [17] and OpenDayLight [4]). In

such environments, designing a distributed control application comprises signi�cant e�orts

beyond the design and implementation of the application’s core: one needs to address timing,

consistency, synchronization, coordination, and other common complications of distributed

programming. More importantly, network programmers have to develop tools and techniques

to measure the performance of their control applications in order to identify design bottlenecks,

and need to manually optimize the control plane to come up with a scalable design. �is directly

obviates the promised simplicity of SDN.

Issues of Exposing Complexities.�e most notable distributed control platform is ONIX [75],

which is built using an eventually consistent represention of the network graph (i.e., Network

Information Base). Providing eventual consistency greatly simpli�es the control plane’s design,

as there is no need to deal with timing complications that can happen as a result of delays

and failures in the network. Yet, such a design simpli�cation, admittedly, pushes a lot of

complexities into control applications and hinders the promised simplicity of SDN [75, 86].


Controller 1 Controller 2

EventuallyConsistent



Installing Flowsin Parallel

Figure 1.2: Using an eventually consistent state, two controllers can steer tra�c (e.g., the greenand the red �ows) onto the same link, which results in an unnecessary oversubscription in thenetwork.

�is becomes more pronounced when control applications need to implement coordination

and synchronization mechanisms that are mostly overlooked in existing control platforms.

For instance, consider a routing application designed to prevent link oversubscription by

steering �ows along paths with su�cient vacant capacity, shown in Figure 1.2. �is application

is simple to implement on a centralized controller, but it can easily oversubscribe a link when

deployed in a distributed controller using an eventually consistent state. For instance, due to

latency in updating the state and lack of proper synchronization, two distinct instances of the

application can route two elephant �ows (e.g., the green and the red �ows) to the same link,

leading to an unnecessary oversubscription. To overcome this issue, the network programmer

needs to implement/integrate complex coordination mechanisms, which requires more e�orts

than implementing the core of the routing functionality.

In addition to such suboptimalities, eventual consistency may result in invalid control.

For example, consider a naive minimum spanning tree (MST) routing application, depicted

in Figure 1.3, that tries to prevent loops by keeping one outgoing port up per neighbor

switch. When two neighbor switches are controlled by two physically separate controllers, each




MST Router MST Router



MST Router MST Router

Block Portsfor Loop Prevention

Figure 1.3: In a distributed routing application that tries to avoid loops in a network, weneed coordination and leader election mechanisms to avoid service disruption. Without suchmechanisms, an eventually consistent state may lead to faults in the network.

controller immediately blocks a few ports, but will eventually notify the neighbor controller.

Without a proper coordination mechanism, this can result in obscure connectivity problems

even in extremely simple situations. �is problem can be overcome by employing a leader

election mechanism, which inevitably complicates the control application’s logic.

Moreover, in scenarios where control applications require a subset of network events

(e.g., network events related to a speci�c tenant in a multi-tenant datacenter), it would be

futile to keep a copy of the complete network state in each controller instance. �is is

especially important when network policies are naturally enforced using the state of a small

partition. For instance, to enforce isolation and to provide network statistics and billing for each

tenant, the control application needs the network events of that speci�c tenant in a consistent

way, not a holistic view of the entire network. �is suboptimal behavior can potentially

become a scalability bottleneck by unavailingly consuming resources. To prevent unnecessary

duplications of state, di�erent instances of an application must explicitly specify what parts of

the network state they need to access. �is requires proper partitioning that, if not facilitated by

the platform, can easily dwarf the main part of the control logic.

Issues of Delegating to External Datastores. An alternative design for a distributed control

plane is to delegate state management to an external distributed datastore. Although using


an external datastore can simplify distributed state management, it has several important

drawbacks. Delegating SDN state management to an external system is shown to have

considerable overheads [76]. Further, such an over-delegation of the control plane functionality

does not resolve the problem of distributed programming, such as concurrent updates and

consistency [123]. More importantly, it is the external datastore that dictates the architecture

of the control plane: if the datastore forms a ring, we cannot form an e�ective hierarchy in the

control plane.

ONOS [17] is a distributed control plane that delegates all the core functions of a distributed

control application to distributed state storage systems (e.g., ZooKeeper [64], Casandra [77], or

RAMCloud [99]). �is simpli�es ONOS’s design but inherits all the complications of distributed

programming with one layer of indirection. In such a setting, network programmers need to

design and optimize their control application as well as the external distributed system since the

logic is implemented in one environment and the state is stored in another system. As a result,

using an external system does not allow the network programmer to design the functions and the

respective state in accordance. At least, one needs to design and measure two separate systems

with di�erent characteristics and fundamentally di�erent designs.

For instance, suppose we have a network managed by multiple controllers that store their

state in a distributed datastore. Despite sharing a common datastore (even if the datastore

is strongly consistent), the control applications on these controllers still need to coordinate

to correctly manage the network. For example, if they see packets of the same �ow in

parallel (say due to a multi-path route), they need to coordinate with one another to avoid

black holes and routing loops. In general, they need to synchronizate on any decision that

might lead to con�icts. Such synchronization problems require orchestration facilities such as

Corybantic [90], Athens [15] and STN [23]. It is important to note that providing such facilities

in the control plane would eliminate the need for an external state management system.

Relying on an external datastore has strong performance implications. Consider a simple

learning switch application, as another example, that stores the state of each switch and forwards


Controller 2

External Data Store

VM

Net 1 Net 2

VM

Controller 1

External Data Store

VM

Net 1 Net 2

VM

VMs Migrated

Control App Migrated

Virtual Networking

Controller 2Controller 1

Virtual Networking

Figure 1.4: When a virtual network migrates, the control plane can migrate the respectivevirtual networking application instance. Without �ne-grained control on the external datastore,however, such a migration is ine�ective.

packets accordingly. To store its state, this application will have to save the state of all switches on

the external distributed database that ONOS supports. �is learning switch application should

be high-throughput, and the overheads of using an external system can be prohibitive for this

simple control application. �is makes network programming unnecessarily complicated as

one needs to not only make their learning switch application scale, but also to come up with a

scalable design for the external state management system.

Using an external datastore also limits the control plane’s ability to self-optimize. For

example, consider an instance of a virtual networking application that controls the ports and

the �ows of a virtual network in a virtualized environment. In such a setting, virtual machines

have mobility and can migrate. �us, it is important that the control plane places the virtual

networking application instance close to the virtual network upon migration. As shown in

Figure 1.4, even if we implement an optimized placement mechanism for the control application,

the external state storage system may not move the state according to the application placement.

�at necessitates changes in the external system which is quite challenging.

Motivation. �ese simple examples demonstrate that, without a proper control plane ab-

straction, control applications have to deal with practical complications of distributed systems,


which can be a huge burden if not impossible to deal with. Pushing such complications

(e.g., synchronization, coordination, and placement) into control applications clearly obviates

many bene�ts of SDN. A viable way to provide an easy-to-use network programming paradigm

is to hide these generic complications and boilerplates from control applications behind a

suitable abstraction. Such an abstraction must be simple and, more importantly, powerful

enough to express essential control functions.

1.2 Objectives

Simplicity and scalability are not mutually exclusive. Map-Reduce [35] and Spark [132] are the

infamous programming models that hide the complexities of batch data processing. Storm [84]

and D-Stream [133] are successful examples that hide the boilerplates of distributed stream

processing. In essence, such systems �nd the sweet spot between what they hide from and what

they expose to the distributed applications. �e key to �nding a proper programming model is

an easy-to-use abstraction that is generic enough to cover all important use cases in its target

domain. We believe similar approaches are viable for SDN.

�e main objective of this thesis is a new distributed programming platform that can simplify

network programming in SDN. By programming platform, we mean a composition of (i) a

programming model that provides the APIs to develop control applications and (ii) a distributed

runtime that automatically deploys, runs, and optimizes the applications written in the proposed

programming model.

Such a platform must have the following characteristics:

1. Simple: We aim at creating a programming paradigm that is as simple to program

as existing centralized controllers. �at is, the e�orts required to develop a control

application (e.g., line-of-code and cyclomatic complexity [85]) using the distributed

programming model should be almost the same as implementing it on a centralized

controller. More importantly, the e�orts required to instrument and measure the


distributed control application should be comparable to existing centralized controllers.

2. Generic: �e programming model should be su�ciently expressive for essential control

functionalities. For example, one should be able to implement forwarding, routing, and

isolation using this programming model. Moreover, one should be able to implement and

emulate existing control platforms using the proposed programming model. �is ensures

that the programming model is generic enough to accommodate existing proposals and

legacy control applications.

3. Scalable: We need a programming model that has no inherent scalability bottleneck.

It is important to note that the scalability of any programming model depends heavily

on the applications. For instance, one can create an ine�cient and unscalable Map-

Reduce application. Such scalability bottlenecks that reside in applications cannot be

automatically resolved. Having said that, our objective is to propose a programming

paradigm that scales well given a control application design that is not inherently

centralized. In other words, our proposal by itself should not be scalability bottleneck.

4. Autonomous: �ere are many commonalities among distributed control applications.

One of the most important example of such commonalities is optimizing the placement of

control applications. Our goal is to develop an environment in which such commonalities

are automated with no programmer intervention. Having such an autonomous system

simpli�es network programming and operation. More importantly, it results in an

elastic control plane that can automatically adapt to the network characteristics and the

workload.

1.3 Overview

We �rst present a comprehensive overview of existing SDN proposals and research. To gain

a better perspective on the topic of this dissertation, we start with earlier proposals from


the academia and the industry that have emerged before SDN. �en, we present high-level

architectural proposals, as well as latest research on SDN control plane, SDN data plane, SDN

abstractions, and SDN tools.

In our background research, we deconstruct what are the main scalability concerns in SDN

and discuss how they compare with traditional networks. We argue that some of these scalability

concerns are not unique to SDN and coexist in traditional networks [60]. Moreover, we

explore scalability and performance bottlenecks (such as scarce control channels and resilience

to failure) in di�erent settings for di�erent control applications.

�ese bottlenecks reside in the data plane, in the control plane, or both; hence, there

are numerous places (e.g., the forwarding fabric, the controller, and the control applications)

to tackle them. We enumerate such di�erent alternatives and trade-o�s in the design

space to address the scalability and performance bottlenecks. Furthermore, we compare these

alternatives and analyze the cons and pros of each approach. �is analysis forms the foundation

of our distributed control plane proposals: Kandoo and Beehive. Both of our proposals scale

well and are simple to program.

Kandoo. One of the main scalability bottlenecks of the control plane in SDN is the scarce

control channels that can be easily overwhelmed by frequent events. Even when one proactively

installs �ow-entries, collecting �ow statistics to implement adaptive networking algorithms

can overwhelm the control channels and consequently the control plane. Kandoo [57] is a

distributed programming model that tries to address this scalability bottleneck in environments

where computing resources are available close to the data path elements (e.g., datacenters and

campus networks). In contrast to previous proposals such as DIFANE [131] and DevoFlow [34],

Kandoo limits the overhead of frequent events on the control plane without any modi�cations

in the data plane and without hindering the visibility of the control applications.

As shown in Figure 1.5, Kandoo has two layers of controllers: (i) the bottom layer is a group

of controllers with no interconnection and no knowledge of the network-wide state, and (ii) the

top layer is a logically centralized controller that maintains the network-wide state. Controllers


Root ControllerRoot ControllerRoot Controller

RareEvents

FrequentEvents

App D(Non-Local)

App C(Non-Local)

Local Controller

App A(Local)

App B(Local)

App A(Local)

App B(Local)

App A(Local)

Local Controller

App B(Local)

Local Controller

Switch Switch Switch Switch Switch

Figure 1.5: Kandoo has two layers of controllers: Local controllers run local applications andhandle frequent events, while the root controller runs non-local applications processing in-frequent events.

at the bottom layer run only local control applications (i.e., applications that can function using

the state of a single switch) close to the datapath elements. �ese controllers handle most of the

frequent events and e�ectively shield the top layer.

Local control applications have implicit parallelism and can be easily replicated since they

access only their local state. As such, Kandoo’s design enables network operators to replicate

local controllers on demand without any concerns about distributed state replication. �ese

highly replicated local controllers process frequent events and relieve the load on the top layer.

Our evaluations show that a network controlled by Kandoo has an order of magnitude lower

control channel consumption compared to centralized OpenFlow controllers. Moreover, the

only extra information we need from network programmers is a �ag that states whether a control

application is local or non-local. As a result, control applications written in Kandoo are as simple

as their centralized versions.

Beehive. Beehive [59], the second programming paradigm presented in this dissertation, is

more generic and more powerful than Kandoo. Beehive consists of a programming model


and a distributed control plane. Using Beehive’s programming model, network programmers

implement control applications as asynchronous message handlers storing data in their local

dictionaries (i.e., associative arrays or maps). Based on this programming model, Beehive’s

distributed control platform seamlessly partitions the dictionaries of each application, and

assigns each partition exclusively to one leader thread. Beehive’s partitioning mechanism

guarantees that each message can be processed solely using the entries in one partition. �at

is, for each message, there is one and only one partition (and accordingly there is one thread

managing that partition) that is used to serve the message. To that end, Beehive automatically

infers the mapping between each message to the keys required to process that message, and

sends the message to the leader thread which exclusively owns those keys (i.e., the thread that

owns the partition containing those keys). It is important to note that all these functionalities

(e.g., inferring partitions, parallelization through threads, and the platform-wide coordination)

are internal to our control platform and are not exposed to network programmers.

Beehive provides seamless fault-tolerance, runtime instrumentation, and optimized place-

ment of control applications without developer intervention. To provide such autonomous

functionalities and to remain generic, Beehive is not built on top of existing external datastores

(e.g., memcached [43] or RAMCloud [99]) since we need �ne-grained control over all aspects

of the control plane, including but not limited to the placement of state and the replication

mechanism.

As depicted in Figure 1.6, Beehive’s control platform is internally built based on the intuitive

primitives of hives, cells, and bees. A hive is basically an instance of the controller running

on a physical machine, a container, or a virtual machine. A cell is an entry in the dictionary

of a control application. A bee, on the other hand, is a light-weight thread of execution that

exclusively manages its own cells and runs the message handlers of a control application. �e

platform guarantees that each message is processed by the bee that exclusively owns the cells

required to process that message. �e interplay of these primitives results in concurrent and

consistent state access in a distributed fashion.


Hive on Machine 3

App A App B

Hive on Machine 1 Hive on Machine 2

App A App B

MM

MessagesM

App A App B

M

MMM

Figure 1.6: Beehive’s control platform is built based on the intuitive primitives of hives, bees, andcells.

�e Beehive control platform is fault-tolerant. Each bee forms a colony (i.e., a Ra� [96]

quorum) with follower bees on other hives to consistently replicate its cells. Upon a failure, the

colony reelects its leader and continues to process messages in a consistent manner. Another

important capability of Beehive is runtime instrumentation of control applications. �is enables

Beehive to automatically live migrate the bees among hives aiming to optimize the control plane.

�e runtime instrumentation, as a feedback provided to network programmers, can also be

used to identify bottlenecks in control applications, and can assist developers in enhancing their

designs.

It is important to note that Beehive hides many commonalities of distributed programming

(such as coordination, locking, concurrency, consistency, optimization and fault-tolerance),

while providing su�cient knobs for network programmers to implement a wide range of

application designs. In this dissertation, we demonstrate that Beehive is generic enough for

implementing existing distributed control platforms (including Kandoo) as well as important

control applications (such as routing and tra�c engineering) with ease and at scale.

UseCases. We present how important use cases can be implemented using Kandoo and Beehive.

We �rst present OpenTCP [46], our proposal to optimize and adapt TCP using the centralized

network-wide view available in SDN, and how one can use Kandoo to implement OpenTCP at

scale. Furthermore, we present a distributed SDN controller and a simple shortest-path routing


algorithm implemented using Beehive to showcase the expressiveness of our proposal. We

demonstrate that Beehive’s automatic optimizer can be advantageous for SDN applications in

real-world settings. Evaluating these use-cases, we demonstrate that Beehive and Kandoo are

simple to program and can scale.

1.4 �esis Statement

We present expressive and extensible distributed programming paradigms for SDN that are easy to

use, straightforward to instrument, and autonomous in optimizing the control plane.

1.5 Contributions

�is dissertation makes the following contributions:

1. We present Kandoo [57], a two-layer hierarchy of controllers that is as simple to

program as centralized controllers, yet scales considerably better by o�oading local

control applications onto local computing resources close to data-path elements to process

frequent events. �e key di�erentiation of Kandoo is that, unlike previous proposals such

as DIFANE [131] and DevoFlow [34], it does not require modi�cations in the forwarding

hardware nor to the southbound protocols (e.g., OpenFlow [88]).

2. As a use case of Kandoo, we present OpenTCP [46], which can adapt and optimize

TCP by exploiting SDN’s network-wide view. By deploying OpenTCP in SciNet (the

largest academic HPC datacenter in Canada), we demonstrate that OpenTCP can achieve

considerable improvements in �ow completion time with simple optimization policies, at

scale. Such a deployment is quite challenging using centralized controllers.

3. We propose Beehive which provides a programming model (i.e., a set of APIs) that

is intentionally similar to centralized controllers to preserve simplicity. Applications

developed using this programming model are seamlessly transformed into distributed


applications by our distributed control platform. Using Beehive, we have implemented

our own SDN controller as well as existing distributed controllers such as Kandoo. To

demonstrate the generality of our proposal, we have implemented several non-SDN use

cases on top of Beehive including a key-value store, a message queue, and graph processing

algorithms. Using these SDN and non-SDN use cases, we demonstrate that our proposal

is simple to program, scales considerably well, can automatically tolerate faults, provides

low-overhead runtime instrumentation, and can optimize itself upon changes in the

workload.

To preserve simplicity, Beehive applications are written in a generic programming

language with an API similar to centralized controllers. To that end, our proposal does not

introduce a new domain speci�c language (DSL), such as BOOM’s disorderly program-

ming [13] and Fabric’s DSL for secure storage and computation [80]. Moreover, unlike

existing distributed SDN controllers, such as ONIX [75] and ONOS [17], Beehive hides

the boilerplates of distributed programming and does not expose them to programmers.

Beehive is generic: it does not impose a speci�c distribution model (e.g., a hierarchy or

a ring), and is not tied to a speci�c control application (e.g., forwarding or routing) or a

speci�c southbound protocol (e.g., OpenFlow). As such, it can accommodate a variety of

designs for control applications, including hierarchies, rings and pub-sub. Similarly, its

self-optimization mechanism is generic and applies to all types of control applications, in

contrast to proposals such as ElasticCon [37] which has the speci�c goal of optimizing the

workload of switches among several controllers.

1.6 Dissertation Structure

�is dissertation is organized as follows:

• In Chapter 2, we present an overview of the state-of-the-art SDN research from both the

academia and the industry. We present SDN precursors (which includes the proposals


that led to what we know as SDN today), SDN proposals in the design space, SDN tools,

and novel applications of SDN. Moreover, we deconstruct challenges in SDN, and discuss

di�erent alternatives to address these challenges.

• In Chapter 3, we present Kandoo. Kandoo is one of the two programming abstraction

presented in this dissertation. We explain Kandoo’s design philosophy in detail and

compare it with other alternative proposals in the same domain. Using a sample elephant

�ow detection application, we demonstrate that Kandoo can result in a considerably more

scalable control plane.

• In Chapter 4, we present our second programming paradigm, Beehive. We present the

details of Beehive’s design, and demonstrate its programming model. Moreover, we

explain the internals of the Beehive distributed control plane and how it acts as a scalable

runtime for the proposed programming model. We also explain fault-tolerance, runtime

instrumentation, and automatic placement optimization in Beehive.

• We present important use cases of our proposals in Chapter 5. We �rst present OpenTCP,

and explain how it can be implemented using Kandoo (and consequently Beehive). �en,

we present and evaluate a real-world distributed controller and a distributed shortest path

algorithm implemented using Beehive.

• We summarize and conclude this dissertation in Chapter 6, and enumerate all the

potential directions to follow up this research in the future.

Chapter 2

Background

SDN is a broad area and has a vast body of research ranging from low-level hardware designs to

high-level abstractions to real-world network applications. Covering all SDN research is beyond

the scope of this dissertation. �us, in this chapter, we limit the scope of our literature review

to fundamental SDN research and the state-of-the-art proposals related to the topic of this

dissertation. We start with precursors and important fundamental SDN abstractions. �en, we

enumerate and analyze proposals that address scalability, resilience, correctness and simplicity

concerns in SDN. Moreover, we present important measurement techniques and optimization

applications using SDN.

2.1 Precursors

Traditional networks are built as a fragile assemblage of networking protocols (i.e., slightly

less than 7000 RFCs), which are instrumental for building the Internet, but inessential

for orchestration within a single networking domain. �ese protocols o�en solve very

straightforward problems but are unnecessarily complicated since they are designed to operate

in highly heterogeneous infrastructures with no orchestration mechanism. For that reason, to

enforce a simple network policy (e.g., a simple isolation rule) in traditional networks, we either

have to use a set of complicated signaling protocols supported by all networking elements, or

17

Chapter 2. Background 18

build an overlay network to steer tra�c through middleboxes.

�ere are various architectural proposals to address this inherent problem of traditional

network architectures, and SDN is the result of their convergence. �ese proposals, although

di�erent in scope and mechanisms, agree upon the separation of control functions from the data

plane: data-path elements support addressing protocols and respective forwarding primitives,

and all other problems are tackled in the control plane which communicates with data-path

elements using a standard API1.

In this chapter, we present an overview of proposals which profoundly in�uence SDN’s

design principles.

2.1.1 ForCES: An initial proposal

In most operating systems, networking applications are designed as separate fast-path and

slow-path components. �e former provides line-rate packet processing functionality, needs

to be extremely e�cient, and is usually implemented within the Kernel, whereas the second

component is a user-space so�ware that provides higher level functionality, and manages its

kernel-level counterparts via signaling mechanisms (e.g., NetLink sockets).

Routers and switches are internally designed the same way. �eir fast path (i.e., the

forwarding plane) is usually implemented in hardware (using either ASIC, programmable

hardware or network processors) separated from the slow path (i.e., the controller) which is

usually implemented in so�ware running on a general purpose processor. Observing this

analogy2, Forwarding and Control Element Separation (ForCES) [128] proposes to use a

standard signaling mechanism in between control and forwarding elements which e�ectively

decouples control plane from forwarding elements.

As portrayed in Figure 2.1, ForCES have two types of networking entities: (i) Forwarding

Elements (FE) that forward packets, and (ii) Control Elements (CE) that control FEs. A

1It is important to note that traditional network equipments are implemented in a similar fashion except forthe fact that their control plane is highly coupled with the data plane, and thus is in�exible.

2ForCES is proposed by designers of Linux NetLink sockets.


FE

LFB LFB LFB

LFBLFB

(a) Forwarding Elements (FE) are con�gured asa set of interconnected Logical Function Blocks(LFB).

FE FE FE

CE CE

ForCES

Protocol

Data

Plan

eCo

ntrol

Plan

e

(b) Control Elements (CE) can associate with anyForwarding Element (FE), con�gure and query itsLogical Function Blocks.

Figure 2.1: ForCES Components.

FE is con�gured as a topology of Logical Function Blocks (LFB) which de�ne high level

forwarding functions (Figure 2.1a). As shown in Figure 2.1b, each CE can associate with any

FE and con�gure, query, and subscribe to its LFBs. LFBs are designed to be �ne-grained, and

thus almost all sorts of forwarding functionality can be implemented in ForCES’s FEs. �is

advantage, however, is also the main disadvantage of ForCES.�is makes the model too generic

and underspeci�ed, and for that reason ForCES is not deployed in practice.

2.1.2 4D: Scalable Routing

RCP. Routing Control Platform (RCP) [22] aims at addressing scalability issues in traditional

routing protocols, speci�cally iBGP, by delegating routing decisions to a control platform.

To centralize routing decisions, RCP gathers internal network state (by monitoring link-state

advertisements from IGP protocols) as well as candidate routes to external networks (by

establishing iBGP connection to network routers). Utilizing this centralized information, RCP

selects and assigns routes for each router, essentially preventing the protocol oscillation and

persistent forwarding loops. To be resilient to failures, RCP realizes multiple replicas and

guarantees that di�erent replicas will have identical routing decisions in steady state.

4D.Extending RCP, 4D [49] proposes a clean-slate approach which extends RCP beyond routing

protocols into all kinds of decision logics in networks. 4D has three major design objectives:

First, network requirements should be expressed as a set of network-level objectives that should


be independent of data plane elements, and control decisions should satisfy and optimize those

objectives. Second, decisions are made based on a timely, accurate network-wide view of the

data plane’s state. Consequently, routers and switches should provide the information required

to build such a view. �ird, instead of implementing decision logic inside network protocols,

the data plane elements should only use the output of decision logic to operate. �is ensures

direct control over the data plane elements. In other words, we can directly enforce decisions

and make sure they are obeyed by switches and routers.

To achieve its design objectives, 4D (as implied by its name) partitions network functionality

into four planes (Figure 2.2): (i) data, (ii) discovery, (iii) dissemination, and (iv) decision planes.

�e decision plane makes all control decisions, such as load balancing, access control, and

routing. �e data plane provides packet processing functions such as forwarding, �ltering,

scheduling, measurements, and handles packets according to the state enforced by the decision

plane. Discovering physical elements in the data plane is the responsibility of the discovery plane.

�is plane merely discovers physical networking elements and represents them as logical entities

encapsulating their capabilities and adjacency information. �e dissemination plane, on the

other hand, is a passive communication medium that relays decisions to the data plane, and

provides the discovered state to the decision plane.

Decision

Data

Dissemination

Discovery

Direct Control

Network-wideView

Network-Level Objectives

Figure 2.2: 4D Four Planes: (i) data, (ii) discovery, (iii) dissemination, and (iv) decision planes


Tesseract. 4D is a generic architectural template and does not enforce any implementa-

tion details for the planes, nor any protocols and APIs for communicating between them.

Tesseract [127] is an experimental implementation of 4D that embeds the four planes into

two components: (i) Decision Element (DE) is the centralized entity that makes networking

decisions and has direct control over forwarding elements. DE is basically an implementation

of 4D’s decision plane that uses the dissemination service to communicate to switches, and is

replicated using a simple master-slave model to ensure resilience. (ii) Switch is the component

residing in the forwarding elements that provides packet processing, discovery functions, and

the dissemination service endpoint. In essence, the dissemination plane is partly deployed on

switches and partly on DE to maintain a robust communication channel. Due to the lack of

standard protocols in 4D, Tesseract uses its own node con�guration API to con�gure forwarding

elements. Evaluated in Emulab, Tesseract is shown to be resilient to failures and su�ciently

scalable to handle a thousand nodes.

2.1.3 Ethane: Centralized Authentication

Traditional networks “retro�t” security using auxiliary middleboxes and network processors.

�is approach does not scale as it is not an integral part of the network. It is di�cult for network

operators to maintain policies over time, and it is straightforward for attackers to �nd a way

through the maze of broken policies. To resolve this problem, Casado et al. proposed clean-

slate approaches, SANE [26] and then Ethane [25], incorporating a centralized authority to

authenticate �ows in the network.

SANE. SANE [26] aims at transforming the permissive network into a secure infrastructure

using a centralized authorization mechanism. In SANE, a centralized controller (DC) is

responsible for authentication, service management and authorization. DC assigns capabilities

to authorized �ows and if a �ow does not have a valid capability switches drop it. Upon initiating

a new �ow, end-hosts need to ask for capability grants from a centralized domain controller

(DC). When capabilities are granted, packets are tagged with the capability and sent out from


the end-hosts.

Ethane. Ethane [25] extends SANE such that no modi�cations are required on end-hosts. In

contrast to SANE, Ethane switches have an initially empty �ow table that holds active �ow-

entries (i.e., forwarding rules). When a packet with no active �ow-entry arrives at a switch, it

is sent to the controller. �en, it is the controller’s responsibility to either setup proper �ow-

entries to forward the packet, or drop the packet. �is ensures centralized policy enforcement

while minimizing the modi�cations on the end-hosts.

2.1.4 OpenFlow: Convergence of Proposals

Inspired by 4D, OpenFlow [88] generalizes Ethane for all network programming applications.

In contrast to 4D and other clean-slate designs, OpenFlow takes a pragmatic approach with

a few compromises on generality. OpenFlow adds a standard API between the data plane

and the control plane. In contrast to providing a fully programmable datapath, OpenFlow

imposes minimal changes to switches for programming their forwarding state (i.e., how

incoming packets are forwarded in a switch). �is is to enable high-performance and low-cost

implementations in new hardware, as well as straightforward implementation of OpenFlow in

existing switches.

Flow-entries. OpenFlow models the forwarding state as a set of matched-action rules, called

�ow-entries, composed of a set of conditions over packet header, and a set of actions. �e

conditions can match information such as ingress port, mac addresses, IPv4/v6 �elds, TCP ports

and outermost VLAN/MPLS �elds to name a few. �ese conditions are designed to be easily

implementable in ternary content-addressable memories (TCAMs). Once all the conditions

match a packet the set of respective actions are executed. �ese actions range from sending

packets out to a port to pushing/popping tags and QoS actions.

In addition to forwarding rules, OpenFlow adds minimal functionalities for collecting

network statistics. In addition to generic statistics per port, OpenFlow maintains aggregate


Flow Table 1 Flow Table N

OpenFlow Agent

Controller

Control App Control App Control App

OpenFlowProtocol

TCP or TLS

Software

Hardw

are

Software

OpenFlow Switch

Rule Actions Counters

⁃ SRC Mac⁃ DST Mac⁃ SRC IP⁃ DST IP⁃ TCP SPort⁃ TCP DPort⁃ …

⁃ Forward/Drop Packet⁃ Push/Pop Tags⁃ Send to the next

table⁃ Enqueue packet⁃ …

⁃ Number of Packets⁃ Bytes⁃ Duration

Flow-Entry

Figure 2.3: OpenFlow Switch: An OpenFlow 1.1 switch stores multiple �ow tables each consistingof multiple �ow-entries.

metrics (e.g., number of packets, size of packets, etc.) for each �ow-entry. �is uniquely enables

adaptive forwarding.

Flow-tables. In OpenFlow 1.0 (and prior versions), all �ow-entries are stored in a single table.

Each �ow-entry had a priority for disambiguation. �e problem with this approach is space

explosion: we are storing the Cartesian product of all rules in �ow-entries. In OpenFlow 1.1,

this problem is resolved by replacing the single table with a pipeline of �ow-tables. In addition

to normal actions, each �ow-entry can point to the next �ow-table in the pipeline (note that no

back reference is allowed). �is process is presented in Figure 2.3.

2.2 Forwarding Abstractions

Separating the control and data planes is the high-level design approach advocated by all

SDN proposals, but they are considerably di�erent in how they draw the border between the

responsibilities of the control plane and the functions of the data plane. Such fundamental

di�erences result in di�erent forwarding abstractions for which there are three major schools of

thought.


Programmable Abstractions. Programmable abstractions view the data plane as a set of

programmable entities with an instruction set. Similar to general-purpose processors, for-

warding elements do not provide any prede�ned network function but instead one can o�oad

code into these elements to get the desired functionalities. Active networks, the pioneer of

these abstractions, allow programmability at packet transport granularity. Active networks can

be implemented in so�ware running on general-purpose processors, or using programmable

hardware, such as NetFPGAs[94] or BEE [29] to achieve line-rate processing.

Active network proposals can be categorized into two categories: Some proposals, such as

ActiveIP [126], Smart Packets [107] and ANTS [125] advocate encapsulating code inside packets.

�e proposals assume that the forwarding elements are stateless, generic processing elements

that execute a code accompanying a packet on a per packet basis. �is code can either modify

the packet itself or result in a forwarding decision. Other proposals, such as Switchware [114]

and DAN [36], advocate o�oading code into switches. �ey assume a switch is a stateful and

programmable processing element. �e o�oaded code is executed per incoming packet and

can result in di�erent forwarding decisions.

We also note that there are extensible network programming platforms such as XORP [55]

and Click [73] that must be deployed on generic x86 platforms. �ese platforms suite well for

implementing routing protocols but they are not suitable as a forwarding abstraction.

Generic Abstractions. Proposals in this class keep the data plane as generic as possible

essentially de�ning data-path functions as a set of black boxes with no prede�ned characteristics.

In such abstractions, the data plane is responsible for providing a formal speci�cation of its

functionalities, and it is the control plane’s job to compose these black boxes to program the

network. In other words, the control plane merely composes a set of prede�ned functions to

modify a packet or reach a forwarding decision.

ForCES [128] is the most notable proposal in this category. In ForCES, Logical Functional

Blocks (LFBs) represent the generic networking functions available in forwarding elements. For

instance, there are LFBs for address translation, packet validation, IP unicast, and queueing.


Composing these LFBs, the control elements can program forwarding elements.

SwitchBlade [14] is another example in this category which exposes prede�ned networking

functions as blackboxes which can be composed to perform a cohesive behavior. SwitchBlade

slices a switch into one or more Virtual Data Planes (VDPs) each having their own isolated

pipeline. Each VDP is composed of a set of modules (note that modules can be shared among

di�erent VDPs). �e set of modules of each VDP is dynamic and con�gurable.

SwitchBlade processes incoming packets in four stages: (i) It selects the appropriate VDP(s)

based on the �rst few bytes of the packet. (ii) Once the VDPs are selected, the packet enters a

shaping module for rate limiting. (iii)�en, packets are preprocessed through VDP’s respective

preprocessing modules to calculate meta-data about packets. (iv) Finally, packets are forwarded

using an output port lookup module which can use longest-pre�x matching, exact matching, or

forward packets to the CPU.

Recon�gurable Abstractions. �e other school of thought, led by OpenFlow [88], views the

data plane as a minimal set of functions that can be directly implemented in hardware and

pushes all other functions to the control plane. �ese proposals de�ne the data plane as a set

of packet processing elements providing basic forwarding and queuing primitives. �e state of

these forwarding elements is con�gured by the control plane. For that reason, recon�gurable

abstraction are not completely programmable. For instance, OpenFlow allows the control plane

to add, remove and modify �ow-entries (§2.1.4) in switches. Some proposals (e.g., Fabric [27])

even push the abstraction further towards a model in which edge switches are programmable,

and the core uses a high throughput label switching protocol (e.g., MPLS [105]).

P4 [20, 67] is another proposal that aims at providing more programmability and more

�exibility in recon�gurable networks. In essence, the goal of P4 is to bring the bene�ts of

programmable abstractions into recon�gurable abstractions. P4 has a high-level programming

language using which one can specify packet �elds, how the packets are parsed, and what are

the actions to apply on the packets. To be target independent, P4 automatically compiles this

high-level program onto the actual target.


Remarks. Programmable and generic abstractions are more evolvable, and accommodate a

broader range of data plane functions (including the recon�gurable abstractions) at the cost

of simplicity since, to use a speci�c data plane function, one would need to de�ne several

programming contracts; e�ectively specifying a custom southbound API. Recon�gurable

abstractions, on the other hand, enforce a well-de�ned southbound protocol that both the data

and control planes agree upon. �is results in a relatively limited data plane, but simpli�es

network control. More importantly, it prevents API fragmentation for control programs and

favors a vendor agnostic ecosystem. For that reason, the later model is well adopted by the

research community as well as the industry. In essence, recon�gurable abstractions are the most

pragmatic abstraction and thus most SDN proposals have converged on that.

2.3 Control Plane Scalability

Decoupling control from the data plane leads to several scalability concerns. Moving

traditionally local control functionalities to a remote controller can potentially result in new

bottlenecks. �e common perception that control in SDN is centralized leads to concerns about

SDN scalability and resiliency. It can also lead to signaling overheads that can be signi�cant

depending on the type of network and associated applications.

We argue that there is no inherent bottleneck to SDN scalability, i.e., these concerns stem

from implicit and extrinsic assumptions [60]. For instance, early benchmarks on NOX (the

�rst SDN controller) showed it could only handle 30k �ow initiations per second [51] while

maintaining a sub-10ms �ow install time. Such a low �ow setup throughput in NOX is shown

to be an implementation artifact [118]; or, the control plane being centralized was simply due to

the historical evolution of SDN. Without such assumptions, similar to any distributed system,

one can design a scalable SDN control plane.

We believe there are legitimate concerns for SDN scalability, but we argue that these

scalability limitations are not restricted to SDN, i.e., traditional control protocol design also


faces the same challenges. While this does not address these scalability issues, it shows that

we do not need to worry about most scalability issues in SDN more than we do for traditional

networks. In this section, we present recent proposals to address these scalability concerns.

2.3.1 In-Data-Plane Proposals

Datapath elements sit at the base layer of SDN with signi�cant in�uence on the whole

architecture. One way to address scalability concerns in the control plane is delegating more

responsibilities to the data plane. �is simply reduces the load on the control plane and can

eliminate bottlenecks.

Decision Caching. One way to eliminate the load on the control plane is to cache the decision

logic (i.e., the function that maps events into concrete rules) inside the datapath elements. �e

most notable work pursuing this idea is DIFANE [131]. DIFANE delegates forwarding logic

from the controller to a special kind of switches called Authority Switches in order to shield the

controller from packet-in events. �e controller o�oads forwarding rules to Authority Switches,

and Authority Switches, on behalf of the controller, install rules in other switches. In other

words, Authority Switches act as a proxy that caches simple forwarding decisions. Although

considerably e�cient, DIFANE is only applicable to forwarding rules that can be implemented

in a switch, and cannot help for complex control logics.

Aggregate Visibility. Another way to reduce the control plane’s load is to handle frequent events

(such as short-lived �ow arrivals) within the datapath, and selectively send aggregate events to

the control plane. �is approach is pioneered by DevoFlow [34] which uses wildcarded �ows

to forward tra�c, and sends only elephant �ows (e.g., �ow with more than a certain amount

of data, bandwidth consumption, etc.) to the controller. Although this approach would result

in scalable SDN, it comes at the cost of controller visibility: the controller never gets to see the

short-lived �ows. Some advantages of SDN would be hindered when the controller does not

have the power to visit all �ows.


Remarks. �ese in-data-plane proposals are mostly motivated by early benchmarks on

NOX [51] that are shown to be fallacies rooted in a naive IO implementation, which can be

simply resolved by using asynchronous IO mechanisms [118] or by exploiting parallelism [134].

Newer implementations of NOX and Beacon [39] can easily cope with hundreds of thousands

of �ows on commodity machines. We therefore believe that controller’s throughput is not an

inherent bottleneck in SDN and there is no need to push functionalities back to the data plane.

Moreover, delegating more to the data plane would result in less visibility in control applications

that can obviate the bene�ts of SDN.

2.3.2 Distributed Control Planes

�e controller plays a central role in SDN: it programs forwarding elements, maintains the

network state, and runs the control applications. In essence, it is the network operating

system, and thus must be scalable, reliable, and e�cient. Led by NOX [51], SDN controllers

are traditionally designed as a centralized middleware that communicates with switches and

hosts control applications. �ese controllers mostly use an abstract object model to represent

the network and provide an event-based programming model for control applications. �ese

controllers all had centralized architectures with ine�cient implementations. �ese limitations

have recently been the topic of extensive research that we present in this section.

Regardless of the controller’s capability, a centralized controller does not scale as the network

grows (increasing the number of switches, �ows, bandwidth, etc.) and will fail to handle

all the incoming requests while providing the same service guarantees. �is necessitates

scalable distributed control platforms that can cope with arbitrary loads in di�erent network

environments.

ONIX. ONIX [75] is the successor of NOX that maintains the centralized network state in

a distributed fashion. As shown in Figure 2.4, ONIX models the network using an abstract

data structure called Network Information Base (NIB). NIB represents a network as a graph in

which nodes represent networking elements and edges represents logical connections. NIB is


maintained in a distributed fashion and is eventually consistent. Within ONIX, each controller

has its own NIB, and when a part of the NIB is updated in one of the controllers it will

be eventually propagated to the whole cluster. All other aspects of state distribution (e.g.,

partitioning and consistency) are handled by control applications.

HyperFlow. HyperFlow [116] is another distributed controller built based on NOX. It partitions

the network, and assigns a controller to each partition. In order to synchronize the network

state, HyperFlow logs events on each controller and propagates them on the cluster. Within

HyperFlow, consistency is preserved by managing each switch only by the local controller.

Whenever a remote application requires to modify a switch, the message is sent to the controller

that manages that switch, and replies are routed back to the application accordingly. In contrast

to ONIX, HyperFlow does not require any change in control applications but is not as scalable

as ONIX.

DISCO. DISCO [103] is a distributed controller focusing on scalable overlay networks. DISCO

partitions the network into multiple domains, each controlled by its own controller. In contrast

to HyperFlow, DISCO focuses on providing end-to-end connectivity among nodes in the

network, not detailed path setup. As such, neighbor controllers form pub-sub relationships

with one another. Using this pub-sub system, which is built on top of AMQP [95], network

controllers share aggregated network-wide views and establish end-to-end connections.

ONOS. ONOS [17] is another distributed controller that tries to address the shortcomings of

Network Information Base

Distribution EngineDistribution Engine SouthBound DriverSouthBound Driver

Network Information Base

SwitchSwitch

Control ApplicationsControl ApplicationsControl ApplicationsControl ApplicationsControl ApplicationsControl Applications

Dat

a Pl

ane

Cont

rol P

lane

Controller 1 Controller N

Figure 2.4: ONIX maintains the network-wide state in a distributed fashion.


ONIX. In contrast to ONIX, the goal of ONOS is to provide a strongly consistent network-

wide state. To that end, as shown in Figure 2.5, ONOS stores the network state in an external

distributed datastore; Cassandra and ZooKeeper in its �rst version and RAMCloud in its second

version. As a result, the control platform consists of controllers that synchronize their state

using the external state management system. Delegating state management to an external

system simpli�es ONOS’s design, yet would lead into suboptimalities, scalability issues, and

maintenance di�culties.

B4. B4 [66] is Google’s production wide-area network (WAN) that consists of multiple WAN

sites. Each site in B4 is controlled by a distributed control plane, called Network Control Servers

(NCS), that manage the SDN switches in that site. NCS internally uses ONIX NIBs [75] to

implement OpenFlow, and Quagga [6] to implement routing protocols. At a higher level, a

logically-centralized global view of the network is used by a centralized Tra�c Engineering (TE)

application to optimize tra�c across B4’s planet-scale network.

ElastiCon. ElastiCon [37] is a proposal on how to distribute the load of the network among

controllers in a distributed control plane. �e main contribution of ElastiCon is a four phase

protocol that controllers use to elect an apt master for a new switch. Moreover, it proposes an

algorithm to migrate a switch from one controller to another. ElastiCon is tackling an important

SwitchSwitch

Control ApplicationsControl ApplicationsControl ApplicationsControl ApplicationsControl ApplicationsControl Applications

Dat

a Pl

ane

Cont

rol P

lane

Controller 1 Controller N

External Distributed Store

Figure 2.5: ONOS maintains the network-wide state using an external database system.


problem and the proposal is orthogonal to all distributed controllers. �at is, ElastiCon can be

easily adopted in other distributed controllers.

Remarks. An interesting observation is that control plane scalability challenges in SDN

(e.g., simplicity and consistency requirements) are not inherently di�erent than those faced in

traditional network design. SDN, by itself, is neither likely to eliminate the control plane design

complexity nor make it more or less scalable.3

SDN, however, (a) allows us to rethink the constraints traditionally imposed on control

protocol designs (e.g., a �xed distribution model) and decide on our own trade-o�s in the design

space; and (b) encourages us to apply common so�ware and distributed systems development

practices to simplify development, veri�cation, and debugging. Unlike traditional networks,

in SDN, we do not need to address basic but challenging issues like topology discovery, state

distribution, and resiliency over and over again. Control applications can rely on the control

platform to provide these common functions; functions such as maintaining a cohesive view of

the network in a distributed and scalable fashion.

2.4 Resilience to Failures

Resilience to failures and convergence time a�er a failure have always been a key concern in

network performance. SDN is no exception, and, with the early systems setting an example

of designs with a single central control, resiliency to failures has been a major concern. A

state-synchronized slave controller would be su�cient to recover from controller failures, but

a network partition would leave half of the network brainless. In a multi-controller network,

with an appropriate controller discovery mechanism, switches can always discover a controller

if one exists within their partition. �erefore, given a scalable discovery mechanism, controller

failures do not pose a challenge to SDN scalability.

Link Failures. Let us decompose the process of repairing a broken link or switch to see how it3A�er all, one can replicate a traditional network design with SDN by co-locating equal number of forwarding

and control elements. Even though this obviates the bene�ts of SDN, it is technically possible.


is di�erent from the traditional networks. As shown in Figure 2.6, convergence in response to

a link failure has �ve steps. �e switch detects a change. �en, the switch noti�es the controller.

Upon noti�cation, the control program computes the repair actions and pushes updates to

the a�ected datapath elements which, in turn, update their forwarding tables.4 In traditional

networks, link failure noti�cations are �ooded across the network, whereas with SDN, this

information is sent directly to a controller. �erefore, the information propagation delay in

SDN is no worse than traditional networks. Also, as an advantage for SDN, the computation is

carried out on more capable controller machines as opposed to weak management CPUs of all

switches, regardless of whether they are a�ected by the failure or not.

Note that the above argument was built on the implicit assumption that the failed switch or

link does not a�ect the switch-controller communication channel. �e control network itself

needs to be repaired �rst, if a failed link or switch was part of it. In that case, if the control

network – built with traditional network gear – is running an IGP, the IGP needs to converge

�rst before switches can communicate with the controller to repair the data network. In this

corner case therefore convergence may be slower than in traditional networks. If this proves to

be a problem, the network operator should deploy an out-of-band control network to alleviate

this issue.

Control Plane Failure. To tolerate a controller failure, controllers have to ensure that the

controller’s state is persistent and the state is properly replicated. For a centralized controller, this

means persistent state with master-slave replication. For distributed controllers, on the other

hand, this depends on the type of consistency guarantees they provide. Existing distributed

controllers either provide eventually consistent replications (e.g., ONIX [75]), or rely on an

external distributed datastore to replicate the state for them (e.g., ONOS [17]). �ey handle

network partitions in the control plane similarly. It is important to note that beyond the

distributed system mechanisms, one can use the state stored in switches to preserve consistency

upon failure [100].

4For switch failures, the process is very similar with the exception that the controller itself detects the failure.


Controller

Switch

Switch

Switch

1. Switch detects a link failure.

2. Switch notifies the controller.

3. Controller computes the required updates.

4. Controller pushes the updates to switches.

5. Switches update their forwarding tables.

Figure 2.6: �e �ve steps when converging upon a link failure.

Control applications can also result in failures in the control plane. For example, when a

control application fails to handle an event or when it crashes, it can result in a control plane

failure (e.g., loss of connectivity and a transient high loss episode). LegoSDN [28] aims at

isolating application failures using a sandbox called AppVisor which contains applications and,

according to the operator policies, can stop applications or revert their transactions upon an

application error. Approaches like LegoSDN can be very useful to improve the reliability of the

control plane.

Remarks. Overall, the failure recovery process in SDN is no worse than traditional networks.

Consequently, similar scalability concerns exist and the same techniques used to minimize

downtime in traditional networks are applicable to SDN. For instance, SDN design can and

should also leverage local fast failover mechanisms available in switches to transiently forward

tra�c towards preprogrammed backup paths while a failure is being addressed. We stress that

control platforms must provide the essential failover and recovery mechanisms that control

applications can reuse and rely upon.


2.5 Programming Abstractions

Almost all SDN controllers provide APIs that expose the details embedded in the southbound

interface between the data and control planes. �at level of details is essential for implementing

a controller, but not generally required for managing a network. Network operators and

programmers instead require an abstraction through which they can specify the required

behavior from the network with minimal e�ort. To that end, network programming boilerplates

and complexities should be handled by the controllers and be hidden from management

interfaces. Such an interface will accommodate network growth and will ease management at

scale. �is is a major challenge for the future of SDN to preserve its simplicity. In this section,

we present the proposals that aim at simpler programming abstractions.

Frenetic.�e Frenetic project [45, 91] is the foremost initiative aiming at raising the abstraction

level for network programming. Frenetic provides a declarative language (inspired from

SQL) for querying network state, and a reactive functional language for programming the

network. Frenetic provides functional composition by design and facilitates easy re-use.

Frenetic languages run atop a runtime environment that manages all low-level semantics such as

installing and updating rules and gathering network statistics. Moreover, the Frenetic runtime

provides optimizations and consistency guarantees (both at packet and �ow levels).

Nettle and Procera. Nettle [120] and Procera [121] is a high level functional abstraction for

network programming on top of OpenFlow, very similar to Frenetic. �e only di�erence is

that Nettle and Procera support all networking events while Frenetic is limited to packet level

events. On the other hand, Frenetic provides a declarative language for stat collection which

neither Nettle nor Procera provide.

Kinetic. Kinetic [72] is another programming abstraction, built on top of Pyretic, with the goal

of creating a domain-speci�c language (DSL) for dynamic network con�gurations. In contrast

to Frenetic, which provides a static network programming abstraction, Kinetic aims at creating

a network programming environment that can react to temporal changes and events in the


network. More importantly, policies implemented in Kinetic are formally veri�able. Internally,

Kinetic uses a �nite state machine (FSM) to present a dynamic abstraction of the policy. �ese

state machines can be automatically composed to create larger policies and are fed into NuSMV

for formal veri�cation. Providing these features makes Kinetic simpler than Pyretic to use in

real environments yet incur its own performance and scalability overheads.

Remarks. �ese high-level programming abstractions can indeed simplify network program-

ming by hiding the internal details of SDN protocols. All these proposals, however, focus on

centralized controller systems. Providing such abstractions on distributed controllers requires

extensive architectural changes in both the control plane and also the abstraction.

2.6 Correctness and Consistency Tools

Networks are dynamic entities that constantly change state; links go up and down, elements

fail, and new switches are added. Each of these events necessitates new con�gurations that are

enforced while the network is operating. �is results in transient inconsistencies, subtle errors

(such as loops), and colorful failures that are di�cult to detect and resolve. One of the advantages

of SDN is that it paves the way for developing new tools to help in detecting and prohibiting such

consistency and validation problems.

Veri�cation. Veri�cation tools mostly focus on verifying a set of invariants against the network

con�guration. �ere are static veri�cation tools that verify network con�guration o�ine and

out of bound. Such veri�cation tools aim at checking “what-if ” rules, and usually pursue formal

modeling for packets, protocols, and �ows in order to check network invariants.

FlowChecker [31] and its predecessor Con�gChecker [11] use Binary Decision Diagrams

(BDDs) to model the network state, and use model checking techniques to check whether the

model matches a set of prede�ned speci�cations and rules. Anteater [82], similarly, models

network state as a boolean satis�ability problem (SAT), and uses a SAT solver to check invariants.

NICE [24] is a similar approach that uses model checking to validate OpenFlow applications.


It emulates a simple switch for the controller and tries to explore all valid state spaces (in

regards to the controller, control channels, and switches) to �nd bugs and inconsistencies.

One limitation of NICE is that it can only check one control application at a time due to the

complexity of verifying parallel control logic.

Header Space Analysis (HSA) [69], in contrast, models packets as a binary vector and

formally de�ne network con�guration as a function that transforms this binary vector.

Invariants are then veri�ed based on such header transformations. �ese methods have

interesting theoretical properties and are able to detect some con�guration problems, while

they fall short in modeling complex network transformations (even when it comes to modeling

simple address translations).

Another category of veri�cation tools are focused on real-time, inbound veri�cation. �e

challenge here is to develop methods for (i) gathering a real-time view of the network and (ii)

e�cient veri�cation. VeriFlow [70, 71], for instance, limits the search space to a subset of the

network and, with that, signi�cantly improves veri�cation performance. VeriFlow, however,

cannot detect transient inconsistencies that are caused by di�erent synchronizations between

the control plane and the datapath.

Consistency Guarantees. Veri�cation mechanisms, although important, cannot detect tran-

sient inconsistencies resulting from out of sync con�guration updates. For instance, to install a

path, we need to push network con�guration to every switch in the path. �is can simply go out

of sync because of networking delays, and until the system reaches a stable state, many packets

are maliciously dropped or wrongly forwarded. �ese sort of problems are very frequent in

practice (due to the nature of distributed systems) and are signi�cantly di�cult to discover.

Reitblatt et al. [104] have systematically studied these problems, classi�ed them, and

proposed update mechanisms to prevent such inconsistencies. �ey have identi�ed two

classes of consistency guarantees: (i) per-packet consistency which ensures each packet is

processed using the same version of con�guration throughout the network, and (ii) per-�ow

consistency which ensures packets of a �ow are all forwarding using the same version of network


con�guration. Note that the latter is a stronger guarantee and thus more di�cult to implement.

For per-packet consistency, one can add a tag match to each rule representing the

con�guration version. Once all rules of a speci�c version are installed, packets can be labeled

with the apt con�guration version at ingress ports. Authors have proposed three mechanisms

for per-�ow consistency. �e �rst mechanism is mixing the per-packet mechanism with

�ow timeouts. �is ensures con�gurations are not applied unless all �ows using a speci�c

con�guration version are ended. �is approach falls short when a �ow spans di�erent ingress

port. �e second approach is to use wild card cloning which requires DevoFlow switches [34],

and the third approach requires noti�cation from end-hosts. �is approach is further extended

to provide bandwidth guarantees for virtual machine migration in [47]. �e main issue of all

these approaches is that, in general, they can double the size of forwarding tables during an

update.

McGeer [87] has proposed an alternative approach for rule update which incorporates the

controller. �is approach pushes a set of intermediate rules to switches which forward a�ected

�ows to the controller. Once the controller received all packets from all a�ected �ows, it installs

the new version of rules, and then forwards all �ows. �is approach o�ers more �exibility while

increasing the load on the control plane.

Customizable Consistency Generator (CCG) [135] is another proposal that aims at mini-

mizing the relatively high performance and space overheads of the aforementioned consistency

proposals. �e main idea behind CCG is to minimize the network state transition delay upon a

con�guration update. To that end, for each update, it veri�es whether the update is safe to apply

based on a set of user-de�ned invariants. If safe, the update is installed, otherwise bu�ered. In

the presence of a deadlock (i.e., when none of the updates can be safely applied), CCG falls back

to stronger consistency methods (such as [104]) to apply the updates. �e authors have shown

that CCG can lower space overheads and the transition delays for consistent state update.

Congestion-Free Updates. Even when rules are updated consistently (say with no transient

loops or dead-end paths), the network might observe transient congestion and respective high


loss episodes during an update due to asynchronous updates among switches. �at is, because

switches apply updates asynchronously, there is a possibility of signi�cant transient tra�c spikes

in the network. �is can cause high packet loss and considerable temporary congestion, which

hinders the usability of interactive (i.e., latency intolerant) systems.

zUpdate [79] and SWAN [62] are both proposals aiming for congestion-free network

updates. zUpdate calculates the steps required to transition from the current state to the desired

state in a way to minimize congestion and tra�c imbalance, and installs the con�guration

updates using the same method proposed in [104]. SWAN proposes a similar technique with

the di�erence that it keeps a scratch link capacity (e.g., 13 of the link capacity) to avoid infeasible

update scenarios. Both zUpdate and SWAN perform better than distributed network protocols

such MPLS-TE but are limited to speci�c network topologies.

Con�ict-Free Con�gurations. Policies and con�gurations can come from di�erent sources for

di�erent purposes, and hence can be con�icting. For example, a load balancing application

naturally has a di�erent objective than an energy optimizer. Such control applications will

compete for resources of the network. When dealing with user-de�ned policies, one can design

an autonomous system that resolves con�ict (e.g., the con�ict resolution mechanism used for

hierarchical policies in PANE [41, 42]).

Composing control applications in a con�ict-free manner is more complicated than

resolving con�icts among user-de�ned policies since control applications are not as well-de�ned

and limited as user-de�ned policies. �ere are two important proposals for this problem.

Corybantic [90] proposes a coarse-grained system in which control modules propose their

intended updates in each update round. �en, the system selects a non-con�icting set of updates

to apply and noti�es the control modules of the selected updates. �is way the control modules

will adapt their next updates according to the decisions in this round. Corybantic is a plausible

solution yet it is quite coarse-grained and appears to be di�cult to fully automate.

Volpano et al. [122] propose a �ner-grained solution based on deterministic �nite-state

transducer (DFTs) [63]. DFTs are basically �nite state machines that can translate an input to


an output. Volpano et al. assume that control applications are implemented or speci�ed using

DFTs. Given such DFTs, an autonomous system can compile updates in safe (i.e., con�ict-free)

zones. Although the proposed system is �ne-grained, the main problem with this proposal is

that DFTs are limited and cannot express all sorts of control applications.

Debuggers. Although written in so�ware, it is not easy to debug a network application using

existing so�ware debugging tools. �e reason is that network applications program a distributed

system (a cluster of switches), and for debugging its behavior we need to access arbitrary switches

and inspect arbitrary forwarding packets.

ndb [54] aims at providing basic debugging features (such as breakpoints and traces). To

provide breakpoints, ndb install rules on switches to notify the information collector. To

provide packet backtrace, ndb stores postcards (i.e., packet headers along with their �ow match

information on all �ows). When packets traverse through the network, they generate postcards

and once it hits a breakpoint the debugger can build the backtrace. ndb is built in a way that

postcards and breakpoints do not hit the control plane. However, they consume a considerable

amount of bandwidth on the datapath.

SDN Troubleshooting System (STS) [108] takes an alternative path and tries to �nd so�ware

bugs by replaying logs. Due to the probabilistic nature of SDN (e.g., asynchronous events, race

conditions and distributed actors), STS cannot rely on a simple replay of logs. Instead, STS

does a random walk among possible replay sequences, tries each sequence multiple times, and

checks invariants to �nd the minimum sequence of inputs that trigger the bug. Although STS

uses pruning techniques to narrow its search space, it is still unclear how it will scale for realistic

networks with tens of thousands of switches.

Remarks. Consistency, correctness, and troubleshooting are all very important aspects of

network programming. �e proposals that we discussed in this chapter are all useful to

implement a correct and consistent SDN. Having said that, there are still signi�cant open

problems that are not addressed by existing proposals. Speci�cally, correctness and consistency

is still an open problem in a distributed setting. For example, when we have multiple controllers


managing the network providing a �ow-level consistency is quite challenging. Another

important open challenge is performance troubleshooting and debugging. �at is, existing

proposals cannot answer queries such as “why is my control plane slow?” or “why doesn’t it

scale?”

2.7 Monitoring and Measurements

Before SDN, networking applications relies on protocols such as SNMP [56] to capture network

statistics and metrics. SNMP metrics, however, are coarse grained and di�cult to customize. For

instance, it was only possible to estimate tra�c matrix using SNMP in traditional networks [89].

For that reason, one would need to use sampling mechanisms (such as sFlow [102]) or capture

statistics by modifying end-hosts [46]. To that end, emergence of SDN protocols provides an

opportunity for improving monitoring and measurement tools. �at is because SDN protocols

provide meaningful counters that are attached to individual �ows, ports, queues, and switches.

�is powerful construct enables control applications to gather accurate tra�c matrix.

OpenTM. OpenTM [117] was the �rst e�ort to capture tra�c matrix using SDN counters.

OpenTM is basically a control application running on NOX [51] that queries OpenFlow switches

for �ow counters in the network, and creates the tra�c matrix accordingly. To lower the

overhead of statistics collection, OpenTM tries to �nd the optimum set of switches to query. �e

authors have conducted experiments with di�erent selection schemes (i.e., uniform random,

non-uniform random, last switch, round-robin, and least-loaded switch) and have shown that

querying the least-loaded switch will be the best switch selection scheme in OpenFlow network.

Despite all the values of OpenTM, we believe in a real world setting edge switches are always the

least-loaded switches and hence querying edge switches would be the simplest, most realistic,

and an optimal way of gathering the tra�c matrix.

FlowSense. FlowSense [129] has the same goal as OpenTM, but proposes an alternative

approach. Instead of querying switches and polling the stat counters, FlowSense relies on Packet-


In and Flow-Removed OpenFlow messages to collect the tra�c matrix. Packet-In events are

�red when a packet does not match any �ow in the switch, and Flow-Removed is �red when a

�ow is removed/expired. FlowSense uses Packet-In events as a trigger for starting a �ow, and

uses byte counters and duration in Flow-Removed events to estimate the tra�c of the �ow. �e

authors have shown that FlowSense has a low overhead compared to polling while maintaining

the same level of accuracy. �e major problem with FlowSense is that it is limited to reactive �ow

installation and will not work for a network that is proactively programmed. As most real-world

networks are proactively managed, FlowSense would not be widely adopted in practice.

OpenSketch. In contrast to OpenTM and FlowSense, which monitor the network in the control

plane, OpenSketch [130] proposes to implement measurement and monitoring facilities in the

data plane. OpenSketch has a three-stage, data plane pipeline to select and count packets: (i)

Hashing to summarize packet headers, (ii) Classi�cation to classify the packets into di�erent

classes, and (iii) Counting which uses probabilistic counters (i.e., Count-Min Sketches) to store

the count (for each class). In the control plane, OpenSketch provides a measurement library

to assist the control logic to identify �ows, store statistics, and query counters. OpenSketch is

shown to be accurate while preserving low memory footprint. �e major issue with OpenSketch

is that it requires changes in the data plane, which results in bloated switches and fragmentation

which is in contrast to SDN design principles.

DREAM. DREAM [92] is another inbound measurement system that tries to implement �ne-

grain measurements using existing TCAM counters in hardware switches. �e main challenge of

using TCAMs is their limited capacity; especially when one needs �ne-grained measurements.

DREAM proposes an adaptive method, trying to �nd a trade-o� between the granularity of

matches installed in TCAMs and the TCAM space. Although this method can be advantageous

in existing network, it has limits in the level of visibility it provides for control applications.

NetAssay. NetAssay [38] takes a di�erent approach for measurement and monitoring by

focusing on high-level concepts (e.g., devices and applications) instead of low-level constructs

(e.g., �ows or addresses). To that end, NetAssay forms a dynamic mapping between higher-level


concepts and the �ow space, and uses that mapping for translating low-level measurements

to high-level monitoring. To keep NetAssay generic, these mappings can be implemented by

the operators. It is important to note that NetAssay’s approach is orthogonal to other SDN

measurement approaches and can complement other proposals.

2.8 Network Optimization

Network applications are traditionally built based on the end-to-end design argument [106] and

traditional networks act as a black-box medium for packet transport. �is resulted in suboptimal

behaviors in transport and application layer [18]. �is problem can only be solved when the

network provides an open integration/programming point for applications. SDN provides such

an open API and creates an opportunity to optimize network applications. In what follows, we

brie�y explain some of the most important application optimization e�orts based on SDN.

Hedera and Helios. Data center networks usually have a multi-root topology (e.g., leaf-spine

topology) with high aggregate bandwidth. Traditional Interior Gateway Protocols (IGPs), such

as IS-IS [97] and OSPF [93], mostly yield a minimum spanning tree, and hence, quite a few

underutilized links. Link aggregation protocols (e.g., MLAG or MC-MLAG), on the other hand,

are passive and cannot optimize for tra�c patterns in a network. For instance, it can result in

congestion for elephant �ows. To tackle this problem, Hedera [10] proposes a dynamic �ow

scheduling mechanism using SDN. �e Hedera scheduler �nds elephant �ows (i.e., �ows that

need bandwidth but are network limited) and tries to �nd a optimal placement for elephant

�ows on ECMP paths.

Helios [40] focuses on hybrid datacenter fabrics that use all optical switches alongside

an Ethernet network. In such networks, the optical fabric is high-throughput but does not

have enough (bandwidth and switching) capacity to forward all connections in the network.

Moreover, the optical fabric has a relatively high latency for switching and thus cannot e�ciently

react to short-lived tra�c. �e idea behind Helios is to forward short lived tra�c on the Ethernet


fabric, and forward long-lived, elephant �ows on optical paths. To that end, Helios queries

switches to detect elephant �ows. Once an elephant �ow is detected it will be forwarded to a

pre-setup optical path.

Both Hedera and Helios result in higher link utilization, lower congestion, and lower late

latency.

Hadoop Optimization. Helios and Hedera were both application agnostic and work without

the knowledge of the applications and their networking demand. Although this results in

transparent network optimization, there are applications (or development frameworks) where

we know about their network demand and application topology. For instance, in a Hadoop

cluster, we can take advantage of our knowledge about job allocation and jobs’ size, to optimize

the network. Wang et al. [124] have proposed to use networking information of Hadoop

clusters for �ow scheduling. �ey focus on hybrid (optical/Ethernet) fabrics, and recon�gure

the topology of the optical fabric based on Map-Reduce jobs, and HDFS demand.

2.9 Discussion

Previous works have addressed SDN research challenges in an innovative manner that was not

feasible in traditional networks. However, we believe that there is still no clear proposal on how

to resolve scalability issues in SDN without hindering its simplicity. Implementing, measuring

and optimizing a distributed control application is quite di�cult using existing proposals. On

the other hand, using simplifying programming models results in centralization and possibly the

lack of scalability. Our goal in this dissertation is to create an easy-to-use network programming

paradigm that is distributed, intuitive, scalable and self-optimizing.

Chapter 3

Kandoo

Limiting the overheads of frequent events on the control plane is essential for realizing a scalable

SDN. One way of limiting this overhead is to process frequent events in the data plane. �is

requires modifying switches and comes at the cost of visibility in the control plane. Taking an

alternative route, in this chapter, we present Kandoo [57], a framework for preserving scalability

without changing switches.

3.1 Introduction

Frequent and resource-exhaustive events, such as �ow arrivals and network-wide statistics

collection events, stress the control plane and consequently limit the scalability of OpenFlow

networks [34, 116, 33]. Although one can suppress �ow arrivals by proactively pushing the

network state, this approach falls short when it comes to other types of frequent events such as

network-wide statistics collection. Existing solutions either view this as an intrinsic limitation

or try to address it by modifying switches. For instance, HyperFlow [116] can handle a few

thousand events per second, and anything beyond that is considered a scalability limitation. In

contrast, DIFANE [131] and DevoFlow [34] introduce new functionalities in switches to suppress

frequent events and to reduce the load on the control plane.

To limit the load on the controller, frequent events should be handled in the closest vicinity

44

Chapter 3. Kandoo 45

of datapaths, preferably without modifying switches. Adding new primitives to switches is

undesirable. It breaks the general principles of SDN, requires changes to the standards, and

necessitates the costly process of modifying switching hardware. �us, the important question

is:

How can we move control functionalities toward datapaths, without introducing new datapath

mechanisms in switches?

To answer this question, we focus on: (i) environments where processing power is readily

available close to switches (such as datacenter networks) or can be easily added (such as

enterprise networks), and (ii) applications that are local in scope, i.e., applications that process

events from a single switch without using the network-wide state. We show that, under these

two conditions, one can o�oad local event processing to local resources, and therefore realize a

control plane that handles frequent events at scale.

We note that some applications, such as routing, require the network-wide state, and cannot

be o�oaded to local processing resources. However, a large class of applications are either

local (e.g., local policy enforcer and Link Layer Discovery Protocol [7]) or can be decomposed

to modules that are local (e.g., elephant �ow detection module in an elephant �ow rerouting

application).

Kandoo. In this chapter, we present the design and implementation of Kandoo [57], a

novel distributed control plane that o�oads control applications over available resources in

the network with minimal developer intervention and without violating any requirements of

control applications. Kandoo’s control plane essentially distinguishes local control applications

(i.e., applications that process events locally) from non-local applications (i.e., applications that

require access to the network-wide state). Kandoo creates a two-level hierarchy for controllers:

(i) local controllers execute local applications as close as possible to switches, and (ii) a logically

centralized root controller runs non-local control applications. As illustrated in Figure 3.1,

several local controllers are deployed throughout the network; each of these controllers controls

one or a handful of switches. �e root controller, on the other hand, controls all local controllers.


Switch

Local Controller

Switch Switch Switch Switch

Local Controller

LocalController

Root Controller

RareEvents

FrequentEvents

Figure 3.1: Kandoo’s Two Levels of Controllers. Local controllers handle frequent events, whilea logically centralized root controller handles rare events.

It is easy to realize local controllers since they are merely switch proxies for the root

controller, and they do not need the network-wide state. �ey can even be implemented directly

in OpenFlow switches. Interestingly, local controllers can linearly scale with the number of

switches in a network. �us, the control plane scales as long as we process frequent events in

local applications and shield the root controller from these frequent events. Needless to say,

Kandoo cannot help any control applications that require network-wide state (even though it

does not hurt them, either). In Chapter 4, we present Beehive, which provides a framework to

implement such applications.

Our implementation of Kandoo is completely compliant with the OpenFlow speci�cations.

Data and control planes are decoupled in Kandoo. Switches can operate without having a

local controller; control applications function regardless of their physical location. �e main

advantage of Kandoo is that it gives network operators the freedom to con�gure the deployment

model of control plane functionalities based on the characteristics of control applications.

�e design and implementation of Kandoo are presented in Section 3.2. Our experiments

con�rm that Kandoo scales an order of magnitude better than a normal OpenFlow network and

would lead to more than 90% of events being processed locally under reasonable assumptions,

as described in Section 3.3. Applications of Kandoo are not limited to the evaluation scenarios


presented in this section. In Section 3.4, we brie�y discuss other potential applications of

Kandoo and compare it to existing solutions. We conclude our discussion of Kandoo in

Section 3.5.

3.2 Design and Implementation

Design Objectives. Kandoo is designed with the following goals in mind:

1. Kandoo must be compatible with OpenFlow: we do not introduce any new data plane

functionality in switches, and, as long as they support OpenFlow, Kandoo supports them,

as well.

2. Kandoo is easy to use and automatically distributes control applications without any

manual intervention. In other words, Kandoo control applications are not aware of

how they are deployed in the network, and application developers can assume their

applications would be run on a centralized OpenFlow controller. �e only extra

information Kandoo needs is a �ag showing whether a control application is local or not.

In this section, we explain Kandoo’s design using a toy example. We show how Kandoo can

be used to reroute elephant �ows in a simple network of three switches (Figure 3.2). Our example

has two applications: (i) Appdetect , and (ii) Appreroute . Appdetect constantly queries each switch

to detect elephant �ows. Once an elephant �ow is detected, Appdetect noti�es Appreroute , which

in turn may install or update �ow-entries on network switches.

It is extremely challenging, if not impossible, to implement this application in current

OpenFlow networks without modifying switches [34]. If switches are not modi�ed, a (logically)

centralized control needs to frequently query all switches, which would place a considerable

load on control channels.

Kandoo Controller. As shown in Figure 3.3, Kandoo has a controller component at its core.

�is component has the same role as a general OpenFlow controller, but it has Kandoo-speci�c


Root Controller

Local ControllerLocal Controller

SwitchSwitch

Appdetect

End-host Switch

Appdetect

End-host

Flow-Entry

Eelephant

LegendLogical Control Channel

Datapath Connection

Local Controller

Appdetect

Appreroute

Flow-Entry

Figure 3.2: Toy example for Kandoo’s design: In this example, two hosts are connected usinga simple line topology. Each switch is controlled by one local Kandoo controller. �e rootcontroller controls the local controllers. In this example, we have two control applications:Appdetect is a local control application, but Appreroute is non-local.

functionalities for identifying local and non-local application, hiding the complexity of the

underlying distributed application model, and propagating events in the control plane.

A network controlled by Kandoo has multiple local controllers and a logically centralized root

controller.1 �ese controllers collectively form Kandoo’s distributed control plane. Each switch

is controlled by only one Kandoo controller, and each Kandoo controller can control multiple

switches. If the root controller needs to install �ow-entries on switches of a local controller, it

delegates the requests to the respective local controller. Note that for high availability, the root

controller can register itself as the slave controller for a speci�c switch (this behavior is supported

in OpenFlow 1.2).

Deployment Model. �e deployment model of Kandoo controllers depends on the charac-

teristics of a network. For so�ware switches, local controllers can be directly deployed on the

same end-host. Similarly, if we can change the so�ware of a physical switch, we can deploy

Kandoo directly on the switch. Otherwise, we deploy Kandoo local controllers on the processing

resources closest to the switches. In such a setting, one should provision the number of local

controllers based on the workload and available processing resources. Note that we can use1We note that the root controller in Kandoo can itself be distributed.


Non-local ApplicationNon-local

Application

Local Application

Local Application

Data-path Element

Local Controller

Local Applications

RegisterApplication

ActionsEvents

Events

Root Controller

RelayEvents

Subscribe to Events

OpenFlowPackets

Data-path ElementSwitches

API

API Non-local

ApplicationsActionsEvents

Events

RegisterApplication

Kandoo Controllers

Figure 3.3: Kandoo’s high level architecture.

a hybrid model in real settings. For instance, consider a virtualized deployment environment

depicted in Figure 3.4, where virtual machines are connected to the network using so�ware

switches. In this environment, we can place local controllers in end-hosts next to so�ware

switches and in separate nodes for other switches.

In our toy example (Figure 3.2), we have four Kandoo controllers: three local controllers

controlling the switches and a root controller. �e local controllers can be physically positioned

using any deployment model explained above. Note that, in this example, we have the maximum

number of local controllers required.

Control Applications. In Kandoo, control applications are implemented using the abstraction

provided by the platform and are not aware of their actual placement (e.g., on local controllers

or on the root controller). �ey are generally OpenFlow applications and can therefore send

OpenFlow messages and listen on events. Moreover, they can emit application-de�ned events,

which can be consumed by other applications, and can reply to an event emitted by another

application. In Kandoo, control applications use Packet [5] to declare, serialize, and deserialize

events.


Processing Resource

End-Host

Physical SwitchVM

Root Controller

Local Controller

Local Controller

Software Switch

VM

LegendLogical Control Channel

Data Path Links

Physical SwitchVM

Figure 3.4: Kandoo in a virtualized environment. For so�ware switches, we can leverage thesame end-host for local controllers, and, for physical switches, we use separate processingresources.

Control applications are loaded in local name spaces and can communicate only using

events. �is is to ensure that Kandoo does not introduce faults by o�oading applications. In

our example, Eel ephant is an application-de�ned event that carries matching information about

the detected elephant �ow (e.g., its OpenFlow match structure) and is emitted by Appdetect .

A local controller can run an application only if the application is local. In our example,

Appreroute is not local, i.e., it may install �ow-entries on any switch in the network. �us, the root

controller is the only controller able to run Appreroute . In contrast, Appdetect is local; therefore,

all controllers can run it.

Event Propagation. �e root controller can subscribe to speci�c events in the local controllers

using a simple messaging channel plus a �ltering component. Once the local controller receives

and locally processes an event, it relays the event to the root controller for further processing.

Note that all communications between Kandoo controllers are event-based and asynchronous.

In our example, the root controller subscribes to events of type Eel ephant in the local

controllers since it is running Appreroute listening on Eel ephant . Eel ephant is �red by an Appdetect


instance deployed on one of the local controllers and is relayed to root controller. Note that if

the root controller does not subscribe to Eel ephant , the local controllers will not relay Eel ephant

events.

It is important to note that the data �ow in Kandoo is not always bottom-up. A local

application can explicitly request data from an application deployed on the root controller by

emitting an event, and applications on the root controllers can send data by replying to that

event. For instance, we can have a topology service running on the root controller that sends

topology information to local applications by replying to events of a speci�c type.

Reactive vs. Proactive. Although Kandoo provides a scalable method for event handling, we

strongly recommend pushing network state proactively. We envision Kandoo to be used as a

scalable, adaptive control plane, where the default con�guration is pushed proactively and is

adaptively re�ned a�erwards. In our toy example, default paths can be pushed proactively, while

elephant �ows will be rerouted adaptively.

3.3 Evaluation

In this section, we present the results obtained for the elephant �ow detection problem. �is

evaluation demonstrates the feasibility of Kandoo to distribute the control plane at scale and

strongly supports our argument.

Setup. In our experiments, we realize a two-layer hierarchy of Kandoo controllers as shown

in Figure 3.5. In each experiment, we emulate an OpenFlow network using a slightly modi�ed

version of Mininet [78] hosted on a physical server equipped with 64G of RAM and 4 Intel

Xeon(R) E7-4807 CPUs (each with 6 cores). Here, we do not enforce any rate, delay or bu�ering

for the emulated network. We use OpenVSwitch 1.4 [101] as our kernel-level so�ware switch.

Elephant Flow Detection. We implemented Elephant Flow Detection applications (i.e.,

Appreroute and Appdetect) as described in Section 3.2. As depicted in Figure 3.5, Appdetect

is deployed on all Kandoo local controllers, whereas Appreroute is deployed only on the root


Host Host Host Host

Local Controller 0

Local Controller 1 Local Controller N

Root Controller

Appreroute

Appdetect

Appdetect

Appdetect

LearningSwitch

LearningSwitch

LearningSwitch

Core Switch

ToR Switch 1 ToR Switch N

Figure 3.5: Experiment Setup. We use a simple tree topology. Each switch is controlled by onelocal Kandoo controller. �e root Kandoo controller manages the local controllers.

controller. Our implementation of Appdetect queries only the top-of-rack (ToR) switches to

detect the elephant �ows. To distinguish ToR switches from the core switch, we implemented

a simple link discovery technique. Appdetect �res one query per �ow per second and reports a

�ow as elephant if it has sent more than 1MB of data. Our implementation of Appreroute installs

new �ow entries on all switches for the detected elephant �ow.

Learning Switch. In addition to Appdetect and Appreroute , we use a simple learning switch

application on all controllers to setup paths. �is application associates MAC addresses to ports

and installs respective high granular �ow entries (i.e., destination MAC adress, IP address and

TCP port number) with an idle timeout of 100ms on the switch. We note that the bandwidth

consumed for path setup is negligible compared to the bandwidth consumption for elephant

�ow detection. �us, our evaluation results would still apply, even if we install all paths

proactively.

Methodology. In our experiments, we aim to study how this control plane scales with respect to

the number of elephant �ows and the network size compared to a normal OpenFlow network

(where all three applications are on a single controller and Appdetect queries switches at the same


rate). We measure the number of requests processed by each controller and their bandwidth

consumption. We note that our goal is to decrease the load on the root controller in Kandoo.

Local controllers handle events locally, which consume far less bandwidth compared to events

sent to the root controller. Moreover, Kandoo’s design makes it easy to add local controllers

when needed, e�ectively making the root controller the only potential bottleneck in terms of

scalability.

Our �rst experiment studies how these applications scale with respect to the number of

elephant �ows in the network. In this experiment, we use a tree topology of depth 2 and

fanout 6 (i.e., 1 core switch, 6 top-of-rack switches, and 36 end-hosts). Each end-host initiates

one hundred UDP �ows randomly to other hosts in the network over the course of four seconds.

We use two types of UDP �ows: (i) mouse �ows that will last only 1 second, and (ii) elephant

�ows that will last for 4 seconds. �is results in 3600 UDP �ows in total. In this experiment, we

measure the rate of messages sent over the control channels, averaged for 25 experiments.

To forward a UDP �ow towards an end-host connected to the same ToR switch, we install

only one �ow-entry. Otherwise, we will install three �ow-entries (one on the source ToR switch,

one on the core, and one on the destination ToR switch). �e probability of establishing a UDP

�ow from a given host to another host on the same ToR switch is only 14%. �is synthetic

workload stresses Kandoo since most �ows are not local. As reported in [16], datacenter tra�c

has locality, and Kandoo would therefore perform better in practice.

As depicted in Figure 3.6, control channel consumption and the load on the central

controller are considerably lower for Kandoo, even when all �ows are elephant2: �ve times

smaller in terms of messages (Figure 3.6a), and an order of magnitude in terms of bytes

(Figure 3.6b). �e main reason is that, unlike the normal OpenFlow, the central controller does

not need to query the ToR switches. Moreover, Appdetect �res one event for each elephant �ow,

which results in signi�cantly less events.

To study Kandoo’s scalability based on the number of nodes in the network, we �x ratio of

2When all �ows are elephant, Appdetec t �res an event for each �ow. �us, it re�ects the maximum number ofevents Kandoo will �re for elephant �ow detection.


0 500

1000 1500 2000 2500 3000

10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Mes

sage

s Per

Sec

ond

Ratio of Elephant to Mouse Flows

Normal OpenFlow Controller Kandoo's Root Controller

(a) Average number of messages received by the controllers

0 50

100 150 200 250 300

10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Band

wid

th C

onsu

mpt

ions

(k

B/s)

Ratio of Elephant to Mouse Flows

Normal OpenFlow Controller Kandoo's Root Controller

(b) Average number of bytes received by the controllers

Figure 3.6: Control Plane Load for the Elephant Flow Detection Scenario. �e load is based onthe number of elephant �ows in the network.

the elephant �ows at 20% and experiment with networks of di�erent fanouts. As illustrated in

Figure 3.7, the role of local controllers is more pronounced when we have larger networks. �ese

controllers scale linearly with the size of the network and e�ectively shield the control channels

from frequent events. Consequently, Kandoo’s root controller handles signi�cantly less events

compared to a normal OpenFlow network.

Based on these two experiments, Kandoo scales signi�cantly better than a normal OpenFlow

network. �e performance of Kandoo can be improved even further by simple optimization in

Appdetect . For example, the load on the central controller can be halved if Appdetect queries only


0

1000

2000

3000

4000

5000

6000

2 3 4 5 6 7

Mes

sage

s Per

Sec

ond

Fanout

Normal Open!ow Root Controller

(a) Average number of packets received by the controllers

0

100

200

300

400

500

600

2 3 4 5 6 7

Band

wid

th C

onsu

mpt

ion

(k

B/s)

Fanout

Normal Open!ow Root Controller

(b) Average number of bytes received by the controllers

Figure 3.7: Control Plane Load for the Elephant Flow Detection Scenario. �e load is based onthe number of nodes in the network.

the �ow-entries initiated by the hosts directly connected to the respective switch. Moreover,

the perceived functionality of our elephant �ow rerouting application remains the same in both

cases. �at is, Kandoo does not improve nor hurt the ability to detect or reroute elephant �ows.

In essence, the main advantage of Kandoo is that it enables us to deploy the same application in

large-scale networks by shielding the control channels from the load of frequent events.

3.4 Discussion

Datapath Extensions. �e problem that we tackle with Kandoo is a generalization of several

previous attempts at scaling SDN. A class of solutions, such as DIFANE [131] and DevoFlow [34],

address this problem by extending data plane mechanisms of switches with the objective of


reducing the load towards the controller. DIFANE tries to partly o�oad forwarding decisions

from the controller to special switches, called authority switches. Using this approach, network

operators can reduce the load on the controller and the latencies of rule installation. DevoFlow,

on the other hand, introduces new mechanisms in switches to dispatch far fewer “important”

events to the control plane. Kandoo has the same goal, but, in contrast to DIFANE and

DevoFlow, it does not extend switches; instead, it moves control plane functions closer to

switches. Kandoo’s approach is more general and works well in datacenters, but it might have a

lower throughput than speci�c extensions implemented in hardware.

Processing frequent events using hardware primitives in forwarding silicone can result in

considerably better e�ciency and lower costs at scale compared to a centralized controller. �is

method, however, comes with its own pitfalls [58]: (i) Designing a special purpose hardware

for a new function imposes a rigid structure; hence, that function cannot evolve frequently. If

we need to update the design (say to �x an error, or to improve performance) we need to wait

for a considerably long time for the next development cycle. (ii) Design, development, and

manufacturing special purpose hardware is quite costly (orders of magnitude more expensive

than so�ware), which increases the risk for already risk-averse ASIC manufacturers.

Because of these pitfalls, silicone is not the best way for introducing new network functions,

where chances of change is not negligible. In essence, it would make sense to use silicone for

implementing a speci�c function only when (i) the function is mature enough (i.e., applied

and tested in real-world for quite some time), and (ii) we have hit a performance barrier on

commodity hardware. �is explains OpenFlow’s focus on forwarding as the main data path

functionality.

Interestingly, we can use Kandoo to prototype and test DIFANE, DevoFlow, or other

potential hardware extensions. For instance, an authority switch in DIFANE can be emulated by

a local Kandoo controller that manages a subset of switches in the network. As another example,

DevoFlow’s extensions can also be emulated using Kandoo controllers directly installed on

switches. �ese controllers not only replace the functionality of DIFANE or DevoFlow, but they


also provide a platform to run any local control application in their context.

DistributedControllers. HyperFlow [116], ONIX [75], SiBF [81], and Devolved Controllers [115]

try to distribute the control plane while maintaining logically centralized, eventually consistent

network state. Although these approaches have their own merits, they impose limitations on

applications they can run. �is is because they assume that all applications require the network-

wide state; hence, they cannot be of much help when it comes to local control applications. �at

said, the distributed controllers can be used to realize a scalable root controller, the controller

that runs non-local applications in Kandoo.

Middleboxes. An alternative way to introduce a new network function is outside the scope of

SDN in so�ware modeled as black-boxes sitting either at the core or at the edge of the network,

or so�ware deployed on processing resources available inside switches such as co-processors and

FPGAs. �e common trait among these proposals is that they all rely on SDN solely for steering

packets through black-boxes and packet processing units [58]. Middlebox architectures, such as

FlowStream [50], SideCar [113] and CoMb [110], provide scalable programmability in the data

plane by intercepting �ows using processing nodes in which network applications are deployed.

Kandoo is orthogonal to these approaches in the sense that it operates in the control plane,

but it provides a similar distribution for control applications. In a network equipped with

FlowStream, SideCar or CoMb, Kandoo can share the processing resources with middleboxes

(given that control and data plane applications are isolated) in order to increase resource

utilization and decrease the number of nodes used by Kandoo.

Active Networks. Active Networks (AN) and SDN represent di�erent schools of thought on

programmable networks. SDN provides programmable control planes, whereas ANs allow

programmability in networking elements at packet transport granularity by running code

encapsulated in the packet [107, 125] or installed on the switches [114, 36]. An extreme

deployment of Kandoo can deploy local controllers on all switches in the network. In such

a setting, we can emulate most functionality of ANs. �at said, Kandoo di�ers from active

networks in two ways. First, Kandoo does not provide inbound packet processing; instead, it


follows the fundamentally di�erent approach proposed by SDN. Second, Kandoo is not an all-

or-nothing solution (i.e., there is no need to have Kandoo support on switches). Using Kandoo,

network operators can still gain e�ciency and scalability using commodity middle boxes, each

controlling a partition of switches.

3.5 Remarks

Kandoo is highly con�gurable, quite simple and scalable. It uses a simple yet e�ective approach

for creating a distributed control plane: it processes frequent events in highly parallel local

control applications and rare events in a central location. As con�rmed by our experiments,

Kandoo scales remarkably better than a normal OpenFlow implementation, without modifying

switches or using sampling techniques.

Kandoo can co-exist with other controllers by using either FlowVisor [112] or customized

Kandoo adapters. Having said that, extra measures should be taken to ensure consistency. �e

major issue is that Kandoo local controllers do not propagate an OpenFlow event unless the root

controller subscribes to that event. �us, without subscribing to all OpenFlow events in all local

controllers, we cannot guarantee that existing OpenFlow applications work as expected.

In the next chapter, we present another distributed control platform, Beehive, that supports

a larger category of distributed control applications beyond local control applications. Using

Beehive, one can create a generic hierarchy of controllers as well as other topologies for control

applications.

Chapter 4

Beehive

As demonstrated in Chapter 3, Kandoo is quite e�ective when the control logic can be broken

into local and centralized applications. However, there are applications that are neither local

nor require the whole network-wide state. For example, a virtual networking application can

operate using the state of a single tenant and does not require a centralized view of the network.

Kandoo models such applications as centralized applications, which is unnecessarily limited.

Aiming to address this issue, in this chapter, we present Beehive which is powerful enough

to support a wide-range of applications from local to centralized, and a variety of distribution

models including generic hierarchies.

4.1 Introduction

Distributed control platforms are employed in practice for the obvious reasons of scale and

resilience [75, 61]. Existing distributed controllers are designed for scalability and availability,

yet expose the complexities and boilerplates of realizing a control application to network

programmers [86]. In such a setting, designing a distributed control application comprises

signi�cant e�orts beyond the design and implementation of the application’s core: one needs

to address consistency, concurrency, coordination, and other common hurdles of distributed

programming that are pushed into control applications in existing distributed control platforms.

59

Chapter 4. Beehive 60

Beehive. Our motivation is to develop a distributed control plane that is scalable, e�cient,

and yet straightforward to program. Our proposal is composed of a programming model1 and

a control platform. Using Beehive’s programming model, control applications are developed

as a collection of message handlers that store their state in dictionaries (i.e., a hash-map,

an associative array), intentionally similar to centralized controllers to preserve simplicity.

Beehive’s distributed control platform is the runtime of our programming model and is hidden

from network programmers. Exploiting the proposed programming model, it seamlessly

transforms the centralized application into their distributed counterparts.

Using this programming model, Beehive is able to provide fault-tolerance, instrumentation,

and automatic optimization. To isolate local faults, our platform employs application-level

transactions with an “all-or-nothing” semantic. Moreover, it employs mini-quorums to replicate

application dictionaries and to tolerate failure. �is is all seamless without intervention by the

programmer. Beehive also provides e�cient runtime instrumentation (including but not limited

to resource consumption, and the volume of messages exchanged) for control applications. �e

instrumentation data is used to (i) automatically optimize the control plane, and (ii) to provide

feedback to network programmers. In this chapter, we showcase a greedy heuristic as a proof of

concept for optimizing the placement of control applications, and provide examples of Beehive’s

runtime analytics for control applications.

Why not an external datastore? Emerging control platforms, most notably ONOS [17], try

to resolve controller complexities by delegating their state management to an external system

(e.g., replicated or distributed databases). In Beehive, we propose an alternative design since

delegating such functionality has important drawbacks.

It has been shown that, for SDN, using an external datastore incurs a considerable

overhead [76]. A�er all, embedding the state inside the controllers (as pioneered by ONIX [75])

has signi�cantly lower communication overheads than using an external system. Further,

creating a correct distributed control function requires coordination and synchronization1It is important to note that, by programming model, we mean a set of APIs to develop control applications

using a general purpose programming language, not a new domain-speci�c language (DSL).


among control applications. �is is essentially beyond state management as demonstrated

in [123, 23]. For example, even if two instances of a routing application store their state in a

distributed store, they need to coordinate in order to avoid blackholes and loops. Implementing

such coordination mechanisms eliminates many of the original motivations of utilizing an

external datastore.

More importantly, in such a setting, it is the external datastore that dictates the placement

and the topology of control functions. For instance, if the external datastore is built as a ring, the

control plane cannot form an e�ective hierarchy, and vice versa. �is lack of �ne-grained control

over the placement of state also limits the capabilities of the control plane to self-optimize,

which is one of the goals of Beehive and has shown to be crucial for SDN [37]. For instance,

consider a controller with a virtual networking application that is managing a virtual network.

If the virtual network migrates, the control plane needs to migrate the state in accordance to

the change in the workload. �is is di�cult to achieve without �ne-grained control. Similarly,

providing Beehive’s runtime instrumentation, analytics, and automatic optimization are quite

di�cult to realize when an external datastore is employed.

Overview. Our implementation of Beehive is open source and publicly available [2]. Using

Beehive, we have also implemented a full featured SDN controller [3] (presented in Chapter 5)

as well as important SDN and non-SDN sample applications such as Kandoo [57], tra�c

engineering, a key-value store (similar to memcached), a message queue, and graph processing.

With systematic optimizations, our implementation can failover in less than 200ms, and our

SDN controller can handle more than 200k OpenFlow messages per machine while persisting

and replicating the state of control applications. Interestingly, existing controllers that use an

external datastore can handle up to a thousand messages with persistence and replication. We

also demonstrate that our runtime instrumentation has negligible overheads. Moreover, using

this runtime instrumentation data, we showcase a heuristic algorithm that can optimize the

placement of control applications.


1 App LearningSwitch:// A message handler that handles messages of type PacketIn.

2 Rcv(pin PacketIn):3 switches = dict("Switches")

// Associate the source MAC address to the input port for the switch thathas sent the PacketIn message.

4 mac2port = switches.Get(pin.Switch)5 mac2port[pin.SrcMac] = pin.InPort6 switches.Put(pin.Switch, mac2port)

// Lookup the output port for the destination MAC address.7 outp = mac2port.Get(pin.DstMac)8 if !outp then

// Flood the packet if the destination MAC address is not learned yet.9 Reply(pin, PacketOut{Port: FLOOD_PORT, ...})10 return

// Send the packet to the learned output port.11 Reply(pin, PacketOut{Port: outp, ...})

// Install a flow on the switch to forward consequent packets.12 Reply(pin, FlowMod{...})

13 App OpenFlowDriver:14 Rcv(pout PacketOut):

// Send packet out to the switch using the OpenFlow connection.15 ...

16 Rcv(fmod FlowMod):// Send the flow mod to the switch using the OpenFlow connection.

17 ...

18 ...// The IO routine.

19 while read data from the OpenFlow connection do20 ...// The emitted PacketIn will be sent to all applications that has a

handler for PacketIn, including the LearningSwitch application.21 Emit(PacketIn{...})22 ...

Figure 4.1: A simple learning switch application in Beehive.

4.2 Programming Model

In Beehive, programs are written in a general purpose language (Go in our implementation)

using a speci�c set of APIs called the Beehive programming model. Creating the distributed

version of an arbitrary control application that stores and shares its state in free and subtle forms

is extremely di�cult in general. �e Beehive programming model enables us to do that, while

imposing minimal limitations on the control applications.

�e Anatomy of an Application. Using Beehive’s programming model, a control application is


implemented as a set of single-threaded message handlers (denoted by Rcv) that are triggered by

asynchronous messages and can emit further messages to communicate with other applications.

For example, as shown in Figure 4.1, a sample learning switch application has a Rcv function for

PacketIn messages, and an OpenFlow driver has Rcv functions for PacketOut and FlowMod

messages. In our design, messages are either (i) emitted to all applications that have a handler

for that particular type of message, or (ii) generated as a reply to another message. A reply

message is processed only at its destination application. For example, the OpenFlow driver

emits a PacketIn message whenever it receives the event from the switch. Inside the PacketIn

message, Beehive automatically stores that this message is sent by the OpenFlow driver. �e

PacketInmessage is processed in any application that has a handler forPacketInmessages, say

the learning switch and a discovery application. �e learning switch application, upon handling

this PacketIn message, replies to the message with a PacketOut and/or a FlowMod message,

and these messages are directly delivered to the OpenFlow driver, not any other application.

To preserve state, message handlers use local application dictionaries (i.e., maps or

associative array). In Figure 4.1, the learning switch application uses its Switches dictionary

to store the learned “MAC address to port mapping” for each switch. As shown in Figure 4.2,

the Switches dictionary associates a switch to the “MAC address to port mapping” learned

from the packets received from that switch. As explained in Section 4.3, these dictionaries

are transactional, replicated, and persistent behind the scenes (i.e., hidden from control

applications, and without network programmer’s intervention). Note that message handlers

are arbitrary programs written in a general purpose programming language, with the limitation

that any data stored outside the dictionaries is ephemeral.

Let us demonstrate this programming model for another simple example.

Example: Elephant FlowRouting. As shown in Figure 4.4, an elephant �ow routing application

can be modeled as four message handlers at its simplest: (i) Init, which handles SwitchJoined

messages emitted by the OpenFlow driver and initializes the Stats dictionary for switches.

(ii) Query, which handles periodic timeouts and queries switches. (iii) Collect, which handles


SwitchN

Switch1Switch1’s Port906:05:04:03:02:01

Switch1’s Port101:02:03:04:05:06

SwitchN’s Port405:06:07:08:09:0A

SwitchN’s Port80A:09:08:07:06:05

Switches

MAC Address to Port Association

Figure 4.2: �e LearningSwitch application (Figure 4.1) learns the MAC address to port mappingon each switch and stores them in the Switches dictionary.

StatReplies (i.e., sent by the OpenFlow switch in response to stat queries), populates the time-

series of �ow statistics in the Stats dictionary, and detects and reports tra�c spikes. (iv) Route,

which re-steers elephant �ows using FlowMod messages upon detecting a tra�c spike. Init,

Query, and Collect share the Stats dictionary to maintain the �ow stats, and Route stores an

estimated tra�c matrix in the TrafficMatrix dictionary. It also maintains its topology data,

which we omitted for brevity.

Inter-Dependencies. Message handlers can depend on one another in two ways: (i) they can

either exchange messages, or (ii) access an application-de�ned dictionary, only if they belong to

the same application. �e interplay of these two mechanisms enables the control applications to

mix di�erent levels of state synchronization and hence di�erent levels of scalability. As explained

shortly, communication using the shared state results in synchronization in the application in

contrast to asynchronous messages.

In our sample elephant �ow routing application, Init, Collect, and Query share the Stats

dictionary, and communicate with Route using TrafficSpike messages. Moreover, Init,

Collect, Query, and Route depend on the OpenFlow driver that emits SwitchJoined and

StatReply messages and handles StatQuerys and FlowMods.

Message handlers of two di�erent applications can communicate using messages, but cannot

access each other’s dictionaries. �is is not a limitation, but a programming paradigm that

fosters scalability (as advocated in Haskell, Erlang, Go, and Scala). Dependencies based on


1 App ElephantFlowRouting:// Init: Handles SwitchJoined messages.

2 Rcv(joined SwitchJoined):3 stats = Dict("Stats")4 switch = joined.Switch

// Initializes the stats dictionary for the new switch.5 stats.Put(switch, FlowStat{Switch: switch})

// Query: Handles timeout messages. The timer emits one timeout message persecond.

6 Rcv(to TimeOut):7 stats = Dict("Stats")

// Iterates over the keys in the stats dictionary. Note that theseentries are initialized in Init.

8 for each switch in stats:// Emits a StatQuery for each switch in the "stats" dictionary. Notethat StatQuery is handled by the OpenFlow driver.

9 Emit(StatQuery{Switch: switch})

// Collect: Handles StatReply messages emitted by the OpenFlow driver inresponse to StatQuery messages.

10 Rcv(reply StatReply):11 stats = Dict("Stats")12 switch = reply.Switch

// Compares the previous statistics with the new one, and emits aTrafficSpike message if it detects a significant change in the stats.

13 oldSwitchStats = stats.Get(switch)14 newSwitchStats = reply.Stats15 if there is a spike in newSwitchStats vs oldSwitchStats then16 Emit(TrafficSpike{Stats: newSwitchStats})

17 stats.Put(switch, newSwitchStats)

// Route: Handles messages of type TrafficSpike.18 Rcv(spike TrafficSpike):19 trafficMatrix = Dict("TrafficMatrix")20 topology = Dict("Topology")

// Update trafficMatrix based on spike and then find the elephant flows.21 ...22 elephantFlows = findElephantFlows(trafficMatrix)23 for each flow in elephantFlows:

// Emit flow mods to reroute elephant flows based on the topology.24 Emit(FlowMod{...})

25// Details on handling switch and link discovery is omitted for brevity.

Figure 4.3: �is simple elephant �ow routing application stores �ow periodically collects stats,and accordingly reroutes tra�c. �e schematic view of this algorithm is also portrayed inFigure 4.4.

messages result in a better decoupling of functions, and hence can give Beehive the freedom

to optimize the placement of applications. �at said, due to the nature of stateful control


Control Plane

Elephant Flow Routing

// CollectRcv(reply StatReply)

OpenFlow Driver SwitchJoined

SwitchJoined

StatReply

StatReply

StatQuery

StatQuery

Timeout

Timeout

FlowMod

FlowMod

TrafficSpike

TrafficSpike

DictionaryTopology

DictionaryStats

Timer

StatsNSN

Stats1S1

// InitRcv(joined SwitchJoined)

// QueryRcv(to Timeout)

// RouteRcv(spike TrafficSpike)


TrafficMatrix

Figure 4.4: �e schematic view of the simple elephant �ow routing application presented inFigure 4.3.

applications, functions inside an application can share state. It is important to note that Beehive

provides utilities to emulate synchronous and asynchronous cross-application calls on top of

asynchronous messages. �ese utilities are as easy to use as function calls across applications.

Consistent Concurrency. Transforming a “centralized looking” application to a concurrent,

distributed system is trivial for stateless applications: one can replicate all message handlers on

all cores on all machines. In contrast, this process can be challenging when we deal with stateful

applications. To realize a valid, concurrent version of a control application, we need to make

sure that all message handlers, when distributed on several controllers, have a consistent view

of their state and their behavior is identical to when they are run as a single-threaded, centralized

application.

To preserve state consistency, we need to ensure that each key in each dictionary is


exclusively accessed in only one thread of execution in the distributed control plane. In

our elephant �ow routing example (Figure 4.4), Init, Query and Collect access the Stat

dictionary on a per switch basis. As such, if we process all the messages of a given switch

in the same thread, this thread will have a consistent view of the statistics of that switch. In

contrast, Route accesses the whole Topology and TrafficMatrix dictionaries and, to remain

consistent, we have to make sure that all the keys in those dictionaries are exclusively owned by

a single thread. Such a message handler is essentially centralized.

4.3 Control Platform

Beehive’s control platform acts as the runtime environment for the proposed programming

model. In other words, the programming model is the public API that is used to develop control

applications, and the control platform is our implementation of that API. We have implemented

Beehive’s control platform in Go. Our implementation is open source and available [2] as a

generic, distributed message passing platform with functionalities such as queuing, parallelism,

replication, and synchronization, with no external dependencies. �is control platform is built

based on the primitives of hive, cells, and bees for concurrent and consistent state access in a

distributed fashion with runtime instrumentation and automatic optimization. We note that

these primitives (i.e., hives, cells and bees) are internal to the control platform and are not

exposed. �at is, network programmers are not expected to know or use these concepts in their

control applications.

4.3.1 Primitives

Hives and Cells. In Beehive, a controller is denoted as a hive that maintains applications’

state in the form of cells. Each cell is a key-value pair in a speci�c application dictionary:

(dict, key, val). For instance, our elephant �ow routing application (Figure 4.4) has a cell

for the �ow statistics of switch S in the form of (Stats, S, StatsS).


Hive on Machine 3 App A App B

Hive on Machine 1

App A

rcv 1

rcv 2

map 1

map 2

Hive on Machine 2App A App B

MMMessages

Will be processed by the red Bee using A’s Rcv 1.

App B

rcv 1map 1

M

M

Figure 4.5: In Beehive, each hive (i.e., controller) maintains the application state in the formof cells (i.e., key-value pairs). Cells that must be colocated are exclusively owned by one bee.Messages mapped to the same cell(s) are processed by the same bee.

Bees. For each set of cells that must be co-located to preserve consistency (as discussed in

Section 4.2), we create an exclusive, logical and light-weight thread of execution called a bee.

As shown in Figure 4.5, upon receiving a message, the hive �nds the particular cells required to

process that message in a given application and, consequently, relays the message to the bee that

exclusively owns those cells. A bee, in response to a message, invokes the respective message

handlers using its cells as the application’s state. If there was not such a bee, the hive creates one

to handle the message.

In our elephant �ow routing example, once the hive received the �rst SwitchJoined

message for switch S, it creates a bee that exclusively owns the cell (Stats, S, StatsS) since

the Rcv function (i.e., Init) requires that cell to process the SwitchJoined message. �en,

the respective bee invokes the Rcv function for that message. Similarly, the same bee handles

consequent StatReplys for S since it exclusively owns (Stats, S, StatsS). �is ensures that

Collect and Init share the same consistent view of the Stats dictionary and their behavior is

identical to when they are deployed on a centralized controller, even though the cells of di�erent

switches might be physically distributed over di�erent hives.

Finding the Bee.�e precursor to �nding the bee to relay a message is to infer the cells required

to process that message. In Beehive’s control platform, each message handler is accompanied

with a Map function that is either (i) automatically generated by the platform or (ii) explicitly

implemented by the programmer.


Map(A, M) is a function of applicationA that maps a message of typeM to a set of cells (i.e., keys

in application dictionaries {(D1, K1), . . . , (Dn, Kn)}). We call this set, the mapped cells of M in A.

We note that, although we have not used a stateful Map function, a Map function can be stateful

with the limitation that its state is local to each hive.

We envision two special cases for the mapped cells: (i) if nil, the message is dropped for

application A. (ii) if empty, the message is broadcasted to all bees on the local hive. �e latter

is particularly useful for simple iterations over all keys in a dictionary (such as Query in our

elephant �ow routing example).

AutomaticallyGenerated Map Functions. Our control platform is capable of generating the Map

functions at both compile-time and runtime. Our compiler automatically generates the missing

Map functions based on the keys used by message handlers to process a message. �e compiler

parses the message handlers, prunes the code paths that are not reachable by a dictionary access

instruction, and generates the code to retrieve respective keys to build the intended mapped

cells.

// InitRcv(joined SwitchJoined): stats = dict("Stats")

switch = joined.Switch

stats.Put(switch, FlowStat{Switch: switch})

// QueryRcv(to TimeOut): stats = Dict("Stats")

for each switch in stats:

Emit(StatQuery{Switch: switch})

// CollectRcv(reply StatReply): stats = Dict("Stats")

switch = reply.Switch

oldSwitchStats = stats.Get(switch) newSwitchStats = reply.Stats if there is a spike then Emit(TrafficSpike{Stats: newSwitchStats})

stats.Put(switch, newSwitchStats)

Map(joined SwitchJoined): switch = joined.Switch return {(“Stats”, switch)}

Map(to TimeOut): return {}

Map(reply StatReply): switch = reply.Switch return {(“Stats”, switch)}

Compiler

Compiler

Compiler

Figure 4.6: Automatically generated Map functions based on the code in Figure 4.4.


For example, as shown in Figure 4.6, the compiler generates the Map functions based on the

code in Init, Query, and Collect, by parsing the Rcv functions and inferring the dictionary

accesses. Processing Collect, for instance, our compiler �rst detects that the stats local

variable is a reference to a dictionary named Stats. �en, it �nds the Put and Get invocations

on Stats. Parsing the parameters of those functions, it infers that the local variable switch

is passed as the key to both Get and Put invocations. �us, the Map function for Collect

should return ("Stats", switch). Since switch is an expression, the compiler also needs to

include all the statements that a�ect switch (e.g., switch = reply.Switch), while ignoring all

other statements that have no e�ects on this expression. In other words, the compiler keeps the

statements required to create the keys used to access dictionaries in the Rcv function.

Our compiler treats foreach loops in a di�erent way. In cases that the statements inside the

foreach loop access only the iterator key, it generates a local broadcast (i.e., an empty mapped

cells) for the Map function. For instance, in Query, we only access switch, which is the iterator

key. As such, Query accesses the dictionary on a per-key basis. �us, if we send the TimeOut

message to all bees on the current hive, each bee will call Query that iterates on all the cells of

that bee. �is is the expected behavior for Query.

In complicated cases where the compiler does not have enough information to infer the keys

(e.g., cross-library or recursive calls), it generates a Map function that results in a centralized

application. �is ensures the generated Map function is correct, but not necessarily e�cient.

To mitigate that, Beehive also provides a generic runtime Map function that invokes the Rcv

function for the message and records all state accesses in that invocation without actually

applying the modi�cations. �is method is generic and widely applicable, but it has a higher

cost compared to when the Map function is generated by the compiler or when it is explicitly

provided by the application.

Life of aMessage. In Beehive, a message is basically an envelope wrapping a piece of data that is

emitted by a bee and optionally has a speci�c destination bee. As shown in Figure 4.7, a message

has the following life-cycle on our platform:


M

1

App

rcv

map M2

App

rcv

map4

5

Hive 1 Hive 2

Mapped to the red cells.

3

R

Replies (if any) are directly sent to the bee that has emitted the M message.

M

6

Figure 4.7: Life of a message in Beehive.

1. A bee emits the message upon receiving an event (such as an IO event, or a timeout) or replies

to a message that it is handling in a Rcv function. In the later case, the message will have a

particular destination bee.

2. If the emitted message has a destination bee, it is directly relayed to that bee. Otherwise, on

the same hive, we pass the message to the Map functions of all applications that have a handler

for that type of message. �is is done asynchronously via queen bees. For each application on

each hive, we allocate a single queen bee that is responsible to map messages using the Map

function of its application. Queen bees are in essence Beehive’s message routers.

3. For each application, the Map function maps the message to application cells. Note that hives

are homogeneous in the sense that they all host the same set of applications, even though

they process di�erent messages and their bees and cells are not identical at runtime.

4. A�er �nding the mapped cells of a message, the queen bee tries to �nd the bee that exclusively

owns those cells. If there is such a bee (either on the same hive or on a remote hive), the

message is accordingly relayed. Otherwise, the queen bee creates a new bee2, assigns the cells

to it, and relays the message. To assign cells to a bee, we use the Ra� consensus algorithm [96]

among hives. Paying one Ra� consensus overhead to create a bee can be quite costly. To

avoid that, queen bees batch lock proposals (used to assign cells to bees) for messages that

are already enqueued with no added latency or wait. �is results in paying one Ra� overhead

2Although by default the new bee is created on the local hive andmoved to an optimal hive later, the applicationcan provide a custom placement strategy in cases where the default local placement is not a good �t for thatapplication’s design.


●

● ● ● ● ●2k

4k

6k

8k

1 3 5 7 9 11Cluster Size

Thr

ough

put (

bee/

s)

Batch Sizes: ● 128 256 512 1024 2048 4096

Figure 4.8: �roughput of a queen bee for creating new bees, using di�erent batch sizes. Here,we have used clusters of 1, 3, 5, 7, 9 and 11 hives. For larger batch sizes, the throughput of a queenbee is almost the same for all clusters.

for each batch. When the queen bee sends a batch of lock proposals, each proposal can either

succeed or fail. If a lock proposal fails, a bee on another hive has already locked the cells. In

such cases, the queen bee relays the message to the remote bee. As shown in Figure 4.8, the

average latency of creating a new bee can be as low as 140µs even when we use a cluster of 11

hives. Clearly, the larger the batch size, the better the throughput.

5. For each application, the respective bee processes the message by invoking the Rcv function.

6. If the Rcv function function replies to the message, the reply message is directly sent to the

bee that has emitted the received message (i.e., emitted in Step 1), and if it emits a message,

the message will go through Step 1.

It is important to note that, in our control platform, we use an unreliable message delivery

mechanism by default. For applications that require reliable message delivery, Beehive provides

lightweight reliable delivery wrappers. �ese wrappers have their own overheads as they store

their messages in a dictionary, exchange sequenced messages with other bees, acknowledge

received messages, drop duplicate messages, and retransmit messages upon timeouts.

Preserving Consistency Using Map. To preserve consistency, Beehive guarantees that all

messages with intersecting mapped cells for application A are processed by the same bee

using A’s message handlers. For instance, consider two messages that are mapped to

{(Switch, 1), (MAC, A1 ∶ ...)} and {(Switch, 1), (Port, 2)} respectively by an application. Since

these two messages share the cell (Switch, 1), the platform guarantees that both messages are


handled by the same bee that owns the 3 cells of {(Switch, 1), (MAC, A1 ∶ ...), (Port, 2)}. �is

way, the platform prevents two di�erent bees from modifying or reading the same part of the

state. As a result, each bee is identical to a centralized, single-threaded application for its own

cells.

Signi�cance of Map.�e exact nature of the mapping process (i.e., how the state is accessed) can

impact Beehive’s performance. Even though Beehive cannot automatically redesign message

handlers (say, by breaking a speci�c function in two) or change Map patterns for a given

implementation, it can help developers in that area by providing feedback that includes various

performance metrics, as we will explain in Section 4.3.4.

Con�icting Mapped Cells. It is rare3 yet possible that the platform observes con�ict-

ing mapped cells. For example, consider an application that maps two messages to

{(Mac, FF...)} and {(Port, 12)}. For these two mapped cells, Beehive will assign two distinct

bees since they are not intersecting. Now, if the application maps another message to

{(Switch, 1), (Mac, FF...), (Port, 12)}, we cannot assign this mapped cell to any of the existing

bee, nor can start a new one. In such con�icting cases, before processing the con�icting message,

the platform stops the bees with con�icting mapped cells, consolidates their dictionaries, and

allocate a new bee that owns all the three cells in {(Switch, 1), (Mac, FF...), (Port, 12)}. If

the bees are on di�erent hives, the platform uses Beehive’s bee migration mechanism that is

explained shortly.

Auxiliaries. In addition to the functions presented here, Beehive provides auxiliary utilities for

proactively locking a cell in a handler, for postponing a message, for composing applications, and

for asynchronous and synchronous cross-application calls (i.e., continuation): (i) Applications

can send message M directly to the bee responsible for cell (D, K) of Application A using

SendToCell(M, A, (D, K)). (ii) Using Lock(A, (D, K)), a bee running application A can proac-

tively lock the cell (D, K). �is is very useful when a bee needs to own a cell prior to receiving

any message. (iii) Using Snooze(M, T) immediately exits the message handler and schedules3We have not observed this situation in our real-world use-cases.


message M to be redelivered a�er timeout T. (iv) Applications can use the DeferReply utility

to postpone replying to a message. �is utility returns a serializable object that applications can

store in their dictionaries, and later use to reply to the original message. �is method is Beehive’s

way of providing continuation.

4.3.2 Fault-Tolerance

For some simple applications, having a consistent replication of the state is not a requirement nor

is transactional behavior. For example, a hub or learning switch can simply restart functioning

with an empty state. For most stateful applications, however, we need a proper replication

mechanism for cells.

Transactions. In Beehive, Rcv functions are transactional. Beehive transactions include the

messages emitted in a function in addition to the operations on the dictionaries. Upon calling

the Rcv function for a message, we automatically start a transaction. �is transaction bu�ers

all the emitted messages along with all the updates on application dictionaries. If there was

an error in the message handler, the transaction will be automatically rolled back. Otherwise,

the transaction will be committed and all the messages will be emitted by the platform. We

will shortly explain how Beehive batches transactions to achieve high throughput, and how it

replicates these transactions for fault-tolerance.

It is important to note that Beehive, by default, hides all the aspects of transaction

management. With that, control applications are implemented as if there is no transaction.

Having said that, network programmers have full control over transactions, and can begin,

commit, or abort a new transaction if they choose to.

Colonies. As shown in Figure 4.9, to replicate the transactions of a bee, the platform

automatically forms a colony. Each colony has one leader bee and a few follower bees in

accordance to the application’s replication factor.4 In a colony, only the leader bee receives and

processes messages, and followers act as purely passive replicas.4Applications can choose to skip replication and persistence.


Hive 1Hive 2

Hive 3

Hive 4

Hive 5

Hive 6

Hive 7

Hive 8

Bee QuorumsHive Quorum

Figure 4.9: Hives form a quorum on the assignment of cells to bees. Bees, on the other hand,form quorums (i.e., colonies) to replicate their respective cells in a consistent manner. �e sizeof a bee colony depends on the replication factor of its application. Here, all applications have areplication factor of 3.

Replicating Transactions. Bees inside the colony form a Ra� [96] quorum to replicate the

transactions. Note that the platform enforces that the leader of the Ra� quorum is always the

leader bee. Before committing a transaction, the leader �rst replicates the transaction on the

majority of its followers. Consequently, the colony has a consistent state and can tolerate faults

as long as the majority of its bees (12replication_factor + 1) are alive.

Batched Transactions. With a naive implementation, replicating each transaction incurs the

overhead of a quorum for each message. �is hinders performance and can result in subpar

throughput for real world applications. To overcome this issue, we batch transactions in

Beehive and replicate each batch once. With that, the overheads of replicating a transaction

are amortized over all the messages processed in that batch.

To evaluate the impact of batching on Beehive’s throughput, we have implemented a simple

key-value store conceptually similar to memcached [43] in less than 150 lines of code. To

measure its throughput, we deployed our sample key-value store application on a 5-node cluster

on Google Compute Engine. Each VM has 4 cores, 16GB of RAM, 10GB of SSD storage.

We con�gured the application to use 1024 buckets for the keys with replication factors of 1, 3


1k

100k

200k

300k

128 2k 4k 6k 8kBatch Size

Thr

ough

put (

msg

/s) Replication Factors: 1 3 5

(a)

100

101

102

103

GET

PUT

Method

Late

ncy

(ms)

When the key is on a remote hive.

When the key is on the local hive.

(b)

Figure 4.10: �e latency and throughput of the sample key-value store application: (a)�roughput using replication factors of 1 to 5, and batch sizes of 128 to 8k. (b) �e PDF ofthe end-to-end latency for GET and PUT requests in a cluster of 20 hives and 80 clients.

and 5, and measured the throughput for batch sizes from 128 to 8192 messages. In each run, we

emit 10M PUT (i.e., write) messages and measure the throughput. As shown in Figure 4.10a,

using larger batch sizes signi�cantly improves the throughput of Beehive applications. �is

improvement is more pronounced (3 to 4 orders of magnitude) when we use replication factors

of 3 and 5, clearly because the overheads are amortized. We note that here, for measurement

purposes, all the master bees are on the same hive. In real settings, one observes a higher

aggregate throughput from all active hives.

Batching vs Latency. In our throughput micro-benchmarks, we send requests to our application

in batches. Although that is the case in practice for most applications, one might ask how the

platform performs when the requests are sent once the previous request is retrieved. To answer

this question, we used another 20-node cluster to query the key-value store sample application.

Each node has 4 clients that send PUT or GET requests over HTTP and then measure the


response time (including the HTTP overheads).

As shown in Figure 4.10b, despite the lack of an e�ective batching for requests, Beehive

handles most GETs in less than 1ms if the key is on the local hive, and less than 7ms if the key is

on a remote hive. �e reason for such a low read latency is that there is no need to replicate read-

only transactions. Moreover, it handles most PUTs in less than 20ms. �is indicates that, even in

such a worst-case scenario, Beehive handles requests with a reasonable latency. As demonstrated

above, in the presence of a batch of messages, Beehive achieves a signi�cantly better latency on

average.

Elasticity. Beehive’s colonies are elastic. When a follower fails or when a hive leaves the cluster,

the leader of the colony creates new followers on live hives if possible. Moreover, if a new hive

joins the cluster and the colony is not fully formed, the leader automatically create new followers.

Note that, forming a consistent colony is only possible when the majority of hives are alive since

any update to a colony requires a consensus among hives.

Failover. When the leader bee fails, one of the follower bees will become the leader of the

Ra� quorum. Once the new leader is elected, the quorum updates its transaction logs and any

uncommitted (but su�ciently replicated) transaction will be committed (i.e., the state updates

are applied and bu�ered messages are emitted). Note that the failover timeout (i.e., the timeout

to start a new election) is con�gurable by the user.

To demonstrate the failover properties of Beehive, we ran the key-value store sample

application with a replication factor of 3 on a 3-node cluster. In this experiment, we allocate

the master bee on Hive 1, and send the benchmark queries to Hive 2. A�er ~5 seconds, we

kill Hive 1, and measure the time the system takes to failover. As shown in Figure 4.11, it takes

198ms, 451ms, and 753ms for the sample key value store application to failover respectively for

Ra� election timeouts of 100ms, 300ms, and 500ms. Clearly, the smaller the election timeout,

the shorter the failover. Having said that, users should choose a Ra� election timeout that does

not result in spurious timeouts based on the characteristics of their network.

Beehive’s failover time would not be adversely a�ected when we increase the number of hives


010203040

010203040

010203040

100300

500

3 4 5 6 7 8Time (sec)

Late

ncy

(ms) 198ms

451ms

753ms

Election Timeout

Election Timeout (m

s)

Figure 4.11: Beehive failover characteristics using di�erent Ra� election timeouts. In thisexperiment, we use the key-value store sample application with a replication factor of 3 on a3-node cluster. Smaller election timeouts result in faster failover, but they must be chosen toavoid spurious timeouts based on the network characteristics.

or the colony size, since Ra� elections will start according to the election timeout regardless

of the quorum size. Moreover, Ra� adds randomness in the election timeout to avoid having

multiple candidates at the same time. �is reduces the probability of unsuccessful elections.

Network Partitions. In the event of a network partition, by default, the platform will be fully

operational in the partition with the majority of nodes to prevent a split-brain. For colonies,

however, they will be operational as long as the majority of bees are in the same partition.

Moreover, if a leader bee falls into a minority partition, it can serve messages with a read-only

access on its state. For example, when a LAN is disconnected from other parts of the network, the

bees managing switches in that LAN will be able to keep the existing switches operational until

the LAN reconnects to the network. To achieve partition tolerance, application dependencies

and their placement should be properly designed. Headless dependencies can result in subpar

partition tolerance, similar to any multi-application platform. To mitigate partitions, one can

deploy one cluster of Beehive in each availability zone and connect them through easy-to-use

proxy functionalities that Beehive provides.


4.3.3 Live Migration of Bees

We provide the functionalities to migrate a bee from one hive to another along with its cells.

�is is instrumental in optimizing the placement of bees.

Migration without Replication. For bees that are not replicated (i.e., have replication factors

of one), Beehive �rst stops the bee, bu�ers all incoming messages, and then replicates its cells

to the destination hive. �en a new bee is created on the destination hive to own the migrated

cells. At the end, the cells are assigned to the new bee and all bu�ered messages are accordingly

drained.

Migration with Replication. We use a slightly di�erent method for bees that are replicated. For

such leader bees, the platform �rst checks whether the colony has a follower on the destination

hive. If there is such a follower, the platform simply stops the current leader from processing

further messages while asking the follower to sync all the replicated transactions. A�erwards,

the leader stops, and the follower explicitly campaigns to become the new leader of its colony.

A�er one Ra� election timeout, the previous leader restarts. If the colony has a new leader, the

leader would become a follower. Otherwise, this is an indication of a migration failure, and the

colony will continue as it was. In cases where the colony has no follower on the destination hive,

the platform creates a new follower on the destination hive, and performs the same process.

4.3.4 Runtime Instrumentation

Control functions that access a minimal state would naturally result in a well-balanced load

on all controllers in the control platform since such applications handle events locally in small

silos. In practice, however, control functions depend on each other in subtle ways. �is makes

it di�cult for network programmers to detect and revise design bottlenecks.

Sometimes, even with apt designs, suboptimalities incur because of changes in the workload.

For example, if a virtual network is migrated from one datacenter to another, the control

functions managing that virtual network should also be placed in the new datacenter to


minimize latency.

�ere is clearly no e�ective way to de�ne a concrete o�ine method to optimize the control

plane placement. For that reason, we rely on runtime instrumentation of control applications.

�is is feasible in our platform, since we have a well-de�ned abstraction for message handlers,

their state, and the messages. Beehive’s runtime instrumentation tries to answer the following

questions: (i) how are the messages balanced among hives and bees? (ii) which bees are

overloaded? and, (iii) which bees exchange the most messages?

Metrics. Our runtime instrumentation system measures the resource consumption of each bee

along with the number of messages it exchanges with other bees. For instance, we measure

the number of messages that are exchanged between an OpenFlow driver of a switch and a

virtual networking application managing a particular virtual network. �is metric essentially

indicates the correlation of each switch to each virtual network. We also store the provenance

and causation data for messages. For example, in a learning switch application, we can report

that most PacketOut messages are emitted by the learning switch application upon receiving

PacketIn’s.

Design. We measure runtime metrics on each hive locally. �is instrumentation data is further

used to �nd the optimal placement of bees and also utilized for application analytics. We

implemented this mechanism as a control application using Beehive’s programming model. �at

is, each bee emits its metrics in the form of a message. �en, our application locally aggregates

those messages and sends them to a centralized application that optimizes the placement of the

bees. �e optimizer is designed to avoid complications such as con�icts in bee migrations.

Overheads of Runtime Instrumentation. One of the common concerns about runtime instru-

mentation is its performance overheads. To measure the e�ects of runtime instrumentation,

we have run the key-value store sample application for a batch size of 8192 messages and a

replication factor of 1, with and without runtime instrumentation. As shown in Figure 4.12,

Beehive’s runtime instrumentation imposes a very low overhead (less than 3%) on the bees

as long as enough resources are provisioned for runtime instrumentation. �is is because


●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

300k

310k

320k

330k

340k

350k

OFF ONRuntime Instrumentation

Thr

ough

put (

msg

/s)

Figure 4.12: �e PDF of a bee’s throughput in the sample key-value application, with and withoutruntime instrumentation.

instrumentation data is processed asynchronously, and does not interfere with the critical path

in message processing.

4.3.5 Automatic Placement

By default, a hive which receives the �rst message for a speci�c cell successfully acquires the

lock and creates a new local bee for that cell. At this point, all subsequent messages for the same

mapped cells, will be processed on the newly created bee on this hive. �is default behavior can

perform well when most messages have locality (i.e., sources are in the close vicinity), which is

the case for most networks [16]. Having said that, there is a possibility of workload migration.

In such cases, Beehive needs to self-optimize via optimizing the placement of bees.

Finding the optimum placement of bees is NP-Hard5 and we consider it outside the scope of

this thesis. Here, we employ a greedy heuristic aiming at processing messages close to their

source. �at is, our greedy optimizer migrates bees among hives aiming at minimizing the

volume of inter-hive messages. We stress that optimality is not our goal and this heuristic is

presented as a proof of concept.

It is important to note that, in Beehive, bees that have an open socket or �le cannot be

migrated, since Beehive cannot move the open sockets and �les along with the bee. As a result,

our greedy heuristic will move other bees close to the bees performing IO, if they exchange a lot

5By reducing the facility location problem.


of messages. For instance, as we will demonstrate in Section 5.2, bees of a forwarding application

will be migrated next to the bee of an OpenFlow driver, which improves latency. We also note

that, using our platform, it is straightforward to implement other optimization strategies, which

we leave to future work.

Hives (1−40)

Hiv

es (

1−40

)

(a)

Hives (1−40)H

ives

(1−

40)

(b)

101

102

103

104

0 20 40 60 80 100

Time (sec)

BW (K

B/s)

Centralized Migration

Optimized

(c)Figure 4.13: Inter-hive tra�c matrix of the elephant �ow routing application (a) with an idealplacement, and (b) when the placement is automatically optimized. (c) Bandwidth usage of thecontrol plane in (b).

E�ectiveness of Automatic Placement. To demonstrate the e�ectiveness of our greedy

placement algorithm, we have run an experiment using our elephant �ow routing example

shown in Figure 4.4 and measured the tra�c matrix between hives which is readily provided by

Beehive. We have simulated a cluster of 40 controllers and 400 switches in a simple tree topology.

We initiate 100 �xed-rate �ows from each switch, and instrument the elephant �ow routing

application. 10% of these �ows have spikes (i.e., detected as an elephant �ow in Collect).

In the ideal scenario shown in Figure 4.13a, the switches are locally polled and all

messages are processed on the same hives except the TrafficSpike messages that are centrally

processed by Route. To demonstrate how Beehive can dynamically optimize the control

plane, we arti�cially assign the cells of all switches to the bees on the �rst hive. Once our

runtime instrumentation collects enough data about the futile communications over the control

channels, it starts to migrate the bee invoking Collect and Query to the hive connected to

each switch. In particular, it migrates the cell (S, SWi, StatSWi) next to the OpenFlow driver

that controls SWi. As shown in Figures 4.13b and 4.13c, this live migration of bees localizes

message processing and results in a placement similar to Figure 4.13a. Note that this is all done

automatically at runtime with no manual intervention.


Application-De�ned Placement. In Beehive, applications can de�ne their own explicit,

customized placement method. An application-de�ned placement method can be de-

�ned as a function that chooses a hive among live hives for a set of mapped cells:

Place(cells, live_hives). �is method is called whenever the cells are not already assigned

to a bee. Using this feature, applications can simply employ placement strategies such as

random, consistent hashing, and resource-based placements (e.g., delegate to another hive if

the memory usage is more than 70%) to name a few. It is important to note that, if none of these

solution works for a particular use-case, clearly, one can implement a centralized load balancer

application and a distributed worker application.

4.3.6 Application Composition

Our approach to foster code reuse in the control platform is to exchange messages among

applications. Although this approach results in a high degree of decoupling, sometimes network

programmers need to compose applications with an explicit, prede�ned ordering. For example,

for security reasons, one might want to compose an access control application with a learning

switch in such a way that messages are always processed by the access control application �rst,

and then relayed to the learning switch.

For such cases, Beehive provides generic composers that compose a set of control functions

into a single composed function. �e composed function isolates the dictionaries of di�erent

applications. �e Map function of this composition returns the union of the mapped cells of the

composed functions to preserve consistency.

In Beehive, we provide two generic composers:

1. All(f1, f2, . . . ) composes a set of message handlers as a sequence in such a way that the

incoming message is passed to the next function only if the previous function has successfully

handled the message. If there is an error in any function the transaction will be aborted.

2. Any(f1, f2, . . . ) composes a set of message handlers in such a way that the incoming message


Use-Case Code SizeHub 16 LOCKandoo [57] 47 LOCLearning Switch 69 LOCElephant Flow Re-Routing 93 LOCKey-Value Store 136 LOCMessage Queue 158 LOCPub-Sub 162 LOCNetwork Virtualization 183 LOCNetwork Discovery 239 LOCRouting 385 LOCONIX NIBs 471 LOC

Table 4.1: Sample use-cases implemented in Beehive and their code size

is passed to the next function only if the previous function could not successfully handle the

message. In other words, Any guarantees that the side e�ect of at most one of these functions

will be applied to the composed application’s state.

4.4 Applications

As shown in Table 4.1, various use-cases are implemented using Beehive with an e�ort

comparable to existing centralized controllers. Note that these numbers include the generated

Map functions. �ese numbers indicate that the developer e�orts have been devoted to the core

of the use-cases not the boilerplates of distributed programming, and the end result can be run

as a distributed application.

Interestingly, quite a few of these use-cases are similar to well-known distributed archi-

tectures. For example, the key-value store application behaves similar to memcached [43]

and our distributed graph processing application behaves similar to Pregel [83]. �is implies

that Beehive’s programming model would accommodate a variety of distributed programming

�avors without hindering simplicity.

It is important to note that, among these applications, ONIX NIB is the largest project. �e

reason is that more than 60% of our code was devoted to implementing NIB’s object model


(e.g., node, links, paths, etc.) and their APIs. We expected any centralized implementation of

ONIX NIBs to have the same amount of programming complexity. Moreover, we note that the

active component that manages these NIB objects is less than 100 LOCs.

Centralized Applications. A centralized application is a composition of functions that require

the whole application state in one physical location. In our framework, a function is centralized

if it accesses the whole dictionaries to handle messages. �e Map function of a centralized

application should return the same value (i.e., the same key) for all messages. As discussed

earlier in Section 4.3, for such a message handler, Beehive guarantees that all the messages are

processed in one single bee. It is important to note that, since applications do not share state,

the platform may place di�erent centralized applications on di�erent hives to satisfy extensive

resource requirements (e.g., a large state).

1 App Centralized:2 ...3 Map(msg):

// Returning key 0 in dictionary D for all messages, this applicationwill process all messages in the same thread (i.e., in the same bee).As such, this application becomes a centralized application.

4 return {(D, 0)}

Figure 4.14: A centralized application maps all messages to the same cell. �is results inprocessing all messages in the same bee.

Kandoo. At the other end of the spectrum, there are local control applications proposed in

Kandoo [57] that use the local state of a single switch to process frequent events. �e message

handlers of a local control application maintain their state on a per switch basis: they use switch

IDs as the keys in their dictionaries and, to handle messages, access their state using a single key.

As such, their Map function would map messages to a cell speci�c to that switch (Figure 4.15).

As such, Beehive conceives one bee for each switch. If the application uses the default placement

method, the bee will be placed on the hive that receives the �rst message from the switch. �is

e�ectively results in a local instance of the control application.

An important advantage of Beehive, over Kandoo, is that it automatically pushes control


functions as close as possible to the source of messages they process (e.g., switches for local

applications). In such a setting, network programmers do not deliberately design for a speci�c

placement. Instead, the platform automatically optimizes the placement of local applications.

�is is in contrast to proposals like Kandoo [57] where the developer has to decide on function

placement (e.g., local controllers close to switches). �e other di�erence is that Beehive requires

one Ra� proposal to lock the local cells for the local bee, while Kandoo does not incur any

consensus overhead in the local controllers. �is overhead is, however, negligible since it incurs

only for the �rst message of each local bee.

1 App Local:2 ...3 Map(msg):

// Returning the switch ID as the mapped cell for each message, themessages of each switch will be processed in their own thread(i.e., the bee that exclusively owns the cell allocated for theswitch). Using the default placement method, the bee will be placedon the first hive that receives a message from that switch. Thiseffectively results in a local Kandoo application.

4 return {(D, msg.Switch)}

Figure 4.15: A local application maps all messages to a cell local to the hive (e.g., the switch’s ID).

ONIX’s NIB [75]. NIB is basically an abstract graph that represents networking elements and

their interlinking. To process a message in NIB manager, we only need the state of a particular

node. As such, each node would be equivalent to a cell managed by a single bee in Beehive. With

that, all queries (e.g., �ows of switch) and update messages (e.g., adding an outgoing link) on a

particular node in NIB will be handled by its own bee in the platform. In Chapter 5, we present

Beehive SDN controller that provides an abstraction similar to ONIX’s NIBs.

1 App NetworkManager:2 ...3 Map(msg):

// By mapping messages to their nodes, the network manager will maintainthe NIB on a per-node basis.

4 return {(D, msg.NodeID)}

Figure 4.16: To manage a graph of network objects, one can easily map all messages to the node’sID. �is way nodes are spread among hives on the platform.


Logical xBars [86]. Logical xBar represents the control plane as a hierarchy in which each

controller acts as a switch to its parent controller. To emulate such an architecture (and any

similar hierarchical controller), we attach a hierarchical identi�er to network messages and store

the state of each xBar controller using the same identi�ers as the keys in the application state.

�is means each controller in xBar hierarchy would have its own cell and its own bee in Beehive.

Upon receiving a message, Beehive Map’s the message to its cell using the message’s hierarchical

identi�er and is hence processed by its respective bee.

Network Virtualization. Typically, network virtualization applications (such as NVP[74])

process messages of each virtual network independently. Such applications can be modeled

as a set of functions that, to process messages, access the state using a virtual network identi�er

as the key. �is is basically sharding messages based on virtual networks, with minimal shared

state in between the shards. Each shard basically forms a set of colocated cells in Beehive and

the platform guarantees that messages of the same virtual network are handled by the same bee.

1 App NetworkVirtualization:2 ...3 Map(create CreatePort):

// By mapping the create port message to the entry in the Portsdictionary and the virtual network entry in the VN dictionary, thisport will always be processed in the bee that owns the virtualnetwork.

4 return {(Ports, create.PortID), (VN, create.VirtualNetworkID)}

5 Map(update PortStatusUpdate):6 return {(Ports, update.PortID)}7 Map(msg VNMsg):8 return {(VN, msg.VirtualNetworkID)}

Figure 4.17: For virtual networks, messages of the same virtual network are naturally processedin the same bee.

4.5 Discussion

What are the main limitations of Beehive’s programming model? As we mentioned in

Section 4.2, in Beehive, applications cannot store persistent data outside the application


dictionaries. Since the value of an entry in a dictionary can be an arbitrary object, this

requirement would not impose a strong limitation on control applications. We note that this

requirement is the key enabler for the platform to automatically infer the Map function and to

provide fault-tolerance for control applications.

Another design-level decision is to provide a serialized, consistent view of messages for

each partition of cells owned by a bee. �at is, each bee is a single thread that processes its

messages in sequence. �us, Beehive does not provide eventual consistency by default and

instead processes messages in consistent shards. A Beehive application can, in fact, implement

eventual consistency by breaking the functionalities into local message handlers and using

asynchronous messages to share state instead of dictionaries. We note that eventual consistency

has many implementation and performance drawbacks [21, 32].

How does Beehive compare with existing distributed programming frameworks? �ere is a

signi�cant body of research on how to simplify distributed programming. Some proposals focus

on remote invocations. �is goes as early as CORBA to the programming environments such as

Live Objects[98]. Moreover, special-purpose programming models, such as MapReduce [35],

MPI, Cilk [19], Storm [1] and Spark [132] are proposed for parallel and distributed processing.

Beehive has been inspired by the approach taken by these proposals to simplify distributed

programming. Having said that, the focus and the applications of these proposals are di�erent

than what we need. �erefore, they cannot be used to as a basis to implement Beehive. For

example, MapReduce and Spark are suitable for batch jobs and Storm is designed for one-way

stream processing. In contrast, Beehive requires bi-directional communication among generic

distributed applications handling a live workload.

�ere are Domain-Speci�c Languages (DSLs), such as BOOM [13] and Fabric [80], that

propose new ways of programming for the purpose of provability, veri�ability or security.

In contrast to proposing a new DSL, our goal here is to create a distributed programming

environment similar to centralized controllers. To that end, control applications in Beehive are

developed using a general purpose programming language using the familiar APIs provided by


Beehive’s programming model.

Howdoes Beehive compare with existing distributed controllers? Existing control planes such

as ONIX [75] and ONOS [17] are focused on scalability and performance. Beehive, on the other

hand, provides a programming paradigm that simpli�es the development of scalable control

applications. We note that existing distributed controllers such as Kandoo [57], ONIX [75],

and Logical xBars [86] can all be implemented using Beehive. More importantly, Beehive’s

automatic placement optimization leads to results similar to what ElasticCon [37] achieve by

load-balancing. Our optimization, however, is generic, is not limited to balancing the workload

of OpenFlow switches, applies to all control applications, and can be extended to accommodate

alternative optimization methods.

How does Beehive compare with fault-tolerance proposals? LegoSDN [28] is a framework

to isolate application failures in a controller. Beehive similarly isolates application failures

using automatic transactions, which is similar to LegoSDN’s Absolute Compromise policy

for centralized controllers. Other LegoSDN policies can be adopted in Beehive, if needed.

Moreover, Ravana [68] is a system that provides transparent slave-master fault-tolerance for

centralized controllers. As demonstrated in Figure 5.12a, Beehive has the capabilities not only

to tolerate a controller failure, but also seamlessly and consistently tolerates faults in the control

channel that is outside the scope of Ravana.

Do applications interplay well in Beehive? �e control plane is an ensemble of control appli-

cations managing the network. For the most part, these applications have interdependencies.

No matter how scalable an application is on its own, heedless dependency on a poorly designed

application may result in subpar performance. For instance, a local application that depends on

messages from a centralized application might not scale well. Beehive cannot automatically �x

a poor design, but provides analytics to highlight the design bottlenecks of control applications,

thus assisting the developers in identifying and resolving such performance issues.

Moreover, as shown in [23] and [90], there can be con�icts in the decisions made by

di�erent control applications. Although we do not propose a speci�c solution for that issue,


these proposals can be easily adopted to implement control applications in Beehive. Speci�cally,

one can implement the Corybantic Coordinator [90] or the STN Middleware [23] as a Beehive

application that sits in between the SDN controller and the control applications. With Beehive’s

automatic optimization, control modules are easily distributed on the control platform to make

the best use of available resources.

4.6 Remarks

In this chapter, we have presented Beehive, a distributed control platform that is easy to use and

instrument. Using a simple programming model, Beehive infers the state required to process

a message, and automatically compiles applications into their distributed counterparts. By

instrumenting applications at runtime, Beehive optimizes the placement of control applications,

and provides feedback to the developer helping them in the design process. We have

demonstrated that our proposal is able to model existing distributed control planes. Moreover,

our evaluations con�rm that this approach can be e�ective in designing scalable control planes.

In the next chapter, we continue our evaluations with two real-world use cases of Beehive.

Chapter 5

Use Cases

In this chapter, we present a few real-world use cases of the programming models proposed in

this thesis. We start with OpenTCP which is an e�ort to adapt di�erent �avors of TCP based

on the scalable statistics collection provided by Kandoo. We present our real world evaluations

of OpenTCP in the SciNet HPC datacenter. We continue with a distributed SDN controller we

have designed and implemented based on Beehive. We demonstrate that our controller is fault-

tolerant and scalable, yet is implemented in a few hundred lines of code similar to centralized

controllers. We conclude this section with a sample routing algorithm that we implemented as

a proof of concept to showcase the generality of Beehive.

5.1 OpenTCP

�e Transmission Control Protocol (TCP) [65] is designed to fully utilize the network

bandwidth while keeping the entire network stable. TCP’s behaviour can be sub-optimal and

even erroneous [119, 12, 30] mainly for two reasons: First, TCP is expected to operate in a diverse

set of networks with di�erent characteristics and tra�c conditions; making TCP a “Jack of all

trades, master of none” protocol. Limiting TCP to a speci�c network and taking advantage

of local characteristics of that network can lead to major performance gains. For instance,

DCTCP [12] outperforms TCP in datacenter networks, even though the results might not be

91

Chapter 5. Use Cases 92

applicable in the Internet. With this mindset, one can adjust TCP (the protocol itself and its

parameters) to gain a better performance in speci�c networks (e.g., datacenters).

Second, even focusing on a particular network, the e�ect of dynamic congestion control

adaptation to tra�c pattern is not yet well understood in today’s networks. Such adaptation can

potentially lead to major improvements, as it provides another dimension that today’s TCP does

not explore.

Clearly, adapting TCP requires meticulous consideration of the network characteristics and

tra�c patterns. �e fact that TCP relies solely on end-to- end measurements of packet loss or

delay as the only sources of feedback from the network means that TCP, at its best, has a very

limited view of the network state (e.g., the trajectory of available bandwidth, congested links,

network topology, and tra�c). �us, a natural question here is:

Can we build a system that observes the state and dynamics of a computer network and adapts

TCP’s behaviour accordingly?

If the answer to this question is positive, it can simplify tuning di�erent TCP parameters. It

can also facilitate dynamically changing the protocol itself.

OpenTCP. Here we present OpenTCP [46] as a system for dynamic adaptation of TCP based

on network and tra�c conditions in SDN [88]. OpenTCP mainly focuses on internal tra�c

in SDN-based datacenters for four reasons: (i) the SDN controller already has a global view

of the network (such as topology and routing information), (ii) the controller can collect any

relevant statistics (such as link utilization and tra�c matrix), (iii) it is straightforward to realize

OpenTCP as a control application in SDNs, and (iv) the operator can easily customize end-hosts’

TCP stack.

Our preliminary implementation of OpenTCP has been deployed in SciNet, a high-

performance computing (HPC) datacenter with ∼4000 hosts. We use OpenTCP to tune TCP

parameters in this environment to show how it can simplify the process of adapting TCP to

network and tra�c conditions. We also show how modifying TCP using OpenTCP can improve

the network performance. Our experiments show that using OpenTCP to adapt init_cwnd


Local Controller

Root Controller

Oracle

Switch

Switch

SwitchFlow

Flow

NetworkStatistics

CCA

Host Host

Host

Host

CUE

Local Controller

Local Controller

Local Controller

Local ControllerCCA

Host

Local ControllerCCA

Host

Local ControllerCCA

Local ControllerCCA

Local ControllerCCA

Figure 5.1: Using Kandoo to implement OpenTCP: �e Oracle, SDN-enabled switches, andCongestion Control Agents are the three components of OpenTCP.

based on link utilization leads to up to 59% reduction in �ow completion times.

OpenTCP and related work. OpenTCP is orthogonal to previous works improving TCP’s

performance. It is not meant to be a new variation of TCP. Instead, it complements previous

e�orts by making it easy to switch between di�erent TCP variants automatically (or in a semi-

supervised manner), or to tune TCP parameters based on network conditions. For instance, one

can use OpenTCP to either utilize DCTCP or CUBIC in a datacenter environment. �e decision

on which variant to use is made in advance through the congestion control policies de�ned by

the network operator.

5.1.1 OpenTCP Architecture

OpenTCP collects data regarding the underlying network state (e.g., topology and routing

information) as well as statistics about network tra�c (e.g., link utilization and tra�c matrix).

�en, using this aggregated information and based on policies de�ned by the network operator,

OpenTCP decides on a speci�c set of adaptations for TCP. Subsequently, OpenTCP sends

periodic updates to the end-hosts that, in turn, update their TCP variant using a simple kernel

module. Figure 5.1 presents the schematic view of how OpenTCP works. As shown, OpenTCP’s

architecture has three main components that aptly �t into Kandoo’s design (§3):


1. �e Oracle is a non-local SDN control application that lies at the heart of OpenTCP

and is deployed on the root controller. It collects information about the underlying

network and tra�c. �en, based on congestion control policies de�ned by the network

operator, it determines the appropriate changes to optimize TCP. Finally, it distributes

update messages to end-hosts.

2. Collectors are typical local applications that collect statistics from switches, deployed on

local controllers to scale.

3. CongestionControl Agents (CCAs) are also local applications that receive update messages

from the Oracle and are responsible for modifying the TCP stack at each host using a

kernel module.

5.1.2 OpenTCP Operation

OpenTCP goes through a cycle composed of three steps: (i) data collection, (ii) optimization

and CUE generation, and (iii) TCP adaptation. �ese steps are repeated at a pre-determined

interval of T seconds. To keep the network stable, T needs to be several orders of magnitude

slower than the network RTT . In the rest of this section, we will describe each of these steps in

detail.

Data Collection. OpenTCP relies on two types of data for its operation. First, it needs

information about the overall structure of the network, including the topology, link properties

like delay and capacity, and routing information. Since this information is readily available in

the SDN controller, the Oracle can easily access them. Second, OpenTCP collects statistics about

tra�c characteristics by periodically querying �ow-tables in SDN switches. In order to satisfy

the low overhead design objective, our implementation of OpenTCP relies on the Kandoo local

controllers [57] to collect summarized link level statistics at scale. �e operator can choose to

collect only a subset of possible statistics to further reduce the overhead of data collection. Later,

we will show how this is done by using congestion control policies.


Congestion Control Policy

Statistics Mask

Optimizationvariable(s)

objective functionconstraint(s)

Mapping Rules

S

DN

Con

trol

ler

Oracle

Optimizer

CUE Generator

SolutionStatisticsStatisticsStatistics

StatisticsStatisticsN

et. State

Hosts CCA

CUEsCUEsCUEs

Switches

Network Operator

Figure 5.2: �e Oracle collects network state, and statistics through the SDN controller andswitches. Combining those with the CCP de�ned by the operator, the Oracle generates CUEswhich are sent to CCAs sitting on end-hosts. CCAs will adapt TCP based on the CUEs.

CUE Generation. A�er data collection, we must determine what changes to make to TCP

sources. �is decision might vary from network to network depending on the network operator’s

preferences. In OpenTCP, we formalize these preferences by de�ning a Congestion Control

Policy (CCP). �e network operator provides OpenTCP with the CCP. At a high level, CCP

de�nes which statistics to collect, what objective function the operator should optimize, what

constraints OpenTCP must satisfy, and how the proposed updates should be mapped onto

speci�c changes in TCP sources. Based on the CCP de�ned by the network operator, OpenTCP

�nds the appropriate changes and determines how to adapt the individual TCP sessions. �ese

changes are sent to individual CCAs in messages we call Congestion Update Epistles (CUEs).

�e CCAs which are present in individual end-hosts use these CUEs to update TCP. Figure 5.2

shows how CCPs are integrated into the TCP adaptation process.

TCP Adaptation. �e �nal step in OpenTCP’s operation is to change TCP based on CUEs

generated by the Oracle. A�er solving the optimization problem, the Oracle forms CUEs and

sends them directly to CCAs sitting in the end-hosts. CUEs are usually very small packets.

In our implementation, each CUE is usually less than 10-20 bytes. �us, their overhead is very

small.


CCAs are responsible for receiving and enforcing updates. Upon receiving a CUE, each CCA

will check the condition given in each mapping rule. If this condition is satis�ed, the CCA

immediately applies the command. Changes can be simple adaptations of TCP parameters or

can require switching between di�erent variants of TCP. If the CCA does not receive any CUEs

from the Oracle for a long time (we use 2T where T is the OpenTCP operation cycle), or if the

mapping rule conditions are violated, the CCA reverts all changes and OpenTCP goes back to

the default TCP settings.

Adaptation example. Assume the operator simply wants to switch from TCP Reno to CUBIC

while keeping an eye on packet drop rates. Here, we assume that the default TCP variant running

in end-hosts is TCP Reno. �ere is a constraint limiting the link utilization to 80%. If the

network has a low load (i.e., as long as all links have a utilization less than 80%) the Oracle

will send CUEs to all CCAs instructing them to switch to CUBIC. If network conditions change

and utilization goes above 80% in any link, the Oracle will not be able to satisfy the optimization

constraints. In this case, the CCAs will fall back to the default TCP variant, which is TCP Reno

here.

OpenTCPwithout SDNSupport. OpenTCP is expressly designed for SDN. In fact, the existence

of a Kandoo root controller with access to the global state of the network, as well as local

controllers which can be used to collect �ow and link level statistics, make SDN the perfect

platform for OpenTCP. Having said that, we can use OpenTCP on a traditional network (non-

SDN) as long as we can collect the required network state and tra�c statistics. Clearly, this

requires more e�ort and might incur a higher overhead since we do not have a centralized

controller or built-in statistics collection features. In the absence of SDN switches, we can use a

dedicated node in the network to act as the Oracle. Moreover, we can use CCAs to collect �ow

statistics (e.g., number of active �ows, utilization and drop rate) at each end-host using a kernel

module in a distributed manner and send relevant information to the Oracle. �e Oracle can

query CCAs to collect various statistics and if needed to aggregate them to match the statistics

available in SDN. �is needs to be done with extra care, so that the overhead remains low, and


the data collection system does not have a major impact on the underlying network tra�c.

Conditions and constraints for OpenTCP. �e current implementation of OpenTCP relies on

a single administrative authority to derive network and tra�c statistics. To this end, we also

assume no adversary in the network and that we could change the end-host stacks. Hence,

OpenTCP doesn’t have to operate in a datacenter per se, and any network that satis�es the above

conditions can bene�t from OpenTCP.

Changing parameters in a running system. Depending on the TCP parameter, making

changes to TCP �ows in an operational network may or may not have immediate changes in

TCP behaviour. For example, adapting init_cwnd only a�ects new TCP �ows but max_cwnd

change is respected almost immediately. If the congestion window size is greater than the new

max_cwnd, TCP will not drop those packets, it will deliver them and a�erwards respects the

new max_cwnd. Similarly, enabling TCP pacing will have an immediate e�ect on the delivery

of packets. �e operator should be aware of when and how each parameter setting will a�ect on

going TCP sessions.

5.1.3 Evaluation

To evaluate the performance of OpenTCP, we deploy our proposal in a High Performance

Computing (HPC) datacenter. SciNet is mainly used for scienti�c computations, large scale

data analyses, and simulations by a large number of researchers with diverse backgrounds: from

biological sciences and chemistry, to astronomy and astrophysics. �rough a course of 20 days,

we enabled OpenTCP and collected 200TB of packet level trace as well as �ow and link level

statistics using our tcp_flow_spy kernel module [8].

Setup.�e topology of SciNet is illustrated in Figure 5.3. �ere are 92 racks of 42 servers. Each

server connects to a Top of Rack switch (ToR) via 1Gbps Ethernet. Each end-host in SciNet

runs Linux 2.6.18 with TCP BIC [52] as the default TCP. SciNet consists of 3, 864 nodes with

a total of 30, 912 cores (Intel Xeon Nehalem) at 2.53GHz, with 16GB RAM per node. �e ToR

switches are connected to a core Myricom Myri-10G 256-Port 10GigE Chassis each having a


10 Gbps

Core Switch

Host

Host Host Host Host

42 × 1 Gbps


92

42

Host Host Host

Figure 5.3: �e topology of SciNet, the datacenter network used for deploying and evaluatingOpenTCP.

10Gbps connection.

OpenTCP in SciNet. We did not have access to a large scale SDN and could not change any

hardware elements in SciNet. �erefore, we took the extra steps to deploy OpenTCP without

SDN. In our experiments, the CCAs periodically collect socket-level statistics, aggregate them,

and send the results to the Oracle, a dedicated server in SciNet using the tcp_flow_spy

kernel module. �e Oracle performs another level of aggregation to produce �nal statistics

combining data from the CCAs. �roughout our experiments, we monitor OpenTCP to ensure

that we do not have a major CPU or bandwidth overhead. �e average CPU overhead of our

instrumentation is less than 0.5%, and we have a negligible bandwidth requirement. We believe

this is a reasonably low overhead, one which is unlikely to have a signi�cant impact on the

underlying jobs and tra�c.

Evaluation overview. We begin this section by describing the workload in SciNet (§ 5.1.3).

We show that, despite the di�erences in the nature of the jobs running in SciNet, the overall

properties of the network tra�c, such as �ow sizes, �ow completion times (FCT), tra�c

locality, and link utilizations are analogous to those found by previous studies of datacenter

tra�c [53, 12, 48]. We take this step to ensure our results are not isolated or limited to SciNet.


0.2

0.4

0.6

0.8

1

1E-06 1E-03 1E+00 1E+03 1E+06

Cu

mu

lative

Fra

ctio

n

Log of FCT (Seconds)

Message Passing InterfaceDistributed File System

0

0.2

0.4

0.6

0.8

1

1E+00 1E+03 1E+06 1E+09 1E+12

Cu

mu

lative

Fra

ctio

n

Log of Flow Size (Bytes)


(a)

0

0.2

0.4

0.6

0.8

1

1E-06 1E-03 1E+00 1E+03 1E+06

Cu

mu

lative

Fra

ctio

n

Log of FCT (Seconds)


0

0.2

0.4

0.6

0.8

1

1E+00 1E+03 1E+06 1E+09 1E+12

Cu

mu

lative

Fra

ctio

n



(b)

Figure 5.4: SciNet tra�c characterization. �e x-axis is in log scale. (a) CDF of �ow sizesexcluding TCP/IP headers. (b) CDF of �ow completion times (right).

Tra�c Characterization. At SciNet, the majority of jobs are Message Passing Interface

(MPI) [44] programs running on one or more servers. Jobs are scheduled through a queuing

system that allows a maximum of 48 hours per job. Any job requiring more time must be broken

into 48 hour chunks. Note that each node runs a single job at any point of time. We use OpenTCP

to collect traces of �ow and link level statistics, capturing a total of ∼ 200TB of �ow level logs

over the course of 20 days.

Workload. We recognize two types of co-existing TCP tra�c in our traces (shown in Figure 5.4):

(i) MPI tra�c and (ii) distributed �le system �ows. �e majority of MPI �ows are less than 1MB

in size and �nish within 1 second. �is agrees with previous studies of datacenter tra�c [12, 16].

It is interesting to note that only a few MPI �ows (1%) last up to the maximum job time allowance

of 48 hours. Unlike the MPI �ows, the distributed �le system tra�c ranges from 20B to 62GB

in size, and most (93%) �nish within 100 seconds. A few large background jobs last more than

6 days.

Locality. Figure 5.5 represents the log of the number of bytes exchanged between server pairs in

24 hours. Element (i , j) in this matrix represents the amount of tra�c that host i sends to host j.


From

To

10

20

30

40

50

60

Figure 5.5: �e locality pattern as seen in matrix of log(number of bytes) exchanged betweenserver pairs.

�e nodes are ordered such that those within a rack are adjacent on the axes.1 �e accumulation

of dots around the diagonal line in the �gure shows the tra�c being locally exchanged among

servers within a rack, a pattern that is similar to previous measurements [16, 53]. Note that this

matrix is not symmetrical as it depicts the tra�c from nodes on the x-axis to nodes on the y-

axis. �e vertical lines in the tra�c matrix represent “token management tra�c” belonging to

the distributed �le system. �is type of tra�c exchanges tokens between end-hosts and a small

set of nodes called token managers. Finally, the rectangular regions in the �gure represent tra�c

between the nodes participating in multi-node MPI jobs.

Utilization. We observe two utilization patterns in SciNet. First, 80% of the time, link

utilizations are below 50%. Second, there are congestion epochs in the network where both

edge and core links experience high levels of utilization and, thus, packet losses. Again, this

observation accords with the previous studies where irrespective of the type of datacenter, link

utilizations are known to be low [16, 53].

Experiment Methodology. We run a series of experiments to study how OpenTCP works

in practice, using a combination of standard benchmarks and sample user jobs as the

workload. During each experiment, we use OpenTCP to make simple modi�cations to end-

hosts’ congestion control schemes, and measure the impact of these changes in terms of �ow

completion times and packet drop rates. �roughout our experiments, we set OpenTCP’s slow

1Virtualization is not used in this cluster.


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1E+01 1E+02 1E+03 1E+04 1E+05 1E+06 1E+07 1E+08 1E+09

Cu

mu

lative

Fra

ctio

n


All to all

Flash

All Gather

Exchange

Scat

ter

Reduce scatter

Figure 5.6: CDF of �ow sizes in di�erent benchmarks used in experiments. �e x-axis is in logscale.

time scale to 1min, unless otherwise stated2 directing OpenTCP to collect statistics and send

CUEs once every 1min. We also measure the overhead of running OpenTCP on CPU usage

and bandwidth. In our experiments, we have used 40 nodes in 10 ToR racks and we run all the

benchmarks explained above in over 10 iterations.

To evaluate the impact of the changes made by OpenTCP, we symmetrically split our nodes

into two sets. Both sets run similar jobs, but we deploy OpenTCP on one of these two sets

only (i.e., half the nodes). Because the two sets are comparable, we can observe the impact of

OpenTCP with minimum in�uence from other random e�ects.

Benchmarks. In order to study di�erent properties of OpenTCP, we use Intel MPI Benchmarks

(IMB) [9] as well as sample user jobs in SciNet. �ese benchmarks cover a wide range of

computing applications, have di�erent processing and tra�c load characteristics, and stress the

SciNet network in multiple ways. We also use sample user jobs (selected from the pool of jobs

submitted by real users) to have an understanding of the behaviour of the system under typical

user generated workload patterns. Figure 5.6 shows the CDF of �ow sizes for these benchmarks.

As illustrated in the �gure, �ow sizes range from 50B to 1GB, much like those observed in § 5.1.3.

�e results presented here are for “all-to-all” IMB benchmark and “Flash” user job. We measured

similar consistent results with other IMB benchmarks as well as other sample user jobs.

2In this environment, the fast time scale (RTT) is less than 1ms.

Chapter 5. Use Cases 1020.0 1.56E-03 0.0 1.05E-03 0.0 7.72E-03

1.368E-03 1.00E-02 2.03E-03 6.21E-03 6.50E-05 1.02E-02

1.4476E-02 2.00E-02 0.004045 6.59E-03 9.40E-05 2.06E-02

2.0162E-02 3.00E-02 6.27E-03 7.91E-03 0.000232 3.07E-02

2.3074E-02 4.00E-02 8.3E-03 8.34E-03 7.46E-04 3.84E-02

2.7308E-02 5.00E-02 0.010328 9.10E-03 1.267E-03 4.17E-02

1.30009E-01 6.00E-02 0.012501 9.92E-03 1.837E-03 4.34E-02

3.04158E-01 7.00E-02 1.4941E-02 1.02E-02 2.926E-03 4.37E-02

5.05662E-01 7.48E-02 1.614E-02 2.03E-02 3.44E-03 4.42E-02

7.31372E-01 7.56E-02 1.6356E-02 3.03E-02 3.946E-03 4.66E-02

9.80593E-01 7.60E-02 1.6431E-02 4.04E-02 4.475E-03 4.73E-02

1.217378 7.64E-02 1.6507E-02 5.04E-02 5.108E-03 4.75E-02

1.560446 7.65E-02 1.6594E-02 6.06E-02 6.264E-03 4.77E-02

1.777013 7.65E-02 1.6686E-02 7.06E-02 0.007574 4.81E-02

2.052396 7.67E-02 0.016801 8.06E-02 8.096E-03 4.93E-02

2.394432 7.68E-02 1.6942E-02 9.06E-02 8.688E-03 4.96E-02

2.690235 7.68E-02 1.7085E-02 1.01E-01 9.222E-03 4.96E-02

2.957487 7.69E-02 0.017276 1.11E-01 9.725E-03 4.99E-02

3.334564 7.70E-02 1.7452E-02 1.21E-01 0.010289 5.11E-02

3.753224 7.71E-02 1.7678E-02 1.31E-01 1.0852E-02 5.32E-02

3.966682 7.72E-02 1.7917E-02 1.41E-01 0.011479 5.39E-02

4.1969E+00 7.73E-02 0.018157 1.51E-01 1.249E-02 5.42E-02

4.492891 7.74E-02 1.8444E-02 1.61E-01 1.3137E-02 5.44E-02

4.845155 7.74E-02 1.8729E-02 1.71E-01 1.4103E-02 5.46E-02

5.058947 7.75E-02 1.9048E-02 1.81E-01 1.4826E-02 5.47E-02

5.33656E+00 7.75E-02 1.9387E-02 1.91E-01 1.7177E-02 5.48E-02

5.866325 7.76E-02 1.9745E-02 2.01E-01 2.4236E-02 5.49E-02

6.210425 7.77E-02 0.020038 2.11E-01 2.4965E-02 5.54E-02

6.554158 7.77E-02 2.024E-02 2.21E-01 0.025529 5.56E-02

6.839346 7.77E-02 2.0427E-02 2.31E-01 2.6054E-02 5.61E-02

7.07373E+00 7.88E-02 0.02061 2.41E-01 0.027649 5.71E-02

7.348253 7.91E-02 2.0781E-02 2.51E-01 2.8171E-02 5.73E-02

7.581513 7.94E-02 0.020961 2.61E-01 0.028704 5.77E-02

7.801092 7.96E-02 2.1135E-02 2.71E-01 0.029363 5.80E-02

8.034545 7.98E-02 2.1313E-02 2.81E-01 2.9957E-02 5.84E-02

8.390666 7.98E-02 2.1487E-02 2.91E-01 3.051E-02 5.92E-02

9.235576 7.99E-02 2.1677E-02 3.01E-01 3.1022E-02 6.04E-02

9.549654 8.00E-02 0.021856 3.11E-01 3.157E-02 6.08E-02

10.085976 8.00E-02 0.022022 3.21E-01 3.2113E-02 6.13E-02

10.340206 8.01E-02 0.022196 3.31E-01 3.2684E-02 6.17E-02

10.615855 8.04E-02 2.2343E-02 3.41E-01 3.3201E-02 6.18E-02

10.829659 8.05E-02 0.022508 3.51E-01 3.4545E-02 6.21E-02

11.039433 8.22E-02 0.022657 3.61E-01 3.5213E-02 6.22E-02

11.252364 8.44E-02 2.2797E-02 3.71E-01 0.036461 6.22E-02

11.454894 8.47E-02 0.022948 3.82E-01 0.039347 6.23E-02

11.700389 8.52E-02 2.3089E-02 3.92E-01 4.3436E-02 6.24E-02

11.990584 8.53E-02 2.3227E-02 4.02E-01 8.2546E-02 6.25E-02

12.836475 8.54E-02 2.3367E-02 4.12E-01 0.170652 6.25E-02

13.3147 8.54E-02 2.3519E-02 4.22E-01 1.76133E-01 6.26E-02

13.561489 8.55E-02 0.023663 4.32E-01 1.76714E-01 6.27E-02

13.799055 8.55E-02 2.3806E-02 4.42E-01 1.77323E-01 6.30E-02

14.009187 8.56E-02 2.395E-02 4.52E-01 0.177985 6.33E-02

14.294587 8.56E-02 2.4095E-02 4.62E-01 1.78744E-01 6.38E-02

14.536356 8.56E-02 2.4243E-02 4.72E-01 0.179866 6.39E-02

15.053835 8.56E-02 2.4382E-02 4.82E-01 1.80654E-01 6.42E-02

15.438598 8.57E-02 0.024537 4.92E-01 0.182034 6.43E-02

15.654457 8.58E-02 0.02469 5.02E-01 1.82732E-01 6.44E-02

15.91365 8.65E-02 2.4835E-02 5.12E-01 1.84797E-01 6.46E-02

16.185149 8.81E-02 0.024992 5.22E-01 1.86762E-01 6.47E-02

16.453792 8.88E-02 0.025137 5.32E-01 0.18756 6.48E-02

17.480976 8.88E-02 2.5306E-02 5.42E-01 1.89088E-01 6.49E-02

17.717138 8.89E-02 2.5471E-02 5.52E-01 0.202654 6.49E-02

17.924415 8.89E-02 2.5646E-02 5.62E-01 2.05131E-01 6.50E-02

18.234262 9.05E-02 2.5815E-02 5.72E-01 2.05665E-01 6.51E-02

18.449854 9.05E-02 0.025984 5.82E-01 0.206789 6.52E-02

18.712117 9.06E-02 2.6151E-02 5.92E-01 2.07915E-01 6.52E-02

20.039091 9.08E-02 2.6325E-02 6.02E-01 2.0914E-01 6.54E-02

20.776717 9.08E-02 2.6505E-02 6.12E-01 2.10129E-01 6.55E-02

20.984476 9.08E-02 0.026698 6.22E-01 2.13348E-01 6.58E-02

21.203764 9.08E-02 0.026882 6.32E-01 0.214113 6.59E-02

21.463044 9.10E-02 2.7069E-02 6.42E-01 0.217587 6.60E-02

21.731919 9.10E-02 0.027274 6.52E-01 0.218949 6.61E-02

21.933856 9.36E-02 2.7481E-02 6.62E-01 2.31753E-01 6.62E-02

22.266777 9.38E-02 2.7716E-02 6.72E-01 0.235737 6.63E-02

22.488595 9.39E-02 2.7941E-02 6.82E-01 2.36775E-01 6.65E-02

23.419647 9.43E-02 2.8202E-02 6.92E-01 2.37891E-01 6.68E-02

23.917087 9.43E-02 2.8479E-02 7.02E-01 2.38629E-01 6.69E-02

24.232286 9.43E-02 0.028756 7.12E-01 0.240144 6.72E-02

25.093202 9.44E-02 2.9062E-02 7.23E-01 2.41014E-01 6.79E-02

25.500754 9.44E-02 0.029401 7.33E-01 2.4153E-01 6.86E-02

26.003099 9.76E-02 2.9787E-02 7.43E-01 2.42043E-01 6.91E-02

26.328407 9.77E-02 3.0263E-02 7.53E-01 2.42666E-01 6.97E-02

26.818761 9.77E-02 0.030792 7.63E-01 2.43181E-01 7.09E-02

27.195513 9.78E-02 3.1485E-02 7.73E-01 0.243726 7.19E-02

27.562728 9.79E-02 0.032416 7.83E-01 2.44261E-01 7.27E-02

28.331838 9.79E-02 3.399E-02 7.93E-01 2.44971E-01 7.36E-02

28.632321 9.79E-02 3.5994E-02 8.00E-01 2.45849E-01 7.39E-02

28.91711 9.80E-02 3.8004E-02 8.04E-01 2.46383E-01 7.44E-02

29.404483 9.81E-02 4.0008E-02 8.07E-01 2.47009E-01 7.51E-02

29.917358 9.81E-02 4.2013E-02 8.09E-01 2.4766E-01 7.54E-02

30.632188 9.81E-02 4.4014E-02 8.10E-01 2.4827E-01 7.56E-02

31.38579 9.81E-02 4.6026E-02 8.12E-01 2.48793E-01 7.60E-02

31.933991 9.82E-02 4.804E-02 8.14E-01 2.50295E-01 7.61E-02

32.419199 9.82E-02 5.0112E-02 8.15E-01 2.51396E-01 7.64E-02

32.693924 9.83E-02 5.2219E-02 8.16E-01 2.5264E-01 7.66E-02

32.913463 9.83E-02 5.4244E-02 8.16E-01 2.53324E-01 7.69E-02

33.390122 9.84E-02 5.6313E-02 8.17E-01 2.54097E-01 7.77E-02

33.693257 9.84E-02 0.058351 8.17E-01 2.54597E-01 7.84E-02

34.865767 9.84E-02 0.06037 8.17E-01 0.255108 7.88E-02

35.580326 9.85E-02 6.242E-02 8.18E-01 2.55841E-01 7.91E-02

35.806527 9.85E-02 6.4525E-02 8.18E-01 0.25649 7.95E-02

36.012378 9.86E-02 6.6556E-02 8.18E-01 0.257658 8.06E-02

36.380494 9.86E-02 6.8601E-02 8.18E-01 2.58397E-01 8.09E-02

36.626263 9.87E-02 7.0789E-02 8.18E-01 0.260986 8.11E-02

36.884276 9.87E-02 7.2807E-02 8.18E-01 2.62121E-01 8.14E-02

37.411756 9.88E-02 7.4807E-02 8.18E-01 2.63287E-01 8.15E-02

37.692116 9.88E-02 7.7674E-02 8.18E-01 2.63945E-01 8.18E-02

38.072142 9.89E-02 7.9716E-02 8.18E-01 2.64614E-01 8.21E-02

38.568569 9.90E-02 8.3919E-02 8.18E-01 0.265262 8.23E-02

38.874771 9.90E-02 8.6707E-02 8.18E-01 2.65857E-01 8.28E-02

39.149344 9.90E-02 9.0165E-02 8.18E-01 2.66745E-01 8.32E-02

39.45034 9.91E-02 9.2403E-02 8.18E-01 2.6725E-01 8.37E-02

39.760096 9.91E-02 9.6392E-02 8.19E-01 2.69207E-01 8.40E-02

40.05407 9.91E-02 1.0175E-01 8.19E-01 0.269847 8.42E-02

40.817456 9.92E-02 0.107211 8.19E-01 2.71166E-01 8.43E-02

41.220761 9.92E-02 1.09464E-01 8.19E-01 2.7221E-01 8.45E-02

41.713548 9.92E-02 0.111801 8.19E-01 2.72753E-01 8.47E-02

42.224609 9.93E-02 1.67302E-01 8.19E-01 2.7547E-01 8.48E-02

42.446489 9.95E-02 1.72197E-01 8.19E-01 2.75986E-01 8.49E-02

42.816795 9.96E-02 1.75245E-01 8.19E-01 2.76602E-01 8.52E-02

43.191928 9.96E-02 1.77662E-01 8.19E-01 2.77465E-01 8.52E-02

43.502043 9.96E-02 1.79889E-01 8.19E-01 2.78054E-01 8.54E-02

4.37291E+01 9.97E-02 1.81986E-01 8.19E-01 2.78562E-01 8.56E-02

44.241514 9.97E-02 0.184008 8.20E-01 2.79307E-01 8.59E-02

44.526853 9.97E-02 1.86024E-01 8.20E-01 2.80832E-01 8.60E-02

44.775919 9.98E-02 0.188109 8.21E-01 2.81466E-01 8.62E-02

45.507528 9.99E-02 0.190224 8.21E-01 0.282663 8.63E-02

45.718122 9.99E-02 0.192359 8.21E-01 2.84676E-01 8.65E-02

46.062891 1.00E-01 1.9437E-01 8.22E-01 2.86287E-01 8.66E-02

46.394708 1.00E-01 1.96428E-01 8.22E-01 2.86814E-01 8.69E-02

46.673615 1.00E-01 1.9854E-01 8.22E-01 2.9115E-01 8.69E-02

46.915279 1.00E-01 0.201096 8.22E-01 4.70256E-01 8.71E-02

47.18689 1.00E-01 2.03422E-01 8.23E-01 0.495641 8.72E-02

47.405634 1.00E-01 2.05443E-01 8.23E-01 6.05909E-01 8.72E-02

47.617804 1.00E-01 0.20745 8.23E-01 1.164566 8.73E-02

47.991328 1.01E-01 0.209564 8.23E-01 1.165764 8.76E-02

48.41236 1.01E-01 2.11933E-01 8.24E-01 1.166635 8.76E-02

48.763742 1.01E-01 2.13994E-01 8.24E-01 1.167253 8.80E-02

49.488568 1.01E-01 0.216531 8.24E-01 1.168228 8.82E-02

49.69096 1.01E-01 0.220308 8.24E-01 1.169674 8.82E-02

49.905315 1.01E-01 2.23812E-01 8.24E-01 1.17102E+00 8.83E-02

50.146338 1.01E-01 2.2798E-01 8.24E-01 1.188385 8.84E-02

50.361762 1.01E-01 2.30315E-01 8.24E-01 1.191391 8.85E-02

50.757591 1.01E-01 2.34772E-01 8.24E-01 44.845904 8.86E-02

50.986492 1.01E-01 0.237229 8.24E-01 44.857057 8.87E-02

51.28355 1.01E-01 2.39448E-01 8.24E-01 44.869566 8.89E-02

52.841262 1.01E-01 2.41707E-01 8.24E-01 44.887575 8.90E-02

53.158623 1.01E-01 0.243834 8.24E-01 44.903629 8.92E-02

53.363298 1.01E-01 2.46514E-01 8.24E-01 44.917052 8.93E-02

53.569877 1.02E-01 2.50563E-01 8.24E-01 44.932463 8.95E-02

53.870596 1.02E-01 2.528E-01 8.24E-01 44.946061 8.96E-02

54.072917 1.02E-01 0.256745 8.24E-01 44.960281 8.98E-02

54.284049 1.02E-01 2.59778E-01 8.24E-01 44.978117 8.99E-02

54.54199 1.02E-01 2.67034E-01 8.24E-01 44.991706 9.01E-02

54.766493 1.03E-01 0.311616 8.24E-01 45.009885 9.03E-02

54.995941 1.03E-01 0.323216 8.24E-01 45.025531 9.04E-02

55.257194 1.03E-01 3.36991E-01 8.24E-01 45.039152 9.06E-02

55.889814 1.03E-01 3.49177E-01 8.24E-01 45.05531 9.07E-02

56.133049 1.03E-01 3.65272E-01 8.24E-01 45.070996 9.09E-02

56.339557 1.03E-01 3.7715E-01 8.24E-01 47.602685 9.10E-02

56.634221 1.03E-01 0.392177 8.24E-01 705.494672 9.11E-02

56.905762 1.03E-01 4.0746E-01 8.24E-01 705.498565 9.13E-02

57.147989 1.03E-01 4.09716E-01 8.24E-01 705.499109 9.16E-02

57.611502 1.03E-01 4.26527E-01 8.24E-01 705.499654 9.23E-02

57.834912 1.03E-01 4.36795E-01 8.24E-01 705.500256 9.26E-02

58.061757 1.03E-01 0.44233 8.24E-01 705.500897 9.30E-02

58.407544 1.03E-01 4.46231E-01 8.24E-01 705.501412 9.38E-02

59.243824 1.03E-01 0.462891 8.24E-01 705.502025 9.43E-02

59.472724 1.04E-01 0.481429 8.25E-01 705.502531 9.48E-02

59.688934 1.04E-01 4.91702E-01 8.25E-01 705.503043 9.54E-02

59.941786 1.04E-01 5.0729E-01 8.25E-01 705.503768 9.59E-02

61.291235 1.04E-01 5.22758E-01 8.25E-01 705.504412 9.65E-02

61.565172 1.04E-01 5.37801E-01 8.25E-01 705.504979 9.71E-02

61.892568 1.04E-01 5.49196E-01 8.25E-01 705.505507 9.81E-02

62.719596 1.04E-01 1.178513 8.25E-01 705.506012 9.93E-02

63.132817 1.04E-01 1.180624 8.25E-01 705.506534 1.00E-01

64.125152 1.04E-01 1.184356 8.25E-01 705.50704 1.01E-01

64.505347 1.04E-01 1.194582 8.25E-01 705.507542 1.03E-01

65.044262 1.04E-01 1.200136 8.25E-01 705.508192 1.03E-01

65.314697 1.04E-01 1.211483 8.25E-01 705.508771 1.04E-01

65.525384 1.04E-01 1.214337 8.25E-01 705.509333 1.05E-01

65.826537 1.04E-01 1.254425 8.25E-01 705.509879 1.07E-01

66.066814 1.04E-01 1.261695 8.25E-01 705.51051 1.08E-01

66.269688 1.04E-01 1.347844 8.25E-01 705.511146 1.09E-01

66.556621 1.04E-01 1.493434 8.25E-01 705.511685 1.11E-01

66.901886 1.04E-01 1.11824E+01 8.25E-01 705.512212 1.11E-01

67.123797 1.05E-01 1.20561E+01 8.25E-01 705.512722 1.13E-01

67.48183 1.05E-01 13.1015 8.25E-01 705.51326 1.15E-01

67.691873 1.05E-01 1.40408E+01 8.25E-01 705.513764 1.17E-01

67.95355 1.05E-01 14.827207 8.25E-01 705.514272 1.18E-01

68.379879 1.05E-01 1.48403E+01 8.25E-01 705.514793 1.19E-01

68.769764 1.05E-01 15.77263 8.25E-01 705.515309 1.20E-01

68.978429 1.05E-01 15.77517 8.25E-01 705.515846 1.21E-01

69.389891 1.05E-01 15.777431 8.25E-01 705.516359 1.23E-01

69.658871 1.05E-01 15.779883 8.25E-01 705.516861 1.25E-01

69.907446 1.05E-01 15.782043 8.25E-01 705.517449 1.26E-01

70.180157 1.05E-01 15.78526 8.25E-01 705.517967 1.27E-01

70.38899 1.05E-01 15.787495 8.25E-01 705.518487 1.27E-01

70.722106 1.05E-01 15.795955 8.25E-01 705.519263 1.28E-01

71.209582 1.06E-01 15.807256 8.25E-01 705.520136 1.28E-01

71.424457 1.06E-01 1.60048E+01 8.25E-01 705.520651 1.29E-01

72.308568 1.06E-01 16.126457 8.25E-01 705.521669 1.29E-01

72.711471 1.06E-01 16.186198 8.25E-01 705.522172 1.30E-01

73.810904 1.06E-01 16.246352 8.25E-01 705.522796 1.30E-01

74.070622 1.06E-01 16.321635 8.25E-01 705.523591 1.31E-01

74.515974 1.06E-01 16.420644 8.25E-01 705.524159 1.31E-01

74.733857 1.06E-01 16.491284 8.25E-01 705.525129 1.31E-01

74.975429 1.06E-01 16.539571 8.25E-01 705.52607 1.32E-01

75.425762 1.06E-01 16.604929 8.25E-01 705.526643 1.32E-01

75.906547 1.06E-01 16.745071 8.25E-01 705.527477 1.32E-01

76.314575 1.06E-01 16.786036 8.25E-01 705.528077 1.32E-01

76.598516 1.06E-01 16.806058 8.25E-01 705.528894 1.32E-01

76.818648 1.06E-01 16.823727 8.25E-01 705.534509 1.32E-01

77.355746 1.06E-01 16.830387 8.25E-01 705.540515 1.32E-01

77.581435 1.07E-01 16.833098 8.25E-01 705.541436 1.32E-01

77.857878 1.07E-01 16.873134 8.25E-01 708.301604 1.32E-01

78.167184 1.07E-01 16.875383 8.25E-01 708.302833 1.32E-01

78.90516 1.07E-01 16.907647 8.25E-01 708.303695 1.33E-01

79.302811 1.07E-01 16.9209 8.26E-01 708.304206 1.33E-01

79.611842 1.07E-01 1.77786E+01 8.26E-01 708.304716 1.34E-01

79.861229 1.07E-01 18.6142 8.26E-01 708.305233 1.35E-01

80.558719 1.07E-01 1.94881E+01 8.26E-01 708.306021 1.35E-01

80.954443 1.07E-01 2.03332E+01 8.26E-01 708.306538 1.36E-01

81.192639 1.07E-01 21.1694 8.26E-01 708.307409 1.36E-01

82.005395 1.07E-01 2.20021E+01 8.26E-01 708.307965 1.37E-01

82.227956 1.07E-01 2.28439E+01 8.26E-01 708.308467 1.37E-01

82.49765 1.08E-01 2.36864E+01 8.26E-01 708.309016 1.38E-01

82.728709 1.08E-01 2.4531E+01 8.26E-01 708.309525 1.39E-01

82.974757 1.08E-01 2.77067E+02 8.26E-01 708.310056 1.39E-01

83.576652 1.08E-01 277.072 8.26E-01 708.310576 1.40E-01

84.032569 1.08E-01 2.7714E+02 8.26E-01 708.311105 1.41E-01

84.331913 1.08E-01 2.77194E+02 8.26E-01 708.311649 1.42E-01

84.924975 1.08E-01 277.197 8.26E-01 708.312153 1.43E-01

85.018672 1.18E-01 2.772E+02 8.26E-01 708.312657 1.44E-01

85.218768 1.26E-01 2.77212E+02 8.26E-01 708.313166 1.46E-01

85.384934 1.36E-01 277.214 8.26E-01 708.31372 1.47E-01

85.472665 1.46E-01 2.77218E+02 8.26E-01 708.314242 1.47E-01

85.618525 1.56E-01 277.221 8.26E-01 708.314755 1.48E-01

85.816273 1.66E-01 2.77248E+02 8.26E-01 708.315294 1.49E-01

85.845264 1.76E-01 2.77251E+02 8.26E-01 708.315876 1.50E-01

85.967433 1.86E-01 2.77253E+02 8.26E-01 708.316601 1.50E-01

86.029059 1.96E-01 277.255 8.26E-01 708.317182 1.51E-01

86.249811 2.05E-01 2.77297E+02 8.26E-01 708.317764 1.52E-01

86.497188 2.12E-01 2.77314E+02 8.26E-01 708.318297 1.52E-01

86.93712 2.12E-01 2.77318E+02 8.26E-01 708.318834 1.52E-01

87.170557 2.12E-01 2.77324E+02 8.26E-01 708.319392 1.53E-01

87.905188 2.12E-01 2.77328E+02 8.26E-01 708.319947 1.53E-01

88.583444 2.12E-01 2.77335E+02 8.26E-01 708.320455 1.54E-01

89.230838 2.12E-01 2.77337E+02 8.26E-01 708.321038 1.55E-01

89.462139 2.12E-01 2.77342E+02 8.26E-01 708.321674 1.55E-01

89.500357 2.22E-01 2.77347E+02 8.26E-01 708.322264 1.55E-01

89.651014 2.32E-01 277.351 8.26E-01 708.32277 1.56E-01

89.677108 2.42E-01 277.358 8.26E-01 708.323459 1.56E-01

89.768366 2.52E-01 2.77365E+02 8.26E-01 708.324276 1.56E-01

89.909622 2.62E-01 277.38 8.26E-01 708.324852 1.56E-01

89.925259 2.72E-01 277.399 8.26E-01 708.325529 1.56E-01

90.183108 2.77E-01 277.404 8.26E-01 708.326189 1.57E-01

90.411207 2.77E-01 2.77431E+02 8.26E-01 708.326904 1.57E-01

90.869531 2.78E-01 2.77433E+02 8.26E-01 708.327521 1.57E-01

91.348562 2.78E-01 277.435 8.26E-01 708.328361 1.57E-01

91.595225 2.78E-01 2.77438E+02 8.26E-01 708.498928 1.57E-01

9.19368E+01 2.78E-01 277.44 8.26E-01 708.499682 1.57E-01

92.309943 2.78E-01 2.77442E+02 8.26E-01 708.501238 1.57E-01

92.731703 2.78E-01 2.77453E+02 8.26E-01 708.502572 1.58E-01

92.990176 2.78E-01 2.77455E+02 8.26E-01 708.50352 1.58E-01

93.318035 2.78E-01 277.464 8.26E-01 708.504226 1.58E-01

93.614331 2.78E-01 2.77475E+02 8.26E-01 708.505023 1.58E-01

93.862171 2.78E-01 2.77479E+02 8.26E-01 708.505528 1.59E-01

94.15529 2.78E-01 2.77485E+02 8.26E-01 708.506055 1.60E-01

95.106563 2.78E-01 2.77487E+02 8.26E-01 708.506607 1.61E-01

95.720935 2.78E-01 2.7749E+02 8.26E-01 708.507133 1.62E-01

9.61314E+01 2.78E-01 277.495 8.26E-01 708.507643 1.63E-01

96.340584 2.78E-01 2.77498E+02 8.26E-01 708.50821 1.64E-01

96.657448 2.78E-01 277.5 8.26E-01 708.50878 1.65E-01

97.400291 2.79E-01 2.77508E+02 8.26E-01 708.509294 1.67E-01

97.717285 2.79E-01 2.77513E+02 8.26E-01 708.509824 1.70E-01

98.067925 2.79E-01 277.517 8.26E-01 708.510342 1.73E-01

98.345988 2.79E-01 2.77521E+02 8.26E-01 708.510844 1.76E-01

98.666786 2.79E-01 277.524 8.26E-01 708.511357 1.79E-01

98.906759 2.79E-01 2.77526E+02 8.26E-01 708.511873 1.81E-01

99.166432 2.79E-01 2.77533E+02 8.26E-01 708.512375 1.85E-01

100.16891 2.79E-01 277.541 8.26E-01 708.512881 1.89E-01

101.441207 2.79E-01 2.77547E+02 8.26E-01 708.513406 1.95E-01

101.736446 2.79E-01 2.77554E+02 8.26E-01 708.513919 2.00E-01

101.977815 2.79E-01 2.77573E+02 8.26E-01 708.514421 2.04E-01

102.322988 2.79E-01 2.77575E+02 8.26E-01 708.514935 2.09E-01

102.529123 2.79E-01 2.77578E+02 8.26E-01 708.515437 2.14E-01

102.945466 2.79E-01 2.77581E+02 8.27E-01 708.515944 2.19E-01

103.215773 2.80E-01 2.77586E+02 8.27E-01 708.516449 2.25E-01

103.629686 2.80E-01 2.77609E+02 8.27E-01 708.516952 2.31E-01

103.864877 2.80E-01 277.613 8.27E-01 708.517457 2.37E-01

104.478801 2.80E-01 2.77617E+02 8.27E-01 708.517976 2.44E-01

104.701847 2.80E-01 2.77623E+02 8.27E-01 708.518478 2.50E-01

105.09099 2.80E-01 2.77626E+02 8.27E-01 708.518981 2.58E-01

105.326168 2.80E-01 277.63 8.27E-01 708.519491 2.66E-01

105.573944 2.80E-01 2.77636E+02 8.27E-01 708.519994 2.74E-01

105.778261 2.80E-01 2.77644E+02 8.27E-01 708.520497 2.81E-01

106.042395 2.81E-01 2.77647E+02 8.27E-01 708.520998 2.88E-01

106.285135 2.81E-01 277.649 8.27E-01 708.521499 2.94E-01

106.512165 2.81E-01 2.77653E+02 8.27E-01 708.522005 3.02E-01

106.821264 2.81E-01 2.77658E+02 8.27E-01 708.522508 3.11E-01

107.25213 2.81E-01 2.77663E+02 8.27E-01 708.523018 3.21E-01

107.758034 2.81E-01 277.69 8.27E-01 708.523473 3.31E-01

107.987405 2.81E-01 2.77693E+02 8.27E-01 708.52399 3.40E-01

108.387977 2.81E-01 277.697 8.27E-01 708.524491 3.47E-01

108.666183 2.82E-01 277.702 8.27E-01 708.524997 3.55E-01

108.937391 2.82E-01 2.77704E+02 8.27E-01 708.5255 3.61E-01

109.180215 2.82E-01 2.77706E+02 8.27E-01 708.526003 3.69E-01

109.55324 2.82E-01 277.726 8.27E-01 708.52651 3.77E-01

109.793077 2.82E-01 2.77731E+02 8.27E-01 708.527016 3.85E-01

110.092519 2.82E-01 277.733 8.27E-01 708.527517 3.93E-01

110.412302 2.82E-01 2.77736E+02 8.27E-01 708.528026 4.01E-01

110.963994 2.82E-01 2.77739E+02 8.27E-01 708.528533 4.09E-01

112.318584 2.82E-01 2.77748E+02 8.27E-01 708.529035 4.17E-01

112.757307 2.82E-01 2.77761E+02 8.27E-01 708.529538 4.25E-01

113.212968 2.83E-01 2.77765E+02 8.27E-01 708.530039 4.34E-01

113.594431 2.83E-01 2.77784E+02 8.27E-01 708.53054 4.42E-01

113.802629 2.83E-01 2.77787E+02 8.27E-01 708.531051 4.50E-01

114.269456 2.83E-01 2.77792E+02 8.27E-01 708.531555 4.60E-01

114.475496 2.83E-01 2.77794E+02 8.27E-01 708.532056 4.70E-01

114.695856 2.83E-01 277.827 8.27E-01 708.532565 4.77E-01

114.896798 2.93E-01 2.77833E+02 8.27E-01 708.533067 4.86E-01

115.096093 3.03E-01 2.77849E+02 8.27E-01 708.533578 4.95E-01

115.284229 3.13E-01 277.851 8.27E-01 708.534083 5.02E-01

115.445925 3.23E-01 2.77854E+02 8.27E-01 708.534586 5.10E-01

115.610599 3.33E-01 2.77857E+02 8.27E-01 708.535089 5.18E-01

115.902029 3.42E-01 2.77867E+02 8.27E-01 708.535597 5.27E-01

116.18424 3.42E-01 277.87 8.27E-01 708.536098 5.34E-01

116.588251 3.42E-01 2.77874E+02 8.27E-01 708.536599 5.42E-01

117.009517 3.42E-01 2.77878E+02 8.27E-01 708.5371 5.50E-01

117.526725 3.42E-01 2.77884E+02 8.27E-01 708.537607 5.59E-01

117.773753 3.43E-01 277.887 8.27E-01 708.538108 5.67E-01

118.082054 3.43E-01 2.77897E+02 8.27E-01 708.538609 5.76E-01

118.466949 3.43E-01 2.779E+02 8.27E-01 708.539109 5.84E-01

119.082229 3.43E-01 2.77905E+02 8.27E-01 708.539612 5.93E-01

119.571642 3.43E-01 2.77907E+02 8.27E-01 708.540116 6.03E-01

120.178774 3.43E-01 2.77921E+02 8.27E-01 708.540619 6.12E-01

120.388223 3.43E-01 277.923 8.27E-01 708.541123 6.21E-01

120.620865 3.43E-01 2.77925E+02 8.27E-01 708.541635 6.31E-01

120.862594 3.43E-01 2.77927E+02 8.27E-01 708.542156 6.40E-01

121.223627 3.43E-01 2.77931E+02 8.27E-01 708.542657 6.50E-01

121.394691 3.53E-01 277.952 8.27E-01 708.543151 6.60E-01

121.41424 3.63E-01 2.77955E+02 8.27E-01 708.543652 6.70E-01

121.636143 3.69E-01 2.77957E+02 8.27E-01 708.544151 6.80E-01

121.880221 3.69E-01 277.959 8.27E-01 708.54466 6.89E-01

122.133583 3.69E-01 2.77962E+02 8.27E-01 708.545161 6.99E-01

122.596868 3.69E-01 2.77968E+02 8.27E-01 708.545676 7.07E-01

123.01539 3.69E-01 2.77999E+02 8.27E-01 708.546179 7.17E-01

123.677619 3.69E-01 2.78002E+02 8.27E-01 708.546665 7.27E-01

123.938072 3.69E-01 2.78008E+02 8.27E-01 708.547169 7.36E-01

124.341122 3.69E-01 2.7802E+02 8.28E-01 708.54768 7.46E-01

124.570596 3.70E-01 2.78022E+02 8.28E-01 708.548182 7.55E-01

124.83813 3.70E-01 2.78028E+02 8.28E-01 708.548637 7.66E-01

125.437652 3.70E-01 2.78031E+02 8.28E-01 708.549138 7.75E-01

125.663189 3.70E-01 2.78033E+02 8.28E-01 708.549639 7.83E-01

125.876302 3.70E-01 278.036 8.28E-01 708.550144 7.93E-01

126.472777 3.71E-01 2.78038E+02 8.28E-01 708.550647 8.02E-01

126.705614 3.71E-01 278.048 8.28E-01 708.551153 8.11E-01

126.96485 3.71E-01 2.7805E+02 8.28E-01 708.551658 8.20E-01

127.167953 3.71E-01 2.78054E+02 8.28E-01 708.552161 8.28E-01

127.443413 3.71E-01 2.78059E+02 8.28E-01 708.552662 8.34E-01

127.645838 3.71E-01 2.78063E+02 8.28E-01 708.553165 8.40E-01

127.875152 3.71E-01 2.7807E+02 8.28E-01 708.553668 8.45E-01

128.083369 3.72E-01 2.78081E+02 8.28E-01 708.554179 8.50E-01

128.66276 3.72E-01 2.78087E+02 8.28E-01 708.554681 8.55E-01

128.871078 3.72E-01 2.78092E+02 8.28E-01 708.555183 8.60E-01

129.217128 3.72E-01 278.096 8.28E-01 708.555684 8.64E-01

129.476226 3.72E-01 2.78099E+02 8.28E-01 708.556196 8.67E-01

129.972863 3.72E-01 2.78111E+02 8.28E-01 708.5567 8.70E-01

130.458366 3.72E-01 278.12 8.28E-01 708.557201 8.74E-01

131.123556 3.72E-01 2.78155E+02 8.28E-01 708.557718 8.78E-01

131.369163 3.72E-01 2.78157E+02 8.28E-01 708.558228 8.81E-01

131.707645 3.72E-01 2.7816E+02 8.28E-01 708.558757 8.84E-01

131.99021 3.73E-01 2.78172E+02 8.28E-01 708.559258 8.87E-01

132.232757 3.73E-01 2.78181E+02 8.28E-01 708.559761 8.90E-01

132.435301 3.73E-01 278.19 8.28E-01 708.560268 8.92E-01

132.910741 3.73E-01 2.78204E+02 8.28E-01 708.560783 8.94E-01

133.571414 3.74E-01 2.78317E+02 8.28E-01 708.561289 8.96E-01

133.960253 3.74E-01 2.7832E+02 8.28E-01 708.561793 8.99E-01

134.166575 3.74E-01 278.322 8.28E-01 708.562316 9.00E-01

134.858197 3.74E-01 2.78324E+02 8.29E-01 708.562818 9.02E-01

135.382507 3.74E-01 2.78326E+02 8.29E-01 708.563325 9.03E-01

135.647877 3.74E-01 2.78329E+02 8.32E-01 708.563839 9.04E-01

135.862844 3.74E-01 2.78331E+02 8.34E-01 708.564342 9.05E-01

136.234645 3.75E-01 2.78333E+02 8.37E-01 708.564902 9.06E-01

137.762181 3.75E-01 2.78336E+02 8.42E-01 708.565403 9.07E-01

137.990669 3.75E-01 2.78338E+02 8.46E-01 708.566086 9.08E-01

138.349357 3.75E-01 2.78341E+02 8.53E-01 708.566658 9.08E-01

138.677556 3.76E-01 2.78343E+02 8.58E-01 708.567324 9.09E-01

138.983542 3.76E-01 2.78345E+02 8.63E-01 708.567954 9.09E-01

140.293125 3.76E-01 2.78348E+02 8.70E-01 708.568478 9.10E-01

140.85909 3.76E-01 2.7835E+02 8.77E-01 708.569041 9.11E-01

141.398054 3.76E-01 2.78353E+02 8.87E-01 708.569655 9.11E-01

141.703307 3.76E-01 2.78355E+02 8.93E-01 708.570259 9.11E-01

141.912583 3.76E-01 2.78357E+02 8.99E-01 708.570827 9.12E-01

142.125788 3.79E-01 2.7836E+02 9.09E-01 708.571415 9.12E-01

142.33538 3.81E-01 2.78362E+02 9.16E-01 708.571968 9.12E-01

142.539519 3.83E-01 2.78365E+02 9.27E-01 708.572495 9.12E-01

142.744033 3.87E-01 2.78367E+02 9.33E-01 708.573117 9.13E-01

142.946144 3.88E-01 2.78369E+02 9.39E-01 708.57366 9.13E-01

143.146444 3.90E-01 2.78372E+02 9.48E-01 708.574349 9.13E-01

143.355002 3.99E-01 2.78374E+02 9.53E-01 708.734902 9.13E-01

143.507791 4.09E-01 2.78377E+02 9.60E-01 708.737214 9.13E-01

143.609955 4.19E-01 2.78379E+02 9.62E-01 708.74071 9.13E-01

143.739163 4.29E-01 2.78382E+02 9.66E-01 730.532142 9.14E-01

143.91073 4.39E-01 2.78384E+02 9.69E-01 730.532688 9.14E-01

143.918794 4.49E-01 2.78386E+02 9.70E-01 730.533206 9.14E-01

144.005829 4.59E-01 2.78389E+02 9.71E-01 730.53377 9.14E-01

144.271131 4.63E-01 2.78391E+02 9.72E-01 730.534496 9.15E-01

168.353625 4.71E-01 2.78394E+02 9.73E-01 730.535007 9.15E-01

205.550873 4.73E-01 2.78396E+02 9.73E-01 730.535515 9.15E-01

206.371261 4.73E-01 2.78398E+02 9.74E-01 730.536129 9.16E-01

207.200939 4.73E-01 2.78401E+02 9.74E-01 730.536655 9.16E-01

208.033265 4.73E-01 2.78403E+02 9.75E-01 730.537233 9.17E-01

208.864968 4.73E-01 2.78406E+02 9.75E-01 730.537775 9.17E-01

209.712604 4.73E-01 2.78408E+02 9.75E-01 730.538315 9.17E-01

210.522206 4.73E-01 2.7841E+02 9.75E-01 730.538829 9.18E-01

211.331604 4.73E-01 2.78413E+02 9.75E-01 730.53934 9.19E-01

212.185829 4.73E-01 278.416 9.76E-01 730.539858 9.19E-01

212.992211 4.73E-01 2.78418E+02 9.76E-01 730.540368 9.20E-01

213.799864 4.73E-01 2.78421E+02 9.76E-01 730.540876 9.21E-01

214.62675 4.73E-01 278.423 9.76E-01 730.541396 9.22E-01

215.479057 4.73E-01 2.78426E+02 9.76E-01 730.54191 9.23E-01

216.295796 4.73E-01 2.7843E+02 9.76E-01 730.542417 9.24E-01

217.287887 4.73E-01 2.86546E+02 9.76E-01 730.542951 9.26E-01

218.085839 4.73E-01 286.548 9.76E-01 730.543506 9.27E-01

248.570381 4.73E-01 2.86648E+02 9.76E-01 730.544011 9.28E-01

248.582965 4.83E-01 286.726 9.76E-01 730.544532 9.29E-01

248.602264 4.93E-01 2.86979E+02 9.76E-01 730.545079 9.30E-01

250.667727 4.95E-01 2.87013E+02 9.76E-01 730.54558 9.31E-01

2.50978E+02 4.95E-01 287.428 9.76E-01 730.546131 9.32E-01

251.178429 4.96E-01 287.791 9.76E-01 730.546634 9.32E-01

251.450359 5.01E-01 2.87793E+02 9.76E-01 730.547135 9.33E-01

251.577147 5.11E-01 2.87795E+02 9.76E-01 730.547655 9.35E-01

251.5814 5.21E-01 2.87797E+02 9.76E-01 730.548265 9.35E-01

251.585063 5.31E-01 2.878E+02 9.76E-01 730.548825 9.36E-01

251.588196 5.41E-01 2.87802E+02 9.77E-01 730.549424 9.37E-01

251.590875 5.51E-01 2.87805E+02 9.78E-01 730.549944 9.38E-01

251.593276 5.61E-01 2.87807E+02 9.78E-01 730.55048 9.38E-01

251.595566 5.71E-01 2.87809E+02 9.79E-01 730.551023 9.39E-01

251.597953 5.81E-01 2.87812E+02 9.79E-01 730.551602 9.39E-01

251.600493 5.91E-01 2.87814E+02 9.80E-01 730.552188 9.39E-01

251.603049 6.01E-01 2.87817E+02 9.80E-01 730.552719 9.39E-01

251.605817 6.11E-01 2.87819E+02 9.80E-01 730.553309 9.40E-01

251.608813 6.21E-01 2.87821E+02 9.81E-01 730.553887 9.41E-01

251.611617 6.31E-01 2.87824E+02 9.81E-01 730.554445 9.41E-01

251.614217 6.41E-01 2.87826E+02 9.81E-01 730.555148 9.42E-01

251.616523 6.51E-01 2.8783E+02 9.81E-01 730.555665 9.42E-01

251.618835 6.61E-01 2.87832E+02 9.81E-01 730.556229 9.42E-01

251.620634 6.71E-01 2.87835E+02 9.81E-01 730.55677 9.43E-01

251.623102 6.81E-01 2.87837E+02 9.81E-01 730.557478 9.43E-01

251.62579 6.91E-01 2.8784E+02 9.81E-01 730.558167 9.43E-01

251.628944 7.01E-01 2.87842E+02 9.81E-01 730.558985 9.43E-01

251.633684 7.11E-01 2.87844E+02 9.81E-01 730.559949 9.44E-01

251.643573 7.21E-01 2.87848E+02 9.81E-01 730.561118 9.44E-01

251.663133 7.31E-01 287.858 9.81E-01 730.562007 9.44E-01

251.669186 7.41E-01 3.16832E+02 9.81E-01 730.564193 9.44E-01

251.672983 7.51E-01 3.19796E+02 9.81E-01 730.564785 9.44E-01

251.676297 7.61E-01 319.798 9.81E-01 730.565398 9.44E-01

251.680184 7.71E-01 3.198E+02 9.82E-01 730.565933 9.44E-01

251.684952 7.81E-01 3.19802E+02 9.82E-01 730.568703 9.44E-01

253.605018 7.90E-01 3.19805E+02 9.83E-01 798.192375 9.44E-01

253.830593 7.91E-01 3.19807E+02 9.83E-01 798.19288 9.45E-01

254.032387 7.95E-01 3.19809E+02 9.84E-01 798.193486 9.45E-01

254.232723 7.98E-01 3.19812E+02 9.84E-01 798.194012 9.45E-01

254.432982 8.01E-01 3.19814E+02 9.85E-01 798.194544 9.45E-01

254.655516 8.02E-01 3.19817E+02 9.85E-01 798.195065 9.46E-01

254.674669 8.12E-01 3.19819E+02 9.85E-01 798.195797 9.46E-01

254.679678 8.23E-01 3.19821E+02 9.85E-01 798.196458 9.46E-01

254.684061 8.33E-01 3.20099E+02 9.85E-01 798.19703 9.46E-01

254.687398 8.43E-01 3.20102E+02 9.86E-01 798.197973 9.46E-01

254.691011 8.53E-01 3.20104E+02 9.86E-01 798.198574 9.47E-01

254.695441 8.63E-01 3.20106E+02 9.87E-01 798.199115 9.47E-01

254.698646 8.73E-01 320.108 9.87E-01 798.199629 9.47E-01

254.701712 8.83E-01 3.2011E+02 9.87E-01 798.200334 9.48E-01

254.704391 8.93E-01 3.20112E+02 9.87E-01 798.200892 9.48E-01

254.707251 9.03E-01 3.20115E+02 9.87E-01 798.201777 9.48E-01

254.709434 9.13E-01 3.20117E+02 9.88E-01 798.202418 9.48E-01

254.711908 9.23E-01 3.20119E+02 9.88E-01 798.202978 9.49E-01

254.714907 9.33E-01 3.20122E+02 9.88E-01 798.204407 9.49E-01

254.719086 9.43E-01 3.20124E+02 9.88E-01 798.205017 9.49E-01

261.353827 9.50E-01 3.20127E+02 9.89E-01 798.20556 9.50E-01

263.471832 9.59E-01 3.20129E+02 9.89E-01 798.206085 9.50E-01

263.672785 9.60E-01 3.20132E+02 9.89E-01 798.207028 9.51E-01

263.937979 9.60E-01 3.20135E+02 9.89E-01 798.207554 9.51E-01

264.166399 9.60E-01 320.137 9.89E-01 798.20826 9.52E-01

264.367359 9.68E-01 3.20163E+02 9.89E-01 798.209308 9.52E-01

291.301074 9.68E-01 3.20195E+02 9.89E-01 798.210202 9.52E-01

291.502108 9.68E-01 3.20218E+02 9.89E-01 798.211121 9.52E-01

291.734693 9.69E-01 3.20234E+02 9.89E-01 798.211632 9.53E-01

291.987937 9.69E-01 3.20237E+02 9.89E-01 798.212514 9.54E-01

292.256392 9.69E-01 3.20256E+02 9.89E-01 798.213112 9.54E-01

292.374567 9.79E-01 3.20268E+02 9.89E-01 798.21362 9.56E-01

293.590341 9.83E-01 3.2027E+02 9.89E-01 798.214179 9.57E-01

293.790637 9.84E-01 3.20285E+02 9.89E-01 798.214753 9.58E-01

293.818018 9.94E-01 3.20296E+02 9.89E-01 798.215261 9.60E-01

294.089318 1.00E+00 320.298 9.89E-01 798.21583 9.61E-01

294.114777 1.00E+00 3.20314E+02 9.89E-01 798.216562 9.61E-01

3.20328E+02 9.89E-01 798.217112 9.63E-01

3.20331E+02 9.89E-01 798.217642 9.63E-01

3.20342E+02 9.89E-01 798.218156 9.64E-01

320.346 9.89E-01 798.218661 9.65E-01

3.20359E+02 9.89E-01 798.219163 9.67E-01

3.20372E+02 9.89E-01 798.219707 9.68E-01

3.20385E+02 9.89E-01 798.220228 9.70E-01

3.20388E+02 9.89E-01 798.220764 9.72E-01

3.204E+02 9.89E-01 798.221294 9.73E-01

3.20406E+02 9.89E-01 798.22185 9.75E-01

3.2041E+02 9.89E-01 798.222376 9.77E-01

3.20933E+02 9.89E-01 798.222895 9.79E-01

3.20939E+02 9.89E-01 798.223401 9.80E-01

3.20944E+02 9.89E-01 798.223903 9.82E-01

3.2096E+02 9.89E-01 798.224435 9.83E-01

3.20965E+02 9.89E-01 798.224974 9.84E-01

3.21006E+02 9.89E-01 798.225477 9.85E-01

3.21009E+02 9.89E-01 798.226055 9.86E-01

3.21011E+02 9.89E-01 798.226591 9.87E-01

3.21014E+02 9.89E-01 798.227096 9.88E-01

3.21027E+02 9.89E-01 798.227682 9.89E-01

3.21032E+02 9.89E-01 798.228251 9.90E-01

321.036 9.89E-01 798.228837 9.91E-01

321.041 9.89E-01 798.229361 9.92E-01

3.21047E+02 9.89E-01 798.229929 9.93E-01

3.21052E+02 9.90E-01 798.230618 9.93E-01

3.21057E+02 9.90E-01 798.231123 9.93E-01

3.21062E+02 9.90E-01 798.231681 9.94E-01

3.21064E+02 9.90E-01 798.232215 9.95E-01

3.21082E+02 9.90E-01 798.232724 9.96E-01

3.21087E+02 9.90E-01 798.233263 9.96E-01

321.101 9.90E-01 798.233861 9.97E-01

3.21106E+02 9.90E-01 798.234377 9.97E-01

3.21465E+02 9.90E-01 798.234903 9.97E-01

3.21468E+02 9.90E-01 798.235651 9.98E-01

3.21473E+02 9.90E-01 798.236198 9.98E-01

3.21741E+02 9.90E-01 798.236792 9.99E-01

3.21874E+02 9.90E-01 798.23772 9.99E-01

3.21881E+02 9.90E-01 798.239139 9.99E-01

3.21919E+02 9.90E-01 798.239735 9.99E-01

3.21956E+02 9.90E-01 798.240493 9.99E-01

3.21981E+02 9.90E-01 798.241815 9.99E-01

3.22114E+02 9.90E-01 798.245137 1.00E+00

3.22386E+02 9.90E-01 798.246312 1.00E+00

3.22391E+02 9.90E-01 799.77378 1.00E+00

3.22396E+02 9.90E-01 799.775975 1.00E+00

3.2245E+02 9.90E-01 800.301105 1.00E+00

3.22453E+02 9.90E-01 800.301266 1.00E+00

3.22511E+02 9.90E-01

3.22514E+02 9.90E-01

3.22516E+02 9.90E-01

3.22523E+02 9.90E-01

3.22538E+02 9.90E-01

3.22603E+02 9.90E-01

3.22638E+02 9.90E-01

322.697 9.90E-01

322.815 9.90E-01

3.22981E+02 9.90E-01

322.983 9.90E-01

3.22985E+02 9.90E-01

3.22987E+02 9.90E-01

3.2299E+02 9.90E-01

3.22992E+02 9.90E-01

3.22994E+02 9.91E-01

3.22997E+02 9.91E-01

3.22999E+02 9.91E-01

3.23002E+02 9.92E-01

3.23004E+02 9.93E-01

3.23007E+02 9.94E-01

3.23009E+02 9.94E-01

3.23011E+02 9.95E-01

3.23014E+02 9.95E-01

3.23016E+02 9.96E-01

3.23019E+02 9.97E-01

3.23021E+02 9.97E-01

3.23023E+02 9.98E-01

3.23026E+02 9.99E-01

3.23028E+02 9.99E-01

3.23031E+02 9.99E-01

3.23033E+02 9.99E-01

3.23035E+02 9.99E-01

3.23038E+02 1.00E+00

3.2304E+02 1.00E+00

3.23043E+02 1.00E+00

3.23045E+02 1.00E+00

3.23047E+02 1.00E+00

3.23056E+02 1.00E+00

3.23058E+02 1.00E+00

323.065 1.00E+00

3.24224E+02 1.00E+00

3.24227E+02 1.00E+00

3.24229E+02 1.00E+00

3.24232E+02 1.00E+00

3.24249E+02 1.00E+00

3.24253E+02 1.00E+00

3.24276E+02 1.00E+00

3.2428E+02 1.00E+00

3.24306E+02 1.00E+00

3.24312E+02 1.00E+00

3.24316E+02 1.00E+00

324.322 1.00E+00

324.327 1.00E+00

3.24331E+02 1.00E+00

3.24333E+02 1.00E+00

3.24336E+02 1.00E+00

3.24352E+02 1.00E+00

324.358 1.00E+00

3.24364E+02 1.00E+00

3.24378E+02 1.00E+00

3.24383E+02 1.00E+00

3.24787E+02 1.00E+00

3.24796E+02 1.00E+00

3.24799E+02 1.00E+00

324.803 1.00E+00

324.803 1.00E+00

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 90 180 270 360 450 540 630 720 810 900C

um

ula

tive F

racti

on

FCT (Seconds)

OpenTCP1

OpenTCP2

TCP

(a)0.0 1.33E-01 0.0 8.16E-01 0.0 2.33E-01

0.0 1.33E-01 0.0 8.16E-01 0.0 2.33E-01

1.0 2.46E-01 1.0 8.29E-01 1.0 3.56E-01

2.0 3.05E-01 2.0 8.31E-01 2.0 4.56E-01

3.0 3.52E-01 3.0 8.33E-01 3.0 5.48E-01

4.0 3.73E-01 4.0 8.33E-01 4.0 6.24E-01

6.0 3.86E-01 5.0 8.33E-01 5.0 6.92E-01

11.0 3.98E-01 6.0 8.33E-01 6.0 7.51E-01

15.0 4.10E-01 8.0 8.33E-01 7.0 8.00E-01

18.0 4.21E-01 9.0 8.33E-01 8.0 8.45E-01

21.0 4.36E-01 11.0 8.33E-01 9.0 8.78E-01

23.0 4.46E-01 12.0 8.33E-01 10.0 9.04E-01

25.0 4.57E-01 13.0 8.33E-01 11.0 9.26E-01

27.0 4.68E-01 14.0 8.33E-01 12.0 9.41E-01

29.0 4.78E-01 15.0 8.33E-01 13.0 9.53E-01

31.0 4.89E-01 16.0 8.33E-01 14.0 9.62E-01

33.0 5.00E-01 17.0 8.34E-01 15.0 9.71E-01

35.0 5.10E-01 19.0 8.34E-01 16.0 9.76E-01

38.0 5.24E-01 21.0 8.34E-01 17.0 9.81E-01

41.0 5.38E-01 23.0 8.34E-01 18.0 9.85E-01

44.0 5.51E-01 31.0 8.34E-01 19.0 9.87E-01

47.0 5.64E-01 32.0 8.34E-01 20.0 9.90E-01

50.0 5.79E-01 33.0 8.34E-01 21.0 9.91E-01

52.0 5.89E-01 34.0 8.34E-01 22.0 9.92E-01

54.0 6.00E-01 35.0 8.34E-01 23.0 9.93E-01

56.0 6.11E-01 36.0 8.34E-01 24.0 9.93E-01

58.0 6.24E-01 37.0 8.34E-01 25.0 9.94E-01

60.0 6.36E-01 38.0 8.34E-01 26.0 9.95E-01

62.0 6.48E-01 41.0 8.34E-01 27.0 9.96E-01

64.0 6.61E-01 42.0 8.34E-01 28.0 9.96E-01

66.0 6.73E-01 43.0 8.34E-01 29.0 9.96E-01

68.0 6.84E-01 44.0 8.34E-01 30.0 9.97E-01

70.0 6.96E-01 45.0 8.34E-01 31.0 9.97E-01

72.0 7.08E-01 46.0 8.34E-01 32.0 9.98E-01

74.0 7.21E-01 47.0 8.34E-01 33.0 9.98E-01

76.0 7.33E-01 48.0 8.34E-01 34.0 9.98E-01

78.0 7.43E-01 49.0 8.35E-01 35.0 9.98E-01

80.0 7.55E-01 50.0 8.35E-01 36.0 9.99E-01

82.0 7.66E-01 51.0 8.35E-01 37.0 9.99E-01

84.0 7.77E-01 52.0 8.35E-01 38.0 9.99E-01

86.0 7.87E-01 53.0 8.35E-01 39.0 9.99E-01

89.0 8.01E-01 54.0 8.36E-01 42.0 9.99E-01

92.0 8.14E-01 55.0 8.36E-01 43.0 9.99E-01

95.0 8.26E-01 56.0 8.37E-01 46.0 9.99E-01

98.0 8.37E-01 57.0 8.38E-01 47.0 9.99E-01

101.0 8.48E-01 58.0 8.38E-01 48.0 9.99E-01

104.0 8.58E-01 59.0 8.39E-01 51.0 1.00E+00

108.0 8.71E-01 60.0 8.40E-01 52.0 1.00E+00

112.0 8.83E-01 61.0 8.42E-01 55.0 1.00E+00

116.0 8.95E-01 62.0 8.43E-01 71.0 1.00E+00

120.0 9.06E-01 63.0 8.44E-01 71.0 1.00E+00

124.0 9.17E-01 64.0 8.46E-01

129.0 9.28E-01 65.0 8.48E-01

134.0 9.38E-01 66.0 8.50E-01

140.0 9.47E-01 67.0 8.51E-01

146.0 9.52E-01 68.0 8.53E-01

152.0 9.57E-01 69.0 8.56E-01

158.0 9.59E-01 70.0 8.58E-01

164.0 9.61E-01 71.0 8.60E-01

170.0 9.62E-01 72.0 8.62E-01

176.0 9.62E-01 73.0 8.65E-01

182.0 9.63E-01 74.0 8.68E-01

188.0 9.64E-01 75.0 8.70E-01

194.0 9.65E-01 76.0 8.73E-01

200.0 9.65E-01 77.0 8.76E-01

206.0 9.66E-01 78.0 8.79E-01

212.0 9.67E-01 79.0 8.81E-01

218.0 9.67E-01 80.0 8.83E-01

224.0 9.68E-01 81.0 8.86E-01

230.0 9.69E-01 82.0 8.88E-01

236.0 9.69E-01 83.0 8.90E-01

242.0 9.70E-01 84.0 8.93E-01

248.0 9.70E-01 85.0 8.94E-01

254.0 9.71E-01 86.0 8.96E-01

260.0 9.71E-01 87.0 8.98E-01

266.0 9.72E-01 88.0 9.00E-01

272.0 9.72E-01 89.0 9.02E-01

278.0 9.73E-01 90.0 9.03E-01

284.0 9.73E-01 91.0 9.05E-01

290.0 9.73E-01 92.0 9.06E-01

296.0 9.74E-01 93.0 9.08E-01

302.0 9.74E-01 94.0 9.09E-01

308.0 9.75E-01 95.0 9.11E-01

314.0 9.75E-01 96.0 9.11E-01

320.0 9.76E-01 97.0 9.13E-01

326.0 9.76E-01 98.0 9.13E-01

332.0 9.77E-01 99.0 9.14E-01

338.0 9.77E-01 100.0 9.15E-01

344.0 9.77E-01 101.0 9.16E-01

350.0 9.78E-01 102.0 9.16E-01

356.0 9.78E-01 103.0 9.17E-01

362.0 9.78E-01 104.0 9.18E-01

368.0 9.79E-01 105.0 9.18E-01

374.0 9.79E-01 106.0 9.19E-01

380.0 9.79E-01 107.0 9.19E-01

386.0 9.80E-01 108.0 9.20E-01

392.0 9.80E-01 109.0 9.20E-01

398.0 9.80E-01 110.0 9.21E-01

404.0 9.81E-01 111.0 9.21E-01

410.0 9.81E-01 112.0 9.22E-01

416.0 9.81E-01 113.0 9.23E-01

422.0 9.81E-01 114.0 9.23E-01

428.0 9.82E-01 115.0 9.24E-01

434.0 9.82E-01 116.0 9.25E-01

440.0 9.82E-01 117.0 9.26E-01

446.0 9.82E-01 118.0 9.27E-01

452.0 9.83E-01 119.0 9.27E-01

458.0 9.83E-01 120.0 9.28E-01

464.0 9.83E-01 121.0 9.29E-01

470.0 9.83E-01 122.0 9.31E-01

476.0 9.83E-01 123.0 9.32E-01

482.0 9.84E-01 124.0 9.33E-01

488.0 9.84E-01 125.0 9.34E-01

495.0 9.84E-01 126.0 9.36E-01

501.0 9.84E-01 127.0 9.37E-01

507.0 9.84E-01 128.0 9.39E-01

513.0 9.84E-01 129.0 9.41E-01

521.0 9.84E-01 130.0 9.43E-01

528.0 9.85E-01 131.0 9.44E-01

534.0 9.85E-01 132.0 9.46E-01

541.0 9.85E-01 133.0 9.48E-01

548.0 9.85E-01 134.0 9.50E-01

556.0 9.85E-01 135.0 9.52E-01

562.0 9.85E-01 136.0 9.54E-01

568.0 9.85E-01 137.0 9.56E-01

574.0 9.85E-01 138.0 9.58E-01

582.0 9.86E-01 139.0 9.60E-01

589.0 9.86E-01 140.0 9.62E-01

596.0 9.86E-01 141.0 9.64E-01

602.0 9.86E-01 142.0 9.66E-01

608.0 9.86E-01 143.0 9.68E-01

615.0 9.86E-01 144.0 9.70E-01

621.0 9.86E-01 145.0 9.72E-01

628.0 9.86E-01 146.0 9.74E-01

635.0 9.87E-01 147.0 9.75E-01

641.0 9.87E-01 148.0 9.77E-01

651.0 9.87E-01 149.0 9.78E-01

658.0 9.87E-01 150.0 9.80E-01

666.0 9.87E-01 151.0 9.82E-01

672.0 9.87E-01 152.0 9.83E-01

680.0 9.87E-01 153.0 9.84E-01

686.0 9.87E-01 154.0 9.85E-01

692.0 9.87E-01 155.0 9.87E-01

699.0 9.88E-01 156.0 9.88E-01

707.0 9.88E-01 157.0 9.89E-01

716.0 9.88E-01 158.0 9.90E-01

722.0 9.88E-01 159.0 9.91E-01

730.0 9.88E-01 160.0 9.92E-01

738.0 9.88E-01 161.0 9.92E-01

745.0 9.88E-01 162.0 9.93E-01

755.0 9.88E-01 163.0 9.94E-01

763.0 9.88E-01 164.0 9.95E-01

772.0 9.88E-01 165.0 9.95E-01

778.0 9.88E-01 166.0 9.96E-01

785.0 9.88E-01 167.0 9.96E-01

792.0 9.89E-01 168.0 9.96E-01

798.0 9.89E-01 169.0 9.97E-01

808.0 9.89E-01 170.0 9.97E-01

814.0 9.89E-01 171.0 9.97E-01

820.0 9.89E-01 172.0 9.97E-01

826.0 9.89E-01 173.0 9.97E-01

832.0 9.89E-01 174.0 9.98E-01

838.0 9.89E-01 175.0 9.98E-01

844.0 9.89E-01 176.0 9.98E-01

852.0 9.89E-01 177.0 9.98E-01

859.0 9.89E-01 178.0 9.98E-01

865.0 9.89E-01 179.0 9.98E-01

873.0 9.89E-01 180.0 9.98E-01

879.0 9.90E-01 181.0 9.98E-01

885.0 9.90E-01 182.0 9.98E-01

891.0 9.90E-01 183.0 9.98E-01

898.0 9.90E-01 184.0 9.98E-01

904.0 9.90E-01 185.0 9.98E-01

918.0 9.90E-01 187.0 9.98E-01

924.0 9.90E-01 188.0 9.98E-01

934.0 9.90E-01 189.0 9.98E-01

941.0 9.90E-01 190.0 9.98E-01

947.0 9.90E-01 191.0 9.98E-01

955.0 9.90E-01 192.0 9.99E-01

961.0 9.90E-01 193.0 9.99E-01

967.0 9.90E-01 194.0 9.99E-01

976.0 9.90E-01 195.0 9.99E-01

982.0 9.91E-01 196.0 9.99E-01

990.0 9.91E-01 197.0 9.99E-01

996.0 9.91E-01 198.0 9.99E-01

1002.0 9.91E-01 202.0 9.99E-01

1008.0 9.91E-01 203.0 9.99E-01

1014.0 9.91E-01 204.0 9.99E-01

1020.0 9.91E-01 205.0 9.99E-01

1027.0 9.91E-01 206.0 9.99E-01

1034.0 9.91E-01 207.0 9.99E-01

1042.0 9.91E-01 209.0 9.99E-01

1052.0 9.91E-01 210.0 9.99E-01

1060.0 9.91E-01 211.0 9.99E-01

1070.0 9.91E-01 212.0 9.99E-01

1081.0 9.91E-01 214.0 9.99E-01

1092.0 9.91E-01 225.0 9.99E-01

1099.0 9.91E-01 227.0 9.99E-01

1107.0 9.91E-01 228.0 9.99E-01

1113.0 9.92E-01 231.0 9.99E-01

1120.0 9.92E-01 232.0 9.99E-01

1131.0 9.92E-01 233.0 9.99E-01

1142.0 9.92E-01 234.0 9.99E-01

1148.0 9.92E-01 236.0 9.99E-01

1154.0 9.92E-01 237.0 9.99E-01

1162.0 9.92E-01 241.0 9.99E-01

1173.0 9.92E-01 244.0 9.99E-01

1182.0 9.92E-01 246.0 9.99E-01

1190.0 9.92E-01 248.0 9.99E-01

1201.0 9.92E-01 250.0 9.99E-01

1207.0 9.92E-01 252.0 9.99E-01

1215.0 9.92E-01 253.0 9.99E-01

1221.0 9.92E-01 254.0 9.99E-01

1228.0 9.92E-01 256.0 9.99E-01

1234.0 9.92E-01 257.0 9.99E-01

1240.0 9.92E-01 258.0 9.99E-01

1246.0 9.92E-01 259.0 9.99E-01

1256.0 9.92E-01 262.0 9.99E-01

1262.0 9.92E-01 263.0 9.99E-01

1268.0 9.93E-01 267.0 9.99E-01

1275.0 9.93E-01 268.0 9.99E-01

1281.0 9.93E-01 272.0 9.99E-01

1288.0 9.93E-01 274.0 9.99E-01

1294.0 9.93E-01 275.0 9.99E-01

1301.0 9.93E-01 277.0 9.99E-01

1307.0 9.93E-01 278.0 9.99E-01

1314.0 9.93E-01 279.0 9.99E-01

1320.0 9.93E-01 283.0 9.99E-01

1326.0 9.93E-01 285.0 9.99E-01

1332.0 9.93E-01 286.0 9.99E-01

1338.0 9.94E-01 287.0 9.99E-01

1344.0 9.94E-01 289.0 9.99E-01

1350.0 9.94E-01 291.0 9.99E-01

1356.0 9.94E-01 294.0 9.99E-01

1364.0 9.94E-01 295.0 9.99E-01

1370.0 9.94E-01 299.0 9.99E-01

1378.0 9.94E-01 300.0 9.99E-01

1384.0 9.94E-01 302.0 9.99E-01

1395.0 9.94E-01 306.0 9.99E-01

1401.0 9.94E-01 307.0 9.99E-01

1411.0 9.94E-01 309.0 1.00E+00

1417.0 9.95E-01 313.0 1.00E+00

1424.0 9.95E-01 316.0 1.00E+00

1432.0 9.95E-01 320.0 1.00E+00

1442.0 9.95E-01 325.0 1.00E+00

1449.0 9.95E-01 330.0 1.00E+00

1455.0 9.95E-01 333.0 1.00E+00

1461.0 9.95E-01 334.0 1.00E+00

1468.0 9.95E-01 335.0 1.00E+00

1474.0 9.95E-01 336.0 1.00E+00

1484.0 9.95E-01 341.0 1.00E+00

1492.0 9.95E-01 344.0 1.00E+00

1499.0 9.96E-01 346.0 1.00E+00

1510.0 9.96E-01 358.0 1.00E+00

1517.0 9.96E-01 365.0 1.00E+00

1524.0 9.96E-01 368.0 1.00E+00

1533.0 9.96E-01 369.0 1.00E+00

1539.0 9.96E-01 390.0 1.00E+00

1547.0 9.96E-01 393.0 1.00E+00

1553.0 9.96E-01 411.0 1.00E+00

1559.0 9.96E-01 422.0 1.00E+00

1565.0 9.96E-01 423.0 1.00E+00

1571.0 9.96E-01 449.0 1.00E+00

1577.0 9.97E-01 459.0 1.00E+00

1585.0 9.97E-01 462.0 1.00E+00

1592.0 9.97E-01 499.0 1.00E+00

1599.0 9.97E-01 505.0 1.00E+00

1605.0 9.97E-01 510.0 1.00E+00

1617.0 9.97E-01 545.0 1.00E+00

1626.0 9.97E-01 559.0 1.00E+00

1641.0 9.97E-01 593.0 1.00E+00

1647.0 9.97E-01 619.0 1.00E+00

1657.0 9.97E-01 644.0 1.00E+00

1664.0 9.97E-01 788.0 1.00E+00

1672.0 9.97E-01 828.0 1.00E+00

1683.0 9.97E-01 853.0 1.00E+00

1692.0 9.97E-01 935.0 1.00E+00

1699.0 9.97E-01 941.0 1.00E+00

1709.0 9.97E-01 1018.0 1.00E+00

1715.0 9.97E-01 1263.0 1.00E+00

1727.0 9.97E-01 1306.0 1.00E+00

1733.0 9.97E-01 1373.0 1.00E+00

1739.0 9.98E-01 1373.0 1.00E+00

1749.0 9.98E-01

1757.0 9.98E-01

1763.0 9.98E-01

1771.0 9.98E-01

1784.0 9.98E-01

1791.0 9.98E-01

1801.0 9.98E-01

1817.0 9.98E-01

1836.0 9.98E-01

1848.0 9.98E-01

1859.0 9.98E-01

1870.0 9.98E-01

1885.0 9.98E-01

1898.0 9.98E-01

1913.0 9.98E-01

1930.0 9.98E-01

1936.0 9.98E-01

1949.0 9.98E-01

1956.0 9.98E-01

1969.0 9.98E-01

1976.0 9.99E-01

1985.0 9.99E-01

1999.0 9.99E-01

2008.0 9.99E-01

2018.0 9.99E-01

2033.0 9.99E-01

2049.0 9.99E-01

2055.0 9.99E-01

2063.0 9.99E-01

2074.0 9.99E-01

2091.0 9.99E-01

2098.0 9.99E-01

2112.0 9.99E-01

2123.0 9.99E-01

2133.0 9.99E-01

2140.0 9.99E-01

2150.0 9.99E-01

2156.0 9.99E-01

2169.0 9.99E-01

2175.0 9.99E-01

2181.0 9.99E-01

2193.0 9.99E-01

2210.0 9.99E-01

2222.0 9.99E-01

2232.0 9.99E-01

2251.0 9.99E-01

2262.0 9.99E-01

2271.0 9.99E-01

2281.0 9.99E-01

2288.0 9.99E-01

2297.0 9.99E-01

2319.0 9.99E-01

2325.0 1.00E+00

2336.0 1.00E+00

2343.0 1.00E+00

2401.0 1.00E+00

2420.0 1.00E+00

2435.0 1.00E+00

2451.0 1.00E+00

2478.0 1.00E+00

2509.0 1.00E+00

2526.0 1.00E+00

2570.0 1.00E+00

2614.0 1.00E+00

2718.0 1.00E+00

2786.0 1.00E+00

3231.0 1.00E+00

3242.0 1.00E+00

3439.0 1.00E+00

4005.0 1.00E+00

4005.0 1.00E+00

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1E+00 1E+01 1E+02 1E+03 1E+04

Cum

ula

tive

Fra

ctio

n

Log of Normalized Drop Rate

1E+001E-011E-021E-031E-04

OpenTCP

1

OpenTCP2

TCP

(b)

Figure 5.7: Comparing the CDF of FCT (a) and drop rate (b) of OpenTCP1, OpenTCP2, andregular TCP while running the “all-to-all” IMB benchmark.

Experiment Results. To run our OpenTCP experiments, we �rst de�ne two congestion control

policies for OpenTCP: OpenTCP1 and OpenTCP2. We then compare the performance of

OpenTCP1, OpenTCP2, and unmodi�ed TCP in terms of Flow Completion Times (FCT) and

packet drop rates using di�erent benchmarks. Finally, we take a closer look at the overheads of

OpenTCP and the impact of changing the refresh rate.

OpenTCP1 – Minimize FCTs. Our �rst experiment with OpenTCP aims at reducing FCTs by

updating the initial congestion window size, and the retransmission time-out (RTO). In SciNet,

there is a strong correlation between MPI job completion times and the tail of FCT. �erefore,

reducing FCTs will have a direct impact on job completion times.

OpenTCP1 collects link utilizations, drop rate, ratio of short-lived to long-lived �ows and

number of active �ows. �e optimization function is to �nd the appropriate value for the initial


OpenTCP TCP

< 5 44.13613 90.71601

[5-9] 10.30827 9.283988

[9-14] 7.219584 0

[15-19] 13.08433 0

[20-24] 5.071508 0

[25-29] 5.060271 0

[30-34] 14.73941 0

>=35 0.380487 0

0.0251 0

0.238198 0

0

20

40

60

80

100

< 5 [5-9] [10-14] [15-19] [20-24] [25-29] ! 30Per

cen

tage

of

Pack

ets

Congestion Window (Segments)

OpenTCP TCP1

Figure 5.8: Congestion window distribution for OpenTCP1 and unmodi�ed TCP.

congestion window size (w0) as long as the maximum link utilization is kept below 70%. If the

link utilization goes above 70% for any of the links, the Oracle will stop sending CUEs and the

CCAs will fall back to the default values of the initial congestion window size and the RTO.

OpenTCP2 – Minimize FCTs, limit drops. While OpenTCP1 aims at improving FCTs by

adapting init_cwnd and ignoring packet drops, we de�ne a second variant called OpenTCP2

which improves upon OpenTCP1 by keeping the packet drop rate below 0.1%. �is means that

when the CCA observes a packet drop rate above 0.1% it reverts the initial congestion window

size and the RTO to their default values. �is ensures that when the CCA observes a packet

drop rate above 0.1% it returns the initial congestion window size and the RTO to their default

values.

Impact on �ow completion times. Figure 5.7 depicts the CDF of �ow completion times for

OpenTCP1, OpenTCP2, and unmodi�ed TCP. Here, the tail of the FCT curve for OpenTCP1

is almost 64% shorter than for the unmodi�ed TCP. As mentioned above, this leads to faster

job completion in our experiments. Additionally, more than 45% of the �ows �nish in under

260 seconds in OpenTCP1, indicating a 62% improvement over unmodi�ed TCP. Similarly,

OpenTCP2 helps 80% of the �ows �nish in a fraction of a second, a signi�cant improvement

over OpenTCP1. �is is because of OpenTCP2’s fast reaction to drop rate at the end-hosts.

In OpenTCP2, the tail of the FCT curve is 324 seconds, a 59% improvement over unmodi�ed


TCP. �e FCT tail of OpenTCP2 is slightly (8%) longer than OpenTCP1 since it incorporates

conditional CUEs and does not aggressively increase the initial congestion window. Moreover,

note that the big gap between FCT of TCP and OpenTCP variants is due to nature of all-to-all

benchmark that is stressing the SciNet network. In this benchmark, every process is sending

and receiving to/from all other processes. �is creates a TCP map-behaviour similar to TCP

incast throughput collapse [119].

Impact on congestion window size.�e improvements in FCTs are a direct result of increasing

init_cwnd in OpenTCP1. Figure 5.8 presents a histogram of congestion window sizes for

OpenTCP1 and unmodi�ed TCP. In the case of unmodi�ed TCP, more than 90% of packets are

sent when the source has a congestion window size of three, four, or �ve segments. OpenTCP1,

however, is able to operate at a larger range for congestion window sizes: more than 50% of

packets are sent while the congestion window size is greater than 5 segments.

Impact on packet drops. �e FCT improvements in OpenTCP1 and OpenTCP2 come at a

price. Clearly, operating at larger congestion window sizes will result in a higher probability

of congestion in the network, and thus, may lead to packet drops. As Figure 5.9 shows, the drop

rate distribution for unmodi�ed TCP has a shorter tail compared to OpenTCP1 and OpenTCP2.

As expected, OpenTCP1 introduces a considerable amount of drops in the network since the

CUEs are not conditional and thus the CCAs do not react to packet drops. OpenTCP2 has no

drops for more than 81% of the �ows. �is is a signi�cant improvement over unmodi�ed TCP.

However, the highest 18% of drops in OpenTCP2 are worse than those in unmodi�ed TCP, as

OpenTCP2 needs to observe packet drops before it can react. Some �ows will take the hit in this

case, and might end up with relatively large drop rates.

�e Flash Benchmark. All the experiments described so far use the “all-to-all” IMB benchmark.

�e Flash benchmark is a selection of user jobs that we use to evaluate OpenTCP under

realistic workloads. Flash solves the Sedov point-blast wave problem [109], a self-similar

description of the evolution of a blast wave arising from a powerful explosion. �e Flash

benchmark represents a common implementation and tra�c patterns in SciNet; it generates


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300C

um

ula

tive F

racti

on

FCT (Seconds)

OpenTCP1

OpenTCP2

TCP

(a)

0.0 2.12E-01 0.0 6.62E-02 0.0 8.13E-01

0.0 2.12E-01 0.0 6.62E-02 0.0 8.13E-01

1.0 4.54E-01 1.0 7.88E-01 1.0 8.73E-01

2.0 5.32E-01 2.0 8.55E-01 2.0 9.05E-01

3.0 5.69E-01 3.0 8.68E-01 3.0 9.28E-01

4.0 5.84E-01 6.0 8.81E-01 4.0 9.44E-01

5.0 5.97E-01 9.0 8.86E-01 5.0 9.56E-01

7.0 6.13E-01 12.0 8.90E-01 6.0 9.63E-01

9.0 6.23E-01 15.0 8.93E-01 7.0 9.69E-01

11.0 6.34E-01 18.0 8.96E-01 8.0 9.74E-01

13.0 6.42E-01 21.0 8.98E-01 9.0 9.78E-01

15.0 6.49E-01 24.0 9.00E-01 10.0 9.82E-01

17.0 6.54E-01 27.0 9.02E-01 11.0 9.85E-01

19.0 6.59E-01 30.0 9.04E-01 12.0 9.87E-01

21.0 6.63E-01 33.0 9.05E-01 13.0 9.89E-01

23.0 6.68E-01 36.0 9.07E-01 14.0 9.91E-01

25.0 6.74E-01 39.0 9.09E-01 15.0 9.93E-01

27.0 6.78E-01 42.0 9.10E-01 16.0 9.95E-01

29.0 6.81E-01 45.0 9.11E-01 17.0 9.96E-01

31.0 6.85E-01 48.0 9.12E-01 18.0 9.96E-01

33.0 6.87E-01 51.0 9.13E-01 19.0 9.97E-01

35.0 6.91E-01 54.0 9.14E-01 20.0 9.98E-01

37.0 6.94E-01 57.0 9.15E-01 21.0 9.98E-01

39.0 6.98E-01 60.0 9.17E-01 22.0 9.99E-01

41.0 7.02E-01 63.0 9.18E-01 23.0 9.99E-01

43.0 7.05E-01 66.0 9.18E-01 24.0 9.99E-01

45.0 7.07E-01 69.0 9.20E-01 25.0 9.99E-01

47.0 7.10E-01 72.0 9.20E-01 26.0 1.00E+00

49.0 7.12E-01 75.0 9.21E-01 27.0 1.00E+00

51.0 7.15E-01 78.0 9.22E-01 28.0 1.00E+00

53.0 7.17E-01 81.0 9.23E-01 33.0 1.00E+00

55.0 7.18E-01 84.0 9.24E-01 35.0 1.00E+00

57.0 7.20E-01 87.0 9.25E-01 35.0 1.00E+00

59.0 7.23E-01 90.0 9.26E-01

61.0 7.25E-01 93.0 9.27E-01

63.0 7.27E-01 96.0 9.27E-01

65.0 7.29E-01 99.0 9.28E-01

67.0 7.32E-01 102.0 9.29E-01

69.0 7.34E-01 105.0 9.30E-01

71.0 7.36E-01 108.0 9.30E-01

73.0 7.38E-01 111.0 9.31E-01

75.0 7.40E-01 114.0 9.32E-01

77.0 7.42E-01 117.0 9.32E-01

79.0 7.44E-01 120.0 9.33E-01

81.0 7.46E-01 123.0 9.33E-01

83.0 7.48E-01 126.0 9.34E-01

85.0 7.50E-01 129.0 9.35E-01

87.0 7.51E-01 132.0 9.35E-01

89.0 7.53E-01 135.0 9.36E-01

91.0 7.55E-01 138.0 9.37E-01

93.0 7.56E-01 141.0 9.37E-01

95.0 7.58E-01 144.0 9.38E-01

97.0 7.59E-01 147.0 9.39E-01

99.0 7.60E-01 150.0 9.39E-01

101.0 7.61E-01 153.0 9.40E-01

103.0 7.62E-01 156.0 9.40E-01

105.0 7.64E-01 159.0 9.41E-01

107.0 7.66E-01 162.0 9.42E-01

109.0 7.68E-01 165.0 9.42E-01

111.0 7.69E-01 168.0 9.43E-01

113.0 7.70E-01 171.0 9.43E-01

115.0 7.72E-01 174.0 9.44E-01

117.0 7.73E-01 177.0 9.44E-01

119.0 7.75E-01 180.0 9.45E-01

121.0 7.77E-01 183.0 9.45E-01

123.0 7.78E-01 186.0 9.46E-01

125.0 7.80E-01 189.0 9.46E-01

127.0 7.82E-01 192.0 9.46E-01

129.0 7.83E-01 195.0 9.47E-01

131.0 7.84E-01 198.0 9.47E-01

133.0 7.86E-01 201.0 9.48E-01

135.0 7.87E-01 204.0 9.48E-01

137.0 7.89E-01 207.0 9.48E-01

139.0 7.90E-01 210.0 9.49E-01

141.0 7.91E-01 213.0 9.49E-01

143.0 7.93E-01 216.0 9.50E-01

145.0 7.93E-01 219.0 9.50E-01

147.0 7.95E-01 222.0 9.51E-01

149.0 7.96E-01 225.0 9.51E-01

151.0 7.98E-01 228.0 9.51E-01

153.0 7.99E-01 231.0 9.52E-01

155.0 8.01E-01 234.0 9.52E-01

157.0 8.02E-01 237.0 9.52E-01

159.0 8.03E-01 240.0 9.53E-01

161.0 8.05E-01 243.0 9.53E-01

163.0 8.06E-01 246.0 9.53E-01

165.0 8.07E-01 249.0 9.54E-01

167.0 8.09E-01 252.0 9.54E-01

169.0 8.10E-01 255.0 9.54E-01

171.0 8.11E-01 258.0 9.55E-01

173.0 8.12E-01 261.0 9.55E-01

175.0 8.14E-01 264.0 9.55E-01

177.0 8.15E-01 267.0 9.56E-01

179.0 8.16E-01 270.0 9.56E-01

181.0 8.17E-01 273.0 9.56E-01

183.0 8.19E-01 276.0 9.57E-01

185.0 8.20E-01 279.0 9.57E-01

187.0 8.21E-01 282.0 9.57E-01

189.0 8.22E-01 285.0 9.58E-01

191.0 8.23E-01 288.0 9.58E-01

193.0 8.24E-01 291.0 9.58E-01

195.0 8.25E-01 294.0 9.58E-01

197.0 8.26E-01 297.0 9.59E-01

199.0 8.27E-01 300.0 9.59E-01

201.0 8.28E-01 303.0 9.59E-01

203.0 8.29E-01 306.0 9.60E-01

205.0 8.30E-01 309.0 9.60E-01

207.0 8.31E-01 312.0 9.60E-01

209.0 8.32E-01 315.0 9.60E-01

211.0 8.32E-01 318.0 9.61E-01

213.0 8.33E-01 321.0 9.61E-01

215.0 8.34E-01 324.0 9.61E-01

217.0 8.36E-01 327.0 9.61E-01

219.0 8.36E-01 330.0 9.62E-01

221.0 8.37E-01 333.0 9.62E-01

223.0 8.38E-01 336.0 9.62E-01

225.0 8.38E-01 339.0 9.63E-01

227.0 8.40E-01 342.0 9.63E-01

229.0 8.40E-01 345.0 9.63E-01

231.0 8.41E-01 348.0 9.63E-01

233.0 8.42E-01 351.0 9.63E-01

235.0 8.42E-01 354.0 9.64E-01

237.0 8.43E-01 357.0 9.64E-01

239.0 8.44E-01 360.0 9.64E-01

241.0 8.45E-01 363.0 9.65E-01

243.0 8.46E-01 366.0 9.65E-01

245.0 8.47E-01 369.0 9.65E-01

247.0 8.47E-01 372.0 9.65E-01

249.0 8.48E-01 375.0 9.65E-01

251.0 8.49E-01 378.0 9.66E-01

253.0 8.50E-01 381.0 9.66E-01

255.0 8.50E-01 384.0 9.66E-01

257.0 8.51E-01 387.0 9.66E-01

259.0 8.52E-01 390.0 9.67E-01

261.0 8.52E-01 393.0 9.67E-01

263.0 8.53E-01 396.0 9.67E-01

265.0 8.54E-01 399.0 9.67E-01

267.0 8.54E-01 402.0 9.67E-01

269.0 8.55E-01 405.0 9.68E-01

271.0 8.56E-01 408.0 9.68E-01

273.0 8.56E-01 411.0 9.68E-01

275.0 8.57E-01 415.0 9.68E-01

277.0 8.58E-01 418.0 9.68E-01

279.0 8.59E-01 421.0 9.69E-01

281.0 8.60E-01 424.0 9.69E-01

283.0 8.61E-01 427.0 9.69E-01

285.0 8.62E-01 430.0 9.69E-01

287.0 8.63E-01 433.0 9.69E-01

289.0 8.63E-01 436.0 9.70E-01

291.0 8.64E-01 439.0 9.70E-01

293.0 8.64E-01 442.0 9.70E-01

295.0 8.64E-01 445.0 9.70E-01

297.0 8.65E-01 448.0 9.70E-01

299.0 8.65E-01 451.0 9.70E-01

301.0 8.66E-01 454.0 9.70E-01

303.0 8.67E-01 457.0 9.71E-01

305.0 8.67E-01 460.0 9.71E-01

307.0 8.68E-01 463.0 9.71E-01

309.0 8.68E-01 467.0 9.71E-01

311.0 8.68E-01 470.0 9.71E-01

313.0 8.69E-01 473.0 9.71E-01

315.0 8.70E-01 476.0 9.71E-01

317.0 8.70E-01 479.0 9.72E-01

319.0 8.71E-01 482.0 9.72E-01

321.0 8.71E-01 485.0 9.72E-01

323.0 8.72E-01 489.0 9.72E-01

325.0 8.73E-01 492.0 9.72E-01

327.0 8.73E-01 495.0 9.72E-01

329.0 8.74E-01 498.0 9.72E-01

331.0 8.74E-01 501.0 9.72E-01

333.0 8.75E-01 504.0 9.73E-01

335.0 8.76E-01 507.0 9.73E-01

337.0 8.76E-01 510.0 9.73E-01

339.0 8.77E-01 513.0 9.73E-01

341.0 8.77E-01 516.0 9.73E-01

343.0 8.78E-01 519.0 9.73E-01

345.0 8.78E-01 522.0 9.73E-01

347.0 8.79E-01 526.0 9.73E-01

349.0 8.80E-01 529.0 9.73E-01

351.0 8.81E-01 532.0 9.74E-01

353.0 8.81E-01 535.0 9.74E-01

355.0 8.82E-01 538.0 9.74E-01

357.0 8.83E-01 541.0 9.74E-01

359.0 8.83E-01 544.0 9.74E-01

361.0 8.84E-01 547.0 9.74E-01

363.0 8.84E-01 551.0 9.74E-01

365.0 8.85E-01 554.0 9.74E-01

367.0 8.85E-01 557.0 9.74E-01

369.0 8.85E-01 560.0 9.75E-01

371.0 8.86E-01 563.0 9.75E-01

373.0 8.87E-01 566.0 9.75E-01

376.0 8.87E-01 569.0 9.75E-01

378.0 8.88E-01 572.0 9.75E-01

380.0 8.88E-01 575.0 9.75E-01

382.0 8.89E-01 578.0 9.75E-01

384.0 8.89E-01 581.0 9.75E-01

386.0 8.90E-01 584.0 9.75E-01

388.0 8.90E-01 587.0 9.76E-01

390.0 8.91E-01 590.0 9.76E-01

392.0 8.91E-01 593.0 9.76E-01

394.0 8.91E-01 597.0 9.76E-01

396.0 8.92E-01 600.0 9.76E-01

398.0 8.93E-01 603.0 9.76E-01

400.0 8.93E-01 606.0 9.76E-01

402.0 8.94E-01 611.0 9.76E-01

404.0 8.94E-01 614.0 9.76E-01

406.0 8.94E-01 617.0 9.76E-01

408.0 8.94E-01 621.0 9.76E-01

410.0 8.95E-01 624.0 9.77E-01

412.0 8.96E-01 627.0 9.77E-01

414.0 8.96E-01 630.0 9.77E-01

416.0 8.97E-01 633.0 9.77E-01

419.0 8.97E-01 637.0 9.77E-01

421.0 8.98E-01 640.0 9.77E-01

423.0 8.98E-01 643.0 9.77E-01

425.0 8.99E-01 646.0 9.77E-01

427.0 8.99E-01 651.0 9.77E-01

429.0 9.00E-01 654.0 9.77E-01

431.0 9.00E-01 658.0 9.77E-01

433.0 9.00E-01 661.0 9.77E-01

435.0 9.00E-01 664.0 9.77E-01

438.0 9.01E-01 668.0 9.78E-01

440.0 9.01E-01 671.0 9.78E-01

442.0 9.02E-01 674.0 9.78E-01

444.0 9.02E-01 678.0 9.78E-01

446.0 9.02E-01 682.0 9.78E-01

450.0 9.02E-01 685.0 9.78E-01

452.0 9.03E-01 689.0 9.78E-01

454.0 9.03E-01 692.0 9.78E-01

456.0 9.03E-01 698.0 9.78E-01

458.0 9.04E-01 701.0 9.78E-01

461.0 9.04E-01 705.0 9.78E-01

463.0 9.05E-01 708.0 9.78E-01

465.0 9.05E-01 711.0 9.79E-01

467.0 9.05E-01 714.0 9.79E-01

469.0 9.06E-01 717.0 9.79E-01

471.0 9.06E-01 720.0 9.79E-01

473.0 9.06E-01 723.0 9.79E-01

475.0 9.07E-01 726.0 9.79E-01

477.0 9.07E-01 729.0 9.79E-01

479.0 9.07E-01 733.0 9.79E-01

481.0 9.08E-01 737.0 9.79E-01

483.0 9.08E-01 743.0 9.79E-01

485.0 9.08E-01 746.0 9.79E-01

487.0 9.08E-01 750.0 9.80E-01

489.0 9.09E-01 753.0 9.80E-01

491.0 9.09E-01 756.0 9.80E-01

493.0 9.09E-01 760.0 9.80E-01

496.0 9.10E-01 763.0 9.80E-01

498.0 9.10E-01 766.0 9.80E-01

501.0 9.10E-01 769.0 9.80E-01

503.0 9.10E-01 772.0 9.80E-01

505.0 9.11E-01 775.0 9.80E-01

507.0 9.11E-01 778.0 9.80E-01

511.0 9.11E-01 784.0 9.80E-01

513.0 9.12E-01 787.0 9.80E-01

515.0 9.12E-01 791.0 9.81E-01

518.0 9.12E-01 795.0 9.81E-01

521.0 9.12E-01 798.0 9.81E-01

523.0 9.12E-01 801.0 9.81E-01

525.0 9.13E-01 804.0 9.81E-01

527.0 9.13E-01 807.0 9.81E-01

530.0 9.13E-01 810.0 9.81E-01

532.0 9.13E-01 814.0 9.81E-01

534.0 9.14E-01 817.0 9.81E-01

536.0 9.14E-01 822.0 9.81E-01

538.0 9.14E-01 826.0 9.81E-01

540.0 9.15E-01 832.0 9.81E-01

542.0 9.15E-01 835.0 9.81E-01

544.0 9.15E-01 838.0 9.81E-01

547.0 9.15E-01 841.0 9.81E-01

550.0 9.16E-01 844.0 9.81E-01

552.0 9.16E-01 847.0 9.81E-01

554.0 9.17E-01 850.0 9.81E-01

556.0 9.17E-01 853.0 9.82E-01

559.0 9.17E-01 856.0 9.82E-01

563.0 9.18E-01 863.0 9.82E-01

565.0 9.18E-01 866.0 9.82E-01

568.0 9.18E-01 870.0 9.82E-01

570.0 9.19E-01 873.0 9.82E-01

572.0 9.19E-01 876.0 9.82E-01

574.0 9.19E-01 879.0 9.82E-01

576.0 9.19E-01 883.0 9.82E-01

578.0 9.20E-01 886.0 9.82E-01

580.0 9.20E-01 889.0 9.82E-01

582.0 9.20E-01 893.0 9.82E-01

584.0 9.20E-01 896.0 9.82E-01

587.0 9.20E-01 899.0 9.82E-01

589.0 9.21E-01 902.0 9.82E-01

591.0 9.21E-01 907.0 9.82E-01

594.0 9.21E-01 910.0 9.82E-01

597.0 9.21E-01 913.0 9.83E-01

599.0 9.22E-01 916.0 9.83E-01

601.0 9.22E-01 919.0 9.83E-01

603.0 9.22E-01 922.0 9.83E-01

605.0 9.22E-01 925.0 9.83E-01

609.0 9.22E-01 928.0 9.83E-01

612.0 9.23E-01 932.0 9.83E-01

614.0 9.23E-01 937.0 9.83E-01

616.0 9.23E-01 940.0 9.83E-01

618.0 9.23E-01 945.0 9.83E-01

621.0 9.24E-01 948.0 9.83E-01

623.0 9.24E-01 951.0 9.83E-01

625.0 9.24E-01 955.0 9.83E-01

630.0 9.24E-01 958.0 9.83E-01

633.0 9.24E-01 961.0 9.83E-01

635.0 9.25E-01 967.0 9.84E-01

638.0 9.25E-01 970.0 9.84E-01

640.0 9.25E-01 977.0 9.84E-01

642.0 9.25E-01 981.0 9.84E-01

646.0 9.25E-01 984.0 9.84E-01

648.0 9.25E-01 988.0 9.84E-01

650.0 9.25E-01 1000.0 9.84E-01

652.0 9.26E-01 1006.0 9.84E-01

654.0 9.26E-01 1010.0 9.84E-01

656.0 9.26E-01 1015.0 9.84E-01

659.0 9.27E-01 1022.0 9.84E-01

661.0 9.27E-01 1025.0 9.84E-01

664.0 9.27E-01 1028.0 9.84E-01

666.0 9.27E-01 1031.0 9.84E-01

668.0 9.28E-01 1035.0 9.84E-01

670.0 9.28E-01 1039.0 9.84E-01

672.0 9.28E-01 1043.0 9.84E-01

677.0 9.28E-01 1048.0 9.84E-01

679.0 9.29E-01 1051.0 9.85E-01

681.0 9.29E-01 1054.0 9.85E-01

685.0 9.29E-01 1057.0 9.85E-01

688.0 9.29E-01 1060.0 9.85E-01

693.0 9.29E-01 1067.0 9.85E-01

696.0 9.30E-01 1072.0 9.85E-01

699.0 9.30E-01 1075.0 9.85E-01

701.0 9.30E-01 1081.0 9.85E-01

703.0 9.30E-01 1085.0 9.85E-01

706.0 9.31E-01 1090.0 9.85E-01

708.0 9.31E-01 1093.0 9.85E-01

710.0 9.31E-01 1096.0 9.85E-01

716.0 9.31E-01 1102.0 9.85E-01

718.0 9.31E-01 1105.0 9.85E-01

722.0 9.32E-01 1111.0 9.85E-01

724.0 9.32E-01 1119.0 9.85E-01

726.0 9.32E-01 1123.0 9.85E-01

729.0 9.32E-01 1127.0 9.85E-01

731.0 9.33E-01 1130.0 9.86E-01

734.0 9.33E-01 1133.0 9.86E-01

738.0 9.33E-01 1136.0 9.86E-01

743.0 9.33E-01 1139.0 9.86E-01

745.0 9.33E-01 1143.0 9.86E-01

748.0 9.33E-01 1147.0 9.86E-01

751.0 9.34E-01 1152.0 9.86E-01

754.0 9.34E-01 1155.0 9.86E-01

757.0 9.34E-01 1158.0 9.86E-01

763.0 9.34E-01 1161.0 9.86E-01

765.0 9.35E-01 1170.0 9.86E-01

768.0 9.35E-01 1173.0 9.86E-01

773.0 9.35E-01 1177.0 9.86E-01

775.0 9.35E-01 1183.0 9.86E-01

778.0 9.35E-01 1187.0 9.86E-01

780.0 9.35E-01 1191.0 9.86E-01

782.0 9.36E-01 1194.0 9.86E-01

786.0 9.36E-01 1197.0 9.86E-01

789.0 9.36E-01 1200.0 9.87E-01

791.0 9.36E-01 1205.0 9.87E-01

793.0 9.36E-01 1208.0 9.87E-01

795.0 9.36E-01 1211.0 9.87E-01

797.0 9.37E-01 1214.0 9.87E-01

800.0 9.37E-01 1217.0 9.87E-01

802.0 9.37E-01 1220.0 9.87E-01

805.0 9.37E-01 1223.0 9.87E-01

809.0 9.37E-01 1228.0 9.87E-01

811.0 9.37E-01 1235.0 9.87E-01

814.0 9.38E-01 1238.0 9.87E-01

817.0 9.38E-01 1243.0 9.87E-01

819.0 9.38E-01 1248.0 9.87E-01

821.0 9.38E-01 1253.0 9.87E-01

823.0 9.38E-01 1256.0 9.87E-01

826.0 9.38E-01 1259.0 9.88E-01

828.0 9.38E-01 1264.0 9.88E-01

831.0 9.38E-01 1269.0 9.88E-01

835.0 9.39E-01 1273.0 9.88E-01

840.0 9.39E-01 1276.0 9.88E-01

844.0 9.39E-01 1280.0 9.88E-01

847.0 9.39E-01 1283.0 9.88E-01

849.0 9.39E-01 1286.0 9.88E-01

852.0 9.39E-01 1289.0 9.88E-01

854.0 9.39E-01 1292.0 9.88E-01

857.0 9.40E-01 1295.0 9.88E-01

860.0 9.40E-01 1298.0 9.88E-01

862.0 9.40E-01 1301.0 9.88E-01

865.0 9.40E-01 1305.0 9.88E-01

867.0 9.40E-01 1310.0 9.89E-01

869.0 9.40E-01 1314.0 9.89E-01

872.0 9.41E-01 1317.0 9.89E-01

874.0 9.41E-01 1321.0 9.89E-01

877.0 9.41E-01 1324.0 9.89E-01

880.0 9.41E-01 1328.0 9.89E-01

882.0 9.41E-01 1331.0 9.89E-01

886.0 9.42E-01 1334.0 9.89E-01

890.0 9.42E-01 1337.0 9.89E-01

893.0 9.42E-01 1340.0 9.89E-01

895.0 9.42E-01 1343.0 9.89E-01

901.0 9.42E-01 1347.0 9.89E-01

903.0 9.43E-01 1350.0 9.89E-01

907.0 9.43E-01 1353.0 9.89E-01

911.0 9.43E-01 1357.0 9.89E-01

913.0 9.43E-01 1361.0 9.90E-01

915.0 9.44E-01 1365.0 9.90E-01

920.0 9.44E-01 1368.0 9.90E-01

923.0 9.44E-01 1373.0 9.90E-01

927.0 9.44E-01 1376.0 9.90E-01

930.0 9.44E-01 1379.0 9.90E-01

935.0 9.44E-01 1382.0 9.90E-01

938.0 9.45E-01 1386.0 9.90E-01

941.0 9.45E-01 1391.0 9.90E-01

945.0 9.45E-01 1394.0 9.90E-01

948.0 9.45E-01 1397.0 9.90E-01

951.0 9.45E-01 1400.0 9.90E-01

953.0 9.45E-01 1404.0 9.90E-01

959.0 9.45E-01 1407.0 9.90E-01

962.0 9.45E-01 1410.0 9.90E-01

964.0 9.45E-01 1413.0 9.90E-01

970.0 9.45E-01 1416.0 9.91E-01

976.0 9.46E-01 1419.0 9.91E-01

979.0 9.46E-01 1424.0 9.91E-01

983.0 9.46E-01 1427.0 9.91E-01

987.0 9.46E-01 1430.0 9.91E-01

989.0 9.46E-01 1434.0 9.91E-01

991.0 9.46E-01 1437.0 9.91E-01

993.0 9.46E-01 1440.0 9.91E-01

995.0 9.47E-01 1443.0 9.91E-01

998.0 9.47E-01 1446.0 9.91E-01

1000.0 9.47E-01 1450.0 9.91E-01

1002.0 9.47E-01 1453.0 9.92E-01

1008.0 9.47E-01 1456.0 9.92E-01

1011.0 9.47E-01 1461.0 9.92E-01

1013.0 9.47E-01 1465.0 9.92E-01

1015.0 9.48E-01 1468.0 9.92E-01

1019.0 9.48E-01 1471.0 9.92E-01

1026.0 9.48E-01 1474.0 9.92E-01

1028.0 9.48E-01 1477.0 9.92E-01

1033.0 9.48E-01 1483.0 9.92E-01

1035.0 9.48E-01 1490.0 9.92E-01

1040.0 9.48E-01 1494.0 9.92E-01

1042.0 9.48E-01 1497.0 9.92E-01

1044.0 9.48E-01 1500.0 9.92E-01

1046.0 9.48E-01 1504.0 9.92E-01

1048.0 9.49E-01 1510.0 9.92E-01

1050.0 9.49E-01 1513.0 9.93E-01

1054.0 9.49E-01 1518.0 9.93E-01

1056.0 9.49E-01 1521.0 9.93E-01

1060.0 9.49E-01 1524.0 9.93E-01

1062.0 9.49E-01 1527.0 9.93E-01

1064.0 9.49E-01 1531.0 9.93E-01

1066.0 9.50E-01 1534.0 9.93E-01

1069.0 9.50E-01 1538.0 9.93E-01

1072.0 9.50E-01 1541.0 9.93E-01

1074.0 9.50E-01 1545.0 9.93E-01

1079.0 9.50E-01 1549.0 9.93E-01

1082.0 9.50E-01 1557.0 9.93E-01

1088.0 9.50E-01 1561.0 9.93E-01

1090.0 9.51E-01 1564.0 9.93E-01

1099.0 9.51E-01 1567.0 9.93E-01

1105.0 9.51E-01 1570.0 9.93E-01

1107.0 9.51E-01 1573.0 9.93E-01

1115.0 9.51E-01 1576.0 9.93E-01

1119.0 9.51E-01 1579.0 9.93E-01

1128.0 9.51E-01 1583.0 9.93E-01

1134.0 9.51E-01 1588.0 9.94E-01

1138.0 9.51E-01 1592.0 9.94E-01

1142.0 9.51E-01 1595.0 9.94E-01

1144.0 9.52E-01 1602.0 9.94E-01

1149.0 9.52E-01 1606.0 9.94E-01

1153.0 9.52E-01 1609.0 9.94E-01

1155.0 9.52E-01 1612.0 9.94E-01

1161.0 9.52E-01 1617.0 9.94E-01

1163.0 9.52E-01 1620.0 9.94E-01

1166.0 9.52E-01 1626.0 9.94E-01

1170.0 9.53E-01 1630.0 9.94E-01

1172.0 9.53E-01 1635.0 9.94E-01

1174.0 9.53E-01 1638.0 9.94E-01

1179.0 9.53E-01 1641.0 9.94E-01

1181.0 9.53E-01 1646.0 9.94E-01

1183.0 9.53E-01 1650.0 9.94E-01

1185.0 9.53E-01 1653.0 9.94E-01

1187.0 9.54E-01 1656.0 9.94E-01

1191.0 9.54E-01 1659.0 9.94E-01

1195.0 9.54E-01 1662.0 9.94E-01

1198.0 9.54E-01 1666.0 9.94E-01

1201.0 9.54E-01 1670.0 9.95E-01

1204.0 9.55E-01 1674.0 9.95E-01

1207.0 9.55E-01 1677.0 9.95E-01

1209.0 9.55E-01 1681.0 9.95E-01

1211.0 9.55E-01 1688.0 9.95E-01

1214.0 9.55E-01 1691.0 9.95E-01

1216.0 9.55E-01 1695.0 9.95E-01

1223.0 9.55E-01 1698.0 9.95E-01

1225.0 9.55E-01 1701.0 9.95E-01

1229.0 9.56E-01 1706.0 9.95E-01

1231.0 9.56E-01 1709.0 9.95E-01

1236.0 9.56E-01 1714.0 9.95E-01

1238.0 9.56E-01 1717.0 9.95E-01

1240.0 9.56E-01 1721.0 9.95E-01

1244.0 9.57E-01 1724.0 9.95E-01

1247.0 9.57E-01 1727.0 9.95E-01

1250.0 9.57E-01 1731.0 9.95E-01

1252.0 9.57E-01 1734.0 9.95E-01

1257.0 9.57E-01 1739.0 9.96E-01

1260.0 9.57E-01 1742.0 9.96E-01

1262.0 9.57E-01 1746.0 9.96E-01

1264.0 9.57E-01 1749.0 9.96E-01

1269.0 9.57E-01 1756.0 9.96E-01

1273.0 9.58E-01 1760.0 9.96E-01

1275.0 9.58E-01 1763.0 9.96E-01

1277.0 9.58E-01 1766.0 9.96E-01

1279.0 9.58E-01 1771.0 9.96E-01

1282.0 9.59E-01 1774.0 9.96E-01

1284.0 9.59E-01 1778.0 9.96E-01

1289.0 9.59E-01 1783.0 9.96E-01

1291.0 9.59E-01 1789.0 9.96E-01

1293.0 9.59E-01 1794.0 9.96E-01

1299.0 9.59E-01 1798.0 9.96E-01

1301.0 9.59E-01 1804.0 9.96E-01

1303.0 9.60E-01 1809.0 9.96E-01

1305.0 9.60E-01 1812.0 9.96E-01

1309.0 9.60E-01 1816.0 9.96E-01

1312.0 9.60E-01 1820.0 9.97E-01

1314.0 9.60E-01 1824.0 9.97E-01

1318.0 9.60E-01 1828.0 9.97E-01

1320.0 9.61E-01 1831.0 9.97E-01

1323.0 9.61E-01 1834.0 9.97E-01

1325.0 9.61E-01 1837.0 9.97E-01

1327.0 9.61E-01 1840.0 9.97E-01

1329.0 9.61E-01 1843.0 9.97E-01

1332.0 9.61E-01 1849.0 9.97E-01

1334.0 9.61E-01 1852.0 9.97E-01

1337.0 9.62E-01 1855.0 9.97E-01

1340.0 9.62E-01 1858.0 9.97E-01

1342.0 9.62E-01 1863.0 9.97E-01

1345.0 9.62E-01 1870.0 9.97E-01

1351.0 9.62E-01 1876.0 9.97E-01

1355.0 9.63E-01 1879.0 9.97E-01

1359.0 9.63E-01 1884.0 9.97E-01

1361.0 9.63E-01 1887.0 9.97E-01

1364.0 9.64E-01 1898.0 9.97E-01

1366.0 9.64E-01 1905.0 9.97E-01

1373.0 9.64E-01 1908.0 9.97E-01

1375.0 9.64E-01 1911.0 9.97E-01

1379.0 9.64E-01 1918.0 9.97E-01

1382.0 9.65E-01 1924.0 9.97E-01

1384.0 9.65E-01 1931.0 9.97E-01

1386.0 9.65E-01 1938.0 9.97E-01

1389.0 9.65E-01 1955.0 9.97E-01

1392.0 9.65E-01 1959.0 9.97E-01

1395.0 9.66E-01 1969.0 9.97E-01

1397.0 9.66E-01 1976.0 9.97E-01

1401.0 9.66E-01 1980.0 9.98E-01

1403.0 9.66E-01 1983.0 9.98E-01

1405.0 9.66E-01 1988.0 9.98E-01

1407.0 9.66E-01 1992.0 9.98E-01

1411.0 9.66E-01 1996.0 9.98E-01

1415.0 9.67E-01 1999.0 9.98E-01

1418.0 9.67E-01 2002.0 9.98E-01

1422.0 9.67E-01 2007.0 9.98E-01

1425.0 9.67E-01 2015.0 9.98E-01

1427.0 9.67E-01 2021.0 9.98E-01

1430.0 9.67E-01 2027.0 9.98E-01

1433.0 9.67E-01 2030.0 9.98E-01

1435.0 9.68E-01 2034.0 9.98E-01

1438.0 9.68E-01 2041.0 9.98E-01

1440.0 9.68E-01 2055.0 9.98E-01

1445.0 9.68E-01 2059.0 9.98E-01

1448.0 9.68E-01 2062.0 9.98E-01

1451.0 9.69E-01 2070.0 9.98E-01

1453.0 9.69E-01 2074.0 9.98E-01

1456.0 9.69E-01 2083.0 9.98E-01

1458.0 9.69E-01 2087.0 9.98E-01

1463.0 9.70E-01 2094.0 9.98E-01

1465.0 9.70E-01 2100.0 9.98E-01

1467.0 9.70E-01 2105.0 9.98E-01

1474.0 9.70E-01 2109.0 9.98E-01

1477.0 9.71E-01 2113.0 9.98E-01

1481.0 9.71E-01 2116.0 9.98E-01

1484.0 9.71E-01 2136.0 9.99E-01

1487.0 9.71E-01 2142.0 9.99E-01

1491.0 9.71E-01 2151.0 9.99E-01

1494.0 9.72E-01 2158.0 9.99E-01

1498.0 9.72E-01 2161.0 9.99E-01

1501.0 9.72E-01 2166.0 9.99E-01

1505.0 9.72E-01 2175.0 9.99E-01

1508.0 9.73E-01 2179.0 9.99E-01

1511.0 9.73E-01 2182.0 9.99E-01

1514.0 9.73E-01 2187.0 9.99E-01

1517.0 9.73E-01 2191.0 9.99E-01

1519.0 9.73E-01 2198.0 9.99E-01

1521.0 9.73E-01 2203.0 9.99E-01

1525.0 9.74E-01 2211.0 9.99E-01

1532.0 9.74E-01 2216.0 9.99E-01

1535.0 9.74E-01 2219.0 9.99E-01

1538.0 9.74E-01 2224.0 9.99E-01

1541.0 9.75E-01 2228.0 9.99E-01

1543.0 9.75E-01 2233.0 9.99E-01

1546.0 9.75E-01 2239.0 9.99E-01

1552.0 9.75E-01 2242.0 9.99E-01

1555.0 9.75E-01 2248.0 9.99E-01

1557.0 9.75E-01 2252.0 9.99E-01

1559.0 9.75E-01 2256.0 9.99E-01

1561.0 9.76E-01 2260.0 9.99E-01

1565.0 9.76E-01 2265.0 9.99E-01

1567.0 9.76E-01 2280.0 9.99E-01

1569.0 9.76E-01 2289.0 9.99E-01

1572.0 9.76E-01 2296.0 9.99E-01

1574.0 9.77E-01 2308.0 9.99E-01

1576.0 9.77E-01 2313.0 9.99E-01

1579.0 9.77E-01 2323.0 9.99E-01

1581.0 9.77E-01 2329.0 9.99E-01

1583.0 9.77E-01 2349.0 9.99E-01

1585.0 9.78E-01 2352.0 1.00E+00

1587.0 9.78E-01 2364.0 1.00E+00

1595.0 9.78E-01 2372.0 1.00E+00

1598.0 9.78E-01 2388.0 1.00E+00

1603.0 9.79E-01 2392.0 1.00E+00

1605.0 9.79E-01 2408.0 1.00E+00

1610.0 9.79E-01 2413.0 1.00E+00

1612.0 9.79E-01 2424.0 1.00E+00

1620.0 9.80E-01 2434.0 1.00E+00

1627.0 9.80E-01 2445.0 1.00E+00

1629.0 9.80E-01 2462.0 1.00E+00

1631.0 9.80E-01 2475.0 1.00E+00

1635.0 9.80E-01 2495.0 1.00E+00

1637.0 9.80E-01 2525.0 1.00E+00

1639.0 9.80E-01 2562.0 1.00E+00

1641.0 9.80E-01 2572.0 1.00E+00

1648.0 9.81E-01 2654.0 1.00E+00

1650.0 9.81E-01 2674.0 1.00E+00

1652.0 9.81E-01 2766.0 1.00E+00

1657.0 9.81E-01 2830.0 1.00E+00

1659.0 9.81E-01 2916.0 1.00E+00

1663.0 9.81E-01 3125.0 1.00E+00

1665.0 9.81E-01 3182.0 1.00E+00

1668.0 9.82E-01 3778.0 1.00E+00

1670.0 9.82E-01 3778.0 1.00E+00

1673.0 9.82E-01

1678.0 9.82E-01

1681.0 9.82E-01

1683.0 9.82E-01

1686.0 9.83E-01

1688.0 9.83E-01

1691.0 9.83E-01

1693.0 9.83E-01

1697.0 9.83E-01

1699.0 9.84E-01

1701.0 9.84E-01

1703.0 9.84E-01

1708.0 9.84E-01

1712.0 9.84E-01

1716.0 9.84E-01

1719.0 9.84E-01

1725.0 9.85E-01

1730.0 9.85E-01

1733.0 9.85E-01

1737.0 9.85E-01

1741.0 9.85E-01

1743.0 9.85E-01

1752.0 9.85E-01

1759.0 9.85E-01

1762.0 9.86E-01

1766.0 9.86E-01

1774.0 9.86E-01

1776.0 9.86E-01

1778.0 9.86E-01

1780.0 9.86E-01

1790.0 9.87E-01

1792.0 9.87E-01

1800.0 9.87E-01

1802.0 9.87E-01

1807.0 9.87E-01

1810.0 9.87E-01

1822.0 9.87E-01

1826.0 9.87E-01

1828.0 9.88E-01

1830.0 9.88E-01

1835.0 9.88E-01

1837.0 9.88E-01

1844.0 9.88E-01

1848.0 9.88E-01

1861.0 9.88E-01

1865.0 9.89E-01

1867.0 9.89E-01

1874.0 9.89E-01

1878.0 9.89E-01

1882.0 9.89E-01

1885.0 9.89E-01

1887.0 9.89E-01

1889.0 9.89E-01

1892.0 9.89E-01

1899.0 9.90E-01

1906.0 9.90E-01

1910.0 9.90E-01

1913.0 9.90E-01

1920.0 9.90E-01

1922.0 9.90E-01

1930.0 9.90E-01

1934.0 9.90E-01

1939.0 9.91E-01

1943.0 9.91E-01

1946.0 9.91E-01

1948.0 9.91E-01

1950.0 9.91E-01

1952.0 9.91E-01

1959.0 9.91E-01

1966.0 9.91E-01

1973.0 9.91E-01

1977.0 9.92E-01

1984.0 9.92E-01

1999.0 9.92E-01

2004.0 9.92E-01

2007.0 9.92E-01

2032.0 9.92E-01

2036.0 9.92E-01

2041.0 9.92E-01

2047.0 9.92E-01

2051.0 9.93E-01

2059.0 9.93E-01

2061.0 9.93E-01

2067.0 9.93E-01

2072.0 9.93E-01

2075.0 9.93E-01

2077.0 9.93E-01

2083.0 9.93E-01

2086.0 9.93E-01

2093.0 9.94E-01

2095.0 9.94E-01

2103.0 9.94E-01

2108.0 9.94E-01

2110.0 9.94E-01

2124.0 9.94E-01

2126.0 9.94E-01

2128.0 9.95E-01

2130.0 9.95E-01

2132.0 9.95E-01

2134.0 9.95E-01

2136.0 9.95E-01

2159.0 9.95E-01

2162.0 9.95E-01

2165.0 9.95E-01

2175.0 9.95E-01

2177.0 9.95E-01

2179.0 9.95E-01

2184.0 9.95E-01

2187.0 9.96E-01

2191.0 9.96E-01

2193.0 9.96E-01

2195.0 9.96E-01

2199.0 9.96E-01

2201.0 9.96E-01

2211.0 9.96E-01

2213.0 9.96E-01

2240.0 9.96E-01

2243.0 9.96E-01

2246.0 9.97E-01

2250.0 9.97E-01

2257.0 9.97E-01

2263.0 9.97E-01

2278.0 9.97E-01

2293.0 9.97E-01

2307.0 9.97E-01

2313.0 9.97E-01

2321.0 9.97E-01

2341.0 9.98E-01

2371.0 9.98E-01

2373.0 9.98E-01

2378.0 9.98E-01

2390.0 9.98E-01

2392.0 9.98E-01

2395.0 9.98E-01

2406.0 9.98E-01

2410.0 9.98E-01

2429.0 9.98E-01

2453.0 9.98E-01

2456.0 9.98E-01

2458.0 9.99E-01

2462.0 9.99E-01

2478.0 9.99E-01

2495.0 9.99E-01

2498.0 9.99E-01

2501.0 9.99E-01

2532.0 9.99E-01

2541.0 9.99E-01

2570.0 9.99E-01

2605.0 9.99E-01

2612.0 9.99E-01

2622.0 9.99E-01

2652.0 1.00E+00

2753.0 1.00E+00

2794.0 1.00E+00

3003.0 1.00E+00

3075.0 1.00E+00

3211.0 1.00E+00

3211.0 1.00E+00

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1E+00 1E+01 1E+02 1E+03 1E+04

Cum

ula

tive

Fra

ctio

n

1E-04 1E-03 1E-02 1E-01 1E-00

OpenTCP1

OpenTCP2TCP

Log of Normalized Drop Rate

(b)

Figure 5.9: Comparing the CDF of FCT (a) and drop rate (b) of OpenTCP1, OpenTCP2, andregular TCP while running the “Flash” benchmark.

about 12, 000 �ows. Figure 5.9 compares the CDF of �ow completion times and drop rates for

OpenTCP1, OpenTCP2, and unmodi�ed TCP. Like the results presented above, both OpenTCP1

and OpenTCP2 improve the tail of the FCT curve by at least 40%. But, OpenTCP2 is more

successful at reducing the drop rate as it has an explicit rule for controlling the drops.

OpenTCP has low overhead. Table 5.1 divides OpenTCP’s overhead into four categories: (i)

the CPU overhead associated with data aggregation and CUE generation at the Oracle, (ii) the

CPU overhead associated with data collection and CUE enforcement in the end-hosts, (iii)

the required bandwidth to transfer statistics from the end-hosts to the Oracle, and (iv) the

required bandwidth to disseminate CUEs to the end-hosts. We summarize OpenTCP’s overhead

for refresh periods of 1, 5, and 10 minutes in Table 5.1. �e table shows that OpenTCP has a

negligible processing and bandwidth overhead, making it easy to scale OpenTCP to large clusters


Table 5.1: �e approximate OpenTCP overhead for a network with ∼4000 nodes and ∼100switches for di�erent Oracle refresh cycles.

1 min 5 min 10 minOracle CPU Overhead (%) 0.9 0.0 0.0CCA CPU Overhead (%) 0.5 0.1 0.0Data Collection (Kbps) 453 189 95CUE Dissemination (Kbps) 530 252 75

of hundreds of thousands of nodes. Clearly, OpenTCP’s refresh cycle plays a critical role in its

overhead on networking elements.

Stability. To realize OpenTCP in practical settings, we assume that the network operator has

expertise in de�ning the congestion control policies appropriate for the network. Moreover,

to achieve pragmatic stability, there should be a monitoring system in place which alerts the

operator of churns and instabilities in the network. �is monitoring component should alert

the controller whenever there is an oscillation between states. For example, if Oracle is making

adjustments to TCP �ows in time t1 and immediately in time t1 + T those changes are reverted

back, it is a good indication of a potential unstable condition. In this case, the operator should

be noti�ed to adjust either the congestion control policies, the stability constraints, or the overall

resources in the network. One simple stability metric is number of times a rule is applied and

reverted. �e monitoring system can measure such stability metrics and alert the operator.

On top of adapting TCP parameters based on network and tra�c conditions, OpenTCP can

go beyond to include application semantics into its decision making process. For example,

the target video streaming rate for video servers can be fed to OpenTCP’s Oracle. Another

application semantic could be �le sizes in a transfer that OpenTCP can use to derive hints for

TCP sessions.


5.2 Beehive SDN Controller

Beehive is a generic platform for distributed programming. As such, Beehive provides the

building blocks for an SDN controller, not the speci�c functionalities of a controller. �at is,

in our design, the SDN controller is an application (and perhaps the most important use-case)

of Beehive. In this section, we discuss the SDN controller we have built on top of Beehive to

showcase the e�ectiveness of our proposal for SDN. It is important to note that the controller

we present here is only one possible design, and Beehive can be used as a framework to develop

other SDN control plane abstractions.

SDN Controller Suite. We have designed and implemented an SDN abstraction on top of

Beehive. Our implementation is open-source [3]. �is abstraction can act as a general-purpose,

distributed SDN controller as-is, or can be reused by the community to create other networking

systems. Our abstraction is formed based on simple and familiar SDN constructs and is

implemented as shown in Figure 5.10.

Network object model (NOM) is the core of our SDN abstraction that represents the network

in the form of a graph (similar to ONIX NIB [75]): nodes, links, ports, and �ow entries in a

southbound protocol agnostic manner. We skip the de�nition for nodes, ports, links, and �ow

entries as they are generic and well-known.

In addition to these common and well-known graph constructs, Beehive’s SDN abstraction

supports triggers and paths that will simplify network programming at scale. Triggers are

basically active thresholds on bandwidth consumption and duration of �ow entries. �ey are

installed by network applications and pushed close to their respective switches by the node

controller (explained shortly). Triggers are a scalable replacement for polling switches directly

from di�erent control applications. A path is an atomic, transactional end-to-end forwarding

rule that is automatically compiled into individual �ows across switches in the network. Using

paths, control applications can rely on the platform to manage the details of routing for them.

Node Controller. In our design, node controllers are the active components that manage


Driver

Controller

Driver DriverDriver

Switch Switch

Controller

Consolidator

Monitor

Consolidator

Monitor

Path Manager

Controller Controller

Hive

HiveH

ive

Hiv

e

MasterSlave

Flows

Triggers

Flows

Triggers

Paths

Figure 5.10: Beehive’s SDN controller suite.

the NOM objects. To make them protocol agnostic, we have drivers that act as southbound

mediators. �at is, drivers are responsible to map physical networking entities into NOM logical

entities. �e node controller is responsible to manage those drivers (e.g., elect a master among

them). �e node controller stores its data on a per switch basis and is automatically distributed

and placed close to switches.

Node controller is responsible for con�gurations at node level. At a higher level, we have path

managers that install logical paths on network nodes. We have implemented the path manager

as a centralized application since it requires a global view of the network to install proper �ow-

entries. Note that all the expensive functionalities are delegated to the node controller which is

a distributed application and close to switches.

Consolidator.�ere is always a possibility of discrepancy between the state of a switch and the

state of a node controller: the switch may remove a �ow right when the master driver fails, or a

switch may restart while the �ow is being installed by the driver. �e consolidator is basically a

poller that removes any �ow that is not in the controller’s state, and noti�es the controller if the

switch has removed any previously installed �ows.

Monitor. Monitor is another poller in the controller suite that implements triggers. For each

node, the monitor periodically sends �ow stat queries to �nd out whether any trigger should


Switch S1

Node Ctrl(S1)Forwarding

(S1)Driver


(S1)

Driver


(S1)

Driver

Forwarding Quorum Node Ctrl Quorum

Master Slave Slave

(a)

Switch S1

Node Ctrl(S1)

Forwarding(S1)

Driver

Node Ctrl(S1)

Forwarding(S1)

Driver

Master Slave

(b)

Switch S1

Node Ctrl(S1)

Forwarding(S1)

Driver


(S1)

Driver

Master Slave

(c)

1

10

100

1000

0 1 2 3 4 5 6 7 8 9 10Time (sec)

Late

ncy

(ms)

FailoverFailure

(d)

1

10

100

1000

0 1 2 3 4 5 6 7 8 9 10Time (sec)

Late

ncy

(ms)

Failover

Failure

(e)

1

10

100

1000

0 1 2 3 4 5 6 7 8 9 10Time (sec)

Late

ncy

(ms)

Failover

Failure Optimized

(f)

Figure 5.11: Fault-tolerance characteristics of the control suite using a Ra� election of 300ms:(a) �e default placement. (b) When the newly elected masters are on the same hive. (c) Whenthe new masters are on di�erent hives. (d) Reactive forwarding latency of (b). (e) Reactiveforwarding latency of (c). (f) Reactive forwarding latency of (c) which is improved by automaticplacement optimization at second 6.


�re. �e monitor is distributed on a per-node basis and, in most cases, is placed next to the

node’s controller. �is reduces control channel overheads.

Beehive vs Centralized Controllers. To compare the simplicity of Beehive with centralized

controllers, we have implemented a hub, a learning switch, a host tracker, a �rewall (i.e.,

ACL at the edge) and a layer-3 router with respectively 16, 69, 158, 237 and 310 lines of code

using Beehive’s SDN abstraction. �ese applications are similar in functionality and on par

in simplicity to what is available on POX and Beacon. �e di�erences in lines of code are

because of di�erent verbosity of Python, Java, and Go. �e signi�cant di�erence here is that

Beehive SDN applications will automatically bene�t from persistence, fault-tolerance, runtime

instrumentation, and optimized placement, readily available in Beehive. We admit that the lines

of code, although commonly used, is not a perfect representative of application complexity.

Finding a proper complexity metric for control applications is beyond the scope of this thesis

and we leave it to future works.

Fault-Tolerance. To tolerate a failure (i.e., either a controller failure or a lost switch-to-controller

channel), our node controller application pings the drivers of each node (by default, the

controller sends one ping each 100ms). If a driver does not respond to a few (3 by default)

consecutive node controller’s pings, the driver is announced dead. If the lost driver is the master

driver, one of the slave drivers will be instructed to become the master driver. It is important to

note that when a hive that hosts the node controller fails, the platform will automatically delegate

the node to a follower bee on another hive. In such scenarios, once the node controller resumes,

it continues to health-check the drivers.

To evaluate how our SDN controller reacts to faults, we have developed a reactive forwarding

application that learns layer-2 and layer-3 addresses, but instead of installing �ow entries, it

sends PacketOuts. With that, a failure in the control plane is directly observed in the data

plane. In this experiment, we create a simple network emulated in mininet, and measure the

round-trip time between two hosts every 100ms for 10 seconds. To emulate a fault, we kill the

hive that hosts the OpenFlow driver at second 3. In this set of experiments, we use a Ra� election


timeout of 300ms.

Before presenting the results, let us �rst clarify what happens for the control applications

upon a failure. As shown in Figure 5.11a, because of the default placement, the master bees of

node controller and the forwarding application will be collocated on the hive to which the switch

�rst connects. �e driver on the hive consequently will be the master driver. In this setting, the

average latency of the forwarding application will be about 1ms.

Failures in any of the hives hosting the slave bees would have no e�ect on the control

application (given that the majority of hives – 2 in this case – are up and running). Now, when

the hive that hosts the master bees fail (or goes into a minority partition), one of the follower bees

of each application will be elected as a master bee for that application. �ere are two possibilities

here: (i) as shown in Figure 5.11b new masters may be both collocated on the same hive, or (ii)

as shown in Figure 5.11c, new masters will be on di�erent hives3.

As portrayed in Figure 5.11d, when the new masters are elected from the same hive the

latency a�er the failover will be as low as 1ms. In contrast, when the new masters are located

on di�erent hives the latency will as high as 8ms (Figure 5.11e). �is is where the automatic

placement optimization can be of measurable help. As shown in Figure 5.11f, our greedy

algorithm detects the suboptimality and migrates the bees accordingly. A�er a short spike

in latency because of the bu�ering of a couple of messages during migration, the forwarding

latency becomes 1ms. It is important to note that as demonstrated in Figure 4.11, the Ra� election

timeout can be tuned to lower values if a faster failover is required in an environment.

When the link between the switch and the master driver fails (i.e., due to an update in

the control network, congestion in channels, or a switch recon�guration), something similar

happens (Figure 5.12a). In such scenarios, the master driver noti�es the node controller that its

switch is disconnected. In response, the node controller elects one of the remaining slave drivers

as the master. Although the forwarding application and the master driver are not from the

same hive anymore, the forwarding application will perform consistently. �is is an important

3We cannot prefer one bee over another bee in a colony in election time since it will obviate the provableconvergence of Ra�.


property of Beehive. As shown in Figure 5.12b, the forwarding latency would accordingly be

high at �rst. As soon as the automatic optimizer detects this suboptimality the bees are migrated

and the latency will drop. We note that, since the drivers have IO routines they will never be

migrated by the optimizer. As a result, it is always the other bees that will be placed close to

those IO channels.

Scalability. To characterize the scalability of the Beehive SDN controller, we benchmarked our

learning switch application deployed on 1, 3, and 5 hives with no persistence as well as replication

factors of 1, 3, and 5. We stress each hive by one cbench [111] process. In these experiments, we

use 5 GCE virtual machines with 16 virtual cores, 14.4GB of RAM, and directly attached SSDs.

We launch cbench on either 1, 3 or 5 nodes in parallel, each emulating 16 di�erent switches. �at

is, when we use 1, 3, and 5 nodes to stress the application we have 16, 48, and 80 switches equally

spread among 1, 3, and 5 hives, respectively. Moreover, all SDN applications in our controller

suite (e.g., the node controller and discovery) are enabled as well as runtime instrumentation.

Although disabling all extra applications and using a Hub will result in a throughput of one

million messages per second on a single machine, we believe such a benchmark is not a realistic

demonstration of the real-world performance of a control platform.

Switch S1

Node Ctrl(S1)

Forwarding(S1)

Driver

Node Ctrl(S1)

Forwarding(S1)

Driver

Node Ctrl(S1)

Forwarding(S1)

Driver

Master Slave

(a)

1

10

100

1000

0 1 2 3 4 5 6 7 8 9 10Time (sec)

Late

ncy

(ms)

Disconnected Optimized

(b)

Figure 5.12: When a switch disconnects from its master driver: (a) One of the slave driverswill become the master. (b) �e reactive forwarding latency is optimized using the placementheuristic.

As shown in Figure 5.13, when we stress only 1 node in the cluster the throughputs of the

learning switch application with di�erent replication factors are very close. As we engage more

controllers, the throughput of the application with replication factors of 3 and 5 demonstrate


increase but at a slower pace. �is is because when we use larger replication factors, other hives

will consume resources to replicate the transactions in the follower bees. Moreover, Beehive’s

throughput is signi�cantly higher than ONOS 1.2 [17] for the same forwarding functionality.

�is experiment demonstrates that, although Beehive is easy-to-use, it can scale considerably

well. In essence, hiding the boilerplates of distributed programming enables us to systematically

optimize the control plane for all applications. In these experiments, ONOS replicates the

forwarding application on all nodes. As a result, unlike Beehive, we cannot use a replication

factor smaller than the cluster size. For example, when we use a 5-node cluster, ONOS uses all

the �ve nodes and we cannot run an experiment with a replication factor of 3.

●

●

●

●

●

1k

200k

400k

600k

800k

1m

1 2 3 4 5Number of Nodes

TP

ut (

msg

/sec

)

Replication● Beehive−0

Beehive−1

Beehive−3

Beehive−5

ONOS−0

ONOS−1

ONOS−3

ONOS−5

Figure 5.13: �roughput of Beehive’s learning switch application and ONOS’s forwardingapplication in di�erent cluster sizes. �e number of nodes (i.e., the x-axis) in this graphrepresents the nodes on which we launch cbench. For Beehive, we always use a 5-node cluster.For ONOS, the cluster size is equal to the replication factor since all nodes in the cluster willparticipate in running the application.

5.3 Simple Routing in Beehive

Routing mechanisms can be more challenging to implement on existing distributed control

platforms compared to a centralized controller, especially if we want to realize a scalable routing

algorithm that requires no manual con�guration and is not topology-speci�c. In this section,

we demonstrate how simple it is to design and implement a generic k-shortest path algorithm

using Beehive.

Distributed Shortest Path. Suppose we already have a topology discovery algorithm that


RoutingRcv(msg RouteAdv)

Rcv(msg LinkDiscovered)

DictionaryNeighbors

DictionaryRoutes

…NN

…N1

dist = msg.Distanceif dist is shortest path: for each N in Neighbors: Emit(RouteAdv{ To: msg.To, From: N, Distance: dist + 1, })

update NeighborsEmit(RouteAdv{msg.From, msg.To})…NN

…N1

Figure 5.14: A distributed shortest path algorithm.

automatically emits a Discovery message whenever an edge is found in the graph. To calculate

the shortest paths in a distributed fashion, we employ an algorithm that is very similar to

traditional distance vector routing protocols. As such, we will emit routing advertisement

messages to signal that we have found a new path towards a given destination. �e routing

advertisements are in the form of R = (N1, N2, D), which means N1 has a route to N2 with a length

of D. Whenever, a node N receives an advertisement, it emits further advertisements by adding its

neighbors to the route i� the received advertisement is a shortest path. For example, suppose N2,

N3 and N4 are the neighbors of N1. When N1 receives a shortest route R = (N1, A, D) to A, it emits

(N2, A, D + 1), (N3, A, D + 1), and (N4, A, D + 1). �is process would continue until the network

converges.

In such a setting, the state of our application is accessed on a per-node basis. �at is, as shown

in Figure 5.14, we have a dictionary that stores the neighbors of each node, and a dictionary that

stores all the routes initiated from each node. Such a simple application can be easily distributed

on several hives to scale.

Convergence Evaluation. We have implemented this routing algorithm using Beehive in a few

hundred lines of code and evaluated our implementation using a simulated network. In these

simulations, we assume a k-ary fat-tree network topology with homogenous switches of 4, 8,


16, 32, 64, and 128 ports (respectively with 8, 16, 128, 512, 2k, 8k edge switches connecting

16, 128, 1k, 8k, 65k, 524k end-hosts). For the purpose of simulation, we use two auxiliary

control applications: one randomly emits discovery events based on the simulated network

topology, and the other receives and calculates the statistics for rule installation commands

from the routing module. We de�ne the convergence time as the di�erence between the �rst

discovery message and the last rule installation. �at is the time it takes to provide full multi-

path connectivity between every single two edge switches in the network.

Using a cluster of six controllers, each allocated four cores and 10GB of memory, we

measured the convergence time (i) when the platform does a cold start (i.e., has no information

about the network topology), (ii) when a new POD is added to the network and (iii) when a new

link is added to the network. In all experiments, each controller controls an equal number of

core switches and PODs.

Note that, we have O(k2) alternative paths between two edge switches in a k-ary fat-tree

network and, hence, O(k6) alternative paths in total. When we use high-density switches with

64 or 128 ports in a fat-tree network, the number of Routemessages exchanged and the memory

to store the routes is well beyond the capacity of a single commodity VM.�is is where Beehive’s

control plane demonstrates its scalability.

As shown in Figure 5.15a, considering the size of the network, this algorithm convergences

in a reasonable time from a cold state. Moreover, when we add a new POD, the system converges

in less than a second for switches of 32-ports or less (Figure 5.15b). More interestingly, as

shown in Figure 5.15c, when we randomly add a new link, the system converges in less than a

second, even for larger networks. By allocating more resources for Beehive, one can seamlessly

improve these results with no modi�cations in the routing application. To demonstrate this, we

conducted another set of experiments for a fat-tree network built from 64-port switches. As

shown in Figure 5.15d, Beehive’s control plane seamlessly scales as we add more controllers and

the convergence time continually diminishes. We note that, if additional computing resources

are available, adding a new controller requires no additional con�guration and is practically


No. of Ports per Switchse

c.4 8 16 32 64 128

08

16

(a) Convergence time from a cold statewhen there is no route.

No. of Ports per Switch

sec.

4 8 16 32 64 1280

2

4

(b) Convergence time when a newPOD is added to the network.


ms.

4 8 16 32 64 1288

64

128

(c) Convergence time when a new linkis added between two randomly se-lected switches.


sec.

4 8 6 8 10 12 14048

(d) Adding more controllers resultsin faster convergence (used 64-portswitches).

Figure 5.15: Convergence time of the �at routing algorithm in a k-ary fat-tree topology.

seamless in Beehive.

5.4 Remarks

In this section, we presented real-world use-cases of our proposals. OpenTCP, the Beehive SDN

controller, and the simple routing algorithm demonstrate that Kandoo and Beehive are quite

scalable, despite their simplicity. It is important to note that our proposals are not limited to

these speci�c usecases and can be applied to a wide range of problems beyond SDN applications.

For instance, Beehive can be utilized to implement a key-value store and a message queue that

are scalable, distributed, and automatically optimized.

Chapter 6

Conclusion and Future Directions

In this thesis, we have deconstructed several scalability concerns of SDN and proposed two

distributed programming models, Kandoo and Beehive, that can scale well while preserving

simplicity. We showed that some of the presumed limitations of SDN are not inherent and only

stem from the historical evolution of OpenFlow. Using the real-world use cases of OpenTCP

and the Beehive SDN controller, we demonstrated that such limitations can be easily overcome

using an aptly designed so�ware platform that is able accommodate various control functions.

In essence, with a well-designed distributed network programming paradigm, we are able to

preserve the promised simplicity of SDN at scale.

�e two-layer hierarchy of controllers in Kandoo (Chapter 3) gives network programmers

the option of choosing between local and network-wide state to implement di�erent control

functionalities. Local controllers, without the need for a centralized network view, have high

parallelism and can be placed close to forwarding elements to shield the control channels from

the load of frequent events. Non-local applications, on the other hand, are deployed in the root

controller and can access the network-wide state. Employing Kandoo, network programmers

merely provide a �ag that whether an application is local or non-local, and the rest is seamlessly

handled by Kandoo. As such, Kandoo is very simple to use, yet scales an order of magnitude

better than centralized controllers.

117

Chapter 6. Conclusion and Future Directions 118

Kandoo enables network programmers to implement applications such as elephant �ow re-

routing and OpenTCP (Chapter 5) in a simple way with low overheads. For example, OpenTCP

can collect the tra�c matrix in a large-scale HPC datacenter without hindering the performance

of MPI jobs, and can adapt TCP to optimize the network. As we demonstrated, OpenTCP can

considerably improve �ow completion times, which interestingly improves the perceived end-

to-end performance of the MPI jobs.

�e generic programming model proposed in Beehive (Chapter 4) has a wider application

domain than Kandoo and can support di�erent types of functions in addition to local and

centralized applications. Using Beehive’s programming model, applications are written as

centralized message handlers that store their state in dictionaries. Applications written in that

programming model are automatically transformed (either at compile-time or at runtime) into

their distributed counterparts while preserving their intended behavior. Beehive hides many

boilerplates of distributed programming such as concurrency, synchronization, and replication.

With runtime instrumentation and placement optimization, Beehive is capable of automatically

improving the performance of the control plane.

Beehive is expressive and can be used to implement other distributed controllers including

Kandoo. Using several SDN and non-SDN use cases, such as a key-value store and an

SDN controller (Chapter 5), we demonstrated that real-world, distributed applications can

be implemented in Beehive with an e�ort comparable to centralized controllers. Despite

their simplicity, the throughput and the latency of the implemented applications are on par

with existing distributed controllers. More importantly, Beehive instruments applications at

runtime and automatically optimizes the placement of control applications. Using Beehive SDN

controller, we have demonstrated that this automatic optimized placement results in signi�cant

improvements in the perceived latency of the controller without any intervention by network

programmers.

Future Direction. Over the course of this research, we have encountered interesting open

problems that have not been addressed in this thesis. We believe addressing the following open


problems can result in valuable and impactful research:

1. AlternativeOptimizedPlacementMethods: In Beehive, we presented a greedy algorithm

that tries to optimize the control plane’s latency by reducing the volume of inter-hive

tra�c. Although this is very e�ective for forwarding SDN applications (as shown in

Chapter 5), there are optimization approaches that can be exploited for other types of

applications. For example, another potentially useful placement method is to minimize

the tail latency of control applications. �at objective can indeed increase the inter-hive

tra�c, but may result in better performance for some speci�c applications. Moreover,

extending Beehive’s runtime instrumentation, one can provide richer information to the

placement and the optimization methods.

2. Acceleration: Porting Kandoo and Beehive runtimes to switches and co-processors in

forwarding hardware enables us to push the control functions as close as possible to the

data path. �is can result in considerable performance improvements and can widen the

application range of our proposals. It is important to note that this does not result in

coupling the control plane back with the data plane. Instead, we still keep the control

plane decoupled, yet provide an option of running code closest to the data path. �at

is, the control plane is still decoupled from the data plane, but is placed directly on the

forwarding element. We do not aim at packet processing in the control plane, but with

this acceleration we can process more frequent events at scale.

Another approach to accelerate network programs is to compile local control application

(either Kandoo local applications or Beehive application with local mapped cells) onto

programmable hardware (e.g., NetFPGA [94]). �is cross compilation will probably need

a limited programming abstraction, but can result in a system that is capable of providing

scalable packet processing. More importantly, with such a cross compilation, Beehive can

automatically choose the placement of these functions and can seamlessly optimize the

network.


3. Veri�cation: Given that Beehive and Kandoo both have well-de�ned structures for

control application, it is feasible to develop static analyzers for control applications that are

able to detect bugs or potential performance bottlenecks. We developed such an analyzer

to automatically generate Map functions based on message handlers. �is static analyzer

can be extended to detect inter-dependencies of message handlers and to provide the cases

in which an application will become centralized. Such a static analyzer can also shed light

on potential performance problems and bugs (e.g., deadlocks) without the need to test the

application at runtime.

Bibliography

[1] Apache Storm. http://storm.apache.org/.

[2] Beehive. http://github.com/kandoo/beehive.

[3] Beehive Network Controller. http://github.com/kandoo/beehive-netctrl.

[4] OpenDaylight - An Open Source Community and Meritocracy for So�wareDe�ned

Networking. http://bit.ly/XJlWjU.

[5] Packet Binary Serialization Framework. http://github.com/packet/packet.

[6] Quagga Routing Suite. http://www.nongnu.org/quagga/.

[7] Station and Media Access Control Connectivity Discovery, IEEE Standard 802.1AB.

http://tinyurl.com/6s739pe.

[8] TCP Flow Spy. https://github.com/soheilhy/tcp_flow_spy.

[9] Intel MPI Benchmarks, http://so�ware.intel.com/en-us/articles/intel-mpi-benchmarks/.

[10] M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic

Flow Scheduling for Data Center Networks. In Proceedings of NSDI’10, pages 19–19, 2010.

[11] E. Al-Shaer, W. Marrero, A. El-Atawy, and K. Elbadawi. Network Con�guration in a Box:

Towards End-to-End Veri�cation of Network Reachability and Security. In Proceedings

of ICNP’09, pages 123–132, 2009.

121

http://storm.apache.org/

http://github.com/kandoo/beehive

http://github.com/kandoo/beehive-netctrl

http://bit.ly/XJlWjU

http://github.com/packet/packet

http://www.nongnu.org/quagga/

http://tinyurl.com/6s739pe

https://github.com/soheilhy/tcp_flow_spy

Bibliography 122

[12] M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and

M. Sridharan. Data Center TCP (DCTCP). In Proceedings of SIGCOMM’10, pages 63–74,

2010.

[13] P. Alvaro, T. Condie, N. Conway, K. Elmeleegy, J. M. Hellerstein, and R. Sears.

BOOM Analytics: Exploring Data-centric, Declarative Programming for the Cloud. In

Proceedings of EuroSys’10, pages 223–236, 2010.

[14] M. B. Anwer, M. Motiwala, M. b. Tariq, and N. Feamster. SwitchBlade: A Platform for

Rapid Deployment of Network Protocols on Programmable Hardware. In Proceedings of

SIGCOMM’10, pages 183–194, 2010.

[15] A. AuYoung, Y. Ma, S. Banerjee, J. Lee, P. Sharma, Y. Turner, C. Liang, and J. C. Mogul.

Democratic Resolution of Resource Con�icts Between SDN Control Programs. In

Proceedings of CoNEXT’14, pages 391–402, 2014.

[16] T. Benson, A. Akella, and D. A. Maltz. Network Tra�c Characteristics of Data Centers in

the Wild. In Proceedings of IMC’10, pages 267–280, 2010.

[17] P. Berde, M. Gerola, J. Hart, Y. Higuchi, M. Kobayashi, T. Koide, B. Lantz, B. O’Connor,

P. Radoslavov, W. Snow, and G. Parulkar. ONOS: Towards an Open, Distributed SDN OS.

In Proceedings of HotSDN’14, pages 1–6, 2014.

[18] M. S. Blumenthal and D. D. Clark. Rethinking the Design of the Internet: the End-to-End

arguments vs. the Brave New World. ACM Transaction on Internet Technologies, 1(1):70–

109, Aug. 2001.

[19] R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou.

Cilk: An E�cient Multithreaded Runtime System. Journal of Parallel and Distributed

Computing, 37(1):55–69, August 1995.

Bibliography 123

[20] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger,

D. Talayco, A. Vahdat, G. Varghese, and D. Walker. P4: Programming Protocol-

independent Packet Processors. SIGCOMM Comput. Commun. Rev., 44(3):87–95, July

2014.

[21] E. Brewer. CAP Twelve Years Later: How the "Rules" Have Changed. Computer Magazine,

45(2):23–29, 2012.

[22] M. Caesar, D. Caldwell, N. Feamster, J. Rexford, A. Shaikh, and J. van der Merwe. Design

and Implementation of a Routing Control Platform. In Proceedings of the 2nd Symposium

on Networked Systems Design & Implementation, volume 2, pages 15–28, 2005.

[23] M. Canini, P. Kuznetsov, D. Levin, and S. Schmid. So�ware Transactional Networking:

Concurrent and Consistent Policy Composition. In Proceedings of HotSDN’13, pages 1–6,

2013.

[24] M. Canini, D. Venzano, P. Perešíni, D. Kostić, and J. Rexford. A NICE Way to Test

Open�ow Applications. In Proceedings of NSDI’12, pages 10–10, 2012.

[25] M. Casado, M. J. Freedman, J. Pettit, J. Luo, N. McKeown, and S. Shenker. Ethane: Taking

Control of the Enterprise. In Proceedings of SIGCOMM’07, pages 1–12, 2007.

[26] M. Casado, T. Gar�nkel, A. Akella, M. J. Freedman, D. Boneh, N. McKeown, and

S. Shenker. SANE: A Protection Architecture for Enterprise Networks. In Proceedings

of USENIX-SS’06, 2006.

[27] M. Casado, T. Koponen, S. Shenker, and A. Tootoonchian. Fabric: A Retrospective on

Evolving SDN. In Proceedings of HotSDN’12, pages 85–90, 2012.

[28] B. Chandrasekaran and T. Benson. Tolerating SDN Application Failures with LegoSDN.

In Proceedings of HotNets-XIII, pages 22:1–22:7, 2014.

Bibliography 124

[29] C. Chang, J. Wawrzynek, and R. Brodersen. BEE2: A High-End Recon�gurable

Computing System. Design Test of Computers, IEEE, 22(2):114–125, 2005.

[30] Y. Chen, R. Gri�th, J. Liu, R. H. Katz, and A. D. Joseph. Understanding TCP Incast

�roughput Collapse in Datacenter Networks. In Proceedings of WREN’09, pages 73–82,

2009.

[31] Z. Chen, Q. Gao, W. Zhang, and F. Qin. FlowChecker: Detecting Bugs in MPI Libraries

via Message Flow Checking. In Proceedings of SC’10, pages 1–11, 2010.

[32] J. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev,

C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik,

D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor,

R. Wang, and D. Woodford. Spanner: Google’s Globally-Distributed Database. In

Proceedings of OSDI’12, pages 251–264, Berkeley, CA, USA, 2012.

[33] A. Curtis, W. Kim, and P. Yalagandula. Mahout: Low-overhead Datacenter Tra�c

Management Using End-Host-Based Elephant Detection. In Proceedings of INFOCOM’11,

pages 1629–1637, 2011.

[34] A. R. Curtis, J. C. Mogul, J. Tourrilhes, P. Yalagandula, P. Sharma, and S. Banerjee.

DevoFlow: Scaling Flow Management for High-Performance Networks. In Proceedings

of SIGCOMM’11, pages 254–265, 2011.

[35] J. Dean and S. Ghemawat. MapReduce: Simpli�ed Data Processing on Large Clusters.

Commun. ACM, 51(1):107–113, Jan. 2008.

[36] D. Decasper and B. Plattner. DAN: Distributed Code Caching for Active Networks. In

Proceedings of INFOCOM’98, volume 2, pages 609–616, 1998.

[37] A. A. Dixit, F. Hao, S. Mukherjee, T. Lakshman, and R. Kompella. ElastiCon: An Elastic

Distributed Sdn Controller. In Proceedings of ANCS’14, pages 17–28, 2014.

Bibliography 125

[38] S. Donovan and N. Feamster. Intentional Network Monitoring: Finding the Needle

Without Capturing the Haystack. In Proceedings of HotNets’14, pages 5:1–5:7, 2014.

[39] D. Erickson. �e Beacon Open�ow Controller. In Proceedings of HotSDN’13, pages 13–18,

2013.

[40] N. Farrington, G. Porter, S. Radhakrishnan, H. H. Bazzaz, V. Subramanya, Y. Fainman,

G. Papen, and A. Vahdat. Helios: A Hybrid Electrical/Optical Switch Architecture for

Modular Data Centers. In Proceedings of SIGCOMM’10, pages 339–350, 2010.

[41] A. D. Ferguson, A. Guha, C. Liang, R. Fonseca, and S. Krishnamurthi. A Northbound

API for Sharing SDNs. In Open Networking Summit’13, 2013.

[42] A. D. Ferguson, A. Guha, C. Liang, R. Fonseca, and S. Krishnamurthi. Participatory

Networking: An API for Application Control of SDNs. In Proceedings of SIGCOMM’13,

pages 327–338, 2013.

[43] B. Fitzpatrick. Distributed Caching with Memcached. Linux Journal, 2004(124):5, August

2004.

[44] M. P. Forum. MPI: A Message-Passing Interface Standard. Technical report, Knoxville,

TN, USA, 1994.

[45] N. Foster, R. Harrison, M. J. Freedman, C. Monsanto, J. Rexford, A. Story, and D. Walker.

Frenetic: A Network Programming Language. In Proceedings of ICFP’11, pages 279–291,

2011.

[46] M. Ghobadi, S. Hassas Yeganeh, and Y. Ganjali. Rethinking End-to-End Congestion

Control in So�ware-De�ned Networks. In Proceedings of HotNets’12, pages 61–66, 2012.

[47] S. Ghorbani and M. Caesar. Walk the Line: Consistent Network Updates With Bandwidth

Guarantees. In Proceedings of HotSDN’12, pages 67–72, 2012.

http://dx.doi.org/10.1145/2390231.2390242

http://dx.doi.org/10.1145/2390231.2390242

Bibliography 126

[48] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel,

and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. Commun. ACM,

54(3):95–104, Mar. 2011.

[49] A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, J. Rexford, G. Xie, H. Yan, J. Zhan,

and H. Zhang. A Clean Slate 4D Approach to Network Control and Management.

SIGCOMM CCR, 35(5):41–54, October 2005.

[50] A. Greenhalgh, F. Huici, M. Hoerdt, P. Papadimitriou, M. Handley, and L. Mathy. Flow

Processing and the Rise of Commodity Network Hardware. SIGCOMM CCR, 39(2):20–

26, 2009.

[51] N. Gude, T. Koponen, J. Pettit, B. Pfa�, M. Casado, N. McKeown, and S. Shenker. NOX:

Towards an Operating System for Networks. SIGCOMM CCR, 38(3):105–110, July 2008.

[52] S. Ha, I. Rhee, and L. Xu. CUBIC: A New TCP-friendly High-speed TCP Variant. SIGOPS

Oper. Syst. Rev., 42(5):64–74, July 2008.

[53] D. Halperin, S. Kandula, J. Padhye, P. Bahl, and D. Wetherall. Augmenting Data Center

Networks with Multi-gigabit Wireless Links. In Proceedings of SIGCOMM’11, pages 38–49,

2011.

[54] N. Handigol, B. Heller, V. Jeyakumar, D. Maziéres, and N. McKeown. Where is the

Debugger for My So�ware-De�ned Network? In Proceedings of HotSDN’12, pages 55–

60, 2012.

[55] M. Handley, O. Hodson, and E. Kohler. XORP: An Open Platform for Network Research.

SIGCOMM CCR, 33(1):53–57, Jan. 2003.

[56] D. Harrington, R. Presuhn, and B. Wijnen. An Architecture for Describing Simple

Network Management Protocol (SNMP) Management Frameworks. http://www.

ietf.org/rfc/rfc3411.txt, Dec. 2002.

http://www.ietf.org/rfc/rfc3411.txt


Bibliography 127

[57] S. Hassas Yeganeh and Y. Ganjali. Kandoo: A Framework for E�cient and Scalable

O�oading of Control Applications. In Proceedings of the HotSDN’12, pages 19–24, 2012.

[58] S. Hassas Yeganeh and Y. Ganjali. Turning the Tortoise to the Hare: An Alternative

Perspective on Event Handling in SDN. In Proceedings of BigSystem’14, pages 29–32, 2014.

[59] S. Hassas Yeganeh and Y. Ganjali. Beehive: Towards a Simple Abstraction for Scalable

So�ware-De�ned Networking. In Proceedings of HotNets’14, pages 13:1–13:7, 2014.

[60] S. Hassas Yeganeh, A. Tootoonchian, and Y. Ganjali. On Scalability of So�ware-De�ned

Networking. IEEE Communications Magazine, 51(2):136–141, February 2013.

[61] B. Heller, R. Sherwood, and N. McKeown. �e Controller Placement Problem. In

Proceedings of HotSDN’12, pages 7–12, 2012.

[62] C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer.

Achieving High Utilization with So�ware-driven WAN. In Proceedings of SIGCOMM’13,

pages 15–26, 2013.

[63] J. E. Hopcro� and J. D. Ullman. Introduction to Automata �eory, Languages, and

Computation. Adison-Wesley Publishing Company, Reading, Massachusets, USA, 1979.

[64] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free Coordination for

Internet-scale Systems. In Proceedings of USENIXATC’10, pages 11–11, Berkeley, CA, USA,

2010.

[65] V. Jacobson. Congestion Avoidance and Control. SIGCOMM CCR, 25(1):157–187, 1995.

[66] S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer,

J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Experience with a

Globally-deployed So�ware De�ned WAN. In Proceedings of SIGCOMM’13, pages 3–14,

2013.

http://dx.doi.org/10.1145/2342441.2342446

http://dx.doi.org/10.1145/2342441.2342446

http://dx.doi.org/10.1145/2609441.2609642

http://dx.doi.org/10.1145/2609441.2609642

http://dx.doi.org/10.1145/2670518.2673864

http://dx.doi.org/10.1145/2670518.2673864

Bibliography 128

[67] L. Jose, L. Yan, G. Varghese, and N. McKeown. Compiling Packet Programs to

Recon�gurable Switches. In Proceedings of NSDI’15, 2015.

[68] N. Katta, H. Zhang, M. Freedman, and J. Rexford. Ravana: Controller Fault-Tolerance in

So�ware-De�ned Networking. In Proceedings of SOSR’15, 2015.

[69] P. Kazemian, G. Varghese, and N. McKeown. Header Space Analysis: Static Checking for

Networks. In Proceedings of NSDI’12, pages 9–9, 2012.

[70] A. Khurshid, W. Zhou, M. Caesar, and P. B. Godfrey. VeriFlow: Verifying Network-wide

Invariants in Real Time. In Proceedings of HotSDN’12, pages 49–54, 2012.

[71] A. Khurshid, X. Zou, W. Zhou, M. Caesar, and P. B. Godfrey. VeriFlow: Verifying

Network-wide Invariants in Real Time. In Proceedings NSDI’13, pages 15–28, 2013.

[72] H. Kim, J. Reich, A. Gupta, M. Shahbaz, N. Feamster, and R. Clark. Kinetic: Veri�able

Dynamic Network Control. In Proceedings of NSDI’15, 2015.

[73] E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. �e Click Modular Router.

ACM Trans. Comput. Syst., 18(3):263–297, Aug. 2000.

[74] T. Koponen, K. Amidon, P. Balland, M. Casado, A. Chanda, B. Fulton, I. Ganichev,

J. Gross, N. Gude, P. Ingram, E. Jackson, A. Lambeth, R. Lenglet, S.-H. Li, A. Padman-

abhan, J. Pettit, B. Pfa�, R. Ramanathan, S. Shenker, A. Shieh, J. Stribling, P. �akkar,

D. Wendlandt, A. Yip, and R. Zhang. Network Virtualization in Multi-tenant Datacenters.

In Proceedings of NSDI’14, pages 203–216, 2014.

[75] T. Koponen, M. Casado, N. Gude, J. Stribling, L. Poutievski, M. Zhu, R. Ramanathan,

Y. Iwata, H. Inoue, T. Hama, and S. Shenker. Onix: A Distributed Control Platform for

Large-scale Production Networks. In Proceedings of OSDI’10, pages 1–6, 2010.

[76] A. Krishnamurthy, S. P. Chandrabose, and A. Gember-Jacobson. Pratyaastha: An E�cient

Elastic Distributed SDN Control Plane. In Proceedings of HotSDN’14, pages 133–138, 2014.

Bibliography 129

[77] A. Lakshman and P. Malik. Cassandra: A Decentralized Structured Storage System.

SIGOPS Oper. Syst. Rev., 44(2):35–40, Apr. 2010.

[78] B. Lantz, B. Heller, and N. McKeown. A Network in a Laptop: Rapid Prototyping for

So�ware-de�ned Networks. In Proceedings of HotNets-IX, pages 19:1–19:6, 2010.

[79] H. H. Liu, X. Wu, M. Zhang, L. Yuan, R. Wattenhofer, and D. Maltz. zUpdate: Updating

Data Center Networks with Zero Loss. In Proceedings of SIGCOMM’13, pages 411–422,

2013.

[80] J. Liu, M. D. George, K. Vikram, X. Qi, L. Waye, and A. C. Myers. Fabric: A Platform for

Secure Distributed Computation and Storage. In Proceedings of SOSP’09, pages 321–334,

2009.

[81] C. A. B. Macapuna, C. E. Rothenberg, and F. Magalh. In-packet Bloom �lter based

data center networking with distributed OpenFlow controllers. In Proceedings of IEEE

Globecom MENS’10, pages 584–588, 2010.

[82] H. Mai, A. Khurshid, R. Agarwal, M. Caesar, P. B. Godfrey, and S. T. King. Debugging the

Data Plane with Anteater. In Proceedings of SIGCOMM’11, pages 290–301, 2011.

[83] G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski.

Pregel: A System for Large-scale Graph Processing. In Proceedings of SIGMOD’10, pages

135–146, 2010.

[84] N. Marz and J. Warren. Big Data: Principles and Best Practices of Scalable Realtime Data

Systems. Manning Publications Co., 1st edition, 2015.

[85] T. McCabe. A Complexity Measure. IEEE Transactions on So�ware Engineering, (4):308–

320, 1976.

[86] J. McCauley, A. Panda, M. Casado, T. Koponen, and S. Shenker. Extending SDN to Large-

Scale Networks. In Open Networking Summit, 2013.

Bibliography 130

[87] R. McGeer. A Safe, E�cient Update Protocol for Open�ow Networks. In Proceedings of

HotSDN’12, pages 61–66, 2012.

[88] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford,

S. Shenker, and J. Turner. OpenFlow: Enabling Innovation in Campus Networks.

SIGCOMM CCR, 38(2):69–74, March 2008.

[89] A. Medina, N. Ta�, K. Salamatian, S. Bhattacharyya, and C. Diot. Tra�c Matrix

Estimation: Existing Techniques and New Directions. In Proceedings of SIGCOMM’02,

pages 161–174, 2002.

[90] J. C. Mogul, A. AuYoung, S. Banerjee, L. Popa, J. Lee, J. Mudigonda, P. Sharma, and

Y. Turner. Corybantic: Towards the Modular Composition of SDN Control Programs.

In Proceedings of HotNets-XII, pages 1:1–1:7, 2013.

[91] C. Monsanto, J. Reich, N. Foster, J. Rexford, and D. Walker. Composing So�ware-de�ned

Networks. In Proceedings of NSDI’13, pages 1–14, 2013.

[92] M. Moshref, M. Yu, R. Govindan, and A. Vahdat. DREAM: Dynamic Resource Allocation

for So�ware-de�ned Measurement. In Proceedings of SIGCOMM’14, pages 419–430, 2014.

[93] J. Moy. OSPF Version 2. http://www.ietf.org/rfc/rfc2328.txt, 1998.

[94] J. Naous, G. Gibb, S. Bolouki, and N. McKeown. NetFPGA: reusable router architecture

for experimental research. In Proceedings of PRESTO’08, pages 1–7, 2008.

[95] OASIS. Advanced Message Queuing Protocol (AMQP), Version 1.0. http://docs.

oasis-open.org/amqp/core/v1.0/os/amqp-core-complete-v1.0-os.pdf,

2011. [retrieved 23 April 2015].

[96] D. Ongaro and J. Ousterhout. In Search of an Understandable Consensus Algorithm. In

Proceedings of USENIX ATC’14, pages 305–320, 2014.


http://docs.oasis-open.org/amqp/core/v1.0/os/amqp-core-complete-v1.0-os.pdf

http://docs.oasis-open.org/amqp/core/v1.0/os/amqp-core-complete-v1.0-os.pdf

Bibliography 131

[97] D. Oran. OSI IS-IS Intra-domain Routing Protocol. http://www.ietf.org/rfc/

rfc1142.txt, 1990.

[98] K. Ostrowski, K. Birman, D. Dolev, and J. H. Ahnn. Programming With Live Distributed

Objects. In Proceedings of ECOOP’08, pages 463–489, 2008.

[99] J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra,

A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and

R. Stutsman. �e Case for RAMCloud. Commun. ACM, 54(7):121–130, July 2011.

[100] A. Panda, C. Scott, A. Ghodsi, T. Koponen, and S. Shenker. CAP for Networks. In

Proceedings of HotSDN’13, pages 91–96, 2013.

[101] B. Pfa�, J. Pettit, K. Amidon, M. Casado, T. Koponen, and S. Shenker. Extending

Networking into the Virtualization Layer. In Proceedings of HotNets-VIII, 2009.

[102] P. Phaal and M. Lavine. sFlow Version 5. http://www.sflow.org/sflow_version_

5.txt, 2004.

[103] K. Phemius, M. Bouet, and J. Leguay. DISCO: Distributed Multi-domain SDN

Controllers. In Proceedings of NOMS 2014, pages 1–4, 2014.

[104] M. Reitblatt, N. Foster, J. Rexford, C. Schlesinger, and D. Walker. Abstractions for Network

Update. In Proceedings of SIGCOMM’12, pages 323–334, 2012.

[105] E. Rosen, A. Viswanathan, and R. Callon. Multiprotocol Label Switching Architecture.

RFC 3031 (Proposed Standard), Jan. 2001.

[106] J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-End Arguments in System Design. ACM

Trans. Comput. Syst., 2(4):277–288, Nov. 1984.

[107] B. Schwartz, A. W. Jackson, W. T. Strayer, W. Zhou, R. D. Rockwell, and C. Partridge.

Smart Packets: Applying Active Networks to Network Management. ACM Transactions

on Computer Systems, 18(1):67–88, Feb. 2000.



http://www.sflow.org/sflow_version_5.txt

http://www.sflow.org/sflow_version_5.txt

Bibliography 132

[108] C. Scott, A. Wundsam, B. Raghavan, A. Panda, A. Or, J. Lai, E. Huang, Z. Liu, A. El-

Hassany, S. Whitlock, H. Acharya, K. Zari�s, and S. Shenker. Troubleshooting Blackbox

SDN Control So�ware with Minimal Causal Sequences. In Proceedings of SIGCOMM’14,

pages 395–406, 2014.

[109] L. I. Sedov. Similarity and Dimensional Methods in Mechanics. Moscow, 1982. Translated

from the Russian by V. I. Kisin.

[110] V. Sekar, N. Egi, S. Ratnasamy, M. K. Reiter, and G. Shi. Design and Implementation of a

Consolidated Middlebox Architecture. In Proceedings of NSDI’12, pages 24–24, 2012.

[111] R. Sherwood. O�ops. http://archive.openflow.org/wk/index.php/Oflops.

[112] R. Sherwood, M. Chan, A. Covington, G. Gibb, M. Flajslik, N. Handigol, T.-Y. Huang,

P. Kazemian, M. Kobayashi, J. Naous, S. Seetharaman, D. Underhill, T. Yabe, K.-K. Yap,

Y. Yiakoumis, H. Zeng, G. Appenzeller, R. Johari, N. McKeown, and G. Parulkar. Carving

Research Slices out of Your Production Networks with OpenFlow. SIGCOMM CCR,

40(1):129–130, Jan 2010.

[113] A. Shieh, S. Kandula, and E. G. Sirer. SideCar: Building Programmable Datacenter

Networks Without Programmable Switches. In Proceedings of HotNets-IX, pages 21:1–21:6,

2010.

[114] J. M. Smith, D. J. Farber, C. A. Gunter, S. M. Nettles, D. C. Feldmeier, and W. D. Sincoskie.

SwitchWare: Accelerating Network Evolution (White Paper). Technical report, 1996.

[115] A.-W. Tam, K. Xi, and H. Chao. Use of Devolved Controllers in Data Center Networks.

In Processings of the IEEE Computer Communications Workshops, pages 596–601, 2011.

[116] A. Tootoonchian and Y. Ganjali. HyperFlow: A Distributed Control Plane for OpenFlow.

In Proceedings of INM/WREN’10, pages 3–3, 2010.

http://archive.openflow.org/wk/index.php/Oflops

Bibliography 133

[117] A. Tootoonchian, M. Ghobadi, and Y. Ganjali. OpenTM: Tra�c Matrix Estimator for

OpenFlow Networks. In Proceedings of PAM’10, pages 201–210, Berlin, Heidelberg, 2010.

Springer-Verlag.

[118] A. Tootoonchian, S. Gorbunov, Y. Ganjali, M. Casado, and R. Sherwood. On Controller

Performance in So�ware-De�ned Networks. In Proceedings of Hot-ICE’12, pages 10–10,

2012.

[119] V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger,

G. A. Gibson, and B. Mueller. Safe and E�ective Fine-grained TCP Retransmissions for

Datacenter Communication. In Proceedings of SIGCOMM’09, pages 303–314, 2009.

[120] A. Voellmy and P. Hudak. Nettle: Taking the Sting Out of Programming Network Routers.

In Proceedings of PADL’11, pages 235–249, 2011.

[121] A. Voellmy, H. Kim, and N. Feamster. Procera: A Language for High-Level Reactive

Network Control. In Proceedings of HotSDN’12, pages 43–48, 2012.

[122] D. M. Volpano, X. Sun, and G. G. Xie. Towards Systematic Detection and Resolution of

Network Control Con�icts. In Proceedings of HotSDN’14, pages 67–72, 2014.

[123] A. Wang, W. Zhou, B. Godfrey, and M. Caesar. So�ware-De�ned Networks as Databases.

In ONS’14, 2014.

[124] G. Wang, T. E. Ng, and A. Shaikh. Programming Your Network at Run-time for Big Data

Applications. In Proceedings of HotSDN’12, pages 103–108, 2012.

[125] D. J. Wetherall, J. V. Guttag, and D. L. Tennenhouse. ANTS: A Toolkit for Building and

Dynamically Deploying Network Protocols. In Proceedings of the IEEE Open Architectures

and Network Programming conference, pages 117–129, 1998.

[126] D. J. Wetherall and D. L. Tennenhouse. �e ACTIVE IP option. In Proceedings of EW 7,

pages 33–40, 1996.

Bibliography 134

[127] H. Yan, D. A. Maltz, T. E. Ng, H. Gogineni, H. Zhang, and Z. Cai. Tesseract: A 4D Network

Control Plane. In Proceedings of NSDI’07, 2007.

[128] L. Yang, R. Dantu, T. Anderson, and R. Gopal. Forwarding and Control Element Sepa-

ration (ForCES) Framework. RFC 3746 (http://tools.ietf.org/html/rfc3746),

April 2004.

[129] C. Yu, C. Lumezanu, Y. Zhang, V. Singh, G. Jiang, and H. V. Madhyastha. FlowSense:

Monitoring Network Utilization with Zero Measurement Cost. In Proceedings of PAM’13,

pages 31–41, 2013.

[130] M. Yu, L. Jose, and R. Miao. So�ware-de�ned tra�c measurement with opensketch. In

Proceedings of NSDI’13, pages 29–42, 2013.

[131] M. Yu, J. Rexford, M. J. Freedman, and J. Wang. Scalable Flow-based Networking with

DIFANE. In Proceedings of SIGCOMM’10, pages 351–362, 2010.

[132] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin,

S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-tolerant Abstraction

for In-memory Cluster Computing. In Proceedings of NSDI’12, pages 2–2, 2012.

[133] M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized Streams: Fault-

tolerant Streaming Computation at Scale. In Proceedings of SOSP’13, pages 423–438, 2013.

[134] T. S. E. N. Zheng Cai, Alan L. Cox. Maestro: A System for Scalable OpenFlow Control.

Technical report, Department of Computer Science, Rice University, 2010.

[135] W. Zhou, D. Jin, J. Cro�, M. Caesar, and P. B. Godfrey. Enforcing Customizable

Consistency Properties in So�ware-De�ned Networks. In Proceedings of NSDI’15, pages

73–85, 2015.

http://tools.ietf.org/html/rfc3746

Documents

S⁄fl⁄’v D⁄‹†¤⁄g•†vp P¤ı‡¤[flfl⁄Œ‡ S N · Beehive is a more generic proposal built around a programming model that is similar ii. to centralized controllers,