Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Simple Distributed Programmingfor
Scalable Software-Defined Networks
by
Soheil Hassas Yeganeh
A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy
Graduate Department of Computer ScienceUniversity of Toronto
Copyright © 2015 by Soheil Hassas Yeganeh
Abstract
Simple Distributed Programmingfor
Scalable So�ware-De�ned Networks
Soheil Hassas YeganehDoctor of Philosophy
Graduate Department of Computer ScienceUniversity of Toronto
2015
So�ware De�ned Networking (SDN) aims at simplifying the process of network programming,
by decoupling the control and data planes. �is is o�en exempli�ed by implementing a
complicated control logic as a simple control application deployed on a centralized controller.
In practice, for scalability and resilience, such simple control applications are implemented
as complex distributed applications, that demand signi�cant e�orts to develop, measure, and
optimize. Such complexities obviate the bene�ts of SDN.
In this thesis, we present two distributed control platforms, Kandoo and Beehive, that are
straightforward to program. Kandoo is a two-layer control plane: (i) the bottom layer is a colony
of controllers with only local state, and (ii) the top layer is a logically centralized controller
that maintains the network-wide state. Controllers at the bottom layer run only local control
applications (i.e., applications that can function using the state of a single switch) near datapath.
�ese controllers handle frequent events and shield the top layer. Using Kandoo, programmers
�ag their applications as either local or non-local, and the o�oading process is seamlessly
handled by the platform. Our evaluations show that a network controlled by Kandoo has an
order of magnitude lower control channel consumption compared to centralized controllers.
We demonstrate that Kandoo can be used to adapt and optimize Transform Control Protocol
(TCP) in a large-scale High-Performance Computing (HPC) datacenter with a low network
overhead.
Beehive is a more generic proposal built around a programming model that is similar
ii
to centralized controllers, yet enables the platform to automatically infer how applications
maintain their state and depend on one another. Using this programming model, the platform
automatically generates the distributed version of each control application. With runtime
instrumentation, the platform dynamically and seamlessly migrates applications among con-
trollers aiming to optimize the control plane. Beehive also provides feedback to identify design
bottlenecks in control applications, helping developers enhance the performance of the control
plane. Implementing a distributed controller and a simple routing algorithm using Beehive,
we demonstrate that Beehive is as simple to use as centralized controllers and can scale on
par with existing distributed controllers. Moreover, we demonstrate that Beehive automatically
improves control plane’s latency by optimizing the placement of control applications without
any intervention by network operators.
iii
Dedication
To Mom, Dad, and Shabnam.
iv
Acknowledgements
Above all, I would like to sincerely thank my advisor, Yashar Ganjali, for his support, for hisdedication, and for being an amazing source of encouragement and inspiration. I considermyself very fortunate to have him as my supervisor, mentor and friend. I am also grateful tothe members of my supervisory committee Eyal Delara and David Lie for their continuousguidance, and to my committee members Nick Feamster and Angela Demke Brown for theirinsightful and invaluable feedbacks on my research.
I would also like to thank Renèe Miller for her strong support and continuous researchadvice. I cannot go without mentioning my friends and co-authors Oktie Hassanzadeh, KavehAasaraai, and Ken Pu who have been extremely helpful during my �ve years at the Universityof Toronto. I am also grateful to the SciNet team, especially Chris Loken, Daniel Gruner, andChing-Hsing Yu, for trusting and supporting me during my experiments.
�is thesis is dedicated to my wife and my parents for their unconditional love, constantencouragement and support.
v
Contents
1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 �esis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Background 17
2.1 Precursors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 ForCES: An initial proposal . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.1.2 4D: Scalable Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.3 Ethane: Centralized Authentication . . . . . . . . . . . . . . . . . . . . . . 21
2.1.4 OpenFlow: Convergence of Proposals . . . . . . . . . . . . . . . . . . . . 22
2.2 Forwarding Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Control Plane Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.1 In-Data-Plane Proposals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.2 Distributed Control Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4 Resilience to Failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5 Programming Abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
vi
2.6 Correctness and Consistency Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.7 Monitoring and Measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.8 Network Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3 Kandoo 44
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.2 Design and Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4 Beehive 59
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Programming Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 Control Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3.1 Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3.2 Fault-Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.3.3 Live Migration of Bees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.4 Runtime Instrumentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.5 Automatic Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.6 Application Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.6 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5 Use Cases 91
5.1 OpenTCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.1.1 OpenTCP Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
vii
5.1.2 OpenTCP Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.1.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.2 Beehive SDN Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.3 Simple Routing in Beehive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6 Conclusion and Future Directions 117
Bibliography 120
viii
List of Tables
4.1 Sample use-cases implemented in Beehive and their code size . . . . . . . . . . . 84
5.1 �e approximate OpenTCP overhead for a network with ∼4000 nodes and ∼100
switches for di�erent Oracle refresh cycles. . . . . . . . . . . . . . . . . . . . . . . 106
ix
List of Figures
1.1 So�ware-De�ned Networking: A generic scheme. . . . . . . . . . . . . . . . . . . 2
1.2 Using an eventually consistent state, two controllers can steer tra�c (e.g., the
green and the red �ows) onto the same link, which results in an unnecessary
oversubscription in the network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 In a distributed routing application that tries to avoid loops in a network, we
need coordination and leader election mechanisms to avoid service disruption.
Without such mechanisms, an eventually consistent state may lead to faults in
the network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 When a virtual network migrates, the control plane can migrate the respective
virtual networking application instance. Without �ne-grained control on the
external datastore, however, such a migration is ine�ective. . . . . . . . . . . . . 7
1.5 Kandoo has two layers of controllers: Local controllers run local applications
and handle frequent events, while the root controller runs non-local applica-
tions processing in-frequent events. . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Beehive’s control platform is built based on the intuitive primitives of hives,
bees, and cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 ForCES Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 4D Four Planes: (i) data, (ii) discovery, (iii) dissemination, and (iv) decision planes 20
2.3 OpenFlow Switch: An OpenFlow 1.1 switch stores multiple �ow tables each
consisting of multiple �ow-entries. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
x
2.4 ONIX maintains the network-wide state in a distributed fashion. . . . . . . . . . 29
2.5 ONOS maintains the network-wide state using an external database system. . . 30
2.6 �e �ve steps when converging upon a link failure. . . . . . . . . . . . . . . . . . 33
3.1 Kandoo’s Two Levels of Controllers. Local controllers handle frequent events,
while a logically centralized root controller handles rare events. . . . . . . . . . . 46
3.2 Toy example for Kandoo’s design: In this example, two hosts are connected using
a simple line topology. Each switch is controlled by one local Kandoo controller.
�e root controller controls the local controllers. In this example, we have two
control applications: Appdetect is a local control application, but Appreroute is
non-local. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3 Kandoo’s high level architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.4 Kandoo in a virtualized environment. For so�ware switches, we can leverage the
same end-host for local controllers, and, for physical switches, we use separate
processing resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5 Experiment Setup. We use a simple tree topology. Each switch is controlled
by one local Kandoo controller. �e root Kandoo controller manages the local
controllers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 Control Plane Load for the Elephant Flow Detection Scenario. �e load is based
on the number of elephant �ows in the network. . . . . . . . . . . . . . . . . . . . 54
3.7 Control Plane Load for the Elephant Flow Detection Scenario. �e load is based
on the number of nodes in the network. . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 A simple learning switch application in Beehive. . . . . . . . . . . . . . . . . . . . 62
4.2 �e LearningSwitch application (Figure 4.1) learns the MAC address to port
mapping on each switch and stores them in the Switches dictionary. . . . . . . 64
xi
4.3 �is simple elephant �ow routing application stores �ow periodically collects
stats, and accordingly reroutes tra�c. �e schematic view of this algorithm is
also portrayed in Figure 4.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4 �e schematic view of the simple elephant �ow routing application presented
in Figure 4.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.5 In Beehive, each hive (i.e., controller) maintains the application state in the form
of cells (i.e., key-value pairs). Cells that must be colocated are exclusively owned
by one bee. Messages mapped to the same cell(s) are processed by the same bee. 68
4.6 Automatically generated Map functions based on the code in Figure 4.4. . . . . 69
4.7 Life of a message in Beehive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.8 �roughput of a queen bee for creating new bees, using di�erent batch sizes.
Here, we have used clusters of 1, 3, 5, 7, 9 and 11 hives. For larger batch sizes, the
throughput of a queen bee is almost the same for all clusters. . . . . . . . . . . . 72
4.9 Hives form a quorum on the assignment of cells to bees. Bees, on the other hand,
form quorums (i.e., colonies) to replicate their respective cells in a consistent
manner. �e size of a bee colony depends on the replication factor of its
application. Here, all applications have a replication factor of 3. . . . . . . . . . . 75
4.10 �e latency and throughput of the sample key-value store application: (a)
�roughput using replication factors of 1 to 5, and batch sizes of 128 to 8k. (b)
�e PDF of the end-to-end latency for GET and PUT requests in a cluster of 20
hives and 80 clients. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.11 Beehive failover characteristics using di�erent Ra� election timeouts. In this
experiment, we use the key-value store sample application with a replication
factor of 3 on a 3-node cluster. Smaller election timeouts result in faster failover,
but they must be chosen to avoid spurious timeouts based on the network
characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
xii
4.12 �e PDF of a bee’s throughput in the sample key-value application, with and
without runtime instrumentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.13 Inter-hive tra�c matrix of the elephant �ow routing application (a) with
an ideal placement, and (b) when the placement is automatically optimized.
(c) Bandwidth usage of the control plane in (b). . . . . . . . . . . . . . . . . . . . 82
4.14 A centralized application maps all messages to the same cell. �is results in
processing all messages in the same bee. . . . . . . . . . . . . . . . . . . . . . . . . 85
4.15 A local application maps all messages to a cell local to the hive (e.g., the switch’s
ID). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
4.16 To manage a graph of network objects, one can easily map all messages to the
node’s ID. �is way nodes are spread among hives on the platform. . . . . . . . . 86
4.17 For virtual networks, messages of the same virtual network are naturally
processed in the same bee. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.1 Using Kandoo to implement OpenTCP:�e Oracle, SDN-enabled switches, and
Congestion Control Agents are the three components of OpenTCP. . . . . . . . . 93
5.2 �e Oracle collects network state, and statistics through the SDN controller and
switches. Combining those with the CCP de�ned by the operator, the Oracle
generates CUEs which are sent to CCAs sitting on end-hosts. CCAs will adapt
TCP based on the CUEs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.3 �e topology of SciNet, the datacenter network used for deploying and
evaluating OpenTCP. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
5.4 SciNet tra�c characterization. �e x-axis is in log scale. (a) CDF of �ow sizes
excluding TCP/IP headers. (b) CDF of �ow completion times (right). . . . . . . 99
5.5 �e locality pattern as seen in matrix of log(number of bytes) exchanged
between server pairs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.6 CDF of �ow sizes in di�erent benchmarks used in experiments. �e x-axis is in
log scale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
xiii
5.7 Comparing the CDF of FCT (a) and drop rate (b) of OpenTCP1, OpenTCP2,
and regular TCP while running the “all-to-all” IMB benchmark. . . . . . . . . . 102
5.8 Congestion window distribution for OpenTCP1 and unmodi�ed TCP. . . . . . . 103
5.9 Comparing the CDF of FCT (a) and drop rate (b) of OpenTCP1, OpenTCP2,
and regular TCP while running the “Flash” benchmark. . . . . . . . . . . . . . . 105
5.10 Beehive’s SDN controller suite. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.11 Fault-tolerance characteristics of the control suite using a Ra� election of
300ms: (a) �e default placement. (b) When the newly elected masters are
on the same hive. (c) When the new masters are on di�erent hives. (d)
Reactive forwarding latency of (b). (e) Reactive forwarding latency of (c). (f)
Reactive forwarding latency of (c) which is improved by automatic placement
optimization at second 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.12 When a switch disconnects from its master driver: (a) One of the slave drivers
will become the master. (b) �e reactive forwarding latency is optimized using
the placement heuristic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.13 �roughput of Beehive’s learning switch application and ONOS’s forwarding
application in di�erent cluster sizes. �e number of nodes (i.e., the x-axis) in
this graph represents the nodes on which we launch cbench. For Beehive, we
always use a 5-node cluster. For ONOS, the cluster size is equal to the replication
factor since all nodes in the cluster will participate in running the application. . 113
5.14 A distributed shortest path algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.15 Convergence time of the �at routing algorithm in a k-ary fat-tree topology. . . . 116
xiv
Chapter 1
Introduction
Computer networks have traditionally been designed based on a plethora of protocols imple-
mented by autonomous networking elements. In such a setting, controlling the network is
di�cult and network programmability is limited since these network protocols are focused
on speci�c functionalities implemented using a speci�c mechanism. As networks and their
applications evolve, the need for �ne-grained control and network programming becomes more
pronounced. Without programmability, implementing new ideas (e.g., network virtualization
and isolation) and adapting new changes (e.g., new routing mechanisms) are quite challenging
and tedious [88].
So�ware-De�ned Networking (SDN) emerged with the goal of lowering the barrier
for innovation in computer networks. �ere are numerous SDN proposals, ranging from
ForCES [128] to 4D [49] to OpenFlow [88]. Although, these proposals are di�erent in scope,
moving the control functions out of data-path elements is the common denominator among
them. �ese proposals essentially decouple control and data planes, and provide a standard API
using which the control plane can con�gure the data plane. In SDN, as shown in Figure 1.1,
data-path elements (such as routers and switches) are high-performance entities implementing
a set of forwarding primitives (mostly matched-action forwarding rules). �e control plane, on
the other hand, is a cohort of programs (i.e., the control applications) that collectively control
1
Chapter 1. Introduction 2
the data-path elements using the exposed forwarding primitives. �ese control applications are
developed in general-purpose programming languages and are hosted on top of a middleware
called the network controller. �e network between the data plane and the control plane is called
the control channel, which can be either in-band (i.e., shared with the actual network �ows) or
out-of-band (i.e., a dedicated and isolated network).
Switch
Switch
Switch
ControllerCo
ntro
l App
Cont
rol P
lane
Ope
n AP
ID
ata
Plan
e
Cont
rol C
hann
elCo
ntro
l App
Cont
rol A
pp
Cont
rol A
pp
Cont
rol A
pp
Figure 1.1: So�ware-De�ned Networking: A generic scheme.
Advantages of SDN.�e separation of control from data path elements brings about numerous
advantages such as high �exibility, programmability, and the possibility of realizing a centralized
network view; with the promise of simplifying network programming. A�er all, it is signi�cantly
easier to implement complex policies with arbitrarily large scopes on a decoupled, centralized
controller. For example, consider a simple shortest-path routing application. In traditional
networks, one needs to implement the shortest path algorithm in a highly distributed,
heterogeneous environment and should de�ne boilerplates including the serialization protocol,
the advertisement primitives and the coordination mechanisms, in addition to the core of
the shortest path routing algorithm. In contrast, on a centralized controller, such a routing
application is implemented as �nding shortest paths using the centralized network-wide view.
Despite all its simplicity, centralization is not feasible in large scale networks due to concerns
Chapter 1. Introduction 3
about scalability and resilience [60]. In such environments, network programmers need to
implement distributed control applications. As we explain next, implementing distributed
control applications is quite challenging without proper platform-level support.
1.1 Problem Statement
In large-scale networks, to overcome scalability and performance limitations, control applica-
tions are usually implemented on distributed control platforms. Implementing a distributed
control application is clearly more di�cult than implementing its centralized version. Some of
these complexities (e.g., synchronization, and coordination) are common among all distributed
applications, whereas some are stemmed from the application’s design (e.g., parallelization).
Existing distributed control platforms either (i) expose the complexities of distributed
programming to control applications (e.g., ONIX [75]), or (ii) delegate distributed state
management to external distributed systems (e.g., ONOS [17] and OpenDayLight [4]). In
such environments, designing a distributed control application comprises signi�cant e�orts
beyond the design and implementation of the application’s core: one needs to address timing,
consistency, synchronization, coordination, and other common complications of distributed
programming. More importantly, network programmers have to develop tools and techniques
to measure the performance of their control applications in order to identify design bottlenecks,
and need to manually optimize the control plane to come up with a scalable design. �is directly
obviates the promised simplicity of SDN.
Issues of Exposing Complexities.�e most notable distributed control platform is ONIX [75],
which is built using an eventually consistent represention of the network graph (i.e., Network
Information Base). Providing eventual consistency greatly simpli�es the control plane’s design,
as there is no need to deal with timing complications that can happen as a result of delays
and failures in the network. Yet, such a design simpli�cation, admittedly, pushes a lot of
complexities into control applications and hinders the promised simplicity of SDN [75, 86].
Chapter 1. Introduction 4
Controller 1 Controller 2
EventuallyConsistent
Controller 1 Controller 2
EventuallyConsistent
Installing Flowsin Parallel
Figure 1.2: Using an eventually consistent state, two controllers can steer tra�c (e.g., the greenand the red �ows) onto the same link, which results in an unnecessary oversubscription in thenetwork.
�is becomes more pronounced when control applications need to implement coordination
and synchronization mechanisms that are mostly overlooked in existing control platforms.
For instance, consider a routing application designed to prevent link oversubscription by
steering �ows along paths with su�cient vacant capacity, shown in Figure 1.2. �is application
is simple to implement on a centralized controller, but it can easily oversubscribe a link when
deployed in a distributed controller using an eventually consistent state. For instance, due to
latency in updating the state and lack of proper synchronization, two distinct instances of the
application can route two elephant �ows (e.g., the green and the red �ows) to the same link,
leading to an unnecessary oversubscription. To overcome this issue, the network programmer
needs to implement/integrate complex coordination mechanisms, which requires more e�orts
than implementing the core of the routing functionality.
In addition to such suboptimalities, eventual consistency may result in invalid control.
For example, consider a naive minimum spanning tree (MST) routing application, depicted
in Figure 1.3, that tries to prevent loops by keeping one outgoing port up per neighbor
switch. When two neighbor switches are controlled by two physically separate controllers, each
Chapter 1. Introduction 5
Controller 1 Controller 2
EventuallyConsistent
MST Router MST Router
Controller 1 Controller 2
EventuallyConsistent
MST Router MST Router
Block Portsfor Loop Prevention
Figure 1.3: In a distributed routing application that tries to avoid loops in a network, weneed coordination and leader election mechanisms to avoid service disruption. Without suchmechanisms, an eventually consistent state may lead to faults in the network.
controller immediately blocks a few ports, but will eventually notify the neighbor controller.
Without a proper coordination mechanism, this can result in obscure connectivity problems
even in extremely simple situations. �is problem can be overcome by employing a leader
election mechanism, which inevitably complicates the control application’s logic.
Moreover, in scenarios where control applications require a subset of network events
(e.g., network events related to a speci�c tenant in a multi-tenant datacenter), it would be
futile to keep a copy of the complete network state in each controller instance. �is is
especially important when network policies are naturally enforced using the state of a small
partition. For instance, to enforce isolation and to provide network statistics and billing for each
tenant, the control application needs the network events of that speci�c tenant in a consistent
way, not a holistic view of the entire network. �is suboptimal behavior can potentially
become a scalability bottleneck by unavailingly consuming resources. To prevent unnecessary
duplications of state, di�erent instances of an application must explicitly specify what parts of
the network state they need to access. �is requires proper partitioning that, if not facilitated by
the platform, can easily dwarf the main part of the control logic.
Issues of Delegating to External Datastores. An alternative design for a distributed control
plane is to delegate state management to an external distributed datastore. Although using
Chapter 1. Introduction 6
an external datastore can simplify distributed state management, it has several important
drawbacks. Delegating SDN state management to an external system is shown to have
considerable overheads [76]. Further, such an over-delegation of the control plane functionality
does not resolve the problem of distributed programming, such as concurrent updates and
consistency [123]. More importantly, it is the external datastore that dictates the architecture
of the control plane: if the datastore forms a ring, we cannot form an e�ective hierarchy in the
control plane.
ONOS [17] is a distributed control plane that delegates all the core functions of a distributed
control application to distributed state storage systems (e.g., ZooKeeper [64], Casandra [77], or
RAMCloud [99]). �is simpli�es ONOS’s design but inherits all the complications of distributed
programming with one layer of indirection. In such a setting, network programmers need to
design and optimize their control application as well as the external distributed system since the
logic is implemented in one environment and the state is stored in another system. As a result,
using an external system does not allow the network programmer to design the functions and the
respective state in accordance. At least, one needs to design and measure two separate systems
with di�erent characteristics and fundamentally di�erent designs.
For instance, suppose we have a network managed by multiple controllers that store their
state in a distributed datastore. Despite sharing a common datastore (even if the datastore
is strongly consistent), the control applications on these controllers still need to coordinate
to correctly manage the network. For example, if they see packets of the same �ow in
parallel (say due to a multi-path route), they need to coordinate with one another to avoid
black holes and routing loops. In general, they need to synchronizate on any decision that
might lead to con�icts. Such synchronization problems require orchestration facilities such as
Corybantic [90], Athens [15] and STN [23]. It is important to note that providing such facilities
in the control plane would eliminate the need for an external state management system.
Relying on an external datastore has strong performance implications. Consider a simple
learning switch application, as another example, that stores the state of each switch and forwards
Chapter 1. Introduction 7
Controller 2
External Data Store
VM
Net 1 Net 2
VM
Controller 1
External Data Store
VM
Net 1 Net 2
VM
VMs Migrated
Control App Migrated
Virtual Networking
Controller 2Controller 1
Virtual Networking
Figure 1.4: When a virtual network migrates, the control plane can migrate the respectivevirtual networking application instance. Without �ne-grained control on the external datastore,however, such a migration is ine�ective.
packets accordingly. To store its state, this application will have to save the state of all switches on
the external distributed database that ONOS supports. �is learning switch application should
be high-throughput, and the overheads of using an external system can be prohibitive for this
simple control application. �is makes network programming unnecessarily complicated as
one needs to not only make their learning switch application scale, but also to come up with a
scalable design for the external state management system.
Using an external datastore also limits the control plane’s ability to self-optimize. For
example, consider an instance of a virtual networking application that controls the ports and
the �ows of a virtual network in a virtualized environment. In such a setting, virtual machines
have mobility and can migrate. �us, it is important that the control plane places the virtual
networking application instance close to the virtual network upon migration. As shown in
Figure 1.4, even if we implement an optimized placement mechanism for the control application,
the external state storage system may not move the state according to the application placement.
�at necessitates changes in the external system which is quite challenging.
Motivation. �ese simple examples demonstrate that, without a proper control plane ab-
straction, control applications have to deal with practical complications of distributed systems,
Chapter 1. Introduction 8
which can be a huge burden if not impossible to deal with. Pushing such complications
(e.g., synchronization, coordination, and placement) into control applications clearly obviates
many bene�ts of SDN. A viable way to provide an easy-to-use network programming paradigm
is to hide these generic complications and boilerplates from control applications behind a
suitable abstraction. Such an abstraction must be simple and, more importantly, powerful
enough to express essential control functions.
1.2 Objectives
Simplicity and scalability are not mutually exclusive. Map-Reduce [35] and Spark [132] are the
infamous programming models that hide the complexities of batch data processing. Storm [84]
and D-Stream [133] are successful examples that hide the boilerplates of distributed stream
processing. In essence, such systems �nd the sweet spot between what they hide from and what
they expose to the distributed applications. �e key to �nding a proper programming model is
an easy-to-use abstraction that is generic enough to cover all important use cases in its target
domain. We believe similar approaches are viable for SDN.
�e main objective of this thesis is a new distributed programming platform that can simplify
network programming in SDN. By programming platform, we mean a composition of (i) a
programming model that provides the APIs to develop control applications and (ii) a distributed
runtime that automatically deploys, runs, and optimizes the applications written in the proposed
programming model.
Such a platform must have the following characteristics:
1. Simple: We aim at creating a programming paradigm that is as simple to program
as existing centralized controllers. �at is, the e�orts required to develop a control
application (e.g., line-of-code and cyclomatic complexity [85]) using the distributed
programming model should be almost the same as implementing it on a centralized
controller. More importantly, the e�orts required to instrument and measure the
Chapter 1. Introduction 9
distributed control application should be comparable to existing centralized controllers.
2. Generic: �e programming model should be su�ciently expressive for essential control
functionalities. For example, one should be able to implement forwarding, routing, and
isolation using this programming model. Moreover, one should be able to implement and
emulate existing control platforms using the proposed programming model. �is ensures
that the programming model is generic enough to accommodate existing proposals and
legacy control applications.
3. Scalable: We need a programming model that has no inherent scalability bottleneck.
It is important to note that the scalability of any programming model depends heavily
on the applications. For instance, one can create an ine�cient and unscalable Map-
Reduce application. Such scalability bottlenecks that reside in applications cannot be
automatically resolved. Having said that, our objective is to propose a programming
paradigm that scales well given a control application design that is not inherently
centralized. In other words, our proposal by itself should not be scalability bottleneck.
4. Autonomous: �ere are many commonalities among distributed control applications.
One of the most important example of such commonalities is optimizing the placement of
control applications. Our goal is to develop an environment in which such commonalities
are automated with no programmer intervention. Having such an autonomous system
simpli�es network programming and operation. More importantly, it results in an
elastic control plane that can automatically adapt to the network characteristics and the
workload.
1.3 Overview
We �rst present a comprehensive overview of existing SDN proposals and research. To gain
a better perspective on the topic of this dissertation, we start with earlier proposals from
Chapter 1. Introduction 10
the academia and the industry that have emerged before SDN. �en, we present high-level
architectural proposals, as well as latest research on SDN control plane, SDN data plane, SDN
abstractions, and SDN tools.
In our background research, we deconstruct what are the main scalability concerns in SDN
and discuss how they compare with traditional networks. We argue that some of these scalability
concerns are not unique to SDN and coexist in traditional networks [60]. Moreover, we
explore scalability and performance bottlenecks (such as scarce control channels and resilience
to failure) in di�erent settings for di�erent control applications.
�ese bottlenecks reside in the data plane, in the control plane, or both; hence, there
are numerous places (e.g., the forwarding fabric, the controller, and the control applications)
to tackle them. We enumerate such di�erent alternatives and trade-o�s in the design
space to address the scalability and performance bottlenecks. Furthermore, we compare these
alternatives and analyze the cons and pros of each approach. �is analysis forms the foundation
of our distributed control plane proposals: Kandoo and Beehive. Both of our proposals scale
well and are simple to program.
Kandoo. One of the main scalability bottlenecks of the control plane in SDN is the scarce
control channels that can be easily overwhelmed by frequent events. Even when one proactively
installs �ow-entries, collecting �ow statistics to implement adaptive networking algorithms
can overwhelm the control channels and consequently the control plane. Kandoo [57] is a
distributed programming model that tries to address this scalability bottleneck in environments
where computing resources are available close to the data path elements (e.g., datacenters and
campus networks). In contrast to previous proposals such as DIFANE [131] and DevoFlow [34],
Kandoo limits the overhead of frequent events on the control plane without any modi�cations
in the data plane and without hindering the visibility of the control applications.
As shown in Figure 1.5, Kandoo has two layers of controllers: (i) the bottom layer is a group
of controllers with no interconnection and no knowledge of the network-wide state, and (ii) the
top layer is a logically centralized controller that maintains the network-wide state. Controllers
Chapter 1. Introduction 11
Root ControllerRoot ControllerRoot Controller
RareEvents
FrequentEvents
App D(Non-Local)
App C(Non-Local)
Local Controller
App A(Local)
App B(Local)
App A(Local)
App B(Local)
App A(Local)
Local Controller
App B(Local)
Local Controller
Switch Switch Switch Switch Switch
Figure 1.5: Kandoo has two layers of controllers: Local controllers run local applications andhandle frequent events, while the root controller runs non-local applications processing in-frequent events.
at the bottom layer run only local control applications (i.e., applications that can function using
the state of a single switch) close to the datapath elements. �ese controllers handle most of the
frequent events and e�ectively shield the top layer.
Local control applications have implicit parallelism and can be easily replicated since they
access only their local state. As such, Kandoo’s design enables network operators to replicate
local controllers on demand without any concerns about distributed state replication. �ese
highly replicated local controllers process frequent events and relieve the load on the top layer.
Our evaluations show that a network controlled by Kandoo has an order of magnitude lower
control channel consumption compared to centralized OpenFlow controllers. Moreover, the
only extra information we need from network programmers is a �ag that states whether a control
application is local or non-local. As a result, control applications written in Kandoo are as simple
as their centralized versions.
Beehive. Beehive [59], the second programming paradigm presented in this dissertation, is
more generic and more powerful than Kandoo. Beehive consists of a programming model
Chapter 1. Introduction 12
and a distributed control plane. Using Beehive’s programming model, network programmers
implement control applications as asynchronous message handlers storing data in their local
dictionaries (i.e., associative arrays or maps). Based on this programming model, Beehive’s
distributed control platform seamlessly partitions the dictionaries of each application, and
assigns each partition exclusively to one leader thread. Beehive’s partitioning mechanism
guarantees that each message can be processed solely using the entries in one partition. �at
is, for each message, there is one and only one partition (and accordingly there is one thread
managing that partition) that is used to serve the message. To that end, Beehive automatically
infers the mapping between each message to the keys required to process that message, and
sends the message to the leader thread which exclusively owns those keys (i.e., the thread that
owns the partition containing those keys). It is important to note that all these functionalities
(e.g., inferring partitions, parallelization through threads, and the platform-wide coordination)
are internal to our control platform and are not exposed to network programmers.
Beehive provides seamless fault-tolerance, runtime instrumentation, and optimized place-
ment of control applications without developer intervention. To provide such autonomous
functionalities and to remain generic, Beehive is not built on top of existing external datastores
(e.g., memcached [43] or RAMCloud [99]) since we need �ne-grained control over all aspects
of the control plane, including but not limited to the placement of state and the replication
mechanism.
As depicted in Figure 1.6, Beehive’s control platform is internally built based on the intuitive
primitives of hives, cells, and bees. A hive is basically an instance of the controller running
on a physical machine, a container, or a virtual machine. A cell is an entry in the dictionary
of a control application. A bee, on the other hand, is a light-weight thread of execution that
exclusively manages its own cells and runs the message handlers of a control application. �e
platform guarantees that each message is processed by the bee that exclusively owns the cells
required to process that message. �e interplay of these primitives results in concurrent and
consistent state access in a distributed fashion.
Chapter 1. Introduction 13
Hive on Machine 3
App A App B
Hive on Machine 1 Hive on Machine 2
App A App B
MM
MessagesM
App A App B
M
MMM
Figure 1.6: Beehive’s control platform is built based on the intuitive primitives of hives, bees, andcells.
�e Beehive control platform is fault-tolerant. Each bee forms a colony (i.e., a Ra� [96]
quorum) with follower bees on other hives to consistently replicate its cells. Upon a failure, the
colony reelects its leader and continues to process messages in a consistent manner. Another
important capability of Beehive is runtime instrumentation of control applications. �is enables
Beehive to automatically live migrate the bees among hives aiming to optimize the control plane.
�e runtime instrumentation, as a feedback provided to network programmers, can also be
used to identify bottlenecks in control applications, and can assist developers in enhancing their
designs.
It is important to note that Beehive hides many commonalities of distributed programming
(such as coordination, locking, concurrency, consistency, optimization and fault-tolerance),
while providing su�cient knobs for network programmers to implement a wide range of
application designs. In this dissertation, we demonstrate that Beehive is generic enough for
implementing existing distributed control platforms (including Kandoo) as well as important
control applications (such as routing and tra�c engineering) with ease and at scale.
UseCases. We present how important use cases can be implemented using Kandoo and Beehive.
We �rst present OpenTCP [46], our proposal to optimize and adapt TCP using the centralized
network-wide view available in SDN, and how one can use Kandoo to implement OpenTCP at
scale. Furthermore, we present a distributed SDN controller and a simple shortest-path routing
Chapter 1. Introduction 14
algorithm implemented using Beehive to showcase the expressiveness of our proposal. We
demonstrate that Beehive’s automatic optimizer can be advantageous for SDN applications in
real-world settings. Evaluating these use-cases, we demonstrate that Beehive and Kandoo are
simple to program and can scale.
1.4 �esis Statement
We present expressive and extensible distributed programming paradigms for SDN that are easy to
use, straightforward to instrument, and autonomous in optimizing the control plane.
1.5 Contributions
�is dissertation makes the following contributions:
1. We present Kandoo [57], a two-layer hierarchy of controllers that is as simple to
program as centralized controllers, yet scales considerably better by o�oading local
control applications onto local computing resources close to data-path elements to process
frequent events. �e key di�erentiation of Kandoo is that, unlike previous proposals such
as DIFANE [131] and DevoFlow [34], it does not require modi�cations in the forwarding
hardware nor to the southbound protocols (e.g., OpenFlow [88]).
2. As a use case of Kandoo, we present OpenTCP [46], which can adapt and optimize
TCP by exploiting SDN’s network-wide view. By deploying OpenTCP in SciNet (the
largest academic HPC datacenter in Canada), we demonstrate that OpenTCP can achieve
considerable improvements in �ow completion time with simple optimization policies, at
scale. Such a deployment is quite challenging using centralized controllers.
3. We propose Beehive which provides a programming model (i.e., a set of APIs) that
is intentionally similar to centralized controllers to preserve simplicity. Applications
developed using this programming model are seamlessly transformed into distributed
Chapter 1. Introduction 15
applications by our distributed control platform. Using Beehive, we have implemented
our own SDN controller as well as existing distributed controllers such as Kandoo. To
demonstrate the generality of our proposal, we have implemented several non-SDN use
cases on top of Beehive including a key-value store, a message queue, and graph processing
algorithms. Using these SDN and non-SDN use cases, we demonstrate that our proposal
is simple to program, scales considerably well, can automatically tolerate faults, provides
low-overhead runtime instrumentation, and can optimize itself upon changes in the
workload.
To preserve simplicity, Beehive applications are written in a generic programming
language with an API similar to centralized controllers. To that end, our proposal does not
introduce a new domain speci�c language (DSL), such as BOOM’s disorderly program-
ming [13] and Fabric’s DSL for secure storage and computation [80]. Moreover, unlike
existing distributed SDN controllers, such as ONIX [75] and ONOS [17], Beehive hides
the boilerplates of distributed programming and does not expose them to programmers.
Beehive is generic: it does not impose a speci�c distribution model (e.g., a hierarchy or
a ring), and is not tied to a speci�c control application (e.g., forwarding or routing) or a
speci�c southbound protocol (e.g., OpenFlow). As such, it can accommodate a variety of
designs for control applications, including hierarchies, rings and pub-sub. Similarly, its
self-optimization mechanism is generic and applies to all types of control applications, in
contrast to proposals such as ElasticCon [37] which has the speci�c goal of optimizing the
workload of switches among several controllers.
1.6 Dissertation Structure
�is dissertation is organized as follows:
• In Chapter 2, we present an overview of the state-of-the-art SDN research from both the
academia and the industry. We present SDN precursors (which includes the proposals
Chapter 1. Introduction 16
that led to what we know as SDN today), SDN proposals in the design space, SDN tools,
and novel applications of SDN. Moreover, we deconstruct challenges in SDN, and discuss
di�erent alternatives to address these challenges.
• In Chapter 3, we present Kandoo. Kandoo is one of the two programming abstraction
presented in this dissertation. We explain Kandoo’s design philosophy in detail and
compare it with other alternative proposals in the same domain. Using a sample elephant
�ow detection application, we demonstrate that Kandoo can result in a considerably more
scalable control plane.
• In Chapter 4, we present our second programming paradigm, Beehive. We present the
details of Beehive’s design, and demonstrate its programming model. Moreover, we
explain the internals of the Beehive distributed control plane and how it acts as a scalable
runtime for the proposed programming model. We also explain fault-tolerance, runtime
instrumentation, and automatic placement optimization in Beehive.
• We present important use cases of our proposals in Chapter 5. We �rst present OpenTCP,
and explain how it can be implemented using Kandoo (and consequently Beehive). �en,
we present and evaluate a real-world distributed controller and a distributed shortest path
algorithm implemented using Beehive.
• We summarize and conclude this dissertation in Chapter 6, and enumerate all the
potential directions to follow up this research in the future.
Chapter 2
Background
SDN is a broad area and has a vast body of research ranging from low-level hardware designs to
high-level abstractions to real-world network applications. Covering all SDN research is beyond
the scope of this dissertation. �us, in this chapter, we limit the scope of our literature review
to fundamental SDN research and the state-of-the-art proposals related to the topic of this
dissertation. We start with precursors and important fundamental SDN abstractions. �en, we
enumerate and analyze proposals that address scalability, resilience, correctness and simplicity
concerns in SDN. Moreover, we present important measurement techniques and optimization
applications using SDN.
2.1 Precursors
Traditional networks are built as a fragile assemblage of networking protocols (i.e., slightly
less than 7000 RFCs), which are instrumental for building the Internet, but inessential
for orchestration within a single networking domain. �ese protocols o�en solve very
straightforward problems but are unnecessarily complicated since they are designed to operate
in highly heterogeneous infrastructures with no orchestration mechanism. For that reason, to
enforce a simple network policy (e.g., a simple isolation rule) in traditional networks, we either
have to use a set of complicated signaling protocols supported by all networking elements, or
17
Chapter 2. Background 18
build an overlay network to steer tra�c through middleboxes.
�ere are various architectural proposals to address this inherent problem of traditional
network architectures, and SDN is the result of their convergence. �ese proposals, although
di�erent in scope and mechanisms, agree upon the separation of control functions from the data
plane: data-path elements support addressing protocols and respective forwarding primitives,
and all other problems are tackled in the control plane which communicates with data-path
elements using a standard API1.
In this chapter, we present an overview of proposals which profoundly in�uence SDN’s
design principles.
2.1.1 ForCES: An initial proposal
In most operating systems, networking applications are designed as separate fast-path and
slow-path components. �e former provides line-rate packet processing functionality, needs
to be extremely e�cient, and is usually implemented within the Kernel, whereas the second
component is a user-space so�ware that provides higher level functionality, and manages its
kernel-level counterparts via signaling mechanisms (e.g., NetLink sockets).
Routers and switches are internally designed the same way. �eir fast path (i.e., the
forwarding plane) is usually implemented in hardware (using either ASIC, programmable
hardware or network processors) separated from the slow path (i.e., the controller) which is
usually implemented in so�ware running on a general purpose processor. Observing this
analogy2, Forwarding and Control Element Separation (ForCES) [128] proposes to use a
standard signaling mechanism in between control and forwarding elements which e�ectively
decouples control plane from forwarding elements.
As portrayed in Figure 2.1, ForCES have two types of networking entities: (i) Forwarding
Elements (FE) that forward packets, and (ii) Control Elements (CE) that control FEs. A
1It is important to note that traditional network equipments are implemented in a similar fashion except forthe fact that their control plane is highly coupled with the data plane, and thus is in�exible.
2ForCES is proposed by designers of Linux NetLink sockets.
Chapter 2. Background 19
FE
LFB LFB LFB
LFBLFB
(a) Forwarding Elements (FE) are con�gured asa set of interconnected Logical Function Blocks(LFB).
FE FE FE
CE CE
ForCES
Protocol
Data
Plan
eCo
ntrol
Plan
e
(b) Control Elements (CE) can associate with anyForwarding Element (FE), con�gure and query itsLogical Function Blocks.
Figure 2.1: ForCES Components.
FE is con�gured as a topology of Logical Function Blocks (LFB) which de�ne high level
forwarding functions (Figure 2.1a). As shown in Figure 2.1b, each CE can associate with any
FE and con�gure, query, and subscribe to its LFBs. LFBs are designed to be �ne-grained, and
thus almost all sorts of forwarding functionality can be implemented in ForCES’s FEs. �is
advantage, however, is also the main disadvantage of ForCES.�is makes the model too generic
and underspeci�ed, and for that reason ForCES is not deployed in practice.
2.1.2 4D: Scalable Routing
RCP. Routing Control Platform (RCP) [22] aims at addressing scalability issues in traditional
routing protocols, speci�cally iBGP, by delegating routing decisions to a control platform.
To centralize routing decisions, RCP gathers internal network state (by monitoring link-state
advertisements from IGP protocols) as well as candidate routes to external networks (by
establishing iBGP connection to network routers). Utilizing this centralized information, RCP
selects and assigns routes for each router, essentially preventing the protocol oscillation and
persistent forwarding loops. To be resilient to failures, RCP realizes multiple replicas and
guarantees that di�erent replicas will have identical routing decisions in steady state.
4D.Extending RCP, 4D [49] proposes a clean-slate approach which extends RCP beyond routing
protocols into all kinds of decision logics in networks. 4D has three major design objectives:
First, network requirements should be expressed as a set of network-level objectives that should
Chapter 2. Background 20
be independent of data plane elements, and control decisions should satisfy and optimize those
objectives. Second, decisions are made based on a timely, accurate network-wide view of the
data plane’s state. Consequently, routers and switches should provide the information required
to build such a view. �ird, instead of implementing decision logic inside network protocols,
the data plane elements should only use the output of decision logic to operate. �is ensures
direct control over the data plane elements. In other words, we can directly enforce decisions
and make sure they are obeyed by switches and routers.
To achieve its design objectives, 4D (as implied by its name) partitions network functionality
into four planes (Figure 2.2): (i) data, (ii) discovery, (iii) dissemination, and (iv) decision planes.
�e decision plane makes all control decisions, such as load balancing, access control, and
routing. �e data plane provides packet processing functions such as forwarding, �ltering,
scheduling, measurements, and handles packets according to the state enforced by the decision
plane. Discovering physical elements in the data plane is the responsibility of the discovery plane.
�is plane merely discovers physical networking elements and represents them as logical entities
encapsulating their capabilities and adjacency information. �e dissemination plane, on the
other hand, is a passive communication medium that relays decisions to the data plane, and
provides the discovered state to the decision plane.
Decision
Data
Dissemination
Discovery
Direct Control
Network-wideView
Network-Level Objectives
Figure 2.2: 4D Four Planes: (i) data, (ii) discovery, (iii) dissemination, and (iv) decision planes
Chapter 2. Background 21
Tesseract. 4D is a generic architectural template and does not enforce any implementa-
tion details for the planes, nor any protocols and APIs for communicating between them.
Tesseract [127] is an experimental implementation of 4D that embeds the four planes into
two components: (i) Decision Element (DE) is the centralized entity that makes networking
decisions and has direct control over forwarding elements. DE is basically an implementation
of 4D’s decision plane that uses the dissemination service to communicate to switches, and is
replicated using a simple master-slave model to ensure resilience. (ii) Switch is the component
residing in the forwarding elements that provides packet processing, discovery functions, and
the dissemination service endpoint. In essence, the dissemination plane is partly deployed on
switches and partly on DE to maintain a robust communication channel. Due to the lack of
standard protocols in 4D, Tesseract uses its own node con�guration API to con�gure forwarding
elements. Evaluated in Emulab, Tesseract is shown to be resilient to failures and su�ciently
scalable to handle a thousand nodes.
2.1.3 Ethane: Centralized Authentication
Traditional networks “retro�t” security using auxiliary middleboxes and network processors.
�is approach does not scale as it is not an integral part of the network. It is di�cult for network
operators to maintain policies over time, and it is straightforward for attackers to �nd a way
through the maze of broken policies. To resolve this problem, Casado et al. proposed clean-
slate approaches, SANE [26] and then Ethane [25], incorporating a centralized authority to
authenticate �ows in the network.
SANE. SANE [26] aims at transforming the permissive network into a secure infrastructure
using a centralized authorization mechanism. In SANE, a centralized controller (DC) is
responsible for authentication, service management and authorization. DC assigns capabilities
to authorized �ows and if a �ow does not have a valid capability switches drop it. Upon initiating
a new �ow, end-hosts need to ask for capability grants from a centralized domain controller
(DC). When capabilities are granted, packets are tagged with the capability and sent out from
Chapter 2. Background 22
the end-hosts.
Ethane. Ethane [25] extends SANE such that no modi�cations are required on end-hosts. In
contrast to SANE, Ethane switches have an initially empty �ow table that holds active �ow-
entries (i.e., forwarding rules). When a packet with no active �ow-entry arrives at a switch, it
is sent to the controller. �en, it is the controller’s responsibility to either setup proper �ow-
entries to forward the packet, or drop the packet. �is ensures centralized policy enforcement
while minimizing the modi�cations on the end-hosts.
2.1.4 OpenFlow: Convergence of Proposals
Inspired by 4D, OpenFlow [88] generalizes Ethane for all network programming applications.
In contrast to 4D and other clean-slate designs, OpenFlow takes a pragmatic approach with
a few compromises on generality. OpenFlow adds a standard API between the data plane
and the control plane. In contrast to providing a fully programmable datapath, OpenFlow
imposes minimal changes to switches for programming their forwarding state (i.e., how
incoming packets are forwarded in a switch). �is is to enable high-performance and low-cost
implementations in new hardware, as well as straightforward implementation of OpenFlow in
existing switches.
Flow-entries. OpenFlow models the forwarding state as a set of matched-action rules, called
�ow-entries, composed of a set of conditions over packet header, and a set of actions. �e
conditions can match information such as ingress port, mac addresses, IPv4/v6 �elds, TCP ports
and outermost VLAN/MPLS �elds to name a few. �ese conditions are designed to be easily
implementable in ternary content-addressable memories (TCAMs). Once all the conditions
match a packet the set of respective actions are executed. �ese actions range from sending
packets out to a port to pushing/popping tags and QoS actions.
In addition to forwarding rules, OpenFlow adds minimal functionalities for collecting
network statistics. In addition to generic statistics per port, OpenFlow maintains aggregate
Chapter 2. Background 23
Flow Table 1 Flow Table N
OpenFlow Agent
Controller
Control App Control App Control App
OpenFlowProtocol
TCP or TLS
Software
Hardw
are
Software
OpenFlow Switch
Rule Actions Counters
⁃ SRC Mac⁃ DST Mac⁃ SRC IP⁃ DST IP⁃ TCP SPort⁃ TCP DPort⁃ …
⁃ Forward/Drop Packet⁃ Push/Pop Tags⁃ Send to the next
table⁃ Enqueue packet⁃ …
⁃ Number of Packets⁃ Bytes⁃ Duration
Flow-Entry
Figure 2.3: OpenFlow Switch: An OpenFlow 1.1 switch stores multiple �ow tables each consistingof multiple �ow-entries.
metrics (e.g., number of packets, size of packets, etc.) for each �ow-entry. �is uniquely enables
adaptive forwarding.
Flow-tables. In OpenFlow 1.0 (and prior versions), all �ow-entries are stored in a single table.
Each �ow-entry had a priority for disambiguation. �e problem with this approach is space
explosion: we are storing the Cartesian product of all rules in �ow-entries. In OpenFlow 1.1,
this problem is resolved by replacing the single table with a pipeline of �ow-tables. In addition
to normal actions, each �ow-entry can point to the next �ow-table in the pipeline (note that no
back reference is allowed). �is process is presented in Figure 2.3.
2.2 Forwarding Abstractions
Separating the control and data planes is the high-level design approach advocated by all
SDN proposals, but they are considerably di�erent in how they draw the border between the
responsibilities of the control plane and the functions of the data plane. Such fundamental
di�erences result in di�erent forwarding abstractions for which there are three major schools of
thought.
Chapter 2. Background 24
Programmable Abstractions. Programmable abstractions view the data plane as a set of
programmable entities with an instruction set. Similar to general-purpose processors, for-
warding elements do not provide any prede�ned network function but instead one can o�oad
code into these elements to get the desired functionalities. Active networks, the pioneer of
these abstractions, allow programmability at packet transport granularity. Active networks can
be implemented in so�ware running on general-purpose processors, or using programmable
hardware, such as NetFPGAs[94] or BEE [29] to achieve line-rate processing.
Active network proposals can be categorized into two categories: Some proposals, such as
ActiveIP [126], Smart Packets [107] and ANTS [125] advocate encapsulating code inside packets.
�e proposals assume that the forwarding elements are stateless, generic processing elements
that execute a code accompanying a packet on a per packet basis. �is code can either modify
the packet itself or result in a forwarding decision. Other proposals, such as Switchware [114]
and DAN [36], advocate o�oading code into switches. �ey assume a switch is a stateful and
programmable processing element. �e o�oaded code is executed per incoming packet and
can result in di�erent forwarding decisions.
We also note that there are extensible network programming platforms such as XORP [55]
and Click [73] that must be deployed on generic x86 platforms. �ese platforms suite well for
implementing routing protocols but they are not suitable as a forwarding abstraction.
Generic Abstractions. Proposals in this class keep the data plane as generic as possible
essentially de�ning data-path functions as a set of black boxes with no prede�ned characteristics.
In such abstractions, the data plane is responsible for providing a formal speci�cation of its
functionalities, and it is the control plane’s job to compose these black boxes to program the
network. In other words, the control plane merely composes a set of prede�ned functions to
modify a packet or reach a forwarding decision.
ForCES [128] is the most notable proposal in this category. In ForCES, Logical Functional
Blocks (LFBs) represent the generic networking functions available in forwarding elements. For
instance, there are LFBs for address translation, packet validation, IP unicast, and queueing.
Chapter 2. Background 25
Composing these LFBs, the control elements can program forwarding elements.
SwitchBlade [14] is another example in this category which exposes prede�ned networking
functions as blackboxes which can be composed to perform a cohesive behavior. SwitchBlade
slices a switch into one or more Virtual Data Planes (VDPs) each having their own isolated
pipeline. Each VDP is composed of a set of modules (note that modules can be shared among
di�erent VDPs). �e set of modules of each VDP is dynamic and con�gurable.
SwitchBlade processes incoming packets in four stages: (i) It selects the appropriate VDP(s)
based on the �rst few bytes of the packet. (ii) Once the VDPs are selected, the packet enters a
shaping module for rate limiting. (iii)�en, packets are preprocessed through VDP’s respective
preprocessing modules to calculate meta-data about packets. (iv) Finally, packets are forwarded
using an output port lookup module which can use longest-pre�x matching, exact matching, or
forward packets to the CPU.
Recon�gurable Abstractions. �e other school of thought, led by OpenFlow [88], views the
data plane as a minimal set of functions that can be directly implemented in hardware and
pushes all other functions to the control plane. �ese proposals de�ne the data plane as a set
of packet processing elements providing basic forwarding and queuing primitives. �e state of
these forwarding elements is con�gured by the control plane. For that reason, recon�gurable
abstraction are not completely programmable. For instance, OpenFlow allows the control plane
to add, remove and modify �ow-entries (§2.1.4) in switches. Some proposals (e.g., Fabric [27])
even push the abstraction further towards a model in which edge switches are programmable,
and the core uses a high throughput label switching protocol (e.g., MPLS [105]).
P4 [20, 67] is another proposal that aims at providing more programmability and more
�exibility in recon�gurable networks. In essence, the goal of P4 is to bring the bene�ts of
programmable abstractions into recon�gurable abstractions. P4 has a high-level programming
language using which one can specify packet �elds, how the packets are parsed, and what are
the actions to apply on the packets. To be target independent, P4 automatically compiles this
high-level program onto the actual target.
Chapter 2. Background 26
Remarks. Programmable and generic abstractions are more evolvable, and accommodate a
broader range of data plane functions (including the recon�gurable abstractions) at the cost
of simplicity since, to use a speci�c data plane function, one would need to de�ne several
programming contracts; e�ectively specifying a custom southbound API. Recon�gurable
abstractions, on the other hand, enforce a well-de�ned southbound protocol that both the data
and control planes agree upon. �is results in a relatively limited data plane, but simpli�es
network control. More importantly, it prevents API fragmentation for control programs and
favors a vendor agnostic ecosystem. For that reason, the later model is well adopted by the
research community as well as the industry. In essence, recon�gurable abstractions are the most
pragmatic abstraction and thus most SDN proposals have converged on that.
2.3 Control Plane Scalability
Decoupling control from the data plane leads to several scalability concerns. Moving
traditionally local control functionalities to a remote controller can potentially result in new
bottlenecks. �e common perception that control in SDN is centralized leads to concerns about
SDN scalability and resiliency. It can also lead to signaling overheads that can be signi�cant
depending on the type of network and associated applications.
We argue that there is no inherent bottleneck to SDN scalability, i.e., these concerns stem
from implicit and extrinsic assumptions [60]. For instance, early benchmarks on NOX (the
�rst SDN controller) showed it could only handle 30k �ow initiations per second [51] while
maintaining a sub-10ms �ow install time. Such a low �ow setup throughput in NOX is shown
to be an implementation artifact [118]; or, the control plane being centralized was simply due to
the historical evolution of SDN. Without such assumptions, similar to any distributed system,
one can design a scalable SDN control plane.
We believe there are legitimate concerns for SDN scalability, but we argue that these
scalability limitations are not restricted to SDN, i.e., traditional control protocol design also
Chapter 2. Background 27
faces the same challenges. While this does not address these scalability issues, it shows that
we do not need to worry about most scalability issues in SDN more than we do for traditional
networks. In this section, we present recent proposals to address these scalability concerns.
2.3.1 In-Data-Plane Proposals
Datapath elements sit at the base layer of SDN with signi�cant in�uence on the whole
architecture. One way to address scalability concerns in the control plane is delegating more
responsibilities to the data plane. �is simply reduces the load on the control plane and can
eliminate bottlenecks.
Decision Caching. One way to eliminate the load on the control plane is to cache the decision
logic (i.e., the function that maps events into concrete rules) inside the datapath elements. �e
most notable work pursuing this idea is DIFANE [131]. DIFANE delegates forwarding logic
from the controller to a special kind of switches called Authority Switches in order to shield the
controller from packet-in events. �e controller o�oads forwarding rules to Authority Switches,
and Authority Switches, on behalf of the controller, install rules in other switches. In other
words, Authority Switches act as a proxy that caches simple forwarding decisions. Although
considerably e�cient, DIFANE is only applicable to forwarding rules that can be implemented
in a switch, and cannot help for complex control logics.
Aggregate Visibility. Another way to reduce the control plane’s load is to handle frequent events
(such as short-lived �ow arrivals) within the datapath, and selectively send aggregate events to
the control plane. �is approach is pioneered by DevoFlow [34] which uses wildcarded �ows
to forward tra�c, and sends only elephant �ows (e.g., �ow with more than a certain amount
of data, bandwidth consumption, etc.) to the controller. Although this approach would result
in scalable SDN, it comes at the cost of controller visibility: the controller never gets to see the
short-lived �ows. Some advantages of SDN would be hindered when the controller does not
have the power to visit all �ows.
Chapter 2. Background 28
Remarks. �ese in-data-plane proposals are mostly motivated by early benchmarks on
NOX [51] that are shown to be fallacies rooted in a naive IO implementation, which can be
simply resolved by using asynchronous IO mechanisms [118] or by exploiting parallelism [134].
Newer implementations of NOX and Beacon [39] can easily cope with hundreds of thousands
of �ows on commodity machines. We therefore believe that controller’s throughput is not an
inherent bottleneck in SDN and there is no need to push functionalities back to the data plane.
Moreover, delegating more to the data plane would result in less visibility in control applications
that can obviate the bene�ts of SDN.
2.3.2 Distributed Control Planes
�e controller plays a central role in SDN: it programs forwarding elements, maintains the
network state, and runs the control applications. In essence, it is the network operating
system, and thus must be scalable, reliable, and e�cient. Led by NOX [51], SDN controllers
are traditionally designed as a centralized middleware that communicates with switches and
hosts control applications. �ese controllers mostly use an abstract object model to represent
the network and provide an event-based programming model for control applications. �ese
controllers all had centralized architectures with ine�cient implementations. �ese limitations
have recently been the topic of extensive research that we present in this section.
Regardless of the controller’s capability, a centralized controller does not scale as the network
grows (increasing the number of switches, �ows, bandwidth, etc.) and will fail to handle
all the incoming requests while providing the same service guarantees. �is necessitates
scalable distributed control platforms that can cope with arbitrary loads in di�erent network
environments.
ONIX. ONIX [75] is the successor of NOX that maintains the centralized network state in
a distributed fashion. As shown in Figure 2.4, ONIX models the network using an abstract
data structure called Network Information Base (NIB). NIB represents a network as a graph in
which nodes represent networking elements and edges represents logical connections. NIB is
Chapter 2. Background 29
maintained in a distributed fashion and is eventually consistent. Within ONIX, each controller
has its own NIB, and when a part of the NIB is updated in one of the controllers it will
be eventually propagated to the whole cluster. All other aspects of state distribution (e.g.,
partitioning and consistency) are handled by control applications.
HyperFlow. HyperFlow [116] is another distributed controller built based on NOX. It partitions
the network, and assigns a controller to each partition. In order to synchronize the network
state, HyperFlow logs events on each controller and propagates them on the cluster. Within
HyperFlow, consistency is preserved by managing each switch only by the local controller.
Whenever a remote application requires to modify a switch, the message is sent to the controller
that manages that switch, and replies are routed back to the application accordingly. In contrast
to ONIX, HyperFlow does not require any change in control applications but is not as scalable
as ONIX.
DISCO. DISCO [103] is a distributed controller focusing on scalable overlay networks. DISCO
partitions the network into multiple domains, each controlled by its own controller. In contrast
to HyperFlow, DISCO focuses on providing end-to-end connectivity among nodes in the
network, not detailed path setup. As such, neighbor controllers form pub-sub relationships
with one another. Using this pub-sub system, which is built on top of AMQP [95], network
controllers share aggregated network-wide views and establish end-to-end connections.
ONOS. ONOS [17] is another distributed controller that tries to address the shortcomings of
Network Information Base
Distribution EngineDistribution Engine SouthBound DriverSouthBound Driver
Network Information Base
SwitchSwitch
Control ApplicationsControl ApplicationsControl ApplicationsControl ApplicationsControl ApplicationsControl Applications
Dat
a Pl
ane
Cont
rol P
lane
Controller 1 Controller N
Figure 2.4: ONIX maintains the network-wide state in a distributed fashion.
Chapter 2. Background 30
ONIX. In contrast to ONIX, the goal of ONOS is to provide a strongly consistent network-
wide state. To that end, as shown in Figure 2.5, ONOS stores the network state in an external
distributed datastore; Cassandra and ZooKeeper in its �rst version and RAMCloud in its second
version. As a result, the control platform consists of controllers that synchronize their state
using the external state management system. Delegating state management to an external
system simpli�es ONOS’s design, yet would lead into suboptimalities, scalability issues, and
maintenance di�culties.
B4. B4 [66] is Google’s production wide-area network (WAN) that consists of multiple WAN
sites. Each site in B4 is controlled by a distributed control plane, called Network Control Servers
(NCS), that manage the SDN switches in that site. NCS internally uses ONIX NIBs [75] to
implement OpenFlow, and Quagga [6] to implement routing protocols. At a higher level, a
logically-centralized global view of the network is used by a centralized Tra�c Engineering (TE)
application to optimize tra�c across B4’s planet-scale network.
ElastiCon. ElastiCon [37] is a proposal on how to distribute the load of the network among
controllers in a distributed control plane. �e main contribution of ElastiCon is a four phase
protocol that controllers use to elect an apt master for a new switch. Moreover, it proposes an
algorithm to migrate a switch from one controller to another. ElastiCon is tackling an important
SwitchSwitch
Control ApplicationsControl ApplicationsControl ApplicationsControl ApplicationsControl ApplicationsControl Applications
Dat
a Pl
ane
Cont
rol P
lane
Controller 1 Controller N
External Distributed Store
Figure 2.5: ONOS maintains the network-wide state using an external database system.
Chapter 2. Background 31
problem and the proposal is orthogonal to all distributed controllers. �at is, ElastiCon can be
easily adopted in other distributed controllers.
Remarks. An interesting observation is that control plane scalability challenges in SDN
(e.g., simplicity and consistency requirements) are not inherently di�erent than those faced in
traditional network design. SDN, by itself, is neither likely to eliminate the control plane design
complexity nor make it more or less scalable.3
SDN, however, (a) allows us to rethink the constraints traditionally imposed on control
protocol designs (e.g., a �xed distribution model) and decide on our own trade-o�s in the design
space; and (b) encourages us to apply common so�ware and distributed systems development
practices to simplify development, veri�cation, and debugging. Unlike traditional networks,
in SDN, we do not need to address basic but challenging issues like topology discovery, state
distribution, and resiliency over and over again. Control applications can rely on the control
platform to provide these common functions; functions such as maintaining a cohesive view of
the network in a distributed and scalable fashion.
2.4 Resilience to Failures
Resilience to failures and convergence time a�er a failure have always been a key concern in
network performance. SDN is no exception, and, with the early systems setting an example
of designs with a single central control, resiliency to failures has been a major concern. A
state-synchronized slave controller would be su�cient to recover from controller failures, but
a network partition would leave half of the network brainless. In a multi-controller network,
with an appropriate controller discovery mechanism, switches can always discover a controller
if one exists within their partition. �erefore, given a scalable discovery mechanism, controller
failures do not pose a challenge to SDN scalability.
Link Failures. Let us decompose the process of repairing a broken link or switch to see how it3A�er all, one can replicate a traditional network design with SDN by co-locating equal number of forwarding
and control elements. Even though this obviates the bene�ts of SDN, it is technically possible.
Chapter 2. Background 32
is di�erent from the traditional networks. As shown in Figure 2.6, convergence in response to
a link failure has �ve steps. �e switch detects a change. �en, the switch noti�es the controller.
Upon noti�cation, the control program computes the repair actions and pushes updates to
the a�ected datapath elements which, in turn, update their forwarding tables.4 In traditional
networks, link failure noti�cations are �ooded across the network, whereas with SDN, this
information is sent directly to a controller. �erefore, the information propagation delay in
SDN is no worse than traditional networks. Also, as an advantage for SDN, the computation is
carried out on more capable controller machines as opposed to weak management CPUs of all
switches, regardless of whether they are a�ected by the failure or not.
Note that the above argument was built on the implicit assumption that the failed switch or
link does not a�ect the switch-controller communication channel. �e control network itself
needs to be repaired �rst, if a failed link or switch was part of it. In that case, if the control
network – built with traditional network gear – is running an IGP, the IGP needs to converge
�rst before switches can communicate with the controller to repair the data network. In this
corner case therefore convergence may be slower than in traditional networks. If this proves to
be a problem, the network operator should deploy an out-of-band control network to alleviate
this issue.
Control Plane Failure. To tolerate a controller failure, controllers have to ensure that the
controller’s state is persistent and the state is properly replicated. For a centralized controller, this
means persistent state with master-slave replication. For distributed controllers, on the other
hand, this depends on the type of consistency guarantees they provide. Existing distributed
controllers either provide eventually consistent replications (e.g., ONIX [75]), or rely on an
external distributed datastore to replicate the state for them (e.g., ONOS [17]). �ey handle
network partitions in the control plane similarly. It is important to note that beyond the
distributed system mechanisms, one can use the state stored in switches to preserve consistency
upon failure [100].
4For switch failures, the process is very similar with the exception that the controller itself detects the failure.
Chapter 2. Background 33
Controller
Switch
Switch
Switch
1. Switch detects a link failure.
2. Switch notifies the controller.
3. Controller computes the required updates.
4. Controller pushes the updates to switches.
5. Switches update their forwarding tables.
Figure 2.6: �e �ve steps when converging upon a link failure.
Control applications can also result in failures in the control plane. For example, when a
control application fails to handle an event or when it crashes, it can result in a control plane
failure (e.g., loss of connectivity and a transient high loss episode). LegoSDN [28] aims at
isolating application failures using a sandbox called AppVisor which contains applications and,
according to the operator policies, can stop applications or revert their transactions upon an
application error. Approaches like LegoSDN can be very useful to improve the reliability of the
control plane.
Remarks. Overall, the failure recovery process in SDN is no worse than traditional networks.
Consequently, similar scalability concerns exist and the same techniques used to minimize
downtime in traditional networks are applicable to SDN. For instance, SDN design can and
should also leverage local fast failover mechanisms available in switches to transiently forward
tra�c towards preprogrammed backup paths while a failure is being addressed. We stress that
control platforms must provide the essential failover and recovery mechanisms that control
applications can reuse and rely upon.
Chapter 2. Background 34
2.5 Programming Abstractions
Almost all SDN controllers provide APIs that expose the details embedded in the southbound
interface between the data and control planes. �at level of details is essential for implementing
a controller, but not generally required for managing a network. Network operators and
programmers instead require an abstraction through which they can specify the required
behavior from the network with minimal e�ort. To that end, network programming boilerplates
and complexities should be handled by the controllers and be hidden from management
interfaces. Such an interface will accommodate network growth and will ease management at
scale. �is is a major challenge for the future of SDN to preserve its simplicity. In this section,
we present the proposals that aim at simpler programming abstractions.
Frenetic.�e Frenetic project [45, 91] is the foremost initiative aiming at raising the abstraction
level for network programming. Frenetic provides a declarative language (inspired from
SQL) for querying network state, and a reactive functional language for programming the
network. Frenetic provides functional composition by design and facilitates easy re-use.
Frenetic languages run atop a runtime environment that manages all low-level semantics such as
installing and updating rules and gathering network statistics. Moreover, the Frenetic runtime
provides optimizations and consistency guarantees (both at packet and �ow levels).
Nettle and Procera. Nettle [120] and Procera [121] is a high level functional abstraction for
network programming on top of OpenFlow, very similar to Frenetic. �e only di�erence is
that Nettle and Procera support all networking events while Frenetic is limited to packet level
events. On the other hand, Frenetic provides a declarative language for stat collection which
neither Nettle nor Procera provide.
Kinetic. Kinetic [72] is another programming abstraction, built on top of Pyretic, with the goal
of creating a domain-speci�c language (DSL) for dynamic network con�gurations. In contrast
to Frenetic, which provides a static network programming abstraction, Kinetic aims at creating
a network programming environment that can react to temporal changes and events in the
Chapter 2. Background 35
network. More importantly, policies implemented in Kinetic are formally veri�able. Internally,
Kinetic uses a �nite state machine (FSM) to present a dynamic abstraction of the policy. �ese
state machines can be automatically composed to create larger policies and are fed into NuSMV
for formal veri�cation. Providing these features makes Kinetic simpler than Pyretic to use in
real environments yet incur its own performance and scalability overheads.
Remarks. �ese high-level programming abstractions can indeed simplify network program-
ming by hiding the internal details of SDN protocols. All these proposals, however, focus on
centralized controller systems. Providing such abstractions on distributed controllers requires
extensive architectural changes in both the control plane and also the abstraction.
2.6 Correctness and Consistency Tools
Networks are dynamic entities that constantly change state; links go up and down, elements
fail, and new switches are added. Each of these events necessitates new con�gurations that are
enforced while the network is operating. �is results in transient inconsistencies, subtle errors
(such as loops), and colorful failures that are di�cult to detect and resolve. One of the advantages
of SDN is that it paves the way for developing new tools to help in detecting and prohibiting such
consistency and validation problems.
Veri�cation. Veri�cation tools mostly focus on verifying a set of invariants against the network
con�guration. �ere are static veri�cation tools that verify network con�guration o�ine and
out of bound. Such veri�cation tools aim at checking “what-if ” rules, and usually pursue formal
modeling for packets, protocols, and �ows in order to check network invariants.
FlowChecker [31] and its predecessor Con�gChecker [11] use Binary Decision Diagrams
(BDDs) to model the network state, and use model checking techniques to check whether the
model matches a set of prede�ned speci�cations and rules. Anteater [82], similarly, models
network state as a boolean satis�ability problem (SAT), and uses a SAT solver to check invariants.
NICE [24] is a similar approach that uses model checking to validate OpenFlow applications.
Chapter 2. Background 36
It emulates a simple switch for the controller and tries to explore all valid state spaces (in
regards to the controller, control channels, and switches) to �nd bugs and inconsistencies.
One limitation of NICE is that it can only check one control application at a time due to the
complexity of verifying parallel control logic.
Header Space Analysis (HSA) [69], in contrast, models packets as a binary vector and
formally de�ne network con�guration as a function that transforms this binary vector.
Invariants are then veri�ed based on such header transformations. �ese methods have
interesting theoretical properties and are able to detect some con�guration problems, while
they fall short in modeling complex network transformations (even when it comes to modeling
simple address translations).
Another category of veri�cation tools are focused on real-time, inbound veri�cation. �e
challenge here is to develop methods for (i) gathering a real-time view of the network and (ii)
e�cient veri�cation. VeriFlow [70, 71], for instance, limits the search space to a subset of the
network and, with that, signi�cantly improves veri�cation performance. VeriFlow, however,
cannot detect transient inconsistencies that are caused by di�erent synchronizations between
the control plane and the datapath.
Consistency Guarantees. Veri�cation mechanisms, although important, cannot detect tran-
sient inconsistencies resulting from out of sync con�guration updates. For instance, to install a
path, we need to push network con�guration to every switch in the path. �is can simply go out
of sync because of networking delays, and until the system reaches a stable state, many packets
are maliciously dropped or wrongly forwarded. �ese sort of problems are very frequent in
practice (due to the nature of distributed systems) and are signi�cantly di�cult to discover.
Reitblatt et al. [104] have systematically studied these problems, classi�ed them, and
proposed update mechanisms to prevent such inconsistencies. �ey have identi�ed two
classes of consistency guarantees: (i) per-packet consistency which ensures each packet is
processed using the same version of con�guration throughout the network, and (ii) per-�ow
consistency which ensures packets of a �ow are all forwarding using the same version of network
Chapter 2. Background 37
con�guration. Note that the latter is a stronger guarantee and thus more di�cult to implement.
For per-packet consistency, one can add a tag match to each rule representing the
con�guration version. Once all rules of a speci�c version are installed, packets can be labeled
with the apt con�guration version at ingress ports. Authors have proposed three mechanisms
for per-�ow consistency. �e �rst mechanism is mixing the per-packet mechanism with
�ow timeouts. �is ensures con�gurations are not applied unless all �ows using a speci�c
con�guration version are ended. �is approach falls short when a �ow spans di�erent ingress
port. �e second approach is to use wild card cloning which requires DevoFlow switches [34],
and the third approach requires noti�cation from end-hosts. �is approach is further extended
to provide bandwidth guarantees for virtual machine migration in [47]. �e main issue of all
these approaches is that, in general, they can double the size of forwarding tables during an
update.
McGeer [87] has proposed an alternative approach for rule update which incorporates the
controller. �is approach pushes a set of intermediate rules to switches which forward a�ected
�ows to the controller. Once the controller received all packets from all a�ected �ows, it installs
the new version of rules, and then forwards all �ows. �is approach o�ers more �exibility while
increasing the load on the control plane.
Customizable Consistency Generator (CCG) [135] is another proposal that aims at mini-
mizing the relatively high performance and space overheads of the aforementioned consistency
proposals. �e main idea behind CCG is to minimize the network state transition delay upon a
con�guration update. To that end, for each update, it veri�es whether the update is safe to apply
based on a set of user-de�ned invariants. If safe, the update is installed, otherwise bu�ered. In
the presence of a deadlock (i.e., when none of the updates can be safely applied), CCG falls back
to stronger consistency methods (such as [104]) to apply the updates. �e authors have shown
that CCG can lower space overheads and the transition delays for consistent state update.
Congestion-Free Updates. Even when rules are updated consistently (say with no transient
loops or dead-end paths), the network might observe transient congestion and respective high
Chapter 2. Background 38
loss episodes during an update due to asynchronous updates among switches. �at is, because
switches apply updates asynchronously, there is a possibility of signi�cant transient tra�c spikes
in the network. �is can cause high packet loss and considerable temporary congestion, which
hinders the usability of interactive (i.e., latency intolerant) systems.
zUpdate [79] and SWAN [62] are both proposals aiming for congestion-free network
updates. zUpdate calculates the steps required to transition from the current state to the desired
state in a way to minimize congestion and tra�c imbalance, and installs the con�guration
updates using the same method proposed in [104]. SWAN proposes a similar technique with
the di�erence that it keeps a scratch link capacity (e.g., 13 of the link capacity) to avoid infeasible
update scenarios. Both zUpdate and SWAN perform better than distributed network protocols
such MPLS-TE but are limited to speci�c network topologies.
Con�ict-Free Con�gurations. Policies and con�gurations can come from di�erent sources for
di�erent purposes, and hence can be con�icting. For example, a load balancing application
naturally has a di�erent objective than an energy optimizer. Such control applications will
compete for resources of the network. When dealing with user-de�ned policies, one can design
an autonomous system that resolves con�ict (e.g., the con�ict resolution mechanism used for
hierarchical policies in PANE [41, 42]).
Composing control applications in a con�ict-free manner is more complicated than
resolving con�icts among user-de�ned policies since control applications are not as well-de�ned
and limited as user-de�ned policies. �ere are two important proposals for this problem.
Corybantic [90] proposes a coarse-grained system in which control modules propose their
intended updates in each update round. �en, the system selects a non-con�icting set of updates
to apply and noti�es the control modules of the selected updates. �is way the control modules
will adapt their next updates according to the decisions in this round. Corybantic is a plausible
solution yet it is quite coarse-grained and appears to be di�cult to fully automate.
Volpano et al. [122] propose a �ner-grained solution based on deterministic �nite-state
transducer (DFTs) [63]. DFTs are basically �nite state machines that can translate an input to
Chapter 2. Background 39
an output. Volpano et al. assume that control applications are implemented or speci�ed using
DFTs. Given such DFTs, an autonomous system can compile updates in safe (i.e., con�ict-free)
zones. Although the proposed system is �ne-grained, the main problem with this proposal is
that DFTs are limited and cannot express all sorts of control applications.
Debuggers. Although written in so�ware, it is not easy to debug a network application using
existing so�ware debugging tools. �e reason is that network applications program a distributed
system (a cluster of switches), and for debugging its behavior we need to access arbitrary switches
and inspect arbitrary forwarding packets.
ndb [54] aims at providing basic debugging features (such as breakpoints and traces). To
provide breakpoints, ndb install rules on switches to notify the information collector. To
provide packet backtrace, ndb stores postcards (i.e., packet headers along with their �ow match
information on all �ows). When packets traverse through the network, they generate postcards
and once it hits a breakpoint the debugger can build the backtrace. ndb is built in a way that
postcards and breakpoints do not hit the control plane. However, they consume a considerable
amount of bandwidth on the datapath.
SDN Troubleshooting System (STS) [108] takes an alternative path and tries to �nd so�ware
bugs by replaying logs. Due to the probabilistic nature of SDN (e.g., asynchronous events, race
conditions and distributed actors), STS cannot rely on a simple replay of logs. Instead, STS
does a random walk among possible replay sequences, tries each sequence multiple times, and
checks invariants to �nd the minimum sequence of inputs that trigger the bug. Although STS
uses pruning techniques to narrow its search space, it is still unclear how it will scale for realistic
networks with tens of thousands of switches.
Remarks. Consistency, correctness, and troubleshooting are all very important aspects of
network programming. �e proposals that we discussed in this chapter are all useful to
implement a correct and consistent SDN. Having said that, there are still signi�cant open
problems that are not addressed by existing proposals. Speci�cally, correctness and consistency
is still an open problem in a distributed setting. For example, when we have multiple controllers
Chapter 2. Background 40
managing the network providing a �ow-level consistency is quite challenging. Another
important open challenge is performance troubleshooting and debugging. �at is, existing
proposals cannot answer queries such as “why is my control plane slow?” or “why doesn’t it
scale?”
2.7 Monitoring and Measurements
Before SDN, networking applications relies on protocols such as SNMP [56] to capture network
statistics and metrics. SNMP metrics, however, are coarse grained and di�cult to customize. For
instance, it was only possible to estimate tra�c matrix using SNMP in traditional networks [89].
For that reason, one would need to use sampling mechanisms (such as sFlow [102]) or capture
statistics by modifying end-hosts [46]. To that end, emergence of SDN protocols provides an
opportunity for improving monitoring and measurement tools. �at is because SDN protocols
provide meaningful counters that are attached to individual �ows, ports, queues, and switches.
�is powerful construct enables control applications to gather accurate tra�c matrix.
OpenTM. OpenTM [117] was the �rst e�ort to capture tra�c matrix using SDN counters.
OpenTM is basically a control application running on NOX [51] that queries OpenFlow switches
for �ow counters in the network, and creates the tra�c matrix accordingly. To lower the
overhead of statistics collection, OpenTM tries to �nd the optimum set of switches to query. �e
authors have conducted experiments with di�erent selection schemes (i.e., uniform random,
non-uniform random, last switch, round-robin, and least-loaded switch) and have shown that
querying the least-loaded switch will be the best switch selection scheme in OpenFlow network.
Despite all the values of OpenTM, we believe in a real world setting edge switches are always the
least-loaded switches and hence querying edge switches would be the simplest, most realistic,
and an optimal way of gathering the tra�c matrix.
FlowSense. FlowSense [129] has the same goal as OpenTM, but proposes an alternative
approach. Instead of querying switches and polling the stat counters, FlowSense relies on Packet-
Chapter 2. Background 41
In and Flow-Removed OpenFlow messages to collect the tra�c matrix. Packet-In events are
�red when a packet does not match any �ow in the switch, and Flow-Removed is �red when a
�ow is removed/expired. FlowSense uses Packet-In events as a trigger for starting a �ow, and
uses byte counters and duration in Flow-Removed events to estimate the tra�c of the �ow. �e
authors have shown that FlowSense has a low overhead compared to polling while maintaining
the same level of accuracy. �e major problem with FlowSense is that it is limited to reactive �ow
installation and will not work for a network that is proactively programmed. As most real-world
networks are proactively managed, FlowSense would not be widely adopted in practice.
OpenSketch. In contrast to OpenTM and FlowSense, which monitor the network in the control
plane, OpenSketch [130] proposes to implement measurement and monitoring facilities in the
data plane. OpenSketch has a three-stage, data plane pipeline to select and count packets: (i)
Hashing to summarize packet headers, (ii) Classi�cation to classify the packets into di�erent
classes, and (iii) Counting which uses probabilistic counters (i.e., Count-Min Sketches) to store
the count (for each class). In the control plane, OpenSketch provides a measurement library
to assist the control logic to identify �ows, store statistics, and query counters. OpenSketch is
shown to be accurate while preserving low memory footprint. �e major issue with OpenSketch
is that it requires changes in the data plane, which results in bloated switches and fragmentation
which is in contrast to SDN design principles.
DREAM. DREAM [92] is another inbound measurement system that tries to implement �ne-
grain measurements using existing TCAM counters in hardware switches. �e main challenge of
using TCAMs is their limited capacity; especially when one needs �ne-grained measurements.
DREAM proposes an adaptive method, trying to �nd a trade-o� between the granularity of
matches installed in TCAMs and the TCAM space. Although this method can be advantageous
in existing network, it has limits in the level of visibility it provides for control applications.
NetAssay. NetAssay [38] takes a di�erent approach for measurement and monitoring by
focusing on high-level concepts (e.g., devices and applications) instead of low-level constructs
(e.g., �ows or addresses). To that end, NetAssay forms a dynamic mapping between higher-level
Chapter 2. Background 42
concepts and the �ow space, and uses that mapping for translating low-level measurements
to high-level monitoring. To keep NetAssay generic, these mappings can be implemented by
the operators. It is important to note that NetAssay’s approach is orthogonal to other SDN
measurement approaches and can complement other proposals.
2.8 Network Optimization
Network applications are traditionally built based on the end-to-end design argument [106] and
traditional networks act as a black-box medium for packet transport. �is resulted in suboptimal
behaviors in transport and application layer [18]. �is problem can only be solved when the
network provides an open integration/programming point for applications. SDN provides such
an open API and creates an opportunity to optimize network applications. In what follows, we
brie�y explain some of the most important application optimization e�orts based on SDN.
Hedera and Helios. Data center networks usually have a multi-root topology (e.g., leaf-spine
topology) with high aggregate bandwidth. Traditional Interior Gateway Protocols (IGPs), such
as IS-IS [97] and OSPF [93], mostly yield a minimum spanning tree, and hence, quite a few
underutilized links. Link aggregation protocols (e.g., MLAG or MC-MLAG), on the other hand,
are passive and cannot optimize for tra�c patterns in a network. For instance, it can result in
congestion for elephant �ows. To tackle this problem, Hedera [10] proposes a dynamic �ow
scheduling mechanism using SDN. �e Hedera scheduler �nds elephant �ows (i.e., �ows that
need bandwidth but are network limited) and tries to �nd a optimal placement for elephant
�ows on ECMP paths.
Helios [40] focuses on hybrid datacenter fabrics that use all optical switches alongside
an Ethernet network. In such networks, the optical fabric is high-throughput but does not
have enough (bandwidth and switching) capacity to forward all connections in the network.
Moreover, the optical fabric has a relatively high latency for switching and thus cannot e�ciently
react to short-lived tra�c. �e idea behind Helios is to forward short lived tra�c on the Ethernet
Chapter 2. Background 43
fabric, and forward long-lived, elephant �ows on optical paths. To that end, Helios queries
switches to detect elephant �ows. Once an elephant �ow is detected it will be forwarded to a
pre-setup optical path.
Both Hedera and Helios result in higher link utilization, lower congestion, and lower late
latency.
Hadoop Optimization. Helios and Hedera were both application agnostic and work without
the knowledge of the applications and their networking demand. Although this results in
transparent network optimization, there are applications (or development frameworks) where
we know about their network demand and application topology. For instance, in a Hadoop
cluster, we can take advantage of our knowledge about job allocation and jobs’ size, to optimize
the network. Wang et al. [124] have proposed to use networking information of Hadoop
clusters for �ow scheduling. �ey focus on hybrid (optical/Ethernet) fabrics, and recon�gure
the topology of the optical fabric based on Map-Reduce jobs, and HDFS demand.
2.9 Discussion
Previous works have addressed SDN research challenges in an innovative manner that was not
feasible in traditional networks. However, we believe that there is still no clear proposal on how
to resolve scalability issues in SDN without hindering its simplicity. Implementing, measuring
and optimizing a distributed control application is quite di�cult using existing proposals. On
the other hand, using simplifying programming models results in centralization and possibly the
lack of scalability. Our goal in this dissertation is to create an easy-to-use network programming
paradigm that is distributed, intuitive, scalable and self-optimizing.
Chapter 3
Kandoo
Limiting the overheads of frequent events on the control plane is essential for realizing a scalable
SDN. One way of limiting this overhead is to process frequent events in the data plane. �is
requires modifying switches and comes at the cost of visibility in the control plane. Taking an
alternative route, in this chapter, we present Kandoo [57], a framework for preserving scalability
without changing switches.
3.1 Introduction
Frequent and resource-exhaustive events, such as �ow arrivals and network-wide statistics
collection events, stress the control plane and consequently limit the scalability of OpenFlow
networks [34, 116, 33]. Although one can suppress �ow arrivals by proactively pushing the
network state, this approach falls short when it comes to other types of frequent events such as
network-wide statistics collection. Existing solutions either view this as an intrinsic limitation
or try to address it by modifying switches. For instance, HyperFlow [116] can handle a few
thousand events per second, and anything beyond that is considered a scalability limitation. In
contrast, DIFANE [131] and DevoFlow [34] introduce new functionalities in switches to suppress
frequent events and to reduce the load on the control plane.
To limit the load on the controller, frequent events should be handled in the closest vicinity
44
Chapter 3. Kandoo 45
of datapaths, preferably without modifying switches. Adding new primitives to switches is
undesirable. It breaks the general principles of SDN, requires changes to the standards, and
necessitates the costly process of modifying switching hardware. �us, the important question
is:
How can we move control functionalities toward datapaths, without introducing new datapath
mechanisms in switches?
To answer this question, we focus on: (i) environments where processing power is readily
available close to switches (such as datacenter networks) or can be easily added (such as
enterprise networks), and (ii) applications that are local in scope, i.e., applications that process
events from a single switch without using the network-wide state. We show that, under these
two conditions, one can o�oad local event processing to local resources, and therefore realize a
control plane that handles frequent events at scale.
We note that some applications, such as routing, require the network-wide state, and cannot
be o�oaded to local processing resources. However, a large class of applications are either
local (e.g., local policy enforcer and Link Layer Discovery Protocol [7]) or can be decomposed
to modules that are local (e.g., elephant �ow detection module in an elephant �ow rerouting
application).
Kandoo. In this chapter, we present the design and implementation of Kandoo [57], a
novel distributed control plane that o�oads control applications over available resources in
the network with minimal developer intervention and without violating any requirements of
control applications. Kandoo’s control plane essentially distinguishes local control applications
(i.e., applications that process events locally) from non-local applications (i.e., applications that
require access to the network-wide state). Kandoo creates a two-level hierarchy for controllers:
(i) local controllers execute local applications as close as possible to switches, and (ii) a logically
centralized root controller runs non-local control applications. As illustrated in Figure 3.1,
several local controllers are deployed throughout the network; each of these controllers controls
one or a handful of switches. �e root controller, on the other hand, controls all local controllers.
Chapter 3. Kandoo 46
Switch
Local Controller
Switch Switch Switch Switch
Local Controller
LocalController
Root Controller
RareEvents
FrequentEvents
Figure 3.1: Kandoo’s Two Levels of Controllers. Local controllers handle frequent events, whilea logically centralized root controller handles rare events.
It is easy to realize local controllers since they are merely switch proxies for the root
controller, and they do not need the network-wide state. �ey can even be implemented directly
in OpenFlow switches. Interestingly, local controllers can linearly scale with the number of
switches in a network. �us, the control plane scales as long as we process frequent events in
local applications and shield the root controller from these frequent events. Needless to say,
Kandoo cannot help any control applications that require network-wide state (even though it
does not hurt them, either). In Chapter 4, we present Beehive, which provides a framework to
implement such applications.
Our implementation of Kandoo is completely compliant with the OpenFlow speci�cations.
Data and control planes are decoupled in Kandoo. Switches can operate without having a
local controller; control applications function regardless of their physical location. �e main
advantage of Kandoo is that it gives network operators the freedom to con�gure the deployment
model of control plane functionalities based on the characteristics of control applications.
�e design and implementation of Kandoo are presented in Section 3.2. Our experiments
con�rm that Kandoo scales an order of magnitude better than a normal OpenFlow network and
would lead to more than 90% of events being processed locally under reasonable assumptions,
as described in Section 3.3. Applications of Kandoo are not limited to the evaluation scenarios
Chapter 3. Kandoo 47
presented in this section. In Section 3.4, we brie�y discuss other potential applications of
Kandoo and compare it to existing solutions. We conclude our discussion of Kandoo in
Section 3.5.
3.2 Design and Implementation
Design Objectives. Kandoo is designed with the following goals in mind:
1. Kandoo must be compatible with OpenFlow: we do not introduce any new data plane
functionality in switches, and, as long as they support OpenFlow, Kandoo supports them,
as well.
2. Kandoo is easy to use and automatically distributes control applications without any
manual intervention. In other words, Kandoo control applications are not aware of
how they are deployed in the network, and application developers can assume their
applications would be run on a centralized OpenFlow controller. �e only extra
information Kandoo needs is a �ag showing whether a control application is local or not.
In this section, we explain Kandoo’s design using a toy example. We show how Kandoo can
be used to reroute elephant �ows in a simple network of three switches (Figure 3.2). Our example
has two applications: (i) Appdetect , and (ii) Appreroute . Appdetect constantly queries each switch
to detect elephant �ows. Once an elephant �ow is detected, Appdetect noti�es Appreroute , which
in turn may install or update �ow-entries on network switches.
It is extremely challenging, if not impossible, to implement this application in current
OpenFlow networks without modifying switches [34]. If switches are not modi�ed, a (logically)
centralized control needs to frequently query all switches, which would place a considerable
load on control channels.
Kandoo Controller. As shown in Figure 3.3, Kandoo has a controller component at its core.
�is component has the same role as a general OpenFlow controller, but it has Kandoo-speci�c
Chapter 3. Kandoo 48
Root Controller
Local ControllerLocal Controller
SwitchSwitch
Appdetect
End-host Switch
Appdetect
End-host
Flow-Entry
Eelephant
LegendLogical Control Channel
Datapath Connection
Local Controller
Appdetect
Appreroute
Flow-Entry
Figure 3.2: Toy example for Kandoo’s design: In this example, two hosts are connected usinga simple line topology. Each switch is controlled by one local Kandoo controller. �e rootcontroller controls the local controllers. In this example, we have two control applications:Appdetect is a local control application, but Appreroute is non-local.
functionalities for identifying local and non-local application, hiding the complexity of the
underlying distributed application model, and propagating events in the control plane.
A network controlled by Kandoo has multiple local controllers and a logically centralized root
controller.1 �ese controllers collectively form Kandoo’s distributed control plane. Each switch
is controlled by only one Kandoo controller, and each Kandoo controller can control multiple
switches. If the root controller needs to install �ow-entries on switches of a local controller, it
delegates the requests to the respective local controller. Note that for high availability, the root
controller can register itself as the slave controller for a speci�c switch (this behavior is supported
in OpenFlow 1.2).
Deployment Model. �e deployment model of Kandoo controllers depends on the charac-
teristics of a network. For so�ware switches, local controllers can be directly deployed on the
same end-host. Similarly, if we can change the so�ware of a physical switch, we can deploy
Kandoo directly on the switch. Otherwise, we deploy Kandoo local controllers on the processing
resources closest to the switches. In such a setting, one should provision the number of local
controllers based on the workload and available processing resources. Note that we can use1We note that the root controller in Kandoo can itself be distributed.
Chapter 3. Kandoo 49
Non-local ApplicationNon-local
Application
Local Application
Local Application
Data-path Element
Local Controller
Local Applications
RegisterApplication
ActionsEvents
Events
Root Controller
RelayEvents
Subscribe to Events
OpenFlowPackets
Data-path ElementSwitches
API
API Non-local
ApplicationsActionsEvents
Events
RegisterApplication
Kandoo Controllers
Figure 3.3: Kandoo’s high level architecture.
a hybrid model in real settings. For instance, consider a virtualized deployment environment
depicted in Figure 3.4, where virtual machines are connected to the network using so�ware
switches. In this environment, we can place local controllers in end-hosts next to so�ware
switches and in separate nodes for other switches.
In our toy example (Figure 3.2), we have four Kandoo controllers: three local controllers
controlling the switches and a root controller. �e local controllers can be physically positioned
using any deployment model explained above. Note that, in this example, we have the maximum
number of local controllers required.
Control Applications. In Kandoo, control applications are implemented using the abstraction
provided by the platform and are not aware of their actual placement (e.g., on local controllers
or on the root controller). �ey are generally OpenFlow applications and can therefore send
OpenFlow messages and listen on events. Moreover, they can emit application-de�ned events,
which can be consumed by other applications, and can reply to an event emitted by another
application. In Kandoo, control applications use Packet [5] to declare, serialize, and deserialize
events.
Chapter 3. Kandoo 50
Processing Resource
End-Host
Physical SwitchVM
Root Controller
Local Controller
Local Controller
Software Switch
VM
LegendLogical Control Channel
Data Path Links
Physical SwitchVM
Figure 3.4: Kandoo in a virtualized environment. For so�ware switches, we can leverage thesame end-host for local controllers, and, for physical switches, we use separate processingresources.
Control applications are loaded in local name spaces and can communicate only using
events. �is is to ensure that Kandoo does not introduce faults by o�oading applications. In
our example, Eel ephant is an application-de�ned event that carries matching information about
the detected elephant �ow (e.g., its OpenFlow match structure) and is emitted by Appdetect .
A local controller can run an application only if the application is local. In our example,
Appreroute is not local, i.e., it may install �ow-entries on any switch in the network. �us, the root
controller is the only controller able to run Appreroute . In contrast, Appdetect is local; therefore,
all controllers can run it.
Event Propagation. �e root controller can subscribe to speci�c events in the local controllers
using a simple messaging channel plus a �ltering component. Once the local controller receives
and locally processes an event, it relays the event to the root controller for further processing.
Note that all communications between Kandoo controllers are event-based and asynchronous.
In our example, the root controller subscribes to events of type Eel ephant in the local
controllers since it is running Appreroute listening on Eel ephant . Eel ephant is �red by an Appdetect
Chapter 3. Kandoo 51
instance deployed on one of the local controllers and is relayed to root controller. Note that if
the root controller does not subscribe to Eel ephant , the local controllers will not relay Eel ephant
events.
It is important to note that the data �ow in Kandoo is not always bottom-up. A local
application can explicitly request data from an application deployed on the root controller by
emitting an event, and applications on the root controllers can send data by replying to that
event. For instance, we can have a topology service running on the root controller that sends
topology information to local applications by replying to events of a speci�c type.
Reactive vs. Proactive. Although Kandoo provides a scalable method for event handling, we
strongly recommend pushing network state proactively. We envision Kandoo to be used as a
scalable, adaptive control plane, where the default con�guration is pushed proactively and is
adaptively re�ned a�erwards. In our toy example, default paths can be pushed proactively, while
elephant �ows will be rerouted adaptively.
3.3 Evaluation
In this section, we present the results obtained for the elephant �ow detection problem. �is
evaluation demonstrates the feasibility of Kandoo to distribute the control plane at scale and
strongly supports our argument.
Setup. In our experiments, we realize a two-layer hierarchy of Kandoo controllers as shown
in Figure 3.5. In each experiment, we emulate an OpenFlow network using a slightly modi�ed
version of Mininet [78] hosted on a physical server equipped with 64G of RAM and 4 Intel
Xeon(R) E7-4807 CPUs (each with 6 cores). Here, we do not enforce any rate, delay or bu�ering
for the emulated network. We use OpenVSwitch 1.4 [101] as our kernel-level so�ware switch.
Elephant Flow Detection. We implemented Elephant Flow Detection applications (i.e.,
Appreroute and Appdetect) as described in Section 3.2. As depicted in Figure 3.5, Appdetect
is deployed on all Kandoo local controllers, whereas Appreroute is deployed only on the root
Chapter 3. Kandoo 52
Host Host Host Host
Local Controller 0
Local Controller 1 Local Controller N
Root Controller
Appreroute
Appdetect
Appdetect
Appdetect
LearningSwitch
LearningSwitch
LearningSwitch
Core Switch
ToR Switch 1 ToR Switch N
Figure 3.5: Experiment Setup. We use a simple tree topology. Each switch is controlled by onelocal Kandoo controller. �e root Kandoo controller manages the local controllers.
controller. Our implementation of Appdetect queries only the top-of-rack (ToR) switches to
detect the elephant �ows. To distinguish ToR switches from the core switch, we implemented
a simple link discovery technique. Appdetect �res one query per �ow per second and reports a
�ow as elephant if it has sent more than 1MB of data. Our implementation of Appreroute installs
new �ow entries on all switches for the detected elephant �ow.
Learning Switch. In addition to Appdetect and Appreroute , we use a simple learning switch
application on all controllers to setup paths. �is application associates MAC addresses to ports
and installs respective high granular �ow entries (i.e., destination MAC adress, IP address and
TCP port number) with an idle timeout of 100ms on the switch. We note that the bandwidth
consumed for path setup is negligible compared to the bandwidth consumption for elephant
�ow detection. �us, our evaluation results would still apply, even if we install all paths
proactively.
Methodology. In our experiments, we aim to study how this control plane scales with respect to
the number of elephant �ows and the network size compared to a normal OpenFlow network
(where all three applications are on a single controller and Appdetect queries switches at the same
Chapter 3. Kandoo 53
rate). We measure the number of requests processed by each controller and their bandwidth
consumption. We note that our goal is to decrease the load on the root controller in Kandoo.
Local controllers handle events locally, which consume far less bandwidth compared to events
sent to the root controller. Moreover, Kandoo’s design makes it easy to add local controllers
when needed, e�ectively making the root controller the only potential bottleneck in terms of
scalability.
Our �rst experiment studies how these applications scale with respect to the number of
elephant �ows in the network. In this experiment, we use a tree topology of depth 2 and
fanout 6 (i.e., 1 core switch, 6 top-of-rack switches, and 36 end-hosts). Each end-host initiates
one hundred UDP �ows randomly to other hosts in the network over the course of four seconds.
We use two types of UDP �ows: (i) mouse �ows that will last only 1 second, and (ii) elephant
�ows that will last for 4 seconds. �is results in 3600 UDP �ows in total. In this experiment, we
measure the rate of messages sent over the control channels, averaged for 25 experiments.
To forward a UDP �ow towards an end-host connected to the same ToR switch, we install
only one �ow-entry. Otherwise, we will install three �ow-entries (one on the source ToR switch,
one on the core, and one on the destination ToR switch). �e probability of establishing a UDP
�ow from a given host to another host on the same ToR switch is only 14%. �is synthetic
workload stresses Kandoo since most �ows are not local. As reported in [16], datacenter tra�c
has locality, and Kandoo would therefore perform better in practice.
As depicted in Figure 3.6, control channel consumption and the load on the central
controller are considerably lower for Kandoo, even when all �ows are elephant2: �ve times
smaller in terms of messages (Figure 3.6a), and an order of magnitude in terms of bytes
(Figure 3.6b). �e main reason is that, unlike the normal OpenFlow, the central controller does
not need to query the ToR switches. Moreover, Appdetect �res one event for each elephant �ow,
which results in signi�cantly less events.
To study Kandoo’s scalability based on the number of nodes in the network, we �x ratio of
2When all �ows are elephant, Appdetec t �res an event for each �ow. �us, it re�ects the maximum number ofevents Kandoo will �re for elephant �ow detection.
Chapter 3. Kandoo 54
0 500
1000 1500 2000 2500 3000
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Mes
sage
s Per
Sec
ond
Ratio of Elephant to Mouse Flows
Normal OpenFlow Controller Kandoo's Root Controller
(a) Average number of messages received by the controllers
0 50
100 150 200 250 300
10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Band
wid
th C
onsu
mpt
ions
(k
B/s)
Ratio of Elephant to Mouse Flows
Normal OpenFlow Controller Kandoo's Root Controller
(b) Average number of bytes received by the controllers
Figure 3.6: Control Plane Load for the Elephant Flow Detection Scenario. �e load is based onthe number of elephant �ows in the network.
the elephant �ows at 20% and experiment with networks of di�erent fanouts. As illustrated in
Figure 3.7, the role of local controllers is more pronounced when we have larger networks. �ese
controllers scale linearly with the size of the network and e�ectively shield the control channels
from frequent events. Consequently, Kandoo’s root controller handles signi�cantly less events
compared to a normal OpenFlow network.
Based on these two experiments, Kandoo scales signi�cantly better than a normal OpenFlow
network. �e performance of Kandoo can be improved even further by simple optimization in
Appdetect . For example, the load on the central controller can be halved if Appdetect queries only
Chapter 3. Kandoo 55
0
1000
2000
3000
4000
5000
6000
2 3 4 5 6 7
Mes
sage
s Per
Sec
ond
Fanout
Normal Open!ow Root Controller
(a) Average number of packets received by the controllers
0
100
200
300
400
500
600
2 3 4 5 6 7
Band
wid
th C
onsu
mpt
ion
(k
B/s)
Fanout
Normal Open!ow Root Controller
(b) Average number of bytes received by the controllers
Figure 3.7: Control Plane Load for the Elephant Flow Detection Scenario. �e load is based onthe number of nodes in the network.
the �ow-entries initiated by the hosts directly connected to the respective switch. Moreover,
the perceived functionality of our elephant �ow rerouting application remains the same in both
cases. �at is, Kandoo does not improve nor hurt the ability to detect or reroute elephant �ows.
In essence, the main advantage of Kandoo is that it enables us to deploy the same application in
large-scale networks by shielding the control channels from the load of frequent events.
3.4 Discussion
Datapath Extensions. �e problem that we tackle with Kandoo is a generalization of several
previous attempts at scaling SDN. A class of solutions, such as DIFANE [131] and DevoFlow [34],
address this problem by extending data plane mechanisms of switches with the objective of
Chapter 3. Kandoo 56
reducing the load towards the controller. DIFANE tries to partly o�oad forwarding decisions
from the controller to special switches, called authority switches. Using this approach, network
operators can reduce the load on the controller and the latencies of rule installation. DevoFlow,
on the other hand, introduces new mechanisms in switches to dispatch far fewer “important”
events to the control plane. Kandoo has the same goal, but, in contrast to DIFANE and
DevoFlow, it does not extend switches; instead, it moves control plane functions closer to
switches. Kandoo’s approach is more general and works well in datacenters, but it might have a
lower throughput than speci�c extensions implemented in hardware.
Processing frequent events using hardware primitives in forwarding silicone can result in
considerably better e�ciency and lower costs at scale compared to a centralized controller. �is
method, however, comes with its own pitfalls [58]: (i) Designing a special purpose hardware
for a new function imposes a rigid structure; hence, that function cannot evolve frequently. If
we need to update the design (say to �x an error, or to improve performance) we need to wait
for a considerably long time for the next development cycle. (ii) Design, development, and
manufacturing special purpose hardware is quite costly (orders of magnitude more expensive
than so�ware), which increases the risk for already risk-averse ASIC manufacturers.
Because of these pitfalls, silicone is not the best way for introducing new network functions,
where chances of change is not negligible. In essence, it would make sense to use silicone for
implementing a speci�c function only when (i) the function is mature enough (i.e., applied
and tested in real-world for quite some time), and (ii) we have hit a performance barrier on
commodity hardware. �is explains OpenFlow’s focus on forwarding as the main data path
functionality.
Interestingly, we can use Kandoo to prototype and test DIFANE, DevoFlow, or other
potential hardware extensions. For instance, an authority switch in DIFANE can be emulated by
a local Kandoo controller that manages a subset of switches in the network. As another example,
DevoFlow’s extensions can also be emulated using Kandoo controllers directly installed on
switches. �ese controllers not only replace the functionality of DIFANE or DevoFlow, but they
Chapter 3. Kandoo 57
also provide a platform to run any local control application in their context.
DistributedControllers. HyperFlow [116], ONIX [75], SiBF [81], and Devolved Controllers [115]
try to distribute the control plane while maintaining logically centralized, eventually consistent
network state. Although these approaches have their own merits, they impose limitations on
applications they can run. �is is because they assume that all applications require the network-
wide state; hence, they cannot be of much help when it comes to local control applications. �at
said, the distributed controllers can be used to realize a scalable root controller, the controller
that runs non-local applications in Kandoo.
Middleboxes. An alternative way to introduce a new network function is outside the scope of
SDN in so�ware modeled as black-boxes sitting either at the core or at the edge of the network,
or so�ware deployed on processing resources available inside switches such as co-processors and
FPGAs. �e common trait among these proposals is that they all rely on SDN solely for steering
packets through black-boxes and packet processing units [58]. Middlebox architectures, such as
FlowStream [50], SideCar [113] and CoMb [110], provide scalable programmability in the data
plane by intercepting �ows using processing nodes in which network applications are deployed.
Kandoo is orthogonal to these approaches in the sense that it operates in the control plane,
but it provides a similar distribution for control applications. In a network equipped with
FlowStream, SideCar or CoMb, Kandoo can share the processing resources with middleboxes
(given that control and data plane applications are isolated) in order to increase resource
utilization and decrease the number of nodes used by Kandoo.
Active Networks. Active Networks (AN) and SDN represent di�erent schools of thought on
programmable networks. SDN provides programmable control planes, whereas ANs allow
programmability in networking elements at packet transport granularity by running code
encapsulated in the packet [107, 125] or installed on the switches [114, 36]. An extreme
deployment of Kandoo can deploy local controllers on all switches in the network. In such
a setting, we can emulate most functionality of ANs. �at said, Kandoo di�ers from active
networks in two ways. First, Kandoo does not provide inbound packet processing; instead, it
Chapter 3. Kandoo 58
follows the fundamentally di�erent approach proposed by SDN. Second, Kandoo is not an all-
or-nothing solution (i.e., there is no need to have Kandoo support on switches). Using Kandoo,
network operators can still gain e�ciency and scalability using commodity middle boxes, each
controlling a partition of switches.
3.5 Remarks
Kandoo is highly con�gurable, quite simple and scalable. It uses a simple yet e�ective approach
for creating a distributed control plane: it processes frequent events in highly parallel local
control applications and rare events in a central location. As con�rmed by our experiments,
Kandoo scales remarkably better than a normal OpenFlow implementation, without modifying
switches or using sampling techniques.
Kandoo can co-exist with other controllers by using either FlowVisor [112] or customized
Kandoo adapters. Having said that, extra measures should be taken to ensure consistency. �e
major issue is that Kandoo local controllers do not propagate an OpenFlow event unless the root
controller subscribes to that event. �us, without subscribing to all OpenFlow events in all local
controllers, we cannot guarantee that existing OpenFlow applications work as expected.
In the next chapter, we present another distributed control platform, Beehive, that supports
a larger category of distributed control applications beyond local control applications. Using
Beehive, one can create a generic hierarchy of controllers as well as other topologies for control
applications.
Chapter 4
Beehive
As demonstrated in Chapter 3, Kandoo is quite e�ective when the control logic can be broken
into local and centralized applications. However, there are applications that are neither local
nor require the whole network-wide state. For example, a virtual networking application can
operate using the state of a single tenant and does not require a centralized view of the network.
Kandoo models such applications as centralized applications, which is unnecessarily limited.
Aiming to address this issue, in this chapter, we present Beehive which is powerful enough
to support a wide-range of applications from local to centralized, and a variety of distribution
models including generic hierarchies.
4.1 Introduction
Distributed control platforms are employed in practice for the obvious reasons of scale and
resilience [75, 61]. Existing distributed controllers are designed for scalability and availability,
yet expose the complexities and boilerplates of realizing a control application to network
programmers [86]. In such a setting, designing a distributed control application comprises
signi�cant e�orts beyond the design and implementation of the application’s core: one needs
to address consistency, concurrency, coordination, and other common hurdles of distributed
programming that are pushed into control applications in existing distributed control platforms.
59
Chapter 4. Beehive 60
Beehive. Our motivation is to develop a distributed control plane that is scalable, e�cient,
and yet straightforward to program. Our proposal is composed of a programming model1 and
a control platform. Using Beehive’s programming model, control applications are developed
as a collection of message handlers that store their state in dictionaries (i.e., a hash-map,
an associative array), intentionally similar to centralized controllers to preserve simplicity.
Beehive’s distributed control platform is the runtime of our programming model and is hidden
from network programmers. Exploiting the proposed programming model, it seamlessly
transforms the centralized application into their distributed counterparts.
Using this programming model, Beehive is able to provide fault-tolerance, instrumentation,
and automatic optimization. To isolate local faults, our platform employs application-level
transactions with an “all-or-nothing” semantic. Moreover, it employs mini-quorums to replicate
application dictionaries and to tolerate failure. �is is all seamless without intervention by the
programmer. Beehive also provides e�cient runtime instrumentation (including but not limited
to resource consumption, and the volume of messages exchanged) for control applications. �e
instrumentation data is used to (i) automatically optimize the control plane, and (ii) to provide
feedback to network programmers. In this chapter, we showcase a greedy heuristic as a proof of
concept for optimizing the placement of control applications, and provide examples of Beehive’s
runtime analytics for control applications.
Why not an external datastore? Emerging control platforms, most notably ONOS [17], try
to resolve controller complexities by delegating their state management to an external system
(e.g., replicated or distributed databases). In Beehive, we propose an alternative design since
delegating such functionality has important drawbacks.
It has been shown that, for SDN, using an external datastore incurs a considerable
overhead [76]. A�er all, embedding the state inside the controllers (as pioneered by ONIX [75])
has signi�cantly lower communication overheads than using an external system. Further,
creating a correct distributed control function requires coordination and synchronization1It is important to note that, by programming model, we mean a set of APIs to develop control applications
using a general purpose programming language, not a new domain-speci�c language (DSL).
Chapter 4. Beehive 61
among control applications. �is is essentially beyond state management as demonstrated
in [123, 23]. For example, even if two instances of a routing application store their state in a
distributed store, they need to coordinate in order to avoid blackholes and loops. Implementing
such coordination mechanisms eliminates many of the original motivations of utilizing an
external datastore.
More importantly, in such a setting, it is the external datastore that dictates the placement
and the topology of control functions. For instance, if the external datastore is built as a ring, the
control plane cannot form an e�ective hierarchy, and vice versa. �is lack of �ne-grained control
over the placement of state also limits the capabilities of the control plane to self-optimize,
which is one of the goals of Beehive and has shown to be crucial for SDN [37]. For instance,
consider a controller with a virtual networking application that is managing a virtual network.
If the virtual network migrates, the control plane needs to migrate the state in accordance to
the change in the workload. �is is di�cult to achieve without �ne-grained control. Similarly,
providing Beehive’s runtime instrumentation, analytics, and automatic optimization are quite
di�cult to realize when an external datastore is employed.
Overview. Our implementation of Beehive is open source and publicly available [2]. Using
Beehive, we have also implemented a full featured SDN controller [3] (presented in Chapter 5)
as well as important SDN and non-SDN sample applications such as Kandoo [57], tra�c
engineering, a key-value store (similar to memcached), a message queue, and graph processing.
With systematic optimizations, our implementation can failover in less than 200ms, and our
SDN controller can handle more than 200k OpenFlow messages per machine while persisting
and replicating the state of control applications. Interestingly, existing controllers that use an
external datastore can handle up to a thousand messages with persistence and replication. We
also demonstrate that our runtime instrumentation has negligible overheads. Moreover, using
this runtime instrumentation data, we showcase a heuristic algorithm that can optimize the
placement of control applications.
Chapter 4. Beehive 62
1 App LearningSwitch:// A message handler that handles messages of type PacketIn.
2 Rcv(pin PacketIn):3 switches = dict("Switches")
// Associate the source MAC address to the input port for the switch thathas sent the PacketIn message.
4 mac2port = switches.Get(pin.Switch)5 mac2port[pin.SrcMac] = pin.InPort6 switches.Put(pin.Switch, mac2port)
// Lookup the output port for the destination MAC address.7 outp = mac2port.Get(pin.DstMac)8 if !outp then
// Flood the packet if the destination MAC address is not learned yet.9 Reply(pin, PacketOut{Port: FLOOD_PORT, ...})10 return
// Send the packet to the learned output port.11 Reply(pin, PacketOut{Port: outp, ...})
// Install a flow on the switch to forward consequent packets.12 Reply(pin, FlowMod{...})
13 App OpenFlowDriver:14 Rcv(pout PacketOut):
// Send packet out to the switch using the OpenFlow connection.15 ...
16 Rcv(fmod FlowMod):// Send the flow mod to the switch using the OpenFlow connection.
17 ...
18 ...// The IO routine.
19 while read data from the OpenFlow connection do20 ...// The emitted PacketIn will be sent to all applications that has a
handler for PacketIn, including the LearningSwitch application.21 Emit(PacketIn{...})22 ...
Figure 4.1: A simple learning switch application in Beehive.
4.2 Programming Model
In Beehive, programs are written in a general purpose language (Go in our implementation)
using a speci�c set of APIs called the Beehive programming model. Creating the distributed
version of an arbitrary control application that stores and shares its state in free and subtle forms
is extremely di�cult in general. �e Beehive programming model enables us to do that, while
imposing minimal limitations on the control applications.
�e Anatomy of an Application. Using Beehive’s programming model, a control application is
Chapter 4. Beehive 63
implemented as a set of single-threaded message handlers (denoted by Rcv) that are triggered by
asynchronous messages and can emit further messages to communicate with other applications.
For example, as shown in Figure 4.1, a sample learning switch application has a Rcv function for
PacketIn messages, and an OpenFlow driver has Rcv functions for PacketOut and FlowMod
messages. In our design, messages are either (i) emitted to all applications that have a handler
for that particular type of message, or (ii) generated as a reply to another message. A reply
message is processed only at its destination application. For example, the OpenFlow driver
emits a PacketIn message whenever it receives the event from the switch. Inside the PacketIn
message, Beehive automatically stores that this message is sent by the OpenFlow driver. �e
PacketInmessage is processed in any application that has a handler forPacketInmessages, say
the learning switch and a discovery application. �e learning switch application, upon handling
this PacketIn message, replies to the message with a PacketOut and/or a FlowMod message,
and these messages are directly delivered to the OpenFlow driver, not any other application.
To preserve state, message handlers use local application dictionaries (i.e., maps or
associative array). In Figure 4.1, the learning switch application uses its Switches dictionary
to store the learned “MAC address to port mapping” for each switch. As shown in Figure 4.2,
the Switches dictionary associates a switch to the “MAC address to port mapping” learned
from the packets received from that switch. As explained in Section 4.3, these dictionaries
are transactional, replicated, and persistent behind the scenes (i.e., hidden from control
applications, and without network programmer’s intervention). Note that message handlers
are arbitrary programs written in a general purpose programming language, with the limitation
that any data stored outside the dictionaries is ephemeral.
Let us demonstrate this programming model for another simple example.
Example: Elephant FlowRouting. As shown in Figure 4.4, an elephant �ow routing application
can be modeled as four message handlers at its simplest: (i) Init, which handles SwitchJoined
messages emitted by the OpenFlow driver and initializes the Stats dictionary for switches.
(ii) Query, which handles periodic timeouts and queries switches. (iii) Collect, which handles
Chapter 4. Beehive 64
SwitchN
Switch1Switch1’s Port906:05:04:03:02:01
Switch1’s Port101:02:03:04:05:06
SwitchN’s Port405:06:07:08:09:0A
SwitchN’s Port80A:09:08:07:06:05
Switches
MAC Address to Port Association
Figure 4.2: �e LearningSwitch application (Figure 4.1) learns the MAC address to port mappingon each switch and stores them in the Switches dictionary.
StatReplies (i.e., sent by the OpenFlow switch in response to stat queries), populates the time-
series of �ow statistics in the Stats dictionary, and detects and reports tra�c spikes. (iv) Route,
which re-steers elephant �ows using FlowMod messages upon detecting a tra�c spike. Init,
Query, and Collect share the Stats dictionary to maintain the �ow stats, and Route stores an
estimated tra�c matrix in the TrafficMatrix dictionary. It also maintains its topology data,
which we omitted for brevity.
Inter-Dependencies. Message handlers can depend on one another in two ways: (i) they can
either exchange messages, or (ii) access an application-de�ned dictionary, only if they belong to
the same application. �e interplay of these two mechanisms enables the control applications to
mix di�erent levels of state synchronization and hence di�erent levels of scalability. As explained
shortly, communication using the shared state results in synchronization in the application in
contrast to asynchronous messages.
In our sample elephant �ow routing application, Init, Collect, and Query share the Stats
dictionary, and communicate with Route using TrafficSpike messages. Moreover, Init,
Collect, Query, and Route depend on the OpenFlow driver that emits SwitchJoined and
StatReply messages and handles StatQuerys and FlowMods.
Message handlers of two di�erent applications can communicate using messages, but cannot
access each other’s dictionaries. �is is not a limitation, but a programming paradigm that
fosters scalability (as advocated in Haskell, Erlang, Go, and Scala). Dependencies based on
Chapter 4. Beehive 65
1 App ElephantFlowRouting:// Init: Handles SwitchJoined messages.
2 Rcv(joined SwitchJoined):3 stats = Dict("Stats")4 switch = joined.Switch
// Initializes the stats dictionary for the new switch.5 stats.Put(switch, FlowStat{Switch: switch})
// Query: Handles timeout messages. The timer emits one timeout message persecond.
6 Rcv(to TimeOut):7 stats = Dict("Stats")
// Iterates over the keys in the stats dictionary. Note that theseentries are initialized in Init.
8 for each switch in stats:// Emits a StatQuery for each switch in the "stats" dictionary. Notethat StatQuery is handled by the OpenFlow driver.
9 Emit(StatQuery{Switch: switch})
// Collect: Handles StatReply messages emitted by the OpenFlow driver inresponse to StatQuery messages.
10 Rcv(reply StatReply):11 stats = Dict("Stats")12 switch = reply.Switch
// Compares the previous statistics with the new one, and emits aTrafficSpike message if it detects a significant change in the stats.
13 oldSwitchStats = stats.Get(switch)14 newSwitchStats = reply.Stats15 if there is a spike in newSwitchStats vs oldSwitchStats then16 Emit(TrafficSpike{Stats: newSwitchStats})
17 stats.Put(switch, newSwitchStats)
// Route: Handles messages of type TrafficSpike.18 Rcv(spike TrafficSpike):19 trafficMatrix = Dict("TrafficMatrix")20 topology = Dict("Topology")
// Update trafficMatrix based on spike and then find the elephant flows.21 ...22 elephantFlows = findElephantFlows(trafficMatrix)23 for each flow in elephantFlows:
// Emit flow mods to reroute elephant flows based on the topology.24 Emit(FlowMod{...})
25// Details on handling switch and link discovery is omitted for brevity.
Figure 4.3: �is simple elephant �ow routing application stores �ow periodically collects stats,and accordingly reroutes tra�c. �e schematic view of this algorithm is also portrayed inFigure 4.4.
messages result in a better decoupling of functions, and hence can give Beehive the freedom
to optimize the placement of applications. �at said, due to the nature of stateful control
Chapter 4. Beehive 66
Control Plane
Elephant Flow Routing
// CollectRcv(reply StatReply)
OpenFlow Driver SwitchJoined
SwitchJoined
StatReply
StatReply
StatQuery
StatQuery
Timeout
Timeout
FlowMod
FlowMod
TrafficSpike
TrafficSpike
DictionaryTopology
DictionaryStats
Timer
StatsNSN
Stats1S1
// InitRcv(joined SwitchJoined)
// QueryRcv(to Timeout)
// RouteRcv(spike TrafficSpike)
Switch Switch Switch Switch
TrafficMatrix
Figure 4.4: �e schematic view of the simple elephant �ow routing application presented inFigure 4.3.
applications, functions inside an application can share state. It is important to note that Beehive
provides utilities to emulate synchronous and asynchronous cross-application calls on top of
asynchronous messages. �ese utilities are as easy to use as function calls across applications.
Consistent Concurrency. Transforming a “centralized looking” application to a concurrent,
distributed system is trivial for stateless applications: one can replicate all message handlers on
all cores on all machines. In contrast, this process can be challenging when we deal with stateful
applications. To realize a valid, concurrent version of a control application, we need to make
sure that all message handlers, when distributed on several controllers, have a consistent view
of their state and their behavior is identical to when they are run as a single-threaded, centralized
application.
To preserve state consistency, we need to ensure that each key in each dictionary is
Chapter 4. Beehive 67
exclusively accessed in only one thread of execution in the distributed control plane. In
our elephant �ow routing example (Figure 4.4), Init, Query and Collect access the Stat
dictionary on a per switch basis. As such, if we process all the messages of a given switch
in the same thread, this thread will have a consistent view of the statistics of that switch. In
contrast, Route accesses the whole Topology and TrafficMatrix dictionaries and, to remain
consistent, we have to make sure that all the keys in those dictionaries are exclusively owned by
a single thread. Such a message handler is essentially centralized.
4.3 Control Platform
Beehive’s control platform acts as the runtime environment for the proposed programming
model. In other words, the programming model is the public API that is used to develop control
applications, and the control platform is our implementation of that API. We have implemented
Beehive’s control platform in Go. Our implementation is open source and available [2] as a
generic, distributed message passing platform with functionalities such as queuing, parallelism,
replication, and synchronization, with no external dependencies. �is control platform is built
based on the primitives of hive, cells, and bees for concurrent and consistent state access in a
distributed fashion with runtime instrumentation and automatic optimization. We note that
these primitives (i.e., hives, cells and bees) are internal to the control platform and are not
exposed. �at is, network programmers are not expected to know or use these concepts in their
control applications.
4.3.1 Primitives
Hives and Cells. In Beehive, a controller is denoted as a hive that maintains applications’
state in the form of cells. Each cell is a key-value pair in a speci�c application dictionary:
(dict, key, val). For instance, our elephant �ow routing application (Figure 4.4) has a cell
for the �ow statistics of switch S in the form of (Stats, S, StatsS).
Chapter 4. Beehive 68
Hive on Machine 3 App A App B
Hive on Machine 1
App A
rcv 1
rcv 2
map 1
map 2
Hive on Machine 2App A App B
MMMessages
Will be processed by the red Bee using A’s Rcv 1.
App B
rcv 1map 1
M
M
Figure 4.5: In Beehive, each hive (i.e., controller) maintains the application state in the formof cells (i.e., key-value pairs). Cells that must be colocated are exclusively owned by one bee.Messages mapped to the same cell(s) are processed by the same bee.
Bees. For each set of cells that must be co-located to preserve consistency (as discussed in
Section 4.2), we create an exclusive, logical and light-weight thread of execution called a bee.
As shown in Figure 4.5, upon receiving a message, the hive �nds the particular cells required to
process that message in a given application and, consequently, relays the message to the bee that
exclusively owns those cells. A bee, in response to a message, invokes the respective message
handlers using its cells as the application’s state. If there was not such a bee, the hive creates one
to handle the message.
In our elephant �ow routing example, once the hive received the �rst SwitchJoined
message for switch S, it creates a bee that exclusively owns the cell (Stats, S, StatsS) since
the Rcv function (i.e., Init) requires that cell to process the SwitchJoined message. �en,
the respective bee invokes the Rcv function for that message. Similarly, the same bee handles
consequent StatReplys for S since it exclusively owns (Stats, S, StatsS). �is ensures that
Collect and Init share the same consistent view of the Stats dictionary and their behavior is
identical to when they are deployed on a centralized controller, even though the cells of di�erent
switches might be physically distributed over di�erent hives.
Finding the Bee.�e precursor to �nding the bee to relay a message is to infer the cells required
to process that message. In Beehive’s control platform, each message handler is accompanied
with a Map function that is either (i) automatically generated by the platform or (ii) explicitly
implemented by the programmer.
Chapter 4. Beehive 69
Map(A, M) is a function of applicationA that maps a message of typeM to a set of cells (i.e., keys
in application dictionaries {(D1, K1), . . . , (Dn, Kn)}). We call this set, the mapped cells of M in A.
We note that, although we have not used a stateful Map function, a Map function can be stateful
with the limitation that its state is local to each hive.
We envision two special cases for the mapped cells: (i) if nil, the message is dropped for
application A. (ii) if empty, the message is broadcasted to all bees on the local hive. �e latter
is particularly useful for simple iterations over all keys in a dictionary (such as Query in our
elephant �ow routing example).
AutomaticallyGenerated Map Functions. Our control platform is capable of generating the Map
functions at both compile-time and runtime. Our compiler automatically generates the missing
Map functions based on the keys used by message handlers to process a message. �e compiler
parses the message handlers, prunes the code paths that are not reachable by a dictionary access
instruction, and generates the code to retrieve respective keys to build the intended mapped
cells.
// InitRcv(joined SwitchJoined): stats = dict("Stats")
switch = joined.Switch
stats.Put(switch, FlowStat{Switch: switch})
// QueryRcv(to TimeOut): stats = Dict("Stats")
for each switch in stats:
Emit(StatQuery{Switch: switch})
// CollectRcv(reply StatReply): stats = Dict("Stats")
switch = reply.Switch
oldSwitchStats = stats.Get(switch) newSwitchStats = reply.Stats if there is a spike then Emit(TrafficSpike{Stats: newSwitchStats})
stats.Put(switch, newSwitchStats)
Map(joined SwitchJoined): switch = joined.Switch return {(“Stats”, switch)}
Map(to TimeOut): return {}
Map(reply StatReply): switch = reply.Switch return {(“Stats”, switch)}
Compiler
Compiler
Compiler
Figure 4.6: Automatically generated Map functions based on the code in Figure 4.4.
Chapter 4. Beehive 70
For example, as shown in Figure 4.6, the compiler generates the Map functions based on the
code in Init, Query, and Collect, by parsing the Rcv functions and inferring the dictionary
accesses. Processing Collect, for instance, our compiler �rst detects that the stats local
variable is a reference to a dictionary named Stats. �en, it �nds the Put and Get invocations
on Stats. Parsing the parameters of those functions, it infers that the local variable switch
is passed as the key to both Get and Put invocations. �us, the Map function for Collect
should return ("Stats", switch). Since switch is an expression, the compiler also needs to
include all the statements that a�ect switch (e.g., switch = reply.Switch), while ignoring all
other statements that have no e�ects on this expression. In other words, the compiler keeps the
statements required to create the keys used to access dictionaries in the Rcv function.
Our compiler treats foreach loops in a di�erent way. In cases that the statements inside the
foreach loop access only the iterator key, it generates a local broadcast (i.e., an empty mapped
cells) for the Map function. For instance, in Query, we only access switch, which is the iterator
key. As such, Query accesses the dictionary on a per-key basis. �us, if we send the TimeOut
message to all bees on the current hive, each bee will call Query that iterates on all the cells of
that bee. �is is the expected behavior for Query.
In complicated cases where the compiler does not have enough information to infer the keys
(e.g., cross-library or recursive calls), it generates a Map function that results in a centralized
application. �is ensures the generated Map function is correct, but not necessarily e�cient.
To mitigate that, Beehive also provides a generic runtime Map function that invokes the Rcv
function for the message and records all state accesses in that invocation without actually
applying the modi�cations. �is method is generic and widely applicable, but it has a higher
cost compared to when the Map function is generated by the compiler or when it is explicitly
provided by the application.
Life of aMessage. In Beehive, a message is basically an envelope wrapping a piece of data that is
emitted by a bee and optionally has a speci�c destination bee. As shown in Figure 4.7, a message
has the following life-cycle on our platform:
Chapter 4. Beehive 71
M
1
App
rcv
map M2
App
rcv
map4
5
Hive 1 Hive 2
Mapped to the red cells.
3
R
Replies (if any) are directly sent to the bee that has emitted the M message.
M
6
Figure 4.7: Life of a message in Beehive.
1. A bee emits the message upon receiving an event (such as an IO event, or a timeout) or replies
to a message that it is handling in a Rcv function. In the later case, the message will have a
particular destination bee.
2. If the emitted message has a destination bee, it is directly relayed to that bee. Otherwise, on
the same hive, we pass the message to the Map functions of all applications that have a handler
for that type of message. �is is done asynchronously via queen bees. For each application on
each hive, we allocate a single queen bee that is responsible to map messages using the Map
function of its application. Queen bees are in essence Beehive’s message routers.
3. For each application, the Map function maps the message to application cells. Note that hives
are homogeneous in the sense that they all host the same set of applications, even though
they process di�erent messages and their bees and cells are not identical at runtime.
4. A�er �nding the mapped cells of a message, the queen bee tries to �nd the bee that exclusively
owns those cells. If there is such a bee (either on the same hive or on a remote hive), the
message is accordingly relayed. Otherwise, the queen bee creates a new bee2, assigns the cells
to it, and relays the message. To assign cells to a bee, we use the Ra� consensus algorithm [96]
among hives. Paying one Ra� consensus overhead to create a bee can be quite costly. To
avoid that, queen bees batch lock proposals (used to assign cells to bees) for messages that
are already enqueued with no added latency or wait. �is results in paying one Ra� overhead
2Although by default the new bee is created on the local hive andmoved to an optimal hive later, the applicationcan provide a custom placement strategy in cases where the default local placement is not a good �t for thatapplication’s design.
Chapter 4. Beehive 72
●
● ● ● ● ●2k
4k
6k
8k
1 3 5 7 9 11Cluster Size
Thr
ough
put (
bee/
s)
Batch Sizes: ● 128 256 512 1024 2048 4096
Figure 4.8: �roughput of a queen bee for creating new bees, using di�erent batch sizes. Here,we have used clusters of 1, 3, 5, 7, 9 and 11 hives. For larger batch sizes, the throughput of a queenbee is almost the same for all clusters.
for each batch. When the queen bee sends a batch of lock proposals, each proposal can either
succeed or fail. If a lock proposal fails, a bee on another hive has already locked the cells. In
such cases, the queen bee relays the message to the remote bee. As shown in Figure 4.8, the
average latency of creating a new bee can be as low as 140µs even when we use a cluster of 11
hives. Clearly, the larger the batch size, the better the throughput.
5. For each application, the respective bee processes the message by invoking the Rcv function.
6. If the Rcv function function replies to the message, the reply message is directly sent to the
bee that has emitted the received message (i.e., emitted in Step 1), and if it emits a message,
the message will go through Step 1.
It is important to note that, in our control platform, we use an unreliable message delivery
mechanism by default. For applications that require reliable message delivery, Beehive provides
lightweight reliable delivery wrappers. �ese wrappers have their own overheads as they store
their messages in a dictionary, exchange sequenced messages with other bees, acknowledge
received messages, drop duplicate messages, and retransmit messages upon timeouts.
Preserving Consistency Using Map. To preserve consistency, Beehive guarantees that all
messages with intersecting mapped cells for application A are processed by the same bee
using A’s message handlers. For instance, consider two messages that are mapped to
{(Switch, 1), (MAC, A1 ∶ ...)} and {(Switch, 1), (Port, 2)} respectively by an application. Since
these two messages share the cell (Switch, 1), the platform guarantees that both messages are
Chapter 4. Beehive 73
handled by the same bee that owns the 3 cells of {(Switch, 1), (MAC, A1 ∶ ...), (Port, 2)}. �is
way, the platform prevents two di�erent bees from modifying or reading the same part of the
state. As a result, each bee is identical to a centralized, single-threaded application for its own
cells.
Signi�cance of Map.�e exact nature of the mapping process (i.e., how the state is accessed) can
impact Beehive’s performance. Even though Beehive cannot automatically redesign message
handlers (say, by breaking a speci�c function in two) or change Map patterns for a given
implementation, it can help developers in that area by providing feedback that includes various
performance metrics, as we will explain in Section 4.3.4.
Con�icting Mapped Cells. It is rare3 yet possible that the platform observes con�ict-
ing mapped cells. For example, consider an application that maps two messages to
{(Mac, FF...)} and {(Port, 12)}. For these two mapped cells, Beehive will assign two distinct
bees since they are not intersecting. Now, if the application maps another message to
{(Switch, 1), (Mac, FF...), (Port, 12)}, we cannot assign this mapped cell to any of the existing
bee, nor can start a new one. In such con�icting cases, before processing the con�icting message,
the platform stops the bees with con�icting mapped cells, consolidates their dictionaries, and
allocate a new bee that owns all the three cells in {(Switch, 1), (Mac, FF...), (Port, 12)}. If
the bees are on di�erent hives, the platform uses Beehive’s bee migration mechanism that is
explained shortly.
Auxiliaries. In addition to the functions presented here, Beehive provides auxiliary utilities for
proactively locking a cell in a handler, for postponing a message, for composing applications, and
for asynchronous and synchronous cross-application calls (i.e., continuation): (i) Applications
can send message M directly to the bee responsible for cell (D, K) of Application A using
SendToCell(M, A, (D, K)). (ii) Using Lock(A, (D, K)), a bee running application A can proac-
tively lock the cell (D, K). �is is very useful when a bee needs to own a cell prior to receiving
any message. (iii) Using Snooze(M, T) immediately exits the message handler and schedules3We have not observed this situation in our real-world use-cases.
Chapter 4. Beehive 74
message M to be redelivered a�er timeout T. (iv) Applications can use the DeferReply utility
to postpone replying to a message. �is utility returns a serializable object that applications can
store in their dictionaries, and later use to reply to the original message. �is method is Beehive’s
way of providing continuation.
4.3.2 Fault-Tolerance
For some simple applications, having a consistent replication of the state is not a requirement nor
is transactional behavior. For example, a hub or learning switch can simply restart functioning
with an empty state. For most stateful applications, however, we need a proper replication
mechanism for cells.
Transactions. In Beehive, Rcv functions are transactional. Beehive transactions include the
messages emitted in a function in addition to the operations on the dictionaries. Upon calling
the Rcv function for a message, we automatically start a transaction. �is transaction bu�ers
all the emitted messages along with all the updates on application dictionaries. If there was
an error in the message handler, the transaction will be automatically rolled back. Otherwise,
the transaction will be committed and all the messages will be emitted by the platform. We
will shortly explain how Beehive batches transactions to achieve high throughput, and how it
replicates these transactions for fault-tolerance.
It is important to note that Beehive, by default, hides all the aspects of transaction
management. With that, control applications are implemented as if there is no transaction.
Having said that, network programmers have full control over transactions, and can begin,
commit, or abort a new transaction if they choose to.
Colonies. As shown in Figure 4.9, to replicate the transactions of a bee, the platform
automatically forms a colony. Each colony has one leader bee and a few follower bees in
accordance to the application’s replication factor.4 In a colony, only the leader bee receives and
processes messages, and followers act as purely passive replicas.4Applications can choose to skip replication and persistence.
Chapter 4. Beehive 75
Hive 1Hive 2
Hive 3
Hive 4
Hive 5
Hive 6
Hive 7
Hive 8
Bee QuorumsHive Quorum
Figure 4.9: Hives form a quorum on the assignment of cells to bees. Bees, on the other hand,form quorums (i.e., colonies) to replicate their respective cells in a consistent manner. �e sizeof a bee colony depends on the replication factor of its application. Here, all applications have areplication factor of 3.
Replicating Transactions. Bees inside the colony form a Ra� [96] quorum to replicate the
transactions. Note that the platform enforces that the leader of the Ra� quorum is always the
leader bee. Before committing a transaction, the leader �rst replicates the transaction on the
majority of its followers. Consequently, the colony has a consistent state and can tolerate faults
as long as the majority of its bees (12replication_factor + 1) are alive.
Batched Transactions. With a naive implementation, replicating each transaction incurs the
overhead of a quorum for each message. �is hinders performance and can result in subpar
throughput for real world applications. To overcome this issue, we batch transactions in
Beehive and replicate each batch once. With that, the overheads of replicating a transaction
are amortized over all the messages processed in that batch.
To evaluate the impact of batching on Beehive’s throughput, we have implemented a simple
key-value store conceptually similar to memcached [43] in less than 150 lines of code. To
measure its throughput, we deployed our sample key-value store application on a 5-node cluster
on Google Compute Engine. Each VM has 4 cores, 16GB of RAM, 10GB of SSD storage.
We con�gured the application to use 1024 buckets for the keys with replication factors of 1, 3
Chapter 4. Beehive 76
1k
100k
200k
300k
128 2k 4k 6k 8kBatch Size
Thr
ough
put (
msg
/s) Replication Factors: 1 3 5
(a)
100
101
102
103
GET
PUT
Method
Late
ncy
(ms)
When the key is on a remote hive.
When the key is on the local hive.
(b)
Figure 4.10: �e latency and throughput of the sample key-value store application: (a)�roughput using replication factors of 1 to 5, and batch sizes of 128 to 8k. (b) �e PDF ofthe end-to-end latency for GET and PUT requests in a cluster of 20 hives and 80 clients.
and 5, and measured the throughput for batch sizes from 128 to 8192 messages. In each run, we
emit 10M PUT (i.e., write) messages and measure the throughput. As shown in Figure 4.10a,
using larger batch sizes signi�cantly improves the throughput of Beehive applications. �is
improvement is more pronounced (3 to 4 orders of magnitude) when we use replication factors
of 3 and 5, clearly because the overheads are amortized. We note that here, for measurement
purposes, all the master bees are on the same hive. In real settings, one observes a higher
aggregate throughput from all active hives.
Batching vs Latency. In our throughput micro-benchmarks, we send requests to our application
in batches. Although that is the case in practice for most applications, one might ask how the
platform performs when the requests are sent once the previous request is retrieved. To answer
this question, we used another 20-node cluster to query the key-value store sample application.
Each node has 4 clients that send PUT or GET requests over HTTP and then measure the
Chapter 4. Beehive 77
response time (including the HTTP overheads).
As shown in Figure 4.10b, despite the lack of an e�ective batching for requests, Beehive
handles most GETs in less than 1ms if the key is on the local hive, and less than 7ms if the key is
on a remote hive. �e reason for such a low read latency is that there is no need to replicate read-
only transactions. Moreover, it handles most PUTs in less than 20ms. �is indicates that, even in
such a worst-case scenario, Beehive handles requests with a reasonable latency. As demonstrated
above, in the presence of a batch of messages, Beehive achieves a signi�cantly better latency on
average.
Elasticity. Beehive’s colonies are elastic. When a follower fails or when a hive leaves the cluster,
the leader of the colony creates new followers on live hives if possible. Moreover, if a new hive
joins the cluster and the colony is not fully formed, the leader automatically create new followers.
Note that, forming a consistent colony is only possible when the majority of hives are alive since
any update to a colony requires a consensus among hives.
Failover. When the leader bee fails, one of the follower bees will become the leader of the
Ra� quorum. Once the new leader is elected, the quorum updates its transaction logs and any
uncommitted (but su�ciently replicated) transaction will be committed (i.e., the state updates
are applied and bu�ered messages are emitted). Note that the failover timeout (i.e., the timeout
to start a new election) is con�gurable by the user.
To demonstrate the failover properties of Beehive, we ran the key-value store sample
application with a replication factor of 3 on a 3-node cluster. In this experiment, we allocate
the master bee on Hive 1, and send the benchmark queries to Hive 2. A�er ~5 seconds, we
kill Hive 1, and measure the time the system takes to failover. As shown in Figure 4.11, it takes
198ms, 451ms, and 753ms for the sample key value store application to failover respectively for
Ra� election timeouts of 100ms, 300ms, and 500ms. Clearly, the smaller the election timeout,
the shorter the failover. Having said that, users should choose a Ra� election timeout that does
not result in spurious timeouts based on the characteristics of their network.
Beehive’s failover time would not be adversely a�ected when we increase the number of hives
Chapter 4. Beehive 78
010203040
010203040
010203040
100300
500
3 4 5 6 7 8Time (sec)
Late
ncy
(ms) 198ms
451ms
753ms
Election Timeout
Election Timeout (m
s)
Figure 4.11: Beehive failover characteristics using di�erent Ra� election timeouts. In thisexperiment, we use the key-value store sample application with a replication factor of 3 on a3-node cluster. Smaller election timeouts result in faster failover, but they must be chosen toavoid spurious timeouts based on the network characteristics.
or the colony size, since Ra� elections will start according to the election timeout regardless
of the quorum size. Moreover, Ra� adds randomness in the election timeout to avoid having
multiple candidates at the same time. �is reduces the probability of unsuccessful elections.
Network Partitions. In the event of a network partition, by default, the platform will be fully
operational in the partition with the majority of nodes to prevent a split-brain. For colonies,
however, they will be operational as long as the majority of bees are in the same partition.
Moreover, if a leader bee falls into a minority partition, it can serve messages with a read-only
access on its state. For example, when a LAN is disconnected from other parts of the network, the
bees managing switches in that LAN will be able to keep the existing switches operational until
the LAN reconnects to the network. To achieve partition tolerance, application dependencies
and their placement should be properly designed. Headless dependencies can result in subpar
partition tolerance, similar to any multi-application platform. To mitigate partitions, one can
deploy one cluster of Beehive in each availability zone and connect them through easy-to-use
proxy functionalities that Beehive provides.
Chapter 4. Beehive 79
4.3.3 Live Migration of Bees
We provide the functionalities to migrate a bee from one hive to another along with its cells.
�is is instrumental in optimizing the placement of bees.
Migration without Replication. For bees that are not replicated (i.e., have replication factors
of one), Beehive �rst stops the bee, bu�ers all incoming messages, and then replicates its cells
to the destination hive. �en a new bee is created on the destination hive to own the migrated
cells. At the end, the cells are assigned to the new bee and all bu�ered messages are accordingly
drained.
Migration with Replication. We use a slightly di�erent method for bees that are replicated. For
such leader bees, the platform �rst checks whether the colony has a follower on the destination
hive. If there is such a follower, the platform simply stops the current leader from processing
further messages while asking the follower to sync all the replicated transactions. A�erwards,
the leader stops, and the follower explicitly campaigns to become the new leader of its colony.
A�er one Ra� election timeout, the previous leader restarts. If the colony has a new leader, the
leader would become a follower. Otherwise, this is an indication of a migration failure, and the
colony will continue as it was. In cases where the colony has no follower on the destination hive,
the platform creates a new follower on the destination hive, and performs the same process.
4.3.4 Runtime Instrumentation
Control functions that access a minimal state would naturally result in a well-balanced load
on all controllers in the control platform since such applications handle events locally in small
silos. In practice, however, control functions depend on each other in subtle ways. �is makes
it di�cult for network programmers to detect and revise design bottlenecks.
Sometimes, even with apt designs, suboptimalities incur because of changes in the workload.
For example, if a virtual network is migrated from one datacenter to another, the control
functions managing that virtual network should also be placed in the new datacenter to
Chapter 4. Beehive 80
minimize latency.
�ere is clearly no e�ective way to de�ne a concrete o�ine method to optimize the control
plane placement. For that reason, we rely on runtime instrumentation of control applications.
�is is feasible in our platform, since we have a well-de�ned abstraction for message handlers,
their state, and the messages. Beehive’s runtime instrumentation tries to answer the following
questions: (i) how are the messages balanced among hives and bees? (ii) which bees are
overloaded? and, (iii) which bees exchange the most messages?
Metrics. Our runtime instrumentation system measures the resource consumption of each bee
along with the number of messages it exchanges with other bees. For instance, we measure
the number of messages that are exchanged between an OpenFlow driver of a switch and a
virtual networking application managing a particular virtual network. �is metric essentially
indicates the correlation of each switch to each virtual network. We also store the provenance
and causation data for messages. For example, in a learning switch application, we can report
that most PacketOut messages are emitted by the learning switch application upon receiving
PacketIn’s.
Design. We measure runtime metrics on each hive locally. �is instrumentation data is further
used to �nd the optimal placement of bees and also utilized for application analytics. We
implemented this mechanism as a control application using Beehive’s programming model. �at
is, each bee emits its metrics in the form of a message. �en, our application locally aggregates
those messages and sends them to a centralized application that optimizes the placement of the
bees. �e optimizer is designed to avoid complications such as con�icts in bee migrations.
Overheads of Runtime Instrumentation. One of the common concerns about runtime instru-
mentation is its performance overheads. To measure the e�ects of runtime instrumentation,
we have run the key-value store sample application for a batch size of 8192 messages and a
replication factor of 1, with and without runtime instrumentation. As shown in Figure 4.12,
Beehive’s runtime instrumentation imposes a very low overhead (less than 3%) on the bees
as long as enough resources are provisioned for runtime instrumentation. �is is because
Chapter 4. Beehive 81
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
300k
310k
320k
330k
340k
350k
OFF ONRuntime Instrumentation
Thr
ough
put (
msg
/s)
Figure 4.12: �e PDF of a bee’s throughput in the sample key-value application, with and withoutruntime instrumentation.
instrumentation data is processed asynchronously, and does not interfere with the critical path
in message processing.
4.3.5 Automatic Placement
By default, a hive which receives the �rst message for a speci�c cell successfully acquires the
lock and creates a new local bee for that cell. At this point, all subsequent messages for the same
mapped cells, will be processed on the newly created bee on this hive. �is default behavior can
perform well when most messages have locality (i.e., sources are in the close vicinity), which is
the case for most networks [16]. Having said that, there is a possibility of workload migration.
In such cases, Beehive needs to self-optimize via optimizing the placement of bees.
Finding the optimum placement of bees is NP-Hard5 and we consider it outside the scope of
this thesis. Here, we employ a greedy heuristic aiming at processing messages close to their
source. �at is, our greedy optimizer migrates bees among hives aiming at minimizing the
volume of inter-hive messages. We stress that optimality is not our goal and this heuristic is
presented as a proof of concept.
It is important to note that, in Beehive, bees that have an open socket or �le cannot be
migrated, since Beehive cannot move the open sockets and �les along with the bee. As a result,
our greedy heuristic will move other bees close to the bees performing IO, if they exchange a lot
5By reducing the facility location problem.
Chapter 4. Beehive 82
of messages. For instance, as we will demonstrate in Section 5.2, bees of a forwarding application
will be migrated next to the bee of an OpenFlow driver, which improves latency. We also note
that, using our platform, it is straightforward to implement other optimization strategies, which
we leave to future work.
Hives (1−40)
Hiv
es (
1−40
)
(a)
Hives (1−40)H
ives
(1−
40)
(b)
101
102
103
104
0 20 40 60 80 100
Time (sec)
BW (K
B/s)
Centralized Migration
Optimized
(c)Figure 4.13: Inter-hive tra�c matrix of the elephant �ow routing application (a) with an idealplacement, and (b) when the placement is automatically optimized. (c) Bandwidth usage of thecontrol plane in (b).
E�ectiveness of Automatic Placement. To demonstrate the e�ectiveness of our greedy
placement algorithm, we have run an experiment using our elephant �ow routing example
shown in Figure 4.4 and measured the tra�c matrix between hives which is readily provided by
Beehive. We have simulated a cluster of 40 controllers and 400 switches in a simple tree topology.
We initiate 100 �xed-rate �ows from each switch, and instrument the elephant �ow routing
application. 10% of these �ows have spikes (i.e., detected as an elephant �ow in Collect).
In the ideal scenario shown in Figure 4.13a, the switches are locally polled and all
messages are processed on the same hives except the TrafficSpike messages that are centrally
processed by Route. To demonstrate how Beehive can dynamically optimize the control
plane, we arti�cially assign the cells of all switches to the bees on the �rst hive. Once our
runtime instrumentation collects enough data about the futile communications over the control
channels, it starts to migrate the bee invoking Collect and Query to the hive connected to
each switch. In particular, it migrates the cell (S, SWi, StatSWi) next to the OpenFlow driver
that controls SWi. As shown in Figures 4.13b and 4.13c, this live migration of bees localizes
message processing and results in a placement similar to Figure 4.13a. Note that this is all done
automatically at runtime with no manual intervention.
Chapter 4. Beehive 83
Application-De�ned Placement. In Beehive, applications can de�ne their own explicit,
customized placement method. An application-de�ned placement method can be de-
�ned as a function that chooses a hive among live hives for a set of mapped cells:
Place(cells, live_hives). �is method is called whenever the cells are not already assigned
to a bee. Using this feature, applications can simply employ placement strategies such as
random, consistent hashing, and resource-based placements (e.g., delegate to another hive if
the memory usage is more than 70%) to name a few. It is important to note that, if none of these
solution works for a particular use-case, clearly, one can implement a centralized load balancer
application and a distributed worker application.
4.3.6 Application Composition
Our approach to foster code reuse in the control platform is to exchange messages among
applications. Although this approach results in a high degree of decoupling, sometimes network
programmers need to compose applications with an explicit, prede�ned ordering. For example,
for security reasons, one might want to compose an access control application with a learning
switch in such a way that messages are always processed by the access control application �rst,
and then relayed to the learning switch.
For such cases, Beehive provides generic composers that compose a set of control functions
into a single composed function. �e composed function isolates the dictionaries of di�erent
applications. �e Map function of this composition returns the union of the mapped cells of the
composed functions to preserve consistency.
In Beehive, we provide two generic composers:
1. All(f1, f2, . . . ) composes a set of message handlers as a sequence in such a way that the
incoming message is passed to the next function only if the previous function has successfully
handled the message. If there is an error in any function the transaction will be aborted.
2. Any(f1, f2, . . . ) composes a set of message handlers in such a way that the incoming message
Chapter 4. Beehive 84
Use-Case Code SizeHub 16 LOCKandoo [57] 47 LOCLearning Switch 69 LOCElephant Flow Re-Routing 93 LOCKey-Value Store 136 LOCMessage Queue 158 LOCPub-Sub 162 LOCNetwork Virtualization 183 LOCNetwork Discovery 239 LOCRouting 385 LOCONIX NIBs 471 LOC
Table 4.1: Sample use-cases implemented in Beehive and their code size
is passed to the next function only if the previous function could not successfully handle the
message. In other words, Any guarantees that the side e�ect of at most one of these functions
will be applied to the composed application’s state.
4.4 Applications
As shown in Table 4.1, various use-cases are implemented using Beehive with an e�ort
comparable to existing centralized controllers. Note that these numbers include the generated
Map functions. �ese numbers indicate that the developer e�orts have been devoted to the core
of the use-cases not the boilerplates of distributed programming, and the end result can be run
as a distributed application.
Interestingly, quite a few of these use-cases are similar to well-known distributed archi-
tectures. For example, the key-value store application behaves similar to memcached [43]
and our distributed graph processing application behaves similar to Pregel [83]. �is implies
that Beehive’s programming model would accommodate a variety of distributed programming
�avors without hindering simplicity.
It is important to note that, among these applications, ONIX NIB is the largest project. �e
reason is that more than 60% of our code was devoted to implementing NIB’s object model
Chapter 4. Beehive 85
(e.g., node, links, paths, etc.) and their APIs. We expected any centralized implementation of
ONIX NIBs to have the same amount of programming complexity. Moreover, we note that the
active component that manages these NIB objects is less than 100 LOCs.
Centralized Applications. A centralized application is a composition of functions that require
the whole application state in one physical location. In our framework, a function is centralized
if it accesses the whole dictionaries to handle messages. �e Map function of a centralized
application should return the same value (i.e., the same key) for all messages. As discussed
earlier in Section 4.3, for such a message handler, Beehive guarantees that all the messages are
processed in one single bee. It is important to note that, since applications do not share state,
the platform may place di�erent centralized applications on di�erent hives to satisfy extensive
resource requirements (e.g., a large state).
1 App Centralized:2 ...3 Map(msg):
// Returning key 0 in dictionary D for all messages, this applicationwill process all messages in the same thread (i.e., in the same bee).As such, this application becomes a centralized application.
4 return {(D, 0)}
Figure 4.14: A centralized application maps all messages to the same cell. �is results inprocessing all messages in the same bee.
Kandoo. At the other end of the spectrum, there are local control applications proposed in
Kandoo [57] that use the local state of a single switch to process frequent events. �e message
handlers of a local control application maintain their state on a per switch basis: they use switch
IDs as the keys in their dictionaries and, to handle messages, access their state using a single key.
As such, their Map function would map messages to a cell speci�c to that switch (Figure 4.15).
As such, Beehive conceives one bee for each switch. If the application uses the default placement
method, the bee will be placed on the hive that receives the �rst message from the switch. �is
e�ectively results in a local instance of the control application.
An important advantage of Beehive, over Kandoo, is that it automatically pushes control
Chapter 4. Beehive 86
functions as close as possible to the source of messages they process (e.g., switches for local
applications). In such a setting, network programmers do not deliberately design for a speci�c
placement. Instead, the platform automatically optimizes the placement of local applications.
�is is in contrast to proposals like Kandoo [57] where the developer has to decide on function
placement (e.g., local controllers close to switches). �e other di�erence is that Beehive requires
one Ra� proposal to lock the local cells for the local bee, while Kandoo does not incur any
consensus overhead in the local controllers. �is overhead is, however, negligible since it incurs
only for the �rst message of each local bee.
1 App Local:2 ...3 Map(msg):
// Returning the switch ID as the mapped cell for each message, themessages of each switch will be processed in their own thread(i.e., the bee that exclusively owns the cell allocated for theswitch). Using the default placement method, the bee will be placedon the first hive that receives a message from that switch. Thiseffectively results in a local Kandoo application.
4 return {(D, msg.Switch)}
Figure 4.15: A local application maps all messages to a cell local to the hive (e.g., the switch’s ID).
ONIX’s NIB [75]. NIB is basically an abstract graph that represents networking elements and
their interlinking. To process a message in NIB manager, we only need the state of a particular
node. As such, each node would be equivalent to a cell managed by a single bee in Beehive. With
that, all queries (e.g., �ows of switch) and update messages (e.g., adding an outgoing link) on a
particular node in NIB will be handled by its own bee in the platform. In Chapter 5, we present
Beehive SDN controller that provides an abstraction similar to ONIX’s NIBs.
1 App NetworkManager:2 ...3 Map(msg):
// By mapping messages to their nodes, the network manager will maintainthe NIB on a per-node basis.
4 return {(D, msg.NodeID)}
Figure 4.16: To manage a graph of network objects, one can easily map all messages to the node’sID. �is way nodes are spread among hives on the platform.
Chapter 4. Beehive 87
Logical xBars [86]. Logical xBar represents the control plane as a hierarchy in which each
controller acts as a switch to its parent controller. To emulate such an architecture (and any
similar hierarchical controller), we attach a hierarchical identi�er to network messages and store
the state of each xBar controller using the same identi�ers as the keys in the application state.
�is means each controller in xBar hierarchy would have its own cell and its own bee in Beehive.
Upon receiving a message, Beehive Map’s the message to its cell using the message’s hierarchical
identi�er and is hence processed by its respective bee.
Network Virtualization. Typically, network virtualization applications (such as NVP[74])
process messages of each virtual network independently. Such applications can be modeled
as a set of functions that, to process messages, access the state using a virtual network identi�er
as the key. �is is basically sharding messages based on virtual networks, with minimal shared
state in between the shards. Each shard basically forms a set of colocated cells in Beehive and
the platform guarantees that messages of the same virtual network are handled by the same bee.
1 App NetworkVirtualization:2 ...3 Map(create CreatePort):
// By mapping the create port message to the entry in the Portsdictionary and the virtual network entry in the VN dictionary, thisport will always be processed in the bee that owns the virtualnetwork.
4 return {(Ports, create.PortID), (VN, create.VirtualNetworkID)}
5 Map(update PortStatusUpdate):6 return {(Ports, update.PortID)}7 Map(msg VNMsg):8 return {(VN, msg.VirtualNetworkID)}
Figure 4.17: For virtual networks, messages of the same virtual network are naturally processedin the same bee.
4.5 Discussion
What are the main limitations of Beehive’s programming model? As we mentioned in
Section 4.2, in Beehive, applications cannot store persistent data outside the application
Chapter 4. Beehive 88
dictionaries. Since the value of an entry in a dictionary can be an arbitrary object, this
requirement would not impose a strong limitation on control applications. We note that this
requirement is the key enabler for the platform to automatically infer the Map function and to
provide fault-tolerance for control applications.
Another design-level decision is to provide a serialized, consistent view of messages for
each partition of cells owned by a bee. �at is, each bee is a single thread that processes its
messages in sequence. �us, Beehive does not provide eventual consistency by default and
instead processes messages in consistent shards. A Beehive application can, in fact, implement
eventual consistency by breaking the functionalities into local message handlers and using
asynchronous messages to share state instead of dictionaries. We note that eventual consistency
has many implementation and performance drawbacks [21, 32].
How does Beehive compare with existing distributed programming frameworks? �ere is a
signi�cant body of research on how to simplify distributed programming. Some proposals focus
on remote invocations. �is goes as early as CORBA to the programming environments such as
Live Objects[98]. Moreover, special-purpose programming models, such as MapReduce [35],
MPI, Cilk [19], Storm [1] and Spark [132] are proposed for parallel and distributed processing.
Beehive has been inspired by the approach taken by these proposals to simplify distributed
programming. Having said that, the focus and the applications of these proposals are di�erent
than what we need. �erefore, they cannot be used to as a basis to implement Beehive. For
example, MapReduce and Spark are suitable for batch jobs and Storm is designed for one-way
stream processing. In contrast, Beehive requires bi-directional communication among generic
distributed applications handling a live workload.
�ere are Domain-Speci�c Languages (DSLs), such as BOOM [13] and Fabric [80], that
propose new ways of programming for the purpose of provability, veri�ability or security.
In contrast to proposing a new DSL, our goal here is to create a distributed programming
environment similar to centralized controllers. To that end, control applications in Beehive are
developed using a general purpose programming language using the familiar APIs provided by
Chapter 4. Beehive 89
Beehive’s programming model.
Howdoes Beehive compare with existing distributed controllers? Existing control planes such
as ONIX [75] and ONOS [17] are focused on scalability and performance. Beehive, on the other
hand, provides a programming paradigm that simpli�es the development of scalable control
applications. We note that existing distributed controllers such as Kandoo [57], ONIX [75],
and Logical xBars [86] can all be implemented using Beehive. More importantly, Beehive’s
automatic placement optimization leads to results similar to what ElasticCon [37] achieve by
load-balancing. Our optimization, however, is generic, is not limited to balancing the workload
of OpenFlow switches, applies to all control applications, and can be extended to accommodate
alternative optimization methods.
How does Beehive compare with fault-tolerance proposals? LegoSDN [28] is a framework
to isolate application failures in a controller. Beehive similarly isolates application failures
using automatic transactions, which is similar to LegoSDN’s Absolute Compromise policy
for centralized controllers. Other LegoSDN policies can be adopted in Beehive, if needed.
Moreover, Ravana [68] is a system that provides transparent slave-master fault-tolerance for
centralized controllers. As demonstrated in Figure 5.12a, Beehive has the capabilities not only
to tolerate a controller failure, but also seamlessly and consistently tolerates faults in the control
channel that is outside the scope of Ravana.
Do applications interplay well in Beehive? �e control plane is an ensemble of control appli-
cations managing the network. For the most part, these applications have interdependencies.
No matter how scalable an application is on its own, heedless dependency on a poorly designed
application may result in subpar performance. For instance, a local application that depends on
messages from a centralized application might not scale well. Beehive cannot automatically �x
a poor design, but provides analytics to highlight the design bottlenecks of control applications,
thus assisting the developers in identifying and resolving such performance issues.
Moreover, as shown in [23] and [90], there can be con�icts in the decisions made by
di�erent control applications. Although we do not propose a speci�c solution for that issue,
Chapter 4. Beehive 90
these proposals can be easily adopted to implement control applications in Beehive. Speci�cally,
one can implement the Corybantic Coordinator [90] or the STN Middleware [23] as a Beehive
application that sits in between the SDN controller and the control applications. With Beehive’s
automatic optimization, control modules are easily distributed on the control platform to make
the best use of available resources.
4.6 Remarks
In this chapter, we have presented Beehive, a distributed control platform that is easy to use and
instrument. Using a simple programming model, Beehive infers the state required to process
a message, and automatically compiles applications into their distributed counterparts. By
instrumenting applications at runtime, Beehive optimizes the placement of control applications,
and provides feedback to the developer helping them in the design process. We have
demonstrated that our proposal is able to model existing distributed control planes. Moreover,
our evaluations con�rm that this approach can be e�ective in designing scalable control planes.
In the next chapter, we continue our evaluations with two real-world use cases of Beehive.
Chapter 5
Use Cases
In this chapter, we present a few real-world use cases of the programming models proposed in
this thesis. We start with OpenTCP which is an e�ort to adapt di�erent �avors of TCP based
on the scalable statistics collection provided by Kandoo. We present our real world evaluations
of OpenTCP in the SciNet HPC datacenter. We continue with a distributed SDN controller we
have designed and implemented based on Beehive. We demonstrate that our controller is fault-
tolerant and scalable, yet is implemented in a few hundred lines of code similar to centralized
controllers. We conclude this section with a sample routing algorithm that we implemented as
a proof of concept to showcase the generality of Beehive.
5.1 OpenTCP
�e Transmission Control Protocol (TCP) [65] is designed to fully utilize the network
bandwidth while keeping the entire network stable. TCP’s behaviour can be sub-optimal and
even erroneous [119, 12, 30] mainly for two reasons: First, TCP is expected to operate in a diverse
set of networks with di�erent characteristics and tra�c conditions; making TCP a “Jack of all
trades, master of none” protocol. Limiting TCP to a speci�c network and taking advantage
of local characteristics of that network can lead to major performance gains. For instance,
DCTCP [12] outperforms TCP in datacenter networks, even though the results might not be
91
Chapter 5. Use Cases 92
applicable in the Internet. With this mindset, one can adjust TCP (the protocol itself and its
parameters) to gain a better performance in speci�c networks (e.g., datacenters).
Second, even focusing on a particular network, the e�ect of dynamic congestion control
adaptation to tra�c pattern is not yet well understood in today’s networks. Such adaptation can
potentially lead to major improvements, as it provides another dimension that today’s TCP does
not explore.
Clearly, adapting TCP requires meticulous consideration of the network characteristics and
tra�c patterns. �e fact that TCP relies solely on end-to- end measurements of packet loss or
delay as the only sources of feedback from the network means that TCP, at its best, has a very
limited view of the network state (e.g., the trajectory of available bandwidth, congested links,
network topology, and tra�c). �us, a natural question here is:
Can we build a system that observes the state and dynamics of a computer network and adapts
TCP’s behaviour accordingly?
If the answer to this question is positive, it can simplify tuning di�erent TCP parameters. It
can also facilitate dynamically changing the protocol itself.
OpenTCP. Here we present OpenTCP [46] as a system for dynamic adaptation of TCP based
on network and tra�c conditions in SDN [88]. OpenTCP mainly focuses on internal tra�c
in SDN-based datacenters for four reasons: (i) the SDN controller already has a global view
of the network (such as topology and routing information), (ii) the controller can collect any
relevant statistics (such as link utilization and tra�c matrix), (iii) it is straightforward to realize
OpenTCP as a control application in SDNs, and (iv) the operator can easily customize end-hosts’
TCP stack.
Our preliminary implementation of OpenTCP has been deployed in SciNet, a high-
performance computing (HPC) datacenter with ∼4000 hosts. We use OpenTCP to tune TCP
parameters in this environment to show how it can simplify the process of adapting TCP to
network and tra�c conditions. We also show how modifying TCP using OpenTCP can improve
the network performance. Our experiments show that using OpenTCP to adapt init_cwnd
Chapter 5. Use Cases 93
Local Controller
Root Controller
Oracle
Switch
Switch
SwitchFlow
Flow
NetworkStatistics
CCA
Host Host
Host
Host
CUE
Local Controller
Local Controller
Local Controller
Local ControllerCCA
Host
Local ControllerCCA
Host
Local ControllerCCA
Local ControllerCCA
Local ControllerCCA
Figure 5.1: Using Kandoo to implement OpenTCP: �e Oracle, SDN-enabled switches, andCongestion Control Agents are the three components of OpenTCP.
based on link utilization leads to up to 59% reduction in �ow completion times.
OpenTCP and related work. OpenTCP is orthogonal to previous works improving TCP’s
performance. It is not meant to be a new variation of TCP. Instead, it complements previous
e�orts by making it easy to switch between di�erent TCP variants automatically (or in a semi-
supervised manner), or to tune TCP parameters based on network conditions. For instance, one
can use OpenTCP to either utilize DCTCP or CUBIC in a datacenter environment. �e decision
on which variant to use is made in advance through the congestion control policies de�ned by
the network operator.
5.1.1 OpenTCP Architecture
OpenTCP collects data regarding the underlying network state (e.g., topology and routing
information) as well as statistics about network tra�c (e.g., link utilization and tra�c matrix).
�en, using this aggregated information and based on policies de�ned by the network operator,
OpenTCP decides on a speci�c set of adaptations for TCP. Subsequently, OpenTCP sends
periodic updates to the end-hosts that, in turn, update their TCP variant using a simple kernel
module. Figure 5.1 presents the schematic view of how OpenTCP works. As shown, OpenTCP’s
architecture has three main components that aptly �t into Kandoo’s design (§3):
Chapter 5. Use Cases 94
1. �e Oracle is a non-local SDN control application that lies at the heart of OpenTCP
and is deployed on the root controller. It collects information about the underlying
network and tra�c. �en, based on congestion control policies de�ned by the network
operator, it determines the appropriate changes to optimize TCP. Finally, it distributes
update messages to end-hosts.
2. Collectors are typical local applications that collect statistics from switches, deployed on
local controllers to scale.
3. CongestionControl Agents (CCAs) are also local applications that receive update messages
from the Oracle and are responsible for modifying the TCP stack at each host using a
kernel module.
5.1.2 OpenTCP Operation
OpenTCP goes through a cycle composed of three steps: (i) data collection, (ii) optimization
and CUE generation, and (iii) TCP adaptation. �ese steps are repeated at a pre-determined
interval of T seconds. To keep the network stable, T needs to be several orders of magnitude
slower than the network RTT . In the rest of this section, we will describe each of these steps in
detail.
Data Collection. OpenTCP relies on two types of data for its operation. First, it needs
information about the overall structure of the network, including the topology, link properties
like delay and capacity, and routing information. Since this information is readily available in
the SDN controller, the Oracle can easily access them. Second, OpenTCP collects statistics about
tra�c characteristics by periodically querying �ow-tables in SDN switches. In order to satisfy
the low overhead design objective, our implementation of OpenTCP relies on the Kandoo local
controllers [57] to collect summarized link level statistics at scale. �e operator can choose to
collect only a subset of possible statistics to further reduce the overhead of data collection. Later,
we will show how this is done by using congestion control policies.
Chapter 5. Use Cases 95
Congestion Control Policy
Statistics Mask
Optimizationvariable(s)
objective functionconstraint(s)
Mapping Rules
S
DN
Con
trol
ler
Oracle
Optimizer
CUE Generator
SolutionStatisticsStatisticsStatistics
StatisticsStatisticsN
et. State
Hosts CCA
CUEsCUEsCUEs
Switches
Network Operator
Figure 5.2: �e Oracle collects network state, and statistics through the SDN controller andswitches. Combining those with the CCP de�ned by the operator, the Oracle generates CUEswhich are sent to CCAs sitting on end-hosts. CCAs will adapt TCP based on the CUEs.
CUE Generation. A�er data collection, we must determine what changes to make to TCP
sources. �is decision might vary from network to network depending on the network operator’s
preferences. In OpenTCP, we formalize these preferences by de�ning a Congestion Control
Policy (CCP). �e network operator provides OpenTCP with the CCP. At a high level, CCP
de�nes which statistics to collect, what objective function the operator should optimize, what
constraints OpenTCP must satisfy, and how the proposed updates should be mapped onto
speci�c changes in TCP sources. Based on the CCP de�ned by the network operator, OpenTCP
�nds the appropriate changes and determines how to adapt the individual TCP sessions. �ese
changes are sent to individual CCAs in messages we call Congestion Update Epistles (CUEs).
�e CCAs which are present in individual end-hosts use these CUEs to update TCP. Figure 5.2
shows how CCPs are integrated into the TCP adaptation process.
TCP Adaptation. �e �nal step in OpenTCP’s operation is to change TCP based on CUEs
generated by the Oracle. A�er solving the optimization problem, the Oracle forms CUEs and
sends them directly to CCAs sitting in the end-hosts. CUEs are usually very small packets.
In our implementation, each CUE is usually less than 10-20 bytes. �us, their overhead is very
small.
Chapter 5. Use Cases 96
CCAs are responsible for receiving and enforcing updates. Upon receiving a CUE, each CCA
will check the condition given in each mapping rule. If this condition is satis�ed, the CCA
immediately applies the command. Changes can be simple adaptations of TCP parameters or
can require switching between di�erent variants of TCP. If the CCA does not receive any CUEs
from the Oracle for a long time (we use 2T where T is the OpenTCP operation cycle), or if the
mapping rule conditions are violated, the CCA reverts all changes and OpenTCP goes back to
the default TCP settings.
Adaptation example. Assume the operator simply wants to switch from TCP Reno to CUBIC
while keeping an eye on packet drop rates. Here, we assume that the default TCP variant running
in end-hosts is TCP Reno. �ere is a constraint limiting the link utilization to 80%. If the
network has a low load (i.e., as long as all links have a utilization less than 80%) the Oracle
will send CUEs to all CCAs instructing them to switch to CUBIC. If network conditions change
and utilization goes above 80% in any link, the Oracle will not be able to satisfy the optimization
constraints. In this case, the CCAs will fall back to the default TCP variant, which is TCP Reno
here.
OpenTCPwithout SDNSupport. OpenTCP is expressly designed for SDN. In fact, the existence
of a Kandoo root controller with access to the global state of the network, as well as local
controllers which can be used to collect �ow and link level statistics, make SDN the perfect
platform for OpenTCP. Having said that, we can use OpenTCP on a traditional network (non-
SDN) as long as we can collect the required network state and tra�c statistics. Clearly, this
requires more e�ort and might incur a higher overhead since we do not have a centralized
controller or built-in statistics collection features. In the absence of SDN switches, we can use a
dedicated node in the network to act as the Oracle. Moreover, we can use CCAs to collect �ow
statistics (e.g., number of active �ows, utilization and drop rate) at each end-host using a kernel
module in a distributed manner and send relevant information to the Oracle. �e Oracle can
query CCAs to collect various statistics and if needed to aggregate them to match the statistics
available in SDN. �is needs to be done with extra care, so that the overhead remains low, and
Chapter 5. Use Cases 97
the data collection system does not have a major impact on the underlying network tra�c.
Conditions and constraints for OpenTCP. �e current implementation of OpenTCP relies on
a single administrative authority to derive network and tra�c statistics. To this end, we also
assume no adversary in the network and that we could change the end-host stacks. Hence,
OpenTCP doesn’t have to operate in a datacenter per se, and any network that satis�es the above
conditions can bene�t from OpenTCP.
Changing parameters in a running system. Depending on the TCP parameter, making
changes to TCP �ows in an operational network may or may not have immediate changes in
TCP behaviour. For example, adapting init_cwnd only a�ects new TCP �ows but max_cwnd
change is respected almost immediately. If the congestion window size is greater than the new
max_cwnd, TCP will not drop those packets, it will deliver them and a�erwards respects the
new max_cwnd. Similarly, enabling TCP pacing will have an immediate e�ect on the delivery
of packets. �e operator should be aware of when and how each parameter setting will a�ect on
going TCP sessions.
5.1.3 Evaluation
To evaluate the performance of OpenTCP, we deploy our proposal in a High Performance
Computing (HPC) datacenter. SciNet is mainly used for scienti�c computations, large scale
data analyses, and simulations by a large number of researchers with diverse backgrounds: from
biological sciences and chemistry, to astronomy and astrophysics. �rough a course of 20 days,
we enabled OpenTCP and collected 200TB of packet level trace as well as �ow and link level
statistics using our tcp_flow_spy kernel module [8].
Setup.�e topology of SciNet is illustrated in Figure 5.3. �ere are 92 racks of 42 servers. Each
server connects to a Top of Rack switch (ToR) via 1Gbps Ethernet. Each end-host in SciNet
runs Linux 2.6.18 with TCP BIC [52] as the default TCP. SciNet consists of 3, 864 nodes with
a total of 30, 912 cores (Intel Xeon Nehalem) at 2.53GHz, with 16GB RAM per node. �e ToR
switches are connected to a core Myricom Myri-10G 256-Port 10GigE Chassis each having a
Chapter 5. Use Cases 98
10 Gbps
Core Switch
Host
Host Host Host Host
42 × 1 Gbps
Switch Switch Switch Switch
92
42
Host Host Host
Figure 5.3: �e topology of SciNet, the datacenter network used for deploying and evaluatingOpenTCP.
10Gbps connection.
OpenTCP in SciNet. We did not have access to a large scale SDN and could not change any
hardware elements in SciNet. �erefore, we took the extra steps to deploy OpenTCP without
SDN. In our experiments, the CCAs periodically collect socket-level statistics, aggregate them,
and send the results to the Oracle, a dedicated server in SciNet using the tcp_flow_spy
kernel module. �e Oracle performs another level of aggregation to produce �nal statistics
combining data from the CCAs. �roughout our experiments, we monitor OpenTCP to ensure
that we do not have a major CPU or bandwidth overhead. �e average CPU overhead of our
instrumentation is less than 0.5%, and we have a negligible bandwidth requirement. We believe
this is a reasonably low overhead, one which is unlikely to have a signi�cant impact on the
underlying jobs and tra�c.
Evaluation overview. We begin this section by describing the workload in SciNet (§ 5.1.3).
We show that, despite the di�erences in the nature of the jobs running in SciNet, the overall
properties of the network tra�c, such as �ow sizes, �ow completion times (FCT), tra�c
locality, and link utilizations are analogous to those found by previous studies of datacenter
tra�c [53, 12, 48]. We take this step to ensure our results are not isolated or limited to SciNet.
Chapter 5. Use Cases 990
0.2
0.4
0.6
0.8
1
1E-06 1E-03 1E+00 1E+03 1E+06
Cu
mu
lative
Fra
ctio
n
Log of FCT (Seconds)
Message Passing InterfaceDistributed File System
0
0.2
0.4
0.6
0.8
1
1E+00 1E+03 1E+06 1E+09 1E+12
Cu
mu
lative
Fra
ctio
n
Log of Flow Size (Bytes)
Message Passing InterfaceDistributed File System
(a)
0
0.2
0.4
0.6
0.8
1
1E-06 1E-03 1E+00 1E+03 1E+06
Cu
mu
lative
Fra
ctio
n
Log of FCT (Seconds)
Message Passing InterfaceDistributed File System
0
0.2
0.4
0.6
0.8
1
1E+00 1E+03 1E+06 1E+09 1E+12
Cu
mu
lative
Fra
ctio
n
Log of Flow Size (Bytes)
Message Passing InterfaceDistributed File System
(b)
Figure 5.4: SciNet tra�c characterization. �e x-axis is in log scale. (a) CDF of �ow sizesexcluding TCP/IP headers. (b) CDF of �ow completion times (right).
Tra�c Characterization. At SciNet, the majority of jobs are Message Passing Interface
(MPI) [44] programs running on one or more servers. Jobs are scheduled through a queuing
system that allows a maximum of 48 hours per job. Any job requiring more time must be broken
into 48 hour chunks. Note that each node runs a single job at any point of time. We use OpenTCP
to collect traces of �ow and link level statistics, capturing a total of ∼ 200TB of �ow level logs
over the course of 20 days.
Workload. We recognize two types of co-existing TCP tra�c in our traces (shown in Figure 5.4):
(i) MPI tra�c and (ii) distributed �le system �ows. �e majority of MPI �ows are less than 1MB
in size and �nish within 1 second. �is agrees with previous studies of datacenter tra�c [12, 16].
It is interesting to note that only a few MPI �ows (1%) last up to the maximum job time allowance
of 48 hours. Unlike the MPI �ows, the distributed �le system tra�c ranges from 20B to 62GB
in size, and most (93%) �nish within 100 seconds. A few large background jobs last more than
6 days.
Locality. Figure 5.5 represents the log of the number of bytes exchanged between server pairs in
24 hours. Element (i , j) in this matrix represents the amount of tra�c that host i sends to host j.
Chapter 5. Use Cases 100
From
To
10
20
30
40
50
60
Figure 5.5: �e locality pattern as seen in matrix of log(number of bytes) exchanged betweenserver pairs.
�e nodes are ordered such that those within a rack are adjacent on the axes.1 �e accumulation
of dots around the diagonal line in the �gure shows the tra�c being locally exchanged among
servers within a rack, a pattern that is similar to previous measurements [16, 53]. Note that this
matrix is not symmetrical as it depicts the tra�c from nodes on the x-axis to nodes on the y-
axis. �e vertical lines in the tra�c matrix represent “token management tra�c” belonging to
the distributed �le system. �is type of tra�c exchanges tokens between end-hosts and a small
set of nodes called token managers. Finally, the rectangular regions in the �gure represent tra�c
between the nodes participating in multi-node MPI jobs.
Utilization. We observe two utilization patterns in SciNet. First, 80% of the time, link
utilizations are below 50%. Second, there are congestion epochs in the network where both
edge and core links experience high levels of utilization and, thus, packet losses. Again, this
observation accords with the previous studies where irrespective of the type of datacenter, link
utilizations are known to be low [16, 53].
Experiment Methodology. We run a series of experiments to study how OpenTCP works
in practice, using a combination of standard benchmarks and sample user jobs as the
workload. During each experiment, we use OpenTCP to make simple modi�cations to end-
hosts’ congestion control schemes, and measure the impact of these changes in terms of �ow
completion times and packet drop rates. �roughout our experiments, we set OpenTCP’s slow
1Virtualization is not used in this cluster.
Chapter 5. Use Cases 101
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1E+01 1E+02 1E+03 1E+04 1E+05 1E+06 1E+07 1E+08 1E+09
Cu
mu
lative
Fra
ctio
n
Log of Flow Size (Bytes)
All to all
Flash
All Gather
Exchange
Scat
ter
Reduce scatter
Figure 5.6: CDF of �ow sizes in di�erent benchmarks used in experiments. �e x-axis is in logscale.
time scale to 1min, unless otherwise stated2 directing OpenTCP to collect statistics and send
CUEs once every 1min. We also measure the overhead of running OpenTCP on CPU usage
and bandwidth. In our experiments, we have used 40 nodes in 10 ToR racks and we run all the
benchmarks explained above in over 10 iterations.
To evaluate the impact of the changes made by OpenTCP, we symmetrically split our nodes
into two sets. Both sets run similar jobs, but we deploy OpenTCP on one of these two sets
only (i.e., half the nodes). Because the two sets are comparable, we can observe the impact of
OpenTCP with minimum in�uence from other random e�ects.
Benchmarks. In order to study di�erent properties of OpenTCP, we use Intel MPI Benchmarks
(IMB) [9] as well as sample user jobs in SciNet. �ese benchmarks cover a wide range of
computing applications, have di�erent processing and tra�c load characteristics, and stress the
SciNet network in multiple ways. We also use sample user jobs (selected from the pool of jobs
submitted by real users) to have an understanding of the behaviour of the system under typical
user generated workload patterns. Figure 5.6 shows the CDF of �ow sizes for these benchmarks.
As illustrated in the �gure, �ow sizes range from 50B to 1GB, much like those observed in § 5.1.3.
�e results presented here are for “all-to-all” IMB benchmark and “Flash” user job. We measured
similar consistent results with other IMB benchmarks as well as other sample user jobs.
2In this environment, the fast time scale (RTT) is less than 1ms.
Chapter 5. Use Cases 1020.0 1.56E-03 0.0 1.05E-03 0.0 7.72E-03
1.368E-03 1.00E-02 2.03E-03 6.21E-03 6.50E-05 1.02E-02
1.4476E-02 2.00E-02 0.004045 6.59E-03 9.40E-05 2.06E-02
2.0162E-02 3.00E-02 6.27E-03 7.91E-03 0.000232 3.07E-02
2.3074E-02 4.00E-02 8.3E-03 8.34E-03 7.46E-04 3.84E-02
2.7308E-02 5.00E-02 0.010328 9.10E-03 1.267E-03 4.17E-02
1.30009E-01 6.00E-02 0.012501 9.92E-03 1.837E-03 4.34E-02
3.04158E-01 7.00E-02 1.4941E-02 1.02E-02 2.926E-03 4.37E-02
5.05662E-01 7.48E-02 1.614E-02 2.03E-02 3.44E-03 4.42E-02
7.31372E-01 7.56E-02 1.6356E-02 3.03E-02 3.946E-03 4.66E-02
9.80593E-01 7.60E-02 1.6431E-02 4.04E-02 4.475E-03 4.73E-02
1.217378 7.64E-02 1.6507E-02 5.04E-02 5.108E-03 4.75E-02
1.560446 7.65E-02 1.6594E-02 6.06E-02 6.264E-03 4.77E-02
1.777013 7.65E-02 1.6686E-02 7.06E-02 0.007574 4.81E-02
2.052396 7.67E-02 0.016801 8.06E-02 8.096E-03 4.93E-02
2.394432 7.68E-02 1.6942E-02 9.06E-02 8.688E-03 4.96E-02
2.690235 7.68E-02 1.7085E-02 1.01E-01 9.222E-03 4.96E-02
2.957487 7.69E-02 0.017276 1.11E-01 9.725E-03 4.99E-02
3.334564 7.70E-02 1.7452E-02 1.21E-01 0.010289 5.11E-02
3.753224 7.71E-02 1.7678E-02 1.31E-01 1.0852E-02 5.32E-02
3.966682 7.72E-02 1.7917E-02 1.41E-01 0.011479 5.39E-02
4.1969E+00 7.73E-02 0.018157 1.51E-01 1.249E-02 5.42E-02
4.492891 7.74E-02 1.8444E-02 1.61E-01 1.3137E-02 5.44E-02
4.845155 7.74E-02 1.8729E-02 1.71E-01 1.4103E-02 5.46E-02
5.058947 7.75E-02 1.9048E-02 1.81E-01 1.4826E-02 5.47E-02
5.33656E+00 7.75E-02 1.9387E-02 1.91E-01 1.7177E-02 5.48E-02
5.866325 7.76E-02 1.9745E-02 2.01E-01 2.4236E-02 5.49E-02
6.210425 7.77E-02 0.020038 2.11E-01 2.4965E-02 5.54E-02
6.554158 7.77E-02 2.024E-02 2.21E-01 0.025529 5.56E-02
6.839346 7.77E-02 2.0427E-02 2.31E-01 2.6054E-02 5.61E-02
7.07373E+00 7.88E-02 0.02061 2.41E-01 0.027649 5.71E-02
7.348253 7.91E-02 2.0781E-02 2.51E-01 2.8171E-02 5.73E-02
7.581513 7.94E-02 0.020961 2.61E-01 0.028704 5.77E-02
7.801092 7.96E-02 2.1135E-02 2.71E-01 0.029363 5.80E-02
8.034545 7.98E-02 2.1313E-02 2.81E-01 2.9957E-02 5.84E-02
8.390666 7.98E-02 2.1487E-02 2.91E-01 3.051E-02 5.92E-02
9.235576 7.99E-02 2.1677E-02 3.01E-01 3.1022E-02 6.04E-02
9.549654 8.00E-02 0.021856 3.11E-01 3.157E-02 6.08E-02
10.085976 8.00E-02 0.022022 3.21E-01 3.2113E-02 6.13E-02
10.340206 8.01E-02 0.022196 3.31E-01 3.2684E-02 6.17E-02
10.615855 8.04E-02 2.2343E-02 3.41E-01 3.3201E-02 6.18E-02
10.829659 8.05E-02 0.022508 3.51E-01 3.4545E-02 6.21E-02
11.039433 8.22E-02 0.022657 3.61E-01 3.5213E-02 6.22E-02
11.252364 8.44E-02 2.2797E-02 3.71E-01 0.036461 6.22E-02
11.454894 8.47E-02 0.022948 3.82E-01 0.039347 6.23E-02
11.700389 8.52E-02 2.3089E-02 3.92E-01 4.3436E-02 6.24E-02
11.990584 8.53E-02 2.3227E-02 4.02E-01 8.2546E-02 6.25E-02
12.836475 8.54E-02 2.3367E-02 4.12E-01 0.170652 6.25E-02
13.3147 8.54E-02 2.3519E-02 4.22E-01 1.76133E-01 6.26E-02
13.561489 8.55E-02 0.023663 4.32E-01 1.76714E-01 6.27E-02
13.799055 8.55E-02 2.3806E-02 4.42E-01 1.77323E-01 6.30E-02
14.009187 8.56E-02 2.395E-02 4.52E-01 0.177985 6.33E-02
14.294587 8.56E-02 2.4095E-02 4.62E-01 1.78744E-01 6.38E-02
14.536356 8.56E-02 2.4243E-02 4.72E-01 0.179866 6.39E-02
15.053835 8.56E-02 2.4382E-02 4.82E-01 1.80654E-01 6.42E-02
15.438598 8.57E-02 0.024537 4.92E-01 0.182034 6.43E-02
15.654457 8.58E-02 0.02469 5.02E-01 1.82732E-01 6.44E-02
15.91365 8.65E-02 2.4835E-02 5.12E-01 1.84797E-01 6.46E-02
16.185149 8.81E-02 0.024992 5.22E-01 1.86762E-01 6.47E-02
16.453792 8.88E-02 0.025137 5.32E-01 0.18756 6.48E-02
17.480976 8.88E-02 2.5306E-02 5.42E-01 1.89088E-01 6.49E-02
17.717138 8.89E-02 2.5471E-02 5.52E-01 0.202654 6.49E-02
17.924415 8.89E-02 2.5646E-02 5.62E-01 2.05131E-01 6.50E-02
18.234262 9.05E-02 2.5815E-02 5.72E-01 2.05665E-01 6.51E-02
18.449854 9.05E-02 0.025984 5.82E-01 0.206789 6.52E-02
18.712117 9.06E-02 2.6151E-02 5.92E-01 2.07915E-01 6.52E-02
20.039091 9.08E-02 2.6325E-02 6.02E-01 2.0914E-01 6.54E-02
20.776717 9.08E-02 2.6505E-02 6.12E-01 2.10129E-01 6.55E-02
20.984476 9.08E-02 0.026698 6.22E-01 2.13348E-01 6.58E-02
21.203764 9.08E-02 0.026882 6.32E-01 0.214113 6.59E-02
21.463044 9.10E-02 2.7069E-02 6.42E-01 0.217587 6.60E-02
21.731919 9.10E-02 0.027274 6.52E-01 0.218949 6.61E-02
21.933856 9.36E-02 2.7481E-02 6.62E-01 2.31753E-01 6.62E-02
22.266777 9.38E-02 2.7716E-02 6.72E-01 0.235737 6.63E-02
22.488595 9.39E-02 2.7941E-02 6.82E-01 2.36775E-01 6.65E-02
23.419647 9.43E-02 2.8202E-02 6.92E-01 2.37891E-01 6.68E-02
23.917087 9.43E-02 2.8479E-02 7.02E-01 2.38629E-01 6.69E-02
24.232286 9.43E-02 0.028756 7.12E-01 0.240144 6.72E-02
25.093202 9.44E-02 2.9062E-02 7.23E-01 2.41014E-01 6.79E-02
25.500754 9.44E-02 0.029401 7.33E-01 2.4153E-01 6.86E-02
26.003099 9.76E-02 2.9787E-02 7.43E-01 2.42043E-01 6.91E-02
26.328407 9.77E-02 3.0263E-02 7.53E-01 2.42666E-01 6.97E-02
26.818761 9.77E-02 0.030792 7.63E-01 2.43181E-01 7.09E-02
27.195513 9.78E-02 3.1485E-02 7.73E-01 0.243726 7.19E-02
27.562728 9.79E-02 0.032416 7.83E-01 2.44261E-01 7.27E-02
28.331838 9.79E-02 3.399E-02 7.93E-01 2.44971E-01 7.36E-02
28.632321 9.79E-02 3.5994E-02 8.00E-01 2.45849E-01 7.39E-02
28.91711 9.80E-02 3.8004E-02 8.04E-01 2.46383E-01 7.44E-02
29.404483 9.81E-02 4.0008E-02 8.07E-01 2.47009E-01 7.51E-02
29.917358 9.81E-02 4.2013E-02 8.09E-01 2.4766E-01 7.54E-02
30.632188 9.81E-02 4.4014E-02 8.10E-01 2.4827E-01 7.56E-02
31.38579 9.81E-02 4.6026E-02 8.12E-01 2.48793E-01 7.60E-02
31.933991 9.82E-02 4.804E-02 8.14E-01 2.50295E-01 7.61E-02
32.419199 9.82E-02 5.0112E-02 8.15E-01 2.51396E-01 7.64E-02
32.693924 9.83E-02 5.2219E-02 8.16E-01 2.5264E-01 7.66E-02
32.913463 9.83E-02 5.4244E-02 8.16E-01 2.53324E-01 7.69E-02
33.390122 9.84E-02 5.6313E-02 8.17E-01 2.54097E-01 7.77E-02
33.693257 9.84E-02 0.058351 8.17E-01 2.54597E-01 7.84E-02
34.865767 9.84E-02 0.06037 8.17E-01 0.255108 7.88E-02
35.580326 9.85E-02 6.242E-02 8.18E-01 2.55841E-01 7.91E-02
35.806527 9.85E-02 6.4525E-02 8.18E-01 0.25649 7.95E-02
36.012378 9.86E-02 6.6556E-02 8.18E-01 0.257658 8.06E-02
36.380494 9.86E-02 6.8601E-02 8.18E-01 2.58397E-01 8.09E-02
36.626263 9.87E-02 7.0789E-02 8.18E-01 0.260986 8.11E-02
36.884276 9.87E-02 7.2807E-02 8.18E-01 2.62121E-01 8.14E-02
37.411756 9.88E-02 7.4807E-02 8.18E-01 2.63287E-01 8.15E-02
37.692116 9.88E-02 7.7674E-02 8.18E-01 2.63945E-01 8.18E-02
38.072142 9.89E-02 7.9716E-02 8.18E-01 2.64614E-01 8.21E-02
38.568569 9.90E-02 8.3919E-02 8.18E-01 0.265262 8.23E-02
38.874771 9.90E-02 8.6707E-02 8.18E-01 2.65857E-01 8.28E-02
39.149344 9.90E-02 9.0165E-02 8.18E-01 2.66745E-01 8.32E-02
39.45034 9.91E-02 9.2403E-02 8.18E-01 2.6725E-01 8.37E-02
39.760096 9.91E-02 9.6392E-02 8.19E-01 2.69207E-01 8.40E-02
40.05407 9.91E-02 1.0175E-01 8.19E-01 0.269847 8.42E-02
40.817456 9.92E-02 0.107211 8.19E-01 2.71166E-01 8.43E-02
41.220761 9.92E-02 1.09464E-01 8.19E-01 2.7221E-01 8.45E-02
41.713548 9.92E-02 0.111801 8.19E-01 2.72753E-01 8.47E-02
42.224609 9.93E-02 1.67302E-01 8.19E-01 2.7547E-01 8.48E-02
42.446489 9.95E-02 1.72197E-01 8.19E-01 2.75986E-01 8.49E-02
42.816795 9.96E-02 1.75245E-01 8.19E-01 2.76602E-01 8.52E-02
43.191928 9.96E-02 1.77662E-01 8.19E-01 2.77465E-01 8.52E-02
43.502043 9.96E-02 1.79889E-01 8.19E-01 2.78054E-01 8.54E-02
4.37291E+01 9.97E-02 1.81986E-01 8.19E-01 2.78562E-01 8.56E-02
44.241514 9.97E-02 0.184008 8.20E-01 2.79307E-01 8.59E-02
44.526853 9.97E-02 1.86024E-01 8.20E-01 2.80832E-01 8.60E-02
44.775919 9.98E-02 0.188109 8.21E-01 2.81466E-01 8.62E-02
45.507528 9.99E-02 0.190224 8.21E-01 0.282663 8.63E-02
45.718122 9.99E-02 0.192359 8.21E-01 2.84676E-01 8.65E-02
46.062891 1.00E-01 1.9437E-01 8.22E-01 2.86287E-01 8.66E-02
46.394708 1.00E-01 1.96428E-01 8.22E-01 2.86814E-01 8.69E-02
46.673615 1.00E-01 1.9854E-01 8.22E-01 2.9115E-01 8.69E-02
46.915279 1.00E-01 0.201096 8.22E-01 4.70256E-01 8.71E-02
47.18689 1.00E-01 2.03422E-01 8.23E-01 0.495641 8.72E-02
47.405634 1.00E-01 2.05443E-01 8.23E-01 6.05909E-01 8.72E-02
47.617804 1.00E-01 0.20745 8.23E-01 1.164566 8.73E-02
47.991328 1.01E-01 0.209564 8.23E-01 1.165764 8.76E-02
48.41236 1.01E-01 2.11933E-01 8.24E-01 1.166635 8.76E-02
48.763742 1.01E-01 2.13994E-01 8.24E-01 1.167253 8.80E-02
49.488568 1.01E-01 0.216531 8.24E-01 1.168228 8.82E-02
49.69096 1.01E-01 0.220308 8.24E-01 1.169674 8.82E-02
49.905315 1.01E-01 2.23812E-01 8.24E-01 1.17102E+00 8.83E-02
50.146338 1.01E-01 2.2798E-01 8.24E-01 1.188385 8.84E-02
50.361762 1.01E-01 2.30315E-01 8.24E-01 1.191391 8.85E-02
50.757591 1.01E-01 2.34772E-01 8.24E-01 44.845904 8.86E-02
50.986492 1.01E-01 0.237229 8.24E-01 44.857057 8.87E-02
51.28355 1.01E-01 2.39448E-01 8.24E-01 44.869566 8.89E-02
52.841262 1.01E-01 2.41707E-01 8.24E-01 44.887575 8.90E-02
53.158623 1.01E-01 0.243834 8.24E-01 44.903629 8.92E-02
53.363298 1.01E-01 2.46514E-01 8.24E-01 44.917052 8.93E-02
53.569877 1.02E-01 2.50563E-01 8.24E-01 44.932463 8.95E-02
53.870596 1.02E-01 2.528E-01 8.24E-01 44.946061 8.96E-02
54.072917 1.02E-01 0.256745 8.24E-01 44.960281 8.98E-02
54.284049 1.02E-01 2.59778E-01 8.24E-01 44.978117 8.99E-02
54.54199 1.02E-01 2.67034E-01 8.24E-01 44.991706 9.01E-02
54.766493 1.03E-01 0.311616 8.24E-01 45.009885 9.03E-02
54.995941 1.03E-01 0.323216 8.24E-01 45.025531 9.04E-02
55.257194 1.03E-01 3.36991E-01 8.24E-01 45.039152 9.06E-02
55.889814 1.03E-01 3.49177E-01 8.24E-01 45.05531 9.07E-02
56.133049 1.03E-01 3.65272E-01 8.24E-01 45.070996 9.09E-02
56.339557 1.03E-01 3.7715E-01 8.24E-01 47.602685 9.10E-02
56.634221 1.03E-01 0.392177 8.24E-01 705.494672 9.11E-02
56.905762 1.03E-01 4.0746E-01 8.24E-01 705.498565 9.13E-02
57.147989 1.03E-01 4.09716E-01 8.24E-01 705.499109 9.16E-02
57.611502 1.03E-01 4.26527E-01 8.24E-01 705.499654 9.23E-02
57.834912 1.03E-01 4.36795E-01 8.24E-01 705.500256 9.26E-02
58.061757 1.03E-01 0.44233 8.24E-01 705.500897 9.30E-02
58.407544 1.03E-01 4.46231E-01 8.24E-01 705.501412 9.38E-02
59.243824 1.03E-01 0.462891 8.24E-01 705.502025 9.43E-02
59.472724 1.04E-01 0.481429 8.25E-01 705.502531 9.48E-02
59.688934 1.04E-01 4.91702E-01 8.25E-01 705.503043 9.54E-02
59.941786 1.04E-01 5.0729E-01 8.25E-01 705.503768 9.59E-02
61.291235 1.04E-01 5.22758E-01 8.25E-01 705.504412 9.65E-02
61.565172 1.04E-01 5.37801E-01 8.25E-01 705.504979 9.71E-02
61.892568 1.04E-01 5.49196E-01 8.25E-01 705.505507 9.81E-02
62.719596 1.04E-01 1.178513 8.25E-01 705.506012 9.93E-02
63.132817 1.04E-01 1.180624 8.25E-01 705.506534 1.00E-01
64.125152 1.04E-01 1.184356 8.25E-01 705.50704 1.01E-01
64.505347 1.04E-01 1.194582 8.25E-01 705.507542 1.03E-01
65.044262 1.04E-01 1.200136 8.25E-01 705.508192 1.03E-01
65.314697 1.04E-01 1.211483 8.25E-01 705.508771 1.04E-01
65.525384 1.04E-01 1.214337 8.25E-01 705.509333 1.05E-01
65.826537 1.04E-01 1.254425 8.25E-01 705.509879 1.07E-01
66.066814 1.04E-01 1.261695 8.25E-01 705.51051 1.08E-01
66.269688 1.04E-01 1.347844 8.25E-01 705.511146 1.09E-01
66.556621 1.04E-01 1.493434 8.25E-01 705.511685 1.11E-01
66.901886 1.04E-01 1.11824E+01 8.25E-01 705.512212 1.11E-01
67.123797 1.05E-01 1.20561E+01 8.25E-01 705.512722 1.13E-01
67.48183 1.05E-01 13.1015 8.25E-01 705.51326 1.15E-01
67.691873 1.05E-01 1.40408E+01 8.25E-01 705.513764 1.17E-01
67.95355 1.05E-01 14.827207 8.25E-01 705.514272 1.18E-01
68.379879 1.05E-01 1.48403E+01 8.25E-01 705.514793 1.19E-01
68.769764 1.05E-01 15.77263 8.25E-01 705.515309 1.20E-01
68.978429 1.05E-01 15.77517 8.25E-01 705.515846 1.21E-01
69.389891 1.05E-01 15.777431 8.25E-01 705.516359 1.23E-01
69.658871 1.05E-01 15.779883 8.25E-01 705.516861 1.25E-01
69.907446 1.05E-01 15.782043 8.25E-01 705.517449 1.26E-01
70.180157 1.05E-01 15.78526 8.25E-01 705.517967 1.27E-01
70.38899 1.05E-01 15.787495 8.25E-01 705.518487 1.27E-01
70.722106 1.05E-01 15.795955 8.25E-01 705.519263 1.28E-01
71.209582 1.06E-01 15.807256 8.25E-01 705.520136 1.28E-01
71.424457 1.06E-01 1.60048E+01 8.25E-01 705.520651 1.29E-01
72.308568 1.06E-01 16.126457 8.25E-01 705.521669 1.29E-01
72.711471 1.06E-01 16.186198 8.25E-01 705.522172 1.30E-01
73.810904 1.06E-01 16.246352 8.25E-01 705.522796 1.30E-01
74.070622 1.06E-01 16.321635 8.25E-01 705.523591 1.31E-01
74.515974 1.06E-01 16.420644 8.25E-01 705.524159 1.31E-01
74.733857 1.06E-01 16.491284 8.25E-01 705.525129 1.31E-01
74.975429 1.06E-01 16.539571 8.25E-01 705.52607 1.32E-01
75.425762 1.06E-01 16.604929 8.25E-01 705.526643 1.32E-01
75.906547 1.06E-01 16.745071 8.25E-01 705.527477 1.32E-01
76.314575 1.06E-01 16.786036 8.25E-01 705.528077 1.32E-01
76.598516 1.06E-01 16.806058 8.25E-01 705.528894 1.32E-01
76.818648 1.06E-01 16.823727 8.25E-01 705.534509 1.32E-01
77.355746 1.06E-01 16.830387 8.25E-01 705.540515 1.32E-01
77.581435 1.07E-01 16.833098 8.25E-01 705.541436 1.32E-01
77.857878 1.07E-01 16.873134 8.25E-01 708.301604 1.32E-01
78.167184 1.07E-01 16.875383 8.25E-01 708.302833 1.32E-01
78.90516 1.07E-01 16.907647 8.25E-01 708.303695 1.33E-01
79.302811 1.07E-01 16.9209 8.26E-01 708.304206 1.33E-01
79.611842 1.07E-01 1.77786E+01 8.26E-01 708.304716 1.34E-01
79.861229 1.07E-01 18.6142 8.26E-01 708.305233 1.35E-01
80.558719 1.07E-01 1.94881E+01 8.26E-01 708.306021 1.35E-01
80.954443 1.07E-01 2.03332E+01 8.26E-01 708.306538 1.36E-01
81.192639 1.07E-01 21.1694 8.26E-01 708.307409 1.36E-01
82.005395 1.07E-01 2.20021E+01 8.26E-01 708.307965 1.37E-01
82.227956 1.07E-01 2.28439E+01 8.26E-01 708.308467 1.37E-01
82.49765 1.08E-01 2.36864E+01 8.26E-01 708.309016 1.38E-01
82.728709 1.08E-01 2.4531E+01 8.26E-01 708.309525 1.39E-01
82.974757 1.08E-01 2.77067E+02 8.26E-01 708.310056 1.39E-01
83.576652 1.08E-01 277.072 8.26E-01 708.310576 1.40E-01
84.032569 1.08E-01 2.7714E+02 8.26E-01 708.311105 1.41E-01
84.331913 1.08E-01 2.77194E+02 8.26E-01 708.311649 1.42E-01
84.924975 1.08E-01 277.197 8.26E-01 708.312153 1.43E-01
85.018672 1.18E-01 2.772E+02 8.26E-01 708.312657 1.44E-01
85.218768 1.26E-01 2.77212E+02 8.26E-01 708.313166 1.46E-01
85.384934 1.36E-01 277.214 8.26E-01 708.31372 1.47E-01
85.472665 1.46E-01 2.77218E+02 8.26E-01 708.314242 1.47E-01
85.618525 1.56E-01 277.221 8.26E-01 708.314755 1.48E-01
85.816273 1.66E-01 2.77248E+02 8.26E-01 708.315294 1.49E-01
85.845264 1.76E-01 2.77251E+02 8.26E-01 708.315876 1.50E-01
85.967433 1.86E-01 2.77253E+02 8.26E-01 708.316601 1.50E-01
86.029059 1.96E-01 277.255 8.26E-01 708.317182 1.51E-01
86.249811 2.05E-01 2.77297E+02 8.26E-01 708.317764 1.52E-01
86.497188 2.12E-01 2.77314E+02 8.26E-01 708.318297 1.52E-01
86.93712 2.12E-01 2.77318E+02 8.26E-01 708.318834 1.52E-01
87.170557 2.12E-01 2.77324E+02 8.26E-01 708.319392 1.53E-01
87.905188 2.12E-01 2.77328E+02 8.26E-01 708.319947 1.53E-01
88.583444 2.12E-01 2.77335E+02 8.26E-01 708.320455 1.54E-01
89.230838 2.12E-01 2.77337E+02 8.26E-01 708.321038 1.55E-01
89.462139 2.12E-01 2.77342E+02 8.26E-01 708.321674 1.55E-01
89.500357 2.22E-01 2.77347E+02 8.26E-01 708.322264 1.55E-01
89.651014 2.32E-01 277.351 8.26E-01 708.32277 1.56E-01
89.677108 2.42E-01 277.358 8.26E-01 708.323459 1.56E-01
89.768366 2.52E-01 2.77365E+02 8.26E-01 708.324276 1.56E-01
89.909622 2.62E-01 277.38 8.26E-01 708.324852 1.56E-01
89.925259 2.72E-01 277.399 8.26E-01 708.325529 1.56E-01
90.183108 2.77E-01 277.404 8.26E-01 708.326189 1.57E-01
90.411207 2.77E-01 2.77431E+02 8.26E-01 708.326904 1.57E-01
90.869531 2.78E-01 2.77433E+02 8.26E-01 708.327521 1.57E-01
91.348562 2.78E-01 277.435 8.26E-01 708.328361 1.57E-01
91.595225 2.78E-01 2.77438E+02 8.26E-01 708.498928 1.57E-01
9.19368E+01 2.78E-01 277.44 8.26E-01 708.499682 1.57E-01
92.309943 2.78E-01 2.77442E+02 8.26E-01 708.501238 1.57E-01
92.731703 2.78E-01 2.77453E+02 8.26E-01 708.502572 1.58E-01
92.990176 2.78E-01 2.77455E+02 8.26E-01 708.50352 1.58E-01
93.318035 2.78E-01 277.464 8.26E-01 708.504226 1.58E-01
93.614331 2.78E-01 2.77475E+02 8.26E-01 708.505023 1.58E-01
93.862171 2.78E-01 2.77479E+02 8.26E-01 708.505528 1.59E-01
94.15529 2.78E-01 2.77485E+02 8.26E-01 708.506055 1.60E-01
95.106563 2.78E-01 2.77487E+02 8.26E-01 708.506607 1.61E-01
95.720935 2.78E-01 2.7749E+02 8.26E-01 708.507133 1.62E-01
9.61314E+01 2.78E-01 277.495 8.26E-01 708.507643 1.63E-01
96.340584 2.78E-01 2.77498E+02 8.26E-01 708.50821 1.64E-01
96.657448 2.78E-01 277.5 8.26E-01 708.50878 1.65E-01
97.400291 2.79E-01 2.77508E+02 8.26E-01 708.509294 1.67E-01
97.717285 2.79E-01 2.77513E+02 8.26E-01 708.509824 1.70E-01
98.067925 2.79E-01 277.517 8.26E-01 708.510342 1.73E-01
98.345988 2.79E-01 2.77521E+02 8.26E-01 708.510844 1.76E-01
98.666786 2.79E-01 277.524 8.26E-01 708.511357 1.79E-01
98.906759 2.79E-01 2.77526E+02 8.26E-01 708.511873 1.81E-01
99.166432 2.79E-01 2.77533E+02 8.26E-01 708.512375 1.85E-01
100.16891 2.79E-01 277.541 8.26E-01 708.512881 1.89E-01
101.441207 2.79E-01 2.77547E+02 8.26E-01 708.513406 1.95E-01
101.736446 2.79E-01 2.77554E+02 8.26E-01 708.513919 2.00E-01
101.977815 2.79E-01 2.77573E+02 8.26E-01 708.514421 2.04E-01
102.322988 2.79E-01 2.77575E+02 8.26E-01 708.514935 2.09E-01
102.529123 2.79E-01 2.77578E+02 8.26E-01 708.515437 2.14E-01
102.945466 2.79E-01 2.77581E+02 8.27E-01 708.515944 2.19E-01
103.215773 2.80E-01 2.77586E+02 8.27E-01 708.516449 2.25E-01
103.629686 2.80E-01 2.77609E+02 8.27E-01 708.516952 2.31E-01
103.864877 2.80E-01 277.613 8.27E-01 708.517457 2.37E-01
104.478801 2.80E-01 2.77617E+02 8.27E-01 708.517976 2.44E-01
104.701847 2.80E-01 2.77623E+02 8.27E-01 708.518478 2.50E-01
105.09099 2.80E-01 2.77626E+02 8.27E-01 708.518981 2.58E-01
105.326168 2.80E-01 277.63 8.27E-01 708.519491 2.66E-01
105.573944 2.80E-01 2.77636E+02 8.27E-01 708.519994 2.74E-01
105.778261 2.80E-01 2.77644E+02 8.27E-01 708.520497 2.81E-01
106.042395 2.81E-01 2.77647E+02 8.27E-01 708.520998 2.88E-01
106.285135 2.81E-01 277.649 8.27E-01 708.521499 2.94E-01
106.512165 2.81E-01 2.77653E+02 8.27E-01 708.522005 3.02E-01
106.821264 2.81E-01 2.77658E+02 8.27E-01 708.522508 3.11E-01
107.25213 2.81E-01 2.77663E+02 8.27E-01 708.523018 3.21E-01
107.758034 2.81E-01 277.69 8.27E-01 708.523473 3.31E-01
107.987405 2.81E-01 2.77693E+02 8.27E-01 708.52399 3.40E-01
108.387977 2.81E-01 277.697 8.27E-01 708.524491 3.47E-01
108.666183 2.82E-01 277.702 8.27E-01 708.524997 3.55E-01
108.937391 2.82E-01 2.77704E+02 8.27E-01 708.5255 3.61E-01
109.180215 2.82E-01 2.77706E+02 8.27E-01 708.526003 3.69E-01
109.55324 2.82E-01 277.726 8.27E-01 708.52651 3.77E-01
109.793077 2.82E-01 2.77731E+02 8.27E-01 708.527016 3.85E-01
110.092519 2.82E-01 277.733 8.27E-01 708.527517 3.93E-01
110.412302 2.82E-01 2.77736E+02 8.27E-01 708.528026 4.01E-01
110.963994 2.82E-01 2.77739E+02 8.27E-01 708.528533 4.09E-01
112.318584 2.82E-01 2.77748E+02 8.27E-01 708.529035 4.17E-01
112.757307 2.82E-01 2.77761E+02 8.27E-01 708.529538 4.25E-01
113.212968 2.83E-01 2.77765E+02 8.27E-01 708.530039 4.34E-01
113.594431 2.83E-01 2.77784E+02 8.27E-01 708.53054 4.42E-01
113.802629 2.83E-01 2.77787E+02 8.27E-01 708.531051 4.50E-01
114.269456 2.83E-01 2.77792E+02 8.27E-01 708.531555 4.60E-01
114.475496 2.83E-01 2.77794E+02 8.27E-01 708.532056 4.70E-01
114.695856 2.83E-01 277.827 8.27E-01 708.532565 4.77E-01
114.896798 2.93E-01 2.77833E+02 8.27E-01 708.533067 4.86E-01
115.096093 3.03E-01 2.77849E+02 8.27E-01 708.533578 4.95E-01
115.284229 3.13E-01 277.851 8.27E-01 708.534083 5.02E-01
115.445925 3.23E-01 2.77854E+02 8.27E-01 708.534586 5.10E-01
115.610599 3.33E-01 2.77857E+02 8.27E-01 708.535089 5.18E-01
115.902029 3.42E-01 2.77867E+02 8.27E-01 708.535597 5.27E-01
116.18424 3.42E-01 277.87 8.27E-01 708.536098 5.34E-01
116.588251 3.42E-01 2.77874E+02 8.27E-01 708.536599 5.42E-01
117.009517 3.42E-01 2.77878E+02 8.27E-01 708.5371 5.50E-01
117.526725 3.42E-01 2.77884E+02 8.27E-01 708.537607 5.59E-01
117.773753 3.43E-01 277.887 8.27E-01 708.538108 5.67E-01
118.082054 3.43E-01 2.77897E+02 8.27E-01 708.538609 5.76E-01
118.466949 3.43E-01 2.779E+02 8.27E-01 708.539109 5.84E-01
119.082229 3.43E-01 2.77905E+02 8.27E-01 708.539612 5.93E-01
119.571642 3.43E-01 2.77907E+02 8.27E-01 708.540116 6.03E-01
120.178774 3.43E-01 2.77921E+02 8.27E-01 708.540619 6.12E-01
120.388223 3.43E-01 277.923 8.27E-01 708.541123 6.21E-01
120.620865 3.43E-01 2.77925E+02 8.27E-01 708.541635 6.31E-01
120.862594 3.43E-01 2.77927E+02 8.27E-01 708.542156 6.40E-01
121.223627 3.43E-01 2.77931E+02 8.27E-01 708.542657 6.50E-01
121.394691 3.53E-01 277.952 8.27E-01 708.543151 6.60E-01
121.41424 3.63E-01 2.77955E+02 8.27E-01 708.543652 6.70E-01
121.636143 3.69E-01 2.77957E+02 8.27E-01 708.544151 6.80E-01
121.880221 3.69E-01 277.959 8.27E-01 708.54466 6.89E-01
122.133583 3.69E-01 2.77962E+02 8.27E-01 708.545161 6.99E-01
122.596868 3.69E-01 2.77968E+02 8.27E-01 708.545676 7.07E-01
123.01539 3.69E-01 2.77999E+02 8.27E-01 708.546179 7.17E-01
123.677619 3.69E-01 2.78002E+02 8.27E-01 708.546665 7.27E-01
123.938072 3.69E-01 2.78008E+02 8.27E-01 708.547169 7.36E-01
124.341122 3.69E-01 2.7802E+02 8.28E-01 708.54768 7.46E-01
124.570596 3.70E-01 2.78022E+02 8.28E-01 708.548182 7.55E-01
124.83813 3.70E-01 2.78028E+02 8.28E-01 708.548637 7.66E-01
125.437652 3.70E-01 2.78031E+02 8.28E-01 708.549138 7.75E-01
125.663189 3.70E-01 2.78033E+02 8.28E-01 708.549639 7.83E-01
125.876302 3.70E-01 278.036 8.28E-01 708.550144 7.93E-01
126.472777 3.71E-01 2.78038E+02 8.28E-01 708.550647 8.02E-01
126.705614 3.71E-01 278.048 8.28E-01 708.551153 8.11E-01
126.96485 3.71E-01 2.7805E+02 8.28E-01 708.551658 8.20E-01
127.167953 3.71E-01 2.78054E+02 8.28E-01 708.552161 8.28E-01
127.443413 3.71E-01 2.78059E+02 8.28E-01 708.552662 8.34E-01
127.645838 3.71E-01 2.78063E+02 8.28E-01 708.553165 8.40E-01
127.875152 3.71E-01 2.7807E+02 8.28E-01 708.553668 8.45E-01
128.083369 3.72E-01 2.78081E+02 8.28E-01 708.554179 8.50E-01
128.66276 3.72E-01 2.78087E+02 8.28E-01 708.554681 8.55E-01
128.871078 3.72E-01 2.78092E+02 8.28E-01 708.555183 8.60E-01
129.217128 3.72E-01 278.096 8.28E-01 708.555684 8.64E-01
129.476226 3.72E-01 2.78099E+02 8.28E-01 708.556196 8.67E-01
129.972863 3.72E-01 2.78111E+02 8.28E-01 708.5567 8.70E-01
130.458366 3.72E-01 278.12 8.28E-01 708.557201 8.74E-01
131.123556 3.72E-01 2.78155E+02 8.28E-01 708.557718 8.78E-01
131.369163 3.72E-01 2.78157E+02 8.28E-01 708.558228 8.81E-01
131.707645 3.72E-01 2.7816E+02 8.28E-01 708.558757 8.84E-01
131.99021 3.73E-01 2.78172E+02 8.28E-01 708.559258 8.87E-01
132.232757 3.73E-01 2.78181E+02 8.28E-01 708.559761 8.90E-01
132.435301 3.73E-01 278.19 8.28E-01 708.560268 8.92E-01
132.910741 3.73E-01 2.78204E+02 8.28E-01 708.560783 8.94E-01
133.571414 3.74E-01 2.78317E+02 8.28E-01 708.561289 8.96E-01
133.960253 3.74E-01 2.7832E+02 8.28E-01 708.561793 8.99E-01
134.166575 3.74E-01 278.322 8.28E-01 708.562316 9.00E-01
134.858197 3.74E-01 2.78324E+02 8.29E-01 708.562818 9.02E-01
135.382507 3.74E-01 2.78326E+02 8.29E-01 708.563325 9.03E-01
135.647877 3.74E-01 2.78329E+02 8.32E-01 708.563839 9.04E-01
135.862844 3.74E-01 2.78331E+02 8.34E-01 708.564342 9.05E-01
136.234645 3.75E-01 2.78333E+02 8.37E-01 708.564902 9.06E-01
137.762181 3.75E-01 2.78336E+02 8.42E-01 708.565403 9.07E-01
137.990669 3.75E-01 2.78338E+02 8.46E-01 708.566086 9.08E-01
138.349357 3.75E-01 2.78341E+02 8.53E-01 708.566658 9.08E-01
138.677556 3.76E-01 2.78343E+02 8.58E-01 708.567324 9.09E-01
138.983542 3.76E-01 2.78345E+02 8.63E-01 708.567954 9.09E-01
140.293125 3.76E-01 2.78348E+02 8.70E-01 708.568478 9.10E-01
140.85909 3.76E-01 2.7835E+02 8.77E-01 708.569041 9.11E-01
141.398054 3.76E-01 2.78353E+02 8.87E-01 708.569655 9.11E-01
141.703307 3.76E-01 2.78355E+02 8.93E-01 708.570259 9.11E-01
141.912583 3.76E-01 2.78357E+02 8.99E-01 708.570827 9.12E-01
142.125788 3.79E-01 2.7836E+02 9.09E-01 708.571415 9.12E-01
142.33538 3.81E-01 2.78362E+02 9.16E-01 708.571968 9.12E-01
142.539519 3.83E-01 2.78365E+02 9.27E-01 708.572495 9.12E-01
142.744033 3.87E-01 2.78367E+02 9.33E-01 708.573117 9.13E-01
142.946144 3.88E-01 2.78369E+02 9.39E-01 708.57366 9.13E-01
143.146444 3.90E-01 2.78372E+02 9.48E-01 708.574349 9.13E-01
143.355002 3.99E-01 2.78374E+02 9.53E-01 708.734902 9.13E-01
143.507791 4.09E-01 2.78377E+02 9.60E-01 708.737214 9.13E-01
143.609955 4.19E-01 2.78379E+02 9.62E-01 708.74071 9.13E-01
143.739163 4.29E-01 2.78382E+02 9.66E-01 730.532142 9.14E-01
143.91073 4.39E-01 2.78384E+02 9.69E-01 730.532688 9.14E-01
143.918794 4.49E-01 2.78386E+02 9.70E-01 730.533206 9.14E-01
144.005829 4.59E-01 2.78389E+02 9.71E-01 730.53377 9.14E-01
144.271131 4.63E-01 2.78391E+02 9.72E-01 730.534496 9.15E-01
168.353625 4.71E-01 2.78394E+02 9.73E-01 730.535007 9.15E-01
205.550873 4.73E-01 2.78396E+02 9.73E-01 730.535515 9.15E-01
206.371261 4.73E-01 2.78398E+02 9.74E-01 730.536129 9.16E-01
207.200939 4.73E-01 2.78401E+02 9.74E-01 730.536655 9.16E-01
208.033265 4.73E-01 2.78403E+02 9.75E-01 730.537233 9.17E-01
208.864968 4.73E-01 2.78406E+02 9.75E-01 730.537775 9.17E-01
209.712604 4.73E-01 2.78408E+02 9.75E-01 730.538315 9.17E-01
210.522206 4.73E-01 2.7841E+02 9.75E-01 730.538829 9.18E-01
211.331604 4.73E-01 2.78413E+02 9.75E-01 730.53934 9.19E-01
212.185829 4.73E-01 278.416 9.76E-01 730.539858 9.19E-01
212.992211 4.73E-01 2.78418E+02 9.76E-01 730.540368 9.20E-01
213.799864 4.73E-01 2.78421E+02 9.76E-01 730.540876 9.21E-01
214.62675 4.73E-01 278.423 9.76E-01 730.541396 9.22E-01
215.479057 4.73E-01 2.78426E+02 9.76E-01 730.54191 9.23E-01
216.295796 4.73E-01 2.7843E+02 9.76E-01 730.542417 9.24E-01
217.287887 4.73E-01 2.86546E+02 9.76E-01 730.542951 9.26E-01
218.085839 4.73E-01 286.548 9.76E-01 730.543506 9.27E-01
248.570381 4.73E-01 2.86648E+02 9.76E-01 730.544011 9.28E-01
248.582965 4.83E-01 286.726 9.76E-01 730.544532 9.29E-01
248.602264 4.93E-01 2.86979E+02 9.76E-01 730.545079 9.30E-01
250.667727 4.95E-01 2.87013E+02 9.76E-01 730.54558 9.31E-01
2.50978E+02 4.95E-01 287.428 9.76E-01 730.546131 9.32E-01
251.178429 4.96E-01 287.791 9.76E-01 730.546634 9.32E-01
251.450359 5.01E-01 2.87793E+02 9.76E-01 730.547135 9.33E-01
251.577147 5.11E-01 2.87795E+02 9.76E-01 730.547655 9.35E-01
251.5814 5.21E-01 2.87797E+02 9.76E-01 730.548265 9.35E-01
251.585063 5.31E-01 2.878E+02 9.76E-01 730.548825 9.36E-01
251.588196 5.41E-01 2.87802E+02 9.77E-01 730.549424 9.37E-01
251.590875 5.51E-01 2.87805E+02 9.78E-01 730.549944 9.38E-01
251.593276 5.61E-01 2.87807E+02 9.78E-01 730.55048 9.38E-01
251.595566 5.71E-01 2.87809E+02 9.79E-01 730.551023 9.39E-01
251.597953 5.81E-01 2.87812E+02 9.79E-01 730.551602 9.39E-01
251.600493 5.91E-01 2.87814E+02 9.80E-01 730.552188 9.39E-01
251.603049 6.01E-01 2.87817E+02 9.80E-01 730.552719 9.39E-01
251.605817 6.11E-01 2.87819E+02 9.80E-01 730.553309 9.40E-01
251.608813 6.21E-01 2.87821E+02 9.81E-01 730.553887 9.41E-01
251.611617 6.31E-01 2.87824E+02 9.81E-01 730.554445 9.41E-01
251.614217 6.41E-01 2.87826E+02 9.81E-01 730.555148 9.42E-01
251.616523 6.51E-01 2.8783E+02 9.81E-01 730.555665 9.42E-01
251.618835 6.61E-01 2.87832E+02 9.81E-01 730.556229 9.42E-01
251.620634 6.71E-01 2.87835E+02 9.81E-01 730.55677 9.43E-01
251.623102 6.81E-01 2.87837E+02 9.81E-01 730.557478 9.43E-01
251.62579 6.91E-01 2.8784E+02 9.81E-01 730.558167 9.43E-01
251.628944 7.01E-01 2.87842E+02 9.81E-01 730.558985 9.43E-01
251.633684 7.11E-01 2.87844E+02 9.81E-01 730.559949 9.44E-01
251.643573 7.21E-01 2.87848E+02 9.81E-01 730.561118 9.44E-01
251.663133 7.31E-01 287.858 9.81E-01 730.562007 9.44E-01
251.669186 7.41E-01 3.16832E+02 9.81E-01 730.564193 9.44E-01
251.672983 7.51E-01 3.19796E+02 9.81E-01 730.564785 9.44E-01
251.676297 7.61E-01 319.798 9.81E-01 730.565398 9.44E-01
251.680184 7.71E-01 3.198E+02 9.82E-01 730.565933 9.44E-01
251.684952 7.81E-01 3.19802E+02 9.82E-01 730.568703 9.44E-01
253.605018 7.90E-01 3.19805E+02 9.83E-01 798.192375 9.44E-01
253.830593 7.91E-01 3.19807E+02 9.83E-01 798.19288 9.45E-01
254.032387 7.95E-01 3.19809E+02 9.84E-01 798.193486 9.45E-01
254.232723 7.98E-01 3.19812E+02 9.84E-01 798.194012 9.45E-01
254.432982 8.01E-01 3.19814E+02 9.85E-01 798.194544 9.45E-01
254.655516 8.02E-01 3.19817E+02 9.85E-01 798.195065 9.46E-01
254.674669 8.12E-01 3.19819E+02 9.85E-01 798.195797 9.46E-01
254.679678 8.23E-01 3.19821E+02 9.85E-01 798.196458 9.46E-01
254.684061 8.33E-01 3.20099E+02 9.85E-01 798.19703 9.46E-01
254.687398 8.43E-01 3.20102E+02 9.86E-01 798.197973 9.46E-01
254.691011 8.53E-01 3.20104E+02 9.86E-01 798.198574 9.47E-01
254.695441 8.63E-01 3.20106E+02 9.87E-01 798.199115 9.47E-01
254.698646 8.73E-01 320.108 9.87E-01 798.199629 9.47E-01
254.701712 8.83E-01 3.2011E+02 9.87E-01 798.200334 9.48E-01
254.704391 8.93E-01 3.20112E+02 9.87E-01 798.200892 9.48E-01
254.707251 9.03E-01 3.20115E+02 9.87E-01 798.201777 9.48E-01
254.709434 9.13E-01 3.20117E+02 9.88E-01 798.202418 9.48E-01
254.711908 9.23E-01 3.20119E+02 9.88E-01 798.202978 9.49E-01
254.714907 9.33E-01 3.20122E+02 9.88E-01 798.204407 9.49E-01
254.719086 9.43E-01 3.20124E+02 9.88E-01 798.205017 9.49E-01
261.353827 9.50E-01 3.20127E+02 9.89E-01 798.20556 9.50E-01
263.471832 9.59E-01 3.20129E+02 9.89E-01 798.206085 9.50E-01
263.672785 9.60E-01 3.20132E+02 9.89E-01 798.207028 9.51E-01
263.937979 9.60E-01 3.20135E+02 9.89E-01 798.207554 9.51E-01
264.166399 9.60E-01 320.137 9.89E-01 798.20826 9.52E-01
264.367359 9.68E-01 3.20163E+02 9.89E-01 798.209308 9.52E-01
291.301074 9.68E-01 3.20195E+02 9.89E-01 798.210202 9.52E-01
291.502108 9.68E-01 3.20218E+02 9.89E-01 798.211121 9.52E-01
291.734693 9.69E-01 3.20234E+02 9.89E-01 798.211632 9.53E-01
291.987937 9.69E-01 3.20237E+02 9.89E-01 798.212514 9.54E-01
292.256392 9.69E-01 3.20256E+02 9.89E-01 798.213112 9.54E-01
292.374567 9.79E-01 3.20268E+02 9.89E-01 798.21362 9.56E-01
293.590341 9.83E-01 3.2027E+02 9.89E-01 798.214179 9.57E-01
293.790637 9.84E-01 3.20285E+02 9.89E-01 798.214753 9.58E-01
293.818018 9.94E-01 3.20296E+02 9.89E-01 798.215261 9.60E-01
294.089318 1.00E+00 320.298 9.89E-01 798.21583 9.61E-01
294.114777 1.00E+00 3.20314E+02 9.89E-01 798.216562 9.61E-01
3.20328E+02 9.89E-01 798.217112 9.63E-01
3.20331E+02 9.89E-01 798.217642 9.63E-01
3.20342E+02 9.89E-01 798.218156 9.64E-01
320.346 9.89E-01 798.218661 9.65E-01
3.20359E+02 9.89E-01 798.219163 9.67E-01
3.20372E+02 9.89E-01 798.219707 9.68E-01
3.20385E+02 9.89E-01 798.220228 9.70E-01
3.20388E+02 9.89E-01 798.220764 9.72E-01
3.204E+02 9.89E-01 798.221294 9.73E-01
3.20406E+02 9.89E-01 798.22185 9.75E-01
3.2041E+02 9.89E-01 798.222376 9.77E-01
3.20933E+02 9.89E-01 798.222895 9.79E-01
3.20939E+02 9.89E-01 798.223401 9.80E-01
3.20944E+02 9.89E-01 798.223903 9.82E-01
3.2096E+02 9.89E-01 798.224435 9.83E-01
3.20965E+02 9.89E-01 798.224974 9.84E-01
3.21006E+02 9.89E-01 798.225477 9.85E-01
3.21009E+02 9.89E-01 798.226055 9.86E-01
3.21011E+02 9.89E-01 798.226591 9.87E-01
3.21014E+02 9.89E-01 798.227096 9.88E-01
3.21027E+02 9.89E-01 798.227682 9.89E-01
3.21032E+02 9.89E-01 798.228251 9.90E-01
321.036 9.89E-01 798.228837 9.91E-01
321.041 9.89E-01 798.229361 9.92E-01
3.21047E+02 9.89E-01 798.229929 9.93E-01
3.21052E+02 9.90E-01 798.230618 9.93E-01
3.21057E+02 9.90E-01 798.231123 9.93E-01
3.21062E+02 9.90E-01 798.231681 9.94E-01
3.21064E+02 9.90E-01 798.232215 9.95E-01
3.21082E+02 9.90E-01 798.232724 9.96E-01
3.21087E+02 9.90E-01 798.233263 9.96E-01
321.101 9.90E-01 798.233861 9.97E-01
3.21106E+02 9.90E-01 798.234377 9.97E-01
3.21465E+02 9.90E-01 798.234903 9.97E-01
3.21468E+02 9.90E-01 798.235651 9.98E-01
3.21473E+02 9.90E-01 798.236198 9.98E-01
3.21741E+02 9.90E-01 798.236792 9.99E-01
3.21874E+02 9.90E-01 798.23772 9.99E-01
3.21881E+02 9.90E-01 798.239139 9.99E-01
3.21919E+02 9.90E-01 798.239735 9.99E-01
3.21956E+02 9.90E-01 798.240493 9.99E-01
3.21981E+02 9.90E-01 798.241815 9.99E-01
3.22114E+02 9.90E-01 798.245137 1.00E+00
3.22386E+02 9.90E-01 798.246312 1.00E+00
3.22391E+02 9.90E-01 799.77378 1.00E+00
3.22396E+02 9.90E-01 799.775975 1.00E+00
3.2245E+02 9.90E-01 800.301105 1.00E+00
3.22453E+02 9.90E-01 800.301266 1.00E+00
3.22511E+02 9.90E-01
3.22514E+02 9.90E-01
3.22516E+02 9.90E-01
3.22523E+02 9.90E-01
3.22538E+02 9.90E-01
3.22603E+02 9.90E-01
3.22638E+02 9.90E-01
322.697 9.90E-01
322.815 9.90E-01
3.22981E+02 9.90E-01
322.983 9.90E-01
3.22985E+02 9.90E-01
3.22987E+02 9.90E-01
3.2299E+02 9.90E-01
3.22992E+02 9.90E-01
3.22994E+02 9.91E-01
3.22997E+02 9.91E-01
3.22999E+02 9.91E-01
3.23002E+02 9.92E-01
3.23004E+02 9.93E-01
3.23007E+02 9.94E-01
3.23009E+02 9.94E-01
3.23011E+02 9.95E-01
3.23014E+02 9.95E-01
3.23016E+02 9.96E-01
3.23019E+02 9.97E-01
3.23021E+02 9.97E-01
3.23023E+02 9.98E-01
3.23026E+02 9.99E-01
3.23028E+02 9.99E-01
3.23031E+02 9.99E-01
3.23033E+02 9.99E-01
3.23035E+02 9.99E-01
3.23038E+02 1.00E+00
3.2304E+02 1.00E+00
3.23043E+02 1.00E+00
3.23045E+02 1.00E+00
3.23047E+02 1.00E+00
3.23056E+02 1.00E+00
3.23058E+02 1.00E+00
323.065 1.00E+00
3.24224E+02 1.00E+00
3.24227E+02 1.00E+00
3.24229E+02 1.00E+00
3.24232E+02 1.00E+00
3.24249E+02 1.00E+00
3.24253E+02 1.00E+00
3.24276E+02 1.00E+00
3.2428E+02 1.00E+00
3.24306E+02 1.00E+00
3.24312E+02 1.00E+00
3.24316E+02 1.00E+00
324.322 1.00E+00
324.327 1.00E+00
3.24331E+02 1.00E+00
3.24333E+02 1.00E+00
3.24336E+02 1.00E+00
3.24352E+02 1.00E+00
324.358 1.00E+00
3.24364E+02 1.00E+00
3.24378E+02 1.00E+00
3.24383E+02 1.00E+00
3.24787E+02 1.00E+00
3.24796E+02 1.00E+00
3.24799E+02 1.00E+00
324.803 1.00E+00
324.803 1.00E+00
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 90 180 270 360 450 540 630 720 810 900C
um
ula
tive F
racti
on
FCT (Seconds)
OpenTCP1
OpenTCP2
TCP
(a)0.0 1.33E-01 0.0 8.16E-01 0.0 2.33E-01
0.0 1.33E-01 0.0 8.16E-01 0.0 2.33E-01
1.0 2.46E-01 1.0 8.29E-01 1.0 3.56E-01
2.0 3.05E-01 2.0 8.31E-01 2.0 4.56E-01
3.0 3.52E-01 3.0 8.33E-01 3.0 5.48E-01
4.0 3.73E-01 4.0 8.33E-01 4.0 6.24E-01
6.0 3.86E-01 5.0 8.33E-01 5.0 6.92E-01
11.0 3.98E-01 6.0 8.33E-01 6.0 7.51E-01
15.0 4.10E-01 8.0 8.33E-01 7.0 8.00E-01
18.0 4.21E-01 9.0 8.33E-01 8.0 8.45E-01
21.0 4.36E-01 11.0 8.33E-01 9.0 8.78E-01
23.0 4.46E-01 12.0 8.33E-01 10.0 9.04E-01
25.0 4.57E-01 13.0 8.33E-01 11.0 9.26E-01
27.0 4.68E-01 14.0 8.33E-01 12.0 9.41E-01
29.0 4.78E-01 15.0 8.33E-01 13.0 9.53E-01
31.0 4.89E-01 16.0 8.33E-01 14.0 9.62E-01
33.0 5.00E-01 17.0 8.34E-01 15.0 9.71E-01
35.0 5.10E-01 19.0 8.34E-01 16.0 9.76E-01
38.0 5.24E-01 21.0 8.34E-01 17.0 9.81E-01
41.0 5.38E-01 23.0 8.34E-01 18.0 9.85E-01
44.0 5.51E-01 31.0 8.34E-01 19.0 9.87E-01
47.0 5.64E-01 32.0 8.34E-01 20.0 9.90E-01
50.0 5.79E-01 33.0 8.34E-01 21.0 9.91E-01
52.0 5.89E-01 34.0 8.34E-01 22.0 9.92E-01
54.0 6.00E-01 35.0 8.34E-01 23.0 9.93E-01
56.0 6.11E-01 36.0 8.34E-01 24.0 9.93E-01
58.0 6.24E-01 37.0 8.34E-01 25.0 9.94E-01
60.0 6.36E-01 38.0 8.34E-01 26.0 9.95E-01
62.0 6.48E-01 41.0 8.34E-01 27.0 9.96E-01
64.0 6.61E-01 42.0 8.34E-01 28.0 9.96E-01
66.0 6.73E-01 43.0 8.34E-01 29.0 9.96E-01
68.0 6.84E-01 44.0 8.34E-01 30.0 9.97E-01
70.0 6.96E-01 45.0 8.34E-01 31.0 9.97E-01
72.0 7.08E-01 46.0 8.34E-01 32.0 9.98E-01
74.0 7.21E-01 47.0 8.34E-01 33.0 9.98E-01
76.0 7.33E-01 48.0 8.34E-01 34.0 9.98E-01
78.0 7.43E-01 49.0 8.35E-01 35.0 9.98E-01
80.0 7.55E-01 50.0 8.35E-01 36.0 9.99E-01
82.0 7.66E-01 51.0 8.35E-01 37.0 9.99E-01
84.0 7.77E-01 52.0 8.35E-01 38.0 9.99E-01
86.0 7.87E-01 53.0 8.35E-01 39.0 9.99E-01
89.0 8.01E-01 54.0 8.36E-01 42.0 9.99E-01
92.0 8.14E-01 55.0 8.36E-01 43.0 9.99E-01
95.0 8.26E-01 56.0 8.37E-01 46.0 9.99E-01
98.0 8.37E-01 57.0 8.38E-01 47.0 9.99E-01
101.0 8.48E-01 58.0 8.38E-01 48.0 9.99E-01
104.0 8.58E-01 59.0 8.39E-01 51.0 1.00E+00
108.0 8.71E-01 60.0 8.40E-01 52.0 1.00E+00
112.0 8.83E-01 61.0 8.42E-01 55.0 1.00E+00
116.0 8.95E-01 62.0 8.43E-01 71.0 1.00E+00
120.0 9.06E-01 63.0 8.44E-01 71.0 1.00E+00
124.0 9.17E-01 64.0 8.46E-01
129.0 9.28E-01 65.0 8.48E-01
134.0 9.38E-01 66.0 8.50E-01
140.0 9.47E-01 67.0 8.51E-01
146.0 9.52E-01 68.0 8.53E-01
152.0 9.57E-01 69.0 8.56E-01
158.0 9.59E-01 70.0 8.58E-01
164.0 9.61E-01 71.0 8.60E-01
170.0 9.62E-01 72.0 8.62E-01
176.0 9.62E-01 73.0 8.65E-01
182.0 9.63E-01 74.0 8.68E-01
188.0 9.64E-01 75.0 8.70E-01
194.0 9.65E-01 76.0 8.73E-01
200.0 9.65E-01 77.0 8.76E-01
206.0 9.66E-01 78.0 8.79E-01
212.0 9.67E-01 79.0 8.81E-01
218.0 9.67E-01 80.0 8.83E-01
224.0 9.68E-01 81.0 8.86E-01
230.0 9.69E-01 82.0 8.88E-01
236.0 9.69E-01 83.0 8.90E-01
242.0 9.70E-01 84.0 8.93E-01
248.0 9.70E-01 85.0 8.94E-01
254.0 9.71E-01 86.0 8.96E-01
260.0 9.71E-01 87.0 8.98E-01
266.0 9.72E-01 88.0 9.00E-01
272.0 9.72E-01 89.0 9.02E-01
278.0 9.73E-01 90.0 9.03E-01
284.0 9.73E-01 91.0 9.05E-01
290.0 9.73E-01 92.0 9.06E-01
296.0 9.74E-01 93.0 9.08E-01
302.0 9.74E-01 94.0 9.09E-01
308.0 9.75E-01 95.0 9.11E-01
314.0 9.75E-01 96.0 9.11E-01
320.0 9.76E-01 97.0 9.13E-01
326.0 9.76E-01 98.0 9.13E-01
332.0 9.77E-01 99.0 9.14E-01
338.0 9.77E-01 100.0 9.15E-01
344.0 9.77E-01 101.0 9.16E-01
350.0 9.78E-01 102.0 9.16E-01
356.0 9.78E-01 103.0 9.17E-01
362.0 9.78E-01 104.0 9.18E-01
368.0 9.79E-01 105.0 9.18E-01
374.0 9.79E-01 106.0 9.19E-01
380.0 9.79E-01 107.0 9.19E-01
386.0 9.80E-01 108.0 9.20E-01
392.0 9.80E-01 109.0 9.20E-01
398.0 9.80E-01 110.0 9.21E-01
404.0 9.81E-01 111.0 9.21E-01
410.0 9.81E-01 112.0 9.22E-01
416.0 9.81E-01 113.0 9.23E-01
422.0 9.81E-01 114.0 9.23E-01
428.0 9.82E-01 115.0 9.24E-01
434.0 9.82E-01 116.0 9.25E-01
440.0 9.82E-01 117.0 9.26E-01
446.0 9.82E-01 118.0 9.27E-01
452.0 9.83E-01 119.0 9.27E-01
458.0 9.83E-01 120.0 9.28E-01
464.0 9.83E-01 121.0 9.29E-01
470.0 9.83E-01 122.0 9.31E-01
476.0 9.83E-01 123.0 9.32E-01
482.0 9.84E-01 124.0 9.33E-01
488.0 9.84E-01 125.0 9.34E-01
495.0 9.84E-01 126.0 9.36E-01
501.0 9.84E-01 127.0 9.37E-01
507.0 9.84E-01 128.0 9.39E-01
513.0 9.84E-01 129.0 9.41E-01
521.0 9.84E-01 130.0 9.43E-01
528.0 9.85E-01 131.0 9.44E-01
534.0 9.85E-01 132.0 9.46E-01
541.0 9.85E-01 133.0 9.48E-01
548.0 9.85E-01 134.0 9.50E-01
556.0 9.85E-01 135.0 9.52E-01
562.0 9.85E-01 136.0 9.54E-01
568.0 9.85E-01 137.0 9.56E-01
574.0 9.85E-01 138.0 9.58E-01
582.0 9.86E-01 139.0 9.60E-01
589.0 9.86E-01 140.0 9.62E-01
596.0 9.86E-01 141.0 9.64E-01
602.0 9.86E-01 142.0 9.66E-01
608.0 9.86E-01 143.0 9.68E-01
615.0 9.86E-01 144.0 9.70E-01
621.0 9.86E-01 145.0 9.72E-01
628.0 9.86E-01 146.0 9.74E-01
635.0 9.87E-01 147.0 9.75E-01
641.0 9.87E-01 148.0 9.77E-01
651.0 9.87E-01 149.0 9.78E-01
658.0 9.87E-01 150.0 9.80E-01
666.0 9.87E-01 151.0 9.82E-01
672.0 9.87E-01 152.0 9.83E-01
680.0 9.87E-01 153.0 9.84E-01
686.0 9.87E-01 154.0 9.85E-01
692.0 9.87E-01 155.0 9.87E-01
699.0 9.88E-01 156.0 9.88E-01
707.0 9.88E-01 157.0 9.89E-01
716.0 9.88E-01 158.0 9.90E-01
722.0 9.88E-01 159.0 9.91E-01
730.0 9.88E-01 160.0 9.92E-01
738.0 9.88E-01 161.0 9.92E-01
745.0 9.88E-01 162.0 9.93E-01
755.0 9.88E-01 163.0 9.94E-01
763.0 9.88E-01 164.0 9.95E-01
772.0 9.88E-01 165.0 9.95E-01
778.0 9.88E-01 166.0 9.96E-01
785.0 9.88E-01 167.0 9.96E-01
792.0 9.89E-01 168.0 9.96E-01
798.0 9.89E-01 169.0 9.97E-01
808.0 9.89E-01 170.0 9.97E-01
814.0 9.89E-01 171.0 9.97E-01
820.0 9.89E-01 172.0 9.97E-01
826.0 9.89E-01 173.0 9.97E-01
832.0 9.89E-01 174.0 9.98E-01
838.0 9.89E-01 175.0 9.98E-01
844.0 9.89E-01 176.0 9.98E-01
852.0 9.89E-01 177.0 9.98E-01
859.0 9.89E-01 178.0 9.98E-01
865.0 9.89E-01 179.0 9.98E-01
873.0 9.89E-01 180.0 9.98E-01
879.0 9.90E-01 181.0 9.98E-01
885.0 9.90E-01 182.0 9.98E-01
891.0 9.90E-01 183.0 9.98E-01
898.0 9.90E-01 184.0 9.98E-01
904.0 9.90E-01 185.0 9.98E-01
918.0 9.90E-01 187.0 9.98E-01
924.0 9.90E-01 188.0 9.98E-01
934.0 9.90E-01 189.0 9.98E-01
941.0 9.90E-01 190.0 9.98E-01
947.0 9.90E-01 191.0 9.98E-01
955.0 9.90E-01 192.0 9.99E-01
961.0 9.90E-01 193.0 9.99E-01
967.0 9.90E-01 194.0 9.99E-01
976.0 9.90E-01 195.0 9.99E-01
982.0 9.91E-01 196.0 9.99E-01
990.0 9.91E-01 197.0 9.99E-01
996.0 9.91E-01 198.0 9.99E-01
1002.0 9.91E-01 202.0 9.99E-01
1008.0 9.91E-01 203.0 9.99E-01
1014.0 9.91E-01 204.0 9.99E-01
1020.0 9.91E-01 205.0 9.99E-01
1027.0 9.91E-01 206.0 9.99E-01
1034.0 9.91E-01 207.0 9.99E-01
1042.0 9.91E-01 209.0 9.99E-01
1052.0 9.91E-01 210.0 9.99E-01
1060.0 9.91E-01 211.0 9.99E-01
1070.0 9.91E-01 212.0 9.99E-01
1081.0 9.91E-01 214.0 9.99E-01
1092.0 9.91E-01 225.0 9.99E-01
1099.0 9.91E-01 227.0 9.99E-01
1107.0 9.91E-01 228.0 9.99E-01
1113.0 9.92E-01 231.0 9.99E-01
1120.0 9.92E-01 232.0 9.99E-01
1131.0 9.92E-01 233.0 9.99E-01
1142.0 9.92E-01 234.0 9.99E-01
1148.0 9.92E-01 236.0 9.99E-01
1154.0 9.92E-01 237.0 9.99E-01
1162.0 9.92E-01 241.0 9.99E-01
1173.0 9.92E-01 244.0 9.99E-01
1182.0 9.92E-01 246.0 9.99E-01
1190.0 9.92E-01 248.0 9.99E-01
1201.0 9.92E-01 250.0 9.99E-01
1207.0 9.92E-01 252.0 9.99E-01
1215.0 9.92E-01 253.0 9.99E-01
1221.0 9.92E-01 254.0 9.99E-01
1228.0 9.92E-01 256.0 9.99E-01
1234.0 9.92E-01 257.0 9.99E-01
1240.0 9.92E-01 258.0 9.99E-01
1246.0 9.92E-01 259.0 9.99E-01
1256.0 9.92E-01 262.0 9.99E-01
1262.0 9.92E-01 263.0 9.99E-01
1268.0 9.93E-01 267.0 9.99E-01
1275.0 9.93E-01 268.0 9.99E-01
1281.0 9.93E-01 272.0 9.99E-01
1288.0 9.93E-01 274.0 9.99E-01
1294.0 9.93E-01 275.0 9.99E-01
1301.0 9.93E-01 277.0 9.99E-01
1307.0 9.93E-01 278.0 9.99E-01
1314.0 9.93E-01 279.0 9.99E-01
1320.0 9.93E-01 283.0 9.99E-01
1326.0 9.93E-01 285.0 9.99E-01
1332.0 9.93E-01 286.0 9.99E-01
1338.0 9.94E-01 287.0 9.99E-01
1344.0 9.94E-01 289.0 9.99E-01
1350.0 9.94E-01 291.0 9.99E-01
1356.0 9.94E-01 294.0 9.99E-01
1364.0 9.94E-01 295.0 9.99E-01
1370.0 9.94E-01 299.0 9.99E-01
1378.0 9.94E-01 300.0 9.99E-01
1384.0 9.94E-01 302.0 9.99E-01
1395.0 9.94E-01 306.0 9.99E-01
1401.0 9.94E-01 307.0 9.99E-01
1411.0 9.94E-01 309.0 1.00E+00
1417.0 9.95E-01 313.0 1.00E+00
1424.0 9.95E-01 316.0 1.00E+00
1432.0 9.95E-01 320.0 1.00E+00
1442.0 9.95E-01 325.0 1.00E+00
1449.0 9.95E-01 330.0 1.00E+00
1455.0 9.95E-01 333.0 1.00E+00
1461.0 9.95E-01 334.0 1.00E+00
1468.0 9.95E-01 335.0 1.00E+00
1474.0 9.95E-01 336.0 1.00E+00
1484.0 9.95E-01 341.0 1.00E+00
1492.0 9.95E-01 344.0 1.00E+00
1499.0 9.96E-01 346.0 1.00E+00
1510.0 9.96E-01 358.0 1.00E+00
1517.0 9.96E-01 365.0 1.00E+00
1524.0 9.96E-01 368.0 1.00E+00
1533.0 9.96E-01 369.0 1.00E+00
1539.0 9.96E-01 390.0 1.00E+00
1547.0 9.96E-01 393.0 1.00E+00
1553.0 9.96E-01 411.0 1.00E+00
1559.0 9.96E-01 422.0 1.00E+00
1565.0 9.96E-01 423.0 1.00E+00
1571.0 9.96E-01 449.0 1.00E+00
1577.0 9.97E-01 459.0 1.00E+00
1585.0 9.97E-01 462.0 1.00E+00
1592.0 9.97E-01 499.0 1.00E+00
1599.0 9.97E-01 505.0 1.00E+00
1605.0 9.97E-01 510.0 1.00E+00
1617.0 9.97E-01 545.0 1.00E+00
1626.0 9.97E-01 559.0 1.00E+00
1641.0 9.97E-01 593.0 1.00E+00
1647.0 9.97E-01 619.0 1.00E+00
1657.0 9.97E-01 644.0 1.00E+00
1664.0 9.97E-01 788.0 1.00E+00
1672.0 9.97E-01 828.0 1.00E+00
1683.0 9.97E-01 853.0 1.00E+00
1692.0 9.97E-01 935.0 1.00E+00
1699.0 9.97E-01 941.0 1.00E+00
1709.0 9.97E-01 1018.0 1.00E+00
1715.0 9.97E-01 1263.0 1.00E+00
1727.0 9.97E-01 1306.0 1.00E+00
1733.0 9.97E-01 1373.0 1.00E+00
1739.0 9.98E-01 1373.0 1.00E+00
1749.0 9.98E-01
1757.0 9.98E-01
1763.0 9.98E-01
1771.0 9.98E-01
1784.0 9.98E-01
1791.0 9.98E-01
1801.0 9.98E-01
1817.0 9.98E-01
1836.0 9.98E-01
1848.0 9.98E-01
1859.0 9.98E-01
1870.0 9.98E-01
1885.0 9.98E-01
1898.0 9.98E-01
1913.0 9.98E-01
1930.0 9.98E-01
1936.0 9.98E-01
1949.0 9.98E-01
1956.0 9.98E-01
1969.0 9.98E-01
1976.0 9.99E-01
1985.0 9.99E-01
1999.0 9.99E-01
2008.0 9.99E-01
2018.0 9.99E-01
2033.0 9.99E-01
2049.0 9.99E-01
2055.0 9.99E-01
2063.0 9.99E-01
2074.0 9.99E-01
2091.0 9.99E-01
2098.0 9.99E-01
2112.0 9.99E-01
2123.0 9.99E-01
2133.0 9.99E-01
2140.0 9.99E-01
2150.0 9.99E-01
2156.0 9.99E-01
2169.0 9.99E-01
2175.0 9.99E-01
2181.0 9.99E-01
2193.0 9.99E-01
2210.0 9.99E-01
2222.0 9.99E-01
2232.0 9.99E-01
2251.0 9.99E-01
2262.0 9.99E-01
2271.0 9.99E-01
2281.0 9.99E-01
2288.0 9.99E-01
2297.0 9.99E-01
2319.0 9.99E-01
2325.0 1.00E+00
2336.0 1.00E+00
2343.0 1.00E+00
2401.0 1.00E+00
2420.0 1.00E+00
2435.0 1.00E+00
2451.0 1.00E+00
2478.0 1.00E+00
2509.0 1.00E+00
2526.0 1.00E+00
2570.0 1.00E+00
2614.0 1.00E+00
2718.0 1.00E+00
2786.0 1.00E+00
3231.0 1.00E+00
3242.0 1.00E+00
3439.0 1.00E+00
4005.0 1.00E+00
4005.0 1.00E+00
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1E+00 1E+01 1E+02 1E+03 1E+04
Cum
ula
tive
Fra
ctio
n
Log of Normalized Drop Rate
1E+001E-011E-021E-031E-04
OpenTCP
1
OpenTCP2
TCP
(b)
Figure 5.7: Comparing the CDF of FCT (a) and drop rate (b) of OpenTCP1, OpenTCP2, andregular TCP while running the “all-to-all” IMB benchmark.
Experiment Results. To run our OpenTCP experiments, we �rst de�ne two congestion control
policies for OpenTCP: OpenTCP1 and OpenTCP2. We then compare the performance of
OpenTCP1, OpenTCP2, and unmodi�ed TCP in terms of Flow Completion Times (FCT) and
packet drop rates using di�erent benchmarks. Finally, we take a closer look at the overheads of
OpenTCP and the impact of changing the refresh rate.
OpenTCP1 – Minimize FCTs. Our �rst experiment with OpenTCP aims at reducing FCTs by
updating the initial congestion window size, and the retransmission time-out (RTO). In SciNet,
there is a strong correlation between MPI job completion times and the tail of FCT. �erefore,
reducing FCTs will have a direct impact on job completion times.
OpenTCP1 collects link utilizations, drop rate, ratio of short-lived to long-lived �ows and
number of active �ows. �e optimization function is to �nd the appropriate value for the initial
Chapter 5. Use Cases 103
OpenTCP TCP
< 5 44.13613 90.71601
[5-9] 10.30827 9.283988
[9-14] 7.219584 0
[15-19] 13.08433 0
[20-24] 5.071508 0
[25-29] 5.060271 0
[30-34] 14.73941 0
>=35 0.380487 0
0.0251 0
0.238198 0
0
20
40
60
80
100
< 5 [5-9] [10-14] [15-19] [20-24] [25-29] ! 30Per
cen
tage
of
Pack
ets
Congestion Window (Segments)
OpenTCP TCP1
Figure 5.8: Congestion window distribution for OpenTCP1 and unmodi�ed TCP.
congestion window size (w0) as long as the maximum link utilization is kept below 70%. If the
link utilization goes above 70% for any of the links, the Oracle will stop sending CUEs and the
CCAs will fall back to the default values of the initial congestion window size and the RTO.
OpenTCP2 – Minimize FCTs, limit drops. While OpenTCP1 aims at improving FCTs by
adapting init_cwnd and ignoring packet drops, we de�ne a second variant called OpenTCP2
which improves upon OpenTCP1 by keeping the packet drop rate below 0.1%. �is means that
when the CCA observes a packet drop rate above 0.1% it reverts the initial congestion window
size and the RTO to their default values. �is ensures that when the CCA observes a packet
drop rate above 0.1% it returns the initial congestion window size and the RTO to their default
values.
Impact on �ow completion times. Figure 5.7 depicts the CDF of �ow completion times for
OpenTCP1, OpenTCP2, and unmodi�ed TCP. Here, the tail of the FCT curve for OpenTCP1
is almost 64% shorter than for the unmodi�ed TCP. As mentioned above, this leads to faster
job completion in our experiments. Additionally, more than 45% of the �ows �nish in under
260 seconds in OpenTCP1, indicating a 62% improvement over unmodi�ed TCP. Similarly,
OpenTCP2 helps 80% of the �ows �nish in a fraction of a second, a signi�cant improvement
over OpenTCP1. �is is because of OpenTCP2’s fast reaction to drop rate at the end-hosts.
In OpenTCP2, the tail of the FCT curve is 324 seconds, a 59% improvement over unmodi�ed
Chapter 5. Use Cases 104
TCP. �e FCT tail of OpenTCP2 is slightly (8%) longer than OpenTCP1 since it incorporates
conditional CUEs and does not aggressively increase the initial congestion window. Moreover,
note that the big gap between FCT of TCP and OpenTCP variants is due to nature of all-to-all
benchmark that is stressing the SciNet network. In this benchmark, every process is sending
and receiving to/from all other processes. �is creates a TCP map-behaviour similar to TCP
incast throughput collapse [119].
Impact on congestion window size.�e improvements in FCTs are a direct result of increasing
init_cwnd in OpenTCP1. Figure 5.8 presents a histogram of congestion window sizes for
OpenTCP1 and unmodi�ed TCP. In the case of unmodi�ed TCP, more than 90% of packets are
sent when the source has a congestion window size of three, four, or �ve segments. OpenTCP1,
however, is able to operate at a larger range for congestion window sizes: more than 50% of
packets are sent while the congestion window size is greater than 5 segments.
Impact on packet drops. �e FCT improvements in OpenTCP1 and OpenTCP2 come at a
price. Clearly, operating at larger congestion window sizes will result in a higher probability
of congestion in the network, and thus, may lead to packet drops. As Figure 5.9 shows, the drop
rate distribution for unmodi�ed TCP has a shorter tail compared to OpenTCP1 and OpenTCP2.
As expected, OpenTCP1 introduces a considerable amount of drops in the network since the
CUEs are not conditional and thus the CCAs do not react to packet drops. OpenTCP2 has no
drops for more than 81% of the �ows. �is is a signi�cant improvement over unmodi�ed TCP.
However, the highest 18% of drops in OpenTCP2 are worse than those in unmodi�ed TCP, as
OpenTCP2 needs to observe packet drops before it can react. Some �ows will take the hit in this
case, and might end up with relatively large drop rates.
�e Flash Benchmark. All the experiments described so far use the “all-to-all” IMB benchmark.
�e Flash benchmark is a selection of user jobs that we use to evaluate OpenTCP under
realistic workloads. Flash solves the Sedov point-blast wave problem [109], a self-similar
description of the evolution of a blast wave arising from a powerful explosion. �e Flash
benchmark represents a common implementation and tra�c patterns in SciNet; it generates
Chapter 5. Use Cases 105
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 50 100 150 200 250 300C
um
ula
tive F
racti
on
FCT (Seconds)
OpenTCP1
OpenTCP2
TCP
(a)
0.0 2.12E-01 0.0 6.62E-02 0.0 8.13E-01
0.0 2.12E-01 0.0 6.62E-02 0.0 8.13E-01
1.0 4.54E-01 1.0 7.88E-01 1.0 8.73E-01
2.0 5.32E-01 2.0 8.55E-01 2.0 9.05E-01
3.0 5.69E-01 3.0 8.68E-01 3.0 9.28E-01
4.0 5.84E-01 6.0 8.81E-01 4.0 9.44E-01
5.0 5.97E-01 9.0 8.86E-01 5.0 9.56E-01
7.0 6.13E-01 12.0 8.90E-01 6.0 9.63E-01
9.0 6.23E-01 15.0 8.93E-01 7.0 9.69E-01
11.0 6.34E-01 18.0 8.96E-01 8.0 9.74E-01
13.0 6.42E-01 21.0 8.98E-01 9.0 9.78E-01
15.0 6.49E-01 24.0 9.00E-01 10.0 9.82E-01
17.0 6.54E-01 27.0 9.02E-01 11.0 9.85E-01
19.0 6.59E-01 30.0 9.04E-01 12.0 9.87E-01
21.0 6.63E-01 33.0 9.05E-01 13.0 9.89E-01
23.0 6.68E-01 36.0 9.07E-01 14.0 9.91E-01
25.0 6.74E-01 39.0 9.09E-01 15.0 9.93E-01
27.0 6.78E-01 42.0 9.10E-01 16.0 9.95E-01
29.0 6.81E-01 45.0 9.11E-01 17.0 9.96E-01
31.0 6.85E-01 48.0 9.12E-01 18.0 9.96E-01
33.0 6.87E-01 51.0 9.13E-01 19.0 9.97E-01
35.0 6.91E-01 54.0 9.14E-01 20.0 9.98E-01
37.0 6.94E-01 57.0 9.15E-01 21.0 9.98E-01
39.0 6.98E-01 60.0 9.17E-01 22.0 9.99E-01
41.0 7.02E-01 63.0 9.18E-01 23.0 9.99E-01
43.0 7.05E-01 66.0 9.18E-01 24.0 9.99E-01
45.0 7.07E-01 69.0 9.20E-01 25.0 9.99E-01
47.0 7.10E-01 72.0 9.20E-01 26.0 1.00E+00
49.0 7.12E-01 75.0 9.21E-01 27.0 1.00E+00
51.0 7.15E-01 78.0 9.22E-01 28.0 1.00E+00
53.0 7.17E-01 81.0 9.23E-01 33.0 1.00E+00
55.0 7.18E-01 84.0 9.24E-01 35.0 1.00E+00
57.0 7.20E-01 87.0 9.25E-01 35.0 1.00E+00
59.0 7.23E-01 90.0 9.26E-01
61.0 7.25E-01 93.0 9.27E-01
63.0 7.27E-01 96.0 9.27E-01
65.0 7.29E-01 99.0 9.28E-01
67.0 7.32E-01 102.0 9.29E-01
69.0 7.34E-01 105.0 9.30E-01
71.0 7.36E-01 108.0 9.30E-01
73.0 7.38E-01 111.0 9.31E-01
75.0 7.40E-01 114.0 9.32E-01
77.0 7.42E-01 117.0 9.32E-01
79.0 7.44E-01 120.0 9.33E-01
81.0 7.46E-01 123.0 9.33E-01
83.0 7.48E-01 126.0 9.34E-01
85.0 7.50E-01 129.0 9.35E-01
87.0 7.51E-01 132.0 9.35E-01
89.0 7.53E-01 135.0 9.36E-01
91.0 7.55E-01 138.0 9.37E-01
93.0 7.56E-01 141.0 9.37E-01
95.0 7.58E-01 144.0 9.38E-01
97.0 7.59E-01 147.0 9.39E-01
99.0 7.60E-01 150.0 9.39E-01
101.0 7.61E-01 153.0 9.40E-01
103.0 7.62E-01 156.0 9.40E-01
105.0 7.64E-01 159.0 9.41E-01
107.0 7.66E-01 162.0 9.42E-01
109.0 7.68E-01 165.0 9.42E-01
111.0 7.69E-01 168.0 9.43E-01
113.0 7.70E-01 171.0 9.43E-01
115.0 7.72E-01 174.0 9.44E-01
117.0 7.73E-01 177.0 9.44E-01
119.0 7.75E-01 180.0 9.45E-01
121.0 7.77E-01 183.0 9.45E-01
123.0 7.78E-01 186.0 9.46E-01
125.0 7.80E-01 189.0 9.46E-01
127.0 7.82E-01 192.0 9.46E-01
129.0 7.83E-01 195.0 9.47E-01
131.0 7.84E-01 198.0 9.47E-01
133.0 7.86E-01 201.0 9.48E-01
135.0 7.87E-01 204.0 9.48E-01
137.0 7.89E-01 207.0 9.48E-01
139.0 7.90E-01 210.0 9.49E-01
141.0 7.91E-01 213.0 9.49E-01
143.0 7.93E-01 216.0 9.50E-01
145.0 7.93E-01 219.0 9.50E-01
147.0 7.95E-01 222.0 9.51E-01
149.0 7.96E-01 225.0 9.51E-01
151.0 7.98E-01 228.0 9.51E-01
153.0 7.99E-01 231.0 9.52E-01
155.0 8.01E-01 234.0 9.52E-01
157.0 8.02E-01 237.0 9.52E-01
159.0 8.03E-01 240.0 9.53E-01
161.0 8.05E-01 243.0 9.53E-01
163.0 8.06E-01 246.0 9.53E-01
165.0 8.07E-01 249.0 9.54E-01
167.0 8.09E-01 252.0 9.54E-01
169.0 8.10E-01 255.0 9.54E-01
171.0 8.11E-01 258.0 9.55E-01
173.0 8.12E-01 261.0 9.55E-01
175.0 8.14E-01 264.0 9.55E-01
177.0 8.15E-01 267.0 9.56E-01
179.0 8.16E-01 270.0 9.56E-01
181.0 8.17E-01 273.0 9.56E-01
183.0 8.19E-01 276.0 9.57E-01
185.0 8.20E-01 279.0 9.57E-01
187.0 8.21E-01 282.0 9.57E-01
189.0 8.22E-01 285.0 9.58E-01
191.0 8.23E-01 288.0 9.58E-01
193.0 8.24E-01 291.0 9.58E-01
195.0 8.25E-01 294.0 9.58E-01
197.0 8.26E-01 297.0 9.59E-01
199.0 8.27E-01 300.0 9.59E-01
201.0 8.28E-01 303.0 9.59E-01
203.0 8.29E-01 306.0 9.60E-01
205.0 8.30E-01 309.0 9.60E-01
207.0 8.31E-01 312.0 9.60E-01
209.0 8.32E-01 315.0 9.60E-01
211.0 8.32E-01 318.0 9.61E-01
213.0 8.33E-01 321.0 9.61E-01
215.0 8.34E-01 324.0 9.61E-01
217.0 8.36E-01 327.0 9.61E-01
219.0 8.36E-01 330.0 9.62E-01
221.0 8.37E-01 333.0 9.62E-01
223.0 8.38E-01 336.0 9.62E-01
225.0 8.38E-01 339.0 9.63E-01
227.0 8.40E-01 342.0 9.63E-01
229.0 8.40E-01 345.0 9.63E-01
231.0 8.41E-01 348.0 9.63E-01
233.0 8.42E-01 351.0 9.63E-01
235.0 8.42E-01 354.0 9.64E-01
237.0 8.43E-01 357.0 9.64E-01
239.0 8.44E-01 360.0 9.64E-01
241.0 8.45E-01 363.0 9.65E-01
243.0 8.46E-01 366.0 9.65E-01
245.0 8.47E-01 369.0 9.65E-01
247.0 8.47E-01 372.0 9.65E-01
249.0 8.48E-01 375.0 9.65E-01
251.0 8.49E-01 378.0 9.66E-01
253.0 8.50E-01 381.0 9.66E-01
255.0 8.50E-01 384.0 9.66E-01
257.0 8.51E-01 387.0 9.66E-01
259.0 8.52E-01 390.0 9.67E-01
261.0 8.52E-01 393.0 9.67E-01
263.0 8.53E-01 396.0 9.67E-01
265.0 8.54E-01 399.0 9.67E-01
267.0 8.54E-01 402.0 9.67E-01
269.0 8.55E-01 405.0 9.68E-01
271.0 8.56E-01 408.0 9.68E-01
273.0 8.56E-01 411.0 9.68E-01
275.0 8.57E-01 415.0 9.68E-01
277.0 8.58E-01 418.0 9.68E-01
279.0 8.59E-01 421.0 9.69E-01
281.0 8.60E-01 424.0 9.69E-01
283.0 8.61E-01 427.0 9.69E-01
285.0 8.62E-01 430.0 9.69E-01
287.0 8.63E-01 433.0 9.69E-01
289.0 8.63E-01 436.0 9.70E-01
291.0 8.64E-01 439.0 9.70E-01
293.0 8.64E-01 442.0 9.70E-01
295.0 8.64E-01 445.0 9.70E-01
297.0 8.65E-01 448.0 9.70E-01
299.0 8.65E-01 451.0 9.70E-01
301.0 8.66E-01 454.0 9.70E-01
303.0 8.67E-01 457.0 9.71E-01
305.0 8.67E-01 460.0 9.71E-01
307.0 8.68E-01 463.0 9.71E-01
309.0 8.68E-01 467.0 9.71E-01
311.0 8.68E-01 470.0 9.71E-01
313.0 8.69E-01 473.0 9.71E-01
315.0 8.70E-01 476.0 9.71E-01
317.0 8.70E-01 479.0 9.72E-01
319.0 8.71E-01 482.0 9.72E-01
321.0 8.71E-01 485.0 9.72E-01
323.0 8.72E-01 489.0 9.72E-01
325.0 8.73E-01 492.0 9.72E-01
327.0 8.73E-01 495.0 9.72E-01
329.0 8.74E-01 498.0 9.72E-01
331.0 8.74E-01 501.0 9.72E-01
333.0 8.75E-01 504.0 9.73E-01
335.0 8.76E-01 507.0 9.73E-01
337.0 8.76E-01 510.0 9.73E-01
339.0 8.77E-01 513.0 9.73E-01
341.0 8.77E-01 516.0 9.73E-01
343.0 8.78E-01 519.0 9.73E-01
345.0 8.78E-01 522.0 9.73E-01
347.0 8.79E-01 526.0 9.73E-01
349.0 8.80E-01 529.0 9.73E-01
351.0 8.81E-01 532.0 9.74E-01
353.0 8.81E-01 535.0 9.74E-01
355.0 8.82E-01 538.0 9.74E-01
357.0 8.83E-01 541.0 9.74E-01
359.0 8.83E-01 544.0 9.74E-01
361.0 8.84E-01 547.0 9.74E-01
363.0 8.84E-01 551.0 9.74E-01
365.0 8.85E-01 554.0 9.74E-01
367.0 8.85E-01 557.0 9.74E-01
369.0 8.85E-01 560.0 9.75E-01
371.0 8.86E-01 563.0 9.75E-01
373.0 8.87E-01 566.0 9.75E-01
376.0 8.87E-01 569.0 9.75E-01
378.0 8.88E-01 572.0 9.75E-01
380.0 8.88E-01 575.0 9.75E-01
382.0 8.89E-01 578.0 9.75E-01
384.0 8.89E-01 581.0 9.75E-01
386.0 8.90E-01 584.0 9.75E-01
388.0 8.90E-01 587.0 9.76E-01
390.0 8.91E-01 590.0 9.76E-01
392.0 8.91E-01 593.0 9.76E-01
394.0 8.91E-01 597.0 9.76E-01
396.0 8.92E-01 600.0 9.76E-01
398.0 8.93E-01 603.0 9.76E-01
400.0 8.93E-01 606.0 9.76E-01
402.0 8.94E-01 611.0 9.76E-01
404.0 8.94E-01 614.0 9.76E-01
406.0 8.94E-01 617.0 9.76E-01
408.0 8.94E-01 621.0 9.76E-01
410.0 8.95E-01 624.0 9.77E-01
412.0 8.96E-01 627.0 9.77E-01
414.0 8.96E-01 630.0 9.77E-01
416.0 8.97E-01 633.0 9.77E-01
419.0 8.97E-01 637.0 9.77E-01
421.0 8.98E-01 640.0 9.77E-01
423.0 8.98E-01 643.0 9.77E-01
425.0 8.99E-01 646.0 9.77E-01
427.0 8.99E-01 651.0 9.77E-01
429.0 9.00E-01 654.0 9.77E-01
431.0 9.00E-01 658.0 9.77E-01
433.0 9.00E-01 661.0 9.77E-01
435.0 9.00E-01 664.0 9.77E-01
438.0 9.01E-01 668.0 9.78E-01
440.0 9.01E-01 671.0 9.78E-01
442.0 9.02E-01 674.0 9.78E-01
444.0 9.02E-01 678.0 9.78E-01
446.0 9.02E-01 682.0 9.78E-01
450.0 9.02E-01 685.0 9.78E-01
452.0 9.03E-01 689.0 9.78E-01
454.0 9.03E-01 692.0 9.78E-01
456.0 9.03E-01 698.0 9.78E-01
458.0 9.04E-01 701.0 9.78E-01
461.0 9.04E-01 705.0 9.78E-01
463.0 9.05E-01 708.0 9.78E-01
465.0 9.05E-01 711.0 9.79E-01
467.0 9.05E-01 714.0 9.79E-01
469.0 9.06E-01 717.0 9.79E-01
471.0 9.06E-01 720.0 9.79E-01
473.0 9.06E-01 723.0 9.79E-01
475.0 9.07E-01 726.0 9.79E-01
477.0 9.07E-01 729.0 9.79E-01
479.0 9.07E-01 733.0 9.79E-01
481.0 9.08E-01 737.0 9.79E-01
483.0 9.08E-01 743.0 9.79E-01
485.0 9.08E-01 746.0 9.79E-01
487.0 9.08E-01 750.0 9.80E-01
489.0 9.09E-01 753.0 9.80E-01
491.0 9.09E-01 756.0 9.80E-01
493.0 9.09E-01 760.0 9.80E-01
496.0 9.10E-01 763.0 9.80E-01
498.0 9.10E-01 766.0 9.80E-01
501.0 9.10E-01 769.0 9.80E-01
503.0 9.10E-01 772.0 9.80E-01
505.0 9.11E-01 775.0 9.80E-01
507.0 9.11E-01 778.0 9.80E-01
511.0 9.11E-01 784.0 9.80E-01
513.0 9.12E-01 787.0 9.80E-01
515.0 9.12E-01 791.0 9.81E-01
518.0 9.12E-01 795.0 9.81E-01
521.0 9.12E-01 798.0 9.81E-01
523.0 9.12E-01 801.0 9.81E-01
525.0 9.13E-01 804.0 9.81E-01
527.0 9.13E-01 807.0 9.81E-01
530.0 9.13E-01 810.0 9.81E-01
532.0 9.13E-01 814.0 9.81E-01
534.0 9.14E-01 817.0 9.81E-01
536.0 9.14E-01 822.0 9.81E-01
538.0 9.14E-01 826.0 9.81E-01
540.0 9.15E-01 832.0 9.81E-01
542.0 9.15E-01 835.0 9.81E-01
544.0 9.15E-01 838.0 9.81E-01
547.0 9.15E-01 841.0 9.81E-01
550.0 9.16E-01 844.0 9.81E-01
552.0 9.16E-01 847.0 9.81E-01
554.0 9.17E-01 850.0 9.81E-01
556.0 9.17E-01 853.0 9.82E-01
559.0 9.17E-01 856.0 9.82E-01
563.0 9.18E-01 863.0 9.82E-01
565.0 9.18E-01 866.0 9.82E-01
568.0 9.18E-01 870.0 9.82E-01
570.0 9.19E-01 873.0 9.82E-01
572.0 9.19E-01 876.0 9.82E-01
574.0 9.19E-01 879.0 9.82E-01
576.0 9.19E-01 883.0 9.82E-01
578.0 9.20E-01 886.0 9.82E-01
580.0 9.20E-01 889.0 9.82E-01
582.0 9.20E-01 893.0 9.82E-01
584.0 9.20E-01 896.0 9.82E-01
587.0 9.20E-01 899.0 9.82E-01
589.0 9.21E-01 902.0 9.82E-01
591.0 9.21E-01 907.0 9.82E-01
594.0 9.21E-01 910.0 9.82E-01
597.0 9.21E-01 913.0 9.83E-01
599.0 9.22E-01 916.0 9.83E-01
601.0 9.22E-01 919.0 9.83E-01
603.0 9.22E-01 922.0 9.83E-01
605.0 9.22E-01 925.0 9.83E-01
609.0 9.22E-01 928.0 9.83E-01
612.0 9.23E-01 932.0 9.83E-01
614.0 9.23E-01 937.0 9.83E-01
616.0 9.23E-01 940.0 9.83E-01
618.0 9.23E-01 945.0 9.83E-01
621.0 9.24E-01 948.0 9.83E-01
623.0 9.24E-01 951.0 9.83E-01
625.0 9.24E-01 955.0 9.83E-01
630.0 9.24E-01 958.0 9.83E-01
633.0 9.24E-01 961.0 9.83E-01
635.0 9.25E-01 967.0 9.84E-01
638.0 9.25E-01 970.0 9.84E-01
640.0 9.25E-01 977.0 9.84E-01
642.0 9.25E-01 981.0 9.84E-01
646.0 9.25E-01 984.0 9.84E-01
648.0 9.25E-01 988.0 9.84E-01
650.0 9.25E-01 1000.0 9.84E-01
652.0 9.26E-01 1006.0 9.84E-01
654.0 9.26E-01 1010.0 9.84E-01
656.0 9.26E-01 1015.0 9.84E-01
659.0 9.27E-01 1022.0 9.84E-01
661.0 9.27E-01 1025.0 9.84E-01
664.0 9.27E-01 1028.0 9.84E-01
666.0 9.27E-01 1031.0 9.84E-01
668.0 9.28E-01 1035.0 9.84E-01
670.0 9.28E-01 1039.0 9.84E-01
672.0 9.28E-01 1043.0 9.84E-01
677.0 9.28E-01 1048.0 9.84E-01
679.0 9.29E-01 1051.0 9.85E-01
681.0 9.29E-01 1054.0 9.85E-01
685.0 9.29E-01 1057.0 9.85E-01
688.0 9.29E-01 1060.0 9.85E-01
693.0 9.29E-01 1067.0 9.85E-01
696.0 9.30E-01 1072.0 9.85E-01
699.0 9.30E-01 1075.0 9.85E-01
701.0 9.30E-01 1081.0 9.85E-01
703.0 9.30E-01 1085.0 9.85E-01
706.0 9.31E-01 1090.0 9.85E-01
708.0 9.31E-01 1093.0 9.85E-01
710.0 9.31E-01 1096.0 9.85E-01
716.0 9.31E-01 1102.0 9.85E-01
718.0 9.31E-01 1105.0 9.85E-01
722.0 9.32E-01 1111.0 9.85E-01
724.0 9.32E-01 1119.0 9.85E-01
726.0 9.32E-01 1123.0 9.85E-01
729.0 9.32E-01 1127.0 9.85E-01
731.0 9.33E-01 1130.0 9.86E-01
734.0 9.33E-01 1133.0 9.86E-01
738.0 9.33E-01 1136.0 9.86E-01
743.0 9.33E-01 1139.0 9.86E-01
745.0 9.33E-01 1143.0 9.86E-01
748.0 9.33E-01 1147.0 9.86E-01
751.0 9.34E-01 1152.0 9.86E-01
754.0 9.34E-01 1155.0 9.86E-01
757.0 9.34E-01 1158.0 9.86E-01
763.0 9.34E-01 1161.0 9.86E-01
765.0 9.35E-01 1170.0 9.86E-01
768.0 9.35E-01 1173.0 9.86E-01
773.0 9.35E-01 1177.0 9.86E-01
775.0 9.35E-01 1183.0 9.86E-01
778.0 9.35E-01 1187.0 9.86E-01
780.0 9.35E-01 1191.0 9.86E-01
782.0 9.36E-01 1194.0 9.86E-01
786.0 9.36E-01 1197.0 9.86E-01
789.0 9.36E-01 1200.0 9.87E-01
791.0 9.36E-01 1205.0 9.87E-01
793.0 9.36E-01 1208.0 9.87E-01
795.0 9.36E-01 1211.0 9.87E-01
797.0 9.37E-01 1214.0 9.87E-01
800.0 9.37E-01 1217.0 9.87E-01
802.0 9.37E-01 1220.0 9.87E-01
805.0 9.37E-01 1223.0 9.87E-01
809.0 9.37E-01 1228.0 9.87E-01
811.0 9.37E-01 1235.0 9.87E-01
814.0 9.38E-01 1238.0 9.87E-01
817.0 9.38E-01 1243.0 9.87E-01
819.0 9.38E-01 1248.0 9.87E-01
821.0 9.38E-01 1253.0 9.87E-01
823.0 9.38E-01 1256.0 9.87E-01
826.0 9.38E-01 1259.0 9.88E-01
828.0 9.38E-01 1264.0 9.88E-01
831.0 9.38E-01 1269.0 9.88E-01
835.0 9.39E-01 1273.0 9.88E-01
840.0 9.39E-01 1276.0 9.88E-01
844.0 9.39E-01 1280.0 9.88E-01
847.0 9.39E-01 1283.0 9.88E-01
849.0 9.39E-01 1286.0 9.88E-01
852.0 9.39E-01 1289.0 9.88E-01
854.0 9.39E-01 1292.0 9.88E-01
857.0 9.40E-01 1295.0 9.88E-01
860.0 9.40E-01 1298.0 9.88E-01
862.0 9.40E-01 1301.0 9.88E-01
865.0 9.40E-01 1305.0 9.88E-01
867.0 9.40E-01 1310.0 9.89E-01
869.0 9.40E-01 1314.0 9.89E-01
872.0 9.41E-01 1317.0 9.89E-01
874.0 9.41E-01 1321.0 9.89E-01
877.0 9.41E-01 1324.0 9.89E-01
880.0 9.41E-01 1328.0 9.89E-01
882.0 9.41E-01 1331.0 9.89E-01
886.0 9.42E-01 1334.0 9.89E-01
890.0 9.42E-01 1337.0 9.89E-01
893.0 9.42E-01 1340.0 9.89E-01
895.0 9.42E-01 1343.0 9.89E-01
901.0 9.42E-01 1347.0 9.89E-01
903.0 9.43E-01 1350.0 9.89E-01
907.0 9.43E-01 1353.0 9.89E-01
911.0 9.43E-01 1357.0 9.89E-01
913.0 9.43E-01 1361.0 9.90E-01
915.0 9.44E-01 1365.0 9.90E-01
920.0 9.44E-01 1368.0 9.90E-01
923.0 9.44E-01 1373.0 9.90E-01
927.0 9.44E-01 1376.0 9.90E-01
930.0 9.44E-01 1379.0 9.90E-01
935.0 9.44E-01 1382.0 9.90E-01
938.0 9.45E-01 1386.0 9.90E-01
941.0 9.45E-01 1391.0 9.90E-01
945.0 9.45E-01 1394.0 9.90E-01
948.0 9.45E-01 1397.0 9.90E-01
951.0 9.45E-01 1400.0 9.90E-01
953.0 9.45E-01 1404.0 9.90E-01
959.0 9.45E-01 1407.0 9.90E-01
962.0 9.45E-01 1410.0 9.90E-01
964.0 9.45E-01 1413.0 9.90E-01
970.0 9.45E-01 1416.0 9.91E-01
976.0 9.46E-01 1419.0 9.91E-01
979.0 9.46E-01 1424.0 9.91E-01
983.0 9.46E-01 1427.0 9.91E-01
987.0 9.46E-01 1430.0 9.91E-01
989.0 9.46E-01 1434.0 9.91E-01
991.0 9.46E-01 1437.0 9.91E-01
993.0 9.46E-01 1440.0 9.91E-01
995.0 9.47E-01 1443.0 9.91E-01
998.0 9.47E-01 1446.0 9.91E-01
1000.0 9.47E-01 1450.0 9.91E-01
1002.0 9.47E-01 1453.0 9.92E-01
1008.0 9.47E-01 1456.0 9.92E-01
1011.0 9.47E-01 1461.0 9.92E-01
1013.0 9.47E-01 1465.0 9.92E-01
1015.0 9.48E-01 1468.0 9.92E-01
1019.0 9.48E-01 1471.0 9.92E-01
1026.0 9.48E-01 1474.0 9.92E-01
1028.0 9.48E-01 1477.0 9.92E-01
1033.0 9.48E-01 1483.0 9.92E-01
1035.0 9.48E-01 1490.0 9.92E-01
1040.0 9.48E-01 1494.0 9.92E-01
1042.0 9.48E-01 1497.0 9.92E-01
1044.0 9.48E-01 1500.0 9.92E-01
1046.0 9.48E-01 1504.0 9.92E-01
1048.0 9.49E-01 1510.0 9.92E-01
1050.0 9.49E-01 1513.0 9.93E-01
1054.0 9.49E-01 1518.0 9.93E-01
1056.0 9.49E-01 1521.0 9.93E-01
1060.0 9.49E-01 1524.0 9.93E-01
1062.0 9.49E-01 1527.0 9.93E-01
1064.0 9.49E-01 1531.0 9.93E-01
1066.0 9.50E-01 1534.0 9.93E-01
1069.0 9.50E-01 1538.0 9.93E-01
1072.0 9.50E-01 1541.0 9.93E-01
1074.0 9.50E-01 1545.0 9.93E-01
1079.0 9.50E-01 1549.0 9.93E-01
1082.0 9.50E-01 1557.0 9.93E-01
1088.0 9.50E-01 1561.0 9.93E-01
1090.0 9.51E-01 1564.0 9.93E-01
1099.0 9.51E-01 1567.0 9.93E-01
1105.0 9.51E-01 1570.0 9.93E-01
1107.0 9.51E-01 1573.0 9.93E-01
1115.0 9.51E-01 1576.0 9.93E-01
1119.0 9.51E-01 1579.0 9.93E-01
1128.0 9.51E-01 1583.0 9.93E-01
1134.0 9.51E-01 1588.0 9.94E-01
1138.0 9.51E-01 1592.0 9.94E-01
1142.0 9.51E-01 1595.0 9.94E-01
1144.0 9.52E-01 1602.0 9.94E-01
1149.0 9.52E-01 1606.0 9.94E-01
1153.0 9.52E-01 1609.0 9.94E-01
1155.0 9.52E-01 1612.0 9.94E-01
1161.0 9.52E-01 1617.0 9.94E-01
1163.0 9.52E-01 1620.0 9.94E-01
1166.0 9.52E-01 1626.0 9.94E-01
1170.0 9.53E-01 1630.0 9.94E-01
1172.0 9.53E-01 1635.0 9.94E-01
1174.0 9.53E-01 1638.0 9.94E-01
1179.0 9.53E-01 1641.0 9.94E-01
1181.0 9.53E-01 1646.0 9.94E-01
1183.0 9.53E-01 1650.0 9.94E-01
1185.0 9.53E-01 1653.0 9.94E-01
1187.0 9.54E-01 1656.0 9.94E-01
1191.0 9.54E-01 1659.0 9.94E-01
1195.0 9.54E-01 1662.0 9.94E-01
1198.0 9.54E-01 1666.0 9.94E-01
1201.0 9.54E-01 1670.0 9.95E-01
1204.0 9.55E-01 1674.0 9.95E-01
1207.0 9.55E-01 1677.0 9.95E-01
1209.0 9.55E-01 1681.0 9.95E-01
1211.0 9.55E-01 1688.0 9.95E-01
1214.0 9.55E-01 1691.0 9.95E-01
1216.0 9.55E-01 1695.0 9.95E-01
1223.0 9.55E-01 1698.0 9.95E-01
1225.0 9.55E-01 1701.0 9.95E-01
1229.0 9.56E-01 1706.0 9.95E-01
1231.0 9.56E-01 1709.0 9.95E-01
1236.0 9.56E-01 1714.0 9.95E-01
1238.0 9.56E-01 1717.0 9.95E-01
1240.0 9.56E-01 1721.0 9.95E-01
1244.0 9.57E-01 1724.0 9.95E-01
1247.0 9.57E-01 1727.0 9.95E-01
1250.0 9.57E-01 1731.0 9.95E-01
1252.0 9.57E-01 1734.0 9.95E-01
1257.0 9.57E-01 1739.0 9.96E-01
1260.0 9.57E-01 1742.0 9.96E-01
1262.0 9.57E-01 1746.0 9.96E-01
1264.0 9.57E-01 1749.0 9.96E-01
1269.0 9.57E-01 1756.0 9.96E-01
1273.0 9.58E-01 1760.0 9.96E-01
1275.0 9.58E-01 1763.0 9.96E-01
1277.0 9.58E-01 1766.0 9.96E-01
1279.0 9.58E-01 1771.0 9.96E-01
1282.0 9.59E-01 1774.0 9.96E-01
1284.0 9.59E-01 1778.0 9.96E-01
1289.0 9.59E-01 1783.0 9.96E-01
1291.0 9.59E-01 1789.0 9.96E-01
1293.0 9.59E-01 1794.0 9.96E-01
1299.0 9.59E-01 1798.0 9.96E-01
1301.0 9.59E-01 1804.0 9.96E-01
1303.0 9.60E-01 1809.0 9.96E-01
1305.0 9.60E-01 1812.0 9.96E-01
1309.0 9.60E-01 1816.0 9.96E-01
1312.0 9.60E-01 1820.0 9.97E-01
1314.0 9.60E-01 1824.0 9.97E-01
1318.0 9.60E-01 1828.0 9.97E-01
1320.0 9.61E-01 1831.0 9.97E-01
1323.0 9.61E-01 1834.0 9.97E-01
1325.0 9.61E-01 1837.0 9.97E-01
1327.0 9.61E-01 1840.0 9.97E-01
1329.0 9.61E-01 1843.0 9.97E-01
1332.0 9.61E-01 1849.0 9.97E-01
1334.0 9.61E-01 1852.0 9.97E-01
1337.0 9.62E-01 1855.0 9.97E-01
1340.0 9.62E-01 1858.0 9.97E-01
1342.0 9.62E-01 1863.0 9.97E-01
1345.0 9.62E-01 1870.0 9.97E-01
1351.0 9.62E-01 1876.0 9.97E-01
1355.0 9.63E-01 1879.0 9.97E-01
1359.0 9.63E-01 1884.0 9.97E-01
1361.0 9.63E-01 1887.0 9.97E-01
1364.0 9.64E-01 1898.0 9.97E-01
1366.0 9.64E-01 1905.0 9.97E-01
1373.0 9.64E-01 1908.0 9.97E-01
1375.0 9.64E-01 1911.0 9.97E-01
1379.0 9.64E-01 1918.0 9.97E-01
1382.0 9.65E-01 1924.0 9.97E-01
1384.0 9.65E-01 1931.0 9.97E-01
1386.0 9.65E-01 1938.0 9.97E-01
1389.0 9.65E-01 1955.0 9.97E-01
1392.0 9.65E-01 1959.0 9.97E-01
1395.0 9.66E-01 1969.0 9.97E-01
1397.0 9.66E-01 1976.0 9.97E-01
1401.0 9.66E-01 1980.0 9.98E-01
1403.0 9.66E-01 1983.0 9.98E-01
1405.0 9.66E-01 1988.0 9.98E-01
1407.0 9.66E-01 1992.0 9.98E-01
1411.0 9.66E-01 1996.0 9.98E-01
1415.0 9.67E-01 1999.0 9.98E-01
1418.0 9.67E-01 2002.0 9.98E-01
1422.0 9.67E-01 2007.0 9.98E-01
1425.0 9.67E-01 2015.0 9.98E-01
1427.0 9.67E-01 2021.0 9.98E-01
1430.0 9.67E-01 2027.0 9.98E-01
1433.0 9.67E-01 2030.0 9.98E-01
1435.0 9.68E-01 2034.0 9.98E-01
1438.0 9.68E-01 2041.0 9.98E-01
1440.0 9.68E-01 2055.0 9.98E-01
1445.0 9.68E-01 2059.0 9.98E-01
1448.0 9.68E-01 2062.0 9.98E-01
1451.0 9.69E-01 2070.0 9.98E-01
1453.0 9.69E-01 2074.0 9.98E-01
1456.0 9.69E-01 2083.0 9.98E-01
1458.0 9.69E-01 2087.0 9.98E-01
1463.0 9.70E-01 2094.0 9.98E-01
1465.0 9.70E-01 2100.0 9.98E-01
1467.0 9.70E-01 2105.0 9.98E-01
1474.0 9.70E-01 2109.0 9.98E-01
1477.0 9.71E-01 2113.0 9.98E-01
1481.0 9.71E-01 2116.0 9.98E-01
1484.0 9.71E-01 2136.0 9.99E-01
1487.0 9.71E-01 2142.0 9.99E-01
1491.0 9.71E-01 2151.0 9.99E-01
1494.0 9.72E-01 2158.0 9.99E-01
1498.0 9.72E-01 2161.0 9.99E-01
1501.0 9.72E-01 2166.0 9.99E-01
1505.0 9.72E-01 2175.0 9.99E-01
1508.0 9.73E-01 2179.0 9.99E-01
1511.0 9.73E-01 2182.0 9.99E-01
1514.0 9.73E-01 2187.0 9.99E-01
1517.0 9.73E-01 2191.0 9.99E-01
1519.0 9.73E-01 2198.0 9.99E-01
1521.0 9.73E-01 2203.0 9.99E-01
1525.0 9.74E-01 2211.0 9.99E-01
1532.0 9.74E-01 2216.0 9.99E-01
1535.0 9.74E-01 2219.0 9.99E-01
1538.0 9.74E-01 2224.0 9.99E-01
1541.0 9.75E-01 2228.0 9.99E-01
1543.0 9.75E-01 2233.0 9.99E-01
1546.0 9.75E-01 2239.0 9.99E-01
1552.0 9.75E-01 2242.0 9.99E-01
1555.0 9.75E-01 2248.0 9.99E-01
1557.0 9.75E-01 2252.0 9.99E-01
1559.0 9.75E-01 2256.0 9.99E-01
1561.0 9.76E-01 2260.0 9.99E-01
1565.0 9.76E-01 2265.0 9.99E-01
1567.0 9.76E-01 2280.0 9.99E-01
1569.0 9.76E-01 2289.0 9.99E-01
1572.0 9.76E-01 2296.0 9.99E-01
1574.0 9.77E-01 2308.0 9.99E-01
1576.0 9.77E-01 2313.0 9.99E-01
1579.0 9.77E-01 2323.0 9.99E-01
1581.0 9.77E-01 2329.0 9.99E-01
1583.0 9.77E-01 2349.0 9.99E-01
1585.0 9.78E-01 2352.0 1.00E+00
1587.0 9.78E-01 2364.0 1.00E+00
1595.0 9.78E-01 2372.0 1.00E+00
1598.0 9.78E-01 2388.0 1.00E+00
1603.0 9.79E-01 2392.0 1.00E+00
1605.0 9.79E-01 2408.0 1.00E+00
1610.0 9.79E-01 2413.0 1.00E+00
1612.0 9.79E-01 2424.0 1.00E+00
1620.0 9.80E-01 2434.0 1.00E+00
1627.0 9.80E-01 2445.0 1.00E+00
1629.0 9.80E-01 2462.0 1.00E+00
1631.0 9.80E-01 2475.0 1.00E+00
1635.0 9.80E-01 2495.0 1.00E+00
1637.0 9.80E-01 2525.0 1.00E+00
1639.0 9.80E-01 2562.0 1.00E+00
1641.0 9.80E-01 2572.0 1.00E+00
1648.0 9.81E-01 2654.0 1.00E+00
1650.0 9.81E-01 2674.0 1.00E+00
1652.0 9.81E-01 2766.0 1.00E+00
1657.0 9.81E-01 2830.0 1.00E+00
1659.0 9.81E-01 2916.0 1.00E+00
1663.0 9.81E-01 3125.0 1.00E+00
1665.0 9.81E-01 3182.0 1.00E+00
1668.0 9.82E-01 3778.0 1.00E+00
1670.0 9.82E-01 3778.0 1.00E+00
1673.0 9.82E-01
1678.0 9.82E-01
1681.0 9.82E-01
1683.0 9.82E-01
1686.0 9.83E-01
1688.0 9.83E-01
1691.0 9.83E-01
1693.0 9.83E-01
1697.0 9.83E-01
1699.0 9.84E-01
1701.0 9.84E-01
1703.0 9.84E-01
1708.0 9.84E-01
1712.0 9.84E-01
1716.0 9.84E-01
1719.0 9.84E-01
1725.0 9.85E-01
1730.0 9.85E-01
1733.0 9.85E-01
1737.0 9.85E-01
1741.0 9.85E-01
1743.0 9.85E-01
1752.0 9.85E-01
1759.0 9.85E-01
1762.0 9.86E-01
1766.0 9.86E-01
1774.0 9.86E-01
1776.0 9.86E-01
1778.0 9.86E-01
1780.0 9.86E-01
1790.0 9.87E-01
1792.0 9.87E-01
1800.0 9.87E-01
1802.0 9.87E-01
1807.0 9.87E-01
1810.0 9.87E-01
1822.0 9.87E-01
1826.0 9.87E-01
1828.0 9.88E-01
1830.0 9.88E-01
1835.0 9.88E-01
1837.0 9.88E-01
1844.0 9.88E-01
1848.0 9.88E-01
1861.0 9.88E-01
1865.0 9.89E-01
1867.0 9.89E-01
1874.0 9.89E-01
1878.0 9.89E-01
1882.0 9.89E-01
1885.0 9.89E-01
1887.0 9.89E-01
1889.0 9.89E-01
1892.0 9.89E-01
1899.0 9.90E-01
1906.0 9.90E-01
1910.0 9.90E-01
1913.0 9.90E-01
1920.0 9.90E-01
1922.0 9.90E-01
1930.0 9.90E-01
1934.0 9.90E-01
1939.0 9.91E-01
1943.0 9.91E-01
1946.0 9.91E-01
1948.0 9.91E-01
1950.0 9.91E-01
1952.0 9.91E-01
1959.0 9.91E-01
1966.0 9.91E-01
1973.0 9.91E-01
1977.0 9.92E-01
1984.0 9.92E-01
1999.0 9.92E-01
2004.0 9.92E-01
2007.0 9.92E-01
2032.0 9.92E-01
2036.0 9.92E-01
2041.0 9.92E-01
2047.0 9.92E-01
2051.0 9.93E-01
2059.0 9.93E-01
2061.0 9.93E-01
2067.0 9.93E-01
2072.0 9.93E-01
2075.0 9.93E-01
2077.0 9.93E-01
2083.0 9.93E-01
2086.0 9.93E-01
2093.0 9.94E-01
2095.0 9.94E-01
2103.0 9.94E-01
2108.0 9.94E-01
2110.0 9.94E-01
2124.0 9.94E-01
2126.0 9.94E-01
2128.0 9.95E-01
2130.0 9.95E-01
2132.0 9.95E-01
2134.0 9.95E-01
2136.0 9.95E-01
2159.0 9.95E-01
2162.0 9.95E-01
2165.0 9.95E-01
2175.0 9.95E-01
2177.0 9.95E-01
2179.0 9.95E-01
2184.0 9.95E-01
2187.0 9.96E-01
2191.0 9.96E-01
2193.0 9.96E-01
2195.0 9.96E-01
2199.0 9.96E-01
2201.0 9.96E-01
2211.0 9.96E-01
2213.0 9.96E-01
2240.0 9.96E-01
2243.0 9.96E-01
2246.0 9.97E-01
2250.0 9.97E-01
2257.0 9.97E-01
2263.0 9.97E-01
2278.0 9.97E-01
2293.0 9.97E-01
2307.0 9.97E-01
2313.0 9.97E-01
2321.0 9.97E-01
2341.0 9.98E-01
2371.0 9.98E-01
2373.0 9.98E-01
2378.0 9.98E-01
2390.0 9.98E-01
2392.0 9.98E-01
2395.0 9.98E-01
2406.0 9.98E-01
2410.0 9.98E-01
2429.0 9.98E-01
2453.0 9.98E-01
2456.0 9.98E-01
2458.0 9.99E-01
2462.0 9.99E-01
2478.0 9.99E-01
2495.0 9.99E-01
2498.0 9.99E-01
2501.0 9.99E-01
2532.0 9.99E-01
2541.0 9.99E-01
2570.0 9.99E-01
2605.0 9.99E-01
2612.0 9.99E-01
2622.0 9.99E-01
2652.0 1.00E+00
2753.0 1.00E+00
2794.0 1.00E+00
3003.0 1.00E+00
3075.0 1.00E+00
3211.0 1.00E+00
3211.0 1.00E+00
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1E+00 1E+01 1E+02 1E+03 1E+04
Cum
ula
tive
Fra
ctio
n
1E-04 1E-03 1E-02 1E-01 1E-00
OpenTCP1
OpenTCP2TCP
Log of Normalized Drop Rate
(b)
Figure 5.9: Comparing the CDF of FCT (a) and drop rate (b) of OpenTCP1, OpenTCP2, andregular TCP while running the “Flash” benchmark.
about 12, 000 �ows. Figure 5.9 compares the CDF of �ow completion times and drop rates for
OpenTCP1, OpenTCP2, and unmodi�ed TCP. Like the results presented above, both OpenTCP1
and OpenTCP2 improve the tail of the FCT curve by at least 40%. But, OpenTCP2 is more
successful at reducing the drop rate as it has an explicit rule for controlling the drops.
OpenTCP has low overhead. Table 5.1 divides OpenTCP’s overhead into four categories: (i)
the CPU overhead associated with data aggregation and CUE generation at the Oracle, (ii) the
CPU overhead associated with data collection and CUE enforcement in the end-hosts, (iii)
the required bandwidth to transfer statistics from the end-hosts to the Oracle, and (iv) the
required bandwidth to disseminate CUEs to the end-hosts. We summarize OpenTCP’s overhead
for refresh periods of 1, 5, and 10 minutes in Table 5.1. �e table shows that OpenTCP has a
negligible processing and bandwidth overhead, making it easy to scale OpenTCP to large clusters
Chapter 5. Use Cases 106
Table 5.1: �e approximate OpenTCP overhead for a network with ∼4000 nodes and ∼100switches for di�erent Oracle refresh cycles.
1 min 5 min 10 minOracle CPU Overhead (%) 0.9 0.0 0.0CCA CPU Overhead (%) 0.5 0.1 0.0Data Collection (Kbps) 453 189 95CUE Dissemination (Kbps) 530 252 75
of hundreds of thousands of nodes. Clearly, OpenTCP’s refresh cycle plays a critical role in its
overhead on networking elements.
Stability. To realize OpenTCP in practical settings, we assume that the network operator has
expertise in de�ning the congestion control policies appropriate for the network. Moreover,
to achieve pragmatic stability, there should be a monitoring system in place which alerts the
operator of churns and instabilities in the network. �is monitoring component should alert
the controller whenever there is an oscillation between states. For example, if Oracle is making
adjustments to TCP �ows in time t1 and immediately in time t1 + T those changes are reverted
back, it is a good indication of a potential unstable condition. In this case, the operator should
be noti�ed to adjust either the congestion control policies, the stability constraints, or the overall
resources in the network. One simple stability metric is number of times a rule is applied and
reverted. �e monitoring system can measure such stability metrics and alert the operator.
On top of adapting TCP parameters based on network and tra�c conditions, OpenTCP can
go beyond to include application semantics into its decision making process. For example,
the target video streaming rate for video servers can be fed to OpenTCP’s Oracle. Another
application semantic could be �le sizes in a transfer that OpenTCP can use to derive hints for
TCP sessions.
Chapter 5. Use Cases 107
5.2 Beehive SDN Controller
Beehive is a generic platform for distributed programming. As such, Beehive provides the
building blocks for an SDN controller, not the speci�c functionalities of a controller. �at is,
in our design, the SDN controller is an application (and perhaps the most important use-case)
of Beehive. In this section, we discuss the SDN controller we have built on top of Beehive to
showcase the e�ectiveness of our proposal for SDN. It is important to note that the controller
we present here is only one possible design, and Beehive can be used as a framework to develop
other SDN control plane abstractions.
SDN Controller Suite. We have designed and implemented an SDN abstraction on top of
Beehive. Our implementation is open-source [3]. �is abstraction can act as a general-purpose,
distributed SDN controller as-is, or can be reused by the community to create other networking
systems. Our abstraction is formed based on simple and familiar SDN constructs and is
implemented as shown in Figure 5.10.
Network object model (NOM) is the core of our SDN abstraction that represents the network
in the form of a graph (similar to ONIX NIB [75]): nodes, links, ports, and �ow entries in a
southbound protocol agnostic manner. We skip the de�nition for nodes, ports, links, and �ow
entries as they are generic and well-known.
In addition to these common and well-known graph constructs, Beehive’s SDN abstraction
supports triggers and paths that will simplify network programming at scale. Triggers are
basically active thresholds on bandwidth consumption and duration of �ow entries. �ey are
installed by network applications and pushed close to their respective switches by the node
controller (explained shortly). Triggers are a scalable replacement for polling switches directly
from di�erent control applications. A path is an atomic, transactional end-to-end forwarding
rule that is automatically compiled into individual �ows across switches in the network. Using
paths, control applications can rely on the platform to manage the details of routing for them.
Node Controller. In our design, node controllers are the active components that manage
Chapter 5. Use Cases 108
Driver
Controller
Driver DriverDriver
Switch Switch
Controller
Consolidator
Monitor
Consolidator
Monitor
Path Manager
Controller Controller
Hive
HiveH
ive
Hiv
e
MasterSlave
Flows
Triggers
Flows
Triggers
Paths
Figure 5.10: Beehive’s SDN controller suite.
the NOM objects. To make them protocol agnostic, we have drivers that act as southbound
mediators. �at is, drivers are responsible to map physical networking entities into NOM logical
entities. �e node controller is responsible to manage those drivers (e.g., elect a master among
them). �e node controller stores its data on a per switch basis and is automatically distributed
and placed close to switches.
Node controller is responsible for con�gurations at node level. At a higher level, we have path
managers that install logical paths on network nodes. We have implemented the path manager
as a centralized application since it requires a global view of the network to install proper �ow-
entries. Note that all the expensive functionalities are delegated to the node controller which is
a distributed application and close to switches.
Consolidator.�ere is always a possibility of discrepancy between the state of a switch and the
state of a node controller: the switch may remove a �ow right when the master driver fails, or a
switch may restart while the �ow is being installed by the driver. �e consolidator is basically a
poller that removes any �ow that is not in the controller’s state, and noti�es the controller if the
switch has removed any previously installed �ows.
Monitor. Monitor is another poller in the controller suite that implements triggers. For each
node, the monitor periodically sends �ow stat queries to �nd out whether any trigger should
Chapter 5. Use Cases 109
Switch S1
Node Ctrl(S1)Forwarding
(S1)Driver
Node Ctrl(S1)Forwarding
(S1)
Driver
Node Ctrl(S1)Forwarding
(S1)
Driver
Forwarding Quorum Node Ctrl Quorum
Master Slave Slave
(a)
Switch S1
Node Ctrl(S1)
Forwarding(S1)
Driver
Node Ctrl(S1)
Forwarding(S1)
Driver
Master Slave
(b)
Switch S1
Node Ctrl(S1)
Forwarding(S1)
Driver
Node Ctrl(S1)Forwarding
(S1)
Driver
Master Slave
(c)
1
10
100
1000
0 1 2 3 4 5 6 7 8 9 10Time (sec)
Late
ncy
(ms)
FailoverFailure
(d)
1
10
100
1000
0 1 2 3 4 5 6 7 8 9 10Time (sec)
Late
ncy
(ms)
Failover
Failure
(e)
1
10
100
1000
0 1 2 3 4 5 6 7 8 9 10Time (sec)
Late
ncy
(ms)
Failover
Failure Optimized
(f)
Figure 5.11: Fault-tolerance characteristics of the control suite using a Ra� election of 300ms:(a) �e default placement. (b) When the newly elected masters are on the same hive. (c) Whenthe new masters are on di�erent hives. (d) Reactive forwarding latency of (b). (e) Reactiveforwarding latency of (c). (f) Reactive forwarding latency of (c) which is improved by automaticplacement optimization at second 6.
Chapter 5. Use Cases 110
�re. �e monitor is distributed on a per-node basis and, in most cases, is placed next to the
node’s controller. �is reduces control channel overheads.
Beehive vs Centralized Controllers. To compare the simplicity of Beehive with centralized
controllers, we have implemented a hub, a learning switch, a host tracker, a �rewall (i.e.,
ACL at the edge) and a layer-3 router with respectively 16, 69, 158, 237 and 310 lines of code
using Beehive’s SDN abstraction. �ese applications are similar in functionality and on par
in simplicity to what is available on POX and Beacon. �e di�erences in lines of code are
because of di�erent verbosity of Python, Java, and Go. �e signi�cant di�erence here is that
Beehive SDN applications will automatically bene�t from persistence, fault-tolerance, runtime
instrumentation, and optimized placement, readily available in Beehive. We admit that the lines
of code, although commonly used, is not a perfect representative of application complexity.
Finding a proper complexity metric for control applications is beyond the scope of this thesis
and we leave it to future works.
Fault-Tolerance. To tolerate a failure (i.e., either a controller failure or a lost switch-to-controller
channel), our node controller application pings the drivers of each node (by default, the
controller sends one ping each 100ms). If a driver does not respond to a few (3 by default)
consecutive node controller’s pings, the driver is announced dead. If the lost driver is the master
driver, one of the slave drivers will be instructed to become the master driver. It is important to
note that when a hive that hosts the node controller fails, the platform will automatically delegate
the node to a follower bee on another hive. In such scenarios, once the node controller resumes,
it continues to health-check the drivers.
To evaluate how our SDN controller reacts to faults, we have developed a reactive forwarding
application that learns layer-2 and layer-3 addresses, but instead of installing �ow entries, it
sends PacketOuts. With that, a failure in the control plane is directly observed in the data
plane. In this experiment, we create a simple network emulated in mininet, and measure the
round-trip time between two hosts every 100ms for 10 seconds. To emulate a fault, we kill the
hive that hosts the OpenFlow driver at second 3. In this set of experiments, we use a Ra� election
Chapter 5. Use Cases 111
timeout of 300ms.
Before presenting the results, let us �rst clarify what happens for the control applications
upon a failure. As shown in Figure 5.11a, because of the default placement, the master bees of
node controller and the forwarding application will be collocated on the hive to which the switch
�rst connects. �e driver on the hive consequently will be the master driver. In this setting, the
average latency of the forwarding application will be about 1ms.
Failures in any of the hives hosting the slave bees would have no e�ect on the control
application (given that the majority of hives – 2 in this case – are up and running). Now, when
the hive that hosts the master bees fail (or goes into a minority partition), one of the follower bees
of each application will be elected as a master bee for that application. �ere are two possibilities
here: (i) as shown in Figure 5.11b new masters may be both collocated on the same hive, or (ii)
as shown in Figure 5.11c, new masters will be on di�erent hives3.
As portrayed in Figure 5.11d, when the new masters are elected from the same hive the
latency a�er the failover will be as low as 1ms. In contrast, when the new masters are located
on di�erent hives the latency will as high as 8ms (Figure 5.11e). �is is where the automatic
placement optimization can be of measurable help. As shown in Figure 5.11f, our greedy
algorithm detects the suboptimality and migrates the bees accordingly. A�er a short spike
in latency because of the bu�ering of a couple of messages during migration, the forwarding
latency becomes 1ms. It is important to note that as demonstrated in Figure 4.11, the Ra� election
timeout can be tuned to lower values if a faster failover is required in an environment.
When the link between the switch and the master driver fails (i.e., due to an update in
the control network, congestion in channels, or a switch recon�guration), something similar
happens (Figure 5.12a). In such scenarios, the master driver noti�es the node controller that its
switch is disconnected. In response, the node controller elects one of the remaining slave drivers
as the master. Although the forwarding application and the master driver are not from the
same hive anymore, the forwarding application will perform consistently. �is is an important
3We cannot prefer one bee over another bee in a colony in election time since it will obviate the provableconvergence of Ra�.
Chapter 5. Use Cases 112
property of Beehive. As shown in Figure 5.12b, the forwarding latency would accordingly be
high at �rst. As soon as the automatic optimizer detects this suboptimality the bees are migrated
and the latency will drop. We note that, since the drivers have IO routines they will never be
migrated by the optimizer. As a result, it is always the other bees that will be placed close to
those IO channels.
Scalability. To characterize the scalability of the Beehive SDN controller, we benchmarked our
learning switch application deployed on 1, 3, and 5 hives with no persistence as well as replication
factors of 1, 3, and 5. We stress each hive by one cbench [111] process. In these experiments, we
use 5 GCE virtual machines with 16 virtual cores, 14.4GB of RAM, and directly attached SSDs.
We launch cbench on either 1, 3 or 5 nodes in parallel, each emulating 16 di�erent switches. �at
is, when we use 1, 3, and 5 nodes to stress the application we have 16, 48, and 80 switches equally
spread among 1, 3, and 5 hives, respectively. Moreover, all SDN applications in our controller
suite (e.g., the node controller and discovery) are enabled as well as runtime instrumentation.
Although disabling all extra applications and using a Hub will result in a throughput of one
million messages per second on a single machine, we believe such a benchmark is not a realistic
demonstration of the real-world performance of a control platform.
Switch S1
Node Ctrl(S1)
Forwarding(S1)
Driver
Node Ctrl(S1)
Forwarding(S1)
Driver
Node Ctrl(S1)
Forwarding(S1)
Driver
Master Slave
(a)
1
10
100
1000
0 1 2 3 4 5 6 7 8 9 10Time (sec)
Late
ncy
(ms)
Disconnected Optimized
(b)
Figure 5.12: When a switch disconnects from its master driver: (a) One of the slave driverswill become the master. (b) �e reactive forwarding latency is optimized using the placementheuristic.
As shown in Figure 5.13, when we stress only 1 node in the cluster the throughputs of the
learning switch application with di�erent replication factors are very close. As we engage more
controllers, the throughput of the application with replication factors of 3 and 5 demonstrate
Chapter 5. Use Cases 113
increase but at a slower pace. �is is because when we use larger replication factors, other hives
will consume resources to replicate the transactions in the follower bees. Moreover, Beehive’s
throughput is signi�cantly higher than ONOS 1.2 [17] for the same forwarding functionality.
�is experiment demonstrates that, although Beehive is easy-to-use, it can scale considerably
well. In essence, hiding the boilerplates of distributed programming enables us to systematically
optimize the control plane for all applications. In these experiments, ONOS replicates the
forwarding application on all nodes. As a result, unlike Beehive, we cannot use a replication
factor smaller than the cluster size. For example, when we use a 5-node cluster, ONOS uses all
the �ve nodes and we cannot run an experiment with a replication factor of 3.
●
●
●
●
●
1k
200k
400k
600k
800k
1m
1 2 3 4 5Number of Nodes
TP
ut (
msg
/sec
)
Replication● Beehive−0
Beehive−1
Beehive−3
Beehive−5
ONOS−0
ONOS−1
ONOS−3
ONOS−5
Figure 5.13: �roughput of Beehive’s learning switch application and ONOS’s forwardingapplication in di�erent cluster sizes. �e number of nodes (i.e., the x-axis) in this graphrepresents the nodes on which we launch cbench. For Beehive, we always use a 5-node cluster.For ONOS, the cluster size is equal to the replication factor since all nodes in the cluster willparticipate in running the application.
5.3 Simple Routing in Beehive
Routing mechanisms can be more challenging to implement on existing distributed control
platforms compared to a centralized controller, especially if we want to realize a scalable routing
algorithm that requires no manual con�guration and is not topology-speci�c. In this section,
we demonstrate how simple it is to design and implement a generic k-shortest path algorithm
using Beehive.
Distributed Shortest Path. Suppose we already have a topology discovery algorithm that
Chapter 5. Use Cases 114
RoutingRcv(msg RouteAdv)
Rcv(msg LinkDiscovered)
DictionaryNeighbors
DictionaryRoutes
…NN
…N1
dist = msg.Distanceif dist is shortest path: for each N in Neighbors: Emit(RouteAdv{ To: msg.To, From: N, Distance: dist + 1, })
update NeighborsEmit(RouteAdv{msg.From, msg.To})…NN
…N1
Figure 5.14: A distributed shortest path algorithm.
automatically emits a Discovery message whenever an edge is found in the graph. To calculate
the shortest paths in a distributed fashion, we employ an algorithm that is very similar to
traditional distance vector routing protocols. As such, we will emit routing advertisement
messages to signal that we have found a new path towards a given destination. �e routing
advertisements are in the form of R = (N1, N2, D), which means N1 has a route to N2 with a length
of D. Whenever, a node N receives an advertisement, it emits further advertisements by adding its
neighbors to the route i� the received advertisement is a shortest path. For example, suppose N2,
N3 and N4 are the neighbors of N1. When N1 receives a shortest route R = (N1, A, D) to A, it emits
(N2, A, D + 1), (N3, A, D + 1), and (N4, A, D + 1). �is process would continue until the network
converges.
In such a setting, the state of our application is accessed on a per-node basis. �at is, as shown
in Figure 5.14, we have a dictionary that stores the neighbors of each node, and a dictionary that
stores all the routes initiated from each node. Such a simple application can be easily distributed
on several hives to scale.
Convergence Evaluation. We have implemented this routing algorithm using Beehive in a few
hundred lines of code and evaluated our implementation using a simulated network. In these
simulations, we assume a k-ary fat-tree network topology with homogenous switches of 4, 8,
Chapter 5. Use Cases 115
16, 32, 64, and 128 ports (respectively with 8, 16, 128, 512, 2k, 8k edge switches connecting
16, 128, 1k, 8k, 65k, 524k end-hosts). For the purpose of simulation, we use two auxiliary
control applications: one randomly emits discovery events based on the simulated network
topology, and the other receives and calculates the statistics for rule installation commands
from the routing module. We de�ne the convergence time as the di�erence between the �rst
discovery message and the last rule installation. �at is the time it takes to provide full multi-
path connectivity between every single two edge switches in the network.
Using a cluster of six controllers, each allocated four cores and 10GB of memory, we
measured the convergence time (i) when the platform does a cold start (i.e., has no information
about the network topology), (ii) when a new POD is added to the network and (iii) when a new
link is added to the network. In all experiments, each controller controls an equal number of
core switches and PODs.
Note that, we have O(k2) alternative paths between two edge switches in a k-ary fat-tree
network and, hence, O(k6) alternative paths in total. When we use high-density switches with
64 or 128 ports in a fat-tree network, the number of Routemessages exchanged and the memory
to store the routes is well beyond the capacity of a single commodity VM.�is is where Beehive’s
control plane demonstrates its scalability.
As shown in Figure 5.15a, considering the size of the network, this algorithm convergences
in a reasonable time from a cold state. Moreover, when we add a new POD, the system converges
in less than a second for switches of 32-ports or less (Figure 5.15b). More interestingly, as
shown in Figure 5.15c, when we randomly add a new link, the system converges in less than a
second, even for larger networks. By allocating more resources for Beehive, one can seamlessly
improve these results with no modi�cations in the routing application. To demonstrate this, we
conducted another set of experiments for a fat-tree network built from 64-port switches. As
shown in Figure 5.15d, Beehive’s control plane seamlessly scales as we add more controllers and
the convergence time continually diminishes. We note that, if additional computing resources
are available, adding a new controller requires no additional con�guration and is practically
Chapter 5. Use Cases 116
No. of Ports per Switchse
c.4 8 16 32 64 128
08
16
(a) Convergence time from a cold statewhen there is no route.
No. of Ports per Switch
sec.
4 8 16 32 64 1280
2
4
(b) Convergence time when a newPOD is added to the network.
No. of Ports per Switch
ms.
4 8 16 32 64 1288
64
128
(c) Convergence time when a new linkis added between two randomly se-lected switches.
No. of Ports per Switch
sec.
4 8 6 8 10 12 14048
(d) Adding more controllers resultsin faster convergence (used 64-portswitches).
Figure 5.15: Convergence time of the �at routing algorithm in a k-ary fat-tree topology.
seamless in Beehive.
5.4 Remarks
In this section, we presented real-world use-cases of our proposals. OpenTCP, the Beehive SDN
controller, and the simple routing algorithm demonstrate that Kandoo and Beehive are quite
scalable, despite their simplicity. It is important to note that our proposals are not limited to
these speci�c usecases and can be applied to a wide range of problems beyond SDN applications.
For instance, Beehive can be utilized to implement a key-value store and a message queue that
are scalable, distributed, and automatically optimized.
Chapter 6
Conclusion and Future Directions
In this thesis, we have deconstructed several scalability concerns of SDN and proposed two
distributed programming models, Kandoo and Beehive, that can scale well while preserving
simplicity. We showed that some of the presumed limitations of SDN are not inherent and only
stem from the historical evolution of OpenFlow. Using the real-world use cases of OpenTCP
and the Beehive SDN controller, we demonstrated that such limitations can be easily overcome
using an aptly designed so�ware platform that is able accommodate various control functions.
In essence, with a well-designed distributed network programming paradigm, we are able to
preserve the promised simplicity of SDN at scale.
�e two-layer hierarchy of controllers in Kandoo (Chapter 3) gives network programmers
the option of choosing between local and network-wide state to implement di�erent control
functionalities. Local controllers, without the need for a centralized network view, have high
parallelism and can be placed close to forwarding elements to shield the control channels from
the load of frequent events. Non-local applications, on the other hand, are deployed in the root
controller and can access the network-wide state. Employing Kandoo, network programmers
merely provide a �ag that whether an application is local or non-local, and the rest is seamlessly
handled by Kandoo. As such, Kandoo is very simple to use, yet scales an order of magnitude
better than centralized controllers.
117
Chapter 6. Conclusion and Future Directions 118
Kandoo enables network programmers to implement applications such as elephant �ow re-
routing and OpenTCP (Chapter 5) in a simple way with low overheads. For example, OpenTCP
can collect the tra�c matrix in a large-scale HPC datacenter without hindering the performance
of MPI jobs, and can adapt TCP to optimize the network. As we demonstrated, OpenTCP can
considerably improve �ow completion times, which interestingly improves the perceived end-
to-end performance of the MPI jobs.
�e generic programming model proposed in Beehive (Chapter 4) has a wider application
domain than Kandoo and can support di�erent types of functions in addition to local and
centralized applications. Using Beehive’s programming model, applications are written as
centralized message handlers that store their state in dictionaries. Applications written in that
programming model are automatically transformed (either at compile-time or at runtime) into
their distributed counterparts while preserving their intended behavior. Beehive hides many
boilerplates of distributed programming such as concurrency, synchronization, and replication.
With runtime instrumentation and placement optimization, Beehive is capable of automatically
improving the performance of the control plane.
Beehive is expressive and can be used to implement other distributed controllers including
Kandoo. Using several SDN and non-SDN use cases, such as a key-value store and an
SDN controller (Chapter 5), we demonstrated that real-world, distributed applications can
be implemented in Beehive with an e�ort comparable to centralized controllers. Despite
their simplicity, the throughput and the latency of the implemented applications are on par
with existing distributed controllers. More importantly, Beehive instruments applications at
runtime and automatically optimizes the placement of control applications. Using Beehive SDN
controller, we have demonstrated that this automatic optimized placement results in signi�cant
improvements in the perceived latency of the controller without any intervention by network
programmers.
Future Direction. Over the course of this research, we have encountered interesting open
problems that have not been addressed in this thesis. We believe addressing the following open
Chapter 6. Conclusion and Future Directions 119
problems can result in valuable and impactful research:
1. AlternativeOptimizedPlacementMethods: In Beehive, we presented a greedy algorithm
that tries to optimize the control plane’s latency by reducing the volume of inter-hive
tra�c. Although this is very e�ective for forwarding SDN applications (as shown in
Chapter 5), there are optimization approaches that can be exploited for other types of
applications. For example, another potentially useful placement method is to minimize
the tail latency of control applications. �at objective can indeed increase the inter-hive
tra�c, but may result in better performance for some speci�c applications. Moreover,
extending Beehive’s runtime instrumentation, one can provide richer information to the
placement and the optimization methods.
2. Acceleration: Porting Kandoo and Beehive runtimes to switches and co-processors in
forwarding hardware enables us to push the control functions as close as possible to the
data path. �is can result in considerable performance improvements and can widen the
application range of our proposals. It is important to note that this does not result in
coupling the control plane back with the data plane. Instead, we still keep the control
plane decoupled, yet provide an option of running code closest to the data path. �at
is, the control plane is still decoupled from the data plane, but is placed directly on the
forwarding element. We do not aim at packet processing in the control plane, but with
this acceleration we can process more frequent events at scale.
Another approach to accelerate network programs is to compile local control application
(either Kandoo local applications or Beehive application with local mapped cells) onto
programmable hardware (e.g., NetFPGA [94]). �is cross compilation will probably need
a limited programming abstraction, but can result in a system that is capable of providing
scalable packet processing. More importantly, with such a cross compilation, Beehive can
automatically choose the placement of these functions and can seamlessly optimize the
network.
Chapter 6. Conclusion and Future Directions 120
3. Veri�cation: Given that Beehive and Kandoo both have well-de�ned structures for
control application, it is feasible to develop static analyzers for control applications that are
able to detect bugs or potential performance bottlenecks. We developed such an analyzer
to automatically generate Map functions based on message handlers. �is static analyzer
can be extended to detect inter-dependencies of message handlers and to provide the cases
in which an application will become centralized. Such a static analyzer can also shed light
on potential performance problems and bugs (e.g., deadlocks) without the need to test the
application at runtime.
Bibliography
[1] Apache Storm. http://storm.apache.org/.
[2] Beehive. http://github.com/kandoo/beehive.
[3] Beehive Network Controller. http://github.com/kandoo/beehive-netctrl.
[4] OpenDaylight - An Open Source Community and Meritocracy for So�wareDe�ned
Networking. http://bit.ly/XJlWjU.
[5] Packet Binary Serialization Framework. http://github.com/packet/packet.
[6] Quagga Routing Suite. http://www.nongnu.org/quagga/.
[7] Station and Media Access Control Connectivity Discovery, IEEE Standard 802.1AB.
http://tinyurl.com/6s739pe.
[8] TCP Flow Spy. https://github.com/soheilhy/tcp_flow_spy.
[9] Intel MPI Benchmarks, http://so�ware.intel.com/en-us/articles/intel-mpi-benchmarks/.
[10] M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic
Flow Scheduling for Data Center Networks. In Proceedings of NSDI’10, pages 19–19, 2010.
[11] E. Al-Shaer, W. Marrero, A. El-Atawy, and K. Elbadawi. Network Con�guration in a Box:
Towards End-to-End Veri�cation of Network Reachability and Security. In Proceedings
of ICNP’09, pages 123–132, 2009.
121
Bibliography 122
[12] M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, and
M. Sridharan. Data Center TCP (DCTCP). In Proceedings of SIGCOMM’10, pages 63–74,
2010.
[13] P. Alvaro, T. Condie, N. Conway, K. Elmeleegy, J. M. Hellerstein, and R. Sears.
BOOM Analytics: Exploring Data-centric, Declarative Programming for the Cloud. In
Proceedings of EuroSys’10, pages 223–236, 2010.
[14] M. B. Anwer, M. Motiwala, M. b. Tariq, and N. Feamster. SwitchBlade: A Platform for
Rapid Deployment of Network Protocols on Programmable Hardware. In Proceedings of
SIGCOMM’10, pages 183–194, 2010.
[15] A. AuYoung, Y. Ma, S. Banerjee, J. Lee, P. Sharma, Y. Turner, C. Liang, and J. C. Mogul.
Democratic Resolution of Resource Con�icts Between SDN Control Programs. In
Proceedings of CoNEXT’14, pages 391–402, 2014.
[16] T. Benson, A. Akella, and D. A. Maltz. Network Tra�c Characteristics of Data Centers in
the Wild. In Proceedings of IMC’10, pages 267–280, 2010.
[17] P. Berde, M. Gerola, J. Hart, Y. Higuchi, M. Kobayashi, T. Koide, B. Lantz, B. O’Connor,
P. Radoslavov, W. Snow, and G. Parulkar. ONOS: Towards an Open, Distributed SDN OS.
In Proceedings of HotSDN’14, pages 1–6, 2014.
[18] M. S. Blumenthal and D. D. Clark. Rethinking the Design of the Internet: the End-to-End
arguments vs. the Brave New World. ACM Transaction on Internet Technologies, 1(1):70–
109, Aug. 2001.
[19] R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou.
Cilk: An E�cient Multithreaded Runtime System. Journal of Parallel and Distributed
Computing, 37(1):55–69, August 1995.
Bibliography 123
[20] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger,
D. Talayco, A. Vahdat, G. Varghese, and D. Walker. P4: Programming Protocol-
independent Packet Processors. SIGCOMM Comput. Commun. Rev., 44(3):87–95, July
2014.
[21] E. Brewer. CAP Twelve Years Later: How the "Rules" Have Changed. Computer Magazine,
45(2):23–29, 2012.
[22] M. Caesar, D. Caldwell, N. Feamster, J. Rexford, A. Shaikh, and J. van der Merwe. Design
and Implementation of a Routing Control Platform. In Proceedings of the 2nd Symposium
on Networked Systems Design & Implementation, volume 2, pages 15–28, 2005.
[23] M. Canini, P. Kuznetsov, D. Levin, and S. Schmid. So�ware Transactional Networking:
Concurrent and Consistent Policy Composition. In Proceedings of HotSDN’13, pages 1–6,
2013.
[24] M. Canini, D. Venzano, P. Perešíni, D. Kostić, and J. Rexford. A NICE Way to Test
Open�ow Applications. In Proceedings of NSDI’12, pages 10–10, 2012.
[25] M. Casado, M. J. Freedman, J. Pettit, J. Luo, N. McKeown, and S. Shenker. Ethane: Taking
Control of the Enterprise. In Proceedings of SIGCOMM’07, pages 1–12, 2007.
[26] M. Casado, T. Gar�nkel, A. Akella, M. J. Freedman, D. Boneh, N. McKeown, and
S. Shenker. SANE: A Protection Architecture for Enterprise Networks. In Proceedings
of USENIX-SS’06, 2006.
[27] M. Casado, T. Koponen, S. Shenker, and A. Tootoonchian. Fabric: A Retrospective on
Evolving SDN. In Proceedings of HotSDN’12, pages 85–90, 2012.
[28] B. Chandrasekaran and T. Benson. Tolerating SDN Application Failures with LegoSDN.
In Proceedings of HotNets-XIII, pages 22:1–22:7, 2014.
Bibliography 124
[29] C. Chang, J. Wawrzynek, and R. Brodersen. BEE2: A High-End Recon�gurable
Computing System. Design Test of Computers, IEEE, 22(2):114–125, 2005.
[30] Y. Chen, R. Gri�th, J. Liu, R. H. Katz, and A. D. Joseph. Understanding TCP Incast
�roughput Collapse in Datacenter Networks. In Proceedings of WREN’09, pages 73–82,
2009.
[31] Z. Chen, Q. Gao, W. Zhang, and F. Qin. FlowChecker: Detecting Bugs in MPI Libraries
via Message Flow Checking. In Proceedings of SC’10, pages 1–11, 2010.
[32] J. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. Furman, S. Ghemawat, A. Gubarev,
C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik,
D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor,
R. Wang, and D. Woodford. Spanner: Google’s Globally-Distributed Database. In
Proceedings of OSDI’12, pages 251–264, Berkeley, CA, USA, 2012.
[33] A. Curtis, W. Kim, and P. Yalagandula. Mahout: Low-overhead Datacenter Tra�c
Management Using End-Host-Based Elephant Detection. In Proceedings of INFOCOM’11,
pages 1629–1637, 2011.
[34] A. R. Curtis, J. C. Mogul, J. Tourrilhes, P. Yalagandula, P. Sharma, and S. Banerjee.
DevoFlow: Scaling Flow Management for High-Performance Networks. In Proceedings
of SIGCOMM’11, pages 254–265, 2011.
[35] J. Dean and S. Ghemawat. MapReduce: Simpli�ed Data Processing on Large Clusters.
Commun. ACM, 51(1):107–113, Jan. 2008.
[36] D. Decasper and B. Plattner. DAN: Distributed Code Caching for Active Networks. In
Proceedings of INFOCOM’98, volume 2, pages 609–616, 1998.
[37] A. A. Dixit, F. Hao, S. Mukherjee, T. Lakshman, and R. Kompella. ElastiCon: An Elastic
Distributed Sdn Controller. In Proceedings of ANCS’14, pages 17–28, 2014.
Bibliography 125
[38] S. Donovan and N. Feamster. Intentional Network Monitoring: Finding the Needle
Without Capturing the Haystack. In Proceedings of HotNets’14, pages 5:1–5:7, 2014.
[39] D. Erickson. �e Beacon Open�ow Controller. In Proceedings of HotSDN’13, pages 13–18,
2013.
[40] N. Farrington, G. Porter, S. Radhakrishnan, H. H. Bazzaz, V. Subramanya, Y. Fainman,
G. Papen, and A. Vahdat. Helios: A Hybrid Electrical/Optical Switch Architecture for
Modular Data Centers. In Proceedings of SIGCOMM’10, pages 339–350, 2010.
[41] A. D. Ferguson, A. Guha, C. Liang, R. Fonseca, and S. Krishnamurthi. A Northbound
API for Sharing SDNs. In Open Networking Summit’13, 2013.
[42] A. D. Ferguson, A. Guha, C. Liang, R. Fonseca, and S. Krishnamurthi. Participatory
Networking: An API for Application Control of SDNs. In Proceedings of SIGCOMM’13,
pages 327–338, 2013.
[43] B. Fitzpatrick. Distributed Caching with Memcached. Linux Journal, 2004(124):5, August
2004.
[44] M. P. Forum. MPI: A Message-Passing Interface Standard. Technical report, Knoxville,
TN, USA, 1994.
[45] N. Foster, R. Harrison, M. J. Freedman, C. Monsanto, J. Rexford, A. Story, and D. Walker.
Frenetic: A Network Programming Language. In Proceedings of ICFP’11, pages 279–291,
2011.
[46] M. Ghobadi, S. Hassas Yeganeh, and Y. Ganjali. Rethinking End-to-End Congestion
Control in So�ware-De�ned Networks. In Proceedings of HotNets’12, pages 61–66, 2012.
[47] S. Ghorbani and M. Caesar. Walk the Line: Consistent Network Updates With Bandwidth
Guarantees. In Proceedings of HotSDN’12, pages 67–72, 2012.
Bibliography 126
[48] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel,
and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. Commun. ACM,
54(3):95–104, Mar. 2011.
[49] A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, J. Rexford, G. Xie, H. Yan, J. Zhan,
and H. Zhang. A Clean Slate 4D Approach to Network Control and Management.
SIGCOMM CCR, 35(5):41–54, October 2005.
[50] A. Greenhalgh, F. Huici, M. Hoerdt, P. Papadimitriou, M. Handley, and L. Mathy. Flow
Processing and the Rise of Commodity Network Hardware. SIGCOMM CCR, 39(2):20–
26, 2009.
[51] N. Gude, T. Koponen, J. Pettit, B. Pfa�, M. Casado, N. McKeown, and S. Shenker. NOX:
Towards an Operating System for Networks. SIGCOMM CCR, 38(3):105–110, July 2008.
[52] S. Ha, I. Rhee, and L. Xu. CUBIC: A New TCP-friendly High-speed TCP Variant. SIGOPS
Oper. Syst. Rev., 42(5):64–74, July 2008.
[53] D. Halperin, S. Kandula, J. Padhye, P. Bahl, and D. Wetherall. Augmenting Data Center
Networks with Multi-gigabit Wireless Links. In Proceedings of SIGCOMM’11, pages 38–49,
2011.
[54] N. Handigol, B. Heller, V. Jeyakumar, D. Maziéres, and N. McKeown. Where is the
Debugger for My So�ware-De�ned Network? In Proceedings of HotSDN’12, pages 55–
60, 2012.
[55] M. Handley, O. Hodson, and E. Kohler. XORP: An Open Platform for Network Research.
SIGCOMM CCR, 33(1):53–57, Jan. 2003.
[56] D. Harrington, R. Presuhn, and B. Wijnen. An Architecture for Describing Simple
Network Management Protocol (SNMP) Management Frameworks. http://www.
ietf.org/rfc/rfc3411.txt, Dec. 2002.
Bibliography 127
[57] S. Hassas Yeganeh and Y. Ganjali. Kandoo: A Framework for E�cient and Scalable
O�oading of Control Applications. In Proceedings of the HotSDN’12, pages 19–24, 2012.
[58] S. Hassas Yeganeh and Y. Ganjali. Turning the Tortoise to the Hare: An Alternative
Perspective on Event Handling in SDN. In Proceedings of BigSystem’14, pages 29–32, 2014.
[59] S. Hassas Yeganeh and Y. Ganjali. Beehive: Towards a Simple Abstraction for Scalable
So�ware-De�ned Networking. In Proceedings of HotNets’14, pages 13:1–13:7, 2014.
[60] S. Hassas Yeganeh, A. Tootoonchian, and Y. Ganjali. On Scalability of So�ware-De�ned
Networking. IEEE Communications Magazine, 51(2):136–141, February 2013.
[61] B. Heller, R. Sherwood, and N. McKeown. �e Controller Placement Problem. In
Proceedings of HotSDN’12, pages 7–12, 2012.
[62] C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer.
Achieving High Utilization with So�ware-driven WAN. In Proceedings of SIGCOMM’13,
pages 15–26, 2013.
[63] J. E. Hopcro� and J. D. Ullman. Introduction to Automata �eory, Languages, and
Computation. Adison-Wesley Publishing Company, Reading, Massachusets, USA, 1979.
[64] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free Coordination for
Internet-scale Systems. In Proceedings of USENIXATC’10, pages 11–11, Berkeley, CA, USA,
2010.
[65] V. Jacobson. Congestion Avoidance and Control. SIGCOMM CCR, 25(1):157–187, 1995.
[66] S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer,
J. Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, and A. Vahdat. B4: Experience with a
Globally-deployed So�ware De�ned WAN. In Proceedings of SIGCOMM’13, pages 3–14,
2013.
Bibliography 128
[67] L. Jose, L. Yan, G. Varghese, and N. McKeown. Compiling Packet Programs to
Recon�gurable Switches. In Proceedings of NSDI’15, 2015.
[68] N. Katta, H. Zhang, M. Freedman, and J. Rexford. Ravana: Controller Fault-Tolerance in
So�ware-De�ned Networking. In Proceedings of SOSR’15, 2015.
[69] P. Kazemian, G. Varghese, and N. McKeown. Header Space Analysis: Static Checking for
Networks. In Proceedings of NSDI’12, pages 9–9, 2012.
[70] A. Khurshid, W. Zhou, M. Caesar, and P. B. Godfrey. VeriFlow: Verifying Network-wide
Invariants in Real Time. In Proceedings of HotSDN’12, pages 49–54, 2012.
[71] A. Khurshid, X. Zou, W. Zhou, M. Caesar, and P. B. Godfrey. VeriFlow: Verifying
Network-wide Invariants in Real Time. In Proceedings NSDI’13, pages 15–28, 2013.
[72] H. Kim, J. Reich, A. Gupta, M. Shahbaz, N. Feamster, and R. Clark. Kinetic: Veri�able
Dynamic Network Control. In Proceedings of NSDI’15, 2015.
[73] E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. �e Click Modular Router.
ACM Trans. Comput. Syst., 18(3):263–297, Aug. 2000.
[74] T. Koponen, K. Amidon, P. Balland, M. Casado, A. Chanda, B. Fulton, I. Ganichev,
J. Gross, N. Gude, P. Ingram, E. Jackson, A. Lambeth, R. Lenglet, S.-H. Li, A. Padman-
abhan, J. Pettit, B. Pfa�, R. Ramanathan, S. Shenker, A. Shieh, J. Stribling, P. �akkar,
D. Wendlandt, A. Yip, and R. Zhang. Network Virtualization in Multi-tenant Datacenters.
In Proceedings of NSDI’14, pages 203–216, 2014.
[75] T. Koponen, M. Casado, N. Gude, J. Stribling, L. Poutievski, M. Zhu, R. Ramanathan,
Y. Iwata, H. Inoue, T. Hama, and S. Shenker. Onix: A Distributed Control Platform for
Large-scale Production Networks. In Proceedings of OSDI’10, pages 1–6, 2010.
[76] A. Krishnamurthy, S. P. Chandrabose, and A. Gember-Jacobson. Pratyaastha: An E�cient
Elastic Distributed SDN Control Plane. In Proceedings of HotSDN’14, pages 133–138, 2014.
Bibliography 129
[77] A. Lakshman and P. Malik. Cassandra: A Decentralized Structured Storage System.
SIGOPS Oper. Syst. Rev., 44(2):35–40, Apr. 2010.
[78] B. Lantz, B. Heller, and N. McKeown. A Network in a Laptop: Rapid Prototyping for
So�ware-de�ned Networks. In Proceedings of HotNets-IX, pages 19:1–19:6, 2010.
[79] H. H. Liu, X. Wu, M. Zhang, L. Yuan, R. Wattenhofer, and D. Maltz. zUpdate: Updating
Data Center Networks with Zero Loss. In Proceedings of SIGCOMM’13, pages 411–422,
2013.
[80] J. Liu, M. D. George, K. Vikram, X. Qi, L. Waye, and A. C. Myers. Fabric: A Platform for
Secure Distributed Computation and Storage. In Proceedings of SOSP’09, pages 321–334,
2009.
[81] C. A. B. Macapuna, C. E. Rothenberg, and F. Magalh. In-packet Bloom �lter based
data center networking with distributed OpenFlow controllers. In Proceedings of IEEE
Globecom MENS’10, pages 584–588, 2010.
[82] H. Mai, A. Khurshid, R. Agarwal, M. Caesar, P. B. Godfrey, and S. T. King. Debugging the
Data Plane with Anteater. In Proceedings of SIGCOMM’11, pages 290–301, 2011.
[83] G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski.
Pregel: A System for Large-scale Graph Processing. In Proceedings of SIGMOD’10, pages
135–146, 2010.
[84] N. Marz and J. Warren. Big Data: Principles and Best Practices of Scalable Realtime Data
Systems. Manning Publications Co., 1st edition, 2015.
[85] T. McCabe. A Complexity Measure. IEEE Transactions on So�ware Engineering, (4):308–
320, 1976.
[86] J. McCauley, A. Panda, M. Casado, T. Koponen, and S. Shenker. Extending SDN to Large-
Scale Networks. In Open Networking Summit, 2013.
Bibliography 130
[87] R. McGeer. A Safe, E�cient Update Protocol for Open�ow Networks. In Proceedings of
HotSDN’12, pages 61–66, 2012.
[88] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford,
S. Shenker, and J. Turner. OpenFlow: Enabling Innovation in Campus Networks.
SIGCOMM CCR, 38(2):69–74, March 2008.
[89] A. Medina, N. Ta�, K. Salamatian, S. Bhattacharyya, and C. Diot. Tra�c Matrix
Estimation: Existing Techniques and New Directions. In Proceedings of SIGCOMM’02,
pages 161–174, 2002.
[90] J. C. Mogul, A. AuYoung, S. Banerjee, L. Popa, J. Lee, J. Mudigonda, P. Sharma, and
Y. Turner. Corybantic: Towards the Modular Composition of SDN Control Programs.
In Proceedings of HotNets-XII, pages 1:1–1:7, 2013.
[91] C. Monsanto, J. Reich, N. Foster, J. Rexford, and D. Walker. Composing So�ware-de�ned
Networks. In Proceedings of NSDI’13, pages 1–14, 2013.
[92] M. Moshref, M. Yu, R. Govindan, and A. Vahdat. DREAM: Dynamic Resource Allocation
for So�ware-de�ned Measurement. In Proceedings of SIGCOMM’14, pages 419–430, 2014.
[93] J. Moy. OSPF Version 2. http://www.ietf.org/rfc/rfc2328.txt, 1998.
[94] J. Naous, G. Gibb, S. Bolouki, and N. McKeown. NetFPGA: reusable router architecture
for experimental research. In Proceedings of PRESTO’08, pages 1–7, 2008.
[95] OASIS. Advanced Message Queuing Protocol (AMQP), Version 1.0. http://docs.
oasis-open.org/amqp/core/v1.0/os/amqp-core-complete-v1.0-os.pdf,
2011. [retrieved 23 April 2015].
[96] D. Ongaro and J. Ousterhout. In Search of an Understandable Consensus Algorithm. In
Proceedings of USENIX ATC’14, pages 305–320, 2014.
Bibliography 131
[97] D. Oran. OSI IS-IS Intra-domain Routing Protocol. http://www.ietf.org/rfc/
rfc1142.txt, 1990.
[98] K. Ostrowski, K. Birman, D. Dolev, and J. H. Ahnn. Programming With Live Distributed
Objects. In Proceedings of ECOOP’08, pages 463–489, 2008.
[99] J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra,
A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and
R. Stutsman. �e Case for RAMCloud. Commun. ACM, 54(7):121–130, July 2011.
[100] A. Panda, C. Scott, A. Ghodsi, T. Koponen, and S. Shenker. CAP for Networks. In
Proceedings of HotSDN’13, pages 91–96, 2013.
[101] B. Pfa�, J. Pettit, K. Amidon, M. Casado, T. Koponen, and S. Shenker. Extending
Networking into the Virtualization Layer. In Proceedings of HotNets-VIII, 2009.
[102] P. Phaal and M. Lavine. sFlow Version 5. http://www.sflow.org/sflow_version_
5.txt, 2004.
[103] K. Phemius, M. Bouet, and J. Leguay. DISCO: Distributed Multi-domain SDN
Controllers. In Proceedings of NOMS 2014, pages 1–4, 2014.
[104] M. Reitblatt, N. Foster, J. Rexford, C. Schlesinger, and D. Walker. Abstractions for Network
Update. In Proceedings of SIGCOMM’12, pages 323–334, 2012.
[105] E. Rosen, A. Viswanathan, and R. Callon. Multiprotocol Label Switching Architecture.
RFC 3031 (Proposed Standard), Jan. 2001.
[106] J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-End Arguments in System Design. ACM
Trans. Comput. Syst., 2(4):277–288, Nov. 1984.
[107] B. Schwartz, A. W. Jackson, W. T. Strayer, W. Zhou, R. D. Rockwell, and C. Partridge.
Smart Packets: Applying Active Networks to Network Management. ACM Transactions
on Computer Systems, 18(1):67–88, Feb. 2000.
Bibliography 132
[108] C. Scott, A. Wundsam, B. Raghavan, A. Panda, A. Or, J. Lai, E. Huang, Z. Liu, A. El-
Hassany, S. Whitlock, H. Acharya, K. Zari�s, and S. Shenker. Troubleshooting Blackbox
SDN Control So�ware with Minimal Causal Sequences. In Proceedings of SIGCOMM’14,
pages 395–406, 2014.
[109] L. I. Sedov. Similarity and Dimensional Methods in Mechanics. Moscow, 1982. Translated
from the Russian by V. I. Kisin.
[110] V. Sekar, N. Egi, S. Ratnasamy, M. K. Reiter, and G. Shi. Design and Implementation of a
Consolidated Middlebox Architecture. In Proceedings of NSDI’12, pages 24–24, 2012.
[111] R. Sherwood. O�ops. http://archive.openflow.org/wk/index.php/Oflops.
[112] R. Sherwood, M. Chan, A. Covington, G. Gibb, M. Flajslik, N. Handigol, T.-Y. Huang,
P. Kazemian, M. Kobayashi, J. Naous, S. Seetharaman, D. Underhill, T. Yabe, K.-K. Yap,
Y. Yiakoumis, H. Zeng, G. Appenzeller, R. Johari, N. McKeown, and G. Parulkar. Carving
Research Slices out of Your Production Networks with OpenFlow. SIGCOMM CCR,
40(1):129–130, Jan 2010.
[113] A. Shieh, S. Kandula, and E. G. Sirer. SideCar: Building Programmable Datacenter
Networks Without Programmable Switches. In Proceedings of HotNets-IX, pages 21:1–21:6,
2010.
[114] J. M. Smith, D. J. Farber, C. A. Gunter, S. M. Nettles, D. C. Feldmeier, and W. D. Sincoskie.
SwitchWare: Accelerating Network Evolution (White Paper). Technical report, 1996.
[115] A.-W. Tam, K. Xi, and H. Chao. Use of Devolved Controllers in Data Center Networks.
In Processings of the IEEE Computer Communications Workshops, pages 596–601, 2011.
[116] A. Tootoonchian and Y. Ganjali. HyperFlow: A Distributed Control Plane for OpenFlow.
In Proceedings of INM/WREN’10, pages 3–3, 2010.
Bibliography 133
[117] A. Tootoonchian, M. Ghobadi, and Y. Ganjali. OpenTM: Tra�c Matrix Estimator for
OpenFlow Networks. In Proceedings of PAM’10, pages 201–210, Berlin, Heidelberg, 2010.
Springer-Verlag.
[118] A. Tootoonchian, S. Gorbunov, Y. Ganjali, M. Casado, and R. Sherwood. On Controller
Performance in So�ware-De�ned Networks. In Proceedings of Hot-ICE’12, pages 10–10,
2012.
[119] V. Vasudevan, A. Phanishayee, H. Shah, E. Krevat, D. G. Andersen, G. R. Ganger,
G. A. Gibson, and B. Mueller. Safe and E�ective Fine-grained TCP Retransmissions for
Datacenter Communication. In Proceedings of SIGCOMM’09, pages 303–314, 2009.
[120] A. Voellmy and P. Hudak. Nettle: Taking the Sting Out of Programming Network Routers.
In Proceedings of PADL’11, pages 235–249, 2011.
[121] A. Voellmy, H. Kim, and N. Feamster. Procera: A Language for High-Level Reactive
Network Control. In Proceedings of HotSDN’12, pages 43–48, 2012.
[122] D. M. Volpano, X. Sun, and G. G. Xie. Towards Systematic Detection and Resolution of
Network Control Con�icts. In Proceedings of HotSDN’14, pages 67–72, 2014.
[123] A. Wang, W. Zhou, B. Godfrey, and M. Caesar. So�ware-De�ned Networks as Databases.
In ONS’14, 2014.
[124] G. Wang, T. E. Ng, and A. Shaikh. Programming Your Network at Run-time for Big Data
Applications. In Proceedings of HotSDN’12, pages 103–108, 2012.
[125] D. J. Wetherall, J. V. Guttag, and D. L. Tennenhouse. ANTS: A Toolkit for Building and
Dynamically Deploying Network Protocols. In Proceedings of the IEEE Open Architectures
and Network Programming conference, pages 117–129, 1998.
[126] D. J. Wetherall and D. L. Tennenhouse. �e ACTIVE IP option. In Proceedings of EW 7,
pages 33–40, 1996.
Bibliography 134
[127] H. Yan, D. A. Maltz, T. E. Ng, H. Gogineni, H. Zhang, and Z. Cai. Tesseract: A 4D Network
Control Plane. In Proceedings of NSDI’07, 2007.
[128] L. Yang, R. Dantu, T. Anderson, and R. Gopal. Forwarding and Control Element Sepa-
ration (ForCES) Framework. RFC 3746 (http://tools.ietf.org/html/rfc3746),
April 2004.
[129] C. Yu, C. Lumezanu, Y. Zhang, V. Singh, G. Jiang, and H. V. Madhyastha. FlowSense:
Monitoring Network Utilization with Zero Measurement Cost. In Proceedings of PAM’13,
pages 31–41, 2013.
[130] M. Yu, L. Jose, and R. Miao. So�ware-de�ned tra�c measurement with opensketch. In
Proceedings of NSDI’13, pages 29–42, 2013.
[131] M. Yu, J. Rexford, M. J. Freedman, and J. Wang. Scalable Flow-based Networking with
DIFANE. In Proceedings of SIGCOMM’10, pages 351–362, 2010.
[132] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin,
S. Shenker, and I. Stoica. Resilient Distributed Datasets: A Fault-tolerant Abstraction
for In-memory Cluster Computing. In Proceedings of NSDI’12, pages 2–2, 2012.
[133] M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized Streams: Fault-
tolerant Streaming Computation at Scale. In Proceedings of SOSP’13, pages 423–438, 2013.
[134] T. S. E. N. Zheng Cai, Alan L. Cox. Maestro: A System for Scalable OpenFlow Control.
Technical report, Department of Computer Science, Rice University, 2010.
[135] W. Zhou, D. Jin, J. Cro�, M. Caesar, and P. B. Godfrey. Enforcing Customizable
Consistency Properties in So�ware-De�ned Networks. In Proceedings of NSDI’15, pages
73–85, 2015.