Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Programming the Topology of NetworksTechnology and Algorithms
Pierre Blanche Klaus-Tycho Foerster
Jeff Cox Jamie Gaudette
Nikhil Devanur Phillipa Gill
Mark Filer Madeleine Glick
Daniel Kilper Gireeja Ranade
Janardhan Kulkarni Houman Rastegarfar
Ratul Mahajan Stefan Schmid
Amar Phanishayee Rachee Singh
Manya Ghobadi
The cloud infrastructure
Inefficiencies and waste in
the cloud infrastructure
Data centers Optical cables
2
Network
topology
Traffic engineeringCapacity provisioning
Technology and algorithms to optimize network topology
3
High-level idea
Networks with programmable topologies
A B
D C
1
A B
D C
1
1
11
Throughput: 2 units Throughput: 3 units
1
1
1
Dynamic topologyStatic topology
4
• Challenging:
• Requires reconfigurable hardware technology
• Requires revisiting networking layer algorithms
• Impactful:
• Cheaper networks
• Higher throughput
Programmable topologies
5
Talk outline
Data center networks
ProjecToR: Programming the
network topology [SIGCOMM’16]
Technology and algorithms to enable programmable
topologies in the cloud
Google data center
Wide-area networks
Programming the capacity of links[SIGCOMM’18, HotNets’17, OFC’16]
Level3 global backbone
6
Today’s data center interconnects
A B C D
0 3 3 3
3 0 3 3
3 3 0 3
3 3 3 0
Ideal demand matrix:
uniform and static
Static capacity between
Top-of-Rack (ToR) pairs
0 6 6 0
0 0 0 0
0 0 0 0
0 0 0 0
Non-ideal demand matrix:
skewed and dynamic
A
B
C
D
A B C D A B C D
A
B
C
D10Gbps
10Gbps
0 0 0 0
0 0 0 0
8 0 6 0
0 0 0 0
0 0 0 0
0 0 0 7
0 0 0 7
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 12 8 0
0 6 6 0
0 0 0 0
0 0 0 0
0 0 0 0
7
Need for a reconfigurable interconnect
Data:
• 200K servers across 4 production clusters
• Cluster sizes: 100 -- 2500 racks
Observation:
• Many rack pairs exchange little traffic
• Only some hot rack pairs are active
Implication:
• Static topology with uniform capacity:
• Over-provisioned for most rack pairs
• Under-provisioned for few others
Reconfigurable interconnect:
To dynamically provide additional capacity between hot rack pairs
8
Our proposal: ProjecToR interconnect
9
• Free-space topology (programmable)
• Digital micromirror device to redirect light
• Disco-ball shaped mirror assembly to magnify reachability
Laser Photodetector
Static topology 9
Digital Micromirror Device (DMD)
Array of micromirrors (10 um) Memory cell
10
A 3-ToR ProjecToR interconnect prototype
ToR1ToR2
ToR3
Source laser
DMD
Mirrors
reflecting to
ToR2 and ToR3
ToR1ToR2
ToR3
11
Routing algorithm
• We have a highly flexible topology allowing for millions of ways to connect lasers to photodetectors
• Ideal solution: fast changing topology to adapt to demand change
• Challenge: It takes 12μs to reprogram a link
ToR1 ToR1
ToR2
ToR3
ToR2
ToR3
lasers photodetectors
12
Routing algorithm
• Two topology approach:
• Slow switching topology or dedicated topology
• Fast switching links or opportunistic links
ToR1
ToR2
ToR3
ToR1
ToR2
ToR3
ToR1 ToR1
ToR2
ToR3
ToR2
ToR3
ToR1
ToR2
ToR3
ToR1
ToR2
ToR3
lasers photodetectors
dedicated topology opportunistic links
13
Routing packets
ToR1
ToR2
ToR3
ToR1
ToR2
ToR3
dedicated topology
2
3
3
3
Virtual output queues
K-shortest paths routing
opportunistic link
2
2
2
2
2
2
2
14
Scheduling opportunistic links
• Given a set of potential links and current traffic demand, find a set of active opportunistic links
ToR1
ToR2
ToR3
ToR1
ToR2
ToR3
0 0 100
100 0 0
0 0 0
s
o
u
r
c
e
d e s t i n a t i o n
15
Scheduling opportunistic links
• Standard switch scheduling problem
• Blossom matching
• BvN matrix decomposition
• Centralized scheduler
• Single tiered matching
0 100 0
0 0 100
100 0 0
s
o
u
r
c
e
d e s t i n a t i o n
input output
16
• Standard switch scheduling problem
• Blossom matching
• BvN matrix decomposition
• Centralized scheduler
• Single tiered matching
Scheduling opportunistic links
input outputSrc ToRs Dst ToRs
Two-tiered
Decentralized
Extended the Gale-Shapely algorithm for finding stable matches [GS-1962]
Constant competitive against an offline optimal allocation17
- Slow switching time
+ Reconfigurable
+ Switching time: 12μs
• Tail flow completion time
• Different traffic matrices
• Impact of switching time
• Impact of fan-out
Simulation results
0
5
10
15
20
25
30
35
40
20 30 40 50 60 70 80
Ave
rage
Flo
w C
om
ple
tio
n T
ime
(m
s)
Average Load (%)
- No reconfigurability
95%
ProjecToR
Fat tree [SIGCOMM’08]
FireFly
[SIGCOMM’14]
18
The key takeaway from this talk
Current assumption: Network topology is fixed
New world: Network topology is dynamicc3
c4
s
w
u v
y
tc7 c8
Problems to solve:
• Scheduling
• Capacity provisioning
• Traffic engineering
• Load-balancing
• Exciting: Unusual wealth of algorithms
• Challenging: Changes fundamental assumptions
• Impactful: Better efficiency ($/Gbps) 19