Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Hitesh Ballani
Daniel Cletheroe, Paolo Costa, Istvan Haller, Krzysztof Jozwik, Fotini Karinou, Sophie Lange, Kai Shi, Benn Thomsen
Microsoft Research
Will free scaling of the network continue?
2
Sw
itch
perf. p
er
do
llar
Gb
ps/
$
Years
Requirements for data center switching
1. Ultra-fast
- nanosecond switching
Bursty cloud applications
- 90% packets less than 576 bytes
Requirements for data center switching
1. Ultra-fast
- nanosecond switching
2. Scalable (ideally, high-radix)
- flat network
3. Reliable and easy to manage
Flatter network
5
Could photonics offer a new growth curve?
Potential benefits
Low and predictable
latency
Clo
ud
netw
ork
co
st
Time
Electrical
Optical
Why the hold-up?Sca
le o
f so
lutio
n
Switching speed
Slow (ms) Fast (ns)
Cluster scale
(10s-100s)
Data center scale
(1,000s-10,000s)
Maintaining two separate networks
- Too much complexity
EAMs
MZIs
SOAs
Tunable lasers and AWGRs
…
What is the
last mile so hard?
Bridging the last mile
Apps
Network Stack
NIC
PHY
Apps
Network Stack
NIC
PHY
A fundamentally different abstraction
From asynchronous packet switches to synchronous circuit switches
Optical Circuit Switch
Lots of very hard problems to solve to make
optical switching practical
Need for cloud-centric, cross-layer solutions
Two case studies
Problem Traditional solution Cloud-centric solution
Lack of buffering Centralized Scheduler Scheduler-less network
Sub-nanosecond CDR - Phase caching
10
Case study #1: Scheduling the network
11
1 2 3 N
1011… 1011…
Scheduler
High-radix optical switchNo buffers
Building a data center wide scheduler is hard
Scale to 100K servers, Infer demand, Communicate demand
Scheduler-less network [Cheng et al., 2000]
A B C D E F G H
1 2 3 4 5 6 7Time slot
A
B
C
DE
F
G
H
BC
DEF
G
H
A
CD
EFG
H
A
B
DE
FGH
A
B
C
EF
GHA
B
C
D
FG
HAB
C
D
E
GH
ABC
D
E
F
HA
BCD
E
F
G
(a cyclic permutation)
A permutation
of connections N-1 time slots
Uniform traffic 100% throughputStatic pre-defined schedule
Scheduler-less network [Cheng et al., 2000]
A B C D E F G H
1 2 3 4 5 6 7Time slot
A
B
C
DE
F
G
H
BC
DEF
G
H
A
CD
EFG
H
A
B
DE
FGH
A
B
C
EF
GHA
B
C
D
FG
HAB
C
D
E
GH
ABC
D
E
F
HA
BCD
E
F
GUniform traffic
Arbitrary traffic pattern
Static pre-defined schedule
Load Balancing
Scheduler-less network [Cheng et al., 2000]
A B C D E F G H
1 2 3 4 5 6 7Time slot
A
B
C
DE
F
G
H
BC
DEF
G
H
A
CD
EFG
H
A
B
DE
FGH
A
B
C
EF
GHA
B
C
D
FG
HAB
C
D
E
GH
ABC
D
E
F
HA
BCD
E
F
GUniform traffic
Arbitrary traffic pattern
50% throughput
in worst-caseStatic pre-defined schedule
Load Balancing
Good trade-off
- Through and latency overhead of 2-hop paths
+ No scheduling!
“Shoal: A Network Architecture for Disaggregated Racks”, Cornell and Microsoft Research, NSDI 2019
“RotorNet: A Scalable, Low-complexity, Optical Datacenter Network”, UCSD, SIGCOMM 2017
Case study#2: Clock and Data Recovery
Today’s network Optical network
16
• Links are point to point and always on− CDR Locking time does not matter
• Links can be established every few ns− Locking time impacts overall throughput
High-radix optical switch
Transient links
Status quo on CDR
Commercial
State-of-the-art*
>100 ns
CDR LockingTime
256-ByteData
20 ns
100 Gbps (4x25)
Optical Switching
1 ns
8 ns 20 ns
100 Gbps (4x25)
1 ns
High locking time impacts network throughput
69% txput
* A Cevrero et al., IBM Research and EPFL, OFC 2018
Need for sub-nanosecond CDR
Phase caching: sub-nanosecond CDR
Phase does not change often, cache it!
Synchronous
Tx RxBut Optical Switches requiresynchronisation to avoid packet collisions!
CDR problem reduced to phase discovery
But it can still take >40ns to recover phase
Fast optical switching starts looking viable
20
Cross-layer approach that
leverages DC peculiarities to
achieve sub-ns CDR
Good trade-off in this setting
20 ns
4×25 Gbps
1 ns
0sssssss625 ps
“Sub-Nanosecond Clock and Data Recovery in an Optically-Switched Data Centre Network”,
UCL and Microsoft Research, ECOC PDP, 2018
Innovation across the cloud stack needed
Physical
Network
Stack
New optics
New hardware/software
New protocols
Architecture New algorithms
Need an end-to-end, cloud-centric approach to
make optical switching viable
http://aka.ms/OpticsForTheCloud