A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

Embed Size (px)

Citation preview

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    1/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1

    David Tsiang, Cedrik Begin, Guglielmo Morandin

    4/22/13

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    2/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2

    Goals and requirements of switch fabrics Buffering strategies (input, output, CIOQ) Transport (packet vs cell) Topologies (single stage, multi-stage) Congestion management (proactive, reactive) Multicast Service provider examples Enterprise and Datacenter examples

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    3/85

    Cisco Confidential 2010 Cisco and/or its affiliates. All rights reserved. 3

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    4/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4

    ScaleBandwidth per fabric interface

    Number of fabric interfaces

    FairnessUsually want non-blocking and fair (sometimes weighted fairness)

    Non-blocking no cross-flow interference between src-dest flows

    e.g. a congested flow doesnt unduly interfere with a non-congested flow

    LatencyService provider 100 us (WAN distances dominate, jitter)

    Enterprise 10s of us (Campus distances)

    Datacenter 1 us (Datacenter distances compute perf. latency sensitive)

    CostSP vs Datacenter vs Enterprise

    Redundancy1:1, 1+1, 1:N

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    5/85

    Cisco Confidential 2010 Cisco and/or its affiliates. All rights reserved. 5

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    6/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6

    CentralShared memory

    InputDeep buffers only on input

    OutputDeep buffers only on output

    Combined input/outputDeep buffers on input and output

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    7/85 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7

    Usually associated with central memory switch fabric designs Bandwidth scale limited by memory bandwidth (can be distributed over

    several parallel memory slices to improve)

    Limited queue scale (not practical for multi-chassis) Similar performance characteristics to output buffered switch without

    need for a large speedup

    Examples: Early cisco routers (AGS+, 7000, 7500), smaller routers (ISR,ASR1K, Procket, early Juniper routers M40, M160)

    FIA

    FIA

    Centralmemory

    Write

    Read

    FIA: FabricInterface

    Adaptor

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    8/85 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8

    Buffers on input Requires Virtual Output Queues to be non-blocking Most common type of buffering (GSR, N7K, ASR9K, Panini)

    FIA Switch Fabric

    Send

    Receive

    VOQ

    VOQ

    VOQ

    MemMem

    Mem

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    9/85 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9

    Only works if there is no congestion within the switch fabric! Can be achieved if speedup is high (path from SendRCV is >>

    FIA input BW).

    Pure output buffered switch is not practical for large systems(need speed of N)

    FIA Switch Fabric

    Send

    Receive

    OQ

    OQ

    OQ

    MemMem

    Mem

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    10/85 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10

    High speed up (e.g. 2-3X FIA BW) enough most of the time Input Queues for cases where speedup is insufficient - blocking CRS uses this (VOQ scale impractical for input only approach)

    FIA Switch Fabric

    Send

    Receive

    OQ

    OQOQ

    MemMem

    Mem

    MemMem

    Mem

    IQ

    IQ

    IQ

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    11/85Cisco Confidential 2010 Cisco and/or its affiliates. All rights reserved. 11

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    12/85 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12

    Two main methods of transporting data across a switch fabric Packet send whole packets (or even multiple pkts as a frame)

    Advantages:

    Simpler no reassembly of cells (but may have to reorder packets)

    Higher Efficiency per packet overhead vs per cell overhead

    Lower average latency (can do cut through on egress)

    DisadvantagesSlightly higher complexity for buffered switch chips

    Higher WC latency (small packets must wait behind larger packets)

    Not as scalable (large scale switches require distribution which requires cells to beefficient packet requires bundling of links to achieve low latency for large packets

    which does not allow for large scale distribution)

    Cell segment packets into smaller sized cellsAdvantages

    Lower WC latency (important for TDM types of traffic)

    Scalable (easy to evenly distribute cells)

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    13/85 2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13

    (Cell continued):

    Disadvantages

    Higher complexity requires segmentation and reassembly of cells

    Higher overhead per cell overhead, packet packing efficiency

    Worse average latency reassembly and reordering cell buffer adds latency, cant doegress cut-through

    Generally:Packet transport pure packet fabrics, single chassis scale fabrics

    Cell transport hybrid packet/TDM switches, most large scale multi-chassisfabrics.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    14/85

    Cisco Confidential 2010 Cisco and/or its affiliates. All rights reserved. 14

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    15/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15

    Mesh (pt-pt, bus) Scale limited to bus bandwidth or FIA bandwidth Used in smaller and/or older systems

    FIA FIA

    Bus

    FIA FIA

    FIA FIA

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    16/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16

    Single stage (crossbar, central memory) Scale limited to number of serdes I/O on a single chip

    e.g.224x224 (SM15) of serdes on a chip limits a system to 224 FIAs Paninihas 768 FIAs

    Use parallel crossbars for bandwidth scale and redundancy

    FIA FIA

    CrossbarCrossbar

    Crossbar

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    17/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 17

    3-stage symmetric CLOS#S1 = #S2 = #S3, Traffic takes two hops always (S1->S2, S2->S3)

    For a NxN xbar chip can connect up to N^2 FIAsScale is usually less due to common use of combining S1 and S3 (folded:N^2/2) and number of S1s and FIAs achievable in a LC chassis.

    Provably non-blocking via rearrangement (connection oriented) orload balancing (requires some speedup to overcome imperfectrandomized load balancing)

    S1

    FIA

    FIA

    FIA

    FIA

    FIA

    FIA

    FIA

    FIA

    S1 S3

    S3

    S2

    S2

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    18/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18

    n-stage topologies (Hyper-cube, Torus, Hyper-torus) Flows can take a variable number of hops from src-destination Lower cost

    Typically FIAs interconnect directly less components, no fabric chassis needed

    Less interconnection cost but interconnection can be complex for:

    less than a fully populated system

    system with varying speed nodes

    Requires complex scheduling to be non-blockingTypically flow based path selection

    Must be able to dynamically change path if flow bandwidths change(reordering?)

    Slow to recover from failures (massive path recomputations)

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    19/85

    Cisco Confidential 2010 Cisco and/or its affiliates. All rights reserved. 19

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    20/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 20

    Centralized timeslot schedulingAllows bufferless crossbars with no data loss (no collisions)

    Difficult to achieve maximum match (O(n^2.5), O(n^3) complexity)Approximate maximum match instead (e.g. ISLIP, PIM, WFA) needs speedup toovercome imperfect match.

    Not that scalable (O(n) complexity, but n can be large. Sched. Speed can be an issue as well).

    Distributed timeslot schedulingScheduling done by each destination independently

    Imprecise sources may receive multiple grants but can only act on one

    Results in loss of bandwidth can be overcome with speedup

    Scalable since its distributed but somewhat inefficient

    Distributed bandwidth schedulingDistribute bandwidth (credits or MTU pkts) on request

    src sends when ready

    Collisions can occur need buffering in the xbar and some speedup

    Scalable since its distributed and efficient

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    21/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 21

    i.e. no scheduling just sendSwitch fabric either:

    buffers and asserts flow control if buffers get too full

    Or just drops if buffers get too full (may require ack + retransmission)

    Requires a large speedup to get good performanceoversubscribed scenarios

    Is blocking for congestion > speedup because flow control withinthe switch fabric is usually coarse (not to VOQ level)

    Can reduce blocking by adding secondary flow control from destination back tosource can be at a VOQ level

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    22/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22

    Hybrid schemesCan combine proactive and reactive schemes

    e.g. send speculatively and if congested (reactive) request to re-send (proactive)

    Better latency if non-congested

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    23/85

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    24/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 24

    Typically very challengingScheduling per MC group not practicalCombinatoric number of groups 2^n-1 where n is the number of FIAs

    Usually drop on congestion or reactive flow controlAlternative turn multicast into unicast can now isolate

    congestionBut can be blocking of unicast (ingress replication) or expensive (serverreplication).

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    25/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 25

    LCSwitchFabric

    1

    LC

    Switch

    Fabric

    1 2 3 4 Ingress ReplicationCan block ingress if not enough

    speedup to overcome replicationdilation

    Staggered delivery

    12

    3

    4

    1

    2

    4

    3

    Fabric multicast

    No impact to linecards, scalable to100% multicastDrop on congestion

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    26/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 26

    LC

    SwitchFabric

    1 2

    LC

    LC

    Switch

    Fabric

    1

    MCserver

    1 2 3 4

    Binary Tree ReplicationCan block ingress if not enough

    speedup to overcome replicationdilation (but less chance of this

    due to distribution of replication).Very staggered delivery

    MC Server ReplicationNo ingress blocking

    Additional expense of MC servercards

    Staggered delivery

    1

    2

    4

    3

    1

    2

    1

    3 4

    3

    4

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    27/85

    Cisco Confidential 2010 Cisco and/or its affiliates. All rights reserved. 27

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    28/85

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    29/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 29

    Fabric works in cellperiods of 128ns. Cellclock is distributed to LCs

    and switch cards.

    Packets are segmentedinto cells in the ingresspath

    For each cell a request issent to the SCA(Scheduler Control ASIC)

    SCA determines whichinput -> outputconnections to make.Sends grants to the IFIAand controls the XBARs

    Cells are sent across theserial links and XBARs tothe EFIA

    Packets are reassembledin the egress path

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    30/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 30

    Ingress ToFab Queues: Per destination slot HPQ + LPQs, Multicast HPQ + LPQs. (MDRR) FIA (toFab): H/L Unicast queues per destination LC + H/L Multicast Q SCA algorithm used to insure fairness + maximize throughput over the fabric

    Schedules between UC/MC requests (alternates priority between UC/MC).

    Within a priority, Input LCs get their fair share of traffic towards output linecards

    FIA (FrFab) has: Per source UC/MC reassembly queues that can flow control the SCA if nearing full.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    31/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 31

    Multicast to different LCs is performed by the crossbar.A given multicast cell is transmitted to N destinations across the crossbar.

    Partial grants supported.If a cell wants to go to destinations 1,2 and 3. The fabric may first grant 1,3 andthen grant 2 in a subsequent cell time.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    32/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 32

    Switch cards:Redundant switch card which allows to correct errors in a single serial link

    stream.Redundant stream carries XOR of 4 other streams

    Provides 4+1 redundancy.

    CSC (Clock and Scheduler) cards:

    2 of these in the system, one is operational and the other standby.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    33/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 33

    Generation FabricConnection

    FIA SCH XBAR

    622M 5x1.25G FIA SCA XBAR

    2.5G FIA-48Fusilli

    10G 20x1.25G TFIA,FFIASuperfish

    SCA192 XBAR192

    20G 20x2.5G EROS HADHecate

    (priority)

    IRIS

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    34/85

    Cisco Confidential 2010 Cisco and/or its affiliates. All rights reserved. 34

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    35/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 35

    Cell Based (Fixed 136B cells w. cell packing)

    Unscheduled Single stage / 3 stage fabric Single chassis / Multi-chassis capable Architecture scales up to 1536 EFIAs.

    2/4 EFIAs per LC. VOQ not feasible (system has 1M+ Output queues) solved with fabric speedup with flow control

    3 generations: 40G -> 120G -> 400G per slot.New generations required to support previous generations

    Input Buffered. (fabric congestion) Output Buffered. (fabric speedup) Multicast replication in fabric. 2 Priorities per UC/MC in fabric.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    36/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 36

    1 of 8

    2 of 8

    8 of 8

    1

    2

    8

    1

    2

    16

    40/120/400 Gbps

    Line Card Line Card

    136 Bytes cells FabricChassis 2.5X Speedup

    Buffered Non-blocking SwitchMulti-stage Interconnect3 Stage Clos Topology

    S1 S2 S3

    S1 S2 S3

    S1 S2 S3

    2 LEVELS OF PRIORITY MULTICAST SUPPORT

    1M multicast groups

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    37/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 37

    IFIA:Segments packets into fixed size cells

    Distributes cells evenly to planes.

    S1 Stage: Distributes all cells evenly to all S2s. S2 Stage:

    UC: Directs cell to S3 stage based on Destination address.

    MC: Replicates cell to S3 stages based on FGID

    S3 Stage:UC: Directs cell to EFIA based on destination address.

    MC: Replicates cell to EFIAs based on FGID

    EFIA:Receives cells from all planes and reassembles packet per source/cast.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    38/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 38

    Single chassis can have: 3-stage topology if switching element does not have enough links to do single stage. Single stage topology:

    Full mesh between IFIAs and EFIAs and Fabric chips. Fabric chips in S123 mode whereby incoming cells are routed directly to the EFIAs.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    39/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 39

    Linecard Chassis: Fabric Cards contain S1 and S3 stages (these maybe combined into ASICs doing both stages). Fabric Chassis:

    Fabric cards contain S2 Stages. 1 or more fabric cards may implement S2 stage of plane.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    40/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 40

    Full mesh between IFIAs,EFIAs and Fabric chips. Fabric chips in S13 mode:

    Traffic local to a chassis does not go over optical links. Traffic destined to other chassis goes over optical links.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    41/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 41

    Input buffering: Per destination (EFIA) H/L queue. System scale (~1M OQ) deemed too large for VOQ.

    Output buffering: Per faceplate port configurable number of queues.

    S3

    Reseq&Reassembly

    EFIA

    DiscardFilter

    8k shaped

    Queues

    ..

    IFIA

    Fabr

    icDestinationBP

    S1 S2

    Packetsfrom NPU

    3072 High priorityfabric Destination(EFIA) queues

    S2 Queues perpriority per S3group

    S3 Queues perpriority per fabricdestination (EFIA)

    4K Raw queuesin EFIA

    Packetsto NPU

    3072 Low priorityfabric destination(EFIA) queues

    S1 has a singledata queue

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    42/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 42

    S3

    Reseq&Re

    assembly

    EFIA

    DiscardFilter

    8k shapedQueues

    ..

    IFIA

    FabricDestinationBP

    S1 S2

    Packets

    fromNPU

    1 High priority

    Multicast queue

    S2 Queues per S3group per priority andcast. (i.e. separatequeues for MC)

    Some number ofMC Raw queues

    in EFIA

    Packetsto NPU

    1 Low priorityMulticast queue

    S1 has a single

    data queue forboth UC and MCdata cells

    S3 Queues perdestination per priorityand cast. (i.e. separatequeues for MC)

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    43/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 43

    IFIAs send traffic into the fabric. 2 Main flow controls to regulate.

    Handles cases where fabric speedup is insufficient Destination Backpressure:

    Used to minimize buffer occupancy in the fabric for short term congestion. Operates at per destination EFIA granularity. S3 Queue congestion + S2 Feed forward counts contribute. Ingress FIAs implement a slow start algorithm to minimize overshoot.

    Discard: Operates at a per faceplate port granularity. Used to alleviate potential fabric congestion by reducing the amount ofcongested traffic from entering the fabric.

    That is we do not want to send packets across the fabric which are going to bediscarded at EFIA anyway.

    To minimize the amount of queuing delay at IFIA due to congestion at thedestination.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    44/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 44

    Other flow controls in the fabric as well:Stages may backpressure upstream stages if they are out of resources.

    Not expected.

    S3

    Reseq&Reassembly

    EFIA

    DiscardFilter

    8k shapedQueues

    ..

    IFIA

    FabricDestinationBP

    S1 S2

    Discard

    DestBP

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    45/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 45

    Multicast is performed in the fabric at the S2 and S3 stages. IFIAs have 2 queues (H/L) Cell Header contains FGID field which is used by S2 and S3

    stages as an index to replication table.

    1M fabric groups.

    S2 and S3 replicate cells

    No flow control mechanisms. Separate queues for H/L multicast. If there is congestion multicast cells are dropped.

    Scheduling between unicast and multicast cells is WRR at S2 andS3 stages.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    46/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 46

    4/8 planes in the system.FQ: 8 planes: 1 plane per fabric card

    HQ: 8 planes: 2 planes per fabric card

    QQ: 4 planes: 1 plane per fabric card

    Enough speedup in fabric to handle one plane down and notadversely affect performance.

    Further planes down result in reduced fabric performance Fabric can operate with a minimum of 2 planes.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    47/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 47

    Generation Fabric links FIA XBAR40G 2.5G

    (8b10b)SprayerSponge

    SEA(36x72)

    120G 5G(scrambler +

    8b10b)

    SealCrab

    Superstar(128x144)

    400G 8.625G(scrambler)

    Inbar Sapir(128x128)

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    48/85

    Cisco Confidential 2010 Cisco and/or its affiliates. All rights reserved. 48

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    49/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 49

    Cell Based (Variable Cell size) Distributed scheduling Single stage / 3 stage fabric topologies Single chassis / Multi-chassis capableArchitecture scales up to 768 FIAs

    Up to 6 FIAs per LC. Input Buffered: 64K VOQs (4 COS per 10GE) Multicast replication in fabric. (512K groups) 2 independent pipes in fabric.

    OTN, Data UC, MC

    Panini Multi chassis Architecture

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    50/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 50

    S1

    S1

    S1

    S1

    S3

    S3

    S3

    S3

    S2

    S2

    S2

    6 of 6

    S1

    S1

    S1

    S1

    S3

    S3

    S3

    S3

    S2

    S2

    S2

    2 of 6

    Panini Multi-chassis Architecture- Multi-Chassis Fabric Architecture

    S1

    S1

    S1

    S1

    S3

    S3

    S3

    S3

    S2

    S2

    S2

    1 of 6

    nx200G

    64~256B Cells nx200G(1x Speedup)

    FabricChassis

    3 Stage CLOS Topology

    Single Priority2 Paths in fabric

    Multicast Support512K Multicast Groups

    Replication at S2 and S3

    240Gbps

    240Gbps240Gbps

    240Gbps

    CXPCXP

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    51/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 51

    IFIA:Segments packets into variable sized cells

    Distributes cells evenly to planes.

    S1 Stage: Distributes all cells evenly to all S2s. S2 Stage:

    UC: Directs cell to S3 stage based on Destination address.

    MC: Replicates cell to S3 stages based on FGID

    S3 Stage:UC: Directs cell to EFIA based on destination address.

    MC: Replicates cell to EFIAs based on FGID

    EFIA:Receives cells from all planes and reassembles packet per source/cast.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    52/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 52

    Single stage topology Full mesh between FIAs and Fabric chips. FIAs spray cells to Fabric chips which will send them to correct

    FIA.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    53/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 53

    3 stage topology. In each Linecard Chassis of the FIA are connected to each S1S3. Full mesh between S1S3s and S2s on a plane. Fabric Chassis:

    Fabric cards contain S2 Stages. 1 or multiple Fabric cards may implement S2 stage of plane.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    54/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 54

    3 stage topology In each Linecard Chassis of the FIAs are connected to each S1S3. Full mesh between S1S3 and S2 stages S2 stage shared between 2 chassis fabric cards.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    55/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 55

    Unicast: Distributed scheduling VOQs in IFIA indicate occupancy state to EFIAs they are associated with.

    EFIAs will issue credits fairly (WFQ) to IFIAs based on: VOQ state from IFIAs Number of links up between S3 and itself Congestion indication from fabric

    IFIAs will send cells of packets into fabric for VOQs with credit. Multicast: Unscheduled

    Packets sent into the fabric Congestion may result in drops or global flow control depending on priority.

    Congestion in fabric between UC/MC: When only UC traffic, should be no sustained congestion as the EFIAs

    control the traffic towards it.

    When MC is introduced: congestion may occur and fabric will indicate to EFIA EFIA will adapt UC credits down to a configured value to make room for MC traffic

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    56/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 56

    Ingress Unicast: 64K VOQs:Enough for 4 COS Queues per 10GE

    Ingress Multicast:4 class queues towards the fabric

    Fabric: Queues per pipe per destination. S2: queue per S3 destination. S3: queue per EFIA.

    Egress: Per cast/class queues towards the egress NPU

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    57/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 57

    Multicast is performed in the fabric at the S2 and S3 stages. Cell Header contains FGID field which is used by S2 and S3

    stages as an index to replication table.

    512K Fabric groups.

    S2 and S3 replicate cells to wanted destinations

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    58/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 58

    6 planes in the system. Enough speedup in fabric to handle one plane down and not

    adversely affect performance.

    Further planes down result in reduced fabric performance Fabric can operate with a minimum of 1 plane.

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    59/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 59

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    60/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 60

    Packets are segmentedinto 64B cells

    48B of payload 8B of cell header 8B of CRC

    No cell packing: a givencell may only have data forone packet.

    Cell is split (in 16-bitchunks) over 4 serial links.One to each XBAR

    A fifth redundant serial linkcontains information forerror correction

    Links are 8b10b encoded

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    61/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 61

    Packets are segmented into136B cells:

    12B of header 120B of payload 4B of RS code (for error

    correction)

    Unicast cells can be packedsuch that a cell can contain

    data from 2 packets

    Multicast cells are not packed Control Cells

    Idle,Discard,SRCC Cells are:

    8b10b encoded for 2.5G links Scrambled + 8b10b encoded for

    5G links

    Scrambled for 8.625G links

    Packet 1 Packet 2Two Packet Payloads

    Packet 1 Packet 2

    Packet 1 Packet 2

    30 bytes 30 bytes 30 bytes 30 bytes

    Cell Payload (120B)

    Packet 1 (120 bytes)

    Single Packet Payload

    (4 bytes)

    Header (12 bytes)

    RS

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    62/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 62

    13 byte header | 64-256 bytes payload | 1 byte CRC Idle cells sent if no data to send 11.5G Serdes 64/66 encoding FEC covers a group of cells (for optical links) Retransmit used for electrical link error correction CRC-32 covers the packet

    Data payload 64-255 bytescrc8Fabric hdr

    13 bytes

    Pkt payload up to 9.6k bytescrc32Fabric

    hdr

    4 bytes

    Cell

    Packet

    Pkt hdr

    14 bytes

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    63/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 63

    S3

    Reseq&Reassembly

    EFIA

    Disca

    rdFilter

    8k shapedQueues

    ..

    IFIA

    FabricDestinationBP

    S1 S2

    PacketsfromNPU

    3072 High priorityfabric destinationqueues

    S2 Queues perpriority per fabricgroup

    S3 Queues perpriority per fabricdestination

    4K raw queuesin EFIA

    EFIA raw queuestate controls thediscard filter

    Packetsto NPU

    S2 OOR stateused to controlscheduling in S1

    S1 Hiccups controlper plane schedulingat Sprayer

    3072 Low priorityfabric destinationqueues

    S3 OOR state usedto control schedulingin S2

    S2 queue state andfeed forwardincorporated intodestination BP

    S3 queue stategeneratesdestination BP

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    64/85

    Cisco Confidential 2010 Cisco and/or its affiliates. All rights reserved. 64

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    65/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 65

    First high performance enterprise switchFCS in 1998

    First implementation was shared bus, evolved to single stagefabric

    Large set of features, supported also by special service cards Wire rate at minimum packet size

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    66/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 66

    Port asics on linecards, decision engine and Dbus arbiter on supervisor card 16 Gb/s total system bandwidth Input and output buffers No backpressure from output

    Bus arbiter prevented multiple port asics to write on the bus simultaneously

    Two bus priorities to support VoIP

    More queuing classes on port asics

    DBUS

    RBUS

    DECISION

    ENGINE

    PORT ASIC PORT ASIC PORT ASIC..

    ARB

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    67/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 67

    Ioslice

    .

    .

    .

    .

    .

    .

    Input Queue (per priority)

    Decision

    Engine

    Ioslice

    .

    .

    .

    .

    .

    .

    Output Queues

    CrossbarFabric

    interface

    Fabric

    interface

    High speed serial links

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    68/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 68

    .

    .

    .

    .

    Fabric

    interfaceCrossbar

    .

    .

    .

    .

    Fabric

    interface..

    DecisionEngine

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    69/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 69

    Input queuesNo VOQ, only per-priority input queues

    No congestion feedback from egress ports to ingress

    Blocking possible

    CrossbarCan drop packets when congested

    Initially centralized decision engine, later distributed on each linecardTo support line rate at min packet size as newer generations of linecards got faster

    .

    .

    .

    .

    Fabric

    interfaceCrossbar

    .

    .

    .

    .

    Fabric

    interface..

    DecisionEngine

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    70/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 70

    Packet basedArbiters (conceptually one per outputbuffer) independently decide whichinput is writing to which output

    Each crossbar link is a bundle of8 serdes

    Lower port count required

    Input and output queuesInput queues cause blocking

    3x internal overspeed to compensate

    Requires store and forwardInput queues can drop

    Two priorities supported as twoseparate datapaths and queues

    Prio1

    .

    .

    Prio2

    .

    .

    .

    .

    . .

    . .

    . .

    Egresswrr

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    71/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 71

    Crossbar does replication according to FabricPort Of Exit mask

    FPOE set by ingress port asic

    Done by writing to multiple output queues simultaneously

    Multiple retries possible to satisfy replication mask

    3x internal overspeed helps to maintain rate

    Egress fabric interface and egress port asic useDestination Index in packet header to accesslookup table and perform further replications

    .

    .

    . .

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    72/85

    Cisco Confidential 2010 Cisco and/or its affiliates. All rights reserved. 72

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    73/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 73

    Support for no-drop protocols (Fibrechannel over Ethernet) Bandwidth optimized More than 16 slots per chassis High density, including oversubscribed linecards

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    74/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 74

    Centrally scheduled Packet based 3 stage fabric Buffered Crossbar Input Buffered Single chassis topology, up to 16 linecards slots + 2 supervisor

    slots

    Multicast replication in fabric One priority per cast in fabric 8 priorities per cast in Ioslices

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    75/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 75

    . .

    .

    .

    .

    .

    .

    .

    schedulercredit returnrequest

    grant

    crossbar with inputand output buffer

    (one of three stages shown)

    VOQper port,classoutput queues

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    76/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 76

    Packets are queued into VOQsVOQ system shared among ports on an ingress ioslice

    Multiple small packets are accumulated into a superframeUp to a max size of about 3000 bytes

    Requests are made to central schedulerNo size information, MTU assumed

    Destination port and priority

    Grants are generated according to egress buffer availabilityCentral scheduler keeps track of buffer availability on every egress queue in thesystem

    Superframes are sent to fabric upon grant receptionSmaller than max size if grants arrives quickly, when little or no congestion present

    Split into fragments if packet bigger than max sf size

    No drops in crossbars and outputsOptionally no drops in VOQs by issuing pause frames

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    77/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 77

    Up to 8 VOQs per destination port at ingress iosliceShared across ingress portsIngress drops according to tail drop or some form of AQM (WRED, AFD)

    Ingress Ioslice determines load balancing over fabric planesRound robin or pseudo-random

    One egress queue for each egress port, priorityNo drops at egress

    One credit loop per egress port, priority

    Buffer hard-partitioned

    Credits are returned to central scheduler whenpacket leaves system, creating available buffer

    Egress scheduler controls egress queue

    drain rate, so it controls credit return rate

    Central scheduler distributes aggregategrant rate across requesting voqs

    voq42,1

    voq42,1

    voq42,1

    eg q 42,1

    central

    scheduler

    eg

    scheduler

    eg q 42,2

    voq42,2

    voq42,2

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    78/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 78

    Three stage folded CLOSnetwork

    Links are bundles of 8 serdes Aggregator asic in linecard to

    reduce number of arbitrationlinks to central schedulers(active/standby)

    Depending on ioslicebandwidth, multiple links

    between ioslices andcrossbar(s) on linecard Linecard

    Spine card

    Spine card

    xbar

    .

    .

    SupSup

    central scheduler

    .

    .

    .

    .

    xbar ..

    credit

    aggregator

    central scheduler

    FIA

    FIA

    .

    .

    .

    .

    xbar

    Stage 1 and 3

    Stage 2

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    79/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 79

    Not scheduledNo superframing possible

    no voqs, only input queues, independent of group membership

    Replication performed at S2 and S3 stagesSeparate internal datapath in xbar for multicast

    Packet header contains index into replication table

    Multiple retries to satisfy all required destination crossbar ports

    Limited by timer. On timeout, drop

    Egress ioslices do further replication to individual ports Load balanced to fabric planes using flow hash

    No reordering required

    Max rate of single flow limited by link bundle capacity

    Fabric can also be programmed to flow control multicastMore blocking, but preferable for financial applications where MC is low averagebandwidth but very bursty

    Scheduling between unicast and multicast is DWRR

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    80/85

    Cisco Confidential 2010 Cisco and/or its affiliates. All rights reserved. 80

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    81/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 81

    Low latencySub microsecond

    Cut through operation

    Single chassis, multiple chips Support for no-drop classes (FCoE)

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    82/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 82

    Packet-based, with full cut-through Speculative transmission

    Similar to shared Ethernet: collision detection and retransmission

    Single crossbar stage with large overspeed Unbuffered Crossbar Input and output Buffered

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    83/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 83

    Single stage crossbars Fabric latency-optimized for 10 Gbps or 40 Gbps

    Single links or bundles of 4

    Fabric speedup 3.6 Out of band Ack/nack from crossbar for each transmission Out of band Xon/Xoff broadcast from each unicast output queue to all ingress

    ack/nackXon/Xoff

    48 48

    4

    12 x 10G

    or 3 x 40G

    12 x 10G

    or 3 x 40G

    576 x 10Gor

    144 x 40G

    IFIA XBAR EFIA

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    84/85

    2010 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 84

    Packets are queued into VOQsSeparate VOQ system for each input port on an ingress ioslice to support ingress cut-through

    8 classes of service across entire switch

    Superframing when congested

    If destination not congested, unicast packet is sent immediatelyRandom path selection

    Crossbar Nacks packet if no downlink available, ingress stops and retries on different path

    Only one packet in flight per voq.

    No need to reorder packets at egress

    Large speedup in egress buffer and egress downlinks to reduce collisionprobability

    Egress queue per (priority,port)Broadcasts Xoff before it gets too full, to avoid egress drops

    After getting Xon ingress waits random time before attempting send

  • 7/30/2019 A-975f5095-4d33-4907-8af5-ed1e7e50b5a0-8cfe9203-f3d3-4a98-823b-50b02ebc2d97_130423_35702_24

    85/85

    Sent immediately by ingresssubject only to uplink availability, no egress xon/xoff

    Replication performed in crossbarOne copy to each ioslice participating in the mc group

    Crossbar nacks with success maskIf some destinations did not get a copy, ingress retries

    No memory in crossbar

    Egress chip has shared memoryOne memory write, multiple reads according to group membershipSome destinations may be pruned based on queue lengths