blob: 5bd6a40ffdba0c13f25f7ab2736f5826b04bd954 [file] [log] [blame]
Zack Williams553a3632019-08-09 17:14:43 -07001Architecture and Design
2***********************
Charles Chan33bac082019-09-12 01:07:51 -07003
Charles Chan6613eac2019-09-17 15:42:48 -07004Overview
5--------
6
7.. image:: images/arch-overview.png
8 :width: 1000px
9
10Trellis operates as a **hybrid L2/L3 fabric**.
11
Zack Williamsd63d35b2020-06-23 14:12:46 -070012As a **pure (or classic) SDN** solution, Trellis **does not use any of the
13traditional control protocols** typically found in networking, a non-exhaustive
14list of which includes: STP, MSTP, RSTP, LACP, MLAG, PIM, IGMP, OSPF, IS-IS,
15Trill, RSVP, LDP and BGP.
Charles Chan6613eac2019-09-17 15:42:48 -070016
Zack Williamsd63d35b2020-06-23 14:12:46 -070017Instead, Trellis **uses an SDN Controller (ONOS) decoupled from the data-plane
18hardware to directly program ASIC forwarding tables using OpenFlow and with
19OF-DPA**, an open-API from Broadcom running on the switches.
Charles Chan6613eac2019-09-17 15:42:48 -070020
Zack Williamsd63d35b2020-06-23 14:12:46 -070021In this design, a set of applications running on ONOS implement all the fabric
22functionality and features, such as **Ethernet switching**, **IP routing**,
23**multicast**, **DHCP Relay**, **pseudowires** and more.
Charles Chan6613eac2019-09-17 15:42:48 -070024
25.. note::
Zack Williamsd63d35b2020-06-23 14:12:46 -070026 You can learn more about Trellis features and design concepts by visiting
27 the `Project Website <https://opennetworking.org/trellis>`_ and reading the
28 `Platform Brief
29 <https://www.opennetworking.org/wp-content/uploads/2019/09/TrellisPlatformBrief.pdf>`_.
Charles Chan6613eac2019-09-17 15:42:48 -070030
31
Charles Chan33bac082019-09-12 01:07:51 -070032Introduction to OF-DPA Pipeline
33-------------------------------
34
Zack Williamsd63d35b2020-06-23 14:12:46 -070035In this design note, we are going to explain the design choices we have made
36and how we got around OF-DPA (OpenFlow Data Plane Abstraction) pipeline
37restrictions to implement the features we need. We will start from explaining
38the OFDPA flow tables and group tables we use.
39
Charles Chan33bac082019-09-12 01:07:51 -070040Fig. 1 shows the simplified OFDPA pipeline overview.
41
42.. image:: images/arch-ofdpa.png
43 :width: 1000px
44
45Fig. 1 Simplified OF-DPA pipeline overview
46
Charles Chan33bac082019-09-12 01:07:51 -070047Flow Tables
48-----------
49
50VLAN Table
51^^^^^^^^^^
52.. note::
Zack Williamsd63d35b2020-06-23 14:12:46 -070053 The **VLAN Flow Table (id=10)** is used for IEEE 801.Q VLAN assignment and
54 filtering to specify how VLANs are to be handled on a particular port.
55 **All packets must have an associated VLAN id in order to be processed by
56 subsequent tables**.
Charles Chan33bac082019-09-12 01:07:51 -070057
58 **Table miss**: goto **ACL table**.
59
60According to OFDPA spec, we need to assign a VLAN ID even for untagged packets.
Zack Williamsd63d35b2020-06-23 14:12:46 -070061Each untagged packet will be tagged with an **internal VLAN** when being
62handled by VLAN table. The internal VLAN will be popped when the packet is
63sent to a output port or controller. The internal VLAN is assigned according
64to the subnet configuration of the input port. Packets coming from ports that
65do not have subnet configured (e.g. the spine facing ports) will be tagged with
66VLAN ID **4094**.
Charles Chan33bac082019-09-12 01:07:51 -070067
Zack Williamsd63d35b2020-06-23 14:12:46 -070068The internal VLAN is also used to determine the subnet when a packet needs to
69be flooded to all ports in the same subnet. (See L2 Broadcast section for
70detail.)
Charles Chan33bac082019-09-12 01:07:51 -070071
72Termination MAC Table
73^^^^^^^^^^^^^^^^^^^^^
74.. note::
Zack Williamsd63d35b2020-06-23 14:12:46 -070075 The **Termination MAC (TMAC) Flow Table (id=20)** determines whether to do
76 bridging or routing on a packet.
77
Charles Chan33bac082019-09-12 01:07:51 -070078 It identifies routed packets their destination MAC, VLAN, and Ethertype.
Zack Williamsd63d35b2020-06-23 14:12:46 -070079
80 Routed packet rule types use a Goto-Table instruction to indicate that the
81 next table is one of the routing tables.
Charles Chan33bac082019-09-12 01:07:51 -070082
83 **Table miss**: goto **Bridging table**.
84
Zack Williamsd63d35b2020-06-23 14:12:46 -070085In this table, we determine which table the packet should go to by checking the
86destination MAC address and the Ethernet type of the packet.
Charles Chan33bac082019-09-12 01:07:51 -070087
88- if dst_mac = router MAC and eth_type = ip, goto **unicast routing** table
Charles Chan33bac082019-09-12 01:07:51 -070089
Zack Williamsd63d35b2020-06-23 14:12:46 -070090- if dst_mac = router MAC and eth_type = mpls, goto **MPLS table**
91
92- if dst_mac = multicast MAC (01:00:5F:00:00:00/FF:FF:FF:80:00:00), goto
93 **multicast routing** table
94
95- none of above, goto **bridging table**
Charles Chan33bac082019-09-12 01:07:51 -070096
97MPLS Tables
98^^^^^^^^^^^
99.. note::
Zack Williamsd63d35b2020-06-23 14:12:46 -0700100 The MPLS pipeline can support three **MPLS Flow Tables, MPLS Table 0
101 (id=23), MPLS Table 1 (id=24) and MPLS Table 2 (id=25)**.
Charles Chan33bac082019-09-12 01:07:51 -0700102
Zack Williamsd63d35b2020-06-23 14:12:46 -0700103 An MPLS Flow Table lookup matches the label in the outermost MPLS shim
104 header in the packets.
105
106 - MPLS Table 0 is only used to pop a protection label on platforms that
107 support this table, or to detect an MPLS- TP Section OAM PDU.
108
Charles Chan33bac082019-09-12 01:07:51 -0700109 - MPLS Table 1 and MPLS Table 2 can be used for all label operations.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700110
111 - MPLS Table 1 and MPLS Table 2 are synchronized flow tables and updating
112 one updates the other
Charles Chan33bac082019-09-12 01:07:51 -0700113
114 **Table miss**: goto **ACL table**.
115
116We only use MPLS Table 1 (id=24) in current design.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700117
Charles Chan33bac082019-09-12 01:07:51 -0700118MPLS packets are matched by the MPLS label.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700119
120The packet will go to **L3 interface group** with MPLS label being popped and
121further go to destination leaf switch.
Charles Chan33bac082019-09-12 01:07:51 -0700122
123
124Unicast Routing Table
125^^^^^^^^^^^^^^^^^^^^^
126.. note::
Zack Williamsd63d35b2020-06-23 14:12:46 -0700127 The **Unicast Routing Flow Table (id=30)** supports routing for potentially
128 large numbers of IPv4 and IPv6 flow entries using the hardware L3 tables.
Charles Chan33bac082019-09-12 01:07:51 -0700129
130 **Table miss**: goto **ACL table**.
131
Zack Williamsd63d35b2020-06-23 14:12:46 -0700132In this table, we determine where to output a packet by checking its
133**destination IP (unicast)** address.
Charles Chan33bac082019-09-12 01:07:51 -0700134
Zack Williamsd63d35b2020-06-23 14:12:46 -0700135- if dst_ip locates at a **remote switch**, the packet will go to an **L3 ECMP
136 group**, be tagged with MPLS label, and further go to a spine switch
Charles Chan33bac082019-09-12 01:07:51 -0700137
Zack Williamsd63d35b2020-06-23 14:12:46 -0700138- if dst_ip locates at the **same switch**, the packet will go to an **L3
139 interface group** and further go to a host
140
141Note that the priority of flow entries in this table is sorted by prefix
142length.
143
Charles Chan33bac082019-09-12 01:07:51 -0700144Longer prefix (/32) will have higher priority than shorter prefix (/0).
145
146
147Multicast Routing Table
148^^^^^^^^^^^^^^^^^^^^^^^
149.. note::
Zack Williamsd63d35b2020-06-23 14:12:46 -0700150 The **Multicast Routing Flow Table (id=40)** supports routing for IPv4 and
151 IPv6 multicast packets.
Charles Chan33bac082019-09-12 01:07:51 -0700152
153 **Table miss**: goto **ACL table**.
154
155Flow entries in this table always match the **destination IP (multicast)**.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700156
157Matched packets will go to an **L3 multicast group** and further go to the next
158switch or host.
Charles Chan33bac082019-09-12 01:07:51 -0700159
160
161Bridging Table
162^^^^^^^^^^^^^^
163.. note::
Zack Williamsd63d35b2020-06-23 14:12:46 -0700164 The **Bridging Flow Table (id=50)** supports Ethernet packet switching for
165 potentially large numbers of flow entries using the hardware L2 tables.
166
167 The Bridging Flow Table forwards either based on VLAN (normal switched
168 packets) or Tunnel id (isolated forwarding domain packets), with the Tunnel
169 id metadata field used to distinguish different flow table entry types by
170 range assignment.
Charles Chan33bac082019-09-12 01:07:51 -0700171
172 **Table miss**: goto **ACL table**.
173
Zack Williamsd63d35b2020-06-23 14:12:46 -0700174In this table, we match the **VLAN ID** and the **destination MAC address** and
175determine where the packet should be forwarded to.
Charles Chan33bac082019-09-12 01:07:51 -0700176
Zack Williamsd63d35b2020-06-23 14:12:46 -0700177- if the destination MAC can be matched, the packet will go to the **L2
178 interface group** and further sent to the destination host.
179
180- if the destination MAC can not be matched, the packet will go to the **L2
181 flood group** and further flooded to the same subnet.
182
183 Since we cannot match IP in bridging table, we use the VLAN ID to determine
184 which subnet this packet should be flooded to.
185
186 The VLAN ID can be either (1) the internal VLAN assigned to untagged packets
187 in VLAN table or (2) the VLAN ID that comes with tagged packets.
Charles Chan33bac082019-09-12 01:07:51 -0700188
189
190Policy ACL Table
191^^^^^^^^^^^^^^^^
192.. note::
193 The Policy ACL Flow Table supports wide, multi-field matching.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700194
195 Most fields can be wildcard matched, and relative priority must be
196 specified in all flow entry modification API calls.
197
198 This is the preferred table for matching BPDU and ARP packets. It also
199 provides the Metering instruction.
Charles Chan33bac082019-09-12 01:07:51 -0700200
201 **Table miss**: **do nothing**.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700202 The packet will be forwarded using the output or group in the action set,
203 if any.
204
Charles Chan33bac082019-09-12 01:07:51 -0700205 If the action set does not have a group or output action the packet is dropped.
206
Zack Williamsd63d35b2020-06-23 14:12:46 -0700207In ACL table we trap **ARP**, **LLDP**, **BDDP**, **DHCP** and send those
208packets to the **controller**.
Charles Chan33bac082019-09-12 01:07:51 -0700209
210Group Tables
211------------
212
213L3 ECMP Group
214^^^^^^^^^^^^^
215.. note::
216 OF-DPA L3 ECMP group entries are of OpenFlow type **SELECT**.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700217
218 For IP routing the action buckets reference the OF-DPA **L3 Unicast group**
219 entries that are members of the multipath group for ECMP forwarding.
Charles Chan33bac082019-09-12 01:07:51 -0700220
221 An OF-DPA L3 ECMP Group entry can also be used in a Provider Edge Router.
Charles Chan33bac082019-09-12 01:07:51 -0700222
Zack Williamsd63d35b2020-06-23 14:12:46 -0700223 In this packet flow it can chain to either an **MPLS L3 Label** group entry
224 or to an **MPLS Fast Failover** group entry.
Charles Chan33bac082019-09-12 01:07:51 -0700225
Zack Williamsd63d35b2020-06-23 14:12:46 -0700226 An OF-DPA L3 ECMP Group entry can be specified as a routing target instead
227 of an OF-DPA L3 Unicast Group entry. Selection of an action bucket for
228 forwarding a particular packet is hardware-specific.
Charles Chan33bac082019-09-12 01:07:51 -0700229
230MPLS Label Group
231^^^^^^^^^^^^^^^^
232.. note::
233 MPLS Label Group entries are of OpenFlow **INDIRECT** type.
Charles Chan33bac082019-09-12 01:07:51 -0700234
Zack Williamsd63d35b2020-06-23 14:12:46 -0700235 There are four MPLS label Group entry subtypes, all with similar structure.
236
237 These can be used in different configurations to **push up to three
238 labels** for tunnel initiation or LSR swap.
Charles Chan33bac082019-09-12 01:07:51 -0700239
240MPLS Interface Group
241^^^^^^^^^^^^^^^^^^^^
242.. note::
243 MPLS Interface Group Entry is of OpenFlow type **INDIRECT**.
Charles Chan33bac082019-09-12 01:07:51 -0700244
Zack Williamsd63d35b2020-06-23 14:12:46 -0700245 It is used to **set the outgoing L2 header** to reach the next hop label
246 switch router or provider edge router.
Charles Chan33bac082019-09-12 01:07:51 -0700247
Zack Williamsd63d35b2020-06-23 14:12:46 -0700248We use **L3 ECMP** group to randomly pick one spine switch when we need to
249route a packet from leaves to spines.
Charles Chan33bac082019-09-12 01:07:51 -0700250
Zack Williamsd63d35b2020-06-23 14:12:46 -0700251We point each bucket to an **MPLS Label** Group in which the MPLS labels are
252pushed to the packets to realize Segment Routing mechanism. (More
253specifically, we use the subtype 2 **MPLS L3 VPN Label**).
Charles Chan33bac082019-09-12 01:07:51 -0700254
Zack Williamsd63d35b2020-06-23 14:12:46 -0700255We then point an MPLS Label Group points to an **MPLS Interface** Group in
256which the destination MAC is set to the next hop (spine router).
Charles Chan33bac082019-09-12 01:07:51 -0700257
Zack Williamsd63d35b2020-06-23 14:12:46 -0700258Finally, the packet will goto an **L2 Interface** Group and being sent to the
259output port that goes to the spine router.
260
261Detail of how segment routing is implemented will be explained in the L3
262unicast section below.
Charles Chan33bac082019-09-12 01:07:51 -0700263
264L3 Unicast Group
265^^^^^^^^^^^^^^^^
266.. note::
267 OF-DPA L3 Unicast group entries are of OpenFlow **INDIRECT** type.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700268
269 L3 Unicast group entries are used to supply the routing next hop and output
270 interface for packet forwarding.
271
272 To properly route a packet from either the Routing Flow Table or the Policy
273 ACL Flow Table, the forwarding flow entry must reference an L3 Unicast
274 Group entry.
Charles Chan33bac082019-09-12 01:07:51 -0700275
276 All packets must have a VLAN tag.
Charles Chan33bac082019-09-12 01:07:51 -0700277
Zack Williamsd63d35b2020-06-23 14:12:46 -0700278 **A chained L2 Interface group entry must be in the same VLAN as assigned
279 by the L3 Unicast Group** entry.
Charles Chan33bac082019-09-12 01:07:51 -0700280
Zack Williamsd63d35b2020-06-23 14:12:46 -0700281We use L3 Unicast Group to rewrite the **source MAC**, **destination MAC** and
282**VLAN ID** when routing is needed.
Charles Chan33bac082019-09-12 01:07:51 -0700283
284L3 Multicast Group
285^^^^^^^^^^^^^^^^^^
286.. note::
287 OF-DPA L3 Multicast group entries are of OpenFlow **ALL** type.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700288
289 The action buckets describe the interfaces to which multicast packet
290 replicas are forwarded.
291
Charles Chan33bac082019-09-12 01:07:51 -0700292 Note that:
293
Zack Williamsd63d35b2020-06-23 14:12:46 -0700294 - Chained OF-DPA **L2 Interface** Group entries must be in the **same
295 VLAN** as the OF-DPA **L3 Multicast** group entry. However,
Charles Chan33bac082019-09-12 01:07:51 -0700296
Zack Williamsd63d35b2020-06-23 14:12:46 -0700297 - Chained OF-DPA **L3 Interface** Group entries must be in **different
298 VLANs** from the OF-DPA **L3 Multicast** Group entry, **and from each
299 other**.
Charles Chan33bac082019-09-12 01:07:51 -0700300
301We use L3 multicast group to replicate multicast packets when necessary.
Charles Chan33bac082019-09-12 01:07:51 -0700302
Zack Williamsd63d35b2020-06-23 14:12:46 -0700303It is also possible that L3 multicast group consists of only one bucket when
304replication is not needed.
305
306Detail of how multicast is implemented will be explained in the L3 multicast
307section below.
Charles Chan33bac082019-09-12 01:07:51 -0700308
309L2 Interface Group
310^^^^^^^^^^^^^^^^^^
311.. note::
Zack Williamsd63d35b2020-06-23 14:12:46 -0700312 L2 Interface Group entries are of OpenFlow **INDIRECT** type, with a single
313 action bucket.
Charles Chan33bac082019-09-12 01:07:51 -0700314
Zack Williamsd63d35b2020-06-23 14:12:46 -0700315 OF-DPA L2 Interface group entries are used for egress VLAN filtering and
316 tagging.
Charles Chan33bac082019-09-12 01:07:51 -0700317
Zack Williamsd63d35b2020-06-23 14:12:46 -0700318 If a specific set of VLANs is allowed on a port, appropriate group entries
319 must be defined for the VLAN and port combinations.
320
321 Note: OF-DPA uses the L2 Interface group declaration to configure the port
322 VLAN filtering behavior.
323
324 This approach was taken since OpenFlow does not support configuring VLANs
325 on physical ports.
Charles Chan33bac082019-09-12 01:07:51 -0700326
327L2 Flood Group
328^^^^^^^^^^^^^^
329.. note::
Zack Williamsd63d35b2020-06-23 14:12:46 -0700330 L2 Flood Group entries are used by VLAN Flow Table wildcard (destination
331 location forwarding, or DLF) rules.
332
333 Like OF-DPA L2 Multicast group entry types they are of OpenFlow **ALL**
334 type.
335
Charles Chan33bac082019-09-12 01:07:51 -0700336 The action buckets each encode an output port.
Charles Chan33bac082019-09-12 01:07:51 -0700337
Zack Williamsd63d35b2020-06-23 14:12:46 -0700338 Each OF-DPA L2 Flood Group entry bucket forwards a replica to an output
339 port, except for packet IN_PORT.
Charles Chan33bac082019-09-12 01:07:51 -0700340
Zack Williamsd63d35b2020-06-23 14:12:46 -0700341 All of the OF-DPA L2 Interface Group entries referenced by the OF-DPA Flood
342 Group entry, and the OF- DPA Flood Group entry itself, must be in the
343 **same VLAN**.
Charles Chan33bac082019-09-12 01:07:51 -0700344
Zack Williamsd63d35b2020-06-23 14:12:46 -0700345 Note: There can only be **one OF-DPA L2 Flood Group** entry defined **per
346 VLAN**.
Charles Chan33bac082019-09-12 01:07:51 -0700347
348L2 Unicast
349----------
350
351.. image:: images/arch-l2u.png
352 :width: 800px
353
354Fig. 2: L2 unicast
355
356.. image:: images/arch-l2u-pipeline.png
357 :width: 1000px
358
359Fig. 3: Simplified L2 unicast pipeline
360
Zack Williamsd63d35b2020-06-23 14:12:46 -0700361The L2 unicast mechanism is designed to support **intra-rack (intra-subnet)**
362communication when the destination host is **known**.
Charles Chan33bac082019-09-12 01:07:51 -0700363
364Pipeline Walkthrough - L2 Unicast
365^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Charles Chan33bac082019-09-12 01:07:51 -0700366
Zack Williamsd63d35b2020-06-23 14:12:46 -0700367- **VLAN Table**: An untagged packet will be assigned an internal VLAN ID
368 according to the input port and the subnet configured on the input port.
369 Packets of the same subnet will have the same internal VLAN ID.
370
371- **TMAC Table**: Since the destination MAC of a L2 unicast packet is not the
372 MAC of leaf router, the packet will miss the TMAC table and goes to the
373 bridging table.
374
375- **Bridging Table**: If the destination MAC is learnt, there will be a flow
376 entry matching that destination MAC and pointing to an L2 interface group.
377
378- **ACL Table**: IP packets will miss the ACL table and the L2 interface group
379 will be executed.
380
381 L2 Interface Group: The internal assigned VLAN will be popped before the
382 packet is sent to the output port.
Charles Chan33bac082019-09-12 01:07:51 -0700383
384L2 Broadcast
385------------
386
387.. image:: images/arch-l2f.png
388 :width: 800px
389
390Fig. 4: L2 broadcast
391
392.. image:: images/arch-l2f-pipeline.png
393 :width: 1000px
394
395Fig. 5: Simplified L2 broadcast pipeline
396
397Pipeline Walkthrough - L2 Broadcast
398^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Charles Chan33bac082019-09-12 01:07:51 -0700399
Zack Williamsd63d35b2020-06-23 14:12:46 -0700400- **VLAN Table**: (same as L2 unicast)
401
402- **TMAC Table**: (same as L2 unicast)
403
404- **Bridging Table**: If the destination MAC is not learnt, there will NOT be a
405 flow entry matching that destination MAC.
406
407 It will then fallback to a lower priority entry that matches the VLAN
408 (subnet) and point to an L2 flood group.
409
410- **ACL Table**: IP packets will miss the ACL table and the L2 flood group will
411 be executed.
412
413- **L2 Flood Group**: Consists of all L2 interface groups related to this VLAN
414 (subnet).
415
416- **L2 Interface Group**: The internal assigned VLAN will be popped before the
417 packet is sent to the output port.
Charles Chan33bac082019-09-12 01:07:51 -0700418
419ARP
420---
421
422.. image:: images/arch-arp-pipeline.png
423 :width: 1000px
424
425Fig. 6: Simplified ARP pipeline
426
427All ARP packets will be forwarded according to the bridging pipeline.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700428
Charles Chan33bac082019-09-12 01:07:51 -0700429In addition, a **copy of the ARP packet will be sent to the controller**.
430
Zack Williamsd63d35b2020-06-23 14:12:46 -0700431- Controller will use the ARP packets for **learning purpose and update host
432 store** accordingly.
Charles Chan33bac082019-09-12 01:07:51 -0700433
Zack Williamsd63d35b2020-06-23 14:12:46 -0700434- Controller only **replies** an ARP request if the request is trying to
435 **resolve an interface address configured on the switch edge port**.
Charles Chan33bac082019-09-12 01:07:51 -0700436
437Pipeline Walkthrough - ARP
438^^^^^^^^^^^^^^^^^^^^^^^^^^
Zack Williamsd63d35b2020-06-23 14:12:46 -0700439
440It is similar to L2 broadcast. Except ARP packets will be matched by a special
441ACL table entry and being copied to the controller.
Charles Chan33bac082019-09-12 01:07:51 -0700442
443
444L3 Unicast
445----------
446
447.. image:: images/arch-l3u.png
448 :width: 800px
449
Zack Williamsd63d35b2020-06-23 14:12:46 -0700450Fig. 7: L3 unicast
Charles Chan33bac082019-09-12 01:07:51 -0700451
452.. image:: images/arch-l3u-src-pipeline.png
453 :width: 1000px
454
455Fig. 8 Simplified L3 unicast pipeline - source leaf
456
457Pipeline Walkthrough - Source Leaf Switch
458^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Zack Williamsd63d35b2020-06-23 14:12:46 -0700459
460- **VLAN Table**: An untagged packet will be assigned an internal VLAN ID
461 according to the input port and the subnet configured on the input port.
462 Packets of the same subnet will have the same internal VLAN ID.
463
464- **TMAC Table**: Since the destination MAC of a L3 unicast packet is the MAC
465 of leaf router and the ethernet type is IPv4, the packet will match the TMAC
466 table and go to the unicast routing table.
467
468- **Unicast Routing Table**: In this table we will lookup the destination IP
469 of the packet and point the packet to corresponding L3 ECMP group
470
471- **ACL Table**: IP packets will miss the ACL table and the L3 ECMP group will
472 be executed.
473
474- **L3 ECMP Group**: Hashes on 5 tuple to pick a spine switch and goto the
475 MPLS Label Group.
476
477- **MPLS Label Group**: Push the MPLS label corresponding to the destination
478 leaf switch and goto the MPLS Interface Group.
479
480- **MPLS Interface Group**: Set source MAC address, destination MAC address,
481 VLAN ID and goto the L2 Interface Group.
482
483- **L2 Interface Group**: The internal assigned VLAN will be popped before the
484 packet is sent to the output port that goes to the spine.
Charles Chan33bac082019-09-12 01:07:51 -0700485
486.. image:: images/arch-l3u-transit-pipeline.png
487 :width: 1000px
488
489Fig. 9 Simplified L3 unicast pipeline - spine
490
491Pipeline Walkthrough - Spine Switch
492^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Zack Williamsd63d35b2020-06-23 14:12:46 -0700493
494- **VLAN Table**: An untagged packet will be assigned an internal VLAN ID
495 according to the input port and the subnet configured on the input port.
496 Packets of the same subnet will have the same internal VLAN ID.
497
498- **TMAC Table**: Since the destination MAC of a L3 unicast packet is the MAC
499 of spine router and the ethernet type is MPLS, the packet will match the TMAC
500 table and go to the MPLS table.
501
502- **MPLS Table**: In this table we will lookup the MPLS label of the packet,
503 figure out the destination leaf switch, pop the MPLS label and point to L3
504 ECMP Group.
505
506- **ACL Table**: IP packets will miss the ACL table and the MPLS interface
507 group will be executed.
508
509- **L3 ECMP Group**: Hash to pick a link (if there are multiple links) to the
510 destination leaf and goto the L3 Interface Group.
511
512- **MPLS Interface Group**: Set source MAC address, destination MAC address,
513 VLAN ID and goto the L2 Interface Group.
514
515- **L2 Interface Group**: The internal assigned VLAN will be popped before the
516 packet is sent to the output port that goes to the destination leaf switch.
Charles Chan33bac082019-09-12 01:07:51 -0700517
518.. image:: images/arch-l3u-dst-pipeline.png
519 :width: 1000px
520
521Fig. 10 Simplified L3 unicast pipeline - destination leaf
522
523Pipeline Walkthrough - Destination Leaf Switch
524^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Zack Williamsd63d35b2020-06-23 14:12:46 -0700525
526- **VLAN Table**: An untagged packet will be assigned an internal VLAN ID
527 according to the input port and the subnet configured on the input port.
528 Packets of the same subnet will have the same internal VLAN ID.
529
530- **TMAC Table**: Since the destination MAC of a L3 unicast packet is the MAC
531 of leaf router and the ethernet type is IPv4, the packet will match the TMAC
532 table and go to the unicast routing table.
533
534- **Unicast Routing Table**: In this table we will lookup the destination IP
535 of the packet and point the packet to corresponding L3 Unicast Group.
536
537- **ACL Table**: IP packets will miss the ACL table and the L3 Unicast Group
538 will be executed.
539
540- **L3 Unicast Group**: Set source MAC address, destination MAC address, VLAN
541 ID and goto the L2 Interface Group.
542
543- **L2 Interface Group**: The internal assigned VLAN will be popped before the
544 packet is sent to the output port that goes to the destination leaf switch.
Charles Chan33bac082019-09-12 01:07:51 -0700545
546
Zack Williamsd63d35b2020-06-23 14:12:46 -0700547The L3 unicast mechanism is designed to support inter-rack(inter-subnet)
548untagged communication when the destination host is known.
Charles Chan33bac082019-09-12 01:07:51 -0700549
550Path Calculation and Failover - Unicast
551^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Zack Williamsd63d35b2020-06-23 14:12:46 -0700552
Charles Chan33bac082019-09-12 01:07:51 -0700553Coming soon...
554
555
556L3 Multicast
557------------
558
559.. image:: images/arch-l3m.png
560 :width: 800px
561
562Fig. 11 L3 multicast
563
564.. image:: images/arch-l3m-pipeline.png
565 :width: 1000px
566
567Fig.12 Simplified L3 multicast pipeline
568
Zack Williamsd63d35b2020-06-23 14:12:46 -0700569The L3 multicast mechanism is designed to support use cases such as IPTV. The
570multicast traffic comes in from the upstream router, replicated by the
571leaf-spine switches, send to multiple OLTs and eventually get to the
572subscribers.
Charles Chan33bac082019-09-12 01:07:51 -0700573
574.. note::
Zack Williamsd63d35b2020-06-23 14:12:46 -0700575 We would like to support different combinations of ingress/egress VLAN,
576 including
Charles Chan33bac082019-09-12 01:07:51 -0700577
578 - untagged in -> untagged out
579 - untagged in -> tagged out
580 - tagged in -> untagged out
581 - tagged in -> same tagged out
582 - tagged in -> different tagged out
583
584 However, due to the above-mentioned OFDPA restrictions,
585
Zack Williamsd63d35b2020-06-23 14:12:46 -0700586 - It is NOT possible to chain L3 multicast group to L2 interface group
587 directly if we want to change the VLAN ID
Charles Chan33bac082019-09-12 01:07:51 -0700588
Zack Williamsd63d35b2020-06-23 14:12:46 -0700589 - It is NOT possible to change VLAN ID by chaining L3 multicast group to L3
590 interface group since all output ports should have the same VLAN but the
591 spec requires chained L3 interface group to have different VLAN ID from
592 each other.
593
594 That means, if we need to change VLAN ID, we need to change it before the
595 packets get into the multicast routing table.
596
Charles Chan33bac082019-09-12 01:07:51 -0700597 The only viable solution is changing the VLAN ID in the VLAN table.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700598
599 We change the VLAN tag on the ingress switch (i.e. the switch that connects
600 to the upstream router) when necessary.
601
602 On transit (spine) and egress (destination leaf) switches, output VLAN tag
603 will remain the same as input VLAN tag.
Charles Chan33bac082019-09-12 01:07:51 -0700604
605Pipeline Walkthrough - Ingress Leaf Switch
606^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
607
608.. csv-table:: Table 1. All Possible VLAN Combinations on Ingress Switch
609 :file: tables/arch-mcast-ingress.csv
610 :widths: 2, 5, 5, 10, 10, 5
611 :header-rows: 1
612
613.. note::
Zack Williamsd63d35b2020-06-23 14:12:46 -0700614 In the presence of ``vlan-untagged`` configuration on the ingress port of
615 the ingress switch, the ``vlan-untagged`` will be used instead of 4094.
616
617 The reason is that we cannot distinguish unicast and multicast traffic in
618 that case, and therefore must assign the same VLAN to the packet.
619
Charles Chan33bac082019-09-12 01:07:51 -0700620 The VLAN will anyway get popped in L2IG in this case.
621
Zack Williamsd63d35b2020-06-23 14:12:46 -0700622Table 1 shows all possible VLAN combinations on the ingress switches and how
623the packet is processed through the pipeline. We take the second case
624**untagged -> tagged 200** as an example to explain more details.
Charles Chan33bac082019-09-12 01:07:51 -0700625
626- **VLAN Table**: An untagged packet will be assigned the **egress VLAN ID**.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700627
628- **TMAC Table**: Since the destination MAC of a L2 unicast packet is a
629 multicast MAC address, the packet will match the TMAC table and goes to the
630 multicast routing table.
631
632- **Multicast Routing Table**: In this table we will lookup the multicast group
633 (destination multicast IP) and point the packet to the corresponding L3
634 multicast group.
635
636- **ACL Table**: Multicast packets will miss the ACL table and the L3 multicast
637 group will be executed.
638
639- **L3 Multicast Group**: The packet will be matched by **egress VLAN ID** and
640 forwarded to multiple L2 interface groups that map to output ports.
641
642- **L2 Interface Group**: The egress VLAN will be kept in this case and the
643 packet will be sent to the output port that goes to the transit spine switch.
Charles Chan33bac082019-09-12 01:07:51 -0700644
645
646Pipeline Walkthrough - Transit Spine Switch and Egress Leaf Switch
647^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
648
649.. csv-table:: Table 2. All Possible VLAN Combinations on Transit/Egress Switch
650 :file: tables/arch-mcast-transit-egress.csv
651 :widths: 2, 5, 5, 10, 10, 5
652 :header-rows: 1
653
Zack Williamsd63d35b2020-06-23 14:12:46 -0700654Table 2 shows all possible VLAN combinations on the transit/egress switches and
655how the packet is processed through the pipeline.
Charles Chan33bac082019-09-12 01:07:51 -0700656
Zack Williamsd63d35b2020-06-23 14:12:46 -0700657Note that we have already changed the VLAN tag to the desired egress VLAN on
658the ingress switch.
659
660Therefore, there are only two cases on the transit/egress switches - either
661keep it untagged or keep it tagged. We take the first case **untagged ->
662untagged** as an example to explain more details.
663
664
665- **VLAN Table**: An untagged packet will be assigned an **internal VLAN ID**
666 according to the input port and the subnet configured on the input port.
667 Packets of the same subnet will have the same internal VLAN ID.
668
Charles Chan33bac082019-09-12 01:07:51 -0700669- **TMAC Table**: (same as ingress switch)
Zack Williamsd63d35b2020-06-23 14:12:46 -0700670
Charles Chan33bac082019-09-12 01:07:51 -0700671- **Multicast Routing Table**: (same as ingress switch)
Zack Williamsd63d35b2020-06-23 14:12:46 -0700672
Charles Chan33bac082019-09-12 01:07:51 -0700673- **ACL Table**: (same as ingress switch)
Zack Williamsd63d35b2020-06-23 14:12:46 -0700674
675- **L3 Multicast Group**: The packet will be matched by **internal VLAN ID**
676 and forwarded to multiple L2 interface groups that map to output ports.
677
678- **L2 Interface Group**: The egress VLAN will be popped in this case and the
679 packet will be sent to the output port that goes to the egress leaf switch.
Charles Chan33bac082019-09-12 01:07:51 -0700680
681Path Calculation and Failover - Multicast
682^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
683Coming soon...
684
Charles Chan33bac082019-09-12 01:07:51 -0700685VLAN Cross Connect
686------------------
687
688.. image:: images/arch-xconnect.png
689 :width: 800px
690
691Fig. 13 VLAN cross connect
692
693.. image:: images/arch-xconnect-pipeline.png
694 :width: 1000px
695
696Fig. 14 Simplified VLAN cross connect pipeline
697
Zack Williamsd63d35b2020-06-23 14:12:46 -0700698VLAN Cross Connect is originally designed to support Q-in-Q packets between
699OLTs and BNGs.
700
Charles Chan33bac082019-09-12 01:07:51 -0700701The cross connect pair consists of two output ports.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700702
703Whatever packet comes in on one port with specific VLAN tag will be sent to the
704other port.
Charles Chan33bac082019-09-12 01:07:51 -0700705
706.. note::
707 It can only cross connects **two ports on the same switch**.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700708 :doc:`Pseudowire <configuration/pseudowire>` is required to connect ports
709 across different switches.
Charles Chan33bac082019-09-12 01:07:51 -0700710
711We use L2 Flood Group to implement VLAN Cross Connect.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700712
Charles Chan33bac082019-09-12 01:07:51 -0700713The L2 Flood Group for cross connect only consists of two ports.
Zack Williamsd63d35b2020-06-23 14:12:46 -0700714
715The input port will be removed before flooding according to the spec and thus
716create exactly the desire behavior of cross connect.
Charles Chan33bac082019-09-12 01:07:51 -0700717
718Pipeline Walkthrough - Cross Connect
719^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Charles Chan33bac082019-09-12 01:07:51 -0700720
Zack Williamsd63d35b2020-06-23 14:12:46 -0700721- **VLAN Table**: When a tagged packet comes in, we no longer need to assign
722 the internal VLAN. The original VLAN will be carried through the entire
723 pipeline.
724
725- **TMAC Table**: Since the VLAN will not match any internal VLAN assigned to
726 untagged packets, the packet will miss the TMAC table and goes to the
727 bridging table.
728
729- **Bridging Table**: The packet will hit the flow rule that match the cross
730 connect VLAN ID and being sent to corresponding L2 Flood Group.
731
732- **ACL Table**: IP packets will miss the ACL table and the L2 flood group will
733 be executed.
734
735- **L2 Flood Group**: Consists of two L2 interface groups related to this cross
736 connect VLAN. L2 Interface Group: The original VLAN will NOT be popped
737 before the packet is sent to the output port.
Charles Chan33bac082019-09-12 01:07:51 -0700738
739vRouter
740-------
741
742.. image:: images/arch-vr.png
743 :width: 800px
744
745Fig. 15 vRouter
746
Zack Williamsd63d35b2020-06-23 14:12:46 -0700747The Trellis fabric needs to be connected to the external world via the vRouter
748functionality. **In the networking industry, the term vRouter implies a
749"router in a VM". This is not the case in Trellis**. Trellis vRouter is NOT a
750software router.
751
Charles Chan33bac082019-09-12 01:07:51 -0700752**Only the control plane of the router, i.e routing protocols, runs in a VM**.
753We use the Quagga routing protocol suite as the control plane for vRouter.
754
Zack Williamsd63d35b2020-06-23 14:12:46 -0700755The **vRouter data plane is entirely in hardware**. Essentially the entire
756hardware fabric serves as the (distributed) data plane for vRouter.
Charles Chan33bac082019-09-12 01:07:51 -0700757
758The **external router views the entire Trellis fabric as a single router**.
759
760.. image:: images/arch-vr-overview.png
761
762.. image:: images/arch-vr-logical.png
763
764.. note::
Zack Williamsd63d35b2020-06-23 14:12:46 -0700765 Dual external routers is also supported for redundancy. Visit
766 :doc:`External Connectivity <configuration/dual-homing>` for details.
Charles Chan33bac082019-09-12 01:07:51 -0700767
768Pipeline Walkthrough - vRouter
769^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Zack Williamsd63d35b2020-06-23 14:12:46 -0700770
771The pipeline is exactly as same as L3 unicast. We just install additional flow
772rules in the unicast routing table on each leaf routers.
Charles Chan33bac082019-09-12 01:07:51 -0700773
774
775Learn More
776----------
777.. tip::
Zack Williamsd63d35b2020-06-23 14:12:46 -0700778 Most of our design discussion and meeting notes are kept in `Google Drive
779 <https://drive.google.com/drive/folders/0Bz9dNKPVvtgsR0M5R0hWSHlfZ0U>`_.
780 If you are wondering why features are designed and implemented in a certain
781 way, you may find the answers there.