| Architecture and Design |
| *********************** |
| |
| Overview |
| -------- |
| |
| .. image:: images/arch-overview.png |
| :width: 1000px |
| |
| Trellis operates as a **hybrid L2/L3 fabric**. |
| |
| As a **pure (or classic) SDN** solution, Trellis **does not use any of the |
| traditional control protocols** typically found in networking, a non-exhaustive |
| list of which includes: STP, MSTP, RSTP, LACP, MLAG, PIM, IGMP, OSPF, IS-IS, |
| Trill, RSVP, LDP and BGP. |
| |
| Instead, Trellis **uses an SDN Controller (ONOS) decoupled from the data-plane |
| hardware to directly program ASIC forwarding tables using OpenFlow and with |
| OF-DPA**, an open-API from Broadcom running on the switches. |
| |
| In this design, a set of applications running on ONOS implement all the fabric |
| functionality and features, such as **Ethernet switching**, **IP routing**, |
| **multicast**, **DHCP Relay**, **pseudowires** and more. |
| |
| .. note:: |
| You can learn more about Trellis features and design concepts by visiting |
| the `Project Website <https://opennetworking.org/trellis>`_ and reading the |
| `Platform Brief |
| <https://www.opennetworking.org/wp-content/uploads/2019/09/TrellisPlatformBrief.pdf>`_. |
| |
| |
| Introduction to OF-DPA Pipeline |
| ------------------------------- |
| |
| In this design note, we are going to explain the design choices we have made |
| and how we got around OF-DPA (OpenFlow Data Plane Abstraction) pipeline |
| restrictions to implement the features we need. We will start from explaining |
| the OFDPA flow tables and group tables we use. |
| |
| Fig. 1 shows the simplified OFDPA pipeline overview. |
| |
| .. image:: images/arch-ofdpa.png |
| :width: 1000px |
| |
| Fig. 1 Simplified OF-DPA pipeline overview |
| |
| Flow Tables |
| ----------- |
| |
| VLAN Table |
| ^^^^^^^^^^ |
| .. note:: |
| The **VLAN Flow Table (id=10)** is used for IEEE 801.Q VLAN assignment and |
| filtering to specify how VLANs are to be handled on a particular port. |
| **All packets must have an associated VLAN id in order to be processed by |
| subsequent tables**. |
| |
| **Table miss**: goto **ACL table**. |
| |
| According to OFDPA spec, we need to assign a VLAN ID even for untagged packets. |
| Each untagged packet will be tagged with an **internal VLAN** when being |
| handled by VLAN table. The internal VLAN will be popped when the packet is |
| sent to a output port or controller. The internal VLAN is assigned according |
| to the subnet configuration of the input port. Packets coming from ports that |
| do not have subnet configured (e.g. the spine facing ports) will be tagged with |
| VLAN ID **4094**. |
| |
| The internal VLAN is also used to determine the subnet when a packet needs to |
| be flooded to all ports in the same subnet. (See L2 Broadcast section for |
| detail.) |
| |
| Termination MAC Table |
| ^^^^^^^^^^^^^^^^^^^^^ |
| .. note:: |
| The **Termination MAC (TMAC) Flow Table (id=20)** determines whether to do |
| bridging or routing on a packet. |
| |
| It identifies routed packets their destination MAC, VLAN, and Ethertype. |
| |
| Routed packet rule types use a Goto-Table instruction to indicate that the |
| next table is one of the routing tables. |
| |
| **Table miss**: goto **Bridging table**. |
| |
| In this table, we determine which table the packet should go to by checking the |
| destination MAC address and the Ethernet type of the packet. |
| |
| - if dst_mac = router MAC and eth_type = ip, goto **unicast routing** table |
| |
| - if dst_mac = router MAC and eth_type = mpls, goto **MPLS table** |
| |
| - if dst_mac = multicast MAC (01:00:5F:00:00:00/FF:FF:FF:80:00:00), goto |
| **multicast routing** table |
| |
| - none of above, goto **bridging table** |
| |
| MPLS Tables |
| ^^^^^^^^^^^ |
| .. note:: |
| The MPLS pipeline can support three **MPLS Flow Tables, MPLS Table 0 |
| (id=23), MPLS Table 1 (id=24) and MPLS Table 2 (id=25)**. |
| |
| An MPLS Flow Table lookup matches the label in the outermost MPLS shim |
| header in the packets. |
| |
| - MPLS Table 0 is only used to pop a protection label on platforms that |
| support this table, or to detect an MPLS- TP Section OAM PDU. |
| |
| - MPLS Table 1 and MPLS Table 2 can be used for all label operations. |
| |
| - MPLS Table 1 and MPLS Table 2 are synchronized flow tables and updating |
| one updates the other |
| |
| **Table miss**: goto **ACL table**. |
| |
| We only use MPLS Table 1 (id=24) in current design. |
| |
| MPLS packets are matched by the MPLS label. |
| |
| The packet will go to **L3 interface group** with MPLS label being popped and |
| further go to destination leaf switch. |
| |
| |
| Unicast Routing Table |
| ^^^^^^^^^^^^^^^^^^^^^ |
| .. note:: |
| The **Unicast Routing Flow Table (id=30)** supports routing for potentially |
| large numbers of IPv4 and IPv6 flow entries using the hardware L3 tables. |
| |
| **Table miss**: goto **ACL table**. |
| |
| In this table, we determine where to output a packet by checking its |
| **destination IP (unicast)** address. |
| |
| - if dst_ip locates at a **remote switch**, the packet will go to an **L3 ECMP |
| group**, be tagged with MPLS label, and further go to a spine switch |
| |
| - if dst_ip locates at the **same switch**, the packet will go to an **L3 |
| interface group** and further go to a host |
| |
| Note that the priority of flow entries in this table is sorted by prefix |
| length. |
| |
| Longer prefix (/32) will have higher priority than shorter prefix (/0). |
| |
| |
| Multicast Routing Table |
| ^^^^^^^^^^^^^^^^^^^^^^^ |
| .. note:: |
| The **Multicast Routing Flow Table (id=40)** supports routing for IPv4 and |
| IPv6 multicast packets. |
| |
| **Table miss**: goto **ACL table**. |
| |
| Flow entries in this table always match the **destination IP (multicast)**. |
| |
| Matched packets will go to an **L3 multicast group** and further go to the next |
| switch or host. |
| |
| |
| Bridging Table |
| ^^^^^^^^^^^^^^ |
| .. note:: |
| The **Bridging Flow Table (id=50)** supports Ethernet packet switching for |
| potentially large numbers of flow entries using the hardware L2 tables. |
| |
| The Bridging Flow Table forwards either based on VLAN (normal switched |
| packets) or Tunnel id (isolated forwarding domain packets), with the Tunnel |
| id metadata field used to distinguish different flow table entry types by |
| range assignment. |
| |
| **Table miss**: goto **ACL table**. |
| |
| In this table, we match the **VLAN ID** and the **destination MAC address** and |
| determine where the packet should be forwarded to. |
| |
| - if the destination MAC can be matched, the packet will go to the **L2 |
| interface group** and further sent to the destination host. |
| |
| - if the destination MAC can not be matched, the packet will go to the **L2 |
| flood group** and further flooded to the same subnet. |
| |
| Since we cannot match IP in bridging table, we use the VLAN ID to determine |
| which subnet this packet should be flooded to. |
| |
| The VLAN ID can be either (1) the internal VLAN assigned to untagged packets |
| in VLAN table or (2) the VLAN ID that comes with tagged packets. |
| |
| |
| Policy ACL Table |
| ^^^^^^^^^^^^^^^^ |
| .. note:: |
| The Policy ACL Flow Table supports wide, multi-field matching. |
| |
| Most fields can be wildcard matched, and relative priority must be |
| specified in all flow entry modification API calls. |
| |
| This is the preferred table for matching BPDU and ARP packets. It also |
| provides the Metering instruction. |
| |
| **Table miss**: **do nothing**. |
| The packet will be forwarded using the output or group in the action set, |
| if any. |
| |
| If the action set does not have a group or output action the packet is dropped. |
| |
| In ACL table we trap **ARP**, **LLDP**, **BDDP**, **DHCP** and send those |
| packets to the **controller**. |
| |
| Group Tables |
| ------------ |
| |
| L3 ECMP Group |
| ^^^^^^^^^^^^^ |
| .. note:: |
| OF-DPA L3 ECMP group entries are of OpenFlow type **SELECT**. |
| |
| For IP routing the action buckets reference the OF-DPA **L3 Unicast group** |
| entries that are members of the multipath group for ECMP forwarding. |
| |
| An OF-DPA L3 ECMP Group entry can also be used in a Provider Edge Router. |
| |
| In this packet flow it can chain to either an **MPLS L3 Label** group entry |
| or to an **MPLS Fast Failover** group entry. |
| |
| An OF-DPA L3 ECMP Group entry can be specified as a routing target instead |
| of an OF-DPA L3 Unicast Group entry. Selection of an action bucket for |
| forwarding a particular packet is hardware-specific. |
| |
| MPLS Label Group |
| ^^^^^^^^^^^^^^^^ |
| .. note:: |
| MPLS Label Group entries are of OpenFlow **INDIRECT** type. |
| |
| There are four MPLS label Group entry subtypes, all with similar structure. |
| |
| These can be used in different configurations to **push up to three |
| labels** for tunnel initiation or LSR swap. |
| |
| MPLS Interface Group |
| ^^^^^^^^^^^^^^^^^^^^ |
| .. note:: |
| MPLS Interface Group Entry is of OpenFlow type **INDIRECT**. |
| |
| It is used to **set the outgoing L2 header** to reach the next hop label |
| switch router or provider edge router. |
| |
| We use **L3 ECMP** group to randomly pick one spine switch when we need to |
| route a packet from leaves to spines. |
| |
| We point each bucket to an **MPLS Label** Group in which the MPLS labels are |
| pushed to the packets to realize Segment Routing mechanism. (More |
| specifically, we use the subtype 2 **MPLS L3 VPN Label**). |
| |
| We then point an MPLS Label Group points to an **MPLS Interface** Group in |
| which the destination MAC is set to the next hop (spine router). |
| |
| Finally, the packet will goto an **L2 Interface** Group and being sent to the |
| output port that goes to the spine router. |
| |
| Detail of how segment routing is implemented will be explained in the L3 |
| unicast section below. |
| |
| L3 Unicast Group |
| ^^^^^^^^^^^^^^^^ |
| .. note:: |
| OF-DPA L3 Unicast group entries are of OpenFlow **INDIRECT** type. |
| |
| L3 Unicast group entries are used to supply the routing next hop and output |
| interface for packet forwarding. |
| |
| To properly route a packet from either the Routing Flow Table or the Policy |
| ACL Flow Table, the forwarding flow entry must reference an L3 Unicast |
| Group entry. |
| |
| All packets must have a VLAN tag. |
| |
| **A chained L2 Interface group entry must be in the same VLAN as assigned |
| by the L3 Unicast Group** entry. |
| |
| We use L3 Unicast Group to rewrite the **source MAC**, **destination MAC** and |
| **VLAN ID** when routing is needed. |
| |
| L3 Multicast Group |
| ^^^^^^^^^^^^^^^^^^ |
| .. note:: |
| OF-DPA L3 Multicast group entries are of OpenFlow **ALL** type. |
| |
| The action buckets describe the interfaces to which multicast packet |
| replicas are forwarded. |
| |
| Note that: |
| |
| - Chained OF-DPA **L2 Interface** Group entries must be in the **same |
| VLAN** as the OF-DPA **L3 Multicast** group entry. However, |
| |
| - Chained OF-DPA **L3 Interface** Group entries must be in **different |
| VLANs** from the OF-DPA **L3 Multicast** Group entry, **and from each |
| other**. |
| |
| We use L3 multicast group to replicate multicast packets when necessary. |
| |
| It is also possible that L3 multicast group consists of only one bucket when |
| replication is not needed. |
| |
| Detail of how multicast is implemented will be explained in the L3 multicast |
| section below. |
| |
| L2 Interface Group |
| ^^^^^^^^^^^^^^^^^^ |
| .. note:: |
| L2 Interface Group entries are of OpenFlow **INDIRECT** type, with a single |
| action bucket. |
| |
| OF-DPA L2 Interface group entries are used for egress VLAN filtering and |
| tagging. |
| |
| If a specific set of VLANs is allowed on a port, appropriate group entries |
| must be defined for the VLAN and port combinations. |
| |
| Note: OF-DPA uses the L2 Interface group declaration to configure the port |
| VLAN filtering behavior. |
| |
| This approach was taken since OpenFlow does not support configuring VLANs |
| on physical ports. |
| |
| L2 Flood Group |
| ^^^^^^^^^^^^^^ |
| .. note:: |
| L2 Flood Group entries are used by VLAN Flow Table wildcard (destination |
| location forwarding, or DLF) rules. |
| |
| Like OF-DPA L2 Multicast group entry types they are of OpenFlow **ALL** |
| type. |
| |
| The action buckets each encode an output port. |
| |
| Each OF-DPA L2 Flood Group entry bucket forwards a replica to an output |
| port, except for packet IN_PORT. |
| |
| All of the OF-DPA L2 Interface Group entries referenced by the OF-DPA Flood |
| Group entry, and the OF- DPA Flood Group entry itself, must be in the |
| **same VLAN**. |
| |
| Note: There can only be **one OF-DPA L2 Flood Group** entry defined **per |
| VLAN**. |
| |
| L2 Unicast |
| ---------- |
| |
| .. image:: images/arch-l2u.png |
| :width: 800px |
| |
| Fig. 2: L2 unicast |
| |
| .. image:: images/arch-l2u-pipeline.png |
| :width: 1000px |
| |
| Fig. 3: Simplified L2 unicast pipeline |
| |
| The L2 unicast mechanism is designed to support **intra-rack (intra-subnet)** |
| communication when the destination host is **known**. |
| |
| Pipeline Walkthrough - L2 Unicast |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| - **VLAN Table**: An untagged packet will be assigned an internal VLAN ID |
| according to the input port and the subnet configured on the input port. |
| Packets of the same subnet will have the same internal VLAN ID. |
| |
| - **TMAC Table**: Since the destination MAC of a L2 unicast packet is not the |
| MAC of leaf router, the packet will miss the TMAC table and goes to the |
| bridging table. |
| |
| - **Bridging Table**: If the destination MAC is learnt, there will be a flow |
| entry matching that destination MAC and pointing to an L2 interface group. |
| |
| - **ACL Table**: IP packets will miss the ACL table and the L2 interface group |
| will be executed. |
| |
| L2 Interface Group: The internal assigned VLAN will be popped before the |
| packet is sent to the output port. |
| |
| L2 Broadcast |
| ------------ |
| |
| .. image:: images/arch-l2f.png |
| :width: 800px |
| |
| Fig. 4: L2 broadcast |
| |
| .. image:: images/arch-l2f-pipeline.png |
| :width: 1000px |
| |
| Fig. 5: Simplified L2 broadcast pipeline |
| |
| Pipeline Walkthrough - L2 Broadcast |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| - **VLAN Table**: (same as L2 unicast) |
| |
| - **TMAC Table**: (same as L2 unicast) |
| |
| - **Bridging Table**: If the destination MAC is not learnt, there will NOT be a |
| flow entry matching that destination MAC. |
| |
| It will then fallback to a lower priority entry that matches the VLAN |
| (subnet) and point to an L2 flood group. |
| |
| - **ACL Table**: IP packets will miss the ACL table and the L2 flood group will |
| be executed. |
| |
| - **L2 Flood Group**: Consists of all L2 interface groups related to this VLAN |
| (subnet). |
| |
| - **L2 Interface Group**: The internal assigned VLAN will be popped before the |
| packet is sent to the output port. |
| |
| ARP |
| --- |
| |
| .. image:: images/arch-arp-pipeline.png |
| :width: 1000px |
| |
| Fig. 6: Simplified ARP pipeline |
| |
| All ARP packets will be forwarded according to the bridging pipeline. |
| |
| In addition, a **copy of the ARP packet will be sent to the controller**. |
| |
| - Controller will use the ARP packets for **learning purpose and update host |
| store** accordingly. |
| |
| - Controller only **replies** an ARP request if the request is trying to |
| **resolve an interface address configured on the switch edge port**. |
| |
| Pipeline Walkthrough - ARP |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| It is similar to L2 broadcast. Except ARP packets will be matched by a special |
| ACL table entry and being copied to the controller. |
| |
| |
| L3 Unicast |
| ---------- |
| |
| .. image:: images/arch-l3u.png |
| :width: 800px |
| |
| Fig. 7: L3 unicast |
| |
| .. image:: images/arch-l3u-src-pipeline.png |
| :width: 1000px |
| |
| Fig. 8 Simplified L3 unicast pipeline - source leaf |
| |
| Pipeline Walkthrough - Source Leaf Switch |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| - **VLAN Table**: An untagged packet will be assigned an internal VLAN ID |
| according to the input port and the subnet configured on the input port. |
| Packets of the same subnet will have the same internal VLAN ID. |
| |
| - **TMAC Table**: Since the destination MAC of a L3 unicast packet is the MAC |
| of leaf router and the ethernet type is IPv4, the packet will match the TMAC |
| table and go to the unicast routing table. |
| |
| - **Unicast Routing Table**: In this table we will lookup the destination IP |
| of the packet and point the packet to corresponding L3 ECMP group |
| |
| - **ACL Table**: IP packets will miss the ACL table and the L3 ECMP group will |
| be executed. |
| |
| - **L3 ECMP Group**: Hashes on 5 tuple to pick a spine switch and goto the |
| MPLS Label Group. |
| |
| - **MPLS Label Group**: Push the MPLS label corresponding to the destination |
| leaf switch and goto the MPLS Interface Group. |
| |
| - **MPLS Interface Group**: Set source MAC address, destination MAC address, |
| VLAN ID and goto the L2 Interface Group. |
| |
| - **L2 Interface Group**: The internal assigned VLAN will be popped before the |
| packet is sent to the output port that goes to the spine. |
| |
| .. image:: images/arch-l3u-transit-pipeline.png |
| :width: 1000px |
| |
| Fig. 9 Simplified L3 unicast pipeline - spine |
| |
| Pipeline Walkthrough - Spine Switch |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| - **VLAN Table**: An untagged packet will be assigned an internal VLAN ID |
| according to the input port and the subnet configured on the input port. |
| Packets of the same subnet will have the same internal VLAN ID. |
| |
| - **TMAC Table**: Since the destination MAC of a L3 unicast packet is the MAC |
| of spine router and the ethernet type is MPLS, the packet will match the TMAC |
| table and go to the MPLS table. |
| |
| - **MPLS Table**: In this table we will lookup the MPLS label of the packet, |
| figure out the destination leaf switch, pop the MPLS label and point to L3 |
| ECMP Group. |
| |
| - **ACL Table**: IP packets will miss the ACL table and the MPLS interface |
| group will be executed. |
| |
| - **L3 ECMP Group**: Hash to pick a link (if there are multiple links) to the |
| destination leaf and goto the L3 Interface Group. |
| |
| - **MPLS Interface Group**: Set source MAC address, destination MAC address, |
| VLAN ID and goto the L2 Interface Group. |
| |
| - **L2 Interface Group**: The internal assigned VLAN will be popped before the |
| packet is sent to the output port that goes to the destination leaf switch. |
| |
| .. image:: images/arch-l3u-dst-pipeline.png |
| :width: 1000px |
| |
| Fig. 10 Simplified L3 unicast pipeline - destination leaf |
| |
| Pipeline Walkthrough - Destination Leaf Switch |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| - **VLAN Table**: An untagged packet will be assigned an internal VLAN ID |
| according to the input port and the subnet configured on the input port. |
| Packets of the same subnet will have the same internal VLAN ID. |
| |
| - **TMAC Table**: Since the destination MAC of a L3 unicast packet is the MAC |
| of leaf router and the ethernet type is IPv4, the packet will match the TMAC |
| table and go to the unicast routing table. |
| |
| - **Unicast Routing Table**: In this table we will lookup the destination IP |
| of the packet and point the packet to corresponding L3 Unicast Group. |
| |
| - **ACL Table**: IP packets will miss the ACL table and the L3 Unicast Group |
| will be executed. |
| |
| - **L3 Unicast Group**: Set source MAC address, destination MAC address, VLAN |
| ID and goto the L2 Interface Group. |
| |
| - **L2 Interface Group**: The internal assigned VLAN will be popped before the |
| packet is sent to the output port that goes to the destination leaf switch. |
| |
| |
| The L3 unicast mechanism is designed to support inter-rack(inter-subnet) |
| untagged communication when the destination host is known. |
| |
| Path Calculation and Failover - Unicast |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| Coming soon... |
| |
| |
| L3 Multicast |
| ------------ |
| |
| .. image:: images/arch-l3m.png |
| :width: 800px |
| |
| Fig. 11 L3 multicast |
| |
| .. image:: images/arch-l3m-pipeline.png |
| :width: 1000px |
| |
| Fig.12 Simplified L3 multicast pipeline |
| |
| The L3 multicast mechanism is designed to support use cases such as IPTV. The |
| multicast traffic comes in from the upstream router, replicated by the |
| leaf-spine switches, send to multiple OLTs and eventually get to the |
| subscribers. |
| |
| .. note:: |
| We would like to support different combinations of ingress/egress VLAN, |
| including |
| |
| - untagged in -> untagged out |
| - untagged in -> tagged out |
| - tagged in -> untagged out |
| - tagged in -> same tagged out |
| - tagged in -> different tagged out |
| |
| However, due to the above-mentioned OFDPA restrictions, |
| |
| - It is NOT possible to chain L3 multicast group to L2 interface group |
| directly if we want to change the VLAN ID |
| |
| - It is NOT possible to change VLAN ID by chaining L3 multicast group to L3 |
| interface group since all output ports should have the same VLAN but the |
| spec requires chained L3 interface group to have different VLAN ID from |
| each other. |
| |
| That means, if we need to change VLAN ID, we need to change it before the |
| packets get into the multicast routing table. |
| |
| The only viable solution is changing the VLAN ID in the VLAN table. |
| |
| We change the VLAN tag on the ingress switch (i.e. the switch that connects |
| to the upstream router) when necessary. |
| |
| On transit (spine) and egress (destination leaf) switches, output VLAN tag |
| will remain the same as input VLAN tag. |
| |
| Pipeline Walkthrough - Ingress Leaf Switch |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| .. csv-table:: Table 1. All Possible VLAN Combinations on Ingress Switch |
| :file: tables/arch-mcast-ingress.csv |
| :widths: 2, 5, 5, 10, 10, 5 |
| :header-rows: 1 |
| |
| .. note:: |
| In the presence of ``vlan-untagged`` configuration on the ingress port of |
| the ingress switch, the ``vlan-untagged`` will be used instead of 4094. |
| |
| The reason is that we cannot distinguish unicast and multicast traffic in |
| that case, and therefore must assign the same VLAN to the packet. |
| |
| The VLAN will anyway get popped in L2IG in this case. |
| |
| Table 1 shows all possible VLAN combinations on the ingress switches and how |
| the packet is processed through the pipeline. We take the second case |
| **untagged -> tagged 200** as an example to explain more details. |
| |
| - **VLAN Table**: An untagged packet will be assigned the **egress VLAN ID**. |
| |
| - **TMAC Table**: Since the destination MAC of a L2 unicast packet is a |
| multicast MAC address, the packet will match the TMAC table and goes to the |
| multicast routing table. |
| |
| - **Multicast Routing Table**: In this table we will lookup the multicast group |
| (destination multicast IP) and point the packet to the corresponding L3 |
| multicast group. |
| |
| - **ACL Table**: Multicast packets will miss the ACL table and the L3 multicast |
| group will be executed. |
| |
| - **L3 Multicast Group**: The packet will be matched by **egress VLAN ID** and |
| forwarded to multiple L2 interface groups that map to output ports. |
| |
| - **L2 Interface Group**: The egress VLAN will be kept in this case and the |
| packet will be sent to the output port that goes to the transit spine switch. |
| |
| |
| Pipeline Walkthrough - Transit Spine Switch and Egress Leaf Switch |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| .. csv-table:: Table 2. All Possible VLAN Combinations on Transit/Egress Switch |
| :file: tables/arch-mcast-transit-egress.csv |
| :widths: 2, 5, 5, 10, 10, 5 |
| :header-rows: 1 |
| |
| Table 2 shows all possible VLAN combinations on the transit/egress switches and |
| how the packet is processed through the pipeline. |
| |
| Note that we have already changed the VLAN tag to the desired egress VLAN on |
| the ingress switch. |
| |
| Therefore, there are only two cases on the transit/egress switches - either |
| keep it untagged or keep it tagged. We take the first case **untagged -> |
| untagged** as an example to explain more details. |
| |
| |
| - **VLAN Table**: An untagged packet will be assigned an **internal VLAN ID** |
| according to the input port and the subnet configured on the input port. |
| Packets of the same subnet will have the same internal VLAN ID. |
| |
| - **TMAC Table**: (same as ingress switch) |
| |
| - **Multicast Routing Table**: (same as ingress switch) |
| |
| - **ACL Table**: (same as ingress switch) |
| |
| - **L3 Multicast Group**: The packet will be matched by **internal VLAN ID** |
| and forwarded to multiple L2 interface groups that map to output ports. |
| |
| - **L2 Interface Group**: The egress VLAN will be popped in this case and the |
| packet will be sent to the output port that goes to the egress leaf switch. |
| |
| Path Calculation and Failover - Multicast |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| Coming soon... |
| |
| VLAN Cross Connect |
| ------------------ |
| |
| .. image:: images/arch-xconnect.png |
| :width: 800px |
| |
| Fig. 13 VLAN cross connect |
| |
| .. image:: images/arch-xconnect-pipeline.png |
| :width: 1000px |
| |
| Fig. 14 Simplified VLAN cross connect pipeline |
| |
| VLAN Cross Connect is originally designed to support Q-in-Q packets between |
| OLTs and BNGs. |
| |
| The cross connect pair consists of two output ports. |
| |
| Whatever packet comes in on one port with specific VLAN tag will be sent to the |
| other port. |
| |
| .. note:: |
| It can only cross connects **two ports on the same switch**. |
| :doc:`Pseudowire <configuration/pseudowire>` is required to connect ports |
| across different switches. |
| |
| We use L2 Flood Group to implement VLAN Cross Connect. |
| |
| The L2 Flood Group for cross connect only consists of two ports. |
| |
| The input port will be removed before flooding according to the spec and thus |
| create exactly the desire behavior of cross connect. |
| |
| Pipeline Walkthrough - Cross Connect |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| - **VLAN Table**: When a tagged packet comes in, we no longer need to assign |
| the internal VLAN. The original VLAN will be carried through the entire |
| pipeline. |
| |
| - **TMAC Table**: Since the VLAN will not match any internal VLAN assigned to |
| untagged packets, the packet will miss the TMAC table and goes to the |
| bridging table. |
| |
| - **Bridging Table**: The packet will hit the flow rule that match the cross |
| connect VLAN ID and being sent to corresponding L2 Flood Group. |
| |
| - **ACL Table**: IP packets will miss the ACL table and the L2 flood group will |
| be executed. |
| |
| - **L2 Flood Group**: Consists of two L2 interface groups related to this cross |
| connect VLAN. L2 Interface Group: The original VLAN will NOT be popped |
| before the packet is sent to the output port. |
| |
| vRouter |
| ------- |
| |
| .. image:: images/arch-vr.png |
| :width: 800px |
| |
| Fig. 15 vRouter |
| |
| The Trellis fabric needs to be connected to the external world via the vRouter |
| functionality. **In the networking industry, the term vRouter implies a |
| "router in a VM". This is not the case in Trellis**. Trellis vRouter is NOT a |
| software router. |
| |
| **Only the control plane of the router, i.e routing protocols, runs in a VM**. |
| We use the Quagga routing protocol suite as the control plane for vRouter. |
| |
| The **vRouter data plane is entirely in hardware**. Essentially the entire |
| hardware fabric serves as the (distributed) data plane for vRouter. |
| |
| The **external router views the entire Trellis fabric as a single router**. |
| |
| .. image:: images/arch-vr-overview.png |
| |
| .. image:: images/arch-vr-logical.png |
| |
| .. note:: |
| Dual external routers is also supported for redundancy. Visit |
| :doc:`External Connectivity <configuration/dual-homing>` for details. |
| |
| Pipeline Walkthrough - vRouter |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| The pipeline is exactly as same as L3 unicast. We just install additional flow |
| rules in the unicast routing table on each leaf routers. |
| |
| |
| Learn More |
| ---------- |
| .. tip:: |
| Most of our design discussion and meeting notes are kept in `Google Drive |
| <https://drive.google.com/drive/folders/0Bz9dNKPVvtgsR0M5R0hWSHlfZ0U>`_. |
| If you are wondering why features are designed and implemented in a certain |
| way, you may find the answers there. |