Cumulus switch config

IBM has started delivering Cumulus based Nvidia switches for ESS, so I’ve had to learn how to configure these.

Switches seems to always be delivered with a very old Cumulus version, version 5.1.0, which don’t seem to be upgradeable to current release without dropping the switch configuration. There are 2 release streams of Cumulus for Nvidia switches, a “mainline” and a Long-Term Support (LTS) stream. Current LTS version is 5.11.z, and mainline is 5.15.z:

Cumulus Linux Version LTS Start Date End of Life Date
5.8.z or earlier N/A April 2025
5.9.z April 2024 April 2027
5.10.z N/A November 2025
5.11.z November 2024 November 2027
5.12.z N/A February 2026
5.13.z N/A May 2026
5.14.z N/A July 2026
5.15.z N/A November 2026

Both release streams should be stable and robust, according to Nvidia documentation. I’ve found documentation stating that 5.15 can be upgraded to from 5.12 and later, but not from the 5.11 LTS version, which seems unfortunate, so I think I prefer using the mainline stream to not get stuck needing to do a full ONIE re-install after the system is in production. According to https://docs.nvidia.com/networking-ethernet-software/knowledge-base/Support/Support-Offerings/Cumulus-Linux-Release-Versioning-and-Support-Policy/ “NVIDIA recommends that you run: The latest Cumulus Linux 5.y.z release on Spectrum switches” – so that’s what I do.

Assumptions

I’ll assume we have 2 brand new switches, which should be clustered to provide MLAG to the hosts. The switches will have management addresses 192.168.200.2 and 192.168.200.3, additionally we have a management node (EMS) in the same network with IP address 192.168.200.4. Default username/password for the switch should be cumulus/cumulus.

IP plan:

Component IP Address
Switch1 192.168.200.2/24
Switch2 192.168.200.3/24
Management node 192.168.200.4/24

Initial IP config

When first receiving the switches, we need to connect to them using serial cable to configure initial IP address:

switch1:
nv set interface eth0 ip address 192.168.200.2/24
nv config apply

switch2:
nv set interface eth0 ip address 192.168.200.3/24
nv config apply

ONIE Upgrade

Assuming the switch is running version less than 5.12, we need to upgrade using ONIE. To do that, download the new cumulus image to the management node, and serve it via http:

Management node:
# curl -O https://somewhere/cumulus-linux-5.15.0-mlx-amd64.bin
# python3 -m http.server 9999

The switches can then be upgrade to this image using:

sudo onie-install -a -i http://192.168.200.4:9999/cumulus-linux-5.15.0-mlx-amd64.bin
sudo reboot

After approximately 10 minutes, the switches should have completed the upgrade, and have now lost all configuration. If you still have the serial connection available, use it to again configure the management IP as above. Otherwise, you should have made a not of the mac address of eth0 on the switches, so that you could configure a DHCP server on the management node and provide IP address to the switches that way:

# yum install dhcp-server
# cat <<'EOF' > /etc/dhcp/dhcpd.conf
subnet 192.168.200.0 netmask 255.255.255.0 {
  default-lease-time 60;
  max-lease-time 7200;
  }

host switch1 {
    hardware ethernet    d8:94:24:e6:d5:8a;
    fixed-address        192.168.200.2;
    max-lease-time       3600;
}

host switch2 {
    hardware ethernet    d8:94:24:cd:62:0a;
    fixed-address        192.168.200.3;
    max-lease-time       3600;
}
EOF
# systemctl enable dhcpd

Connect back into the switches to disable dhcp and configure static management IP:

switch1:

nv set system hostname switch1
nv set interface eth0 ip address 192.168.200.2/24
nv unset interface eth0 ipv4 dhcp-client
nv config apply

switch2:

nv set system hostname switch2
nv set interface eth0 ip address 192.168.200.3/24
nv unset interface eth0 ipv4 dhcp-client
nv config apply

NTP Config

We’ll use the management node as NTP server for the switches:

nv unset system ntp server
nv set system ntp server 192.168.200.4 iburst enabled
nv config apply

MLAG peer config

Then we need to configure some peerlink interfaces for MLAG. Probably best to avoid two first, and two last ports, as those tend to be the only ports supporting high power transceivers. At least for SN2xx0 switches. So we connect port 3 and 4 between the two switches. 3-to-3, 4-to-4 and configure:

Switch1:

nv set interface peerlink bond member swp3-4
nv set mlag mac-address 44:38:39:E5:5E:55
nv set mlag peer-ip linklocal
nv set mlag backup 192.168.200.3 vrf mgmt
nv config apply

Switch2:

nv set interface peerlink bond member swp3-4
nv set mlag mac-address 44:38:39:E5:5E:55
nv set mlag peer-ip linklocal
nv set mlag backup 192.168.200.2 vrf mgmt
nv config apply

OBS: Make sure to use same mac-address for each pair of switches, and different mac-addresses for other pairs. Also, mac-address should start with 44:38:39:

MLAG host port config

Connect each host to same port number(s) in both switches, then configure bonding for these ports. I like to name the bond the same number as the first switch-port number.

nv set interface bond5 bond member swp5
nv set interface bond5 description ems
nv set interface bond5 bond mlag id 5
nv set interface bond5 bridge domain br_default

nv set interface bond6 bond member swp6-7
nv set interface bond6 description essio1
nv set interface bond6 bond mlag id 6
nv set interface bond6 bridge domain br_default

nv set interface bond8 bond member swp8-9
nv set interface bond8 description essio2
nv set interface bond8 bond mlag id 8
nv set interface bond8 bridge domain br_default

nv config apply

VLAN

We can add VLANs to the br_default device:

nv set bridge domain br_default vlan 123,321

Configure some ports to be access ports in given VLAN:

nv set interface bond7 bridge domain br_default access 123

Disable native VLAN:

nv set interface bond7 bridge domain br_default untagged none

Show port-vlan:

$ nv show bridge port-vlan
domain        port            vlan   tag-state
-------    ---------     ---------   ---------
br_default   bond1            4070    untagged
             bond3            4070    untagged
             bond5            4070    untagged
             bond7            4070    untagged
             bond9            4070    untagged
             bond11           4070    untagged
            peerlink             1    untagged
                         4070-4071      tagged

Debug

Verify mlag config between peers:

$ nv show mlag consistency-checker global
Parameter               LocalValue                 PeerValue                  Conflict  Summary
----------------------  -------------------------  -------------------------  --------  -------
anycast-ip              -                          -                          -
bridge-priority         32768                      32768                      -
bridge-stp              on                         on                         -
bridge-type             vlan-aware                 vlan-aware                 -
clag-pkg-version        1.6.0-cl5.6.0u15           1.6.0-cl5.6.0u15           -
clag-protocol-version   1.6.1                      1.6.1                      -
peer-ip                 fe80::4ab0:2dff:fe78:7569  fe80::4ab0:2dff:fe78:7569  -
peerlink-bridge-member  Yes                        Yes                        -
peerlink-mtu            9216                       9216                       -
peerlink-native-vlan    1                          1                          -
peerlink-vlans          1, 123, 321                1, 123, 321                -
redirect2-enable        yes                        yes                        -
system-mac              44:38:39:be:ef:bb          44:38:39:be:ef:bb          -

$ nv show interface --view=mlag-cc
Interface  Parameter         LocalValue         PeerValue          Conflict
---------  ----------------  -----------------  -----------------  --------------------------------------------------
bond3      bridge-learning   yes                yes                -
           clag-id           3                  3                  -
           lacp-actor-mac    44:38:39:be:ef:bb  44:38:39:be:ef:bb  -
           lacp-partner-mac  44:38:39:e5:5e:55  44:38:39:e5:5e:55  -
           master            br_default         NOT-SYNCED         -
           mtu               9216               9216               -
           native-vlan       1                  1                  -
           vlan-id           1, 123, 321        1, 123, 321        -
bond5      bridge-learning   yes                yes                -
           clag-id           5                  5                  -
           lacp-actor-mac    44:38:39:be:ef:bb  44:38:39:be:ef:bb  -
           lacp-partner-mac  12:26:05:34:2a:b1  12:26:05:34:2a:b1  -
           master            br_default         NOT-SYNCED         -
           mtu               9216               9216               -
           native-vlan       123                123                -
           vlan-id           123                123                -
bond6      bridge-learning   yes                yes                -
           clag-id           6                  6                  -
           lacp-actor-mac    44:38:39:be:ef:bb  44:38:39:be:ef:bb  -
           lacp-partner-mac  ee:da:e0:ab:d3:7f  ee:da:e0:ab:d3:7f  -
           master            br_default         NOT-SYNCED         -
           mtu               9216               9216               -
           native-vlan       1                  1                  -
           vlan-id           1, 123, 321        1, 123, 321        -
bond7      bridge-learning   yes                yes                -
           clag-id           7                  7                  -
           lacp-actor-mac    44:38:39:be:ef:bb  44:38:39:be:ef:bb  -
           lacp-partner-mac  02:e2:5f:ef:8d:6c  02:e2:5f:ef:8d:6c  -
           master            br_default         NOT-SYNCED         -
           mtu               9216               9216               -
           native-vlan       -                  -                  -
           vlan-id           123                123, 321           vlan mismatch on clag interface between clag peers

Command snippets

Show commands to implement current configuration:

nv config show -o commands

Set password for cumulus user, overriding the strict password policy by providing pre-hashed password:

nv set system aaa user cumulus hashed-password '$6$goMu8n27O1gmG5MK$ZzmvRgn8M1nPJcJhkuK/OZbQH.esZ7.xsHO4pM7YLXVoUwyo50BH88acPFvCqwizFMl91/gzIgYBqgqVY8Ev80'

Save current configuration to /etc/nvue.d/startup.yaml, and copy offline:

nv config save
scp /etc/nvue.d/startup.yaml 192.168.200.4:

Clear physical counters per interface:

cumulus@switch:~$ nv action clear interface swp1 link phy-detail
Action executing ... 
swp1 link phy-detail counters cleared. 
Action succeeded

Reset tranceiver:

cumulus@switch:~$ nv action reset platform transceiver swp1 
Action executing ... 
Resetting module swp1 ... OK 
Action succeeded 

Show tranceivers:

cumulus@switch:~$ nv show platform transceiver 
Transceiver  Identifier  Vendor name  Vendor PN         Vendor SN      Vendor revision
-----------  ----------  -----------  ----------------  -------------  --------------- 
swp1         QSFP28      Mellanox     MCP1600-C001E30N  MT2039VB01185  A3 
swp10        QSFP28      Mellanox     MCP1600-C001E30N  MT2211VS01792  A3 
swp11        QSFP28      Mellanox     MCP1600-C001E30N  MT2211VS01792  A3 
swp12        QSFP28      Mellanox     MCP1650-V00AE30   MT2122VB02220  A2 
swp13        QSFP28      Mellanox     MCP1650-V00AE30   MT2122VB02220  A2 

More details:

cumulus@switch:~$ nv show platform transceiver swp2
cumulus@switch:~$ nv show interface swp1 transceiver

Show all config, including defaults:

nv config show --all

Prescriptive Topology Manager - PTM

PTM can be used for cabling verification. By creating a Graphviz-DOT formatted /etc/ptm.d/topology.dot file containing all nodes supporting LLDP, we can make sure the real world matches with the expectations. F.ex:

$ sudo cat /etc/ptm.d/topology.dot
graph G {
    "ess-sw-02":"eth0" -- "sonic":"Eth47(Port47)";
    "ess-sw-01":"swp27" -- "nas-sw1":"Eth1/27";
    "ess-sw-02":"swp27" -- "nas-sw2":"Eth1/27";
    "ess-sw-01":"swp29" -- "ess-sw-02":"swp29";
    "ess-sw-01":"swp30" -- "ess-sw-02":"swp30";
}
$ sudo systemctl restart ptmd

Same topology file can be pushed to all nodes, and verified with pmctl command:

$ sudo ptmctl
---------------
port   cbl
       status
---------------
eth0   pass
swp27  pass
swp29  pass
swp30  pass

$ sudo ptmctl -d
-----------------------------------------------------------------------------------------------------------------------------
port   cbl     exp                     act                     sysname           portID         portDescr   match   last
       status  nbr                     nbr                                                                  on      upd
-----------------------------------------------------------------------------------------------------------------------------
eth0   pass    sonic:Eth47(Port47)     sonic:Eth47(Port47)     sonic             Eth47(Port47)  Ethernet46  IfName   9m:46s
swp27  pass    nas-sw2:Eth1/27         nas-sw2:Eth1/27         nas-sw2           Eth1/27                    IfName   9m:46s
swp29  pass    ess-sw-01:swp29         ess-sw-01:swp29         ess-sw-01         swp29          swp29       IfName   9m:46s
swp30  pass    ess-sw-01:swp30         ess-sw-01:swp30         ess-sw-01         swp30          swp30       IfName   9m:46s

TODO/FIXME

curl -k -u cumulus:cumulus -X GET "https://127.0.0.1:8765/nvue_v1/?rev=applied"