Cumulus switch config

IBM has started delivering Cumulus based Nvidia switches for ESS, so I’ve had to learn how to configure these.

Switches seems to always be delivered with a very old Cumulus version, version 5.1.0, which don’t seem to be upgradeable to current release without dropping the switch configuration. There are 2 release streams of Cumulus for Nvidia switches, a “mainline” and a Long-Term Support (LTS) stream. Current LTS version is 5.11.z, and mainline is 5.15.z:

Cumulus Linux Version LTS Start Date End of Life Date
5.8.z or earlier N/A April 2025
5.9.z April 2024 April 2027
5.10.z N/A November 2025
5.11.z November 2024 November 2027
5.12.z N/A February 2026
5.13.z N/A May 2026
5.14.z N/A July 2026
5.15.z N/A November 2026

Both release streams should be stable and robust, according to Nvidia documentation. I’ve found documentation stating that 5.15 can be upgraded to from 5.12 and later, but not from the 5.11 LTS version, which seems unfortunate, so I think I prefer using the mainline stream to not get stuck needing to do a full ONIE re-install after the system is in production. According to https://docs.nvidia.com/networking-ethernet-software/knowledge-base/Support/Support-Offerings/Cumulus-Linux-Release-Versioning-and-Support-Policy/ “NVIDIA recommends that you run: The latest Cumulus Linux 5.y.z release on Spectrum switches” – so that’s what I do.

Assumptions

I’ll assume we have 2 brand new switches, which should be clustered to provide MLAG to the hosts. The switches will have management addresses 192.168.200.2 and 192.168.200.3, additionally we have a management node (EMS) in the same network with IP address 192.168.200.4. Default username/password for the switch should be cumulus/cumulus.

IP plan:

Component IP Address
Switch1 192.168.200.2/24
Switch2 192.168.200.3/24
Management node 192.168.200.4/24

Initial IP config

When first receiving the switches, we need to connect to them using serial cable to configure initial IP address:

switch1:
nv set interface eth0 ip address 192.168.200.2/24
nv config apply

switch2:
nv set interface eth0 ip address 192.168.200.3/24
nv config apply

ONIE Upgrade

Assuming the switch is running version less than 5.12, we need to upgrade using ONIE. To do that, download the new cumulus image to the management node, and serve it via http:

Management node:
# curl -O https://somewhere/cumulus-linux-5.15.0-mlx-amd64.bin
# python3 -m http.server 9999

The switches can then be upgrade to this image using:

sudo onie-install -a -i http://192.168.200.4:9999/cumulus-linux-5.15.0-mlx-amd64.bin
sudo reboot

After approximately 10 minutes, the switches should have completed the upgrade, and have now lost all configuration. If you still have the serial connection available, use it to again configure the management IP as above. Otherwise, you should have made a not of the mac address of eth0 on the switches, so that you could configure a DHCP server on the management node and provide IP address to the switches that way:

# yum install dhcp-server
# cat <<'EOF' > /etc/dhcp/dhcpd.conf
subnet 192.168.200.0 netmask 255.255.255.0 {
  default-lease-time 60;
  max-lease-time 7200;
  }

host switch1 {
    hardware ethernet    d8:94:24:e6:d5:8a;
    fixed-address        192.168.200.2;
    max-lease-time       3600;
}

host switch2 {
    hardware ethernet    d8:94:24:cd:62:0a;
    fixed-address        192.168.200.3;
    max-lease-time       3600;
}
EOF
# systemctl enable dhcpd

Connect back into the switches to disable dhcp and configure static management IP:

switch1:

nv set system hostname switch1
nv set interface eth0 ip address 192.168.200.2/24
nv unset interface eth0 ipv4 dhcp-client
nv config apply

switch2:

nv set system hostname switch2
nv set interface eth0 ip address 192.168.200.3/24
nv unset interface eth0 ipv4 dhcp-client
nv config apply

NTP Config

We’ll use the management node as NTP server for the switches:

nv unset system ntp server
nv set system ntp server 192.168.200.4 iburst enabled
nv config apply

MLAG peer config

Then we need to configure some peerlink interfaces for MLAG. Probably best to avoid two first, and two last ports, as those tend to be the only ports supporting high power transceivers. At least for SN2xx0 switches. So we connect port 3 and 4 between the two switches. 3-to-3, 4-to-4 and configure:

Switch1:

nv set interface peerlink bond member swp3-4
nv set mlag mac-address 44:38:39:E5:5E:55
nv set mlag peer-ip linklocal
nv set mlag backup 192.168.200.3 vrf mgmt
nv config apply

Switch2:

nv set interface peerlink bond member swp3-4
nv set mlag mac-address 44:38:39:E5:5E:55
nv set mlag peer-ip linklocal
nv set mlag backup 192.168.200.2 vrf mgmt
nv config apply

OBS: Make sure to use same mac-address for each pair of switches, and different mac-addresses for other pairs. Also, mac-address should start with 44:38:39:

MLAG host port config

Connect each host to same port number(s) in both switches, then configure bonding for these ports. I like to name the bond the same number as the first switch-port number.

nv set interface bond5 bond member swp5
nv set interface bond5 description ems
nv set interface bond5 bond mlag id 5
nv set interface bond5 bridge domain br_default

nv set interface bond6 bond member swp6-7
nv set interface bond6 description essio1
nv set interface bond6 bond mlag id 6
nv set interface bond6 bridge domain br_default

nv set interface bond8 bond member swp8-9
nv set interface bond8 description essio2
nv set interface bond8 bond mlag id 8
nv set interface bond8 bridge domain br_default

nv config apply

VLAN

We can add VLANs to the br_default device:

nv set bridge domain br_default vlan 123,321

Configure some ports to be access ports in given VLAN:

nv set interface bond7 bridge domain br_default access 123

Disable native VLAN:

nv set interface bond7 bridge domain br_default untagged none

Show port-vlan:

$ nv show bridge port-vlan
domain        port            vlan   tag-state
-------    ---------     ---------   ---------
br_default   bond1            4070    untagged
             bond3            4070    untagged
             bond5            4070    untagged
             bond7            4070    untagged
             bond9            4070    untagged
             bond11           4070    untagged
            peerlink             1    untagged
                         4070-4071      tagged

Debug

Verify mlag config between peers:

$ nv show mlag consistency-checker global
Parameter               LocalValue                 PeerValue                  Conflict  Summary
----------------------  -------------------------  -------------------------  --------  -------
anycast-ip              -                          -                          -
bridge-priority         32768                      32768                      -
bridge-stp              on                         on                         -
bridge-type             vlan-aware                 vlan-aware                 -
clag-pkg-version        1.6.0-cl5.6.0u15           1.6.0-cl5.6.0u15           -
clag-protocol-version   1.6.1                      1.6.1                      -
peer-ip                 fe80::4ab0:2dff:fe78:7569  fe80::4ab0:2dff:fe78:7569  -
peerlink-bridge-member  Yes                        Yes                        -
peerlink-mtu            9216                       9216                       -
peerlink-native-vlan    1                          1                          -
peerlink-vlans          1, 123, 321                1, 123, 321                -
redirect2-enable        yes                        yes                        -
system-mac              44:38:39:be:ef:bb          44:38:39:be:ef:bb          -

$ nv show interface --view=mlag-cc
Interface  Parameter         LocalValue         PeerValue          Conflict
---------  ----------------  -----------------  -----------------  --------------------------------------------------
bond3      bridge-learning   yes                yes                -
           clag-id           3                  3                  -
           lacp-actor-mac    44:38:39:be:ef:bb  44:38:39:be:ef:bb  -
           lacp-partner-mac  44:38:39:e5:5e:55  44:38:39:e5:5e:55  -
           master            br_default         NOT-SYNCED         -
           mtu               9216               9216               -
           native-vlan       1                  1                  -
           vlan-id           1, 123, 321        1, 123, 321        -
bond5      bridge-learning   yes                yes                -
           clag-id           5                  5                  -
           lacp-actor-mac    44:38:39:be:ef:bb  44:38:39:be:ef:bb  -
           lacp-partner-mac  12:26:05:34:2a:b1  12:26:05:34:2a:b1  -
           master            br_default         NOT-SYNCED         -
           mtu               9216               9216               -
           native-vlan       123                123                -
           vlan-id           123                123                -
bond6      bridge-learning   yes                yes                -
           clag-id           6                  6                  -
           lacp-actor-mac    44:38:39:be:ef:bb  44:38:39:be:ef:bb  -
           lacp-partner-mac  ee:da:e0:ab:d3:7f  ee:da:e0:ab:d3:7f  -
           master            br_default         NOT-SYNCED         -
           mtu               9216               9216               -
           native-vlan       1                  1                  -
           vlan-id           1, 123, 321        1, 123, 321        -
bond7      bridge-learning   yes                yes                -
           clag-id           7                  7                  -
           lacp-actor-mac    44:38:39:be:ef:bb  44:38:39:be:ef:bb  -
           lacp-partner-mac  02:e2:5f:ef:8d:6c  02:e2:5f:ef:8d:6c  -
           master            br_default         NOT-SYNCED         -
           mtu               9216               9216               -
           native-vlan       -                  -                  -
           vlan-id           123                123, 321           vlan mismatch on clag interface between clag peers

Command snippets

Show commands to implement current configuration:

nv config show -o commands

Find serial number:

$ nv show platform
operational
-------------  -----------------------------------------------
system-mac     2c:5e:ab:53:d9:f9
manufacturer   Mellanox
cpu            x86_64 Intel(R) Xeon(R) CPU D-1527 @ 2.20GHz x8
memory         15.02 GB
disk-size      28GB
port-layout    32 x 200G-QSFP56
part-number    SSG7A92187
serial-number  M2NJ5766671
asic-model     Spectrum-2
system-uuid    d4abdca6-5a00-11f0-8000-2c5ebbb12300
system-type    MSN3700

Set password for cumulus user, overriding the strict password policy by providing pre-hashed password:

nv set system aaa user cumulus hashed-password '$6$goMu8n27O1gmG5MK$ZzmvRgn8M1nPJcJhkuK/OZbQH.esZ7.xsHO4pM7YLXVoUwyo50BH88acPFvCqwizFMl91/gzIgYBqgqVY8Ev80'

Save current configuration to /etc/nvue.d/startup.yaml, and copy offline:

nv config save
scp /etc/nvue.d/startup.yaml 192.168.200.4:

Clear physical counters per interface:

cumulus@switch:~$ nv action clear interface swp1 link phy-detail
Action executing ... 
swp1 link phy-detail counters cleared. 
Action succeeded

Reset tranceiver:

cumulus@switch:~$ nv action reset platform transceiver swp1 
Action executing ... 
Resetting module swp1 ... OK 
Action succeeded 

Show tranceivers:

cumulus@switch:~$ nv show platform transceiver 
Transceiver  Identifier  Vendor name  Vendor PN         Vendor SN      Vendor revision
-----------  ----------  -----------  ----------------  -------------  --------------- 
swp1         QSFP28      Mellanox     MCP1600-C001E30N  MT2039VB01185  A3 
swp10        QSFP28      Mellanox     MCP1600-C001E30N  MT2211VS01792  A3 
swp11        QSFP28      Mellanox     MCP1600-C001E30N  MT2211VS01792  A3 
swp12        QSFP28      Mellanox     MCP1650-V00AE30   MT2122VB02220  A2 
swp13        QSFP28      Mellanox     MCP1650-V00AE30   MT2122VB02220  A2 

More details:

cumulus@switch:~$ nv show platform transceiver swp2
cumulus@switch:~$ nv show interface swp1 transceiver

Show all config, including defaults:

nv config show --all

Show mac addresses:

$ nv show bridge domain br_default mac-table
entry-id  MAC address        vlan  interface   remote-dst  src-vni  entry-type  last-update      age
--------  -----------------  ----  ----------  ----------  -------  ----------  ---------------  ---------------
1         2c:5e:ab:52:d1:fd  4070  peerlink                         static      0:01:35          0:51:21
2         2c:5e:ab:53:c9:48  1     peerlink                         permanent   8 days, 4:16:21  8 days, 4:16:21
3         2c:5e:ab:53:c9:48        peerlink                         permanent   8 days, 4:16:21  8 days, 4:16:21

Carrier stats/changes:

$ nv show interface --view=carrier-stats
Interface      Oper Status  Up Count  Down Count  Total State Changes  Last State Change
-------------  -----------  --------  ----------  -------------------  -----------------------
bond1          down         0         0           0                    Never
bond3          down         0         0           0                    Never
bond5          up           1         0           1                    2025/12/17 14:43:49.869
bond7          down         0         0           0                    Never
bond9          down         0         0           0                    Never
bond11         down         0         0           0                    Never
bond27         up           1         0           1                    2025/12/17 14:22:18.781
br_default     up           2         2           4                    2025/12/10 00:51:29.644
eth0           up           1         1           2                    2025/12/10 00:37:51.149
lo             unknown      0         0           0                    Never
mgmt           up           0         0           0                    Never
peerlink       up           1         0           1                    2025/12/10 00:51:21.914
peerlink.4094  up           1         0           1                    2025/12/10 00:51:21.915
swp1           down         1         2           3                    2025/12/10 00:51:29.688
swp2           down         0         1           1                    2025/12/10 00:40:05.037
swp3           down         1         2           3                    2025/12/10 00:51:29.689
swp4           down         0         1           1                    2025/12/10 00:40:05.042
swp5           up           2         2           4                    2025/12/17 14:43:47.416

Show difference between two revisions:

$ nv config history
...
$ nv config diff 18 19
- set:
    interface:
      bond7:
        bridge:
          domain:
            br_default:
              access: 4071
$ nv config diff 18 19 -o commands
nv set interface bond7 bridge domain br_default access 4071

Prescriptive Topology Manager - PTM

PTM can be used for cabling verification. By creating a Graphviz-DOT formatted /etc/ptm.d/topology.dot file containing all nodes supporting LLDP, we can make sure the real world matches with the expectations. F.ex:

$ sudo cat /etc/ptm.d/topology.dot
graph G {
    "ess-sw-02":"eth0" -- "sonic":"Eth47(Port47)";
    "ess-sw-01":"swp27" -- "nas-sw1":"Eth1/27";
    "ess-sw-02":"swp27" -- "nas-sw2":"Eth1/27";
    "ess-sw-01":"swp29" -- "ess-sw-02":"swp29";
    "ess-sw-01":"swp30" -- "ess-sw-02":"swp30";
}
$ sudo systemctl restart ptmd

Same topology file can be pushed to all nodes, and verified with pmctl command:

$ sudo ptmctl
---------------
port   cbl
       status
---------------
eth0   pass
swp27  pass
swp29  pass
swp30  pass

$ sudo ptmctl -d
-----------------------------------------------------------------------------------------------------------------------------
port   cbl     exp                     act                     sysname           portID         portDescr   match   last
       status  nbr                     nbr                                                                  on      upd
-----------------------------------------------------------------------------------------------------------------------------
eth0   pass    sonic:Eth47(Port47)     sonic:Eth47(Port47)     sonic             Eth47(Port47)  Ethernet46  IfName   9m:46s
swp27  pass    nas-sw2:Eth1/27         nas-sw2:Eth1/27         nas-sw2           Eth1/27                    IfName   9m:46s
swp29  pass    ess-sw-01:swp29         ess-sw-01:swp29         ess-sw-01         swp29          swp29       IfName   9m:46s
swp30  pass    ess-sw-01:swp30         ess-sw-01:swp30         ess-sw-01         swp30          swp30       IfName   9m:46s

TODO/FIXME

curl -k -u cumulus:cumulus -X GET "https://127.0.0.1:8765/nvue_v1/?rev=applied"

Interesting experiences…

Port down because of config mismatch

swp7 in an mlag bond refused to work on one of the switches. Swapped around cabling and server side connections, but problem persisted on same port in same switch while everything was working in other ports or other switch. Looking at signal strengt we saw good RX power, but TX power was nill.

Eventually I found that there was an issue reported here:

$ nv show interface status
Interface      Admin Status  Oper Status  Protodown  Protodown Reason  Fault
-------------  ------------  -----------  ---------  ----------------  -----
<snip>
swp7           up            down         enabled    clag              N/A

$ nv show interface swp7 link protodown-reason
operational
-----------
clag

$ nv show interface --view=mlag-cc

Interface  Parameter         LocalValue         PeerValue          Conflict
---------  ----------------  -----------------  -----------------  ---------------------------------------------------------
<snip>
bond7      clag-id           7                  7                  -
           lacp-partner-mac  00:00:00:00:00:00  00:00:00:00:00:00  -
           lacp-actor-mac    44:38:39:e5:5e:54  44:38:39:e5:5e:54  -
           vlan-id           4071               1, 4070->4071      vlan mismatch on clag interface between clag peers
           native-vlan       4071               1                  native vlan mismatch on clag interface between clag peers
           master            br_default         NOT-SYNCED         -
           mtu               9216               9216               -
           bridge-learning   yes                yes                -
           bpdu-guard        off                off                -
           restricted-role   no                 no                 -

Apparently we had forgotten to configure the port as access port on one of the switches. Correcting that fixed the problem. I’m surprised by the behaviour, but it’s probably sensible…