Cumulus switch config
IBM has started delivering Cumulus based Nvidia switches for ESS, so I’ve had to learn how to configure these.
Switches seems to always be delivered with a very old Cumulus version, version 5.1.0, which don’t seem to be upgradeable to current release without dropping the switch configuration. There are 2 release streams of Cumulus for Nvidia switches, a “mainline” and a Long-Term Support (LTS) stream. Current LTS version is 5.11.z, and mainline is 5.15.z:
| Cumulus Linux Version | LTS Start Date | End of Life Date |
|---|---|---|
| 5.8.z or earlier | N/A | April 2025 |
| 5.9.z | April 2024 | April 2027 |
| 5.10.z | N/A | November 2025 |
| 5.11.z | November 2024 | November 2027 |
| 5.12.z | N/A | February 2026 |
| 5.13.z | N/A | May 2026 |
| 5.14.z | N/A | July 2026 |
| 5.15.z | N/A | November 2026 |
Both release streams should be stable and robust, according to Nvidia documentation. I’ve found documentation stating that 5.15 can be upgraded to from 5.12 and later, but not from the 5.11 LTS version, which seems unfortunate, so I think I prefer using the mainline stream to not get stuck needing to do a full ONIE re-install after the system is in production. According to https://docs.nvidia.com/networking-ethernet-software/knowledge-base/Support/Support-Offerings/Cumulus-Linux-Release-Versioning-and-Support-Policy/ “NVIDIA recommends that you run: The latest Cumulus Linux 5.y.z release on Spectrum switches” – so that’s what I do.
Assumptions
I’ll assume we have 2 brand new switches, which should be clustered to provide MLAG to the hosts. The switches will have management addresses 192.168.200.2 and 192.168.200.3, additionally we have a management node (EMS) in the same network with IP address 192.168.200.4. Default username/password for the switch should be cumulus/cumulus.
IP plan:
| Component | IP Address |
|---|---|
| Switch1 | 192.168.200.2/24 |
| Switch2 | 192.168.200.3/24 |
| Management node | 192.168.200.4/24 |
Initial IP config
When first receiving the switches, we need to connect to them using serial cable to configure initial IP address:
switch1:
nv set interface eth0 ip address 192.168.200.2/24
nv config apply
switch2:
nv set interface eth0 ip address 192.168.200.3/24
nv config apply
ONIE Upgrade
Assuming the switch is running version less than 5.12, we need to upgrade using ONIE. To do that, download the new cumulus image to the management node, and serve it via http:
Management node:
# curl -O https://somewhere/cumulus-linux-5.15.0-mlx-amd64.bin
# python3 -m http.server 9999
The switches can then be upgrade to this image using:
sudo onie-install -a -i http://192.168.200.4:9999/cumulus-linux-5.15.0-mlx-amd64.bin
sudo reboot
After approximately 10 minutes, the switches should have completed the upgrade, and have now lost all configuration. If you still have the serial connection available, use it to again configure the management IP as above. Otherwise, you should have made a not of the mac address of eth0 on the switches, so that you could configure a DHCP server on the management node and provide IP address to the switches that way:
# yum install dhcp-server
# cat <<'EOF' > /etc/dhcp/dhcpd.conf
subnet 192.168.200.0 netmask 255.255.255.0 {
default-lease-time 60;
max-lease-time 7200;
}
host switch1 {
hardware ethernet d8:94:24:e6:d5:8a;
fixed-address 192.168.200.2;
max-lease-time 3600;
}
host switch2 {
hardware ethernet d8:94:24:cd:62:0a;
fixed-address 192.168.200.3;
max-lease-time 3600;
}
EOF
# systemctl enable dhcpd
Connect back into the switches to disable dhcp and configure static management IP:
switch1:
nv set system hostname switch1
nv set interface eth0 ip address 192.168.200.2/24
nv unset interface eth0 ipv4 dhcp-client
nv config apply
switch2:
nv set system hostname switch2
nv set interface eth0 ip address 192.168.200.3/24
nv unset interface eth0 ipv4 dhcp-client
nv config apply
NTP Config
We’ll use the management node as NTP server for the switches:
nv unset system ntp server
nv set system ntp server 192.168.200.4 iburst enabled
nv config apply
MLAG peer config
Then we need to configure some peerlink interfaces for MLAG. Probably best to avoid two first, and two last ports, as those tend to be the only ports supporting high power transceivers. At least for SN2xx0 switches. So we connect port 3 and 4 between the two switches. 3-to-3, 4-to-4 and configure:
Switch1:
nv set interface peerlink bond member swp3-4
nv set mlag mac-address 44:38:39:E5:5E:55
nv set mlag peer-ip linklocal
nv set mlag backup 192.168.200.3 vrf mgmt
nv config apply
Switch2:
nv set interface peerlink bond member swp3-4
nv set mlag mac-address 44:38:39:E5:5E:55
nv set mlag peer-ip linklocal
nv set mlag backup 192.168.200.2 vrf mgmt
nv config apply
OBS: Make sure to use same mac-address for each pair of switches, and different mac-addresses for other pairs. Also, mac-address should start with 44:38:39:
MLAG host port config
Connect each host to same port number(s) in both switches, then configure bonding for these ports. I like to name the bond the same number as the first switch-port number.
nv set interface bond5 bond member swp5
nv set interface bond5 description ems
nv set interface bond5 bond mlag id 5
nv set interface bond5 bridge domain br_default
nv set interface bond6 bond member swp6-7
nv set interface bond6 description essio1
nv set interface bond6 bond mlag id 6
nv set interface bond6 bridge domain br_default
nv set interface bond8 bond member swp8-9
nv set interface bond8 description essio2
nv set interface bond8 bond mlag id 8
nv set interface bond8 bridge domain br_default
nv config apply
VLAN
We can add VLANs to the br_default device:
nv set bridge domain br_default vlan 123,321
Configure some ports to be access ports in given VLAN:
nv set interface bond7 bridge domain br_default access 123
Disable native VLAN:
nv set interface bond7 bridge domain br_default untagged none
Show port-vlan:
$ nv show bridge port-vlan
domain port vlan tag-state
------- --------- --------- ---------
br_default bond1 4070 untagged
bond3 4070 untagged
bond5 4070 untagged
bond7 4070 untagged
bond9 4070 untagged
bond11 4070 untagged
peerlink 1 untagged
4070-4071 tagged
Debug
Verify mlag config between peers:
$ nv show mlag consistency-checker global
Parameter LocalValue PeerValue Conflict Summary
---------------------- ------------------------- ------------------------- -------- -------
anycast-ip - - -
bridge-priority 32768 32768 -
bridge-stp on on -
bridge-type vlan-aware vlan-aware -
clag-pkg-version 1.6.0-cl5.6.0u15 1.6.0-cl5.6.0u15 -
clag-protocol-version 1.6.1 1.6.1 -
peer-ip fe80::4ab0:2dff:fe78:7569 fe80::4ab0:2dff:fe78:7569 -
peerlink-bridge-member Yes Yes -
peerlink-mtu 9216 9216 -
peerlink-native-vlan 1 1 -
peerlink-vlans 1, 123, 321 1, 123, 321 -
redirect2-enable yes yes -
system-mac 44:38:39:be:ef:bb 44:38:39:be:ef:bb -
$ nv show interface --view=mlag-cc
Interface Parameter LocalValue PeerValue Conflict
--------- ---------------- ----------------- ----------------- --------------------------------------------------
bond3 bridge-learning yes yes -
clag-id 3 3 -
lacp-actor-mac 44:38:39:be:ef:bb 44:38:39:be:ef:bb -
lacp-partner-mac 44:38:39:e5:5e:55 44:38:39:e5:5e:55 -
master br_default NOT-SYNCED -
mtu 9216 9216 -
native-vlan 1 1 -
vlan-id 1, 123, 321 1, 123, 321 -
bond5 bridge-learning yes yes -
clag-id 5 5 -
lacp-actor-mac 44:38:39:be:ef:bb 44:38:39:be:ef:bb -
lacp-partner-mac 12:26:05:34:2a:b1 12:26:05:34:2a:b1 -
master br_default NOT-SYNCED -
mtu 9216 9216 -
native-vlan 123 123 -
vlan-id 123 123 -
bond6 bridge-learning yes yes -
clag-id 6 6 -
lacp-actor-mac 44:38:39:be:ef:bb 44:38:39:be:ef:bb -
lacp-partner-mac ee:da:e0:ab:d3:7f ee:da:e0:ab:d3:7f -
master br_default NOT-SYNCED -
mtu 9216 9216 -
native-vlan 1 1 -
vlan-id 1, 123, 321 1, 123, 321 -
bond7 bridge-learning yes yes -
clag-id 7 7 -
lacp-actor-mac 44:38:39:be:ef:bb 44:38:39:be:ef:bb -
lacp-partner-mac 02:e2:5f:ef:8d:6c 02:e2:5f:ef:8d:6c -
master br_default NOT-SYNCED -
mtu 9216 9216 -
native-vlan - - -
vlan-id 123 123, 321 vlan mismatch on clag interface between clag peers
Command snippets
Show commands to implement current configuration:
nv config show -o commands
Find serial number:
$ nv show platform
operational
------------- -----------------------------------------------
system-mac 2c:5e:ab:53:d9:f9
manufacturer Mellanox
cpu x86_64 Intel(R) Xeon(R) CPU D-1527 @ 2.20GHz x8
memory 15.02 GB
disk-size 28GB
port-layout 32 x 200G-QSFP56
part-number SSG7A92187
serial-number M2NJ5766671
asic-model Spectrum-2
system-uuid d4abdca6-5a00-11f0-8000-2c5ebbb12300
system-type MSN3700
Set password for cumulus user, overriding the strict password policy by providing pre-hashed password:
nv set system aaa user cumulus hashed-password '$6$goMu8n27O1gmG5MK$ZzmvRgn8M1nPJcJhkuK/OZbQH.esZ7.xsHO4pM7YLXVoUwyo50BH88acPFvCqwizFMl91/gzIgYBqgqVY8Ev80'
Save current configuration to /etc/nvue.d/startup.yaml, and copy offline:
nv config save
scp /etc/nvue.d/startup.yaml 192.168.200.4:
Clear physical counters per interface:
cumulus@switch:~$ nv action clear interface swp1 link phy-detail
Action executing ...
swp1 link phy-detail counters cleared.
Action succeeded
Reset tranceiver:
cumulus@switch:~$ nv action reset platform transceiver swp1
Action executing ...
Resetting module swp1 ... OK
Action succeeded
Show tranceivers:
cumulus@switch:~$ nv show platform transceiver
Transceiver Identifier Vendor name Vendor PN Vendor SN Vendor revision
----------- ---------- ----------- ---------------- ------------- ---------------
swp1 QSFP28 Mellanox MCP1600-C001E30N MT2039VB01185 A3
swp10 QSFP28 Mellanox MCP1600-C001E30N MT2211VS01792 A3
swp11 QSFP28 Mellanox MCP1600-C001E30N MT2211VS01792 A3
swp12 QSFP28 Mellanox MCP1650-V00AE30 MT2122VB02220 A2
swp13 QSFP28 Mellanox MCP1650-V00AE30 MT2122VB02220 A2
More details:
cumulus@switch:~$ nv show platform transceiver swp2
cumulus@switch:~$ nv show interface swp1 transceiver
Show all config, including defaults:
nv config show --all
Show mac addresses:
$ nv show bridge domain br_default mac-table
entry-id MAC address vlan interface remote-dst src-vni entry-type last-update age
-------- ----------------- ---- ---------- ---------- ------- ---------- --------------- ---------------
1 2c:5e:ab:52:d1:fd 4070 peerlink static 0:01:35 0:51:21
2 2c:5e:ab:53:c9:48 1 peerlink permanent 8 days, 4:16:21 8 days, 4:16:21
3 2c:5e:ab:53:c9:48 peerlink permanent 8 days, 4:16:21 8 days, 4:16:21
Carrier stats/changes:
$ nv show interface --view=carrier-stats
Interface Oper Status Up Count Down Count Total State Changes Last State Change
------------- ----------- -------- ---------- ------------------- -----------------------
bond1 down 0 0 0 Never
bond3 down 0 0 0 Never
bond5 up 1 0 1 2025/12/17 14:43:49.869
bond7 down 0 0 0 Never
bond9 down 0 0 0 Never
bond11 down 0 0 0 Never
bond27 up 1 0 1 2025/12/17 14:22:18.781
br_default up 2 2 4 2025/12/10 00:51:29.644
eth0 up 1 1 2 2025/12/10 00:37:51.149
lo unknown 0 0 0 Never
mgmt up 0 0 0 Never
peerlink up 1 0 1 2025/12/10 00:51:21.914
peerlink.4094 up 1 0 1 2025/12/10 00:51:21.915
swp1 down 1 2 3 2025/12/10 00:51:29.688
swp2 down 0 1 1 2025/12/10 00:40:05.037
swp3 down 1 2 3 2025/12/10 00:51:29.689
swp4 down 0 1 1 2025/12/10 00:40:05.042
swp5 up 2 2 4 2025/12/17 14:43:47.416
Show difference between two revisions:
$ nv config history
...
$ nv config diff 18 19
- set:
interface:
bond7:
bridge:
domain:
br_default:
access: 4071
$ nv config diff 18 19 -o commands
nv set interface bond7 bridge domain br_default access 4071
Prescriptive Topology Manager - PTM
PTM can be used for cabling verification. By creating a Graphviz-DOT formatted /etc/ptm.d/topology.dot file containing all nodes supporting LLDP, we can make sure the real world matches with the expectations. F.ex:
$ sudo cat /etc/ptm.d/topology.dot
graph G {
"ess-sw-02":"eth0" -- "sonic":"Eth47(Port47)";
"ess-sw-01":"swp27" -- "nas-sw1":"Eth1/27";
"ess-sw-02":"swp27" -- "nas-sw2":"Eth1/27";
"ess-sw-01":"swp29" -- "ess-sw-02":"swp29";
"ess-sw-01":"swp30" -- "ess-sw-02":"swp30";
}
$ sudo systemctl restart ptmd
Same topology file can be pushed to all nodes, and verified with pmctl command:
$ sudo ptmctl
---------------
port cbl
status
---------------
eth0 pass
swp27 pass
swp29 pass
swp30 pass
$ sudo ptmctl -d
-----------------------------------------------------------------------------------------------------------------------------
port cbl exp act sysname portID portDescr match last
status nbr nbr on upd
-----------------------------------------------------------------------------------------------------------------------------
eth0 pass sonic:Eth47(Port47) sonic:Eth47(Port47) sonic Eth47(Port47) Ethernet46 IfName 9m:46s
swp27 pass nas-sw2:Eth1/27 nas-sw2:Eth1/27 nas-sw2 Eth1/27 IfName 9m:46s
swp29 pass ess-sw-01:swp29 ess-sw-01:swp29 ess-sw-01 swp29 swp29 IfName 9m:46s
swp30 pass ess-sw-01:swp30 ess-sw-01:swp30 ess-sw-01 swp30 swp30 IfName 9m:46s
TODO/FIXME
- send syslog to central log host (nv show system syslog server)
- open telemetry export ?
- ansible ? https://galaxy.ansible.com/ui/repo/published/nvidia/nvue/
- REST API:
curl -k -u cumulus:cumulus -X GET "https://127.0.0.1:8765/nvue_v1/?rev=applied"
Interesting experiences…
Port down because of config mismatch
swp7 in an mlag bond refused to work on one of the switches. Swapped around cabling and server side connections, but problem persisted on same port in same switch while everything was working in other ports or other switch. Looking at signal strengt we saw good RX power, but TX power was nill.
Eventually I found that there was an issue reported here:
$ nv show interface status
Interface Admin Status Oper Status Protodown Protodown Reason Fault
------------- ------------ ----------- --------- ---------------- -----
<snip>
swp7 up down enabled clag N/A
$ nv show interface swp7 link protodown-reason
operational
-----------
clag
$ nv show interface --view=mlag-cc
Interface Parameter LocalValue PeerValue Conflict
--------- ---------------- ----------------- ----------------- ---------------------------------------------------------
<snip>
bond7 clag-id 7 7 -
lacp-partner-mac 00:00:00:00:00:00 00:00:00:00:00:00 -
lacp-actor-mac 44:38:39:e5:5e:54 44:38:39:e5:5e:54 -
vlan-id 4071 1, 4070->4071 vlan mismatch on clag interface between clag peers
native-vlan 4071 1 native vlan mismatch on clag interface between clag peers
master br_default NOT-SYNCED -
mtu 9216 9216 -
bridge-learning yes yes -
bpdu-guard off off -
restricted-role no no -
Apparently we had forgotten to configure the port as access port on one of the switches. Correcting that fixed the problem. I’m surprised by the behaviour, but it’s probably sensible…