Cumulus switch config
IBM has started delivering Cumulus based Nvidia switches for ESS, so I’ve had to learn how to configure these.
Switches seems to always be delivered with a very old Cumulus version, version 5.1.0, which don’t seem to be upgradeable to current release without dropping the switch configuration. There are 2 release streams of Cumulus for Nvidia switches, a “mainline” and a Long-Term Support (LTS) stream. Current LTS version is 5.11.z, and mainline is 5.15.z:
| Cumulus Linux Version | LTS Start Date | End of Life Date |
|---|---|---|
| 5.8.z or earlier | N/A | April 2025 |
| 5.9.z | April 2024 | April 2027 |
| 5.10.z | N/A | November 2025 |
| 5.11.z | November 2024 | November 2027 |
| 5.12.z | N/A | February 2026 |
| 5.13.z | N/A | May 2026 |
| 5.14.z | N/A | July 2026 |
| 5.15.z | N/A | November 2026 |
Both release streams should be stable and robust, according to Nvidia documentation. I’ve found documentation stating that 5.15 can be upgraded to from 5.12 and later, but not from the 5.11 LTS version, which seems unfortunate, so I think I prefer using the mainline stream to not get stuck needing to do a full ONIE re-install after the system is in production. According to https://docs.nvidia.com/networking-ethernet-software/knowledge-base/Support/Support-Offerings/Cumulus-Linux-Release-Versioning-and-Support-Policy/ “NVIDIA recommends that you run: The latest Cumulus Linux 5.y.z release on Spectrum switches” – so that’s what I do.
Assumptions
I’ll assume we have 2 brand new switches, which should be clustered to provide MLAG to the hosts. The switches will have management addresses 192.168.200.2 and 192.168.200.3, additionally we have a management node (EMS) in the same network with IP address 192.168.200.4. Default username/password for the switch should be cumulus/cumulus.
IP plan:
| Component | IP Address |
|---|---|
| Switch1 | 192.168.200.2/24 |
| Switch2 | 192.168.200.3/24 |
| Management node | 192.168.200.4/24 |
Initial IP config
When first receiving the switches, we need to connect to them using serial cable to configure initial IP address:
switch1:
nv set interface eth0 ip address 192.168.200.2/24
nv config apply
switch2:
nv set interface eth0 ip address 192.168.200.3/24
nv config apply
ONIE Upgrade
Assuming the switch is running version less than 5.12, we need to upgrade using ONIE. To do that, download the new cumulus image to the management node, and serve it via http:
Management node:
# curl -O https://somewhere/cumulus-linux-5.15.0-mlx-amd64.bin
# python3 -m http.server 9999
The switches can then be upgrade to this image using:
sudo onie-install -a -i http://192.168.200.4:9999/cumulus-linux-5.15.0-mlx-amd64.bin
sudo reboot
After approximately 10 minutes, the switches should have completed the upgrade, and have now lost all configuration. If you still have the serial connection available, use it to again configure the management IP as above. Otherwise, you should have made a not of the mac address of eth0 on the switches, so that you could configure a DHCP server on the management node and provide IP address to the switches that way:
# yum install dhcp-server
# cat <<'EOF' > /etc/dhcp/dhcpd.conf
subnet 192.168.200.0 netmask 255.255.255.0 {
default-lease-time 60;
max-lease-time 7200;
}
host switch1 {
hardware ethernet d8:94:24:e6:d5:8a;
fixed-address 192.168.200.2;
max-lease-time 3600;
}
host switch2 {
hardware ethernet d8:94:24:cd:62:0a;
fixed-address 192.168.200.3;
max-lease-time 3600;
}
EOF
# systemctl enable dhcpd
Connect back into the switches to disable dhcp and configure static management IP:
switch1:
nv set system hostname switch1
nv set interface eth0 ip address 192.168.200.2/24
nv unset interface eth0 ipv4 dhcp-client
nv config apply
switch2:
nv set system hostname switch2
nv set interface eth0 ip address 192.168.200.3/24
nv unset interface eth0 ipv4 dhcp-client
nv config apply
NTP Config
We’ll use the management node as NTP server for the switches:
nv unset system ntp server
nv set system ntp server 192.168.200.4 iburst enabled
nv config apply
MLAG peer config
Then we need to configure some peerlink interfaces for MLAG. Probably best to avoid two first, and two last ports, as those tend to be the only ports supporting high power transceivers. At least for SN2xx0 switches. So we connect port 3 and 4 between the two switches. 3-to-3, 4-to-4 and configure:
Switch1:
nv set interface peerlink bond member swp3-4
nv set mlag mac-address 44:38:39:E5:5E:55
nv set mlag peer-ip linklocal
nv set mlag backup 192.168.200.3 vrf mgmt
nv config apply
Switch2:
nv set interface peerlink bond member swp3-4
nv set mlag mac-address 44:38:39:E5:5E:55
nv set mlag peer-ip linklocal
nv set mlag backup 192.168.200.2 vrf mgmt
nv config apply
OBS: Make sure to use same mac-address for each pair of switches, and different mac-addresses for other pairs. Also, mac-address should start with 44:38:39:
MLAG host port config
Connect each host to same port number(s) in both switches, then configure bonding for these ports. I like to name the bond the same number as the first switch-port number.
nv set interface bond5 bond member swp5
nv set interface bond5 description ems
nv set interface bond5 bond mlag id 5
nv set interface bond5 bridge domain br_default
nv set interface bond6 bond member swp6-7
nv set interface bond6 description essio1
nv set interface bond6 bond mlag id 6
nv set interface bond6 bridge domain br_default
nv set interface bond8 bond member swp8-9
nv set interface bond8 description essio2
nv set interface bond8 bond mlag id 8
nv set interface bond8 bridge domain br_default
nv config apply
VLAN
We can add VLANs to the br_default device:
nv set bridge domain br_default vlan 123,321
Configure some ports to be access ports in given VLAN:
nv set interface bond7 bridge domain br_default access 123
Disable native VLAN:
nv set interface bond7 bridge domain br_default untagged none
Show port-vlan:
$ nv show bridge port-vlan
domain port vlan tag-state
------- --------- --------- ---------
br_default bond1 4070 untagged
bond3 4070 untagged
bond5 4070 untagged
bond7 4070 untagged
bond9 4070 untagged
bond11 4070 untagged
peerlink 1 untagged
4070-4071 tagged
Debug
Verify mlag config between peers:
$ nv show mlag consistency-checker global
Parameter LocalValue PeerValue Conflict Summary
---------------------- ------------------------- ------------------------- -------- -------
anycast-ip - - -
bridge-priority 32768 32768 -
bridge-stp on on -
bridge-type vlan-aware vlan-aware -
clag-pkg-version 1.6.0-cl5.6.0u15 1.6.0-cl5.6.0u15 -
clag-protocol-version 1.6.1 1.6.1 -
peer-ip fe80::4ab0:2dff:fe78:7569 fe80::4ab0:2dff:fe78:7569 -
peerlink-bridge-member Yes Yes -
peerlink-mtu 9216 9216 -
peerlink-native-vlan 1 1 -
peerlink-vlans 1, 123, 321 1, 123, 321 -
redirect2-enable yes yes -
system-mac 44:38:39:be:ef:bb 44:38:39:be:ef:bb -
$ nv show interface --view=mlag-cc
Interface Parameter LocalValue PeerValue Conflict
--------- ---------------- ----------------- ----------------- --------------------------------------------------
bond3 bridge-learning yes yes -
clag-id 3 3 -
lacp-actor-mac 44:38:39:be:ef:bb 44:38:39:be:ef:bb -
lacp-partner-mac 44:38:39:e5:5e:55 44:38:39:e5:5e:55 -
master br_default NOT-SYNCED -
mtu 9216 9216 -
native-vlan 1 1 -
vlan-id 1, 123, 321 1, 123, 321 -
bond5 bridge-learning yes yes -
clag-id 5 5 -
lacp-actor-mac 44:38:39:be:ef:bb 44:38:39:be:ef:bb -
lacp-partner-mac 12:26:05:34:2a:b1 12:26:05:34:2a:b1 -
master br_default NOT-SYNCED -
mtu 9216 9216 -
native-vlan 123 123 -
vlan-id 123 123 -
bond6 bridge-learning yes yes -
clag-id 6 6 -
lacp-actor-mac 44:38:39:be:ef:bb 44:38:39:be:ef:bb -
lacp-partner-mac ee:da:e0:ab:d3:7f ee:da:e0:ab:d3:7f -
master br_default NOT-SYNCED -
mtu 9216 9216 -
native-vlan 1 1 -
vlan-id 1, 123, 321 1, 123, 321 -
bond7 bridge-learning yes yes -
clag-id 7 7 -
lacp-actor-mac 44:38:39:be:ef:bb 44:38:39:be:ef:bb -
lacp-partner-mac 02:e2:5f:ef:8d:6c 02:e2:5f:ef:8d:6c -
master br_default NOT-SYNCED -
mtu 9216 9216 -
native-vlan - - -
vlan-id 123 123, 321 vlan mismatch on clag interface between clag peers
Command snippets
Show commands to implement current configuration:
nv config show -o commands
Set password for cumulus user, overriding the strict password policy by providing pre-hashed password:
nv set system aaa user cumulus hashed-password '$6$goMu8n27O1gmG5MK$ZzmvRgn8M1nPJcJhkuK/OZbQH.esZ7.xsHO4pM7YLXVoUwyo50BH88acPFvCqwizFMl91/gzIgYBqgqVY8Ev80'
Save current configuration to /etc/nvue.d/startup.yaml, and copy offline:
nv config save
scp /etc/nvue.d/startup.yaml 192.168.200.4:
Clear physical counters per interface:
cumulus@switch:~$ nv action clear interface swp1 link phy-detail
Action executing ...
swp1 link phy-detail counters cleared.
Action succeeded
Reset tranceiver:
cumulus@switch:~$ nv action reset platform transceiver swp1
Action executing ...
Resetting module swp1 ... OK
Action succeeded
Show tranceivers:
cumulus@switch:~$ nv show platform transceiver
Transceiver Identifier Vendor name Vendor PN Vendor SN Vendor revision
----------- ---------- ----------- ---------------- ------------- ---------------
swp1 QSFP28 Mellanox MCP1600-C001E30N MT2039VB01185 A3
swp10 QSFP28 Mellanox MCP1600-C001E30N MT2211VS01792 A3
swp11 QSFP28 Mellanox MCP1600-C001E30N MT2211VS01792 A3
swp12 QSFP28 Mellanox MCP1650-V00AE30 MT2122VB02220 A2
swp13 QSFP28 Mellanox MCP1650-V00AE30 MT2122VB02220 A2
More details:
cumulus@switch:~$ nv show platform transceiver swp2
cumulus@switch:~$ nv show interface swp1 transceiver
Show all config, including defaults:
nv config show --all
Prescriptive Topology Manager - PTM
PTM can be used for cabling verification. By creating a Graphviz-DOT formatted /etc/ptm.d/topology.dot file containing all nodes supporting LLDP, we can make sure the real world matches with the expectations. F.ex:
$ sudo cat /etc/ptm.d/topology.dot
graph G {
"ess-sw-02":"eth0" -- "sonic":"Eth47(Port47)";
"ess-sw-01":"swp27" -- "nas-sw1":"Eth1/27";
"ess-sw-02":"swp27" -- "nas-sw2":"Eth1/27";
"ess-sw-01":"swp29" -- "ess-sw-02":"swp29";
"ess-sw-01":"swp30" -- "ess-sw-02":"swp30";
}
$ sudo systemctl restart ptmd
Same topology file can be pushed to all nodes, and verified with pmctl command:
$ sudo ptmctl
---------------
port cbl
status
---------------
eth0 pass
swp27 pass
swp29 pass
swp30 pass
$ sudo ptmctl -d
-----------------------------------------------------------------------------------------------------------------------------
port cbl exp act sysname portID portDescr match last
status nbr nbr on upd
-----------------------------------------------------------------------------------------------------------------------------
eth0 pass sonic:Eth47(Port47) sonic:Eth47(Port47) sonic Eth47(Port47) Ethernet46 IfName 9m:46s
swp27 pass nas-sw2:Eth1/27 nas-sw2:Eth1/27 nas-sw2 Eth1/27 IfName 9m:46s
swp29 pass ess-sw-01:swp29 ess-sw-01:swp29 ess-sw-01 swp29 swp29 IfName 9m:46s
swp30 pass ess-sw-01:swp30 ess-sw-01:swp30 ess-sw-01 swp30 swp30 IfName 9m:46s
TODO/FIXME
- send syslog to central log host (nv show system syslog server)
- open telemetry export ?
- ansible ? https://galaxy.ansible.com/ui/repo/published/nvidia/nvue/
- REST API:
curl -k -u cumulus:cumulus -X GET "https://127.0.0.1:8765/nvue_v1/?rev=applied"