KVM SR-IOV InfiniBand on P9 EMS

Note for how to configure SR-IOV infiniband ports for use for KVM VMs running on a RHEL8 based EMS.

Install and enable libvirtd on host

dnf install libvirt virt-install qemu-kvm
systemctl enable libvirtd
systemctl start libvirtd

Delete mgmt interface, and use it as bridge device instead:

nmcli con del mgmt
nmcli con add type bridge con-name bridge-br0 ifname br0 ip4 10.11.12.125/27 stp no
nmcli con add type bridge-slave ifname mgmt master br0 con-name bridge-slave-mgmt
# ibdev2netdev
mlx5_0 port 1 ==> ib0 (Up)
mlx5_1 port 1 ==> ib1 (Down)
mlx5_2 port 1 ==> ib2 (Up)
mlx5_3 port 1 ==> ib3 (Down)

Configure 2 VFs for each of the online infiniband ports

We want 2 VFs on each of the physical ports, so echo that into the sriov_numvfs files:

[root@ess-kvm01 virt]# echo 2 >  /sys/class/infiniband/mlx5_0/device/sriov_numvfs

This gave us these additional virsh nodedev-list devices that can be assigned to VMs:

+pci_0000_01_00_2
+pci_0000_01_00_3

# echo 2 >  /sys/class/infiniband/mlx5_2/device/sriov_numvfs

This gave us these additional virsh nodedev-list devices that can be assigned to VMs:

+pci_0004_0d_00_2
+pci_0004_0d_00_3

Configure guidnode and guidport for VFs:

quorum01:
pci_0000_01_00_2 mlx5_0
guidnode=de:ca:fd:ea:db:ad:11:21
guidport=de:ca:fd:ea:db:ad:12:21
guidnode=ba:dd:ea:dd:ec:af:11:21
guidport=ba:dd:ea:dd:ec:af:12:21

echo de:ca:fd:ea:db:ad:11:21 > /sys/class/infiniband/mlx5_0/device/sriov/0/node
echo de:ca:fd:ea:db:ad:12:21 > /sys/class/infiniband/mlx5_0/device/sriov/0/port
echo Follow > /sys/class/infiniband/mlx5_0/device/sriov/0/policy

echo ba:dd:ea:dd:ec:af:11:21 > /sys/class/infiniband/mlx5_2/device/sriov/0/node
echo ba:dd:ea:dd:ec:af:12:21 > /sys/class/infiniband/mlx5_2/device/sriov/0/port
echo Follow > /sys/class/infiniband/mlx5_2/device/sriov/0/policy

quorum02:
pci_0000_01_00_3 mlx5_0
pci_0004_0d_00_3 mlx5_2
guidnode=de:ca:fd:ea:db:ad:21:21
guidport=de:ca:fd:ea:db:ad:22:21
guidnode=ba:dd:ea:dd:ec:af:21:21
guidport=ba:dd:ea:dd:ec:af:22:21

echo de:ca:fd:ea:db:ad:21:21 > /sys/class/infiniband/mlx5_0/device/sriov/1/node
echo de:ca:fd:ea:db:ad:22:21 > /sys/class/infiniband/mlx5_0/device/sriov/1/port
echo Follow > /sys/class/infiniband/mlx5_0/device/sriov/1/policy

echo ba:dd:ea:dd:ec:af:21:21 > /sys/class/infiniband/mlx5_2/device/sriov/1/node
echo ba:dd:ea:dd:ec:af:22:21 > /sys/class/infiniband/mlx5_2/device/sriov/1/port
echo Follow > /sys/class/infiniband/mlx5_2/device/sriov/1/policy

for i in $(lspci -D | grep Mellanox  |grep Virtual |awk '{print $1}'); do
        echo $i > /sys/bus/pci/drivers/mlx5_core/unbind
        echo $i > /sys/bus/pci/drivers/mlx5_core/bind
done

Create the VMs and install them:

cp RHEL-8.10.0-20240516.14-ppc64le-dvd1.iso   /vm/RHEL-8.10.0-20240516.14-ppc64le-dvd1.iso

virt-install --connect qemu:///system --name quorum01 --ram 32000 --vcpus 8 --disk /vm/quorum01.qcow2,size=80 --os-type=linux --os-variant=rhel8.10 --network=bridge:br0,model=virtio --serial pty --graphics none --console pty,target.type=virtio --cdrom /vm/RHEL-8.10.0-20240516.14-ppc64le-dvd1.iso --hostdev pci_0000_01_00_2 --hostdev pci_0004_0d_00_2

virt-install --connect qemu:///system --name quorum02 --ram 32000 --vcpus 8 --disk /vm/quorum02.qcow2,size=80 --os-type=linux --os-variant=rhel8.10 --network=bridge:br0,model=virtio --serial pty --graphics none --console pty,target.type=virtio --cdrom /vm/RHEL-8.10.0-20240516.14-ppc64le-dvd1.iso --hostdev pci_0000_01_00_3 --hostdev pci_0004_0d_00_3

Once VM is up, configure bonding over IPoIB interfaces:

nmcli con add type bond ifname bond0 con-name bond-bond0 mtu 2044 ipv4.method manual ipv6.method ignore bond.options mode=active-backup,miimon=100 ipv4.addresses 10.112.99.130/16
nmcli con mod bond-bond0 ipv4.routes "10.255.82.0/24 10.112.99.9","172.28.22.128/25 10.112.99.86" ; nmcli con up bond-bond0

nmcli con add type Infiniband ifname ib0 con-name bond-slave-ib0 mtu 2044 master bond0 slave-type bond
nmcli con add type Infiniband ifname ib1 con-name bond-slave-ib1 mtu 2044 master bond0 slave-type bond

Configure SR-IOV before libvirtd is started at boot, autostart VMs

# vi /etc/systemd/system/sr-iov-setup.service
[Unit]
Description=Configure SR-IOV interfaces for KVM
After=network.target rsyslog.service
Before=libvirtd.service

[Service]
Type=oneshot
ExecStart=/usr/local/bin/sr-iov-setup.sh
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target


# systemctl daemon-reload
# systemctl enable sr-iov-setup.service
# virsh autostart  quorum01
# virsh autostart  quorum02

Reboot ess-kvm01 – and everything came back up automatically.