IPoIB For Infiniband
English | 简体中文
Introduction
This chapter introduces how POD access network with the IPoIP interface.
IPoIB CNI provides an IPoIB network card for POD, without RDMA device. It is suitable for conventional applications that require TCP/IP communication, as it does not require an SRIOV network card, allowing more PODs to run on the host
IPoIB
The following steps demonstrate how to use IPoIB on a cluster with 2 nodes, it enables Pod to own a regular TCP/IP network cards without RDMA device.
-
Ensure that the host machine has an Infiniband card installed and the driver is properly installed.
In our demo environment, the host machine is equipped with a Mellanox ConnectX-5 VPI NIC.
Follow the official NVIDIA guide to install the latest OFED driver.
For Mellanox's VPI series network cards, you can refer to the official Switching Infiniband Mode to ensure that the network card is working in Infiniband mode.
To confirm the presence of Inifiniband devices, use the following command:
~# lspci -nn | grep Infiniband 86:00.0 Infiniband controller [0207]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017] ~# rdma link link mlx5_0/1 subnet_prefix fe80:0000:0000:0000 lid 2 sm_lid 2 lmc 0 state ACTIVE physical_state LINK_UP ~# ibstat mlx5_0 | grep "Link layer" Link layer: InfiniBand
Check the ipoib interface of the Inifiniband device
~# ip a show ibs5f0 9: ibs5f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256 link/infiniband 00:00:10:49:fe:80:00:00:00:00:00:00:e8:eb:d3:03:00:93:ae:10 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff altname ibp134s0f0 inet 172.91.0.10/16 brd 172.91.255.255 scope global ibs5f0 valid_lft forever preferred_lft forever inet6 fd00:91::172:91:0:10/64 scope global valid_lft forever preferred_lft forever inet6 fe80::eaeb:d303:93:ae10/64 scope link valid_lft forever preferred_lft forever
-
Install Spiderpool
If you are a user from China, you can specify the parameter
--set global.imageRegistryOverride=ghcr.m.daocloud.io
to pull image from China registry.Once the installation is complete, the following components will be installed:
~# kubectl get pod -n kube-system spiderpool-agent-9sllh 1/1 Running 0 1m spiderpool-agent-h92bv 1/1 Running 0 1m spiderpool-controller-7df784cdb7-bsfwv 1/1 Running 0 1m spiderpool-init 0/1 Completed 0 1m
-
Create the CNI configuration of ipoib, and the ippool. The
spec.ipoib.master
of SpiderMultusConfig should be set to the infiniband interface of the node.cat <<EOF | kubectl apply -f - apiVersion: spiderpool.spidernet.io/v2beta1 kind: SpiderIPPool metadata: name: v4-91 spec: gateway: 172.91.0.1 ips: - 172.91.0.100-172.91.0.120 subnet: 172.91.0.0/16 --- apiVersion: spiderpool.spidernet.io/v2beta1 kind: SpiderMultusConfig metadata: name: ipoib namespace: kube-system spec: cniType: ipoib ipoib: master: "ibs5f0" ippools: ipv4: ["v4-91"] EOF
-
Following the configurations from the previous step, create a DaemonSet application that spans across nodes for testing
ANNOTATION_MULTUS="v1.multus-cni.io/default-network: kube-system/ipoib" NAME=ipoib cat <<EOF | kubectl apply -f - apiVersion: apps/v1 kind: DaemonSet metadata: name: ${NAME} labels: app: $NAME spec: selector: matchLabels: app: $NAME template: metadata: name: $NAME labels: app: $NAME annotations: ${ANNOTATION_MULTUS} spec: containers: - image: docker.io/mellanox/rping-test imagePullPolicy: IfNotPresent name: mofed-test securityContext: capabilities: add: [ "IPC_LOCK" ] command: - sh - -c - | ls -l /dev/infiniband /sys/class/net sleep 1000000 EOF
-
Verify that the network communication is correct between the PODs across nodes.
~# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ipoib-psf4q 1/1 Running 0 34s 172.91.0.112 10-20-1-20 <none> <none> ipoib-t9hm7 1/1 Running 0 34s 172.91.0.116 10-20-1-10 <none> <none>
Succeed to access each other
~# kubectl exec -it ipoib-psf4q bash kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead. root@ipoib-psf4q:/# ping 172.91.0.116 PING 172.91.0.116 (172.91.0.116) 56(84) bytes of data. 64 bytes from 172.91.0.116: icmp_seq=1 ttl=64 time=1.10 ms 64 bytes from 172.91.0.116: icmp_seq=2 ttl=64 time=0.235 ms