Skip to content

IPoIB For Infiniband

English简体中文

Introduction

This chapter introduces how POD access network with the IPoIP interface.

IPoIB CNI provides an IPoIB network card for POD, without RDMA device. It is suitable for conventional applications that require TCP/IP communication, as it does not require an SRIOV network card, allowing more PODs to run on the host

IPoIB

The following steps demonstrate how to use IPoIB on a cluster with 2 nodes, it enables Pod to own a regular TCP/IP network cards without RDMA device.

  1. Ensure that the host machine has an Infiniband card installed and the driver is properly installed.

    In our demo environment, the host machine is equipped with a Mellanox ConnectX-5 VPI NIC.

    Follow the official NVIDIA guide to install the latest OFED driver.

    For Mellanox's VPI series network cards, you can refer to the official Switching Infiniband Mode to ensure that the network card is working in Infiniband mode.

    To confirm the presence of Inifiniband devices, use the following command:

    ~# lspci -nn | grep Infiniband
    86:00.0 Infiniband controller [0207]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
    
    ~# rdma link
    link mlx5_0/1 subnet_prefix fe80:0000:0000:0000 lid 2 sm_lid 2 lmc 0 state ACTIVE physical_state LINK_UP
    
    ~# ibstat mlx5_0 | grep "Link layer"
    Link layer: InfiniBand
    

    Check the ipoib interface of the Inifiniband device

    ~# ip a show ibs5f0
    9: ibs5f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:10:49:fe:80:00:00:00:00:00:00:e8:eb:d3:03:00:93:ae:10 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp134s0f0
    inet 172.91.0.10/16 brd 172.91.255.255 scope global ibs5f0
    valid_lft forever preferred_lft forever
    inet6 fd00:91::172:91:0:10/64 scope global
    valid_lft forever preferred_lft forever
    inet6 fe80::eaeb:d303:93:ae10/64 scope link
    valid_lft forever preferred_lft forever
    
  2. Install Spiderpool

    If you are a user from China, you can specify the parameter --set global.imageRegistryOverride=ghcr.m.daocloud.io to pull image from China registry.

    Once the installation is complete, the following components will be installed:

    ~# kubectl get pod -n kube-system
    spiderpool-agent-9sllh                         1/1     Running     0          1m
    spiderpool-agent-h92bv                         1/1     Running     0          1m
    spiderpool-controller-7df784cdb7-bsfwv         1/1     Running     0          1m
    spiderpool-init                                0/1     Completed   0          1m
    
  3. Create the CNI configuration of ipoib, and the ippool. The spec.ipoib.master of SpiderMultusConfig should be set to the infiniband interface of the node.

    cat <<EOF | kubectl apply -f -
    apiVersion: spiderpool.spidernet.io/v2beta1
    kind: SpiderIPPool
    metadata:
      name: v4-91
    spec:
      gateway: 172.91.0.1
      ips:
        - 172.91.0.100-172.91.0.120
      subnet: 172.91.0.0/16
    ---
    apiVersion: spiderpool.spidernet.io/v2beta1
    kind: SpiderMultusConfig
    metadata:
      name: ipoib
      namespace: kube-system
    spec:
      cniType: ipoib
      ipoib:
        master: "ibs5f0"
        ippools:
          ipv4: ["v4-91"]
    EOF
    
  4. Following the configurations from the previous step, create a DaemonSet application that spans across nodes for testing

    ANNOTATION_MULTUS="v1.multus-cni.io/default-network: kube-system/ipoib"
    NAME=ipoib
    cat <<EOF | kubectl apply -f -
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      name: ${NAME}
      labels:
        app: $NAME
    spec:
      selector:
        matchLabels:
          app: $NAME
      template:
        metadata:
          name: $NAME
          labels:
            app: $NAME
          annotations:
            ${ANNOTATION_MULTUS}
        spec:
          containers:
          - image: docker.io/mellanox/rping-test
            imagePullPolicy: IfNotPresent
            name: mofed-test
            securityContext:
              capabilities:
                add: [ "IPC_LOCK" ]
            command:
            - sh
            - -c
            - |
              ls -l /dev/infiniband /sys/class/net
              sleep 1000000
    EOF
    
  5. Verify that the network communication is correct between the PODs across nodes.

    ~# kubectl get pod -o wide
    NAME                         READY   STATUS             RESTARTS          AGE    IP             NODE         NOMINATED NODE   READINESS GATES
    ipoib-psf4q                  1/1     Running            0                 34s    172.91.0.112   10-20-1-20   <none>           <none>
    ipoib-t9hm7                  1/1     Running            0                 34s    172.91.0.116   10-20-1-10   <none>           <none>
    

    Succeed to access each other

    ~# kubectl exec -it ipoib-psf4q bash
    kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
    root@ipoib-psf4q:/# ping 172.91.0.116
    PING 172.91.0.116 (172.91.0.116) 56(84) bytes of data.
    64 bytes from 172.91.0.116: icmp_seq=1 ttl=64 time=1.10 ms
    64 bytes from 172.91.0.116: icmp_seq=2 ttl=64 time=0.235 ms