Dynamic-Resource-Allocation
Introduction
Dynamic-Resource-Allocation (DRA) is a new feature introduced by Kubernetes that puts resource scheduling in the hands of third-party developers. It provides an API more akin to a storage persistent volume, instead of the countable model (e.g., "nvidia.com/gpu: 2") that device-plugin used to request access to resources, with the main benefit being a more flexible and dynamic allocation of hardware resources, resulting in improved resource utilization. The main benefit is more flexible and dynamic allocation of hardware resources, which improves resource utilization and enhances resource scheduling, enabling Pods to schedule the best nodes. DRA is currently available as an alpha feature in Kubernetes 1.26 (December 2022 release), driven by Nvidia and Intel. Spiderpool currently integrates with the DRA framework, which allows for the following, but not limited to:
- Automatically scheduling to the appropriate node based on the NIC and subnet information reported by each node, combined with the SpiderMultusConfig configuration used by the Pod, so as to prevent the Pod from not being able to start up after scheduling to the node.
- Unify the resource usage of multiple device-plugins: sriov-network-device-plugin, k8s-rdma-shared-dev-plugin in the SpiderClaimParameter.
- Continuously updated, see for details. RoadMap
Explanation of nouns
- ResourceClaimTemplate: resourceclaim template for generating resourceclaim resources. One resourceClaimTemplate can generate multiple resourceclaims.
- ResourceClaim: ResourceClaim binds a specific set of node resources for use by the Pod.
- ResourceClass: A ResourceClass represents a resource (e.g., GPU), and a DRA plugin is responsible for driving the resource represented by a ResourceClass.
Environment Preparation
- Prepare a Kubernetes cluster with a higher version than v1.29.0, and enable the dra feature-gate function of the cluster.
- Have Kubectl, [Helm] (https://helm.sh/docs/intro/install/) installed.
Quick Start
-
Currently DRA is not turned on by default as an alpha feature of Kubernetes. So we need to turn it on manualways, as following steps.
Add the following to the kube-apiserver startup parameters.
Add the following to the kube-controller-manager startup parameters.
Add the following to kube-scheduler's startup parameters:
-
DRA needs to rely on [CDI] (https://github.com/cncf-tags/container-device-interface), so it needs container runtime support. In this article, we take containerd as an example, and we need to enable cdi function manually.
Modify the containerd configuration file to configure CDI.
~# vim /etc/containerd/config.toml ... [plugins. "io.containerd.grpc.v1.cri"] enable_cdi = true cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"] ~# systemctl restart containerd
It is recommended that containerd be older than v1.7.0, as CDI is supported in later versions. The version supported by different runtimes is not the same, please check if it is supported first.
-
Install Spiderpool, taking care to enable CDI.
``` helm repo add spiderpool https://spidernet-io.github.io/spiderpool helm repo update spiderpool helm install spiderpool spiderpool/spiderpool --namespace kube-system --set dra.enabled=true
-
Verify the installation
Check that the Spiderpool pod is running correctly, and check for the presence of the resourceclass resource:
~# kubectl get po -n kube-system | grep spiderpool spiderpool-agent-hqt2b 1/1 Running 0 20d spiderpool-agent-nm9vl 1/1 Running 0 20d spiderpool-controller-7d7f4f55d4-w2rv5 1/1 Running 0 20d spiderpool-init 0/1 Completed 0 21d ~# kubectl get resourceclass NAME DRIVERNAME AGE netresources.spidernet.io netresources.spidernet.io 20d
netresources.spidernet.io is Spiderpool's resourceclass, and Spiderpool will take care of creating and allocating resourceclaims belonging to this resourceclass.
-
Create SpiderIPPool and SpiderMultusConfig instances.
Note: This step can be skipped if your cluster already has other CNIs installed or does not require an underlay CNI with Macvlan.
MACVLAN_MASTER_INTERFACE="eth0" cat <<EOF | kubectl apply -f - apiVersion: spiderpool.spidernet.io/v2beta1 kind: SpiderMultusConfig metadata: name: macvlan-config name: macvlan-conf namespace: kube-system metadata: name: macvlan-conf namespace: kube-system cniType: macvlan macvlan. master: ${MACVLAN_MASTER_INTERFACE} - ${MACVLAN_MASTER_INTERFACE} EOF
SpiderMultusConfig will automatically create the Multus network-attachment-definetion instance
`shell cat <<EOF | kubectl apply -f - apiVersion: spiderpool.spidernet.io/v2beta1 kind: SpiderIPPool metadata: name: ippool-test name: ippool-test spec. ips. - "172.18.30.131-172.18.30.140" subnet: 172.18.0.0/16 gateway: 172.18.0.1 multusName. - kube-system/macvlan-conf EOF
-
Create resource files such as workloads and resourceClaim.
~# export NAME=demo ~# cat <<EOF | kubectl apply -f - apiVersion: spiderpool.spidernet.io/v2beta1 kind: SpiderClaimParameter metadata: name: ${NAME} --- apiVersion: resource.k8s.io/v1alpha2 kind: ResourceClaimTemplate metadata: name: ${NAME} spec: spec: resourceClassName: netresources.spidernet.io parametersRef: apiGroup: spiderpool.spidernet.io kind: SpiderClaimParameter name: ${NAME} --- apiVersion: apps/v1 kind: Deployment metadata: name: ${NAME} spec: replicas: 2 selector: matchLabels: app: ${NAME} template: metadata: annotations: v1.multus-cni.io/default-network: kube-system/macvlan-conf labels: app: ${NAME} spec: containers: - name: ctr image: nginx resources: claims: - name: ${NAME} resourceClaims: - name: ${NAME} source: resourceClaimTemplateName: ${NAME} EOF
Create a ResourceClaimTemplate, K8s will create its own unique Resourceclaim for each Pod based on this ResourceClaimTemplate. the declaration cycle of the Resourceclaim will be consistent with that of the Pod. The declaration cycle of the Resourceclaim is consistent with that of the Pod.
The SpiderClaimParameter is used to extend the configuration parameters of the ResourceClaim, which will affect the scheduling of the ResourceClaim and the generation of its CDI file. In this example, setting rdmaAcc to true will affect whether or not the configured so file is mounted.
A Pod's container affects the resources required by containerd by declaring the use of claims in Resources. The CDI file corresponding to the claim is translated into an OCI Spec configuration when the container is run, which determines the container's creation.
If the Pod creation fails with "unresolvable CDI devices: xxxx", it is possible that the CDI version supported by the container at runtime is too low, which makes the container unable to parse the cdi file at runtime. Currently, the default CDI version of Spiderpool is the latest one. You can specify a lower version in the SpiderClaimParameter instance via annotation: "dra.spidernet.io/cdi-version", e.g.: dra.spidernet.io/cdi-version: 0.5.0
-
Validation
After creating the Pod, view the generated resource files such as ResourceClaim.
~# kubectl get resourceclaim NAME RESOURCECLASSNAME ALLOCATIONMODE STATE AGE demo-745fb4c498-72g7g-demo-7d458 netresources.spidernet.io WaitForFirstConsumer allocated,reserved 20d ~# cat /var/run/cdi/k8s.netresources.spidernet.io-claim_1e15705a-62fe-4694-8535-93a5f0ccf996.yaml --- cdiVersion: 0.6.0 containerEdits: {} devices: - containerEdits: env: - DRA_CLAIM_UID=1e15705a-62fe-4694-8535-93a5f0ccf996 name: 1e15705a-62fe-4694-8535-93a5f0ccf996 kind: k8s.netresources.spidernet.io/claim
This shows that the ResourceClaim has been created, and STATE shows allocated and reserverd, indicating that it has been used by the pod. And spiderpool has generated a CDI file for the ResourceClaim, which describes the files and environment variables to be mounted.
Check that the pod is Running and verify that the the environment variable (DRA_CLAIM_UID) is declared.
~# kubectl get po NAME READY STATUS RESTARTS AGE nginx-745fb4c498-72g7g 1/1 Running 0 20m nginx-745fb4c498-s92qr 1/1 Running 0 20m ~# kubectl exec -it nginx-745fb4c498-72g7g sh ~# printenv DRA_CLAIM_UID 1e15705a-62fe-4694-8535-93a5f0ccf996
You can see that the Pod's containers have correctly declared environment variables, It shows the dra is works.
Welcome to try it out
DRA is currently available as an alpha feature of Spiderpool, and we'll be expanding it with more capabilities in the future, so feel free to try it out. Please let us know if you have any further questions or requests.