Allocate existing devices to workloads with DRA

Standard

You can flexibly request devices for your Google Kubernetes Engine (GKE) workloads by using dynamic resource allocation (DRA). This document shows you how to create a ResourceClaimTemplate to request existing devices in node pools in your cluster, and then create a workload to observe how Kubernetes flexibly allocates the devices to your Pods.

This document is intended for Application operators and Data engineers who run workloads like AI/ML or high performance computing (HPC).

About requesting devices with DRA

When you set up your GKE infrastructure for DRA, the DRA drivers on your nodes create DeviceClass objects in the cluster. A DeviceClass defines a category of devices, such as GPUs, that are available to request for workloads. A platform administrator can optionally deploy additional DeviceClasses that limit which devices you can request in specific workloads.

To request devices within a DeviceClass, you create one of the following objects:

ResourceClaim: A ResourceClaim lets a Pod or a user request hardware resources by filtering for certain parameters within a DeviceClass.
ResourceClaimTemplate: A ResourceClaimTemplate defines a template that Pods can use to automatically create new per-Pod ResourceClaims.

For more information about ResourceClaims and ResourceClaimTemplates, see When to use ResourceClaims and ResourceClaimTemplates.

The examples in this document use a basic ResourceClaimTemplate to request the specified device configuration. For more information about all of the fields that you can specify, see the ResourceClaimTemplate API reference.

Limitations

The following limitations apply:

Limitations of DRA in GKE
Device-specific limitations, which apply regardless of whether you use DRA: GPU workloads on Standard clusters

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the gcloud components update command. Earlier gcloud CLI versions might not support running the commands in this document.
Note: For existing gcloud CLI installations, make sure to set the compute/region property. If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Ensure that your GKE clusters are configured for DRA workloads.

Claim devices and deploy workloads

To request per-Pod device allocation, you create a ResourceClaimTemplate that has your requested device configuration, such as GPUs of a specific type. When you deploy a workload that references the ResourceClaimTemplate, Kubernetes creates ResourceClaims for each Pod in the workload based on the ResourceClaimTemplate. Kubernetes allocates the requested resources and schedules the Pods on corresponding nodes.

To request GPUs in a workload with DRA, follow these steps:

Save the following manifest as gpu-claim-template.yaml:

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: gpu-claim-template
spec:
  spec:
    devices:
      requests:
      - name: single-gpu
        exactly:
          deviceClassName: gpu.nvidia.com
          allocationMode: ExactCount
          count: 1

Create the ResourceClaimTemplate:

kubectl create -f gpu-claim-template.yaml

To create a workload that references the ResourceClaimTemplate, save the following manifest as dra-gpu-example.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dra-gpu-example
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dra-gpu-example
  template:
    metadata:
      labels:
        app: dra-gpu-example
    spec:
      containers:
      - name: ctr
        image: ubuntu:22.04
        command: ["bash", "-c"]
        args: ["echo $(nvidia-smi -L || echo Waiting...); sleep infinity"]
        resources:
          claims:
          - name: single-gpu
      resourceClaims:
      - name: single-gpu
        resourceClaimTemplateName: gpu-claim-template
      tolerations:
      - key: "nvidia.com/gpu"
        operator: "Exists"
        effect: "NoSchedule"

Deploy the workload:
```
kubectl create -f dra-gpu-example.yaml
```

Verify the hardware allocation

You can verify that your workloads have been allocated hardware by checking the ResourceClaim or by looking at the logs for your Pod. To verify the allocation, follow these steps:

Get the ResourceClaim associated with the workload that you deployed:

kubectl get resourceclaims

The output is similar to the following:

NAME                                               STATE                AGE
dra-gpu-example-64b75dc6b-x8bd6-single-gpu-jwwdh   allocated,reserved   9s

Get more details about the hardware assigned to the Pod:

kubectl describe resourceclaims RESOURCECLAIM

Replace RESOURCECLAIM with the full name of the ResourceClaim that you got from the output of the previous step.

The output is similar to the following:

Name:         dra-gpu-example-68f595d7dc-prv27-single-gpu-qgjq5
Namespace:    default
Labels:       <none>
Annotations:  resource.kubernetes.io/pod-claim-name: single-gpu
API Version:  resource.k8s.io/v1
Kind:         ResourceClaim
Metadata:
# Multiple lines are omitted here.
Spec:
  Devices:
    Requests:
      Exactly:
        Allocation Mode:    ExactCount
        Count:              1
        Device Class Name:  gpu.nvidia.com
      Name:                 single-gpu
Status:
  Allocation:
    Devices:
      Results:
        Device:   gpu-0
        Driver:   gpu.nvidia.com
        Pool:     gke-cluster-1-dra-gpu-pool-b56c4961-7vnm
        Request:  single-gpu
    Node Selector:
      Node Selector Terms:
        Match Fields:
          Key:       metadata.name
          Operator:  In
          Values:
            gke-cluster-1-dra-gpu-pool-b56c4961-7vnm
  Reserved For:
    Name:      dra-gpu-example-68f595d7dc-prv27
    Resource:  pods
    UID:       e16c2813-08ef-411b-8d92-a72f27ebf5ef
Events:        <none>

Get logs for the workload that you deployed:

kubectl logs deployment/dra-gpu-example --all-pods=true

The output is similar to the following:

[pod/dra-gpu-example-64b75dc6b-x8bd6/ctr] GPU 0: Tesla T4 (UUID: GPU-2087ac7a-f781-8cd7-eb6b-b00943cc13ef)

This output indicates that GKE allocated one GPU to the container.

What's next

Explore more resources for AI/ML orchestration on GKE

Allocate existing devices to workloads with DRA Stay organized with collections Save and categorize content based on your preferences.

About requesting devices with DRA

Limitations

Before you begin

Claim devices and deploy workloads

Verify the hardware allocation

What's next

Allocate existing devices to workloads with DRA