Idle Giants

Modern machine learning loves GPUs. But when you're running your workloads on Kubernetes, those pricey cards can either speed things up—or quietly drain your budget.

This guide breaks down the most common ways GPU scheduling goes wrong on Kubernetes. And it shows how to fix them. If you're dealing with slow jobs and sky-high bills, you're not the only one.

Where Things Go Sideways: 2 Real-World Examples

Case 1: DeepDive AI’s Costly Idle Time

DeepDive AI, a fast-growing startup, moved their ML training to Kubernetes using NVIDIA T4 GPUs. They expected things to get faster. They didn’t.

They ran five GPUs per node. But on average, each GPU sat idle 35% of the time. That’s 56 hours a week doing nothing. By the end of the month? They’d wasted $1,500 on unused GPU time.

What happened?

Their pods didn’t ask for GPU resources correctly.
Kubernetes didn’t know how to pack them efficiently.
The default scheduler just... shrugged.

Case 2: TechCorp’s Overkill Setup

TechCorp Innovations built a GPU-backed microservices architecture. Eighteen services shared one GPU pool. Each one asked for a slice—whether they needed it or not.

The result?

Services held on to GPUs they barely touched.
Others had to wait in line.
The autoscaler? Totally blind to GPU usage.

Monthly GPU waste: $10,000+. Ouch.

Smarter GPU Scheduling: What Actually Works

By default, Kubernetes treats GPUs like a black box. So if you want real efficiency, you need to teach it how to handle them. Here’s how.

1. Be Specific With GPU Requests

Always set both requests and limits for GPUs. That tells Kubernetes exactly what your pods need—and prevents it from guessing wrong.

resources:
  requests:
    nvidia.com/gpu: 1
  limits:
    nvidia.com/gpu: 1

Don’t mix and match values here. If your request is lower than your limit, your pod might never get scheduled.

2. Install the NVIDIA Device Plugin

Without it, Kubernetes won’t even know your GPUs exist.

Grab the NVIDIA device plugin and install it. For production, the NVIDIA GPU Operator (via Helm) gives you extras like monitoring, drivers, and lifecycle management.

3. Watch Your GPUs Like a Hawk

Use tools like:

DCGM Exporter – collects GPU telemetry
Prometheus + Grafana – shows you what’s happening

Here’s a Terraform snippet to set up GPU metrics tracking:

resource "kubernetes_deployment" "gpu_metrics" {
  metadata {
    name      = "gpu-metrics"
    namespace = "monitoring"
  }

  spec {
    replicas = 1
    selector {
      match_labels = {
        app = "gpu-metrics"
      }
    }

    template {
      metadata {
        labels = {
          app = "gpu-metrics"
        }
      }

      spec {
        container {
          name  = "gpu-exporter"
          image = "nvidia/k8s-gpu-exporter:latest"

          ports {
            container_port = 9100
          }

          resources {
            requests {
              cpu    = "100m"
              memory = "128Mi"
            }
            limits {
              cpu    = "500m"
              memory = "512Mi"
            }
          }
        }
      }
    }
  }
}

Tip: Build dashboards that show you GPU usage over time, idle rates, and scheduling failures. It’ll pay off.

4. Fight Fragmentation With Bin Packing

GPU fragmentation = wasted money.

To avoid it:

Use pod affinity/anti-affinity rules to group similar jobs.
Run short jobs back-to-back on the same nodes.
Add taints and tolerations to keep critical workloads from being pushed aside.

5. Use a Smarter Scheduler

Kubernetes' default scheduler is too generic for serious GPU workloads.

Try tools like:

Volcano, Kube-batch, or KubeSlice – good for batch and ML jobs
Karpenter – an autoscaler that supports GPU-aware decisions
GPUScheduler – an NVIDIA experiment that knows about GPU topology

Tool Stack at a Glance

Tool	What It Does
Kubernetes	Manages your containers
NVIDIA GPU Operator	Handles drivers + monitoring
DCGM Exporter	Sends GPU stats to Prometheus
Prometheus + Grafana	Shows graphs, sets alerts
Helm	Makes deployments repeatable and clean

Final Thoughts

GPUs are powerful—but expensive. If you’re using Kubernetes, you can’t afford to let them sit idle.

Don’t just trust the defaults. Start monitoring your usage. Set resource requests properly. Use smarter scheduling strategies. And most of all—treat GPU efficiency as a first-class problem.

Because every idle GPU hour? That’s money out the door.