Idle Giants

Idle Giants

Reading time1 min
#gpu#kubernetes#devops#scheduling#machine learning

Modern machine learning loves GPUs. But when you're running your workloads on Kubernetes, those pricey cards can either speed things up—or quietly drain your budget.

This guide breaks down the most common ways GPU scheduling goes wrong on Kubernetes. And it shows how to fix them. If you're dealing with slow jobs and sky-high bills, you're not the only one.


Where Things Go Sideways: 2 Real-World Examples

Case 1: DeepDive AI’s Costly Idle Time

DeepDive AI, a fast-growing startup, moved their ML training to Kubernetes using NVIDIA T4 GPUs. They expected things to get faster. They didn’t.

They ran five GPUs per node. But on average, each GPU sat idle 35% of the time. That’s 56 hours a week doing nothing. By the end of the month? They’d wasted $1,500 on unused GPU time.

What happened?

  • Their pods didn’t ask for GPU resources correctly.
  • Kubernetes didn’t know how to pack them efficiently.
  • The default scheduler just... shrugged.

Case 2: TechCorp’s Overkill Setup

TechCorp Innovations built a GPU-backed microservices architecture. Eighteen services shared one GPU pool. Each one asked for a slice—whether they needed it or not.

The result?

  • Services held on to GPUs they barely touched.
  • Others had to wait in line.
  • The autoscaler? Totally blind to GPU usage.

Monthly GPU waste: $10,000+. Ouch.


Smarter GPU Scheduling: What Actually Works

By default, Kubernetes treats GPUs like a black box. So if you want real efficiency, you need to teach it how to handle them. Here’s how.

1. Be Specific With GPU Requests

Always set both requests and limits for GPUs. That tells Kubernetes exactly what your pods need—and prevents it from guessing wrong.

resources:
  requests:
    nvidia.com/gpu: 1
  limits:
    nvidia.com/gpu: 1

Don’t mix and match values here. If your request is lower than your limit, your pod might never get scheduled.

2. Install the NVIDIA Device Plugin

Without it, Kubernetes won’t even know your GPUs exist.

Grab the NVIDIA device plugin and install it. For production, the NVIDIA GPU Operator (via Helm) gives you extras like monitoring, drivers, and lifecycle management.

3. Watch Your GPUs Like a Hawk

Use tools like:

  • DCGM Exporter – collects GPU telemetry
  • Prometheus + Grafana – shows you what’s happening

Here’s a Terraform snippet to set up GPU metrics tracking:

resource "kubernetes_deployment" "gpu_metrics" {
  metadata {
    name      = "gpu-metrics"
    namespace = "monitoring"
  }

  spec {
    replicas = 1
    selector {
      match_labels = {
        app = "gpu-metrics"
      }
    }

    template {
      metadata {
        labels = {
          app = "gpu-metrics"
        }
      }

      spec {
        container {
          name  = "gpu-exporter"
          image = "nvidia/k8s-gpu-exporter:latest"

          ports {
            container_port = 9100
          }

          resources {
            requests {
              cpu    = "100m"
              memory = "128Mi"
            }
            limits {
              cpu    = "500m"
              memory = "512Mi"
            }
          }
        }
      }
    }
  }
}

Tip: Build dashboards that show you GPU usage over time, idle rates, and scheduling failures. It’ll pay off.

4. Fight Fragmentation With Bin Packing

GPU fragmentation = wasted money.

To avoid it:

  • Use pod affinity/anti-affinity rules to group similar jobs.
  • Run short jobs back-to-back on the same nodes.
  • Add taints and tolerations to keep critical workloads from being pushed aside.

5. Use a Smarter Scheduler

Kubernetes' default scheduler is too generic for serious GPU workloads.

Try tools like:

  • Volcano, Kube-batch, or KubeSlice – good for batch and ML jobs
  • Karpenter – an autoscaler that supports GPU-aware decisions
  • GPUScheduler – an NVIDIA experiment that knows about GPU topology

Tool Stack at a Glance

ToolWhat It Does
KubernetesManages your containers
NVIDIA GPU OperatorHandles drivers + monitoring
DCGM ExporterSends GPU stats to Prometheus
Prometheus + GrafanaShows graphs, sets alerts
HelmMakes deployments repeatable and clean

Final Thoughts

GPUs are powerful—but expensive. If you’re using Kubernetes, you can’t afford to let them sit idle.

Don’t just trust the defaults. Start monitoring your usage. Set resource requests properly. Use smarter scheduling strategies. And most of all—treat GPU efficiency as a first-class problem.

Because every idle GPU hour? That’s money out the door.