Modern machine learning loves GPUs. But when you're running your workloads on Kubernetes, those pricey cards can either speed things up—or quietly drain your budget.
This guide breaks down the most common ways GPU scheduling goes wrong on Kubernetes. And it shows how to fix them. If you're dealing with slow jobs and sky-high bills, you're not the only one.
Where Things Go Sideways: 2 Real-World Examples
Case 1: DeepDive AI’s Costly Idle Time
DeepDive AI, a fast-growing startup, moved their ML training to Kubernetes using NVIDIA T4 GPUs. They expected things to get faster. They didn’t.
They ran five GPUs per node. But on average, each GPU sat idle 35% of the time. That’s 56 hours a week doing nothing. By the end of the month? They’d wasted $1,500 on unused GPU time.
What happened?
- Their pods didn’t ask for GPU resources correctly.
- Kubernetes didn’t know how to pack them efficiently.
- The default scheduler just... shrugged.
Case 2: TechCorp’s Overkill Setup
TechCorp Innovations built a GPU-backed microservices architecture. Eighteen services shared one GPU pool. Each one asked for a slice—whether they needed it or not.
The result?
- Services held on to GPUs they barely touched.
- Others had to wait in line.
- The autoscaler? Totally blind to GPU usage.
Monthly GPU waste: $10,000+. Ouch.
Smarter GPU Scheduling: What Actually Works
By default, Kubernetes treats GPUs like a black box. So if you want real efficiency, you need to teach it how to handle them. Here’s how.
1. Be Specific With GPU Requests
Always set both requests
and limits
for GPUs. That tells Kubernetes exactly what your pods need—and prevents it from guessing wrong.
resources:
requests:
nvidia.com/gpu: 1
limits:
nvidia.com/gpu: 1
Don’t mix and match values here. If your request is lower than your limit, your pod might never get scheduled.
2. Install the NVIDIA Device Plugin
Without it, Kubernetes won’t even know your GPUs exist.
Grab the NVIDIA device plugin and install it. For production, the NVIDIA GPU Operator (via Helm) gives you extras like monitoring, drivers, and lifecycle management.
3. Watch Your GPUs Like a Hawk
Use tools like:
- DCGM Exporter – collects GPU telemetry
- Prometheus + Grafana – shows you what’s happening
Here’s a Terraform snippet to set up GPU metrics tracking:
resource "kubernetes_deployment" "gpu_metrics" {
metadata {
name = "gpu-metrics"
namespace = "monitoring"
}
spec {
replicas = 1
selector {
match_labels = {
app = "gpu-metrics"
}
}
template {
metadata {
labels = {
app = "gpu-metrics"
}
}
spec {
container {
name = "gpu-exporter"
image = "nvidia/k8s-gpu-exporter:latest"
ports {
container_port = 9100
}
resources {
requests {
cpu = "100m"
memory = "128Mi"
}
limits {
cpu = "500m"
memory = "512Mi"
}
}
}
}
}
}
}
Tip: Build dashboards that show you GPU usage over time, idle rates, and scheduling failures. It’ll pay off.
4. Fight Fragmentation With Bin Packing
GPU fragmentation = wasted money.
To avoid it:
- Use pod affinity/anti-affinity rules to group similar jobs.
- Run short jobs back-to-back on the same nodes.
- Add taints and tolerations to keep critical workloads from being pushed aside.
5. Use a Smarter Scheduler
Kubernetes' default scheduler is too generic for serious GPU workloads.
Try tools like:
- Volcano, Kube-batch, or KubeSlice – good for batch and ML jobs
- Karpenter – an autoscaler that supports GPU-aware decisions
- GPUScheduler – an NVIDIA experiment that knows about GPU topology
Tool Stack at a Glance
Tool | What It Does |
---|---|
Kubernetes | Manages your containers |
NVIDIA GPU Operator | Handles drivers + monitoring |
DCGM Exporter | Sends GPU stats to Prometheus |
Prometheus + Grafana | Shows graphs, sets alerts |
Helm | Makes deployments repeatable and clean |
Final Thoughts
GPUs are powerful—but expensive. If you’re using Kubernetes, you can’t afford to let them sit idle.
Don’t just trust the defaults. Start monitoring your usage. Set resource requests properly. Use smarter scheduling strategies. And most of all—treat GPU efficiency as a first-class problem.
Because every idle GPU hour? That’s money out the door.