Invisible Failures

Invisible Failures

Reading time1 min
#devops#infrastructure#terraform#drift detection

The Illusion of Stability

Everything looks fine. You run terraform plan—clean. No changes. No red flags.

But that sense of control? It might be a lie.

Terraform’s plan command only compares your config files to the state file. It doesn’t actually check what’s really out there in your cloud environment.

That gap? That’s where infrastructure drift sneaks in.

Small tweaks in the cloud console. One-off fixes during an outage. Scripts that live outside Terraform’s world. Over time, your live environment drifts away from what your code says it should be.

And Terraform? It stays quiet. Because as far as it knows, everything is just fine.

But it’s not. And the impact can be serious.


What Drift Looks Like in Real Life

Let’s talk about what happens when drift slips through the cracks.

Case 1: MoneyMakers Inc. — A Breach Hiding in Plain Sight

A fintech team thought their AWS setup was locked down tight. Terraform handled it all—VPCs, subnets, security groups.

Then a junior engineer made a tiny change in the AWS Console. They opened up a port to test something and forgot to close it. Terraform didn’t catch it, because the code hadn’t changed.

Weeks went by. Then, one Monday morning: alert fatigue gave way to real alarms. Someone had found that open port. Personal data from 10,000 users was exposed.

What it cost them:

  • $1M in cleanup and legal fees
  • Emergency drift detection tooling (too late)
  • Damaged trust inside and outside the company

Case 2: RetailRevamp LLC — Performance Problems That Weren’t in Code

This team used Terraform and Kubernetes to manage microservices. But one late night, a developer changed autoscaling settings manually in the K8s dashboard.

It worked—for a while.

Then a major campaign hit. Traffic spiked. The infrastructure tried to scale, but Terraform’s expectations clashed with reality. Nodes spun up and shut down unpredictably.

The fallout:

  • $300K in lost sales over two days
  • Engineers stuck in damage control
  • Angry customers. A competitive market. Not great timing.

Why terraform plan Isn’t Telling the Whole Story

Here’s the deal.

When you run terraform plan, you’re asking, “If I apply this code, what changes?”

But Terraform is checking the code against its state file—not against what’s actually running in the cloud.

If someone tweaks a resource manually, or a third-party tool updates settings behind the scenes, Terraform doesn’t know. Not unless:

  • You run a terraform refresh
  • And your code now conflicts with what’s live

Otherwise? Silence.

It’s like driving with a fogged-up mirror and hoping the road behind you is still clear.


Build a Workflow That Fights Drift

Drift detection shouldn’t be a post-mortem. It needs to be part of your day-to-day.

Here’s how to stay ahead of it.

1. Automate Your Drift Checks

Make drift detection part of your CI/CD pipeline:

  • Terraform Refresh + Diff: Automate periodic terraform refresh runs and compare changes
  • AWS Config: Tracks changes across your AWS environment
  • Third-Party Tools:
    • Bridgecrew, Checkov: Compare your Terraform code to actual cloud resources
    • Infracost, CloudQuery: Help tie drift to costs and security

2. Use GitOps and Immutable Deployments

Tools like ArgoCD or Flux keep your deployed state in sync with your repo.

If something drifts, they’ll:

  • Alert you
  • Or auto-fix it by redeploying the desired state

Either way, you stay in control.


Quick Checks You Can Start Using Today

Sometimes, small steps catch big issues. Here are a couple of practical checks.

✅ Double-Check Security Groups from the CLI

aws ec2 describe-security-groups --group-ids sg-12345678 --query 'SecurityGroups[*].IpPermissions[]'

This shows you exactly what’s exposed—no matter what Terraform thinks.

✅ Lock Down S3 Buckets in Code

resource "aws_s3_bucket" "my_private_bucket" {
  bucket = "my-secure-bucket"
  acl    = "private"

  block_public_acls       = true
  ignore_public_acls      = true
  block_public_policy     = true
  restrict_public_buckets = true
}

Even if someone makes a manual change, this config keeps your bucket private.


The Bottom Line

Drift isn’t rare. It’s inevitable.

Depending only on terraform plan is like assuming your house is safe just because the doors used to be locked.

Real-world infrastructure is messy. People make changes. Tools go rogue. Stuff breaks.

A solid drift detection strategy isn’t about paranoia—it’s about visibility. It gives you a shot at catching the silent problems before they turn into costly outages.

So don’t wait for the breach, the performance dip, or the angry phone call.

Start looking for drift before it finds you.