Blue-Green Deployments on AWS EKS: Reliable Zero-Downtime Release Pattern
Production rollouts routinely put availability at risk. In user-facing SaaS environments, even sub-minute interruptions during a deployment are unacceptable. The Kubernetes rolling update strategy (Deployment.spec.strategy.type: RollingUpdate
) sometimes falters—liveness probe issues, delayed startup, or cache invalidation can all cause user-facing hiccups. In regulated industries or fintech, this is a non-starter.
Blue-green deployments eliminate the problem by decoupling release from in-place modification. With AWS EKS as the foundation, this pattern is both clean and infrastructure-agnostic. Here’s a practical implementation, including caveats the official docs rarely mention.
Blue-Green Model (Kubernetes, EKS)
Two parallel environments are mandatory:
- Blue: Current production version (“known good”).
- Green: Candidate version (new release).
Routing is exclusively handled at the Service (or Ingress) layer. No in-place pod replacement. Cutover can be atomically controlled—critical in distributed or stateful workloads.
Visual:
[ Service ]
|
-----------+-----------
| |
[Blue pods] [Green pods]
(v1, prod) (v2, candidate)
Switching traffic is as simple as updating a label selector.
Prerequisites
- AWS account with EKS 1.27+ cluster (EKS control plane and worker nodes operational)
kubectl
v1.27+ configured (aws eks update-kubeconfig --cluster <name>
)- IAM permissions for deployments, services, and optional Route53 changes
- Familiarity with Kubernetes Deployment/Service YAML
Note: If using AWS-provisioned LoadBalancers, account for Service-level delays due to subnets or SecurityGroups misconfigurations. Initial LB provisioning can take up to 3 min.
Step 1: Dual Deployments
Suppose the current release is v1. To stage a v2 upgrade:
# myapp-blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
track: blue
template:
metadata:
labels:
app: myapp
track: blue
spec:
containers:
- name: myapp
image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/myapp:v1
resources:
requests:
memory: "128Mi"
cpu: "100m"
ports:
- containerPort: 8080
# myapp-green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
track: green
template:
metadata:
labels:
app: myapp
track: green
spec:
containers:
- name: myapp
image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/myapp:v2
env:
- name: FEATURE_FLAG
value: "true"
ports:
- containerPort: 8080
kubectl apply -f myapp-blue-deployment.yaml
kubectl apply -f myapp-green-deployment.yaml
Deployments exist side by side, each with distinct label: track: blue | green
.
Step 2: Service Routing (Atomic Switch)
A single Service object routes traffic:
apiVersion: v1
kind: Service
metadata:
name: myapp-svc
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
selector:
app: myapp
track: blue
Initial routing is to blue.
Wait for the ELB (inspect with kubectl get svc myapp-svc -w
). Expect initial external load balancer provisioning to occasionally hang or return 503s until health checks are stable.
Testing green:
Port-forward to the green pods for validation:
kubectl port-forward deployment/myapp-green 9000:8080
or expose a temporary Service if ingress testing is required.
Step 3: Cutover
Switch all traffic with one patch
:
kubectl patch svc myapp-svc -p '{"spec": {"selector": {"app": "myapp", "track": "green"}}}'
Effect is nearly instantaneous; all requests to the load balancer now route to v2. Observe via pod logs, e.g.:
kubectl logs deployment/myapp-green -c myapp | grep "Connection from"
Note: Some ALBs cache target health for 20–30 seconds. In high-traffic scenarios, a brief mixing of old/new may occur as connections drain naturally. No perfect solution; for ultra-low-latency cutover, move traffic at the DNS or ALB Target Group level instead.
Step 4: Validate and Roll Back (If Necessary)
Monitor SLOs. If the green deployment degrades, revert routing as before:
kubectl patch svc myapp-svc -p '{"spec": {"selector": {"app": "myapp", "track": "blue"}}}'
Pods never die; recovery is sub-second.
Cleanup:
kubectl delete deployment myapp-blue
—but keep blue running for a “hot standby” period, especially after major schema changes.
Advanced: DNS-Based Switchover
If latency or third-party dependency dictates, consider moving the cutover to Route53:
- Assign distinct ELBs to
myapp-blue
andmyapp-green
Service objects. - Use weighted or failover Route53 records for traffic management.
- TTLs < 60s work, but actual client DNS caching may introduce lag.
This approach increases infra overhead, but decouples routing entirely from Kubernetes service logic.
Automation and CI/CD
Automate the process in CodePipeline, Argo CD, or Jenkins:
- Template manifests via Helm; include deterministic selectors (
track: blue|green
). - Run automated canary or smoke tests on green environment before cutover.
- Integrate manual or automated approval stages before switching Service selectors.
- Always log the cutover event and the result (auditability matters for regulated pipelines).
Non-obvious tip: Set up Slack or SNS notifications triggered by cutover events, especially in high-frequency deploy shops.
Known Pitfalls
- Watch for lingering zombie pods when scaling down; ensure readiness/liveness probes are strictly enforced on both blue and green.
- Cloud load balancers have propagation lag—don’t attempt to guarantee a hard 0s switch unless using direct client-side feature flags or DNS SRV records.
- If using stateful or session-aware applications, session stickiness (especially in AWS Classic ELB) can send returning users to old pods. ALB with cookie-based stickiness can mitigate.
Summary table:
Step | Command/Config | Outcome |
---|---|---|
Blue up | kubectl apply -f myapp-blue-deployment.yaml | v1 pods running |
Green up | kubectl apply -f myapp-green-deployment.yaml | v2 pods running (idle) |
Route | Service selector track: blue | All traffic to blue |
Test | Port-forward or temp Service to green | Green validated, not public |
Switch | kubectl patch svc ... "track": "green" | Traffic moves to green |
Rollback | kubectl patch svc ... "track": "blue" | Instantly revert to blue |
Cleanup | kubectl delete deployment myapp-blue | Only green active |
Note: No deployment strategy is truly “one size fits all.” Evaluate rollout mechanics versus team supportability and operational complexity. Blue-green on EKS is robust—but not magic.
Got a stateful workload or critical 24/7 API? Consider layering feature flags, or hybrid blue-green/canary with traffic splitting (e.g. via Istio's VirtualService) for even finer granularity.