Aws Eks How To

Aws Eks How To

Reading time1 min
#Cloud#DevOps#Kubernetes#EKS#BlueGreenDeployment#AWS

Blue-Green Deployments on AWS EKS: Reliable Zero-Downtime Release Pattern

Production rollouts routinely put availability at risk. In user-facing SaaS environments, even sub-minute interruptions during a deployment are unacceptable. The Kubernetes rolling update strategy (Deployment.spec.strategy.type: RollingUpdate) sometimes falters—liveness probe issues, delayed startup, or cache invalidation can all cause user-facing hiccups. In regulated industries or fintech, this is a non-starter.

Blue-green deployments eliminate the problem by decoupling release from in-place modification. With AWS EKS as the foundation, this pattern is both clean and infrastructure-agnostic. Here’s a practical implementation, including caveats the official docs rarely mention.


Blue-Green Model (Kubernetes, EKS)

Two parallel environments are mandatory:

  • Blue: Current production version (“known good”).
  • Green: Candidate version (new release).

Routing is exclusively handled at the Service (or Ingress) layer. No in-place pod replacement. Cutover can be atomically controlled—critical in distributed or stateful workloads.

Visual:

[ Service ]
    |
-----------+-----------
|                      |
[Blue pods]      [Green pods]
(v1, prod)        (v2, candidate)

Switching traffic is as simple as updating a label selector.


Prerequisites

  • AWS account with EKS 1.27+ cluster (EKS control plane and worker nodes operational)
  • kubectl v1.27+ configured (aws eks update-kubeconfig --cluster <name>)
  • IAM permissions for deployments, services, and optional Route53 changes
  • Familiarity with Kubernetes Deployment/Service YAML

Note: If using AWS-provisioned LoadBalancers, account for Service-level delays due to subnets or SecurityGroups misconfigurations. Initial LB provisioning can take up to 3 min.


Step 1: Dual Deployments

Suppose the current release is v1. To stage a v2 upgrade:

# myapp-blue-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      track: blue
  template:
    metadata:
      labels:
        app: myapp
        track: blue
    spec:
      containers:
      - name: myapp
        image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/myapp:v1
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
        ports:
        - containerPort: 8080
# myapp-green-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
      track: green
  template:
    metadata:
      labels:
        app: myapp
        track: green
    spec:
      containers:
      - name: myapp
        image: 123456789012.dkr.ecr.us-west-2.amazonaws.com/myapp:v2
        env:
        - name: FEATURE_FLAG
          value: "true"
        ports:
        - containerPort: 8080
kubectl apply -f myapp-blue-deployment.yaml
kubectl apply -f myapp-green-deployment.yaml

Deployments exist side by side, each with distinct label: track: blue | green.


Step 2: Service Routing (Atomic Switch)

A single Service object routes traffic:

apiVersion: v1
kind: Service
metadata:
  name: myapp-svc
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 8080
  selector:
    app: myapp
    track: blue

Initial routing is to blue.

Wait for the ELB (inspect with kubectl get svc myapp-svc -w). Expect initial external load balancer provisioning to occasionally hang or return 503s until health checks are stable.


Testing green:

Port-forward to the green pods for validation:

kubectl port-forward deployment/myapp-green 9000:8080

or expose a temporary Service if ingress testing is required.


Step 3: Cutover

Switch all traffic with one patch:

kubectl patch svc myapp-svc -p '{"spec": {"selector": {"app": "myapp", "track": "green"}}}'

Effect is nearly instantaneous; all requests to the load balancer now route to v2. Observe via pod logs, e.g.:

kubectl logs deployment/myapp-green -c myapp | grep "Connection from"

Note: Some ALBs cache target health for 20–30 seconds. In high-traffic scenarios, a brief mixing of old/new may occur as connections drain naturally. No perfect solution; for ultra-low-latency cutover, move traffic at the DNS or ALB Target Group level instead.


Step 4: Validate and Roll Back (If Necessary)

Monitor SLOs. If the green deployment degrades, revert routing as before:

kubectl patch svc myapp-svc -p '{"spec": {"selector": {"app": "myapp", "track": "blue"}}}'

Pods never die; recovery is sub-second.

Cleanup:

kubectl delete deployment myapp-blue

—but keep blue running for a “hot standby” period, especially after major schema changes.


Advanced: DNS-Based Switchover

If latency or third-party dependency dictates, consider moving the cutover to Route53:

  • Assign distinct ELBs to myapp-blue and myapp-green Service objects.
  • Use weighted or failover Route53 records for traffic management.
  • TTLs < 60s work, but actual client DNS caching may introduce lag.

This approach increases infra overhead, but decouples routing entirely from Kubernetes service logic.


Automation and CI/CD

Automate the process in CodePipeline, Argo CD, or Jenkins:

  • Template manifests via Helm; include deterministic selectors (track: blue|green).
  • Run automated canary or smoke tests on green environment before cutover.
  • Integrate manual or automated approval stages before switching Service selectors.
  • Always log the cutover event and the result (auditability matters for regulated pipelines).

Non-obvious tip: Set up Slack or SNS notifications triggered by cutover events, especially in high-frequency deploy shops.


Known Pitfalls

  • Watch for lingering zombie pods when scaling down; ensure readiness/liveness probes are strictly enforced on both blue and green.
  • Cloud load balancers have propagation lag—don’t attempt to guarantee a hard 0s switch unless using direct client-side feature flags or DNS SRV records.
  • If using stateful or session-aware applications, session stickiness (especially in AWS Classic ELB) can send returning users to old pods. ALB with cookie-based stickiness can mitigate.

Summary table:

StepCommand/ConfigOutcome
Blue upkubectl apply -f myapp-blue-deployment.yamlv1 pods running
Green upkubectl apply -f myapp-green-deployment.yamlv2 pods running (idle)
RouteService selector track: blueAll traffic to blue
TestPort-forward or temp Service to greenGreen validated, not public
Switchkubectl patch svc ... "track": "green"Traffic moves to green
Rollbackkubectl patch svc ... "track": "blue"Instantly revert to blue
Cleanupkubectl delete deployment myapp-blueOnly green active

Note: No deployment strategy is truly “one size fits all.” Evaluate rollout mechanics versus team supportability and operational complexity. Blue-green on EKS is robust—but not magic.

Got a stateful workload or critical 24/7 API? Consider layering feature flags, or hybrid blue-green/canary with traffic splitting (e.g. via Istio's VirtualService) for even finer granularity.