How to Strategically Migrate Complex AWS Architectures to Azure Without Disruption
Most “migration checklists” skip real-world complexity. Here’s a practical approach, tuned for engineers facing brownfield AWS-to-Azure workloads with actual downtime risk, layered permissions, incompatible managed services, and interdependent release trains.
1. Deep Inventory and Architectural Mapping
Don’t trust CMDB exports alone. Shadow infrastructure, orphaned S3 buckets, Lambda invocations from legacy billing apps—these are lurking in almost every account. Before migration, run a full discovery:
- EC2 inventory: Instance types, attached EBS, launch templates, immutable AMIs.
- Serverless: Lambda (nodejs14.x? python3.8?) with triggers from S3, SNS, EventBridge.
- Data plane: S3 buckets (cross-region replication?), RDS clusters (Aurora flavors, read replicas), DynamoDB throughput configs.
- Networking: Custom VPC CIDRs, multi-account peering, VPN attachments, security group layer overlaps.
- IAM: Managed vs inline policies, cross-account role assumptions (see:
packed_policy_size_limit
errors). - Monitoring/Telemetry: CloudWatch log groups with retention overrides, custom dashboards, alarm actions.
Real scenario:
One migration hit a wall on missed IAM trust relationships. Lambda functions assumed roles from a centralized “identity” account; after move, several broke due to Azure AD object mapping miss. Lesson: enumerate every trust policy, especially on cross-account or federated principals.
2. Service Mapping—But Mind the Non-Parity
No cloud vendor offers direct drop-in replacements for every service; embracing this early avoids costly rework.
AWS | Azure | Notable Differences / Gotchas |
---|---|---|
EC2 | Virtual Machines | VM sizes and disk types differ; Managed Disks naming |
S3 | Blob Storage | Flat namespace, different consistency semantics, no bucket policies |
Lambda | Functions | Consumption plan scaling limits; Durable timers work differently |
RDS (Postgres/MySQL) | Azure Database for ... | Minor version support lags (e.g., Azure Postgres often 1-2 patch revisions behind AWS) |
DynamoDB | Cosmos DB (Table API) | Indexing, throughput, TTL syntax variants |
CloudFormation | ARM, Bicep, Terraform | Drift detection, stack update behaviors not 1:1 |
When matching features, look for gaps (e.g., VPC endpoint equivalents, S3 event triggers, dedicated tenancy). Some will require net-new Azure design—don’t fake parity where it’s missing.
3. Azure Target Architecture: Don’t Lift-and-Shift Without Optimization
Most lift-and-shift attempts end up legacy-in-the-cloud. Use migration as opportunity for modernization:
- Define environments using Infrastructure-as-Code: Prefer Bicep or Terraform (azurerm_provider >=3.70) for idempotent Azure deployments. Avoid ARM unless you need exact feature parity with Azure Portal.
- Networking: Model VNets, subnets, NSGs in code. Plan CIDR carefully—Azure doesn’t allow all AWS ranges.
- Resource organization: Map AWS tags to Azure Resource Groups/Tags. Pro tip: Use tag policies in Azure Policy for compliance enforcement post-migration.
- Automation: Consider Azure DevOps YAML pipelines, or (if keeping parity with existing pipelines) adapt your GitHub Actions or Jenkinsfile steps with Azure CLI tasks.
Non-obvious tip:
Azure’s “soft delete” on blob storage (enabled by default) can wreck third-party data migration tools that expect immediate delete. Test!
4. Data Migration: Strategy Drives Downtime
“Just replicate and cut over” isn’t viable for TB-scale, high-volume workloads. Need real tooling, deliberate coordination.
- Databases: Azure Database Migration Service (DMS), but beware: for Postgres, the source instance can’t be on a version that DMS doesn’t support. Delta-capture is sensitive to network hiccups.
- Files/blobs:
AzCopy
(azcopy sync
for incremental), or third-party solutions for S3-compatible to Blob Storage. Watch multipart upload limitations. - Cutover windows: Reduce final switch delta with multi-phase syncs (full copy → incremental diff → read-only → cutover).
- Legacy data: For rarely-accessed “cold” objects, consider archive tiers on Blob Storage during migration—cost savings, but 15hr rehydration.
Real command used in outage-free moves:
azcopy sync 'https://s3.amazonaws.com/bucket/' 'https://<account>.blob.core.windows.net/container/' --recursive=true --delete-destination=true
5. Networking & Security: Old Gotchas, New Names
- Subnet design: Azure doesn’t allow overlapping subnets inside a VNet. Double-check before porting AWS VPC plans.
- Firewall rules: Translate AWS Security Groups to NSGs. Note Azure’s “deny by default” inbound, not always matching AWS sg-id behavior.
- IAM → Azure RBAC: Manual translation needed; Azure AD groups rarely map 1:1 to AWS IAM constructs—especially for AssumeRole policies.
- Hybrid connectivity: ExpressRoute = Direct Connect, but setup/latency/ordering differs. Site-to-site VPNs exist, but remember Azure only supports route-based VPNs.
Known issue:
Migrating direct-attach EIPs? Static public IPs in Azure aren’t attached to VM lifecycle by default—must be provisioned explicitly.
6. Automation & Test Before You Cut
Migrations fail mostly due to misconfiguration, not fundamental incompatibility.
- Use CI/CD: Create new build/test pipelines on Azure (Azure DevOps, GitHub Actions, etc).
- Automated validation: Run smoke tests after each deployment, ideally in a pre-production slot with real data shadow copies.
- Dry runs: Maintain both clouds live during trial runs; compare logs, error rates, and actual application outputs for divergence.
Typical test log from a real shadow launch:
[app-backend] ERROR: Unable to open DB connection: FATAL: no pg_hba.conf entry for host "10.240.7.112"
Missed firewall entry—surface such issues early.
7. Gradual Cutover: Blue-Green, DNS, and Rollback Plans
Critical systems demand a phased approach.
- Blue-green deployment: Stand up full Azure parallel environment. Test with actual traffic (synthetic or canary clients) before routing production.
- Traffic shifting: Use Azure Front Door, Traffic Manager, or basic Azure DNS with TTL <60s. Shift traffic gradually, monitor impact.
- Rollback: Don’t burn the bridge; keep AWS infra warm until Azure stabilizes in production.
Side note:
Azure Traffic Manager probes can be overly sensitive to 4xx responses—set custom health endpoints to avoid false failovers.
8. Monitoring—Telemetry Shift Is a Change Project
- Azure Monitor & App Insights: Replace/rebuild CloudWatch dashboards. Watch for missing custom metrics; port log queries and alerts.
- Baseline comparison: Keep a week of AWS latency/error baselines to define new alert thresholds.
- Operations hand-off: Train ops and SRE teams on Azure’s console, CLI, and REST APIs.
- Post-migration audit: Look for “zombie” resources left in AWS—ramp down gradually, don’t nuke.
Summary Table: Migration Battle Card
Step | Key Actions/Reminders |
---|---|
Inventory | Full-stack mapping; catch shadow roles, cross-account trusts |
Service mapping | Document every incompatible feature, plan workaround |
Target design | Define infra-as-code; enforce RBAC/NSG via Azure tooling |
Data migration | Use live replication; incremental + cutover step |
Network/security | CIDR planning, explicit NSGs, RBAC not IAM |
Automation/testing | End-to-end smoke, log diffing, parallel cloud deployment |
Cutover | Blue-green, DNS shift, rollback plan in place |
Monitoring | Port log/metric pipelines and team training |
Final Notes
No two migrations are identical. Expect drift. The tools—ARM, DMS, AzCopy—work, but have quirks the glossy docs omit. The real challenge isn’t technical limits but hidden dependencies and mismatch in cloud philosophies.
Whenever in doubt, isolate workloads, migrate incrementally, and keep rollback paths viable. Not every edge case gets documented—sometimes a late Friday deploy surfaces the odd S3 lifecycle rule no one knew about.
If your migration scenario includes nonstandard services (e.g., custom AWS Marketplace AMIs, Transit Gateway, Outposts), plan distinct waves and validate vendor support for Azure equivalents first.
Questions or unique stalling points? Comment below. Real-world troubleshooting always welcome.