Azure To Gcp

Azure To Gcp

Reading time1 min
#Cloud#Migration#AI#GCP#Azure#Kubernetes

Azure to GCP: Pragmatic Migration for Cloud Engineering Teams

Avoiding avoidable downtime and keeping costs contained—these are the two priorities that can turn an Azure-to-GCP migration from a theoretical exercise into a technical achievement. Many teams face the migration question forced by merger, product fit, or simply a desire to better leverage Google’s stream-processing and ML toolchain. Whatever the driver, the underlying engineering problem remains: How to move actual, running workloads—without business interruption.


Azure vs. GCP: Key Engineering Drivers

Any team considering migration should review core differences, not marketing claims. Here’s a fact-based shortlist:

  • ML/AI stack: Google Vertex AI, AutoML, and TPUs are fully managed, with smooth integration—Azure’s equivalents (e.g., Azure ML) often lag in ecosystem maturity.
  • Analytics at scale: BigQuery’s serverless columnar engine outperforms Synapse for “big scan” workloads.
  • Kubernetes: GKE is generally acknowledged to have faster rollout velocity and lower operational toil than AKS, especially post-1.22.
  • Pricing dynamics: GCP’s custom VM types and sustained use discounts genuinely impact long-term cost structure (as confirmed in real TCO).
  • Backbone network: GCP’s global load balancing can be a differentiator in transregional architectures.

Note: Most organizations don’t need every GCP feature, but missing a critical mapping results in expensive rework.


Step 1: Baseline, Inventory, and Stakeholder Buy-In

Inventory your assets—there’s no shortcut:

  • Pull a CSV export of your Azure resources: az resource list --output table > ./azure-inventory.csv
  • Classify: By workload type (stateless, stateful), regulatory zone, and uptime requirement.
  • Dependencies: Gather network diagrams to untangle service mesh, Azure-specific PaaS (e.g., Service Bus), and external integrations.

Few migrations stall on compute; most derail on untracked dependencies or hidden data gravity. Review all DNS, identity providers, and hardwired endpoints.

Define goals with engineering leadership: is this a “lift-and-shift” (minimize code changes), or do you want to re-platform critical workloads for scalability and cost (e.g., moving a VM-based API to Cloud Run)? Document this up front. Skimp here, regret later.


Step 2: Service and API Mapping

Not all services translate one-to-one from Azure to GCP. Avoid assuming feature parity.

Azure ServiceGCP EquivalentCaveats
Virtual Machines (VM)Compute EngineDisk types, GPU availability differs
Blob StorageCloud StorageLifecycle, inventory API differences
SQL DatabaseCloud SQL / SpannerSpanner is not a direct SQL substitute
AKS (Kubernetes)Google Kubernetes Engine (GKE)Ingress controllers, versioning
Azure FunctionsCloud FunctionsTimeout limits, triggers vary
Cosmos DBFirestore / BigtableConsistency, partitioning differences
Azure ADCloud Identity, IAMSAML/OIDC integration varies

Gotcha: Azure’s Managed Identity does not directly map to GCP—service accounts must be architected and legacy code may require patching.


Step 3: GCP Environment Construction

Standard approach:

  1. Project creation w/ billing constraints: Use gcloud projects create followed by explicit billing accounts link.
  2. IAM modeling: Establish custom roles, avoid Owner-wide permissions in production. Consider mapping existing RBAC patterns.
  3. Network topology: Replicate VNET structure (subnets, peering, VPN, firewall) using Terraform for repeatability. Sample snippet below:
resource "google_compute_network" "core" {
  name = "azure-migrated-net"
  auto_create_subnetworks = false
}
  1. API enablement: Automate with gcloud services enable compute.googleapis.com storage.googleapis.com ....
  2. Service Account hygiene: Create per-app service accounts. Rotate keys prior to cutover.

Note: Don’t assume Azure’s NSG rules match GCP’s VPC firewall semantics—test with staging firewalls ahead of time.


Step 4: Migrating Compute—VMs and Kubernetes Workloads

VM Hosting: Practical Methods

Approach: Export each VM disk as VHD, upload to Cloud Storage, import via gcloud compute images import.

Example Windows Server migration (tested with 2019 and 2022):

  • Azure:
    Export via portal or
    az vm export –n <vm-name> --resource-group <rg> --output vhd
  • GCP:
    Upload with gsutil cp, then:
gcloud compute images import legacy-sql-2019 \
  --source-file gs://my-bucket/vm-disk.vhd \
  --os windows-server-2019 \
  --timeout=6h

Known issue: Some Azure VMs with nested hypervisors fail import with the error:
FAILED_PRECONDITION: osconfig agent not available.

Trade-off: Recreating the image from scratch on GCP sometimes yields better network driver performance than importing existing disks.

Container Workloads

Images:
Push images from Azure Container Registry to Google Artifact Registry:

docker login azurecr.io
docker pull azurecr.io/myteam/app:v2
docker tag azurecr.io/myteam/app:v2 us-central1-docker.pkg.dev/myproj/myrepo/app:v2
docker push us-central1-docker.pkg.dev/myproj/myrepo/app:v2

Manifests:
Update storage class spec (standard-rwo in GKE ≠ managed-premium on AKS). Ingress? Re-write for GKE Ingress controller or test with NGINX controller as your bridge.

Helm charts may need patching for cloud-provider-specific annotations. Don’t forget to audit for embedded Azure resource URIs.


Step 5: Data and Database Migration

Here’s where most migrations hit risk.

Databases:
If running SQL Server/Azure SQL Database, use Database Migration Service (DMS):

  • Target Cloud SQL (MySQL/Postgres/SQL Server).
  • Set up source as read-replica if possible.
  • DMS configuration may require custom flags for collation or encryption (e.g., --enable_sqlserver_ae).

Real-World Example
Migrated 3TB Azure SQL DB to Cloud SQL 14. Observed that initial DMS full transfer took ~8 hours. Change data capture (CDC) during cutover applied ~15 minutes of delta.

Common error:
ERROR: peer authentication failed for user ‘migration_user’
—> Caused by role mapping mismatch. Verify with SELECT * FROM pg_roles;.

Blob Storage:
Storage Transfer Service recommended for >1TB. Parallelization is critical—set --max-worker-count to at least 10 for large buckets.

Tip: Validate ACL and signed URL expiration semantics; they are subtly different from Azure to GCP.


Step 6: Validation and Pre-Cutover Testing

  • End-to-end smoke tests; automate via your CI/CD (e.g., GitHub Actions, Cloud Build).
  • Benchmark key workloads and review ps and kernel logs for errors like:
kernel: [    4.712395] hv_balloon: Max. dynamic memory size: 13056 MB
  • Validate all external APIs—especially OAuth redirect URIs—against GCP endpoints.

Some failures will only emerge under production load. Plan a full blue-green or shadow deployment if possible.


Step 7: Cutover, Observability, and Post-Launch Optimization

  1. DNS switch: Reduce TTL to 60 seconds 48 hours prior.
  2. Production switchover: Use maintenance windows for major service transitions.
  3. Instrumentation: Enable Cloud Monitoring (Stackdriver) and set up alerting on error rates, latency, and spending spikes.
  4. Review IAM: Identify any lingering global roles. Least privilege, always.
  5. Cost controls: Activate Budgets and Recommendations API.
    Known cost pitfall: orphaned disks and reserved IPs driving up untracked spend.

Advanced Tips

  • Use Terraform or Google’s Deployment Manager for repeatable infra creation; version configs in your source repo.
  • For hybrid workloads, evaluate Anthos. Not always straightforward—egress charges and ops complexity can negate benefits for some teams.
  • Security: Disable default service account token creation for all GKE namespaces (automountServiceAccountToken: false) unless explicitly needed.

Not a Silver Bullet

No migration is perfect on the first pass. Expect at least one major adjustment post-migration—often in IAM or DNS. Some scripts and error patterns (e.g., transient 429: RESOURCE_EXHAUSTED during DMS cutover) are unavoidable in larger moves.


The process above emphasizes standard control points, but leaves room for engineering judgement. New features crop up; no checklist stays perfectly current. For SQL workloads, a skip—the downtime window during CDC sync is never exactly what the docs claim. Test everything.

Integrate lessons learned into your runbooks—because odds are, this won’t be your last cloud migration.


For code samples, nuanced errors, or questions about mixed-platform networking, bring them to the comments or DM.