Azure Blob Storage to Google Cloud Storage: Reliable Data Transfer Patterns
Blob storage to bucket—sounds trivial, isn’t. Whether you’re consolidating analytics pipelines, migrating a legacy backup, or satisfying regulatory redundancy, moving terabytes from Microsoft Azure to Google Cloud Platform (GCP) involves a few architectural calls. Network throughput, integrity checks, and tool selection can make or break the experience.
Below: practical guidance, command references, and a few edge-case notes for engineers tasked with actual cloud-to-cloud movement—not proof-of-concept demos.
Use Cases for Azure-to-GCP Transfer
- Cost–performance optimization (e.g., shifting cold data to GCP's Nearline tier)
- Migration of workloads leveraging GCP-native analytics services like BigQuery
- Cross-cloud DR (Disaster Recovery) or geographic sovereignty requirements
- Decommissioning legacy Azure dependencies
Core Prerequisites
Resource | Azure | GCP |
---|---|---|
Storage | Blob Storage Account (Hot/Cool) | Storage Bucket |
Tooling | Azure CLI ≥2.36, AzCopy ≥10.19 | gsutil (Google SDK ≥429.0.0) |
IAM | Reader or Storage Blob Data Reader | Storage Admin |
Credentials | CLI login, service principals | gcloud auth login/service acct |
Note: If your compliance regime restricts local downloads, see direct transfer notes below.
Option 1: Indirect (Local) Transfer
Classic, robust, but potentially slow for large datasets.
1. Download blobs from Azure
# Basic iterative
az storage blob download-batch \
--account-name mystorageacct \
-s my-container \
-d ./azure-export
--auth-mode login recommended for automation.
--pattern '.parquet' for partial exports.*
2. Upload to GCS
gsutil -m cp -r ./azure-export/* gs://my-gcp-bucket/
-m enables parallelism, typically essential for throughput.
Caveat: Space and bandwidth become constraints quickly for >500GB.
Option 2: Proxy VM in Cloud
Running out of local disk, or need speed? Use a VM (in Azure, GCP, or a neutral cloud) positioned for low latency to both endpoints.
Workflow:
- Spin up a suitably sized VM (e.g., n2-standard-4 on GCP; Ubuntu 22.04 LTS)
- Install both Azure CLI and Google SDK (
apt install azure-cli google-cloud-sdk
) - Mount a large persistent disk (SSD for write throughput)
- Transfer with AzCopy for Azure, then gsutil to GCP. Sometimes, it’s worth using
azcopy sync
with the--from-to=BlobLocal
/--from-to=LocalBlob
options.
Sample AzCopy + gsutil Sequence:
azcopy copy "https://mystorageacct.blob.core.windows.net/mycontainer/*?SAS_TOKEN" "/mnt/transfer" --recursive --log-level=INFO
gsutil -o 'GSUtil:parallel_thread_count=32' -m cp -r /mnt/transfer/* gs://my-gcp-bucket/
Known issue: AzCopy retries can seem silent—review /mnt/transfer/azcopy.log
for silent failures.
Option 3: Cloud-Managed Services
For petabyte-scale or compliance-bound workloads.
Google Storage Transfer Service (STS) can directly pull from public HTTP endpoints or S3, but does not natively support Azure sources as of 2024.
Alternative:
- Register Azure Blob as S3-compatible endpoint (via Azure Data Lake Gen2), but this introduces complexity—and often cost.
- For massive cold archives, consider Google Transfer Appliance (shipped hardware), but plan 4-12 weeks ahead.
Automation: Minimal Bash Approach
Even in 2024, glue scripts win for atomic, repeatable jobs.
#!/usr/bin/env bash
set -euo pipefail
SRC="azure-container"
DST="./buffer_dir"
BUCKET="gs://org-bucket"
ACCOUNT="azstorage"
THREADS=32
az storage blob download-batch -s "$SRC" -d "$DST" --account-name "$ACCOUNT"
gsutil -m -o "GSUtil:parallel_thread_count=$THREADS" cp -r "$DST"/* "$BUCKET/"
rm -rf -- "$DST"
Not perfect: neither CLI tool preserves blob tier, nor custom metadata by default. Write your own mappings or handle metadata via post-upload scripts if necessary.
Data Validation and Integrity
Blind trust in CLI exit codes is ill-advised.
- Use checksums (MD5/SHA256) pre- and post-migration. See
az storage blob show --name <blob> --container <container>
for blob hashes; for GCS, trygsutil hash
. - For critical data: checksum validation loop post-upload.
Example:
# Hash local file
sha256sum azure-export/* > checksums.txt
# After upload, re-download one blob and verify
gsutil cp gs://my-gcp-bucket/sample.csv /tmp/sample_restored.csv
sha256sum -c checksums.txt
Throughput Tuning & Common Bottlenecks
- AzCopy/gsutil: both support parallelism; tune with --parallel-level (AzCopy) and -m/-o (gsutil).
- Azure throttling: If you see errors like:
…verify SAS token expiry, adjust --block-size for larger objects.ERROR: Server failed to authenticate the request. Please refer to the information in the www-authenticate header.
- GCP ingress: VPC Service Controls may block or throttle unexpectedly—disable firewall rules for initial tests.
Tip: Compressed Archive Transfer
Compress small files into fewer, larger archives before moving (e.g., tar
, zstd
).
Fewer files → fewer API calls → faster batch operations.
Closing Notes
Cost models are not symmetrical: GCP ingress is free, Azure egress is not. Budget accordingly.
Cloud-to-cloud transfer remains a CLI-heavy task—a fact not lost on teams scripting this process weekly. No vendor-native atomic transfer yet; expect to script, validate, retry.
For fully automated, repeatable flows, integrate into existing CI/CD (e.g., trigger via GitHub Actions, or as a Terraform external resource), but that’s another topic.
Alternative exists: Several vendors (e.g., CloudSync, Resilio) offer managed UIs—some enterprises prefer operational control over convenience.
Gotcha: Blob-level metadata is not always preserved. For datasets where metadata is critical (e.g., scientific archives), extract metadata to JSON prior to transfer and reapply after migration using tool APIs.