Step-by-Step Guide: AWS RDS to Google Cloud SQL Migration
Neither compliance audits nor cost pressure justifies vendor lock-in. Replicating a production database from AWS RDS to Google Cloud SQL is feasible with careful planning and observation of edge cases—especially for organizations seeking to align with internal GCP-first roadmaps, integrate BigQuery/Vertex AI, or simply reduce OpEx.
Decision Table: Why Move Workloads?
Rationale | Notes |
---|---|
GCP-native integration (Security, IAM) | Service Account-based auth, Private Service Connect |
BigQuery pipelines, ML/AI workloads | GCP’s pub/sub, batch ETL simplify data pipelines |
Pricing volatility on AWS | Sustained use discounts on GCP may improve TCO |
Geo-latency or compliance | Data residency, specific GCP regions |
Prerequisites
- Source DB: Confirm engine (MySQL/PostgreSQL/SQL Server), version (e.g., MySQL 8.0.32), max schema size.
- Target: Pre-provision Google Cloud SQL instance, with storage >= source, version parity or superset. Prefer SSD; not all features (e.g., certain extensions) are available.
- Networking: Temporary inbound firewall rules. If using VPC Peering or Interconnect, verify subnet overlap and route propagation.
- Migration Tools:
gcloud
CLI (v465+), SQL clients (mysqldump
,pg_dump
,psql
). For zero-downtime: Google Database Migration Service (DMS), with IAM roles configured (roles/cloudsql.admin
,roles/datamigration.admin
).
Migration Strategies
A. Google Database Migration Service (DMS) — Recommended for Minimal Downtime
- Supports MySQL 5.6–8.0, PostgreSQL 9.6+ (with caveats around extension support).
- Online mode: continuous logical replication; supports cutover testing.
B. Native Dump & Restore — Simpler, but Offline
- Use
mysqldump
/mysql
(MySQL) orpg_dump
/pg_restore
(PostgreSQL). - Suitable for sub-100 GB datasets, or where downtime is tolerable.
Gotcha: SQL Server not fully supported by Cloud SQL; if using SQL Server, expect additional constraints.
Source Prep
- Enable binary logging (MySQL:
binlog_format=ROW
) in RDS parameter group. May require reboot. - Provision replication user with at least
REPLICATION SLAVE
privilege.
Example:
CREATE USER 'replicator'@'%' IDENTIFIED BY 'REDACTEDpw';
GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%';
FLUSH PRIVILEGES;
For PostgreSQL: set rds.logical_replication=1
and grant REPLICATION
role/user.
Snapshot the RDS instance first—rollbacks can be messy.
Step 1: Establish Connectivity
- Confirm network reachability: open temporary ingress (TCP 3306/5432) from Cloud SQL instance or DMS migration job workers to RDS endpoint/IP. For production, avoid public IP if possible.
- Validate with
telnet <rds-endpoint> 3306
.
Step 2: Migration Pipeline
Option 1: Online Migration via DMS
-
Create Migration Job from GCP Console or
gcloud
:- Select Source: RDS endpoint, engine/version.
- Select Destination: Cloud SQL instance.
- Provide replication user creds.
-
Validation Phase
- DMS will perform connectivity and privilege checks; error example:
Fix privilege issues before proceeding.ERROR 1045 (28000): Access denied for user 'replicator'...
- DMS will perform connectivity and privilege checks; error example:
-
Initial Sync + Change Replication
- DMS copies base data, then streams changes (logical replication).
- Monitor lag and throughput: GCP Console → DMS job details.
-
Cutover
- Application in read-only. Confirm zero replica lag.
- Promote Cloud SQL; update endpoint in application config.
Known issue: Large tables (>1TB) may hit DMS timeouts or performance bottlenecks. In such cases, staged dump/restore for large static tables, DMS only for critical transactional ones.
Option 2: Dump & Restore
MySQL Example:
mysqldump --set-gtid-purged=OFF \
--single-transaction \
--routines --events --triggers \
--host=<RDS-ENDPOINT> \
--user=admin --password='<PW>' \
mydb > mydb.sql
gcloud sql import sql $CLOUDSQL_INSTANCE mydb.sql --database=mydb
PostgreSQL Example:
pg_dump -Fc \
--host=<RDS-ENDPOINT> \
--username=postgres \
mydb > mydb.dump
gcloud sql import csv $CLOUDSQL_INSTANCE mydb.dump --database=mydb
*Note: Cloud SQL import supports limited formats; consult docs.
Step 3: Data Consistency Check
Always verify row count and checksums:
SELECT COUNT(*) FROM important_table;
For very large datasets, check with sampling or hash aggregation (e.g., SELECT MD5(STRING_AGG(col, ',')) ...
).
Validate all objects: foreign keys, triggers, stored routines. Occasionally, certain triggers or procedures may require manual fix-up due to engine differences or permissions.
Step 4: Application Cutover
- Freeze writes to RDS.
- Finalize sync (DMS: wait for target lag = 0; Dump/Restore: ensure no app writes since start).
- Update application configs (
DATABASE_URL
, dotenv, Secret Manager refs). - Monitor logs for
ERROR 1045
, connection limit issues, or increased query latency.
Tips and Caveats
- Dry-run on Staging: Always simulate on a copy with prod-like size and topology.
- Throughput bottlenecks: For >500GB, use high-throughput network links (e.g., VPN/Direct Interconnect); consider parallel table exports.
- Extension support: Cloud SQL restricts some PostgreSQL extensions—test compatibility (e.g.,
postgis
needs extra setup). - TLS enforcement: Google Cloud SQL generally requires SSL connections; update application drivers accordingly.
- Timeouts/errors: Cloud SQL import jobs have hard limits (near 24 hours in some regions)—monitor duration and chunk large dumps.
Conclusion
Vendor-neutral database portability is achievable with either DMS for online replication or classic dump/restore for simpler cases. Set up robust validations—row counts, schema diffs, custom health checks—before updating production configs. Don’t skip the dry run. And, as always, be ready to rollback fast if post-migration latency or consistency issues emerge.
Reference Logs from the Field:
[2024-03-19T11:04:05Z] DMS_INITIAL_COPY_COMPLETE: Initial sync succeeded
[2024-03-19T12:13:52Z] DMS_REPLICATION_LAG_WARNING: Replica lag 15s on table 'orders'
[2024-03-19T12:20:51Z] DMS_CUTOVER_RECOMMENDED: Replica lag <1s, OK to cutover
For more, see GCP DMS documentation or ping your infrastructure team’s #cloud-sql-migration Slack channel.
Note: Some teams use native replication or dual-write proxy designs for zero-downtime cutovers. These introduce operational complexity outside typical migration scopes.