Oracle to AWS: Practical Low-Downtime Database Migration Tactics

Deciding to run Oracle workloads on AWS is rarely about the software—it’s about scale, costs, and freeing engineering time from hardware maintenance. But Oracle migrations are notorious for their operational risk, especially around downtime. The real objective: keep systems online and users satisfied throughout the transition.

Why Move Oracle to AWS?

For workloads like high-throughput financial ledgers or CRM systems, Oracle delivers. But running on dedicated infrastructure leads to bottlenecks: rigid scaling, large up-front expenses, and slow disaster recovery.

Cloud-native Oracle (11g - 19c) buys:

Automatic failovers (Multi-AZ in RDS)
Storage elasticity (no more lengthy SAN expansions)
Consumption-based costs (stop overprovisioning)
Runbooks for disaster recovery become “click and wait”

Yet, one wrong move can impact users for hours—or worse, days.

Core Migration Obstacle: Downtime Control

Can you tolerate four hours offline during quarter close? Most can’t. True low-downtime migration hinges on pre-migration analysis, continuous change replication, and flawless cutover logistics. Skipping even a single dependency check risks a post-migration scramble.

Pre-Migration Groundwork: Environment and Workload Assessment

Start with facts:

# Find database version and size
SELECT * FROM v$version;
SELECT SUM(bytes)/1024/1024/1024 AS DB_SIZE_GB FROM dba_segments;

Map all interdependent schemas.
List out triggers, DB links, and scheduled jobs.
Benchmark: capture AWR reports for both typical and peak loads.
Measure site-to-AWS bandwidth with iPerf; note: high latency can cripple LOB transfers.

Critical gotcha: Legacy schema features? AWS SCT isn’t magic—hierarchical queries, custom datatypes, or complex PL/SQL packages may not port cleanly.

Determining the AWS Oracle Target

Three principal targets:

Service	Use-Case	Gotcha
Amazon RDS Oracle	Managed, automation-heavy	No access to OS, kernel-level ops
EC2 + Oracle	Full control	Maintenance overhead, licensing
AWS Outposts Oracle	On-prem/hybrid	Lead times, regional constraints

Often, RDS with Multi-AZ suffices. But for advanced customization (e.g., custom patching or filesystem tweaks), EC2 is a fallback.

Continuous Replication with AWS DMS: Tactics

AWS Database Migration Service (DMS) is the go-to for phased cutovers. Out of the box, DMS supports Oracle sources (min 11.2.0.4) and targets—both RDS and EC2. Workflow:

Initial Load:
– DMS creates target tables and migrates historical data.
Change Data Capture (CDC):
– Ongoing changes (INSERT/UPDATE/DELETE) replicated in near-real time.
Cutover:
– Pause application writes; confirm <0.5s replication lag; swing over connect strings.

Configuration Example:

{
  "cdcStartPosition": "2024-06-14T20:10:00Z",
  "targetTablePrepMode": "DO_NOTHING",
  "maxFullLoadSubTasks": 8
}

Side Note: DMS logs cryptic errors when table structures drift—watch for:

ERROR: Table definition mismatch, code '42080'

Oracle GoldenGate: Optional, Not Optional for Complex Cases

For ultra-low-latency CDC or multi-path replication, GoldenGate (>=19.1) is unmatched:

Active-active syncs (bi-directional) for split-brain tolerant apps
Cross-platform replications (on-prem Solaris to AWS Linux? Supported.)
Real-time transformation (e.g., encrypted number fields to hashed CHAR)

Real-world Example:
An insurance client, with 4TB of transactional data, maintained production loads using GoldenGate + DMS for rollback insurance. Final cutover clocked at <4 minutes downtime; all LOB columns intact, audit triggers retained.

Known Issue:
GoldenGate on AWS can saturate ENA-based network interfaces on bursty write loads. Mitigate via placement groups and monitor NetworkPacketsPerSecond.

Validation & Cutover Essentials

Use dbms_comparison.compare and AWS DMS validation tools post-migration.
Automate a pre-cutover checklist: connection string updates, failover scripts, and archive log clearance.
Collect application-level metrics (response times, error rates) ready for regression.
Prepare abort/rollback: keep source in read-only and verify client DNS caches—TTL settings can backfire here.

Execution: The Cutover Moment

Procedure:

Quiesce source DB—ALTER DATABASE CLOSE NORMAL; (or equivalent for logical standby if used).
Confirm DMS/GoldenGate lag is zero (dms describe-replication-tasks).
Update application configs/secrets (prefer CI/CD managed deployment, not ad hoc edits).
Flip production traffic. Monitor CloudWatch and RDS enhanced monitoring for spikes.

Downtime Window: In practiced hands: 2–10 minutes.
If you over-tuned CDC batch sizes in DMS, you may see lag spike at cutover—recommend testing at prod load scale.

Tips & Caveats from Production Migrations

Bandwidth matters. Favor AWS Direct Connect for bulk transfers >1 TB; VPN throttles can choke full loads.
Automate runbooks (Ansible/Terraform) to eliminate human cutover errors.
Security parity: replicate Oracle Roles/Grants; RDS IAM authentication can simplify password policy management.
Watch for Oracle sequence drift in active-active scenarios. Temporarily double sequence increments pre-cutover.

Summary

Not all Oracle-to-AWS migrations are equal. With the right mix of pre-migration mapping, DMS/GoldenGate for CDC, and automation at cutover, downtime shrinks to minutes. But edge-cases—LOBs, proprietary datatypes, overloaded CDC—require both vigilance and tooling flexibility.

No migration is flawless, but with these approaches, surprises will be operational, not catastrophic.

Practical Question:
Ever encountered ORA-28040 incompatibility errors post-migration? Scenario-specific solutions are welcome below.

Oracle To Aws