Mongodb To Documentdb

Mongodb To Documentdb

Reading time1 min
#Cloud#Databases#Migration#MongoDB#DocumentDB#AWS

Seamless Migration: Transitioning MongoDB Workloads to Amazon DocumentDB Without Downtime

Enterprises running MongoDB frequently encounter operational overhead—manual failovers, backup management, scaling work, regular security patching. Teams looking for AWS-native integration often see operational and security advantages in migrating to Amazon DocumentDB, Amazon’s managed MongoDB-compatible service. However, the migration path is not a straight swap: API support, replication topologies, and feature sets differ, introducing complications during live cutover.

Below: an engineer’s approach to minimizing application downtime and data drift during such a migration, with hard details you’ll need for a production move.


Key Drivers for Shifting from MongoDB to DocumentDB

  • Operational Overhead: Offload backups, patching, and host management to AWS.
  • Native AWS integration: VPC, CloudWatch, IAM, and KMS support out of the box.
  • Availability Targets: Multi-AZ clusters designed for 99.99% uptime.
  • Security Posture: Encryption at rest with KMS; fine-grained access using IAM roles.

Trade-off: DocumentDB does not claim feature parity with MongoDB. Notable gaps: aggregation pipeline stages, oplog tailing, and older or edge-case wire protocol features. Multi-document transactions are available (since DocumentDB 4.0+), but review the subset you rely on: https://docs.aws.amazon.com/documentdb/latest/developerguide/functionality-differences.html


Migration Constraints

  • Replication Model: MongoDB’s oplog-based replica sets vs. DocumentDB’s distributed storage backend. No drop-in streaming.
  • Wire Protocol: Superset support fluctuates per DocumentDB engine version. Test every critical feature—including your indexes.
  • Connection TLS: DocumentDB enforces SSL/TLS; you must use Amazon’s RDS CA bundle.
  • Authentication: SCRAM mechanisms supported; x.509, LDAP are not.

Migration Blueprint (Zero Downtime)

Here’s the migration pattern, tuned for live environments:

  1. Stand up a DocumentDB Cluster (provisioning, basic network and security setup)
  2. Export Baseline Data (mongodump with attention to versions)
  3. Initial Import to DocumentDB (mongorestore)
  4. Live Change Synchronization (Change Streams daemon or AWS DMS in Change Data Capture mode)
  5. Read/Write Traffic Preparation (feature flags, dual writing, or canarying)
  6. Application Cutover—Endpoint Switch
  7. Retirement of Legacy Stack

Consider failure domains and timeouts at each phase.


1. Provision DocumentDB (Engine Version and Instance Sizing Matter)

Always select the latest DocumentDB engine compatible with your workloads (as of 2024, 5.0.x is latest). Cluster setup typically requires at least one instance (primary + >=1 replica for HA), VPC configuration, and security groups.

aws docdb create-db-cluster \
  --db-cluster-identifier prod-migrate-docdb \
  --engine-version 5.0.0 \
  --master-username admin \
  --master-user-password 'securepasswd@2024' \
  --vpc-security-group-ids sg-abcdef123

Instances provisioned with:

aws docdb create-db-instance \
  --db-instance-identifier prod-docdb-instance-1 \
  --db-cluster-identifier prod-migrate-docdb \
  --engine docdb \
  --db-instance-class db.r6g.large

Note: For real traffic, instance class sizing should reflect storage IO needs, not just CPU/memory. Under-sizing results in elevated write latency or replication lag.


2. Full Data Export from MongoDB

Use mongodump from a machine close (network-wise) to the Mongo source. Match the CLI tool version to the source MongoDB version to avoid BSON compatibility headaches.

mongodump --uri="mongodb://prod_user:password@mongo-source-01:27017/prod_db" \
  --out=/migration/dump-2024-06-11

Gotcha: Large collections may require additional RAM; set --numParallelCollections=1 if seeing memory spikes.


3. Import to DocumentDB

DocumentDB is best loaded via mongorestore. SSL is enforced; the CA bundle is available here: https://docs.aws.amazon.com/documentdb/latest/developerguide/connect_programmatically.html

mongorestore --host prod-migrate-docdb.cluster-XXXXXXXXXX.us-east-1.docdb.amazonaws.com:27017 \
  --username admin \
  --password 'securepasswd@2024' \
  --ssl \
  --sslCAFile rds-combined-ca-bundle.pem \
  /migration/dump-2024-06-11/prod_db

Note: Indexes are mostly preserved—but TTL settings and certain unique index options can silently fail due to unsupported features. Validate post-import:

db.system.indexes.find().forEach(printjson)

4. Continuous Change Sync: Change Streams Daemon or AWS DMS

Post-restore, your destination is immediately outdated as writes hit the source. Bridge this gap with live change replication.

Option A: Change Streams Listener (Manual, Flexible, More Engineering Overhead)

Use a Change Streams consumer (MongoDB >= 4.0 required) to tail prod_db for insert/update/delete and apply to DocumentDB in real-time. Node.js, Python, and Java drivers all work. Skeleton example (Node.js):

const sourceClient = new MongoClient(SRC_URI, { useNewUrlParser: true });
const targetClient = new MongoClient(DOCDB_URI, {
    useNewUrlParser: true,
    ssl: true,
    sslCA: fs.readFileSync('rds-combined-ca-bundle.pem')
});

await sourceClient.connect();
await targetClient.connect();

const cs = sourceClient.db('prod_db').collection('orders').watch();
cs.on('change', async (change) => {
    // Switch on change.operationType, re-apply
});

Known issue: Transient network interruptions can break Change Stream cursors. Build in backoff/reconnect. De-duplication is also required to handle idempotency during retries.

Option B: AWS DMS (Managed, Works for Most Use Cases)

AWS Database Migration Service can orchestrate full load + ongoing change capture (CDC). Define MongoDB as source, DocumentDB as target. Set up a replication instance in the same VPC. Configurable task settings allow full load, then CDC:

  • MongoDB endpoints: Replica sets with an oplog role user required.
  • DocumentDB endpoints: Must support SSL and plaintext passwords.

DMS Error Example:

ERROR: Oplog not enabled or user lacks permissions [MongoDB Source Endpoint]

Ensure you have a user with readAnyDatabase, dbAdmin, and readWrite on the source DB.

Side note: DMS does not capture all cluster-level operations; new collections created during migration may need manual synchronization.


5. Application Traffic Preparation

At this stage, DocumentDB is nearly in sync behind MongoDB. There are three realistic application cutover models:

  • Flag-driven endpoint switch: Deploy a feature flag controlling DB connection logic; allows instant rollback.
  • Dual-writes: Application writes to both MongoDB and DocumentDB during a short window. Ensures no write is lost but can add latency and consistency headaches.
  • Windowed Pause: Brief maintenance window (seconds to minute), halt writes, finish last change sync (“write quiesce”), cut over endpoints.

For strict zero downtime, dual writes or feature-flag switch are preferred, but they increase operational complexity.


6. Cutover to DocumentDB

  • Pause traffic, drain in-flight writes (queue or maintenance endpoint).
  • Ensure the change stream/DMS lag is zero (dms describe-replication-tasks --filters ...)
  • Update application secrets/config to point to DocumentDB endpoint(s).
  • Deploy. Gradually ramp up traffic if possible.

Monitoring required: CloudWatch metrics for ReplicaLag, CPUUtilization, and connection count. Watch application metrics too—occasionally, differences in query planner or aggregation support cause production issues.


7. Post-Cutover Cleanup

  • Maintain the Change Stream/DMS until confident no data drift exists (typically 24-48h).
  • Run a schema diff or row counts between prod_db on both platforms. Scripts like mongo-data-matcher or simple aggregation checks can help.
  • Archive or decommission the legacy MongoDB after backup validation.

Tip: Retain the original mongodump archive at rest for at least a month after migration for emergency point-in-time rollback.


Practical Example: Dealing with Index Conflicts

During one production migration, a unique index (users.email) failed to import as DocumentDB rejected partially null fields unsupported under its unique constraint logic. Manual step: Cleanse records prior to dump, or drop/recreate the index post-import with compatible options.


Migration Checklist

StepTool/ActionNotes
Provision DBAWS CLIUse correct engine version
Export datamongodumpMatch tool version to source
Import datamongorestoreValidate indexes
Live syncChange Streams Daemon / AWS DMSWatch for lag, schema drift
Traffic switchFeature flag / Dual write / Pause + swapTest in lower environment first
ValidationData diff scripts, query consistencyLook for edge-case feature missings
Legacy retireArchive dumps, decommission VMs/resourcesKeep for at least 30 days

Final tip: Always test your entire migration pipeline against a production-sized QA dataset before scheduling the production run. Subtle DocumentDB behaviors around aggregation and index support surface only under real data.

Migration between MongoDB and DocumentDB is never fully “push button”—but with a careful blend of bulk export/import and change replication, coupled with rigorous cutover validation, a seamless transition is absolutely achievable.

For precise code samples, DMS troubleshooting, or lessons learned under production load, reach out.