Mongodb To Dynamodb

Mongodb To Dynamodb

Reading time1 min
#Cloud#Database#Migration#MongoDB#DynamoDB#NoSQL

Migrating from MongoDB to DynamoDB: A Practical Guide

Data migrations between NoSQL systems often fail due to underestimating the differences beneath the surface. This guide covers the real-world challenges of moving production workloads from MongoDB (v4.x/v5.x) to DynamoDB, focusing on schema transformation, data integrity, and operational cutover.

Scenario: A SaaS team has an evolving user database in MongoDB Atlas. AWS adoption is driving a move to DynamoDB for operational simplification and tighter latency controls.


Why DynamoDB—Not Just Managed NoSQL

MongoDB’s flexible document model enables rapid feature evolution, at least early on. However, DynamoDB offers operational simplicity (no patching, scaling, or backup scripts). Native integration with AWS IAM, KMS, and Lambda closes several security and automation gaps typical of self-managed MongoDB clusters. Trade-off: DynamoDB’s strict schema expectations and limited query model require careful up-front modeling.

Note: DynamoDB pricing is heavily influenced by read/write patterns. Underestimate traffic or choose the wrong capacity mode, and operational costs can spike.


Key Technical Differences

Data Model:
MongoDB stores data as BSON documents, allowing deep nesting, array fields, and a wide palette of types. In DynamoDB, items are flat:

  • Partition key (mandatory), optional sort key
  • Limited data types (String, Number, Binary, Boolean, String Set, Number Set, List, Map)

Indexing & Queries:
MongoDB supports ad-hoc queries, compound indexes, $lookup aggregation (limited joins), and text search. DynamoDB features only primary and secondary indexes (GSI, LSI); no multi-table JOINs. Query flexibility is traded for speed at scale.

Atomicity & Consistency:
MongoDB’s single-document atomicity is similar to DynamoDB, though DynamoDB can enforce stronger consistency if needed. Batch operations (BatchWriteItem) are not transactions.

Operational Model:
Backups in DynamoDB are zero-impact and integrate to AWS Backup. Replication and high availability are automatic.


Stepwise Migration: Technical Perspective

1. Analyze Current Collections and Usage

Perform a comprehensive audit:

  • List collections/tables. Identify document schemas, including optional or polymorphic fields.
  • Extract frequent query patterns (what, not just how many—db.collection.getIndexes() can help here).
  • Audit array and embedded document usage.
  • Note: MongoDB’s _id: ObjectId fields do not have a natural analog—DynamoDB uses arbitrary strings or numbers as keys.

Sample document shape (users):

{
  "_id": {"$oid": "614e020137c5ee64245b37c5"},
  "username": "johndoe",
  "roles": ["admin", "editor"],
  "profile": {"theme": "dark", "notifications": true},
  "created_at": {"$date": "2023-10-01T12:32:00.000Z"}
}

Known issue: BSON-specific types (Decimal128, Timestamp) require special handling or string casting.


2. DynamoDB Table and Index Design—Driven by Access Patterns

DynamoDB schema design starts from expected queries, not data shape.
Example:

  • If all lookups are by username, make that the partition key.
  • Secondary indexes may be needed for alternate access (e.g., by email or role).

Table definition (pseudo):

TableName: Users
PartitionKey: username (String)
Attributes:
  - profile (Map)
  - roles (String Set)
  - created_at (String, ISO8601)
GlobalSecondaryIndexes:
  - Name: EmailIndex
    PartitionKey: email

Gotcha: DynamoDB does not support multi-attribute unique constraints. Application-level enforcement needed.


3. Data Type Mapping & Transformation

Map MongoDB types to DynamoDB’s required formats:

MongoDBDynamoDBNotes (edge cases in bold)
ObjectIdStringUse toHexString()
StringString
Int/DoubleNumberDynamoDB has no separate int/float.
BooleanBoolean
DateString/NumberStore as ISO8601 or Unix timestamp string
ArrayList/SetUse Set only for unique values
Embedded DocMap

Tip: For fields like created_at, always format to UTC ISO8601 for cross-language compatibility.


4. Export: MongoDB Extraction

For datasets <10M records, native tools suffice.
Typical commands:

mongoexport --db prod_db --collection users --jsonArray --out users.json

Or, for finer control:

# MongoDB Python Driver (python>=3.8)
import bson.json_util, pymongo
client = pymongo.MongoClient('mongodb://...')
users = list(client['prod_db']['users'].find({}))
with open('users.json', 'w') as f:
    for doc in users:
        f.write(bson.json_util.dumps(doc) + '\n')

Known issue: mongoexport will flatten some types (like DBRefs) differently than drivers. Test round-trip.


5. Transform: From BSON to DynamoDB Format

Automate type and key conversion. Most migrations use a custom script (Python or Node.js). See below:

// Node.js, AWS SDK v3
function mongoToDynamoItem(doc) {
  return {
    username: { S: doc.username },
    roles: { SS: doc.roles },
    profile: {
      M: {
        theme: { S: doc.profile.theme },
        notifications: { BOOL: doc.profile.notifications }
      }
    },
    created_at: { S: doc.created_at.toISOString() }
  };
}

Edge: Arrays containing duplicate or non-string types can’t go to a DynamoDB set—use List instead.


6. Ingest: Loading into DynamoDB

For ≤100,000 records:

  • PutItem in loop is sufficient, but beware of write throttling.

For medium/large datasets (>100,000):

  • Use BatchWriteItem (max 25 items or 16MB per call).
    Example with AWS SDK (pseudocode):
    // SDK v3, simplified batching
    async function batchWrite(items) {
      while (items.length) {
        const chunk = items.splice(0, 25);
        await client.send(new BatchWriteItemCommand({
          RequestItems: {
            'Users': chunk.map(i => ({ PutRequest: { Item: i } }))
          }
        }));
        // Handle UnprocessedItems if returned
      }
    }
    

Known Throttling Issue:
Provision adequate write capacity. Use exponential backoff on ProvisionedThroughputExceededException:

{
  "UnprocessedItems": {
    "Users": [
      {
        "PutRequest": { ... }
      }
    ]
  }
}

For >10M records or complex pipelines:

  • Consider AWS Glue (Spark), DMS, or writing directly to S3 as DynamoDB JSON.

7. Validation and Verification

Critical step.

  • Count checks: db.collection.count() vs. DynamoDB table item count.
  • Data integrity: Pick records at random, compare all fields (consider automation).
  • Access pattern simulation: Run production queries or via scripts; verify performance and accuracy.
  • Application-side rewrite: Queries using $in, aggregation pipeline, or deep projections will break and need refactoring.

Side note: Subtle mismatches in map vs. list encoding aren't always visible via the console. Inspect with AWS CLI:

aws dynamodb get-item --table-name Users --key '{"username":{"S":"johndoe"}}'

8. Cutover & Ops

Cutover Playbook:

  • Freeze writes to MongoDB (if possible, maintenance mode).
  • Replay tail of changes using MongoDB Change Streams or oplog scraper.
  • Redirect production traffic to DynamoDB.
  • Monitor success/error rates with CloudWatch; review throttling and sudden latency shifts.

After Go-Live:

  • Tune indexes (Global Secondary Indexes for new access needs).
  • Use Auto Scaling or On-Demand capacity.
  • Revisit IAM policies—least privilege for all table access.

Not perfect: DynamoDB doesn’t replicate MongoDB’s aggregations or rich text search. Offload reporting/analytics elsewhere (e.g., ElasticSearch/Lambda pipelines).


Summary

A migration from MongoDB to DynamoDB is not a mechanical export/import task. Treat it as a schema and process redesign, dictated by query access and operational priorities. Manual field mapping and index design are non-optional.
If deep arrays, polyglot documents, or $lookup are frequent, expect to rewrite significant portions of application logic.

Tip, learned in practice:
Store original Mongo _id field as a string attribute—legacy systems and future audits will appreciate the hint.


Further reading:

For edge cases, anomalies, or tailored scripts, reach out. Migration rarely fits a one-size strategy.