Migrating from MongoDB to DynamoDB: A Practical Guide
Data migrations between NoSQL systems often fail due to underestimating the differences beneath the surface. This guide covers the real-world challenges of moving production workloads from MongoDB (v4.x/v5.x) to DynamoDB, focusing on schema transformation, data integrity, and operational cutover.
Scenario: A SaaS team has an evolving user database in MongoDB Atlas. AWS adoption is driving a move to DynamoDB for operational simplification and tighter latency controls.
Why DynamoDB—Not Just Managed NoSQL
MongoDB’s flexible document model enables rapid feature evolution, at least early on. However, DynamoDB offers operational simplicity (no patching, scaling, or backup scripts). Native integration with AWS IAM, KMS, and Lambda closes several security and automation gaps typical of self-managed MongoDB clusters. Trade-off: DynamoDB’s strict schema expectations and limited query model require careful up-front modeling.
Note: DynamoDB pricing is heavily influenced by read/write patterns. Underestimate traffic or choose the wrong capacity mode, and operational costs can spike.
Key Technical Differences
Data Model:
MongoDB stores data as BSON documents, allowing deep nesting, array fields, and a wide palette of types. In DynamoDB, items are flat:
- Partition key (mandatory), optional sort key
- Limited data types (String, Number, Binary, Boolean, String Set, Number Set, List, Map)
Indexing & Queries:
MongoDB supports ad-hoc queries, compound indexes, $lookup
aggregation (limited joins), and text search. DynamoDB features only primary and secondary indexes (GSI, LSI); no multi-table JOINs. Query flexibility is traded for speed at scale.
Atomicity & Consistency:
MongoDB’s single-document atomicity is similar to DynamoDB, though DynamoDB can enforce stronger consistency if needed. Batch operations (BatchWriteItem
) are not transactions.
Operational Model:
Backups in DynamoDB are zero-impact and integrate to AWS Backup. Replication and high availability are automatic.
Stepwise Migration: Technical Perspective
1. Analyze Current Collections and Usage
Perform a comprehensive audit:
- List collections/tables. Identify document schemas, including optional or polymorphic fields.
- Extract frequent query patterns (what, not just how many—
db.collection.getIndexes()
can help here). - Audit array and embedded document usage.
- Note: MongoDB’s
_id: ObjectId
fields do not have a natural analog—DynamoDB uses arbitrary strings or numbers as keys.
Sample document shape (users):
{
"_id": {"$oid": "614e020137c5ee64245b37c5"},
"username": "johndoe",
"roles": ["admin", "editor"],
"profile": {"theme": "dark", "notifications": true},
"created_at": {"$date": "2023-10-01T12:32:00.000Z"}
}
Known issue: BSON-specific types (Decimal128, Timestamp) require special handling or string casting.
2. DynamoDB Table and Index Design—Driven by Access Patterns
DynamoDB schema design starts from expected queries, not data shape.
Example:
- If all lookups are by
username
, make that the partition key. - Secondary indexes may be needed for alternate access (e.g., by email or role).
Table definition (pseudo):
TableName: Users
PartitionKey: username (String)
Attributes:
- profile (Map)
- roles (String Set)
- created_at (String, ISO8601)
GlobalSecondaryIndexes:
- Name: EmailIndex
PartitionKey: email
Gotcha: DynamoDB does not support multi-attribute unique constraints. Application-level enforcement needed.
3. Data Type Mapping & Transformation
Map MongoDB types to DynamoDB’s required formats:
MongoDB | DynamoDB | Notes (edge cases in bold) |
---|---|---|
ObjectId | String | Use toHexString() |
String | String | |
Int/Double | Number | DynamoDB has no separate int/float. |
Boolean | Boolean | |
Date | String/Number | Store as ISO8601 or Unix timestamp string |
Array | List/Set | Use Set only for unique values |
Embedded Doc | Map |
Tip: For fields like created_at
, always format to UTC ISO8601 for cross-language compatibility.
4. Export: MongoDB Extraction
For datasets <10M records, native tools suffice.
Typical commands:
mongoexport --db prod_db --collection users --jsonArray --out users.json
Or, for finer control:
# MongoDB Python Driver (python>=3.8)
import bson.json_util, pymongo
client = pymongo.MongoClient('mongodb://...')
users = list(client['prod_db']['users'].find({}))
with open('users.json', 'w') as f:
for doc in users:
f.write(bson.json_util.dumps(doc) + '\n')
Known issue: mongoexport
will flatten some types (like DBRefs) differently than drivers. Test round-trip.
5. Transform: From BSON to DynamoDB Format
Automate type and key conversion. Most migrations use a custom script (Python or Node.js). See below:
// Node.js, AWS SDK v3
function mongoToDynamoItem(doc) {
return {
username: { S: doc.username },
roles: { SS: doc.roles },
profile: {
M: {
theme: { S: doc.profile.theme },
notifications: { BOOL: doc.profile.notifications }
}
},
created_at: { S: doc.created_at.toISOString() }
};
}
Edge: Arrays containing duplicate or non-string types can’t go to a DynamoDB set—use List instead.
6. Ingest: Loading into DynamoDB
For ≤100,000 records:
PutItem
in loop is sufficient, but beware of write throttling.
For medium/large datasets (>100,000):
- Use
BatchWriteItem
(max 25 items or 16MB per call).
Example with AWS SDK (pseudocode):// SDK v3, simplified batching async function batchWrite(items) { while (items.length) { const chunk = items.splice(0, 25); await client.send(new BatchWriteItemCommand({ RequestItems: { 'Users': chunk.map(i => ({ PutRequest: { Item: i } })) } })); // Handle UnprocessedItems if returned } }
Known Throttling Issue:
Provision adequate write capacity. Use exponential backoff on ProvisionedThroughputExceededException
:
{
"UnprocessedItems": {
"Users": [
{
"PutRequest": { ... }
}
]
}
}
For >10M records or complex pipelines:
- Consider AWS Glue (Spark), DMS, or writing directly to S3 as DynamoDB JSON.
7. Validation and Verification
Critical step.
- Count checks:
db.collection.count()
vs. DynamoDB table item count. - Data integrity: Pick records at random, compare all fields (consider automation).
- Access pattern simulation: Run production queries or via scripts; verify performance and accuracy.
- Application-side rewrite: Queries using
$in
, aggregation pipeline, or deep projections will break and need refactoring.
Side note: Subtle mismatches in map vs. list encoding aren't always visible via the console. Inspect with AWS CLI:
aws dynamodb get-item --table-name Users --key '{"username":{"S":"johndoe"}}'
8. Cutover & Ops
Cutover Playbook:
- Freeze writes to MongoDB (if possible, maintenance mode).
- Replay tail of changes using MongoDB Change Streams or oplog scraper.
- Redirect production traffic to DynamoDB.
- Monitor success/error rates with CloudWatch; review throttling and sudden latency shifts.
After Go-Live:
- Tune indexes (Global Secondary Indexes for new access needs).
- Use Auto Scaling or On-Demand capacity.
- Revisit IAM policies—least privilege for all table access.
Not perfect: DynamoDB doesn’t replicate MongoDB’s aggregations or rich text search. Offload reporting/analytics elsewhere (e.g., ElasticSearch/Lambda pipelines).
Summary
A migration from MongoDB to DynamoDB is not a mechanical export/import task. Treat it as a schema and process redesign, dictated by query access and operational priorities. Manual field mapping and index design are non-optional.
If deep arrays, polyglot documents, or $lookup are frequent, expect to rewrite significant portions of application logic.
Tip, learned in practice:
Store original Mongo _id
field as a string attribute—legacy systems and future audits will appreciate the hint.
Further reading:
For edge cases, anomalies, or tailored scripts, reach out. Migration rarely fits a one-size strategy.