Efficient MongoDB Backups to Google Cloud Storage
Teams running MongoDB in production know the pain: no verified backups, no recovery, no uptime guarantee. Lost data means real consequences—customer outages, regulatory risk, or project resets. Yet too many shops push database backups down the backlog, or rely on fragile shell cobbling of mongodump and hope. Here’s a hardened backup pipeline that pushes MongoDB data to Google Cloud Storage (GCS)—scriptable, automatable, and robust enough for real-world ops.
Why Push Backups to GCS?
A few reminders for the risk-averse:
- Durability: GCS claims 11 nines (~99.999999999%)—multi-regional redundancy, hardware failures mitigated.
- Elasticity: No concern about scaling local storage or swapping disks.
- Pay-as-you-go: Use Nearline or Coldline for large, rare restores; Standard class for frequent/urgent recovery paths.
- Easy retrieval: Useful for cross-region restores or spinning up a clone instance elsewhere.
Typical alternatives: USB drives? NFS share? Not in 2024.
Prerequisites
- MongoDB 4.2+ (tested with 6.0.4); replica set for point-in-time (oplog) optional but strongly advised.
mongodumpandmongorestoreinstalled—confirm withmongodump --version.- Google Cloud account with IAM permissions to create/storage buckets and write objects.
gcloud(Google Cloud SDK) >= 442.0.0 (older versions can break auth workflows).- Access to Bash or POSIX shell on the database host.
- Outbound internet connectivity (firewalls: open port 443 to GCS endpoints).
Side note: Consider staging the backup off the database node to reduce local I/O impact.
Step 1: Prepare Google Cloud Storage
Log into the Google Cloud Console:
- Create a bucket, e.g.,
mongo-prod-backups-us. - Set region according to compliance/latency needs. For most,
us-central1oreurope-west1are defaults. - Choose a storage class—
Standardfor daily backups;Coldlineif you only care about DR. - Adjust policies: prohibit public access, enable versioning for accidental overwrite protection.
- Grant a service account at least
Storage Object Adminon the bucket (notEditoron the project).
Step 2: Configure Service Account and Local Environment
- Create a dedicated service account for backups (
backup-mongodb@YOUR_PROJECT.iam.gserviceaccount.com). - Download its JSON key and move it to
/etc/gcp/backup-mongodb-sa.json(restrict permissionschmod 600). - Export
GOOGLE_APPLICATION_CREDENTIALSbefore invoking scripts:
export GOOGLE_APPLICATION_CREDENTIALS="/etc/gcp/backup-mongodb-sa.json"
Set this in your .bashrc/.profile on backup hosts. Using user-based gcloud auth login is fragile for automation.
Step 3: Production-Grade Backup Script
Not just a dump and copy. Always timestamp. Trap errors. Prune old backups. Log all actions.
backup-mongodb-to-gcs.sh
#!/bin/bash
set -e
# --- config ---
TS=$(date +"%Y-%m-%d-%H%M")
BK_ROOT="/tmp/mongodb-backups"
BK_NAME="backup-$TS"
BK_PATH="$BK_ROOT/$BK_NAME"
GCS_BUCKET="gs://mongo-prod-backups-us"
MONGODB_URI="mongodb://localhost:27017"
# --- functions ---
err() { echo "[$(date --iso-8601=seconds)]: $*" >&2; }
log() { echo "[$(date --iso-8601=seconds)]: $*"; }
# Prepare
mkdir -p "$BK_ROOT"
log "Starting mongodump -> $BK_PATH"
mongodump --uri "$MONGODB_URI" --gzip --archive="$BK_PATH.archive.gz"
if [[ $? != 0 ]]; then
err "mongodump failed"
exit 2
fi
# Push
log "Uploading $BK_PATH.archive.gz to $GCS_BUCKET"
gsutil cp "$BK_PATH.archive.gz" "$GCS_BUCKET/" || { err "gsutil upload failed"; exit 3; }
# Optionally cleanup
rm -f "$BK_PATH.archive.gz"
# Rotate old files (keep 15 most recent)
gsutil ls -l "$GCS_BUCKET/" | sort -k2 -r | awk 'NR>15 {print $NF}' | xargs -I {} gsutil rm {}
log "Backup complete"
Note: For oplog (point-in-time consistency), add --oplog to the mongodump command, but only if target MongoDB is a replica set.
Step 4: Automate with Cron
For Linux hosts. Run at 01:00 daily, log output, ensure environment:
0 1 * * * . /etc/profile; /bin/bash /usr/local/sbin/backup-mongodb-to-gcs.sh >> /var/log/mongodb_backup.log 2>&1
Critically, avoid overlapping runs if backups take longer than 23 hours; use lockfiles or monitoring.
Step 5: Restore Workflow
Restoring is deliberately asymmetric—test this process before you need it.
gsutil cp gs://mongo-prod-backups-us/backup-2024-06-18-0100.archive.gz /tmp/
mongorestore --gzip --archive=/tmp/backup-2024-06-18-0100.archive.gz --drop --stopOnError
- Use
--dropto replace collections (risk: data loss). - For partial restores, use
--nsIncludeor--nsExclude. - Known issue: Restoring users/roles in >=MongoDB 4.0 requires
--authenticationDatabase admin.
Production Considerations
- Security: Always use dedicated service accounts and restrict bucket permissions. Never store unencrypted sensitive data at rest—consider encrypting archives before upload if compliance requires.
- Reliability: Check
gsutillogs for429 Too Many Requestsor auth failures; these are easy to silently miss in automation. - Scalability: For large (>40GB) dump files, split the dump by database and parallelize.
gsutil -mhelps but won't outpace network constraints. - Pruning: Use GCS lifecycle rules to auto-delete or transition older backups to colder storage.
- Monitoring: Integrate backup job status with Prometheus, Grafana, or external alerting. A backup you haven’t tested is only theoretical.
Common Pitfalls
- Using
mongodumpwithout--oplogon a live sharded/replicated cluster introduces consistency gaps. - Authentication/permissions drift—service accounts lose access, but backups "succeed" because dumps are local.
- Neglecting to check logs: Always scan for
Backup upload failed!or similar indicators after scheduled jobs.
Tip: Scripting Around Outages
If your dump exceeds instance disk size or fills up /tmp, pipe the mongodump directly:
mongodump --uri "$MONGODB_URI" --archive | gzip | gsutil cp - "$GCS_BUCKET/backup-$TS.archive.gz"
Tradeoff: No local copy. This works well in containers or ephemeral servers.
Partial Visual: Backup Flow
[Mongodump] --gzip+archive--> [Local .gz File] --gsutil cp--> [GCS Bucket]
| |
(option: stream direct to bucket via pipe) |
This workflow isn’t bulletproof—network failures, permission bumps, or local disk problems can still bite. But it’s a battle-tested baseline. For Kubernetes or managed MongoDB Atlas, the core pattern remains; adapt the interface, not the process.
Questions about Kubernetes CronJob specs or integrating with CI/CD? The same shell logic applies—swap mounts, service accounts, and storage classes as needed.
