Efficient MongoDB Backups to Google Cloud Storage
Teams running MongoDB in production know the pain: no verified backups, no recovery, no uptime guarantee. Lost data means real consequences—customer outages, regulatory risk, or project resets. Yet too many shops push database backups down the backlog, or rely on fragile shell cobbling of mongodump
and hope. Here’s a hardened backup pipeline that pushes MongoDB data to Google Cloud Storage (GCS)—scriptable, automatable, and robust enough for real-world ops.
Why Push Backups to GCS?
A few reminders for the risk-averse:
- Durability: GCS claims 11 nines (~99.999999999%)—multi-regional redundancy, hardware failures mitigated.
- Elasticity: No concern about scaling local storage or swapping disks.
- Pay-as-you-go: Use Nearline or Coldline for large, rare restores; Standard class for frequent/urgent recovery paths.
- Easy retrieval: Useful for cross-region restores or spinning up a clone instance elsewhere.
Typical alternatives: USB drives? NFS share? Not in 2024.
Prerequisites
- MongoDB 4.2+ (tested with 6.0.4); replica set for point-in-time (oplog) optional but strongly advised.
mongodump
andmongorestore
installed—confirm withmongodump --version
.- Google Cloud account with IAM permissions to create/storage buckets and write objects.
gcloud
(Google Cloud SDK) >= 442.0.0 (older versions can break auth workflows).- Access to Bash or POSIX shell on the database host.
- Outbound internet connectivity (firewalls: open port 443 to GCS endpoints).
Side note: Consider staging the backup off the database node to reduce local I/O impact.
Step 1: Prepare Google Cloud Storage
Log into the Google Cloud Console:
- Create a bucket, e.g.,
mongo-prod-backups-us
. - Set region according to compliance/latency needs. For most,
us-central1
oreurope-west1
are defaults. - Choose a storage class—
Standard
for daily backups;Coldline
if you only care about DR. - Adjust policies: prohibit public access, enable versioning for accidental overwrite protection.
- Grant a service account at least
Storage Object Admin
on the bucket (notEditor
on the project).
Step 2: Configure Service Account and Local Environment
- Create a dedicated service account for backups (
backup-mongodb@YOUR_PROJECT.iam.gserviceaccount.com
). - Download its JSON key and move it to
/etc/gcp/backup-mongodb-sa.json
(restrict permissionschmod 600
). - Export
GOOGLE_APPLICATION_CREDENTIALS
before invoking scripts:
export GOOGLE_APPLICATION_CREDENTIALS="/etc/gcp/backup-mongodb-sa.json"
Set this in your .bashrc
/.profile
on backup hosts. Using user-based gcloud auth login
is fragile for automation.
Step 3: Production-Grade Backup Script
Not just a dump and copy. Always timestamp. Trap errors. Prune old backups. Log all actions.
backup-mongodb-to-gcs.sh
#!/bin/bash
set -e
# --- config ---
TS=$(date +"%Y-%m-%d-%H%M")
BK_ROOT="/tmp/mongodb-backups"
BK_NAME="backup-$TS"
BK_PATH="$BK_ROOT/$BK_NAME"
GCS_BUCKET="gs://mongo-prod-backups-us"
MONGODB_URI="mongodb://localhost:27017"
# --- functions ---
err() { echo "[$(date --iso-8601=seconds)]: $*" >&2; }
log() { echo "[$(date --iso-8601=seconds)]: $*"; }
# Prepare
mkdir -p "$BK_ROOT"
log "Starting mongodump -> $BK_PATH"
mongodump --uri "$MONGODB_URI" --gzip --archive="$BK_PATH.archive.gz"
if [[ $? != 0 ]]; then
err "mongodump failed"
exit 2
fi
# Push
log "Uploading $BK_PATH.archive.gz to $GCS_BUCKET"
gsutil cp "$BK_PATH.archive.gz" "$GCS_BUCKET/" || { err "gsutil upload failed"; exit 3; }
# Optionally cleanup
rm -f "$BK_PATH.archive.gz"
# Rotate old files (keep 15 most recent)
gsutil ls -l "$GCS_BUCKET/" | sort -k2 -r | awk 'NR>15 {print $NF}' | xargs -I {} gsutil rm {}
log "Backup complete"
Note: For oplog (point-in-time consistency), add --oplog
to the mongodump
command, but only if target MongoDB is a replica set.
Step 4: Automate with Cron
For Linux hosts. Run at 01:00 daily, log output, ensure environment:
0 1 * * * . /etc/profile; /bin/bash /usr/local/sbin/backup-mongodb-to-gcs.sh >> /var/log/mongodb_backup.log 2>&1
Critically, avoid overlapping runs if backups take longer than 23 hours; use lockfiles or monitoring.
Step 5: Restore Workflow
Restoring is deliberately asymmetric—test this process before you need it.
gsutil cp gs://mongo-prod-backups-us/backup-2024-06-18-0100.archive.gz /tmp/
mongorestore --gzip --archive=/tmp/backup-2024-06-18-0100.archive.gz --drop --stopOnError
- Use
--drop
to replace collections (risk: data loss). - For partial restores, use
--nsInclude
or--nsExclude
. - Known issue: Restoring users/roles in >=MongoDB 4.0 requires
--authenticationDatabase admin
.
Production Considerations
- Security: Always use dedicated service accounts and restrict bucket permissions. Never store unencrypted sensitive data at rest—consider encrypting archives before upload if compliance requires.
- Reliability: Check
gsutil
logs for429 Too Many Requests
or auth failures; these are easy to silently miss in automation. - Scalability: For large (>40GB) dump files, split the dump by database and parallelize.
gsutil -m
helps but won't outpace network constraints. - Pruning: Use GCS lifecycle rules to auto-delete or transition older backups to colder storage.
- Monitoring: Integrate backup job status with Prometheus, Grafana, or external alerting. A backup you haven’t tested is only theoretical.
Common Pitfalls
- Using
mongodump
without--oplog
on a live sharded/replicated cluster introduces consistency gaps. - Authentication/permissions drift—service accounts lose access, but backups "succeed" because dumps are local.
- Neglecting to check logs: Always scan for
Backup upload failed!
or similar indicators after scheduled jobs.
Tip: Scripting Around Outages
If your dump exceeds instance disk size or fills up /tmp
, pipe the mongodump directly:
mongodump --uri "$MONGODB_URI" --archive | gzip | gsutil cp - "$GCS_BUCKET/backup-$TS.archive.gz"
Tradeoff: No local copy. This works well in containers or ephemeral servers.
Partial Visual: Backup Flow
[Mongodump] --gzip+archive--> [Local .gz File] --gsutil cp--> [GCS Bucket]
| |
(option: stream direct to bucket via pipe) |
This workflow isn’t bulletproof—network failures, permission bumps, or local disk problems can still bite. But it’s a battle-tested baseline. For Kubernetes or managed MongoDB Atlas, the core pattern remains; adapt the interface, not the process.
Questions about Kubernetes CronJob specs or integrating with CI/CD? The same shell logic applies—swap mounts, service accounts, and storage classes as needed.