Rsync To Google Drive

Rsync To Google Drive

Reading time1 min
#Cloud#Linux#Backup#rsync#GoogleDrive#rclone

How to Seamlessly Sync Large Files from Rsync to Google Drive Without Losing Data Integrity

Typical workflows for uploading large files to Google Drive are both inefficient and fragile, especially when handling multi-gigabyte data or incremental backups. Drag-and-drop introduces risk: interrupted uploads frequently require a restart from scratch, and modified files often result in full re-uploads, squandering bandwidth and exposing the system to silent corruption or data loss.

Instead, pair rsync—a battle-tested Linux tool for incremental file synchronization—with rclone, which interfaces directly with Google Drive’s remote API. This duo bridges the gap between POSIX filesystems and remote object storage, enabling selective updates and real content verification.


Why “Smart Sync” is Mandatory

Direct uploads ignore file deltas and often skip checksum integrity. Results:

  • Bandwidth spike during each backup.
  • Accumulation of duplicates and stale data.
  • Virtually no protection against silent file corruption.
  • No audit trail or resumable sync unless handled manually.

By contrast, rsync, when properly deployed, offers:

  • Incremental changes: transfer only the binary deltas or new files (saves hours on large VM images or raw datasets).
  • Preservation of permissions, timestamps, and, with options, checksums.
  • Built-in verbose audit and dry-run capability.

The catch: Google Drive is not a native filesystem, so you need either a FUSE-backed mount or API-driven sync.


Options for Interfacing Rsync with Google Drive

  1. Google’s Desktop Sync/Drive for Desktop

    • Pros: Out-of-the-box GUI, end-user focus.
    • Cons: Flaky with huge files, issues with deep directory hierarchies, no tuning of caching.
  2. Mount Google Drive via rclone FUSE

    • Exposes Google Drive as $HOME/mnt/gdrive for filesystem tools (including rsync).
    • rclone v1.55.1+ recommended for fewer VFS edge cases.
    • Gotcha: FUSE layer induces latency; not suitable for millions of tiny files.
  3. Direct use of rclone’s sync commands (not covered here; see docs).

    • Lower-level; sacrifices rsync’s fine-grained control.

This guide uses option 2. Mount Google Drive via rclone, then execute a standard rsync pass.


Installation (Debian 11 and macOS 13.0+)

sudo apt-get update && sudo apt-get install -y rclone
brew install rclone

Check the version. rclone v1.63.1 (or later) is preferred:

rclone version

Rclone: Configuring Access to Google Drive

Initialize OAuth and get the remote storage connected:

rclone config

Prompts and settings:

  • Press n to create a new remote.
  • Name: gdrive (or another short alias).
  • Type: drive.
  • Standard Google account OAuth flow (browser-based). On headless servers, use the provided link and paste the token.
  • Accept defaults unless sharing with a service account.

Sanity check:

rclone ls gdrive:

If authentication fails:

Failed to create file system for "gdrive:": couldn't find root directory ID

Check for typos in the remote name or corrupted OAuth cache.


Mount the Google Drive Filesystem

Create a target mountpoint (~/mnt/gdrive), then establish the FUSE mount. The --vfs-cache-mode writes flag is strongly recommended to buffer partial file writes, a key mitigation for rsync’s rename/partial-write patterns.

mkdir -p ~/mnt/gdrive
rclone mount gdrive: ~/mnt/gdrive --vfs-cache-mode writes &
sleep 3

Unmount via:

fusermount -u ~/mnt/gdrive

Note: On macOS, substitute umount ~/mnt/gdrive.

Monitor logs:

tail -f ~/.cache/rclone/rclone.log

Rsync Execution: Practical Example

Suppose /srv/projects/data/ hosts source files.

Initial sync:

rsync -avh --progress --delete /srv/projects/data/ ~/mnt/gdrive/backup-data/

Where:

  • -a: recursive, preserves metadata
  • -v: verbose
  • -h: human-readable output
  • --delete: remove extra files at destination (danger: double-check source path)
  • --progress: shows file progress

For high-assurance checksums (full byte-by-byte confirmation):

rsync -avhc --progress --delete /srv/projects/data/ ~/mnt/gdrive/backup-data/
# '-c' == checksum. Note: Significantly increases local disk and CPU activity.

Sub-directory or sparse file sync can be optimized further:

rsync -avh --exclude='*.bak' --exclude='cache/' /srv/projects/data/ ~/mnt/gdrive/backup-data/

Non-obvious tip: Rsync’s delta-xfer algorithm is less effective when both ends are via high-latency FUSE. Small files: sometimes faster with rclone’s native copy or sync.


Automating with a Robust Script

Common pitfall: rclone FUSE unmounts on network loss or idle timeout; scripts may fail quietly if the mount disappears. Mitigate with sanity checks:

#!/usr/bin/env bash

set -e

mnt=~/mnt/gdrive
SRC=/srv/projects/data/
DST=$mnt/backup-data/

if ! mountpoint -q "$mnt"; then
    echo "$(date): Remounting gdrive"
    nohup rclone mount gdrive: "$mnt" --vfs-cache-mode writes &
    sleep 5
fi

rsync -avhc --delete "$SRC" "$DST"
echo "$(date): Sync complete"

Permissions:

chmod 750 /usr/local/bin/google-drive-backup.sh

Crontab entry for 2:00AM backups, with log append:

0 2 * * * /usr/local/bin/google-drive-backup.sh >> /var/log/google-drive-backup.log 2>&1

Advanced Efficiency & Safety Notes

IssueSymptomRecommendation
Incomplete mountTransport endpoint is not connectedRemount FUSE
Rate limits403/429 errors in rclone logUse --drive-chunk-size=64M or stagger jobs
File size limits>5TB single file not supportedSplit archives upfront
Cache thrashingSlow syncs with huge directory treesTry --vfs-cache-max-age 12h

Exclude non-critical files:

rsync ... --exclude='.tmp' --exclude='*.log'

Dry-run before deletion-heavy updates:

rsync --dry-run -avh --delete "$SRC" "$DST"

Conclusion

Mounting Google Drive with rclone and handling file synchronization with rsync achieves a more deterministic, verifiable backup pipeline. This approach is particularly effective for:

  • Nightly syncs of project repos, data science artifacts, and log archives.
  • Teams needing reliable off-site backup without vendor lock-in or brittle web UIs.
  • Anyone requiring strict data validation (e.g., adding -c for checksum-level assurances).

Known limitation: Directory listing on GDrive FUSE is not instantaneous at scale (>200k files can stall). For extremely large sets, blend periodic rclone sync runs with event-based rsync for subdirectories.

Hardware quirks, service limits, and error logs always tell the deeper story. Monitor them, adapt the process. There’s no “one-size-fits-all” in storage workflows—only tradeoffs to be tuned.


No-frills. Efficient. Reliable. That’s how you sync to Google Drive.