Sftp To Google Cloud Storage

Sftp To Google Cloud Storage

Reading time1 min
#Cloud#Security#Storage#SFTP#GoogleCloud#GCS

Integrating SFTP with Google Cloud Storage: Robust File Transfer for Modern Infrastructure

Moving from legacy SFTP servers to managed cloud storage is rarely optional for enterprises facing scale, security, and audit requirements. Critical SFTP-based workflows, however, are embedded deep in business processes—ripping them out isn’t pragmatic. The solution: bridge those SFTP workloads directly into Google Cloud Storage (GCS), preserving protocol compatibility while gaining effortless scalability, improved DR posture, and auditability.

Problem: SFTP Holds the Business, But Hardware Holds SFTP Back

A typical scenario: batch data vendors push nightly files to a company SFTP drop-off. The data team collects those files, then copies them onto shared storage, triggers ETLs, maybe even struggles with a clunky SMB share for analytics. Every audit raises flags about backups, encryption-at-rest, limited log retention, or disaster recovery.

Frankenstein fixes—naive rsyncs, hurried backup cron jobs—rarely scale. Google Cloud Storage, however, provides a versioned, instantly durable, and regionally replicated object store. The question: how to securely ingest SFTP drops into GCS, without retraining vendors or rewriting every client?

Solution Overview

Two reliable patterns emerge:

  1. Classic Hybrid: Stand up a traditional SFTP endpoint, then push/rsync uploaded files to GCS via CLI tools (gsutil).
  2. Direct SFTP-to-GCS: Mount a GCS bucket directly as a POSIX filesystem (via gcsfuse), so SFTP writes are instantly persisted as GCS objects.

Each comes with operational quirks. For most, method 1 offers lower migration risk. Power users can skip the middleman with method 2—if they understand eventual consistency caveats on GCS.


Pattern 1: VM SFTP Server + Automated GCS Sync

Straightforward. Deploy an SFTP server (Debian 11.8 LTS and OpenSSH 8.4p1 work fine), and automate upload syncs.

1. Provision SFTP Endpoint

sudo apt-get update && sudo apt-get install openssh-server=1:8.4p1-5+deb11u1
sudo adduser --disabled-password sftpuser

Restrict user into a chroot jail:

Edit /etc/ssh/sshd_config:

Match User sftpuser
  ChrootDirectory /home/sftpuser
  ForceCommand internal-sftp
  X11Forwarding no
  AllowTcpForwarding no
  PermitTunnel no

If you skip chroot, user can traverse the filesystem—bad for compliance.

Restart SSH:

sudo systemctl restart ssh

2. GCS Bucket Bootstrap

Use a dedicated bucket (enable object versioning if rollback is important):

gsutil mb -c standard -l us-central1 gs://corp-sftp-inbox/
gsutil versioning set on gs://corp-sftp-inbox/

3. Install Google Cloud SDK

sudo apt-get install curl
curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-445.0.0-linux-x86_64.tar.gz
tar zxvf google-cloud-sdk-445.0.0-linux-x86_64.tar.gz
./google-cloud-sdk/install.sh
./google-cloud-sdk/bin/gcloud init

Authenticate a service account with least privilege (prefer roles/storage.objectCreator).

4. Automate File Transfer with gsutil rsync

Sync script (/usr/local/bin/sftp_to_gcs.sh):

#!/bin/bash
SRC=/home/sftpuser/incoming/
DEST=gs://corp-sftp-inbox/

gsutil -m rsync -d -r "$SRC" "$DEST" | tee -a /var/log/sftp-gcs-sync.log

# Remove local files only if confirmed delivered:
# find "$SRC" -type f -mmin +30 -delete

The -d flag deletes remote files not present on source; remove if you want append-only.

Schedule (every 5 minutes):

*/5 * * * * /usr/local/bin/sftp_to_gcs.sh

Known issue: rsync won’t catch partially-uploaded files—delay syncs or atomize uploads.


Pattern 2: Mount GCS Bucket via gcsfuse for Native SFTP Writes

For teams needing “write-once, no local footprint,” gcsfuse (v0.41.10+) can expose GCS buckets directly as a Linux directory.

1. Install gcsfuse

export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update && sudo apt-get install gcsfuse=0.41.10-1

2. Authenticate

For Compute Engine VMs, ensure service account mapping with correct Storage permissions. For on-prem, export a JSON key file:

export GOOGLE_APPLICATION_CREDENTIALS=/etc/gcp/sftp-sa-key.json

Don’t leave wide-open key files—chmod 400 minimum.

3. Mount the Bucket

mkdir -p /srv/gcs-inbox
gcsfuse corp-sftp-inbox /srv/gcs-inbox

Persist mount via /etc/fstab or a systemd unit:

gcsfuse#corp-sftp-inbox /srv/gcs-inbox fuse _netdev,allow_other 0 0

4. Wire the SFTP User’s Home to the GCS Mount

usermod -d /srv/gcs-inbox sftpuser

Now, uploaded files appear immediately in GCS.

Trade-off: gcsfuse is “eventually consistent”—metadata lags, and non-empty directory deletes sometimes fail with:

device or resource busy: directory not empty

Retry or clean up with gsutil rm -r.


Operating in Production: Security and Observability

  • IAM: Don’t grant storage.admin unless absolutely necessary. Use bucket-level permissions.
  • Transport Security: Default SFTP gives strong encryption, but validate Ciphers in sshd_config (exclude legacy ones like arcfour).
  • Audit Logging: Enable GCS Data Access audit logs.
  • Monitoring: Integrate Stackdriver alerting against cloud-storage API errors.
StepDefaultBest Practice
Bucket permsOwnerScoped ServiceAcct
SSH Port22Nonstandard (e.g., 2222)
LoggingNoneCentralized syslog + GCS audit logs

Non-Obvious Tips

  • When using gsutil rsync, a large number of files can bottleneck due to API quotas; stagger cron jobs or request quota increases.
  • gcsfuse doesn’t support file locking semantics natively. Avoid for multi-writer workloads.
  • SFTP clients sometimes “probe” with .tmp files; filter these if you process by trigger.

Summary

With the above patterns, you can continue accepting SFTP uploads while storing files durably in Google Cloud Storage. Both approaches keep protocol compatibility on the edge and relieve ops teams from the underlying hardware treadmill.

For edge cases (NTFS ACL preservation, inotify triggers, etc.), alternatives such as Google Transfer Service or third-party gateways are available, but increase complexity and lock-in.

Small imperfections remain—SFTP in 2024 is a workaround, not a strategy. But if you must, this will keep you off most incident reviews.

Note: Validate all settings against your internal infosec baseline before rollout.