Sftp To Google Drive

Sftp To Google Drive

Reading time1 min
#Cloud#Security#Automation#SFTP#GoogleDrive#Python

Seamless Integration: Automating Secure SFTP Transfers to Google Drive

Moving critical files from legacy systems to collaborative cloud storage often exposes awkward seams between protocols—SFTP and Google Drive top that list. Manual workflows, inconsistent file drops, and gaps in auditability emerge, particularly during periods of rapid scaling or compliance audits.

Here’s a pattern for establishing a reliable, auditable bridge between SFTP and Google Drive. This flow leverages Python 3.10+, Paramiko (SFTP), and Google’s Drive API (v3), all orchestrated from a hardened Linux VM. The goal: automated, actionable, and production-ready. Below, code, caveats, and operational notes.


Overview: Why Bother Integrating These?

RequirementSFTPGoogle DriveIntegration Outcome
Encryption in TransitYesPartial (API/HTTPS)End-to-end security, no manual weak links
Compliance/AuditManual logsAudit logsUnified event traceability
CollaborationNoYesInstant team access post-transfer

Bridging these with automation supports regulatory requirements (SOX, HIPAA), reduces human error, and accelerates data surfacing—particularly in analytics or operational alerting chains.


Prerequisites

  • SFTP server: SSH key or password access required. Test with sftp user@host.
  • Google Cloud project: Drive API enabled, Service Account JSON credentials downloaded.
  • Linux server/VM: Python 3.10+, pip, outbound HTTPS and SSH open. Avoid Windows—permission inconsistencies are common with batch file cleanups.
  • Python libraries: paramiko, google-api-python-client, google-auth-httplib2, google-auth-oauthlib.
  • Target folder: Share specific Drive folder with service account (service-account@project.iam.gserviceaccount.com). Without this, upload attempts silently fail with HttpError 403: Insufficient Permission.

Step 1: Google Drive API Setup (Service Account)

  1. Enable Google Drive API in console.oud.google.com/apis/library/drive.googleapis.com.
  2. Create Service Account via IAM > Service Accounts.
    Assign minimum roles/drive.file permissions.
  3. Download credentials: Place service_account.json securely on automation host.
  4. Grant Drive folder access:
    • Locate Drive folder → Share → add service account’s email.

Gotcha: Folder permission changes can propagate on Google’s side for several minutes. If automation reports 404/notFound, re-check sharing.


Step 2: SFTP Sync – Parameterized, Resilient Download

Paramiko is industry-standard, but plan for timeouts (socket.timeout), dropped connections, and partial transfers. Key point—use SFTP’s atomic file get, not manual SCP, for consistency.

import paramiko
import os
import socket

def sftp_pull_all(host, port, username, password, remote_dir, local_dir):
    os.makedirs(local_dir, exist_ok=True)
    try:
        with paramiko.Transport((host, port)) as transport:
            transport.connect(username=username, password=password)
            with paramiko.SFTPClient.from_transport(transport) as sftp:
                for file in sftp.listdir(remote_dir):
                    remote = f"{remote_dir}/{file}"
                    local = os.path.join(local_dir, file)
                    try:
                        sftp.get(remote, local)
                    except Exception as e:
                        print(f"Failed {remote}: {e}")
    except (paramiko.SSHException, socket.timeout) as e:
        print(f"SFTP error: {e}")

# Example use
sftp_pull_all("sftp.example.com", 22, "apiuser", "secret-2024", "/exports", "./sftp_tmp")

Test log sample:

Failed /exports/hugefile.tar.gz: Channel closed.

If you see this, your SFTP timeout is too short or quotas are exceeded.


Step 3: Uploading to Google Drive (Python, Drive v3)

Install dependencies:

python3.10 -m pip install --upgrade \
  google-api-python-client==2.108.0 \
  google-auth-httplib2 google-auth-oauthlib

Drive uploads are resumable—use MediaFileUpload for reliability.

from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload
from google.oauth2 import service_account
import os

def gdrive_upload(file_path, folder_id, creds_file):
    SCOPES = ['https://www.googleapis.com/auth/drive.file']
    creds = service_account.Credentials.from_service_account_file(
        creds_file, scopes=SCOPES)
    service = build('drive', 'v3', credentials=creds)
    file_metadata = {
        "name": os.path.basename(file_path),
        "parents": [folder_id],
    }
    media = MediaFileUpload(file_path, resumable=True)
    try:
        file = service.files().create(
            body=file_metadata, media_body=media, fields='id'
        ).execute()
        print(f"Uploaded '{file_path}' as {file['id']}")
    except Exception as e:
        print(f"GDrive upload failed for {file_path}: {e}")

# Usage example
gdrive_upload("./sftp_tmp/report.xml", "1ko-DriveF0lderID", "service_account.json")

Known issue: Uploads >5GB may intermittently fail—Google API limits. Split or chunk large files upstream.


Step 4: End-to-End Automation Script

Combine stages into an atomic sequence—download, upload, cleanup.

def main():
    sftp_cfg = {
        "host": "sftp.example.com",
        "port": 22,
        "username": "apiuser",
        "password": "secret-2024",
        "remote_dir": "/exports",
    }
    gdrive_cfg = {
        "folder_id": "1ko-DriveF0lderID",
        "service_account_file": "service_account.json"
    }
    temp_dir = "./sftp_tmp"
    sftp_pull_all(
        sftp_cfg["host"],
        sftp_cfg["port"],
        sftp_cfg["username"],
        sftp_cfg["password"],
        sftp_cfg["remote_dir"],
        temp_dir,
    )
    for fname in os.listdir(temp_dir):
        fpath = os.path.join(temp_dir, fname)
        gdrive_upload(fpath, gdrive_cfg["folder_id"], gdrive_cfg["service_account_file"])
        os.unlink(fpath)

if __name__ == "__main__":
    main()

Security note: Do not persist plaintext credentials or the service_account.json on world-readable filesystems.


Scheduling & Monitoring

Linux/Unix:
Use a dedicated unprivileged user. Sample crontab (run every 20 minutes):

*/20 * * * * /usr/bin/python3.10 /opt/sftp_to_gdrive/bridge.py >> /var/log/sftp_bridge.log 2>&1

Windows: Not advised—inconsistent scheduled task environments. If necessary, use Task Scheduler with python.exe and monitor exit codes.

Non-obvious tip: Enable cloud logging (Stackdriver, DataDog, or local rsyslog tail). Capture both stdout logs and HttpError/tracebacks for operational watchdogging.


Production Recommendations

  • Logging: Swap all print for logging.info/warning/error. Enable rotating log files; avoid stdout-only for root-cause analysis.
  • Error handling: Wrap SFTP and Drive methods in retries with exponential backoff.
  • Checksums: Run sha256sum pre/post-transfer for chain-of-custody verification.
  • SSH keys: Use key-based auth for SFTP. Store keys in a restricted directory (chmod 600), and avoid embedded passwords in scripts.
  • File lifecycle: Automate SFTP-side file deletion only after upload/validation.
  • Alternative: For high throughput, consider Google Workflows or cloud-based SFTP-to-Drive connectors (e.g., GCP Transfer Service). These add cost but reduce local maintenance.

Concluding Note

Treat data interchange as a pipeline, not occasional plumbing. End-to-end SFTP-to-Drive automation provides auditability, reliability, and scale. Realistically, edge cases (API throttling, lost connectivity, permission drift) will surface—so build for detection and controlled failure, not silent skips.

If you encounter intermittent 403s from Google Drive, revalidate sharing on the folder. For SFTP hosts enforcing MFA or IP restrictions, use a VPN endpoint on the automation box. There’s always another gap to close.