Mastering Efficient Data Export: How to Seamlessly Transfer Large Datasets from Snowflake to Amazon S3

Think exporting data is just running a simple unload command? Think again. This guide dives deep into optimizing data movement from Snowflake to S3, debunking common myths and exposing costly pitfalls that can blindside your data engineering team.

Why Efficient Data Export Matters

When building scalable data lakes or preparing datasets for downstream processing, transferring data from Snowflake to Amazon S3 is a core step. It’s easy to overlook complexity here—after all, Snowflake’s COPY INTO command looks straightforward enough.

But in reality, inefficient exports can lead to:

Longer job runtimes
Higher AWS egress charges
Increased Snowflake compute credits usage
Fragmented files causing slower downstream reading

Mastering this export process means saving both time and budget while making your overall data pipeline more robust.

Step 1: Understand the Basics — The COPY INTO Command

Snowflake’s workhorse for exports is the COPY INTO SQL command. Here’s a simple example that unloads a table into S3:

COPY INTO 's3://my-bucket/my-export-folder/'
FROM my_database.my_schema.my_table
STORAGE_INTEGRATION = my_s3_integration
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ',' COMPRESSION = GZIP)
HEADER = TRUE
MAX_FILE_SIZE = 50000000;  -- Max file size of ~50MB per unload file

This command tells Snowflake:

Where to unload (s3://my-bucket/my-export-folder/)
What data (my_table)
How to format files (CSV with gzip compression)
How big the export files should be (~50 MB max)

Pro Tip: Defining file sizes can drastically impact export runtime and downstream usability.

Step 2: Use Storage Integrations for Secure, Scalable Transfers

Never embed AWS keys directly in your SQL commands! Instead, use Storage Integrations which provide role-based access with fine-grained permissions.

To create one:

CREATE STORAGE INTEGRATION my_s3_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/my-snowflake-role'
STORAGE_ALLOWED_LOCATIONS = ('s3://my-bucket/my-export-folder/');

Using integrations not only enhances security but also helps reduce authorization overhead for large or frequent exports.

Step 3: Pick the Right File Format and Compression

Export file format has significant impact on export speed, file size, and ease of later consumption.

Common picks:

Format	Pros	Cons
CSV	Universal, easy debugging	Larger files, limited schema
Parquet	Columnar compression, query fast	Slightly more complex setup
JSON	Semi-structured flexibility	Larger files, parsing overhead

For large exports, Parquet with Snappy compression is usually best because it reduces storage costs and accelerates reads in analytics engines like Athena or Redshift Spectrum.

Example file format creation for Parquet:

CREATE FILE FORMAT my_parquet_format 
TYPE = PARQUET 
COMPRESSION = SNAPPY;

Use this in your EXPORT:

COPY INTO 's3://my-bucket/my-export-folder/'
FROM my_database.my_schema.my_table
FILE_FORMAT = (FORMAT_NAME = 'my_parquet_format');

Step 4: Optimize Parallelism & File Sizes

Snowflake parallelizes exports by splitting output into multiple files based on MAX_FILE_SIZE. Small files hurt both performance and cost (each S3 PUT costs money), while enormous files can slow downstream processing.

Aim for 50–100 MB per unload file; adjust using MAX_FILE_SIZE.

Alternatively, explicitly target the number of files by limiting query size or using multi-step unloads if your dataset is immense:

COPY INTO 's3://my-bucket/my-export-folder/'
FROM (
  SELECT * FROM my_database.my_schema.my_table WHERE mod(hash(cast(id as string)),10) = 0
)
FILE_FORMAT = (FORMAT_NAME='my_parquet_format')
MAX_FILE_SIZE=100000000;

Run similar queries changing the WHERE clause hash condition to partition export workload manually, producing controlled chunked output files.

Step 5: Monitor Costs — Bandwidth & Compute Credits

Snowflake charges compute credits based on warehouse size and runtime during unload jobs.

Tips to minimize credits:

Use appropriately sized warehouses (M-L instead of XL or bigger)
Suspend warehouses immediately after exports finish via automation scripts or tasks
Monitor warehouse credit usage with Snowflake’s UI or queries against ACCOUNT_USAGE.WAREHOUSE_LOAD_HISTORY

AWS side bandwidth costs matter when moving large datasets out of Snowflake’s region or between regions. To minimize egress charges:

Keep Snowflake and S3 buckets within the same AWS region.
Avoid excessive r/w jobs caused by retries or inefficient transfers.

Step 6: Automate & Monitor Your Exports

Large-scale production pipelines require automation and alerting around data exports.

Sample shell script using SnowSQL CLI:

snowsql -a <account> -u <user> -w <warehouse> -d <db> -s <schema> -q "
COPY INTO 's3://my-bucket/my-export-folder/' 
FROM my_table 
FILE_FORMAT = (FORMAT_NAME='my_parquet_format') 
MAX_FILE_SIZE=83886080;
"
if [ $? -ne 0 ]; then
 echo "Export failed at $(date)" | mail -s "Snowflake Export Alert" your-email@example.com
fi

Build orchestration with Airflow/Dagster/Prefect tasks calling this script or direct API calls ensuring retries/logging are in place.

Summary Checklist for Efficient Snowflake → S3 Export

Use COPY INTO with Storage Integrations for secure access.
Select optimal file formats – Parquet + Snappy preferred.
Tune MAX_FILE_SIZE between 50–100 MB to balance parallelism & costs.
Size warehouses smartly; suspend when idle.
Keep Snowflake & S3 buckets in same AWS region where possible.
Automate exports with scripting & monitoring.
Consider manual partitioned queries for very large datasets.

Exporting from Snowflake doesn’t have to be a black box mysterious operation—by mastering these practical steps you can save hours of wait time, reduce cloud bills, and keep your pipelines humming smoothly.

Got special cases or tips around exporting? Share your experiences in the comments!

Happy exporting!

Snowflake To S3