How to Seamlessly Migrate Data from AWS S3 to RDS Using AWS Glue for Scalable Analytics
Forget manual ETL scripts—AWS Glue automates the grunt work of moving data from S3 to RDS, letting you focus on insights instead of infrastructure headaches.
Moving large datasets from Amazon S3 to Amazon Relational Database Service (RDS) efficiently is crucial for building robust, scalable analytics pipelines and reducing data latency in modern cloud architectures. Manual ETL processes can be cumbersome and error-prone, especially as data scales. Thankfully, AWS Glue, a fully managed serverless ETL service, simplifies this migration with powerful automation.
In this post, I’ll walk you through a practical, step-by-step guide on how to leverage AWS Glue to seamlessly migrate your data from S3 to RDS for scalable analytics.
Why Use AWS Glue for Migrating Data from S3 to RDS?
Traditionally, you might write custom scripts or use third-party tools to move ETL data. This manual approach is hard to maintain and does not scale easily. Here are the main reasons why AWS Glue stands out:
- Serverless: No infrastructure management — Glue handles provisioning and scaling.
- ETL Automation: Generates code automatically based on your data.
- Data Catalog: Keeps your metadata organized for easy discovery.
- Integration: Deeply integrated with both S3 and RDS services.
- Cost-effective: You pay only for the resources used during the ETL jobs.
What You’ll Need Before Starting
- An S3 bucket with your source data files (CSV, JSON, Parquet, etc.).
- An operational Amazon RDS database (MySQL/PostgreSQL/SQL Server/Aurora).
- Appropriate IAM roles with permissions for Glue to access S3 and RDS securely.
- AWS CLI or Console access and some basic familiarity with AWS services.
Step 1: Prepare Your Data in S3
Make sure your data files are stored in an organized folder structure inside your S3 bucket. For example:
s3://my-data-bucket/sales_data/2024/
Ensure that the files have consistent schema format because AWS Glue uses this information when generating the ETL job.
Step 2: Create a Database in AWS Glue Data Catalog
The Glue Data Catalog is central to all AWS Glue operations; it stores metadata about your datasets.
- Open the AWS Glue Console.
- Navigate to Databases → click Add database.
- Give it a meaningful name such as
sales_data_db
. - Click Create.
Step 3: Create a Crawler for Your S3 Data
The crawler scans your source data and populates the Data Catalog tables with schemas.
- In AWS Glue Console → Crawlers → Add crawler.
- Name it descriptively e.g.,
S3_Sales_Data_Crawler
. - Set data store as your S3 path (
s3://my-data-bucket/sales_data/2024/
). - Set output database as
sales_data_db
. - Create or choose an existing IAM role that can read from S3 and write to the Catalog.
- Run the crawler; after completion, verify tables created under
sales_data_db
.
Step 4: Configure Your Amazon RDS Instance Security
Before moving data into RDS:
- Ensure your RDS instance is accessible by the AWS Glue service by configuring VPC security groups and subnet associations correctly.
- Allow inbound traffic on the database port (e.g., 3306 for MySQL).
- Note down endpoint address, DB name, username, and password.
Step 5: Create an AWS Glue Connection to Amazon RDS
Glue needs a connection object pointing at your target RDS database.
- Go to Glue Console → Connections → Add connection.
- Choose type: JDBC.
- Enter:
- Connection name e.g.,
rds-sales-db-conn
. - JDBC URL like:
jdbc:mysql://your-rds-endpoint.amazonaws.com:3306/your_db_name
- VPC settings matching those of your RDS instance.
- Credentials (username/password).
- Connection name e.g.,
- Test connection before saving.
Step 6: Create an ETL Job in AWS Glue
Now you’re ready to create an ETL job that reads from the source table created by your crawler and writes into your RDS table.
-
In Glue Console → Jobs → Add job:
- Name:
S3ToRDS_Migration_Job
- IAM Role with permission to access both the source (S3/Catalog) and target (RDS)
- Choose Spark environment (Glue Version 2 or 3 recommended).
- Name:
-
When asked for script source:
- Select “Create a new script”.
- Specify input table as one created by crawler (e.g.,
sales_data_db.sales_2024
)
-
For output target:
- Select “Data store” → JDBC connection (
rds-sales-db-conn
) - Specify target table name in RDS (create if not exists manually)
- Select “Data store” → JDBC connection (
-
AWS Glue will auto-generate PySpark ETL script that reads from S3 via catalog table and writes into the relational DB using batch inserts.
You can customize this script – for example:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv,['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
# Read data from Glue Catalog table
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "sales_data_db", table_name = "sales_2024")
# Apply any transformations if needed here...
# Write to RDS via JDBC
glueContext.write_dynamic_frame.from_jdbc_conf(frame = datasource0,
catalog_connection = "rds-sales-db-conn",
connection_options = {"dbtable": "rds_sales_2024", "database": "your_db_name"},
redshift_tmp_dir = "",
transformation_ctx = "datasink")
job.commit()
Step 7: Run & Monitor Your Job
Run the job via console or CLI:
aws glue start-job-run --job-name S3ToRDS_Migration_Job
Watch job progress in AWS Console under Jobs → Runs tab.
When successful:
- Verify data imported correctly by querying your Amazon RDS database.
- Check performance metrics — tweak parallelism & chunk sizes in job parameters if needed for faster throughput.
Pro Tips for Production Readiness
- Use partition pruning in crawlers/jobs if datasets are large (=reduce scanned files/time).
- Enable job bookmarks in AWS Glue to process incremental changes only.
- Encrypt sensitive credentials using Secrets Manager & link them during connection setup.
- Schedule jobs via CloudWatch Events or Step Functions for automation/end-to-end pipeline orchestration.
Conclusion
Migrating massive amounts of analytical data from Amazon S3 into Amazon RDS no longer requires arduous manual scripting! By leveraging AWS Glue, you can build maintainable, efficient pipelines that scale effortlessly and keep latency low — giving analytic teams fresh insights faster than ever before.
Start automating your S3-to-RDS migration today with these steps and free yourself from tedious plumbing so you can focus on what really matters — unlocking value hidden inside your data!
Happy migrating! 🚀
Have you used AWS Glue for similar migrations? Share your experiences or questions below!