Building ETL Job: Transferring Data from MySQL to Redshift using Python

Extract, Transform, Load (ETL) is a data pipeline process that involves extracting data from a source system, transforming it in some way, and then loading it into a target system. In this article, we’ll demonstrate how to build an ETL job that extracts data from a MySQL database and loads it into a Redshift data warehouse. We’ll also implement the Change Data Capture (CDC) concept to capture delta changes and trigger this ETL job every hour.

Using Normal Python Script

Prerequisites

Python 3 installed on your local machine
MySQL and AWS Redshift instances up and running
mysql-connector-python and psycopg2 Python libraries installed
An orders table in your MySQL database with create_date and update_date columns

Step 1: Extract Data from MySQL

First, we will extract the data from the MySQL database using the mysql-connector-python library. Here's a simple Python function that connects to a MySQL database and fetches new records from the orders table:

import mysql.connector
from datetime import datetime

# set last_update to the start of epoch
last_update = datetime(1970, 1, 1)

def extract_new_records():
    global last_update
    connection = mysql.connector.connect(user='mysql_user', password='mysql_password',
                                        host='mysql_host', database='mysql_database')

    cursor = connection.cursor()
    query = f"""SELECT * FROM orders
                WHERE update_date > '{last_update.strftime("%Y-%m-%d %H:%M:%S")}'"""
    cursor.execute(query)

    records = cursor.fetchall()
    last_update = datetime.now()
    cursor.close()
    connection.close()
    return records

Step 2: Load Data to Redshift

The next step is to load the extracted data into Redshift. We’ll use the psycopg2 library to insert the records into the Redshift table

Building ETL Job: Transferring Data from MySQL to Redshift using Python

Using Normal Python Script

Prerequisites

Step 1: Extract Data from MySQL

Step 2: Load Data to Redshift

Related posts

Recent posts