Write Your First SQL ETL Pipeline (Part II)

How to create and load an aggregate table for your GCP usage using ETL principles and SQL commands.

Share
White loading text on black background.
Photo by Mike van den Bos on Unsplash

Writing A SQL ETL Pipeline: Desired Output

Although Python gets more love when it comes to creating ETL/ELT/EL pipelines, SQL can be just as efficient when it comes to creating a recurring load job.

This is part II in a two part series. If you’re at all lost, please see part I.

When creating a new pipeline, whether that be in SQL or Python, I find it especially helpful if I have an idea what my output should be.

Luckily, I included the desired output in part I.

I’ll re-share it here for your review:

Having written queries that pull specific fields like serviceDescription and perform transformations like extracting the year from invoice (invoiceYear), I want to focus on one column that requires an extra step.

Build Your Pipeline To A Data Engineering Career

You’ve reached the limit of the public preview. The full version of this post includes the implementation details: The code, the edge cases, and the "why" behind the architecture.

When you join PipelineToDE, you get:

  • The DA → DE Pathway Course: A structured roadmap to bridge the gap between analysis and engineering.
  • Weekly Senior Deep Dives: Fresh, tactical insights on Python, Cloud (GCP/AWS), and modern orchestration delivered every week.
  • Production-Ready Blueprints: Access to 80+ protected stories and code repos from my time in the trenches as a Senior DE
  • The DE Job Board (Coming Soon): Exclusive access to a curated board of high-agency Data Engineering roles.