Using BigQuery SQL To Prettify Your Ugliest URLs And Extract Insights

Learn BigQuery SQL techniques to extract only the most relevant elements of your ugliest URLs.

Share
Spilled paints on blank canvas.
Photo by Ricardo Viana on Unsplash

Making Sense of Messy URLs

Often discounted as mere metadata, URLs you scrape, ingest or create contain a lot of relevant information that is often ignored simply because they’re too “ugly.” It’s true that some URLs can be off-putting. But more often, URLs contain information that doesn’t just make for interesting analysis; it can actively drive traffic and, by extension, revenue for your clients or yourself.

Take, for instance, the utm paramter you’ve undoubtedly seen in paragraph-long URLs. For digital marketers, this is a useful tool because it tells us what source pushed you to our site, landing page or affiliate link. Though, if you’re learning SQL and not interested in digital advertising, I, admittedly, may have lost you by now.

If you don’t care about gathering precise data to drive traffic you should care about parsing URLs because URLs represent some of the more complex STRING type data you’ll encounter. The functions you’ll use to extract the “good stuff” from URLs are applicable to a lot of STRING operations in SQL.

Therefore, my intent isn’t to bore you with a bunch of functions exclusive to parsing URL information. Instead, I hope to share a few approaches that can be transferrable when working with longer strings of many types like message text or even genetic information.

Build Your Pipeline To A Data Engineering Career

You’ve reached the limit of the public preview. The full version of this post includes the implementation details: The code, the edge cases, and the "why" behind the architecture.

When you join PipelineToDE, you get:

  • The DA → DE Pathway Course: A structured roadmap to bridge the gap between analysis and engineering.
  • Weekly Senior Deep Dives: Fresh, tactical insights on Python, Cloud (GCP/AWS), and modern orchestration delivered every week.
  • Production-Ready Blueprints: Access to 80+ protected stories and code repos from my time in the trenches as a Senior DE
  • The DE Job Board (Coming Soon): Exclusive access to a curated board of high-agency Data Engineering roles.