Scrape, Clean and Store Zillow Apartment Data (ETL Pipeline)
Use Python to find an apartment on Zillow.
With many Zillow projects and tutorials focused on home buying, being a current apartment dweller, I thought it would be interesting to obtain Zillow apartment data, since the data returned is slightly less variable than home data and, in my opinion, can be more interesting to examine.
I’ll demonstrate the three main steps involved in getting recent apartment data:
- Scraping a Zillow web page for apartments in Orlando
- Cleaning/transforming the resulting data frame
- Storing the 400+ rows in a BigQuery table for later analysis
I’ll cover methods you may have encountered including: BeautifulSoup, Pandas operations for data frame manipulation, basic SQL and the BigQuery API.
Scraping Zillow
Unlike sites that feature heavy text like Wikipedia, Zillow incorporates many visual and dynamic elements like slide shows and map applications.
This doesn’t necessarily make it harder to scrape data, but it does mean you’ll have to dig a little deeper into the underlying HTML/CSS to find the exact elements you’ll want to target.
To get the initial data, we need to solve 3 problems:
- Find the relevant elements and store their output
- Increase the page count to account for all results
- Convert the resulting dictionary to a more legible and workable data frame (personal preference but highly recommended)
Build Your Pipeline To A Data Engineering Career
You’ve reached the limit of the public preview. The full version of this post includes the implementation details: The code, the edge cases, and the "why" behind the architecture.
When you join PipelineToDE, you get:
- The DA → DE Pathway Course: A structured roadmap to bridge the gap between analysis and engineering.
- Weekly Senior Deep Dives: Fresh, tactical insights on Python, Cloud (GCP/AWS), and modern orchestration delivered every week.
- Production-Ready Blueprints: Access to 80+ protected stories and code repos from my time in the trenches as a Senior DE
- The DE Job Board (Coming Soon): Exclusive access to a curated board of high-agency Data Engineering roles.