Using PyPDF For PDF →CSV Conversion To Find Missing Groceries
Convert PDFs to CSVs in an unusual–but practical–use case of real-life data engineering problem-solving.
There’s an epidemic plaguing my neighborhood. Thankfully, it’s not biological. And, being an apartment complex, the area is rarely a target for “porch pirate” package stealers.
Our current issue: Missing or wrong address grocery deliveries.
Though the errors span the delivery service spectrum from Kroger to Instacart, I’ve only had experience with Walmart Plus (it’s part of my cell phone plan). After several… incidents, I’ve become vigilant (or paranoid) about the accuracy of my household’s delivery. When I shopped groceries in-person or, in a worst case scenario, shopped myself, I could check to make sure I purchased–and made it home with–everything on my list.
Since I don’t want to have to tap in the Walmart app or scroll the website, and because I’m a nerd, I’ve been searching for a way to automate grocery tracking.
Walmart offers an API–but only to vendors and e-commerce partners. Based on the MFA that exists for the current sign in flow and discussions on the web scraping subreddit, I’m not going to attempt to develop a web scraper with an ever-changing proxy.
I’ve found a workaround, but it’s not ideal.
Build Your Pipeline To A Data Engineering Career
You’ve reached the limit of the public preview. The full version of this post includes the implementation details: The code, the edge cases, and the "why" behind the architecture.
When you join PipelineToDE, you get:
- The DA → DE Pathway Course: A structured roadmap to bridge the gap between analysis and engineering.
- Weekly Senior Deep Dives: Fresh, tactical insights on Python, Cloud (GCP/AWS), and modern orchestration delivered every week.
- Production-Ready Blueprints: Access to 80+ protected stories and code repos from my time in the trenches as a Senior DE
- The DE Job Board (Coming Soon): Exclusive access to a curated board of high-agency Data Engineering roles.