Parse 12 Months Of Credit Card Statements In 3 Minutes
How to use Python to read multi-page PDFs, transform unstructured data and SQL to format the final result in BigQuery.
How to use Python to read multi-page PDFs, transform unstructured data and SQL to format the final result in BigQuery.
Set up a virtual environment, install Python & pip and run Python scripts in a Google Cloud Compute Engine virtual machine.
Distinguishing between Google Cloud Platform and a typical API’s authentication process emphasizes the need for secure credential storage.
Leverage BigQuery SQL table metadata to deduplicate, partition and delete data — all using only one word.
Covering GitHub versioning, CI/CD pipeline development and scheduling jobs within Google Cloud Platform.
A risk-averse approach to “flipping the switch” from test tables to production tables featuring a subtle BigQuery SQL function.
How data engineers can set realistic development expectations and respond to impatient stakeholders.
Convert PDFs to CSVs in an unusual–but practical–use case of real-life data engineering problem-solving.
Spreadsheets are a breeding ground for data inaccuracy — unless you can solve one core problem.
How data engineers can anticipate, adapt to and recover from inevitable data downtimes and API outages.
The 35 line SQL Query That Powers A Top 3 SQL Publication’s Analytics (Part II).
One function you gloss over has the power to save you hours of development time — and preserve data accuracy.