The 1 Question New Data Engineers Never Ask (But Absolutely Must)

Either afraid or stuck in old habits, new engineers fail to ask important probing questions; how devs can think critically.

Photo by Nik on Unsplash

Currently job searching? Give yourself an edge by developing a personal project using my free 5-page project ideation guide.

As overwhelmed as I was during my first week as a data engineer, I gave my team plenty to consider. Since I was a new hire and very junior, I was in probationary limbo for nearly 2 months. And even though I couldn’t materially contribute, I could fulfill one request: Ask us anything. Coming from a journalism background, I’m very comfortable seeking out answers.

In fact, until my pivot to data engineering, bothering people in exchange for information was my most marketable skill.

And while the questions I asked helped me “get up to speed”, recently I realized that in my quest to understand processes, even in years in both an entry-level and now-senior position I (and many, many juniors) don’t ask one question enough.

It’s a line of thought best summarized by Jurassic Park’s iconic quote, which I’ll paraphrase.

“Your [Data Engineers] were so preoccupied with whether or not they could, they didn’t stop to think if they should” — Dr. Ian Malcom, Jurassic Park (parenthetical mine).

If you’re new to a workplace or learning a new technology, you likely follow this line of logic. Collegiate learning and early professional development, like a simple programming script, emphasizes input and output. Onboarding is also fairly binary.

In either case, it’s an effort to ask or answer the same question: “How do you do x?”

Often, this is phrased as a binary: “(Yes or no) Do you know how to do x”?

Do you know how to write SQL that sources data to solve our business needs?
Do you know how to use GitHub for CI/CD?
Do you know how to deploy a cloud function?

The implicit bit of this “how do you?” line of thought is an unspoken “…So I don’t have to teach or worry about you.” With new hires struggling to stay afloat among a deluge of org-specific vocab, domain knowledge and nuances of the chosen tech stack, there is no time to consider “should” when there are stakeholders, deadlines and quarterly goals.

Unless you founded the data team, until you rise in the ranks, your job is to learn, churn out efficient code and not burn through too much of your org’s compute resources. Also be pleasant to work with. That’s underrated.

But is this no-hard-questions-asked approach really productive?

Failing to ask “why” makes you a close-minded developer. Professionally, this also siloes you as someone who just churns out code and doesn’t think critically about the scope of your problem and the efficiency of your solution. Unless you’re working for Elon Musk at the platform formerly known as Twitter, volume of code is not a reliable metric to determine how good you are at your job. At 3+ years of experience, I’m still relatively new to the data world, but I’d bet that few, if any, engineers earned raises and promotions solely on the merits of their GitHub commit histories.

A recent innocent but effective “why” helped crystallize this concept and, frankly, inspired this piece.

Without getting into too many specifics, I’m part of a subset of a team that is redesigning pipelines and a data storage framework for one of our org’s high-priority data sources. For context, I’ve been working with this data for nearly two years at this point. Up until this meeting, the data fulfilled stakeholder needs, but, like me, stakeholders are really good at asking questions.

One of their favorite questions: Can you get x data? Unfortunately, the data sources we accumulated had become quite complex. When I say I’ve been working with this particular data for nearly two years, I don’t mean occasionally I have a ticket.

I’ve spent hours (probably days) at this point designing, refining and tearing apart various iterations of pipelines. Fun fact: I worked on critical work for this initiative on the floor of my wife’s studio apartment when she briefly lived and worked in France. Maybe it’s the lack of air conditioning that adds increased pressure, but I’ve written some of my best code during my summer living in Europe.

For this particular meeting, however, a more senior team member joined. They listened intently to a comprehensive technical design meeting. Then at the end, they chimed in.

“I have a question… Why? Why do they need this? Why are we doing this?” Maybe it was the lag of my post-hurricane WiFi or maybe there really was a pronounced, contemplative pause.

It was nearly as jarring as a moment in HBO’s Silicon Valley.

“I have a question. That was horrible.”

— Gavin Belson, Silicon Valley

In either case, it yielded an opportunity for me personally to step back from the grind and, for the first time, question why we needed this increasingly complex data source. Instead of team members getting defensive, it led to a productive conversation about not just refining our approach to ingestion, but also inspired talk of how we can manage stakeholder expectations and softly encourage them to “do more with less.”

As important as these questions are in the context of larger initiatives, I’d argue that individual developers need to constantly ask “why?”

The downside of doing without thinking critically is that, once you’ve reached at least an intermediate level of proficiency, you can unconsciously go on autopilot.

This leads to what I call a paint-by-numbers approach to coding and data science. I’m not saying you need to ponder every variable naming convention you use, but you should at least be able to rationalize, if not defend, your micro-level design choices. Knowing why or (why not) you’re doing something can help you adapt when something changes.

For instance, I was a habitual user of Pandas’ .append() method. Unfortunately, to my disappointment, Pandas 2.0 deprecated .append(). I easily could have panicked and said “Iterating and appending key values to an empty data frame is how I’ve always converted JSON to a data frame. What am I going to do?” But being forced to adapt to the change made me think about what prompted that habit initially.

The explanation I arrived at was that, for whatever reason, I’m most comfortable working with data in list form. I know that this piece is all about asking “why”, but I honestly can’t justify this with a technical explanation. It’s like why I drive a certain route or why I like docking boats on a certain side (typically with a dock on the right). It’s just comfort.

And if your approach is legible, resource-efficient and fits with your team’s style guide, then I don’t think anyone has grounds to criticize it. But there is one context in which it is unacceptable to say “that’s just how I do it.”

When I initially applied to jobs out of school, it’s well-documented that I spent a lot of time crafting a portfolio. In doing this, I clearly thought a lot about how I’d approach my projects and, ultimately, present the data.

There were a few times in interviews where I got “tripped up” when it came to technical questions. It’s not that I didn’t know the technology (I truthfully wouldn’t have put it on my resume otherwise), it’s that I didn’t know the tech that much. But in an interview with senior management after presenting my projects I got a question that nearly left me tongue-tied.

For context, I used BigQuery for several projects I discussed and, after learning that this was highly desired in a new hire, I mentioned it in this interview. After the interviewer patiently sat through my discussion of my projects and technical skills they asked:

“You mention BigQuery a lot, which is great. That’s something we need. But then in this project in your portfolio you use SQL Lite. Why use two databases? And especially SQL Lite?”

The honest answer, at the time, would have been “Because that’s what school assignments used and I’m too busy applying for 10+ jobs a day to really question anything I’ve learned.”

Caught off-guard, I managed to croak: “For my learning.” Mercifully, this was acceptable (enough) and we moved on to other topics.

The core of that overly honest response should prompt anyone who frequently uses a technology due to a school mandate to question the applicability of the tech and the marketability of the skill associated with that tool.

In short: Just because you learn something in school doesn’t mean you’re forever bound to that tool or approach. I wrote recently about my habit of using Jupyter Notebook for development having spawned out of a school requirement. Having had time to reflect and compare development environments, I’ve since branched out beyond the school’s environment of choice, PyCharm, to the more professionally apt VS Code (nothing against PyCharm). This instance made me realize it’s ok to question past and present choices in tech stacks, especially when it comes to well-regarded “tools of the trade.”

While I’m encouraging you to ask “why” more often than you do (and, make no mistake, this is advice I need to heed as well), your “why” needs to be tactful. Asking “why” too much can remind a more senior teammate of a child incessantly asking a parent “why” at inopportune times and annoy them.

I’ll close with some examples of appropriate whys:

Why are we using x tool over y when x clearly offers a more streamlined integration with our data warehouse?
Why are we dedicating development resources to solving this issue when there isn’t a clear business outcome?
Why are stakeholders asking for a new data pipeline when this existing table provides nearly all of the dimensions they’re seeking?
Why are we paying for x service when we could feasibly build our own solution?

Tactful, occasional whys can raise your professional profile and distinguish you as a critical thinker who considers not just code-level choices, but also business implications.

Ideally, each “why” and subsequent choice helps you build credibility.

And then the response to your next big request will be my favorite “why” response of all.

“Why not?”

I need your help. Take a minute to answer a 3-question survey to tell me how I can help you outside this blog. All responses receive a free gift.

The 1 Question New Data Engineers Never Ask (But Absolutely Must)

Read more

3 Python Web Scraping Projects You Can Do In 1 Weekend

The Ultimate Guide To GCP Observability

Module 1: The Architect Mindset

Module 3: Automation