Why Do My Data Engineering Requests Take Forever?
How data engineers can set realistic development expectations and respond to impatient stakeholders.
ZachOverflow
Why Do My Data Engineering Requests Take Forever?
ZachOverflow is a recurring column in which I attempt to answer one frequently asked data science question thoroughly and honestly. No oversaturated topics. No listicles. No clickbait. Just my (mostly) unfiltered responses based on professional experience, technical exposure and, yes, the occasional unsubstantiated opinion.
Currently job searching? Give yourself an edge by developing a personal project using my free 5-page project ideation guide.
Journalism school teaches you to do one thing really well: Scramble at the last minute. The practice is so widely adapted that deadlines are often set at the dead end of the day and assignments are often what is known as “day turns”, meaning you’re expected to begin and submit work within the same day. A certain kind of individual thrives in these conditions. Even if you’re the last minute type (as I definitely am), you would find yourself missing deadlines due to factors out of your control: Interviews falling through, equipment malfunctioning and any other range of personnel failures.
Had you just been submitting work to a supervisor, little mistakes could be forgiven or go unnoticed. However, a reporter’s ultimate stakeholder is the reading, listening or viewing audience. And, especially at the overwhelmed, under-experienced local level, teams would make cringe-worthy mistakes like airing broadcast videos without sounds, forgetting to adjust teleprompters, etc.
Because pressure and mistakes were the name of the game, this is how I assumed most jobs functioned. Software engineering and data engineering are not last-minute endeavors.
There’s a reason sprints are measured in weeks and not hours. In orgs that have mastered project management, there is a defined release process designed to refine, test and scrutinize code before it makes it into a product, pipeline or a customer’s hands.
And, unfortunately for those who live by the “get it done yesterday” ethos, these things take time which is why you will rarely or ever have a “day turn” experience as a data engineer. Often, when I receive questions from stakeholders about timelines and inquiries about speeding up a process, I remind them that the end goal is accuracy, not speed.
Since I’ve been getting more last minute requests than usual and I suspect, if you’re a working engineer you may receive requests of the same level of “urgency”, I wanted to offer an explanation for why the data engineering process takes “so long” and what to say when someone wants you to hurry.
One of the guardrails that will slow development time is a project management methodology. I use Agile, so this is what I’m most familiar with. Agile can be a double-edged sword. On one hand, it can restrict what you work on and make you seem less willing or able to “jump on” a last minute request. On the other hand, it allows you to efficiently allocate working hours to prioritize tasks linked to projects that have organizational impact. The difficult parts of working within an Agile framework, in my experience, are consistency and enforcement.
Sprint schedules can vary between and even within organizations, which can lead to a disconnect when speaking with stakeholders about expectations and deadlines. Like any framework, Agile is only effective if it is consistently enforced. Say yes to too many “quick favors” and you have over committed. Say no to too much and you quickly gain a reputation for not being a “team player.” Luckily, this is why teams have team leads, department managers and project managers. They act as the “enforcers” so they are able to speak with anyone who just wants a “quick data pipeline.”
If you’re receiving consistent last-minute requests, before you go to your manager, I would offer some variation of the following as a response to manage expectations and set future boundaries:
Hi (stakeholder)! Thank you for reaching out. Unfortunately, we’re unable to look into your task/deliverable/project at the moment. As a reminder, we work within an Agile methodology and work in (length of sprint) periods, consequently, we have already allocated resources for this sprint. If this request needs a higher level of priority, please escalate to (team lead/manager/technical project manager).
In my experience, this is a sufficient response that concisely and professionally offers an explanation for why you’re refusing the request and how such requests might be considered in the future.
It sets boundaries and expectations.
Aside from being restricted by project framework boundaries, data engineering projects can stall due to a lack of alignment on requirements. This could be due to a stakeholder’s lack of understanding of the data or a need to wait for clarity or further detail up the corporate chain. It is not abnormal for waiting for data engineering requirements to take several weeks. During this time, it may feel like you’re not doing your due diligence to figure out what your stakeholders want. Unfortunately, there’s not much you can do if they are meeting internally with their teams or waiting for the higher ups to make a decision.
The best way to make sure something is moving forward, even when it’s paused, is to politely and consistently follow up. And, most importantly, document communication (Slack, email, JIRA ticket) to show that you’re doing all you can to keep things moving on your end. The corporate ping-pong between departments and levels of management isn’t the most difficult thing to deal with in this scenario. The worst situation to find yourself in, and something that will most definitely make data engineering “take forever”, is a stakeholder that can’t make up their minds or doesn’t know what they want out of the data you’re tasked with fetching.
There are several strategies to engage these individuals to ensure that they’re thinking more deeply and concretely about their requests before approaching the data engineering team. For prompts (the human-to-human kind) to help encourage a decision, you can see my guide to requirements gathering:
The last point of disconnect that can make a stakeholder feel as if a data engineering project is taking forever is a lack of knowledge regarding the complexity of data ingestion and data infrastructure. Intuitive BI tools and code-less pipelines have spoiled data consumers by offering near-instant access to critical data. Worse, they’ve made the very difficult data ingestion process seem simple and transactional–when it definitely isn’t.
While requirements gathering calls may touch on aspects of the API documentation and other technical details, it is rare that a stakeholder will see any hint of the development process required to build robust, responsive pipelines. Even data analysts, who may write a ton of SQL, may not fully understand the undertaking involved with creating an automated data pipeline that keeps data continuously flowing.
Even hinting at the technical complexity required to effectively and accurately fulfill requests can help bridge the disconnect between request and fulfillment. One of the most interesting calls I had was with an analyst within my department. After working closely on a few projects, he wanted to know more about what it actually took to pull data and put a pipeline into production. Over the course of a 90-minute call, I provided a crash course in data engineering, taking him from documentation interpretation to CI/CD and final production.
His reaction? “I can’t believe you guys can and have to do all that every time.”
While you don’t need to subject every one of your stakeholders to data engineering 101, if you’re receiving requests that seem impatient in tone and come with unreasonable expectations, having a technical deep dive call may be one way to get on the same page. If I were explaining to a non-technical one how data populates their dashboard, I would start backward, beginning with what they see and working back to the upstream “magic” that generates that output.
No matter which factor impedes your pipeline development, it is important to resist the temptation to rush. A data pipeline is most effective when it is:
- Timely
- Robust (not prone to breakdown)
- Accurate
Consistently delivering timely, accurate data is critical when developing credibility within your team and among your stakeholders. If stakeholders know they can trust you to provide data that has been sufficiently tested and vetted, they are far more likely to lean on your team for help with larger, higher-impact initiatives.
The bottom line is even though development can seem fast-paced, data engineering is not a day-turn activity.
It takes many sprints to complete a marathon.
I need your help. Take a minute to answer a 3-question survey to tell me how I can help you outside this blog. All responses receive a free gift.