Data Scientists: Answer 6 Questions to Summarize Your Insights In 1 Sentence
A simplified writing technique to help you communicate complex data science problems, describe methods and share results.
Data Storytelling and Communication
Data Scientists: Answer 6 Questions to Summarize Your Insights In 1 Sentence
Want to create your own data-driven project like this? Create a job-worthy data portfolio. Learn how with my free project guide.
Working as a data engineer and having an undergraduate degree in journalism, I used to be self-conscious about my unconventional background.
However, over time, I’ve realized that the communication skills I developed are highly desired by managers who need data professionals that can interact with and communicate insights to internal stakeholders.
The problem is that these “soft” skills are still criminally undervalued in data science, data engineering and data engineering, not to mention the tech industry at large.
Few authors, maybe aside from Cole Nussbaumer Knaflic (Storytelling with Data; no affiliation), have written solely about the skill required to share technical knowledge without getting too into the weeds or, worse, “dumbing it down.”
To emphasize the importance of developing soft skills, in this post, I hope to share a very simple writing prompt that nearly every media outlet in the world uses to deliver information quickly, conversationally and compellingly.
My hope is to demonstrate how you can effortlessly streamline your data science-oriented communications.
5 Ws. 1 H.
Every newspaper you’ve ever read features 1 sentence that answers at least 6 questions.
These questions, known as the 5Ws and 1H, are foundational to the craft of journalism.
However, the dirty secret is that anyone in any industry, especially data science, can instantly improve their technical writing by remembering to include the 5 Ws and 1 H in the first sentence of their presentation, white paper or company-wide email.
Without further ado:
The Ws
- Who?
- What?
- Where?
- When?
- Why?
The H
- How?
That sentence I mentioned?
In reporter-speak, it’s called a lede.
And examining an example helps us see why this type of sentence is so powerful when communicating ideas.
Let’s say I wanted to report on a traffic accident based on these facts:
- When: The accident occurred on Tuesday afternoon
- What: An accident that stopped traffic
- Who: Six cars, including one tractor trailer
- Why: It was raining heavily
- Where: The accident happened on (fictional) I-999
- How: The cars swerved off the road
Writing a precise, concise account of this incident in one sentence is simple.
Answer the questions.
“A six car pile-up stopped traffic for two hours on I-999 Tuesday afternoon when a tractor trailer swerved off the road due to heavy rain.”
Notice how I didn’t waste words on irrelevant information like how often tractor trailers cause accidents or explain the concept of a traffic jam.
If you can’t remember the 5 Ws and 1 H, just remember two words that describe the ideal technical communication: Precise. Concise.
Pardon the interruption: For more Python, SQL and cloud computing walkthroughs, follow Pipeline: Your Data Engineering Resource.
To receive my latest writing, you can follow me as well.
What This Has To Do With Data Science
Since there will probably be few times in your data career which will require you to write traffic reports, let’s look at 3 examples which show how to apply this to scenarios you’ll face in each of the data roles.
Each example will begin with a scenario, a 5 W and 1H break down and a sample communication tying everything together.
1. Data Engineer
Your team just completed work to optimize an existing pipeline.
Refactoring, engineers were able to cut 150 lines of code.
By tweaking SQL queries, engineers reduced load time from 10 to 5 minutes.
The senior engineer discovered the opportunity for a performance increase during an audit of a stubborn AirFlow DAG.
The high-priority work was completed during the August sprint.
- Who: Senior engineer, along with engineering team
- What: Optimizing pipeline load times
- Where: Data warehouse
- When: August sprint
- Why: AirFlow DAG performance suffered
- How: Tweaked SQL queries
“The data engineering team, led by senior engineer Roy, completed their August sprint by optimizing the performance of stubborn_load_time DAG, decreasing load time from 10 minutes to 5 minutes and eliminating 150 lines of excess code.”
2. Data Scientist
A churn model, running on a VM instance, recently did better than expected when deployed, predicting an accurate outcome more than 80% of the time.
The lead data scientist credits the team’s emphasis on feature engineering and monthly re-training as the reason for the impressive performance.
Thanks to the new model, the data science team increased customer retention by more than 25%.
This occurred in Q2.
- Who: Lead Data Scientist
- What: Model outperformed expectations
- Why: Decrease churn; increase retention
- Where: VM instance
- When: Q2.
- How: Emphasis on feature engineering and monthly re-training
“A new churn model running on a compute-optimized VM outperformed expectations in Q2 with a precision rate of 80% increasing customer retention by over 25%; team lead George credits feature engineering and monthly re-training for the recent success.”
3. Data Analyst
The data analysis team just finished a new dashboard on Tableau accessible to anyone in the company.
The new dashboard features 5 dynamic, configurable visualizations.
Recently, C-level executives used the dashboard to pitch to new investors.
As a result of that presentation, investors pledged an additional $5 million to your company.
This occurred shortly before an annual investor retreat.
- Who: The data analysis team
- What: Investors pledged an additional $5 million
- Where: Tableau
- When: Before the annual investor retreat
- Why: Attract new capital
- How: A new dashboard with 5 configurable visualizations
“C-level executives, during an investor retreat, used the data analysis team’s new dynamic Tableau dashboard to pitch the business, earning an additional $5 million in capital.”
Takeaway
Even if you don’t consider yourself a writer, I hope you can see the power of breaking information down into 6 key questions.
By considering information in the form of the question you make sure you’re accountable to the answers you provide.
With the popularity of remote and hybrid working environments it is more important than ever to communicate clearly and concisely.
The sentences I came up with, while a bit wordy for a slide show, would easily be shared via your office’s messaging platform of choice like Slack or Teams.
Even if you continue to think like a data professional I encourage you, next time you have an important insight to share, to write like a reporter.
I need your help. Take a minute to answer a 3-question survey to tell me how I can help you outside this blog. All responses receive a free gift.