The Question That Every Data Engineer Should Ask
The path from business problems to data solutions
Hello there! Welcome to my new blog series: Data Engineer Things! In this series, I write about what I’ve learned in my data engineering journey and random thoughts on data and engineering. As always, opinions are my own.
What’s the secret sauce behind good data engineering solutions? Technical expertise in programming, distributed systems, and data modeling (just to name a few) are certainly important, but they’re not enough. Because every data engineering problem always starts with an ambiguous business problem (if not multiple), data engineers need to first understand the business motivation and translate it into a concrete data problem. Therefore, a good data engineer always asks this question before jumping into solutions:
What business problem are you trying to solve?
Translating a business problem into a data problem is such a critical part of data engineering, but we don’t talk about it enough. And that’s what this blog post is all about:
- Why is business context critical for data engineering?
- How to build the bridge between data engineering with business?

The bigger picture
Why is it so important for data engineers to be equipped with the business context? The short answer is that business context provides the bigger picture of the data engineering problem: what the short-term and long-term business goals are, what and how different systems and teams are involved, and who will be the data producers and consumers. Grasping the bigger picture is essential for data engineers to be able to successfully translate an open business problem into well-defined data problems and build optimized data solutions accordingly.
Now let’s deep dive into the connections between business and data engineering.
The business model determines the nature of the data, and business context enables data engineers to truly interpret the meanings and values of the data they work with. With a comprehensive understanding of the data, data engineers can better predict traffic, handle edge cases, and monitor data quality. As a result, the data systems built will be more scalable and resilient and serve more accurate data insights.
Performant data solutions are only efficient for the use cases in mind. In other words, there is no one-size-fits-all solution in the data engineering world. Many design aspects such as data modeling and data storage heavily depend on expected usage patterns. For example, Iceberg tables are often partitioned by columns that users will likely filter or aggregate on. Moreover, given many distributed storage options available these days, such as Kafka, Cassandra, Druid, etc., data engineers must carefully choose among these options based on downstream use cases (e.g. further real-time processing, random by-key lookup, or real-time dashboard). Knowing the business use cases helps data engineers comprehend how their data products will be leveraged (directly or by other teams) to derive business insights. Business context also enables data engineers to foresee long-term and future data needs in addition to what’s required in the short term.

Not every business problem should be a data engineering problem. As data engineers acquire the business context behind a data request, they will inevitably be exposed to various decoupled systems in the organization. It’s not rare that at the end of the day we conclude that part or all of the solution should be built upstream or downstream. In other words, knowing the bigger picture promotes better decision-making in the engineering ecosystem. This is also how data engineers can impact the requirements of the business problem with their unique data knowledge. The leaders need to make sure data engineers feel empowered to push back on requests that will not be best solved through data engineering.
Finally, understanding business problems help data engineers recognize the impact of their work. In the real data world, a data engineering solution is often not the final answer to a business problem. For example, data analysts would build a product metrics dashboard on top of the dataset created by data engineers, and researchers could integrate it as new features in machine learning algorithms. Given the backend nature of data engineering work, the value of data engineering can easily be underappreciated if the collaboration model in the organization is suboptimal. Building the connection between data engineering work and business initiatives is important for promoting the visibility and morale of the data engineering team.
Data inquiries
While not all business problems require new data engineering efforts, handling data inquiries on existing data systems is a big part of the data engineering team’s day-to-day operations. These data inquiries often don’t come with any business context. In my own experience, among all the data inquiries my team received from other teams, many of them turned out not to be the best things to do once we probed the use cases behind them.
Let me give you an example. A data analyst once posted in my team’s Slack channel asking how to query a specific type of event in client logs. We could have shared a query doing exactly what he needed. Instead, we asked him to tell us more about his use case, and it turned out client logs are not the best dataset for calculating the product metric he had in mind.

In reality, every data question is exploratory in nature, so take it with a grain of salt. To summarize, when the business motivation behind a data request isn’t clear, data engineers should always ask for the business context before approaching solutions.
Partnership
Now that we have acknowledged the significance of business context for data engineering, that leads to another question worth discussing… The business side is complex and constantly evolving, how can we make it easy for data engineers to gain the necessary business context?
The right partnership can build the bridge between business and data engineering.
In my opinion, the best way for data engineers to comprehend business problems is to be part of the cross-functional project collaboration. To be specific, data engineers should be involved in project planning and sync meetings so that they have the full picture of the project. However, it is unrealistic for data engineers to participate in all the meetings because (trust me) there will be too many meetings to go to! I would get nothing done if I went to all the project syncs. To be mindful of time management, data engineers will have to make a judgment when their meeting attendance is necessary and when offline collaboration is more suitable.
In addition, it’s important for data engineers to establish boundaries and trust in the partnership. Boundaries allow data engineers to push back and say no to partners. When a project requires data engineering efforts, the bandwidth of the data engineering team should be considered before committing to a timeline. When data engineers believe a data feature shouldn’t be implemented in the data engineering systems, they should feel entitled to say no (for good). When it comes to trust… everything goes south when there is no trust. Trust makes teamwork a more smooth and more pleasant experience for everyone: e.g. when learning about downstream and upstream systems from colleagues or when convincing partners when a decision is suboptimal, etc.
I talked more about organization structure and data engineering partnership in my last blog post: Data Engineering Excellency at Netflix.
So… What do you think? Feel free to leave a comment if you have any feedback or questions. See you next time!
Want to read more about Data Engineering? Check out other articles from Data Engineer Things.
