avatarElliott Stam

Summary

The article emphasizes the importance of understanding the multiple layers of data management to effectively lead and unblock data-intensive product teams.

Abstract

The article draws parallels between the layers of a cake and the layers of data management, underscoring the necessity for product leaders to comprehend each layer to ensure successful product delivery. It outlines the layers as raw data, derived data, summarized data, and curated product data, and stresses that each layer requires unique ingredients for success. The author highlights the role of product leaders in negotiating roadmaps and timelines, advocating for the integration of product value into data from the outset. The article also touches on the challenges of data quality, the application of business logic, and the need for a balance between engineering and product expertise to create insightful, targeted data products.

Opinions

  • The author believes that not understanding the different layers of data management can debilitate a team's ability to deliver a product vision.
  • Product leadership is seen as responsible for the successful execution of the product vision, which includes a deep understanding of data as a product.
  • There is an emphasis on the importance of data quality from the raw data layer, adhering to the 'garbage in, garbage out' principle.
  • The author opines that data engineering and the application of business logic are critical to the derived data layer and are not merely technical concerns but central to product leadership.
  • Investment in the derived data layer, though often undervalued, is framed as a value multiplier that can prevent project delays and enable the discovery of new product opportunities.
  • The summarized data layer is presented as a potential bottleneck if disconnected from business context, leading to shallow insights and the need for more targeted data summaries.
  • The author advocates for a 'top down' approach to data management, starting with product ideas and working backwards to design the necessary data architecture.
  • The article suggests that a balance of product and engineering awareness is essential for data teams to avoid producing monolithic, less valuable data tables.
  • The author concludes that increased awareness of the various layers of data management can lead to more consistent product delivery and enable teams to succeed.

Understanding the Multiple Layers of Data Management Enabling Products

What product leaders need to know to get unblocked by data

Photo by American Heritage Chocolate on Unsplash

When I saw the layered cake pictured above I immediately wanted to eat it, but I also realized there are many useful parallels to the layers of data management driving modern data products.

Similar to how each layer of the cake involves chocolate, each layer of data management involves data. No surprise there.

But each layer is unique. The cake has dark, milk, and white chocolate layers. Different ingredients are required to successfully execute each layer of the cake, just as different ingredients are required to successfully execute each layer of our data management processes.

And so we arrive at the point of this article: delivering on a product vision requires having the right ingredients at each layer of development. For data-intensive products, not understanding the different layers of data management required to deliver the product is as debilitating as not understanding the layers of a cake we are trying to bake.

If we (product leaders) only understand the top layer (white chocolate), we will struggle to unblock our team when it encounters friction at any other layer. We might fail to see why we are even blocked to begin with, and miss opportunities to set the team up for success.

Product leadership is responsible for negotiating roadmaps, deliverables, and timelines. When we don’t understand what’s required to successfully execute the product vision, we put delivery of the product at risk.

On that note, let’s dive into the different layers of data management product leaders should be aware of and how we can unblock teams experiencing friction at any layer of the data management stack.

For more data engineering advice and reflection find me on substack! And tune into my YouTube channel for hands on tutorials.

Labeling the layers of data management in data products

Here are the layers I’ve encountered in the modern analytics stack, having worked with teams spanning several industries including entertainment, energy, finance, retail, health care, and advertising:

  • Raw data, typically very detailed and unrefined/unprocessed
  • Derived data, resulting from processing and transforming raw data
  • Summarized data, aggregations of raw and/or derived data
  • Curated product data, combining derived and/or summarized data

In some environments the lines between these layers may get blurry. Often the “raw data” observed at the beginning of a data product’s lifecycle has a wealth of software engineering behind it as well. But to simplify the concepts, we’ll consider “raw data” as data existing in the most granular and raw format accessible to the data engineering side of the operation.

A quick hot take for product leaders: data is your product

Photo by Cullan Smith on Unsplash

My entire career has been strategizing, designing, engineering, and explaining analytics products. Not every project has been successful, and over the years I have observed that the easiest way to fail is to treat data and the processes/pipelines surrounding it as simple commodities.

Data starts as exactly that: a commodity. But through the application of business logic, context, and product/design tradeoffs your data becomes a product. This product is valuable to other internal teams, and it is valuable to external applications.

This is why I am so adamant in helping product leaders understand the different layers of data. In the modern technological landscape, we need to understand how raw data ends up becoming a product in order to be effective at managing the ins and outs of delivering results.

Product leaders, hear me: data is no longer a dumb, unfiltered resource you plug into your front-end application. Data should come pre-loaded with product value before it reaches the application layer.

Raw data: what you need to know

A simple explanation of raw data is that it’s the most granular and detailed information your data teams have access to. For example:

  • raw transaction details
  • raw customer information
  • raw event/instrumentation data
  • messy data from a third party system

At this layer of the cake, product leaders should be primarily concerned with two things:

  1. Do we have all the raw ingredients we need?
  2. Are the raw ingredients high quality?

If the answer to these two questions is yes, then we are in good shape and our engineers and analysts are one step closer to being set up for success.

If the answer to either question is no, then we have some work to do in enabling the team for success. Often this requires taking charge of:

  • Setting expectations with leaders and stakeholders.
  • Making product tradeoffs, adapting to the ingredients available.
  • Collaborating with other teams to scope out efforts to introduce new (or improve existing) ingredients.

We could get lost in the topic of “raw data quality”, so for now let’s leave it at the old idiom: garbage in, garbage out. Advocating for data quality should be a priority even at the highest levels of product leadership.

The raw data layer can be a bit abstract for non-technical leaders and it’s not a common consideration when planning roadmaps and timelines of deliverables. Ironically, inadequate data at this layer dramatically reduces the ability to go to market with innovative ideas. Ideas struggle to convert into products when we lack key ingredients for success.

Derived data: applying context to your raw data

Imagine you go to a restaurant and order a meal. How would you feel if the server dumped a bunch of raw vegetables on your table, with dirt still clinging to the roots? You might feel like a business user when they connect to “self service analytics” tapping into dirty, raw data.

Photo by Markus Spiske on Unsplash

The derived data layer is where raw data gets refined and transformed into the fuel that will power your products. This involves cleaning data, and often it also involves applying business logic and assumptions which have significant impact on downstream processes. An important part of setting products up for success is ensuring the business logic implemented at this layer is correct and consistent.

Here is what product leaders need to know about the derived layer of data:

  • Processes exist to extract, transform, and load (ETL) data.
  • Your team owns some of these ETL processes.
  • Your team likely depends on ETL processes owned by other teams.
  • Business logic and assumptions are inevitable at this layer.

Distilling the “so what” for product leaders:

  1. Business logic and assumptions are being applied when ETL processes convert raw data into a more business-relevant format.
  2. If any of the business logic or assumptions are incomplete or incorrect, this negatively impacts your product.
  3. If any of the incomplete/incorrect definitions are owned by other teams, we need to engage in cross-team diplomacy to address it.
  4. Making tradeoffs between the available ingredients and the product vision is a dance between engineering and product.

If you’ve ever seen two products (or dashboards) which don’t agree on an important metric, odds are good that the root cause exists at this layer of data management.

On the surface this looks like a pure data engineering problem, but remember: data is the product. This is absolutely a product leadership concern, and product leaders can move things in a positive direction by helping senior managers understand the value proposition of eliminating friction to set the product’s lifeblood (data) up for success.

Photo by Iván Díaz on Unsplash

Investing resources into this layer tends to be seen by senior product leadership as pure cost (little or no return on investment), and projects here often take a back seat to “net new” feature development. There is always pressure to break new ground and ship something innovative. Effective communication in this genre of conversations involves framing the narrative for non-technical leaders to understand how eliminating friction in various technical processes will multiply the value generated.

Driving on a flat tire slows down the journey and can cause further damage, resulting in higher costs and missed deadlines. Investing in fixing the flat tire is a value multiplier in terms of helping your team and your products reach the destination. That investment can also lead to discovering new product opportunities, as your team is enabled to be flexible and move quickly when developing valuable features.

Summarized data: a blessing and a curse

When I first entered the analytics scene I rarely saw analysts getting involved with the derived data layer. Data was aggregated and delivered to summary tables by a separate data engineering team, and an analyst living closer to the business context would take things from there. The recipe looked like this:

  • Grab your business intelligence (BI) tool of choice
  • Get a data analyst or BI analyst
  • Get connected to the database containing summary tables
  • Apply SQL as needed to join and deliver data to the BI tool
  • Connect to the summary tables/views and build dashboards

This is still very much a relevant pattern today, and if you recognize this pattern then you might recognize these questions:

  • “Why can’t we get visibility on XYZ metric?”
  • “Why can’t we drill down to specific transactions/products/customers?”
  • “How can we get more actionable insights from the dashboard?”
  • “Why do our top two dashboards show different numbers?”

These are the questions which drive many modern analytics deeper into the engineering stack to take ownership of the derived data layer. As more data gets generated, more engineering processes are being developed to manage that data at scale. Where decades ago we had “IT” teams, a slice of those teams forked off into data engineering, which have since forked into data platform engineering, analytics engineering, and more.

Photo by Jens Lelie on Unsplash

The roles are still evolving, but one thing is clear: the scope of work and responsibilities of data teams is expanding. And the importance of knowing when and where to inject product expertise into the pipelines is increasing.

Here is what product leaders need to understand to keep up with the times:

  • Not all summaries are created equal.
  • Summary tables can easily over-simplify the nugget of value you seek.
  • Summarizing data without product intent leads to shallow insights.
  • Product leaders should be aware of the inverse relationship between granularity and performance, and how this influences product tradeoffs.

Challenges surface in the summary layer when data engineers are disconnected from the business but are responsible for building summary tables for analysts. When business context is not imprinted on the data engineering effort, data engineers typically default to summarizing as much data as possible such that business intelligence tools (or other downstream systems) can work with it. Analysts then use that data because it’s what they have available, but they struggle to dig deep enough to extract the information their business counterparts seek.

Over the years I have observed teams do one or more of the following when encountering the challenges above:

  1. Introduce more business-savvy engineers (analytics engineers) to bridge the gap between business and engineering.
  2. Depend less on generic denormalized tables attempting to summarize everything in one place (wide breadth, shallow depth for insights).
  3. Depend more on adhoc analysis, which is capable of diving deep into the data but is less suitable for generic summary tables.
  4. Develop targeted summary tables tailored to specific product needs, tightly coupled with product objectives.

The fourth point above leads into the final layer of data management product leaders need to know about: curated product data.

Curated product data: a well-oiled machine

Photo by Tim Mossholder on Unsplash

Not all summary data is ready for the big stage. Good product design doesn’t happen by accident, and neither does good data architecture.

There are “ground up” data pipelines at every organization. That is, pipelines starting with large volumes of data transformed and summarized until it’s useable by downstream systems such as dashboards. We don’t always know how the resulting data will be used, but we know that it needs to be summarized before it can be used.

This approach continues to be relevant for various efforts ranging from automated workflows to self-serve analytics. But skimming the surface of aggregated data often falls short of scratching the product itch. It leaves a lot of value on the table, struggling to extract insights buried deep in the granular data.

Often we need to apply specific business logic and assumptions on granular data to produce datasources capable of serving insightful, targeted information to our products.

This calls for a “top down” approach:

  • Start with the product ideas, or with product-savvy engineers who see product potential existing in the data.
  • Identify data capable of supporting the ideas.
  • Collaborate (engineering + product) to discover what can be accomplished using the available ingredients.
  • Surface product/technical tradeoffs and define the minimum viable product data layer.
  • Design the data architecture; start with what the product layer needs and work backwards through the other data management layers.

This really is a process. It requires product people capable of communicating with engineers, and it requires engineers capable of communicating with product. Ideally it involves senior engineers who have half their brain in product, and half their brain in engineering. Having a good balance of both in a data professional is valuable. Investing in that value opens the door to innovative data products; not investing in that value tends to lead to lacklustre “ground up” analytics.

The product/engineering balance is important. Data teams that are all engineering with no product awareness are destined to produce monolithic tables delivering a fraction of the data’s hard-to-reach value to the business layer. Leaders who are all product and lack technical awareness are likely to apply pressure in the wrong places, wondering why their team can’t seem to make the data sing.

Modern analytics teams that are set up for success have the product vision and the technical skills to develop curated product layers, capable of carving a path of value from the most granular data available. The skills making this happen are part engineering, part product. I call it product engineering.

Knowing is half the battle

Photo by Greg Rakozy on Unsplash

In this article we zoomed in on one slice of what’s happening behind the scenes in data product teams. There’s plenty more to discuss, as successfully navigating real-world product delivery involves even more engineering, identifying what your users want, figuring out what those same users really need, and awareness of the office politics that can make or break the supply chain of resources required for the team to be successful.

My goal here is to get leaders of data-intensive product teams thinking about how they can set their teams up for success. It can be stressful and frustrating for everyone involved when product expectations don’t align with engineering realities. Hopefully the topics we touched on in this article will be a catalyst for readers to consider how data engineering expectations and ownership is shifting within their own organization.

Let’s close with one more nod to the layers of data management impacting data-intensive product delivery. Keeping these top of mind can make conversations between engineering and product more fruitful:

  1. Raw data layer
  2. Derived data layer
  3. Summary data layer
  4. Curated product data layer

Successfully scoping product requirements, negotiating priorities across the organization, setting expectations with senior leadership, and unblocking your data team are all important factors in shipping data products.

Increased awareness of the various layers of data management behind the scenes will help you enable your team’s success and deliver more consistent results. The data does matter, and perhaps equally important to the data itself are the processes you develop allowing data to become a valuable resource powering your products.

Product Management
Data
Analytics
Data Engineering
Product Leadership
Recommended from ReadMedium