The Fallacy of Big Data: Data Science and the Theory of “Jobs to Be Done”
While the fever-pitch around ‘Big Data’ has been supplanted in favor of the AI and Machine Learning craze, the central tenant of both is largely the same:
With enough data, we can solve every problem.
There was a time when I believed this idea was true. It’s the reason I got into the Data Science field — I once felt that data could tell you everything you needed to know. The truth is, Big Data has its limitations.

One area where Big Data has particularly weak application is in Innovation. Well-meaning and talented business leaders fall into this specific ‘Big Data’ trap all the time — they attempt to find ‘gems’ in their disparate data sets through extensive correlation analysis and customer segmentation. Unfortunately, this is where most organizations go wrong — correlations won’t tell you if any specific individual will actually buy your product.
Know Your Customers’ “Jobs to Be Done”
Harvard Business School Professor and renowned business thinker Clayton Christensen’s Disruptive Innovation Theory suggests that companies open themselves up to disruption when they pursue ‘sustaining innovations’ (the products and services that worked for them in the past). By unwittingly (and often logically) doing what’s always worked, they allow smaller, more agile companies to create products that open up the market to a whole new population of consumers.
Innovation Theory is useful for explaining why businesses fail — but it’s not very helpful at explaining how to successfully make new products and services that consumers want to buy (how to actually innovate). For that reason, Dr. Christensen created a complementary theory to Disruptive Innovation, called the theory of “Jobs to Be Done.”
Jobs-to-be-Done theory provides an extremely useful toolset to accurately predict if a product will succeed. The Harvard Business Review article Know Your Customers’ “Job to Be Done” explains what ‘the job’ is:
We all have many jobs to be done in our lives. Some are little (pass the time while waiting in line); some are big (find a more fulfilling career). Some surface unpredictably (dress for an out-of-town business meeting after the airline lost my suitcase); some regularly (pack a healthful lunch for my daughter to take to school). When we buy a product, we essentially “hire” it to help us do a job. If it does the job well, the next time we’re confronted with the same job, we tend to hire that product again. And if it does a crummy job, we “fire” it and look for an alternative.
Mr. Christensen’s theory helps us contextualize products in terms of the jobs you would hire them to perform rather than their inherent features. The well-known phrase “People don’t want to buy a quarter-inch drill. They want a quarter-inch hole!” applies perfectly here. When a customer buys a product, they ’re not just buying the product, they’re hiring the product to solve a job that has arisen in their lives.
The cause by which a customer comes to a ‘hire’ or ‘fire’ decision is rooted in the very specific progress that they are trying to make in their unique circumstances.
A Prediction Problem
Given how crucial innovation is for businesses, it’s surprising how bad at it they are. McKinsey reported that 84% of global executives stated that innovation was an important part of their growth strategies, yet 94% were dissatisfied in their organizations’ innovation performance.
Why are organizations so bad at innovating?
Creating products and innovating business models is a prediction problem — but it’s not one that can be solved with an algorithm. Businesses need to predict, with a high degree of accuracy, if a product will sell or not in order to thrive. In the world of Big Data, Prediction is the realm of Data Scientists. The customary belief in our current business climate is that with enough data, we should be able to predict whatever outcomes we want — even a successful innovation. Dr. Christensen explains why this is a fallacy:
The fundamental problem is, most of the masses of customer data companies create is structured to show correlations: This customer looks like that one, or 68% of customers say they prefer version A to version B. While it’s exciting to find patterns in the numbers, they don’t mean that one thing actually caused another. And though it’s no surprise that correlation isn’t causality, we suspect that most managers have grown comfortable basing decisions on correlations.
Why is this misguided? Consider the case of one of this article’s coauthors, Clayton Christensen. He’s 64 years old. He’s six feet eight inches tall. His shoe size is 16. He and his wife have sent all their children off to college. He drives a Honda minivan to work. He has a lot of characteristics, but none of them has caused him to go out and buy the New York Times. His reasons for buying the paper are much more specific. He might buy it because he needs something to read on a plane or because he’s a basketball fan and it’s March Madness time. Marketers who collect demographic or psychographic information about him — and look for correlations with other buyer segments — are not going to capture those reasons.
Data Scientists and business leaders sometimes forget that data is only a representation of a much more complex reality. All data is man-made — somebody at some point in time chose what data to collect, how to collect it, how often, and where to put it. Quantitative data, the kind that we can put into a regression model, is enticing to us. We believe that the numbers will tell us the answers and point us in the right direction.
There’s a pervasive belief that there is some set of ideal data that can, together, yield the perfect insights about customers. It’s just a matter of figuring out what the right data is. In short, we can know the “truth” if we just gather the right data in quantitative form.
(Competing Against Luck, Christensen et al., 2016, pg.189)
At least for the time being, data sources and our methods of data collection are not nearly complex enough to capture signal that can indicate if an innovation will work or not. The “job to be done” is far too nuanced and personal to be solved with inputs from a database. Nevertheless, that doesn’t mean Data Scientists are useless when it comes to clearly understanding and organizing around the “Job to be Done” — on the contrary, Data Scientists may be better suited than many to uncover the progress that customers are seeking in their lives. After all, innovation is a prediction problem.
Uncovering the Job
What can Data Scientists do to help their companies and organizations innovate effectively?
Operationalize data that will help uncover the ‘jobs’ of your customers. No matter how ‘Big’ it is, the data that companies store on their customers is not a clean-cut representation of reality. Instead of making data bigger or more complex, Data Scientists should make data smaller — i.e. turn your customer data into qualitative insights that can help your organization uncover the job to be done.
For example, at Franklin Sports, we developed a set of dashboards and reports that help synthesize Amazon product reviews into practical information for the Product teams. Using the web scraping tool Import.io, we pulled reviews for all of our Amazon listings into our Azure SQL Database and ran NLTK word tokenizers to segment reviews into one and two token n-grams. We then layered in filtering criteria that highlighted reviews using words associated with defects like ‘broken,’ ‘cheap,’ and ‘ripped.’ This dataset, combined with our internal product taxonomy, allows Product Managers to quickly examine negative market feedback about their class of products.

With this data, our Product Managers are able to learn crucial insights about the jobs that our products are solving (or not solving). We’ve found that there are many 5 star reviews in this data set where the customer expresses enjoyment of the product but highlights specific ways that it didn’t meet their needs.
This simple application saves our Product Managers countless hours of pouring through review data to learn about their products, allowing them to focus on creating innovations that will solve our customer’s jobs to be done.
Measure what truly matters. Are your data products organized around the jobs that customers hire your company to solve? Clicks, impressions, frequency, time spent, cost, revenue, etc. — these are all metrics that most of us have easy access to, subsequently we spend an enormous amount of time analyzing them and leveraging them (or feature variations thereof) to build models and data products. Unfortunately, most of these metrics don’t really matter to customers. If you sit down and really try and figure out what’s important to your customers, you’ll find that it’s probably very hard to measure. Don’t give up. Really spend time on this — even if it means pausing statistics and programming work to focus on the business. Your technical work will go much further if it’s aligned with the job.
Amazon has mastered this idea. Its retail business is vehemently focused on three target areas: vast selection, low prices, and fast delivery — and they measure each of these on a “minute-to-minute basis.” For example, Amazon employs a shopping-robot that crawls the web to benchmark product prices. If lower prices are found Amazon’s price will automatically be lowered to beat the competitor's price. This process is narrowly focused on solving the job customers are hiring Amazon to do. (Competing Against Luck, Christensen et al., 2016, pg.209)
Take a step back from your ongoing projects and assess whether your activities are aligned with the job that customers hire your company to solve. Are they in agreement? If not, slow down and perform some realignment. Make plans to get the data sources you need so you can measure and take action on what truly matters — even if that data is qualitative in nature. You’ll be tempted to make do with what you have just to “get work done,” but you’ll, at best, make incremental gains. Instead, make a real difference by aggressively focusing on the job to be done above everything else.
Apply the theory. Everything I’ve outlined above is going to be incredibly difficult if you don’t know what ‘job to be done’ your customers hire your products to solve. Above all else — figure that out. Without a clear understanding of your customer’s jobs, you’ll be stuck in “feature chase,” or worse, you’ll actively be working against your customers without even knowing it. I’ve cited the articles and the book that explain “Jobs to be Done” theory in detail throughout this post — read them and understand them, then apply them in your organization:
Know Your Customers’ “Jobs to Be Done”
Marketing Malpractice: The Cause and the Cure
Competing Against Luck: The Story of Innovation and Customer Choice
Solve Problems
The “Holy Grail” of data that will solve every problem doesn’t exist — and if we continue to try and find it we’ll waste countless hours and precious time that could be spent actually working on the problem.
Quit looking for the hidden gem in the data and actually look towards your customers. What are they hiring your company to do? When you fully understand that, you’ll become a much better Data Scientist and Business Leader.
