Decision Skills

Operationalization: the art and science of making metrics

Essential psychology for all data professionals

It’s time to tackle a topic from psychology that’s essential for all data professionals:

How do you measure user happiness?

Let me try to guess what you’re thinking. Survey responses? Lack of complaints? Number of returns? Propensity to click?

The correct answer is…

You don’t. If you think your happiness survey can close an open question philosophers have been kicking around for millennia, think again, Professor Dunning-Kruger.

How about success? How do you measure it? You don’t.

How do you measure anger? You don’t. (See the pattern? Yes, I’m making you angry on purpose. Channel that rage into pondering the similarities between the nouns I’m describing.)

How do you measure credit worthiness? You don’t.

How do you measure love? You don’t.

How do you measure the goodness of a marketing campaign? You don’t!

What is happiness?

What do these things have in common? They’re fuzzy.

When you hear the word happiness, something floats to the top of your mind. Whatever you’re thinking of is probably not the same something as whatever comes to mind for the person next to you. Perhaps on other days and in other moods, you might not agree with yourself either.

Tempted to argue the neuroscience angle? “Wait, we can look into their brains to see if they’re happy…” Been there, done that. Before I went to graduate school for statistics, I was a PhD student in neuroscience, specializing in the field of neuroeconomics (yes, there is such a thing) and studying utility and value signals in the human brain. In other words, my thesis topic was quite literally this.

Proof your author has a brain. Image: SOURCE.

I was lucky enough to be part of an extremely well-equipped lab with:

fMRI scanners. The f makes it functional magnetic resonance imaging, meaning that the brain pictures aren’t static, but rather a map of changes in blood oxygenation — lagged by several seconds — used to localize brain activity during experiments.
EEG rigs. That’s short for electroencephalography, a technique that uses nets of electrodes to record electrical activity on the scalp, supplementing fMRI’s where? with a when? by giving you much more precise timing data — at the millisecond level — about the brain’s reaction to various stimuli.
Eye trackers. Specialized kits that combine a nifty camera with niftier software for mapping out where study participants choose to direct their gaze, allowing researchers to map visual attention.
TMS rigs. Transcranial magnetic stimulation uses powerful electromagnets to induce electrical activity in the brain and cause behavior. Yes, mind control is real (and it’s even FDA-approved as a treatment for depression), but don’t worry: it’s hard to sneak up on you with a fridge-sized machine in tow that doesn’t work unless you sit ve-e-e-ery still.

Despite this wonderful array of tools, I could not measure user happiness. Luckily, I knew that before I began. Just like every other neuroscientist on the planet.

Why’s that? During week 1 of neuroscience/psychology grad school, they make sure to beat an important concept into us. With a stick if they have to. It’s time for you to learn it too: hello, operationalization.

A certain je ne sais quoi. Image: SOURCE.

The art and science of making metrics

Even the word operationalization means different things to different people. When I say the word, I mean:

Operationalization is the creation of measurable proxies for rigorously investigating fuzzy concepts.

I mean it the same way Wikipedia means it: “In research design, operationalization is a process of defining the measurement of a phenomenon that is not directly measurable, though its existence is indicated by other phenomena. It is the process of defining a fuzzy concept so as to make the theoretical concept clearly distinguishable or measurable, and to understand it in terms of empirical observations.”

Operationalization is the reason psychology can claim to be a science. Without it, you’d get nowhere. How do you measure something you can’t even define?

You don’t.

The swamp of human expression

Some nouns are defined by the way they’re measured. Fixed meaning is part of the charm of temperature, mass, calorie count, distance, day of the week, and so on. When it comes to measuring things, psychologists would say that physicists have it easy. Frankly, it’s grueling work to wade through the swamp of human language to drag out a science that kicks and screams the whole way.

The trouble is that much of human communication is imprecise by design. The most precise form is mathematics (a language for saying very little very carefully) and, no matter how much you love math, I hope you’ll agree that it’s too slow for most of our everyday chatter. It’s faster to get things across if we turn up the ambiguity dial and let listeners interpret our words however they like.

Human language can be so imprecise that it has next-level rounding errors for its rounding errors.

Every time we speak abstractly, we say too little (some intended meaning evaporates in flight) and too much (with forking roads of extrapolative interpretation). Ironically, these words won’t mean exactly the same thing entering your brain as they meant exiting mine.

Since I don’t know the wetware settings you’re filtering my words through, it’s near-miraculous I can make myself even approximately understandable. And yet, I’m fairly confident that most of you will “get” it. Humans are amazing critters.

Rounding off reality

While it helps us convey more information per unit time, all that poetry makes transfer of information less reliable. That’s not necessarily all bad, especially if you want to leave your audience a bit of space for innovation.

Unfortunately, while room for interpretation is usually a feature in song lyrics and marketing campaigns, it’s a bug as far as mathematical proofs and scientific research is concerned. When the objective is to convey a precise recipe your audience can follow without screwing up — such as a scientific discovery your colleague can build on — poetry is counterproductive. That’s why psychology wouldn’t be a science without operationalization… and it’s also why sloppily-defined business metrics usually leave you worse off than if you hadn’t tried to measure them in the first place.

We round off reality when we put labels on it.

One of the worst places to leave room for interpretation is machine learning. Humans are used to exchanging fuzzy nonsense with other humans, so we get used to the forgiving way our audience searches for meaning on our behalf even though we don’t know what we’re trying to say. Machines don’t do that. They do exactly what they’re told. If you tell them to optimize for accuracy, they’ll do exactly that. They won’t say, “Hey, boss, I think you said accuracy when you actually meant to say precision…”

Language is even fuzzier. Image: SOURCE.

Furthermore, the categories we create in our data come from that same fuzzy place and the labels we put on reality reveal a lot about our biases. Unfortunately, machine learning systems pick these up and amplify them if we’re not careful. To avoid some terrible — and terribly common — ML/AI/data gotchas, it’s crucial that you develop the skill of saying what you mean and understanding what you’ve actually said.

It’s crucial that you develop the skill of saying what you mean *and* understanding what you’ve actually said.

A page out of the psych playbook

Psychology has had over a century to stub its toe on the dangers of measuring what you haven’t properly defined, so we’ve learned a nugget or two that business leaders and data scientists would be wise to borrow. If you’d like me to write down some of our tips for operationalization, retweets are the surest way to my heart.

The best piece of metric-making advice psychology could give you is this:

Define your metric before you name it.

At its heart, operationalization is all about flipping your standard thinking around: instead of falling in love with a word and pursuing it for its own sake, think deeply about what real-world quantity you want to measure. Even if you were inspired by a poetic expression, think about why that word caught your interest. What is it about “happiness” that seems relevant for your business problem? Why? What real world behavior is that related to? What does that behavior look like? Does it look like a smiling user cheerfully hanging out on your website? Maybe you decide it does.

Now forget all about the original word. Instead of happiness, call it X. Then do what the mathematicians do, for example: “Let X = the user’s propensity to spend time on your website.”

Think carefully about this quantity. Is it actually the one you want to measure and base your decisions on? Maybe you decide it is.

Excellent. Now you can name it. The reason you’re naming it is to save time while writing and talking. Perhaps you’ll call it “X.” Perhaps you’ll call it “happiness” or “blorktibork” or even “X Æ A-12.”

Here comes the important bit. You’re truly allowed to name it whatever you like, as long as it’s not offensive (to etiquette or common sense) and you (along with your audience!) remember that the name is a placeholder for something very specific: the user’s propensity to spend time on your website. It is not Plato’s platonic “happiness” even if you’ve named it that.

If you remember that the name is merely a placeholder, you’ll be less likely to do something stupid like:

Argue with tech bros (of all genders) about whether the metric really is measuring happiness. It isn’t. That argument is as nonsensical as whether I’m allowed to call my variable X if you’ve already called yours X. As long as we both write what we mean at the top of our pages, we’re good, bro.
Misinterpret an increase in “happiness” to mean something good is going on with your website. Perhaps the users can’t find what they’re looking for and they’re spending more time on the page while howling in frustration. If you remember what you’re actually measuring and that its name is just shorthand, you’ll be much safer.
Assume someone else’s research about “happiness” applies to your own scenario. Chances are that they defined their metric differently. Pay attention at a psychology mixer sometime and you’ll probably hear a sequence structured like this: “It’s lovely to meet you! What do you work on?” “Memory.” “Awesome, me too. What kind?” “Visuospatial working memory development in humans, you?” Notice that “memory” could mean anything, while VSWM is a well-defined technical concept (at some point someone wanted to study it and said, “Let VSWM = _____” ). If the other scientist doesn’t work on VSWM, they’ll know better than to think their new friend’s research applies to their own work.

Being careful with language and taking stock of our fuzziness with careful operationalization-based thinking is helpful not just for statistical inference but also for data collection and practical machine learning.

If you liked this article, don’t forget to share it!

Thanks for reading! How about an AI course?

If you had fun here and you’re looking for an applied AI course designed to be fun for beginners and experts alike, here’s one I made for your amusement:

Liked the author? Connect with Cassie Kozyrkov

Let’s be friends! You can find me on Twitter, YouTube, Substack, and LinkedIn. Interested in having me speak at your event? Use this form to get in touch.

Next up

In the follow-up article, I’ll give you 7 tips for creating metrics. If this topic intrigues you, your retweets are my favorite force for getting me to keep writing in the direction you enjoy.

In the meantime, most of the links in this article take you to my other musings. Can’t choose? Try one of these:

Your dataset is a giant inkblot test

The danger of apophenia in analytics and what you can do about it

bit.ly

Is AI a fad?

Three reasons people think AI is a passing craze

bit.ly

Statistics for people in a hurry

Ever wished someone would just tell you what the point of statistics is and what the jargon means in plain English? Let…

towardsdatascience.com