avatarThomas Smith

Summarize

Why Humor Is the Perfect Benchmark for Generative AI

Are LLMs like ChatGPT and Gemini capable of being funny?

Illustration by the author via DALL-E

There are a lot of ways to test today’s powerful Large Language Models. You can give them benchmarks for the speed, measure their number of parameters, or see how they perform on a battery of tests — from image recognition tasks to bar exams.

But I’ve found there’s a quicker (and far more enjoyable) way to test new Large Language Models: see if they’re funny.

Nailing Intent

At first, humor might not seem like the best test for AI success.

Most people are more interested in using ChatGPT to write their blog posts or cajole their supervisor into a salary increase than in having the model tell them jokes. But humor is actually an excellent test for a Large Language Model.

The best LLMs succeed not just because they’re good at performing rote tasks, but because they’re good at understanding people and their intents. But figuring out intent is hard. People are complex, contradictory, and often suck at expressing themselves.

If you ask ChatGPT to “make my essay better,” for example, it has to understand what you mean by “better.” Do you want the model to proofread your writing, fixing your egregious spelling errors? Or should it rewrite the whole thing from scratch? Perhaps you’d like it delivered as a nice limerick?

People are confusing and cryptic. The best LLMs can navigate this maze of humanness, figuring out what we actually want and delivering it.

To Laugh is Human

Humor makes a great case for LLMs, because writing a funny joke requires understanding a lot about people.

The best humor is often about surprise. People have certain expectations about the world, and a funny joke breaks the mold of that expectation, revealing something clever and unanticipated.

Jerry Seinfeld is a master of this form. As he’s quick to point out, his jokes aren’t really about anything. They’re usually observations on daily life, delivered in a way that violates the patterns of our expectations and thus makes us laugh.

Playing with patterns and expectations, humor gives us a lens to look at the assumptions we make about the world. It’s part of why stand-up and satire are such powerful mediums for social change.

Sometimes only by laughing at something can we see its underlying structure, and often its absurdity. And only by doing that can we fully understand it, and perhaps make it better.

The Humor Benchmark

For an LLM to be funny, then, it must do a lot.

First, it needs to understand intent well enough to figure out that the user is asking for a joke in the first place.

Next, it needs to understand the kinds of assumptions or patterns that the typical human would make or see in a situation.

Finally, it needs to understand how to challenge those patterns or break out assumptions in a surprising way, without being bland or alternatively, offensive.

That’s a hard line to walk. A joke can easily veer too far and end up pissing people off instead of making them laugh. Or it can stay too close to a norm, and end up being bland and corny instead of amusing.

Again, that complexity of thought and language makes humor a great test for an LLM. A model that can successfully understand people, predict their reactions and then play with those reactions in a genuinely funny way — without being mean or worse, boring — is hard to build.

And crucially, the same skills required to write humor transfer to other domains.

Understanding intent, predicting patterns of human understanding, and knowing those patterns well enough to alter them in clever ways are skills that make an LLM good at everything from creative writing to data analysis.

The Funniest LLM

So if humor is a test for an LLM, which of today’s models pass the test?

I decided to test this with two of today’s most powerful models: Google’s brand-new Gemini model and GPT-4, the frontier model behind ChatGPT.

For my test, I decided to ask each one to write a stand-up comedy routine about Bichon Frises.

Bichons are inherently funny dogs. Their personified-marshmallow appearances alone is often enough for a laugh.

My Bichon, Lance

But their personalities — which combine a cheerful lovability with a deep (often unearned) intensity, also make them a particularly funny breed. There’s a reason these tiny puff-balls were historically chosen as circus performers.

Here are the results from my comedy test. Let’s start with ChatGPT.

Opening: “Hey everyone! Great to be here. So, I recently got a Bichon Frise. You know, the fluffy, marshmallow-like dog that looks like it just popped out of a washing machine? Yeah, that’s the one.

Observations:

“Bichons are like living clouds with a superiority complex. Mine struts around the house like he owns the place… which, given the amount I spent on his grooming, he probably does.”

“I’ve noticed Bichons have two moods: ‘I’m the cutest thing on the planet’ and ‘I will bark at the leaf blowing outside as if it’s a major security threat.’ There’s no in-between.”

“Trying to discipline a Bichon is like trying to scold a cotton ball. You just end up apologizing for raising your voice.”

Grooming:

“Grooming a Bichon is an art form. It’s like sculpting with fur. One wrong move and your dog goes from ‘Best in Show’ to ‘What in the world is that?’”

“You can actually measure your Bichon’s mood by its hair. The fluffier they are, the more they’re plotting something. When mine looks like a puffy cloud, I know my shoes are in danger.”

Walking and Exercise:

“Walking a Bichon is like being a celebrity’s bodyguard. Everyone stops you, everyone wants a picture, and your Bichon soaks up the attention like a furry little diva.”

“I thought getting a small dog meant less exercise. I was wrong. My Bichon has two settings: ‘sleep’ and ‘hyperdrive.’ There’s no casual stroll; it’s either a full sprint or a dramatic collapse.”

Endearing Qualities:

“But, despite their quirks, Bichons have a way of melting your heart. They’re like little love sponges. Mine follows me around with such adoration, I started to believe I’m actually as great as he thinks I am.”

“At the end of the day, when that little ball of fluff curls up next to you, you realize, all the grooming, the barking at leaves, the high-speed chases… totally worth it.”

Closing: “So, if anyone’s thinking of getting a Bichon Frise, just remember: they’re not just pets, they’re tiny, fluffy overlords. And we wouldn’t have it any other way. Thank you, you’ve been a great audience!”

And Gemini:

Intro

(Image of a Bichon Frise puppy)

Hey everyone, thanks for coming out tonight! I’m here to talk to you about some of the most adorable creatures on the planet: Bichon Frises.

Bichon Frise Appearance

(Image of a Bichon Frise with a bad haircut)

Now, let’s be honest, Bichons are pretty much walking cotton balls. They’re like the poofiest, fluffiest dogs you’ve ever seen. And they know it. They just strut around like they’re the most important things on four legs.

But don’t let their good looks fool you, these dogs are a handful. They’re like little tornadoes of energy, constantly bouncing off the walls and demanding attention. And don’t even get me started on the shedding. It’s like a never-ending snowstorm of white fur.

Bichon Frise Personality

(Image of a Bichon Frise playing fetch)

But despite all that, I gotta admit, I love these little guys. They’re so loving and affectionate, always up for a cuddle. And they’re surprisingly playful. They’ll chase after anything you throw, even if it’s just a crumpled up piece of paper.

Bichon Frise Quirks

(Image of a Bichon Frise with a funny expression)

But Bichons also have some pretty funny quirks. They’re like little clowns, always doing something to make you laugh. They’ll bark at their own reflection in the mirror, chase butterflies in circles, and even try to bury themselves in the couch cushions.

Bichon Frise Exercise

(Image of a Bichon Frise running on a beach)

And don’t forget about the zoomies. Every dog gets them, but Bichons take it to a whole new level. They’ll suddenly go from zero to sixty in a split second, tearing around the house like they’re possessed.

Conclusion

(Image of a Bichon Frise sleeping)

So yeah, Bichons are a bit of a handful, but they’re also some of the most loving, loyal, and entertaining dogs you’ll ever meet. They’re definitely not for everyone, but if you’re looking for a dog that will make you laugh every day, then a Bichon Frise might just be the perfect pet for you.

Outro

(Image of a Bichon Frise giving a paw)

Thanks for listening everyone! I hope you enjoyed my little stand-up routine about Bichon Frises. And if you have any questions, please feel free to ask.

So how did they do?

To me, ChatGPT is the clear winner here. It understands the amusing dichotomy of cute-but-intense that makes Bichons such appealing pets and plays with that dichotomy in clever ways throughout the routine.

As a Bichon owner, the jokes about a cute dog that also treats fluttering leaves as mortal enemies definitely lands.

So does the one about scolding a cotton ball — Bichons’ intense personalities often lead them to do things that require discipline, but they’re also so cute that you kind of have to forgive their (numerous) transgressions immediately after they happen.

Likewise, ChatGPT picks up on some of the patterns that Bichon Frise owners would immediately recognize.

The cost and complexity of grooming a Bichon, peoples’ tendency to stop you on the street to photograph your dog paparazzi-style, and Bichons’ irrational adoration of people; all of those things are familiar to anyone who has spent time around a Bichon.

Gemini’s routine, on the other hand, bombs.

Gemini comes off as if it’s read a fact sheet about Bichon Frises, and then made a half-hearted attempt to transform each piece of factual information into a joke.

Yes, Bichons are “loving, loyal, and entertaining dogs.”

No, that observation isn’t funny.

Worse, Gemini gets lots of basic things wrong. Its joke “And don’t even get me started on the shedding. It’s like a never-ending snowstorm of white fur” makes zero sense, as Bichon Frises are hypoallergenic, non-shedding dogs. Their lack of shedding is one of their signature characteristics.

Again, one of the key things that makes a joke funny is authenticity built on a shared understanding. The audience has to trust that the comedian “gets it” — that they share a common idea of how the world works, and thus that they can laugh together when a joke reveals something unexpected about that world.

Gemini’s basic error makes it come off as inauthentic. And being authentic is a quick way to kill any chance at successful comedy.

The Case for Funny LLMs

So Gemini can’t tell a joke. Should we care?

At face value, it doesn’t really matter if an LLM is funny. Again, most people turn to these models for help with pragmatic, daily tasks, not for the lolz.

But even if ChatGPT isn’t about to replace Sarah Silverman, testing LLMs’ joke-writing abilities is still valuable, because it reveals important things about how a given model was built and trained.

I’ve long suspected that Google’s Gemini was built to turbocharge Google Search. Its comedy routine confirms that suspicion for me.

The fact that Gemini veers towards unfunny facts suggests that it has been trained primarily for information retrieval, not the creation of new information. It’s likely built to field search queries about Bichons, not to spitball ideas for Bichon-related zingers.

Likewise, ChatGPT’s routine shows that it was built with creativity in mind. There’s very little in terms of factual information about Bichons in its routine — a basic level of understanding about the breed is assumed.

But ChatGPT’s jokes show that it understands its audience, and can effectively come up with ways to challenge their assumptions or echo patterns they would recognize, and thus surprise and amuse them.

It’s not that one approach is better. It’s that asking an LLM to write comedy reveals the ways the model has been trained to see the world. And that, in turn, reveals what kinds of other tasks a given LLM will do well — and where it’s likely to fall short.

Traditional computers were built for computation. Asking them to solve super-hard maths problems was thus a good way to benchmark them. Today’s LLMs are different — they’re built for understanding, creativity, nuance, and a deep knowledge of language.

Comedy captures all those things in a single, very human metric. We can’t benchmark LLMs with math. Perhaps we should benchmark them with jokes.

I’ve tested thousands of ChatGPT prompts over the last year. As a full-time creator, there are a handful I come back to every day. I compiled them into a free guide, 7 Enormously Useful ChatGPT Prompts For Creators. Grab a copy today!

Generative Ai
ChatGPT
Artificial Intelligence
Technology
Writing
Recommended from ReadMedium