OF WHISKEY AND DICK PICS

Autotuned Emotion — AI’s Tone Deafness

Reflections on what gives a human voice its power, and why AI should struggle to reproduce that

**Pour another glass of that rock’n’roll** (Photo by Anthony Torres on Unsplash)

Ask me to choose between the jazz greats Billie Holiday and Peggy Lee, and I’ll pick Lady Day every time.

Peggy Lee’s voice is smooth perfection, pure and clear as the ring of a Bohemian crystal champagne flute. Billie Holiday’s is cracked and chipped, the whiskey tumbler of a barfly with a million stories to tell, each one more moving than the last.

That’s what I want from a singer — emotion rasped and scraped from human experience. Wrinkles and scars and laughter lines and bad tattoos. The signs that tell me they are not simply going through the motions, skilfully session-singing from sheet music on a stand. I need them to feel it, if I am to feel it.

Tom Waits or Tom Petty? One was backed by the Heartbreakers. One was a heartbreaker.

There is, as the croaky old poet Leonard Cohen sang, a crack in everything. That’s how the light gets in.

Or to bring in an implausibly different cultural reference, does anyone remember the 1980 feature film of Flash Gordon, with Ingmar Bergman veteran Max von Sydow hamming it up as Emperor Ming the Merciless? As Flash is led off to be executed, Dale Arden begins to cry.

“Look!” Princess Aura says to her father, “Water is leaking from her eyes.” “It’s what they call ‘tears’,” Ming replies, “It is a sign of their weakness.” “No!” we all call out silently in response, “It is a sign of our strength!”

From the Tin Man to Mr Spock to Data, it is a long-standing trope of sci-fi and fantasy that machines and aliens struggle to replicate or interpret emotion. And we now see that being played out before our eyes for real with generative AI.

James Bellerjeau asked in an article earlier today what the secret sauce might be to demonstrating our humanity as writers, to passing what he referred to as an ‘anti-Turing test’. My initial response, which I more or less reproduce here — hey, it’s okay to plagiarise ourselves, right? — was:

I think one aspect that an AI system would struggle to handle convincingly is the blend of conflicting emotions in response to incidents in our personal lives. Because the system is designed to operate by probabilities, it will tend to smooth out less common or unexpected emotional responses — what I call the ‘bland airline food phenomenon’.

When we write, we tend to include examples from our own lives that illustrate particular points, that have elicited strong emotions or prompted telling insights.

But there will also often be conflicting or overlapping feelings in there, which I don’t think AI is likely to be able to reproduce, as it will tend — for all its famed love of tapestries — to follow one conventional strand of ‘thought’, rather than intertwining several different emotional and logical responses.

In other words, even if an AI application could be trained to fake an account of ‘an emotion’, it would struggle to add layers to its narrative, to capture all the possible sources feeding into that moment, and all the different strands leading on from it.

Because it seems to structure its narrative by taking one probabilistic step forward at a time, specifically rejecting all other pathways. And likewise never looks backwards to enrich its narrative by weaving in related ideas from earlier points.

It feels ‘flat’ because it is quite literally two-dimensional, a series of straight lines traced from point to point, eventually closed into a crude figure with its childishly presented ‘Conclusion:’.

We humans do not think or express ourselves in linear form, but through a meshwork of overlapping trains of thought. Our voices are not a perfect, single frequency, but waver and quaver like an erratic oscilloscope, deviating from the simple x-axis in peaks and troughs.

The now ubiquitous Autotune was devised to eliminate unwanted fluctuations from the norm, to standardise a singing voice to a single, simple harmonic line. But the technology in fact originated in the oil industry as a means of cleaning up interference in seismography signals.

It was meant to give a greater insight into the depths of the Earth’s crust, but was then instead repurposed in the interests of shallowness, homogenising voices, and ensuring that the superficial aesthetics of a pop star would not be undermined by vocal imperfections.

Machines, like record company executives, thrive on predictability. Systems like ChatGPT in fact have nothing else to work with.

IF word X, THEN highest probability for next word = Y.

Not only can such a system not draw on nor convincingly express emotion, but it is likewise incapable of telling a story in a surprising or unexpected manner. It cannot meander along the forest paths where the untamed, disturbing, surprising things in life are to be found lurking in the shadows. It instead sticks to the highway, and maybe reaches its intended destination more quickly, but has nothing of interest to show for the journey.

To plagiarise myself again, my comment in this regard on that earlier piece was:

It’s a hard task to mimic inherently illogical thinking, with a logical and probabilistic function. As I understand it, LLMs move like pawns — only forwards, one step at a time, and with a limited choice of squares available to them.

Decent human writers are bishops or queens, capable of sweeping moves.

The best are knights, leaping and pirouetting unexpectedly from one place to another, then back again via a different route.

Not even they know to begin with quite how they ended up on that square, nor which of the options they will take for their next move. And nor does the reader.

I think that is, for the moment, at odds with the modus operandi of a generative LLM.

Those seem to me the clearest elements of human writing that machines cannot reproduce or mimic, except by direct plagiarism, which is relatively straightforward to detect, and against which established legal recourse exists. And so those are the aspects that I feel writers wishing to ensure they stand out from the ranks of the bot army should emphasise.

I think it unlikely, for example, that if ChatGPT were invited to write 1,000 words on why and how AI writing fails to convince us it is human, it would begin with Billie Holiday, and proceed through Flash Gordon, seismography and chess pieces to make its point.

It may seem like a bizarre Heath Robinson contraption, but rather that, for me, than a sleek, flat, soulless, and ultimately destructive and ugly cybertruck.

Who knows? Maybe AI could eventually replicate such stylistic and logical extravagances. But, as Smillew Rahcuef also pointed out in the comments on James’ piece, one thing it will never do is post dick pics.

Princess Aura’s jury is still out as to whether that is a sign of its strength, or its weakness.

On a related theme, here is an article I wrote recently about an attempt to put ChatGPT through its somewhat unconvincing paces as a boostworthy writer:

You Are the Boostinator!

A heartbreaking work of staggering gracelessness

medium.com

And this is a follow-up in the words of ChatGPT itself:

Queenly Delusions of Grandeur

But no clear opinions in the Billie Holiday vs. Peggy Lee debate

matthewclapham.medium.com