Summary

Researchers Li Zhang and Chris Callison-Burch have successfully fine-tuned OpenAI's GPT-3 language model to act as a drum machine, demonstrating the potential for language-to-music knowledge transfer.

Abstract

In an exciting new development, researchers Li Zhang and Chris Callison-Burch have published a paper showcasing how language models like OpenAI's GPT-3 can be fine-tuned to generate music. By fine-tuning GPT-3 with a few hundred MIDI files, they were able to create a model that can take a 2-bar prompt and generate a 16-bar drum groove. The researchers used a straightforward approach involving filtering MIDI files, transforming them into a multi-line string format, and fine-tuning GPT-3 with the text data. The fine-tuned model was able to generate new grooves within the musical style it was fine-tuned with, demonstrating the potential for language-to-music transfer learning with large language models.

Bullet points

Researchers Li Zhang and Chris Callison-Burch have published a paper on using language models like OpenAI's GPT-3 for automatic music generation.
The researchers used a straightforward approach involving filtering MIDI files, transforming them into a multi-line string format, and fine-tuning GPT-3 with the text data.
The fine-tuned model was able to take a 2-bar prompt and generate a 16-bar drum groove, demonstrating the potential for language-to-music transfer learning with large language models.
The researchers argue that further refinement of the fine-tuning method could fix errors that a professional human drummer would not make.
The researchers conclude that language-to-music transfer learning with large language models is viable and promising.
The author of the post tried fine-tuning GPT-3 to generate a popular middle-eastern groove called "Semai Al Thaqil" and was successful in generating the basic rhythmic structure.

AI & Music: Using GPT-3 As A Drum Machine! 🥁

GPT-3's Language-To-Music Knowledge Transfer: Exciting New Paper

In an exciting new paper, Li Zhang & Chris Callison-Burch showcase how language models like OpenAI’s GPT-3 can be fine-tuned to [drumroll 🥁] …

… act as drum machines. 🤯

In this post, we’ll find out how they did it, how to replicate it, and how my experiment in turning GPT-3 into a Middle Eastern drum machine went. Buckle up, this is exciting new stuff!

Teaching GPT-3 To Be A Drummer

In “Language Models Are Drummers” Zhang and Callison-Burch present preliminary results on a method for automatic music generation using GPT-3.

Yes, that’s right: The same GPT-3 that everyone is raving about right now for its incredible ability to generate texts is now taking the stage as a means of generating music.

In their approach, Zhang and Callison-Burch present a method for transferring GPT’s knowledge of language to music by fine-tuning the regular GPT-3 model with just a few hundred MIDI files.

Sounds intriguing, right?

Here’s their straightforward approach:

From Google’s Groove MIDI Dataset, a collection of 1,150 MIDI files and over 22,000 measures of drumming from 10 professional drummers, Zhang & Callison-Burch filtered out a few hundred grooves by style, length, and time signature (Western Rock/Pop, 16 measures, 4/4) for simplicity. MIDI (Musical Instrument Digital Interface) is a protocol standard that allows electronic musical instruments to connect and communicate with each other. Music in MIDI files is stored as note, pitch, and instrument type, among other things, and the data is machine readable.

2. The MIDI has then been transformed into a multi-line string, called “drumroll format”, where a measure of music corresponds to 16 lines of text and each line in that text corresponds to a 16th note. So, each of the selected 16-bar drum grooves is represented as text (the domain of GPT-3): 16 columns with 16 lines of text.

3. Finally, GPT-3 was fine-tuned with the text data, whereby the first two measures of each groove (2 columns of 16 lines of text) represented the prompt and the following fourteen measures (fourteen columns of 16 lines of text) represented the desired completion.

And that was it.

The fine-tuned model was then able to take any given 2-bar prompt (presented to GPT-3 in the “drumroll” format) and turn it into a 16-bar drum groove.

GPT-3 was not only copying its input but managed to create new grooves within the musical style it has been fine-tuned with — where the latest DaVinci model showed much better quality than the cheaper and faster Ada model. That is pretty insane!

Of course, there are still some errors slipping into the resulting drum grooves that a professional human drummer would not make, but these can be fixed, Zhang & Callison-Burch argue, by further refinement of the fine-tuning method. Evaluating the strengths and weaknesses of their approach, they come to the conclusion that “language-to-music transfer learning with large language models is viable and promising”.

Experiment: GPT-3 As a Middle Eastern Drum Machine

Viable and promising is good enough for the ethnomusicologist in me and so I tried to tackle a specific case: fine-tuning GPT-3 to generate a popular middle-eastern groove called “Semai Al Thaqil” which sounds pretty foreign to the untrained ear since it is built on a ten-beat-structure — not the western standard of 4/4.

Here you’ll find an explanation of the rhythm:

I was curious whether GPT-3 could handle this unorthodox musical style and… it worked! 🤯

Okay, it’s just the basic rhythmic structure until now, but I’ll be working on this one to implement more rhythmic detail and hopefully get my fine-tuned GPT3 Middle Eastern drum machine to recognize and “play” a set of different rhythms with ornamentation.

If you want to know how I fine-tuned my Middle Eastern drum machine, follow me here on Medium as I am preparing a step-by-step guide on how to do this.

Here’s the basic drum pattern by GPT-3:

One more thing: If you read until here and like what you got, please clap a few times so this article gets better distribution and more people get a chance to see it(you can clap up to 50 times, I guess).

Thanks! 🙏