Exploring the Creative Possibilities of AI Music with GPT-3
Creating Polyrhythmic Grooves with Large Language Models

While it showed that you can fine-tune the latest GPT-3 models to create polyphonic drum loops, there is a way to produce polyrhythmic grooves with GPT-3 without any fine-tuning involved.
You just need a prompt with enough information for GPT-3 to know what you want it to do. So let’s do this.
Using syllable notation
If you have an OpenAI account, head over to the OpenAI Playground otherwise check this post on how to get an account and $18 of free credits to play around with.
First, we’ll let the latest model (davinci-003) know what we’re up to: teaching it about rhythm.
Cognitive science showed that music and rhythm seem to be a by-product of human language and embodied cognition (which is another day’s story). And that gives us a tool with which we can approach GPT-3 when talking about sounds: syllable notation.
use D for a bass-drum sound on one beat.
use T for a snare drum sound on one beat.
use - for a silenced beat.
use | to mark the end of bars (this sign does not count as a beat!)
This rhythm is called "malfuf": D--T--T-
This is a 4-bar sequence of "malfuf": D--T--T-|D--T--T-|D--T--T-|D--T--T-
This is a 4-bar sequence of "malfuf" with light variations: D--T--T-|DD-T--T-|D-DT--T-|DD-TT-TTAfter describing some drum sounds and how they should occur, we give the model some examples of a specific use case for that notation, in this case, a popular Middle Eastern rhythm.
By the way, you could have used “low” & “high pitch” instead of bass-drum and snare drum, GPT-3 cannot hear and won’t care. I used these for later reference — and reference is something GPT-3 cares a lot about.
Rules & Bad Examples
If you now add “Create some bars of malfuf” you will already get pretty decent results, but sooner or later you’ll see GPT-3 omitting or adding beats (using 7 or 9 beats in a bar, instead of the 8 beats needed). We need to make it clear what the “Rules for ‘malfuf’” actually are:
use D for a bass-drum sound on one beat.
use T for a snare drum sound on one beat.
use - for a silenced beat.
use | to mark the end of bars (this sign does not count as a beat!)
This rhythm is called "malfuf": D--T--T-
This is a 4-bar sequence of "malfuf": D--T--T-|D--T--T-|D--T--T-|D--T--T-
This is a 4-bar sequence of "malfuf" with light variations: D--T--T-|DD-T--T-|D-DT--T-|DD-TT-TT
Rules for "malfuf":
1. A "malfuf" always has 8 beats per bar. This is very important: you need to keep it 8 beats per bar, no matter what. Even when making variations to the rhythm you have to take care that a "malfuf" consists of 8 beats per bar at all times.
2. On the first beat of a bar always use D. only when using heavy variations you can change this.The “rules section” explains the structural necessities for this particular rhythm. We could go into much more detail here and explain variation styles, breaks, tropes, etc, but for this demo, it should be enough. We just don’t want GPT-3 to hallucinate additional beats that mess up the time signature.
Unfortunately, when trying the above prompt a few times and asking GPT-3 for “a bunch of 16 bar sequences of malfuf, light variations in bar 5 to 8 …” etc. the erroneous beats come back. So I added a bad example for GPT-3 to “learn” from:
use D for a bass-drum sound on one beat.
use T for a snare drum sound on one beat.
use - for a silenced beat.
use | to mark the end of bars (this sign does not count as a beat!)
This rhythm is called "malfuf": D--T--T-
This is a 4-bar sequence of "malfuf": D--T--T-|D--T--T-|D--T--T-|D--T--T-
This is a 4-bar sequence of "malfuf" with light variations: D--T--T-|DD-T--T-|D-DT--T-|DD-TT-TT
Rules for "malfuf":
1. A "malfuf" always has 8 beats per bar. This is very important: you need to keep it 8 beats per bar, no matter what. Even when making variations to the rhythm you have to take care that a "malfuf" consists of 8 beats per bar at all times.
2. On the first beat of a bar always use D. only when using heavy variations you can change this.
Here is a bad example: D--T--T-|DD-T---|D--T-T-|D-T--T-
This is wrong because only the first bar has 8 beats while the others have only 7! In a "malfuf" every bar has to have 8 beats.Now with this, it seemed ready for some rhythm generation. And it did quite well:
Write 16 bars of malfuf, use only light variations, and only use T in the last one, and make sure there are always 8 beats per bar:
D--T--T-|DD-T--T-|D-DT--T-|DD-TT-T-|D--T--T-|DD-T--T-|D-DT--T-|DD-TT-T-|D--T--T-|DD-T--T-|D-DT--T-|DD-TT-T-|D--T--T-|DD-T--T-|D-DT--T-|TT-TT-TT
Write another 16-bar sequence with 8 beats per bar but only use T on the 5th beat of every bar, the rest of the beats are all silence:
----T---|----T---|----T---|----T---|----T---|----T---|----T---|----T---|----T---|----T---|----T---|----T---|----T---|----T---|----T---|----T---
write another 16-bar sequence but divide the 128 beats of the 16-bar sequence in bars of 3 beats each. use a D on beat one of every bar. make sure that there are 3 beats in every bar:
D--|D--|D--|D--|D--|D--|D--|D--|D--|D--|D--|D--|D--|D--|D--|D--|D--
replace the 5th D with a T:
D--|D--|D--|D--|T--|D--|D--|D--|D--|D--|D--|D--|D--|D--|D--|D--|D--The last two were just additional rhythmic layers that would give a nice polyrhythmic vibe when added to the main groove.
Putting it together
I converted the three rhythms to MIDI here and put them on top of each other in GarageBand.
Here’s the result:
