Generate a piano cover with AI

A new model generates a piano cover from a pop song: how it works? how you can try it?

Image by Jordan Whitfield at unsplash.com

A piano cover refers to a cover in which all musical instruments are replaced by the sound of the piano alone. Lots of them can be found on youtube, and they may sound almost trivial (spoiler: it is not).

In order to create a piano cover, a person must recognize all the musical elements in the melody and reinterpret it using only the piano. Therefore, one needs musical skills and also creativity in being able to recreate the melody. If it is already difficult for a human being, can an AI succeed?

Recently, an article called “POP2PIANO : POP AUDIO-BASED PIANO COVER GENERATION” came out that intends to do exactly that. In this, article we will discuss it and how you can try it

The AI who wanted to do a Lady Gaga cover.

Actually, as they state in the article such a challenge has already been attempted. The idea is to extract the tracks of the various instruments from the audio and rearrange them. The task is not easy, because a good cover is influenced by both the atmosphere and the composer’s style.

The authors started with 300 hours of the synchronized piano cover dataset. Basically, instead of using raw music, they took the original songs and piano covers. They synchronized the original songs with the covers, then divided them into segments. The covers were transformed to MIDI and they were reduced to 8th-note units. In total, they collected 5989 piano covers from 21 arrangers on youtube (they then used only 4989 and 307 hours).

“Fig. 1. A preprocessing pipeline for synchronizing and filtering paired {Pop, Piano Cover} audio data”. image source from the original article (source)

The model is basically a transformer:

The Pop2Piano model architecture is T5-small [7] used for [9]. It is a Transformer network with an encoder-decoder structure. The number of learnable parameters is about 59M. Unlike [9], the relative positional embedding of the original T5 is used instead of the absolute positional embedding. Additionally, A learnable embedding layer is used for embedding the arranger style. — from the original article (source)

As can be seen from the figure it consists of an encoder and a decoder.

“Fig. 2. The architecture of our model is an encoder-decoder Transformer. Each input position for the encoder is one frame of the spectrogram. We concatenated an embedding vector representing a target arranger style to the spectrogram. Output midi tokens are autoregressively generated from the decoder.” image source from the original article (source)

And the authors have presented an example of the output:

“Fig. 3. An example of piano tokenization. the beat shift token means a relative time shift from that point in time.” image source from the original article (source)

Although the original song is still complex (composed of several instruments and the vocal part), the piano accompaniment seems plausible. Not only that, it sounds plausible but is also similar to the arranger’s work.

Moreover, even in a subjective evaluation, it seems to be plausible (25 participants among people who were not musicians). Participants had to listen to 10 seconds of 25 songs and compare them with the arrangement made by a human. Seventy percent preferred the model’s work.

Here is a video released by the authors as an example:

Also, on the project website, you can test other songs and arrangements (you can find them here).

The authors acknowledge that there are still limitations:

We recognize that some improvements can be made to our model. For instance, Pop2Piano uses only four-beat length audio for the context of input. Therefore, features such as melody contour or texture of accompaniment have less consistency when generating longer than four-beat. Also, time quantization based on eighth note beats prevents the model from generating piano covers with other rhythms such as triplets, 16th notes, and trills. — from the original article (source)

How to obtain a piano cover

The authors have provided both a Github and a Google Colab.

First, you have to change the Runtime (in the menu above select Runtime), then select Change Runtime Type (in the drop-down menu select GPU). Once that is done you need to run the first block of code (CTRL+ENTER or press on the little play symbol). This may take a few minutes, but as soon as it is complete go to the second block.

Again you must execute the code block. It should take about a minute

This block should also take a short time (depends on your connection since it downloads the template)

This block allows you to choose the arranger. You can choose in the drop-down menu which of the composers you prefer (if you want some guidance, they show the differences between the various composers on the project site).

In this block, you can upload the audio track whose piano cover you want to create (you can choose between audio WAV and MP3, I used an MP3 converted from a Youtube video).

Run this block of code (it shouldn’t take long).

You will only need to run this code to download the piano cover (in MIDI format). You will find it in the same folder where you had the original track.

Conclusions

The proposed model, once a song is loaded, allows a track to be downloaded in MIDI (mind you, it is not synchronized with the vocals as in the examples on the project site). I have tried several songs, and it works quite well with pop songs but less so with other genres (for example, if there is a long drum sequence).

In general, the result is interesting especially considering the architecture and the fact that the number of parameters is not very large (only 50 million parameters). As we have seen, Microsoft has also launched a model that generates music, and Google itself is investing in the same field. It seems that after images, music is the next frontier. What do you think? Have you tried it? let me know in the comments.

If you have found it interesting:

You can look for my other articles, you can also subscribe to get notified when I publish articles, and you can also connect or reach me on LinkedIn. Thanks for your support!

Here is the link to my GitHub repository, where I am planning to collect code and many resources related to machine learning, artificial intelligence, and more.

GitHub - SalvatoreRa/tutorial: Tutorials on machine learning, artificial intelligence, data science…

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python…

github.com

Or feel free to check out some of my other articles on Medium:

Restore your images with AI

how to easily restore images with AI

medium.com

AI Reimagines Mythical Creatures

A modern bestiary inspired by medieval ones.

medium.com

How AI Could Help Preserve Art

Art masterpieces are a risk at any time; AI and new technologies can give a hand

towardsdatascience.com

How artificial intelligence could save the Amazon rainforest

Amazonia is at risk and AI could help preserve it

towardsdatascience.com

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com