Build an AI Composer – Machine Learning for Hackers #2
I actually didn’t play anything. You just heard AI generated music. Hello World, welcome to Sirajology! In this episode, we’re going to train a neural network to compose music all by itself. Machine generated music! the technical term for this is ‘music language modeling’ and it has a long history of research behind it. Markov Models and restricted bolztman machines. Which kind of sounds like something out of half life or bioshock. Hold on babe. I’ve got to go save the world using my restricted boltzman machine.
Music is how we communicate our emotions and passions and its completely based on mathematical relationships. Octaves, Chords, Scales, keys, all of it is math. At the lowest level, music is a series of sound waves that create pockets of air pressure, and the pitch we hear depends on frequency of changes in this air pressure. We’ve created annotation to help us map these sounds into an instruction set. So if machine learning is all about feeding data into models to find patterns and make predictions, could we use it to generate music all by itself? absofruitly. We’re going to be build an app that learns how to compose british folk music by training on a dataset of british folk music. We’ll be using Tensorflow, the sickest machine learning library ever, to do this in just 10 lines of Python. We’ll be following the tried and true 4 step machine learning methodology to do this. Collect a dataset, build the model, train the model, and test the model. To start off, we’ll want to collect our dataset.
So let’s import the urllib module, which will let us download a file from the web. Once we import it we can call the URLretrieve method to do just that. We’ll set the parameters to the link to the dataset and the name we’ll call the downloaded file. We’re using the nottingham dataset for this demo, which is a collection of 1000 british folk songs in MIDI format. MIDI format is perfect for us since it encodes all the note and time information exactly how it would be written in music annotation. It comes in a zip file, so we’ll want to unzip it as well.
We can do this programmatically using the zipfile module. We’ll extract the data from the zip and place it in the data directory. We’ve got our data, it’s time to create the model. But before we do that, we need to think about how we want to represent our input data.There are 88 possible pitches in a MIDI file so we could do one vector representation per note.
But lets be more specific. At each time step in the music, there are two things happpening. Theres the main tune or melody and then there are the supporting notes or harmony. Let’s represent each as a vector. And to make things easier we’ll make two assumptions. The first is that the melody is monophonic. That means only one note is played at each time stamp. The second is that the harmony at each stamp can be classified into a chord class. So thats two different vectors one for melody and one for harmony. We’ll then combine them into one vector for each stamp. We can just import our ML helper class and then call the create model method to do this. Music plays out over a period of time, its a sequence of notes. So we need to use a sequence learning model – it has to accept a sequence of notes as an input and output a new sequence of notes.
Plain old neural nets can’t do this. They accept fixed sized inputs like an image or a number. We’ll need a special kind of neural network, a recurrent neural network. Yeah! Those can deal with sequences since data doesn’t just flow one way, it loops. This allows the network to have a kind of short term memory. Yeah, that’ll work. But wait. We want our network to not just remember the most recent music its heard, but all the music its heard. Like, a piece of music can have multiple themes in different parts of it (hopeful, melancholic, angry) and if the network only remembers the most recent part which was cheery, it’s just going to compose cheery stuff. We need a special type of recurrent neural network called a Long short term memory network. Super specific i know. This type of network has a short term memory that is LONG, like it can remember things from way back in the sequence of data, and it uses everything it remembers to generate new sequences.
We can add this model in our code with just one line using our helper class. It’ll generate the sequences and chord mapping file to a file called ‘’ in the data folder. This is just a serialized byte stream representation of our music that we’re going to train our model with it. Now that we have our model we can go ahead and train it. You might be thinking wait this is a little too easy, isn’t there more to it? Well yeah, every machine learning model has a set of what are called ‘hyperparameters’. These are the parameters that we humans set for how our model operates, like knobs on a control panel. How many layers do we want? how many iterations for training? How many neurons? You could play around with these, turning all the knobs in different ways to perfect your end-result, but chances someone somewhere has solved the problem you’re working and and you can just use an existing model with pre-tuned hyperparameters to build something awesome So now we’re ready to train our model.
We can just call the train_model method of our recurrent neural net class to do this. This’ll get the the network to start collecting the input data piece by piece. It took me about 2 hours to train it on my 2013 macbook pro. But you don’t have to wait until its competely done training to test it out. Just wait until you see the “Best loss so far encountered, saving model.” message. Once you see that you can type ‘rnn_sample’ into terminal with the flag —config file and point it to the newly generated config file in your models folder. That will generate a new song using the newly trained model you’ve just created. To generate music we just sample the melody and harmony at each time step and plug it into our trained model. The model will then predict what the next notes will be.
The collection of all the predicted notes is our newly generated song. Let’s listen in to what I’ve generated. So it sounds nice, it could better but it gives off that british folk vibe. There are definitely some improvements that could be made. The time signature is kinda sporadic and in terms of long-term structure, there seems to be a lack of repeated themes and phrases. The solution may well be more data and more computing power. It usually is when it comes to machine learning with deep neural nets. Machine learning can help us learn the fundamental nature of how music works in ways that we haven’t even thought about. I’ve got links below, check em out.
and I’ve gotta go fix a runtime error so thanks for watching .