AI-generated music is everywhere. Websites like Udio and Suno allow anyone to create rich, well-constructed pieces of music in many popular genres in seconds. While these websites are absolutely amazing, I would point out that such platforms, in some cases, appear to be trained on large quantities of copyrighted content. But I digress: before the likes of Udio and Suno came into being, simpler - but still interesting - generative music platforms were available. Often, these simpler systems produce sparse melodies, perhaps something akin to ambient music.
This guide explains how to build a simple generative music application (that doesn't infringe on copyrighted music) using a Hidden Markov Model (HMM). You will build the application using a local Jupyter Notebook. A corpus of classical music will be used as input data and will serve as the basis for different experiments with the application.
While a background in music is helpful, it is not essential to complete this guide. I am breaking up the overall exercise into two separate guides: (1) The first guide (i.e. this one) will focus on setting up the application and running a single experiment with it; (2) The second guide will focus on importing/playing your generated music with the freely available Signal Online MIDI Editor along with running additional experiments.
Before you begin this guide, I want to reiterate: You will not be building anything that comes close to what you can generate on sites like Udio and Suno. Moreover, the model will often generate sequences that may not sound very musical at all! However, despite being a basic model, it is still capable of producing some rather interesting melodies. Non-musicians and musicians alike might utilize these melodies as inspiration for more intricate compositions. If you proceed with this guide and the Part 2 guide, I would encourage you to run multiple experiments to develop different musical sequences that you can use in your own music projects. :-)
To complete the guide, you will need to have:
For convenience, I have uploaded a copy of the notebook that you will build to Kaggle, which you can access here. You can download a copy of the notebook to your local computer.
Before you get started with building the notebook, it may be helpful to review the sub-sections directly below. They describe:
If you have an existing musical background and are well-grounded in music symbols and music theory, you do not need to review this section.
For those readers without a background in music, I imagine that you've still likely seen a musical score before...something like this:
This is a beginning snippet of the musical score for Handel's The Harmonious Blacksmith in E. It is not necessary for you to understand how to read the score. Instead, we will use it to introduce a few musical concepts that we will rely upon when building our generative model:
Music is (generally speaking) composed of individual measures, much as paragraphs are composed of individual sentences. A musical score is a combination of a number of measures. Measures are defined by single vertical lines in musical scores. Looking at the first line of the musical score example above, the green arrows below delineate the measures in that particular line of music:
As seen, there are 3 measures in that first line given that the middle 2 arrows delineate the end of one measure and the beginning of another.
You've probably caught yourself tapping your foot to the rhythm of a piece of music that you really like. Moreover, you may have discerned that the music seems to follow a certain number of beats, such as "one, two, three, four...one, two, three, four". The measures in a given piece of music will each contain a certain number of beats, e.g. 4. And, at its outset, a score will specify this quantity. Often, you will see something that looks like a fraction at the beginning of musical scores:
For now, you can ignore the bottom number in each "fraction", which are actually known as time signatures. The top number tells us the number of beats in each measure. So, the first time signature specifies there are 2 beats/measure, the second specifies there are 3 beats/measure, the third specifies there are 9 beats/measure, the fourth specifies there are 4 beats/measure, and so on. Four beats per measure is most common and the 4/4 time signature in the figure above is also known as common time. Our example musical score uses common time; it is specified by the mark that looks like a "C", as highlighted by the blue box:
So, our musical example uses 4 beats/measure. A musical score can actually have more than one time signature which might delineate different movements of the musical piece. For our purposes in this guide, we'll stick with "simpler" musical scores that only have one time signature.
Measures of music need to contain something - namely, music! Musical notes are the most basic musical element and might be conceived of as specific pitches (frequencies) of sound. Notes are represented via a (non-classical) notation using letters: C, D, E, F, G, A, and B. These base letters can be modified as sharp or flat to represent other pitches. Sharp notes are represented using the #
symbol, e.g. C#, D#, etc. Flat notes are represented using a symbol that looks like a small b
, e.g. Eb, Bb, etc. Some software programs, such as the music21 package discussed below, also represent flat notes with a minus sign -
, e.g. E-, B-, etc. As you might already suspect, the small black dots with vertical stems inside the measures of a musical score represent the notes. Referring again to our musical example, I highlighted a handful of notes using purple arrows:
Notes give rise to another important musical concept: keys. A musical key can be thought of as a group of distinct musical notes that sound "good" together. Sharp and flat notes have particular importance with respect to musical keys. A full discussion on musical keys is beyond the scope of this guide, but suffice it to say that notes outside of a musical key will often sound "weird" or "strange". Thus, a composer will typically write a composition in a specific key that uses particular notes. Like time signatures, a composition can be written with more than one key and different keys might delineate different movements or sections of the composition.
You will see notes followed by a number, such as C2, F#3, or E-5. These numbers represent the octave of the note. The idea behind octaves can be visualized with a graphic of a piano keyboard:
Lower keys on a piano, i.e. to the left, have lower pitches and higher keys, i.e. to the right, have higher pitches. The octave of a note reflects where it is located on the piano keyboard. A lower octave number signifies a lower pitch while a higher octave number signifies a higher pitch.
Unsurprisingly, musical notes can have certain durations within a measure. A note that lasts for an entire measure is known as a whole note; a note that lasts for half of a measure is known as a half note; a note that lasts for one-fourth of a measure is known as a quarter note; a note that lasts for one-eighth of a measure is known as a eighth note; a note that lasts for one-sixteenth of a measure is known as a sixteenth note; and so on. So, with our musical example above where each measure consists of 4 beats, a whole note lasts for 4 beats; a half note lasts for 2 beats; a quarter note lasts for 1 beat; an eighth note lasts for half of a beat; and a sixteenth note lasts for one-fourth of a beat. The duration of a single beat as expressed by a musical note is represented by the bottom number in the time signatures seen in the Beats sub-section above. So, a 4/4 time signature means there are 4 beats per measure and a single beat has the duration of a quarter-note. The symbols used to represent different note durations are not necessary to know to follow this guide, but are generally reflected in how the vertical stems are drawn for each musical note.
A chord is easily understood: it is two or more musical notes played simultaneously. I highlighted a handful of chords using orange arrows:
Like musical notes, chords have a certain duration.
A rest is an "empty" space" within a measure where no musical notes or chords are played. Rests have particular durations just like notes and chords. Hence, a whole rest would be the absence of music for an entire measure; a half rest would be the absence of music for half of a measure; a quarter rest would be the absence of music for one-fourth of a measure; and so on. Rests of different durations are denoted by different symbols, which are not necessary to follow this guide. Nonetheless, I highlighted a couple of rests in the musical score example with red arrows:
The point of this sub-section was to (hopefully) acquaint you with enough musical concepts so that you can successfully complete this Part 1 guide and the upcoming Part 2 guide. There are several music symbols and music theory concepts that I opted to skip over. I encourage you to do a Google search on music symbols and basic music theory to review more comprehensive articles and write-ups.
The Musical Instrument Digital Interface (MIDI) is a standard introduced in the early 1980s that allows electronic instruments, such as synthesizers, to exchange data. A couple of simple examples might help to explain the idea:
Since MIDI is a digital interface, it "talks" in numbers. For example, different musical notes (i.e. pitches) are represented by different numbers. The full specification for MIDI 1.0, the original MIDI standard introduced in 1983, is quite lengthy. You can read more about the MIDI 1.0 specification here.
In this Part 1 guide, you will run a single experiment using 5 classical music compositions from the GiantMIDI-Piano dataset, which was released in 2020. This dataset was developed by scientists at ByteDance, the Chinese firm behind the ubiquitous TikTok application. GiantMIDI-Piano is a classical piano MIDI dataset [that] contains 10,855 MIDI files of 2,786 composers. The total duration of GiantMIDI-Piano is 1,237 hours, and there are 34,504,873 transcribed piano notes. To accomplish this feat, the ByteDance scientists designed logic to automatically convert audio recordings of classical piano music to MIDI data. That conversion supports their consumption and analysis via a variety of computational tools. The GitHub repository for the GiantMIDI-Piano dataset is located here. Note that the GitHub repository only contains a small subset of the dataset's full set of MIDI files. Those samples are sufficient for this guide. However, as described in the repository's disclaimer.md
file, the full set of MIDI files (~200 MB) can be downloaded from the Google Drive folder located here.
You won't need to use the full GiantMIDI-Piano dataset with this guide. That being said, I've uploaded 10,841 of the original dataset's 10,855 MIDI files to Kaggle. The Kaggle dataset can be accessed here. The 14 missing files are due to errors I encountered when unzipping the original dataset archive. Also, please note that the filenames in the Kaggle dataset are different as compared to the original GiantMIDI-Piano dataset. It was necessary to change the filenames to overcome upload errors due to illegal characters in several original filenames.
music21 offers a comprehensive and powerful set of tools for computer-aided music analysis and computational musicology. Particularly, music21 can be used to analyze and manipulate MIDI data, and to consume/generate MIDI files. You will use music21 to load data from the GiantMIDI-Piano dataset, as well as to generate MIDI files for your generated music. You will only scratch the surface of its capabilities in this guide. The project's full documentation can be viewed/built from the documentation
folder of the project's GitHub repository.
It is assumed you already understand the mechanics of Hidden Markov Models.
The input to the HMM is a sequence of musical-type observations. There are 3 musical types: <REST>
, <NOTE>
, and <CHORD>
. So, a sample sequence of these types could be:
<NOTE> <REST> <CHORD> <NOTE> <NOTE> <NOTE> <CHORD> <REST>...
The HMM predicts a hidden sequence of musical elements from the observation sequence. So, the prediction for the partial sequence above might start out as:
[{'name': '<NOTE>', 'pitch': 'C6', 'duration': 1.0}, {'name': '<REST>', 'duration': 0.25},...]
To determine the hidden sequence of musical elements, the HMM needs to calculate the probability of a particular musical element given an observed musical type over the entire observation sequence. As you know, we can reform this probability calculation to express it as the multiplication of the probability of musical type m
having a musical element e
and the probability of musical element e_i
given a preceeding musical element e_i-1
. These probabilities represent our emission and transition probabilities respectively. So, for this HMM, we need to define/calculate:
O = o_1, o_2, ..., o_M, a set of `M` possible observations (i.e. musical types).
π = π_1, π_2, ..., π_N, an initial probabliity distribution that an observation sequence begins with a given state (i.e. musical element) where `N` is the number of states (i.e. musical elements).
A = a_11, a_12, ..., a_NN, a transition probability matrix where `a_ij` is the probability of moving from state (i.e. musical element) `i` to state (i.e. musical element) `j`; each row of `A` sums to 1.
B = b_11, b_12, ..., b_NM, an emission probability matrix where b_ij is the probability of state `i` (i.e. musical element `e_i`) emitting observation (i.e. musical type) `j` (musical type `m_j`).
From the components above, we will derive:
E = e_1, e_2, ..., e_M, a set of `M` hidden states (i.e. musical elements).
You are now ready to start building the music generation notebook. Broadly, the notebook is broken into 3 parts:
The GiantMIDI-Piano GitHub repository contains a subset of the dataset's full set of MIDI files that will be sufficient for use with this guide. Use git
to clone the repository to your local computer:
git clone https://github.com/bytedance/GiantMIDI-Piano.git
Create a new directory on your local computer. This directory will hold selected GiantMIDI-Piano musical scores that will serve as input data for our first experiment with the HMM. You can name the directory as you wish. This guide uses the name midi-music
.
Using the GiantMIDI-Piano
repository that you cloned in Step 1a, navigate to GiantMIDI-Piano/midis_for_evaluation/giantmidi-piano
.
Copy the following 5 MIDI files from GiantMIDI-Piano/midis_for_evaluation/giantmidi-piano
to the input data directory you just created, e.g. midi-music
if you opted to use that name:
Schubert_Fantasie_in_C_major_D760_lR43Ti4w5MM_cut_mov_1.mid
Schubert_Fantasie_in_C_major_D760_lR43Ti4w5MM_cut_mov_2.mid
Schubert_Fantasie_in_C_major_D760_lR43Ti4w5MM_cut_mov_3.mid
Schubert_Fantasie_in_C_major_D760_lR43Ti4w5MM_cut_mov_4.mid
The MIDI files above represent one distinct musical score: Fantasie in C major by Franz Schubert. This is a fairly long composition that is composed of multiple movements or sections. Interestingly, not all sections have the same time signature and musical key. The "C major" designation in the composition name represents the main musical key of the score. If you reviewed the Notes sub-section above, I explained that notes outside of a musical key might sound "weird" or "strange". So, the multiple keys used with this composition could lead to some rather non-musical results. Similarly, the use of multiple time signatures may result in some strange rhythms.
Install the NumPy and music21 packages using pip
:
pip install numpy
pip install music21
The following 23 sub-steps build each of the music generation notebook's 23 cells in order.
The first cell imports required Python packages. Set the first cell to:
### CELL 1: Import Python packages ###
import os
from typing import Tuple, Union
# Data and arithmetic packages
import numpy as np
# Music analysis packages
from music21 import *
The second cell sets constants used throughout the notebook. Replace [INSERT_PATH_TO_YOUR_INPUT_DATA_DIRECTORY_HERE]
with the directory that you created in Step 1b-1 to hold the MIDI files that will be used as input data.
### CELL 2: Constants ###
DATA_DIR = "[INSERT_PATH_TO_YOUR_INPUT_DATA_DIRECTORY_HERE]" # e.g. "C:/midi-music"
OBS = {
"OBS1": ["<NOTE>", "<CHORD>", "<NOTE>", "<NOTE>", "<REST>", "<NOTE>", "<NOTE>", "<NOTE>", "<CHORD>", "<NOTE>", "<REST>", "<NOTE>", "<NOTE>", "<NOTE>", "<CHORD>"],
"OBS2": ["<CHORD>", "<CHORD>", "<CHORD>", "<CHORD>", "<NOTE>", "<CHORD>", "<CHORD>", "<REST>", "<CHORD>", "<REST>", "<CHORD>", "<REST>", "<NOTE>", "<CHORD>", "<REST>"],
"OBS3": ["<REST>", "<NOTE>", "<REST>", "<NOTE>", "<REST>", "<NOTE>", "<CHORD>", "<NOTE>", "<NOTE>", "<CHORD>", "<REST>", "<NOTE>", "<REST>", "<NOTE>", "<NOTE>"],
"OBS4": ["<REST>", "<NOTE>", "<CHORD>", "<NOTE>", "<CHORD>", "<NOTE>", "<NOTE>", "<REST>", "<CHORD>", "<REST>", "<CHORD>", "<NOTE>", "<NOTE>", "<NOTE>", "<CHORD>"]
}
USE_OBS = OBS["OBS1"]
OBS
dictionary holds a set of observations that will be used with the HMM. In other words, these are the sequences that we observe and from which we wish to predict a hidden sequence via the HMM. As you can see, each observation sequence is built from a 3-member set: {<REST>
, <NOTE>
, <CHORD>
}. These items represent the musical types that can comprise a musical sequence for our purposes in this guide.USE_OBS
constant simply refers to a specific observation sequence in the OBS
dictionary.
The representations <REST>
, <NOTE>
, and <CHORD>
are chosen arbitrarily. They could have just as easily been represented as <1>
, <2>
, and <3>
. However, I think explicit representations are more useful as they make the guide easier to follow.
The third cell defines a utility function that will be used to load MIDI data using the music21 converter.parse
method. Set the third cell to:
### CELL 3: Utility function to load MIDI data ###
def load_data() -> list:
scores = []
for filename in os.listdir(DATA_DIR):
if filename.endswith(".mid"):
filepath = DATA_DIR + filename
midi = converter.parse(filepath)
scores.append(midi)
return scores
converter
is music21 class for reading different music file formats including MIDI files.The fourth cell runs the load_data
function defined in Step 2.3 to load the MIDI files in your input data directory created in Step 1b.
### CELL 4: Load MIDI Data ###
scores = load_data()
The fifth cell defines a utility function that "flattens" each MIDI musical score via the music21 recurse
method and, in doing so, extracts a list of specific musical elements (i.e. rests, notes, chords) comprising each musical score. It also builds a list of indices pointing to the locations of measures in the scores. Set the fifth cell to:
### CELL 5: Utility function to get all musical elements and indices of measures for each score ###
def get_indices_of_measures_and_musical_elements_by_score(scores: list) -> Tuple[list, list]:
all_scores_elements = []
indices_of_all_measures = []
for score in scores:
elements = [element for element in score.recurse()]
measures = []
for e in range(len(elements)):
if isinstance(elements[e], stream.Measure):
measures.append(e)
indices_of_all_measures.append(measures)
all_scores_elements.append(elements)
return indices_of_all_measures, all_scores_elements
The sixth cell runs the get_indices_of_measures_and_musical_elements_by_score
function defined in Step 2.5. Set the sixth cell to:
### CELL 6: Getting all musical elements and indices of measures for each score ###
indices_of_all_measures, all_scores_elements = get_indices_of_measures_and_musical_elements_by_score(scores)
If you print the returned indices_of_all_mesaures
list, you should see:
[[2, 6, 21, 51, 61, 81, 91, 117, 138, 161, 188, 211, 232, 269, 301, 315, 324, 343, 349, 362, 386, 392, 405, 418, 424, 426, 453, 473, 493, 529, 569, 602, 642,...
If you print the returned all_scores_elements
list, you should see a sequence of music21 objects:
[[<music21.metadata.Metadata object at 0x25158fc32e0>, <music21.stream.Part 0x2516afb7e50>, <music21.stream.Measure 1 offset=0.0>,...
The following are examples of a music21 Rest
object, a Note
object, and a Chord
object:
<music21.note.Rest 16th>
<music21.note.Note C#>
<music21.chord.Chord G4 C#5 B-5>
Each musical score contained within the all_scores_elements
list from Step 2.6 is composed of these types of music objects. Moreover, each of these music objects has different properties that we can access. For example, they all have a duration
property which tells how long (i.e. for how many beats) a given music element lasts. music21 Note
objects have a pitch
property that tells us the specific note being played. Chord
objects have a notes
property that tells us which specific notes comprise the chord. The seventh cell defines a utility function that extracts the musical properties of the musical elements in each score. The musical elements are organized by the measure that they belong to in each score. This utility function also returns the average measure length - in terms of number of musical elements - for all scores. Set the seventh cell to:
### CELL 7: Utility function to extract musical properties of musical elements by measure and by score
def extract_musical_elements(all_scores_elements: list, indices_of_all_measures: list) -> dict:
def normalize_duration(duration: Union[float, duration.Duration]) -> float:
_duration = duration
if type(_duration) is not float:
_duration = float(_duration)
return round(_duration, 2)
def normalize_volume(volume: float) -> float:
return round(float(volume), 1)
elements_by_measure_by_score = {}
num_measures = 0
len_all_measures = 0
for s in range(len(all_scores_elements)):
elements_by_measure_by_score[s] = {}
# Extract the musical elements of this score
elements = all_scores_elements[s]
indices = indices_of_all_measures[s]
i = 0 # Measure counter
while True:
measure = []
start = indices[i]
end = indices[i + 1]
for e in range(start + 1, end):
element_props = None
if isinstance(elements[e], note.Rest):
element_props = {"name": "<REST>", "duration": normalize_duration(elements[e].quarterLength)}
if isinstance(elements[e], note.Note):
element_props = {"name": "<NOTE>", "pitch": str(elements[e].pitch), "duration": normalize_duration(elements[e].quarterLength)}
if isinstance(elements[e], chord.Chord):
element_props = {"name": "<CHORD>", "notes": [], "duration": normalize_duration(elements[e].duration.quarterLength)}
for chord_note in elements[e].notes:
element_props["notes"].append(str(chord_note.pitch))
if not element_props == None:
measure.append(element_props)
elements_by_measure_by_score[s][i] = measure
num_measures += 1
len_all_measures += len(measure)
if i + 1 == len(indices_of_all_measures[s]) - 1:
break
else:
i += 1
avg_measure_len = round(len_all_measures/num_measures)
return elements_by_measure_by_score, avg_measure_len
normalize_duration
helper method is used to normalize duration
property values, which are sometimes expressed as a float and other times as fractions of a beat.Note
and Chord
objects also have a volume
property. I opted to neglect the property at this point in the guide to keep things a bit simpler. This is why the normalize_volume
helper method is defined but not used.The eighth cell runs the extract_musical_elements
function defined in Step 2.7. Set the eighth cell to:
elements_by_measure_by_score, avg_measure_len = extract_musical_elements(all_scores_elements, indices_of_all_measures)
elements_by_measure_by_score
dictionary, you will see the following output:
{0: {0: [{'name': '<REST>', 'duration': 4.0}], 1: [{'name': '<REST>', 'duration': 1.0}, {'name': '<CHORD>', 'notes': ['C3', 'E3', 'G2', 'C4', 'E4', 'C2'], 'duration': 0.5},...
avg_measure_len
, you should see that the average measure length is 22
. You may have also noticed that the length of each observation sequence in Step 2.2 is 15
. I opted to set the observation sequence length to this value arbitrarily.We need to define a "vocabulary" for our input data. In other words, we need to extract the unique elements across all 3 musical scores. Be aware that <NOTE>
and <CHORD>
elements with the same pitches, but different durations, are themselves different. So, each of the following examples are unique elements:
{'name': '<CHORD>', 'notes': ['G#3', 'C4', 'E-4'], 'duration': 0.75}
{'name': '<CHORD>', 'notes': ['G#3', 'C4', 'E-4'], 'duration': 1.00}
{'name': '<NOTE>', 'pitch': 'G5', 'duration': 0.25}
{'name': '<NOTE>', 'pitch': 'G5', 'duration': 0.50}
Of course, <REST>
elements of different durations also represent unique elements. The ninth cell defines a utility function to generate a dictionary of unique elements. It also returns a second dictionary with the frequency of each unique element. Set the ninth cell to:
def get_unique_elements_and_freqs(elements_by_measure_by_score: dict) -> Tuple[dict, dict, dict]:
unique_elems = {}
freqs = {}
key = -1
for score in elements_by_measure_by_score:
for measure in elements_by_measure_by_score[score]:
elements = elements_by_measure_by_score[score][measure]
for element in elements:
if element not in list(unique_elems.values()):
# Create a new key for this element
key += 1
# Add this element to unique elements
unique_elems[key] = element
# Add this element to `freqs` dictionary
freqs[key] = 1
else:
# Get the key for this element
key_for_this_element = list(unique_elems.values()).index(element)
# Update the frequency in the `freqs` dictionary
freqs[key_for_this_element] += 1
return unique_elems, freqs
Note
and Chord
objects have a volume
property but I opted not to extract it at this point in the guide. This was done to simplify the "vocabulary" of the musical scores. If volume
is extracted as a property, Note
and Chord
elements with the same pitch and duration values, but different volumes, would themselves be different thus increasing the size of our musical vocabulary and increasing the number of calculations we need to make for the HMM.The tenth cell runs the get_unique_elements_and_freqs
function defined in Step 2.9. Set the tenth cell to:
### CELL 10: Generating dictionary of unique elements and their frequencies ###
unique_elems, freqs = get_unique_elements_and_freqs(elements_by_measure_by_score)
If you print the unique_elems
dictionary, you will see the following output:
{0: {'name': '<REST>', 'duration': 4.0}, 1: {'name': '<REST>', 'duration': 1.0}, 2: {'name': '<CHORD>', 'notes': ['C3', 'E3', 'G2', 'C4', 'E4', 'C2'],...
You can also print the length of the unique_elems
dictionary which should be 5163
.
If you print the freqs
dictionary, you will see the following output:
{0: 1, 1: 273, 2: 1, 3: 1, 4: 1175, 5: 1, 6: 1, 7: 509, 8: 1,...
The freqs
dictionary is not used in the setup of the HMM. That being said, you can easily write a function to output the top n
unique elements to better understand the structure of this selection of music. It is left as an exercise for you to implement such a function; and it should output the following top 10 unique elements:
# Top 10 musical elements expressed as a tuple (element, frequency)
({'name': '<REST>', 'duration': 0.25}, 1175)
({'name': '<REST>', 'duration': 0.08}, 509)
({'name': '<REST>', 'duration': 0.33}, 437)
({'name': '<REST>', 'duration': 0.5}, 430)
({'name': '<REST>', 'duration': 0.17}, 395)
({'name': '<REST>', 'duration': 0.75}, 311)
({'name': '<REST>', 'duration': 1.0}, 273)
({'name': '<REST>', 'duration': 0.67}, 229)
({'name': '<REST>', 'duration': 0.42}, 203)
({'name': '<REST>', 'duration': 1.25}, 170)
44
unique <REST>
elements, 1156
unique <NOTE>
elements, and 3963
unique <CHORD>
elements.We are now ready to start setting up the HMM. In this step, we will calculate an initial probability distribution matrix, I
. The matrix tells us the probability of starting a musical sequence with a particular element from unique_elems
. The shape of the initial probability distribution matrix is (1, 764)
. The eleventh cell defines a function to generate I
. Set the eleventh cell to:
### CELL 11: Utility function to generate initial probability distribution ###
def calculate_initial_probability_distribution(elements_by_measure_by_score: dict[dict[list]], unique_elems: dict) -> np.ndarray:
num_elements = len(unique_elems)
I = np.zeros((num_elements))
count = 0
for score in elements_by_measure_by_score:
for measure in elements_by_measure_by_score[score]:
first_element = elements_by_measure_by_score[score][measure][0]
first_element_key = list(unique_elems.values()).index(first_element)
I[first_element_key] += 1
count += 1
I = I/count
return I
The twelfth cell runs the calculate_initial_probability_distribution
function defined in Step 2.11. Set the twelfth cell to:
### CELL 12: Generating initial probability distribution ###
I = calculate_initial_probability_distribution(elements_by_measure_by_score, unique_elems)
The sum of the initial probabilities should be 1.0
which you can check by calling the sum
method on I
:
print(I.sum())
# Should output 1.0
Next, we need to calculate the transition probability matrix, A
. The transition probability matrix tells us the probability of "moving" from one particular musical element to another musical element. The shape of the transition probabliity matrix is (5163, 5163)
. The thirteenth cell defines a utility function to generate A
. Set the thirteenth cell to:
### CELL 13: Utility function to generate transition probability matrix ###
def calculate_transition_probability_matrix(elements_by_measure_by_score: dict, unique_elems: dict) -> np.ndarray:
num_elements = len(unique_elems)
A = np.zeros((num_elements, num_elements), dtype = "float32")
for key in range(0, len(unique_elems)):
unique_element = unique_elems[key]
count = 0
for score in elements_by_measure_by_score:
for m_index in range(len(elements_by_measure_by_score[score])):
elements = elements_by_measure_by_score[score][m_index]
for e_index in range(len(elements)):
element = elements[e_index]
if element == unique_element:
count += 1
if e_index < len(elements) - 1:
next_element = elements[e_index + 1]
elif m_index < len(elements_by_measure_by_score[score]) - 1:
next_element = elements_by_measure_by_score[score][m_index + 1][0]
else:
# The last element of the last measure of every score won't have a succeeding musical element.
# Arbitrarily assign `next_element` to a half note rest.
next_element = {'name': '<REST>', 'duration': 2.0}
next_key = list(unique_elems.values()).index(next_element)
A[key, next_key] += 1
A[key, :] = A[key, :]/count
return A
The fourteenth cell runs the calculate_transition_probability_matrix
function defined in Step 2.13. Set the fourteenth cell to:
### CELL 14: Generating transition probability matrix ###
A = calculate_transition_probability_matrix(elements_by_measure_by_score, unique_elems)
1.0
. This can be verified by calling the sum()
method on each row.The emission probability matrix, B
, gives us the last set of probabilities that we need to setup the HMM. The emission probability matrix tells us the probability that a given musical element "emits" a given observation. Remember from Step 2.2, we only have 3 possible observation values: <REST>
, <NOTE>
, and <CHORD>
. So, if a musical element is a rest, its probability of emitting the observation <REST>
is 1
. The same logic applies to notes and chords. The shape of the emission probability matrix is (5163, 15)
. The fifteenth cell defines a utility function to generate the emission probability matrix. Set the fifteenth cell to:
### CELL 15: Utility function to generate emission probability matrix ###
def calculate_emission_probability_matrix(unique_elems: dict, OBS: list) -> np.ndarray:
num_elements = len(unique_elems)
B = np.zeros((num_elements, len(OBS)), dtype = "float32")
for key in unique_elems:
emitting_element = unique_elems[key]
for o in range(len(OBS)):
observation = OBS[o]
if emitting_element["name"] == observation:
prob = 1
else:
prob = 0
B[key, o] = prob
return B
The sixteenth cell runs the calculate_emission_probability_matrix
function defined in Step 2.15. For this preliminary experiment, we'll use the observation sequence specified by the USE_OBS
constant from Step 2.2. Set the sixteenth cell to:
### CELL 16: Generating emission probability matrix ###
B = calculate_emission_probability_matrix(unique_elems, USE_OBS)
Now that we have the initial probability distribution matrix, the transition probability matrix, and the emission probability matrix, we can calculate the Viterbi lattice for our observation sequence chosen in Step 16. Simultaneously, we will generate the backpointer matrix to read out the best (i.e. highest probability) path from the lattice. Both the Viterbi lattice and backpointer matrix will have shape (5163, 15)
. The seventeenth cell defines a utility function to generate the Viterbi lattice and the backpointer matrix. Set the seventeenth cell to:
def viterbi(OBS, unique_elems, I, A, B):
states = list(unique_elems.keys())
S = len(states)
T = len(OBS)
viterbi_lattice = np.zeros((S, T))
backpointer_matrix = np.zeros((S, T))
for state in range(len(states)):
prob = I[state] * B[state][0]
viterbi_lattice[state][0] = prob
backpointer_matrix[state][0] = 0
for t in range(1, len(OBS)):
for state in range(0, len(states)):
max_prob = 0
argmax = 0
for state_prior in range(len(states)):
prob = viterbi_lattice[state_prior][t - 1] * A[state_prior][state] * B[state][t]
if prob > max_prob:
max_prob = prob
argmax = state_prior
viterbi_lattice[state][t] = max_prob
backpointer_matrix[state][t] = argmax
return viterbi_lattice, backpointer_matrix
The eighteenth cell runs the viterbi
function defined in Step 2.17. Set the eighteenth cell to:
viterbi_lattice, backpointer_matrix = viterbi(USE_OBS, unique_elems, I, A, B)
We can read the best (i.e. highest probability) path using the backpointer matrix generated in Step 2.18. The nineteenth cell defines a utility function to read out the highest probability path from the Viterbi lattice. Set the nineteenth cell to:
### CELL 19: Utility method to read out best path from Viterbi lattice and backpointer matrix ###
def read_best_path(OBS, unique_elems, viterbi_lattice, backpointer_matrix):
elements = []
# Get musical elements for observations in range 1 through len(OBS) starting from the back.
for t in range(len(OBS) - 1, 0, -1):
max = 0
argmax = 0
for s in range(viterbi_lattice.shape[0]):
val = viterbi_lattice[s][t]
if val > max:
max = val
argmax = s
elem_key = backpointer_matrix[argmax][t]
elem = unique_elems[elem_key]
elements.append(elem)
# Get state for first observation.
max_start = 0
argmax_start = 0
for s in range(viterbi_lattice.shape[0]):
val = viterbi_lattice[s][0]
if val > max_start:
max_start = val
argmax_start = s
start_elem = unique_elems[argmax_start]
elements.append(start_elem)
elements.reverse()
return elements
The twentieth cell runs the read_best_path
function defined in Step 2.19. Set the twentieth cell to:
### CELL 20: Reading best path
elements = read_best_path(OBS, unique_elems, viterbi_lattice, backpointer_matrix)
Now that we have our predicted musical elements from Step 2.20, we can generate a MIDI stream from them using the music21 stream.Stream
method. The first step is to iterate over each predicted element and to create an instance of the analogous music21 Rest
, Note
, or Chord
object. These music21 objects are appended to the music
list. The stream.Stream
method is called on the music
list to generate the MIDI output that we need. The twenty-first cell defines a utility function to generate the MIDI stream. Set the twenty-first cell to:
### CELL 21: Utility function to generate MIDI stream from generated musical elements ###
def generate_midi_stream(elements: list) -> Tuple[list, stream.Stream]:
music = []
for element in elements:
if element["name"] == "<REST>":
m = note.Rest()
m.duration.quarterLength = element["duration"]
if element["name"] == "<CHORD>":
m = chord.Chord(element["notes"])
m.duration.quarterLength = element["duration"]
if element["name"] == "<NOTE>":
m = note.Note(element["pitch"])
m.duration.quarterLength = element["duration"]
music.append(m)
midi_stream = stream.Stream(music)
return music, midi_stream
The twenty-second cell runs the generate_midi_stream
from Step 2.21. Set the twenty-second cell to:
### CELL 22: Generating MIDI stream from generated musical elements ###
music, midi_stream = generate_midi_stream(elements)
If you print the returned music
list, you should see the following output:
[<music21.note.Rest 16th>, <music21.note.Rest 16th>, <music21.note.Note C>, <music21.chord.Chord E4 B-3>, <music21.note.Note B->, <music21.chord.Chord D2 E2>, <music21.note.Note F#>, <music21.note.Note G#>, <music21.note.Rest 217/100ql>, <music21.chord.Chord C2 B4>, <music21.note.Rest 52/25ql>, <music21.chord.Chord E5 A4>, <music21.note.Note G#>, <music21.note.Note D>, <music21.note.Note C>]
If you print the returned midi_stream
object, you should see output similar to:
<music21.stream.Stream 0x2a5ec64d720>
We can easily export our MIDI stream to a file using the music21 write
method. The twenty-third cell exports the MIDI data. Set the twenty-third cell to:
### CELL 23: Writing MIDI data to disk ###
FILENAME = "hmm_music_exp_1_using_schubert_fantasie.mid"
midi_stream.write("midi", FILENAME)
Obviously, you can set FILENAME
to whatever you would like. I opted to use the format:
hmm_music_exp_[EXPERIMENT NUMBER]_using_[DESCRIPTION OF INPUT DATA].mid
USE_OBS
constant to point to the sequence that you would like to use.FILENAME
constant in Step 2.23, assuming that you would like to save newly generated music to a different MIDI file.I ran the same experiments (i.e. using the same 4 observation sequences) outlined in this Part 1 guide and I uploaded the results to Kaggle. You can access the experimental results via the Music Generation with GiantMIDI-Piano dataset, and specifically within the hmm_experiments
folder. The hmm_experiments
folder contains four sub-folders that include the results of the four experiments using a Hidden Markov Model (HMM) for music generation. Each sub-folder includes a:
Setup .txt
file: This file details the MIDI scores used as input data for the HMM, along with the observation sequence used to generate the music sequence output.
Output .mid
MIDI file: This MIDI file is the MIDI representation of the generated music for the particular experiment.
Output .wav
file: This WAV file is the audio representation of the generated music for the particular experiment. A piano synthesizer was used to generate the audio outputs.
I will describe how you can listen to and interact with your generated MIDI files in the Part 2 guide. However, if you would like a preview, you can listen to the .wav
files in the four experiment sub-folders. These .wav
files were generated using a commercial piano synthesizer (i.e. a synthesizer that I own). Your experimental results should be exactly the same, assuming you followed the guide as written.
Congratulations on completing this Part 1 guide. In the Part 2 guide, you will use Signal Online MIDI Editor to listen to and interact with your generated music. Please do a search on my HackerNoon username to find the Part 2 guide. See you in Part 2!