Learn how to generate lyrics using deep (multi-layer) LSTM in this article by Matthew Lamons, founder, and CEO of Skejul — the AI platform to help people manage their activities, and Rahul Kumar, an AI scientist, deep learning practitioner, and independent researcher.
This article will show you how to create a deep LSTM model suited for the task of generating music lyrics. Here’s your goal: to build and train a model that outputs entirely new and original lyrics that is in the style of an arbitrary number of artists. You can refer to the code file found at Lyrics-ai
(https://github.com/PacktPublishing/Python-Deep-Learning-Projects/tree/master/Chapter06/Lyrics-ai) for this exercise.
To build a model that can generate lyrics, you’ll need a huge amount of lyric data, which can easily be extracted from various sources. You can find the code files at https://github.com/PacktPublishing/Python-Deep-Learning-Projects/tree/master/Chapter06/Lyrics-ai. These files contain a text file called lyrics_data.txt
which includes lyrics from around 10,000 songs and stored them in.
Now that you have your data, convert this raw text into the one-hot encoding version:
import numpy as np
import codecs
# Class to perform all preprocessing operations
class Preprocessing:
vocabulary = {}
binary_vocabulary = {}
char_lookup = {}
size = 0
separator = '->'
# This will take the data file and convert data into one hot encoding and dump the vocab into the file.
def generate(self, input_file_path):
input_file = codecs.open(input_file_path, 'r', 'utf_8')
index = 0
for line in input_file:
for char in line:
if char not in self.vocabulary:
self.vocabulary[char] = index
self.char_lookup[index] = char
index += 1
input_file.close()
self.set_vocabulary_size()
self.create_binary_representation()
# This method is to load the vocab into the memory
def retrieve(self, input_file_path):
input_file = codecs.open(input_file_path, 'r', 'utf_8')
buffer = ""
for line in input_file:
try:
separator_position = len(buffer) + line.index(self.separator)
buffer += line
key = buffer[:separator_position]
value = buffer[separator_position + len(self.separator):]
value = np.fromstring(value, sep=',')
self.binary_vocabulary[key] = value
self.vocabulary[key] = np.where(value == 1)[0][0]
self.char_lookup[np.where(value == 1)[0][0]] = key
buffer = ""
except ValueError:
buffer += line
input_file.close()
self.set_vocabulary_size()
# Below are some helper functions to perform pre-processing.
def create_binary_representation(self):
for key, value in self.vocabulary.iteritems():
binary = np.zeros(self.size)
binary[value] = 1
self.binary_vocabulary[key] = binary
def set_vocabulary_size(self):
self.size = len(self.vocabulary)
print "Vocabulary size: {}".format(self.size)
def get_serialized_binary_representation(self):
string = ""
np.set_printoptions(threshold='nan')
for key, value in self.binary_vocabulary.iteritems():
array_as_string = np.array2string(value, separator=',', max_line_width=self.size * self.size)
string += "{}{}{}\n".format(key.encode('utf-8'), self.separator, array_as_string[1:len(array_as_string) - 1])
return string
The overall objective of the pre-processing module is to convert the raw text data into one-hot encoding, as shown in the following diagram:
This figure represents the data preprocessing part. The raw lyrics data is used to build the vocabulary mapping which is further been transformed into the on-hot encoding. After the successful execution of the pre-processing module, a binary file will be dumped as {dataset_filename}.vocab
. This vocab
file is one of the mandatory files that need to be fed into the model during the training process, along with the dataset.
This article will take an approach from the Keras model and use TensorFlow to write each layer from scratch. TensorFlow gives you a more fine-tuned control over your model’s architecture. For this model, use the code in the following block to create two placeholders that will store the input and output values:
import tensorflow as tf
import pickle
from tensorflow.contrib import rnn
def build(self, input_number, sequence_length, layers_number, units_number, output_number):
self.x = tf.placeholder("float", [None, sequence_length, input_number])
self.y = tf.placeholder("float", [None, output_number])
self.sequence_length = sequence_length
Next, store the weights and bias in the variables that you’ve created:
self.weights = {
'out': tf.Variable(tf.random_normal([units_number, output_number]))
}
self.biases = {
'out': tf.Variable(tf.random_normal([output_number]))
}
x = tf.transpose(self.x, [1, 0, 2])
x = tf.reshape(x, [-1, input_number])
x = tf.split(x, sequence_length, 0)
You can build this model using multiple LSTM layers, with the basic LSTM cells assigning each layer with the specified number of cells, as shown in the following diagram:
Tensorboard visualization of the LSTM architecture
The following is the code for this:
lstm_layers = []
for i in range(0, layers_number):
lstm_layer = rnn.BasicLSTMCell(units_number)
lstm_layers.append(lstm_layer)
deep_lstm = rnn.MultiRNNCell(lstm_layers)
self.outputs, states = rnn.static_rnn(deep_lstm, x, dtype=tf.float32)
print "Build model with input_number: {}, sequence_length: {}, layers_number: {}, " \
"units_number: {}, output_number: {}".format(input_number, sequence_length, layers_number,
units_number, output_number)
# This method is using to dump the model configurations
self.save(input_number, sequence_length, layers_number, units_number, output_number)
Now that you have the mandatory inputs, that is, the dataset file path, the vocab
file path, and the model name initiate the training process. Define all hyperparameters for the model:
import os
import argparse
from modules.Model import *
from modules.Batch import *
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--training_file', type=str, required=True)
parser.add_argument('--vocabulary_file', type=str, required=True)
parser.add_argument('--model_name', type=str, required=True)
parser.add_argument('--epoch', type=int, default=200)
parser.add_argument('--batch_size', type=int, default=50)
parser.add_argument('--sequence_length', type=int, default=50)
parser.add_argument('--log_frequency', type=int, default=100)
parser.add_argument('--learning_rate', type=int, default=0.002)
parser.add_argument('--units_number', type=int, default=128)
parser.add_argument('--layers_number', type=int, default=2)
args = parser.parse_args()
Since this is a batch training the model, divide the dataset into batches of a defined batch_size
using the Batch
module:
batch = Batch(training_file, vocabulary_file, batch_size, sequence_length)
Each batch will return two arrays. One will be the input vector of the input sequence, which will have a shape of [batch_size
, sequence_length
, vocab_size
], and the other array will hold the label vector, which will have a shape of [batch_size
, vocab_size
].
Now, initialize your model and create the optimizer function. In this model, you used the Adam
Optimizer. Then, you’ll train your model and perform the optimization over each batch:
# Building model instance and classifier
model = Model(model_name)
model.build(input_number, sequence_length, layers_number, units_number, classes_number)
classifier = model.get_classifier()
**# Building cost functions**
cost = tf.reduce_mean(tf.square(classifier - model.y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Computing the accuracy metrics
expected_prediction = tf.equal(tf.argmax(classifier, 1), tf.argmax(model.y, 1))
accuracy = tf.reduce_mean(tf.cast(expected_prediction, tf.float32))
# Preparing logs for Tensorboard
loss_summary = tf.summary.scalar("loss", cost)
acc_summary = tf.summary.scalar("accuracy", accuracy)
train_summary_op = tf.summary.merge_all()
out_dir = "{}/{}".format(model_name, model_name)
train_summary_dir = os.path.join(out_dir, "summaries")
##
# Initializing the session and executing the training
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
iteration = 0
while batch.dataset_full_passes < epoch:
iteration += 1
batch_x, batch_y = batch.get_next_batch()
batch_x = batch_x.reshape((batch_size, sequence_length, input_number))
sess.run(optimizer, feed_dict={model.x: batch_x, model.y: batch_y})
if iteration % log_frequency == 0:
acc = sess.run(accuracy, feed_dict={model.x: batch_x, model.y: batch_y})
loss = sess.run(cost, feed_dict={model.x: batch_x, model.y: batch_y})
print("Iteration {}, batch loss: {:.6f}, training accuracy: {:.5f}".format(iteration * batch_size,
loss, acc))
batch.clean()
Once the model completes its training, the checkpoints are stored. You can use them later for inferencing. The following is a graph of the accuracy and the loss that occurred during the training process:
The accuracy (top) and the loss (bottom) plot with respect to the time
You can see that accuracy getting increased and loss getting reduced over the period of time.
Now that the model is ready, you can use it to make predictions. Start by defining all parameters. While building inference, you need to provide some seed text, just like you did in the previous model. Along with this, you should also provide the path of the vocab
file and the output file in which you will store the generated lyrics. Also, provide the length of the text that you need to generate:
import argparse
import codecs
from modules.Model import *
from modules.Preprocessing import *
from collections import deque
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--model_name', type=str, required=True)
parser.add_argument('--vocabulary_file', type=str, required=True)
parser.add_argument('--output_file', type=str, required=True)
parser.add_argument('--seed', type=str, default="Yeah, oho ")
parser.add_argument('--sample_length', type=int, default=1500)
parser.add_argument('--log_frequency', type=int, default=100)
Next, load the model by providing the name of model that you used in the training step in the preceding code, and restore the vocabulary from the file:
model = Model(model_name)
model.restore()
classifier = model.get_classifier()
vocabulary = Preprocessing()
vocabulary.retrieve(vocabulary_file)
Use the stack methods to store the generated characters, append the stack, and then use the same stack to feed it into the model in an interactive fashion:
# Preparing the raw input data
for char in seed:
if char not in vocabulary.vocabulary:
print char,"is not in vocabulary file"
char = u' '
stack.append(char)
sample_file.write(char)
# Restoring the models and making inferences
with tf.Session() as sess:
tf.global_variables_initializer().run()
saver = tf.train.Saver(tf.global_variables())
ckpt = tf.train.get_checkpoint_state(model_name)
if ckpt and ckpt.model_checkpoint_path:
saver.restore(sess, ckpt.model_checkpoint_path)
for i in range(0, sample_length):
vector = []
for char in stack:
vector.append(vocabulary.binary_vocabulary[char])
vector = np.array([vector])
prediction = sess.run(classifier, feed_dict={model.x: vector})
predicted_char = vocabulary.char_lookup[np.argmax(prediction)]
stack.popleft()
stack.append(predicted_char)
sample_file.write(predicted_char)
if i % log_frequency == 0:
print "Progress: {}%".format((i * 100) / sample_length)
sample_file.close()
print "Sample saved in {}".format(output_file)
After successful execution, you’ll get your own freshly brewed, AI-generated lyrics reviewed and published. The following is one sample of such lyrics. Some spellings have been modified so that the sentences make sense:
Yeah, oho once upon a time, on ir intasd
I got monk that wear your good
So heard me down in my clipp
Cure me out brick
Coway got baby, I wanna sheart in faic
I could sink awlrook and heart your all feeling in the firing of to the still hild, gavelly mind, have before you, their lead
Oh, oh shor,s sheld be you und make
Oh, fseh where sufl gone for the runtome
Weaaabe the ligavus I feed themust of hear
Here, you can see that the model has learned in the way it has generated the paragraphs and sentences with appropriate spacing. It still lacks perfection and also doesn’t make sense.
Seeing signs of success — The first task is to create a model that can learn, and then the second one is used to improve on that model. This can be obtained by training the model with a larger training dataset and longer training durations.
If you found this article interesting, you can explore Python Deep Learning Projects to master deep learning and neural network architectures using Python and Keras. Python Deep Learning Projects imparts all the knowledge needed to implement complex deep learning projects in the field of computational linguistics and computer vision.