265 reads

Cocktail Alchemy: Creating New Recipes With Transformers

by Pavan madduruFebruary 24th, 2023

Too Long; Didn't Read

Transformers have revolutionized natural language processing (NLP) tasks by providing superior performance in language translation, text classification, and sequence modeling. This article will demonstrate how to build a transformer model to generate new cocktail recipes. We will use the Cocktail DB dataset, which contains information about thousands of cocktails, including their ingredients and recipes.

featured image - Cocktail Alchemy: Creating New Recipes With Transformers

Transformers have revolutionized natural language processing (NLP) tasks by providing superior performance in language translation, text classification, and sequence modeling.

The transformer architecture is based on a self-attention mechanism that allows each element in a sequence to attend to all other elements and stacked encoders that process the input sequence.

This article will demonstrate how to build a transformer model to generate new cocktail recipes. We will use the Cocktail DB dataset, which contains information about thousands of cocktails, including their ingredients and recipes.

Download the Cocktail DB Dataset:

First, we need to download and preprocess the Cocktail DB dataset. We will use the Pandas library to accomplish this.

import pandas as pd

url = 'https://www.thecocktaildb.com/api/json/v1/1/search.php?s=' cocktail_df = pd.DataFrame() for i in range(1, 26): response = pd.read_json(tr(i)) cocktail_df = pd.concat([cocktail_df, response['drinks']], ignore_index=True)

Preprocess the Dataset:

cocktail_df = cocktail_df.dropna(subset=['strInstructions']) cocktail_df = cocktail_df[['strDrink', 'strInstructions', 'strIngredient1', 'strIngredient2', 'strIngredient3', 'strIngredient4', 'strIngredient5', 'strIngredient6']] cocktail_df = cocktail_df.fillna('')

Next, we need to tokenize and encode the cocktail recipes using the tokenizer.

Import TensorFlow datasets as tfds.

Define the Tokenizer and Vocabulary Size:

tokenizer = tfds.features.text.SubwordTextEncoder.build_from_corpus( (text for text in cocktail_df['strInstructions']), target_vocab_size=2**13)

Define the Encoding Function:

def encode(text): encoded_text = tokenizer.encode(text) return encoded_text

Apply the Encoding Function to the Dataset:

cocktail_df['encoded_recipe'] = cocktail_df['strInstructions'].apply(encode)

Define the Maximum Length of the Recipe:

MAX_LEN = max([len(recipe) for recipe in cocktail_df['encoded_recipe']])

With the tokenized cocktail recipes, we can define the transformer decoder layer. The transformer decoder layer consists of two sub-layers: the masked multi-head self-attention layer and the point-wise feed-forward layer.

import tensorflow as tf from tensorflow.keras.layers import LayerNormalization, MultiHeadAttention, Dense

class TransformerDecoderLayer(tf.keras.layers.Layer): def init(self, num_heads, d_model, dff, rate=0.1): super(TransformerDecoderLayer, self).init()

    self.mha1 = MultiHeadAttention(num_heads, d_model)
    self.mha2 = MultiHeadAttention(num_heads, d_model)
    self.ffn = tf.keras.Sequential([
        Dense(dff, activation='relu'),
        Dense(d_model)
    ])
    self.layernorm1 = LayerNormalization(epsilon=1e-6)
    self.layernorm2 = LayerNormalization(epsilon=1e-6)
    self.layernorm3 = LayerNormalization(epsilon=1e-6)
    self.dropout1 = tf.keras.layers.Dropout(rate)
    self.dropout2 = tf.keras.layers.Dropout(rate)
    self.dropout3 = tf.keras.layers.Dropout(rate)

def call(self, x, enc_output, training, look_ahead_mask):
    attn1 = self.mha1(x, x, x, look_ahead_mask)
    attn1 = self.dropout1(attn1, training=training)
    out1 = self.layernorm1(x + attn1)
    attn2 = self.mha2(enc_output, enc_output, out1, out1, out1)
    attn2 = self.dropout2(attn2, training=training)
    out2 = self.layernorm2(out1 + attn2)
    ffn_output = self.ffn(out2)
    ffn_output = self.dropout3(ffn_output, training=training)
    out3 = self.layernorm3(out2 + ffn_output)
    return out3

In the code above, the TransformerDecoderLayer class takes four arguments: the number of heads for the masked multi-head attention layer, the dimension of the model, the number of units in the point-wise feed-forward layer, and the dropout rate.

The call method defines the forward pass of the decoder layer, where x is the input sequence, enc_output is the output of the encoder, training is a Boolean flag that indicates whether the model is in training or inference mode, and look_ahead_mask is a mask that prevents the decoder from attending to future tokens.

We can now define the transformer model, which consists of multiple stacked transformer decoder layers followed by a Dense layer that maps the decoder output to the vocabulary size.

From tensorflow.keras.layers import Input

Define the Input Layer:

input_layer = Input(shape=(MAX_LEN,))

Define the Transformer Decoder Layers:

NUM_LAYERS = 4 NUM_HEADS = 8 D_MODEL = 256 DFF = 1024 DROPOUT_RATE = 0.1

decoder_layers = [TransformerDecoderLayer(NUM_HEADS, D_MODEL, DFF, DROPOUT_RATE) for _ in range(NUM_LAYERS)]

Define the Output Layer:

output_layer = Dense(tokenizer.vocab_size)

Connect the Layers:

x = input_layer look_ahead_mask = tf.linalg.band_part(tf.ones((MAX_LEN, MAX_LEN)), -1, 0) for decoder_layer in decoder_layers: x = decoder_layer(x, x, True, look_ahead_mask) output = output_layer(x)

Define the Model:

model = tf.keras.models.Model(inputs=input_layer, outputs=output)

In the code above, we define the input layer to accept the padded sequences with a length of MAX_LEN. We then define the transformer decoder layers by creating a list of TransformerDecoderLayer objects stacked together to process the input sequence.

The output of the last transformer decoder layer is passed through a Dense layer with a vocabulary size corresponding to the number of subwords in the tokenizer. We can train the model using an Adam optimizer and evaluate its performance after a certain number of epochs.

Define the Loss Function:

loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')

def loss_function(real, pred): mask = tf.math.logical_not(tf.math.equal(real, 0)) loss_ = loss_object(real, pred)

mask = tf.cast(mask, dtype=loss_.dtype)
loss_ *= mask

return tf.reduce_mean(loss_)

Define the Learning Rate Schedule:

class CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule): def init(self, d_model, warmup_steps=4000): super(CustomSchedule, self).init()

    self.d_model = tf.cast(d_model, tf.float32)
    self.warmup_steps = warmup_steps

def __call__(self, step):
    arg1 = tf.math.rsqrt(step)
    arg2 = step * (self.warmup_steps**-1)
return tf.math.rsqrt(self.d_model) * tf.math.minimum(arg1, arg2)

Define the Optimizer:

LR = CustomSchedule(D_MODEL) optimizer = tf.keras.optimizers.Adam(LR, beta_1=0.9, beta_2=0.98, epsilon=1e-9)

Define the Accuracy Metric:

train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')

Define the Training Step Function()

@tf.function def train_step(inp, tar): tar_inp = tar[:, :-1] tar_real = tar[:, 1:]

look_ahead_mask = tf.linalg.band_part(tf.ones((tar.shape[1], tar.shape[1])), -1, 0) look_ahead_mask = 1 - look_ahead_mask

with tf.GradientTape() as tape: predictions = model(inp, True, look_ahead_mask) loss = loss_function(tar_real, predictions)

gradients = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(gradients, model.trainable_variables))

train_accuracy.update_state(tar_real, predictions)

return loss Train the model EPOCHS = 50 BATCH_SIZE = 64 NUM_EXAMPLES = len(cocktail_df)

for epoch in range(EPOCHS): print('Epoch', epoch + 1) total_loss = 0

for i in range(0, NUM_EXAMPLES, BATCH_SIZE): batch = cocktail_df[i:i+BATCH_SIZE] input_batch = tf.keras.preprocessing.sequence.pad_sequences(batch['encoded_recipe'], padding='post', maxlen=MAX_LEN) target_batch = input_batch

loss = train_step(input_batch, target_batch)
total_loss += loss

print('Loss:', total_loss) print('Accuracy:', train_accuracy.result().numpy()) train_accuracy.reset_states

Once the model is trained, we can generate new cocktail recipes by feeding the model a seed sequence and iteratively predicting tar. Shapetokentar. Shapee end-of-sequence token is generated.

def generate_recipe(seed, max_len): 
encoded_seed = encode(seed) for i in range(max_len):
input_sequence = tf.keras.preprocessing.sequence.pad_sequences([encoded_seed],
 padding='post', maxlen=MAX_LEN) predictions = model(input_sequence, False, None)
 predicted_id = tf.random.categorical(predictions[:, -1, :], num_samples=1)
 if predicted_id == tokenizer.vocab_size: 
break encoded_seed = tf.squeeze(predicted_id).numpy().tolist() 
recipe = tokenizer.decode(encoded_seed)
 return recipe

In summary, transformers are a powerful tool for sequence modeling that can be used in a wide range of applications beyond NLP.

By following the steps outlined in this article, it is possible to build a transformer model to generate new cocktail recipes, demonstrating the transformer architecture's flexibility and versatility.

References

The Cocktail DB dataset can be accessed and downloaded from the website https://www.thecocktaildb.com/
The Pandas library is part of the Python programming language and can be found in its official documentation: https://pandas.pydata.org/docs/
TensorFlow is an open-source machine learning library for Python, and its official documentation can be found at https://www.tensorflow.org/api_docs
The Keras API is integrated with TensorFlow, and its documentation can be found at https://keras.io/api/
The Transformer architecture and its application in natural language processing are discussed in the paper "Attention Is All You Need" by Vaswani et al. (2017), available at https://arxiv.org/abs/1706.03762