**Register Now!**

265 reads

by Pavan madduruFebruary 24th, 2023

Transformers have revolutionized natural language processing (NLP) tasks by providing superior performance in language translation, text classification, and sequence modeling.

The transformer architecture is based on a self-attention mechanism that allows each element in a sequence to attend to all other elements and stacked encoders that process the input sequence.

This article will demonstrate how to build a transformer model to generate new cocktail recipes. We will use the Cocktail DB dataset, which contains information about thousands of cocktails, including their ingredients and recipes.

First, we need to download and preprocess the Cocktail DB dataset. We will use the Pandas library to accomplish this.

```
import pandas as pd
url = 'https://www.thecocktaildb.com/api/json/v1/1/search.php?s=' cocktail_df = pd.DataFrame() for i in range(1, 26): response = pd.read_json(tr(i)) cocktail_df = pd.concat([cocktail_df, response['drinks']], ignore_index=True)
```

```
cocktail_df = cocktail_df.dropna(subset=['strInstructions']) cocktail_df = cocktail_df[['strDrink', 'strInstructions', 'strIngredient1', 'strIngredient2', 'strIngredient3', 'strIngredient4', 'strIngredient5', 'strIngredient6']] cocktail_df = cocktail_df.fillna('')
```

Next, we need to tokenize and encode the cocktail recipes using the tokenizer.

Import TensorFlow datasets as tfds.

```
tokenizer = tfds.features.text.SubwordTextEncoder.build_from_corpus( (text for text in cocktail_df['strInstructions']), target_vocab_size=2**13)
```

```
def encode(text): encoded_text = tokenizer.encode(text) return encoded_text
```

```
cocktail_df['encoded_recipe'] = cocktail_df['strInstructions'].apply(encode)
```

`MAX_LEN = max([len(recipe) for recipe in cocktail_df['encoded_recipe']])`

With the tokenized cocktail recipes, we can define the transformer decoder layer. The transformer decoder layer consists of two sub-layers: the masked multi-head self-attention layer and the point-wise feed-forward layer.

```
import tensorflow as tf from tensorflow.keras.layers import LayerNormalization, MultiHeadAttention, Dense
class TransformerDecoderLayer(tf.keras.layers.Layer): def init(self, num_heads, d_model, dff, rate=0.1): super(TransformerDecoderLayer, self).init()
self.mha1 = MultiHeadAttention(num_heads, d_model)
self.mha2 = MultiHeadAttention(num_heads, d_model)
self.ffn = tf.keras.Sequential([
Dense(dff, activation='relu'),
Dense(d_model)
])
self.layernorm1 = LayerNormalization(epsilon=1e-6)
self.layernorm2 = LayerNormalization(epsilon=1e-6)
self.layernorm3 = LayerNormalization(epsilon=1e-6)
self.dropout1 = tf.keras.layers.Dropout(rate)
self.dropout2 = tf.keras.layers.Dropout(rate)
self.dropout3 = tf.keras.layers.Dropout(rate)
def call(self, x, enc_output, training, look_ahead_mask):
attn1 = self.mha1(x, x, x, look_ahead_mask)
attn1 = self.dropout1(attn1, training=training)
out1 = self.layernorm1(x + attn1)
attn2 = self.mha2(enc_output, enc_output, out1, out1, out1)
attn2 = self.dropout2(attn2, training=training)
out2 = self.layernorm2(out1 + attn2)
ffn_output = self.ffn(out2)
ffn_output = self.dropout3(ffn_output, training=training)
out3 = self.layernorm3(out2 + ffn_output)
return out3
```

In the code above, the TransformerDecoderLayer class takes four arguments: the number of heads for the masked multi-head attention layer, the dimension of the model, the number of units in the point-wise feed-forward layer, and the dropout rate.

The call method defines the forward pass of the decoder layer, where x is the input sequence, enc_output is the output of the encoder, training is a Boolean flag that indicates whether the model is in training or inference mode, and look_ahead_mask is a mask that prevents the decoder from attending to future tokens.

We can now define the transformer model, which consists of multiple stacked transformer decoder layers followed by a Dense layer that maps the decoder output to the vocabulary size.

From tensorflow.keras.layers import Input

```
input_layer = Input(shape=(MAX_LEN,))
```

```
NUM_LAYERS = 4 NUM_HEADS = 8 D_MODEL = 256 DFF = 1024 DROPOUT_RATE = 0.1
decoder_layers = [TransformerDecoderLayer(NUM_HEADS, D_MODEL, DFF, DROPOUT_RATE) for _ in range(NUM_LAYERS)]
```

```
output_layer = Dense(tokenizer.vocab_size)
```

```
x = input_layer look_ahead_mask = tf.linalg.band_part(tf.ones((MAX_LEN, MAX_LEN)), -1, 0) for decoder_layer in decoder_layers: x = decoder_layer(x, x, True, look_ahead_mask) output = output_layer(x)
```

```
model = tf.keras.models.Model(inputs=input_layer, outputs=output)
```

In the code above, we define the input layer to accept the padded sequences with a length of MAX_LEN. We then define the transformer decoder layers by creating a list of TransformerDecoderLayer objects stacked together to process the input sequence.

The output of the last transformer decoder layer is passed through a Dense layer with a vocabulary size corresponding to the number of subwords in the tokenizer. We can train the model using an Adam optimizer and evaluate its performance after a certain number of epochs.

```
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True, reduction='none')
def loss_function(real, pred): mask = tf.math.logical_not(tf.math.equal(real, 0)) loss_ = loss_object(real, pred)
mask = tf.cast(mask, dtype=loss_.dtype)
loss_ *= mask
return tf.reduce_mean(loss_)
```

`class CustomSchedule(tf.keras.optimizers.schedules.LearningRateSchedule): def init(self, d_model, warmup_steps=4000): super(CustomSchedule, self).init()`

```
self.d_model = tf.cast(d_model, tf.float32)
self.warmup_steps = warmup_steps
def __call__(self, step):
arg1 = tf.math.rsqrt(step)
arg2 = step * (self.warmup_steps**-1)
return tf.math.rsqrt(self.d_model) * tf.math.minimum(arg1, arg2)
```

```
LR = CustomSchedule(D_MODEL) optimizer = tf.keras.optimizers.Adam(LR, beta_1=0.9, beta_2=0.98, epsilon=1e-9)
```

```
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name='train_accuracy')
```

```
@tf.function def train_step(inp, tar): tar_inp = tar[:, :-1] tar_real = tar[:, 1:]
look_ahead_mask = tf.linalg.band_part(tf.ones((tar.shape[1], tar.shape[1])), -1, 0) look_ahead_mask = 1 - look_ahead_mask
with tf.GradientTape() as tape: predictions = model(inp, True, look_ahead_mask) loss = loss_function(tar_real, predictions)
gradients = tape.gradient(loss, model.trainable_variables) optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_accuracy.update_state(tar_real, predictions)
return loss Train the model EPOCHS = 50 BATCH_SIZE = 64 NUM_EXAMPLES = len(cocktail_df)
for epoch in range(EPOCHS): print('Epoch', epoch + 1) total_loss = 0
for i in range(0, NUM_EXAMPLES, BATCH_SIZE): batch = cocktail_df[i:i+BATCH_SIZE] input_batch = tf.keras.preprocessing.sequence.pad_sequences(batch['encoded_recipe'], padding='post', maxlen=MAX_LEN) target_batch = input_batch
loss = train_step(input_batch, target_batch)
total_loss += loss
print('Loss:', total_loss) print('Accuracy:', train_accuracy.result().numpy()) train_accuracy.reset_states
```

Once the model is trained, we can generate new cocktail recipes by feeding the model a seed sequence and iteratively predicting tar. Shapetokentar. Shapee end-of-sequence token is generated.

```
def generate_recipe(seed, max_len):
encoded_seed = encode(seed) for i in range(max_len):
input_sequence = tf.keras.preprocessing.sequence.pad_sequences([encoded_seed],
padding='post', maxlen=MAX_LEN) predictions = model(input_sequence, False, None)
predicted_id = tf.random.categorical(predictions[:, -1, :], num_samples=1)
if predicted_id == tokenizer.vocab_size:
break encoded_seed = tf.squeeze(predicted_id).numpy().tolist()
recipe = tokenizer.decode(encoded_seed)
return recipe
```

In summary, transformers are a powerful tool for sequence modeling that can be used in a wide range of applications beyond NLP.

By following the steps outlined in this article, it is possible to build a transformer model to generate new cocktail recipes, demonstrating the transformer architecture's flexibility and versatility.

**References**

- The Cocktail DB dataset can be accessed and downloaded from the website
https://www.thecocktaildb.com/ - The Pandas library is part of the Python programming language and can be found in its official documentation:
https://pandas.pydata.org/docs/ - TensorFlow is an open-source machine learning library for Python, and its official documentation can be found at
https://www.tensorflow.org/api_docs - The Keras API is integrated with TensorFlow, and its documentation can be found at
https://keras.io/api/ - The Transformer architecture and its application in natural language processing are discussed in the paper "Attention Is All You Need" by Vaswani et al. (2017), available at
https://arxiv.org/abs/1706.03762

L O A D I N G

. . . comments & more!

. . . comments & more!