paint-brush
How to Host a Large Language Model Locallyby@thomascherickal
24,795 reads
24,795 reads

How to Host a Large Language Model Locally

by Thomas CherickalJune 5th, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Large Language Models (LLMs) are a subset of artificial intelligence (AI) that are designed to understand and generate natural language. LLMs have been used to develop a range of applications, from chatbots to language translation services. In this article, we will discuss how to create an LLM on a single computer and provide source code examples.
featured image - How to Host a Large Language Model Locally
Thomas Cherickal HackerNoon profile picture


Large Language Models (LLMs) are a subset of artificial intelligence (AI) that are designed to understand and generate natural language. These models are trained on vast amounts of textual data and use complex algorithms to learn patterns in language. LLMs have been used to develop a range of applications, from chatbots to language translation services. In this article, we will discuss how to create an LLM on a single computer and provide source code examples.


Prerequisites

Before we begin, it’s essential to ensure that your computer meets the following requirements:


  1. A powerful computer with at least 16GB of RAM and 1TB of free storage
  2. A modern GPU, such as an NVIDIA GeForce or AMD Radeon, with at least 8GB of video memory
  3. A Linux operating system, such as Ubuntu or CentOS
  4. Basic knowledge of the command line interface (CLI) and Python programming language

Step 1: Install Anaconda

Anaconda is a popular open-source platform for data science and machine learning. It includes various tools and libraries that we will use to create an LLM. To install Anaconda, follow these steps:


  1. Download the Anaconda installer from the official website: https://www.anaconda.com/products/individual

  2. Open the terminal and navigate to the directory where the installer was downloaded

  3. Run the installer by typing the following command: bash Anaconda3–2021.11-Linux-x86_64.sh

  4. Follow the instructions on the screen to complete the installation


Step 2: Create a Python environment

We will create a Python environment to install the necessary libraries and dependencies for the LLM. To create an environment, follow these steps:


  1. Open the terminal and type the following command: conda create — name lm python=3.8

  2. Activate the environment by typing: conda activate lm


Step 3: Install TensorFlow

TensorFlow is an open-source platform for building and training machine learning models. We will use TensorFlow to create the LLM. To install TensorFlow, follow these steps:


  1. Type the following command in the terminal: pip install tensorflow

  2. Verify that TensorFlow is installed by importing it in Python: import tensorflow as tf


Step 4: Download a pre-trained LLM

Training an LLM from scratch requires an enormous amount of computational power and time. Luckily, pre-trained models are available that we can fine-tune for our specific use case. One of the most popular pre-trained LLMs is GPT-2 (Generative Pre-trained Transformer 2). To download the pre-trained GPT-2 model, follow these steps:


  1. Open the terminal and type the following command: git clone https://github.com/openai/gpt-2.git

  2. Navigate to the gpt-2 directory by typing: cd gpt-2

  3. Download the pre-trained model by typing: python download_model.py 117M


Step 5: Fine-tune the pre-trained LLM

Fine-tuning a pre-trained LLM involves training the model on a specific dataset for a specific task. In our example, we will fine-tune the pre-trained GPT-2 model on a text generation task. To fine-tune the model, follow these steps:


  1. Create a new directory for the fine-tuned model by typing: mkdir my_model
  2. Navigate to the my_model directory by typing: cd my_model
  3. Create a new Python file by typing: touch train.py
  4. Open the train.py file in a text editor and paste the following code:
import tensorflow as tf
import numpy as np
import os
import json
import random
import time
import argparse

# Define the command-line arguments
parser = argparse.ArgumentParser()
parser.add_argument("--dataset_path", type=str, required=True,
                    help="Path to the dataset")
parser.add_argument("--model_path", type=str, required=True,
                    help="Path to the pre-trained model")
parser.add_argument("--output_path", type=str, required=True,
                    help="Path to save the fine-tuned model")
parser.add_argument("--batch_size", type=int, default=16,
                    help="Batch size for training")
parser.add_argument("--epochs", type=int, default=1,
                    help="Number of epochs to train for")
args = parser.parse_args()
# Load the pre-trained GPT-2 model
with open(os.path.join(args.model_path, "hparams.json"), "r") as f:
    hparams = json.load(f)
model = tf.compat.v1.estimator.Estimator(
    model_fn=model_fn,
    model_dir=args.output_path,
    params=hparams,
    config=tf.compat.v1.estimator.RunConfig(
        save_checkpoints_steps=5000,
        keep_checkpoint_max=10,
        save_summary_steps=5000
    )
)
# Define the input function for the dataset
def input_fn(mode):
    dataset = tf.data.TextLineDataset(args.dataset_path)
    dataset = dataset.repeat()
    dataset = dataset.shuffle(buffer_size=10000)
    dataset = dataset.batch(args.batch_size)
    dataset = dataset.map(lambda x: tf.strings.substr(x, 0, hparams["n_ctx"]))
    iterator = dataset.make_one_shot_iterator()
    return iterator.get_next()
# Define the training function
def train():
    for epoch in range(args.epochs):
        model.train(input_fn=lambda: input_fn(tf.estimator.ModeKeys.TRAIN))
        print(f"Epoch {epoch+1} completed.")
# Start the training
train()



In the code above, we define the command-line arguments for the dataset path, pre-trained model path, output path, batch size, and number of epochs to train for. We then load the pre-trained GPT-2 model and define the input function for the dataset. Finally, we define the training function and start the training.


Step 6: Generate text using the fine-tuned LLM

Once the LLM is fine-tuned, we can use it to generate text. To generate text, follow these steps:


  1. Open the terminal and navigate to the my_model directory by typing: cd my_model
  2. Create a new Python file by typing: touch generate.py
  3. Open the generate.py file in a text editor and paste the following code:


import tensorflow as tf
import numpy as np
import os
import json
import random
import time
import argparse

# Define the command-line arguments
parser = argparse.ArgumentParser()
parser.add_argument("--model_path", type=str, required=True,
                    help="Path to the fine-tuned model")
parser.add_argument("--length", type=int, default=100,
                    help="Length of the generated text")
parser.add_argument("--temperature", type=float, default=0.7,
                    help="Temperature for text generation")
args = parser.parse_args()
# Load the fine-tuned model
with open(os.path.join(args.model_path, "hparams.json"), "r") as f:
    hparams = json.load(f)
model_fn = model_fn(hparams, tf.estimator.ModeKeys.PREDICT)
model = tf.compat.v1.estimator.Estimator(
    model_fn=model_fn,
    model_dir=args.model_path,
    params=hparams
)
# Define the generation function
def generate_text(length, temperature):
    start_token = "<|startoftext|>"
    tokens = tokenizer.convert_tokens_to_ids([start_token])
    token_length = len(tokens)
    while token_length < length:
        prediction_input = np.array(tokens[-hparams["n_ctx"]:])
        output = list(model.predict(input_fn=lambda: [[prediction_input]]))[0]["logits"]
        logits = output[-1] / temperature
        logits = logits - np.max(logits)
        probs = np.exp(logits) / np.sum(np.exp(logits))
        token = np.random.choice(range(hparams["n_vocab"]), p=probs)
        tokens.append(token)
        token_length += 1
    output_text = tokenizer.convert_ids_to_tokens(tokens)
    output_text = "".join(output_text).replace("▁", " ")
    output_text = output_text.replace(start_token, "")
    return output_text
# Generate text
text = generate_text(args.length, args.temperature)
print(text)




In the code above, we define the command-line arguments for the fine-tuned model path, length of the generated text, and temperature for text generation. We then load the fine-tuned model and define the generation function. Finally, we generate text using the generate_text function and print the output, and we are done!


Conclusion

In this article, we have learned how to create a Large Language Model (LLM) on a single computer using TensorFlow and the GPT-2 architecture. We started by installing TensorFlow and downloading the GPT-2 code from the OpenAI GitHub repository. We then trained a GPT-2 model on a dataset and fine-tuned the pre-trained GPT-2 model using the same dataset. Finally, we generated text using the fine-tuned LLM.


By following the steps in this article, you can create your own LLM and generate text for various tasks such as language translation, chatbots, and content generation.


The Next Generation?


References

  1. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
  2. OpenAI. (2019). GPT-2: Language models are unsupervised multitask learners. Github. Retrieved from https://github.com/openai/gpt-2.
  3. TensorFlow. (2021). TensorFlow: An open source machine learning framework for everyone. Retrieved from https://www.tensorflow.org/.
  4. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.


Also published here.