692 reads

Running the Turing Test on Myself

by Sergej ChicherinAugust 29th, 2023

Too Long; Didn't Read

Fine-tune Llama 2 on your text dialogues and have it speak as if it were you.

featured image - Running the Turing Test on Myself

Last month, most of AI news was dedicated to Llama 2. This is a large text model developed by Meta and offered for free. In many tasks like code generation or common sense reasoning, it is on par with ChatGPT and other SOTA models. I revisited one of my earlier experiments to assess the advances in Large Language Models (LLMs). Specifically, I aim to perform a Turing Test with myself as the subject. In other words, I want AI to 'speak' as if it were me!

In his seminal 1950 work, Alan Turing described the imitation game where two people, A and B, are talking, and next, the machine takes part of A imitating his answers. My idea is to collect all the previous answers by A and train the large language model to imitate them. In this article, I will fine-tune Llama 2 in a chat-based style to take part in the game.

The word “game” could be also referred to as a language game concept invented by philosopher Ludwig Wittgenstein. In his work Philosophical Investigations, he described each conversation as strictly connected to the context. So like all the game have something in common, the characteristics of the target game is not easily placed in the category. Wittgenstein employed this metaphor to emphasize that word usage can shift based on context and acquire variable meanings. We can deduce from this the ordinary fact that the answers from A usually depend on B.

Noteworthy, when I act as both the observer and the subject A, the Turing Test becomes especially challenging.

Fine-tune Llama 2 on all your texts and run a bot. Step-by-step instructions.

Export your data.

Firstly, I require my own messaging data. Thanks to regulations like GDPR, you have access to all messages you've sent. Anytime, you can download your data from sites such as Google Takeout, Apple Privacy or Facebook data export tool. Different parts of the world use various communication tools, each with unique formats; for example, I retrieved my data from Facebook and Telegram.

Since every exporting format is different there is something in common. First, I collect the names of recipient B and identify texts from myself with a special tag. Next, I gather in the dictionary all chats in this structure Dict[str, List[Dict[str,Any]]]. where the key is the name of B and the value is a list of messages with descriptions. The simplest example is here {“B_name“:[{“text”:“hi“,”from”:”B”}{“text“:”how are you”,”from”:”textual_avatar”}]}

Each message in the list consists of author, timestamp and text. Now it is easier to sample the history of the chat, the previous input and my reply. While the text model is stateless I need to keep the context of a chat by keeping up the previous 10 messages.

I use timestamps for splitting all messages into temporary chunks. If nobody talks for a day, I omit the previous messages as a context.

Create the instruction-based dataset.

Each sample with the history of previous messages, the input text and the other party name form my instruction-following dataset. I used the design of the Stanford Alpaca project with each sample consisting of instruction, input as previous messages and text which is my reply.

For instruction, I use the following prompt You are {TEXTUAL_AVATAR}, a sophisticated AI designed to engage in text conversations. Your goal is to provide relevant responses based on the given context. Imagine you have been having a conversation with {sample['counterpart']} Your task is to mimic a text reply to the last message as {TEXTUAL_AVATAR}

I exported all these messages to a CSV file and encrypted it. From this point, there could be no private messages on disk.

Obtain LLM.

To download Llama 2 first, you have to accept the license agreement. I used HuggingFace modification of version of Llama 2 and SFTTrainer from trl. I load it in 4 bits using the config from bitsandbytes:

bnb_config=BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4",bnb_4bit_compute_dtype=torch.bfloat16 )

SFTTrainer requires a special method for formatting instructions, I use the default method from Alpaca dataset:

def format_instruction(sample):

return f"""### Instruction: {sample['instruction']} ### Input: {sample['input']} ### Response: {sample['response']} """

While there are web services that can quickly fine-tune various models on your data, I've chosen not to use them due to several factors: non-zero cost, privacy concerns, and the potential to enhance my experiments further, independently.

Fine-tune.

I use Parameter Efficient Fine Tuning and Quantized Low-Rank Adaptation.

Low-Rank adaptation(LoRa) adds to large Linear layers in the base model special block. It has the same inputs-outputs as in the base model, however, NxN matrices are replaced with a combination of NxSmall and SmallxN. These blocks are widely used in attentions and the number of parameters in added blocks is much smaller than in the initial model. By leveraging Low-Rank Adaptation, I am able to train just 4 million parameters for the 7-billion parameter model. Training for 50K samples takes two hours for each epoch on RTX-3090.

Exporting model and inference.

After the fine-tuning process, to merge and unload the QLoRA with the base weights, I had to reload the base model in fp16 format due to a current bug at the time of writing:

new_model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, load_in_8bit=False, device_map="auto", trust_remote_code=True) new_model = PeftModel.from_pretrained( new_model, glob.glob("llama-7-int4-textualavatar/checkpoint-*")[0]) model = new_model.merge_and_unload()

I take the output obtained from the model to the text after '### Response:'

After comparing this output with the actual response, I find that they are stylistically similar and generally align well in straightforward scenarios.

Chatbots.

I utilize the method def predict(input_text, history, counterpart="")

that infers the model and returns text on previous inputs and history with some person.

Given the method, the Gradio demo would be easily started as

To set up the Telegram bot, you'll need to acquire an API key from @BotFather. The incoming messages contain only user IDs. Fortunately, the exported chat data already contains a dictionary where the keys are linked to these IDs and the values are their true names. As a result, the chatbot will recognize these individuals and respond to each one in the same conversational style you've used previously.

My results

The initial result was not promising - the model was good in mimicking myself, but in fact, it says nothing serious. The learning curve was jumped as a heart rate on electrocardiography. To improve, I refined the dataset to only include my replies ranging between 10 and 500 characters. I also adjusted the initial learning rate, which led to a smoother loss curve. Although the model may not perfectly represent me, it has surfaced some long-forgotten biographical details.

Possible ways to improve.

On decent hardware like Ampere, it is possible to utilize flash attention to speed up the training.

The fine-tuning process can utilize ghost attention instruction, which implies a chain of thought in the dialogues.

The quality of a fine-tuned model is largely influenced by the quality of its input data. To improve this, you could apply stricter filtering rules with a large language model. On the other hand, it is possible to add more data to the instruction-based dataset with some other source of my writings. Or, generated with LLM and next filtered with Reinforcement Learning from Human or AI feedback.