嘿大家！   我是纳塔拉吉 就像你一样，我一直对人工智能的最新进展着迷。意识到我需要跟上所有正在发生的发展，我决定踏上个人学习之旅，因此 ， 人工智能 100 天 出生于！通过这个系列，我将学习法学硕士，并通过我的博客文章分享想法、实验、观点、趋势和学习。您可以在 HackerNoon 上跟踪整个旅程 这里 或我的个人网站 这里 。 在之前的一篇文章中，我们讨论了微调及其重要性。在这篇文章中，我们将了解一种称为 指令微调的特定微调。 预训练基础模型的局限性： 像 gpt-3 这样的预训练基础模型是根据大量数据进行训练的。对于 gpt-3，它是互联网上的所有数据。好吧，我们不确定这一点，但大多数模型都是经过大量的手动清理和格式化后，在互联网规模的数据上进行训练的。当他们接受训练时，基础模型学习如何预测下一个令牌并真正擅长令牌预测。但纯粹的代币预测并不像你想象的那么有用。如果你问一个预训练的基础模型“  ”它不会回复答案，但可能会用“ ”来完成输入句子。因此，即使像 gpt-3 这样的模型在令牌预测方面非常强大，它也无法充当聊天机器人或副驾驶。那么我们如何将预先训练的模型转换为像 chat-gpt 这样有用的聊天机器人呢？答案是微调，主要是一种特定类型的微调，称为“ ”。 墨西哥的首都是哪里？ 哥伦比亚的首都是什么 指令微调 什么是指令微调？ 指令微调也称为“指令跟随”，是一个教导预先训练的基础模型表现得像聊天机器人的过程。  需要问题和答案形式的数据集。您可以使用公共数据集或您公司的问答形式的数据集。如果您的数据集不是问答形式，您可以使用羊驼等不同技术或在其他法学硕士上使用自定义提示将数据转换为问答形式。请注意，指令微调为模型提供了一种新的行为，不仅可以回答微调中使用的数据的问题，而且这种新行为适用于模型已经拥有的现有知识，这使得微调成为一种强大的技术。 指令微调 使用 Lamini 进行指令微调：  Lamini 是一家人工智能公司，它允许开发人员以简单的方式处理语言模型，从而抽象出托管、培训和其他复杂方面的复杂性。 查看它的全部功能。我们将使用 Lamini 训练名为 的小型语言模型，这是由 创建的开源模型，并使用名为 Alpaca 的公司数据集对其进行 。 在这里 pythia Eleuther AI 指令微调 第 1 步：初始化并加载指令微调数据集 在此步骤中，我们初始化所需的模块并查看羊驼训练数据集。这是代码。   import itertools import jsonlines from datasets import load_dataset from pprint import pprint from llama import BasicModelRunner from transformers import AutoTokenizer, AutoModelForCausalLM from transformers import AutoModelForSeq2SeqLM, AutoTokenizer ## we are using alpaca data set, which is an open source fine tuning data set instruction_tuned_dataset = load_dataset("tatsu-lab/alpaca", split="train", streaming=True) m = 5 print("Instruction-tuned dataset:") top_m = list(itertools.islice(instruction_tuned_dataset, m)) for j in top_m: print(j) 这就是指令调整数据集的样子。它包含问题和答案形式的数据。  第 2 步：补充提示 在此步骤中，我们从羊驼集中获取数据并将其放入下面显示的提示中。   prompt_template_with_input = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {instruction} ### Input: {input} ### Response:""" prompt_template_without_input = """Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: {instruction} ### Response:""" ## hydrate prompts - meaning add data to the above prompts processed_data = [] for j in top_m: if not j["input"]: processed_prompt = prompt_template_without_input.format(instruction=j["instruction"]) else: processed_prompt = prompt_template_with_input.format(instruction=j["instruction"], input=j["input"]) processed_data.append({"input": processed_prompt, "output": j["output"]}) 完成此操作后，数据集将如下所示。  我们基本上将原始问答数据转换为对法学硕士有意义的格式，当被问到问题时，该问题的回答应该是什么样子。我们迭代地执行此操作并将其存储在 文件中。 jsonl   with jsonlines.open(f'alpaca_processed.jsonl', 'w') as writer: writer.write_all(processed_data) 步骤 3 – 非微调输出 在步骤 1 和 2 中，我们加载原始数据并将其水化并以 格式存储。但 Lamini 已准备好这些水合数据，因此从技术上讲，步骤 1 和 2 不是必需的。但需要展示以了解指令微调的工作原理。让我们首先看看 Pythia 模型的非微调版本如何响应一个简单的问题。 jsonl   tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m") #70M parameter model that is not instruction tuned. model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-70m") def inference(text, model, tokenizer, max_input_tokens=1000, max_output_tokens=100): # Tokenize input_ids = tokenizer.encode( text, return_tensors="pt", truncation=True, max_length=max_input_tokens ) # Generate device = model.device generated_tokens_with_prompt = model.generate( input_ids=input_ids.to(device), max_length=max_output_tokens ) # Decode generated_text_with_prompt = tokenizer.batch_decode(generated_tokens_with_prompt, skip_special_tokens=True) # Strip the prompt generated_text_answer = generated_text_with_prompt[0][len(text):] return generated_text_answer ## the 70M model doesnt have any company specific data, we will use the alpace data set from hosted on lamini and fine tune this model # load alpaca dataset finetuning_dataset_path = "lamini/lamini_docs" finetuning_dataset = load_dataset(finetuning_dataset_path) #print(finetuning_dataset) test_sample = finetuning_dataset["test"][0] print(test_sample) print("untrained output sample") print(inference(test_sample["question"], model, tokenizer)) 这是我得到的输出。您会注意到输出没有帮助，并且模型正在尝试完成标记，但没有给出实际答案。  步骤 4 – 指令微调输出 一旦我们使用上一步中看到的问答数据来指导微调，同一模型将开始像聊天机器人一样运行，并将为您的问题提供更准确的答案，包括关于精细数据的问题，以及模型已经获得的数据组成关闭。这几乎就像当孩子第一次学习语言时，他或她现在能够表达他们已经有了的感受以及他们因语言训练而学到的新事物。就像模型的预训练版本一样，指令微调模型也托管在 Lamini 上，可以使用如下所示的命令进行推断。 （是的，拉米尼太棒了！）   ## finetuned output instruction_model = AutoModelForCausalLM.from_pretrained("lamini/lamini_docs_finetuned") print("instruction finetuned output") print(inference(test_sample["question"], instruction_model, tokenizer)) 输出如下所示。您会注意到，我们有更准确的输出，而不是我们在上一步中看到的乱码。  这篇文章的目的是介绍 以及如何使用它来将基础模型制作成更可用的版本。在以后的文章中，我将深入探讨 指令微调 指令微调的实际过程。 这就是人工智能 100 天中的第 13 天。 我写了一篇名为“高于平均水平”的时事通讯，其中讨论了大型科技领域正在发生的一切背后的二阶见解。如果您从事科技行业并且不想成为平庸的人， 。 请订阅它 在  、  上关注我，了解 AI 100 天的最新动态。如果您从事技术工作，您可能有兴趣加入我的技术专业人士 。 Twitter LinkedIn 社区 也出现 。 在这里

Product & Engineering @Microsoft Azure | On Deck Fellow |
Partner at planbcapital.co

2021 - HackerNoon Contributor of the Year - CROWDFUNDING

2022 - HackerNoon Contributor of the Year - Business Strategy

2022 - HackerNoon Contributor of the Year - India

2022 - HackerNoon Contributor of the Year - Netflix

2022 - Startup Blogger of the Year

Listen to Startup Project Podcast

Subscribe to Above Average Co.

Follow me @natarajsindam

Portfolio

Meet the Writer: HackerNoon Contributor Nataraj Sindam on Experimenting With AI 

該音頻是用故事的原始語言製作的！

太長; 讀書

人工智能 100 天，第 13 天：指令微调如何改进预训练的法学硕士

人工智能 100 天，第 13 天：指令微调如何改进预训练的法学硕士

About Author

註釋

標籤

这篇文章刊登在

Related Stories

成功云迁移的完整指南：策略和最佳实践

比特币 UTXO 模型，为独特的生态系统提供动力

点击赚钱：Telegram 可能会在 Solana 之前吸引下一个 100 亿加密用户

如何将您的工作流程提高 10 倍：17 个必备应用程序

成功云迁移的完整指南：策略和最佳实践

比特币 UTXO 模型，为独特的生态系统提供动力

点击赚钱：Telegram 可能会在 Solana 之前吸引下一个 100 亿加密用户

如何将您的工作流程提高 10 倍：17 个必备应用程序

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps