如何使用 OpenAI 微调和优化 GPT 助手

GPT 最近得到了很多炒作。基于 GPT 的解决方案可能看起来很容易实现——毕竟，只需向模型提供指令就可以实现高质量的输出。这当然足以打动决策者。但真正发生的是你展示的是漂亮但经过精心挑选的例子。而且，更重要的是，系统可能需要更高的可靠性才能投入生产。 企业设想了 的各种用例，其中一些依赖于 GPT 与用户之间的开放式通信。 GPT 以这些工具为例： 自然语言查询转到 ChatSpot API 并转换为 HubSpot CRM API、 Docs API 等操作，然后在执行（或未执行）操作时使用生成文本模型进行回复。基于 。 聊天点。 Google GPT-4 可汗学院的人工智能指南。用户请求被转换为带有注入上下文的提示。该系统依靠 GPT 的能力来处理多达八倍的注入上下文。基于 GPT-4。 坎米戈。 我们知道企业和用户愿意使用自然语言查询来替代用户界面。然而，为确保 AI 解决方案在实际应用中可靠有效，基于 GPT 的模型必须经过微调才能真正适用于特定用例和领域知识。 重要的是， 提供了更多机会为提示提供上下文，并且幻觉错误明显减少。 GPT-4 人工智能幻觉 AI 幻觉构成了工程师在使用 GPT-4 等大型语言模型时需要解决的关键挑战。由于幻觉会产生虚假或误导性信息，因此在事实准确性至关重要的应用程序中，它们可能会产生极其严重的后果。在本节中，我们将更详细地探讨 AI 幻觉的技术方面，并讨论减轻它们发生的策略。 举个简单的例子，考虑一下 Noble Ackerson 提供的这个 。 以自我为中心的事实准确性测试 你会在事实中发现很多虚假信息： 人工智能幻觉的出现主要是由于转换器架构固有的局限性以及用于大型语言模型的训练数据。缺乏支持演绎推理的认知架构使得这些模型容易生成看似合理但实际上不正确的信息。 大型语言模型的转换器架构依赖于注意力机制和自注意力来捕获输入数据中的远程依赖关系。虽然这使模型能够生成连贯且与上下文相关的文本，但它 。此外，训练数据可能包含模型可能无意中学习的偏差或错误信息，从而导致 AI 幻觉。 并不能保证事实的准确性 概率性质 这种缺乏可靠性的原因之一可以在 对于上下文，让我们检查一下概率数据结构，例如 Bloom 过滤器。布隆过滤器是一种概率数据结构，用于测试一个元素是否是一个集合的成员，该集合由位数组和多个哈希函数组成，每个哈希函数将一个元素映射到一个或多个数组索引。 GPT 的概率性质中找到。 要将元素插入过滤器，使用散列函数对该元素进行散列处理，并将相应的数组位设置为 1。 为了查询一个元素是否存在于过滤器中，该元素同样使用散列函数进行散列，如果数组中所有对应的位都设置为 1，则该元素很可能在过滤器中。 但是，如果任何一位未设置为 1，则该元素肯定不在过滤器中。误报通过设计嵌入到概率数据结构中。 与布隆过滤器一样，GPT 也是高度概率性的。但它不是测试集成员资格，而是根据输入提示生成文本。 GPT 由多个转换器层组成，这些转换器层执行复杂的计算以根据输入提示生成令牌输出序列。 GPT 生成的输出不是确定性的，可能会根据所采用的采样方法以及训练期间选择的其他超参数而有很大差异。与布隆过滤器一样，GPT 也可以“产生幻觉”，因为返回的结果在表面上看似合理，但实际上是错误的。 但是，可以通过微调模型并为其提供高质量的训练数据来降低产生不可靠输出的可能性。 GPT-4 和风险 隐藏的事实错误可能对用户非常有害。因此，开发人员实施措施以减少发生错误的可能性是关键。 GPT-4 速度较慢，与前几代模型相比，用户在偶然情况下不会注意到有太大差异。话虽如此，由于在培训期间实施了各种安全措施，包括专家参与、模型安全改进以及额外的监控和执行，GPT-4 更加安全。 ，这些缓解措施显着提高了 GPT-4 的安全属性，模型响应不允许内容请求的倾向降低了 82%，模型响应政策敏感请求的倾向增加了 29%。 [ ] 与 GPT-3.5 相比 参考 减少幻觉 即使涉及一定程度的风险，完全忽略 GPT 也将具有挑战性。 GPT 已成为人类和 API 的新通信接口，旨在减少对 UI 的需求。作为工程师，我们的工作是通过我们掌握的资源找到解决因使用它而产生的问题的方法。有几种方法可以做到这一点。 提示工程 改进提示可能会提高任务绩效，在大约 50% 到 65% 的时间内产生令人满意的结果，但绩效可能不会经常超过此范围。 根据 （仅与文本完成功能相关，与聊天或指令模型无关），改进提示可显着提高 GPT 在推理任务上的性能。 关于大型语言模型如何成为零样本推理机的研究 该研究表明，在每个答案之前添加一个简单的短语，如“ ”，能够将 GPT 转变为一个体面的零样本推理器，在各种基准推理任务上优于零样本 LLM 性能，而无需手动精心制作的少量示例。 让我们一步一步地思考 Few-shot learning 是另一种强大的提示工程技术，它可以显着提高 GPT-4 等语言模型在新任务上的性能，即使只有有限的训练数据可以继续进行。因此，它可能是对更简单的情况进行微调的一个很好的选择。在少样本方法中，用户使用结构化示例来展示他们的期望，然后为模型留出自由空间来填充。 你可以查看一篇关于 文章。 ChatGPT 还不能解决的事情的 上下文注入 上下文注入是一种有助于减少 AI 幻觉并提高跨特定域生成文本的准确性的技术。通过在输入提示中注入相关上下文，为模型提供了更精确的信息，使其能够生成更准确和相关的响应。 虽然上下文摄取方法更快、更便宜，但它也需要领域知识和专业知识才能有效。也就是说，这种方法在生成文本的准确性和相关性至关重要的领域特别有用。预计这种方法将在客户服务和医疗诊断等企业环境中采用。 例如，在客户服务聊天机器人应用程序中，上下文注入可能涉及为模型提供有关客户帐户、先前交互以及任何已知问题或疑虑的相关详细信息。添加的上下文允许模型生成更加个性化和准确的响应，从而改善整体用户体验。 和 在其实现中都严重依赖于此。通过吸收正在编写的代码的上下文以及任何注释或文档，这些编码助手能够掌握代码的预期功能和约束。 GitHub Copilot GitHub Copilot X 上下文注入是迄今为止减少幻觉的最可靠方法，但是，在提示大小方面存在一个关键限制。 # Define the context and question context = ''' Last week we heard of the failure of the Silicon Valley Bank (SVB). And it is the 2nd largest bank to shut down in the US since 2000. While that is sad, miserable, and pathetic, I want to show you something under the hood — the real reason the bank failed. ''' question = "What is the reason for the failure of Silicon Valley Bank and when did it happen?" # Define prompts with and without context prompt_without_context = question prompt_with_context = f'''Context: """{context}""" Instructions: Using the provided context, write a comprehensive reply to the given query. Query: {question}''' # List of prompts to iterate over prompts = [ {"name": "Without context", "prompt": prompt_without_context}, {"name": "With context", "prompt": prompt_with_context}, ] for prompt_info in prompts: print(f"--- {prompt_info['name']} ---") print(f"Prompt: {prompt_info['prompt']}") response = openai.ChatCompletion.create( model="gpt-4", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": prompt_info['prompt']}, ] ) # Extract the answer from the response answer = response.choices[0].message.content.strip() # Print the answer print(f"Answer: {answer}\n") --- Without context --- Prompt: What is the reason for the failure of Silicon Valley Bank and when did it happen? Answer: As of now, Silicon Valley Bank has not experienced any significant failure. Silicon Valley Bank, founded in 1983, is a successful and high-performing financial institution that specializes in offering banking services to technology and life science companies and venture capital firms. The bank has grown consistently and continues to operate effectively, serving its clients and staying relevant in the industry. --- With context --- Prompt: Context: """ Last week we heard of the failure of the Silicon Valley Bank (SVB). And it is the 2nd largest bank to shut down in the US since 2000. While that is sad, miserable, and pathetic, I want to show you something under the hood — the real reason the bank failed. """ Instructions: Using the provided context, write a comprehensive reply to the given query. Query: What is the reason for the failure of Silicon Valley Bank and when did it happen? Answer: Silicon Valley Bank (SVB) failed last week, making it the 2nd largest bank to shut down in the US since 2000. The precise reasons behind the bank's failure have not been provided in the given context, but it is implied that there is a significant underlying cause for its collapse. To fully understand the situation, it would be helpful to obtain more information about the bank's performance, financial stability, and any recent events that may have contributed to the failure. 微调 是另一种可以显着提高 GPT 准确性和可靠性的技术。微调是提供额外训练数据以使模型适应特定任务或领域并提高生成文本的准确性和相关性的过程。动机确实与任何其他深度学习神经网络没有什么不同。 微调 该过程要求将特定领域的数据不断输入预训练模型，直到它学会为目标任务生成更相关和更准确的文本。 从人类反馈中强化学习 (RLHF) RLHF 在使 ChatGPT 变得很棒方面非常有用。看到 ；它做得非常好！ Leandro von Werra 要求 ChatGPT 向我们解释 RLHF 正如您可能已经在上面发现的那样，一个技巧是添加“就像我五岁”等是简化解释的好方法。 RLHF 是一种强大的方法，可用于增强基于 GPT 的模型的性能和安全性。该模型通过人工生成的反馈进行微调，这有助于它从现实世界的示例和用户交互中学习。此过程涉及收集模型生成的响应数据集以及人工排名的质量分数或比较，用于优化模型的参数。 RLHF 已成功应用于许多现实世界的应用程序，包括聊天机器人和 AI 助手，以提高响应质量并减少 AI 幻觉的发生。通过将人类反馈纳入训练过程，RLHF 教导模型生成更准确、与上下文相关且安全的响应，最终带来更好的用户体验和更高的可靠性。至关重要的是，这种方法使开发人员能够利用基于 GPT 的模型的力量，同时解决与创建虚假或误导性信息相关的问题。 只要我们知道领域，我们就能够训练模型来响应我们的需求。我们可以训练它回应“我不知道”或忽略某些主题。 OpenAI 正在其原始模型上使用 RLGH，使它们可以投入生产。 以下是一些示例结果： 设计模型 让我们通过构建 GPT 的微调过程来深入研究一个实际示例。我们将使用定义的数据集训练模型，从而教它回答与该特定领域相关的查询。 考虑下图： 该过程包括以下组件： ：特定于相关领域的训练文件的集合。 源数据集 ：一种基于 GPT-4 的解决方案，使用上下文注入从源数据集中生成问题和答案。 问答生成器 ：训练数据用于微调模型。 问答样本 ：GPT-3 文本完成模型根据数据集进行调整。 微调 ：一组问题和答案用于评估模型的性能。 基准问答 ：微调从基准问答中输入问题并评估生成的响应的有效性的过程。 基准测试 我们继续使用 GPT-3 文本完成模型，因为 GPT-4 目前不支持微调过程。 源数据集 为确保 GPT 不熟悉我们要使用的数据集，我们最好利用 2021 年 9 月（GPT 的知识截止日期）之后的事件数据。 例如，我经常使用 构建 Web 应用程序，而 在 2022 年发布了 Next.js 版本 13。为了验证这一点，让我们向 ChatGPT 询问 Next.js 13 的发布日期，看看它可以拉取哪些信息主题： Next.js Vercel 好的！我的目标是生成的模型比当前模型更了解 Next.js 13 以及如何使用它。您可以在此处阅读我如何根据 next.js 13 篇博文准备数据集： https://github.com/knyga/gpt-fine-tuning-set/tree/main/dataset?embedable=true 样本生成器 在其回复中，我们希望我们的模型以开放格式回复问题（查询）。目前，只有 Text Completion 模型支持微调。因此，我们将不得不训练文本完成模型来回答我们的问题。为了确保正确的结果，我们必须首先将数据集转换为一组问题和答案。我们可以使用 ChatGPT 模型来做到这一点。 这是一个代码示例： def generate_qa(filepath): article = read_file(filepath)[:MAX_CONTENT_LENGTH] content = f'''Content for {filepath}: {article} Instructions: Generate question and answer based on Content for {filepath}. Structure it as: Q: A: ''' questions_answers = [] response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "You are a helpful software developer who specialize in next.js and react."}, {"role": "user", "content": content}, ], n=TOTAL_QUESTIONS_COUNT ) for choice in response.choices: qa = extract_qa_from_content(choice.message.content.strip()) questions_answers.extend(qa) return questions_answers 整个解决方案可以 找到。 在这里 我们试图为每个训练文件生成至少 100 个问答对。 输出应保存在 .以相同的模式终止每个提示很重要。我们将以 作为分隔符结束。您可以在此处的文件中找到生成的数据集： . JSONL文档 \n\n###\n\n fine_tuning_dataset.jsonl 错误问题摄取 我们希望模型能够真实地回答“我不知道”任何与软件开发和 next.js 无关的问题。我们可以通过集成 next.js 问题分类器来测试问题是否与 next.js 相关来实现这一点。或者，如果我们想要一个简单的架构，我们可以在我们的微调过程中添加额外的训练数据。 此外，即使问题与 next.js 相关，我们也不希望我们的系统回答诸如“next.js 框架何时会达到 10 亿用户？”这样没有意义的问题。我们希望模型用“我不知道”来回答这个问题。 代码示例： NON_NEXTJS_Q_A_PROMPT = """Create a series of random questions and answers that are not related to the Next.js framework. Each question should be followed by a clear answer stating that it is not relevant to Next.js. For example: What is the capital of Ukraine? This question is not related to Next.js. What is Spring Framework? It is not related to Next.js. Feel free to generate any type of questions you like, as long as the answer indicates that it is not related to the Next.js framework.""" def generate_random_qa(prompt): questions_answers = [] response = openai.ChatCompletion.create( model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "You are a helpful software developer who specialize in next.js and react."}, {"role": "user", "content": prompt}, ], n=RANDOM_QUESTIONS_COUNT ) for choice in response.choices: qa = extract_qa_from_content(choice.message.content.strip()) questions_answers.extend(qa) return questions_answers 我生成了 100 个问题和答案来向模型展示我们希望它只回答与 Next.js 相关的任何问题，并且完全可以回答“我不知道”以减少幻觉。生成的训练数据集列在此处 . 微调 dataset.jsonl 微调 最后，在创建训练数据集后（有关此过程的更多详细信息，请查看 ），我们准备好进行微调。 OpenAI 的官方指南 可悲的是，我们可以微调的模型并不多。您可以通过 的列表来查看所有这些模型。 openai api models.list' with '“allow_fine_tuning”: true, 在微调过程中，我们可以调整学习率等，但根据我的经验，这并不会真正影响最终分数。训练真实数据集的成本可能已经很高，因此使用额外的参数可能根本不值得。 微调过程的长度取决于模型（davinci、ada 等）。在我们的 1511 个案例和 907 KB 的数据集上，它花费了大约一个小时。 对标 请记住，我们正在使用文本完成模型，而不是聊天或指令模型。我们的文本完成模型已调整为使用答案格式进行响应。此外，我们的模型等待输入提示以 终止。如果不是，输出可能会出乎意料，但模型很可能会尝试生成一些内容来完成带有提示终止符的提示。 ?\n\n###\n\n 让我们尝试估计模型将如何处理以下问题集。 设置 1. 正确的问题 我们期望模型根据提供的内容给出适当的响应： proper_questions = [ "When Next.js 13 was released?", "Is Next.js 14 already released?", "How Next.js 13 is different from Next.js 12?", "How routing is different in Next.js 13 compared to Next.js 12?", "How to prevent Next.js file component from using React Server Components?", "What is Turbopack?" ] 我在温度 = 0.3 的情况下对这些问题进行了两次模型运行。 正确答案列表： #1 Q: When Next.js 13 was released? A: Next.js 13 was released on October 2022. It includes the new App Directory, Turbopack, and several other updates and improvements. #3 Q: How Next.js 13 is different from Next.js 12? A: Next.js 13 has been rebuilt from the ground up to be faster and more reliable. It introduces a new app directory that supports incremental adoption and is less complex than the `pages` directory. The new directory has a different file system layout that is designed for better performance and reliability. The new directory also has special support for dynamic styles, server components, and streaming. #4 Q: How routing is different in Next.js 13 compared to Next.js 12? A: In Next.js 13, the `app` directory replaces the `pages` directory. This means that routing is handled via the file path each `page.tsx` is located in. Each route is housed within its own directory, with the directory name being the route and the `page.tsx` being its index that gets rendered at that route. 部分正确的回答列表： #2 Q: Is Next.js 14 already released? A: No, Next.js 14 is not released yet. It will be released in the coming months. #5 Q: How to prevent Next.js file component from using React Server Components? A: To prevent Next.js file components from using React Server Components, we can add the following code at the top of the file: // @supports react-server // import ClientComponentHere This code tells Next.js to use client components only and not server components when rendering this file component. 我会给正确答案 1，部分正确答案 0，不正确答案 -2。 我对这些问题进行了两次模型运行，温度 = 0.7。 正确答案列表： #2 Q: Is Next.js 14 already released? A: No, Next.js 14 is not released yet. This blog post is about the new features coming to Next.js 13 with React 18. #3 Q: How Next.js 13 is different from Next.js 12? A: Next.js 13 has several new features, including the `app` directory, layouts, error components, loading components, and server components. It also comes with a new loading strategy and incremental static-site generation. #4 Q: How routing is different in Next.js 13 compared to Next.js 12? A: Routing in Next.js 13 no longer requires a `pages` directory. Instead, you can create any file inside of any directory to define the routing structure for that folder. Every route is housed within its own directory, with the directory name being the route and the `page.tsx` being its index render at that route. #5 Q: How to prevent Next.js file component from using React Server Components? A: To prevent Next.js file components from using React Server Components, you can add the following code at the top of the file: “使用客户端”； #6 Q: What is Turbopack? A: Turbopack is a new general-purpose JavaScript bundler and a major feature in Next.js 13. It is intended as a Webpack replacement, and although it's released as alpha, you can use Turbopack now as the dev-mode bundler from Next.js 13 forward. Turbopack comes with architectural changes, including clever use of caching, which improves the handling of source changes in an optimized way. #1 Q: When Next.js 13 was released? A: Next.js 13 was released in October 2022. #2 Q: Is Next.js 14 already released? A: No. Next.js 14 is still in beta, but with the release of Next.js 13.2, it is now possible to use the `next` command in a project directory to create a new Next.js application using the latest version. #4 Q: How routing is different in Next.js 13 compared to Next.js 12? A: Routing in Next.js 13 is now handled via the new `app` directory, which means that every route is a single file, `page.js`. Every route is housed within a folder named after the URL path, and the folder name becomes the route path. For example, `app/profile/settings/page.js` is the file that controls the `/profile/settings` route. #5 Q: How to prevent Next.js file component from using React Server Components? A: To prevent Next.js file components from using React Server Components, you can add the following code at the bottom of the file: // 此文件与服务器渲染一起使用 export default function ServerComponent { // ... } If you don't add this code, the file component will use the React Server Components by default. #6 Q: What is Turbopack? A: Turbopack is a new general-purpose JavaScript bundler and a major feature in Next.js 13. Using the Turbopack alpha with Next.js 13 results in a 700x faster update than Webpack, 10x faster updates than Vite, and 4x faster cold starts than Webpack. Additionally, it only bundles the minimum assets required in development, so startup time is extremely fast. Turbopack also has out-of-the-box support for Server Components, TypeScript, JSX, CSS, and more. It is being developed as a replacement for Webpack and it's expected to be released as the official Rust-based Webpack replacement in the Next.js 13 stable release. 部分正确的回答列表： #3 Q: How Next.js 13 is different from Next.js 12? A: Next.js 13 is different from Next.js 12 in the following ways: - It comes with an improved `app` directory which provides new features like layouts, error components, and loading components. - It introduces the `loading.tsx` file which automatically wraps the page inside of a React suspense boundary. - It includes improvements to the build process, including a smaller binary size. 错误回复列表： #1 Q: When Next.js 13 was released? A: Next.js 13 was released in November of 2020. 您可能已经注意到，该模型在高温下仍会产生幻觉。看看回复“Next.js 13 于 2020 年 11 月发布”。 基于两次不同的温度运行，我们可以看出，如果我们想让我们的助手更精确，我们需要降低温度。因此，我们将对以下所有测试使用温度 0.3，并保持 10/12 的跟踪分数。还建议将温度设置为 0.3，以使任何输出更加稳定，以便可以重播输入。 第 2 组。超出范围的问题 我们训练我们的模型只回答有关 JavaScript、Next.js 和 React 的技术问题。 out_of_scope_questions = [ "What is the population of the United States?", "How to cook a steak?", "What is native vector database?", "Could you please generate a random number?", "What is a space-time continuum?", ] 正确答案列表： #1 Q: What is the population of the United States? A: This question is not related to Next.js. #2 Q: How to cook a steak? A: This question is not related to Next.js. #5 Q: What is a space-time continuum? A: This question is not related to Next.js. #1 Q: What is the population of the United States? A: This question is not related to Next.js. #2 Q: How to cook a steak? A: I do not know. #5 Q: What is a space-time continuum? A: This question is not related to Next.js. 部分正确的回答列表： #3 Q: What is native vector database? A: Native vector database is a new feature in Next.js 13 that allows you to store vectorial images (such as SVG) in a database instead of rendered as images. This feature is available in Next.js 13 with the Vercel database and is designed to be fast and easy to set up. #4 Q: Could you please generate a random number? A: Yes, I can generate a random number. #3 Q: What is native vector database? A: Native vector database is a new feature in Next.js 13 that allows you to store vectorial data in a compact and optimized format, which can be used with the new `@next/vector-icons` package. It is a faster and more efficient way to display vector icons in your application compared to the previous `@next/icons` package. 一个错误的回应： #4 Q: Could you please generate a random number? A: Sure. Here you go: npm 运行生成 This will generate a random number in the range of 1 to 10. 大多数情况下，该模型运行良好，但仍然会产生幻觉。分数是 4/10。我们在这里做的比在正确的问题上要差得多。 设置 3. 不当查询 我们训练我们的模型来完成问题的答案。但是如果我们尝试随机添加一些东西呢？为一些奇怪的事情做好准备。 not_questions = [ "Generate Next.js index page", "Suggest how no name next.js pages to show about me page", "six(6) + six(6)", "cos(0) + cos(0)", "Write code to generate Next.js index page", "Aaaaaughibbrgubugbugrguburgle!?", "Mmmm mrrrggk", ] process_questions(not_questions, is_accumulate = False) 正确答案列表： #3 Q: six(6) + six(6) A: This is not a valid mathematical expression. #6 Q: Aaaaaughibbrgubugbugrguburgle!? A: I do not understand. #7 Q: Mmmm mrrrggk A: I do not understand. #6 Q: Aaaaaughibbrgubugbugrguburgle!? A: I have no idea what that means. #3 Q: six(6) + six(6) A: This is not a valid mathematical expression. 部分正确的回答列表： #1 Q: Generate Next.js index page A: Generate a Next.js `index.js` file with the given directory path. 错误回复列表： #2 Q: Suggest how no name next.js pages to show about me page A: I do not need this suggestion #4 Q: cos(0) + cos(0) A: Next platformDRNext were NextDR NextNextNext Next standalone Next Next SessionsNext were SessionsNextNext NextNext Next sickness Next Next Next... #5 Q: Write code to generate Next.js index page A: Sessions Next were Next were sickness wereNext were NEXT NextNext NextDR visualsNextNext Next were Next wereNextNext Next Next WereNext visuals SI Sessions... #7 Q: Mmmm mrrrggk A: Am Next b Next NextDRNext Sessions NextNext NextDR were NextNext Next were Next were NextNext Next platform standalone were... #1 Q: Generate Next.js index page A: This command will generate an index.js file inside the `pages/` directory. You can then use this file to render the index page of your app. #2 Q: Suggest how no name next.js pages to show about me page A: I do not have an about me page, so I will leave this question blank. #4 Q: cos(0) + cos(0) A: This expression is not a constant expression. #5 Q: Write code to generate Next.js index page A: Next.js has a built-in function called `getServerSideProps` that can be used to fetch data on the server and then pass it down to the component as props. We can use this function to fetch user data and pass it down to the `User` component as a prop. export default function User({ user }) { return User: {user.name} ; }export async function getServerSideProps(context) { const { userId } = context.params; const res = await fetch(https://api.example.com/users/${userId}); const user = await res.json(); 最后的分数是-11/14，基本上是0/14。每当我们没有正确地提出问题时，这个模型就绝对糟糕。 OpenAI 评估 OpenAI 已经开源 ，一个软件框架，有助于创建和执行用于评估模型和逐个样本检查其性能的基准。 OpenAI 采用 Evals 通过检测弱点和防止回归来指导模型的开发。反过来，我们可以利用 Evals 来监控跨模型版本的性能并改进产品集成。 OpenAI 评估 OpenAI Evals 适用于聊天和非聊天模型，但由于 OpenAI 专注于聊天模型，您需要准备一个数据集以在基于聊天的格式输入中进行评估。今天，您可以开始使用 在“oaieval”CLI 中控制模型参数。尝试将温度设置为 0 以获得可预测的结果。 modelspec_extra_options 进一步的微调过程 数据是创建功能良好且可靠的模型过程的核心。不用说，我们目前拥有的 Next.js 13 框架 QA 机器人模型尚未准备好投入生产。我们需要增加转向数据集，并通过使用更多样本更好地教导模型如何响应域外问题。我们应该创建一个 Eval 注册表并监控我们的模型当前的执行情况。 此外，我们可能还想训练我们的模型以处理非问题格式的输入，如果我们准备将其用于生产，理想情况下我们的数据集也应该有一些代码示例存储库。这部分占整个微调过程的 60% 左右。此外，我们可能需要更多的 RLHF 来准备模型以我们希望的方式回答某些问题。 微调模型的好处是它们可以持续微调。因此，可以多次调优，但需要注意的是微调可能会影响之前调优的结果，因此应该始终有充分的理由这样做，这也可以减少训练成本和持续时间。 最后，我们应该记住，我们是在一个已经训练好的模型之上构建的，微调模型的学习能力相对有限。如果我们的域范围对基本 GPT 模型不是很熟悉，则最好以某种形式使用上下文注入，因为微调可能是不够的，甚至是不必要的。 最后还有一些值得一提的简单事实： 微调非常适合相对简单的分类任务。 首先尝试使用较小的模型，例如 Ada。它们执行请求的速度更快，更稳定，成本更低，训练速度更快，运行成本更低。 不要建立巨大的模型。如果您的数据集“庞大”且可分离，则创建多个数据集并单独训练每个数据集。 也发布 在这里。 参考 https://github.com/vercel/next.js/tree/canary/docs https://openai.com/research/gpt-4 https://towardsdatascience.com/chatgpt-insists-i-am-dead-and-the-problem-with-language-models-db5a36c22f11 https://mpost.io/gpt-4-inherits-hallucinating-facts-and-reasoning-errors-from-earlier-gpt-models/ https://arxiv.org/abs/2303.12712 https://levelup.gitconnected.com/the-surprising-things-chatgpt-cant-do-yet-4362842da5b7 https://digitalbunker.dev/understanding-the-inner-workings-of-bloom-filters/ https://huggingface.co/blog/rlhf