嘿大家!
检索增强生成 (RAG) 是使用不同 LLM 开发的最常见的应用程序之一。我们之前探讨过如何使用langchain开发 RAG。在这篇文章中,我们将使用Microsoft 的语义内核创建 RAG。
为了继续进行,您将需要 Open AI API。
第一步是初始化语义内核,并告诉内核我们想要使用 Open AI 的聊天完成和 Open AI 的嵌入模型,稍后我们将使用这些模型来创建嵌入。我们还将告诉内核我们想要使用内存存储,在我们的例子中是 Chroma DB。请注意,我们还指示内核此内存存储需要持久。
kernel = sk.Kernel() kernel.add_text_completion_service("openai", OpenAIChatCompletion("gpt-4",api_key)) kernel.add_text_embedding_generation_service("openai-embedding", OpenAITextEmbedding("text-embedding-ada-002", api_key)) # chrome db kernel.register_memory_store(memory_store=ChromaMemoryStore(persist_directory='mymemories2')) print("Made two new services attached to the kernel and made a Chroma memory store that's persistent.")
在此示例中,我们创建了一个 RAG,它可以回答我们为披萨业务创建的 SWOT 分析的问题。因此,为了做到这一点,我们进行 SWOT 分析并获取其嵌入,然后将相应的嵌入存储在我们在上一步中创建的持久数据存储中名为“SWOT”的集合中。
strength_questions = ["What unique recipes or ingredients does the pizza shop use?","What are the skills and experience of the staff?","Does the pizza shop have a strong reputation in the local area?","Are there any unique features of the shop or its location that attract customers?", "Does the pizza shop have a strong reputation in the local area?", "Are there any unique features of the shop or its location that attract customers?"] weakness_questions = ["What are the operational challenges of the pizza shop? (eg, slow service, high staff turnover)","Are there financial constraints that limit growth or improvements?","Are there any gaps in the product offering?","Are there customer complaints or negative reviews that need to be addressed?"] opportunities_questions = ["Is there potential for new products or services (eg, catering, delivery)?","Are there under-served customer segments or market areas?","Can new technologies or systems enhance the business operations?","Are there partnerships or local events that can be leveraged for marketing?"] threats_questions = ["Who are the major competitors and what are they offering?","Are there potential negative impacts due to changes in the local area (eg, construction, closure of nearby businesses)?","Are there economic or industry trends that could impact the business negatively (eg, increased ingredient costs)?","Is there any risk due to changes in regulations or legislation (eg, health and safety, employment)?"] strengths = [ "Unique garlic pizza recipe that wins top awards","Owner trained in Sicily at some of the best pizzerias","Strong local reputation","Prime location on university campus" ] weaknesses = [ "High staff turnover","Floods in the area damaged the seating areas that are in need of repair","Absence of popular calzones from menu","Negative reviews from younger demographic for lack of hip ingredients" ] opportunities = [ "Untapped catering potential","Growing local tech startup community","Unexplored online presence and order capabilities","Upcoming annual food fair" ] threats = [ "Competition from cheaper pizza businesses nearby","There's nearby street construction that will impact foot traffic","Rising cost of cheese will increase the cost of pizzas","No immediate local regulatory changes but it's election season" ] print("✅ SWOT analysis for the pizza shop is resident in native memory") memoryCollectionName = "SWOT" # lets put these in memory / vector store async def run_storeinmemory_async(): for i in range(len(strengths)): await kernel.memory.save_information_async(memoryCollectionName, id=f"strength-{i}", text=f"Internal business strength (S in SWOT) that makes customers happy and satisfied Q&A: Q: {strength_questions[i]} A: {strengths[i]}") for i in range(len(weaknesses)): await kernel.memory.save_information_async(memoryCollectionName, id=f"weakness-{i}", text=f"Internal business weakness (W in SWOT) that makes customers unhappy and dissatisfied Q&A: Q: {weakness_questions[i]} A: {weaknesses[i]}") for i in range(len(opportunities)): await kernel.memory.save_information_async(memoryCollectionName, id=f"opportunity-{i}", text=f"External opportunity (O in SWOT) for the business to gain entirely new customers Q&A: Q: {opportunities_questions[i]} A: {opportunities[i]}") for i in range(len(threats)): await kernel.memory.save_information_async(memoryCollectionName, id=f"threat-{i}", text=f"External threat (T in SWOT) to the business that impacts its survival Q&A: Q: {threats_questions[i]} A: {threats[i]}") asyncio.run(run_storeinmemory_async()) print("😶🌫️ Embeddings for SWOT have been generated and stored in vector db")
现在我们已经将数据的嵌入存储在 chrome 向量存储中,我们可以询问与披萨业务相关的问题并得到答案。
#ask questions on swot potential_question = "What are the easiest ways to make more money?" counter = 0 async def run_askquestions_async(): memories = await kernel.memory.search_async(memoryCollectionName, potential_question, limit=5, min_relevance_score=0.5) display(f"### ❓ Potential question: {potential_question}") for memory in memories: if counter == 0: related_memory = memory.text counter += 1 print(f" > 🧲 Similarity result {counter}:\n >> ID: {memory.id}\n Text: {memory.text} Relevance: {memory.relevance}\n") asyncio.run(run_askquestions_async())
这是使用 Semantic Kernel 创建 RAG 的简化版本。目前使用 LLM 构建的最流行的框架选择是 langchain,我们之前已经了解了如何使用 langchain 构建 RAG。尽管随着越来越多的公司构建工具,Langchain 越来越受欢迎,但市场上将会出现更多复杂的工具,我发现 Semantic Kernel 有一些特殊功能使其脱颖而出。
这就是 AI 100 天的第 18 天。
我写了一篇名为《高于平均水平》的通讯,在通讯中我谈论了科技巨头发生的一切背后的二阶见解。如果你身处科技行业,又不想平庸,那就订阅它吧。
在Twitter 、 LinkedIn或HackerNoon上关注我,了解 100 天 AI 的最新动态或收藏此页面。如果您是科技界人士,您可能有兴趣加入我这里的技术专业人士社区。