Speedrun RAG ของคุณ: สร้าง AI Recommender สําหรับห้องสมุด Steam ของคุณ

วิธีที่ชาญฉลาดและรวดเร็วในการค้นหาใน Steam Library ของคุณ คุณรู้ความรู้สึก คุณกําลังมองหาเกมที่กลยุทธ์ร่วมกันบางทีด้วยธีม sci-fi คุณจะได้รับกําแพงของชื่อที่ตรงกันข้าม สิ่งที่คุณต้องการคือรายการสั้น ๆ ที่จับภาพอารมณ์หลังคําของคุณอย่างแท้จริง ในคู่มือนี้เราแสดงให้เห็นวิธีสร้างสิ่งนี้โดยการเชื่อมโยง Superlinked กับ LlamaIndex ผลลัพธ์คือตัวรับเกม Steam ที่กําหนดเองที่เข้าใจประเภทบวกคําอธิบายบวกแท็กและให้บริการคําตอบในนาทีมิลลิส * ปริมาณ คุณต้องการดูข้อมูลนี้บนข้อมูลของคุณด้วยคําถามที่แท้จริงและหมายเลขความล่าช้า? ติดต่อเรา . Get in touch สัมผัส สัมผัส แพทย์ Retrievers ที่กําหนดเองช่วยให้คุณสามารถควบคุมเนื้อหาโดเมนข้อมูลเมตาและโลจิกการจัดอันดับได้ พวกเขามีประสิทธิภาพมากกว่าการค้นหาความคล้ายคลึงกันทั่วไปเมื่อคําถามมีความสกปรกหรือจargon หนัก Superlinked รวมหลายฟิลด์ข้อความลงในพื้นที่เซมเมนต์เดียวและเรียกใช้คําถามในหน่วยความจําเพื่อให้ได้ผลลัพธ์ที่รวดเร็ว LlamaIndex ให้อินเตอร์เฟซ retriever ที่สะอาดและเชื่อมต่อโดยตรงไปยังเครื่องมือสอบถามและการสังเคราะห์การตอบสนอง มีการบูรณาการ Superlinked Retriever อย่างเป็นทางการสําหรับ LlamaIndex ที่คุณสามารถนําเข้าและใช้ ดูด้านล่าง Superlinked Retriever สําหรับ LlamaIndex Superlinked รวมกับ LlamaIndex ผ่านทางอย่างเป็นทางการ รายการบน LlamaHub เพื่อให้คุณสามารถเพิ่ม Superlinked ไปยัง Stack LlamaIndex ของคุณที่มีอยู่ด้วยการติดตั้งง่ายและ จากนั้นวางไว้ใน a เรียนรู้เพิ่มเติมเกี่ยวกับ พารามิเตอร์ของคลาสและผู้สร้างได้รับการพิสูจน์ในอ้างอิง API ของ LlamaIndex SuperlinkedRetriever from llama_index.retrievers.superlinked import SuperlinkedRetriever RetrieverQueryEngine . หน้าการเข้าร่วมอย่างเป็นทางการ . หน้าการเข้าร่วมอย่างเป็นทางการ หน้าการเข้าร่วมอย่างเป็นทางการ pip install llama-index-retrievers-superlinked from llama_index.retrievers.superlinked import SuperlinkedRetriever # sl_app: a running Superlinked App # query_descriptor: a Superlinked QueryDescriptor that describes your query plan retriever = SuperlinkedRetriever( sl_client=sl_app, sl_query=query_descriptor, page_content_field="text", query_text_param="query_text", metadata_fields=None, top_k=10, ) nodes = retriever.retrieve("strategic co-op sci fi game") ชอบที่จะสร้างด้วยมือหรือปรับแต่งกลยุทธ์ต่อไป? อ่านต่อไป Why Superlinked + LlamaIndex? ทําไม Superlinked + LlamaIndex วัตถุประสงค์ที่เรียบง่าย: ใช้ความแข็งแกร่งของ Superlinked สําหรับการค้นหาหลายฟิลด์และแพคเกจพวกเขาเพื่อให้นักพัฒนาสามารถใช้และขยายในระบบ RAG จริง Superlinked ช่วยให้คุณกําหนดพื้นที่เวกเตอร์ที่แสดงออกและคําถามที่ผสมผสานฟิลด์เช่นชื่อคําอธิบายและประเภทลงในมุมมองเซมเมนต์เดียว LlamaIndex นําการสกัดการค้นหาเครื่องมือคําถามและการสังเคราะห์การตอบสนองที่ล็อคในแอพพลิเคชันและตัวแทนด้วยกาวขั้นต่ํา นอกจากนี้คุณยังสามารถติดตามใน ใช้บล็อกอาคารเดียวกันจากบันทึก Superlinked กูเกิล คอลัมน์ กูเกิล คอลัมน์ คอลัมน์ ทําไม Custom Retrievers มีความสําคัญ ปรับแต่งสําหรับโดเมนของคุณ - Retrievers แบบทั่วไปเหมาะสําหรับการใช้งานทั่วไป แต่พวกเขามักจะพลาดสิ่งบางอย่างที่ละเอียดอ่อน พิจารณาจargon สั้น ๆ หรือคําอธิบายเฉพาะโดเมนเหล่านี้มักจะไม่ได้รับการจับกุมเว้นถ้า Retriever ของคุณรู้ว่าต้องมองหา สิ่งที่กําหนดเองส่องสว่าง: คุณสามารถ hardwire ในแง่มุมนั้น ทํางาน Beyond Just Text – ข้อมูลในโลกจริงส่วนใหญ่ไม่ได้เป็นเพียงข้อความธรรมดา คุณมักจะมีข้อมูลเมตาและแท็กเช่นกัน ตัวอย่างเช่นในระบบการแนะนําเกมเราไม่สนใจเฉพาะคําอธิบายเกม นอกจากนี้เรายังต้องการให้คํานึงถึงประเภทแท็กคะแนนผู้ใช้และอื่น ๆ โปรดคิดเกี่ยวกับเหตุผลนี้: ใครก็ตามที่กําลังมองหา“ เกมการทํางานร่วมกับกลยุทธ์ที่มีองค์ประกอบของ sci-fi” จะไม่ไปไกลกับการจับคู่เพียงข้อความ โลกการกรองและจัดอันดับที่กําหนดเอง – บางครั้งคุณต้องการใช้กฎของคุณเองในการทําคะแนนหรือกรองสิ่งบางอย่าง บางทีคุณอาจต้องการให้ความสําคัญกับเนื้อหาใหม่ ๆ หรือลงโทษผลลัพธ์ที่ไม่ตรงกับขอบเขตคุณภาพบางอย่าง ฉันหมายความว่าการควบคุมดังกล่าวเป็นเช่นการให้สมองจริงของคุณสามารถพิจารณาผ่านความเกี่ยวข้องแทนที่จะพึ่งพาระยะทางเวกเตอร์เท่านั้น เพิ่มประสิทธิภาพ – ลองเป็นจริง: โซลูชันวัตถุประสงค์ทั่วไปถูกสร้างขึ้นเพื่อทํางาน “Ok” สําหรับทุกคน, ไม่ดีสําหรับคุณ หากคุณรู้ข้อมูลและรูปแบบการเข้าถึงของคุณคุณสามารถปรับแต่งเครื่องตรวจจับของคุณเพื่อทํางานได้เร็วขึ้นอันดับที่ดีขึ้นและลดเสียงรบกวนที่ไม่จําเป็นในผลลัพธ์ การทําลายการใช้งาน ส่วนที่ 1: การพึ่งพาหลักและการนําเข้า import time import logging import pandas as pd from typing import List from llama_index.core.retrievers import BaseRetriever from llama_index.core.schema import NodeWithScore, QueryBundle, TextNode from llama_index.core.query_engine import RetrieverQueryEngine from llama_index.core.response_synthesizers import get_response_synthesizer from llama_index.core import Settings from llama_index.llms.openai import OpenAI import superlinked.framework as sl โครงสร้างการนําเข้าแสดงให้เห็นถึงแนวทางไฮบริดของเรา: LlamaIndex Core: ให้ layer abstraction การกู้คืน Superlinked Framework: จัดการการคํานวณเวกเตอร์และการค้นหาเชิงบวก Pandas: จัดการก่อนการประมวลผลและจัดการข้อมูล ส่วนที่ 2: การเข้าใจ LlamaIndex Custom Retrievers ก่อนที่จะเจาะเข้าไปในการใช้งาน Superlinked ของเราเป็นสิ่งสําคัญที่จะเข้าใจวิธีการทํางานของ LlamaIndex สถาปัตยกรรม retriever ที่กําหนดเองและทําไมจึงมีประสิทธิภาพมากสําหรับการสร้างแอปพลิเคชัน RAG ที่เฉพาะเจาะจงในโดเมน BaseRetriever การสกัด LlamaIndex ให้คําอธิบาย คลาสที่ทําหน้าที่เป็นพื้นฐานสําหรับทุกการดําเนินงานการกู้คืน ความงามของการออกแบบนี้อยู่ในความเรียบง่ายของมัน - แต่ละเครื่องกู้คืนที่กําหนดเองต้องใช้วิธีการหลักเดียวเท่านั้น: BaseRetriever from abc import abstractmethod from llama_index.core.retrievers import BaseRetriever from llama_index.core.schema import NodeWithScore, QueryBundle class BaseRetriever: @abstractmethod def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]: """Retrieve nodes given query.""" pass ปริศนาที่นี่คือการปรากฏตัวของโปรโตคอลการกู้คืนจาก LlamaIndex เนื่องจาก "โปรโตคอลการกู้คืน" นี้ทําให้ง่ายต่อการเชื่อมต่อด้านหลังหรือกลยุทธ์ที่แตกต่างกันโดยไม่ต้องสัมผัสกับส่วนที่เหลือของระบบของคุณ ลองแบ่งมันลงในสิ่งที่เกิดขึ้น : Input: QueryBundle This is the query object passed into your retriever. At minimum, it contains the user's raw query string (e.g., "sci-fi strategy games"). But it can also include extra metadata like filters, embeddings, or user preferences. Basically, anything that might help shape a more relevant response. Output: List[NodeWithScore] The retriever returns a list of nodes—these are your chunks of content, documents, or data entries—each paired with a relevance score. The higher the score, the more relevant the node is to the query. This list is what gets passed downstream to the LLM or other post-processing steps. As in our case, we are plugging on the Processing: Backend-Agnostic Here’s the cool part: how you get from query to result is totally up to you. You can use a vector database, a traditional search engine, a REST API, or even something handcrafted for your specific use case. This decouples logic and gives you full control over the retrieval stack. ทําไมสิ่งนี้สําคัญ แปลกประสงค์นี้บริสุทธิ์และ มันหมายความว่าคุณสามารถ: พลังงาน รวมกลยุทธ์หลายอย่าง – ใช้การค้นหาเวกเตอร์หนาแน่นและการกรองคําหลักร่วมกันหากจําเป็น ดําเนินการทดสอบ A / B ได้อย่างง่ายดาย - เปรียบเทียบเครื่องมือค้นหาที่แตกต่างกันเพื่อดูว่าอะไรให้ผลลัพธ์ที่ดีขึ้นสําหรับผู้ใช้ของคุณ เชื่อมต่อตัวแทนหรือเครื่องมือใด ๆ - ไม่ว่าคุณจะสร้าง chatbot, search UI หรือระบบตัวแทนเต็มรูปแบบสล็อตอินเตอร์เฟซ retriever นี้ได้อย่างง่ายดาย พิจารณาโปรโตคอลการกู้คืนเป็นสัญญา API ระหว่าง "สมองการกู้คืน" ของคุณและทุกสิ่งอื่น ๆ เมื่อคุณทําตามมันคุณมีอิสระที่จะนวัตกรรมทุกอย่างที่คุณต้องการอยู่เบื้องหลัง ปลั๊กอิน Superlinked ใน LlamaIndex ดีดังนั้น the คลาสเป็นพื้นฐานของเครื่องมือของเราที่จะให้คําแนะนําเกมที่ชาญฉลาด เราจะเริ่มต้นด้วยการดูอย่างรวดเร็ววิธีการประกอบและจากนั้นจูบลึกลงในแต่ละส่วนเพื่อดูสิ่งที่ทําให้สิ่งนี้คลิก SuperlinkedSteamGamesRetriever ด้านบนแรกคือ พิจารณามันเป็นพื้นฐาน มันเป็นสิ่งที่ทําให้ทุกอย่างจัดระเบียบและเชื่อถือได้ ใช้ Superlinked's เราแสดงรายละเอียดที่สําคัญเช่น , , และ สิ่งนี้ทําให้ข้อมูลเกมทั้งหมดสะอาดและสอดคล้องกันและเชื่อมต่อเข้ากับท่อของ Superlinked เพื่อให้ทุกอย่างไหลได้อย่างราบรื่น schema definition GameSchema game_number name desc_snippet genre class GameSchema(sl.Schema): game_number: sl.IdField name: sl.String desc_snippet: sl.String game_details: sl.String languages: sl.String genre: sl.String game_description: sl.String original_price: sl.Float discount_price: sl.Float combined_text: sl.String # New field for combined text self.game = GameSchema() ต่อไปข้างต้นคือ นี่คือที่ Magic of Semantic Search จะเกิดขึ้น มันใช้ โมเดลเพื่อเปลี่ยนข้อมูลเกม (เช่นชื่อคําอธิบายประเภท ฯลฯ ) เป็นตัวแทนเวกเตอร์ที่หนาแน่น โดยทั่วไปแล้วมันจะทําให้ข้อความทั้งหมดเข้าด้วยกันเป็นสิ่งบางอย่างที่รุ่นสามารถเข้าใจ ส่วนที่เย็น นี้ช่วยให้เครื่องตรวจจับเข้าใจสิ่งที่ผู้ใช้ ดังนั้นถ้าใครบางคนกําลังมองหาบางสิ่งบางอย่างเช่น“การผจญภัยในโลกที่เปิด” เขาสามารถหาเกมที่เหมาะกับอารมณ์ที่จริง ๆ ไม่เพียง แต่คนที่มีคําที่แน่นอน text similarity space sentence-transformers/all-mpnet-base-v2 หมายถึง self.text_space = sl.TextSimilaritySpace( text=self.game.combined_text, model="sentence-transformers/all-mpnet-base-v2" ) โซ มันใช้ข้อมูลที่แตกต่างกัน (เช่นชื่อเกมคําอธิบายประเภทและอื่น ๆ ) และทําลายพวกเขาเป็นชิ้นเดียวของข้อความ สิ่งนี้ให้รุ่นภาพที่สมบูรณ์มากขึ้นของแต่ละเกมเมื่อเปลี่ยนเป็นเวกเตอร์ ผลลัพธ์? วิธีที่แนะนําที่ดีกว่าเนื่องจากมันดึงในกลุ่มรายละเอียดที่แตกต่างกันทั้งหมดในครั้งเดียวแทนที่จะมองไปที่สิ่งเดียวในอิสระ combined text field self.df['combined_text'] = ( self.df['name'].astype(str) + " " + self.df['desc_snippet'].astype(str) + " " + self.df['genre'].astype(str) + " " + self.df['game_details'].astype(str) + " " + self.df['game_description'].astype(str) ) และในที่สุด นี่คือสิ่งที่ทําให้ทุกอย่างเป็นที่น่าตื่นตาตื่นใจ ขอบคุณ Superlinked's , Retriever สามารถจัดการคําถามในเวลาจริงไม่มีความล่าช้าเพียงผลลัพธ์ทันที ซึ่งหมายความว่าไม่ว่าใครบางคนจะล่าช้าสําหรับประเภทที่เฉพาะเจาะจงหรือเพียงแค่เรียกดูสิ่งที่ใหม่ที่จะเล่นพวกเขาจะได้รับคําแนะนําที่รวดเร็วและถูกต้องโดยไม่ต้องรออยู่ in-memory execution InMemoryExecutor # Set up in-memory source and executor source = sl.InMemorySource(self.game, parser=parser) self.executor = sl.InMemoryExecutor(sources=[source], indices=[self.index]) self.app = self.executor.run() # Load data source.put([self.df]) วางชิ้นเหล่านี้ทั้งหมดเข้าด้วยกันและคุณมี — การตั้งค่าที่แข็งแกร่งสําหรับการนําเสนอคําแนะนําเกมที่สมเหตุสมผลสําหรับผู้ใช้ มันรวดเร็วสมาร์ทและเป็นส่วนตัว นี่คือสิ่งที่ทั้งหมดดูเหมือนในกิจกรรม... SuperlinkedSteamGamesRetriever class SuperlinkedSteamGamesRetriever(BaseRetriever): """A custom LlamaIndex retriever using Superlinked for Steam games data.""" def __init__(self, csv_file: str, top_k: int = 10): """ Initialize the retriever with a CSV file path and top_k parameter. Args: csv_file (str): Path to games_data.csv top_k (int): Number of results to return (default: 10) """ self.top_k = top_k # Load the dataset and ensure all required columns are present self.df = pd.read_csv(csv_file) print(f"Loaded dataset with {len(self.df)} games") print("DataFrame Columns:", list(self.df.columns)) required_columns = [ 'game_number', 'name', 'desc_snippet', 'game_details', 'languages', 'genre', 'game_description', 'original_price', 'discount_price' ] for col in required_columns: if col not in self.df.columns: raise ValueError(f"Missing required column: {col}") # Combine relevant columns into a single field for text similarity self.df['combined_text'] = ( self.df['name'].astype(str) + " " + self.df['desc_snippet'].astype(str) + " " + self.df['genre'].astype(str) + " " + self.df['game_details'].astype(str) + " " + self.df['game_description'].astype(str) ) self._setup_superlinked() def _setup_superlinked(self): """Set up Superlinked schema, space, index, and executor.""" # Define schema class GameSchema(sl.Schema): game_number: sl.IdField name: sl.String desc_snippet: sl.String game_details: sl.String languages: sl.String genre: sl.String game_description: sl.String original_price: sl.Float discount_price: sl.Float combined_text: sl.String # New field for combined text self.game = GameSchema() # Create text similarity space using the combined_text field self.text_space = sl.TextSimilaritySpace( text=self.game.combined_text, model="sentence-transformers/all-mpnet-base-v2" ) # Create index self.index = sl.Index([self.text_space]) # Map DataFrame columns to schema parser = sl.DataFrameParser( self.game, mapping={ self.game.game_number: "game_number", self.game.name: "name", self.game.desc_snippet: "desc_snippet", self.game.game_details: "game_details", self.game.languages: "languages", self.game.genre: "genre", self.game.game_description: "game_description", self.game.original_price: "original_price", self.game.discount_price: "discount_price", self.game.combined_text: "combined_text" } ) # Set up in-memory source and executor source = sl.InMemorySource(self.game, parser=parser) self.executor = sl.InMemoryExecutor(sources=[source], indices=[self.index]) self.app = self.executor.run() # Load data source.put([self.df]) print(f"Initialized Superlinked retriever with {len(self.df)} games") def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]: """ Retrieve top-k games based on the query string. Args: query_bundle (QueryBundle): Contains the query string Returns: List[NodeWithScore]: List of retrieved games with scores """ query_text = query_bundle.query_str # Define Superlinked query with explicit field selection query = ( sl.Query(self.index) .find(self.game) .similar(self.text_space, query_text) .select([ self.game.game_number, self.game.name, self.game.desc_snippet, self.game.game_details, self.game.languages, self.game.genre, self.game.game_description, self.game.original_price, self.game.discount_price ]) .limit(self.top_k) ) # Execute query result = self.app.query(query) df_result = sl.PandasConverter.to_pandas(result) # Convert results to NodeWithScore objects nodes_with_scores = [] for i, row in df_result.iterrows(): text = f"{row['name']}: {row['desc_snippet']}" metadata = { "game_number": row["id"], "name": row["name"], "desc_snippet": row["desc_snippet"], "game_details": row["game_details"], "languages": row["languages"], "genre": row["genre"], "game_description": row["game_description"], "original_price": row["original_price"], "discount_price": row["discount_price"] } score = 1.0 - (i / self.top_k) node = TextNode(text=text, metadata=metadata) nodes_with_scores.append(NodeWithScore(node=node, score=score)) return nodes_with_scores print("✅ SuperlinkedSteamGamesRetriever class defined successfully!") Integration Architecture Deep Dive ส่วนที่ 3: Superlinked Schema การกําหนดค่าและการตั้งค่า ตอนนี้เป็นเวลาที่เรากําลังเจาะลึกลงในสิ่งที่บางอย่าง เริ่มต้นด้วยการออกแบบแผนภูมิ ตอนนี้ใน Superlinked แผนภาพไม่ได้เป็นเพียงเกี่ยวกับการกําหนดประเภทข้อมูล แต่ก็คล้ายกับการกําหนดค่าอย่างเป็นทางการระหว่างข้อมูลของเราและเครื่องคอมพิวเตอร์เวกเตอร์พื้นฐาน แผนภาพนี้กําหนดวิธีที่ข้อมูลของเราจะถูกสอดแนมและสอบถามดังนั้นจึงเป็นสิ่งสําคัญที่จะได้รับถูกต้อง ในของเรา , แผนภาพถูกกําหนดดังนี้: SuperlinkedSteamGamesRetriever class GameSchema(sl.Schema): game_number: sl.IdField name: sl.String desc_snippet: sl.String game_details: sl.String languages: sl.String genre: sl.String game_description: sl.String original_price: sl.Float discount_price: sl.Float combined_text: sl.String # New field for combined text self.game = GameSchema() ลองแบ่งออกสิ่งที่บางส่วนขององค์ประกอบเหล่านี้จริง ๆ : ทํา (→ ) Think of this as our primary key. It gives each game a unique identity and allows Superlinked to index and retrieve items efficiently, I mean basically it’s about how we are telling the Superlinked to segregate the unique identify of the games, and btw it’s especially important when you're dealing with thousands of records. sl.IdField game_number and Now these aren't just type hints—they enable Superlinked to optimize operations differently depending on the field. For instance, fields can be embedded and compared semantically, while fields can support numeric filtering or sorting. sl.String sl.Float sl.String sl.Float This is the of our retriever. It’s a synthetic field where we concatenate the game name, description, genre, and other relevant attributes into a single block of text. This lets us build a single using sentence-transformer embeddings: combined_text semantic anchor text similarity space self.text_space = sl.TextSimilaritySpace( text=self.game.combined_text, model="sentence-transformers/all-mpnet-base-v2" ) ทําไมต้องทําเช่นนี้ เนื่องจากผู้ใช้ไม่เพียงแค่ค้นหาตามประเภทหรือชื่อพวกเขาอธิบายว่าพวกเขาคืออะไร โดยการรวมสัญญาณที่สําคัญทั้งหมดใน เราสามารถจับคู่คําถามที่คล้ายคลึงกันและเป็นภาษาธรรมชาติได้ดีขึ้นกับเกมที่เหมาะสมที่สุด ค้นหา combined_text ส่วนที่ 4: การกําหนดค่าพื้นที่เวกเตอร์ # Create text similarity space using the combined_text field self.text_space = sl.TextSimilaritySpace( text=self.game.combined_text, model="sentence-transformers/all-mpnet-base-v2" ) # Create index self.index = sl.Index([self.text_space]) เพื่อเพิ่มความสามารถในการค้นหาความหมายในชุดข้อมูลเกม Steam ของเราฉันได้ทําการเลือกการออกแบบสองแบบที่สมดุลประสิทธิภาพความเรียบง่ายและความยืดหยุ่น ประการแรกสําหรับรูปแบบการบูรณาการฉันเลือก จากห้องสมุด Sentence Transformers รุ่นนี้ผลิตตัวยึด 768 มิติที่ประสบกับพื้นฐานกลางที่แข็งแกร่ง: พวกเขามีความแสดงออกเพียงพอที่จะจับความหมายทางคําอธิบายที่อุดมไปด้วย แต่น้ําหนักเบาเพียงพอที่จะรวดเร็วในการผลิต ฉันหมายความว่ามันเป็นรุ่นทั่วไปที่เชื่อถือได้ซึ่งเป็นที่รู้จักกันดีในการทํางานได้ดีในหลายประเภทของข้อความ - ซึ่งมีความสําคัญมากเมื่อข้อมูลของคุณตั้งแต่แท็กประเภทสั้นไปจนถึงคําอธิบายเกมแบบยาว ในกรณีของเราฉันต้องการรูปแบบที่ไม่ได้จมลงในปลายทั้งสองของสเปกตรัมและ จัดการอย่างสะอาด all-mpnet-base-v2 all-mpnet-base-v2 ต่อไปแม้ว่า Superlinked สนับสนุนการดัชนีหลายพื้นที่ - ที่คุณสามารถรวมฟิลด์หลายฟิลด์หรือแม้กระทั่งโหมด (เช่นข้อความ + รูปภาพ) ฉันจะรวมถึง ในที่นี่เช่นกัน แต่ฉันไม่ได้มีข้อมูลเกี่ยวกับวันที่เปิดตัวสําหรับเกม แต่เพียงเพื่อให้ออกที่นี่ถ้าเรามีข้อมูลวันที่เปิดตัวฉันสามารถปลั๊กอิน RecencySpace ที่นี่และฉันยังสามารถจัดเรียงเกมด้วย เช่นเดียวกับความเมื่อยล้าของเกม. Cool.. TextSimilaritySpace RecencySpace TextSimilaritySpace ส่วนที่ 5: Data Pipeline และ Executor Setup # Map DataFrame columns to schema - Critical for data integrity parser = sl.DataFrameParser( self.game, mapping={ self.game.game_number: "game_number", self.game.name: "name", self.game.desc_snippet: "desc_snippet", self.game.game_details: "game_details", self.game.languages: "languages", self.game.genre: "genre", self.game.game_description: "game_description", self.game.original_price: "original_price", self.game.discount_price: "discount_price", self.game.combined_text: "combined_text" } ) # Set up in-memory source and executor source = sl.InMemorySource(self.game, parser=parser) self.executor = sl.InMemoryExecutor(sources=[source], indices=[self.index]) self.app = self.executor.run() # Load data source.put([self.df]) print(f"Initialized Superlinked retriever with {len(self.df)} games") ในใจของระบบการกู้คืนของเราเป็นท่อที่เรียบง่ายที่สร้างขึ้นเพื่อความชัดเจนและความเร็ว ฉันจะเริ่มต้นด้วย ซึ่งทําหน้าที่เป็นชั้น ETL ของเรา มันช่วยให้มั่นใจว่าแต่ละฟิลด์ในชุดข้อมูลจะถูกพิมพ์อย่างถูกต้องและนําไปสู่แผนภูมิของเราอย่างสม่ําเสมอ ในความเป็นจริงมันทําหน้าที่เป็นสัญญาระหว่างข้อมูล CSV วัตถุดิบของเราและชั้นดัชนี Superlinked DataFrameParser เมื่อข้อมูลได้รับการโครงสร้างแล้วฉันจะให้อาหารให้เป็น ซึ่งเหมาะสําหรับชุดข้อมูลที่เข้ากันได้ในหน่วยความจํา วิธีการนี้ช่วยให้ทุกอย่างรวดเร็วโดยไม่ต้องแนะนําการจัดเก็บข้อมูลหรือความล่าช้าเครือข่าย สุดท้ายคําถามจะถูกจัดการโดย นี่คือสิ่งที่ทําให้ Superlinked เหมาะสําหรับแอพพลิเคชันในเวลาจริงเช่นระบบการแนะนําแบบโต้ตอบซึ่งความเร็วมีผลต่อประสบการณ์ของผู้ใช้โดยตรง InMemorySource InMemoryExecutor ส่วนที่ 6: เครื่องมือการกู้คืน def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]: """ Retrieve top-k games based on the query string. Args: query_bundle (QueryBundle): Contains the query string Returns: List[NodeWithScore]: List of retrieved games with scores """ query_text = query_bundle.query_str # Define Superlinked query with explicit field selection query = ( sl.Query(self.index) .find(self.game) .similar(self.text_space, query_text) .select([ self.game.game_number, self.game.name, self.game.desc_snippet, self.game.game_details, self.game.languages, self.game.genre, self.game.game_description, self.game.original_price, self.game.discount_price ]) .limit(self.top_k) ) # Execute query result = self.app.query(query) df_result = sl.PandasConverter.to_pandas(result) หนึ่งในสิ่งที่ทําให้ Superlinked มีความสนุกสนานอย่างแท้จริงในการทํางานคือตัวสร้างคําถามแบบราบรื่น หากคุณได้ใช้ห้องสมุดเช่น SQLAlchemy หรือ Django ORM รูปแบบนี้จะรู้สึกคุ้นเคย แต่ละวิธีการในโซ่จะเพิ่มความชัดเจนแทนที่จะรบกวน ในกรณีของเราคําถามจะเริ่มต้นโดยการเลือกตัวอักษรที่เกี่ยวข้องและกําหนดการค้นหาความคล้ายคลึงกันโดยใช้ วิธีการที่คํานวณความคล้ายคลึงกันของ cosine ในพื้นที่แทรก นี่คือสิ่งที่ช่วยให้เราสามารถค้นหาเกมที่ใกล้ชิดด้วยความหมายตามคําถามภาษาธรรมชาติของผู้ใช้ .similar() อีกทางเลือกการออกแบบที่ได้รับการพิจารณาที่ฉันทําคือ ฉันสนใจในชุดผลลัพธ์แทนที่จะทําบางอย่างเช่น นี่อาจฟังดูเล็กน้อย แต่ก็ช่วยให้ข้อมูลมีเสถียรภาพลดการประมวลผลและให้แน่ใจว่าเราไม่ผ่านการโหลดประโยชน์ที่ไม่จําเป็นในระหว่างหลังการประมวลผล พิจารณามันเป็นความแม่นยํามากกว่าจํานวนมากโดยเฉพาะอย่างยิ่งเมื่อคุณย้ายข้อมูลระหว่างส่วนประกอบในท่อที่มีความล่าช้า explicitly select the fields SELECT * ส่วนที่ 7: การประมวลผลผลและการสร้าง nodes # Convert to LlamaIndex NodeWithScore format nodes_with_scores = [] for i, row in df_result.iterrows(): text = f"{row['name']}: {row['desc_snippet']}" metadata = { "game_number": row["id"], "name": row["name"], "desc_snippet": row["desc_snippet"], "game_details": row["game_details"], "languages": row["languages"], "genre": row["genre"], "game_description": row["game_description"], "original_price": row["original_price"], "discount_price": row["discount_price"] } # Simple ranking score based on result position score = 1.0 - (i / self.top_k) node = TextNode(text=text, metadata=metadata) nodes_with_scores.append(NodeWithScore(node=node, score=score)) return nodes_with_scores ตอนนี้เมื่อเราได้รับผลลัพธ์จาก Superlinked ฉันแปลงเป็นรูปแบบที่เล่นได้ดีกับ LlamaIndex ครั้งแรกฉันสร้าง string โดยการรวมชื่อของเกมกับคําอธิบายสั้น ๆ นี้จะกลายเป็นเนื้อหาของแต่ละโซ่ทําให้เป็นเรื่องง่ายสําหรับรุ่นภาษาที่จะพิจารณา มันเป็นสัมผัสเล็ก ๆ น้อย ๆ แต่ก็ช่วยเพิ่มความเกี่ยวข้องและความเข้าใจของข้อมูลที่ได้รับเมื่อส่งไปยัง LLM human-readable text ต่อไปฉันจะให้แน่ใจว่า จากชุดข้อมูลรวมถึงสิ่งต่าง ๆ เช่นประเภทราคาและรายละเอียดเกมจะถูกเก็บไว้ในข้อมูลเมตา นี่เป็นสิ่งสําคัญเพราะกระบวนการด้านล่างอาจต้องการกรองแสดงหรือจัดอันดับผลลัพธ์ตามข้อมูลนี้ ฉันไม่ต้องการสูญเสียการเชื่อมโยงที่เป็นประโยชน์ใด ๆ เมื่อเราเริ่มทํางานกับ nodes ที่ได้รับ all original fields สุดท้ายฉันใช้น้ําหนักเบา กลยุทธ์ แทนที่จะพึ่งพาคะแนนความคล้ายคลึงกันดิบเราให้คะแนนขึ้นอยู่กับตําแหน่งของผลลัพธ์ในรายการที่จัดอันดับ สิ่งนี้ทําให้ทุกอย่างเรียบง่ายและสอดคล้องกัน ผลลัพธ์ด้านบนมีคะแนนสูงสุดเสมอและส่วนที่เหลือทําตามในลําดับลดลง มันไม่น่าแปลกใจ แต่ก็ให้เราระบบคะแนนที่มั่นคงและสามารถตีความได้ซึ่งทํางานได้ดีกับคําถามที่แตกต่างกัน score normalisation แสดงเวลา: การดําเนินการท่อ ตอนนี้ที่มีส่วนประกอบทั้งหมดอยู่แล้วถึงเวลาที่จะนําระบบ Retrieval-Augmented Generation (RAG) ของเราไปสู่ชีวิต ด้านล่างคือการบูรณาการ End-to-End ของ Superlinked และ LlamaIndex ในกระทํา # Initialize the RAG pipeline print("Setting up complete Retrieval pipeline...") # Create response synthesizer and query engine response_synthesizer = get_response_synthesizer() query_engine = RetrieverQueryEngine( retriever=retriever, response_synthesizer=response_synthesizer ) print("✅ RAG pipeline configured successfully!") print("\n" + "="*60) print("FULL RAG PIPELINE DEMONSTRATION") print("="*60) # Test queries with full RAG responses test_queries = [ "I want to find a magic game with spells and wizards", "Recommend a fun party game for friends", "I'm looking for a strategic sci-fi game", "What's a good cooperative game for teamwork?" ] for i, query in enumerate(test_queries, 1): print(f"\nQuery {i}: '{query}'") print("-" * 50) response = query_engine.query(query) print(f"Response: {response}") print("\n" + "="*50) การตั้งค่านี้รวมตัวรับสัญญาณที่กําหนดเองของเรากับเครื่องกําเนิดไฟฟ้าการตอบสนองที่ขับเคลื่อนด้วย LLM คําถามเคลื่อนย้ายได้อย่างราบรื่นผ่านท่อและแทนที่จะเพียงแค่สเปรย์ข้อมูลดิบก็กลับคําแนะนําที่ระมัดระวังเกี่ยวกับเกมชนิดใดที่ผู้ใช้อาจต้องการเล่นตามสิ่งที่พวกเขาถาม Takeaways วินเทจ Custom retrievers let you bake domain rules and jargon into the system. Combining multiple text fields into one index improves query understanding. In LlamaIndex you only need to implement _retrieve for a custom backend. Superlinked InMemoryExecutor gives real time latency on moderate datasets. Schema choice matters for clean parsing and mapping. Simple position based scoring is a stable default when you want predictable ranks.\ If you want a quick chat about where mixture of encoders or multi-field retrieval fits in your pipeline, ! talk to one of our engineers หากคุณต้องการแชทอย่างรวดเร็วเกี่ยวกับสถานที่ที่ผสมของตัวเข้ารหัสหรือการค้นหาหลายฟิลด์พอดีกับท่อของคุณ ! พูดคุยกับวิศวกรของเรา พูดคุยกับวิศวกรของเรา พูดคุยกับวิศวกรของเรา การอ้างอิงการบูรณาการ: แพคเกจ Superlinked Retriever บน PyPI และ LlamaIndex docs สําหรับ Retrievers ที่กําหนดเอง ผู้มีส่วนร่วม Vipul Maheshwari ผู้เขียน Filip Makraduli ผู้เขียน