Ungafundisa indlela yokwakha umphathi we-AI for Research Paper Retrieval, Search, and Summarization Ungafundisa indlela yokwakha umphathi we-AI for Research Paper Retrieval, Search, and Summarization Ukuze abacwaningi, ukujabulela iziphumo ezidlulile zihlanganisa ukufumana isikhwama e-heystack. Qinisekisa umphathi we-AI-powered ukuthi akuyona kuphela amaphepha amakhulu asebenzayo kodwa futhi i-summarizes iziphumo eziyinhloko kanye nokuphendula imibuzo zakho ezithile, konke isikhathi esifanayo. Ngokuvamile, umphumela we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi. Ngokuvamile, umphumela we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi we-inthanethi. TL;DR: Ukwakha umphathi wokuthuthukiswa kwe-AI embalwa usebenzisa ukubuyekeza kwe-vector ye-Superlinked. It ukwehlisa i-RAG ephezulu ngokuvamile ngokuvimbela kanye nokufundisa idokhumenti ngqo-ukwenza ukubuyekeza ngokukhawuleza, kulula, futhi enhle. (Uma ufuna ukufinyelela ngqo ku-code? Khangela isisindo esifundeni ku-GitHub lapha. Ukulungele ukuhlola kwe-semantic ye-agentic yakho yokusebenzisa? Thina siphinde ukunceda.) Thola i-open source ku-GitHub . Ngiya Ngiya Ngiya Thina lapha ukuze . Ukusiza Ukusiza Ukusiza Umbhali elandelayo kubonisa indlela yokwakha uhlelo we-agent usebenzisa i-Kernel agent ukulawula i-queries. Uma ufuna ukuyifaka kanye nokushesha ikhodi ku-browser, here’s the . ikhaya . ikhaya ikhaya Yini ukuqala ukwakha inkqubo yokuhlola umphakeli? Ngokuvamile, ukwakhiwa kwelinye uhlelo kuhlanganisa ukucindezeleka kanye nezinsizakalo ezinhle. Izinsiza zokufaka ngokuvamile zitholela ikhadi elikhulu yokuqala esekelwe ku-relevance bese isetshenziswe inqubo ye-secondary rearranging ukuze zitholele iziphumo. Nakuba ukucindezeleka kwandisa ukucindezeleka, kubandisa kakhulu ukucindezeleka kwe-computational, i-latency, kanye ne-overhead ngenxa ye-data extensive retrieval okwenziwe ekuqaleni. I-Superlinked isixazulule lokhu ngokucindezeleka ngokuhlanganisa ukucindezeleka kwe-numeric kanye ne-catalogical nge-text embeddings semantic, okunikezela ama-vector multimodal ephelele. Lokhu Ukwakhiwa kwe-agent system nge-Superlinked Umphathi we-AI angakwazi ukwenza izinto ezintathu eziyinhloko: Find Papers: Search for research papers by topic (isib. “quantum computing”) futhi bese ukubeka kwabo ngokufanelekileyo kanye nesikhathi esifundeni. I-Summarize Papers: I-Condense i-Papers ebonakalayo ku-bit-size insights. Imibuzo ye-Response: Ukukhishwa imibuzo ngqo kusuka kumadokhumenti we-research eyodwa esekelwe ku-user-targeted queries. I-Superlinked isetshenziselwa i-RecencySpace ye-Superlinked, okuyinto ikhowudi i-metadata ye-temporal, enikezela imikhiqizo yesikhathi esifundeni esifundeni ngesikhathi sokufaka, futhi ukunciphisa ingozi ye-re-ranking eyenziwe ngekhompyutha. Ngokwesibonelo, uma amaphepha amabili angama-relevance efanayo, eyodwa esifundeni esifundeni esifundeni kuya ku-ranking aphezulu. Isinyathelo 1 : Setha Toolbox %pip install superlinked Ukuze kube lula futhi enamathela, ngithole isigaba se-Abstract Tool. Lokhu kuyacubungula inqubo yokwakha kanye nokwengeza izixhobo import pandas as pd import superlinked.framework as sl from datetime import timedelta from sentence_transformers import SentenceTransformer from openai import OpenAI import os from abc import ABC, abstractmethod from typing import Any, Optional, Dict from tqdm import tqdm from google.colab import userdata # Abstract Tool Class class Tool(ABC): @abstractmethod def name(self) -> str: pass @abstractmethod def description(self) -> str: pass @abstractmethod def use(self, *args, **kwargs) -> Any: pass # Get API key from Google Colab secrets try: api_key = userdata.get('OPENAI_API_KEY') except KeyError: raise ValueError("OPENAI_API_KEY not found in user secrets. Please add it using Tools > User secrets.") # Initialize OpenAI Client api_key = os.environ.get("OPENAI_API_KEY", "your-openai-key") # Replace with your OpenAI API key if not api_key: raise ValueError("Please set the OPENAI_API_KEY environment variable.") client = OpenAI(api_key=api_key) model = "gpt-4" Isinyathelo 2: Ukuphathelela Dataset Kulesi isibonelo usebenzisa dataset ebandakanya cishe 10,000 I-AI research papers ezinikezwayo Ukuze kube lula, kulula nje iseli elandelayo, futhi iyatholakala ngokuzenzakalelayo ukubuyekeza idatha ku-directory yakho yokusebenza. Ungasebenzisa izidakamizwa zakho zayo, njenge-papiers yokuhlola noma ezinye izinto zemfundo. Uma uye ukhethe ukwenza lokhu, konke ukuthi uzodingeka ukuguqulwa kwe-schema ngokunemba futhi ukuguqulwa ama-names ye-column. Ngena ngemvume import pandas as pd !wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=1FCR3TW5yLjGhEmm-Uclw0_5PWVEaLk1j' -O arxiv_ai_data.csv Okwangoku, ukuze izicathulo zokusebenza ngokushesha, siza kusetshenziselwa isixazululo esincane se-papiers kuphela ukuze isixazulule izicathulo, kodwa sicela usize isibonelo usebenzisa i-dataset ephelele. Isithombe esikhulu se-technical lapha kuyinto ukuthi ama-timestamps ezivela ku-dataset zithunywe kusuka ku-string timestamps (njenge '1993-08-01 00:00:00+00:00') ku-panda datetime objects. Lokhu kokuguqulwa kuyadingeka ngenxa yokuvumela kwenziwa kwezinqubo ze-date/time. df = pd.read_csv('arxiv_ai_data.csv').head(100) # Convert to datetime but keep it as datetime (more readable and usable) df['published'] = pd.to_datetime(df['published']) # Ensure summary is a string df['summary'] = df['summary'].astype(str) # Add 'text' column for similarity search df['text'] = df['title'] + " " + df['summary'] Debug: Columns in original DataFrame: ['authors', 'categories', 'comment', 'doi', 'entry_id', 'journal_ref' 'pdf_url', 'primary_category', 'published', 'summary', 'title', 'updated'] Ukubuyekeza i-DataSet Columns Ngezansi, isithombe esifundeni esifundeni eziyinhloko ku-dataset yethu, okuyinto kubaluleke ngezinyathelo ezilandelayo: Imininingwane: Imininingwane yokubonisa umbhalo wophando. Umhlahlandlela: I-abstract ye-paper, enikeza ukubuyekeza okuhlobene. entry_id: I-ID eyodwa ye-arXiv ye-papiers ngamunye. Ukuze lokhu ukuguqulwa, sinikezela ngokuvamile ku-four columns: Ngena ngemva Ngena ngemva , futhi Ukuze ukuphucula ikhwalithi lokufaka, inqwaba kanye nesithombe zihlanganiswa ku-column eyodwa, ephelele ye-text, okwenza isisindo se-embedding kanye ne-search yethu. entry_id published title summary I-In-Memory Indexer ye-Superlinked : I-In-Memory Indexing ye-Superlinked ibhekwa i-dataset yethu ngqo ku-RAM, okwenza ukufakelwa okusheshayo kakhulu, okuyinto enhle yokufundisa isikhathi se-real-time kanye ne-prototyping eyenziwe ngempumelelo. Ukuze lokhu-proof-of-concept nge-1,000 imibhalo yokufundisa, ukusetshenziswa kwe-in-memory approach kuncike ukusebenza kwe-query, ukunciphisa izinga lokuphumelela ku-disk access. Isinyathelo 3: Ukuqhathanisa i-Superlinked Schema Ukusuka kulandelayo, kufuneka isikhwama yokubhalisa idatha yethu. Sihlanganisa Nge izindawo ebalulekile: PaperSchema lass PaperSchema(sl.Schema): text: sl.String published: sl.Timestamp # This will handle datetime objects properly entry_id: sl.IdField title: sl.String summary: sl.String paper = PaperSchema() Ukucaciswa kwe-Superlinked Spaces for Effective Retrieval Isinyathelo esiyingqayizivele yokuhlanganisa nokuhambisa ngokushesha i-dataset yethu kuhlanganise ukucacisa izindawo ezimbili ze-vector: TextSimilaritySpace kanye ne-RecencySpace. TextSimilarityIzindawo Waze Ukusungulwa ukucubungula ulwazi se-text – njenge-titles kanye ne-abstracts ye-research papers ku-vectors. Ngokuguqulwa kwe-text ku-embeddings, le ndawo ikhiqiza kakhulu ukunambitheka nokunambitheka kwe-semantic searches. It is optimized specifically to handle longer text sequences efficiently, allowing precise similarity comparisons across documents. TextSimilaritySpace text_space = sl.TextSimilaritySpace( text=sl.chunk(paper.text, chunk_size=200, chunk_overlap=50), model="sentence-transformers/all-mpnet-base-v2" ) Ukubuyekezwa Waze Ukubhathanisa i-metadata ye-temporal, okukhuthaza i-recency ye-publishment ye-research. Ngokukhipha ama-timestamps, le mkhakha ivumela ukucindezeleka kwama-documents ezintsha. Ngenxa yalokho, imiphumela ye-recovery ivimbele ngokwemvelo ukucindezeleka kwe-content ne-dates ye-publishment, okukhuthaza ukucindezeleka okuphambili. RecencySpace recency_space = sl.RecencySpace( timestamp=paper.published, period_time_list=[ sl.PeriodTime(timedelta(days=365)), # papers within 1 year sl.PeriodTime(timedelta(days=2*365)), # papers within 2 years sl.PeriodTime(timedelta(days=3*365)), # papers within 3 years ], negative_filter=-0.25 ) Thola i-RecencySpace njenge-time-based filter, efana nokuhlanganisa i-imeyili yakho ngomhla noma ukubonisa i-Instagram amaphepha nge-akhawunti ezintsha ngokuvamile. It inikeza ukujabulela umbuzo, 'Ukuhlobisa kanjani le iphepha?' I-timedeltas encane (njenge-365 iintsuku) ivumela ukuhlaziywa kwe-time-based ye-granular. I-timedeltas emikhulu (njenge-1095 amahora) ivela amahora amakhulu. Waze Ukucacisa okuhle kakhulu, bheka isibonelo elandelayo lapho amaphepha amabili anama-content relevance, kodwa izigaba zabo zihlanganisa izinsuku zofuzo zabo. negative_filter Paper A: Published in 1996 Paper B: Published in 1993 Scoring example: - Text similarity score: Both papers get 0.8 - Recency score: - Paper A: Receives the full recency boost (1.0) - Paper B: Gets penalized (-0.25 due to negative_filter) Final combined scores: - Paper A: Higher final rank - Paper B: Lower final rank Lezi zihlanganisi zihlanganisa ukufinyelela kanye nokusebenza kwe-dataset. Lezi zihlanganisa ukufinyelela okuqukethwe kanye nesikhathi, futhi zihlanganisa kakhulu ukufinyelela kokubili lokuphelelwa kwebhizinisi. Lokhu kunikeza indlela enhle yokuhlanganisa futhi ukufinyelela nge-dataset ngokuvumelana nezinto kanye nesikhathi sokubiliwe. Isinyathelo 4 : Ukwakhiwa kwe-index Okulandelayo, izixazululo zihlanganiswa ku-index okuyinto isisindo se-search engine: paper_index = sl.Index([text_space, recency_space]) Ngemuva kwalokho, i-DataFrame ibhalwe ku-schema futhi ibhalwe amasethi (10 amaphepha ngexesha) ku-in-memory store: # Parser to map DataFrame columns to schema fields parser = sl.DataFrameParser( paper, mapping={ paper.entry_id: "entry_id", paper.published: "published", paper.text: "text", paper.title: "title", paper.summary: "summary", } ) # Set up in-memory source and executor source = sl.InMemorySource(paper, parser=parser) executor = sl.InMemoryExecutor(sources=[source], indices=[paper_index]) app = executor.run() # Load the DataFrame with a progress bar using batches batch_size = 10 data_batches = [df[i:i + batch_size] for i in range(0, len(df), batch_size)] for batch in tqdm(data_batches, total=len(data_batches), desc="Loading Data into Source"): source.put([batch]) I-in-memory executor iyona Superlinked ivela lapha - i-1,000 amaphepha zihlanganisa ngokushesha ku-RAM, futhi izivakashi zihlanganisa ngaphandle kwe-Disk I/O bottlenecks. Isinyathelo 5: Crafting i-query Okulandelayo kuyinto ukwakhiwa kwebhizinisi. Kuyinto lapho isampula yokufakelwa kwebhizinisi yasungulwa. Ukuze kusetshenziswe lokhu, sincoma isampula yebhizinisi elihlanganisa kanye nesikhathi esifundeni. Ngiyazi ukuthi kuyoba: # Define the query knowledgebase_query = ( sl.Query( paper_index, weights={ text_space: sl.Param("relevance_weight"), recency_space: sl.Param("recency_weight"), } ) .find(paper) .similar(text_space, sl.Param("search_query")) .select(paper.entry_id, paper.published, paper.text, paper.title, paper.summary) .limit(sl.Param("limit")) ) Ngokwenza lokhu, sinikezela ukuba ukhethe ukuthi ukunikezela impendulo (relevance_weight) noma ukuhlaziywa (recency_weight) - i-combo enhle kakhulu ekhona nezidingo zamakhasimende ethu. Isinyathelo 6: Ukwakhiwa Imishini Sishayele isigaba se-Tooling. Thina ukwakha izixhobo ezintathu ... I-Retrieval Tool : Lesi sixhobo ifakwe ngokuvimbela ku-Index ye-Superlinked, okuvumela ukuvimbela i-top 5 amaphepha ngokuvumelana ne-question. It is a balance of relevance (1.0 weight) and recentity (0.5 weight) to accomplish the “find papers” goal. What we want is to find the papers that are relevant to the query. Ngakho-ke, uma i-question is: “What quantum computing papers were published between 1993 and 1994?”, then the retrieval tool will retrieve those papers, summarize them one by one, and return the results. class RetrievalTool(Tool): def __init__(self, df, app, knowledgebase_query, client, model): self.df = df self.app = app self.knowledgebase_query = knowledgebase_query self.client = client self.model = model def name(self) -> str: return "RetrievalTool" def description(self) -> str: return "Retrieves a list of relevant papers based on a query using Superlinked." def use(self, query: str) -> pd.DataFrame: result = self.app.query( self.knowledgebase_query, relevance_weight=1.0, recency_weight=0.5, search_query=query, limit=5 ) df_result = sl.PandasConverter.to_pandas(result) # Ensure summary is a string if 'summary' in df_result.columns: df_result['summary'] = df_result['summary'].astype(str) else: print("Warning: 'summary' column not found in retrieved DataFrame.") return df_result Okulandelayo kuyinto . Lesi sixhobo iye yenzelwe izimo lapho i-shrink ye-paper iyadingeka. Ukuze kusetshenziswe, iyatholakala , okuyinto i-ID ye-paper ebonakalayo. Uma a Ngaphandle kokufakwa, isixhobo akufanele ukusebenza njengezinto ezidingekayo ukuze uthole imibhalo efanelekayo ku-dataset. Summarization Tool paper_id paper_id class SummarizationTool(Tool): def __init__(self, df, client, model): self.df = df self.client = client self.model = model def name(self) -> str: return "SummarizationTool" def description(self) -> str: return "Generates a concise summary of specified papers using an LLM." def use(self, query: str, paper_ids: list) -> str: papers = self.df[self.df['entry_id'].isin(paper_ids)] if papers.empty: return "No papers found with the given IDs." summaries = papers['summary'].tolist() summary_str = "\n\n".join(summaries) prompt = f""" Summarize the following paper summaries:\n\n{summary_str}\n\nProvide a concise summary. """ response = self.client.chat.completions.create( model=self.model, messages=[{"role": "user", "content": prompt}], temperature=0.7, max_tokens=500 ) return response.choices[0].message.content.strip() Okokuqala, sincoma . Lezi zixhobo zihlanganisa i ukuze uthenge imibhalo ezihlobene bese usebenzisa kubo ukuhlangabezana imibuzo. Uma imibhalo ezihlobene akuyona ukuhlangabezana imibuzo, kuyoba impendulo esisekelwe ulwazi jikelele QuestionAnsweringTool RetrievalTool class QuestionAnsweringTool(Tool): def __init__(self, retrieval_tool, client, model): self.retrieval_tool = retrieval_tool self.client = client self.model = model def name(self) -> str: return "QuestionAnsweringTool" def description(self) -> str: return "Answers questions about research topics using retrieved paper summaries or general knowledge if no specific context is available." def use(self, query: str) -> str: df_result = self.retrieval_tool.use(query) if 'summary' not in df_result.columns: # Tag as a general question if summary is missing prompt = f""" You are a knowledgeable research assistant. This is a general question tagged as [GENERAL]. Answer based on your broad knowledge, not limited to specific paper summaries. If you don't know the answer, provide a brief explanation of why. User's question: {query} """ else: # Use paper summaries for specific context contexts = df_result['summary'].tolist() context_str = "\n\n".join(contexts) prompt = f""" You are a research assistant. Use the following paper summaries to answer the user's question. If you don't know the answer based on the summaries, say 'I don't know.' Paper summaries: {context_str} User's question: {query} """ response = self.client.chat.completions.create( model=self.model, messages=[{"role": "user", "content": prompt}], temperature=0.7, max_tokens=500 ) return response.choices[0].message.content.strip() Isinyathelo 7: Ukwakhiwa Kernel Agent Okulandelayo i-Kernel Agent. I-Kernel Agent isebenza njenge-controller ye-central, enikezela ukusebenza okuhlobene futhi efanelekayo. Ukusebenza njenge-component ye-core ye-system, i-Kernel Agent ivimbela ukuxhumana ngokuvumelana nezidingo zayo lapho ama-agents amaningi zokusebenza ngokufanayo. Kwi-single-agent systems, njenge-one-agent, i-Kernel Agent isebenzisa ngqo izixhobo zokusebenza ngokuvumelana nezinsizakalo. class KernelAgent: def __init__(self, retrieval_tool: RetrievalTool, summarization_tool: SummarizationTool, question_answering_tool: QuestionAnsweringTool, client, model): self.retrieval_tool = retrieval_tool self.summarization_tool = summarization_tool self.question_answering_tool = question_answering_tool self.client = client self.model = model def classify_query(self, query: str) -> str: prompt = f""" Classify the following user prompt into one of the three categories: - retrieval: The user wants to find a list of papers based on some criteria (e.g., 'Find papers on AI ethics from 2020'). - summarization: The user wants to summarize a list of papers (e.g., 'Summarize papers with entry_id 123, 456, 789'). - question_answering: The user wants to ask a question about research topics and get an answer (e.g., 'What is the latest development in AI ethics?'). User prompt: {query} Respond with only the category name (retrieval, summarization, question_answering). If unsure, respond with 'unknown'. """ response = self.client.chat.completions.create( model=self.model, messages=[{"role": "user", "content": prompt}], temperature=0.7, max_tokens=10 ) classification = response.choices[0].message.content.strip().lower() print(f"Query type: {classification}") return classification def process_query(self, query: str, params: Optional[Dict] = None) -> str: query_type = self.classify_query(query) if query_type == 'retrieval': df_result = self.retrieval_tool.use(query) response = "Here are the top papers:\n" for i, row in df_result.iterrows(): # Ensure summary is a string and handle empty cases summary = str(row['summary']) if pd.notna(row['summary']) else "" response += f"{i+1}. {row['title']} \nSummary: {summary[:200]}...\n\n" return response elif query_type == 'summarization': if not params or 'paper_ids' not in params: return "Error: Summarization query requires a 'paper_ids' parameter with a list of entry_ids." return self.summarization_tool.use(query, params['paper_ids']) elif query_type == 'question_answering': return self.question_answering_tool.use(query) else: return "Error: Unable to classify query as 'retrieval', 'summarization', or 'question_answering'." Kulesi isigaba, zonke izakhiwo ze-Research Agent System ziye zihlanganiswe. Le nkqubo iyakwazi ukuqala ngokuvumela i-Kernel Agent izixhobo ezifanele, ngemuva kwalokho i-Research Agent System iyahambisana ngokuphelele. retrieval_tool = RetrievalTool(df, app, knowledgebase_query, client, model) summarization_tool = SummarizationTool(df, client, model) question_answering_tool = QuestionAnsweringTool(retrieval_tool, client, model) # Initialize KernelAgent kernel_agent = KernelAgent(retrieval_tool, summarization_tool, question_answering_tool, client, model) Sishayele isistimu ... # Test query print(kernel_agent.process_query("Find papers on quantum computing in last 10 years")) Ukusebenza okuzenzakalelayo . It uzothola imibhalo olufanelekayo ngokuvamile kanye nesikhathi esifundeni, futhi uzothola imibhalo olufanelekayo. Uma imiphumela esivela kuhlanganisa isithombe se-resume (ukubonisa ukuthi imibhalo lithunyelwe kusuka ku-dataset), it uzothola imibhalo esifundeni futhi uzothola nathi. RetrievalTool Query type: retrieval Here are the top papers: 1. Quantum Computing and Phase Transitions in Combinatorial Search Summary: We introduce an algorithm for combinatorial search on quantum computers that is capable of significantly concentrating amplitude into solutions for some NP search problems, on average. This is done by... 1. The Road to Quantum Artificial Intelligence Summary: This paper overviews the basic principles and recent advances in the emerging field of Quantum Computation (QC), highlighting its potential application to Artificial Intelligence (AI). The paper provi... 1. Solving Highly Constrained Search Problems with Quantum Computers Summary: A previously developed quantum search algorithm for solving 1-SAT problems in a single step is generalized to apply to a range of highly constrained k-SAT problems. We identify a bound on the number o... 1. The model of quantum evolution Summary: This paper has been withdrawn by the author due to extremely unscientific errors.... 1. Artificial and Biological Intelligence Summary: This article considers evidence from physical and biological sciences to show machines are deficient compared to biological systems at incorporating intelligence. Machines fall short on two counts: fi... Thola thwebula enye isibuyekezo, lokhu, thina thwebula isibuyekezo eyodwa.. print(kernel_agent.process_query("Summarize this paper", params={"paper_ids": ["http://arxiv.org/abs/cs/9311101v1"]})) Query type: summarization This paper discusses the challenges of learning logic programs that contain the cut predicate (!). Traditional learning methods cannot handle clauses with cut because it has a procedural meaning. The proposed approach is to first generate a candidate base program that covers positive examples, and then make it consistent by inserting cut where needed. Learning programs with cut is difficult due to the need for intensional evaluation, and current induction techniques may need to be limited to purely declarative logic languages. Ngingathanda lokhu umzekelo iye asebenzayo ekuthuthukiseni ama-AI ama-agents kanye nama-agent-based systems. Iningi le-recovery functionality etholakalayo lapha lithunyelwe ku-Superlinked, ngakho-ke sicela ukuthatha umzila we-Superlinked. Ukuze izibuyekezo elandelayo lapho izinzuzo zokufaka okucacileyo zihlanganisa izesekeli zakho ze-AI! Ukukhangisa Ukuhlobisa Ikhodi yocingo Ukuhlanganiswa kwe-semantic ne-temporal relevance ukunciphisa i-rearanking ephikile ngokuvumelana nokugcina ukucindezeleka kwebhizinisi lezifundo. I-time-based penalties (negative_filter=-0.25) ibonise izifundo ezivamile lapho izihloko zihlanganisa impahla efanayo. I-Architecture ye-Tool-Based Modular inikeza izingxenye ezizodwa ukuhlangabezana nezinsizakalo ezahlukile (i-recovery, i-summarization, i-question-answering) ngokuvumelana nokuphathwa kwe-system cohesion. Ukukhuthaza idatha emaphaketheni amancane (batch_size=10) nge-progress tracking kuncike ukuguqulwa kwekhwalithi lapho kusetshenziswe ama-datasets amancane ze-research. Imibuzo ye-query ye-adjustable ivumela abasebenzisi ukuguqulwa kwe-relevance (1.0) ne-recent (0.5) ngokuvumelana nezidingo zophando ezithile. I-component ye-question-answering iyahlukanisa ngokushesha ku-knowledge jikelele lapho isikhokelo se-paper-specific ayizukwazi ukufinyelela, ukuvimbela iziphakamiso ze-user-dead-end. Ukuvumelana nenani elikhulu lwezifundo zezifundo ezivela ngokuvamile kungabangela izinzuzo futhi zokusebenza isikhathi. I-Agent AI assistant workflow enokusebenza ngokushesha lokufaka izifundo eziyinhloko, ukuphefumula izibuyekezo eziyinhloko, kanye nokuphendula imibuzo ezithile ezivela kulezi zifundo ingathuthukisa kakhulu le nqubo. Izinzuzo Vipul Maheshwari, umbhali Filip Makraduli, umbhali