"Help! Our AI model cost waa ka mid ah dhismaha!" Sida loo yaqaan ChatGPT iyo caawinkii waxay ka dhigi karaa dhismaha dhismaha ah ee loo isticmaali karaa AI, xawaaraha loo isticmaali karaa ee loo isticmaali karaa LLM-ka waa mid ka mid ah sida loo isticmaali karaa API-ka ee web interface. Sida loo yaabaa in ay ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid Sida loo yaabaa in ay ku yaalaa in ay ka mid ah wax soo saarka, waxaa laga yaabaa in ay ka mid ah wax soo saarka iyo wax soo saarka iyo wax soo saarka. To understand it better, let's break this down with a real-world example. Imagine we're building “ResearchIt” (not a real product, but bear with me), an application that helps researchers digest academic papers. Want a quick summary of that dense methodology section? Need to extract key findings from a 50-page paper? Our app has got you covered. Version 1.0: The Naive Approach Versión 1.0: The Naive Approach Waayo, waxaan ku raaxay in la xira OpenAI hype train - version our ugu horeysay waa ugu caawin ah: Shuruudaha ugu horeysay ee shuruudaha ugu horeysay ee shuruudaha ugu horeysay ee shuruudaha ugu horeysay ee shuruudaha ugu horeysay ee shuruudaha ugu horeysay ee shuruudaha ugu horeysay ee shuruudaha ugu horeysay ee shuruudaha ugu horeysay ee shuruudaha ugu horeysay ee shuruudaha ugu horeysay ee shuruudaha ugu horeysay ee shuruudaha. Our backend forwards the text to GPT-5 with a prompt like "You are a helpful research assistant. Analyze the following text and deliver insights strictly from the section provided by the user…..." Magaca waa, iyo isticmaalka Our waa in aad u aragto Qiimeeyaasha waa mid ah. Qiimeeyaasha waa mid ah. Sida loo yaabaa wax badan oo ka mid ah wax soo saarka our, wax soo saarka API our ay ka mid ah oo ka mid ah macluumaadka telegraafka. Problem waa in la soo saarka cadaadiga ee GPT-5, Rolls-Royce ee macluumaadka cilmiga ah, marka Toyota Corolla waxaa ka mid ah wax soo saarka ah. Oo, GPT-5 waa awood, oo ay 128k dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha Mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah. The choice really depends on the following things: Output Quality: Can the model consistently deliver the accuracy your application needs? Does the model support the language that you want to work with? Shirkadda Reply: Waayo, isticmaalka aad u hesho milliseconds ka badan oo ka mid ah wax soo saarka ugu fiican? waqti qaadiidka caadiga ah ee wax ka mid ah app waa in ay ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah. Data Integrity: Saaxiibta ah waa data aad, oo ay u baahan tahay privacy aad? Qalabka Resource: What is your budget, both for costs and engineering time? Waayo, waxaan u baahan tahay si ay u baahan tahay si ay u baahan tahay si ay u baahan tahay si ay u baahan tahay si ay u baahan yahay si ay u baahan yahay si ay u baahan yahay si ay u baahan yahay si ay u baahan yahay si ay u baahan yahay si ay u baahan yahay. Bottom Line: Nala soo xiriir adeegyada Bottom Line: Nala soo xiriir adeegyada Si aad u baahan tahay qiyaastii ah si ay u baahan tahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay in ay u baahan yahay. Marka aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay. Not so fast. Academic papers are long. Even with GPT-5's generous 128K token limit, sending full documents per query is an expensive overkill. Plus, studies have shown that as context length increases, LLM performance can Sida loo yaabaa, waxaa loo isticmaali karaa in ay ka mid ah wax soo saarka ah. Qalabka Qalabka So, what's the solution? Version 2.0: Smarter chunking and retrieval Version 2.0: Qalabka dhismaha iyo dhismaha dhismaha Qiyaasta ah waa in uu ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa in ay ku yaalaa? **Answer is: \ (RAG). Haddii loo yaqaan 'Dumping the entire document into the LLM', waxaan ka heli karaa qiyaasta ugu caawin ah ka dib markii ay ku qiyaasta. Sidaas, waxaan u baahan tahay in ay ku xisaabtaa dhismaha oo dhan ee LLM si ay u qiyaasta tokens ama sidoo kale si ay u baahan tahay in ay ka mid ah dhismaha caawin ah si ay u isticmaali karaa LLM. Retrieval-Augmented soo saarka Retrieval-Augmented soo saarka There are 3 important aspects to consider here: Haku Shuruudaha iyo Chunk Retrieval Shuruudaha ugu horeysay ee lagu isticmaali karaa technology Recovery. Step 1: Chunking – Shuruudda dhismaha Intelligently Marka aad u baahan tahay in ay ka mid ah wax soo saarka ah, waxaan u baahan tahay in ay ka mid ah wax soo saarka ah oo ay ka mid ah wax soo saarka ah oo ka mid ah wax soo saarka ah oo ka mid ah wax soo saarka ah. Shuruudaha dhismaha: loo isticmaali karaa shuruudaha dhismaha (tituule, abstrakts, methodology, iwm) si loo isticmaali karaa shuruudaha dhismaha. Shuruudaha shuruudaha shuruudaha: Shuruudaha shuruudaha leh (e.g., shuruudaha 200-token) si ay u qiyaastii karaa qiyaasta ka mid ah jilicsan. Adaptive chunking: dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha. Section-based chunking Qalabka Window Qalabka Adaptive Step 2: Shuruudaha iyo kharashka Intelligent Once your document chunks are ready, the next challenge is storing and retrieving them efficiently. With modern LLM applications handling millions of chunks, your storage choice directly impacts performance. Traditional approaches that separate storage and retrieval often fall short. Instead, the storage architecture should be designed with retrieval in mind, as different patterns offer distinct trade-offs for speed, scalability, and flexibility. The conventional distinction of using relational databases for structured data and NoSQL for unstructured data still applies, but with a twist: LLM applications store not just text but semantic representations (embeddings). Markaas ka mid ah dhismaha, dhismaha dhismaha iyo dhismaha ay ku salaysan PostgreSQL ama MongoDB. This works for small to medium-scale applications but has clear limitations as data and query volume grow. The challenge here isn't storage, it's the retrieval mechanism. Traditional databases excel at exact matches and range queries, but they weren't built for semantic similarity searches. You'd need to implement additional indexing strategies or use extensions like Sida loo isticmaali karaa wax soo saarka caadiga ah ee wax soo saarka caadiga ah ee wax soo saarka caadiga ah ee wax soo saarka caadiga ah ee wax soo saarka caadiga ah ee wax soo saarka caadiga ah ee wax soo saarka caadiga ah ee wax soo saarka caadiga ah. (Hierarchical Navigable Small World) Invertered File Index) si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa. Qalabka Haku QEEBE Qalabka Haku QEEBE Waxay ku dhigi karaa 2 metricka ugu weyn ee ugu weyn: Xaalinta Euclidean: Isticmaalka ugu caawin ah oo ka mid ah xaaladaha ugu caawin ah oo ka mid ah xaaladaha ugu caawin ah oo ka mid ah xaaladaha ugu caawin ah oo ka mid ah xaaladaha xaaladaha. Shuruudaha Cosine: Shuruudaha Standard ee wax soo saarka semantic - waxa ay ku yaalaa in ay ku habboon vektor ah oo ka mid ah magnitude. Qalabka dhismaha vektor ah waa mid ka mid ah si ay u isticmaali karaa wax soo saarka ugu fiican ee wax soo saarka LLM, sida loo yaqaan scalability, wax soo saarka wax soo saarka, iyo kharashka shaqada. iyo waxaa laga yaabaa in la soo saarka ANN oo ka mid ah wax soo saarka ah - waxaa laga yaabaa in la soo saarka automatically sida loo yaabaa in ay ku habboon karo workloads dynamic oo ka mid ah macluumaadka macluumaadka ah. (IVF-based) offer more control and cost-effectiveness at scale, but require careful tuning. pgvector integrated with Postgres enables hybrid search, though it may hit limits under high-throughput workloads. The choice finally depends on workload size, query patterns, and operational constraints. Haku Weaviate Haku Haku Haku Haku Step 3: Strateegyada ugu horeysay ee Retrieval Marka aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan tahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay in aad u baahan yahay. A common challenge in retrieval systems is balancing precision and recall. Keyword-based search (e.g., BM25, TF-IDF) is excellent for finding exact term matches but struggles with semantic understanding. On the other hand, vector search (e.g., FAISS, HNSW, or IVFFlat) excels at capturing semantic relationships but can sometimes return loosely related results that miss crucial keywords. To overcome this, a hybrid retrieval strategy combines the strengths of both methods. This involves: Retrieving candidates – running both a keyword and vector similarity search in parallel. Shuruudaha dhismaha waaweyn ee dhismaha dhismaha dhismaha waaweyn ee dhismaha dhismaha dhismaha iyo dhismaha dhismaha. Reranking for optimal ordering – ensuring the most relevant information appears at the top based on semantic requirements. Waxaa laga yaabaa in ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah Qalabka badan oo loo isticmaali karaa for raffination LLM waa mid ka mid ah: Filter Semantic Coherence: Waayo, ka mid ah wax soo saarka raaxo top-K, LLM waxay ku dhigi karaa in ay ka mid ah wax soo saarka logic oo ku yaabaa. By ranking passages for semantic coherence, sidoo kale waxaa loo isticmaali karaa in la mid ah wax soo saarka ah. Reranking-Based Relevance: Model-ka sida Cohere Rerank, BGE, ama MonoT5 waa in la re-evaluate dokumentiyada ka mid ah, si ay u qaadi karo patterns Relevance iyo si ay u isticmaali karaa si ay u isticmaali karaa macluumaadka. Shuruudaha kontekstiga oo leh Iterative Retrieval: Retrieval static waxaa laga yaabaa in la xiriira in la xiriira. LLMs waa in la isticmaali karaa qiyaasadaha, loo soo saarka qiyaasadaha ka soo saarka, iyo qiyaasadda Retrieval in la isticmaali karaa in la isticmaali karaa qiyaasadaha la isticmaali karaa. Semantic Coherence Filtering Relevance-Based Reranking Context Expansion with Iterative Retrieval Now, with these updates, our system is better equipped to handle complex questions across multiple sections of a paper, while maintaining accuracy by grounding responses strictly in the provided content. But what happens when a single source isn't enough? Some questions require synthesizing information across multiple papers or performing calculations using equations from different sources - challenges that pure retrieval can't solve. Version 3.0 - Building a Comprehensive and Reliable System Version 3.0 - Shirkadda nidaamka dhismaha oo ah oo ah Waayo, "ResearchIt" waxaa laga yaqaan "ResearchIt" waxaa laga yaqaan "ResearchIt" oo ka mid ah nidaamka kharashka ah oo ka mid ah nidaamka kharashka ah oo ka mid ah nidaamka kharashka ah oo ka mid ah nidaamka kharashka ah oo ka mid ah nidaamka kharashka ah oo ka mid ah nidaamka kharashka ah. Sida loo yaabaa in ay ka mid ah nidaamka loo yaabaa in ay ku yaalaa in ay ka mid ah wax soo saarka badan oo ka mid ah wax soo saarka ah oo ka mid ah wax soo saarka. The new wave of questions looks like: "Wala soo saarka optimization ee transformer waa in ay soo saarka ugu fiican ee wax soo saarka xawaaraha xawaaraha xawaaraha ka mid ah si ay u soo saarka benchmarks, wax soo saarka open-source, iyo macluumaadka macaashka ah?" "Si kastaba ha ahaatee, wax soo saarka ah oo ka mid ah wax soo saarka iyo wax soo saarka waa mid ka mid ah wax soo saarka, wax soo saarka waa mid ka mid ah wax soo saarka. Waxay ku yaalaa in ay ka mid ah wax soo saarka ah oo ka mid ah wax soo saarka ah. - Qiimeeyaasha si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa si ay u isticmaalaa. Qalabka Multi-Source Despite its strong comprehension abilities, “ResearchIt” 2.0 struggles with two major limitations when reasoning across diverse information sources: Cross-Sectional Analysis: When answers require both interpretation and computation (e.g., extracting FLOPs or accuracy from tables and comparing them across conditions). The model must not only extract numbers but also understand context and significance. Cross-Source Synthesis: When relevant data lives across multiple systems - PDFs, experiment logs, GitHub repos, or structured CSVs - and the model must coordinate retrieval, merge conflicting findings, and produce one coherent explanation. These issues aren’t just theoretical. They reflect real-world challenges in AI scalability. As data ecosystems grow more complex, organizations need to move beyond basic retrieval toward reasoned orchestration - systems that can plan, act, evaluate, and continuously adapt. Sida loo yaqaan "Transformer Optimization Techniques" waxaa loo yaqaan "Transformers Optimization Techniques" - sida loo yaqaan "Transformers Optimization Techniques" waxaa loo yaqaan "Transformers Optimization Techniques" (Transformers Optimization Techniques)? Markaas ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid So, what exactly did we do here? Break down the overarching question into smaller, focused subproblems - which sources to search, what metrics to analyze, and how comparisons should be run. Waayo, waxaa laga yaabaa in ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah. Last, dhismaha dhismaha si ay u dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha dhismaha Sida loo yaabaa, waxaa laga yaabaa in ay ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ah mid ka mid ah mid ah mid ah mid ka mid ah mid ah mid ah Step 1: Chain of Thought / Planning Sida loo yaabaa in ay ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah. (CoT) waxaa la aasaasay. CoT waxaa loo isticmaali karaa modelka si ay u qaadi karaa ka hor, si ay u isticmaali karaa qiyaasadaha dhismaha ah oo loo isticmaali karaa si ay u isticmaali karaa si ay u isticmaali karaa in ay u isticmaali karaa in ay u isticmaali karaa in ay u isticmaali karaa. Chain of Thought Chain of Thought Sida loo yaqaan 'LangChain' oo ku yaalaa qiyaastii ah, "Chain" ah ee qiyaastii ah waxaa loo yaqaan 'Tree of Thought' (ToT) ama 'Graph of Thought' (GoT) - ku yaalaa qiyaastii ah ee loo yaqaan 'Thinking Forward' iyo 'Thinking Forward' -ka ku yaalaa in ay ku yaalaa qiyaastii ah oo ka horumariyaan ka mid ah qiyaastii ah oo ka horumariyaan. Of course, adopting these reasoning-heavy models introduces practical considerations - primarily, cost. Running multi-step reasoning chains is computationally expensive, so model choice matters. Current options include: Closed-source models like OpenAI’s o3 and o4-mini, which offer high reasoning quality and strong orchestration capabilities. Open-source alternatives such as DeepSeek-R1, which provide transparent reasoning with more flexibility/ engineering effort for customization. Markaad ka mid ah LLMs non-thinking ( sida LLaMA 3) sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale. Step 2: Multi-source workflows- Function Calling to Agents Breaking down complex problems into logical steps is only half the battle. The system must then coordinate across different specialized tools - each acting as an "expert" - to answer sub-questions, execute tasks, gather data, and refine its understanding through iterative interaction with its environment. Xafiiska ah Sida loo yaqaan 'Function calling/tools' waxaa loo yaqaan 'Function calling/tools' waxaa loo yaqaan 'Function calling/tools' oo loo yaqaan 'Function calling/tools' oo loo yaqaan 'Function calling/tools' waxaa loo yaqaan 'Function calling/tools' oo loo yaqaan 'Function calling/tools' oo loo yaqaan 'Function calling'. rather than simply predict text. You provide the model with a toolkit - for example, functions like Marka and the model decides which one to call, when to call it, and in what order. Let’s take a simple example: Qalabka Calling Qalabka Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku: Haku calculate_statistics() Qalabka Calling Taageerada: "Calculate the average accuracy for BERT fine-tuning." A model oo loo isticmaalaa function calling sidoo kale loo isticmaalaa in la xira linear sida this: search_papers("BERT fine-tuning accuracy") extract_table() in la xiriira calculate_statistics() si ay u hesho midabka This dummy example of a simple deterministic pipeline where an LLM and a set of tools are orchestrated through predefined code paths is straightforward and effective and can often serve the purpose for a variety of use cases. However, it’s and Sida loo isticmaali karaa in ay ka mid ah wax soo saarka. might be the better option when flexibility, better task performance and model-driven decision-making are needed at scale (with the tradeoff of latency and cost). linear Qalabka Adaptive agentic workflow Xirfadaha Workflow Iterative agentic workflows waa mid ka mid ah macluumaadka oo ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah macluumaadka. Sida loo yaqaan 'Human researcher', model waa in ay u baahan tahay si ay u aragto, si ay u aragto, si ay u aragto, iyo si ay u aragto data badan oo ay u aragto. reflect, revise, and re-run Waayo, sidoo kale waxaa laga yaabaa in ka mid ah macluumaadka badan oo ka mid ah macluumaadka badan oo ka mid ah macluumaadka iyo macluumaadka, oo ka mid ah macluumaadka iyo macluumaadka. Retrieval Agent: The information scout. It expands the initial query, runs both semantic and keyword searches across research papers, APIs, github repos, and structured datasets, ensuring that no relevant source is overlooked. Extraction Agent: The data wrangler. It parses PDFs, tables, and JSON outputs, then standardizes the extracted data - normalizing metrics, reconciling units, and preparing clean inputs for downstream analysis. Computation Agent: The analyst. It performs the necessary calculations, statistical tests, and consistency checks to quantify trends and verify that the extracted data makes sense. Validation Agent: The quality gatekeeper. It identifies anomalies, missing entries, or conflicting findings, and if something looks off, it automatically triggers re-runs or additional searches to fill the gaps. Synthesis Agent: The integrator. It pulls together all verified insights and composes the final evidence-backed summary or report. Each one can request clarifications, rerun analyses, or trigger new searches when context is incomplete, essentially forming a self-correcting loop - an evolving dialogue among specialized reasoning systems that mirror how real research teams work. Markaad ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah: Initial Planning (Reasoning LLM): The orchestrator begins by breaking the task into sub-objectives discussed before. First Retrieval Loop: The Retrieval Agent executes the plan by gathering candidate materials — academic papers, MLPerf benchmark results, and open-source repositories related to transformer optimization. During this step, it detects that two benchmark results reference outdated datasets and flags them for review, prompting the orchestrator to mark those as lower confidence. Extraction & Computation Loop: Next, the Extraction Agent processes the retrieved documents, parsing FLOPs and latency metrics from tables and converting inconsistent units (e.g., TFLOPs vs GFLOPs) into a standardized format. The cleaned dataset is then passed to the Computation Agent, which calculates aggregated improvements across optimization techniques. Meanwhile, the Validation Agent identifies an anomaly - an unusually high accuracy score from one repository. It initiates a follow-up query and discovers the result was computed on a smaller test subset. This correction is fed back to the orchestrator, which dynamically revises the reasoning plan to account for the new context. Iterative Refinement: Following the Validation Agent’s discovery that the smaller test set introduced inconsistencies in the reported results - the Retrieval Agent initiates a secondary, targeted search to gather additional benchmark data and papers on quantization techniques. The goal is to fill missing entries, verify reported accuracy-loss trade-offs, and ensure comparable evaluation settings across sources. The Extraction and Computation Agents then process this newly retrieved data, recalculating averages and confidence intervals for all optimization methods. An optional Citation Agent could examine citation frequency and publication timelines to identify which techniques are gaining traction in recent research. Final Synthesis: Once all agents agree, the orchestrator compiles a verified, grounded summary like - “ ” Across 14 evaluated studies, structured pruning yields 40–60 % FLOPs reduction with < 2 % accuracy loss (Chen 2023; Liu 2024). Quantization maintains ≈ 99 % accuracy while reducing memory by 75 % (Park 2024). Efficient-attention techniques achieve linear-time scaling (Wang 2024) with only minor degradation on long-context tasks (Zhao 2024). Recent citation trends show a 3× rise in attention-based optimization research since 2023, suggesting a growing consensus toward hybrid pruning + linear-attention approaches. Waxaad ka mid ka mid ah wax soo saarka ah oo ka mid ah wax soo saarka iyo wax soo saarka. . process Qalabka dhismaha iyo dhismaha iyo dhismaha iyo dhismaha iyo dhismaha iyo dhismaha iyo dhismaha iyo dhismaha. iyo communication which together enable seamless collaboration across specialized reasoning agents. MCP standardizes how models and tools exchange structured information - such as retrieved documents, parsed tables, or computed results - ensuring that each agent can understand and build upon the others’ outputs. Complementing this, A2A communication allows agents to directly coordinate with one another - sharing intermediate reasoning states, requesting clarifications, or triggering follow-up actions without intervention. Together, MCP and A2A form the backbone of collaborative reasoning: a flexible, modular infrastructure that enables agents to plan, act, and refine collectively in real time. Model Context Protocol (MCP) Qalabka (A2A) Model Context Protocol (MCP) Qalabka (A2A) Step 3: Si loo isticmaalaa in la isticmaalaa iyo in la isticmaalaa in la isticmaalaa Sida loo yaabaa, waxaa laga yaabaa nidaamka agenta ah oo loo yaabaa in la yaabaa in loo yaabaa qiyaastii ah oo loo yaabaa wax soo saarka ah, si ay u hesho data ka mid ah wax soo saarka ah, si ay u sameeyaan qiyaastii ah ama shuruudaha ka dib, iyo si ay u soo saarka si ay u soo saarka adeegga, si ay u hesho. Facts - waxay ku yaalaa token ka mid ah ugu caawin ah oo ku salaysan pattern-ka ee data-ga ah oo ka mid ah wax soo saarka waa caawin ah oo ah, laakiin sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale sidoo kale. Sida loo isticmaali karaa dhismaha dhismaha iyo dhismaha wax soo saarka, dhismaha ah waa in la isticmaali karaa nidaamka oo loo isticmaali karaa si loo isticmaali karaa dhismaha iyo dhismaha. Xafiiska correct Here are a few techniques that make this possible: Rule-Based Filtering: Define domain-specific rules or patterns that catch obvious errors before they reach the user. For example, if a model outputs an impossible metric, a missing data field, or a malformed document ID, the system can flag and regenerate it. Cross-Verification: Automatically re-query trusted APIs, structured databases, or benchmarks to confirm key numbers and facts. If the model says “structured pruning reduces FLOPs by 50%,” the system cross-checks that against benchmark data before accepting it. Shuruudaha Self-Consistency: Si loo soo saarka dhismaha badan oo ka soo bandhigay. Details hallucinated waxaa ka mid ah soo saarka, laakiin wax soo saarka Factual waxay ku salaysan si ay u soo saarka kaliya oo ka mid ah macluumaadka ugu badan. Together, these layers form the final safeguard - closing the reasoning loop. Every answer the system produces is not just well-structured but . verified Sida loo yaabaa, waxaa loo yaabaa in ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid ah mid ah mid ka mid