paint-brush
Kobcinta RAG garaafyada aqoonta: Isku-dhafka Llama 3.1, NVIDIA NIM, iyo LangChain ee Dynamic AIby@neo4j
Taariikh cusub

Kobcinta RAG garaafyada aqoonta: Isku-dhafka Llama 3.1, NVIDIA NIM, iyo LangChain ee Dynamic AI

by Neo4j9m2024/10/22
Read on Terminal Reader

Aad u dheer; In la akhriyo

Maqaalkani wuxuu muujinayaa isticmaalka Llama 3.1, NVIDIA NIM, iyo LangChain si loo abuuro wakiil garaaf ku salaysan aqoonta jiilka dib-u-soo-celinta-kordhinta (RAG), ka faa'iidaysiga xogta habaysan iyo jiilka weydiinta firfircoon si loo hagaajiyo soo celinta macluumaadka iyo saxnaanta jawaabta.
featured image - Kobcinta RAG garaafyada aqoonta: Isku-dhafka Llama 3.1, NVIDIA NIM, iyo LangChain ee Dynamic AI
Neo4j HackerNoon profile picture
0-item
1-item



Iyadoo dadka intiisa badan ay diiradda saaraan soo-celinta-jiilka la kordhiyey (RAG) qoraalka aan qaabaysan, sida dukumentiyada shirkadda ama dukumentiyada, waxaan aad ugu qanacsanahay nidaamyada dib u soo celinta macluumaadka habaysan, gaar ahaan garaafyada aqoonta . Waxaa jiray farxad badan oo ku saabsan GraphRAG, gaar ahaan hirgelinta Microsoft. Si kastaba ha ahaatee, hirgalintooda, xogta la soo geliyo waa qoraal aan habaysanayn oo ah qaab dukumeenti ah, kaas oo loo beddelo garaafka aqoonta iyadoo la adeegsanayo qaabka luqadda weyn (LLM).


Boostada blog-ga ah, waxaan ku tusi doonaa sida loo hirgeliyo soo-celinta garaafka aqoonta oo ka kooban macluumaad habaysan oo ka socda Nidaamka Warbixinta Dhacdada Xun ee FDA (FAERS) , kaas oo bixiya macluumaadka ku saabsan dhacdooyinka xun xun ee daroogada. Haddii aad waligaa ku dartay garaafyada aqoonta iyo dib u soo celinta, fikirkaaga ugu horreeya waxa laga yaabaa inuu noqdo inaad isticmaasho LLM si aad u abuurto weydiimaha kaydka si aad uga soo saarto macluumaadka ku habboon garaafka aqoonta si aad uga jawaabto su'aal la bixiyay. Si kastaba ha ahaatee, jiilka xog ururinta ee isticmaalaya LLM-yada ayaa wali soo koraya waxaana laga yaabaa in aanay wali bixin xalka ugu joogtada ah ama ugu adag. Haddaba, waa maxay beddelka macquulka ah ee xilligan?


Fikradayda, xalka ugu fiican ee hadda jira waa jiilka weydiinta firfircoon. Halkii aad si buuxda ugu tiirsanaan lahayd LLM si ay u dhaliso waydiinta dhamaystiran, habkani waxa uu shaqaaleeyaa lakab macquul ah kaas oo si go'aamiya u soo saara xog xog-ururin ah oo ka soo baxay xaddi gelinta hore loo qeexay. Xalkan waxaa lagu hirgelin karaa iyadoo la adeegsanayo LLM oo leh taageerada wicitaanka shaqada. Faa'iidada isticmaalka sifa-wicitaanku waxay ku jirtaa awoodda lagu qeexo LLM sida ay tahay inay u diyaariso gelinta habaysan ee shaqada. Habkani wuxuu hubinayaa in habka soo saarista weydiinta la xakameeyey oo joogto ah iyada oo u oggolaanaysa isticmaalayaasha dabacsanaanta.


Qulqulka jiilka weydiinta firfircoon — Sawirka qoraaga


Sawirku wuxuu muujinayaa habka loo fahmo su'aasha isticmaalaha si loo soo saaro macluumaad gaar ah. Socodku wuxuu ku lug leeyahay saddex tallaabo oo waaweyn:


  1. Isticmaaluhu wuxuu waydiiyaa su'aal ku saabsan waxyeelada guud ee daawada Lyrica ee dadka ka yar 35 sano.


  2. LLM ayaa go'aamisa shaqada la wacayo iyo cabbirrada loo baahan yahay. Tusaalahan, waxay dooratay hawl lagu magacaabo side_effects oo leh cabbirro ay ku jiraan dawada Lyrica iyo da'da ugu badan ee 35.


  3. Shaqada la aqoonsaday iyo halbeegyada waxaa loo adeegsadaa si go'aamin iyo firfircooni leh loo soo saaro weedha xog ururin (Cypher) si loo soo saaro macluumaadka khuseeya.


Taageerada wicitaanka shaqada ayaa muhiim u ah kiisaska isticmaalka horumarsan ee LLM, sida u oggolaanshaha LLM-yada inay isticmaalaan soo-saarayaal badan oo ku salaysan ujeeddada isticmaale ama dhisidda socodka wakiillada badan. Waxaan qoray maqaallo qaar anigoo isticmaalaya LLM-yada ganacsiga leh oo leh taageero shaqo-wacitaan hooyo. Si kastaba ha ahaatee, waxaanu isticmaali doonaa Llama-3.1 dhawaan la sii daayay, LLM-ka sare ee il furan oo leh taageero-wacitaan hawleed hooyo.


Koodhka ayaa laga heli karaa GitHub .

Dejinta Garaafka Aqoonta

Waxaan isticmaali doonaa Neo4j, kaas oo ah xogta garaafyada asalka ah, si aan u kaydiyo macluumaadka dhacdada xun. Waxaad dejin kartaa mashruuc Sandbox Sanduuq ah oo bilaash ah kaas oo la socda FAERS horay loo dajiyay adiga oo raacaya isku xidhkan .


Tusaalaha xogta degdegga ah ayaa leh garaaf leh jaantuskan soo socda.


Dhacdooyinka xun xun garaaf schema — Sawirka qoraaga


Nidaamku wuxuu xarumaha u yahay qanjidhada kiiska, kaas oo isku xira dhinacyo kala duwan oo warbixinta badbaadada daroogada ah, oo ay ku jiraan daawooyinka ku lug leh, falcelinta la soo maray, natiijooyinka, iyo daawaynta loo qoray. Daawooyin kastaa waxaa lagu gartaa inay tahay mid hoose, mid sare, mid isku xidhan, ama is dhexgal. Kiisaska sidoo kale waxay la xiriiraan macluumaadka ku saabsan soo saaraha, kooxda da'da bukaanka, iyo isha warbixinta. Qorshahani wuxuu ogolaanayaa la socodka iyo falanqaynta xidhiidhka ka dhexeeya daroogada, falcelintooda, iyo natiijooyinka qaab habaysan.


Waxaan ku bilaabi doonaa abuurista ku xidhidhka xogta anagoo dagdagaya shayga Neo4jGraph:


 os.environ["NEO4J_URI"] = "bolt://18.206.157.187:7687" os.environ["NEO4J_USERNAME"] = "neo4j" os.environ["NEO4J_PASSWORD"] = "elevation-reservist-thousands" graph = Neo4jGraph(refresh_schema=False)


Dejinta Deegaanka LLM

Waxaa jira fursado badan oo lagu martigelin karo ilo furan LLMs sida Llama-3.1. Waxaan isticmaali doonaa buug-gacmeedka NVIDIA API , kaas oo bixiya NVIDIA NIM microservices iyo taageeridda wacitaanka noocyada Llama 3.1. Markaad samaysato koonto, waxaad helaysaa 1,000 calaamado ah, taas oo ka badan in la raaco. Waxaad u baahan doontaa inaad abuurto furaha API oo aad nuqul ka geliso buugga xusuus-qorka:


 os.environ["NVIDIA_API_KEY"] = "nvapi-" llm = ChatNVIDIA(model="meta/llama-3.1-70b-instruct")


Waxaan isticmaali doonaa llama-3.1-70b sababtoo ah nooca 8b wuxuu leeyahay xoogaa hiccups ah oo leh cabbiro ikhtiyaari ah oo ku jira qeexitaannada shaqada.


Waxa ugu fiican ee ku saabsan adeegyada yar yar ee NVIDIA NIM waa inaad si fudud ugu martigelin karto gudaha gudaha haddii aad leedahay amni ama walaacyo kale, markaa si fudud ayaa loo beddeli karaa, oo kaliya waxaad u baahan tahay inaad ku darto cabbir URL qaabeynta LLM:


 # connect to an local NIM running at localhost:8000, # specifying a specific model llm = ChatNVIDIA( base_url="http://localhost:8000/v1", model="meta/llama-3.1-70b-instruct" )

Qeexida Qalabka

Waxaan habayn doonaa hal qalab oo leh afar cabbir oo ikhtiyaari ah. Waxaan dhisi doonaa qoraal u dhigma Cypher oo ku saleysan cabbirradaas si aan uga soo saarno xogta la xiriirta garaafka aqoonta. Qalabkeenu wuxuu awood u yeelan doonaa inuu aqoonsado waxyeellooyinka soo noqnoqda ee soo noqnoqda iyadoo lagu saleynayo dawada la geliyo, da'da, iyo soo saaraha daroogada.


 @tool def get_side_effects( drug: Optional[str] = Field( description="disease mentioned in the question. Return None if no mentioned." ), min_age: Optional[int] = Field( description="Minimum age of the patient. Return None if no mentioned." ), max_age: Optional[int] = Field( description="Maximum age of the patient. Return None if no mentioned." ), manufacturer: Optional[str] = Field( description="manufacturer of the drug. Return None if no mentioned." ), ): """Useful for when you need to find common side effects.""" params = {} filters = [] side_effects_base_query = """ MATCH (c:Case)-[:HAS_REACTION]->(r:Reaction), (c)-[:IS_PRIMARY_SUSPECT]->(d:Drug) """ if drug and isinstance(drug, str): candidate_drugs = [el["candidate"] for el in get_candidates(drug, "drug")] if not candidate_drugs: return "The mentioned drug was not found" filters.append("d.name IN $drugs") params["drugs"] = candidate_drugs if min_age and isinstance(min_age, int): filters.append("c.age > $min_age ") params["min_age"] = min_age if max_age and isinstance(max_age, int): filters.append("c.age < $max_age ") params["max_age"] = max_age if manufacturer and isinstance(manufacturer, str): candidate_manufacturers = [ el["candidate"] for el in get_candidates(manufacturer, "manufacturer") ] if not candidate_manufacturers: return "The mentioned manufacturer was not found" filters.append( "EXISTS {(c)<-[:REGISTERED]-(:Manufacturer {manufacturerName: $manufacturer})}" ) params["manufacturer"] = candidate_manufacturers[0] if filters: side_effects_base_query += " WHERE " side_effects_base_query += " AND ".join(filters) side_effects_base_query += """ RETURN d.name AS drug, r.description AS side_effect, count(*) AS count ORDER BY count DESC LIMIT 10 """ print(f"Using parameters: {params}") data = graph.query(side_effects_base_query, params=params) return data


The get_side_effectsfunction waxaa loogu talagalay in laga soo saaro waxyeelada guud ee daroogooyinka garaaf aqoonta iyadoo la adeegsanayo shuruudaha raadinta ee cayiman. Waxay aqbashaa xuduudaha ikhtiyaariga ah ee magaca dawada, da'da bukaanka, iyo soo saaraha daroogada si loo habeeyo raadinta. Halbeeg kastaa wuxuu leeyahay sharraxaad loo gudbiyay LLM oo ay la socoto sharraxaadda shaqada, taasoo u sahlaysa LLM inuu fahmo sida loo isticmaalo. Hawshu waxay markaas dhistaa weydiinta Cypher firfircoon ee ku salaysan agabyada la bixiyey, waxay ku fulisaa su'aashan si ka soo horjeeda garaafka aqoonta, oo waxay soo celisaa xogta waxyeelada soo baxday.


Aynu tijaabinno shaqada:


 get_side_effects("lyrica") # Using parameters: {'drugs': ['LYRICA', 'LYRICA CR']} # [{'drug': 'LYRICA', 'side_effect': 'Pain', 'count': 32}, # {'drug': 'LYRICA', 'side_effect': 'Fall', 'count': 21}, # {'drug': 'LYRICA', 'side_effect': 'Intentional product use issue', 'count': 20}, # {'drug': 'LYRICA', 'side_effect': 'Insomnia', 'count': 19}, # ...


Qalabkayagu wuxuu marka hore khariidadeeyay dawada Lyrica ee lagu sheegay su'aasha "['LYRICA', 'LYRICA CR']" qiyamka garaafka aqoonta, ka dibna waxay fuliyeen bayaan Cypher u dhigma si loo helo waxyeellooyinka soo noqnoqda.

Wakiilka LLM garaaf ku salaysan

Waxa kaliya ee hadhay in la sameeyo waa habaynta wakiilka LLM kaas oo isticmaali kara qalabka la qeexay si uu uga jawaabo su'aalaha ku saabsan waxyeelada daawada.


Socodka xogta wakiilka — Sawirka qoraaga


Sawirku wuxuu muujinayaa isticmaale la falgalaya wakiilka Llama 3.1 si uu wax uga weydiiyo waxyeelada daroogada. Wakiilku waxa uu galaa aalada waxyeelada leh ee ka soo saarta macluumaadka garaafka aqoonta si loo siiyo isticmaalaha xogta la xidhiidha.


Waxaan ku bilaabaynaa qeexida qaabka degdega ah:


 prompt = ChatPromptTemplate.from_messages( [ ( "system", "You are a helpful assistant that finds information about common side effects. " "If tools require follow up questions, " "make sure to ask the user for clarification. Make sure to include any " "available options that need to be clarified in the follow up questions " "Do only the things the user specifically requested. ", ), MessagesPlaceholder(variable_name="chat_history"), ("user", "{input}"), MessagesPlaceholder(variable_name="agent_scratchpad"), ] )


Qaabka degdega ah waxaa ku jira fariinta nidaamka, taariikhda wada sheekaysiga ikhtiyaariga ah, iyo gelinta isticmaalaha. Agent_scratchpad waxaa loo habeeyay LLM, maadaama ay mararka qaar u baahan tahay tillaabooyin badan si looga jawaabo su'aasha, sida fulinta iyo soo celinta macluumaadka qalabka.


Maktabada LangChain ayaa ka dhigaysa mid toos ah in lagu daro agabka LLM iyadoo la adeegsanayo habka bind_tools:


 tools = [get_side_effects] llm_with_tools = llm.bind_tools(tools=tools) agent = ( { "input": lambda x: x["input"], "chat_history": lambda x: _format_chat_history(x["chat_history"]) if x.get("chat_history") else [], "agent_scratchpad": lambda x: format_to_openai_function_messages( x["intermediate_steps"] ), } | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser() ) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True).with_types( input_type=AgentInput, output_type=Output )


Wakiilku waxa uu ku socodsiiyaa gelinta iyada oo loo marayo isbeddello iyo maamulayaal qaabeeya taariikhda wada sheekeysiga, ku dabaqa LLM-ka aaladaha ku xidhan, oo kala saara wax-soo-saarka. Ugu dambayntii, wakiilka waxa loo dejiyay fuliye maamula socodka fulinta, qeexaya gelinta iyo noocyada wax soo saarka, oo uu ku jiro jaangooyooyinka hadalka ee goynta faahfaahsan inta lagu jiro fulinta.


Aan tijaabinno wakiilka:


 agent_executor.invoke( { "input": "What are the most common side effects when using lyrica for people below 35 years old?" } )


Natiijooyinka:


Fulinta wakiilka — Sawirka qoraaga


LLM waxay aqoonsatay inay u baahan tahay inay isticmaasho get_side_effects shaqada oo leh doodo ku habboon. Hawshu waxay markaa si firfircoon u soo saartaa bayaan Cypher, waxay soo saartaa macluumaadka la xidhiidha, oo waxay ku celisaa LLM si ay u dhaliso jawaabta u dambaysa.

Soo koobid

Awoodaha wacitaanka shaqadu waa wax dheeraad ah oo xoog leh oo lagu daro moodooyinka il-furan sida Llama 3.1, taas oo awood u siinaysa is-dhexgalka habaysan oo badan oo la kontoroolo ilaha xogta iyo qalabka. Marka laga reebo weydiinta dukumentiyada aan habaysanayn, wakiilada garaafku waxay bixiyaan fursado xiiso leh oo lagula falgalo garaafyada aqoonta iyo xogta habaysan. Fududeynta martigelinta moodooyinkan iyadoo la adeegsanayo aaladaha sida NVIDIA NIM microservices ayaa ka dhigaya iyaga si sii kordheysa oo la heli karo.


Sida had iyo jeer, koodka ayaa laga heli karaa GitHub .


Si aad wax badan uga barato mawduucan, nagu soo biir NODES 2024 ee Noofembar 7, shirkayada horumarinta farsamada ee bilaashka ah ee ku saabsan abka caqliga leh, garaafyada aqoonta, iyo AI. Hadda isdiiwaangeli!