Kugadzira magirafu eruzivo kubva muzvinyorwa yanga iri nzvimbo inonakidza yekutsvagisa kwenguva yakati rebei. Nekuuya kwemhando dzemitauro mikuru (LLMs), ndima iyi yakawana kutariswa kwakanyanya. Nekudaro, maLLM anogona kudhura zvakanyanya. Imwe nzira ndeyekugadzirisa zvimiro zvidiki, izvo zvakatsigirwa netsvakiridzo yedzidzo, zvichipa mhinduro dzinoshanda. Nhasi, isu tichaongorora Relik , hurongwa hwekumhanyisa kukurumidza uye huremu ruzivo rwekubvisa ruzivo, yakagadziridzwa neboka reNLP paSapienza University yeRome.
Iyo yakajairika ruzivo yekubvisa pombi isina LLM inoita seinotevera:
Mufananidzo uyu unoratidza pombi yekubvisa ruzivo, kutanga kubva kudhata rekuisa iro rine mavara anotaura "Tomaz anofarira kunyora mabloggi. Anonyanya kufarira kudhirowa madhayagiramu. " Maitiro acho anotanga ne coreference resolution yekuzivisa "Tomaz" uye "Iye" sechinhu chimwe chete. Named entity recognition (NER) yobva yazivisa masangano akaita se“Tomaz,” “Blog,” uye “Diagram.”
Entity linking inzira inotevera NER, apo masangano anocherechedzwa anoiswa mamepu kuti aenderane nezvakanyorwa mudhatabhesi kana ruzivo. Semuyenzaniso, “Tomaz” yakabatana ne“Tomaz Bratanic (Q12345)” uye “Blog” ku“Blog (Q321),” asi “Dhiagiramu” harina rinoenderana nenheyo yeruzivo.
Relationship extraction inhanho inotevera apo sisitimu inozivisa uye inobvisa hukama hwakakosha pakati pezvinhu zvinozivikanwa. Uyu muenzaniso unoratidza kuti "Tomaz" ane hukama ne "Blog" inoratidzwa ne "WRITES," zvichiratidza kuti Tomaz anonyora mablog. Pamusoro pezvo, inoratidza kuti "Tomaz" ane hukama ne "Diagraph" inoratidzwa ne "INTERESTED_IN," zvichiratidza kuti Tomaz anofarira madhayagiramu.
Pakupedzisira, ruzivo urwu rwakarongwa, kusanganisira masangano uye hukama hwavo, hunochengetwa mugirafu yezivo, zvichibvumira kurongeka uye kusvikirika data kuti iwedzere kuongororwa kana kutora.
Sechinyakare, pasina simba reLLMs, iyi nzira yese inotsamira pane sutu yemamodheru akasarudzika, imwe neimwe inobata basa rakananga kubva pakugadziriswa kwepakati kusvika pakubvisa hukama. Kunyange kubatanidza aya mamodheru zvinoda kushanda nesimba uye kubatana, zvinopa mukana wakakura: kuderedzwa mari. Nekugadzirisa zvidiki, mamodheru akanangana nebasa, mari yese yekuvaka nekuchengetedza sisitimu inogona kuchengetedzwa.
Iyo kodhi inowanikwa paGitHub .
Ini ndinokupa zano kuti ushandise yakaparadzana Python nharaunda seGoogle Colab , sezvo isu tichafanirwa kutamba tichitenderedza nekutsamira zvishoma. Iwo modhi anokurumidza paGPU, saka unogona kushandisa GPU-powered runtime kana uine Pro vhezheni.
Pamusoro pezvo, isu tinofanirwa kuseta Neo4j, yemuno graph database, kuchengetedza iyo yakatorwa ruzivo. Pane nzira dzakawanda dzekumisa yako database muenzaniso . Nekudaro, ini ndinokurudzira kushandisa Neo4j Aura , iyo inopa yemahara yegore muenzaniso inogona kuwanikwa nyore kubva kuGoogle Colab notebook.
Neo4j Aura - Yakanyatsogadziriswa Cloud Solution
Mushure mekunge dhatabhesi yagadzirwa, tinogona kutsanangura chinongedzo tichishandisa LlamaIndex:
from llama_index.graph_stores.neo4j import Neo4jPGStore username="neo4j" password="rubber-cuffs-radiator" url="bolt://54.89.19.156:7687" graph_store = Neo4jPGStore( username=username, password=password, url=url, refresh_schema=False )
Tichashandisa dataset yenhau yandakawana kuburikidza neDiffbot API imwe nguva yapfuura. Iyo dataset inowanikwa zviri nyore paGitHub kuti isu tishandise zvakare:
import pandas as pd NUMBER_OF_ARTICLES = 100 news = pd.read_csv( "https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/news_articles.csv" ) news = news.head(NUMBER_OF_ARTICLES)
Nhanho yekutanga mupombi ndeye coreference resolution modhi. Coreference resolution ibasa rekuona mazwi ese ari murugwaro anoreva chinhu chimwe chete.
Sekuziva kwangu, hapana akawanda akavhurika-sosi mamodheru anowanikwa ekugadzirisa coreference. Ndakaedza iyo maverick-coref , asi mumiedzo yangu Coreferee kubva ku spaCy yakashanda zvirinani, saka isu tichashandisa izvozvo. Izvo chete zvakashata zvekushandisa Coreferee ndezvekuti isu tinofanirwa kubata negehena rekutsamira, iro rinogadziriswa mubhuku rekunyorera, asi isu hatisi kuzopfuura naro pano.
Iwe unogona kurodha iyo coreference modhi mu spaCy neinotevera kodhi:
import spacy, coreferee coref_nlp = spacy.load('en_core_web_lg') coref_nlp.add_pipe('coreferee')
Iyo Coreferee modhi inoona mapoka ekutaura anoreva chinhu chimwe chete kana masangano. Kunyora zvakare zvinyorwa zvichibva pamasumbu aya, isu tinofanirwa kuita basa redu pachedu:
def coref_text(text): coref_doc = coref_nlp(text) resolved_text = "" for token in coref_doc: repres = coref_doc._.coref_chains.resolve(token) if repres: resolved_text += " " + " and ".join( [ t.text if t.ent_type_ == "" else [e.text for e in coref_doc.ents if t in e][0] for t in repres ] ) else: resolved_text += " " + token.text return resolved_text
Ngatiedzei basa kuti tive nechokwadi chekuti mamodheru uye zvinotsamira zvakamisikidzwa nemazvo:
print( coref_text("Tomaz is so cool. He can solve various Python dependencies and not cry") ) # Tomaz is so cool . Tomaz can solve various Python dependencies and not cry
Mumuenzaniso uyu, modhi yakaratidza kuti “Tomaz” na“Iye” zvinoreva chinhu chimwe chete. Tichishandisa coref_text basa, tinotsiva "Iye" ne "Tomaz."
Ziva kuti kunyora patsva hakuwanzo kudzorera mitsara yegiramatiki nekuda kwekushandisa nyore kutsiva logic kune masangano ari mukati me cluster. Nekudaro, inofanirwa kuve yakanaka zvakakwana kune mazhinji mascenaries.
Ikozvino isu tinoisa iyo coreference resolution kune yedu nhau dhata uye nekuputira zvabuda seLlamaIndex zvinyorwa:
from llama_index.core import Document news["coref_text"] = news["text"].apply(coref_text) documents = [ Document(text=f"{row['title']}: {row['coref_text']}") for i, row in news.iterrows() ]
Relik iraibhurari ine modhi ye entity inobatanidza (EL) uye hukama hwekuwedzera (RE), uye inotsigirawo mamodheru anosanganisa iwo maviri. Mukubatanidza mubatanidzwa, Wikipedia inoshandiswa sehwaro hweruzivo rwekunangwa kumepu masangano muzvinyorwa kune zvavanoenderana nazvo muencyclopedia.
Nekune rimwe divi, kutorwa kwehukama kunosanganisira kuziva nekuisa mumapoka hukama huri pakati pezvinyorwa mukati mezvinyorwa, zvichiita kuti kutorwa kweruzivo rwakarongwa kubva kune isina kurongeka data.
Kana uri kushandisa yemahara Colab vhezheni, shandisa relik-ie/relik-relation-extraction-small modhi, inoita chete hukama hwekubvisa. Kana iwe uine Pro vhezheni, kana iwe uchizoishandisa pamuchina wakasimba wemuno, unogona kuyedza relik-ie/relik-cie-small modhi, inoita mubatanidzwa wekubatanidza uye kuburitsa hukama.
from llama_index.extractors.relik.base import RelikPathExtractor relik = RelikPathExtractor( model="relik-ie/relik-relation-extraction-small" ) # Use on Pro Collab with GPU # relik = RelikPathExtractor( # model="relik-ie/relik-cie-small", model_config={"skip_metadata": True, "device":"cuda"} # )
Pamusoro pezvo, isu tinofanirwa kutsanangura modhi yekumisikidza iyo ichashandiswa kumisa masangano neLLM yekuyerera-kupindura mibvunzo:
import os from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.llms.openai import OpenAI os.environ["OPENAI_API_KEY"] = "sk-" llm = OpenAI(model="gpt-4o", temperature=0.0) embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")
Ziva kuti iyo LLM haizoshandiswa panguva yekuvakwa kwegirafu.
Iye zvino zvatine zvese zviri munzvimbo, tinogona kusimbisa PropertyGraphIndex uye kushandisa zvinyorwa zvenhau se data rekuisa kune ruzivo girafu.
Pamusoro pezvo, isu tinoda kupfuura iyo relik modhi seyo kg_extractors kukosha kuburitsa hukama:
from llama_index.core import PropertyGraphIndex index = PropertyGraphIndex.from_documents( documents, kg_extractors=[relik], llm=llm, embed_model=embed_model, property_graph_store=graph_store, show_progress=True, )
Mushure mekugadzira girafu, unogona kuvhura Neo4j Browser kuti isimbise iyo girafu inotengeswa kunze kwenyika. Iwe unofanirwa kuwana ratidziro yakafanana nekumhanyisa inotevera Cypher chirevo:
MATCH p=(:__Entity__)--(:__Entity__) RETURN p LIMIT 250
Results
Uchishandisa LlamaIndex, zvave nyore kuita mhinduro yemubvunzo. Kuti ushandise iyo yakasarudzika girafu yekutora, unogona kubvunza mibvunzo yakatwasuka se:
query_engine = index.as_query_engine(include_text=True) response = query_engine.query("What happened at Ryanair?") print(str(response))
Apa ndipo panopinda iyo LLM yakatsanangurwa uye yekumisikidza modhi. Ehe, iwe unogona zvakare kushandisa tsika retrievers kune zvingangoita zviri nani kunyatso.
Kugadzira magirafu ezivo pasina kuvimba neLLMs hazvigoneke chete asiwo zvinodhura uye zvinoshanda. Nekugadzirisa zvidiki-madiki, ebasa-rakananga mamodheru, akadai seaya ari muRelik chimiro, unogona kuwana yakakwira-inoshanda ruzivo ruzivo rwekudzoreredza-yakawedzera chizvarwa (RAG) application.
Kubatanidza masangano, inhanho yakakosha mukuita uku, kunovimbisa kuti masangano anocherechedzwa akanyatsoiswa mepu kune anowirirana zvinyorwa muhwaro hweruzivo, nokudaro kuchengetedza kutendeseka uye kushandiswa kwegirafu reruzivo.
Nekushandisa masisitimu akaita seRelik uye mapuratifomu akadai seNeo4j, zvinogoneka kugadzira magirafu eruzivo rwepamusoro anofambisa kuongororwa kwedata rakaomarara uye mabasa ekutora, zvese pasina mutengo wakakwira unowanzo hukama nekuendesa maLLM. Iyi nzira haingoite kuti maturusi ekugadzirisa data awedzere kuwanikwa asi zvakare inosimudzira hunyanzvi uye kugona mukuburitsa ruzivo kufambiswa kwebasa.
Ita shuwa yekupa Relik raibhurari nyeredzi . Iyo kodhi inowanikwa paGitHub .
Kuti udzidze zvakawanda nezvenyaya iyi, tibatane nesu paNODES 2024 muna Mbudzi 7, yedu yemahara virtual developer musangano pane hungwaru maapplication, ruzivo magirafu, uye AI. Register NOW !