Ukusebenzisa iigrafu zolwazi ukuphucula iziphumo ze-retrieval-augmented generation (RAG) izicelo ziye zaba ngumxholo oshushu. Imizekelo emininzi ibonisa indlela yokwakha igrafu yolwazi kusetyenziswa inani elincinane lamaxwebhu. Oku kungenxa yokuba indlela eqhelekileyo - ukukhupha ulwazi olucokisekileyo, olugxile kwiziko - alunasikali. Ukuqhuba uxwebhu ngalunye ngomzekelo ukukhupha amaziko (iindawo) kunye nobudlelwane (imiphetho) kuthatha ixesha elide (kwaye kubiza kakhulu) ukuqhuba kwiidatha ezinkulu.
Siye saxabana oko
Ukulayisha amaxwebhu kwivenkile yegrafu yequmrhu efana ne-Neo4j yenziwe kusetyenziswa iLLMGraphTransformer yeLangChain. Ikhowudi isekelwe kwiLangChain's
from langchain_core.documents import Document from langchain_experimental.graph_transformers import LLMGraphTransformer from langchain_openai import ChatOpenAI llm = ChatOpenAI(temperature=0, model_name="gpt-4-turbo") llm_transformer = LLMGraphTransformer(llm=llm) from time import perf_counter start = perf_counter() documents_to_load = [Document(page_content=line) for line in lines_to_load] graph_documents = llm_transformer.convert_to_graph_documents(documents_to_load) end = perf_counter() print(f"Loaded (but NOT written) {NUM_LINES_TO_LOAD} in {end - start:0.2f}s")
Ukulayisha idatha kwi-GraphVectorStore kufana nje nokulayisha kwivenkile ye-vector. Ekuphela kwento eyongeziweyo kukuba sidibanisa imetadata ebonisa ukuba iphepha ngalinye lidibana njani namanye amaphepha.
import json from langchain_core.graph_vectorstores.links import METADATA_LINKS_KEY, Link def parse_document(line: str) -> Document: para = json.loads(line) id = para["id"] links = { Link.outgoing(kind="href", tag=id) for m in para["mentions"] if m["ref_ids"] is not None for id in m["ref_ids"] } links.add(Link.incoming(kind="href", tag=id)) return Document( id = id, page_content = " ".join(para["sentences"]), metadata = { "content_id": para["id"], METADATA_LINKS_KEY: list(links) }, )
Lo ngumzekelo omhle wendlela onokuthi wongeze ngayo amakhonkco akho phakathi kweenodi.
from langchain_openai import OpenAIEmbeddings from langchain_community.graph_vectorstores.cassandra import CassandraGraphVectorStore import cassio cassio.init(auto=True) TABLE_NAME = "wiki_load" store = CassandraGraphVectorStore( embedding = OpenAIEmbeddings(), node_table=TABLE_NAME, insert_timeout = 1000.0, ) from time import perf_counter start = perf_counter() from datasets.wikimultihop.load import parse_document kg_documents = [parse_document(line) for line in lines_to_load] store.add_documents(kg_documents) end = perf_counter() print(f"Loaded (and written) {NUM_LINES_TO_LOAD} in {end - start:0.2f}s")
Ukuqhuba kwimigqa ye-100, indlela ye-entric-centric esebenzisa i-GPT-4o ithathe i-405.93s ukukhupha i-GraphDocuments kunye ne-10.99s ukuyibhalela i-Neo4j, ngelixa i-content-centric approach ithatha i-1.43s. Ukongezelela, kuya kuthatha iiveki ezingama-41 ukulayisha onke amaphepha angama-5,989,847 kusetyenziswa indlela yequmrhu eliphakathi kunye neeyure ezingama-24 kusetyenziswa indlela yokujonga umxholo. Kodwa ngenxa yeparallelism, indlela yokujonga umxholo isebenza kwiiyure ezi-2.5 kuphela! Ukuthatha izibonelelo ezifanayo zokunxulunyaniswa, kusenokuthatha ngaphezulu kweeveki ezine ukulayisha yonke into kusetyenziswa indlela yequmrhu. Khange ndiyizame kuba iindleko eziqikelelweyo ziya kuba yi-58,700 yeedola - ndicinga ukuba yonke into yasebenza okokuqala!
Umgca ongezantsi: indlela yequmrhu-centric yokukhupha iigrafu zolwazi kumxholo usebenzisa i-LLM yayikubini ixesha kunye neendleko ezingavumelekanga kwisikali. Ngakolunye uhlangothi, ukusebenzisa i-GraphVectorStore yayikhawuleza kwaye ingabizi.
Kweli candelo, imibuzo embalwa, ethathwe kwi-subset yamaxwebhu alayishiwe, iyacelwa ukuba ijongane nomgangatho weempendulo.
I-Entity-centric isetyenziswe i-7324 i-tokens ekhawulezayo kwaye ixabisa i-$ 0.03 ukuvelisa iimpendulo ezingenamsebenzi ngokusisiseko, ngelixa umxholo-centric usebenzisa i-450 iithokheni ezikhawulezayo kunye nexabiso le-$ 0.002 ukuvelisa iimpendulo ezimfutshane ukuphendula imibuzo ngokuthe ngqo.
Kungamangalisa ukuba igrafu ye-Neo4j ecolekileyo ibuyisela iimpendulo ezingenamsebenzi. Ukujonga ukugawulwa kwemithi kwikhonkco, sibona ezinye zezizathu zokuba oku kusenzeka:
> Entering new GraphCypherQAChain chain... Generated Cypher: cypher MATCH (a:Album {id: 'The Circle'})-[:RELEASED_BY]->(r:Record_label) RETURN a.id, r.id Full Context: [{'a.id': 'The Circle', 'r.id': 'Restless'}] > Finished chain. {'query': "When was 'The Circle' released?", 'result': "I don't know the answer."}
Ke, i-schema-grained schema ibuyise kuphela ulwazi malunga neleyibhile yerekhodi. Iyavakala into yokuba iLLM ayikwazanga ukuphendula umbuzo ngokusekelwe kulwazi olufunyenweyo.
Ukukhupha iigrafu zolwazi ezichanekileyo, ezingqamene nequmrhu lixesha- kunye neendleko-zingavumelekanga kwisikali. Xa ebuzwa imibuzo malunga ne-subset yedatha elayishiwe, i-granularity eyongezelelweyo (kunye neendleko ezongezelelweyo zokulayisha igrafu ecocekileyo) ibuyisele amathokheni amaninzi ukuba abandakanye i-prompt kodwa ivelise iimpendulo ezingenamsebenzi!
I-GraphVectorStore ithatha i-coarse-grained, indlela yomxholo-centric eyenza kube lula kwaye kulula ukwakha igrafu yolwazi. Unokuqala ngekhowudi yakho ekhoyo yokuzalisa iVectorStore usebenzisa iLangChain kwaye ungeze amakhonkco (imiphetho) phakathi kweenqununu ukuphucula inkqubo yokubuyisela.
I-RAG yeGrafu sisixhobo esiluncedo sokwenza usetyenziso lwe-AI RAG oluvelisayo ukuba lufumane kwakhona iimeko ezifanelekileyo. Kodwa ukusebenzisa i-grained-grained, indlela yequmrhu-centric ayilinganisi kwiimfuno zemveliso. Ukuba ujonge ukongeza ulwazi lwegrafu kwisicelo sakho seRAG, zama
NguBen Chambers , DataStax