Graph retrieval-augmented generation ( GraphRAG ) yi kuma matimba naswona yi hundzuka ku engeteriwa ka matimba eka tindlela ta ndhavuko to vuyisa ku lavisisa ka vector. Endlelo leri ri tirhisa muxaka lowu hleriweke wa tidathabeyisi ta girafu, leti hlela datha tanihi tinodi na vuxaka, ku ndlandlamuxa vuenti na xiyimo xa mahungu lama vuyiseriweke.
Tigirafu tikulu eka ku yimela na ku hlayisa mahungu lama nga faniki na lama hlanganisiweke hi ndlela leyi hlelekeke, handle ko endla matshalatshala yo khoma vuxaka byo rharhangana na swihlawulekisi eka tinxaka to hambana ta datha. Kuhambana na sweswo, tidathabeyisi ta ti vector ti tala ku lwisana na mahungu yo tano lama hlelekeke, tani hi leswi matimba ya tona ya nga eku khomeni ka datha leyi nga hlelekangiki hi ku tirhisa ti vector ta xiyimo xa le henhla. Eka xitirhisiwa xa wena xa RAG, u nga hlanganisa datha ya girafu leyi hleriweke na ku lavisisa ka vector hi ku tirhisa matsalwa lama nga hlelekangiki ku fikelela leswinene swa misava leyimbirhi. Sweswo hi leswi hi nga ta swi kombisa eka hungu leri ra blog.
Ku aka girafu ya vutivi hi ntolovelo i goza leri tlhontlhaka swinene. Swi katsa ku hlengeleta na ku hlela datha, leswi lavaka ku twisisa loku dzikeke ka havumbirhi bya domain na ku modela girafu.
Ku olovisa endlelo leri, hi vile hi ringeta ti-LLM. Hi ku twisisa ka tona lokukulu ka ririmi na mongo, ti-LLM ti nga endla leswaku swiphemu swa nkoka swa endlelo ro tumbuluxa girafu ya vutivi hi ku tirhisa michini. Hi ku xopaxopa datha ya matsalwa, timodeli leti ti nga kuma swiyenge, ku twisisa vuxaka bya swona, na ku ringanyeta hilaha swi nga yimeriwaka hakona kahle eka xivumbeko xa girafu.
Hikwalaho ka swikambelo leswi, hi engeterile vhidiyo yo sungula ya modyuli yo aka girafu eka LangChain, leyi hi nga ta yi kombisa eka hungu leri ra blog.
Khodi yi kumeka eka GitHub .
U fanele ku veka xikombiso xa Neo4j. Landzelela na swikombiso eka hungu leri ra blog. Ndlela yo olova i ku sungula xikombiso xa mahala eka Neo4j Aura , lexi nyikaka swikombiso swa le mapapa swa database ya Neo4j. Hi ndlela yin’wana, u nga ha tlhela u veka xikombiso xa laha kaya xa database ya Neo4j hi ku dawuniloda xitirhisiwa xa Neo4j Desktop na ku tumbuluxa xikombiso xa database ya laha kaya.
os.environ["OPENAI_API_KEY"] = "sk-" os.environ["NEO4J_URI"] = "bolt://localhost:7687" os.environ["NEO4J_USERNAME"] = "neo4j" os.environ["NEO4J_PASSWORD"] = "password" graph = Neo4jGraph()
Ku engetela kwalaho, u fanele ku nyika xilotlelo xa OpenAI , tanihileswi hi nga ta tirhisa timodeli ta vona eka hungu leri ra blog.
Eka nkombiso lowu, hi ta tirhisa tluka ra Wikipedia ra Elizabeth wo Sungula . Hi nga tirhisa LangChain loaders ku teka na ku avanyisa matsalwa ku suka eka Wikipedia hi ndlela yo pfumala swiphiqo.
# Read the wikipedia article raw_documents = WikipediaLoader(query="Elizabeth I").load() # Define chunking strategy text_splitter = TokenTextSplitter(chunk_size=512, chunk_overlap=24) documents = text_splitter.split_documents(raw_documents[:3])
I nkarhi wo aka girafu leyi sekeriweke eka matsalwa lama vuyiseriweke. Hi xikongomelo lexi, hi tirhisile LLMGraphTransformermodule leyi olovisaka swinene ku aka na ku hlayisa girafu ya vutivi eka database ya girafu.
llm=ChatOpenAI(temperature=0, model_name="gpt-4-0125-preview") llm_transformer = LLMGraphTransformer(llm=llm) # Extract graph data graph_documents = llm_transformer.convert_to_graph_documents(documents) # Store to neo4j graph.add_graph_documents( graph_documents, baseEntityLabel=True, include_source=True )
U nga hlamusela leswaku hi yihi LLM leyi u lavaka leswaku nketani ya ku tumbuluxa girafu ya vutivi yi yi tirhisa. Sweswi, hi seketela ntsena timodeli ta ku vitana ntirho ku suka eka OpenAI na Mistral. Hambiswiritano, hi kunguhata ku ndlandlamuxa nhlawulo wa LLM enkarhini lowu taka. Eka xikombiso lexi, hi tirhisa GPT-4 ya sweswinyana. Xiya leswaku khwalithi ya girafu leyi endliweke yi titshege swinene hi modele lowu u wu tirhisaka. Hi ku ya hi mianakanyo, minkarhi hinkwayo u lava ku tirhisa loyi a nga ni vuswikoti swinene. Swihundzuluxi swa girafu swa LLM swi vuyisa matsalwa ya girafu, lawa ya nga nghenisiwa eka Neo4j hi ku tirhisa ndlela ya add_graph_documents. Parameter ya baseEntityLabel yi avela ku engetela
U nga kambela girafu leyi endliweke eka Neo4j Browser.
Xiya leswaku xifaniso lexi xi yimela ntsena xiphemu xa girafu leyi endliweke.
Endzhaku ka ku tumbuluxiwa ka girafu, hi ta tirhisa endlelo ro vuyisa ra xihlanganisi leri hlanganisaka ti-index ta vector na rito ra nkoka na ku vuyisa girafu eka switirhisiwa swa RAG.
Xifaniso xi kombisa endlelo ro vuyisa ku sungula hi mutirhisi a vutisa xivutiso, lexi endzhaku xi kongomisiwaka eka muvuyisa wa RAG. Retriever leyi yi tirhisa ku lavisisa ka rito ra nkoka na vector ku lavisisa eka datha ya matsalwa leyi nga hlelekangiki no yi hlanganisa na mahungu lawa yi ma hlengeletaka eka girafu ya vutivi. Tanihi leswi Neo4j yi kombisaka ti-index ta rito ra nkoka na ta vector, u nga tirhisa swihlawulekisi hinkwaswo swinharhu swo vuyisa hi sisiteme yin’we ya database. Data leyi hlengeletiweke ku suka eka swihlovo leswi yi nghenisiwa eka LLM ku tumbuluxa na ku tisa nhlamulo yo hetelela.
U nga tirhisa ndlela ya Neo4jVector.from_existing_graph ku engetela havumbirhi bya rito ra nkoka na ku vuyisa vector eka matsalwa. Endlelo leri ri lulamisa ti-index ta ku lavisisa ka rito ra nkoka na ta vector eka endlelo ro lavisisa ra xihlanganisi, ri kongomisa eka tinotsi leti tsariweke Tsalwa. Ku engetela kwalaho, yi hlayela mimpimo ya ku nghenisa matsalwa loko yi kayivela.
vector_index = Neo4jVector.from_existing_graph( OpenAIEmbeddings(), search_type="hybrid", node_label="Document", text_node_properties=["text"], embedding_node_property="embedding" )
Kutani index ya vector yi nga vitaniwa hi ndlela ya similarity_search.
Hi hala tlhelo, ku lulamisa ku vuyisa girafu swi katseka ngopfu kambe swi nyika ntshunxeko wo tala. Xikombiso lexi xi ta tirhisa xikombo xa matsalwa hinkwawo ku kuma tinodi leti faneleke na ku vuyisa vaakelani va tona vo kongoma.
Muvuyisi wa girafu u sungula hi ku kuma swiyenge leswi faneleke eka xingheniso. Leswaku swi olova, hi lerisa LLM ku kuma vanhu, tinhlengeletano ni tindhawu. Ku fikelela leswi, hi ta tirhisa LCEL na ndlela leyintshwa leyi engeteriweke ya with_structured_output ku fikelela leswi.
# Extract entities from text class Entities(BaseModel): """Identifying information about entities.""" names: List[str] = Field( ..., description="All the person, organization, or business entities that " "appear in the text", ) prompt = ChatPromptTemplate.from_messages( [ ( "system", "You are extracting organization and person entities from the text.", ), ( "human", "Use the given format to extract information from the following " "input: {question}", ), ] ) entity_chain = prompt | llm.with_structured_output(Entities)
A hi swi ringeteni:
entity_chain.invoke({"question": "Where was Amelia Earhart born?"}).names # ['Amelia Earhart']
Swikulu, sweswi hi nga kuma swiyenge eka xivutiso, a hi tirhiseni xikombo xa matsalwa hinkwawo ku swi mepe eka girafu ya vutivi. Xo sungula, hi fanele ku hlamusela index ya matsalwa hinkwawo na ntirho lowu nga ta tumbuluxa swivutiso swa matsalwa hinkwawo leswi pfumelelaka ku tsariwa ka mapeletelo loku hoxeke nyana, leswi hi nga ta ka hi nga ngheni eka vuxokoxoko byo tala laha.
graph.query( "CREATE FULLTEXT INDEX entity IF NOT EXISTS FOR (e:__Entity__) ON EACH [e.id]") def generate_full_text_query(input: str) -> str: """ Generate a full-text search query for a given input string. This function constructs a query string suitable for a full-text search. It processes the input string by splitting it into words and appending a similarity threshold (~2 changed characters) to each word, then combines them using the AND operator. Useful for mapping entities from user questions to database values, and allows for some misspelings. """ full_text_query = "" words = [el for el in remove_lucene_chars(input).split() if el] for word in words[:-1]: full_text_query += f" {word}~2 AND" full_text_query += f" {words[-1]}~2" return full_text_query.strip()
A hi swi hlanganiseni hinkwaswo sweswi.
# Fulltext index query def structured_retriever(question: str) -> str: """ Collects the neighborhood of entities mentioned in the question """ result = "" entities = entity_chain.invoke({"question": question}) for entity in entities.names: response = graph.query( """CALL db.index.fulltext.queryNodes('entity', $query, {limit:2}) YIELD node,score CALL { MATCH (node)-[r:!MENTIONS]->(neighbor) RETURN node.id + ' - ' + type(r) + ' -> ' + neighbor.id AS output UNION MATCH (node)<-[r:!MENTIONS]-(neighbor) RETURN neighbor.id + ' - ' + type(r) + ' -> ' + node.id AS output } RETURN output LIMIT 50 """, {"query": generate_full_text_query(entity)}, ) result += "\n".join([el['output'] for el in response]) return result
Ntirho wa structured_retriever wu sungula hi ku kuma swiyenge eka xivutiso xa mutirhisi. Endzhaku ka sweswo, yi vuyelela ehenhla ka swiyenge leswi kumiweke naswona yi tirhisa xifaniso xa Cypher ku vuyisa vaakelani va tinotsi leti faneleke. A hi swi ringeteni!
print(structured_retriever("Who is Elizabeth I?")) # Elizabeth I - BORN_ON -> 7 September 1533 # Elizabeth I - DIED_ON -> 24 March 1603 # Elizabeth I - TITLE_HELD_FROM -> Queen Of England And Ireland # Elizabeth I - TITLE_HELD_UNTIL -> 17 November 1558 # Elizabeth I - MEMBER_OF -> House Of Tudor # Elizabeth I - CHILD_OF -> Henry Viii # and more...
Hilaha swi boxiweke hakona eku sunguleni, hi ta hlanganisa unstructured na graph retriever ku tumbuluxa mongo wo hetelela lowu hundziseriweke eka LLM.
def retriever(question: str): print(f"Search query: {question}") structured_data = structured_retriever(question) unstructured_data = [el.page_content for el in vector_index.similarity_search(question)] final_data = f"""Structured data: {structured_data} Unstructured data: {"#Document ". join(unstructured_data)} """ return final_data
Loko hi tirhana na Python, hi nga ha hlanganisa ntsena swihumesi hi ku tirhisa f-string.
Hi tirhisile hi ndlela leyi humelelaka xiphemu xo vuyisa xa RAG. Endzhaku ka sweswo, hi nghenisa xitsundzuxo lexi tirhisaka xiyimo lexi nyikiweke hi xitirhisiwa xa xihlanganisi lexi hlanganisiweke ku humesa nhlamulo, ku hetisisa ku tirhisiwa ka ketane ya RAG.
template = """Answer the question based only on the following context: {context} Question: {question} """ prompt = ChatPromptTemplate.from_template(template) chain = ( RunnableParallel( { "context": _search_query | retriever, "question": RunnablePassthrough(), } ) | prompt | llm | StrOutputParser() )
Eku heteleleni, hi nga ya emahlweni hi kambela ku tirhisiwa ka hina ka RAG ya xihlanganisi.
chain.invoke({"question": "Which house did Elizabeth I belong to?"}) # Search query: Which house did Elizabeth I belong to? # 'Elizabeth I belonged to the House of Tudor.'
Ndzi tlhele ndzi nghenisa xivumbeko xo tsala swivutiso hi vuntshwa, ku endla leswaku ketane ya RAG yi kota ku pfumelelana na swiletelo swa mbulavurisano leswi pfumelelaka swivutiso swo landzelerisa. Hi ku tekela enhlokweni leswaku hi tirhisa tindlela to lavisisa ta vector na keyword, hi fanele ku tsala nakambe swivutiso swo landzelerisa ku antswisa endlelo ra hina ro lavisisa.
chain.invoke( { "question": "When was she born?", "chat_history": [("Which house did Elizabeth I belong to?", "House Of Tudor")], } ) # Search query: When was Elizabeth I born? # 'Elizabeth I was born on 7 September 1533.'
U nga xiya leswaku U velekiwe rini? yi tsariwe nakambe ro sungula eka Xana Elizabeth wo Sungula u velekiwe rini? . Kutani xivutiso lexi tsariweke nakambe xi tirhisiwile ku vuyisa mongo lowu faneleke na ku hlamula xivutiso.
Hi ku sunguriwa ka LLMGraphTransformer, endlelo ro tumbuluxa tigirafu ta vutivi sweswi ri fanele ku olova no fikeleleka, leswi endlaka leswaku swi olova eka mani na mani loyi a lavaka ku ndlandlamuxa matirhiselo ya yena ya RAG hi vuenti na mongo lowu tigirafu ta vutivi ti wu nyikaka. Leswi i masungulo ntsena tani hi leswi hi nga na ku antswisiwa ko tala loku ku kunguhatiweke.
Loko u ri na vutivi, switsundzuxo, kumbe swivutiso mayelana na ku tumbuluxa tigirafu ta hina leti nga na ti-LLM, hi kombela u nga kanakani ku fikelela.
Khodi yi kumeka eka