Kule bhlog, siza kukubonisa indlela yokwenza i-codebase yeRAG ngeCocoIndex. I-CocoIndex sisixhobo sokukunceda ukuba ubonise kwaye ubuze idatha yakho. Yenzelwe ukuba isetyenziswe njengesakhelo sokwakha owakho umbhobho wedatha. I-CocoIndex ibonelela ngenkxaso eyakhelwe-ngaphakathi kwisiseko sekhowudi ye-chunking, kunye nenkxaso yemveli ye-Tree-sitter. Umgcini- mithi sisixhobo sokuvelisa i-parser kunye nelayibrari eyongezelelweyo yokwahlula, ifumaneka kwi-Rust 🦀 - . I-CocoIndex ine-Rust eyakhelwe-ngaphakathi indibaniselwano kunye ne-Tree-sitter ukucazulula ngokufanelekileyo ikhowudi kunye nokukhupha imithi ye-syntax kwiilwimi ezahlukeneyo zokucwangcisa. I-Tree-sitter GitHub I-Codebase chunking yinkqubo yokwahlulahlula isiseko sekhowudi sibe ngamaqhekeza amancinci, anentsingiselo yesemantiki. I-CocoIndex iphakamisa amandla omhlali woMthi ukuba adibanise ngobukrelekrele ikhowudi esekwe kulwakhiwo lwe-syntax yokwenyani kunokuba kuqhawuke umgca ongekho mthethweni. Ezi ziqwenga ezihambelanayo ngokwesemantiki ziye zisetyenziswe ukwakha isalathiso esisebenza ngakumbi kwiinkqubo zeRAG, okwenza ukuba ukufunyanwa kwekhowudi echane ngakumbi kunye nokugcinwa kwemeko engcono. Ukudlula ngokukhawuleza 🚀 - ungayifumana ikhowudi epheleleyo . Kuphela ~ imigca engama-50 yekhowudi yePython yombhobho weRAG, yijonge 🤗! apha Nceda unike inkwenkwezi ukusixhasa ukuba uyawuthanda umsebenzi wethu. Enkosi kakhulu nge coconut hug eshushu 🥥🤗. iCocoIndex kwiGithub Izinto ezifunekayo kuqala Ukuba awunayo i-Postgres efakiweyo, nceda ubhekisele . I-CocoIndex isebenzisa i-Postgres ukulawula isalathisi sedatha, sinayo kwindlela yethu yokuxhasa ezinye iindawo zolwazi, kuquka eziqhubekayo. Ukuba unomdla kwezinye iidatabase, nceda usazise ngokwenza okanye . kwisikhokelo sofakelo umba weGitHub iDiscord Chaza -cocoIndex ukuhamba kwe Masichaze ukuhamba kwe-cocoIndex ukufunda kwi-codebase kwaye isalathisi se-RAG. Umzobo ohambayo ungentla ubonisa indlela esiza kuyiqhuba ngayo ikhowudi yesiseko sethu: Funda iifayile zekhowudi kwindlela yefayile yendawo Khipha izandiso zefayile Yahlula ikhowudi ibe ziziqwenga zesemantic usebenzisa i-Tree-sitter Yenza izifakelo kwiqhekeza ngalinye Gcina kwisiseko sedatha yevektha ukuze ufunyanwe Masiphumeze oku kuhamba ngenyathelo. 1. Yongeza i-codebase njengomthombo @cocoindex.flow_def(name="CodeEmbedding") def code_embedding_flow(flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope): """ Define an example flow that embeds files into a vector database. """ data_scope["files"] = flow_builder.add_source( cocoindex.sources.LocalFile(path="../..", included_patterns=["*.py", "*.rs", "*.toml", "*.md", "*.mdx"], excluded_patterns=[".*", "target", "**/node_modules"])) code_embeddings = data_scope.add_collector() Kulo mzekelo, siza kusalathisa i-codebase ye-cocoindex ukusuka kulawulo lweengcambu. Ungatshintsha umendo kwi-codebase ofuna ukuyichaza. Siza kuzalathisa zonke iifayile ngezongezo ze , , , , , kwaye sitsibe abalawuli ukuqala nge ., target (kwingcambu) kunye node_modules (phantsi kwalo naluphi na ulawulo). .py .rs .toml .md .mdx iyakwenza itheyibhile enemihlaba engaphantsi ilandelayo, bona apha. flow_builder.add_source uxwebhu (isitshixo, uhlobo: ): igama lefayile yefayile, umz. filename str dir1/file1.md (uhlobo: ukuba Yiyo , kungenjalo ): umxholo wefayile content str binary False bytes 2. Yenza ifayile nganye kwaye uqokelele ulwazi 2.1 Khupha ulwandiso lwegama lefayile Kuqala makhe sichaze umsebenzi wokukhupha ulwandiso lwegama lefayile ngelixa kusenziwa ifayile nganye. Ungafumana uxwebhu lomsebenzi wesiko . apha @cocoindex.op.function() def extract_extension(filename: str) -> str: """Extract the extension of a filename.""" return os.path.splitext(filename)[1] Emva koko siza kucubungula ifayile nganye kwaye siqokelele ulwazi. # ... with data_scope["files"].row() as file: file["extension"] = file["filename"].transform(extract_extension) Apha sikhupha isandiso segama lefayile kwaye siyigcine kwindawo . umzekelo, ukuba igama lefayile yi , indawo iya kuba . extension spec.rs extension .rs 2.2 Yahlula ifayile ibe ziziqwenga Okulandelayo, siza kwahlula ifayile ibe ziinqununu. Sisebenzisa umsebenzi ukwahlula ifayile ibe ziziqwenga. Ungafumana uxwebhu lomsebenzi . SplitRecursively apha I-CocoIndex inikezela ngenkxaso eyakhelwe-ngaphakathi kwi-Tree-sitter, ngoko unokudlula ngolwimi kwipharamitha . Ukubona onke amagama olwimi axhaswayo kunye nezandiso, jonga amaxwebhu . Zonke iilwimi eziphambili ziyaxhaswa, umzekelo, iPython, Rust, JavaScript, TypeScript, Java, C++, njl. Ukuba ayichazwanga okanye ulwimi olukhankanyiweyo aluxhaswanga, luya kuphathwa njengombhalo ocacileyo. language apha with data_scope["files"].row() as file: # ... file["chunks"] = file["content"].transform( cocoindex.functions.SplitRecursively(), language=file["extension"], chunk_size=1000, chunk_overlap=300) 2.3 Zinzisa amaqhekeza Siza kusebenzisa umsebenzi ukubethelela iziqwenga. Ungafumana uxwebhu lomsebenzi . Kukho iimodeli ezili-12k ezixhaswa ngu 🤗 . Unokukhetha nje imodeli oyithandayo. SentenceTransformerEmbed apha Ubuso obuHugging def code_to_embedding(text: cocoindex.DataSlice) -> cocoindex.DataSlice: """ Embed the text using a SentenceTransformer model. """ return text.transform( cocoindex.functions.SentenceTransformerEmbed( model="sentence-transformers/all-MiniLM-L6-v2")) Emva koko kwichunk nganye, siya kuyizinzisa sisebenzisa umsebenzi. kwaye uqokelele izinto ezizinzisiweyo kumqokeleli . code_to_embedding code_embeddings Sikhupha le khowudi_to_embedding umsebenzi endaweni yokubiza ngokuthe ngqo uguqulo(cocoindex.functions.SentenceTransformerEmbed(...)) endaweni. Oku kungenxa yokuba sifuna ukwenza le yabelwane phakathi kwesakhiwo somqukuqelo wesalathiso kunye nenkcazo yomphathi wombuzo. Okanye, ukwenza kube lula. Kulungile ukunqanda lo msebenzi uwongezelelweyo kwaye wenze izinto ngokuthe ngqo endaweni-ayisiyonto inkulu ukukopa uncamathisele kancinci, siyenzele iprojekthi . yokuqalisa ngokukhawuleza with data_scope["files"].row() as file: # ... with file["chunks"].row() as chunk: chunk["embedding"] = chunk["text"].call(code_to_embedding) code_embeddings.collect(filename=file["filename"], location=chunk["location"], code=chunk["text"], embedding=chunk["embedding"]) 2.4 Qokelela izifakelo Okokugqibela, masikhuphele ngaphandle izinto ezizinzisiweyo kwitafile. code_embeddings.export( "code_embeddings", cocoindex.storages.Postgres(), primary_key_fields=["filename", "location"], vector_index=[("embedding", cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY)]) 3. Cwangcisa i-Query Handler sakho kwisalathiso Siza kusebenzisa ukubuza ngesalathiso. Qaphela ukuba kufuneka sidlule umsebenzi kwi- parameter. Oku kungenxa yokuba umbambi wombuzo uya kusebenzisa imodeli yokufakela efanayo njengaleyo isetyenziswe ekuhambeni. SimpleSemanticsQueryHandler code_to_embedding query_transform_flow query_handler = cocoindex.query.SimpleSemanticsQueryHandler( name="SemanticsSearch", flow=code_embedding_flow, target_name="code_embeddings", query_transform_flow=code_to_embedding, default_similarity_metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY) Chaza umsebenzi ongundoqo wokuqhuba umbuzo wesiphathi. @cocoindex.main_fn() def _run(): # Run queries in a loop to demonstrate the query capabilities. while True: try: query = input("Enter search query (or Enter to quit): ") if query == '': break results, _ = query_handler.search(query, 10) print("\nSearch results:") for result in results: print(f"[{result.score:.3f}] {result.data['filename']}") print(f" {result.data['code']}") print("---") print() except KeyboardInterrupt: break if __name__ == "__main__": load_dotenv(override=True) _run() Umhombi @ cocoindex.main_fn() uqalisa ilayibrari ngoseto olulayishiweyo ukusuka kwizinto eziguquguqukayo zokusingqongileyo. Jonga ngeenkcukacha ezithe vetshe. uxwebhu lokuqalisa Qalisa ukuseta isalathisi kunye nokuhlaziya 🎉 Ngoku sele ulungile! Sebenzisa imiyalelo elandelayo ukuseta nokuhlaziya isalathisi. python main.py cocoindex setup python main.py cocoindex update Uyakubona isalathiso sohlaziyo lwemeko kwi-terminal Vavanya umbuzo Okwangoku, ungaqala iseva ye-coindex kwaye uphuhlise ixesha lakho le-RAG lokubaleka ngokuchasene nedatha. Ukuvavanya isalathisi sakho, kukho iindlela ezimbini onokukhetha kuzo: Inketho 1: Sebenzisa iseva yesalathiso kwi -terminal python main.py Xa ubona umyalezo, ungafaka umbuzo wokukhangela. umzekelo: spec. Enter search query (or Enter to quit): spec Ungafumana iziphumo zophendlo kwi-terminal Iziphumo ezibuyisiweyo - ungeno ngalunye luqulathe amanqaku (Ukufana kweCosine), igama lefayile, kunye nekhowudi yesnippet edibanayo. Kwi-cocoindex, sisebenzisa i ukulinganisa ukufana phakathi kombuzo kunye nedatha enesalathisi. Ungatshintshela kwezinye iimetrics kwaye ngokukhawuleza uyivavanye. cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY Ukuze ufunde okungakumbi malunga nokufana kweConsine, bona . i-Wiki Inketho yesi-2: Qhuba iCocoInsight ukuze uqonde umbhobho wedatha yakho kunye sedatha nesalathisi I-CocoInsight sisixhobo sokukunceda uqonde umbhobho wakho wedatha kunye nesalathisi sedatha. Iqhagamshela kwiseva yakho yeCocoIndex yendawo kunye nokugcinwa kwedatha engu-zero. I-CocoInsight ikwi-Early Access ngoku (Simahla) 😊 Usifumene! Isifundo sevidiyo esikhawulezayo semizuzu emi-3 malunga neCocoInsight: . Bukela kwiYouTube Qalisa yeCocoIndex iseva python main.py cocoindex server -c https://cocoindex.io Nje ukuba iseva isebenze, vula kwisikhangeli sakho. Uya kukwazi ukuqhagamshela kwiseva yakho yeCocoIndex kwaye ujonge umbhobho wakho wedatha kunye nesalathiso. iCocoInsight Kwicala lasekunene, unokubona ukuhamba kwedatha esiyichazile. Kwicala lasekhohlo, unokubona isalathisi sedatha kwi-preview yedatha. Unokucofa kuwo nawuphi na umqolo ukuze ubone iinkcukacha zolo ngeniso lwedatha, kubandakanywa umxholo opheleleyo weekhowudi zekhowudi kunye nokufakela kwazo. Uluntu Siyakuthanda ukuva kuluntu! Ungasifumana kunye . kwiGithub neDiscord Ukuba uthanda le post kunye nomsebenzi wethu, nceda uxhase yinkwenkwezi ⭐. Enkosi nge coconut hug eshushu 🥥🤗. iCocoIndex kwiGithub