Ba communautés, ba chats, na ba forums ezali source sans fin ya information na ebele ya ba sujets. Mbala mingi Slack ezwa esika ya mikanda ya tekiniki, mpe masanga ya Telegram mpe Discord esalisaka na masano, kobanda, crypto, mpe mituna ya mobembo. Atako makambo oyo moto amonaka na yango ezali na ntina, mbala mingi ezalaka mpenza na ebongiseli te, mpe yango esalaka ete ezala mpasi mpo na kolukaluka na kati. Na lisolo oyo, tokotala ba complexités ya ko mettre en œuvre bot ya Telegram oyo ekozwa biyano na mituna na kobimisaka ba informations na histoire ya ba messages ya chat. Talá mikakatano oyo ezali kozela biso: . Eyano ekoki kopalangana na masolo ya bato mingi to na lien na ba ressources ya libanda. Luka ba messages oyo etali yango . Ezali na ba spam mpe ba off-topics ebele, oyo esengeli toyekola ko identifier mpe ko filtrer Ko ignorer offtopic . Ba informations ekomi ya kala. Ndenge nini oyebi eyano ya malamu tii lelo? Kopesa makambo na esika ya liboso tozo kende ko mettre en œuvre Basic chatbot userflow Mosaleli atuni bot motuna moko Bot yango ezwa biyano oyo eleki penepene na lisolo ya ba messages Bot ezo résumer ba résultats ya recherche na aide ya LLM Ezongisaka na mosaleli eyano ya suka na ba liens ya ba messages oyo etali yango Totambola na ba étapes minene ya flux oyo ya usager pe to souligner ba défis minene oyo tokokutana na yango. Bobongisi ya ba données Pona kobongisa histoire ya message pona boluki, esengeli tosala ba embeddings ya ba messages wana - ba représentations ya texte vectorisées. Ntango tozalaki kosala na lisolo ya wiki to mokanda ya PDF, tozalaki kokabola makomi na baparagrafe mpe kosala calcul ya Embedding ya phrase mpo na moko na moko. Kasi, tosengeli kotalela makambo ya kokamwa oyo ezalaka mingimingi mpo na masolo kasi te mpo na makomi oyo ebongisami malamu: Ba messages mikuse ebele oyo elandi ewutaka na mosaleli moko. Na makambo ya ndenge wana, ezali malamu kosangisa bansango na ba blocs ya minene ya makomi Ba messages mosusu ezali milayi mingi mpe etali ba sujets ebele ya ndenge na ndenge Ba messages sans sens na ba spam esengeli to filtrer Mosaleli akoki koyanola kozanga kotya bilembo na nsango ya ebandeli. Motuna mpe eyano ekoki kokabolama na lisolo ya kosolola na bamesaje mosusu mingi Mosaleli akoki koyanola na lien ya lisungi ya libanda (ndakisa, lisolo to mokanda) . Na sima, esengeli topona modèle ya intégration. Ezali na ba modèles ebele ya ndenge na ndenge pona kotonga ba embeddings, pe esengeli kotalela makambo ebele tango ya kopona modèle oyo ebongi. . Soki ezali likolo, modèle ekoki koyekola ba nuances mingi na ba données. Boluki ekozala ya sikisiki kasi ekosenga ba ressources ya mémoire mpe ya calcul mingi. Dimension ya ba embeddings oyo modèle ya intégration e former. Yango nde ekomonisa, na ndakisa, ndenge nini esimbaka monɔkɔ oyo osengeli na yango. Ensemble ya ba données Mpo na kobongisa lolenge ya ba résultats ya boluki, tokoki ko classer ba messages na sujet. Ndakisa, na chat oyo epesameli na développement ya frontend, basaleli bakoki kolobela ba sujets lokola : CSS, tooling, React, Vue, etc. Okoki kosalela LLM (ya talo mingi) to ba méthodes classiques ya modélisation ya sujets oyo ewutaka na ba bibliothèques lokola pona ko classer ba messages na yango mitó ya makambo. BERTopic Tokozala pe na besoin ya base de données vectorielle pona kobomba ba embeddings na ba méta-informations (ba liens ya ba posts originals, ba catégories, ba dates). Ba stockages ya vecteur mingi, lokola , , to , ezali pona tina oyo. PostgreSQL ya mbala na mbala na extension ekosala pe. FAISS Milvus Pinecone pgvector Kosala motuna ya basaleli Mpo na koyanola na motuna ya mosaleli, tosengeli kobongola motuna na formulaire oyo ekoki kolukama, mpe bongo kosala calcul ya embedding ya motuna, mpe lisusu koyeba mokano na yango. Résultat ya recherche sémantique na question ekoki kozala ba questions ya ndenge moko oyo ewutaki na histoire ya chat mais biyano na yango te. Pona ko imporver yango, tokoki kosalela moko ya ba techniques ya optimisation (hypothétique ya documentation) oyo eyebani mingi. Likanisi ezali ya kobimisa eyano ya hypothétique na motuna moko na kosalelaka LLM mpe na sima kosala calcul ya intégration ya eyano. Lolenge oyo na makambo mosusu epesaka nzela na koluka na bosikisiki mpe malamu mingi bansango oyo etali yango kati na biyano na esika ya mituna. ya HyDE Koluka bansango oyo ezali na ntina mingi Soki tozwi embedding ya motuna, tokoki koluka ba messages oyo ezali pene na base de données. LLM ezali na fenêtre ya contexte limité, yango wana tokoki kozala na makoki te ya kobakisa ba résultats nionso ya boluki soki ezali mingi. Motuna ebimi ya ndenge nini kotya biyano na esika ya liboso. Ezali na mayele mingi mpo na yango: . Na tango, ba information ekomi ya kala, pe pona ko prioritiser ba messages ya sika, okoki ko calculer score ya recency na nzela ya formule simple Point ya récency 1 / (today - date_of_message + 1) (esengeli o identifier sujet ya question na ba posts). Yango esalisaka mpo na kokitisa bolukiluki na yo, kotika kaka ba posts oyo ezali na ntina na sujet oyo ozali koluka Filtre ya ba métadonnées. . Boluki ya makomi mobimba ya kala, oyo esungami malamu na ba bases de données nionso oyo eyebani mingi, ekoki ntango mosusu kozala na litomba. Boluki ya makomi mobimba . Soki tozwi biyano, tokoki kokabola yango na ndenge oyo ‘ezali penepene’ na motuna, kotikala kaka oyo ezali na ntina mingi. Reranking ekosenga modèle , to tokoki kosalela API ya reranking, ndakisa, uta na . Kobongola na molɔngɔ ya bato ya CrossEncoder Cohere Kobimisa eyano ya nsuka Sima ya koluka pe kosala triage na étape oyo eleki, tokoki ko garder ba posts 50-100 oyo ezo correspondre na contexte ya LLM. Etape elandi ezali ya kosala prompt ya polele mpe ya mokuse mpo na LLM na kosalelaka requête originale ya usager mpe ba résultats ya recherche. Esengeli koyebisa na LLM ndenge nini koyanola na motuna, motuna ya mosaleli, mpe contexte - ba messages oyo etali yango tokuti. Mpo na yango, ezali na ntina mingi kotalela makambo oyo: ezali malako na modèle oyo elimboli ndenge nini esengeli kosala ba informations. Ndakisa, okoki koyebisa LLM eluka eyano kaka na ba données oyo epesami. System Prompt - bolai ya likolo ya ba messages oyo tokoki kosalela lokola entrée. Tokoki kosala calcul ya nombre ya ba jetons na nzela ya tokenizer oyo ekokani na modèle oyo tosalelaka. Ndakisa, OpenAI esalela Tiktoken. Bolai ya contexte - ndakisa, température ezali responsable ya ndenge nini modèle ekozala créatif na ba réponses na yango. Hyperparamètres ya modèle . Ezali ntango nyonso na ntina te kofuta mingi mpo na modèle oyo eleki monene mpe ya nguya. Ezali na ntina kosala ba tests ebele na ba modèles différents mpe ko comparer ba résultats na yango. Na makambo mosusu, ba modèles oyo esalelaka mingi te makoki ekosala mosala soki esengi bosikisiki ya likolo te. Pona ya modèle Kosalela yango Sikoyo tomeka ko mettre en œuvre ba étapes oyo na NodeJS. Tala tech stack oyo nakosalela: mpe NodeJS TypeScript - Cadre ya bot ya télégramme Grammy - lokola ebombelo ya liboso mpo na ba données na biso nionso PostgreSQL - Bobakisi ya PostgreSQL mpo na kobomba ba intégration ya makomi mpe ba messages pgvector - LLM и na ba modèles ya ba intégrations OpenAI API - pona ko simplifier ba interactions ya db Mikro-ORM To sauter ba étapes ya base ya ko installer ba dépendances na setup ya bot ya télégramme pe tokende mbala moko na ba fonctionnalités ya motuya mingi. Schéma ya base de données, oyo ekozala na besoin na sima: import { Entity, Enum, Property, Unique } from '@mikro-orm/core'; @Entity({ tableName: 'groups' }) export class Group extends BaseEntity { @PrimaryKey() id!: number; @Property({ type: 'bigint' }) channelId!: number; @Property({ type: 'text', nullable: true }) title?: string; @Property({ type: 'json' }) attributes!: Record<string, unknown>; } @Entity({ tableName: 'messages' }) export class Message extends BaseEntity { @PrimaryKey() id!: number; @Property({ type: 'bigint' }) messageId!: number; @Property({ type: TextType }) text!: string; @Property({ type: DateTimeType }) date!: Date; @ManyToOne(() => Group, { onDelete: 'cascade' }) group!: Group; @Property({ type: 'string', nullable: true }) fromUserName?: string; @Property({ type: 'bigint', nullable: true }) replyToMessageId?: number; @Property({ type: 'bigint', nullable: true }) threadId?: number; @Property({ type: 'json' }) attributes!: { raw: Record<any, any>; }; } @Entity({ tableName: 'content_chunks' }) export class ContentChunk extends BaseEntity { @PrimaryKey() id!: number; @ManyToOne(() => Group, { onDelete: 'cascade' }) group!: Group; @Property({ type: TextType }) text!: string; @Property({ type: VectorType, length: 1536, nullable: true }) embeddings?: number[]; @Property({ type: 'int' }) tokens!: number; @Property({ type: new ArrayType<number>((i: string) => +i), nullable: true }) messageIds?: number[]; @Property({ persist: false, nullable: true }) distance?: number; } Kabola ba dialogues ya mosaleli na biteni Kokabola ba dialogues milayi kati na basaleli ebele na biteni ezali mosala ya mpamba te. Malheureusement, ba approches par défaut lokola , oyo ezali na bibliothèque ya Langchain, ezo comptabiliser te ba peculiarités nionso spécifiques na chatting. Kasi, na oyo etali Telegram, tokoki ko profiter na ya Telegram oyo ezali na ba messages oyo etali yango mpe ba réponses oyo ba usagers batindi. RecursiveCharacterTextSplitter threads Mbala nyonso oyo etuluku ya sika ya bamesaje ekoya uta na salle ya kosolola, bot na biso esengeli kosala mwa makambo: Filtrer ba messages mikuse na liste ya maloba ya stop (ndakisa 'bonjour', 'bye', etc.) Sangisa ba messages oyo euti na mosaleli moko soki etindamaki na kolandana na boumeli ya ntango mokuse Bosangisa ba messages nionso oyo ezali ya thread moko Sangisa bituluku ya nsango oyo ezwami na ba blocs ya makomi ya minene mpe kokabola lisusu ba blocs ya makomi oyo na biteni na kosalelaka RecursiveCharacterTextSplitter Salá calcul ya ba embeddings mpo na chunk mokomoko Persister ba chunks ya texte na base de données elongo na ba embeddings na yango mpe ba liens na ba messages originales class ChatContentSplitter { constructor( private readonly splitter RecursiveCharacterTextSplitter, private readonly longMessageLength = 200 ) {} public async split(messages: EntityDTO<Message>[]): Promise<ContentChunk[]> { const filtered = this.filterMessage(messages); const merged = this.mergeUserMessageSeries(filtered); const threads = this.toThreads(merged); const chunks = await this.threadsToChunks(threads); return chunks; } toThreads(messages: EntityDTO<Message>[]): EntityDTO<Message>[][] { const threads = new Map<number, EntityDTO<Message>[]>(); const orphans: EntityDTO<Message>[][] = []; for (const message of messages) { if (message.threadId) { let thread = threads.get(message.threadId); if (!thread) { thread = []; threads.set(message.threadId, thread); } thread.push(message); } else { orphans.push([message]); } } return [Array.from(threads.values()), ...orphans]; } private async threadsToChunks( threads: EntityDTO<Message>[][], ): Promise<ContentChunk[]> { const result: ContentChunk[] = []; for await (const thread of threads) { const content = thread.map((m) => this.dtoToString(m)) .join('\n') const texts = await this.splitter.splitText(content); const messageIds = thread.map((m) => m.id); const chunks = texts.map((text) => new ContentChunk(text, messageIds) ); result.push(...chunks); } return result; } mergeMessageSeries(messages: EntityDTO<Message>[]): EntityDTO<Message>[] { const result: EntityDTO<Message>[] = []; let next = messages[0]; for (const message of messages.slice(1)) { const short = message.text.length < this.longMessageLength; const sameUser = current.fromId === message.fromId; const subsequent = differenceInMinutes(current.date, message.date) < 10; if (sameUser && subsequent && short) { next.text += `\n${message.text}`; } else { result.push(current); next = message; } } return result; } // .... } Ba embeddings ya biloko Na sima, esengeli tosala calcul ya ba embeddings pona moko na moko ya ba chunks. Pona yango tokoki kosalela modèle OpenAI text-embedding-3-large public async getEmbeddings(chunks: ContentChunks[]) { const chunked = groupArray(chunks, 100); for await (const chunk of chunks) { const res = await this.openai.embeddings.create({ input: c.text, model: 'text-embedding-3-large', encoding_format: "float" }); chunk.embeddings = res.data[0].embedding } await this.orm.em.flush(); } Koyanola na mituna ya basaleli Mpo na koyanola na motuna ya mosaleli, totangaka liboso ndenge oyo motuna yango ekɔtisami mpe na nsima tolukaka bansango oyo ezali na ntina mingi na lisolo ya kosolola public async similaritySearch(embeddings: number[], groupId; number): Promise<ContentChunk[]> { return this.orm.em.qb(ContentChunk) .where({ embeddings: { $ne: null }, group: this.orm.em.getReference(Group, groupId) }) .orderBy({[l2Distance('embedding', embedding)]: 'ASC'}) .limit(100); } Na sima to rerank ba résultats ya recherche na aide ya modèle ya reranking ya Cohere public async rerank(query: string, chunks: ContentChunk[]): Promise<ContentChunk> { const { results } = await cohere.v2.rerank({ documents: chunks.map(c => c.text), query, model: 'rerank-v3.5', }); const reranked = Array(results.length).fill(null); for (const { index } of results) { reranked[index] = chunks[index]; } return reranked; } Na sima, senga LLM ayanola na motuna ya mosaleli na kozongeláká na mokuse ba résultats ya boluki. Version simplifiée ya traitement ya requête ya recherche ekozala boye: public async search(query: string, group: Group) { const queryEmbeddings = await this.getEmbeddings(query); const chunks = this.chunkService.similaritySearch(queryEmbeddings, group.id); const reranked = this.cohereService.rerank(query, chunks); const completion = await this.openai.chat.completions.create({ model: 'gpt-4-turbo', temperature: 0, messages: [ { role: 'system', content: systemPrompt }, { role: 'user', content: this.userPromptTemplate(query, reranked) }, ] ] return completion.choices[0].message; } // naive prompt public userPromptTemplate(query: string, chunks: ContentChunk[]) { const history = chunks .map((c) => `${c.text}`) .join('\n----------------------------\n') return ` Answer the user's question: ${query} By summarizing the following content: ${history} Keep your answer direct and concise. Provide refernces to the corresponding messages.. `; } Bobongisi mosusu Ata sima ya ba optimisations nionso, tokoki koyoka ba réponses ya bot oyo esalemi na LLM ezali non-ideal mpe incomplete. Nini mosusu ekokaki kobongisama? Mpo na ba posts ya usager oyo ezali na ba liens, tokoki pe ko parser ba contenus ya web-pages na pdf-documents. — ko diriger ba queries ya mosaleli na source ya ba données oyo ebongi mingi, modèle, to index oyo esalemi na intention mpe contexte ya query pona ko optimiser précision, efficacité, pe coût. Query-Routing Tokoki kokotisa ba ressources pertinentes na sujet ya chat-room na index ya recherche — na mosala, ekoki kozala documentation oyo ewutaka na Confluence, pona ba chats ya visa, ba sites internet ya consulat na mibeko, etc. - Esengeli tosala pipeline pona ko évaluer qualité ya ba réponses ya bot na biso RAG-Evaluation