4,245 판독값

지식 관리를 위한 AI: QE-RAG 아키텍처를 사용한 RAG 반복

~에 의해 Shanglun Wang27m2023/09/12

너무 오래; 읽다

RAG(Retrieval-Augmented Generation)는 강력한 LLM 앱 개발에 널리 사용되는 아키텍처입니다. 그러나 현재 아키텍처에는 몇 가지 실질적인 제한 사항이 있습니다. RAG 애플리케이션 구축 과정을 살펴본 다음 QE-RAG라는 새로운 아키텍처를 사용하여 이를 개선할 수 있는 방법을 살펴봅니다.

featured image - 지식 관리를 위한 AI: QE-RAG 아키텍처를 사용한 RAG 반복

LLM 혁명이 구체화되기 시작하면서 과대광고는 상업적인 발전으로 바뀌었습니다. 초기의 흥분이 가라앉으면서 생성적 AI는 더 이상 전지전능한 블랙박스가 아니라 엔지니어 무기고의 구성 요소이자 매우 강력하지만 도구로 간주됩니다. 결과적으로 기업가와 기술자는 이제 LLM 응용 프로그램을 개발하는 데 사용할 수 있는 점점 더 성숙한 도구 및 기술 세트를 보유하게 되었습니다.

LLM의 가장 흥미로운 사용 사례 중 하나는 지식 관리 분야였습니다. OpenAI의 GPT 기술이나 LLaMa 2 및 Flan-T5와 같은 오픈 소스 모델을 기반으로 하는 전문 LLM은 대량의 데이터를 관리하는 데 영리한 방법으로 사용되고 있습니다. 이전에 대규모 텍스트 데이터 세트를 보유한 조직은 유사 일치 또는 전체 텍스트 색인과 같은 텍스트 검색 기술에 의존해야 했지만 이제는 정보를 찾을 수 있을 뿐만 아니라 시간 효율적이고 독자 친화적인 방식으로 요약할 수 있는 강력한 시스템에 액세스할 수 있습니다. 패션.

이 사용 사례에서 검색 증강 생성 아키텍처 (RAG)는 엄청난 유연성과 성능을 갖춘 뛰어난 아키텍처로 등장했습니다. 이 아키텍처를 사용하면 조직은 신속하게 작업 본문을 색인화하고 의미론적 쿼리를 수행하며 말뭉치를 기반으로 사용자 정의 쿼리에 대해 유익하고 설득력 있는 답변을 생성할 수 있습니다. RAG 아키텍처의 구현을 지원하기 위해 여러 회사와 서비스가 생겨나 그 지속력을 강조했습니다.

RAG는 효과적이지만 이 아키텍처에는 몇 가지 실제 제한 사항도 있습니다. 본 글에서는 RAG 아키텍처를 살펴보고, 그 한계점을 파악하고, 이러한 한계점을 해결하기 위한 개선된 아키텍처를 제안합니다.

다른 모든 기사와 마찬가지로 저는 다른 기술 전문가 및 AI 애호가들과 소통하고 싶습니다. 이 아키텍처를 어떻게 개선할 수 있을지에 대한 생각이 있거나 AI에 관해 논의하고 싶은 아이디어가 있다면 주저하지 말고 연락주세요 ! Github이나 LinkedIn에서 저를 찾으실 수 있습니다. 링크는 제 프로필과 이 글의 하단에 있습니다.

콘텐츠 개요

검색 증강 생성(RAG) 아키텍처
RAG 아키텍처의 한계
QE-RAG 또는 질문 강화 RAG 제안
결론

검색 증강 생성(RAG) 아키텍처

RAG, Flan 및 LLaMa와 같은 이름을 사용하는 AI 커뮤니티는 조만간 미래 지향적이고 세련된 이름으로 상을 받지 못할 가능성이 높습니다. 그러나 RAG 아키텍처는 LLM 개발을 통해 사용할 수 있는 매우 강력한 두 가지 기술, 즉 상황별 문서 삽입과 신속한 엔지니어링의 조합으로 인해 확실히 상을 받을 자격이 있습니다.

가장 간단하게 RAG 아키텍처는 임베딩 벡터 검색을 사용하여 질문과 가장 관련성이 높은 말뭉치 부분을 찾고, 해당 부분을 프롬프트에 삽입한 다음 프롬프트 엔지니어링을 사용하여 답변은 프롬프트에 제공된 발췌문을 기반으로 합니다. 이 모든 내용이 다소 혼란스럽게 들린다면 각 구성 요소를 차례로 설명할 것이므로 계속 읽어보시기 바랍니다. 또한 따라하실 수 있도록 예제 코드도 포함하겠습니다.

임베딩 모델

무엇보다도 효과적인 RAG 시스템에는 강력한 임베딩 모델이 필요합니다. 임베딩 모델은 자연 텍스트 문서를 문서의 의미론적 내용을 대략적으로 나타내는 일련의 숫자 또는 "벡터"로 변환합니다. 임베딩 모델이 좋은 모델이라고 가정하면 서로 다른 두 문서의 의미 값을 비교하고 벡터 산술을 사용하여 두 문서가 의미 상 유사한지 확인할 수 있습니다.

실제로 이를 확인하려면 다음 코드를 Python 파일에 붙여넣고 실행하세요.

 import openai from openai.embeddings_utils import cosine_similarity openai.api_key = [YOUR KEY] EMBEDDING_MODEL = "text-embedding-ada-002" def get_cos_sim(input_1, input_2): embeds = openai.Embedding.create(model=EMBEDDING_MODEL, input=[input_1, input_2]) return cosine_similarity(embeds['data'][0]['embedding'], embeds['data'][1]['embedding']) print(get_cos_sim('Driving a car', 'William Shakespeare')) print(get_cos_sim('Driving a car', 'Riding a horse'))

위 코드는 "Driving a car", "William Shakespeare", "Riding a Horse"라는 문구에 대한 임베딩을 생성한 후 코사인 유사성 알고리즘을 사용하여 서로 비교합니다. 구문이 의미론적으로 유사할 때 코사인 유사성이 더 높을 것으로 예상하므로 "Driving a car"와 "Riding a Horse"는 훨씬 더 가까워야 하지만 "Driving a car"와 "William Shakespeare"는 유사하지 않아야 합니다.

OpenAI의 임베딩 모델인 ada-002에 따르면 "driving a car"라는 문구는 "riding a Horse"라는 문구와 88% 유사하고 "William Shakespeare" 문구와 76% 유사하다는 것을 알 수 있습니다. 이는 임베딩 모델이 예상대로 작동하고 있음을 의미합니다. 의미론적 유사성에 대한 이러한 결정은 RAG 시스템의 기초입니다.

코사인 유사성 아이디어는 훨씬 더 큰 문서의 비교로 확장할 때 매우 강력합니다. 예를 들어, 셰익스피어의 맥베스 " 내일, 그리고 내일, 그리고 내일 "의 강력한 독백을 들어보세요.

 monologue = '''Tomorrow, and tomorrow, and tomorrow, Creeps in this petty pace from day to day, To the last syllable of recorded time; And all our yesterdays have lighted fools The way to dusty death. Out, out, brief candle! Life's but a walking shadow, a poor player, That struts and frets his hour upon the stage, And then is heard no more. It is a tale Told by an idiot, full of sound and fury, Signifying nothing.''' print(get_cos_sim(monologue, 'Riding a car')) print(get_cos_sim(monologue, 'The contemplation of mortality'))

독백은 '자동차를 탄다'라는 생각과 75%만 유사하고, '필멸에 대한 성찰'이라는 생각과 82% 유사하다는 것을 보아야 한다.

하지만 우리는 독백을 아이디어와 비교할 뿐만 아니라 실제로 독백을 질문과 비교할 수도 있습니다. 예를 들어:

 get_cos_sim('''Tomorrow, and tomorrow, and tomorrow, Creeps in this petty pace from day to day, To the last syllable of recorded time; And all our yesterdays have lighted fools The way to dusty death. Out, out, brief candle! Life's but a walking shadow, a poor player, That struts and frets his hour upon the stage, And then is heard no more. It is a tale Told by an idiot, full of sound and fury, Signifying nothing.''', 'Which Shakespearean monologue contemplates mortality?') get_cos_sim('''Full of vexation come I, with complaint Against my child, my daughter Hermia. Stand forth, Demetrius. My noble lord, This man hath my consent to marry her. Stand forth, Lysander. And my gracious Duke, This man hath bewitch'd the bosom of my child. Thou, thou, Lysander, thou hast given her rhymes, And interchanged love-tokens with my child: Thou hast by moonlight at her window sung With feigning voice verses of feigning love, And stol'n the impression of her fantasy With bracelets of thy hair, rings, gauds, conceits, Knacks, trifles, nosegays, sweetmeats (messengers Of strong prevailment in unharden'd youth): With cunning hast thou filch'd my daughter's heart, Turn'd her obedience, which is due to me, To stubborn harshness. And, my gracious Duke, Be it so she will not here, before your Grace, Consent to marry with Demetrius, I beg the ancient privilege of Athens: As she is mine, I may dispose of her; Which shall be either to this gentleman, Or to her death, according to our law Immediately provided in that case.''', 'Which Shakespearean monologue contemplates mortality?')

임베딩을 보면 맥베스의 독백이 문맥상 "어떤 셰익스피어의 독백이 죽음을 고려하는가?"라는 질문에 훨씬 더 가깝다는 것을 알 수 있습니다. 죽음을 언급하지만 필멸의 개념을 직접적으로 다루지는 않는 Egeus 독백보다.

벡터 조회

이제 임베딩이 완료되었으므로 RAG 시스템에서 이를 어떻게 사용합니까? 글쎄요, 셰익스피어에 관한 질문에 답할 수 있도록 RAG 시스템에 모든 셰익스피어 독백에 대한 지식을 제공하고 싶다고 가정해 보겠습니다. 이 경우 셰익스피어의 독백을 모두 다운로드하고 이에 대한 임베딩을 생성합니다. 따라하면 다음과 같이 임베딩을 생성할 수 있습니다.

 embedding = openai.Embedding.create(model=EMBEDDING_MODEL, input=[monologue])['data'][0]['embedding']

임베딩이 있으면 이를 쿼리하고 새 임베딩과 비교할 수 있는 방식으로 저장하려고 합니다. 일반적으로 우리는 두 벡터를 빠르게 비교할 수 있는 특수 데이터 저장소인 벡터 데이터베이스 에 이를 저장합니다. 그러나 코퍼스가 극도로 크지 않는 한 무차별 대입 비교는 성능이 중요하지 않은 대부분의 비프로덕션, 실험적 사용 사례에서 놀랍게도 허용됩니다.

데이터베이스 사용 여부에 관계없이 질문에 가장 적합한 코퍼스 항목을 찾을 수 있는 시스템을 구축하고 싶을 것입니다. 이 예에서는 현재 사용자 질문과 가장 관련성이 높은 독백을 찾는 기능을 원할 것입니다. 다음과 같은 작업을 수행할 수 있습니다.

 monologues_embeddings = [ ['Tomorrow, and tomorrow, and tomorrow...', [...]], # text in the left position, embedding in the right position ['Full of vexation come I...', [...]], … # More monologues and their embeddings as you see fit. ] def lookup_most_relevant(question): embed = openai.Embedding.create(model=EMBEDDING_MODEL, input=[question])['data'][0]['embedding'] top_monologue = sorted(monologues_embeddings, key=lambda x: cosine_similarity(embed, x[1]), reverse=True)[0] return top_monologue lookup_most_relevant(“How does Macbeth evaluate his life when he is confronted with his mortality?“)

이 예를 실행하면 질문과 대략 82% 유사하게 Macbeth 독백이 선택되는 것을 볼 수 있습니다.

신속한 엔지니어링

RAG 모델의 마지막 단계는 신속한 엔지니어링입니다. 우리의 경우에는 그다지 어렵지 않습니다. 이제 독백이 준비되었으므로 쿼리에 독백을 삽입한 다음 질문을 하여 프롬프트를 구성할 수 있습니다.

 completion = openai.ChatCompletion.create( model="gpt-4", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": """ Consider the following monologue: Tomorrow, and tomorrow, and tomorrow, Creeps in this petty pace from day to day, To the last syllable of recorded time; And all our yesterdays have lighted fools The way to dusty death. Out, out, brief candle! Life's but a walking shadow, a poor player, That struts and frets his hour upon the stage, And then is heard no more. It is a tale Told by an idiot, full of sound and fury, Signifying nothing. How does Macbeth evaluate his life when he is confronted with his mortality? """}, ] )

다음과 같은 답변이 표시됩니다.

 In the well-known "Tomorrow and tomorrow and tomorrow" monologue from Shakespeare's Macbeth, the titular character plunges into existential despair when confronted with his impending death. Macbeth evaluates his life as hollow, futile, and meaningless when facing his mortality. He sees it as a "walking shadow" and himself as a "poor player," highlighting his view of life as a fleeting, contrived performance with no lasting substance or impact. The "brief candle" is a metaphor for his own mortality, suggesting that life's flame can be extinguished suddenly and unceremoniously. By stating "It is a tale told by an idiot, full of sound and fury, signifying nothing," Macbeth is expressing his belief that life, despite all its noise and action, is ultimately empty, absurd, and void of meaning. Overall, Macbeth's examination of his life is profoundly pessimistic, revealing his deep despair and cynicism.

물론, 이 특별한 예는 RAG 아키텍처의 가장 강력한 시연은 아닙니다. 왜냐하면 대부분의 GPT 모델은 이미 셰익스피어의 독백을 알고 있고 인터넷에서 공개적으로 셰익스피어에 대한 대규모 분석을 훈련받았기 때문입니다. 실제로 독백을 삽입하지 않고 GPT-4에 이 정확한 질문을 하면 매우 좋은 답변을 얻을 수 있을 것입니다. 하지만 독백에 대한 인용문을 많이 사용하지는 않을 것입니다. 그러나 상업적인 환경에서는 이 기술을 기존 GPT 구현에서 액세스할 수 없는 독점 또는 난해한 데이터 세트에 교차 적용할 수 있다는 것이 분명합니다.

실제로 이전 기사인 ChatGPT, Google Cloud 및 Python을 사용하여 문서 분석기 구축 에 익숙한 독자라면 기술의 마지막 부분이 해당 기사에서 수행한 프롬프트 엔지니어링과 매우 유사하다는 것을 인식할 수 있습니다. 이러한 아이디어를 확장하면 사용자가 일본 경제 정책에 대해 검색하고 질문할 수 있는 일본 정부의 출판물(해당 기사의 샘플 데이터)을 기반으로 구축된 RAG 시스템을 쉽게 상상할 수 있습니다. 시스템은 가장 관련성이 높은 문서를 신속하게 검색하고 요약하며 기본 GPT 모델에서는 사용할 수 없는 심층적인 도메인별 지식을 기반으로 답변을 생성합니다. 이러한 강력함과 단순성은 바로 RAG 아키텍처가 LLM 개발자들 사이에서 많은 관심을 끄는 이유입니다.

이제 RAG 아키텍처를 살펴보았으므로 이 아키텍처의 몇 가지 단점을 살펴보겠습니다.

RAG 아키텍처의 한계

디버깅 가능성 내장

많은 RAG 시스템이 질문과 관련 문서를 연결하기 위해 문서 임베딩 및 벡터 검색에 의존하기 때문에 전체 시스템은 사용된 임베딩 모델만큼 좋은 경우가 많습니다. OpenAI 임베딩 모델은 놀라울 정도로 유연하며 임베딩에 맞게 조정할 수 있는 기술이 많이 있습니다. GPT에 대한 Meta의 오픈 소스 경쟁업체인 LLaMa는 미세 조정 가능한 임베딩 모델을 제공합니다. 그러나 임베딩 모델에는 피할 수 없는 블랙박스 측면이 있습니다. 짧은 텍스트 문자열을 비교할 때는 다소 관리하기 쉽지만 짧은 문자열을 훨씬 긴 문서와 비교할 때는 유효성을 검사하고 디버깅하기가 어렵습니다. 이전 예에서 우리는 임베딩 조회가 "필멸성"을 "내일, 내일, 그리고 내일" 독백에 연결할 수 있다는 믿음을 약간 가져야 합니다. 이는 투명성과 디버깅 가능성이 중요한 워크로드에는 상당히 불편할 수 있습니다.

컨텍스트 과부하

RAG 모델의 또 다른 한계는 전달될 수 있는 컨텍스트의 양이 상대적으로 제한되어 있다는 것입니다. 임베딩 모델이 제대로 작동하려면 문서 수준 컨텍스트가 필요하므로 임베딩을 위해 코퍼스를 분할할 때 주의해야 합니다. 맥베스의 독백은 사망률에 관한 질문과 82%의 유사성을 가질 수 있지만, 이 질문을 독백의 처음 두 줄, 즉 "내일, 그리고 내일, 그리고 내일. 기록된 시간의 마지막 음절까지 매일매일 이 사소한 속도로 기어갑니다.”

결과적으로 RAG 프롬프트에 전달되는 컨텍스트는 상당히 커야 합니다. 현재 가장 맥락이 높은 GPT 모델은 여전히 16,000개의 토큰으로 제한되어 있는데 이는 상당히 많은 텍스트이지만, 긴 인터뷰 기록이나 맥락이 풍부한 기사로 작업할 때는 제공할 수 있는 맥락의 양이 제한됩니다. 최종 세대 프롬프트에서.

소설 용어

RAG 모델의 마지막 한계는 새로운 용어를 사용할 수 없다는 것입니다. 특정 분야에서 일하는 사람들은 해당 분야에 고유한 용어와 말하는 방식을 개발하는 경향이 있습니다. 임베딩 모델의 훈련 데이터에 이러한 용어가 없으면 조회 프로세스가 어려워집니다.

예를 들어, ada-002 임베딩 모델은 "Rust 프로그래밍 언어"가 "LLVM"과 관련되어 있다는 것을 알지 못할 수 있습니다. 실제로 78%라는 상대적으로 낮은 코사인 유사성을 반환합니다. 이는 두 아이디어가 실제 생활에서 밀접하게 관련되어 있음에도 불구하고 LLVM에 대해 이야기하는 문서가 Rust에 대한 쿼리에서 강한 유사성을 나타내지 않을 수 있음을 의미합니다.

일반적으로 새로운 용어의 문제는 신속한 엔지니어링을 통해 극복할 수 있지만 임베딩 검색의 경우 상대적으로 어렵습니다. 앞서 언급했듯이 임베딩 모델을 미세 조정하는 것은 가능하지만 모든 상황에서 임베딩 모델에 새로운 용어를 가르치는 것은 오류가 발생하기 쉽고 시간이 많이 걸릴 수 있습니다.

QE-RAG 또는 질문 강화 RAG 제안

이러한 제한 사항을 고려하여 위에서 설명한 많은 제한 사항을 회피하는 새로운 종류의 RAG 시스템에 대한 수정된 아키텍처를 제안하고 싶습니다. 이 아이디어는 말뭉치 외에도 자주 묻는 질문에 대해 벡터 검색을 수행하고 LLM을 사용하여 질문의 맥락에서 말뭉치를 전처리하는 것을 기반으로 합니다. 해당 프로세스가 복잡해 보이더라도 걱정하지 마세요. 이 섹션에서는 따라하는 데 사용할 수 있는 코드 예제와 함께 구현 세부 사항을 살펴보겠습니다.

한 가지 주목해야 할 점은 QE-RAG가 일반 RAG 구현과 함께 실행되어 필요할 경우 다른 구현으로 대체할 수 있다는 것입니다. 구현이 성숙해짐에 따라 폴백이 점점 줄어들어야 하지만 QE-RAG는 여전히 바닐라 RAG 아키텍처를 대체하기보다는 개선하기 위한 것입니다.

아키텍처

QE-RAG 아키텍처의 광범위한 스트로크는 다음과 같습니다.

코퍼스에 대해 질문할 수 있거나 질문할 가능성이 있는 질문의 벡터 데이터베이스를 만듭니다.
벡터 데이터베이스의 질문에 대해 말뭉치를 전처리하고 요약합니다.
사용자 쿼리가 들어오면 사용자 쿼리와 벡터 데이터베이스의 질문을 비교합니다.
데이터베이스의 질문이 사용자 쿼리와 매우 유사한 경우 질문에 대답하기 위해 요약된 말뭉치 버전을 검색합니다.
요약된 말뭉치를 사용하여 사용자 질문에 답하세요.
DB에 사용자 쿼리와 매우 유사한 질문이 없으면 바닐라 RAG 구현으로 대체합니다.

각 부분을 차례로 살펴보겠습니다.

질문 임베딩

아키텍처는 바닐라 RAG와 마찬가지로 임베딩 및 벡터 데이터베이스로 시작됩니다. 그러나 문서를 삽입하는 대신 일련의 질문을 삽입할 것입니다.

이를 설명하기 위해 셰익스피어 전문가인 LLM을 구축하려고 한다고 가정해 보겠습니다. 다음과 같은 질문에 답하기를 원할 수도 있습니다.

 questions = [ "How does political power shape the way characters interact in Shakespeare's plays?", "How does Shakespeare use supernatural elements in his plays?", "How does Shakespeare explore the ideas of death and mortality in his plays?", "How does Shakespeare explore the idea of free will in his plays?" ]

우리는 이와 같이 임베딩을 만들고 저장하거나 나중에 사용하려고 합니다.

 questions_embed = openai.Embedding.create(model=EMBEDDING_MODEL, input=questions)

전처리 및 요약

이제 질문이 있으므로 코퍼스를 다운로드하여 요약해 보겠습니다. 이 예에서는 Macbeth 및 Hamlet의 HTML 버전을 다운로드합니다.

 import openai import os import requests from bs4 import BeautifulSoup plays = { 'shakespeare_macbeth': 'https://www.gutenberg.org/cache/epub/1533/pg1533-images.html', 'shakespeare_hamlet': 'https://www.gutenberg.org/cache/epub/1524/pg1524-images.html', } if not os.path.exists('training_plays'): os.mkdir('training_plays') for name, url in plays.items(): print(name) file_path = os.path.join('training_plays', '%s.txt' % name) if not os.path.exists(file_path): res = requests.get(url) with open(file_path, 'w') as fp_write: fp_write.write(res.text)

그런 다음 HTML 태그를 가이드로 사용하여 연극을 장면으로 처리합니다.

 with open(os.path.join('training_plays', 'shakespeare_hamlet.txt')) as fp_file: soup = BeautifulSoup(''.join(fp_file.readlines())) headers = soup.find_all('div', {'class': 'chapter'})[1:] scenes = [] for header in headers: cur_act = None cur_scene = None lines = [] for i in header.find_all('h2')[0].parent.find_all(): if i.name == 'h2': print(i.text) cur_act = i.text elif i.name == 'h3': print('\t', i.text.replace('\n', ' ')) if cur_scene is not None: scenes.append({ 'act': cur_act, 'scene': cur_scene, 'lines': lines }) lines = [] cur_scene = i.text elif (i.text != '' and not i.text.strip('\n').startswith('ACT') and not i.text.strip('\n').startswith('SCENE') ): lines.append(i.text)

QE-RAG를 독특하게 만드는 부분은 다음과 같습니다. 특정 장면에 대한 임베딩을 생성하는 대신 각 질문을 대상으로 해당 장면에 대한 요약을 생성합니다.

 def summarize_for_question(text, question, location): completion = openai.ChatCompletion.create( model="gpt-3.5-turbo-16k", messages=[ {"role": "system", "content": "You are a literature assistant that provides helpful summaries."}, {"role": "user", "content": """Is the following excerpt from %s relevant to the following question? %s === %s === If so, summarize the sections that are relevant. Include references to specific passages that would be useful. If not, simply say: \"nothing is relevant\" without additional explanation""" % ( location, question, text )}, ] ) return completion

이 기능은 ChatGPT에게 2가지 작업을 수행하도록 요청합니다. 1) 해당 구절이 현재 질문에 답변하는 데 실제로 유용한지 확인하고 2) 질문에 답변하는 데 유용한 장면 부분을 요약합니다.

Macbeth나 Hamlet의 몇 가지 중요한 장면에 대해 이 기능을 사용해 보면 GPT3.5가 장면이 질문과 관련이 있는지 식별하는 데 꽤 능숙하며 요약이 장면 자체보다 상당히 짧다는 것을 알 수 있습니다. 이렇게 하면 나중에 즉각적인 엔지니어링 단계에 삽입하는 것이 훨씬 쉬워집니다.

이제 모든 장면에 대해 이 작업을 수행할 수 있습니다.

 for scene in scenes: scene_text = ''.join(scene['lines']) question_summaries = {} for question in questions: completion = summarize_for_question(''.join(scene['lines']), question, "Shakespeare's Hamlet") question_summaries[question] = completion.choices[0].message['content'] scene['question_summaries'] = question_summaries

프로덕션 워크로드에서는 요약을 데이터베이스에 저장하지만 우리의 경우에는 JSON 파일로 디스크에 작성합니다.

2단계 벡터 검색

이제 아래와 같은 사용자 질문을 받았다고 가정해 보겠습니다.

 user_question = "How do Shakespearean characters deal with the concept of death?"

바닐라 RAG에서와 마찬가지로 질문에 대한 임베딩을 생성하려고 합니다.

 uq_embed = openai.Embedding.create(model=EMBEDDING_MODEL, input=[user_question])['data'][0]['embedding']

바닐라 RAG에서는 사용자 질문 임베딩을 셰익스피어 장면의 임베딩과 비교하지만, QE-RAG에서는 질문 임베딩과 비교합니다.

 print([cosine_similarity(uq_embed, q) for q in question_embed])

우리는 벡터 검색이 질문 3을 가장 관련성이 높은 질문으로 (올바르게) 식별했음을 알 수 있습니다. 이제 질문 3에 대한 요약 데이터를 검색합니다.

 relevant_texts = [] for scene in hamlet + macbeth: # hamlet and macbeth are the scene lists from the above code if "NOTHING IS RELEVANT" not in scene['question_summaries'][questions[2]].upper() and \ "NOTHING IN THIS EXCERPT" not in scene['question_summaries'][questions[2]].upper() and \ 'NOTHING FROM THIS EXCERPT' not in scene['question_summaries'][questions[2]].upper() and \ "NOT DIRECTLY ADDRESSED" not in scene['question_summaries'][questions[2]].upper(): relevant_texts.append(scene['question_summaries'][questions[2]])

GPT 요약은 결정적이지 않기 때문에 장면이 현재 질문과 관련이 없음을 나타내는 여러 가지 다른 문자열을 얻을 수 있습니다. 핵심은 관련 발췌문만 관련 요약 목록에 밀어넣는 것입니다.

이 단계에서는 프롬프트에 가장 관련성이 높은 요약만 포함하기 위해 두 번째 수준 벡터 검색을 수행할 수 있지만 코퍼스의 크기를 고려하면 프롬프트에서 전체 relevant_texts 목록을 간단히 사용할 수도 있습니다.

신속한 엔지니어링

이제 모든 발췌문이 있으므로 다음과 같이 프롬프트를 작성할 수 있습니다.

 completion = openai.ChatCompletion.create( model="gpt-3.5-turbo-16k", messages=[ {"role": "system", "content": "You are a literature assistant that answers questions based on excerpts."}, {"role": "user", "content": """Consider the following passages: %s ==== Answer this question: %s Be detailed and complete in your analysis. Make references to specific passages where helpful. Answer only based on passages provided. """ % ( '\n===\n'.join(relevant_texts), user_question )}, ] )

API 호출을 실행하면 아래와 같은 내용이 표시됩니다.

 Based on the passages provided, Shakespearean characters deal with the concept of death in various ways. In Hamlet, for example, the characters contemplate the nature of death and its implications. Hamlet himself reflects on the brevity of life and the corrupting influence of vices. He questions the purpose of existence and contemplates suicide as a release from the burdens of life. In the passage, Hamlet says, "To die, to sleep— / To sleep, perchance to dream. Ay, there's the rub, / For in that sleep of death what dreams may come" (Lines 65-67). This shows his contemplation of the afterlife and the uncertainty surrounding it. In Macbeth, characters are confronted with the consequences of their actions and the fragility of life. Lady Macbeth, for instance, experiences guilt and remorse for her involvement in the murder of King Duncan. She is haunted by visions of blood on her hands and expresses the desire to escape the consequences of her actions. In the passage, Lady Macbeth says, "Out, damned spot! Out, I say!" (Line 30). This demonstrates her struggle to cope with the guilt and the psychological toll it takes on her. Furthermore, both Macbeth and Lady Macbeth reflect on the inevitability and the consequences of their actions. In the passage, Macbeth laments the decay of life and the loss of purpose and meaning. He reflects on the transitory nature of existence and the insignificance of worldly achievements. Macbeth states, "Life's but a walking shadow, a poor player / That struts and frets his hour upon the stage" (Lines 217-218). This shows his acceptance of mortality and his realization that material possessions and power do not bring true fulfillment. In summary, Shakespearean characters deal with the concept of death in various ways. They contemplate the nature of life and the afterlife, struggle with guilt and remorse, and question the meaning and purpose of their existence. Shakespeare's exploration of death and mortality through his characters provides insight into the complexities of human nature and the universal human experience of confronting one's mortality.

햄릿과 맥베스의 다양한 부분을 발췌한 매우 훌륭한 요약입니다.

비교를 위해 전체 장면에 대한 임베딩을 생성하고 동일한 프롬프트를 사용하여 바닐라 RAG 조회를 실행하여 10,000개 토큰에 도달할 때까지 가장 관련성이 높은 상위 N개 장면을 찾았습니다.

 relevant_texts = [] for i in range(10): if len(encoding.encode('\n===\n'.join(relevant_texts))) > 10000: break relevant_texts.append(''.join(macbeth[sorted( [(idx, cosine_similarity(uq_embed, q)) for idx, q in enumerate(macbeth_embed)], key=lambda x: x[1], reverse=True )[i][0]]['lines'])) relevant_texts.append(''.join(hamlet[sorted( [(idx, cosine_similarity(uq_embed, q)) for idx, q in enumerate(hamlet_embed)], key=lambda x: x[1], reverse=True )[i][0]]['lines']))

이것을 프롬프트에 입력하면 아래와 같은 내용이 표시됩니다.

 Based on the provided passages, Shakespearean characters deal with the concept of death in various ways. In Hamlet, the characters reflect on death and mortality extensively. Hamlet himself contemplates suicide, expressing his desire to end his suffering and questioning the purpose of life. He says, "To be, or not to be: that is the question" (Hamlet, Act III, Scene I, Lines 56). This soliloquy reveals his struggle with the concept of death and his contemplation of the consequences of choosing either life or death. Additionally, in the same play, the appearance of the Ghost prompts discussions about death. The characters question the meaning and significance of encountering the dead, and Hamlet grapples with the idea of mortality in relation to his father's death. He reflects, "Oh God! a beast that wants discourse of reason / Would have mourn'd longer" (Hamlet, Act I, Scene II, Lines 152-153). This passage highlights Hamlet's grief and his pondering of the brevity of life and the transitory nature of existence. In Macbeth, the characters also confront death and its consequences. Macbeth, after killing King Duncan, reflects on the significance of the act and its implications. He says, "Will all great Neptune's ocean wash this blood Clean from my hand? No, this my hand will rather The multitudinous seas incarnadine" (Macbeth, Act II, Scene II, Lines 77-79). This passage showcases Macbeth's awareness of the guilt associated with murder and his realization that the blood on his hands cannot be cleansed. Lady Macbeth, on the other hand, shows a different perspective on death. She suppresses her guilt and tries to detach herself from the consequences of their actions. She states, "A little water clears us of this deed" (Macbeth, Act II, Scene II, Line 69). This passage reveals her denial and determination to move forward without remorse. In conclusion, Shakespearean characters deal with the concept of death in various ways. Some contemplate the meaning of life and question the purpose of existence, while others grapple with the guilt and consequences that come with taking a life. Shakespeare explores the complexities of death and mortality through his characters' introspection and their different responses to the inevitability of death.

이것은 매우 설득력 있는 분석이지만 Hamlet과 Macbeth의 가장 중요한 구절과 관련이 없습니다. 보시다시피 QE-RAG는 표준 RAG 시스템보다 더 관련성이 높은 컨텍스트를 삽입할 수 있다는 점에서 뚜렷한 이점을 가지고 있습니다.

그러나 위의 예는 개발자에게 임베딩 프로세스를 더 잘 제어할 수 있는 기능인 QE-RAG의 또 다른 이점을 보여주지 않습니다. QE-RAG가 이를 어떻게 달성하는지 알아보기 위해 이 문제의 확장, 즉 새로운 용어를 살펴보겠습니다.

QE-RAG를 새로운 용어로 확장

QE-RAG가 실제로 빛을 발하는 곳은 새로운 용어를 도입할 때입니다. 예를 들어, 절망과 절망 사이에 위치하는 용어인 일본어 단어 "zetsubou"와 같은 새로운 개념을 도입한다고 가정해 보겠습니다. 특히 자신의 상황에 대한 항복을 전달합니다. 그것은 영국의 절망 개념만큼 즉각적으로 재앙을 불러일으키는 것이 아니라, 일어나고 있는 불쾌한 일에 대한 묵인에 관한 것입니다.

다음과 같은 질문에 답하고 싶다고 가정해 보겠습니다.

user_question = "How do Shakespearean characters cope with Zetsubou?"

바닐라 RAG를 사용하면 임베딩 검색을 수행한 다음 최종 프롬프트 엔지니어링 단계에 설명자를 추가합니다.

 relevant_texts = [] for i in range(10): if len(encoding.encode('\n===\n'.join(relevant_texts))) > 10000: break relevant_texts.append(''.join(macbeth[sorted( [(idx, cosine_similarity(uq_embed, q)) for idx, q in enumerate(macbeth_embed)], key=lambda x: x[1], reverse=True )[i][0]]['lines'])) relevant_texts.append(''.join(hamlet[sorted( [(idx, cosine_similarity(uq_embed, q)) for idx, q in enumerate(hamlet_embed)], key=lambda x: x[1], reverse=True )[i][0]]['lines'])) completion = openai.ChatCompletion.create( model="gpt-3.5-turbo-16k", messages=[ {"role": "system", "content": "You are a literature assistant that answers questions based on excerpts."}, {"role": "user", "content": """Zetsubou is the concept of hopelessness and despair, combined with a surrender to whim of one's circumstances. Consider the following passages: %s ==== Answer this question: %s Be detailed and complete in your analysis. Make references to specific passages where helpful. Answer only based on passages provided. """ % ( '\n===\n'.join(relevant_texts), user_question )}, ] )

그 결과 매우 잘 쓰여지고 설득력이 있지만 Hamlet의 몇 장면에 초점을 맞춘 약간 과장된 답변이 탄생했습니다. Macbeth는 이 답변에서 전혀 언급되지 않았습니다. 장면 중 임베딩 검색을 통과한 장면이 없기 때문입니다. 임베딩을 보면 "zetsubou"의 의미론적 의미가 제대로 캡처되지 않았으므로 관련 텍스트를 검색할 수 없다는 것이 매우 분명합니다.

QE-RAG에서는 요약 단계에서 새 용어에 대한 정의를 삽입하여 시스템에서 액세스할 수 있는 텍스트의 품질을 획기적으로 향상시킬 수 있습니다.

 def summarize_for_question(text, question, location, context=''): completion = openai.ChatCompletion.create( model="gpt-3.5-turbo-16k", messages=[ {"role": "system", "content": "You are a literature assistant that provides helpful summaries."}, {"role": "user", "content": """%s Is the following excerpt from %s relevant to the following question? %s === %s === If so, summarize the sections that are relevant. Include references to specific passages that would be useful. If not, simply say: \"nothing is relevant\" without additional explanation""" % ( context, location, question, text )}, ] ) return completion questions = [ "How do characters deal with Zetsubou in Shakespearean plays?" ] summarize_for_question(''.join(scene['lines']), questions[0], "Shakespeare's Macbeth", "Zetsubou is the concept of hopelessness and despair, combined with a surrender to whim of one's circumstances." )

구절과 함께 이 요약 프롬프트를 사용하면 매우 정확하고 상황에 맞는 요약을 볼 수 있습니다. 이러한 요약은 후속 QE-RAG에서 사용될 수 있습니다.

또한 ada-002가 "zetsubou"의 정확한 의미를 모르더라도 두 질문에 이 단어가 있으면 질문 임베딩이 매우 유사하기 때문에 질문 임베딩이 매우 유사합니다.

따라서 QE-RAG를 사용하여 Zetsubou에 대한 원래 질문에 답하면 다음과 같은 답을 얻을 수 있습니다.

 Based on the passages provided, Shakespearean characters cope with Zetsubou, or hopelessness and despair, in different ways. In "Macbeth," Lady Macbeth copes with Zetsubou through guilt and anguish. She is consumed by the guilt of her and her husband's actions, and this guilt manifests in her sleepwalking and attempts to wash her hands of imagined blood stains. This can be seen in her utterances of "Out, damned spot!" and her preoccupation with the blood on her hands. Lady Macbeth's coping mechanism is to face the consequences of her actions head-on, even if it means descending into madness. The Doctor and Gentlewoman observe her behavior and acknowledge that her condition requires divine intervention. In "Hamlet," the excerpt does not provide specific passages or examples of characters coping with Zetsubou. However, it does address Hamlet's inner turmoil and struggle to cope with his circumstances. Hamlet's state of mind, confusion, and madness are a reflection of despair. The characters in this scene are attempting to understand Hamlet's condition and find ways to deal with it, but the excerpt does not provide details on how they cope with Zetsubou. In "Macbeth," Macbeth himself copes with Zetsubou by surrendering to his dire circumstances. He becomes desensitized to fear and horror, indicating a kind of resignation to his grim circumstances. He acknowledges that there is no escape from his fate and chooses to face it head-on, even though he knows it will lead to his downfall. Macbeth's reaction to the news of his wife's death and his subsequent soliloquy reflect a deep despair and hopelessness. He sees life as meaningless, a "tale Told by an idiot, full of sound and fury, Signifying nothing." Macbeth's surrender to his circumstances and his belief in the futility of life demonstrate his coping mechanism for Zetsubou. In "Macbeth," Lady Macduff copes with Zetsubou by expressing her feelings of despair and hopelessness. She feels betrayed and abandoned by her husband, who has fled and left their family behind. She questions his actions and believes that they show a lack of love and wisdom. Lady Macduff experiences a sense of abandonment, which aligns with the concept of Zetsubou. She tries to make her son understand the severity of their circumstances, but he questions the definition of a traitor. Lady Macduff ultimately meets a tragic end, indicating that her coping mechanism for Zetsubou is not successful. In summary, Shakespearean characters cope with Zetsubou in different ways. Lady Macbeth copes through guilt and faces the consequences of her actions, even if it means descending into madness. Macbeth copes through surrendering to his dire circumstances and embracing his fate. Lady Macduff copes through expressing her feelings of despair and hopelessness but ultimately meets a tragic end. These characters' coping mechanisms reflect different responses to Zetsubou in Shakespearean plays

출력물은 여전히 이 개념에 약간 어려움을 겪고 있지만 QE-RAG가 문제의 핵심에 훨씬 더 많이 관여하고 다양한 소스에서 끌어낼 수 있다는 것은 분명합니다.

이를 통해 우리는 QE-RAG를 구현하고 그 사용 사례를 시연했습니다!

결론

오늘 기사에서는 점점 인기를 얻고 있는 RAG 아키텍처와 그 한계를 살펴보았습니다. 그런 다음 대규모 언어 모델의 기능을 보다 완벽하게 활용하기 위해 QE-RAG라는 새로운 아키텍처로 RAG 아키텍처를 확장했습니다. 향상된 정확성과 상황별 액세스 외에도 QE-RAG를 사용하면 사용자와 상호 작용하고 질문 유형에 더 익숙해지면서 전체 시스템이 성장할 수 있으므로 기업은 오픈 소스를 기반으로 고유한 지적 재산을 개발할 수 있습니다. 또는 상업적으로 이용 가능한 LLM .

물론 실험적인 아이디어로서 QE-RAG는 완벽하지 않습니다. 이 아키텍처를 어떻게 개선할 수 있는지에 대한 아이디어가 있거나 단순히 LLM 기술에 대해 토론하고 싶다면 주저하지 말고 내 Github 또는 LinkedIn을 통해 연락해 주세요.