paint-brush
Querying News Articles Via a Streamlit App Using OpenAI, Langchain, and Qdrant DBby@hacker4897555
457 reads
457 reads

Querying News Articles Via a Streamlit App Using OpenAI, Langchain, and Qdrant DB

by January 3rd, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

By learning from users’ past queries and interactions, chatbots can present more relevant and personalized news articles.
featured image - Querying News Articles Via a Streamlit App Using OpenAI, Langchain, and Qdrant DB
undefined HackerNoon profile picture

Chatbots integrated into news querying serve various crucial purposes. They offer a convenient and conversational approach for users to access news, eliminating the need to navigate websites or apps. Users can simply ask a chatbot for news on a specific topic or event, making information more accessible, especially for those who find traditional methods challenging.


Personalization is a key feature of chatbots in the news industry. By learning from users’ past queries and interactions, chatbots can present more relevant and personalized news articles, enhancing the overall user experience. This tailoring of content ensures that users receive information aligned with their preferences.


Time efficiency is another significant advantage. Chatbots quickly sift through vast amounts of information, presenting users with the most relevant news articles. This time-saving aspect is particularly beneficial for users who would otherwise have to manually search and filter through numerous sources.


Moreover, chatbots contribute to interactive news consumption. They engage users in a conversation, answering follow-up questions and providing additional context or related information. This interactive approach adds depth to the news-reading experience, surpassing the passive nature of traditional methods.


Information overload is a common issue in the digital age, and chatbots help mitigate it by filtering out noise. They deliver news that is most relevant to the user’s interests and needs, streamlining the consumption process and enhancing user satisfaction.


Visually impaired users benefit significantly from chatbots, especially when integrated with voice technology. This combination provides an invaluable audio-based method for accessing news, promoting inclusivity in news consumption.


Integration into commonly used platforms, such as messaging apps, enhances user convenience. Users can receive news updates in the same environment where they communicate with others, streamlining their digital experience.


Automated updates and alerts are a proactive feature of chatbots. Programmed to send timely news updates or alerts about breaking news, chatbots ensure that users stay informed in real time, contributing to a more connected and aware user base.


Language and regional customization further extend the accessibility of news. Chatbots can be designed to deliver news in multiple languages and tailor content to regional or local interests, catering to diverse demographics and preferences.


In summary, chatbots in the news industry elevate user experience through convenient, personalized, and interactive access to news. They address various challenges in traditional news consumption methods while catering to a diverse range of user needs and preferences.


In this article, we’ll design an RAG pipeline using OpenAI, Langchain, and Qdrant DB and encase it in an user interface via Streamlit.

What Is RAG

“RAG” stands for “Retrieval-Augmented Generation.” It’s a technique used in natural language processing and machine learning, particularly in the development of advanced language models like chatbots.In a RAG setup, when a query is input (like a question or a prompt), the retrieval system first searches through its database to find relevant information or documents. This information is then passed on to the generative model, which synthesizes it to create a coherent and contextually appropriate response.

Visual of a RAG pipeline

A Brief Note on the Components

Langchain: LangChain is an open-source framework designed to simplify the creation of applications using large language models (LLMs). LangChain’s use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis. LangChain enables developers to connect LLMs to other data sources, interact with their environment, and build complex applications. It is written in Python and JavaScript and supports a variety of language models, including GPT-3, LLAMA, Hugging Face Jurassic-1 Jumbo, and more.


Qdrant: Qdrant is an open-source vector similarity search engine and vector database written in Rust. It provides a production-ready service with a convenient API to store, search, and manage points — vectors with an additional payload. Qdrant is tailored to extended filtering support, making it useful for various neural network or semantic-based matching, faceted search, and other applications.

Setting Up the Environment & Code

First, in your directory, create a requirements.txt file with the following content:


langchain

streamlit

requests

opeanai

qdrant-client

tiktoken


Then run the command to install these dependencies:


pip install -r requirements.txt


Now create a file name app.py and paste the following code in it, the comments explain their functionality:


#importing the needed libraries

import streamlit as st

import requests

from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.embeddings.openai import OpenAIEmbeddings

from langchain.vectorstores import Qdrant

from langchain.chat_models import ChatOpenAI

from langchain.chains import RetrievalQA

import os



#function to fetch text data from the links of news websites
def fetch_article_content(url):

headers = { 

    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3' 

} 

try: 

    response = requests.get(url, headers=headers) 

    response.raise_for_status() 

    return response.text 

except requests.RequestException as e: 

    st.error(f"Error fetching {url}: {e}") 

    return "" 


#function to collate all the text from the news website into a single string
def process_links(links):

all_contents = "" 

for link in enumerate(links): 

    content = fetch_article_content(link.strip()) 

    all_contents += content + "\\n\\n" 

return all_contents 


#function to chunk the articles beofore creating vector embeddings
def get_text_chunks_langchain(text):

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100) 

texts = text_splitter.split_text(text) 

return texts 


#creating the streamlit app
def main():

st.title('News Article Fetcher') 

# Initialize state variables 

if 'articles_fetched' not in st.session_state: 

    st.session_state.articles_fetched = False 

if 'chat_history' not in st.session_state: 

    st.session_state.chat_history = "" 



# Model selection 

model_choice = st.radio("Choose your model", \["GPT 3.5", "GPT 4"\], key= "model_choice") 

model = "gpt-3.5-turbo-1106" if st.session_state.model_choice == "GPT 3.5" else "gpt-4-1106-preview" 



#API_KEY 

API_KEY = st.text_input("Enter your OpenAI API key", type="password", key= "API_KEY") 



# Ensure API_KEY is set before proceeding 

if not API_KEY: 

    st.warning("Please enter your OpenAI API key.") 

    st.stop() 



#asking user to upload a text file with links to news articles (1 link per line) 

uploaded_file = st.file_uploader("Upload a file with links", type="txt") 



# Read the file into a list of links 

if uploaded_file: 

    stringio = uploaded_file.getvalue().decode("utf-8") 

    links = stringio.splitlines() 



# Fetch the articles' content 

if st.button("Fetch Articles") and uploaded_file: 

    progress_bar = st.progress(0) 

    with st.spinner('Fetching articles...'): 

        article_contents = process_links(links) 

        progress_bar.progress(0.25)  # Update progress to 25% 



        #Process the article contents 

        texts = get_text_chunks_langchain(article_contents) 

        progress_bar.progress(0.5)  # Update progress to 50% 



        #storing the chunked articles as embeddings in Qdrant 

        os.environ\["OPENAI_API_KEY"\] =  st.session_state.API_KEY 

        embeddings = OpenAIEmbeddings() 

        vector_store = Qdrant.from_texts(texts, embeddings, location=":memory:",) 

        retriever = vector_store.as_retriever() 

        progress_bar.progress(0.75)  # Update progress to 75% 



        #Creating a QA chain against the vectorstore 

        llm = ChatOpenAI(model_name= model) 

        if 'qa' not in st.session_state: 

            st.session_state.qa = RetrievalQA.from_llm(llm= llm, retriever= retriever) 

        progress_bar.progress(1) 



        st.success('Articles fetched successfully!') 

        st.session_state.articles_fetched = True 



#once articles are fetched, take input for user query 



if 'articles_fetched' in st.session_state and st.session_state.articles_fetched: 



    query = st.text_input("Enter your query here:", key="query") 



    if query: 

        # Process the query using your QA model (assuming it's already set up) 

        with st.spinner('Analyzing query...'): 

            qa = st.session_state.qa 

            response = qa.run(st.session_state.query)   

        # Update chat history 

        st.session_state.chat_history += f"> {st.session_state.query}\\n{response}\\n\\n" 



    # Display conversation history 

    st.text_area("Conversation:", st.session_state.chat_history, height=1000, key="conversation_area") 

    # JavaScript to scroll to the bottom of the text area 

    st.markdown( 

        f"<script>document.getElementById('conversation_area').scrollTop = document.getElementById('conversation_area').scrollHeight;</script>", 

        unsafe_allow_html=True 

    ) 


if name == "main":

main() 

Then save the app.py and run the following command in your terminal:


streamlit run app.py


This launches your application at localhost with port number 8051.

Conclusion

In conclusion, the integration of chatbots into news querying not only addresses the challenges of traditional news consumption but also significantly enhances user experience by providing a convenient, personalized, and interactive access to information. The discussed RAG pipeline, incorporating OpenAI, Langchain, and Qdrant DB, coupled with a Streamlit-based user interface, exemplifies the cutting-edge technological advancements in natural language processing and machine learning. This comprehensive solution not only streamlines the process of fetching and analyzing news articles but also showcases the potential of AI-driven systems in delivering tailored content, mitigating information overload, and ensuring inclusivity for visually impaired users. The outlined code implementation serves as a practical guide for developers interested in building advanced chatbot applications for news retrieval, demonstrating the fusion of language models, vector similarity search engines, and efficient UI design. Ultimately, this innovative approach represents a paradigm shift in news consumption, offering a glimpse into the future of user-centric and technology-driven information access.

References

  • https://api.python.langchain.com/en/latest/api_reference.html
  • https://python.langchain.com/docs/integrations/vectorstores/qdrant