This is the summary and my key takeaways from the by LinkedIn on how NLP is being used (as of ) in designing its Help Search System. This highlights the problem statement and the different iterations of solutions that were adopted along with their shortcomings. original post 2019 Problem Statement: Given a query by a user, fetch the most relevant Help Article from the database. Iteration 1: Initial Solution Indexed all the help articles (documents) in the database using . In short, it generates an inverted dictionary that maps terms to all the documents it appeared in. Lucene Index 2. The given query is used to fetch all the relevant documents (hits) using Lucene indexing. 3. Each hit is scored using the algorithm, which takes the into account, giving the highest weights to hits in the , then hits in the and then the , and returns a weighted score. BM25F document structure Title Keywords Body 4. Return the best-scored articles. Why it failed Since the document retrieval system is ), without taking into account, the following are two example failure cases: Term based ( syntactics semantics Iteration 2: Final Solution Step 1: Text Normalization “ normalized to “ how canceling my premium accounts immediately” cancel premium account” Step 2: Query Mapping It might happen that the normalized query doesn’t have any words in common with the words in the articles. Hence, each query is mapped to a more representative query to fill the gap between a user’s terminology and the article’s terminology. Done in the following two steps: Queries are grouped together based on similarity metrics Query Grouping: For each of the queries in the Query group, a is calculated and the top K queries are selected as Rep Queries 2. Topic Mining and Rep Scoring: repScore is the similarity between the raw query and another query in the group sim(RQ, Q2) is the maximum similarity between Q2 and one of the topics from the title (similarly for the body) sim(Q2, title) Step 3: Intent Classification Long-tailed queries might not have a Rep Query, in which case a is used for classifying the of the query. CNN Intent For example: “Canceling Your Premium Subscription” and “Canceling or Updating a Premium Subscription Purchased on Your Apple Device” are considered to have the same intent of “cancel premium.” Overall Flow Also Published Here