In this article 🔍 We'll learn what semantic search is. 🎉 We'll discuss how semantic search impacts the user experience 🦀 We'll see the use case of a new npm library called Voy, a WASM vector similarity search engine written in Rust. Let's go. What Is Semantic Search? Semantic search is a type of search that focuses on meanings. You can search with human language or vague concepts, and the search result will give you similar data points in the database based on the semantics of your search query. It almost feels like semantic search engines "understand" the meaning of your questions. You can ask any question in your natural language like "Which Marvel movie to watch after Shang-Chi?". The engines understand the meaning and why you ask that question and it returns the most relevant results back to you. In fact, semantic search engines commonly use pre-trained models to understand the , and generate a computational representation of the search queries and database. We often refer to the representation as "vector embeddings" or "embeddings". The engine will then classify the embeddings and find the nearest neighbors of the search query embeddings. The nearest neighbors are the data points that are the most relevant to what you're looking for. Neural Networks search intent and contextual meanings To illustrate how semantic search works: The image originated from Google Research's blog " ". Announcing ScaNN: Efficient Vector Similarity Search Who Uses Semantic Search? Semantic search is everywhere. It's so valuable because it has a big impact on user experience: Users are able to search and access digital content in a more human way. Search engines are able to provide more relevant and helpful content to the users and make around keywords irrelevant. SEO strategy Search can be for big data. blazingly fast Search isn't limited to text anymore. We can create embeddings from different types of digital content, like images and videos. Some examples of the major companies that integrated semantic search: Google: Hummingbird Amazon: Semantic product search Spotify: Natural Language Search Meta: Facebook AI Similarity Search Redis: Redis Vector Similarity Search Introducing Voy is an open source semantic search engine in (WASM). I created it to empower more projects to build semantic features and create better user experience for people around the world. Voy follows several design principles: Voy WebAssembly 🤏 : Reduce overhead for limited devices, such as mobile browsers with slow network or IoT. Tiny ⚡️ : Create the best search experience for the users. Fast 🌳 : Optimize bundle size and enable asynchronous capabilities for modern Web API, such as . Tree Shackable Web Workers 🔋 : Generate portable embeddings index anywhere, anytime. Resumable ☁️ : Run semantic search on CDN edge servers. Edgy It's available on npm. You can simply install it with your favorite package manager and start using it. npm i voy-search yarn add voy-search pnpm add voy-search # with npm # with Yarn # with pnpm To demonstrate what it looks like: You can find the ! Feel free to try it out. Voy's repository on GitHub The repository includes an where you can see how to load the WASM module with . example Webpack 5 Let's break it down a bit to see what it did. First, the demo was loading the WASM module asynchronously. After loading, it started indexing the following phrases: "This is a very happy Person" "That is a Happy Dog" "Today is a sunny day" Indexing is where we transforms the phrases into embeddings and organize them in a embedding space. Once the index was ready, the demo performed a similarity search with the phrase "That is a happy Person". Finally we saw the search result returns "This is a very happy Person" as the top result. We can reason about the credibility of the result because "This is a happy Person" does have the highest semantic similarity to "This is a very happy Person". How does Voy Work? Voy takes care of two things: Indexing resources. Retrieving nearest neighbors from the index. Index Resources To demonstrate, we'll use "text" as our resource. The embeddings are organized and stored in a under the hood. k-d tree is a data structure for organizing data in a k-dimensional space. It is very useful for our vector embeddings index because the embeddings are fixed floating arrays. k-d tree As of now, packaging an embeddings transformers into WASM is still under development. So Voy relies on other libraries like to generate embeddings. Web AI ] })) // Dynamically import Voy const voy = await import ( 'voy' ) const phrases = [ 'That is a very happy Person' , 'That is a Happy Dog' , 'Today is a sunny day' , // Use web-ai to create text embeddings const model = await ( await TextModel.create( 'gtr-t5-quant' )).model const processed = await Promise .all(phrases.map( ( q ) => model.process(q))) // Index embeddings with Voy const data = processed.map( ( { result }, i ) => ({ id : String (i), title : phrases[i], url : `/path/ ${i} ` , // link to your resource for the search result embeddings : result, const index = voy.index({ embeddings : data }) // index is a serialized k-d tree As you can see, after executing , it returns a serialized index. It allows Voy to deserialize the index when executing searches without being in the same environment. For example, the index can be created in build time and ship the serialized index to the client to perform searches. It is referred as the . voy.index() resumability Retrieve Nearest Neighbors nearests.forEach( ) // Create query embeddings const query = await model.process( 'That is a happy Person' ) // Search with Voy and return the 1 result const nearests = voy.search(index, query.result, 1 ) // Display vector similarity search result ( result ) => log( `🕸️ voy similarity search result 👉 " ${result.title} "` ) // That is a very happy Person Internally, Voy uses Squared to calculate the nearest neighbors. There're a few ways to calculate the distance between points. Here the points are the nodes of the embeddings object in the k-d tree. The common formulas are: Euclidean distance Euclidean distance Manhattan distance Cosine similarity Final Thoughts Voy is created to make semantic accessible for developers to build, ship, and create user value. It still has a few more steps to be a self-sufficient semantic search engine. If you're interested, you can follow Voy's . public roadmap If you believe in Voy's mission and would like to support the project, please check out the on the GitHub repository. sponsor section If you are interested in some of the open source embeddings transformers, here are some of the projects I experimented with: spotify/annoy facebookresearch/faiss google-research/bert with the model UKPLab/sentence-transformers all-MiniLM-L12-v2 References - Google Research Announcing ScaNN: Efficient Vector Similarity Search - Redis Build Intelligent Apps with New Redis Vector Similarity Search - Wikipedia Cosine similarity - Wikipedia Euclidean distance - GitHub facebookresearch/faiss - Engineering at Meta Faiss: A library for efficient similarity search - Google Cloud Find anything blazingly fast with Google's vector search technology - Wikipedia Google Hummingbird - GitHub google-research/bert - Daw-Chih Liou I Built A Snappy Static Full-text Search with WebAssembly, Rust, Next.js, and Xor Filters - Spotify Introducing Natural Language Search for Podcast Episodes - Wikipedia - Wikipedia k-d tree Manhattan distance - Wikipedia Nearest neighbor search - Wikipedia Neural network - Amazon Science Semantic product search - Wikipedia Semantic search - Hugging Face sentence-transformers/all-MiniLM-L12-v2 - Wikipedia Similarity search - GitHub spotify/annoy - GitHub tantaraio/voy - GitHub UKPLab/sentence-transformers - GitHub visheratin/web-ai - GitHub Projects Voy Roadmap - Search Engine Land What is semantic search: A deep dive into entity-based search - W3C Community Group WebAssembly - mdn web docs Web Workers API - OpenJS Foundation Webpack This article was originally posted on . Daw-Chih's website