The intelligent search results has become an absolute part of information retrieval in the digital world. Classic keyword-based search engines generally do not account for the actual user's intent or context, leading to missed opportunity and lost revenue, for online companies. Alternatively, Hybrid Search which combines keyword-based and AI-driven semantic search enhances your existing search capabilities in just a few minutes.
This step-by-step guide will show you how to elevate your search functionality using an open-source project, OpenSearch. By adopting these steps, you can turn your search into a complete, working, artificial intelligence solution in 5 minutes.
Hybrid Search combines:
Keyword-based search: Classic Boolean and text-based approaches to searching (e.g., "bag" results that contain the word "bag").
Semantic or vector-based search: AI techniques that comprehend context (e.g., “backpack” also retrieves “knapsack”, “school bag” etc., as the search engine recognizes their similarity).
Combining these two paradigms, you offer users not only relevance but performance and context-sensitive answers. In the e-commerce domain, this is to present users with exact items they are searching even if they express themselves in a different way. For content searches, it ensures that the most contextually relevant documents appear first.
Docker (recommended for a quick setup) or an existing environment to run OpenSearch.
Some command line experience and a clients like curl or Postman to tests the requests.
A dataset to index (you can try a simple JSON file or use the sample data provided in this article).
Note: For those who would prefer a cloud-based solution, you do not need to follow the local Docker instructions, and you can leverage a managed OpenSearch service available from any major cloud provider. The steps to create and manage indices remain similar.
Run OpenSearch locally using Docker. Copy and paste the following command in your terminal:
docker run -d --name opensearch -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:2.9.0
Here’s what each part does:
-d --name opensearch: Runs the container in the background with the name “opensearch.”
-p 9200:9200 -p 9600:9600: Exposes ports for HTTP requests and performance monitoring.
-e "discovery.type=single-node": Tells OpenSearch to run as a single-node cluster for simplicity.
Verify the container is running by visiting http://localhost:9200. You should see a JSON response with OpenSearch details.
To enable AI-powered semantic search, you need to store vector embeddings along with traditional keyword fields in your documents. Let’s create an index called products. We’ll set it up with both text and vector fields to showcase a hybrid approach. OpenSearch Documentation Reference
Copy and paste this index mapping:
curl -X PUT "http://localhost:9200/products"
-H 'Content-Type: application/json'
-d '{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"properties": {
"name": {
"type": "text"
},
"description": {
"type": "text"
},
"keywords": {
"type": "keyword"
},
"embedding": {
"type": "dense_vector",
"dims": 768
}
}
}
}'
Next we will need to generate an embedding for each document, a numerical representation that reflects the meaning of the text. You can either use an open-source model like Hugging Face) or use a service like OpenAI Embeddings API. Below is an example using Hugging Face's sentence-transformers. OpenSearch Documentation Reference
#!/usr/bin/env python3
import requests
from sentence_transformers import SentenceTransformer
#1. Load a model from Hugging Face
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
#2. Sample documents to index
documents = [
{ "name": "Blue Backpack",
"description": "A spacious bag suitable for school and travel",
"keywords": ["bags", "backpack", "school"],
},
{ "name": "Leather Handbag",
"description": "Stylish handbag made from genuine leather",
"keywords": ["fashion", "handbag", "leather"],
}
]
#3. Generate embeddings and index
for i, doc in enumerate(documents):
textToEmbed = doc["name"] + " " + doc["description"]
embedding = model.encode(textToEmbed).tolist() # Convert to list for JSON serialization
payload = {
"name": doc["name"],
"description": doc["description"],
"keywords": doc["keywords"],
"embedding": embedding
}
response = requests.post(
"http://localhost:9200/products/_doc/" + str(i),
json=payload
)
print(f"Indexed document {i}. Status code: {response.status_code}, Response: {response.json()}")
1. Save above code in a file with name index_documents.py
2. Run pip install sentence-transformers requests
if you haven’t already
3. Execute python index_documents.py
This script loads a small transformer model that produces 768-dimensional embeddings, iterates over couple of sample documents, combines text string from name and description, and encodes it into an embedding and then sends each document along with the embedding to OpenSearch via HTTP POST.
Pairing a standard keyword-based match and a vector based similarity query. OpenSearch provides various query types, below is an example using the _search
endpoint. OpenSearch Documentation Reference
curl -X POST "http://localhost:9200/products/_search" \
-H 'Content-Type: application/json' \
-d '{
"size": 5,
"query": {
"bool": {
"should": [
{
"match": {
"name": "bag"
}
},
{
"match": {
"description": "bag"
}
},
{
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.queryVector, doc['"embedding"']) + 1.0",
"params": {
"queryVector": replace this text with the user query embedding array
}
}
}
}
]
}
}
}'
• script_score: Vector-based scoring using cosine similarity. Add 1.0 to avoid negative scores.
• match: Standard keyword-based queries on name and description
• bool -> should: We use “OR-like” logic such that documents matching any of the clauses are boosted
Not only is it easier to create a robust and smart search experience than ever before, but using OpenSearch and cloud-based services, it is now possible to merge familiar keyword-based matching with intelligence-based semantic matching, which will open the door to improved relevancy, enhanced user delight, and potential increased conversions.
Ready to upgrade your search experience? Followed by the simple commands, exploremore with advanced capabilities, such as synonyms, re-ranking and personalized suggestions. The potential is boundless once you unleash the ability of hybrid AI search.
Happy Searching!