When building a search for an application, you typically face two broad approaches: broad Traditional keyword-based search — match words exactly or with simple variants.
Semantic (or vector) search — match meaning or context using AI embeddings. Traditional keyword-based search — match words exactly or with simple variants. Semantic (or vector) search — match meaning or context using AI embeddings. There's also a hybrid approach, but I will leave that for a future article. Instead, in this post, I’ll walk you through how the two Brad approaches work in Python using MariaDB and an AI embedding model, highlight where they differ, and show code that you can adapt. MariaDB The Key Components For this example, I used MariaDB Cloud to spin up a free serverless database. Within seconds, I had a free instance ready. I grabbed the host/user/password details, connected with VS Code, created a database called demo, created a products table, and loaded ~500 rows of product names via LOAD DATA LOCAL INFILE. This is an extremely small dataset, but it's enough for learning and experimentation. MariaDB Cloud demo products LOAD DATA LOCAL INFILE Then I built a small Python + FastAPI app. First, I implemented a simple keyword search (by product name) endpoint using a full-text index, then I implemented a semantic (vector) search using AI-generated vector embeddings + MariaDB’s vector support. You can see the whole process in this video. this video Keyword-based Search: Simple and Familiar For keyword search, I used a full-text index on the name column of the products table. With this index in place, I could search by product name using this SQL query: full-text index name products SELECT name
FROM products
ORDER BY MATCH(name) AGAINST(?)
LIMIT 10; SELECT name
FROM products
ORDER BY MATCH(name) AGAINST(?)
LIMIT 10; I exposed this functionality using a FastAPI endpoint as follows: @app.get("/products/text-search")
def text_search(query: str):
    cursor = connection.cursor()
    cursor.execute(
        "SELECT name FROM products ORDER BY MATCH(name) AGAINST(?) LIMIT 10;", (query,)
    )
    return [name for (name,) in cursor] @app.get("/products/text-search")
def text_search(query: str):
    cursor = connection.cursor()
    cursor.execute(
        "SELECT name FROM products ORDER BY MATCH(name) AGAINST(?) LIMIT 10;", (query,)
    )
    return [name for (name,) in cursor] Pros: Runs fast.
Works well when users type exact or close terms.
Uses built-in SQL features (no external AI model needed). Runs fast. Works well when users type exact or close terms. Uses built-in SQL features (no external AI model needed). Cons: Misses synonyms, context, or related meaning.
Doesn’t understand intent (if user types “running shoes”, a strict keyword search may miss “jogging trainers” or “sneakers”).
Quality depends heavily on the wording. Misses synonyms, context, or related meaning. Doesn’t understand intent (if user types “running shoes”, a strict keyword search may miss “jogging trainers” or “sneakers”). Quality depends heavily on the wording. In my demo, the endpoint returned several products that were not relevant to “running shoes”. Semantic (Vector) Search: Matching Meaning To go beyond keywords, I implemented a second endpoint: go beyond keywords I use an AI embedding model (Google Generative AI via LangChain) to convert each product name into a high-dimensional vector.
Store those vectors in MariaDB with the vector integration for LangChain.
At query time, embed the user’s search phrase into a vector (using exactly the same AI embedding model of the previous step), then perform a similarity search with the highly performant HNSW algorithm in MariaDB (e.g., top 10 nearest vectors) and return the corresponding products. I use an AI embedding model (Google Generative AI via LangChain) to convert each product name into a high-dimensional vector. Store those vectors in MariaDB with the vector integration for LangChain. integration for LangChain At query time, embed the user’s search phrase into a vector (using exactly the same AI embedding model of the previous step), then perform a similarity search with the highly performant HNSW algorithm in MariaDB (e.g., top 10 nearest vectors) and return the corresponding products. Here’s how I implemented the ingestion endpoint: @app.post("/products/ingest")
def ingest_products():
    cursor = connection.cursor()
    cursor.execute("SELECT name FROM products;")
    vector_store.add_texts([name for (name,) in cursor])
    return "Products ingested successfully" @app.post("/products/ingest")
def ingest_products():
    cursor = connection.cursor()
    cursor.execute("SELECT name FROM products;")
    vector_store.add_texts([name for (name,) in cursor])
    return "Products ingested successfully" And this is the semantic search endpoint: @app.get("/products/semantic-search")
def search_products(query: str):
    results = vector_store.similarity_search(query, k=10)
    return [doc.page_content for doc in results] @app.get("/products/semantic-search")
def search_products(query: str):
    results = vector_store.similarity_search(query, k=10)
    return [doc.page_content for doc in results] The LangChain integration for MariaDB makes the whole process extremely easy. The integration creates two tables: langchain_collection: Each row represents a related set of vector embeddings. I have only one in this demo, which corresponds to the product names.
langchain_embedding: The vector embeddings. Each vector belongs to a collection (many-to-one to langchain_collection). langchain_collection: Each row represents a related set of vector embeddings. I have only one in this demo, which corresponds to the product names. langchain_collection langchain_embedding: The vector embeddings. Each vector belongs to a collection (many-to-one to langchain_collection). langchain_embedding langchain_collection When I ran the semantic search endpoint with the same query “running shoes”, the results felt much more relevant: they included products that didn’t match “running” or “shoes” literally but were semantically close. Keyword vs. Semantic — When to Use Which Here’s a quick comparison: Approach

Pros

Cons



Keyword search

Quick to set up, uses SQL directly

Limited to literal term matching, less clever



Semantic search

Matches meaning and context, more flexible

Requires embedding model + vector support Approach

Pros

Cons



Keyword search

Quick to set up, uses SQL directly

Limited to literal term matching, less clever



Semantic search

Matches meaning and context, more flexible

Requires embedding model + vector support Approach

Pros

Cons Approach Approach Pros Pros Cons Cons Keyword search

Quick to set up, uses SQL directly

Limited to literal term matching, less clever Keyword search Keyword search Quick to set up, uses SQL directly Quick to set up, uses SQL directly Limited to literal term matching, less clever Limited to literal term matching, less clever Semantic search

Matches meaning and context, more flexible

Requires embedding model + vector support Semantic search Semantic search Matches meaning and context, more flexible Matches meaning and context, more flexible Requires embedding model + vector support Requires embedding model + vector support Pick keyword search when: Pick keyword search Your search domain is small and predictable, or, obviously, you need an exact keyword match.
Users know exactly what they’re looking for (specific codes, exact names).
You want minimal dependencies and complexity. Your search domain is small and predictable, or, obviously, you need an exact keyword match. Users know exactly what they’re looking for (specific codes, exact names). You want minimal dependencies and complexity. Pick semantic search when: Pick semantic search You need to handle synonyms, similar concepts, and user intent.
The dataset or domain has natural language variation.
You’re willing to integrate an embedding model and manage vector storage/indexing. MariaDB helps with this. You need to handle synonyms, similar concepts, and user intent. The dataset or domain has natural language variation. You’re willing to integrate an embedding model and manage vector storage/indexing. MariaDB helps with this. MariaDB helps with this In many real-world apps, you’ll use a hybrid: start with keyword search, and for higher-value queries or when an exact match fails, fall back to semantic search. Or even mix the two via hybrid search. MariaDB helps with this, too. MariaDB helps with this, too How Simple the Integration Can Be In my demo, I triggered vector ingestion via a POST endpoint (/ingest). That reads all product names, computes embeddings, and writes them to MariaDB. One line of code (via LangChain + MariaDB integration) handled the insertion of ~500 rows of vectors. /ingest Once vectors are stored, adding a semantic search endpoint is just a few lines of code. The MariaDB vector supports hiding most of the complexity. The Source Code You can find the code on GitHub. I have one simplistic, easy-to-follow program in the webinar-main.py and a more elaborate one with good practices in backend.py. Feel free to clone the repository, modify it, experiment with your own datasets, and let us know if there's anything you'd like to see in the LangChain integration for MariaDB. GitHub webinar-main.py webinar-main.py backend.py backend.py https://www.youtube.com/watch?v=B8XGe4KIv8o&embedable=true https://www.youtube.com/watch?v=B8XGe4KIv8o&embedable=true

Database Transactions: Everything That Can Go Wrong When Using Them

Visit my personal Blog!

Nominated for 2022 - HackerNoon Contributor of the Year - Database

Nominated for 2022 - HackerNoon Contributor of the Year - Git

Nominated for 2022 - HackerNoon Contributor of the Year - Java

Nominated for 2022 - HackerNoon Contributor of the Year - Docker

Nominated for 2022 - HackerNoon Contributor of the Year - Sql

Nominated for 2022 - HackerNoon Contributor of the Year - Analytics

Too Long; Didn't Read

Traditional Keyword-Based Search vs Semantic Search: Which Is Best For You?

Traditional Keyword-Based Search vs Semantic Search: Which Is Best For You?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

9 Useful Interactive CLI Tools for Linux

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

9 Useful Interactive CLI Tools for Linux

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps