How LLMs and Vector Search Have Revolutionized Building AI Applications

Generative AI and large language models (LLMs) are generating lots of excitement. They’ve sparked the imagination of the general public because of how useful they can be when prompted for help with writing text. But for developers, they’re even more revolutionary, because of how dramatically they’ve simplified the way AI applications can be built. Let’s take a look at why.

Why AI was hard until very recently

Traditionally, the way you build an AI application has been a four-step process:

Encode the most relevant pieces of your data as vectors. Understanding what the “most relevant pieces” are is a hard problem, and often involves building a separate model just to figure this out instead of making an educated guess. Extracting the relevant pieces from the raw data is another hard problem. (These are important problems, which is why we built Kaskada to solve them.)
Train a model using those vectors to accomplish your goal. For example, one of Netflix’s most important goals is to predict "what will Jonathan want to watch when he logs in.” If you have a common problem like image recognition you can "fine tune" an existing model instead of starting from scratch but this is often many-gpus-for-hours-or-days territory.
Deploy the model and expose it as an API.
To run it in production, run your realtime data through the same encoding as in step 1, then send it through the model you built and deployed to do its prediction ("inference").

Step 3 is generally straightforward (although not without challenges); steps 1, 2, and 4 all involve bespoke solutions that can require difficult-to-find skills.

Not surprisingly, when a problem domain requires a team of PhDs to address it successfully, it will be cost- and skill-prohibitive for all but a few companies.

Why AI is easy now with LLMs

A reason that everyone’s so excited about generative AI with LLMs is because you can often solve a problem "well enough" without any of the steps above. With generative AI your job is to:

Figure out how to get your data into GPT as text
Formulate queries about that data in English

That's it, really. Everything else is details.

The most important detail: what data do you give GPT in step 1? You can't throw everything at it; it can only handle 4k tokens in GPT-3.5, or up to 32k in GPT-4, which is much slower and more expensive.

Vector search enables you to take the exact query that you already created to send to GPT and throw it over to your database, where you put everything you know about the customer. Vector search literally answers "what is the most relevant data for this query" with no further effort on your part–it's almost magical.

(I firmly believe that vector search should be a feature of your main application database and not a separate system, which is whywe added it to Apache Cassandra and DataStax Astra DB.)

Once you have your most relevant data and your query, you bundle them together and make a call to OpenAI, you get your answer back, and you’re done. (I’m glossing over some challenges, like “how do you get an LLM to answer in a strictly defined format so you can consume the answer and incorporate it into your application,” but these are the kinds of challenges that software engineers already know how to solve.)

So a streaming content provider would use data like: every session Jonathan ever spent with title, actors, and category; how long he watched it; plus all the metadata we can come up with, and then just denormalize it all into a single text blob and run it through the encoder to get a vector.

And if this were Netflix, it would be super easy, because Netflix already has their data in Cassandra, so there would be no need to spin up a new system or hire PhDs. You just need your devs to write a query to pull that metadata together into a single table and send each row to Google’s PaLM or OpenAI’s Ada API to turn them into vectors (called embeddings).

Once that’s done, you can fetch the relevant rows from Cassandra with a query like this, where ? is a bind variable for your query vector that you get from the same embedding API:

SELECT original_data_text
FROM user_recommendation_data
WHERE user_id = ‘jonathan’
ORDER BY embedding ANN OF ?
LIMIT 20

Then you add those results to your LLM prompt, and … that's it. Now you have a recommendation system that you built in a week with no PhDs, justAstra DB and your LLM of choice.

Is the traditional model still useful?

Travis Fisher is correct when he says to only consider fine-tuning or custom models once you've exhausted simpler approaches with LLMs. However, LLMs are slower and more expensive than custom-built models with smaller and/or more carefully tuned feature vectors. Additionally, in use cases like the Netflix one where you want to take machine-readable inputs and get back machine-readable outputs, you can usually achieve more accurate results with a custom model than you can by round-tripping through text and English.

But even if you do need to build a custom model, LLMs can help with generating data, labels, and features to do so. That, however, is a topic for another article!

How can I try this out?

I recommend reading Towards Data Science’s article covering the next level down on how this all works. Then sign up for vector-search-enabled Astra DB, and register for our June 15 webinar on vector search for LLMs. Get ready: this is going to be fun!

By Jonathan Ellis, DataStax