Generative AI and large language models (LLMs) are generating lots of excitement. They’ve sparked the imagination of the general public because of how useful they can be when prompted for help with writing text. But for developers, they’re even more revolutionary, because of how dramatically they’ve simplified the way AI applications can be built. Let’s take a look at why.
Why AI was hard until very recently
Traditionally, the way you build an AI application has been a four-step process:
- Encode the most relevant pieces of your data as vectors. Understanding what the “most relevant pieces” are is a hard problem, and often involves building a separate model just to figure this out instead of making an educated guess. Extracting the relevant pieces from the raw data is another hard problem. (These are important problems, which is why
we built Kaskada to solve them .) - Train a model using those vectors to accomplish your goal. For example, one of Netflix’s most important goals is to predict "what will Jonathan want to watch when he logs in.” If you have a common problem like image recognition you can "fine tune" an existing model instead of starting from scratch but this is often many-gpus-for-hours-or-days territory.
- Deploy the model and expose it as an API.
- To run it in production, run your realtime data through the same encoding as in step 1, then send it through the model you built and deployed to do its prediction ("inference").
Step 3 is generally straightforward (although
Not surprisingly, when a problem domain requires a team of PhDs to address it successfully, it will be cost- and skill-prohibitive for all but a few companies.
Why AI is easy now with LLMs
A reason that everyone’s so excited about generative AI with LLMs is because you can often solve a problem "well enough" without any of the steps above. With generative AI your job is to:
- Figure out how to get your data into GPT as text
- Formulate queries about that data in English
That's it, really. Everything else is details.
The most important detail: what data do you give GPT in step 1? You can't throw everything at it; it can only handle 4k tokens in GPT-3.5, or up to 32k in GPT-4, which is much slower and more expensive.
Vector search enables you to take the exact query that you already created to send to GPT and throw it over to your database, where you put everything you know about the customer. Vector search literally answers "what is the most relevant data for this query" with no further effort on your part–it's almost magical.
(I firmly believe that vector search should be a feature of your main application database and not a separate system, which is whywe added it to Apache Cassandra and DataStax Astra DB.)
Once you have your most relevant data and your query, you bundle them together and make a call to OpenAI, you get your answer back, and you’re done. (I’m glossing over some challenges, like “
So a streaming content provider would use data like: every session Jonathan ever spent with title, actors, and category; how long he watched it; plus all the metadata we can come up with, and then just denormalize it all into a single text blob and run it through the encoder to get a vector.
And if this were Netflix, it would be super easy, because
Once that’s done, you can fetch the relevant rows from Cassandra with a query like this, where ? is a bind variable for your query vector that you get from the same embedding API:
SELECT original_data_text
FROM user_recommendation_data
WHERE user_id = ‘jonathan’
ORDER BY embedding ANN OF ?
LIMIT 20
Then you add those results to your LLM prompt, and … that's it. Now you have a recommendation system that you built in a week with no PhDs, justAstra DB and your LLM of choice.
Is the traditional model still useful?
But even if you do need to build a custom model, LLMs can help with generating data, labels, and features to do so. That, however, is a topic for another article!
How can I try this out?
I recommend reading
By Jonathan Ellis, DataStax