Generative AI and large language models (LLMs) are generating lots of excitement. They’ve sparked the imagination of the general public because of how useful they can be when prompted for help with writing text. But for developers, they’re even more revolutionary, because of how dramatically they’ve simplified the way AI applications can be built. Let’s take a look at why.
Traditionally, the way you build an AI application has been a four-step process:
Step 3 is generally straightforward (although
Not surprisingly, when a problem domain requires a team of PhDs to address it successfully, it will be cost- and skill-prohibitive for all but a few companies.
A reason that everyone’s so excited about generative AI with LLMs is because you can often solve a problem "well enough" without any of the steps above. With generative AI your job is to:
That's it, really. Everything else is details.
The most important detail: what data do you give GPT in step 1? You can't throw everything at it; it can only handle 4k tokens in GPT-3.5, or up to 32k in GPT-4, which is much slower and more expensive.
Vector search enables you to take the exact query that you already created to send to GPT and throw it over to your database, where you put everything you know about the customer. Vector search literally answers "what is the most relevant data for this query" with no further effort on your part–it's almost magical.
(I firmly believe that vector search should be a feature of your main application database and not a separate system, which is whywe added it to Apache Cassandra and DataStax Astra DB.)
Once you have your most relevant data and your query, you bundle them together and make a call to OpenAI, you get your answer back, and you’re done. (I’m glossing over some challenges, like “
So a streaming content provider would use data like: every session Jonathan ever spent with title, actors, and category; how long he watched it; plus all the metadata we can come up with, and then just denormalize it all into a single text blob and run it through the encoder to get a vector.
And if this were Netflix, it would be super easy, because
Once that’s done, you can fetch the relevant rows from Cassandra with a query like this, where ? is a bind variable for your query vector that you get from the same embedding API:
SELECT original_data_text
FROM user_recommendation_data
WHERE user_id = ‘jonathan’
ORDER BY embedding ANN OF ?
LIMIT 20
Then you add those results to your LLM prompt, and … that's it. Now you have a recommendation system that you built in a week with no PhDs, justAstra DB and your LLM of choice.
But even if you do need to build a custom model, LLMs can help with generating data, labels, and features to do so. That, however, is a topic for another article!
I recommend reading
By Jonathan Ellis, DataStax