Generative AI and large language models (LLMs) are generating lots of excitement. They’ve sparked the imagination of the general public because of how useful they can be when prompted for help with writing text. But for developers, they’re even more revolutionary, because of how dramatically they’ve simplified the way AI applications can be built. Let’s take a look at why. Why AI was hard until very recently Traditionally, the way you build an AI application has been a four-step process: Encode the most relevant pieces of your data as vectors. Understanding what the “most relevant pieces” are is a hard problem, and often involves building a separate model just to figure this out instead of making an educated guess. Extracting the relevant pieces from the raw data is another hard problem. (These are important problems, which is why .) we built Kaskada to solve them Train a model using those vectors to accomplish your goal. For example, one of Netflix’s most important goals is to predict "what will Jonathan want to watch when he logs in.”  If you have a common problem like image recognition you can "fine tune" an existing model instead of starting from scratch but this is often many-gpus-for-hours-or-days territory. Deploy the model and expose it as an API. To run it in production, run your realtime data through the same encoding as in step 1, then send it through the model you built and deployed to do its prediction ("inference"). Step 3 is generally straightforward (although ); steps 1, 2, and 4 all involve bespoke solutions that can require difficult-to-find skills. not without challenges Not surprisingly, when a problem domain requires a team of PhDs to address it successfully, it will be cost- and skill-prohibitive for all but a few companies. Why AI is easy now with LLMs A reason that everyone’s so excited about generative AI with LLMs is because you can often solve a problem "well enough" without any of the steps above. With generative AI your job is to: Figure out how to get your data into GPT as text Formulate queries about that data in English That's it, really. Everything else is details. The most important detail: what data do you give GPT in step 1? You can't throw everything at it; it can only handle 4k tokens in GPT-3.5, or up to 32k in GPT-4, which is much slower and more expensive. Vector search enables you to take the exact query that you already created to send to GPT and throw it over to your database, where you put everything you know about the customer. Vector search literally answers "what is the most relevant data for this query" with no further effort on your part–it's almost magical. (I firmly believe that vector search should be a feature of your main application database and not a separate system, which is why .) we added it to Apache Cassandra and DataStax Astra DB Once you have your most relevant data and your query, you bundle them together and make a call to OpenAI, you get your answer back, and you’re done. (I’m glossing over some challenges, like “ so you can consume the answer and incorporate it into your application,” but these are the kinds of challenges that software engineers already know how to solve.) how do you get an LLM to answer in a strictly defined format So a streaming content provider would use data like: every session Jonathan ever spent with title, actors, and category; how long he watched it; plus all the metadata we can come up with, and then just denormalize it all into a single text blob and run it through the encoder to get a vector. And if this were Netflix, it would be super easy, because , so there would be no need to spin up a new system or hire PhDs. You just need your devs to write a query to pull that metadata together into a single table and send each row to Google’s or OpenAI’s API to turn them into vectors (called ). Netflix already has their data in Cassandra PaLM Ada embeddings Once that’s done, you can fetch the relevant rows from Cassandra with a query like this, where ? is a bind variable for your query vector that you get from the same embedding API: SELECT original_data_text
FROM user_recommendation_data
WHERE user_id = ‘jonathan’
ORDER BY embedding ANN OF ?
LIMIT 20 Then you add those results to your LLM prompt, and … that's it. Now you have a recommendation system that you built in a week with no PhDs, just and your LLM of choice. Astra DB Is the traditional model still useful? when he says to only consider fine-tuning or custom models once you've exhausted simpler approaches with LLMs. However, LLMs are slower and more expensive than custom-built models with smaller and/or more carefully tuned feature vectors. Additionally, in use cases like the Netflix one where you want to take machine-readable inputs and get back machine-readable outputs, you can usually achieve more accurate results with a custom model than you can by round-tripping through text and English. Travis Fisher is correct But even if you do need to build a custom model, LLMs can help with generating data, labels, and features to do so. That, however, is a topic for another article! How can I try this out? I recommend reading covering the next level down on how this all works. Then for vector-search-enabled Astra DB, and on vector search for LLMs. Get ready: this is going to be fun! Towards Data Science’s article sign up register for our June 15 webinar By Jonathan Ellis, DataStax

This story contains new, firsthand information uncovered by the writer.

The is an opinion piece based on the author’s POV and does not necessarily reflect the views of HackerNoon.

The code in this story is for educational purposes. The readers are solely responsible for whatever they build with it.

How LLMs and Vector Search Have Revolutionized Building AI Applications

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

3 Key Tools for Deploying AI/ML Workloads on Kubernetes

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

3 Key Tools for Deploying AI/ML Workloads on Kubernetes

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps