Building generative AI requires working with a variety of developing machine learning and AI technologies. It also requires buckets of patience. These two requirements are not unrelated.
Through working alongside extremely talented machine learning experts from different firms building generative AI applications, it’s clear that there are a number of ‘yikes’ aspects to developing fully realized generative AI tools. In this article, I’ll briefly summarise five pitfalls to avoid when developing a generative AI application.
This may seem obvious, but I was surprised at how unreliable the output of commercial LLMs could be. When testing major LLMs (in particular, ChatGPT, Anthropic’s Claude AI, and Amazon Bedrock) on complex financial datasets, I noted that they can generate errors or hallucinations at a rate of one per page. LLMs are guilty of often generating information that sounds plausible but is false.
For example, when
Though
Overall, the lack of a calibrated confidence score in commercial LLMs is a huge limitation. Application developers must develop their own methods for calculating confidence and validating errors.
Any generative AI dependent on LLM production credits (such as Google’s Tensor Processing Units (TPUs)) needed to be technologically regulated to avoid spending more than accounted for - potentially bankrupting the company or developer.
Bankruptcy through overenthusiastic spending of production credits might seem like an urban myth - certainly, there are no noted cases of production credits running amok - but better safe than sorry, right?
There are several ways to stretch out your production credits to their fullest potential:
Again, it may seem obvious, but when stuck looking at a profusion of code, it’s easy to forget that a real human being will be using the product. In a production environment, consider the pathways the user will take, even in complex and code-heavy environments. Don’t be like Google Bard, whose generative AI model couldn’t answer
Many traditional UX testing tools can be adapted for generative AI products - such as a
Clean, normalize, and potentially enrich your data to improve the training process. Techniques like
Generally, the more data, the better (the AI algorithms I used were from a document store of 25 million documents, for example), but too much data may result in overfitting or computational bottlenecks.
Slightly off-topic, but in the future, it’s possible that wrangling with training data may not be an issue. Promising advancements like
There are many ways to scale a model. For example, you might split the model across multiple machines (model parallelism) or replicate the model across multiple machines (data parallelism). Complex models will likely benefit from the model’s parallelism’s device distribution; larger datasets or models with smaller architectures may benefit from increased throughput.
Another consideration is whether to upgrade compute resources on a single machine, such as by increasing GPU memory (
Of course, rigorous testing and validation after model scaling is a must. To ensure it can handle the increased load, consider trying a variety of by-the-book testing plus load and stress testing.
It’s unrealistic to expect building a generative AI product to work without encountering at least one of these challenges. Ultimately, each of these challenges presents a fork in decision-making. At each juncture, the choice you make gets you one step closer to building a mature, production-ready model.
Working with the latest advances in academic and industrial machine learning will help counter some of the typical frustrations, as machine learning’s competitive landscape is constantly pushing out new innovations. I organize the
Above all, accuracy and cost-effectiveness don’t have to be mutually exclusive. With controlled and strategic experimentation, building a generative AI product can be more rewarding and less frustrating than you might think.
Good luck!