326 reads

5 Key Updates Unveiled at OpenAI's DevDay 2024

by Eleanor HecksOctober 21st, 2024

Too Long; Didn't Read

OpenAI's 2024 DevDay introduced major updates, including model distillation, vision fine-tuning, prompt caching, higher rate limits, and a Realtime API, enhancing ChatGPT's capabilities for developers and promoting future innovation in AI applications.

featured image - 5 Key Updates Unveiled at OpenAI's DevDay 2024

OpenAI — the company behind ChatGPT — held its second annual DevDay in the fall of 2024. It unveiled several massive updates, tools, and features to delight attendees and livestream viewers. What were these changes? Will they benefit developers? Here’s a breakdown of the event and what it means for the future of ChatGPT.

OpenAI’s 2024 DevDay

DevDay — a portmanteau of developer and day — is a global conference hosted by OpenAI. While 2024’s updates were unveiled in San Francisco on Oct. 1, the company plans to visit London on Oct. 30 and Singapore on Nov. 21. This is in contrast to last year when the only event was in San Francisco.

OpenAI’s reasoning for expanding its scope is twofold. For one, it wants to be closer to its global audience. The second reason is that developers requested more time to learn from each other after 2023’s DevDay. So, while there will technically be three events, every major update has already been revealed.

The most recent conference was the company’s second. It held its first almost one year prior, in November 2023, where it announced a JSON mode, custom models, and a copyright protection tool, among other things. Since 2023’s DevDay was packed with massive updates, many developers expected 2024 to be just as big, if not bigger.

While many expected OpenAI to unveil the next iteration of ChatGPT, which started training in 2024, it announced it would prioritize application programming interface (API) advancements and developer tools. This DevDay focused on ChatGPT-4o and GPT-4o mini, which were released in 2024 as successors to GPT-4.

The “o” in GPT-4o stands for omni, emphasizing the model’s multimodality. It outperforms GPT-4 in many ways. It supports more languages,has a 128,000-token context window, and a knowledge cutoff date of late 2023. As artificial intelligence goes, it is among the most powerful, fast models today — and it has just received multiple massive updates.

5 Major Updates Revealed During DevDay

Five major updates were unveiled during DevDay 2024. While they became available to developers on Oct. 1, some are only for paying customers. Each has unique implications for developers.

1. Model Distillation

Model distillation involves leveraging a larger model to teach a smaller one. OpenAI announced multiple features for this cost-effective training method. The first is stored completions, which lets developers automatically keep input-output pairs on the platform permanently. This enables them to seamlessly generate datasets for distillation.

The second model distillation tool is a beta of an Evals product. It lets developers conduct distillation end-to-end on the platform to measure their model’s performance. Instead of manually creating scripts and tinkering with a mess of disparate logging tools, they can easily create and run custom evaluations.

2. Vision Fine-Tuning

Vision fine-tuning is a customization feature that makes it possible to fine-tune for GPT-4o and 4o-mini with images — not just text. This way, models will have a stronger understanding of visual input. As a result, they perform better in object recognition, detection, image search, and classification tasks.

3. Prompt Caching

OpenAI’s prompt caching provides automatic discounts on input the model has recently seen. Instead of paying full price, developers get 50% off — and faster processing speeds — starting at about 1,024 cached tokens. This feature is available on GPT-4o, GPT4-o mini, o1-preview, and o1 mini.

Since 1,000 tokens represent approximately 750 words, companies will save on commonly reused prompts. Notably, specifics on how long the cached storage is saved vary depending on whether the user is active during peak or off-peak hours. However, they don’t have to go out of their way to manually apply the discount.

4. Higher Rate Limit

A rate limit controls how often users can access the server to make requests within a certain time frame. OpenAI announced it is doubling its rate limit for GPT-4o, going from 5,000 requests per minute to 10,000. This change may not seem as grand as the others, but it will be just as significant in the long run.

5. Realtime API

In the past, developers needed three different platforms to create a speech-to-speech model — one for transcription, one for inference, and one for text-to-speech. With Realtime API, they can now integrate fast, natural speech-to-speech conversations into their applications. They have six preset voices to choose from.

Realtime API was previously exclusive to ChatGPT as Advanced Voice Mode. Now, professionals can build models that have lifelike speech-to-speech discussions. The delays are shorter and the voices are clearer. Users can even interrupt the AI midsentence to steer the conversation in a different direction.

Applications of an Updated ChatGPT

Model distillation is one of the most significant features unveiled during the 2024 DevDay since it allows inexperienced professionals to build robust models. It has applications in health care, finance, and retail. It could even be used in a nontechnical setting like education. Since upwards of 60% of educators already use AI in the classroom, integration would be relatively seamless.

Notably, model distillation has proven to be an effective tool for improving open models on various tasks. However, it doesn’t reach the teacher model’s level of performance. This slight drop in accuracy and speed may be unacceptable in high-risk, fast-paced industries. However, users can spend extra time fine-tuning their model if needed.

Vision fine-tuning is just as significant as model distillation. Developers could create highly accurate image detection and classification models. Trained models could evaluate X-rays to identify cancerous cells, analyze surveillance video to identify intruders, or improve quality control in high-risk settings like aerospace manufacturing.

Prompt caching’s real-world impact isn’t as large, but it is still important. It may even be the feature developers are most excited about. After all, it’s something they’ve been waiting years for. Automatically getting half off commonly used prompts could save individuals and companies a lot of money.

Notably, prompt caching isn’t exclusive to GPT-4o and GPT-4o mini — Anthropic’s Claude and Google’s Gemini already have it. For reference, Gemini offers a 75% discount on input tokens. However, the key difference is that it isn’t automatic — users must use CachedContent.create to create a cache and explicitly specify usage when defining the model.

Realtime API is last but not least. This feature could revolutionize speech-to-speech conversations, enabling seamless communication between humans and AI. Businesses could leverage it for customer service, scheduling, translation, or summarization. They could use it as an assistant, companion, or support tool.

What These Updates Mean for ChatGPT’s Future

The significance of focusing on API advancements and developer tools for this year’s DevDay cannot be ignored. Instead of rolling out a flashy new model at the big annual conference to make headlines and generate hype, OpenAI is ensuring its foundation is strong. These updates will encourage external innovation and generate insightful feedback from users.

These updates could kick-start a growth period for OpenAI, causing speech-to-speech and image-focused models to trend. At the very least, other AI engineers and businesses will look toward this company as inspiration. Similar updates from competitor models will likely follow in the near future.

As for the implications of Realtime API, prompt caching, vision fine-tuning, and model distillation, the one-day conferences in London and Singapore will likely be microcosms of what’s to come. Attendees and livestream viewers should expect to see developer-created content, insights, and questions featured at the upcoming events.