OpenAI — the company behind ChatGPT — held its second annual DevDay in the fall of 2024. It unveiled several massive updates, tools, and features to delight attendees and livestream viewers. What were these changes? Will they benefit developers? Here’s a breakdown of the event and what it means for the future of ChatGPT.
DevDay — a portmanteau of developer and day — is a global conference hosted by OpenAI. While 2024’s updates were unveiled in San Francisco on Oct. 1, the company
OpenAI’s reasoning for expanding its scope is twofold. For one, it
The most recent conference was the company’s second. It held its first almost one year prior, in November 2023, where it announced a JSON mode, custom models, and a copyright protection tool, among other things. Since 2023’s DevDay was packed with massive updates, many developers expected 2024 to be just as big, if not bigger.
While many expected OpenAI to unveil the next iteration of ChatGPT, which started training in 2024, it announced it would prioritize application programming interface (API) advancements and developer tools. This DevDay focused on ChatGPT-4o and GPT-4o mini, which were released in 2024 as successors to GPT-4.
The “o” in GPT-4o stands for omni, emphasizing the model’s multimodality. It outperforms GPT-4 in many ways. It supports more languages,
Five major updates were unveiled during DevDay 2024. While they became available to developers on Oct. 1, some are only for paying customers. Each has unique implications for developers.
Model distillation involves leveraging a larger model to teach a smaller one. OpenAI announced multiple features for this cost-effective training method. The first is stored completions, which lets developers automatically keep input-output pairs on the platform permanently. This enables them to seamlessly generate datasets for distillation.
The second model distillation tool is a beta of an Evals product. It lets developers conduct distillation end-to-end on the platform to measure their model’s performance. Instead of manually creating scripts and tinkering with a mess of disparate logging tools, they can easily create and run custom evaluations.
Vision fine-tuning is a customization feature that makes it possible to fine-tune for GPT-4o and 4o-mini with images — not just text. This way, models will have a stronger understanding of visual input. As a result, they perform better in object recognition, detection, image search, and classification tasks.
OpenAI’s prompt caching provides automatic discounts on input the model has recently seen. Instead of paying full price, developers get 50% off — and faster processing speeds —
Since 1,000 tokens
A rate limit controls how often users can access the server to make requests within a certain time frame. OpenAI announced
In the past, developers needed three different platforms to create a speech-to-speech model — one for transcription, one for inference, and one for text-to-speech. With Realtime API, they can now integrate fast, natural speech-to-speech conversations into their applications. They
Realtime API was previously exclusive to ChatGPT as Advanced Voice Mode. Now, professionals can build models that have lifelike speech-to-speech discussions. The delays are shorter and the voices are clearer. Users can even interrupt the AI midsentence to steer the conversation in a different direction.
Model distillation is one of the most significant features unveiled during the 2024 DevDay since it allows inexperienced professionals to build robust models. It has applications in health care, finance, and retail. It could even be used in a nontechnical setting like education. Since upwards of
Notably, model distillation has
Vision fine-tuning is just as significant as model distillation. Developers could create highly accurate image detection and classification models. Trained models could evaluate X-rays to identify cancerous cells, analyze surveillance video to identify intruders, or improve quality control in high-risk settings like aerospace manufacturing.
Prompt caching’s real-world impact isn’t as large, but it is still important. It may even be the feature developers are most excited about. After all, it’s something they’ve been waiting years for. Automatically getting half off commonly used prompts could save individuals and companies a lot of money.
Notably, prompt caching isn’t exclusive to GPT-4o and GPT-4o mini — Anthropic’s Claude and Google’s Gemini already have it. For reference, Gemini
Realtime API is last but not least. This feature could revolutionize speech-to-speech conversations, enabling seamless communication between humans and AI. Businesses could leverage it for customer service, scheduling, translation, or summarization. They could use it as an assistant, companion, or support tool.
The significance of focusing on API advancements and developer tools for this year’s DevDay cannot be ignored. Instead of rolling out a flashy new model at the big annual conference to make headlines and generate hype, OpenAI is ensuring its foundation is strong. These updates will encourage external innovation and generate insightful feedback from users.
These updates could kick-start a growth period for OpenAI, causing speech-to-speech and image-focused models to trend. At the very least, other AI engineers and businesses will look toward this company as inspiration. Similar updates from competitor models will likely follow in the near future.
As for the implications of Realtime API, prompt caching, vision fine-tuning, and model distillation, the one-day conferences in London and Singapore will likely be microcosms of what’s to come. Attendees and livestream viewers should expect to see developer-created content, insights, and questions featured at the upcoming events.