OpenAI made waves on Monday with the announcement of their latest flagship model, GPT-4o, or GPT-4 Omni. This new model promises enhanced capabilities for all users, boasting smarter and faster real-time voice interactions.
Why is this important? Well, tech giants like Google, Microsoft, and Apple are all pivoting towards a future powered by generative AI. With OpenAI leading the charge, they’re determined to maintain their position at the forefront of this rapidly evolving landscape.
According to CTO Mira Murati, the standout feature of GPT-4o is its accessibility — it brings the formidable intelligence of GPT-4 to all users, including those on the free tier. During a livestream presentation, Murati emphasized that GPT-4o, denoted by the letter “o” for “Omni,” represents a significant leap forward in terms of user-friendliness and speed.
The latest model from OpenAI, GPT-4o, is breaking boundaries by accepting input in various forms—text, audio, and images—and seamlessly generating output in all three formats. But that’s not all—this advanced AI is equipped with emotion recognition, allowing it to gauge the feelings behind your input. Plus, you can interrupt it mid-speech, and it responds with lightning speed, almost keeping pace with human conversation. With its multifaceted abilities and human-like responsiveness, GPT-4o is revolutionizing the way we interact with AI.
Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqN
— OpenAI (@OpenAI) May 13, 2024
Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx
Impact Shorts
More ShortsBut the excitement doesn’t stop there. Murati hinted at an even more substantial update to the model—think GPT-5 — set to be unveiled later this year, promising further advancements.
In a demonstration, OpenAI showcased the real-time capabilities of ChatGPT’s voice assistant, highlighting faster responses and the ability to seamlessly interrupt the AI. This glimpse into the future of AI-driven interactions underscores the potential of GPT-4o to revolutionize how we engage with technology.
During one demonstration, OpenAI showcased a real-time tutorial on deep breathing, illustrating the potential for practical guidance using GPT-4o.
In another demo, ChatGPT displayed its versatility by reading an AI-generated story in various voices, ranging from dramatic recitals to robotic tones and even singing.
A third demonstration showcased ChatGPT’s problem-solving prowess, as it helped a user work through an algebra equation instead of simply providing an answer.
Throughout the demos, GPT-4o exhibited significantly enhanced personality and conversational abilities compared to previous iterations.
OpenAI also demonstrated the chatbot’s ability to seamlessly switch between languages, facilitating translations between English and Italian in real-time.
These demonstrations underscored ChatGPT’s multimodal capabilities, spanning visual, audio, and text interactions. The AI assistant was able to utilize a phone’s camera to read written notes and even attempt to discern the emotions of individuals.
In the broader context, the online event coincided with Google’s upcoming I/O developer conference, where advancements in generative AI are expected to take centre stage.
OpenAI has also announced the release of a desktop version of ChatGPT, initially catering to Mac users, with access rolling out to paid users starting today. While a Windows version is in the pipeline, OpenAI chose to prioritize Mac users initially due to their larger user base.
In addition, OpenAI revealed plans to grant free users access to custom GPTs and its GPT store. These capabilities will be phased in over the coming weeks.
The rollout of GPT-4o’s text and image capabilities has commenced for paid ChatGPT Plus and Team users, with availability for Enterprise users on the horizon. Free users will also begin gaining access, albeit with rate limits.
Furthermore, the voice version of GPT-4o is set to become available “in the coming weeks,” expanding its utility beyond text-based interactions.
Developers can expect to leverage GPT-4o’s text and vision modes, with audio and video capabilities slated for release to “a small group of trusted partners” in the near future.
In a tweet, OpenAI confirmed that the enigmatic GPT2-chatbot spotted on a benchmarking site is indeed GPT-4o, shedding light on its identity.


)

)
)
)
)
)
)
)
)
