OpenAI’s new flagship model can reason across audio, vision, and text in real time.
OpenAI has unveiled GPT-4o, an upgraded version of its GPT-4 model powering ChatGPT. OpenAI CTO Mira Murati stated that the new model “is much faster” and enhances “capabilities across text, vision, and audio.” The model will be free for all users, with paid users enjoying “up to five times the capacity limits” of free users.
According to OpenAI, GPT-4o’s capabilities will be introduced gradually, with text and image features starting to roll out in ChatGPT. CEO Sam Altman noted that the model is “natively multimodal,” enabling it to generate content or understand commands in voice, text, or images. Developers will have access to the API, which Altman mentioned is half the price and twice as fast as GPT-4 Turbo.
ChatGPT’s voice mode will receive new features as part of the GPT-4o update, allowing it to function as a voice assistant similar to Her, providing real-time responses and observing the environment. Altman discussed OpenAI’s evolution, noting a shift in focus towards making advanced AI models available to developers through paid APIs, enabling third parties to create innovative applications.
Speculation prior to the GPT-4o launch suggested that OpenAI might announce an AI search engine to compete with Google, a voice assistant integrated into GPT-4, or even a new model, GPT-5. The timing of this launch, just ahead of Google I/O, indicates OpenAI’s strategic positioning in the AI landscape.