OpenAI introduces GPT-4o: A free multimodal AI model for real-time audio, vision, and text

OpenAI has announced the release of GPT-4o, a new flagship AI model that can process and generate a combination of text, audio, and images in real-time. The “o” in GPT-4o stands for “omni,” signifying the model’s ability to handle multiple modalities.

GPT-4o is a single model trained end-to-end across text, vision, and audio. This allows the model to process all inputs and outputs using the same neural network, preserving information such as tone, multiple speakers, and background noises. GPT-4o is the first model combining all of these modalities, says OpenAI.

GPT-4o aims to provide a more natural human-computer interaction by accepting any combination of text, audio, and image inputs and generating corresponding outputs. The model can respond to audio inputs in as little as 232 milliseconds on average, which is comparable to human response time in a conversation.

The new model matches the performance of GPT-4 Turbo on text in English and code while offering improved performance on text in non-English languages. GPT-4o is also faster and 50% cheaper in the API compared to its predecessor. The model particularly excels in vision and audio understanding compared to existing models, according to OpenAI.

GPT-4o’s text and image inputs and text outputs are being released publicly, while other modalities will be gradually rolled out as the company. Audio outputs will initially be limited to preset voices and will adhere to existing safety policies.

The new model’s capabilities will be iteratively rolled out, with extended red team access starting today. GPT-4o’s text and image capabilities are now available in ChatGPT, with a new version of Voice Mode featuring GPT-4o planned for release in alpha within ChatGPT Plus in the coming weeks.

Developers can access GPT-4o in the API as a text and vision model, with support for audio and video capabilities planned for release to a small group of trusted partners in the coming weeks.

GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits.

[Image courtesy: OpenAI]

Just in

Vercel raises $250M

San Francisco-based Vercel, a frontend cloud platform provider, has secured $250 million in Series E funding, bringing the company's valuation to $3.25 billion.

Worky raises $6M (Mexico)

Mexico City-based Worky, a provider of HR and payroll software solutions for Mexican companies, has closed a $6 million Series A financing round.

Amazon announces $1.31B investment in France

Amazon has announced a new investment of about $1.31 billion (€1.2 billion) in France, which the company says will lead to the creation of over 3,000 permanent jobs in the country.

Amazon Web Services CEO Adam Selipsky to step down — CNBC

Adam Selipsky, CEO of Amazon’s cloud computing business, will step down from his role next month. Matt Garman, senior vice president of sales and marketing at Amazon Web Services, will succeed Mr. Selipsky after he exits the company June 3, writes Annie Palmer. 

Palo Alto Networks, Accenture expand alliance to offer generative AI services

Palo Alto Networks and Accenture have announced the expansion of their strategic alliance to provide new offerings that combine Palo Alto Networks' Precision AI technology with Accenture's secure generative AI services.