GPT-4o gives OpenAI a multimodal boost in its race against Google Gemini

Summary

  • ChatGPT by OpenAI is shifting the AI landscape, pushing Google to innovate faster to keep up with the competition.
  • New GPT-4o model boasts enhanced abilities in interpreting various types of information and it’s available to free users with some limitations.
  • OpenAI’s advancements are forcing Google to step up its game, signaling a competitive race towards AI dominance.



More than any other product, ChatGPT is responsible for the AI revolution that we are currently in the midst of. The sophistication of OpenAI’s large language model was unlike anything the world had seen before it, forcing Google, one of the world’s largest tech companies, to play catch-up. Now, a mere 18 months after ChatGPT upended how we think about artificial intelligence, OpenAI has released one of its biggest updates to date, and it’s still making Google play catch-up.


Related

Best Chromebooks in 2024

Chromebooks come in all shapes and sizes — 10-inch to 17-inch, budget to premium, laptop or tablet — and we’ll help pick your next one!


What’s in the new model

In a live stream earlier today, OpenAI announced GPT-4o (the “o” stands for “omni”), its most advanced LLM yet. According to OpenAI, GPT-4o is faster than previous models and is better at interpreting, written, audio, and visual information; basically everything you’d want from an AI chatbot update. Even better, it’s bringing these features to free-tier users (albeit with some limits) and unlocking other features that were previously only for paid users. All of this is already underway so check your app or log in to your browser to see if you can play with the new goodies.


OpenAI shows off a lot of what the new model can do in a blog post on its website. The first thing that struck me is that ChatGPT can now laugh convincingly enough to almost escape the uncanny valley, and given some of its new skills, I expect to see an uptick in adoption (assuming this isn’t a huge marketing stunt). GPT-4o seems to be a lot better at interpreting visual input, so OpenAI says it could recognize what sport you are watching on TV and explain the rules to you.

GPT-4o should also be much better at understanding voice inputs. With its other models, voice input is converted to text, passed to GPT-3.5/4.0, and finally, the response is converted back to audio. In contrast, the new model was trained more holistically, with text, audio, and images being processed by the same neural network, which in theory, should allow the model to pick up on how many speakers it’s interacting with and their tone.



Related

Best ChatGPT extensions you should try right now

Just don’t ask it how to turn on Skynet

If you’re a developer, OpenAI has even more news. The GPT-4o API is available now for text and voice. Compared to GPT-4 Turbo, the new model is half the price, twice as fast, and has a five times higher rate limit. If you want to play with the audio and video APIs, OpenAI says you’ll have to wait a bit longer and be on its short list of trusted developers.

And if you’re like me, still using the free tier, the GPT-4o roll out has already begun, although I have yet to see it. When using GPT-4o, free users will be able to access responses from the web, chat about photos, upload files, and access enterprise level data analysis tools.


Now it’s Google’s move

All of this information dropped less than an hour after Google previewed some very similar features for Gemini ahead of tomorrow’s Google I/O event. In case there was any doubt, OpenAI has made it clear that its arms race with Google is alive and well.


Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button