Happy Monday! ☀️
Exactly a week ago, OpenAI announced GPT-4o. I watched the livestream event and demos released by OpenAI.
Although most of the updated features, such as real-time audio and video, have not been released yet, I will use this article to talk about my early impressions of the potential of GPT-4o.
1. The multimodal capabilities open up a range of exciting applications.
The "o" in GPT-4o stands for "omni" (meaning "all" in Latin). It accepts any combination of text, audio, and image inputs and generates any combination of text, audio, and image outputs.
The speed and quality of processing what it hears and sees represent a significant upgrade over GPT-4. For GPT-4, audio-to-text, text-to-text, and text-to-audio were three separate models. There were inevitable lags between them, and certain information could get lost.
GPT-4o is a newly trained single model that connects text, vision, and audio, with all inputs and outputs processed by the same neural network. It can even detect tones, multiple speakers, and background noises.
This streamlined multimodal interaction made me think of some designers' use cases that would not have been possible before.
Scenario 1 - User Interview Analysis
When I was watching this demo video, I thought about the user interviews I had conducted in the past.
It opens up a new world if ChatGPT (with the GPT-4o model) can serve as the meeting assistant in live user interviews. It can not only take notes but also detect subtle things such as facial expressions and tones. This non-verbal information is valuable.
For example, I won’t have to spend time rewatching the user testing recording multiple times just to capture who said what and how they said it. Additionally, the detected emotions can be included as part of the interview report.
Scenario 2 - Design Reviews
Similarly, in user interviews, if GPT-4o can process visuals and voices and provide live feedback, it can potentially serve as a partner in design reviews as well.
Imagine if ChatGPT could observe the designs and listen to the conversations between designers. ChatGPT could provide live design feedback and summarize the key points from the conversation.
Scenario 3 - Real-time Design Assistance
This demo video demonstrates how ChatGPT with the GPT-4o model can process visuals and engage in real-time dialogue with the user.
It would be mind-blowing if ChatGPT could watch me design and provide guidance. For instance, as I create wireframes in Figma, ChatGPT could engage in a live conversation with me, offering suggestions and adjustments.
2. The feedback is significantly faster.
Although the vision and audio features of GPT-4o have not been released yet, the text-to-text model has at least been updated in my account.
I switched to GPT-4o and tested it to see how it performs compared to GPT-4.
I found a random draft of an app design online and asked ChatGPT (with the GPT-4o model) to provide me with some suggestions for improving the UI design.
This is the result I received. It was extensive, and at the bottom, there were some actionable suggestions.
Then I switched to GPT-4 and asked the same prompt.
Here's the result—similar quality, yet less information than GPT-4o.
The speed of GPT-4, however, was much slower than that of GPT-4o. GPT-4o was at least 2x faster than GPT-4.
Then, I asked ChatGPT (GPT-4o) to give me some real-world examples with layouts similar to the UI I provided.
Surprisingly, all the examples were related to mobile apps focusing on “courses” and “assignments”. This indicates that ChatGPT detected the UI I provided was for a course-related mobile app.
Lastly, I switched to GPT-4 and asked the same follow-up question.
The quality of the results was not as good as GPT-4o's—only one example, Google Classroom, was a course-themed app. The other examples focused only on the clean layout but were not as relevant as the suggestions from GPT-4o.
Summary
GPT-4o is significantly faster than GPT-4.
The text-to-text generation quality of GPT-4o is slightly better than that of GPT-4.
The multimodal capabilities of GPT-4o open up a lot of potential for designers.
The new features of GPT-4o will be continually rolled out in the coming months.
Thanks for reading. If you enjoyed this issue, please consider sharing Design with AI newsletter to someone who might benefit from it.
Have a great week! I will visit Seattle for a work-related event—exciting!
—Xinran
P.S. Maven launched my AI course: AI for Product Designer. Secure your spot today before they fill up!