
After taking baby steps, it's time for giant leaps. And OpenAI has proven the world once more, that it's more than capable.
OpenAI has been in business creating and developing numerous AIs. But it's ChatGPT that makes it shine the most. At first, the AI is only text based, and can only respond in text. Later, it can listen. This time, OpenAI enhanced it so it can "see, hear, and speak"
According to OpenAI in a blog post:
"Voice and image give you more ways to use ChatGPT in your life. Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it. When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow up questions for a step by step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you."
Use your voice to engage in a back-and-forth conversation with ChatGPT. Speak with it on the go, request a bedtime story, or settle a dinner table debate.
Sound on pic.twitter.com/3tuWzX0wtS— OpenAI (@OpenAI) September 25, 2023
In the first update, users can opt into voice conversations on ChatGPT's mobile app and choose from five different synthetic voices for the chatbot to respond with.
This feature allows ChatGPT to both "listen" and "speak."
OpenAI said that the feature shall have a voice chat capabilities powered by a novel text-to-speech model capable of mimicking human voices, thanks to its integration with the company’s image generation models.
The features is part of what is known as GPT Vision (or GPT-V).
In the second update, users can also share images with ChatGPT, with the ability to also highlight areas of focus for further analysis.
This, allows ChatGPT to "see."
Together, the update is considered OpenAI's biggest since the introduction of GPT-4.
This upgrade comes right after OpenAI unveiled DALL·E 3, its most advanced text-to-image generator yet.
The integration of DALL·E 3 and the ChatGPT bot signifies OpenAI’s push towards AI assistants that can perceive the world more like humans do.
By giving it "senses," OpenAI is like giving users "more ways to use ChatGPT in your life."
Long story short, this launch represents a major step towards OpenAI’s vision for advancing AI from a mere ANI to AGI.
Sam Altman, the co-founder and CEO of OpenAI, has previously said that transitioning to AGI is perhaps the most important project in human history.
Read: Artificial General Intelligence, And How Necessary Controls Can Help Us Prepare For Their Arrival
ChatGPT can now see, hear, and speak. Rolling out over next two weeks, Plus users will be able to have voice conversations with ChatGPT (iOS & Android) and to include images in conversations (all platforms). https://t.co/uNZjgbR5Bm pic.twitter.com/paG0hMshXb
— OpenAI (@OpenAI) September 25, 2023
This feature is significant, and comes alongside the ever-rising stakes of AI arms race among chatbot leaders such as OpenAI, Microsoft, Google and Anthropic.
OpenAI said that the voice and image support for ChatGPT is rolling out to all paying subscribers, whereas voice is coming on both ChatGPT app on iOS and Android for all users.
It's worth noting that voice functionality is initially limited to the iOS and Android apps, whereas the image processing capabilities shall be available on all platforms.
In an effort to encourage consumers to adopt generative AI into their daily lives, tech giants are racing to launch not only to improve their chatbots with new features, but also integrating them with their other products.
Experts have raised concerns about this kind of AIs, and how companies behind them preserve user data. OpenAI acknowledged those concerns, said that synthetic voices were "created with voice actors we have directly worked with," rather than collected from strangers.
The company's terms of service also said that consumers own their inputs "to the extent permitted by applicable law."
However, the company also noted there that transcriptions are considered inputs and may be used to improve the large-language models.
The underlying research — voice generation and image understanding — offers a glimpse at what much more advanced AI will be capable of in the future. Learn more about this update and our safety measures: https://t.co/uNZjgbR5Bm
— OpenAI (@OpenAI) September 25, 2023
And speaking about the risks of having more powerful multimodal AI systems involving vision and voice generation, OpenAI is also aware of the potential of impersonation, bias and reliance.
"OpenAI’s goal is to build AGI that is safe and beneficial," the company wrote in its announcement.
"We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future."
Read: Paving The Roads To Artificial Intelligence: It's Either Us, Or Them