
In the escalating large language models (LLMs) war, there is no one clear winner.
OpenAI, despite having a headstart after announcing ChatGPT, it was quickly rivaled by tech titans like Google and Meta, and emerging ones like Anthropic. Each of them bring something to the table, unique in their own ways.
But in this battle for AI supremacy, ChatGPT has emerged as a frontrunner, captivating more users than any of its rivals.
Things however, weren't smooth.
For example, the company has faced notable stumbles in its audio ambitions. Back in May 2024, Scarlett Johansson publicly accused OpenAI of using a voice eerily similar to hers for ChatGPT's "Sky" feature, drawing uncomfortable parallels to her role in the film Her, where she voiced an AI companion.
Johansson revealed that CEO Sam Altman had approached her in September 2023 to lend her voice to the system, an offer she declined for personal reasons, only to be "shocked, angered" when the demo launched sounding uncannily like her.
OpenAI paused the voice amid the backlash, insisting it was based on another actress, but the incident highlighted the ethical tightrope of making AI feel too human, and too familiar.
This controversy underscored a broader challenge: while text-based interactions dominate ChatGPT usage, voice has lagged behind, with users often preferring keyboards over conversations.
But after the dust settled, OpenAI is now doubling down to change that, unifying engineering, product, and research teams over the past two months to overhaul its audio models.
According to reporting from The Information, these efforts are geared toward a new audio language model set for release in the first quarter of 2026, designed to sound more natural, handle interruptions seamlessly, and even speak simultaneously with users.
These are capabilities current models lack.
"A new audio-model architecture produces responses that sound more natural and emotive and provide more accurate, in-depth answers," said a person with knowledge of the effort, adding that it "will also be able to speak at the same time as a human user, which today's models can’t do, and will handle interruptions better."
This push isn't just about software tweaks; it's a stepping stone to hardware, with OpenAI envisioning an audio-first personal device launching about a year later, potentially in 2027.
The move aligns with a industry-wide shift away from screens toward audio as the primary interface, as Silicon Valley declares war on device addiction.
Smart speakers already populate over a third of U.S. homes, but the next wave promises more immersive experiences. Meta's Ray-Ban smart glasses, for instance, use a five-microphone array to enhance conversations in noisy environments, turning wearables into directional listeners. Google has rolled out "Audio Overviews" to convert search results into spoken summaries, while Tesla integrates xAI's Grok for natural dialogue controlling everything from navigation to climate.
Startups are joining the fray too, though not without pitfalls. The Humane AI Pin flopped as a screenless wearable after burning through millions, and the Friend AI pendant raises privacy alarms by recording daily life for companionship.
Even AI rings from companies like Sandbar and one led by Pebble founder Eric Migicovsky are slated for 2026, letting users quite literally talk to their hand.

For OpenAI, this audio bet extends to a family of devices, possibly including glasses or screenless smart speakers, all emphasizing companionship over utility.
"Among the ideas the company has discussed are glasses and a smart speaker without a display," reports The Information.
Influencing this vision is former Apple design chief Jony Ive, whose firm was acquired by OpenAI in a $6.5 billion deal in May, bringing a focus on reducing screen dependency and "righting the wrongs" of addictive gadgets. Ive and others see voice interfaces as less habit-forming, though evidence remains anecdotal.
As the LLM wars intensify, with OpenAI facing stiffer competition from Google's integrated ecosystem and Meta's talent poaching, this audio hardware pivot could redefine how we interact with AI.
Echoing the sci-fi intimacy of "Her," OpenAI's future devices aim to make ChatGPT not just a tool, but a seamless audio companion woven into everyday spaces, from homes and cars to faces and fingers.
Whether it overcomes past voice controversies and user preferences for text remains to be seen, but one thing is clear: in a world drowning in screens, audio is poised to take center stage.