Background

Six Hours And Codex: How One Developer Single-Handedly Built OpenAI's Vision For A Voice-First Future

OpenAI's Voice Hack Night

During OpenAI's Voice Hack Night in San Francisco, one solo developer stepped onto the stage and demonstrated something that felt straight out of a sci-fi movie.

The project, simply titled Agentic OS for a Phone, showcased a fully voice-first, agentic operating system that eliminates the need for traditional mobile apps. Selected as one of just four finalists at the hackathon, the prototype was built by Isa Usmanov in roughly six hours.

Rather than adding voice controls to an existing smartphone interface, it reimagines the phone entirely.

The proof-of-concept shows AI agents that can listen, reason, act, and generate real-time user interfaces on demand.

The idea echoes the 2013 film Her, directed by Spike Jonze.

In the movie, a lonely man develops a relationship with Samantha, an intelligent conversational operating system voiced by Scarlett Johansson. The AI doesn't simply respond to questions; she anticipates needs, manages tasks, and interacts in a way that feels remarkably human.

Read: ChatGPT Has A Scarlett Johansson Problem: 'Shocked, Angered And In Disbelief'

OpenAI CEO Sam Altman has repeatedly described Her as his favorite AI film and has often cited its interaction model as a vision of how humans may eventually engage with artificial intelligence.

While OpenAI's attempt to recreate Johansson's voice became controversial, the company's broader ambition remained unchanged: building a voice-native AI experience that feels as natural and intuitive as the one depicted in the film.

The launch of ChatGPT's Advanced Voice Mode marked a major step away from text-based interaction and toward fluid, real-time conversations. By 2026, OpenAI's latest real-time models had advanced far beyond simple dialogue, becoming capable of taking actions, coordinating tools, and driving autonomous workflows.

The larger goal is an agentic future.

And at Voice Hack Night, Usmanov's prototype offered a tangible glimpse of what that future could look like.

Holding up a standard smartphone, he spoke naturally:

"Find me flights from Munich to San Francisco late next week."

"What's on my calendar today?"

"Delete the 1:30 a.m. meeting — I'm not going to make it."

"Show me the latest important AI industry news."

"What's the weather in San Francisco?"

The phone responded almost instantly.

A glowing blue orb interface transformed into dynamically generated, context-aware screens: flight cards displaying prices, durations, and airline information; a clean calendar view; weather summaries; and AI news briefings. When he verbally deleted a meeting, it disappeared. When he asked about flights to Rio, new travel options appeared immediately.

Everything was built by a single developer using OpenAI's Codex for rapid code generation alongside the company's latest real-time voice models. No traditional app ecosystem. No manually designed screens for every workflow. Just an AI system capable of understanding intent and generating the necessary interface on demand.

"I actually built everything in Codex, and then the rest is powered by the new Real-time 2 model," Usmanov explained during his pitch.

"I started completely from scratch and built an agentic phone for myself. I think it resonates with people because I'm not the only one who has this problem."

The concept feels increasingly inevitable, not only because it aligns with OpenAI's long-term vision, but because it addresses a frustration shared by nearly every smartphone user. Modern devices are crowded with apps, notifications, menus, and fragmented experiences.

An agentic operating system flips that model entirely. Instead of navigating software, users simply express their intentions. The AI handles the rest.

While still an early prototype, the hackathon project demonstrates that much of the technical foundation already exists. As real-time AI models continue to improve, the idea of a voice-native, agent-driven operating system no longer feels like science fiction. It feels like a logical next step in the evolution of personal computing.

Published: 
31/05/2026