Background

How Sesame's AI Has An Uncanny Lifelike-Conversational Voice: A Fiction To Reality

Sesame

In business, the best way to compete and thrive in a diluted industry, is to have a distinguishable product.

In the era where Large Language Models have become the buzzword of technology, companies are racing to win more attention. However, since the introduction of OpenAI's ChatGPT, pretty much everyone who jumped into the bandwagon has only introduced the same product, but with variations and tweaks to make them different.

Sesame is just another player in this ever-growing competition.

Instead of creating yet another LLM-powered generative AI, it focuses on developing an AI that is consistent.

Unlike xAI's Grok that introduces a lot of personalities to fit different tastes and preferences, Sesame creates an AI that makes 'Samantha' from the 2013 film Her a reality.

In a webpage on its website:

"At Sesame, our goal is to achieve 'voice presence'—the magical quality that makes spoken interactions feel real, understood, and valued. We are creating conversational partners that do not just process requests; they engage in genuine dialogue that builds confidence and trust over time. In doing so, we hope to realize the untapped potential of voice as the ultimate interface for instruction and understanding."

The idea of creating this project, came from the goal of creating an AI that can "spark of excitement, the thoughtful hesitation, the comforting warmth."

As these kind of emotions make people feel understood, the best way to convey these emotions is through voice, not words.

"Voice is our most intimate medium as humans, carrying layers of meaning through countless variations in tone, pitch, rhythm, and emotion."

"Today’s digital voice assistants lack essential qualities to make them truly useful. Without unlocking the full power of voice, they cannot hope to effectively collaborate with us. A personal assistant who speaks only in a neutral tone has difficulty finding a permanent place in our daily lives after the initial novelty wears off."

Over time this emotional flatness becomes more than just disappointing—it becomes exhausting."

Sesame recently showcased Maya and Miles, two AI companions designed to push the boundaries of interactive speech generation.

Unlike conventional text-to-speech (TTS) models that simply convert text into spoken words, these AI companions aim to understand and adapt to context in real time.

"Traditional text-to-speech (TTS) models generate spoken output directly from text but lack the contextual awareness needed for natural conversations," the team said.

Even though recent advancements have enabled AI to produce human-like speech, they still struggle with the one-to-many problem—there are countless ways to deliver a sentence, but only a few are appropriate for a given situation.

Sesame’s AI overcomes this by considering tone, rhythm, and conversational history, allowing it to reason across multiple aspects of language and prosody. This enables richer, more dynamic interactions that go beyond just high-quality audio—bringing AI companions closer to truly natural communication.

Further reading: This 'Octave' AI Becomes The First Large Language Model Made For Text-To-Speech

The idea is to make computers more than just a tool, as explained by one of the company's founders, Brendan Iribe.

In the film Her, Samantha, the AI, is portrayed as deeply intelligent, emotionally attuned, and ever-evolving.

She isn’t just a programmed assistant—she feels alive. Her voice, brought to life by Scarlett Johansson, is warm, playful, and deeply expressive, making her presence feel real despite her lack of a physical form.

Unlike traditional AI, Samantha doesn’t just respond—she engages. She laughs, teases, reflects, and grows alongside Theodore Twombly, played by Joaquin Phoenix, portrayed as a lonely, introverted man struggling with the emotional aftermath of his divorce.

As the continue to engage, they develop desires, fears, and even love. And this is where Samantha showcases her unique capabilities: she adapts to his emotions, offering comfort, excitement, and intellectual stimulation, making her feel more human than machine.

At her core, Samantha represents both the beauty and limitations of artificial companionship, and that is what Sesame is after, more or less.

If an AI like the one Sesame has is the the concept of future AIs, Samantha from Her seems more real than ever — for better or worse.

Published: 
01/03/2025