OpenAI's 'Realtime API' Lets Developers Can Use Its Voice Assistant In Third-Party Apps

OpenAI, Realtime API

The AI was kind of boring and dull, before OpenAI came in.

Especially when it announced ChatGPT, the company quickly wowed the world with the generative AI product. And since then, the rest is history. Rivals began racing to create competing products that things become an arms race.

With the many players in the field, OpenAI remains the buzzword of generative AI.

But with the many competitor it is now having, OpenAI is operating in an increasingly competitive space.

To remain relevant, the company is introducing a set of new tools that would make it easier for developers to jump on board and build applications based on its AI technology.

Here, OpenAI introduces what it calls the 'Realtime API'.

According to a post on its website, Realtime API allows developers to "build fast speech-to-speech experiences into their applications."

"Today, we're introducing a public beta of the Realtime API, enabling all paid developers to build low-latency, multimodal experiences in their apps. Similar to ChatGPT’s Advanced Voice Mode, the Realtime API supports natural speech-to-speech conversations using the six preset voices(opens in a new window) already supported in the API."

"We’re also introducing audio input and output in the Chat Completions API(opens in a new window) to support use cases that don’t require the low-latency benefits of the Realtime API. With this update, developers can pass any text or audio inputs into GPT-4o and have the model respond with their choice of text, audio, or both."

The API is meant to ease developers in their work.

Previously, the process developers had to go through, include transcribing audio with an automatic speech recognition model like Whisper, to then then pass the text to a text model for inference or reasoning, and finally play the model’s output using a text-to-speech.

"This approach often resulted in loss of emotion, emphasis and accents, plus noticeable latency," said OpenAI.

With the API, OpenAI is simplifying those three steps into just one.

"The Realtime API improves this by streaming audio inputs and outputs directly, enabling more natural conversational experiences. It can also handle interruptions automatically, much like Advanced Voice Mode in ChatGPT," the company added.

As part of the rollout, OpenAI also introduced a fine-tuning tool for models after training that would allow developers to improve the responses generated by models using images and text.

This fine-tuning process can include feedback from humans who feed the model examples of good and bad answers based on its responses.

Using images to fine-tune models would give them stronger image understanding capabilities, enabling applications such as enhanced visual search and improved object detection for autonomous vehicles, OpenAI said.

The startup also unveiled a tool that would allow smaller models to learn from larger ones, along with "Prompt Caching" that cuts some development costs by half by reusing pieces of the text AI has previously processed.

While OpenAI earns a lot of money from paying subscribers, a larger chunk of its revenue comes from businesses that use its services to build their own AI applications.

So here, Realtime API's introduction should be a key selling point.

After all, OpenAI is seeing a heated up competition, as others, like Google with its Gemini, Microsoft with its Copilot initiative, Meta with LLaMA, and many others that include smaller ones, have been quick in catching up to the generative AI trends, that many of them are no longer considered "alternatives" to OpenAI's products.

Realtime API is part of its plan to stay ahead of an increasingly crowded market for AI software, at a time when it’s also looking to close a large funding round.

It also comes days after several leaders, including CTO Mira Murati, who are departing from the company.

Published: 
02/10/2024