OpenAI previews Realtime API for speech-to-speech apps

October 2, 2024

104

OpenAI has launched a public beta of the Realtime API, an API that enables paid builders to construct low-latency, multi-modal experiences together with textual content and speech in apps.

Introduced October 1, the Realtime API, much like the OpenAI ChatGPT Advanced Voice Mode, helps pure speech-to-speech conversations utilizing preset voices that the API already helps. OpenAI is also introducing audio enter and output within the Chat Completions API to assist use circumstances that don’t want the low-latency advantages of the Realtime API. Developers can move textual content or audio inputs into GPT-4o and have the mannequin reply with textual content, audio, or each.

With the Realtime API and the audio assist within the Chat Completions API, builders don’t have to hyperlink collectively a number of fashions to energy voice experiences. They can construct pure conversational experiences with only one API name, OpenAI stated. Previously, creating the same voice expertise had builders transcribing an computerized speech recognition mannequin similar to Whisper, passing textual content to a textual content mannequin for inference or reasoning, and enjoying the mannequin’s output utilizing a text-to-speech mannequin. This strategy usually resulted in lack of emotion, emphasis, and accents, plus latency.

Source hyperlink

Post Views: 179

OpenAI previews Realtime API for speech-to-speech apps

LEAVE A REPLY Cancel reply

EVEN MORE NEWS

Zuckerberg’s Take on AI-Driven Ads Could ‘Wipe Out Entire Ad…

Cal AI Startup Founder Snubbed by 15 Elite Colleges: ‘Make A…

iPhone gross sales in China falter: headwinds on 4 fronts

POPULAR CATEGORY

RELATED ARTICLESMORE FROM AUTHOR

OpenAI Rolls Back March GPT-4o Update to Stop ChatGPT From B…

Microsoft previews SignalR shopper for iOS

OpenAI Rumored to Be Developing Social Media App, Competing …