OpenAI's latest 'Voice Engine' requires only 15 seconds to replicate

30 Mar

30Mar

Although OpenAI admits there is a chance the technology can be abused, the "Voice Engine," presently undergoing closed testing, can restore people's voices.

The company OpenAI, which created the popular generative AI tool ChatGPT, has unveiled a new voice cloning technology called "Voice Engine." With a comparatively small amount of original audio, this audio model can replicate an individual's voice, including their intonation and other distinctive speech characteristics.

"In our Friday blog post, the company notes the remarkable capability of its small model, which can generate emotive and realistic voices using just a single 15-second sample."

"For comparison, ElevenLabs, an AI voice platform, offers an instant voice cloning tool that typically requires at least one minute of samples. For optimal performance at their professional service level, nearly 10 minutes of continuous speech is recommended."

"The company showcased various examples to demonstrate the capabilities of this technology. One such example involved cloning the voice of a young patient who had lost much of her ability to speak due to a vascular brain tumor. Using an older recording she made for a school project, OpenAI was able to replicate how she sounds today."

OpenAI worked with the creators of Livox, a "alternative communication app" for people with disabilities, as well as Lifespan, a nonprofit connected to Brown University's medical school. The patient was able to effectively communicate using her own voice thanks to the OpenAI Voice Engine's instant text-to-speech functionality, which it provided using a recording she had produced for a school presentation.

Furthermore, OpenAI showcased how HeyGen use its technology to generate speech translations from one language to another that sound natural.

Voice Engine was first created in late 2022 and is presently used to power preset voices in ChatGPT's Voice and Read Aloud functionality and OpenAI's text-to-speech API. Even with these developments, OpenAI is moving cautiously in front of a more widespread release.

"OpenAI aims to initiate a discussion on the responsible deployment of synthetic voices and how society can adapt to these new capabilities," the organization stated, acknowledging the widespread condemnation of "deepfakes."

These synthetic voices have been used to impersonate celebrities, government officials, and even private citizens for various nefarious purposes, including political campaigns, fraudulent advertisements, and criminal activities. U.S. President Joe Biden has been advocating for stronger safeguards against the malicious use of AI voice impersonations.

In fact, Meta revealed last summer that its AI voice tool was being withheld due to concerns about the "potential risks of misuse."

"In line with our commitment to AI safety and our voluntary pledges, we have decided to offer a preview of this technology without wide release at this time," OpenAI explained.

The partners currently testing Voice Engine have agreed to adhere to OpenAI's usage policies, which explicitly forbid the impersonation of another individual or organization without consent. Furthermore, the company mandates explicit and informed consent from the original speaker and prohibits developers from creating mechanisms for individual users to clone their own voices.

March 2024, Cryptoniteuae

OpenAI Chat GPT Voice Engine ElevenLabs AI voice tools

Comments

OpenAI's latest 'Voice Engine' requires only 15 seconds to replicate speech accurately