Artificial intelligence (AI) company OpenAI has begun rolling out an advanced voice feature for its ChatGPT platform.
The feature, which utilizes the company’s GPT-4o model, offers hyper-realistic audio responses, according to a Tuesday (July 30) TechCrunch report. The new audio capabilities supposedly enable users to have real-time, delay-free conversations with ChatGPT and even interrupt it mid-sentence, addressing key challenges in achieving realistic AI interactions.
The alpha version of Advanced Voice Mode is being released to a select group of ChatGPT Plus subscribers, with plans for a broader rollout to all premium users in this fall. This cautious approach comes after controversy surrounding the technology’s initial demonstration in May.
During that showcase, the voice capability, dubbed “Sky,” drew attention for its uncanny resemblance to actress Scarlett Johansson’s voice, even as the actress said she had repeatedly denied OpenAI permission to use her voice.
Johansson, who had a starring role in the AI-themed film “Her,” subsequently sought legal counsel to protect her likeness. OpenAI denied using Johansson’s voice but removed the controversial demo, highlighting the complex legal landscape surrounding AI and celebrity likeness rights.
To mitigate potential misuse, OpenAI has limited the system to four preset voices created in collaboration with paid voice actors. The company emphasized that ChatGPT cannot impersonate specific individuals or public figures, a measure designed to prevent the creation of deceptive deepfakes — a growing concern in the AI industry.
“We tested GPT-4o’s voice capabilities with 100+ external red teamers across 45 languages,” the company wrote on X, formerly Twitter, in a series of posts on Tuesday to announce the new offering. “To protect people’s privacy, we’ve trained the model to only speak in the four preset voices, and we built systems to block outputs that differ from those voices. We’ve also implemented guardrails to block requests for violent or copyrighted content.”
We tested GPT-4o’s voice capabilities with 100+ external red teamers across 45 languages. To protect people’s privacy, we’ve trained the model to only speak in the four preset voices, and we built systems to block outputs that differ from those voices. We’ve also implemented…
— OpenAI (@OpenAI) July 30, 2024
OpenAI has also implemented filters to block requests for generating music or copyrighted audio, a move likely influenced by recent legal actions against AI companies for alleged copyright infringement.
The music industry, in particular, has been proactive in challenging AI-generated content, with lawsuits already filed against AI song-generators Suno and Udio.