The Creator of ChatGPT's Voice Aims to Build Tech from "Her" Without Dystopia

Alexis Conneau has long been fascinated by the movie "Her" and has been dedicated to turning its fictional voice technology into reality. His Twitter banner features a picture of Joaquin Phoenix's character from the movie. With ChatGPT's Advanced Voice Mode, a project he started at OpenAI after similar work at Meta, he achieved a significant milestone. Now, he has launched WaveForms AI, a new audio LLM company aiming to release AI audio products in 2025 that compete with those from OpenAI and Google. The startup raised $40 million in seed funding led by Andreessen Horowitz.

How "Her" Influenced Conneau and His New Venture

In an interview with TechCrunch, Conneau spent a good deal of time discussing how to avoid the dystopia depicted in "Her". The movie was a science fiction tale about a world where people form intimate relationships with AI systems instead of humans. Conneau believes they want to bring the existing and future technology for good and do the opposite of what the company in the movie does. Building the tech without the dystopia seems contradictory, but he is determined to do it and is convinced his new AI startup will help people "feel the AGI" with their ears.WaveForms AI is an audio LLM company training its own foundation models. It aims to supply "emotionally intelligent" AI that facilitates various interactions, such as talking to your car or computer. Conneau is wary of the AI companionship space but wants the company to be more "horizontal" and have a broader impact. He believes talking to generative AI will become a more common way to interact with technology in the future.

The Technology Behind WaveForms AI

WaveForms AI's Advanced Voice Mode is a significant innovation. It doesn't just translate voice into text and back like the old voice mode. Instead, it breaks down the audio of your voice into tokens and runs those tokens directly through an audio-specific transformer model, enabling low latency. This is a more advanced approach that sets it apart from ChatGPT's regular voice mode.When it comes to AI audio models, there is a claim that they can "understand emotions". Similar to text-based LLMs, audio LLMs recognize patterns in audio clips labeled as "sad" or "excited" by humans. They respond with emotional intonations based on these recognized patterns.

Making AI More Personable

Conneau believes that generative AI doesn't need to get significantly smarter than GPT-4o to create better products. WaveForms is focusing on making AI better to talk to rather than improving the underlying intelligence. They aim to develop smaller foundational models that are less expensive and faster to run, given the evidence that the old AI scaling laws are slowing down.Conneau's former co-worker at OpenAI, Ilya Sutskever, often talked about trying to "feel the AGI". Conneau is convinced that achieving AGI will be more of a feeling and that audio LLMs will be the key. He believes you will be able to feel the AGI more when you can talk to it and hear it.

Responsibility in AI Development

As startups make AI better to talk to, they have a responsibility to ensure people don't get addicted. Andreessen Horowitz's Martin Casado believes it may not be a bad thing if people talk to AI more often. He compares it to talking to a random person on the internet or a video game. However, from a societal standpoint, there is a concern about developing a loving relationship with AI, as depicted in "Her". WaveForms now has to navigate this fine line.