A new “empathic voice interface” launched today by Hume AI, a New York–based startup, makes it possible to add a range of emotionally expressive voices, plus an emotionally attuned ear, to large language models from Anthropic, Google, Meta, Mistral, and OpenAI—portending an era when AI helpers may more routinely get all gushy on us.
“We specialize in building empathic personalities that speak in ways people would speak, rather than stereotypes of AI assistants,” says Hume AI cofounder Alan Cowen, a psychologist who has coauthored a number of research papers on AI and emotion, and who previously worked on emotional technologies at Google and Facebook.
WIRED tested Hume’s latest voice technology, called EVI 2 and found its output to be similar to that developed by OpenAI for ChatGPT. (When OpenAI gave ChatGPT a flirtatious voice in May, company CEO Sam Altman touted the interface as feeling “like AI from the movies.” Later, a real movie star, Scarlett Johansson, claimed OpenAI had ripped off her voice.)
Like ChatGPT, Hume is far more emotionally expressive than most conventional voice interfaces. If you tell it that your pet has died, for example, it will adopt a suitable somber and sympathetic tone. (Also, as with ChatGPT, you can interrupt Hume mid-flow, and it will pause and adapt with a new response.)
OpenAI has not said how much its voice interface tries to measure the emotions of users, but Hume’s is expressly designed to do that. During interactions, Hume’s developer interface will show values indicating a measure of things like “determination,” “anxiety,” and “happiness” in the users’ voice. If you talk to Hume with a sad tone it will also pick up on that, something that ChatGPT does not seem to do.
Hume also makes it easy to deploy a voice with specific emotions by adding a prompt in its UI. Here it is when I asked it to be “sexy and flirtatious”:
And when told to be “sad and morose”:
And here’s the particularly nasty message when asked to be “angry and rude”:
The technology did not always seem as polished and smooth as OpenAI’s, and it occasionally behaved in odd ways. For example, at one point the voice suddenly sped up and spewed gibberish. But if the voice can be refined and made more reliable, it has the potential to help make humanlike voice interfaces more common and varied.
The idea of recognizing, measuring, and simulating human emotion in technological systems goes back decades and is studied in a field known as “affective computing,” a term introduced by Rosalind Picard, a professor at the MIT Media Lab, in the 1990s.
Albert Salah, a professor at Utrecht University in the Netherlands who studies affective computing, is impressed with Hume AI’s technology and recently demonstrated it to his students. “What EVI seems to be doing is assigning emotional valence and arousal values [to the user], and then modulating the speech of the agent accordingly,” he says. “It is a very interesting twist on LLMs.”