David Ferrucci, CEO of AI company Elemental Cognition and previously the lead on IBM’s Watson project, says language models have removed a great deal of the complexity from building useful assistants. Parsing complex commands previously required a huge amount of hand-coding to cover the different variations of language, and the final systems were often annoyingly brittle and prone to failure. “Large language models give you a huge lift,” he says.
Ferrucci says, however, that because language models are not well suited to providing precise and reliable information, making a voice assistant truly useful will still require a lot of careful engineering.
More capable and lifelike voice assistants could perhaps have subtle effects on users. The huge popularity of ChatGPT has been accompanied by confusion over the nature of the technology behind it as well as its limits.
Motahhare Eslami, an assistant professor at Carnegie Mellon University who studies users’ interactions with AI helpers, says large language models may alter the way people perceive their devices. The striking confidence exhibited by chatbots such as ChatGPT causes people to trust them more than they should, she says.
People may also be more likely to anthropomorphize a fluent agent that has a voice, Eslami says, which could further muddy their understanding of what the technology can and can’t do. It is also important to ensure that all of the algorithms used do not propagate harmful biases around race, which can happen in subtle ways with voice assistants. “I’m a fan of the technology, but it comes with limitations and challenges,” Eslami says.
Tom Gruber, who cofounded Siri, the startup that Apple acquired in 2010 for its voice assistant technology of the same name, expects large language models to produce significant leaps in voice assistants’ capabilities in coming years but says they may also introduce new flaws.
“The biggest risk—and the biggest opportunity—is personalization based on personal data,” Gruber says. An assistant with access to a user’s emails, Slack messages, voice calls, web browsing, and other data could potentially help recall useful information or unearth valuable insights, especially if a user can engage in a natural back-and-forth conversation. But this kind of personalization would also create a potentially vulnerable new repository of sensitive private data.
“It’s inevitable that we’re going to build a personal assistant that will be your personal memory, that can track everything you’ve experienced and augment your cognition,” Gruber says. “Apple and Google are the two trusted platforms, and they could do this but they have to make some pretty strong guarantees.”
Hsiao says her team is certainly thinking about ways to advance Assistant further with help from Bard and generative AI. This could include using personal information, such as the conversations in a user’s Gmail, to make responses to queries more individualized. Another possibility is for Assistant to take on tasks on behalf of a user, like making a restaurant reservation or booking a flight.
Hsiao stresses, however, that work on such features has yet to begin. She says it will take a while for a virtual assistant to be ready to perform complex tasks on a user’s behalf and wield their credit card. “Maybe in a certain number of years, this technology has become so advanced and so trustworthy that yes, people will be willing to do that, but we would have to test and learn our way forward,” she says.