15 January 2025

Amazon is preparing to relaunch its voice-powered Alexa digital assistant as an AI “agent” that can complete practical tasks, as the technology group races to solve challenges that have stymied AI's overhaul of the system.

The $2.4 trillion company has sought over the past two years to redesign Alexa, its chat system built into 500 million consumer devices worldwide, so that the software's “brain” is implanted with generative artificial intelligence.

Rohit Prasad, who leads the Artificial General Intelligence (AGI) team at AmazonHe told the Financial Times that the voice assistant still needs to overcome many technical hurdles before it is launched.

This includes solving the problem of “hallucinations” or fabricated answers, the speed or “latency” of their response, and their reliability. “The hallucinations must be close to zero,” Prasad said. “It is still an open issue in the industry, but we are working hard on it.”

Amazon leaders' vision is to turn Alexa, which is currently still used for a narrow set of simple tasks like playing music and setting alarms, into a “proxy” product that acts as a personal concierge. This can include anything from suggesting restaurants to adjusting the lights in the bedroom based on a person's sleep cycles.

Alexa's redesign has been in the works since the launch of OpenAI's ChatGPT, backed by Microsoft, in late 2022. While Microsoft, Google, Meta and others have quickly integrated generative AI into their computing platforms and bolstered their software services, critics have questioned whether Amazon will be able to Resolve its technical and organizational conflicts in time to compete with its competitors.

According to several employees who have worked on Amazon's voice assistant teams in recent years, its efforts have been plagued by complications and come after years of AI research and development.

Several former workers said the long wait for launch was largely due to the unforeseen difficulties involved in switching and combining the simpler, pre-defined algorithms on which Alexa was built, with more powerful but unpredictable large-scale language models.

In response, Amazon said it is “working hard to enable more proactive and capable assistance” for its voice assistant. She added that the technical implementation of this scale, in a live service and a range of devices used by customers around the world, was unprecedented, and was not as simple as overlaying an LLM certificate on an Alexa service.

Prasad, the former chief engineer for Alexa, said the launch of the company's internal Amazon Nova models last month — led by its AGI team — was driven in part by specific needs for optimal speed, cost and reliability, for AI assistance. Apps like Alexa “get into the last mile, which is really hard.”

To function as an agent, Alexa's “brain” must be able to connect to hundreds of third-party software and services, Prasad said.

“Sometimes we underestimate the number of services built into Alexa, which is a huge number. These apps receive billions of requests weekly, so when you're trying to perform reliable actions quickly… you have to be able to do it in a cost-effective way,” he added.

The complexity comes from Alexa users who expect quick responses as well as extremely high levels of accuracy. These qualities run counter to the inherently probabilistic nature of today's generative AI, which is statistical software that predicts words based on speech and language patterns.

Some former employees also point to the difficulties they face in maintaining the assistant's original features, including its consistency and functionality, while imbuing it with new generative qualities such as creativity and free dialogue.

Given the personal and chatty nature of LLM's software, the company also plans to hire experts to shape the AI's personality, voice and diction so that it remains familiar to Alexa users, according to a person familiar with the matter.

A former senior member of the Alexa team said that while the LLM is very sophisticated, it comes with risks, such as producing answers that are “sometimes completely invented.”

“At the scale Amazon operates, this could happen large numbers of times a day,” they said, damaging its brand and reputation.

In June, Mihail Eric, a former machine learning scientist at Alexa and a founding member of its Conversation Modeling Team, said: He said publicly That Amazon “dropped the ball” when it became the “unequivocal market leader in conversational AI” with Alexa.

Although the company had strong scientific talent and “vast” financial resources, the company was “riddled with technical and bureaucratic problems,” Eric said, suggesting that “data was poorly annotated” and “documentation was either nonexistent or outdated.”

According to two former employees working on Alexa-related AI, the historical technology underlying the voice assistant was inflexible and difficult to change quickly, burdened by an outdated and disorganized code base and an engineering team that was “overly spread out.”

The original Alexa, built on technology acquired from British startup Evi in ​​2012, was a question-answering machine that worked by searching within a specific world of facts to find the right response, such as today's weather or a specific situation. A song in your music library.

The new Alexa uses a range of different AI models to recognize voice queries, translate them, and generate responses, as well as identify policy violations, such as picking up inappropriate responses and hallucinations. Creating software to translate between legacy systems and new AI models was a major hurdle in the Alexa-LLM integration.

The models include Amazon's internal software, including the latest Nova model, as well as Cloud, an AI model from startup Anthropic, in which Amazon has invested. More than $8 billion During the past 18 months.

“The most challenging thing about AI agents is making sure they are safe, reliable and predictable,” Anthropic CEO Dario Amodei told the Financial Times last year.

The agent-like AI program needs to get to the point “where . . . people can actually trust the system,” he added. “Once we get to that point, we will launch these systems.”

One current employee said more steps are still needed, such as installing child safety filters and testing custom Alexa integrations like smart lights and the Ring doorbell.

“Reliability is the issue — getting it to work close to 100 percent of the time,” the employee added. “That's why you see us… or Apple or Google shipping slowly and gradually.

Several third parties developing “skills” or features for Alexa said they were unsure when the new AI-enabled device would be released and how to create new functionality for it.

“We are awaiting details and understanding,” said Thomas Lindgren, co-founder of Swedish content development company Wanderword. “When we started working with them, they were more open…and then over time they changed.”

Another partner said that after an initial period of “pressure” from Amazon on developers to start preparing for the next generation of Alexa, things calmed down.

The perennial challenge facing Amazon's Alexa team — which faced major layoffs in 2023 — is how to make money. Figuring out how to make assistants “cheap enough to run at scale” will be a big task, said Jared Roche, co-founder of generative AI group OctoAI.

A former Alexa employee said options being discussed include creating a new Alexa subscription service, or taking a share of sales of goods and services.

Prasad said Amazon's goal is to create a variety of AI models that can serve as “building blocks” for a variety of applications beyond Alexa.

“What we're always focused on is customers and practical AI, we're not doing science for science's sake,” Prasad said. “We do this.” . . To deliver value and impact to customers, which in this age of generative AI is more important than ever because customers want to see a return on investment.

Leave a Reply

Your email address will not be published. Required fields are marked *