What are AI agents? | MIT Technology Review

Agents featured prominently in Google’s annual I/O conference in May, when the company unveiled its new AI agent called Astra, which allows users to interact with it using audio and video. OpenAI’s new GPT-4o model has also been called an AI agent.

And it’s not just hype, although there is definitely some of that too. Tech companies are plowing vast sums into creating AI agents, and their research efforts could usher in the kind of useful AI we have been dreaming about for decades. Many experts, including Sam Altman, say they are the next big thing.

But what are they? And how can we use them?

How are they defined?

It is still early days for research into AI agents, and the field does not have a definitive definition for them. But simply, they are AI models and algorithms that can autonomously make decisions in a dynamic world, says Jim Fan, a senior research scientist at NVIDIA who leads the company’s AI agents initiative.

The grand vision for AI agents is a system that can execute a vast range of tasks, much like a human assistant can. In the future, it could help you book your vacation, but it will also remember if you prefer swanky hotels, so it will only suggest hotels that have four stars or more, then go ahead and book the one you pick from the range of options it offers you. It will then also suggest flights that work best with your calendar, and plan the itinerary for your trip based on your preferences. It could make a list of things to pack based on that plan and the weather forecast. It might even send your itinerary to any friends it knows live in your destination, and invite them along. In the workplace, it could analyze your to-do list and execute tasks from it, such as sending calendar invites, memos or emails.

One vision for agents is that they are multimodal, meaning they can process language, audio and video. For example in Google’s Astra demo, users could point their smartphone cameras at things and ask the agent questions. The agent could respond to inputs across text, audio and video.

These agents could also make processes smoother for businesses and public organizations, says David Barber, the director of the University College London Centre for Artificial Intelligence. For example, an AI agent might be able to function as a more sophisticated customer service bot. The current generation of language model-based assistants can only generate the next likely word in a sentence. But an AI agent would have the ability to act on natural language commands autonomously, and process customer service tasks without supervision. For example, the agent will be able to analyze customer complaint emails, and then know it needs to check the customer’s reference number, access databases such as customer relationship management and delivery systems to see whether the complaint is legitimate, and process it according to the company’s policies, Barber says.

Broadly speaking, there are two different categories of agents: Software agents and embodied agents, says Fan.