Artificial intelligence takes a dramatic next step: OpenAI announced a new AI agent called ChatGPT Agent that will operate within the popular ChatGPT platform. The agent, which has already started rolling out to Pro users worldwide, allows performing complete digital actions on behalf of the user, using a virtual browser and advanced file generation capabilities.

This is one of the company's most significant announcements in the past year. The agent can function like an automatic personal assistant who understands the user's instructions and performs complex tasks on the internet and in digital work environments. Examples demonstrated so far include generating complete PowerPoint presentations based on financial data, building Excel sheets with analyses, coordinating a couple’s evening according to personal preferences, filling out online forms, using a programming terminal, and even ordering cakes online.

According to Iza Polford, the lead researcher who developed the agent, one of the first tests for the service was when she asked the chat to order cakes according to precise requirements. "It took almost an hour, but it was better than doing it myself," she said.

The new agent combines two previous services that OpenAI launched earlier this year: Operator — which allowed ChatGPT to use a visual browser to navigate websites, and Deep Research — which enabled deep processing of multiple information sources. Now, the combination of the two allows the new agent to switch between visual navigation mode and text reading, adapting itself to different tasks as needed.

Despite similarities to services like Microsoft's Copilot, the new agent does not directly replace the company’s Office software but may reduce the need for them. This fact is especially interesting considering that Microsoft is one of OpenAI’s largest investors, and the companies are currently negotiating the continuation of Microsoft’s access to OpenAI’s models.

The agent is initially available to Pro, ChatGPT Plus, and Teams users, with Pro users receiving up to 400 Agent actions per month, and others receiving only 40. At this stage, it is not known when the service will be available to free users, but it probably won’t happen soon.

During demonstrations presented to the press, the Agent successfully performed a variety of tasks ranging from generating a presentation on Nvidia's quarterly results to booking a restaurant table while cross-checking the user's calendar. Simple tasks like scheduling a meeting take about five minutes, but research tasks may last 20 minutes or more. All tasks are performed "as if simultaneously." In other words, you can ask the agent to perform multiple tasks at once, a significant advantage for business users.

One of the most intriguing features at launch is the new ability to watch a replay of the agent’s actions — in other words, the user can see exactly where the agent browsed, which websites were opened, and which steps were taken, as if it were a screen recording. This way, the user retains control over what is done on their behalf and learns how the agent operates.

However, the company notes that certain actions, such as logging into social networks or financial websites, will require active user approval. In these cases, a "view mode" will operate, requiring the user to remain on the action page and not switch to another app.

Currently, the new agent still does not support the user’s personal memory, meaning it cannot integrate knowledge from past interactions or previous preferences when performing new tasks. OpenAI emphasizes that this is mainly for safety reasons, aiming to avoid situations of “prompt injection” that could lead to disruption or misuse of the agent.

Although the new agent’s capabilities are not perfect and some tasks take a considerable amount of time to complete, it is clear that this is a significant step towards making ChatGPT a key player not only in text-based conversation but also in performing actual digital actions. If the trend of "smart agents" takes hold — and it may just be a matter of time — the traditional use of browsers, forms, and office software could change fundamentally. Perhaps in the near future, we will spend less time typing, searching, and clicking, and more time simply asking the AI to do these things for us.