
Large Language Models (LLMs) have come a long way, and they are getting better and more powerful.
Since OpenAI introduced ChatGPT and redefined how to commercialize the AI to help people, the company created an arms race where others compete in this emerging, lucrative business.
But then, the market that now becomes saturated, offers pretty much the same thing. It's either one or another because rivals are all great alternatives.
To keep on thriving and remain relevant, OpenAI is enhancing its AI to be a better "assistant' to users.
In this case, the company introduces 'Operator,' which it describes as "an agent that can use its own browser to perform tasks for you."
In a post on its website, OpenAI said that:
"Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes. The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses."
A research preview of Operator, an agent that can use its own browser to perform tasks for you. pic.twitter.com/wkBBDIlVqj
— OpenAI (@OpenAI) January 23, 2025
Operator is based on a new model we’re calling “computer-using agent” (CUA).
CUA combines GPT-4o's vision capabilities with advanced reasoning through reinforcement learning. It’s trained to control a computer in the same way a human would—it looks at the screen, and uses a…— OpenAI (@OpenAI) January 23, 2025
Operator is designed to handle a wide range of repetitive browser tasks, from filling out forms and ordering groceries to generating memes.
By leveraging the same interfaces and tools that humans use daily, this feature enhances LLM-powered AI's practicality, saving time on routine activities while creating new opportunities for businesses to engage with their audiences.
To ensure a thoughtful and gradual launch, OpenAI said that it's starting on a smaller scale.
The company said that Operator is initially in preview in the U.S., and only available for Pro users.
"This research preview allows us to learn from our users and the broader ecosystem, refining and improving as we go. Our plan is to expand to Plus, Team, and Enterprise users and integrate these capabilities into ChatGPT in the future," said OpenAI.
As for what it's capable of, Operator is powered by an AI model called the Computer-Using Agent (CUA).
This model combines the visual understanding of GPT-4o with advanced reasoning skills developed through reinforcement learning, enabling it to interact seamlessly with graphical user interfaces (GUIs) like buttons, menus, and text fields commonly seen on screens.
CUA allows Operator to "see" by analyzing screenshots and "interact" by performing actions with the equivalent of a mouse and keyboard. This enables Operator to perform tasks on the web independently, without the need for custom API integrations.
And because things on the web aren't bells and whistle, there can be challenges. And what this means, there can be mistakes.
According to OpenAI, Operator can address this by leveraging its reasoning capabilities to self-correct.
Like for example, iff it encounters a situation where it cannot proceed, it simply hands control back to the user, ensuring a smooth and collaborative experience.
Although CUA is still in its early stages and has certain limitations, it has already set new benchmarks in WebArena and WebVoyager, two key assessments for browser-based tasks.
To use Operator, users can just describe the task they want to accomplish, and let it handle the rest.
Since Operator uses its own browser, users can easily watch how things go, and take over control of it at any time, and Operator is designed to ask for user intervention when tasks involve sensitive actions like logging in, entering payment details, or solving CAPTCHAs.
Operator also offers the flexibility to personalize workflows by adding custom instructions.
These can apply universally across all websites or be tailored for specific platforms, such as selecting preferred airlines. For convenience, users can save frequently used prompts on the homepage, making it easy to automate recurring tasks like restocking groceries.
Much like managing multiple tabs in a web browser, Operator can run several tasks simultaneously in separate conversations.
In an example OpenAI said that users can use Operator to place an order for a personalized enamel mug on Etsy while booking a campsite on Hipcamp—all at the same time.
AI that can do things like this, is prone for mistakes that can harm users.
To ensure that Operator is safe, OpenAI said that it employs three layers of safeguards to prevent abuse and ensure users are firmly in control.
First, OpenAI makes Operator to always ensure that the person using it is always in control and asks for input at critical points. Second, OpenAI makes it easy for users to manage data privacy in Operator, and third, OpenAI created defenses against adversarial websites that may try to mislead Operator through hidden prompts, malicious code, or phishing attempts.
But still, mistakes can still happen, especially when Operator is still at its early phase.
"While Operator is designed with these safeguards, no system is flawless and this is still a research preview; we are committed to continuous improvement through real-world feedback and rigorous testing," explained OpenAI.
There are also limitations.
"Operator is currently in an early research preview, and while it’s already capable of handling a wide range of tasks, it’s still learning, evolving and may make mistakes. For instance, it currently encounters challenges with complex interfaces like creating slideshows or managing calendars. Early user feedback will play a vital role in enhancing its accuracy, reliability, and safety, helping us make Operator better for everyone," added OpenAI.
For the first time, our models can take actions on the internet, so we did a lot of internal testing and external red teaming to help ensure Operator is safe to use. https://t.co/F8C54j3fem
— OpenAI (@OpenAI) January 23, 2025