With laptop and smartphone makers like Samsung spreading generative AI across every aspect of their devices, Openai is trying to do the same with an agent tool announced on January 23rd. The tool, called Operator, runs on the same basic technology as Chatgpt, but resides within a Proprietary web browser. This enables it to autonomously perform actions such as ordering groceries or booking tours.
Openai suggested in a blog post operator could “ope[n] Up new engagement opportunities for companies, “but did not elaborate.
What is Openais operator?
Operator is an application that includes a web browser and the generative AI model GPT-4O. It is the result of an Openai project to train GPT-4O’s vision capabilities on the graphical user interfaces found on typical web pages. Its ability to make multi-step plans and fix errors independently if necessary set it apart from other efforts to create agentic AI, Openai boasted. The operator’s computer-using agent (CUA) model is trained specifically on buttons, forms, and menus likely to be found on a web page.
The operator is in beta. Openai said feedback from early-stage users will be used to improve it.
Chatgpt Pro subscribers can sign up for the operator starting today.
Openai plans to provide the operator for Plus, Team and Enterprise soon. The tech giant also intends to integrate its capabilities into ChatGpt in general. They are including CUA in their API “soon” according to the blog post.
How does the operator work?
The company says CUA’s reasoning technique, which it calls an “inner monologue,” helps the model understand intermediate steps and adapt to unexpected inputs. Under the hood, CUA takes screenshots of web pages and uses a virtual mouse and keyboard to navigate.
As with ChatGpt, users can add custom instructions that the operator will remember, such as the user’s preferred airline.
See: Threat actors can jailbreak generative AI to automatically create phishing emails and other malicious content.
Users can prompt operator in natural language the same way they can prompt ChatGpt. The operator is trained to log into websites, provide payment information or pass CAPTCHAS, so it will hand control back to the user for these steps. The operator is programmed not to accept requests—such as making bank transactions—or to weigh in on high-stakes situations, such as deciding whether to hire an employee.
If the operator encounters an interface it cannot predict how to interact with, it returns the task to the user. Openai worked directly with the following companies to ensure that the operator can interact with their websites:
- Doordash.
- Instacart.
- OpenTable.
- Price line.
- Stub hub.
- Thumbtack.
- Uber.
Openai notes that the early iteration of the operator tends to struggle with “complex interfaces,” including creating slideshows or adding items to calendars.
Operator is part of a crowded generative AI landscape
Some of the operator’s functionality overlaps with competitor tools, such as Google Gemini or Apple Intelligence.
The operator invites comparison to Microsoft’s much maligned recall feature, which uses screenshots to navigate a PC. Carrier also shares some capabilities with Google Lens on Chrome. However, its ability to navigate websites autonomously may be a point of differentiation. Agentic AI, where generative AI models perform multi-step errands on the user’s account, is either the hot new thing in tech or a new way to package the still limited products.