I Tried the Agentic Browsers
Today I tried three agentic browsers: Comet, Dia, and Fellou. I gave it a real task I wanted to automate: extract data from a webpage, and write it to a Google Sheet in another tab. The task requires clicking on buttons to reveal data, which I thought would be the most challenging part.
The results were quite disappointing.
- Comet: was decent at parsing the page and extracting data (including clicking buttons), but it froze at the very end—maybe it exceeded the context limit or had some other issue. As for writing to the Google Sheet, it said it couldn't do it.
- Fellou: page parsing was poor, and it also froze. However, it could at least interact with the Google Sheet, although its CPU usage spiked to 20%.
- Dia: Dia can extract information that's already on the webpage, but it cannot interact with the buttons, nor write to a Google sheet.
Overall, I see two major hurdles for agentic/AI browsers:
Problem 1: Webpages are not designed for AI.
Webpages contain a massive amount of redundant information, which consumes a huge amount of context, interferes with the AI's judgment, and slows down execution. Services that convert webpages to Markdown don't handle dynamic content well. I feel this ultimately needs to be solved by websites and content providers, for example, by offering paid, AI-friendly Markdown APIs. Trying to handle this entirely on the client-side is extremely difficult.
Problem 2: Screen Reading.
On a Mac, for instance, reading screen content relies on the accessibility API. This creates a problem: if the AI wants to "see" the webpage (rather than just its HTML), the browser must be the foreground app; it can't just run in the background. But the whole point of using an AI assistant is to save time so I can do other things, right? If I still have to keep the browser in the foreground, I might as well just do it myself. Allowing AI to take screenshots using browser API might solve this to some extent, but it would be less precise than the accessibility APIs.
If we're just talking about integrating an AI sidebar, Comet already does a great job, and Chrome is about to catch up. But they are still a long way from being true, general-purpose agentic browsers. I think the future directions might be:
- From the client/browser perspective: Focus on optimizing the most common daily operations and polishing the user experience (e.g., writing emails, summarizing news, etc.).
- Combine agents with old-school "record/replay" functionalities, allowing users to create their own workflows more easily.
- From the server/website perspective: Explore business models for providing AI-friendly content interfaces. I believe there is a huge demand and a large market for this.