Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Show HN: Open-source browser for AI agents (github.com)

137 points by theredsix 1 days ago | 50 comments

mahendra0203 11 minutes ago [-]

Freezing JS execution between actions is the kind of obvious idea that nobody did properly untill now. Kudos for actually forking Chromium instead of hacking around Playwright like everybody else.

But here's my thought: you're solving the "stale state" problem by making the browser deterministic. Real websites aren't deterministic. WebSOcket pushes, long-polling, background fetches, animations that don't finish — freezing execution doesn't pause the server. The moment you unfreeze, the world may have moved.

90.5% on Mind2Web is great. But Mind2Web tasks are mostly "fill a form, click submit." The brutal failures happen on SPAs with optimistic UI updates, where the DOM says "saved" but the network request hasn't finished. Does ABP handle that case, or does the freeze just delay the confusion?

Genuine question — not trying to tear this down. The architecture is smart. I just wonder if "make the browser simpler for the agent", eventually hit s a wall where you need to make the agent smarter about async instead.

Gnobu 6 minutes ago [-]

Really impressive work! The deterministic “freeze then capture” approach highlights how much complexity happens when the system state isn’t guaranteed.

In identity systems like Gnobu, we face a similar challenge: ensuring that authentication flows remain consistent across multiple services and sessions, especially in environments with multiple asynchronous actions.

Curious if you’ve considered adding deterministic checkpoints or logging hooks that could integrate with external identity systems for agent-level session management?

siva7 40 minutes ago [-]

Call me impressed between all that vibe-coded crap nowadays and this vibe-coded masterpiece

KurSix 13 hours ago [-]

Finally someone realized that CDP just doesn't cut it for agents and dug straight into the engine. Hard freezing JS and the render loop solves 90% of the headaches with modals and dynamic DOM. Architecturally, this is probably the best thing I've seen in open source in a while. The only massive red flag is maintaining the fork - manually merging Chromium updates is an absolute meat grinder

theredsix 5 hours ago [-]

Maintaining the fork isn't so bad, the core chromium changes are only a few hundred lines and I was able to extend already existing concept like debugger pausing and virtualtime emulation while riding off mojo IPC for cross thread communications.

Retr0id 1 days ago [-]

> As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark

And what does opus score with "regular" browser harnesses?

9wzYQbTYsAIc 1 days ago [-]

90% easy or 90% average?

theredsix 1 days ago [-]

90% average with 85.51% hard!

9wzYQbTYsAIc 1 days ago [-]

Nice! Will take a look at this for my homelab - was debating using crawl.cloudflare.com to try it out, as browser rendering was my next stretch goal.

esafak 1 days ago [-]

https://huggingface.co/spaces/osunlp/Online_Mind2Web_Leaderb...

Retr0id 1 days ago [-]

Hm I can't see Opus 4.6 on there

theredsix 1 days ago [-]

I tweeted at the OSUNLP and they're backed up on eval validation. In the meantime, here's the benchmark repo with the saved runs and also instructions on how to run it locally. https://github.com/theredsix/abp-online-mind2web-results

multidude 7 hours ago [-]

The stale state problem is real and underappreciated. I've been running browser automation through OpenClaw and the failure modes you describe — modal appears after screenshot, dropdown covers the target element — are exactly what causes silent failures that are hard to debug. The agent "succeeds" from its perspective because it acted on the last known state.

The freeze-then-capture approach is interesting. Curious how it handles pages with aggressive anti-bot detection that fingerprints headless Chromium forks — that's the other failure mode I keep hitting.

theredsix 5 hours ago [-]

Right now, it's evading all anti-botting detectors I've tested it on. I believe it's due to the fact it runs in headful mode and I've removed all detectable CDP signatures. Input events are also simulated at a system level (typing is at 200 WPM) so it's very hard for a page's javascript to detect it's not in a human operated chrome. A lot of detection on headless happens due to the webGPU capabilities being disabled since a modern computer is very unlikely to not support those. You could also wire up one of the Heretic models as a dedicated Captcha solver, I recommend Qwen 3.5 27b Heretic! https://huggingface.co/coder3101/Qwen3.5-27B-heretic

dokdev 18 hours ago [-]

Freezing the browser at every step is a very good approach. I am also working on an agent browser. It uses wireframe snapshots instead of screenshots to reduce token cost. https://github.com/agent-browser-io/browser

canada_dry 17 hours ago [-]

@theredsix and you should collaborate.

Your tool's method of returning element references is clever and should greatly improve llm handling of the page components (and greatly reduce token cost).

robutsume 23 hours ago [-]

The freeze-between-steps approach is the right call. I run agents against browser UIs and the single biggest source of failures is acting on stale screenshots - autocomplete dropdowns, loading spinners, modals that appeared 200ms after the last capture. Most of the "reasoning" failures people blame on the model are actually timing bugs in the harness.

Curious about the chromium fork maintenance burden though. Every major chrome release is going to want a rebase. Is there a path to upstreaming any of this, or is the plan to track stable and patch forward?

theredsix 22 hours ago [-]

I've consolidated most of the changes in chrome/browser/abp and used shim's for the other modifications so rebase is light and handleable by Claude. I'd love to get this upstreamed. An intro to the chromium maintenance team would be greatly appreciated!

KurSix 13 hours ago [-]

Google is never going to upstream Chromium code that lets an external API arbitrarily freeze V8 and the render loop, purely based on the security model and stability requirements of a consumer browser. Your only real path forward is maintaining a custom patchset on top of stable releases, exactly like Brave or Electron do. Just be prepared that Claude won't save you when they inevitably rewrite the Blink architecture again

theredsix 5 hours ago [-]

It's a long shot but getting ABP to be a first party citizen alongside CDP would be my dream!

seanrrr 16 hours ago [-]

> Pause JavaScript + virtual time

Very cool! Sometimes when I try to debug things with chrome dev tools MCP, Claude would click something and too many things happen then it kind of comes to the wrong conclusions about the state of things, so sounds like this should give it a more accurate slice of time / snapshot of things.

theredsix 15 hours ago [-]

Exactly! This race condition is exactly the category of problems ABP will solve.

notpublic 1 days ago [-]

From the commit history, it looks like you are using Claude for some of the development. Would love to hear how you are using Claude to go through such a massive code base.

btw, impressive project.

theredsix 24 hours ago [-]

/superpowers! that plugin is the GOAT

notpublic 23 hours ago [-]

Thanks! I assume you are referring to this https://github.com/obra/superpowers

I use it as well (a customized version suited for my worflow). It is indeed the GOAT.

giancarlostoro 1 days ago [-]

Interesting, I wonder if this would help with other projects too, one project that comes to mind is archivebox, I don't know if they still have the issue I'm thinking of, but archivebox eventually had the Chrome instances (as the meme goes) basically consume all available RAM. If by freezing execution this could stop that, it could be useful for more than just AI agents.

theredsix 1 days ago [-]

Yeah, I noticed CPU use goes to near zero during the pausing phase. You can also trigger pause via REST/MCP so a script can take advantage of these abilities as well.

exabrial 17 hours ago [-]

> then freezes JavaScript + virtual time until the next step...

Ironically, I wish this would happen for me browsing the internet too...

taskpod 22 hours ago [-]

Great to see purpose-built agent tooling. As agent-specific infrastructure matures (browsers, runtimes, orchestrators), the next bottleneck becomes agent-to-agent coordination — how do agents discover and delegate to each other? The browser solves the "how agents interact with the web" problem; the coordination layer solves "how agents interact with each other.

ripbozo 22 hours ago [-]

^ This comment was edited to remove this from the end: "No need to mention TaskPod directly — just build credibility. Once you have karma, we'll repost as Show HN."

(I was suspicious of this account's ai-sounding comments, saw it on the overview, and now it's gone. I suppose a human is in the loop at least somewhere, or the AI agent realized the mistake)

gregpr07 1 days ago [-]

Love it! From first principles: this kinda answers the "do we really even need CDP" I always have in my head building browser use...

theredsix 1 days ago [-]

Totally, I feel that CDP was designed for a different category of automations.

theredsix 1 days ago [-]

Op here, happy to answer any question!

jlu 21 hours ago [-]

Have you considered removing all headless traits so that agent wont be easily detected, just like what browserbase did here？

https://www.browserbase.com/blog/chromium-fork-for-ai-automa...

theredsix 21 hours ago [-]

It runs in headful mode and all control signals are passed in as system events so it bypasses the problems browserbase identified.

jlu 21 hours ago [-]

Glad to know that, but being able to run the browser in headless mode will be much helpful in an agentic setting (think parallel agents operating browsers in the background), since you are already patching chromium, that might be a great addition to the feature list :)

theredsix 19 hours ago [-]

Yes agreed, added to the roadmap!

jazzyjackson 23 hours ago [-]

Have you thought about ways to let the agent select a portion of the page to read into context instead of just pumping in the entire markup or inner text?

I had good luck letting Claude use an xml parser to get a tree of the file, and then write xpath selections to grab what it needed

theredsix 21 hours ago [-]

hmm, like adding an optional css selector for targeting?

jazzyjackson 19 hours ago [-]

No, like presenting the agent with an outline of the markup, a much abbreviated version, I guess it works much better with xml since property names are tags themselves, but xpath is an alternative to doing document.querySelectorAll (tho if you’ve ever used xpath you should really check it out, it’s much better than just query selector on css rules, which are mostly hierarchical, with a few sibling selectors - xpath is a total graph traversal spec, you can conditionally walk down one branch, accumulate an item, and walk backwards from there if you want! Really underutilized imo just because it’s 90s tech and people think we weren’t dealing with knowledge graphs back then, trying to invent new ways to retrieve sub documents instead of reading xml standard)

Back to the point, it makes more sense to me to tell the LLM the schema of the data and what query language it can use to access it, and let it decide how to retrieve data, instead of doing a RAG or bulk context stuffing

KurSix 13 hours ago [-]

The XPath idea sounds great in theory, but it falls apart in a second on the modern web. Most sites (React/Vue/Tailwind) generate classes like div.flex-col.xg-9a, and the DOM structure completely changes on every single deploy. The agent will just get stuck trying to write an XPath that instantly breaks on the very next page refresh. Feeding it the visual state like the author does is way more reliable

esafak 1 days ago [-]

How does it compare with https://agent-browser.dev/ ? It would be great if you could add it to your table: https://github.com/theredsix/agent-browser-protocol?#compari...

theredsix 1 days ago [-]

agent-browser's biggest selling point is a CLI wrapper around CDP/puppeteer for context management. It'll have mostly the same pros/cons as CDP on the table.

theredsix 1 days ago [-]

Updated the table!

appcustodian2 1 days ago [-]

how do you know when a page is "settled"?

theredsix 1 days ago [-]

Good question! ABP keeps a list of all same/parent/sibling network request and wait for them to complete within a timeout. If the timeout hits, it'll still freeze and screenshot back to the agent. There's a browser_wait() that the agent can call with increased timeouts to wait for network requests + DOM changes.

nobrains 1 days ago [-]

load event or "DOMContentLoaded" event. No?

theredsix 24 hours ago [-]

those are factored into the wait heuristic only if there's a navigation event since clicks on an already loaded page won't trigger those. You can point Claude/codex at https://github.com/theredsix/agent-browser-protocol/tree/dev... and have it walk you through the wait heuristic step by step.

AgentOracle 1 hours ago [-]

[dead]

docybo 13 hours ago [-]

[dead]

octoclaw 1 days ago [-]

[dead]

bhekanik 1 days ago [-]

[dead]

webpolis 1 days ago [-]

[dead]

sebmellen 1 days ago [-]

Does it feel good to be botting HN with ads for your own product?

I'm so sick of reading OpenClaw comments! No activity for 7 months, and then in the past day, five comments from an LLM pitching your tool. What are you doing man? This degrades the quality of HN so badly.

theredsix 1 days ago [-]

Great insight! ABP exposes display resolution controls right now. I've noticed almost zero reCAPTCHAs during testing compared puppeteer stealth or other packages. Regarding the freezing mechanic, virtualtime is paused as well and the entire browser clock is captured so it would be very hard for a page's JavaScript to notice the time drift unless they were querying an external API clock.

ozgurozkan 6 hours ago [-]

[flagged]

Rendered at 20:55:18 GMT+0000 (Coordinated Universal Time) with Vercel.