Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Zerostack – A Unix-inspired coding agent written in pure Rust (crates.io)

544 points by gidellav 1 days ago | 298 comments

parhamn 1 days ago [-]

I (somewhat jokingly) wrote one recently too... https://github.com/pnegahdar/nano in under 200 lines. Repl, sessions, non-interactive, approvals, etc

The smarter the models get the less the harnesses matter (outside of devx).

Maybe one day I'll run it through swebech.

freakynit 1 days ago [-]

So freaking cool..in just 200 (190 actually) lines.

I also wrote one by myself last week (just for fun and learning). It works, including integration with configured mcpServers (like you do in most coding agents). Wrote about the whole step-by-step process and what is needed at what step and why: https://nb1t.sh/building-a-real-agent-step-by-step/

tasuki 13 hours ago [-]

Ok, I know it's a joke. And also, are you daily-driving it?

parhamn 9 hours ago [-]

Not daily driver, but have used it as a utility a few times.

For my daily work I like letting different harnesses compete and look over each others work (while subsidized with the subscriptions) so I use OpenADE.

mgfist 1 days ago [-]

I like it

rullopat 18 hours ago [-]

I understand the need for memory footprint in some situations, but what's the point of seeking performance for a software that mostly calls LLMs and waits?

tjoff 17 hours ago [-]

Before I tried coding agents my guess would have been: none.

But seeing how slow claude code and copilot cli are and how much ram they use I'm flabbergasted. If you have long running sessions they can both take tens pf gigabytes of ram and feel quite sluggish.

i_am_a_peasant 17 hours ago [-]

huh. my evidence with codex hasn’t been so bad. and tbh why would i discourage anyone from coding. hack away mr hacker. your solution will either sink or swim

krzyk 16 hours ago [-]

codex is in rust and not in power and memory hungry js/ts.

i_am_a_peasant 15 hours ago [-]

oh sweet I had no idea. funny that i mostly use it to write rust

dorian-graph 9 hours ago [-]

It was previously JS/TS, but they rewrote it in Rust, sometime in the past 12 months.

manmal 10 hours ago [-]

Check out its app-server, IMO it’s a decent foundation to the codex clients.

crabmusket 15 hours ago [-]

I've been playing with running Claude Code inside a Vagrant VM. I can't be certain it was getting OOM killed when I allowed the VM 4GB of RAM, but when I went to 16 it did seem to be more stable...

yjftsjthsd-h 14 hours ago [-]

> I can't be certain it was getting OOM killed when I allowed the VM 4GB of RAM

Of it's actually getting OOMed (and not backing off by itself), I'm pretty sure that's logged in dmesg. Or earlyoom or systemd-oomd if userspace is in play and getting there first.

crabmusket 6 hours ago [-]

Thanks for the tip, I will probably try shrinking it back to 4 to see, as that seems like it should be enough RAM for anybody (:

Mjarvis 16 hours ago [-]

Yes...exactly. Its frustrating and inefficient.

mpalmer 11 hours ago [-]

The appetite for Rust is the appetite for higher guardrails. Automatic memory management in safe Rust makes it less likely your app bloats even as its source balloons.

The people "writing" agents are not themselves experts in how to write performant code. Claude Code is so massive and ugly it can only be realistically maintained by continuing to throw LLMs at it. But that's not a replacement for good software design.

adabsurdo 16 hours ago [-]

[dead]

mapcars 18 hours ago [-]

I see spreading Rust as an overall good thing, because it changes benchmark on how software should feel in terms of performance, stability, memory footprint.

So even if it doesn't create tangible advantage in a particular use case - its still good for the whole industry.

GodelNumbering 17 hours ago [-]

I haven't used Rust extensively but my feeling is, if you change the design (which inevitably happens in many early stage projects), the refactoring takes more time due to borrow-checker semantics. Although I am far from a representative sample and could well have been using it wrong

ijustlovemath 16 hours ago [-]

When you write Rust long enough you settle on certain architectures (message passing, event loops) that go well with the borrow checker, and don't end up thinking about it too much. Plus you can always throw an agent at the first set of errors from the refactor and let the compiler guide the annoying parts.

bheadmaster 15 hours ago [-]

> When you write Rust long enough you settle on certain architectures (message passing, event loops) that go well with the borrow checker

So basically Go?

flossly 15 hours ago [-]

Go only provides one concurrency paradigm. Rust support many (if not all).

The type system of Go is very weak. I'd say that'd be my main reason to pass on Go, even when the concurrency paradigm fits the project perfectly.

jen20 11 hours ago [-]

The biggest reason to pass on Go right now (if your software can tolerate a runtime) is the lack of algebraic data types when doing interesting domain modeling. It makes such a huge difference it’s worth tolerating the pain points of Rust (or Swift, or F#) just to have them.

ijustlovemath 12 hours ago [-]

Traits, Enums, and Typestate allow much richer paradigms at much lower cost

eldenring 17 hours ago [-]

Its just not a thing to consider and doesn't happen often.

amelius 17 hours ago [-]

No because it means people will use Rust for the wrong reasons.

Systems programming is only a tiny fraction of code out there.

Approaching every problem as a systems programming problem is a massive waste of resources and intellect.

angusturner 17 hours ago [-]

For small to medium projects, an LLM can write functional (if not well crafted) Rust.

Considering how easy this is now, why choose a heavier, slower and less typesafe language?

amelius 14 hours ago [-]

Ok, so write your app in the garbage collected language, and then tell the LLM to translate it to Rust :)

Wowfunhappy 13 hours ago [-]

I find it kind of shocking that Anthropic doesn't see it this way.

pojzon 9 hours ago [-]

Claude Code has whole game engine built into it. God knows why.

attentive 3 hours ago [-]

Tell us more.

singpolyma3 16 hours ago [-]

Could choose a similar weight, similar speed, equal or more typesafe language though :)

galangalalgol 13 hours ago [-]

Ada? Other than c and c++ everything else benchmarks 2-4 times slower than rust for compute bound tasks, even after jit warmup. I'm up for ada though, especially with an llm where I don't have to type all that verbose syntax.

singpolyma3 13 hours ago [-]

OCaml? Haskell? Idris?

Lots of options with no jit or warmup

galangalalgol 12 hours ago [-]

I'm not against jit or warmup, just saying it doesn't actually catch up for compute bound tasks in my experience. Haskell and ocaml would definitely be next on my list, but they do take a very good hit in performance over ada or rust. I wouldn't say they were similar in performance, certainly. There is a pretty big cliff between the systems languages and everything else performance-wise. For a lot of things it doesn't matter I know, but none of those things are domains I've ever worked in. I've never had a project in my professional career where we didn't descope requirements to fit the available compute.

tcfhgj 17 hours ago [-]

it saves a lot of resources - for instance my devices would probably use less than half of the memory it uses now and I wouldn't hear the fan.

amelius 14 hours ago [-]

You won't hear the fan because you're still building it.

The resources I was talking about are developers × time.

tcfhgj 14 hours ago [-]

I am talking about using software - if software is used by many people, that's the more relevant resource usage.

lobocinza 7 hours ago [-]

It is a common trend for companies to optimize for visible CapEx at the cost of increased but invisible OpEx for consumers.

gf000 17 hours ago [-]

How is it any faster than something written in say, Java?

tcfhgj 17 hours ago [-]

latency and throughput (when with Java the system is crying for more memory while it's chilling in the Rust case)

gf000 16 hours ago [-]

What's the latency difference between a long running process issuing a network call in Java vs rust? This is such a short time that it is completely overshadowed by noise (OS doing something else, what other software is running etc)

As for throughput: you have 1-2 requests going at a time, the next one waiting for the reply. What throughput are we talking about?

That's like speeding to the post office and expecting your letter to get to the recipient faster.

tcfhgj 16 hours ago [-]

you seem to specifically aim at the current example, but mine wasn't

Anyways, consider how higher memory usage can affect the systems performance dramatically once the system needs to start swapping memory to disk signficantly

hnlmorg 15 hours ago [-]

If you cannot write a simple Java agent without consuming so much RAM that your system is swapping then that really says more about the developer than anything.

Java is used in plenty of embedded systems and other memory constrained environments. Yes, it’s not going to perform well compared with Rust, but that doesn’t mean it’s an Electron-equivalent bloated clusterfuck of an ecosystem that’s going to eat all your system resources.

tcfhgj 14 hours ago [-]

> so much

1) the agent is probably not the only thing running on the system, so more is just worse generally

2) I am fine if a developer needs Rust or similar to write a resource efficient app. I wonder what the developer could achieve when he put the optimization effort into the Rust app instead.

hnlmorg 13 hours ago [-]

My point is that Java isn’t going to be the application that sends your machine into swap hell.

People are so narrow minded about programming on this forum. They talk as if only Rust fills the void between unsafe C and node.js behemoths. But the reality is there are a plethora of other good languages out there too.

gf000 13 hours ago [-]

Of course, what would be a point of talking about an overly specific statement that has no relevance here?

mejutoco 9 hours ago [-]

> That's like speeding to the post office and expecting your letter to get to the recipient faster.

I mean, the post office is not a magic box. Actual people will take your letter somewhere, sometimes batching sends. So running to the post office might actually get your letter in an earlier batch, same as ordering on amazon or your online supermarket in the morning or in the evening might change the delivery time.

Pedantic, I know, but interesting example.

ink-splatters 16 hours ago [-]

You can tune java runtime in many ways, achieving impressive throughput/latency for your type of workload.

Next to none of them will get you nearly as good cold start times as of native app, if using free java.

There was GraalVM and its ecosystem which included Java Native Image - first thing I’d evaluate if thought about non-server side, performant Java application.

But it all had been sadly swept away by Oracle from free tier.

flossly 15 hours ago [-]

I use GraalVM and Native Image now and while the project --a small CLI tool-- is tiny (2kLOC with mainly AWS-SDK deps) the compile times are huge (~3 minutes), the OS-dependencies many (so much I use a build container to ease the burden of installing all) and the resulting binary is huge (~60MB).

But then it distributes as one binary and starts in milliseconds.

Rust would have been a better fit (cargo-and-done, smaller binary, quicker to compile); but I wanted to use Kotlin as we use in all other projects.

gf000 13 hours ago [-]

It hasn't been swept away by Oracle, far from it. It's development is just no longer coupled to the OpenJDK release cycle, which benefits both projects.

tornikeo 18 hours ago [-]

Simplest explanation I could come up with: Just for hype and fun.

Rewriting things in rust is "cool". Bun did it, other projects did it. Therefore, writing a coding agent in one should be cool too.

And apparently enough HN crowd agrees with it to take the #1 spot on the board.

GodelNumbering 17 hours ago [-]

For the most part, doing things right in the given language matters more than change of language. A lot of refactors in Rust (in the coding agent space) I see jump straight to Rust without considering what inefficiencies can be addressed before changing the language.

Having said that, I considered a Go/Rust rewrite of Dirac (https://github.com/dirac-run/dirac) for some modules to support cases when someone wants to run like 30 agents, but it quickly became obvious that, a) while the node event loop is a bottleneck, it is not the sole bottleneck and b) if you have a VSCode extension, you can't totally get rid of TypeScript, so it just becomes the case of bi-lingual project and the maintenance burden that comes with it

flossly 15 hours ago [-]

Rust is just another language. Sure it's cooler than some langs, to some ppl. Sure.

The author made the choice. Open sourced it (thanks!). So now we all enjoy more options. Saying author did so because "cool" does not sit well with me. It's feels like you get a no-strings attached gift of significant value and then going saying the giver gave it to be seen as cool.

joelthelion 18 hours ago [-]

Opencode can be surprisingly hard on the CPU (could be an issue when coding on battery or a weak remote VM), and uses a lot of RAM. A little competition is always welcome.

wint3rmute 18 hours ago [-]

Even a simple coding agent TUI should work instantenously, which I sadly cannot say is true about typescript-based applications like Claude Code or Gemini.

After switching away from GNOME Terminal + Zsh to Ghostty + Nushell, I started to appreciate how instant everything feels. Why not make everything just as fast?

itsdavesanders 17 hours ago [-]

I have to say this is one of my favorite things about local Qwen and Qwen code, it seems a heck of a lot faster that Claude and feels better to work with.

Problem is it is nowhere near as smart, so what speed I get in conversation gets killed by iteration.

jwxz 17 hours ago [-]

I didn't see anyone mention this, but I think having a single binary is much nicer than having a JS (or Python) program sprawled all over your system.

ink-splatters 16 hours ago [-]

Having single binary output is completely different problem and is solved for both Python and typescript (bun supports the later).

crabmusket 15 hours ago [-]

Node and Deno can also bundle apps into a single executable.

flossly 15 hours ago [-]

Over time software grows. Once big rewriting it in another language is hard and gets harder as the project grows in size.

Starting with a resource-saving attitude may be a very good long term strategy.

Also: with Rust there are many features of high-level, modern, type-safe, FP-inspired languages that you do not have to miss.

amelius 12 hours ago [-]

Most FP languages cannot work without GC unless you're willing to give up idiomatic FP programming. There is a reason Haskell has a garbage collector.

flossly 6 hours ago [-]

Hence I used FP-inspired (to point at languages like Rust, Kotlin, Ruby, Swift)

rbalicki 11 hours ago [-]

That's exactly the tradeoff I made with Barnum (https://barnum-circus.github.io/). It's just not important to optimize the performance of the rust side for the reason you stated. So instead, all focus goes into making it easy for an LLM to build a reliable pipeline (from which LLMs are invoked).

throwa356262 18 hours ago [-]

While we are not there yet, people are looking into running agents in esp32 and alike.

See projects such as picoclaw, nullclaw and more.

https://github.com/sipeed/picoclaw

https://github.com/nullclaw/nullclaw

krzyk 16 hours ago [-]

e.g. opencode right now uses ~80% of my CPU.

At first I also thought that it would be just call and wait, but a lot of work is done locally (any tool calls).

tacone 13 hours ago [-]

It's also dealing with memory issues (see: Memory Megathread https://github.com/anomalyco/opencode/issues/20695).

And in my experience is not that much faster to start than more complex software like Visual Studio Code.

faangguyindia 14 hours ago [-]

If you write in Go, you get faster compile time, more likely your code will compile fine after long time.

tcfhgj 17 hours ago [-]

- Reduce the footprint on the planet

- prolonged life of hardware

- less electricity

- less expensive hardware

sdevonoes 17 hours ago [-]

Compared to what LLMs actually consume, your agent makes zero difference

krzyk 16 hours ago [-]

Why would anyone compare a cloud LLMs power usage when one doesn't pay for it? Local power consumption is important for those.

afavour 16 hours ago [-]

OP specifically cited “reduce the footprint on the planet”

tcfhgj 17 hours ago [-]

very wrong - especially on the local machine, see https://news.ycombinator.com/item?id=48164613

iddan 17 hours ago [-]

Running many of those in scale.

phplovesong 17 hours ago [-]

I recall back in the mid 2000s when i saw many "rewrite in rails" apps. Its just hype, and it will die out in a few years when something new comes out.

cpa 17 hours ago [-]

[dead]

frio 1 days ago [-]

Thanks, I've been tooling away in my spare time on my own version of this -- both to get a deeper understanding of agents (everyone suggests writing your own) and to help learn Rust. I'd like to retain `pi`'s configurability though, the ability to self-mutate and generate new tools is incredibly useful, particularly because I don't think any of these things should have access to arbitrary code execution through `bash` (of course, if they have access to, say, `edit` and `cargo run` they still have arbitrary code exec, but...) (so I tend to generate tools on the fly when I encounter something the no-bash agent needs to do).

gidellav 1 days ago [-]

I actually though about this issue, but while Pi can have this script-like environment thanks to the fact that it's based on an interpreted language (TypeScript), Rust has its own limitation as a compiled language.

I decided to allow for customization in a different way:

1. The prompt library (~/.config/hypernova/prompts/) acts as a simpler alternative to Skills, with the built-in prompts that should replace superpowers + Claude's frontend-design

2. Compile-time features; things that might make the agent more bloated can be disabled when you decide to compile zerostack

3. Clean code; code that's short and easy to read, you can just throw zerostack on its own source code in order to build a custom fork if your necessity can't be satisfied. Good features could also be adopted by the main version.

4. Permission mode; as you can see in the README, there was lots of concern around the permission model, and I landed on a 4-mode system that goes from "Restrictive" (no commands) to "YOLO" (whatever the agent wants to do" + custom regex patterns for allow/ask/deny permission on 'bash' calls. In your case, you just need to run `zerostack -R` to force all tools to ask for permission.

(Also, there is a work-in-progress features for programmable agents, but that's yet to be announced)

aerzen 22 hours ago [-]

Ok, what about having tools be discoverable from the environment, similar to how $PATH works in POSIX?

There could be an env var $AGENT_TOOLS, a string of paths delimited by `:` and tools would be discovered as some specific format of file. Maybe a JSON that contains tool name, list of parameters and the command to run it.

This is essentially decoupling tools from the agent, allowing more customization and per-project environments. It does require shipping and installing more binaries, one for each tool probably.

threecheese 12 hours ago [-]

The Hermes agent (Python) follows something similar; it defines a HOME dir and enumerates plugins and memory extensions present there.

https://github.com/nousresearch/hermes-agent

Functionally, it fits more in the openclaw space than pi-agent.

zrg 16 hours ago [-]

This is one of the approaches im considering for my own, Roder.

The approach mostly being communicating over json rpc which has become the standard for MCP so it makes it more approachable to agent developers.

Obviously its very much NOT mcp, its a low level events based rpc system for registering capabilities and extending low level primitives of the agnet itself not the model

gidellav 22 hours ago [-]

I understand the concept, but I don't get what's the advantage over adding in the prompt instructions to use a specific bash command for a specific task, acting as a "custom tool".

frio 6 hours ago [-]

The harness clamps what the agent can do. `bash` allows full code execution; a dedicated `mvn` tool might only allow `mvn compile` but not `mvn spring-boot:run`. You could probably implement this with an `allow` list attached to your `bash` tool, but by doing it this way, you can enhance the outputs or perform mandatory checks too.

For instance, Claude likes to run little Python scripts; reviewing them is tedious. Removing `bash` and adding a `python` tool would allow the harness to pre-review and grep for common harmful patterns, or run the `python` script in a `krunvm` or `muvm` to isolate it, etc. This review/isolation would be handled programatically as it's part of the harness; leaving the agent to choose what to do as a skill means the agent can conveniently forget to enforce its own checks.

aerzen 17 hours ago [-]

Good point. There might be a small advantage if one does not want to give bash access. But general answer to "how do add custom tools like we can in pi" is "you don't". Keep it simple.

frio 1 days ago [-]

I've been trying to use `Deno` underneath `Rust` so that the tools can still be written in Typescript and thus self-mutated without the compilation step (but I can still try to do clever things with V8 Isolates or similar). It's been an ugly experiment so far; I'm vaguely thinking a simpler model would be to just define a binary "API" and run tools by exec-ing binaries.

gidellav 1 days ago [-]

I have to be honest and tell you that try to load such an heavy runtime as a scripting layer is not a great idea; at the same time I can tell you that I am working on another Rust project where I also needed scripting, and after three attempts I landed on rhai (https://rhai.rs/) (https://rhai.rs/book).

You might find it nice for pretty much all use cases except for high-performance scripting (so, if you are not try to build the entire logic entirely in rhai, you are going to be fine).

frio 1 days ago [-]

Yeah, it's been a bit of a dead end. I didn't want the heavy runtime but felt it was worth disproving after experimenting rather than ruling out off the bat. Even before getting it running, the dependency list alone was pretty discouraging, especially given the storm of supply chain attacks these days.

Rhai looks nice, I'll take a look, thanks! And good luck with Zerostack.

aschar 1 days ago [-]

[dead]

slopinthebag 1 days ago [-]

I was just going to suggest rhai. It's simple enough LLMs can easily write it with a little context, and you control the entire API so you can sandbox effectively without needing to resort to hacks with a JS interpreter etc.

slowhorse 22 hours ago [-]

I agree v8 and Deno seems very heavy handed and complex to integrate for scripting capabilities.

Have you considered Lua? It is tailor made for use cases like this. Creating an embedded host in Rust is trivial, the work lies in creating built-in functions for the script runtime so that the user scripts can do useful things to the environment.

BillStrong 1 days ago [-]

Have you thought about Zig? If you limit it to CompTime, isn't that just a scripting language that happens to be compiled to binary?

brabel 17 hours ago [-]

That’s not how it works. Comptime Zig is Zig, not an embedded scripting language. You can’t run comptime code separately, it only runs as part of compiling a Zig program. Think of it like Rust macros.

frio 23 hours ago [-]

Possibly, I'm not really interested in learning Zig though (or learning to embed it in Rust). I'm sure that'd be a cool project for someone else to try :).

jswny 1 days ago [-]

Why not WASM?

frio 1 days ago [-]

Unfamiliarity and I believe it requires a compile step. I’m at least familiar with Typescript and Deno so being able to embed them was an appealing idea :)

kristjansson 22 hours ago [-]

> simpler alternative to Skills

this concerns me. Skills are already just about the simplest possible thing; they're just prompts, in a directory!

lunar_mycroft 22 hours ago [-]

Skills are notably more complex than that. They require metadata (which the model is given and uses to determine whether or not to load the main file), are intended to be loaded via a tool call, contain extra resources (also loaded by tool calls), etc. In contrast, with this system the harness doesn't need a tool to load the stored prompts, the prompts don't need to include metadata to allow for runtime discovery, etc.

cobolcomesback 16 hours ago [-]

Runtime discovery is the entire point of skills. Without it, this is just a templating prompt system that the user has to remember to use… except because this one changes your system prompt, it also busts your cache and costs you extra money when you use a prompt.

Skills are already dead-simple and this prompt system doesn’t at all tackle the same problem.

lunar_mycroft 14 hours ago [-]

"{Feature} is the whole point of {more complex technology}" is an objection that can very often be raised. That doesn't mean that giving up features in exchange for simplicity is always the wrong call. And there's also advantages to having the user drive what instructions go into the prompt instead of the harness/model.

cobolcomesback 14 hours ago [-]

This is tangential to the point. It’s often great to have a simpler version of a solution, even if it eschews some features. But this isn’t that. OP claims that the prompt system is an “alternative” to skills, but it isn’t. It isn’t solving the same problem that skills solve at all. It’s like saying that a bicycle is a simpler alternative to a lawnmower because they both have wheels.

Prompts are a feature that are simpler than skills, sure, but they’re a completely different feature entirely.

lunar_mycroft 14 hours ago [-]

It's an alternative in the same way e.g. plain markdown is an alternative to HTML, even though plain markdown lacks some of the features of HTML. "X is an alternative to Y" in this sense doesn't mean "X all the same features of Y", it means "you might reasonably choose to use X instead of Y, depending on your exact usecase"

gidellav 22 hours ago [-]

Exactly, this was my thought process when deciding if we should have Skills or not.

In the end, I think that this prompt-only design, with the integrated tools that come with zerostack, is more than enough.

backscratches 22 hours ago [-]

So are these lol

praveer13 1 days ago [-]

I’ve been doing the same thing in zig haha.

throwa356262 1 days ago [-]

"RAM footprint: ~8MB on an empty session, ~12MB when working"

I like this, Claude Code is using multiple gigabytes, which is really annoying on lowend laptops

all2 1 days ago [-]

I'm building an agent framework in golang and it is extremely light weight. Startup time is under 1/2 second, and RAM usage is really low. I have a 12 year old laptop and it happily runs without slowing down.

There's no reason what is essentially a string concat engine should be slow on any hardware, including old hardware.

gidellav 22 hours ago [-]

Isn't 2 second startup time a lot? With zerostack, I managed to get it down to ~90ms

NewJazz 21 hours ago [-]

They said 1/2 as in 0.5 seconds as in 500 ms.

throwa356262 18 hours ago [-]

Sounds interesting, would you like to share any more information about your project?

all2 8 hours ago [-]

Link is here [0]. The idea is to model cognitive states (how to think), and workflows (what to think about) as statecharts. The charts will be defined in YAML (version-able, hot-reloading). Context payloads are defined in an agent YAML file. Think of it as a map, like a drive map for a computer's HDD/SSD. You spec the order of context chunks, what goes into them, and then when the inference payload is built, it uses the context map definition (comprised of the chunks you defined), the agent definition (including model params like context length, temp, etc), cognitive state, and workflow state to build out the inference payload.

Agent cognitive states may add chunks to the system prompt. Workflows may add chunks to the system prompt. Tool access may vary by agent/workflow state (policy is last-defined-wins overlays to keep it simple to reason about).

Agents may run by themselves or be 'bound' to a workflow. Agents can detach from a workflow before it is finished, and either re-bind, or another agent may bind to the workflow (one implements, another reviews, for example).

Conceptually, this is all very simple, which is why I'm hand rolling it.

The goal is a minimal runtime that can support long-running agents in a 'zero human company' setting.

On top of the runtime will be a minimal change control workflow (if you've spent time in hardware engineering, these are standard processes governed by a company's quality system).

I've yet to wire in the economic pieces (token spend, power consumption, rollups that show performance of various agents based on inputs and outputs).

It is a bit far fetched, but I'd like to get this thing ISO9001 certified, and maybe AS9100 certified.

This is all to scratch my own itch, tbh. Most agentic systems are hard to reason about, bloated, lack visibility in the appropriate places, lack economic data of sufficient granularity, and so on. So I'm building this.

[0] https://github.com/zerohumancompany2/maelstrom-code

rel 1 days ago [-]

I've been trying to migrate over the zed and think they're Agent Client Protocol[1] is pretty neat, I wonder how much memory pressure Claude Code exerts if it is going through that mechanism instead

1: https://zed.dev/acp

threecheese 12 hours ago [-]

Not answering your question, but I just realized the new Anthropic billing changes are affecting ACP clients like Zed :(

https://zed.dev/blog/anthropic-subscription-changes

messh 1 days ago [-]

The memory footprint is great, it allows finally running these coding agents in extra small instances -- say x1 on shellbox.dev

chrisweekly 1 days ago [-]

Hmm, if they're this small something like smolmachines (like shellbox, but free and local) might be a great fit.

tecoholic 1 days ago [-]

Yes. Just this fact is going to make a lot of people try it out.

rane 18 hours ago [-]

I have 29 Claude Codes open, using 6.3 GiB RSS total

esperent 1 days ago [-]

Are you sure you don't have an LSP plugin or something running?

marknutter 1 days ago [-]

Isn't that because of the context window size?

gidellav 1 days ago [-]

Hi, I'm the developer of zerostack! No, the memory footprint is not beacuse of the context window size: on my benchmarks, with a 128k context loaded, and it jumped from 8MB (without any chat/context loaded) to 11MB.

The reasons why the memory footprint of zerostack are:

- Rust, and not JS/Python, so no interpreters/VMs on top

- Load-as-needed, so we only allocate things like LLM connectors when needed

- `smallvec` used for most of the array usage of the tool (up to N items are stored in stack)

- `compactstring` used for most of the string usage of the tool (up to N chars are stored in stack)

- `opt-level=z` to force LLVM to optimize for binary size and not for performance (even tho we still beat both in TTFT and in tool use time opencode)

- heavy usage of [LTO](https://en.wikipedia.org/wiki/Interprocedural_optimization#W...)

SatvikBeri 1 days ago [-]

The context window has nothing to do with RAM usage and even if it did, a million tokens of context is maybe 5mb.

bluegatty 1 days ago [-]

'A million tokens of context' is literally Terrabytes of KV cache VRAM on very expensive Nvidia silicon - on the model.

On the Agent, yes, the context window does relate to RAM, because the 'entire conversational history' is generally kept in memory. So ballpark 1M 'words' across a bunch of strings. It's not that-that much.

Claude Code is not inneficient because 'it's not Rust' - it's just probably not very efficiently designed.

Rust does not bestow magical properties that make memory more efficient really.

A bit more, but it's not going to change this situation.

'Dong it in Rust' might yield amazing returns just because the very nature of the activity is 'optimization'.

rixed 24 hours ago [-]

Rust "denialism" is as annoying as rust evangelism.

Of course any seemingly idiomatic rust is going to run circles around TS transpiled into JIT-compiled JS.

bluegatty 20 hours ago [-]

Lamenting any 'not even criticism' of Rust as 'denialism' is just evidence of the insane cult that is Rust.

Rebuilding Claude Code in Rust will make almost no difference in terms of real world performance. V8 is 'relatively fast', and there wouldn't be any noticeable improvements there, and probably not memory footprint either.

The source for Claude Code was leaked and it's a vibe-coded mess, there's not much thought given to clean architecture, it's unlikely they've just cleaned up a bit and given thought to memory consumption etc, if they did, they'd get by far most of the way there and likely abnegate and real want to 'do it in rust', unless there are other architectural considerations.

imtringued 10 hours ago [-]

You're the delusional one for bringing up the memory usage of the inference server that clearly isn't running inside the coding agent.

The problem with your comments is that you're showing off a fundamental lack of understanding between managed languages and unmanaged languages.

The vast majority of GCs are optimized for throughput and allocate big chunks of memory. They also tend to never release it if there was a temporary memory spike. The most advanced GCs also tend to have either read or write barriers, which slow down basic object accesses.

Just in time compilation and managed languages in general need to retain a runtime representation of the source code to perform JIT compilation and then they have to store the compiled code in memory as well.

JavaScript uses references against dynamic objects, which means you have to pay the indirection cost of a pointer but you also need to store type information as well to monomorphize the object literals and classes at runtime and fall back to a regular hashmap when fields are added dynamically.

All of these things will add up and increase the amount of memory the application uses and how slow it runs.

Sure Claude Code has severe architectural issues causing it to leak hundreds of gigabytes of RAM, but if those were not there you could easily build a C++ based alternative that runs circles around a hypothetical JavaScript based Claude Code that got its act together.

bluegatty 49 minutes ago [-]

1) I'm not 'delusional' for bringing up 'What Memory is Used Where' - I'm clarifying for the people who seem a bit confused (see above) as to 'where the context lives' - and trying to provide a simple mental model for that.

That's the opposite of delusional.

It's just information.

Attacking people for anything 'Rust related' however - is the quintessential reason why everyone hates the Rust community.

2) 'The problem with your comment' is that it's presumptive and arrogant - as if I 'don't know the difference between GC and managed languages'.

I've been writing software since 1990.

Embedded (on custom Silicon), UI, SaaS, backend, some embedded work I've done is still in production today from almost 30 years ago.

I've written a scripting languages (for production), and cyclic ref-count gc (didn't make it to production).

Your comments about GC etc. are fine - but they but they don't really offer any insight into the actual problem.

There's one critical detail aka 'memory not released after spikes', yes, this is observed behaviour, but it's usually accommodated with a little bit of decent Engineering.

If you're going to make the comparative basis an an 'Idiomatic Rust' solution (aka good patterns), the we should make the assumption of an 'Idiomatic Node' solution for Claude Code.

3) 'The other problem with your comment' is that your conclusion is wrong - by your own hand.

Right here: "Claude Code has severe architectural issues causing it to leak hundreds of gigabytes of RAM," - the implication being that Claude Claude does not inherently have to 'leak all that RAM' - and would run just as fine with some basic work.

An 'Idiomatic Node' implementation of Claude Code wouldn't exhibit those problems, and would perform pragmatically just as well as an Idiomatic Rust implementation.

From a memory management situation, Rust might use significantly less memory, but a 150Mb footprint vs 350Mb foot print for an average session is 'pragmatically immaterial'.

The difference in 'perceived performance' would be negligible - if any.

The 'cost' of writing a the 'kind of program that Claude code is' in a systems-level language would be quite a lot, for not really much benefit.

The 'Rust or C++' solution would not 'run circles' around the 'node' implementation in anything but some 'preformative', inward looking benchmarks, aka 'the worst kind of Engineering'.

Consider pondering why almost nobody writes such applications in Rust or C++.

regexorcist 16 hours ago [-]

You have a point but it's definitely not TBs for 1M. Should be more like 100G.

vlovich123 1 days ago [-]

It has nothing to do with local RAM usage. But a million tokens of LLM context is decidedly not 5mb.

The rough estimate is 2 * L * H_kv * D * bytes per element

Where:

* L = number of layers * H_kv = # of KV heads * D = head dimension * factor of 2 = keys + values

The dominant factor here is typically 2 * H_kv * D since it’s usually at least 2048 bytes. Per token.

For Llama3 7B youre looking at 128gib if you’re context is really 1M (not that that particular model supports a context so big). DeepSeek4 uses something called sparse attention so the above calculus is improved - 1M of context would use 5-10GiB.

But regardless of the details, you’re off by several orders of magnitude.

tujux 1 days ago [-]

Pretty sure we're talking about the output text, not the tensors.

m00x 21 hours ago [-]

These LLM replies are really getting annoying.

vlovich123 13 hours ago [-]

Mine? I literally wrote what I wrote because “context window” as a term of art refers to the LLM’s context window.

I guess get better at detecting LLMs instead of accusing everything of being an LLM reply?

SwellJoe 1 days ago [-]

The context window is not on your system. It's on the server with the model. There may be some local prompt caching, of some sort, but you're not locally hosting the context unless you're also locally hosting the model.

bluegatty 1 days ago [-]

Chat history is kept locally, generally you have to send the 'whole history' to the model 'each turn'.

SwellJoe 22 hours ago [-]

That's just the plain text (or whatever files), that's not the context the model is directly working with on the server, which is tokenized, embedded, vectorized and has attention run against those vectors. The local history is generally quite small, the context generally quite a bit larger. A text conversation of a few hundred kilobytes in plain text will be gigabytes in context.

bluegatty 20 hours ago [-]

KV for a sota model is into terrabytes

rixed 23 hours ago [-]

Only "generally"? I'm curious what API has moved away from this protocol that seems mode adapted to conversaions with humans than agentic loops.

_flux 20 hours ago [-]

To me it would certainly make sense if the protocol just said "append this text to context window id/sha256", in particular as the data is cached in tensor level in the provider side, so they need to first do that lookup anyway. So I would be surprised if they don't have that.

In addition, this protocol could make it more transparent to say "oh we cannot proceed as we dropped the this cache, are you sure you want to proceed and consume a whole lot of expensive uncached tokens?". Oh, maybe that's a reason not to do it..

bluegatty 23 hours ago [-]

So the standard API you pass it all along but I think there are some odd open ai apis that are different.

arjie 22 hours ago [-]

I had Claude Code build me one of these as well, though I added Dirac's line hashing for edits etc. Also used Rust, and I had this idea that I should use plugins so it can self-edit by implementing in hooks but in the end, I just have it create exhaust information about improvements into a separate file and just update the source code and recompile. The source code is in a fixed place so it can just rewrite and build the agent itself. I use it with DeepSeek 4 Flash running on 2x RTX 6000 Pros which I get some 138 tok/s on.

To be honest, I just plagiarized Pi, Dirac, OpenCode. Any new tricks in this one that I can steal?

joshka 20 hours ago [-]

Take a look at OpenAI blogs about codex: https://openai.com/index/unrolling-the-codex-agent-loop/ https://openai.com/index/harness-engineering/ https://openai.com/index/unlocking-the-codex-harness/

GodelNumbering 17 hours ago [-]

Creator of Dirac here. Glad to see it mentioned and even more glad that you found it useful.

I am currently in deep refactor mode to introduce modular tooling to Dirac since the concept of 'fixed' set of tools is starting to feel antiquated, adding tools on demand would be super convenient and a likely replacement for MCP (I understand not all use-cases of it)

karagenit 16 hours ago [-]

Curious how you’re handling prompt caching, as I understand it most LLM providers essentially inject tool definitions in the system prompt, so changing tools dynamically breaks the cache. This has been a big annoyance for me in a separate project; I currently just implemented my own tool-ish system that defines schemas in user messages and instructs the LLM to return matching JSON, but it’s less reliable than using the native tool calling + structured outputs available in the API.

GodelNumbering 6 hours ago [-]

Native tool calling indeed. By modular, I meant the tool defs are loaded dynamically per task and stay the same during the task

gidellav 22 hours ago [-]

Some interesting features I add on top of being lightweight are the prompts library, Git worktrees integration and Ralph Wiggum loops integrations.

arjie 21 hours ago [-]

Very cool. Thank you! I will look.

teo-mateo 21 hours ago [-]

Is it public on github?

arjie 11 hours ago [-]

Mine? No. It’s super idiosyncratic and I haven’t validated that it has not leaked secrets into the codebase.

normie3000 21 hours ago [-]

Yes.

wkcheng 24 hours ago [-]

This is nice! I tried it for a bit and it was indeed quite fast. Are you looking for contributors, or are you building this as a personal tool? I ran into some issues when attempting to use different models, though: gpt-5.5 on Azure doesn't work, even with the OpenAI compatible endpoint, because "max_tokens" has been replaced with "max_completion_tokens". And it doesn't appear possible to pass through custom headers, so I wasn't able to specify reasoning_effort for deepseek models.

gidellav 22 hours ago [-]

Yes, I am open for PRs.

What you showed is a clear bug in my codebase, if you can, open a Github issue with each of your bugs.

Thanks!

zbyforgotp 21 hours ago [-]

We don’t trust llm execution- so we add user approvals. But task decomposition calls for co-recursion between code and prompts. This means that the approvals should be evocable at any depth. I think we need some kind of protocol for that (à la the Cubes OS protocols for cut and paste between vms).

Maybe a workaround could be to use bubblewrap of the scripts ther recursively call the llm (and run the agent in yolo inside the wrap).

frabcus 20 hours ago [-]

Well, or not spawn any external commands, and actually have tools made of code written by someone who thought about what the agents at each level should be limited to doing.

zbyforgotp 20 hours ago [-]

In the limit we want the llm to write the code (like in RLMs).

alfiedotwtf 20 hours ago [-]

Or just run agents in a container…

hashmal 20 hours ago [-]

Currently, having LLM feeding on its own output repeatedly is the fastest way to get it hallucinate.

zbyforgotp 14 hours ago [-]

Too late for fixing it - but of course I meant https://www.qubes-os.org/

agumonkey 19 hours ago [-]

Transactional recursive agents ?

Nothing is committed until the final top-level transaction is accepted.

gidellav 16 hours ago [-]

zerostack contains --sandbox flags that forces bwrap usage on all shell tool usage

360MustangScope 1 days ago [-]

Funny this comes out today. I was just about to start to write one in rust. It's amazing having opencode slowly leak memory and end up becoming 6gbs on a large project and then get slower and slower.

Will check this out! Seems cool!

gidellav 1 days ago [-]

Yes! This project derived from an OOM killer activation that happened on my old laptop beacuse i had more than 2 opencode instances open together with Firefox...

hiAndrewQuinn 1 days ago [-]

The codebase was small enough that I handed it over to DeepSeek v4 Flash in Pi to skim through for any risky business, and I didn't find anything concerning. Nice work.

koito17 1 days ago [-]

Since the OP stated they used DeepSeek V4 Flash for generating a lot of the code, I decided to check whether there were any outdated dependencies. In my experience, with Rust projects, if you do not instruct models (even Claude 4.7 Opus) to use `cargo add` instead of manually editing the Cargo.toml, you will almost certainly get out-of-date dependencies added to your project.

Manually checking the dependencies used by this project, I was pleased to see they are all the latest version. That doesn't mean there are no issues lurking in transitive dependencies, of course.

As for getting an LLM to review the code, I think we can get all opinionated very fast. For instance, when I was eyeballing the code, some of the enum methods converting to/from strings made me think "this could've been a single #[derive] with strum." That would make the code in provider.rs a lot more concise, at the cost of importing one crate (with no dependencies!)

Lastly, for fun, I decided to get DeepSeek V4 Pro (with Max thinking) to "audit" the codebase. The output mentioned no obvious signs of hidden telemetry, but it did note that the project sets the panic handler to "abort", which I have strong opinions on... Presumably the OP wanted to avoid linking against libunwind to save a few kilobytes of binary size, but now you have a binary that immediately aborts and doesn't give the user a stacktrace of what just crashed. I would rather have a ~50 KiB larger binary if it means getting useful debug info during a panic. Additionally, if there are async tasks that panic, they can't be recovered to display a generic error message; instead the whole process just aborts.

gidellav 1 days ago [-]

Hi, nice comment!

1. I had experience not only with wrong versions selected by the agents, but also weird crates (ex. choosing a crate with 10 github stars when a more complete and more supported one was available), reason why now I always choose the dependencies and then I let the agent work.

2. Yes, some of the provider code could be made using macros, I am just lazy... But thanks for the tip! I will save it for later.

3. No telemetry, and it can be checked thanks to the fact that there are no HTTP calls outside of the MCP implementation (via rmcp) and LLM connectors (via rig)

4. Yes, i set panic handler to 'abort', thinking that I would've get a nice size decrease: i yet have to experience a panic on this project, but I will revert it to default behavior if the binary size saving is really so small

5. While it is async, the entire project runs on one thread (as expressed in the main.rs with ```#[tokio::main(flavor = "current_thread")]```), as it allows for a nice ~8MB memory saving (so, 50% off) and no real performance loss, being such a simple tool.

---

P.S. Just switched back to default settings for panic handler

hiAndrewQuinn 1 days ago [-]

Hidden telemetry was my big concern, yes; the abort thing wasn't caught as a security thing by DeepSeek V4 Flash but it was mentioned by Claude 4.7 Opus (I wanted to compare and contrast here), and Flash brought it up later when I asked it about performance tuning.

`cargo add` tip is very helpful, I had a hunch this happened in my own Rust project and I think you just filled in the missing piece for me there.

vlovich123 1 days ago [-]

To me panic=abort is much safer security as it means you’re unlikely to enter weird states due to incorrectly handled unwinding. The only attack vector is a DOS attack which is a short term thing that’s easily rectified.

gidellav 1 days ago [-]

Thanks! Funny enough, a good chunk of the coding was done by Deepseek v4 Flash, while I hand-wrote a couple of the TUI logic, as deepseek kept failing on certain cursor-moving logic, and I fully managed the memory optimization process (as you can read on another comment I left, it both a set of compiler optimizations and usage of certain Rust crates in order to leverage more efficient data structures).

hiAndrewQuinn 1 days ago [-]

Taking notes and comparing this against my own (non coding agent) Rust TUI project, thank you! I'm new to Rust so this is a helpful baseline.

gidellav 1 days ago [-]

No problem, happy to help!

kadoban 1 days ago [-]

> I handed it over to DeepSeek v4 Flash in Pi to skim through for any risky business

Doesn't prompt injection make that a rather flimsy investigation?

wolttam 9 hours ago [-]

The way I see this going is there will be 10s of thousands of model harness projects out there, because the tools make it so easy to make a harness that suites your workflows exactly the way you like (as someone who made their own harness)

I also used bwrap for sandboxing. I'm looking at layering slirp4netns, because I found out that models will happily break out of the sandbox via the the host network interface.

whazor 17 hours ago [-]

It says inspired by Pi, but I don't see any extension/plugin possibilities. The best feature of Pi is that an extension can hook anywhere and completely change the behavior. It also allows two extensions to stack on the same hook where there are no conflicts.

I believe Pi extensibility is the most important feature, exactly as how it was important for WordPress. WordPress won because anyone could install it and add the plugins they needed. WordPress also has the same hook system where multiple plugins can build on the same hook.

Companies will want to completely customize their agent harness so it optimally works for their situation.

zrg 16 hours ago [-]

I'm actually very close to being ready to release exactly that also in rust. I completely agree with your statement, extensibility is the most importnat feature.

https://x.com/PandelisZ/status/2055633346831548902

The two things I want to get right before actually releasing it is properly eval it againt other harnesses and make sure its better.

And the licence. I don't think a GPL licence will yield addoption so I would like to MIT Roder or figure out the right licence

gidellav 16 hours ago [-]

Check https://news.ycombinator.com/item?id=48164948

krzyk 16 hours ago [-]

The most important feature of Pi is that it is small, and has small system prompt, making it great for locall LLMs.

khimaros 1 days ago [-]

i built something with a similar philosophy here: https://github.com/khimaros/airun -- it is intended to be piped and redirected. it discovers skills, AGENTS and prompt templates from Claude Code, Pi.dev, OpenCode and others. no TUI, but does have a basic tool calling loop

$ airun -q -p 'output a shell command for linux to display the current time. output only the command with no other code fencing or prose' | airun -q -s 'review the provided shell command, determine if it is safe, run it only if it is safe, and then summarize the output from the command' --permissions-allow='bash:date *'

gidellav 1 days ago [-]

While I think that the core philosohpy is the same, i'd like to ask: why adding features like Skills and prompt templates?

I personally decided to not implement Skills and instead using a prompt library approach, where certain .md are used to fully replace the system prompt, in order to allow for an approach similar to Skills with ~100 LoC dedicated to this system.

afzalive 1 days ago [-]

Isn't the key thing with skills that the description is used to match them from a prompt that doesn't mention them?

Would a prompt library do that too?

khimaros 12 hours ago [-]

i wanted airun to be drop-in useful in existing Claude/OpenCode/etc projects and skills are common.

c-hendricks 1 days ago [-]

Aren't skills fairly easy to share, and can contain more than one file?

desireco42 1 days ago [-]

Prompts as well... he might be on to something here, can't say as I didn't try it yet

Skills are just prompts

c-hendricks 14 hours ago [-]

Skills are _like_ prompts, yes, they're extra info added to the context. A prompt is just a prompt though, an agent like Claude could use multiple skills in one go, which seems impossible to do with Zerostack.

hedgehog 1 days ago [-]

Most of mine have code in them. That's most of the value.

cobolcomesback 16 hours ago [-]

Skills are not just prompts.. the entire problem that skills solve is runtime discoverability via a skill description. Agents can self-recognize that a skill would be useful in a situation, and then load+use.

Prompts are just text templates entered by the user, and the user must specifically know when to and remember to invoke them. If you’re just using skills as if they are the same as prompts, you’re totally missing out on the entire benefit that skills provide!

tontinton 10 hours ago [-]

Yo that's really similar to my very own https://github.com/tontinton/maki only I'm MIT and you're GPL, cool

halcyonblue 9 hours ago [-]

https://forgecode.dev/ https://github.com/tailcallhq/forgecode is written in Rust too and seems surprisingly capable. How does Zerostack compare to forgecode?

goyozi 22 hours ago [-]

Really neat, I’ll have to try it when I’m at home. Lean, fast tools really make a difference in the coding experience.

I’m curious how the prompts idea performs in practice compared to typical skills and subagents. I frequently combine the two to get otherwise tricky workflows done. Say I have a failing build. I invoke my /fix-ci skill (sometimes in the same context I made the code change in), it launches a subagent to extract an error message / stack traces / relevant logs, and works through the problem. Say an integration test ran into a db query issue. Sometimes the agent itself, sometimes with a slight nudge from me, will load the readonly db access skill and start investigating. If I expect long, deep shenanigans, I’ll often say something like „use a sonnet subagent and instruct it to use the db query skill to debug the behavior we’re seeing”. And it can keep going like that: skills give extra capabilities on the fly, subagents isolate context to prevent bloat. Intuitively, it seems that by the agent running itself via bash with different prompts _might_ come close but a bit less streamlined? I’d have to check and see.

gidellav 22 hours ago [-]

Well... for the most part, you use it like skills, but instead of "commands" you can think of "environments": so '/prompt debug', which is one of the integrated prompts, allows for a debug-focused agent, you can then talk to it as a normal agent, and then '/prompt code' to go back to the standard coding agent.

About subagents: as of right now, the entire agent runs on one context buffer, so it doesn't support subagents in order to keep it lean; but there is a great chance that subagents will be added, as explore-heavy tasks often bloat the context window

post_below 21 hours ago [-]

It sounds like you're saying that /prompt changes the system message part of the session. Doesn't that cause a cache break and result in higher usage/cost?

post_below 19 hours ago [-]

I took a quick look at the source code and it looks like, yes, using /prompt during a session will rebuild the session with a new preamble/system prompt, causing a full cache miss on the next turn.

So in that way it's not like skills at all, neither of those result in paying full read price on the entire session, just the skill prompt itself.

Something else I noticed... In the Anthropic implementation it doesn't seem to be using 'cache_control' in the body. Assuming my understanding is current, without that the Anthropic API won't do any caching at all (unlike most other APIs that do some level of automatic caching without it being requested). So that would result in paying full read price on every turn.

Of course I could be missing something, it was a quick look. Can you clarify?

GTonehour 17 hours ago [-]

I tried to list the competing open-source AI coding agents to compare their popularity over time — opencode wins for now.

https://www.star-history.com/?repos=anthropics%2Fclaude-code...

nextaccountic 16 hours ago [-]

> Bash execution ... optional sandboxing for isolation

Sandboxing should be the default. Rather than routinely allowing unsandboxed access, one should be able to configure the sandbox to allow exactly what is needed

That's hard. For example, I've been unable to give wayland access to agents inside the sandbox (there's a special flag in bubblewrap to mount /dev/dri in a way you can make use of it, but you also must give access to the wayland socket, and maybe other things). So I think that maybe harnesses should invest in more sandboxing resources

gidellav 16 hours ago [-]

This is actually a topic of current interest, and I think that I will switch to a sandbox-by-default once the bwrap implementation inside of zerostack is well tested and highly configurable.

sinansaka 20 hours ago [-]

Love it! I think the minimal approach you took is the right path forward. As others mentioned, small harnesses make it possible to run many agents in parallel and in small cloud instances. working on a minimal agent in Go myself for this use case.

martingxx 18 hours ago [-]

I wonder how this compares to tau https://tau-agent.dev/ ?

Both are in Rust and both mention Unix in their descriptions.

coalstartprob 18 hours ago [-]

[dead]

mohsen1 24 hours ago [-]

This is much needed!

Compared to Codex CLI, Claude Code is insanely slow.

    $  time claude --version
    2.1.143 (Claude Code)

    ________________________________________________________

    Executed in    4.39 secs      fish           external
    usr time   29.68 millis    0.26 millis   29.41 millis
    sys time   71.30 millis    1.30 millis   70.00 millis

5 seconds to show me the version number!

I'm guessing Claude Code also needs a rewrite in Rust. But from what I saw in the leaked TypeScript code, a line-to-line port will be pretty bad. It requires a new architecture that matches Rust idioms

nomel 23 hours ago [-]

Note that includes network requests to check latest version.

I suspect we'll soon see someone make a persistent Claude shell mode, with the reverse of a !, where you work in shell and send a message to Claude, and Claude sees all the context.

marcosscriven 21 hours ago [-]

What version of time is giving you that kind of output?

pramodbiligiri 17 hours ago [-]

Looks like that time command was invoked from "fish" shell: https://fishshell.com/docs/current/cmds/time.html

1 days ago [-]

zoobab 15 hours ago [-]

I tried to install opencode on my x200 laptop, it would segfault as Bun wants some specific intel processor extensions (SIMD).

Now I tried to install zerostack, but the compilation freezes at a certain package.

Is there a static binary available for linux?

zoobab 9 hours ago [-]

I finally managed to compile it, quite happy with the usage.

Will try to rebuild it with static flag.

ianberdin 10 hours ago [-]

Don’t get me wrong, but 7K LoCs means it is still an early attempt to make a coding agent. It starts easy “ah it can edit and read files!”, but it requires a lot of extra effort to make properly for many edge cases, especially caching, price optimizations, etc.

I’ve been implementing custom coding agent in https://playcode.io for 3 years already. Far beyond of 7K LoCs.

So when you compare to “shitty slow” Claude code - I don’t agree.

gidellav 10 hours ago [-]

Check what tools we already implemented, check your "slow" accusation, check the prompt system, check the provider integration (via Rig, so caching is already enabled), check the MCP support and other integrations that you don't even find on some major agents (git worktrees + loops).

For 3 years, your Lovable clone is something that Claude Code could make in a couple of days, but good luck shitting on other project I guess.

nopurpose 14 hours ago [-]

How would one create custom tools for it? opencode offers TS SDK for it, but with rust it will be something more heavyweight like gRPC bridge (similar to how terrafoem providers work).

tsiao1999 19 hours ago [-]

I’m also playing around with Rust for building agents—my setup ends up looking a lot like ZeroStack’s approach. If anyone’s curious, my project is here: https://github.com/7df-lab/devo

Fuzzwah 17 hours ago [-]

The screenshots in your readme all 404

Phlogi 22 hours ago [-]

Looks interesting, how would you use skills with that? Would I need to migrate them into prompts? Which I think is not the same.

E.g. how to use official, vendor provided skills with zerostack? https://github.com/elestio/elestio-skill

ffsm8 22 hours ago [-]

Technically, a skill is equivalent to adding

'"The skill description": if this applies, read /path/to/skill/definition.md'

To your agents.md

At least currently skills don't let you set the model (to my knowledge), so that's not a distinction either here (it would be with agent definitions)

inciampati 1 days ago [-]

> Integrated Ralph Wiggum loops: looping capabilities for long-horizon tasks

Imo, this shouldn't be embedded in the executor layer. Orchestration should handle this.

gidellav 1 days ago [-]

I get you, but when I decided to follow a no-skills approach (as in, no agent's Skills used), I had to decide what:

1. Couldn't be built only using prompts

2. Couldn't be built only using MCP servers

3. Would have improved my UX experience (as i hope, your UX experience).

From those three conditions, I chose integrated git worktrees and loops

qsera 1 days ago [-]

Is AI is the new Waterfall/Agile methodology with all the lingo/terminology/names that make no damn sense?

Appears so, because I am so turned off by it...

noodletheworld 1 days ago [-]

Are agent harnesses the new web framework?

Everyone wants to write one, building a new one is easy to start with, but tough to get to “prod ready” and the landscape is littered with failed attempts?

Certainly feels like it.

This is really good though; works well and at least has a clearly articulated raison d'être.

spectaclepiece 23 hours ago [-]

The key thing with pi is that it can extend itself. How does that work when it’s written in rust?

nextaccountic 16 hours ago [-]

The usual way to make a Rust program extensible is to embed a wasm interpreter. Then the agent can extend it by writing an extension in Rust or any other language that compiles to wasm. Zed does it for example

adastra22 22 hours ago [-]

That's a bit like saying "the key thing with Lisp is that it can extend itself." Yes, that is a core feature and a lot of people use it for that reason. But not everyone. Other use pi just because it is a small agent harness, but don't need (or don't want) the self-extensibility.

perlgeek 11 hours ago [-]

Are there any pre-built Linux binaries for this? I tried to install it with cargo, but got "feature `edition2024` is required" (which is the newest cargo available from my current Ubuntu distro).

Also, can I configure zerostack to always require a sandbox? I don't want to accidentally forget to call it with --sandbox.

tedshark 21 hours ago [-]

New to this. but whats the benefit over models like Claude code ?

frabcus 21 hours ago [-]

Make harness independent of model, so when pricing or quality changes you can switch.

Avoid lock in to stack from one provider (things like a harness that only works with models from one provider and so on).

Use local models (a couple of them do work a bit now, if you have 20Gb video RAM), which saves money and is more private, and works offline.

Can improve the harness, fix bugs in it, make it compatible with different systems and techniques.

This game happens every time in new cycles of developer technology. The good bet historically has always been to use open source - there's a reason most developer tooling just pre-AI revolution was open source (even things like Java and .NET which used to be proprietary).

DeathArrow 17 hours ago [-]

>Make harness independent of model

You can use Claude Code with almost any model.

>Use local models (a couple of them do work a bit now, if you have 20Gb video RAM), which saves money and is more private, and works offline.

You can do that with Claude Code.

timwis 21 hours ago [-]

Different harness (pi), but this blog post may partially answer your question: https://mariozechner.at/posts/2025-11-30-pi-coding-agent/

sergiotapia 1 days ago [-]

Given agent harnesses affect so much of the performance of models, it would be great to see some kind of benchmark on how this tool performs compared to claude/codex/opencode/pi etc.

gidellav 1 days ago [-]

Hi! While I didn't try any agent benchmark, I already though of this possible issue, and I tried to approach it on two different levels:

1. The tools that are given to the agent are almost the same to the one defined in Opencode, except for Skills and Subagents (both features not implemented in zerostack)

2. Zerostack is prompt-based, so that it ships with a set of .md files, stored in ~/.config/zerostack/prompt, and that can be selected from the TUI in order to activate different 'agents': as you can see from the README, it is designed to contain the most important feautres of superpower + Claude's front-end design + git worktree support and Ralph Wiggum loops (both as integrated features)

esafak 1 days ago [-]

It's been said before, but it is important to prospective users, so it bears repeating: screenshots and benchmarks, please; it helps users decide whether to invest time in it. The ability to transfer settings from other agents would be great too.

gidellav 1 days ago [-]

1. I will add some screenshots tomorrow

2. As said before, there are no benchmarks right now, but it is good enough for me, so I hope it's good enough for y'all :)

3. Transfering settings from other agents is out-of-scope for a minimalstic coding agent, but the idea is that, apart from MCP server, the rest might just force you to learn how zerostack works, because of design choices such as not having Skills or having certain specialized tools integrated (worktrees and loops).

1 days ago [-]

theusus 1 days ago [-]

I absolutely like this. Pi becomes sluggish after installing a couple of extensions. I myself was trying to port Pi to Rust but it was consuming too much tokens.

Is there any API like Pi so that I can create extensions.

esperent 1 days ago [-]

It absolutely doesn't. It must be the extensions you're using.

I've found is that nearly every extension on the official pi.dev/packages is vibe coded trash, like for example the most popular subagents extension.

Instead of just giving you a basic subagent, it's a whole kitchen sink of recursion, teams, chains, confusingly named agents like "oracle" etc. Basically feels like someone kept prompting "what else could we add here?".

They're all like that. It's no wonder these slow down pi.

What I've done is just have the agent write my own.

Get a local copy of e.g. that kitchen sink subagents extension. Have the agent list all the features, then I give back a much smaller list of the features I want and say "write me a new extension with just these new features" and every time it one shots it (using GPT 5.3 usually), then 20-30 minutes later I have a working, lightweight extension tuned to my exact workflow.

I've done this for I guess about 8 extensions now (subagents, a lightweight typescript LSP, web search, background processes, Claude style hooks, plan mode are the main ones) and it's very fast and snappy.

theusus 24 hours ago [-]

Still they are maintained by those developers. I cannot spend my time developing extensions. I'd rather do that in Rust.

esperent 23 hours ago [-]

Then pi is probably not for you, as doing this is pretty much the whole selling point. You could try oh-my-pi or OpenCode instead.

0xAstro 20 hours ago [-]

These simple harnesses perform the best in my day to day experience but I sitll can't figure out why that's the case.

jwpapi 20 hours ago [-]

Because they don’t have an incentive to maximize your usage, but rather focus on solving probabilistic solvable problems for you.

Bigger harnesses need to balance upping your token usage and being helpful.

eddy-sekorti 15 hours ago [-]

How is it any faster than something written in anyother programming languages?

2001zhaozhao 22 hours ago [-]

Hmm, Claude Code and Opencode work fine for me.

It's a bit amusing that coding agents rely on drawing 1000W+ and using 2TB+ of memory in a datacenter to run, yet people really focus on the last few watts and few hundred megabytes of memory on their laptop (which get dwarfed by the energy cost of compiling their code anyways). But I suppose making them a bit faster and lighter wouldn't hurt.

kvdveer 22 hours ago [-]

The data centre runs on a dedicated power line. My laptop runs on battery. Using coding agents currently drains battery quite fast, which is surprising, given that the vast majority of the work does not take place on my laptop.

Making the client side coding agent more efficient isn't about saving the climate. It is about extending the workday (which might actually make the climate worse)

21 hours ago [-]

remus 21 hours ago [-]

I think this is overly reductive. For sure the models are behemoths and consume a lot of resources, but the harness can have a big impact on how much the model is used. For example, having a strong set of tools available in the harness means the model can work much more efficiently.

NewJazz 21 hours ago [-]

It is also just an indicator of the planning and polish that a particular harness may have.

huflungdung 22 hours ago [-]

[dead]

teiferer 19 hours ago [-]

Could we finally put the whole "written in pure Rust" thing as if it is a certificate of quality to rest? You can write crap in Rust, you can write excellent software in Rust, and both goes for all other languages too. I don't care what language you used for a project from the quality POV. Slop is slop, no matter Rust or JS or C.

born-jre 22 hours ago [-]

Sorry, it looks like we were not able to load the page. Please make sure your network connection works and you are using an up-to-date browser. If the issue persists, please visit our issue tracker to report the problem

Got this on iPhone firefox

gidellav 22 hours ago [-]

Retry from Safari, sometimes it works better

slopinthebag 1 days ago [-]

I love these. Coding agents aren't very difficult to build, it's a TUI + tools + getting a nice agent loop working. The hardest part seems to be supporting all of the different providers and model quirks. What is interesting is seeing the experimentation: some provide tons of tools, others provide a single python interpreter and have the agent use tools via sandboxed python scripts, others use minimal tools and lean on bash. Personally I want a harness that gives a ton of control to the user to let them steer the LLM, less agent and more augmentation. Maybe I'll have to build it myself. If anyone has ideas, let me know.

inhumantsar 22 hours ago [-]

I'm working on one right now where nearly everything can be expressed as a combination of workflows. There will be some built-in agent types out of the box but all the Lego pieces are there if you want to put together something different.

michalsustr 22 hours ago [-]

What language are you building this in? I’m interested but trying to stay away from js world for security reasons.

inhumantsar 12 hours ago [-]

The system and plugins are Rust. Workflows can be defined in a plugin with Rust or externally with YAML.

Might add support for custom WASM plugins down the road, but everything shipped with the system will be Rust.

afzalive 1 days ago [-]

Pi.dev is pretty good in giving tons of control to the use and has extensions that you can easily build.

Although people are complaining about its RAM usage in this thread, I haven't bothered to check how much RAM it uses.

slopinthebag 4 hours ago [-]

I refuse to run npm slop on my hardware

usernametaken29 1 days ago [-]

Now make it into an IntelliJ plugin which has proper access to the search index. I’ll pay for it. For Christs sake it’s insane JetBrains hasn’t figured this out yet

gidellav 22 hours ago [-]

I am currently deciding on adding ACP support or not (and ACP support should allow connections to JetBrains's IDEs)

upcoming-sesame 18 hours ago [-]

Yes please.

TUIs are cool but sometimes people prefer staying in the IDE

nullorempty 1 days ago [-]

I think this is such an opportunity for JetBrains. I talked to them about this at AWS Re-Invent, strangely, they could really see how strong of a position they are in if only they paid attention to the right thing!

usernametaken29 1 days ago [-]

They even have this already, Junie, but of course the plugin version cannot use BYOK….

kirtivr 1 days ago [-]

Jetbrains does not have their own IDE-integrated coding agent?

What do Jetbrains users use then? Amp?

krzyk 15 hours ago [-]

What is the use case for integrating coding agent in IDE?

I use run agents outside of my IDE, while they work I can look at the code they created, or I can us IDE to do different work.

sgarman 24 hours ago [-]

https://www.jetbrains.com/junie/

usernametaken29 23 hours ago [-]

Junie does not support BYOK inside the IDE

leonsmith 21 hours ago [-]

Has this position recently changed? It states this on the marketing page?

> Use a JetBrains AI subscription or connect your preferred provider with Bring Your Own Key (BYOK).

Ardren 18 hours ago [-]

It seem confusing. My understanding is the AI assistant part (i.e. chat) is configurable. But Junie IDE is only via credits through Jetbrains.

https://youtrack.jetbrains.com/articles/SUPPORT-A-1833/What-...

(To make it more confusing, Junie CLI seems to say it will any provider)

PythonLuvr 17 hours ago [-]

[flagged]

Mashimo 21 hours ago [-]

What does the k stand for? Key?

You can add any open Ai api endpoint you want, no?

usernametaken29 17 hours ago [-]

No, you have to buy their subscription within the IDE

Mashimo 14 hours ago [-]

The JetBrains AI Assistant plugins says:

> Choose how AI runs by selecting built-in AI models from top-tier providers, bringing your own API keys or connecting local models.

And the AI Assistant in turn can use Junie.

At least that is what the plugin overview says, I have not tested it.

dtauzell 23 hours ago [-]

Does the IntelliJ mcp server do that? It has find tools

rw_panic0_0 19 hours ago [-]

what "unix-inspired" here means?

deagle50 1 days ago [-]

Looks promising, is OpenAI subscription support planned?

hparadiz 1 days ago [-]

this is what I've been waiting for

a low level language. please no more scripting language TUIs!

nine_k 1 days ago [-]

Rust, a language with affine types, generics, lifetimes, deep static analysis, hygienic macros, etc is not low-level. It's nearly as high-level as Haskell (without HKTs though).

It just does not rely on GC and allows to manage resources efficiently. This efficiency is partly due to its being so high-level.

gidellav 1 days ago [-]

While I agree on the fact that it allows to manage resources efficiently, I don't agree on the fact the efficency derives from it being high-level; from a purely tecnical standpoint, i could skim off 2-3MB from the memory footprint by writing the code in pure C, as there are some unused parts of Rust's std that cannot be removed without recompiling std.

This is obv only a technical talk, as writing an AI TUI in pure C would be rather... ehhh

nine_k 1 days ago [-]

That's why I said "part of its efficiency". Rust can do RAII, can optimize things more aggressively because of no aliasing ever in safe code, and because of known lifetimes, it can offer fearless concurrency™. Rust can also support highly optimized data representations (see how Optional works, or other ADTs, etc) which languages like Haskell, to say nothing of Python, cannot offer because of GC and boxing.

Lower-level languages like Zig or even Go, to say nothing of C, lack many of the high-level language features that power this efficiency.

onlyrealcuzzo 1 days ago [-]

Agreed, Rust is way more expressive than people give it credit for.

schaefer 1 days ago [-]

There has been no reason to wait... Codex is written in rust.

-- So is deepseek-tui.

hparadiz 1 days ago [-]

Forgot to add an open source qualifier. I use codex lol

andxor 1 days ago [-]

Codex is also opensource.

hparadiz 1 days ago [-]

I don't really want something owned by a company for my local stuff. I'd prefer it be small and minimalistic. Maybe in the future I'll change my mind and it will be more like a browser but for now I wanna keep it small and local.

gidellav 1 days ago [-]

Thanks! I don't think that the only advantages are being open and lightweight, but you can actually find some more interesting features such as Ollama support, integrated Prompts (in order to compete with superpowers), git worktrees integration, and so on

iknowstuff 1 days ago [-]

Isn’t codex in rust?

rvz 1 days ago [-]

yes.

cyberpunk 20 hours ago [-]

How come the official codex install instructions say use npm install?

(I just rebuilt my sandbox vm a few days ago….)

Or are there two separate codex clients?

https://developers.openai.com/codex/cli

nicoritschel 11 hours ago [-]

The one from npm is signed by OpenAI, which means computer use from the CLI. The brew distribution requires using the Codex app for computer use.

Thanks Apple.

krzyk 15 hours ago [-]

Because people are crazy, usage of npm for installing binaries is quite common unfortunately.

cyberpunk 14 hours ago [-]

So …. do I understand it right? openai, one of the hottest companies on the planet right now, with very deep pockets, distribute their official rust cli via the … public npm repo?

krzyk 14 hours ago [-]

Yes.

There is also homebrew install.

icase 16 hours ago [-]

omfg stop

nobody actually cares about rust, let alone likes it

choopachups 1 days ago [-]

dude, im actually in disbelief how long we put up with the pile of shit that is claude code.

NamlchakKhandro 6 hours ago [-]

No extensions? I think you've missed the point

tencentshill 24 hours ago [-]

This may be the most HN post I have ever seen.

DeathArrow 22 hours ago [-]

IMO, the problem with Claude Code, OpenCode, Pi is the harness quality and convincing the agents to do the exact things you need, to define workflows and make the agents stick to it. I didn't experience performance issues.

For example I have an agent in Claude Code that has strict rules to do something before implementing every phase in the plan. Sometimes it decides not to do it. "But, wait the feature is simple enough so I can proceed straight to implementation..."

Just because this is written in Rust won't solve the biggest issues most users have with coding agents.

bhaak 19 hours ago [-]

But that‘s not an issue with the coding agent. It’s the model that doesn’t follow the instructions.

Given how an LLM works, you can never be sure it will always work. LLMs are not deterministic.

DeathArrow 17 hours ago [-]

Isn't a harness supposed to guide and steer yhe coding agent?

bhaak 16 hours ago [-]

While the harness can block certain actions (e.g., tool usage), it can’t enforce perfect adherence to instructions because the model itself is probabilistic. The harness can reduce deviations, but it can’t eliminate the fundamental unpredictability of LLMs.

The rules that are fed into the AI are not unbreakable laws to the AI. We should always remember that.

DeathArrow 22 hours ago [-]

How does this do in SWE-Bench Pro and Terminal Bench?

phplovesong 23 hours ago [-]

Does anyone use claude with custom agents? IIRC they banned the use, and only allow claudes own agent.

shepherdjerred 23 hours ago [-]

You can use Claude with other harnesses at API costs, but you cannot use it with your Claude Code sub. That's changing next month though, I guess https://support.claude.com/en/articles/15036540-use-the-clau...

DeathArrow 21 hours ago [-]

I use Claude Code with GLM 5.1, MiniMax M2.7, Kimi K2.6 and Xiaomi MiMo V2.5 Pro.

rvz 24 hours ago [-]

As you can see, writing a coding agent in a compiled language makes a ton of sense and gives the benefits of running multiple agents efficiently instead of running into leaks and tools consuming gigabytes of RAM.

_user_account 18 hours ago [-]

That makes no sense, coding harness are just subprocess wrappers + http calls. What is the benefit if at the end of the day it will spawn make,cmake,python,node.js, or whatever the developer is working on? With the enormous downside of loosing native/easy extensibility, JavaScript Object Notation (JSON) is derived from JavaScript, it seamlessly parses and dumps.

anuis258 14 hours ago [-]

hmm

joeyguerra 24 hours ago [-]

the war of the coding agents has begun.

kapija 18 hours ago [-]

woo hoo, more ai slop...

obaid 22 hours ago [-]

Worth noting the "Unix-inspired" framing is the HN title, not the README — the project itself pitches "minimalistic" and "optimized for memory footprint." Curious what the author means by Unix-inspired specifically, since a single-binary TUI running a multi-tool agent loop doesn't immediately read as do-one-thing-well-and-compose.

Sim-In-Silico 19 hours ago [-]

[flagged]

sarim 24 hours ago [-]

[flagged]

LuminaNAO 12 hours ago [-]

[dead]

shrmarahul 15 hours ago [-]

[flagged]

kuanghs 20 hours ago [-]

[dead]

edgardurand 1 days ago [-]

[flagged]

phoebe_builds 1 days ago [-]

[flagged]

amys94fr 18 hours ago [-]

[flagged]

artem_am 1 days ago [-]

[flagged]

IndianAISupport 7 hours ago [-]

Another one. Cool, cool.

nimchimpsky 1 days ago [-]

[dead]

andrew_kwak 1 days ago [-]

[flagged]

brcmthrowaway 1 days ago [-]

!RemindMe 6 months

kuberwastaken 18 hours ago [-]

This is awesome! can't wait to see where it goes as it continues development

Always funny how Hacker News works with traction, posted about a rust based TUI agent I'm working on a couple days ago too :P

https://github.com/Kuberwastaken/claurst

zby 19 hours ago [-]

There is also https://github.com/Dicklesworthstone/pi_agent_rust

I vibed a comparison/review of these two systems using my llm wiki: https://zby.github.io/commonplace/work/pi-agent-zerostack-co...

(the prompt is in https://zby.github.io/commonplace/work/pi-agent-zerostack-co...)

cassianoleal 19 hours ago [-]

Your bot seems to think that `pi_agent_rust` is the same as upstream Pi.

zby 19 hours ago [-]

I think I fixed this in a later revision. Does that persist?

Rendered at 04:12:16 GMT+0000 (Coordinated Universal Time) with Vercel.