People may feel differently about the fee that OpenRouter takes, but I think the service they provide is worth the extra cost.
Having access to dozens of models through a single API key, tracking cost of each request, being able to run the same request on different models and comparing their results next to each other, separating usages through different API keys, adding your own presets, setting your routing rules...
And once you start using an account with multiple users, it's even more useful to have all those features!
Not relying on a subscription and having the right to do exactly what you want with your API key (using it with any tool/harness...) is also a big plus to me.
therealpygon 1 days ago [-]
I agree with you in certain circumstances, but not really for internal user inference. OpenRouter is great if you need to maintain uptime, but for basic usage (chat/coding/self-agents) you can do all of what you mentioned and more with a LiteLLM instance. The number of companies that send a bill is rarely a concern when it comes to “is work getting done”, but I agree with you that minimizing user friction is best.
For general use, I personally don’t see much justification as to why I would want to pay a per-token fee just to not create a few accounts with my trusted providers and add them to an instance for users. It is transparent to users beyond them having a single internal API key (or multiple if you want to track specific app usage) for all the models they have access to, with limits and logging. They wouldn’t even need to know what provider is hosting the model and the underlying provider could be swapped without users knowing.
It is certainly easier to pay a fee per token on a small scale and not have to run an instance, so less technical users could definitely find advantage in just sticking with OpenRouter.
BeetleB 1 days ago [-]
The two things I like about OpenRouter:
1. The LLM provider doesn't know it's you (unless you have personally identifiable information in your queries). If N people are accessing GPT-5.x using OpenRouter, OpenAI can't distinguish the people. It doesn't know if 1 person made all those requests, or N.
It's been forever since I played with LiteLLM. Can I get these with it?
napoleond 1 days ago [-]
> It doesn't know if 1 person made all those requests, or N.
FWIW this is highly unlikely to be true.
It's true that the upstream provider won't know it's _you_ per se, but most LLM providers strongly encourage proxies like OpenRouter to distinguish between downstream clients for security and performance reasons.
Fair point. Would be good to hear from OpenRouter folks on how they handle the safety identifier.
For prompt caching, they already say they permit it, and do not consider it "logging" (i.e. if you have zero retention turned on, it will still go to providers who do prompt caching).
Deathmax 1 days ago [-]
OpenRouter tells you if they submit with your user ID or anonymously if you hover over one of the icons on the provider, eg OpenAI has "OpenRouter submits API requests to this provider with an anonymous user ID.", Azure OpenAI on the other hand has "OpenRouter submits API requests to this provider anonymously.".
BeetleB 1 days ago [-]
But does "anonymous user ID" mean that they make a user ID for you, and it's sticky? If I make a request today and another tomorrow, the same anonymous user ID is sent each time? Or do they keep changing it?
therealpygon 19 hours ago [-]
I believe they are static user ids that only OpenRouter knows is you (the anonymous part. Static id is required for any cached pricing. If the user id changes each request, it would be a massive security hole to reuse that cache between requests with different user ids.
Without caching, it would make sense to be per-request (more like a transaction-id, and would make sense to be) as this could then be tied internally back to a user while maintaining external anonymity, but unfortunately I don’t believe that is the case.
therealpygon 19 hours ago [-]
1 - I can’t speak to whether that is the case with OpenRouter. However, I suspect that there is more than enough fingerprint and uniqueness inherent to the requests that an AI could probably do a fairly accurate job of reconstructing “possible” sources, even with such anonymity. The result is the same, all your information is still tied to OpenRouter in order to track the billing. That also ignores that OpenRouter is also privy to all that same information. In the end, it comes down to how much you trust your partners.
As for LiteLLM, the company you would pay for inference is going to know it is “you” — the account — but LiteLLM would also have the same effect of appearing to be a single source to that provider. That said, a uniqueness for a user may be passed (as is often with OpenRouter also) for security. Only you know who the users are, that never has to leave your network if you don’t want.
2 - well, you select the providers, so that’s pretty much on you? :-) basically, you are establishing accounts with the inference providers you trust. Bedrock has ZDR, SOC, HIPPA, etc available, even for token inference, as an example. Cost is higher without cache, but you can’t have true ZDR and Cache (that I know of), because a cache would have to be stored between requests. The closest you could get there is maybe a secure inference container but that piles on the cost. Still, plenty of providers with ZDR policies.
LiteLLM is effectively just a proxy for whatever supported (or OpenAI, Anthropic, etc compatible api provider) you choose.
instalabsai 1 days ago [-]
One additional major benefit of OpenRouter is that there is no rate limiting. This is the primary reason why we went with OpenRouter because of the tight rate limiting with the native providers.
BeetleB 1 days ago [-]
I think it's more accurate to say that they switch providers when there is rate limiting.
The underlying provider can still limit rates. What Openrouter provides is automatic switching between providers for the same model.
(I could be wrong.)
Computer0 20 hours ago [-]
Beyond that, with some providers like Open AI, API limits are determined via a tiered account system based on your business relationship and spend.
cjonas 16 hours ago [-]
Wouldn't they be using the Azure inference API or AWS bedrock on their own accounts and NOT be going through the openAI/Anthropics servers anyways? I just always assumed this is how the big inference "resellers" (openrouter, cursor, etc) were operating.
wongarsu 1 days ago [-]
A lot of inference providers for open models only accept prepaid payments, and managing multiple of those accounts is kind of cumbersome. I could limit myself to a smaller set of providers, but then I'm probably overpaying by more than the 5.5% fee
If you're only using flagship model providers then openrouter's value add is a lot more limited
rvnx 1 days ago [-]
The main thing about Openrouter is also that they take 100% of the risk in case of overcharges from the models, you have an actual hard cap.
The minus is that context caching is only moderately working at best, rendering all savings nearly useless.
largbae 21 hours ago [-]
I haven't noticed any problems with large context requests through OR to e.g. Opus (other than the rate at which my budget gets spent!). Is this a performance thing?
SR2Z 1 days ago [-]
Is there any risk? Don't the model providers also bill by the token?
fuzzy2 1 days ago [-]
The accounting could be asynchronous, so you could overshoot your budget by a few requests before you're blocked.
fg137 1 days ago [-]
> The number of companies that send a bill is rarely a concern
Not true in any non startup where there is an actual finance department
datadrivenangel 1 days ago [-]
LiteLLM had a major security incident recently, and often isn't actually that useful an abstraction...
1 days ago [-]
cobertos 1 days ago [-]
Does OpenRouter perform better than LiteLLM on integration though? I found using Anthropic's models through a LiteLLM-laundered OpenAI-style API to perform noticably worse than using Anthropic's API directly. So I've scrapped considering LiteLLM as an option. It's also just a buggy mess from trying to use their MCP server. The errors it puts out are meaningless, and the UI behaves oddly even in the happy path (error message colored green with Success: prepended).
But if OpenRouter does better (even though it's the same sort of API layer) maybe it's worth it?
blazarquasar 13 hours ago [-]
OpenRouter performs much, much better than LiteLLM proxy.
In my experience, if OpenRouter offers a model, the API will be supported. They also often have inference providers available that will perform much better than the default provider. Just as an example, Z.ai is sitting at around 10 token/s for GLM 5.1 while friendly is doing 70 token/s for the same model through OpenRouter.
LiteLLM proxy also adds quite some overhead as well.
I have personally settled on a mix of Bifrost as my router which connects to OpenRouter or some other providers that I deem more privacy friendly.
vidarh 1 days ago [-]
I love Openrouter. The ability to define presets, and the ease of access is well worth the few vs. juggling lots of providers separately. I maintain a few subscriptions too - including the most expensive Claude subscription - but Openrouter handles the rest for me.
r0fl 1 days ago [-]
Love openrouter
I can use cheap models without having to have an api at a bunch of different providers and can use the expensive models when im in a pinch and am maxed out from claude or codex
well worth the 5% they take
spaniard89277 1 days ago [-]
You can get the same with kilo gateway without the fee.
abdusco 6 hours ago [-]
If you want to try a model quickly, you'll have to pay the original provider at least 5 USD. With OpenRouter, you can pay pennies. If you decide it's not for you, then you don't have to leave the rest of your deposit with the provider.
muyuu 21 hours ago [-]
Have you tried Kilo? I'd like to hear from someone who has tried both to know how do they compare.
1 days ago [-]
_s_a_m_ 22 hours ago [-]
Doesnt Copilot Pro+ does the same?
windexh8er 18 hours ago [-]
Not even remotely close.
sandos 14 hours ago [-]
It does have multiple suppliers of models at least?
_s_a_m_ 11 hours ago [-]
By default is has already all OpenAI and Anthropic models
pixel_popping 1 days ago [-]
Expect you don't have the right to do what you want with the API Key (see waves of ban lately, many SaaS services have closed because of it).
embedding-shape 1 days ago [-]
Unless you provide some more details, at least outline what "do what you want" was in your case, this seems like just straight up FUD.
himata4113 1 days ago [-]
openrouter accepts crypto so might have been some money laundering involved for reselling dirty crypto for llm api.
if that wasn't the reason, hey that's actually a great way to launder money (not financial advice).
embedding-shape 1 days ago [-]
So you pay OpenRouter with cryptocurrencies, which they accept as a payment method, and then what, they block your account because the cryptocurrencies you paid with came from some account on the blockchain associated with other stuff?
Or what are you really saying here? I don't understand how that's related to "you don't have the right to do what you want with the API Key", which is the FUD part.
himata4113 1 days ago [-]
You pay openrouter with dirty crypto, then you have a business which simply resells openrouter giving you clean fiat. I think openrouter specifically only banned those kind of accounts since that's what I have observed from other comments / research. numlocked in this thread has explicitely said that they don't ban accounts for any of the reasons specified above which narrows down the scope to some form of broken ToS specifically around fraud and money laundering.
embedding-shape 1 days ago [-]
And then you go on HN and post "you don't have the right to do what you want"? Yeah, FUD and good riddance if so.
pixel_popping 1 days ago [-]
You are not allowed to resell Openrouter as an API yourself, so for example if you make a service that charge per token, you can't use Openrouter API for that, this is specified in their ToS, so no, you can't do what you want, what FUD?
Quote from their own TOS: access the Site or Service for purposes of reselling API access to AI Models or otherwise developing a competing service;
embedding-shape 1 days ago [-]
Yeah, you're not allowed to do things that are specifically spelled out in the ToS, how is this surprising? Of course you don't get "unlimited access to do whatever you technically can", APIs never worked like that, why would they suddenly work like that?
When you say "you don't have the right to do what you want with the API Key" it makes it sound like specific use cases are disallowed, or something similar. "You don't have the right to go against the ToS, for some reason they block you then!" would have been very different, and of course it's like that.
Bit like complaining that Stripe is preventing you from accepting credit card payments for narcotics. Yes, just because you have an API key doesn't mean somehow you can do whatever you want.
pixel_popping 1 days ago [-]
That's very different from the Stripe example, as opening a service like Openrouter isn't illegal, so that's only coming from it being opinionated, nothing to do with the law. And my example was for not so specific use cases but quite general one which is just to open let say a service like Opencode Zen and use Openrouter as a backend, this is explicity forbidden by Openrouter and it isn't against the law, that's not just a "niche use case".
Are we allowed yes or not to make a service that charge per Token to end-users, like giving access to Kimi K2.5 to end-users through Openrouter in a pay per token basis?
Vinnl 1 days ago [-]
That was a different user who wrote that.
embedding-shape 1 days ago [-]
Yeah, I didn't mean them specifically, more a general "you".
Vinnl 1 days ago [-]
Ah fair enough.
supernes 1 days ago [-]
On the topic of Zed itself as a VSCode replacement - my experience is mixed. I loved it at first, but with time the papercuts add up. The responsiveness difference isn't that big on my system, but Zed's memory usage (with the TS language server in particular) is scandalous. As far as DX goes it's probably at 85% of the level VSCode provides, but in this space QoL features matter a lot. Oh, and it still can't render emojis in buffers on Linux...
extr 1 days ago [-]
I actually find Zed pretty reasonable in terms of memory usage. But yeah, like you say, there are lots of small UX/DX papercuts that are just unfortunate. In some cases I'm not sure it's even Zed's fault, it's just years and years of expecting things to work a certain way because of VS Code and they work differently in Zed.
Eg: Ctrl+P "Open Fol.." in Zed does not surface "Opening a Folder". Zed doesn't call them folders. You have to know that's called "Workspace". And even then, if you type "Open Work..." it doesn't surface! You have to purposefully start with "work..."
veber-alex 22 hours ago [-]
The issues you described show a critical lack of awareness from the Zed developers that people migrate to their IDE mainly from VS Code.
They are blowing their "weirdness budget" on nonsense.
extr 21 hours ago [-]
I don't think it's conscious or even a result of not caring about UX/DX. But I do think you're right - I've noticed the loudest voices in their Issue queue are people wanting things like better vim support, helix keybind support (super niche terminal modal editor), etc. Fine if they want to make that their niche but if you are migrating from VS Code like 99% of people you can't have these kinds of papercuts, people will just uninstall.
fishgoesblub 1 days ago [-]
I've been attempting to use Zed as a VSCode replacement but between the lack of bitmap font support (Criminal in an alleged code editor), and the weird UI choices, it's been hard. I want to love it, but what is "performance" if I have to spend more time working around the UI and lack of features from extensions. Strangest issue I've encountered is the colours being completely wrong when using Wayland.. colours are perfect when ran with Xwayland. I'll give it a plus though for native transparency for the background. Much nicer than having to make the entire window transparent, including the foreground text like with VSCode.
hypercube33 1 hours ago [-]
Question since I'm currently digging through fonts trying to find one my eyes like...what's your preference?
fishgoesblub 1 hours ago [-]
Terminus. I personally love it. There's a TTF version called Terminess that includes the original bitmaps for certain sizes, and uses the more blurry style font for sizes the original didn't have. You can use it with certain programs like VSCode that don't allow you to select bitmap fonts, yet actually support them.
tecoholic 1 days ago [-]
I agree. One of the strangest things I found was “in-built” support for a things like TailwindCSS. The LSP starts showing warning in all HTML/TSX files and confuses me to no end. I know, I can turn it off with the setting, but the choice seems so confusing. Tailwind is just one of the thousands of CSS libraries. Why enable it without any mention in the codebase.
I have actually ditched PyCharm for the snappiness of Zed. But the paper cuts are really adding up.
avilay 23 hours ago [-]
The point about papercuts adding up so resonates with me! I loved Zed initially and did find it more responsive than VS Code, loved the Zed Agent autocomplete, etc. However, I eventually and reluctantly went back to VS Code. The papercut that finally did it for me was [this open bug](https://github.com/zed-industries/zed/issues/36516) because of which I was not able to step into a packaged library's code when I was debugging my own code, this was in Python.
rzkyif 1 days ago [-]
Same here: I found the multibuffers feature really useful, but the extension system really couldn't hold a candle to VS Code at the time of my testing
Spent a couple of hours trying to make the Svelte extension ignore a particular type of false positive CSS error, failed, and returned to VS Code
Will definitely give it another chance when the extension system is more mature though!
udkl 1 days ago [-]
QoL features is where WebStorm shines! I don't look forward to when I have to open vscode instead sometimes.
Just the floating and ephemeral "Search in files" modal in Jetbrain IDEs would convince me to switch from any other IDE.
fxtentacle 1 days ago [-]
my favorite is Ctrl+Shift+A which lets you search through all available UI actions (hence the A). That's just so helpful if you know the IDE can do something but you forgot where in the menu structure it was. And to top things off, you can also use Ctrl+Shift+A to look up the keyboard shortcuts for every possible action
BoorishBears 1 days ago [-]
I still debate how much productivity I've gained from better AI compared to the loss from switching off WebStorm
But their tab complete situation is abysmal, and Supermaven got macrophaged by Cursor
tuzemec 1 days ago [-]
I have 4-5 typescript projects and one python opened in Zed at any given time (with a bunch of LSPs, ACPs, opened terminals, etc.) and I see around 1.2 - 1.4gb usage.
I opened just one of the typescript projects inside VSCode and I see something like 1gb (combining the helpers usage). I'm not using it actively, so no extra plugins and so on.
That's on mac, so I guess it may vary on other systems.
thejazzman 1 days ago [-]
I think there’s a bug? It used to be memory efficient and now I periodically notice it explodes. Quit and restart fixes it
I don’t have any extensions installed and I’m basically leaving it open, idle, as a note scratch space. I do have projects open with many files but not many actual files are open
Anyway idk
extr 1 days ago [-]
I think you are kidding if you think you are going to be remotely approximately the quantity/quality of output you get from a $100/max sub with Zed/Openrouter. I easily get $1K+ of usage out of my $100 max sub. And that's with Opus 4.6 on high thinking.
alexjplant 1 days ago [-]
For personal use I've noticed Claude (via the web-based chat UI) making really bizarre mistakes lately like ignoring input or making completely random assumptions. At work Claude Code has turned into an absolute dog. It fails to follow instructions and builds stuff like a lazy junior developer without any architecture, tests, or verification. This is even with max effort, Opus 4.6, multiple agents, early compaction, etc. I don't know what they did but Anthropic's quality lead has basically evaporated for me. I hope they fix it because I've since adapted my project's Claude artifacts for use with Codex and started using it instead - it feels like Claude Code did earlier this year.
I'd like to give the new GLM models a try for personal stuff.
stldev 21 hours ago [-]
Same, I'm looking hard for an alternative to what I had.
And I'm seeing the same thing in my sphere- everyone is bailing Anthropic the past few weeks. I figure that's why we're seeing more posts like this.
I hope they're paying attention.
AussieWog93 21 hours ago [-]
I've noticed the same thing, and even done side by side tests where I compare Claude Code with Cursor both running Opus 4.6.
It seems Cursor somehow builds a better contextual description of the workspace, so the model knows what I'm actually trying to achieve.
The problem is that with Cursor I'm paying per-token, so as GP suggested you can easily spend $100+ per month vs $20 on Claude Code.
hypercube33 1 hours ago [-]
I saw this immediately with 4.6 and dumped back to 4.5 because I actually asked it wtf it was doing and it's response was "being lazy"
selcuka 19 hours ago [-]
> At work Claude Code has turned into an absolute dog.
Some of the newer models available on OpenRouter are good, but I agree that none of them are a replacement for Opus 4.6 for coding.
If you're trying to minimize cost then having one of the inexpensive models do exploratory work and simple tasks while going back to Opus for the serious thinking and review is a good hybrid model. Having the $20/month Claude plan available is a good idea even if you're primarily using OpenRouter available models.
I think trying to use anything other than the best available SOTA model for important work is not a good tradeoff, though.
eshack94 19 hours ago [-]
I've been thinking of doing this — using one of the "pretty good but not Opus 4.6-good, YET very cheap" models for the implementation part of more basic code features, AFTER first using Opus 4.6 high for the planning stage.
Do you think this would be a decent approach?
Also, which client would I use for this? OpenCode? I don't think Claude Code supports using other models. Thoughts?
submain 18 hours ago [-]
I have been doing this and the results have been fairly good.
I use claude to build requirements.md -> implementation.md -> todo.md. Then I tell opencode + openrouter to read those files and follow the todo using a cheap (many times free) model.
It works 90% of the time. The other 10% it will get stuck, in which case I revert to claude.
That has allowed me to stay on the $20/month claude subscription as opposed to the $100.
lelanthran 1 days ago [-]
> I easily get $1K+ of usage out of my $100 max sub. And that's with Opus 4.6 on high thinking.
And people keep claiming the token providers are running inference at a profit.
gruez 1 days ago [-]
>And people keep claiming the token providers are running inference at a profit.
Not everyone gets $1K of usage, and you don't know how fat the per-token margins are. It's like saying the local buffet place is losing money because you eat $100 worth of takeout for $30.
lelanthran 1 days ago [-]
> Not everyone gets $1K of usage, and you don't know how fat the per-token margins are.
Well, we're going to find out sooner rather than later. Right now you don't know how thin (or negative) the margins are, either, after all.
All we know for certain is how much VC cash they got. Revenue, spend, profit, etc calculated according to GAAP are still a secret.
kitsune1 1 days ago [-]
[dead]
manquer 21 hours ago [-]
In addition to usage distribution aspects others called out .
$1K is not actual cost, just API pricing being compared to subscription pricing. It is quite possible that API has a large operating margins, and say costs only $100 to deliver $1K worth of API credits.
infecto 22 hours ago [-]
Yes and when we say things like that we are not talking about plans. Running inference at a profit means api token use is run profitably. It’s a huge unknown what’s happening at the plan level, we know there is subsidy happening but in aggregate impossible to know if it’s profitable or not.
Computer0 19 hours ago [-]
The model developers across the board stand by that most/all models are profitable by EOL, and losses come from R&D/Training.
Scene_Cast2 22 hours ago [-]
Out of curiosity, how many tokens are people using? I checked my openrouter activity - I used about 550 million tokens in the last month, 320M with Gemini and 240M with Opus. This cost me $600 in the past 30 days. $200 on Gemini, $400 on Opus.
mabunday 13 hours ago [-]
My Claude Code usage stats after ~3 months of heavy use:
Favorite model: Opus 4.6 Total tokens: 42.6m
Sessions: 420 Longest session: 10d 2h 13m
Active days: 53/95 Longest streak: 16 days
Most active day: Feb 9 Current streak: 4 days
~158x more tokens than Moby-Dick
Monthly breakdown via claude-code-monitor (not sure how accurate this is):
Month Total Tokens Cost (USD)
2026-01 96,166,569 $112.66
2026-02 340,158,917 $393.44
2026-03 2,183,154,148 $3,794.51
2026-04 1,832,917,712 $3,412.72
─────────────────────────────────────
Total 4,452,397,346 $7,713.34
mikeocool 1 days ago [-]
Yeah — I just created an anthropic API key to experiment with pi, and managed to spend $1 in about 30 minutes doing some basic work with Sonnet.
Extrapolating that out, the subscription pricing is HEAVILY subsidized. For similar work in Claude Code, I use a Pro plan for $20/month, and rarely bang up against the limits.
causal 1 days ago [-]
And it scales up - the $200 plan gets you something like 20x what the Pro plan gets you. I've never come close to hitting that limit.
It's obviously capital-subsidized and so I have zero expectation of that lasting, but it's pretty anti-competitive to Cursor and others that rely on API keys.
selcuka 19 hours ago [-]
Ignoring the training costs, the marginal cost for inference is pretty low for providers. They are estimated to break even or better with their $20/month subscriptions.
That being said, they can't stop launching new models, so training is not a one time task. Therefore one might argue that it is part of the marginal cost.
walthamstow 1 days ago [-]
I ran ccusage on my work Max account and I spend what would cost $300 a week if it was billed at API rates.
tern 24 hours ago [-]
According to the meter, I used $15k in tokens with my Max plan (along with $5k of Codex tokens) in the last 30 days. That built an entire working and (lightly) optimized language, parser, compiler, runtime toolchain among other things.
nothinkjustai 1 days ago [-]
Not everyone is just vibecoding everything and relying on agents running sota models to do anything tho.
Its really fantastic. I can't imagine why you'd go through the effort using Claude Code with other models when pi is a much better harness. There's tons of extensions already available, and its trivial to prompt an LLM to create your any new extension you want. Lacking creativity and want something from another harness?
> Run <other harness> in tmux and interrogate it how feature X works, then build me the equivalent as a pi extension.
Maybe in a few years there will be obvious patterns with harnesses having built really optimal flows, but right now it works so much better to experiment and try new approaches and prompts and flows, and pi is the easiest one to tweak and make it your own.
ElFitz 1 days ago [-]
> but right now it works so much better to experiment and try new approaches and prompts and flows, and pi is the easiest one to tweak and make it your own.
That’s what really appeals to me. I’ve been fighting Claude Code’s attempts to put everything in memory lately (which is fine for personal preferences), when I prefer the repo to contain all the actual knowledge and learnings. Made me realise how these micro-improvements could ultimately, some day, lead to lock-in.
> Run <other harness> in tmux and interrogate it how feature X works, then build me the equivalent as a pi extension.
had been using claude max/opus with pi and the results have been incredible. Having pi write an AGENTS.md and dip your feet into creating your own skills specific to the project.
With the anthropic billing change (not being able to use the max credits for pi) I think I have to cancel - as I'm whirring through credits now.
Going to move to the $250/mo OpenAI codex plan for now.
chatmasta 21 hours ago [-]
Regardless of which harness you use, asking your agent to self-edit its own .claude (and to put it in the repo itself so you see the changes) is the single biggest impact change you can make in terms of compounding improvement. Couple this with telling it to create skills for /garden (clean up drift based on what changed this session), /handoff (garden, create ant skills to resolve friction encountered this session, and write a summary of the session and note for next agent), /takeover (read the latest handoff file). Since doing this I’ve completely cured my session-abandonment anxiety and can confidently swap to a new session at < 20% context usage without feeling like I’m talking to someone who just woke up from a coma.
aryehof 14 hours ago [-]
Can you provide more details on these skills?
chris_ivester 12 hours ago [-]
[dead]
zweicoder 1 days ago [-]
I was looking into this as well since Claude models are costing too much with the Extra Usage changes.
Is OpenAI codex not also charging by usage instead of subscription when using pi?
ElFitz 1 days ago [-]
pi is what OpenClaw runs on, and so far OpenAI seems committed to it. No telling how long it will last.
dust42 1 days ago [-]
I really love it. The simplicity is key. The first play project I made with it was a public transport map with GTFS data - click on a stop and get the routes and the timetables for the stop and the surrounding ones. I used Qwen3.5-35B on Mac M1 Max with oMLX. It wrote 98% of the code with very little interaction from me. And very useful is the /tree feature to go back in history when the model is on a wrong track or my instructions where not good enough. I usually work in a two path approach: first let the model explore what it needs to fulfill the task and write it into CONTEXT.md (or any other name to your liking). Then restart the session with the CONTEXT.md. That way you are always nicely operating in 5-15k context, i.e. all is very fast. Create an account for pi (or docker) and make sure it can't walk into other directories - it has bash access.
Add the browser-tools to the skills and load them when useful:
https://github.com/badlogic/pi-skills
No need for database MCP, I use postgres and tell it to use psql.
Occasionally I use prettier to remove indentation - the LLM makes a lot less edit errors that way. Just add the indent back before you commit. Or tell pi to do it.
WhyNotHugo 1 days ago [-]
Pi is a lot simpler than Claude and a lot more transparent in how it operates.
It's designed to be a small simple core with a rich API which you can use for extensions (providing skills, tools, or just modifying/extending the agent's behaviour).
It's likely that you'll eventually need to find extensions for some extended functionality, but for each feature you can pick the one that fits your need exactly (or just use Pi to hack a new extension).
recursivegirth 1 days ago [-]
I bought a $30 Z.ai Coding Plan sub to go with it. 7 million tokens has only gone through 2% of my weekly usage using the GLM-5.1 model. I am pretty happy.
I am only doing single project workflows, but with Z.ai I feel like it opens a whole new door to parallel workflows without hitting usage limits.
esperent 20 hours ago [-]
Is GLM-5.1 actually good?
I tested one of the other models that everyone is raving about yesterday (Qwen 3.6 plus) and within minutes found myself arguing with it even over a very simple task. After about 30 minutes (in which token usage never went over 50k because it was just me rewinding to give it more and more explicit instructions which it kept ignoring), I reverted everything and did it with Opus in literally about 4 minutes, after intentionally giving Opus a much more vague prompt.
recursivegirth 18 hours ago [-]
I've had a good experience so far. Idk if I would attribute that to Pi or the GLM models. However, it feels nice not being constrained by usage.
ElFitz 23 hours ago [-]
Yeah, that's pretty much the lines I was thinking along.
Perhaps use codex for planning / reviews, but otherwise go with z.ai / minimax for actual implementation. Thanks!
nocobot 1 days ago [-]
i really have been enjoying pi a lot
at first i thought i was goring to build lots of extra plugins and commands but what ended up working for me is:
- i have a simpel command that pulls context from a linear issue
- simple review command
- project specific skills for common tasks
sonar_un 1 days ago [-]
I use Daniel Meissler’s PAI and it’s been an incredible harness.
Daviey 1 days ago [-]
Reluctantly, the dev seems to have a stinky attitude.
He went on an "OSS vacation", which is perfectly reasonable and said he'd be back on a certain date. I had a PR open for a trivial fix, someone asked when it would land. I shared he was still away. After his return I politely asked, "@badlogic hey, what can we do to progress this? Thanks x"
I then got what I would consider an abusive reply, because he confused me with someone else. In the meantime he extended his vacation. Didn't even think his shitty attitude was worthy of an apology, that HE confused me with someone else.
Now he's seemingly marked anything with my name on as a "clanker", despite all my changes being by hand.
I've been around open source enough to have a thick skin, but when i'm doing something "for fun" and someone treats you like that, i'd rather avoid it as far as possible. I certainly could not in good faith use this project for anything work related.
embedding-shape 1 days ago [-]
> Honestly, it seems you are grumpy, so it was probably a good idea to extend that vacation. Being rude just creates a more toxic environment for everyone. Maybe extend that break for the rest of the month and come back nicer? Thanks
Honestly, it seems like both of you were feeling a bit "grumpy" at the moment, but sending passive aggressiveness towards the maintainer you are trying to get to merge your code (or not your code, someone else's code?) seems like a very bold strategy regardless.
Daviey 1 days ago [-]
You know, when I wrote that I genuinely meant it, or at least I think i did. It wasn't supposed to be passive aggressive. :(
But that doesn't negate the maintainer talking to people like that (and taking contributions without attribution).. and the net result is I don't want to use the software, and frankly they probably won't miss me.. so the end result is neutral.. I just find it sad.
raincole 1 days ago [-]
> Maybe extend that break for the rest of the month and come back nicer?
Quite sure most (perhaps >99%) adult people would consider this passive aggressive.
But yeah, I agree with you for the rest part. Why did Mario assume that bot is you...?
ZeWaka 1 days ago [-]
if a human showed up directly under some bot bullshit pinging me I'd assume they were the bot operator as well
sodacanner 1 days ago [-]
I'm not sure how you'd be able to interpret that as anything other than passive aggressive.
brazukadev 22 hours ago [-]
> You know, when I wrote that I genuinely meant it, or at least I think i did. It wasn't supposed to be passive aggressive. :(
That's a great opportunity for self reflection.
Daviey 22 hours ago [-]
Yeah, I take that. I've thought about it much of this evening.
I think at the time I was frustrated, it felt unfair and I couldn't understand it.
Then I thought,this guy probably does need more time off. Which was a genuine thought.
But that is where I should have stopped. The way i expressed it, whilst genuine thought, was expressed in a way which was passive aggressive. I am owning that.
brazukadev 1 hours ago [-]
> Then I thought,this guy probably does need more time off. Which was a genuine thought.
The thought was correct, he was probably stressed. You made it worse.
> The way i expressed it, whilst genuine thought, was expressed in a way which was passive aggressive
The way you expressed is a technicality, what counts here was your action.
Open source developers owe you nothing, you can always fork and implement your feature by yourself.
Behaving the way you did, treating it as something about you, is very selfish.
crashprone 1 days ago [-]
You seem to have posted your polite question as a reply to the bot comment which talks about PR #1484 and not your PR. I'd say it's pretty obvious why the maintainer thought you were pushing the bot's PR.
As someone else pointed out cooler heads and less passive aggressive responses would've resolved this issue easily.
NwtnsMthd 1 days ago [-]
But hey, the dev was generous to give it an MIT license, you could always just fork it and what you like ¯\_(ツ)_/¯
bashtoni 1 days ago [-]
After hitting Claude limits today I spent the afternoon using OpenCode + GLM 5.1 via OpenRouter and I was very impressed.
OpenCode picked up my CLAUDE.md files and skills straight away, and I got similar
performance to Opus 4.6.
sourcecodeplz 1 days ago [-]
How much did it cost for how long?
bashtoni 1 days ago [-]
~$1/hr over 4 hours.
I'm pretty conservative when it comes to clearing the context, and I also tend to provide the right files to work on (or at least the right starting point).
I had seen prior to using the model that it starts producing much worse results when the context used is larger, so my usage style probably helps getting better results. I work like this with Claude Code anyway, so it wasn't a big change.
Many of us got the annual Lite plan when they had the $28 discount. But even at $120 I think it's a good deal.
WhyNotHugo 21 hours ago [-]
Aliyun had GLM-5, Kimi K2.5 and a few others for ¥40 (~€5) per month. Regrettably, that plan is not longer available for new users, but the new plan is still ~€25.
Z.ai seems crazy expensive in comparisons, although I wonder if inference speeds have a noticeable difference.
jml78 1 days ago [-]
I am trying to take this in the more giving way possible, anyone remotely considering that subscription should go on reddit and see all the people experiencing outages constantly and insanely slow speeds when it does work.
I have been wanting to subscribe but based on how awful the experience is for most people, I just can’t pull the trigger
BeetleB 1 days ago [-]
At $84, I can understand not taking the risk. But for $28 ... it was worth it.
FWIW, I've never dealt with outages since I signed up over 3 months ago (Lite plan). It is slow - always has been. I can live with that.
At the same time, I'm not using it for work. It's for the occasional project once in a while. So maybe I just haven't hit any limits? I did use it for OpenClaw for 2-3 weeks. Never had connection issues.
you can see the details of their limits. Seems GLM 5.1 has low thresholds, and will get lower starting May. On Reddit I see some people switching to GLM 5 and claiming they haven't hit limits - the site doesn't indicate the limits for that model.
They also say that those who subscribed before February have different limits - unsure if it's lower or higher!
GLM-4.7 is still a fairly capable model. Not as good as Opus, but for most personal projects it's been adequate. I see on Reddit plenty of people plan using GLM-5.1, and use 4.7 for implementation.
1 days ago [-]
srslyTrying2hlp 1 days ago [-]
[dead]
cbg0 1 days ago [-]
I don't think there's currently better value than Github's $40 plan which gives you access to GPT5 & Claude variants. It's pay per request so not ideal for back-and-forth but great for building complex features on the cheap compared to paying per token.
Because GH is accessing the API behind the scenes, you should face less degradation when using Sonnet/Opus models compared to a Claude subscription.
Keep a ChatGPT $20 subscription alongside for back-and-forth conversations and you'll get great bang for buck.
rafaelmn 1 days ago [-]
I'm still paying the 10$ GH copilot but I don't use it because :
- context is aggressively trimmed compared to CC obviously for cost saving reasons, so the performance is worse
- the request pricing model forces me to adjust how I work
Just these alone are not worth saving the 60$/month for me.
I like the VSCode integration and the MCP/LSP usage surprised me sometimes over the dumb grep from CC. Ironically VSCode is becoming my terminal emulator of choice for all the CLI agents - SSH/container access and the automatic port mapping, etc. - it's more convenient than tmux sessions for me. So Copilot would be ideal for me but yeah it's just tweaked for being budget/broad scope tool rather than a tool for professionals that would pay to get work done.
lbreakjai 1 days ago [-]
You can use your GH subscription with a different harness. I'm using opencode with it, it turns GH into a pure token provider. The orchestration (compacting, etc.) is left to the harness.
It turns it into a very good value for money, as far as I'm concerned.
rafaelmn 1 days ago [-]
But you still get charged per turn right ? I don't like that because it impacts my workflow. When I was last using it I would easily burn through the 10$ plan in two days just by iterating on plans interactively.
lbreakjai 1 days ago [-]
Honestly I'm not sure, I'm on my company's plan, I get a progress bar vaguely filling, but no idea of the costs or billing under the hood.
sourcecodeplz 1 days ago [-]
But you still get the reduced context-window.
briHass 1 days ago [-]
Disagree entirely.
GHCP at least is transparent about the pricing: hit enter on a prompt= one request. CC/Codex use some opaque quota scheme, where you never really know if a request will be 1,2,10% of your hourly max, let alone weekly max.
I've never seen much difference with context ostensibly being shorter in GHCP, all of the models (in any provider) lose the thread well before their window is full, and it seems that aggressive autocompaction is a pretty standard way to help with that, and CC/Codex do it frequently.
rafaelmn 1 days ago [-]
>I've never seen much difference with context ostensibly being shorter in GHCP, all of the models (in any provider) lose the thread well before their window is full, and it seems that aggressive autocompaction is a pretty standard way to help with that, and CC/Codex do it frequently.
Then we've had wildly different results. Running CC and GH copilot with Opus 4.6 on same task and the results out of CC were just better, likewise for Codex and GPT 5.4. I have to assume it's the aggressive context compaction/limited context loading because tracking what copilot does it seems to read way less context and then misses out on stuff other agents pick up automatically.
neya 1 days ago [-]
Is your source code worth only $40 for them to train their models on?
This is of course not a problem for business accounts.
We are not allowed to use anything other than our company provided GHCP credentials due to the data retention clause in our contracts. Ie. they are not allowed to use our data.
cbg0 1 days ago [-]
Considering how much data they already have from everything that's on GitHub, I doubt you would make a dent boycotting their AI product.
spwa4 1 days ago [-]
And don't you think they're going to realize soon that it's also pretty good at "doing penetration testing" for your company when it's already trained on your company's source code?
Google $20/mo plan has great usage for Claude Opus. Last time I used it, around Feb, it felt basically unlimited.
no1youknowz 1 days ago [-]
Agree, that was Feb. Not now, I cancelled mine on the 7th. Claude Opus via Gemini is just a few prompts then it locks you out for another week.
auggierose 1 days ago [-]
So, you basically tried it a century ago...
frenchie4111 1 days ago [-]
Does anyone use Zed with a monorepo?
I am in a situation where every sub-folder has its own language server settings, lint settings, etc. VSCode (and forks) can handle this by creating a workspace, adding each folder to the workspace, and having a separate .vscode per-folder. I haven't figured out how to do the same with Zed.
I would love to stop using VSCode forks
reddec 1 days ago [-]
My 50c - ollama cloud 20$. GLM5 and kimi are really competitive models, Ollama usage limits insane high, no limits where to use (has normal APIs), privacy and no logging
my002 1 days ago [-]
Interesting. I've always been turned off by how vague the descriptions of Ollama's limits are for their paid tiers. What sort of work have you been doing with it?
The worst I saw - multiple parallel agents (opencode & pi-coding agents), with Kimi and glm, almost non stop development during the work day - 15-20% session consumption (I think it’s 2h bucket) max. Never hit the limit.
In contrast, 20$ Claude in the similar mode I consumed after just few hours of work.
yieldcrv 1 days ago [-]
yeah? why do you like that over using GLM5 in a VPS that charges by token use? $20 still cheaper and seamless to set up? how are the tokens per second?
reddec 20 hours ago [-]
I have roughly 20-40M token usage per day for GLM only (more if count other models). Using API pricing from OR it means ollama more profitable for me after day (few days if count cache properly).
For several models like Kimi and glm they have b300 and performance really good. At launch I got closer to 90-100 tps. Nowadays it’s around 60 tps stable across most models I used (utility models < 120B almost instant)
slimebot80 17 hours ago [-]
I might be misunderstanding something.
He uses $70 for remaining credit and says that's a good thing because it rolls over
But spending $70 on an API (he says he still prefers Opus) is far less cost effective than a Max plan on Anthropic.
The article seems to be nudging us to setup OpenRouter but the premise isn't fully true. A bit of diversity is excellent, but the costs are going to (largely) prohibit it in reality?
kisamoto 16 hours ago [-]
OP here. I do like Opus but I don't default to it for everything. My CC usage is a lot of Haiku/Sonnet and is also very bursty (bursts throughout the month, not a day).
I find that a lot of my Claude usage goes unused and then when I'm coding or leaning on agents I hit a limit and have to wait. I don't like that dynamic. I do have Extra Usage enabled (with a cap) but then I'm spending more than the $100 I already do.
I'm learning that a lot of people seem to consistently stay within limits and that works for them but I was looking for something different for myself.
The real pain is that Anthropic don't easily quantify usage (which can now change over the day). How many tokens is it? Minimum? Maximum? I tried to quantify this with OpenTelemetry for a while but have decided to move to this more flexible setup.
delduca 1 days ago [-]
I also dropped Claude Code Max.
I switched to OpenCode Zen + GitHub Copilot. For some reason, Claude Code burns through my quota really quickly.
I dislike how Zen (and many similar cases, not picking on Zen here) report being not for profit or transparent, while the auto-recharge mechanism guarantees they are sitting on a float of at least $5 per account, and presumably an average of at least $10. That's something like 50 cents of interest income per year per account. It's not nothing and it's hardly egregious fraud, but I feel if they will do this when it's obvious what they're doing, what other corners might they cut
Honesty as a marketing strategy is really undervalued in cases like this
pprotas 1 days ago [-]
Yeah man, it's a grand scheme to skim 50 cents off you per year. All combined, that might be just enough to cover their website hosting costs.
vixalien 1 days ago [-]
honestly the issue with Zen is that they collect and might sell your data
woutr_be 1 days ago [-]
How does Claude Code compare to OpenCode Zen? I’m on the $20/month Claude plan, and was considering OpenCode Zen as well.
Due to the quota changes, I actually find myself using Claude less and less
delduca 1 days ago [-]
I mostly use Opus via Copilot with opencode, and I'll tell you, in the past few days, I've had long sessions (almost the whole day) without hitting rate limits. That's very different from Claude Code, which used to rate-limit me before even halfway through the day.
woutr_be 1 days ago [-]
Just cancelled my Claude plan, so that I can switch over when it expires in a week. The usage limits somehow just make me less productive with it.
criley2 1 days ago [-]
I haven't tried $20 claude code recently, but I've used OpenCode Zen primarily so I can play with opensource/chinese models which are very inexpensive. I'd spend $0.50-$1.00 on a single claude opus 4.6 plan mode run, then have a chinese model execute the plan for like $0.10-$0.15 total. I'd keep context short, constantly start new threads, and get laser focused markdown plans and knowledgebase to be token efficient.
If I just let opencode zen run claude opus to plan and execute, I'd spend $20 in like 5 minutes lol
sourcecodeplz 1 days ago [-]
Which chinese models do you use and do you use any for specific tasks?
msh 1 days ago [-]
kimi k2.5 works quite well and is super fast. Much faster than opus but not quite at the same quality level.
criley2 1 days ago [-]
Whenever a new one comes out, there's a good chance they're free for a week on Zen, so I try out any free ones. For example, MiniMax M2.5 and Qwen 3.6+ are free right now.
Personally, I've had a lot of good results in my little personal projects with Kimi K2.5, GLM 5 and 5.1, and MiniMax M2.5.
_pdp_ 1 days ago [-]
Our bank (a major retail bank in UK) is refusing doing business with OpenRouter and OpenRouter issued a refund which we did not request. So something is up. There is that.
I might be paranoid but I feel that access to models will become more constraint in the future as the industry gets more regulated.
chid 1 days ago [-]
I don't quite understand what you mean by something is up. Was the reason around security/telemetry or similar?
_pdp_ 1 days ago [-]
Bank refused to provide reasons - even after a formal complaint was raised with them.
We are not the only one. I found other people online experiencing the same issue. It is hard to tell how wide-spread this is but it is strange to say the least.
mayama 1 days ago [-]
OpenRouter accepts crypto for payments. That should have raised some flags with banks.
siliconc0w 1 days ago [-]
I'm running out of Claude session limits in a single planning + implementation session even when using sonnet for the implementation. This isn't even super complex work - it was refactoring a data model, modifying templates/apis/services, etc. It has also gotten notably more 'lazy' like it updated the data model and not the template until I specifically pointed that out.
My backup has been Opencode + Kimi K2. It's definitely not as strong as even Sonnet but it's pretty fast and is serviceable for basic web app work like the above.
KronisLV 1 days ago [-]
I tried using OpenRouter for the same kind of development I now do with Anthropic's subscription across Sonnet/Gemini/GPT models and it ended up being 2-3x more expensive than the subscription (which I suspect is heavily subsidized).
It's nice that it works for the author, though, and OpenRouter is pretty nice for trying out models or interacting with multiple ones through a unified platform!
1 days ago [-]
WhitneyLand 1 days ago [-]
>>For some reason Zed limits the Gemini 3.1 context to 200k tokens
It’s not just Zed, CoPilot also reduces the capabilities and options available when using models directly.
No thanks, definitely agree with the Open Router approach or native harness to keep full functionality.
frr149 14 hours ago [-]
I see a lot of people trying to run away from Anthropic "window of doom" affair lately, me myself included. What has stopped me so far is the lack of real alternative to Opus. Not even gpt5.4 comes close
hybirdss 1 days ago [-]
The bursty usage pattern is what kills the subscription value. I hit limits during mid-refactor and there's nothing to do but wait. The worst part is knowing the unused hours during the day are just gone.
OpenRouter credit rollover is the real insight — credits that don't expire vs time windows that reset whether you used them or not. I'm surprised Anthropic hasn't offered a token pack option alongside the subscription.
rachel_rig 14 hours ago [-]
[dead]
jusonchan81 19 hours ago [-]
$20 codex has been working great for me and I don’t think I ever hit a limit. It works great because I typically break down the tasks small enough that I can fully review and accept.
I’ve always wondered what’s the business case for spending more as I personally feel I am getting so much done.
simlevesque 1 days ago [-]
I really don't like OpenCode. One thing that really irritated me is that on mouse hover it selects options when you're given a set of choices.
cat_plus_plus 2 hours ago [-]
I just got MiniMax $200/year token plan. Usually it works fine for daily coding, if it gets stuck I pay for some Claude API calls through Roo gateway. Unlike other plans, this one officially supports running OpenClaw or other API workflows and doesn't suspend you long term if you use too many tokens, just set rate per few hours.
bachmeier 1 days ago [-]
I just tried Zed with Gemma 4 to see how it does with local models. Impressive speed and quality for the small model with thinking off (E4B). Very slow for the big model with thinking turned on. We'll see if this is better than my current tools (primary is Codex CLI plus qwen3 coder next) but the first impression is good. Especially nice that it configured all of my ollama models automatically.
candl 1 days ago [-]
What providers offer nowadays coding plans, so no pricing per tokens, just api call limit and a monthly fee. Which are affordable?
tiku 1 days ago [-]
Im using z.ai when I hit my Claude limit after a few questions..drops in easily in Claude code.
mark212 15 hours ago [-]
Bizarre and baffling -- an entire post about AI agents for coding and not a single mention of OpenAI, Codex, or ChatGPT (any model). Not that I'm shilling for them in any way, but the consensus among Twitterati is that Codex is better and it's weird that it's not even mentioned as an option?
pixel_popping 1 days ago [-]
It should be noted about Openrouter that you aren't allowed to expose the access to end users, it has to be for internal usage only, which can be fatal as they have made waves of account banning lately (without warnings).
numlocked 1 days ago [-]
You are absolutely allowed to expose access to end users, as long as you continue to abide by terms of service. We have hundreds, if not thousands, of apps built on openrouter that in turn have end users of their own. We showcase many of them on our /apps ranking page!
pixel_popping 1 days ago [-]
TOS says: access the Site or Service for purposes of reselling API access to AI Models or otherwise developing a competing service;
So yes obviously you can do what you want as long as you abide by terms of service, but the terms of service does NOT allow you to resell the API.
senko 1 days ago [-]
> you aren't allowed to expose the access to end users, it has to be for internal usage only,
> TOS says: access the Site or Service for purposes of reselling API access to AI Models or otherwise developing a competing service;
I think what you meant is "you aren't allowed to expose the access to the API to end users", which is a fair condition IMHO.
You're still allowed to expose the functionality (ie. build a SaaS or AI assistant powered by OpenRouter API), just don't build a proxy.
pixel_popping 1 days ago [-]
To be clear, I like Openrouter and recommend it to many people (I don't aim to "shit on it").
It does talk about a competing service, if I build a service that propose all the image gen models of Openrouter, and charge the user for it per token, am I allowed?
himata4113 1 days ago [-]
I was actually wondering about this since I've seen like 3 comments talking about the same thing, would it happen to be related to money laundering due to the availability of the crypto payment method?
Deathmax 1 days ago [-]
The comments are all from the same author.
OpenRouter recently started enforcing account-level regional restrictions for providers that enforce it (OpenAI, Anthropic, Google) - ie blocking accounts that look like they are being used by users in China. The regional restriction used to be based on the Cloudflare edge worker IP's geolocation and enforced upstream, so a proxy/server running inside of supported regions would get around the geoblocks, but now OpenRouter are using (unspecified) signals like your billing address to geoblock. People say "banned" because the error message says "Author <provider> is banned", which really should be read as "Unable to use models from provider due to upstream ban".
pixel_popping 1 days ago [-]
Which further strengthen the fact that you can't do anything you want with API keys, even if you pay for them.
himata4113 1 days ago [-]
there is a huge gap between 'doing whatever you want' and 'illegal activities' as well as upstream restrictions (out of openrouters control)
pixel_popping 1 days ago [-]
What illegal activity? What another user pointed out about crypto isn't it, I'm talking about the fact that you can't open a service through Openrouter and charge your users per Token (aka "reselling" Openrouter), since when is this illegal?
blastslot 1 days ago [-]
[dead]
Maticslot 1 days ago [-]
[dead]
Computer0 1 days ago [-]
When I use the tool ccusage it says I use $600 of usage a month for my $100. I don’t know that this is a good value proposition for me if I want to stay with the same model, half the reason I use Claude code, personally.
blitzar 1 days ago [-]
> Reallocating $100/Month Claude Code Spend
The new gimped claude code limits means my claude code spend the last month is $131. It cost me $20. I did an additional spend $5 on extra usage which cost me $5.
While VC's are setting fire to money I am going to warm my hands.
542458 1 days ago [-]
I think it is worth noting that “what they charge for api access” != “marginal cost of inference”. So I don’t think getting i.e. $40 of api usage for $20 would be insane. $131 for $20 does probably mean somebody is losing money though.
andai 1 days ago [-]
You mean you were getting more than $130 per $20 before?
85% discount is actually a bit lower than I remember. I think it used to be closer to 90-95%. They're getting stingy ;)
blitzar 1 days ago [-]
I think it was around $400-$500 last year ($20 a day was fairly common) before they added the 7 day limits (and have since slashed the 4 hour limits).
No parallel running; I would very consistently get tokens for over 3 hours then take a walk around the block and come back and be ready to go again.
vanillameow 1 days ago [-]
I ran this just now and for a small web-app I built I used over $50 in a single day. This was using superpowers plugin and almost exclusively coordinating through Opus. Could I get by with 100$ a month without the subscription? Maybe, but I pay for the convenience of just being able to throw Opus with lavish plugins at it (with 5h limits that are, in my opinion, pretty reasonable). I don't really WANT to have to think about when Haiku or Sonnet are enough.
If anything I would consider switching to OpenAI subscription (if I didn't despise them even more than Anthropic as a company), but converting to API use seems completely infeasible to me. I'd have to severely cut back on my use for not much benefit, other than having maybe an agent thats a little less jank than CC.
sandos 14 hours ago [-]
I dont understand how CC can burn that much money. Iv'e built many webapps using copilot, and our normal business tokens rarely run out. I would say Ive never exceeded 150% of a normal months tokens.
vanillameow 10 hours ago [-]
I think Opus is just an expensive model on API, especially without context management. A single message with near full context (I think this was still on 250k as well) costs like 1$ or something like that iirc.
Imo this is the premium I pay right now to just not have to worry about this.
The project where I burned 50$ in a day was using superpowers plugin (A set of skills that makes Claude meticulously plan out design and implementation, interview for details, use subagents for subtasks and review them independently, etc.) - it burns tokens like crazy, but it has super good results for me for custom software tools for myself.
I would probably change my approach if I a) was creating software for customers where I had to actually worry about the implementation details or b) if I was forced to switch to API and couldn't just throw Opus at a 28-task plan for an hour. But this works for me right now so meh. I feel like I'm in some rare Goldilocks zone where Anthropic is not super ripping me off (I use CC quite heavily and generate real value for myself) but I also don't go crazy if I go 2 days without building the next SaaS startup.
blitzar 1 days ago [-]
Depending on your workflow, in the spirit of reallocating $100/Month subscription, it may be worth dropping to the $20/Month plan (or equivalent at other providers) and then pay as you go on the (rare) occasions you "build a small web-app I built and used over $50 in a single day".
But at that point we are just min/maxing the details, and all I can say is if you are on a $100/$200 a month subscription to any of these services and not using them regularly then you shouldn't be on a $200 subscription any more than you should be on a $700 a month gym membership when you go every 3 months for 15 minutes.
vanillameow 1 days ago [-]
Nah I get you, but for me personally, I do use CC a ton. It's building me so many useful internal tools right now, and deep research is also bootstrapping me into some new hobbies etc. I think I'm kind of in a rare-ish (maybe not so much on HN, but for the general population) spot where I'm not really trying to make some SaaS get quick rich scheme, but just directing CC to make apps that would take me a few days to make in a few hours, smoke test them, and solve a problem I had before. (e.g. photo tagging automation, MCP connections for personal services for documentation of chats or setting up todos, ansible playbooks for VMs, setting up data pipelines from external APIs for home automation...)
I deffo get more perceived value out of it than the 100$ I pay.
Could I get MORE value with the same 100$? imo only through OpenAI (no harness lock in and more lenient limits), but I deeply dislike the way their company is evolving. Admittedly, recent launches from Anthropic like managed agents and Mythos Preview don't make me very hopeful the individual developer pricing is here to stay, but I'll use what I can get while I can get it.
Could I get my required value with less than 100$? Mayyybe I could get by with like, three Anthropic 20$ plans? or 2x20$ and an OAI 20$? but this is so min-maxy that I just don't really want to bother. Pay by token would kill my workflow instantly. I'd have to add so many steps for model selection alone. I'll cross that bridge when Anthropic cuts me off.
I agree though most people on the $200 plans are either just not using them or in some deep AI psychosis. I'd like to exclude myself from these groups, but the pipeline to AI psychosis seems very wishy washy to begin with (the thread the other day about iTunes charts being AI dominated had a surprising amount of people defending AI music, imo).
philipp-gayret 1 days ago [-]
I like and do use Zed but be aware functionality like Hooks is not supported for their integration with Claude Code, as a heavy user of Hooks I would stick with the terminal.
kisamoto 1 days ago [-]
I'm always interested in how people use tools. I like to have a full editor to review code as a complement to the CLI and as I don't often use hooks the integration is also good enough for me.
1. What do you use the hooks for?
2. Do you use an editor alongside the CLI to review code or only examine the diffs?
yakuphanycl 1 days ago [-]
[flagged]
Computer0 1 days ago [-]
I have had credits on open router that haven’t been deleted since near the projects launch, I believe 365 days is not a rule but rather a right reserved.
numlocked 1 days ago [-]
COO of OpenRouter here. Thats right — we haven’t done it to date but we can’t have unlimited liabilities stacking up forever. At some point we will start expiring credits from accounts that have seen zero activity in over a year.
blitzar 1 days ago [-]
Maybe a bad suggestion, but can you do an inactivity "fee" - 25% / year (min $5) or something similar. I like the pre-pay system everyone in Ai seems to have settled on, its better than the AWS bills that we all know and love.
ac29 1 days ago [-]
> we can’t have unlimited liabilities stacking up forever
The liabilities are completely offset by prepayments from your customers though. Even better, you can earn interest on the deposits without paying any out.
If you just dont want the liabilities on the books, issue refunds. Expiring credits feels like a cash grab.
theshrike79 1 days ago [-]
It's just basic bookkeeping, storing money over fiscal years is a nightmare to manage. (At least over here, dunno about whatever country Openrouter is based in)
kisamoto 1 days ago [-]
Thank you for taking the time to explain that - makes sense. I lifted what was present in your terms of service as I'd like to understand the minimum time I have.
indigodaddy 1 days ago [-]
What if I deposited $10, and have lots of recent activity on free models and have barely touched the $10 for payg models?
threatofrain 1 days ago [-]
In CA gift cards don’t expire and the industry does fine without having people buy expiring money.
andrewmcwatters 1 days ago [-]
[dead]
hhthrowaway1230 1 days ago [-]
note: doesn't openrouter charge 5.5% fee?
kisamoto 1 days ago [-]
You are absolutely correct, I was not aware of this. I will update the article accordingly and perhaps it's more worthwhile to stay solely on Cursor with the limited models.
Sadly Zed seems to add 10% so it's still more worthwhile to use OpenRouter.
cedws 1 days ago [-]
I feel like a bit of an idiot because I didn’t know this either. I just assumed OR was another startup burning money to provide models at cost.
OpenRouter is a valuable service but I’ll probably try to run my own router going forward.
giancarlostoro 1 days ago [-]
Look again, they don't charge that fee until after "1M requests per month" whatever that means? Oh that's if you bring your own provider keys.
Come on at least write the Hackernews replies yourself.
kisamoto 1 days ago [-]
I did. Perhaps too much consumption of AI responses but articles and engagement are written by me - a human.
cbg0 1 days ago [-]
That's exactly what a clanker would say. ^/s
glitchcrab 1 days ago [-]
Only the opening sentence has an AI smell; the rest is definitely written by a fleshy meatbag
urnfjrkrkn 1 days ago [-]
I would suggest to explore paid plans on different providers. Much better value than plans bundled with editors or API based usage in openrouter. And Chinese companies have versions hosted in Singapore or US.
Also ditching Claude Code is mistake. It is quite capable model, and still great value. I would keep it, even if it's just for code reviews and planning. Anthropic allows pro plans use in Zed.
heliumtera 1 days ago [-]
I heard you liked men in the middle, so we put a man in the middle of men in the middle.
0xbadcafebee 1 days ago [-]
I just so happen to be doing a price comparison for different cloud LLM providers right now. It turns out some of the cheapest providers with the highest limits are ones you might not have heard of.
OpenCode Go has the simplest plan at the highest rate limits for any subscription plan with multiple model families, and it's $10/month ($5/month for first month). With the cheapest model in the plan (MiniMax M2.5), it is a 13x higher rate than Claude Max, at 1/10th the price. The most expensive model (GLM 5.1) gives you a rate of 880 per 5h, which is more than any other $10 plan. I don't expect this price to last, it makes no sense. OpenCode also has a very generous free tier with higher rates than some paid plans, but the free models do collect data.
The cheapest plan of all is free and unlimited - GitHub Copilot. They offer 3 models for free with (supposedly) no limit - GPT-4o, GPT-4.1, and GPT-5-mini. I would not suggest coding with them, but for really basic stuff, you can't get better than free. I would not recommend their paid plans, they actually have the lowest limits of any provider. They also have the most obtuse per-token pricing of any provider. (FYI, GitHub Copilot OAuth is officially supported in OpenCode)
The next cheapest unlimited plan is BlackBox Pro. Their $10/month Pro plan provides unlimited access to MiniMax M2.5. This model is good enough for coding, and the unlimited number of requests means you can keep churning with subagents long after other providers have hit a limit.
The next cheapest is MiniMax Max, a plan from the makers of MiniMax. For $50/month, you get 15,000 requests per 5-hours to MiniMax M2.7. This is not as cheap as OpenCode Go, which gives you 20,000 requests of MiniMax M2.5 for $10, but you are getting the newer model.
If you don't want to use MiniMax, the next cheapest is Chutes Pro. For $20/month, you get a monthly limit of 5,000 requests.
Note: This calculation is inaccurate, for multiple reasons. For one, it's entirely predicated on working 8 hours a day, 22 days a month; I'll recalculate at some point to find cheapest if you wanted to churn 24/7. For another, some providers (coughANTHROPIC) don't actually tell you what their limits are, so we have to guess and use an average. But based on my research, the calculations seems to match up with the per-request API cost reported at OpenRouter. Happy to take suggestions on improvements.
mongrelion 23 hours ago [-]
I have been so far happy with the value that Copilot brought but for the past few weeks I have felt the chokehold on the number of requests.
I have had the chance to test the main Chinese models through OpenRouter but the Pay-as-you-go model is expensive compared to a subscription model, but I don't want to marry to a single provider.
Thanks for bringing OpenCode Go to my attention. Your comparison is the research I didn't know I needed, and I will be cancelling my Copilot subscription to replace it with OpenCode Go right away.
da_ordi_ 1 days ago [-]
Yep, I was comparing opencode go ($10/month) with copilot pro ($10/month) this morning.
opencode go gives about 14x the requests of copilot pro. I was like, there must be something not right.
Then I compared the best model GLM5.1 on opencode go, and antropic opus 4.6, yes opus is better on most benchmarks, but glm 5.1 is not too far behind.
14 hours ago [-]
g8oz 1 days ago [-]
Just on Zed: it's speed and responsiveness are very impressive. Feels as snappy as Notepad++.
BoredPositron 1 days ago [-]
Get a Gemini subscription and pipe the antigravity tokens into claude code. You can have five family accounts on one subscription and every account gets the same amount of tokens. It's the best value there is atm and you get more claude tokens than from anthropic themselves.
faeyanpiraat 1 days ago [-]
Sounds like a good way to get your google account banned
pyinstallwoes 1 days ago [-]
Sorry can you expand on that? I have a Gemini subscription from a Google pro account but never used it much. I can use it with Claude Code?? Hmm. I’ll look it up. Thanks!
phainopepla2 1 days ago [-]
Be aware it's against the terms of service. Google account ban is possible
atlgator 1 days ago [-]
I am very disappointed that Anthropic killed the use of Max subscriptions for OpenClaw, especially when I never hit my usage limits on it. Perhaps I will try this combo as an alternative.
Rather than trying to lie and get people to use your service, be honest what the upsides/downsides are, and only add your spam when it's at least a bit related, otherwise it just comes off as insincere when you're spamming your own platform in unrelated threads.
i_love_retros 1 days ago [-]
I can't believe people are spending $100 a month on this! You're all mad!
grebc 1 days ago [-]
When you consider the cross section of the tech community posting on HN, is it really that surprising?
It’s mad for sure, but I’d bet 99.9% of people spending money on AI aren’t spending their own hard earned sooo… “YOLO it’s a business expense/investment”…
Syzygies 1 days ago [-]
Hearing my "before I die" math and code ambitions, a psychiatrist friend tried to convince me to hire a full-time programming assistant. Then came AI.
Money is relative. I retired at less than the average professor salary (all ages) at a not-rich school. I would have made more in tech. I still have weeks where the market goes up 2000x my AI budget, just the retirement savings from my salary. Anyone who isn't living in a van and eating peanut butter if they must, to save the max toward retirement, isn't recognizing how profoundly our system is rigged to favor saving.
vidarh 1 days ago [-]
$200 for Claude, $50 to Open AI, and maybe $100 for Openrouter, and a second Claude account paid by a client... Likely to increase.
It easily pays for itself 10x over.
kisamoto 1 days ago [-]
I had a similar opinion a couple of years ago, content with more of an autocomplete.
Now I'm happy with agents as the models and harnesses have improved significantly but the token usage comes at a cost.
gozzoo 1 days ago [-]
some are spending 100/day or even 1000/day. they must really be mad :)
i_love_retros 1 days ago [-]
Drunk on perceived power
nubg 1 days ago [-]
Your ignorance is our opportunity :)
dboreham 1 days ago [-]
To get the equivalent of a junior developer that would cost $80,000/yr + benefits?
andrewmcwatters 1 days ago [-]
[dead]
Rendered at 20:30:48 GMT+0000 (Coordinated Universal Time) with Vercel.
Having access to dozens of models through a single API key, tracking cost of each request, being able to run the same request on different models and comparing their results next to each other, separating usages through different API keys, adding your own presets, setting your routing rules...
And once you start using an account with multiple users, it's even more useful to have all those features!
Not relying on a subscription and having the right to do exactly what you want with your API key (using it with any tool/harness...) is also a big plus to me.
For general use, I personally don’t see much justification as to why I would want to pay a per-token fee just to not create a few accounts with my trusted providers and add them to an instance for users. It is transparent to users beyond them having a single internal API key (or multiple if you want to track specific app usage) for all the models they have access to, with limits and logging. They wouldn’t even need to know what provider is hosting the model and the underlying provider could be swapped without users knowing.
It is certainly easier to pay a fee per token on a small scale and not have to run an instance, so less technical users could definitely find advantage in just sticking with OpenRouter.
1. The LLM provider doesn't know it's you (unless you have personally identifiable information in your queries). If N people are accessing GPT-5.x using OpenRouter, OpenAI can't distinguish the people. It doesn't know if 1 person made all those requests, or N.
2. The ability to ensure your traffic is routed only to providers that claim not to log your inputs (not even for security purposes): https://openrouter.ai/docs/guides/routing/provider-selection...
It's been forever since I played with LiteLLM. Can I get these with it?
FWIW this is highly unlikely to be true.
It's true that the upstream provider won't know it's _you_ per se, but most LLM providers strongly encourage proxies like OpenRouter to distinguish between downstream clients for security and performance reasons.
For example:
- https://developers.openai.com/api/docs/guides/safety-best-pr...
- https://developers.openai.com/api/docs/guides/prompt-caching...
For prompt caching, they already say they permit it, and do not consider it "logging" (i.e. if you have zero retention turned on, it will still go to providers who do prompt caching).
Without caching, it would make sense to be per-request (more like a transaction-id, and would make sense to be) as this could then be tied internally back to a user while maintaining external anonymity, but unfortunately I don’t believe that is the case.
As for LiteLLM, the company you would pay for inference is going to know it is “you” — the account — but LiteLLM would also have the same effect of appearing to be a single source to that provider. That said, a uniqueness for a user may be passed (as is often with OpenRouter also) for security. Only you know who the users are, that never has to leave your network if you don’t want.
2 - well, you select the providers, so that’s pretty much on you? :-) basically, you are establishing accounts with the inference providers you trust. Bedrock has ZDR, SOC, HIPPA, etc available, even for token inference, as an example. Cost is higher without cache, but you can’t have true ZDR and Cache (that I know of), because a cache would have to be stored between requests. The closest you could get there is maybe a secure inference container but that piles on the cost. Still, plenty of providers with ZDR policies.
LiteLLM is effectively just a proxy for whatever supported (or OpenAI, Anthropic, etc compatible api provider) you choose.
The underlying provider can still limit rates. What Openrouter provides is automatic switching between providers for the same model.
(I could be wrong.)
If you're only using flagship model providers then openrouter's value add is a lot more limited
The minus is that context caching is only moderately working at best, rendering all savings nearly useless.
Not true in any non startup where there is an actual finance department
But if OpenRouter does better (even though it's the same sort of API layer) maybe it's worth it?
LiteLLM proxy also adds quite some overhead as well.
I have personally settled on a mix of Bifrost as my router which connects to OpenRouter or some other providers that I deem more privacy friendly.
well worth the 5% they take
if that wasn't the reason, hey that's actually a great way to launder money (not financial advice).
Or what are you really saying here? I don't understand how that's related to "you don't have the right to do what you want with the API Key", which is the FUD part.
Quote from their own TOS: access the Site or Service for purposes of reselling API access to AI Models or otherwise developing a competing service;
When you say "you don't have the right to do what you want with the API Key" it makes it sound like specific use cases are disallowed, or something similar. "You don't have the right to go against the ToS, for some reason they block you then!" would have been very different, and of course it's like that.
Bit like complaining that Stripe is preventing you from accepting credit card payments for narcotics. Yes, just because you have an API key doesn't mean somehow you can do whatever you want.
Are we allowed yes or not to make a service that charge per Token to end-users, like giving access to Kimi K2.5 to end-users through Openrouter in a pay per token basis?
Eg: Ctrl+P "Open Fol.." in Zed does not surface "Opening a Folder". Zed doesn't call them folders. You have to know that's called "Workspace". And even then, if you type "Open Work..." it doesn't surface! You have to purposefully start with "work..."
They are blowing their "weirdness budget" on nonsense.
I have actually ditched PyCharm for the snappiness of Zed. But the paper cuts are really adding up.
Spent a couple of hours trying to make the Svelte extension ignore a particular type of false positive CSS error, failed, and returned to VS Code
Will definitely give it another chance when the extension system is more mature though!
Just the floating and ephemeral "Search in files" modal in Jetbrain IDEs would convince me to switch from any other IDE.
But their tab complete situation is abysmal, and Supermaven got macrophaged by Cursor
I opened just one of the typescript projects inside VSCode and I see something like 1gb (combining the helpers usage). I'm not using it actively, so no extra plugins and so on.
That's on mac, so I guess it may vary on other systems.
I don’t have any extensions installed and I’m basically leaving it open, idle, as a note scratch space. I do have projects open with many files but not many actual files are open
Anyway idk
I'd like to give the new GLM models a try for personal stuff.
And I'm seeing the same thing in my sphere- everyone is bailing Anthropic the past few weeks. I figure that's why we're seeing more posts like this.
I hope they're paying attention.
It seems Cursor somehow builds a better contextual description of the workspace, so the model knows what I'm actually trying to achieve.
The problem is that with Cursor I'm paying per-token, so as GP suggested you can easily spend $100+ per month vs $20 on Claude Code.
Could it be related to this?: https://news.ycombinator.com/item?id=47660925
If you're trying to minimize cost then having one of the inexpensive models do exploratory work and simple tasks while going back to Opus for the serious thinking and review is a good hybrid model. Having the $20/month Claude plan available is a good idea even if you're primarily using OpenRouter available models.
I think trying to use anything other than the best available SOTA model for important work is not a good tradeoff, though.
Do you think this would be a decent approach?
Also, which client would I use for this? OpenCode? I don't think Claude Code supports using other models. Thoughts?
I use claude to build requirements.md -> implementation.md -> todo.md. Then I tell opencode + openrouter to read those files and follow the todo using a cheap (many times free) model.
It works 90% of the time. The other 10% it will get stuck, in which case I revert to claude.
That has allowed me to stay on the $20/month claude subscription as opposed to the $100.
And people keep claiming the token providers are running inference at a profit.
Not everyone gets $1K of usage, and you don't know how fat the per-token margins are. It's like saying the local buffet place is losing money because you eat $100 worth of takeout for $30.
Well, we're going to find out sooner rather than later. Right now you don't know how thin (or negative) the margins are, either, after all.
All we know for certain is how much VC cash they got. Revenue, spend, profit, etc calculated according to GAAP are still a secret.
$1K is not actual cost, just API pricing being compared to subscription pricing. It is quite possible that API has a large operating margins, and say costs only $100 to deliver $1K worth of API credits.
Extrapolating that out, the subscription pricing is HEAVILY subsidized. For similar work in Claude Code, I use a Pro plan for $20/month, and rarely bang up against the limits.
It's obviously capital-subsidized and so I have zero expectation of that lasting, but it's pretty anti-competitive to Cursor and others that rely on API keys.
That being said, they can't stop launching new models, so training is not a one time task. Therefore one might argue that it is part of the marginal cost.
Any insights / suggestions / best practices?
> Run <other harness> in tmux and interrogate it how feature X works, then build me the equivalent as a pi extension.
Maybe in a few years there will be obvious patterns with harnesses having built really optimal flows, but right now it works so much better to experiment and try new approaches and prompts and flows, and pi is the easiest one to tweak and make it your own.
That’s what really appeals to me. I’ve been fighting Claude Code’s attempts to put everything in memory lately (which is fine for personal preferences), when I prefer the repo to contain all the actual knowledge and learnings. Made me realise how these micro-improvements could ultimately, some day, lead to lock-in.
> Run <other harness> in tmux and interrogate it how feature X works, then build me the equivalent as a pi extension.
I’ll give it a try!
With the anthropic billing change (not being able to use the max credits for pi) I think I have to cancel - as I'm whirring through credits now.
Going to move to the $250/mo OpenAI codex plan for now.
Is OpenAI codex not also charging by usage instead of subscription when using pi?
No need for database MCP, I use postgres and tell it to use psql.
Occasionally I use prettier to remove indentation - the LLM makes a lot less edit errors that way. Just add the indent back before you commit. Or tell pi to do it.
It's designed to be a small simple core with a rich API which you can use for extensions (providing skills, tools, or just modifying/extending the agent's behaviour).
It's likely that you'll eventually need to find extensions for some extended functionality, but for each feature you can pick the one that fits your need exactly (or just use Pi to hack a new extension).
I am only doing single project workflows, but with Z.ai I feel like it opens a whole new door to parallel workflows without hitting usage limits.
I tested one of the other models that everyone is raving about yesterday (Qwen 3.6 plus) and within minutes found myself arguing with it even over a very simple task. After about 30 minutes (in which token usage never went over 50k because it was just me rewinding to give it more and more explicit instructions which it kept ignoring), I reverted everything and did it with Opus in literally about 4 minutes, after intentionally giving Opus a much more vague prompt.
at first i thought i was goring to build lots of extra plugins and commands but what ended up working for me is:
- i have a simpel command that pulls context from a linear issue
- simple review command
- project specific skills for common tasks
He went on an "OSS vacation", which is perfectly reasonable and said he'd be back on a certain date. I had a PR open for a trivial fix, someone asked when it would land. I shared he was still away. After his return I politely asked, "@badlogic hey, what can we do to progress this? Thanks x"
I then got what I would consider an abusive reply, because he confused me with someone else. In the meantime he extended his vacation. Didn't even think his shitty attitude was worthy of an apology, that HE confused me with someone else.
https://github.com/badlogic/pi-mono/discussions/1475#discuss...
And another other thing I fixed with no attribution, just landed it himself separately. https://github.com/badlogic/pi-mono/discussions/1080
and
https://github.com/badlogic/pi-mono/issues/1079#event-223896...
Now he's seemingly marked anything with my name on as a "clanker", despite all my changes being by hand.
I've been around open source enough to have a thick skin, but when i'm doing something "for fun" and someone treats you like that, i'd rather avoid it as far as possible. I certainly could not in good faith use this project for anything work related.
Honestly, it seems like both of you were feeling a bit "grumpy" at the moment, but sending passive aggressiveness towards the maintainer you are trying to get to merge your code (or not your code, someone else's code?) seems like a very bold strategy regardless.
But that doesn't negate the maintainer talking to people like that (and taking contributions without attribution).. and the net result is I don't want to use the software, and frankly they probably won't miss me.. so the end result is neutral.. I just find it sad.
Quite sure most (perhaps >99%) adult people would consider this passive aggressive.
But yeah, I agree with you for the rest part. Why did Mario assume that bot is you...?
That's a great opportunity for self reflection.
I think at the time I was frustrated, it felt unfair and I couldn't understand it.
Then I thought,this guy probably does need more time off. Which was a genuine thought.
But that is where I should have stopped. The way i expressed it, whilst genuine thought, was expressed in a way which was passive aggressive. I am owning that.
The thought was correct, he was probably stressed. You made it worse.
> The way i expressed it, whilst genuine thought, was expressed in a way which was passive aggressive
The way you expressed is a technicality, what counts here was your action.
Open source developers owe you nothing, you can always fork and implement your feature by yourself.
Behaving the way you did, treating it as something about you, is very selfish.
As someone else pointed out cooler heads and less passive aggressive responses would've resolved this issue easily.
OpenCode picked up my CLAUDE.md files and skills straight away, and I got similar performance to Opus 4.6.
I'm pretty conservative when it comes to clearing the context, and I also tend to provide the right files to work on (or at least the right starting point).
I had seen prior to using the model that it starts producing much worse results when the context used is larger, so my usage style probably helps getting better results. I work like this with Claude Code anyway, so it wasn't a big change.
Many of us got the annual Lite plan when they had the $28 discount. But even at $120 I think it's a good deal.
Z.ai seems crazy expensive in comparisons, although I wonder if inference speeds have a noticeable difference.
I have been wanting to subscribe but based on how awful the experience is for most people, I just can’t pull the trigger
FWIW, I've never dealt with outages since I signed up over 3 months ago (Lite plan). It is slow - always has been. I can live with that.
At the same time, I'm not using it for work. It's for the occasional project once in a while. So maybe I just haven't hit any limits? I did use it for OpenClaw for 2-3 weeks. Never had connection issues.
Looking at https://docs.z.ai/devpack/faq
you can see the details of their limits. Seems GLM 5.1 has low thresholds, and will get lower starting May. On Reddit I see some people switching to GLM 5 and claiming they haven't hit limits - the site doesn't indicate the limits for that model.
They also say that those who subscribed before February have different limits - unsure if it's lower or higher!
GLM-4.7 is still a fairly capable model. Not as good as Opus, but for most personal projects it's been adequate. I see on Reddit plenty of people plan using GLM-5.1, and use 4.7 for implementation.
Because GH is accessing the API behind the scenes, you should face less degradation when using Sonnet/Opus models compared to a Claude subscription.
Keep a ChatGPT $20 subscription alongside for back-and-forth conversations and you'll get great bang for buck.
I like the VSCode integration and the MCP/LSP usage surprised me sometimes over the dumb grep from CC. Ironically VSCode is becoming my terminal emulator of choice for all the CLI agents - SSH/container access and the automatic port mapping, etc. - it's more convenient than tmux sessions for me. So Copilot would be ideal for me but yeah it's just tweaked for being budget/broad scope tool rather than a tool for professionals that would pay to get work done.
It turns it into a very good value for money, as far as I'm concerned.
GHCP at least is transparent about the pricing: hit enter on a prompt= one request. CC/Codex use some opaque quota scheme, where you never really know if a request will be 1,2,10% of your hourly max, let alone weekly max.
I've never seen much difference with context ostensibly being shorter in GHCP, all of the models (in any provider) lose the thread well before their window is full, and it seems that aggressive autocompaction is a pretty standard way to help with that, and CC/Codex do it frequently.
Then we've had wildly different results. Running CC and GH copilot with Opus 4.6 on same task and the results out of CC were just better, likewise for Codex and GPT 5.4. I have to assume it's the aggressive context compaction/limited context loading because tracking what copilot does it seems to read way less context and then misses out on stuff other agents pick up automatically.
https://www.techradar.com/pro/bad-news-skeptics-github-says-...
We are not allowed to use anything other than our company provided GHCP credentials due to the data retention clause in our contracts. Ie. they are not allowed to use our data.
I am in a situation where every sub-folder has its own language server settings, lint settings, etc. VSCode (and forks) can handle this by creating a workspace, adding each folder to the workspace, and having a separate .vscode per-folder. I haven't figured out how to do the same with Zed.
I would love to stop using VSCode forks
The worst I saw - multiple parallel agents (opencode & pi-coding agents), with Kimi and glm, almost non stop development during the work day - 15-20% session consumption (I think it’s 2h bucket) max. Never hit the limit.
In contrast, 20$ Claude in the similar mode I consumed after just few hours of work.
For several models like Kimi and glm they have b300 and performance really good. At launch I got closer to 90-100 tps. Nowadays it’s around 60 tps stable across most models I used (utility models < 120B almost instant)
He uses $70 for remaining credit and says that's a good thing because it rolls over
But spending $70 on an API (he says he still prefers Opus) is far less cost effective than a Max plan on Anthropic.
The article seems to be nudging us to setup OpenRouter but the premise isn't fully true. A bit of diversity is excellent, but the costs are going to (largely) prohibit it in reality?
I find that a lot of my Claude usage goes unused and then when I'm coding or leaning on agents I hit a limit and have to wait. I don't like that dynamic. I do have Extra Usage enabled (with a cap) but then I'm spending more than the $100 I already do.
I'm learning that a lot of people seem to consistently stay within limits and that works for them but I was looking for something different for myself.
The real pain is that Anthropic don't easily quantify usage (which can now change over the day). How many tokens is it? Minimum? Maximum? I tried to quantify this with OpenTelemetry for a while but have decided to move to this more flexible setup.
I switched to OpenCode Zen + GitHub Copilot. For some reason, Claude Code burns through my quota really quickly.
https://opencode.ai/zen
Honesty as a marketing strategy is really undervalued in cases like this
Due to the quota changes, I actually find myself using Claude less and less
If I just let opencode zen run claude opus to plan and execute, I'd spend $20 in like 5 minutes lol
Personally, I've had a lot of good results in my little personal projects with Kimi K2.5, GLM 5 and 5.1, and MiniMax M2.5.
I might be paranoid but I feel that access to models will become more constraint in the future as the industry gets more regulated.
We are not the only one. I found other people online experiencing the same issue. It is hard to tell how wide-spread this is but it is strange to say the least.
My backup has been Opencode + Kimi K2. It's definitely not as strong as even Sonnet but it's pretty fast and is serviceable for basic web app work like the above.
It's nice that it works for the author, though, and OpenRouter is pretty nice for trying out models or interacting with multiple ones through a unified platform!
It’s not just Zed, CoPilot also reduces the capabilities and options available when using models directly.
No thanks, definitely agree with the Open Router approach or native harness to keep full functionality.
OpenRouter credit rollover is the real insight — credits that don't expire vs time windows that reset whether you used them or not. I'm surprised Anthropic hasn't offered a token pack option alongside the subscription.
I’ve always wondered what’s the business case for spending more as I personally feel I am getting so much done.
So yes obviously you can do what you want as long as you abide by terms of service, but the terms of service does NOT allow you to resell the API.
> TOS says: access the Site or Service for purposes of reselling API access to AI Models or otherwise developing a competing service;
I think what you meant is "you aren't allowed to expose the access to the API to end users", which is a fair condition IMHO.
You're still allowed to expose the functionality (ie. build a SaaS or AI assistant powered by OpenRouter API), just don't build a proxy.
It does talk about a competing service, if I build a service that propose all the image gen models of Openrouter, and charge the user for it per token, am I allowed?
OpenRouter recently started enforcing account-level regional restrictions for providers that enforce it (OpenAI, Anthropic, Google) - ie blocking accounts that look like they are being used by users in China. The regional restriction used to be based on the Cloudflare edge worker IP's geolocation and enforced upstream, so a proxy/server running inside of supported regions would get around the geoblocks, but now OpenRouter are using (unspecified) signals like your billing address to geoblock. People say "banned" because the error message says "Author <provider> is banned", which really should be read as "Unable to use models from provider due to upstream ban".
The new gimped claude code limits means my claude code spend the last month is $131. It cost me $20. I did an additional spend $5 on extra usage which cost me $5.
While VC's are setting fire to money I am going to warm my hands.
85% discount is actually a bit lower than I remember. I think it used to be closer to 90-95%. They're getting stingy ;)
No parallel running; I would very consistently get tokens for over 3 hours then take a walk around the block and come back and be ready to go again.
If anything I would consider switching to OpenAI subscription (if I didn't despise them even more than Anthropic as a company), but converting to API use seems completely infeasible to me. I'd have to severely cut back on my use for not much benefit, other than having maybe an agent thats a little less jank than CC.
Imo this is the premium I pay right now to just not have to worry about this. The project where I burned 50$ in a day was using superpowers plugin (A set of skills that makes Claude meticulously plan out design and implementation, interview for details, use subagents for subtasks and review them independently, etc.) - it burns tokens like crazy, but it has super good results for me for custom software tools for myself.
I would probably change my approach if I a) was creating software for customers where I had to actually worry about the implementation details or b) if I was forced to switch to API and couldn't just throw Opus at a 28-task plan for an hour. But this works for me right now so meh. I feel like I'm in some rare Goldilocks zone where Anthropic is not super ripping me off (I use CC quite heavily and generate real value for myself) but I also don't go crazy if I go 2 days without building the next SaaS startup.
But at that point we are just min/maxing the details, and all I can say is if you are on a $100/$200 a month subscription to any of these services and not using them regularly then you shouldn't be on a $200 subscription any more than you should be on a $700 a month gym membership when you go every 3 months for 15 minutes.
I deffo get more perceived value out of it than the 100$ I pay. Could I get MORE value with the same 100$? imo only through OpenAI (no harness lock in and more lenient limits), but I deeply dislike the way their company is evolving. Admittedly, recent launches from Anthropic like managed agents and Mythos Preview don't make me very hopeful the individual developer pricing is here to stay, but I'll use what I can get while I can get it.
Could I get my required value with less than 100$? Mayyybe I could get by with like, three Anthropic 20$ plans? or 2x20$ and an OAI 20$? but this is so min-maxy that I just don't really want to bother. Pay by token would kill my workflow instantly. I'd have to add so many steps for model selection alone. I'll cross that bridge when Anthropic cuts me off.
I agree though most people on the $200 plans are either just not using them or in some deep AI psychosis. I'd like to exclude myself from these groups, but the pipeline to AI psychosis seems very wishy washy to begin with (the thread the other day about iTunes charts being AI dominated had a surprising amount of people defending AI music, imo).
1. What do you use the hooks for?
2. Do you use an editor alongside the CLI to review code or only examine the diffs?
The liabilities are completely offset by prepayments from your customers though. Even better, you can earn interest on the deposits without paying any out.
If you just dont want the liabilities on the books, issue refunds. Expiring credits feels like a cash grab.
Sadly Zed seems to add 10% so it's still more worthwhile to use OpenRouter.
OpenRouter is a valuable service but I’ll probably try to run my own router going forward.
https://openrouter.ai/docs/guides/overview/auth/byok
Also ditching Claude Code is mistake. It is quite capable model, and still great value. I would keep it, even if it's just for code reviews and planning. Anthropic allows pro plans use in Zed.
OpenCode Go has the simplest plan at the highest rate limits for any subscription plan with multiple model families, and it's $10/month ($5/month for first month). With the cheapest model in the plan (MiniMax M2.5), it is a 13x higher rate than Claude Max, at 1/10th the price. The most expensive model (GLM 5.1) gives you a rate of 880 per 5h, which is more than any other $10 plan. I don't expect this price to last, it makes no sense. OpenCode also has a very generous free tier with higher rates than some paid plans, but the free models do collect data.
The cheapest plan of all is free and unlimited - GitHub Copilot. They offer 3 models for free with (supposedly) no limit - GPT-4o, GPT-4.1, and GPT-5-mini. I would not suggest coding with them, but for really basic stuff, you can't get better than free. I would not recommend their paid plans, they actually have the lowest limits of any provider. They also have the most obtuse per-token pricing of any provider. (FYI, GitHub Copilot OAuth is officially supported in OpenCode)
The next cheapest unlimited plan is BlackBox Pro. Their $10/month Pro plan provides unlimited access to MiniMax M2.5. This model is good enough for coding, and the unlimited number of requests means you can keep churning with subagents long after other providers have hit a limit.
The next cheapest is MiniMax Max, a plan from the makers of MiniMax. For $50/month, you get 15,000 requests per 5-hours to MiniMax M2.7. This is not as cheap as OpenCode Go, which gives you 20,000 requests of MiniMax M2.5 for $10, but you are getting the newer model.
If you don't want to use MiniMax, the next cheapest is Chutes Pro. For $20/month, you get a monthly limit of 5,000 requests.
I'll be adding more of these as I find them to this spreadsheet: https://codeberg.org/mutablecc/calculate-ai-cost/src/branch/...
Note: This calculation is inaccurate, for multiple reasons. For one, it's entirely predicated on working 8 hours a day, 22 days a month; I'll recalculate at some point to find cheapest if you wanted to churn 24/7. For another, some providers (coughANTHROPIC) don't actually tell you what their limits are, so we have to guess and use an average. But based on my research, the calculations seems to match up with the per-request API cost reported at OpenRouter. Happy to take suggestions on improvements.
I have had the chance to test the main Chinese models through OpenRouter but the Pay-as-you-go model is expensive compared to a subscription model, but I don't want to marry to a single provider.
Thanks for bringing OpenCode Go to my attention. Your comparison is the research I didn't know I needed, and I will be cancelling my Copilot subscription to replace it with OpenCode Go right away.
opencode go gives about 14x the requests of copilot pro. I was like, there must be something not right.
Then I compared the best model GLM5.1 on opencode go, and antropic opus 4.6, yes opus is better on most benchmarks, but glm 5.1 is not too far behind.
Rather than trying to lie and get people to use your service, be honest what the upsides/downsides are, and only add your spam when it's at least a bit related, otherwise it just comes off as insincere when you're spamming your own platform in unrelated threads.
It’s mad for sure, but I’d bet 99.9% of people spending money on AI aren’t spending their own hard earned sooo… “YOLO it’s a business expense/investment”…
Money is relative. I retired at less than the average professor salary (all ages) at a not-rich school. I would have made more in tech. I still have weeks where the market goes up 2000x my AI budget, just the retirement savings from my salary. Anyone who isn't living in a van and eating peanut butter if they must, to save the max toward retirement, isn't recognizing how profoundly our system is rigged to favor saving.
It easily pays for itself 10x over.
Now I'm happy with agents as the models and harnesses have improved significantly but the token usage comes at a cost.