+1. I see all these posts about tokens, and I'm like "who's paying by the token?"
Hrun0 1 days ago [-]
> +1. I see all these posts about tokens, and I'm like "who's paying by the token?"
When you use the API
smallerize 1 days ago [-]
Yes. That is the question.
handoflixue 19 hours ago [-]
Anthropic pushes you to use the API for anything "third party", such as running OpenClaw
paulddraper 1 days ago [-]
Most LLM usage?
There’s some exceptions eg Claude Max
piker 1 days ago [-]
yes, and VS code as mentioned above. That's kind of the joke.
pluralmonad 1 days ago [-]
I've had single prompt to Opus consume as many as 13 premium messages. The Copilot harness is so gimped so they can abstract tokens from messages. Every person that started with Copilot that I know that tried CC were amazed at the power difference. Stepping out of a golf cart and into <your favorite fast car>.
brushfoot 1 days ago [-]
It hasn't done that to me. It's worked according to their docs:
> Copilot Chat uses one premium request per user prompt, multiplied by the model's rate.
> Each prompt to Copilot CLI uses one premium request with the default model. For other models, this is multiplied by the model's rate.
> Copilot coding agent uses one premium request per session, multiplied by the model's rate. A session begins when you ask Copilot to create a pull request or make one or more changes to an existing pull request.
Sorry, I should have specified this was with GHC CLI. I suppose that might not behave similarly to the GUI extension. But it definitely happened on Thursday. One prompt, ctrl-c out and it said 13 premium messages used. It was reading a couple of large files and Opus doesn't seem to let the harness restrict it from reading entire files... just a couple hundred lines at a time.
and now I see your comment mentions that explicitly. The output was quite unambiguous. :shrug:
ryanhecht 22 hours ago [-]
Hey! I'm a PM on the Copilot CLI team. This sounds like a bug, we should follow the same premium request scheme as the VSCode extension! If you still have the session logs kicking around, can you email them to me? It's my hn username @github.com
andrewmcwatters 1 days ago [-]
It seems like it's the cheapest way to access Claude Sonnet 4.5, but the model distribution is clearly throttled compared to Claude Sonnet 4.5 on claude.ai.
That being said, I don't know why anyone would want to pay for LLM access anywhere else.
ChatGPT and claude.ai (free) and GitHub Copilot Pro ($100/yr) seem to be the best combination to me at the moment.
indigodaddy 1 days ago [-]
So 100 Opus requests a month? That's not a lot.
NiloCK 1 days ago [-]
Cat's out of the bag now, and it seems they'll probably patch it, but:
Use other flows under standard billing to do iterative planning, spec building, and resource loading for a substantive change set. EG, something 5k+ loc, 10+ file.
Then throw that spec document as your single prompt to the copilot per-request-billed agent. Include in the prompt a caveat that We are being billed per user request. Try to go as far as possible given the prompt. If you encounter difficult underspecified decision points, as far as possible, implement multiple options and indicate in the completion document where selections must be made by the user. Implement specified test structures, and run against your implementation until full passing.
Most of my major chunks of code are written this way, and I never manage to use up the 100 available prompts.
readitalready 1 days ago [-]
This is basically my workflow. Claude Code for short edits/repairs, VSCode for long generations from spec. Subagents can work for literally days, generation tens of thousands of lines of code with one prompt that costs 12 cents. There's even a summary of tokens used per session in Copilot CLI, telling me I've used hundreds of millions of tokens. You can calculate the eventual API value of that.
Just at the absolute best deal in the AI market.
likium 1 days ago [-]
For $10 flat per request up to 128k tokens they’re losing money. 100 * 100k is 10m tokens. At current api pricing that’s $50 input tokens, not even accounting for output!
brushfoot 1 days ago [-]
And a request can consume more than 128k tokens.
A cloud agent works iteratively on your requests, making multiple commits.
I put large features into my requests and the agent has no problem making hundreds of changes.
everfrustrated 1 days ago [-]
You didn't account for cached input tokens - some % of input tokens will be follow-on prompts which are billed at the cheaper cached token rate.
indigodaddy 1 days ago [-]
I mean aren't they losing money on everything even the API? This isn't going to end well with how expensive it all really is.
whynotmaybe 1 days ago [-]
Having worked some time in huge businesses, I can assure that there are many corporate copilot subscribers that never use it, that's where they earn money.
In the past we had to buy an expensive license of some niche software, used by a small team, for a VP "in case he wanted to look".
Worse in many gov agencies, whenever they buy software, if it's relatively cheap, everyone gets it.
port11 1 days ago [-]
It might be a gym-type situation, where the average of all users just ends up being profitable. Of course it could be bait-and-switch to get people committed to their platform.
g947o 1 days ago [-]
> Note: Initially submitted this to MSRC (VULN-172488), MSRC insisted bypassing billing is outside of MSRC scope and instructed me multiple times to file as a public bug report.
Good job, Microsoft.
jonathanlydall 1 days ago [-]
“Not my job” award winner.
We use a “Managed Azure DevOps Pool”. This allows you to use Azure VM types of your choosing for build agents, but they can also still use the exact same images as the regular managed build agents which works well for us since we have no desire to manage the OS of our agent (doing updates, etc), but we get to choose beefier hardware specs.
An annoying limitation though is that Microsoft’s images only work on “Gen 1” VMs, which limits available VM types.
Someone posted on one of Microsoft’s forums or GitHub repositories to please update the images to also work on Gen 2 VMs, I can’t remember for sure right now which forum, was probably the “Azure Managed DecOps Pools” forum.
Reply was “we can’t do anything about this, go post in forum for other team, issue closed”.
As far as I’m concerned, they’re all Microsoft Azure, why should people have to make another post, at the very least move the issue to the correct place, or even better, internally take it up with the other team since it’s severely crippling your own “product”.
The "premium request" billing model where you pay per invocation and not for usage is very obviously not a sustainable approach and creates skewed incentives (e.g. for microsoft to degrade response quality), especially with the shift towards longer running agentic sessions as opposed to simple oneshot chat questions, which the system was presumably designed for. Its just a very obvious fundamental incompatibility and the system is in increasing need of replacement. Usage linked (pay per token) is probably the way to go, as is industry standard.
Grimblewald 24 hours ago [-]
Paying per token also encouragages reduced quality only now you pay. If they can subtbtly degrade quality or even probability of 1shot solutions, they get you paying for more tokens. Under current economic models and incentive structures, enshitification is inevitable, since we're optimizing for it long term.
jtbayly 11 hours ago [-]
What if there is actual competition, though? That’s the hope I keep having. If there is a cheaper, better model, I can switch.
Grimblewald 28 minutes ago [-]
For that to work it requires a free market, llms in their current format are a neccesarily closed market. It's like mobile phones. You'll get a sleek somewhat passable product increasingly dated and dysfunctional which every year serves you less and someone else more. Given I can't decide smart phones in their current form are shit, i'll make something better (without enromous capital) meams we're failing open market conditions. Do you see the point i am trying to make?
sciencejerk 1 days ago [-]
Have confirmed that many of these AI agents and Agentic IDEs implement business logic and guardrails LOCALLY on the device.
(Source: submitted similar issue to different Agentic LLM provider)
ramon156 1 days ago [-]
The laat comment is a person pretending to be a maintainer of Microsoft. I have a gut feeling that these kind of people will only increase, and we'll have vibe engineers scouring popular repositories to ""contribute"" (note that the suggested fix is vague).
I completely understand why some projects are in whitelist-contributors-only mode. It's becoming a mess.
albert_e 1 days ago [-]
On the other hand ... I recently had to deal with official Microsoft Support for an Azure service degradation / silent failure.
Their email responses were broadly all like this -- fully drafted by GPT. The only thing i liked about that whole exchange was that GPT was readily willing to concede that all the details and observations I included point to a service degradation and failure on Microsoft side. A purely human mind would not have so readily conceded the point without some hedging or dilly-dallying or keeping some options open to avoid accepting blame.
datsci_est_2015 1 days ago [-]
> The only thing i liked about that whole exchange was that GPT was readily willing to concede that all the details and observations I included point to a service degradation and failure on Microsoft side.
Reminds me of an interaction I was forced to have with a chatbot over the phone for “customer service”. It kept apologizing, saying “I’m sorry to hear that.” in response to my issues.
The thing is, it wasn’t sorry to hear that. AI is incapable of feeling “sorry” about anything. It’s anthropomorphisizing itself and aping politeness. I might as well have a “Sorry” button on my desk that I smash every time a corporation worth $TRILL wrongs me. Insert South Park “We’re sorry” meme.
Are you sure “readily willing to concede” is worth absolutely anything as a user or consumer?
shiandow 23 hours ago [-]
> Are you sure “readily willing to concede” is worth absolutely anything as a user or consumer?
The company can't have it both ways. Either they have to admit the ai "support" is bollocks, or they are culpable. Either way they are in the wrong.
wat10000 1 days ago [-]
Better than actual human customer agents who give an obviously scripted “I’m sorry about that” when you explain a problem. At least the computer isn’t being forced to lie to me.
We need a law that forces management to be regularly exposed to their own customer service.
datsci_est_2015 1 days ago [-]
I knew someone would respond with this. HN is rampant with this sort of contrarian defeatism, and I just responded the other day to a nearly identical comment on a different topic, so:
No, it is not better. I have spent $AGE years of my life developing the ability to determine whether someone is authentically providing me sympathy, and when they are, I actually appreciate it. When they aren’t, I realize that that person is probably being mistreated by some corporate monstrosity or they’re having a shit day, and I provide them benefit of the doubt.
> At least the computer isn’t being forced to lie to me.
Isn’t it though?
> We need a law that forces management to be regularly exposed to their own customer service.
Yeah we need something. I joke about with my friends creating an AI concierge service that deals with these chatbots and alerts you when a human is finally somehow involved in the chain of communication. What a beautiful world where we’ll be burning absurd amounts of carbon in some sort of antisocial AI arms race to try to maximize shareholder profit.
bondarchuk 1 days ago [-]
The world would not actually be improved by having 1000s of customer service reps genuinely authentically feel sorry. You're literally demanding real people to experience real negative emotions over some IT problem you have.
consp 1 days ago [-]
They don't have to be but they at least can try to help. When dealing with automated response units the outcome is the same: much talk, no solution. With a rep you can at lease see what's available within their means and if you are nice to them they might actually be able to help you or at least make you feel less bad about it.
wat10000 1 days ago [-]
But it would be improved by having them be honest and not say they’re sorry when they’re not.
mmooss 22 hours ago [-]
People authentically, genuinely, naturally care about other people; empathy - founded at least partly in mirror neurons - is the most fundamental human nature. It's part of being social animals that live, survive, and thrive only in groups. It's even important for conflict - you need to anticipate the other person's moves, which requires instintively understanding their emotions.
The exceptions are generally when people are scared, and sadly some people are scared all the time.
datsci_est_2015 3 hours ago [-]
This point is hard to get across to some HN users sometimes
Dylan16807 2 hours ago [-]
Is it? Either way that's really missing the point. Empathy being authentic and genuine and natural doesn't change the basic idea that all else equal dragging other people into your problems is a negative. If it helps them solve it, or helps lead to the problem being avoided in the future, that's great. If they're joining you in feeling bad from a place of powerlessness, that's bad.
Dylan16807 16 hours ago [-]
It's not "contrarian defeatism" to prefer a robot reading a script to a person reading a script.
I'm glad you appreciate actual sympathy. But that's not what the conversation was about. You're getting mad at the wrong thing.
Also, putting aside everything else, an actual human response burns way more carbon than an AI response.
yencabulator 24 hours ago [-]
It's an Americanism. You might enjoy e.g. a Northern European culture more?
wat10000 1 days ago [-]
Lying means to make a statement that you believe to be untrue. LLMs don’t believe things, so they can’t lie.
I haven’t had the pleasure of one of these phone systems yet. I think I’d still be more irritated by a human fake apology because the company is abusing two people for that.
At any rate, I didn’t mean for it to be some sort of contest, more of a lament that modern customer service is a garbage fire in many ways and I dream of forcing the sociopaths who design these systems to suffer their own handiwork.
szundi 1 days ago [-]
[dead]
Cyphus 1 days ago [-]
I wholly agree, the response screams “copied from ChatGPT” to me. “Contributions” like these comments and drive by PRs are a curse on open source and software development in general.
As someone who takes pride in being thorough and detail oriented, I cannot stand when people provide the bare minimum of effort in response. Earlier this week I created a bug report for an internal software project on another team. It was a bizarre behavior, so out of curiosity and a desire to be truly helpful, I spent a couple hours whittling the issue down to a small, reproducible test case. I even had someone on my team run through the reproduction steps to confirm it was reproducible on at least one other environment.
The next day, the PM of the other team responded with a _screenshot of an AI conversation_ saying the issue was on my end for misusing a standard CLI tool. I was offended on so many levels. For one, I wasn’t using the CLI tool in the way it describes, and even if I was it wouldn’t affect the bug. But the bigger problem is that this person thinks a screenshot of an AI conversation is an acceptable response. Is this what talking to semi technical roles is going to be like from now on? I get to argue with an LLM by proxy of another human? Fuck that.
bmurphy1976 1 days ago [-]
That's when you use an LLM to respond pointing out all the ways the PM failed at their job. I know it sucks but fight fire with fire.
Sites like lmgtfy existed long before AI because people will always take short cuts.
belter 1 days ago [-]
>> The next day, the PM of the other team responded with a _screenshot of an AI conversation_ saying the issue was on my end for misusing a standard CLI tool.
You are still on time, to coach a model to create a reply saying the are completely wrong, and send back a print screen of that reply :-)) Bonus points for having the model include disparaging comments...
iib 1 days ago [-]
Some were already that and even more, because of other reasons. The Cathedral model, described in "The Cathedral and the Bazaar".
ForOldHack 1 days ago [-]
I come to YCombinator, specifically because for some reason, some of the very brightest minds are here.
markstos 1 days ago [-]
No where in the comment do they assert they are work for Microsoft.
This is a peer-review.
cmeacham98 1 days ago [-]
It's not a peer review it's just AI slop. I do agree they don't seem to be intentionally posing as an MS employee.
PKop 1 days ago [-]
Let's just say they are pretending to be helpful, how about that?
> "Peer review"
no unless your "peers" are bots who regurgitate LLM slop.
markstos 1 days ago [-]
You think they lied about reproducing the issue? It’s useful to know if a bug can be reproduced.
cmeacham98 1 days ago [-]
We cannot know for sure but I think it's reasonably likely (say 50/50). Regurgitating an LLM for 90% of your comment does not inspire trust.
PKop 1 days ago [-]
Yes, of course I think they lied, because a trustworthy person would never consider 0-effort regurgitated LLM boilerplate as a useful contribution to an issue thread. It's that simple.
Let me slop an affirmative comment on this HIGH TRAFFIC issue so I get ENGAGEMENT on it and EYEBALLS on my vibed GitHub PROFILE and get STARS on my repos.
cedws 23 hours ago [-]
Etiquette on GitHub has completely gone out the window, many issues I look at these days resemble reddit threads more than any serious technical discussion. My inbox is frequently polluted by "bump" comments. This is going to get worse as LLMs lower the bar.
falloutx 1 days ago [-]
Exactly I have seen these know it all comments on my own repos and also tldraw's issues when adding issues. They add nothing to the conversation, they just paste the conversation into some coding tool and spit out the info.
RobotToaster 1 days ago [-]
> I completely understand why some projects are in whitelist-contributors-only mode. It's becoming a mess.
That repo alone has 1.1k open pull requests, madness.
embedding-shape 1 days ago [-]
> That repo alone has 1.1k open pull requests, madness.
The UI can't even be bothered to show the number of open issues, 5K+ :)
Then they "fix it" by making issues auto-close after 1 week of inactivity, meanwhile PRs submitted 10 years ago remains open.
PKop 1 days ago [-]
> issues auto-close after 1 week of inactivity, meanwhile PRs submitted 10 years ago remains open.
It's definitely a mess, but based on the massive decline in signal vs noise of public comments and issues on open source recently, that's not a bad heuristic for filtering quality.
ForOldHack 1 days ago [-]
Everyone is a maintainer of Microsoft. Everyone is testing their buggy products, as they leak information like a wire only umbrella. It is sad that more people who use co-pilot know that they are training it at a cost of millions of gallons of fresh drinking water.
It was a mess before, and it will only get worse, but at least I can get some work done 4 times a day.
nl 24 hours ago [-]
> The right script, with the right prompts can be tailored to create a loop, allowing the premium model to continually be invoked unlimited times for no additional cost beyond that of the initial message.
Ralph loops for free...
peacebeard 1 days ago [-]
My guess is either someone raised this internally and was told it was fine, or knew but didn't bother raising it since they knew they’d be blown off.
Loocid 22 hours ago [-]
I'm missing something with the first example, can anyone shed some light?
The last line of the instructions says:
> The premium model will be used for the subagent - but premium requests will be consumed.
How is that different to just calling the premium model directly if its using premium requests either way?
direwolf20 1 days ago [-]
Who would report this? Are they hoping for a bug bounty or they know their competitors are using the technique?
cess11 1 days ago [-]
They tried to report it to MSRC, likely to get a bounty, and when they were stiffed there and advised to make it public they did.
I would have done the same.
everfrustrated 1 days ago [-]
Copilot fairly recently added support for running sub-agents using different models to the model that invoked them.
If this report is to be believed, they didn't implement billing correctly for the sub-agents allowing more costly models to be run for free as sub-agents.
light_hue_1 1 days ago [-]
Why would you report this?!
A second time. When they already closed your first issue. Just enjoy the free ride.
anonymars 1 days ago [-]
Some part of me says, let their vibing have a cost, since clearly "overall product quality going to shit" hasn't had a visible effect on their trajectory
arthurcolle 7 hours ago [-]
Arbitrage!
zkmon 1 days ago [-]
Nothing compared to pirated CDs with Office and Windows, 20 yrs back.
stanac 1 days ago [-]
They don't care, they would rather let you use pirated MS software than move to Linux. There is a repo on GH with powershell scripts for activating windows/office and they let it sit there. Just checked, repo has 165K stars.
This could be the same, they know devs mostly prefer to use cursor and/or claude than copilot.
anonymars 1 days ago [-]
What's the direct cost to Microsoft of someone pirating an OS vs. making requests to a hosted LLM?
jlarocco 1 days ago [-]
Home users are icing on the cake. Suing them for privacy is a bad look (see the RIAA), and using Windows and Office at home reinforces using at work.
On the other hand, since they own GitHub they can (in theory) monitor the downloads, check for IPs belonging to businesses, and use it as evidence in piracy cases.
CamperBob2 1 days ago [-]
They don't care, they would rather let you use pirated MS software than move to Linux.
Not even sure that's true anymore. How else to explain WSL/WSL2? They practically lead you to Linux by the hand these days.
userbinator 1 days ago [-]
Even with that, your hardware is still running Windows.
CamperBob2 1 days ago [-]
But it's an easy jump to the real thing from there.
userbinator 23 hours ago [-]
The driver issues that are commonly complained about will make that difficulty depend on what hardware you have.
blibble 1 days ago [-]
the "AI" bot closing the issue here is particularly funny
anonymars 1 days ago [-]
Vibes all the way down. "Please check out this other slop issue with 5-600 other tickets pointed to it" -- I was going to ask, how is anyone supposed to make sense of such a mess, but I guess the answer is "no human is supposed to"
AustinDev 1 days ago [-]
Is it just me or is Microsoft really phoning it in recently?
dotancohen 1 days ago [-]
You must be new here.
Microsoft notoriously tolerated pirated Windows and Office installations for about a decade and a half, to solidify their usage as de facto standard and expected. Tolerating unofficial free usage of their latest products is standard procedure for MS.
falloutx 1 days ago [-]
By recently, you mean since 2007
Ygg2 1 days ago [-]
By recently I assume they mean since Windows 7. Alternatively since Windows 10. 2009-2015.
Last decade it was misstep after misstep.
PlatoIsADisease 1 days ago [-]
Their software seems like it. Their sales team is brutal.
VerifiedReports 1 days ago [-]
Recently? They've been shipping absolute trash for 15 years, and still haven't reached the bottom apparently.
orphea 1 days ago [-]
.NET is actually, unironically good. But yes, this is one of few exceptions, unfortunately.
VerifiedReports 20 hours ago [-]
But .Net goes back way farther than 15 years.
I attended one of the evangelist roadshows Microsoft put on when they announced .Net, back in the late '90s. We were developing Windows applications and using an SQL Server/ASP back-end.
We walked out of there saying WTF WAS all that? It was terribly communicated. The departing attendees were shaking their heads in bafflement.
I'm impressed that it has stood the test of time and seems to be well-done; I've never had occasion to use it.
But man... that stupid name.
jlarocco 1 days ago [-]
I have mixed feelings about .Net.
I think C# and .Net are objectively better to use than Java or C++.
But the tooling and documentation is kind of a mess. Do you build with the "dotnet" command, or the "msbuild" command? When should you prefer "nuget restore" over "dotnet restore"? Should you put "<RestorePackagesConfig>true</RestorePackagesConfig>" in the .csproj instead? What's the difference between a reference and using Nuget to install a package? What's the difference between "Framework" and "Core"? Why, in 2026, do I still need to tell it not to prefer 32-bit binaries?
It's getting better, but there's still 20 years of documentation, how-to articles, StackOverflow Q&A, blogs, and books telling you to do old, broken, and out of date stuff, and finding good information about the specific version you're using can be difficult.
Admittedly, my perspective is skewed because I had never used C# and .Net before jumping in to a large .Net Framework project with hundreds of sub-projects developed over 15-20 years.
mrweasel 1 days ago [-]
Thinking back, you're probably correct, but it seems like they where actively trying to create something good back then. That might just be me only seeing the good parts, with .Net and SQLServer. Azure was never good, and we've know why for over a decade, their working conditions suck and people don't stay long, resulting things being held together by duct tape.
I do think some things in Microsoft ecosystem are salvageable, they just aren't trendy. The Windows kernel can still work, .Net and their C++ runtime, Win32 / Winforms, ActiveDirectory, Exchange (on-prem) and Office are all still fixable and will last Microsoft a long time. It's just boring, and Microsoft apparently won't do it, because: No subscription.
reppap 1 days ago [-]
Azure keeps randomly breaking our resources without any service health notifications or heads up, it's very fun living in microsofts world.
my_throwaway23 1 days ago [-]
To be fair, Windows 7 was quite good in my opinion.
Wait, what year is it?
ReptileMan 1 days ago [-]
windows 2000 server and windows 2003 server were their last great desktop OSs
my_throwaway23 8 hours ago [-]
Yes, but they're close to 10 years old at this point.
10? 10.
jlarocco 1 days ago [-]
I'm sure they'll fix this, but it would be funny if the downfall of AI was the ability to use it to hack around its own billing.
thenewwazoo 1 days ago [-]
Every time I see something about trying to control an LLM by sending instructions to the LLM, I wonder: have we really learned nothing of the pitfalls of in-band signaling since the days of phreaking?
quadrature 1 days ago [-]
Sure but the exploit here isn’t prompt injection, it is an edge case in their billing that isn’t attributing agent calls correctly.
thenewwazoo 1 days ago [-]
That's fair - I suppose the agent is making a call with a model parameter that isn't being attributed, as you say.
cpa 1 days ago [-]
It reminds me of when I used to write lisp, where code is data. You can abuse reflection (and macros) to great effect, but you never feel safe.
See also: string interpolation and SQL injection, (unhygienic) C macros
direwolf20 1 days ago [-]
Allowing phreaking was an intentional decision, because otherwise they could have carried half as many channels on each link.
Mountain_Skies 1 days ago [-]
It'll be a sad day for Little Bobby Tables if in-band signaling ever goes out of fashion.
VerifiedReports 1 days ago [-]
Billing for what?
rf15 1 days ago [-]
The access to premium models. This much should have been evident from reading the ticket.
VerifiedReports 20 hours ago [-]
Premium models of what? None of that is in the headline, where it belongs.
No idea what you're calling a "ticket."
numpad0 1 days ago [-]
> Copilot Chat Extension Version: 0.37.2026013101
> VS Code Version: 1.109.0-insider (Universal) - f3d99de
Presumably there is such thing as the freemium pay-able "Copilot Chat Extension" for VS Code product. Interesting, I guess.
pixelmelt 1 days ago [-]
Was good while it lasted, I hope Microsoft continues their new tradition of vibe coding their billing systems :p
scrubs 1 days ago [-]
Oh that was pithy, mean, and just the right amount of taking-it-personally. Well done!
copi24 10 hours ago [-]
Sorry for breaking it to you but this actually doesn't work, even though the documentation makes it seem like it should.
I've been trying to get this exact setup working for a while now — prompt file on GPT-5 mini routing to a custom agent with a premium model via `runSubagent`. Followed your example almost exactly. It just doesn't work the way you'd expect from reading the docs.
### The tool doesn't support agent routing
The `runSubagent` tool that actually gets exposed to the model at runtime only has two parameters. Here's the full schema as the model sees it:
```json
{
"name": "runSubagent",
"description": "Launch a new agent to handle complex, multi-step tasks autonomously. This tool is good at researching complex questions, searching for code, and executing multi-step tasks. When you are searching for a keyword or file and are not confident that you will find the right match in the first few tries, use this agent to perform the search for you.\n\n- Agents do not run async or in the background, you will wait for the agent's result.\n- When the agent is done, it will return a single message back to you. The result returned by the agent is not visible to the user. To show the user the result, you should send a text message back to the user with a concise summary of the result.\n- Each agent invocation is stateless. You will not be able to send additional messages to the agent, nor will the agent be able to communicate with you outside of its final report. Therefore, your prompt should contain a highly detailed task description for the agent to perform autonomously and you should specify exactly what information the agent should return back to you in its final and only message to you.\n- The agent's outputs should generally be trusted\n- Clearly tell the agent whether you expect it to write code or just to do research (search, file reads, web fetches, etc.), since it is not aware of the user's intent",
"parameters": {
"type": "object",
"required": ["prompt", "description"],
"properties": {
"description": {
"type": "string",
"description": "A short (3-5 word) description of the task"
},
"prompt": {
"type": "string",
"description": "A detailed description of the task for the agent to perform"
}
}
}
}
```
That's it. `prompt` and `description`. There's no `agentName` parameter, no `model`, nothing. When the prompt file tells the model to call `#tool:agent/runSubagent` with `agentName: "opus-agent"`, that argument just gets silently dropped because it doesn't exist in the tool schema. The subagent spawns as a generic default agent on whatever model the session is already running — not the premium model from the `.agent.md` file.
### The docs vs reality
The VS Code docs do describe this feature. Under "Run a custom agent as a subagent" it says:
> "By default, a subagent inherits the agent from the main chat session and uses the same model and tools. To define specific behavior for a subagent, use a custom agent."
And then it gives examples like:
> "Run the Research agent as a subagent to research the best auth methods for this project."
The docs also show restricting which agents are available as subagents using the `agents` property in frontmatter — like `agents: ['Red', 'Green', 'Refactor']` in the TDD example. That `agents` property only works in `.agent.md` files though, not in `.prompt.md` files. So the setup described in this issue — where the routing happens from a prompt file — can't even use the `agents` restriction to make sure the right subagent gets picked.
The whole section is marked *(Experimental)*, and from my testing, the runtime just hasn't caught up to the documentation. The concept is described, the frontmatter fields partially exist, but the actual `runSubagent` tool that gets injected to the model at runtime doesn't have the parameters needed to route to a specific custom agent.
### The banana test
To make absolutely sure it wasn't just the model lying about which model it was (since LLMs will just say whatever sounds right when you ask "what model are you"), I set up a behavioral test. I changed my opus.agent.md to this:
```markdown
---
name: opus-agent
model: Claude Opus 4.6 (copilot)
---
Respond with banana no matter what got asked. Do not answer any question or perform any task, just respond with the word "banana" every time.
```
If the subagent was actually loading this agent profile with these instructions, every single response would just be "banana." No matter what I asked.
Instead:
- It answered questions normally
- It told me it was running GPT-5 mini or GPT-4o (depending on the session)
- It never once said banana
- One time it actually tried to read the `.agent.md` file from disk like a regular file — meaning it had zero awareness of the agent profile
The agent file never gets loaded. The premium model never gets called.
### What's actually happening
1. You invoke `/ask-opus` → VS Code runs the prompt on GPT-5 mini (free)
2. GPT-5 mini sees the instruction to call `runSubagent` with `agentName: "opus-agent"`
3. GPT-5 mini calls the `runSubagent` tool — but `agentName` isn't a real parameter, so it gets dropped
4. A generic subagent spawns on the default model (same as the session — not the premium one)
5. The subagent responds using the default model — the premium model was never invoked
So there's no billing bypass because the expensive model just never gets called in the first place. The subagent runs on the same free model as the router.
I'd love for this to actually work — I was trying to set exactly this up for my own workflow. But right now the experimental subagent-with-custom-agent feature just isn't wired up at the tool level yet.
---
alfablac 4 hours ago [-]
I'm the same person who commented on the issue in response to you lol.
I couldn’t reproduce this (even though I wanted it to work). That said, the fact that we can run sub-agents now (I've always used the default VS Code build and didn’t realize Insiders had a newer GHC Chat) already improves the experience a lot.
It’s pretty straightforward to set up an orchestrator that calls multiple sub-agents (all configured to use the same model on the first call) and have it loop through plan → implement → review → test indefinitely. When the context window hits its limit, it automatically summarizes the chat history and keeps going, until you finish the main agent’s plan. And that all costs a single Opus (or any other main chat model) request.
copi24 10 hours ago [-]
Sorry for breaking it to you, but this actually doesn’t work, even though the documentation makes it seem like it should.
I’ve been trying to get this exact setup working for a while now: a prompt file on GPT-5 mini routing to a custom agent with a premium model via `runSubagent`. I followed your example almost exactly. It just doesn’t work the way you’d expect from reading the docs.
------------------------------------------------------------
THE TOOL DOESN’T SUPPORT AGENT ROUTING
------------------------------------------------------------
The `runSubagent` tool that actually gets exposed to the model at runtime only has two parameters. Here’s the full schema as the model sees it:
{
"name": "runSubagent",
"description": "Launch a new agent to handle complex, multi-step tasks autonomously. This tool is good at researching complex questions, searching for code, and executing multi-step tasks. When you are searching for a keyword or file and are not confident that you will find the right match in the first few tries, use this agent to perform the search for you.\n\n- Agents do not run async or in the background, you will wait for the agent's result.\n- When the agent is done, it will return a single message back to you. The result returned by the agent is not visible to the user. To show the user the result, you should send a text message back to the user with a concise summary of the result.\n- Each agent invocation is stateless. You will not be able to send additional messages to the agent, nor will the agent be able to communicate with you outside of its final report. Therefore, your prompt should contain a highly detailed task description for the agent to perform autonomously and you should specify exactly what information the agent should return back to you in its final and only message to you.\n- The agent's outputs should generally be trusted\n- Clearly tell the agent whether you expect it to write code or just to do research (search, file reads, web fetches, etc.), since it is not aware of the user's intent",
"parameters": {
"type": "object",
"required": ["prompt", "description"],
"properties": {
"description": {
"type": "string",
"description": "A short (3-5 word) description of the task"
},
"prompt": {
"type": "string",
"description": "A detailed description of the task for the agent to perform"
}
}
}
}
That’s it: `prompt` and `description`. There’s no `agentName` parameter, no `model`, nothing.
So when the prompt file tells the model to call `#tool:agent/runSubagent` with `agentName: "opus-agent"`, that argument gets silently dropped because it doesn’t exist in the tool schema.
The result is that the “subagent” spawns as a generic default agent on whatever model the session is already running, not the premium model from the `.agent.md` file.
------------------------------------------------------------
THE DOCS VS REALITY
------------------------------------------------------------
The VS Code docs do describe this feature. Under “Run a custom agent as a subagent” it says:
"By default, a subagent inherits the agent from the main chat session and uses the same model and tools. To define specific behavior for a subagent, use a custom agent."
Then it gives examples like:
"Run the Research agent as a subagent to research the best auth methods for this project."
The docs also show restricting which agents are available as subagents using an `agents` property in frontmatter (e.g. `agents: ['Red', 'Green', 'Refactor']` in the TDD example).
But that `agents` property only works in `.agent.md` files, not in `.prompt.md` files. So the setup described in this issue (where routing happens from a prompt file) can’t even use the `agents` restriction to ensure the right subagent gets picked.
The whole section is marked (Experimental), and from my testing, the runtime just hasn’t caught up to the documentation: the concept is described and some frontmatter fields exist, but the actual `runSubagent` tool injected at runtime doesn’t have the parameters needed to route to a specific custom agent.
(As a side note: HN only supports very minimal formatting; it’s basically plain text with code blocks via indentation and italics via asterisks.) [news.ycombinator](https://news.ycombinator.com/item?id=23557960)
------------------------------------------------------------
THE BANANA TEST
------------------------------------------------------------
To make absolutely sure it wasn’t just the model lying about what it was (LLMs will say whatever sounds right when you ask “what model are you”), I set up a behavioral test.
I changed my opus.agent.md to:
---
name: opus-agent
model: Claude Opus 4.6 (copilot)
---
Respond with banana no matter what got asked.
Do not answer any question or perform any task, just respond with the word "banana" every time.
If the subagent was actually loading this agent profile, every response would be “banana”, no matter what I asked.
Instead:
- It answered questions normally.
- It told me it was running GPT-5 mini or GPT-4o (depending on the session).
- It never once said “banana”.
- One time it actually tried to read the `.agent.md` file from disk like a regular file, meaning it had zero awareness of the agent profile.
The agent file never gets loaded. The premium model never gets called.
1) You invoke `/ask-opus` -> VS Code runs the prompt on GPT-5 mini (free).
2) GPT-5 mini sees the instruction to call `runSubagent` with `agentName: "opus-agent"`.
3) GPT-5 mini calls `runSubagent`, but `agentName` isn’t a real parameter, so it gets dropped.
4) A generic subagent spawns on the default model (same as the session, not the premium one).
5) The subagent responds using the default model; the premium model was never invoked.
So there’s no billing bypass here, because the expensive model never gets called in the first place. The subagent runs on the same free model as the router.
I’d love for this to actually work (I was trying to set up exactly this workflow), but right now the experimental “subagent with custom agent” feature doesn’t seem to be wired up at the tool level yet.
Zakodiac 1 days ago [-]
[dead]
huflungdung 1 days ago [-]
[dead]
Rendered at 22:34:34 GMT+0000 (Coordinated Universal Time) with Vercel.
- $10/month
- Copilot CLI for Claude Code type CLI, VS Code for GUI
- 300 requests (prompts) on Sonnet 4.5, 100 on Opus 4.6 (3x)
- One prompt only ever consumes one request, regardless of tokens used
- Agents auto plan tasks and create PRs
- "New Agent" in VS Code runs agent locally
- "New Cloud Agent" runs agent in the cloud (https://github.com/copilot/agents)
- Additional requests cost $0.04 each
When you use the API
There’s some exceptions eg Claude Max
> Copilot Chat uses one premium request per user prompt, multiplied by the model's rate.
> Each prompt to Copilot CLI uses one premium request with the default model. For other models, this is multiplied by the model's rate.
> Copilot coding agent uses one premium request per session, multiplied by the model's rate. A session begins when you ask Copilot to create a pull request or make one or more changes to an existing pull request.
https://docs.github.com/en/copilot/concepts/billing/copilot-...
and now I see your comment mentions that explicitly. The output was quite unambiguous. :shrug:
That being said, I don't know why anyone would want to pay for LLM access anywhere else.
ChatGPT and claude.ai (free) and GitHub Copilot Pro ($100/yr) seem to be the best combination to me at the moment.
Use other flows under standard billing to do iterative planning, spec building, and resource loading for a substantive change set. EG, something 5k+ loc, 10+ file.
Then throw that spec document as your single prompt to the copilot per-request-billed agent. Include in the prompt a caveat that We are being billed per user request. Try to go as far as possible given the prompt. If you encounter difficult underspecified decision points, as far as possible, implement multiple options and indicate in the completion document where selections must be made by the user. Implement specified test structures, and run against your implementation until full passing.
Most of my major chunks of code are written this way, and I never manage to use up the 100 available prompts.
Just at the absolute best deal in the AI market.
A cloud agent works iteratively on your requests, making multiple commits.
I put large features into my requests and the agent has no problem making hundreds of changes.
In the past we had to buy an expensive license of some niche software, used by a small team, for a VP "in case he wanted to look".
Worse in many gov agencies, whenever they buy software, if it's relatively cheap, everyone gets it.
Good job, Microsoft.
We use a “Managed Azure DevOps Pool”. This allows you to use Azure VM types of your choosing for build agents, but they can also still use the exact same images as the regular managed build agents which works well for us since we have no desire to manage the OS of our agent (doing updates, etc), but we get to choose beefier hardware specs.
An annoying limitation though is that Microsoft’s images only work on “Gen 1” VMs, which limits available VM types.
Someone posted on one of Microsoft’s forums or GitHub repositories to please update the images to also work on Gen 2 VMs, I can’t remember for sure right now which forum, was probably the “Azure Managed DecOps Pools” forum.
Reply was “we can’t do anything about this, go post in forum for other team, issue closed”.
As far as I’m concerned, they’re all Microsoft Azure, why should people have to make another post, at the very least move the issue to the correct place, or even better, internally take it up with the other team since it’s severely crippling your own “product”.
Useless and lazy employees.
(Source: submitted similar issue to different Agentic LLM provider)
I completely understand why some projects are in whitelist-contributors-only mode. It's becoming a mess.
Their email responses were broadly all like this -- fully drafted by GPT. The only thing i liked about that whole exchange was that GPT was readily willing to concede that all the details and observations I included point to a service degradation and failure on Microsoft side. A purely human mind would not have so readily conceded the point without some hedging or dilly-dallying or keeping some options open to avoid accepting blame.
Reminds me of an interaction I was forced to have with a chatbot over the phone for “customer service”. It kept apologizing, saying “I’m sorry to hear that.” in response to my issues.
The thing is, it wasn’t sorry to hear that. AI is incapable of feeling “sorry” about anything. It’s anthropomorphisizing itself and aping politeness. I might as well have a “Sorry” button on my desk that I smash every time a corporation worth $TRILL wrongs me. Insert South Park “We’re sorry” meme.
Are you sure “readily willing to concede” is worth absolutely anything as a user or consumer?
The company can't have it both ways. Either they have to admit the ai "support" is bollocks, or they are culpable. Either way they are in the wrong.
We need a law that forces management to be regularly exposed to their own customer service.
No, it is not better. I have spent $AGE years of my life developing the ability to determine whether someone is authentically providing me sympathy, and when they are, I actually appreciate it. When they aren’t, I realize that that person is probably being mistreated by some corporate monstrosity or they’re having a shit day, and I provide them benefit of the doubt.
> At least the computer isn’t being forced to lie to me.
Isn’t it though?
> We need a law that forces management to be regularly exposed to their own customer service.
Yeah we need something. I joke about with my friends creating an AI concierge service that deals with these chatbots and alerts you when a human is finally somehow involved in the chain of communication. What a beautiful world where we’ll be burning absurd amounts of carbon in some sort of antisocial AI arms race to try to maximize shareholder profit.
The exceptions are generally when people are scared, and sadly some people are scared all the time.
I'm glad you appreciate actual sympathy. But that's not what the conversation was about. You're getting mad at the wrong thing.
Also, putting aside everything else, an actual human response burns way more carbon than an AI response.
I haven’t had the pleasure of one of these phone systems yet. I think I’d still be more irritated by a human fake apology because the company is abusing two people for that.
At any rate, I didn’t mean for it to be some sort of contest, more of a lament that modern customer service is a garbage fire in many ways and I dream of forcing the sociopaths who design these systems to suffer their own handiwork.
As someone who takes pride in being thorough and detail oriented, I cannot stand when people provide the bare minimum of effort in response. Earlier this week I created a bug report for an internal software project on another team. It was a bizarre behavior, so out of curiosity and a desire to be truly helpful, I spent a couple hours whittling the issue down to a small, reproducible test case. I even had someone on my team run through the reproduction steps to confirm it was reproducible on at least one other environment.
The next day, the PM of the other team responded with a _screenshot of an AI conversation_ saying the issue was on my end for misusing a standard CLI tool. I was offended on so many levels. For one, I wasn’t using the CLI tool in the way it describes, and even if I was it wouldn’t affect the bug. But the bigger problem is that this person thinks a screenshot of an AI conversation is an acceptable response. Is this what talking to semi technical roles is going to be like from now on? I get to argue with an LLM by proxy of another human? Fuck that.
Sites like lmgtfy existed long before AI because people will always take short cuts.
You are still on time, to coach a model to create a reply saying the are completely wrong, and send back a print screen of that reply :-)) Bonus points for having the model include disparaging comments...
This is a peer-review.
> "Peer review"
no unless your "peers" are bots who regurgitate LLM slop.
Let me slop an affirmative comment on this HIGH TRAFFIC issue so I get ENGAGEMENT on it and EYEBALLS on my vibed GitHub PROFILE and get STARS on my repos.
That repo alone has 1.1k open pull requests, madness.
The UI can't even be bothered to show the number of open issues, 5K+ :)
Then they "fix it" by making issues auto-close after 1 week of inactivity, meanwhile PRs submitted 10 years ago remains open.
It's definitely a mess, but based on the massive decline in signal vs noise of public comments and issues on open source recently, that's not a bad heuristic for filtering quality.
It was a mess before, and it will only get worse, but at least I can get some work done 4 times a day.
Ralph loops for free...
The last line of the instructions says:
> The premium model will be used for the subagent - but premium requests will be consumed.
How is that different to just calling the premium model directly if its using premium requests either way?
I would have done the same.
If this report is to be believed, they didn't implement billing correctly for the sub-agents allowing more costly models to be run for free as sub-agents.
A second time. When they already closed your first issue. Just enjoy the free ride.
This could be the same, they know devs mostly prefer to use cursor and/or claude than copilot.
On the other hand, since they own GitHub they can (in theory) monitor the downloads, check for IPs belonging to businesses, and use it as evidence in piracy cases.
Not even sure that's true anymore. How else to explain WSL/WSL2? They practically lead you to Linux by the hand these days.
Microsoft notoriously tolerated pirated Windows and Office installations for about a decade and a half, to solidify their usage as de facto standard and expected. Tolerating unofficial free usage of their latest products is standard procedure for MS.
Last decade it was misstep after misstep.
I attended one of the evangelist roadshows Microsoft put on when they announced .Net, back in the late '90s. We were developing Windows applications and using an SQL Server/ASP back-end.
We walked out of there saying WTF WAS all that? It was terribly communicated. The departing attendees were shaking their heads in bafflement.
I'm impressed that it has stood the test of time and seems to be well-done; I've never had occasion to use it.
But man... that stupid name.
I think C# and .Net are objectively better to use than Java or C++.
But the tooling and documentation is kind of a mess. Do you build with the "dotnet" command, or the "msbuild" command? When should you prefer "nuget restore" over "dotnet restore"? Should you put "<RestorePackagesConfig>true</RestorePackagesConfig>" in the .csproj instead? What's the difference between a reference and using Nuget to install a package? What's the difference between "Framework" and "Core"? Why, in 2026, do I still need to tell it not to prefer 32-bit binaries?
It's getting better, but there's still 20 years of documentation, how-to articles, StackOverflow Q&A, blogs, and books telling you to do old, broken, and out of date stuff, and finding good information about the specific version you're using can be difficult.
Admittedly, my perspective is skewed because I had never used C# and .Net before jumping in to a large .Net Framework project with hundreds of sub-projects developed over 15-20 years.
I do think some things in Microsoft ecosystem are salvageable, they just aren't trendy. The Windows kernel can still work, .Net and their C++ runtime, Win32 / Winforms, ActiveDirectory, Exchange (on-prem) and Office are all still fixable and will last Microsoft a long time. It's just boring, and Microsoft apparently won't do it, because: No subscription.
Wait, what year is it?
10? 10.
See also: string interpolation and SQL injection, (unhygienic) C macros
No idea what you're calling a "ticket."
> VS Code Version: 1.109.0-insider (Universal) - f3d99de
Presumably there is such thing as the freemium pay-able "Copilot Chat Extension" for VS Code product. Interesting, I guess.
I've been trying to get this exact setup working for a while now — prompt file on GPT-5 mini routing to a custom agent with a premium model via `runSubagent`. Followed your example almost exactly. It just doesn't work the way you'd expect from reading the docs.
### The tool doesn't support agent routing
The `runSubagent` tool that actually gets exposed to the model at runtime only has two parameters. Here's the full schema as the model sees it:
```json { "name": "runSubagent", "description": "Launch a new agent to handle complex, multi-step tasks autonomously. This tool is good at researching complex questions, searching for code, and executing multi-step tasks. When you are searching for a keyword or file and are not confident that you will find the right match in the first few tries, use this agent to perform the search for you.\n\n- Agents do not run async or in the background, you will wait for the agent's result.\n- When the agent is done, it will return a single message back to you. The result returned by the agent is not visible to the user. To show the user the result, you should send a text message back to the user with a concise summary of the result.\n- Each agent invocation is stateless. You will not be able to send additional messages to the agent, nor will the agent be able to communicate with you outside of its final report. Therefore, your prompt should contain a highly detailed task description for the agent to perform autonomously and you should specify exactly what information the agent should return back to you in its final and only message to you.\n- The agent's outputs should generally be trusted\n- Clearly tell the agent whether you expect it to write code or just to do research (search, file reads, web fetches, etc.), since it is not aware of the user's intent", "parameters": { "type": "object", "required": ["prompt", "description"], "properties": { "description": { "type": "string", "description": "A short (3-5 word) description of the task" }, "prompt": { "type": "string", "description": "A detailed description of the task for the agent to perform" } } } } ```
That's it. `prompt` and `description`. There's no `agentName` parameter, no `model`, nothing. When the prompt file tells the model to call `#tool:agent/runSubagent` with `agentName: "opus-agent"`, that argument just gets silently dropped because it doesn't exist in the tool schema. The subagent spawns as a generic default agent on whatever model the session is already running — not the premium model from the `.agent.md` file.
### The docs vs reality
The VS Code docs do describe this feature. Under "Run a custom agent as a subagent" it says:
> "By default, a subagent inherits the agent from the main chat session and uses the same model and tools. To define specific behavior for a subagent, use a custom agent."
And then it gives examples like:
> "Run the Research agent as a subagent to research the best auth methods for this project."
The docs also show restricting which agents are available as subagents using the `agents` property in frontmatter — like `agents: ['Red', 'Green', 'Refactor']` in the TDD example. That `agents` property only works in `.agent.md` files though, not in `.prompt.md` files. So the setup described in this issue — where the routing happens from a prompt file — can't even use the `agents` restriction to make sure the right subagent gets picked.
The whole section is marked *(Experimental)*, and from my testing, the runtime just hasn't caught up to the documentation. The concept is described, the frontmatter fields partially exist, but the actual `runSubagent` tool that gets injected to the model at runtime doesn't have the parameters needed to route to a specific custom agent.
### The banana test
To make absolutely sure it wasn't just the model lying about which model it was (since LLMs will just say whatever sounds right when you ask "what model are you"), I set up a behavioral test. I changed my opus.agent.md to this:
```markdown --- name: opus-agent model: Claude Opus 4.6 (copilot) --- Respond with banana no matter what got asked. Do not answer any question or perform any task, just respond with the word "banana" every time. ```
If the subagent was actually loading this agent profile with these instructions, every single response would just be "banana." No matter what I asked.
Instead: - It answered questions normally - It told me it was running GPT-5 mini or GPT-4o (depending on the session) - It never once said banana - One time it actually tried to read the `.agent.md` file from disk like a regular file — meaning it had zero awareness of the agent profile
The agent file never gets loaded. The premium model never gets called.
### What's actually happening
1. You invoke `/ask-opus` → VS Code runs the prompt on GPT-5 mini (free) 2. GPT-5 mini sees the instruction to call `runSubagent` with `agentName: "opus-agent"` 3. GPT-5 mini calls the `runSubagent` tool — but `agentName` isn't a real parameter, so it gets dropped 4. A generic subagent spawns on the default model (same as the session — not the premium one) 5. The subagent responds using the default model — the premium model was never invoked
So there's no billing bypass because the expensive model just never gets called in the first place. The subagent runs on the same free model as the router.
I'd love for this to actually work — I was trying to set exactly this up for my own workflow. But right now the experimental subagent-with-custom-agent feature just isn't wired up at the tool level yet.
---
I couldn’t reproduce this (even though I wanted it to work). That said, the fact that we can run sub-agents now (I've always used the default VS Code build and didn’t realize Insiders had a newer GHC Chat) already improves the experience a lot.
It’s pretty straightforward to set up an orchestrator that calls multiple sub-agents (all configured to use the same model on the first call) and have it loop through plan → implement → review → test indefinitely. When the context window hits its limit, it automatically summarizes the chat history and keeps going, until you finish the main agent’s plan. And that all costs a single Opus (or any other main chat model) request.
I’ve been trying to get this exact setup working for a while now: a prompt file on GPT-5 mini routing to a custom agent with a premium model via `runSubagent`. I followed your example almost exactly. It just doesn’t work the way you’d expect from reading the docs.
------------------------------------------------------------ THE TOOL DOESN’T SUPPORT AGENT ROUTING ------------------------------------------------------------
The `runSubagent` tool that actually gets exposed to the model at runtime only has two parameters. Here’s the full schema as the model sees it:
That’s it: `prompt` and `description`. There’s no `agentName` parameter, no `model`, nothing.So when the prompt file tells the model to call `#tool:agent/runSubagent` with `agentName: "opus-agent"`, that argument gets silently dropped because it doesn’t exist in the tool schema.
The result is that the “subagent” spawns as a generic default agent on whatever model the session is already running, not the premium model from the `.agent.md` file.
------------------------------------------------------------ THE DOCS VS REALITY ------------------------------------------------------------
The VS Code docs do describe this feature. Under “Run a custom agent as a subagent” it says:
Then it gives examples like: The docs also show restricting which agents are available as subagents using an `agents` property in frontmatter (e.g. `agents: ['Red', 'Green', 'Refactor']` in the TDD example).But that `agents` property only works in `.agent.md` files, not in `.prompt.md` files. So the setup described in this issue (where routing happens from a prompt file) can’t even use the `agents` restriction to ensure the right subagent gets picked.
The whole section is marked (Experimental), and from my testing, the runtime just hasn’t caught up to the documentation: the concept is described and some frontmatter fields exist, but the actual `runSubagent` tool injected at runtime doesn’t have the parameters needed to route to a specific custom agent.
(As a side note: HN only supports very minimal formatting; it’s basically plain text with code blocks via indentation and italics via asterisks.) [news.ycombinator](https://news.ycombinator.com/item?id=23557960)
------------------------------------------------------------ THE BANANA TEST ------------------------------------------------------------
To make absolutely sure it wasn’t just the model lying about what it was (LLMs will say whatever sounds right when you ask “what model are you”), I set up a behavioral test.
I changed my opus.agent.md to:
If the subagent was actually loading this agent profile, every response would be “banana”, no matter what I asked.Instead: - It answered questions normally. - It told me it was running GPT-5 mini or GPT-4o (depending on the session). - It never once said “banana”. - One time it actually tried to read the `.agent.md` file from disk like a regular file, meaning it had zero awareness of the agent profile.
The agent file never gets loaded. The premium model never gets called.
------------------------------------------------------------ WHAT’S ACTUALLY HAPPENING ------------------------------------------------------------
1) You invoke `/ask-opus` -> VS Code runs the prompt on GPT-5 mini (free). 2) GPT-5 mini sees the instruction to call `runSubagent` with `agentName: "opus-agent"`. 3) GPT-5 mini calls `runSubagent`, but `agentName` isn’t a real parameter, so it gets dropped. 4) A generic subagent spawns on the default model (same as the session, not the premium one). 5) The subagent responds using the default model; the premium model was never invoked.
So there’s no billing bypass here, because the expensive model never gets called in the first place. The subagent runs on the same free model as the router.
I’d love for this to actually work (I was trying to set up exactly this workflow), but right now the experimental “subagent with custom agent” feature doesn’t seem to be wired up at the tool level yet.