Using a throwaway account for obvious reasons, but I’m very involved in this space using LLMs from multiple providers. I’m aware of at least two instances in which the intermediate infrastructure “swapped” responses, once impacting Claude models and once impacting GPT models, from two different providers.
One gave us a proper postmortem in which their API gateway was incorrectly handling HTTP 100 status codes, putting them into an error state where there was effectively an off by one error - you would receive the response to the prompt that came in before yours and would pay it forward (your response would go to the next caller).
The other instance never had root cause explained to us, and we were just told to trust it wouldn’t happen again.
Both of these are from $1T+ companies.
ZDR wasn’t compromised in these cases since it was responses being swapped in flight. I wouldn’t be surprised if this is a similar issue - it’s not that data is being retained, it’s just not being safely isolated in intermediate infrastructure.
pocksuppet 13 hours ago [-]
This attack is called "HTTP desync" or "request smuggling". It's often done intentionally by a client to try and spy on other clients' responses.
Every time you multiplex requests from multiple clients onto one upstream connection, you are probably vulnerable to this, because (despite its superficial simplicity) HTTP is just too complex to reliably match the requests and responses to upstream.
For example a desync can be triggered in some systems by having more than one Content-Length header, by mixing Content-Length with chunked encoding, or by passing an HTTP/2 header called Content-Length that doesn't match the actual content length.
The same attack has been applied to SMTP by messing up the line endings surrounding the end-of-message delimiter, where it's called SMTP smuggling. It may also apply to other protocols.
markasoftware 12 hours ago [-]
Very true, this was likely an attack. Worth noting that mr kettle has done a defcon talk nearly every year on some variant of this attack, the most recent one titled "HTTP/1.1 must die" because he rightfully believes that switching to the binary headers of http/2 (specifically in reverse proxy connections to upstream servers) is the only way to systematically prevent these.
albinowax_ 12 hours ago [-]
I’ll be back next month with a load of fresh vectors in “Can AI Do Novel Security Research? Meet the HTTP Terminator”
Maybe my last presentation on the topic! Possibly.
bostik 11 hours ago [-]
Or as the Risky Business guys crystallise it: "James Kettle breaks the internet. Again."
pocksuppet 11 hours ago [-]
Why the reference to AI? This looks like standard security research.
albinowax_ 9 hours ago [-]
If you follow the link, the presentation abstract should hopefully answer that question!
If that doesn’t help I guess you’ll need to wait for the whitepaper to land but I can assure you I didn’t just do my normal research then add AI to the title for clicks :)
tejusarora 13 hours ago [-]
Woah. Sounds plausible. However, wouldn’t that still be an implicit violation of ZDR since now the response is possibly egressed out of the enterprise network? So if I were working with PHI, the response egress is a potential violation of HIPAA even though claude didn’t retain anything — but the whole
Point was to comply with HIPAA. Thoughts?
rsync 11 hours ago [-]
Actually, it’s not obvious why you’re using a throwaway account…
Every emergent behavior from these actors - whose claim to positive moral values is barely plausible - should be reported, discussed, dissected and critiqued early and often.
zymhan 6 hours ago [-]
>should be reported, discussed, dissected and critiqued early and often.
And why does anonymity detract from any of that?
DANmode 9 hours ago [-]
Your points don’t reference each other.
Yes, the discussion should be had constantly.
But: Should this person potentially have their life messed up because they pointed out the emperor has no clothes?
theplumber 13 hours ago [-]
These companies(at least one of them) seem lead by idiots(Hint:his name is Dario) so I wouldn’t be surprised to have multiple wtf moment if you were to see how they treat our data…Let’s just start pushing for opening up AI models because they are too dangerous behind paid walls. That would be a great regulation.
minhaz23 13 hours ago [-]
Curious why you feel that way about Dario?
politician 13 hours ago [-]
Dario quit OpenAI to hype the AI apocalypse for quick cash and attention. Then, he walked right into an obvious crisis with the Pentagon by continuing to try to play both sides of the AGI doom story that even his own AI would've pointed out. Then, after being labelled a supply chain risk, he starts a new roadshow with the newest most dangerous AI model that definitely cannot be released to the public and its safer little brother Fable. A move that gets both his premier models shut down globally once the same government that labelled them a supply chain risk learns that Fable isn't actually safe from jailbreaks. Just prior to his planned IPO.
Dario might not be a literal idiot, but he might strongly benefit from training a model to do strategic thinking for Anthropic.
throwatdem12311 12 hours ago [-]
All of these things have people frothing at the mouth to give up all their data to Anthropic to use their models and to buy in when the IPO eventually happens.
Seems to me Dario is actually a genius. These are all things that I would to make people believe that my “basically the same as the other guy” product is ackshually best thing ever for real. Trust me bro.
The entire bubble is hype and fear mongering. The technical merits of the products are completely irrelevant at this point. Dario is doing exactly what someone that understands this would do and they are winning.
solenoid0937 13 hours ago [-]
HN thinks the safety crowd is dumb, and has never seriously engaged with the AI safety space.
HN doesn't believe superintelligence will be a thing; while the AI safety crowd believes they are building it. So the decisionmaking of the safety crowd is incomprehensible to HN.
pseudony 13 hours ago [-]
Funny how Dario’s and Sam’s concern for our safety dovetails so nicely with their companies’ strategies.
How fortunate.
Grow up. Whenever push comes to shove, they reduce safety and alignment departments, rush out releases over the heads of the same departments. If you engaged with the news these last years you’d see it for what it is “models for me, but not for thee”.
solenoid0937 12 hours ago [-]
It's clear you haven't engaged with the subject matter beyond the typical "internet-forum cynic" mindset.
Both companies were founded on the basis of AI Safety.
- There are tons of great safety people doing real work at OpenAI. Releases are held back, models are evaluated, etc.
- Anthropic goes even further - constrained themselves with a PBC/LTBT structure, treat safety even more rigorously, and notably delayed the release of Mythos (literally the opposite of what you alleged) and continue to hold their two red lines despite threats from the gov.
You should actually talk to some of the people at these labs. Nearly everyone working at these places genuinely believe AGI/ASI is actually happening, so they do take safety seriously.
To imply these companies don't care about safety is typical internet-brand nihilism/cynicism that helps you feel smart while being literally the opposite of the truth.
cyral 11 hours ago [-]
To add to this, they should look at the Fable system card. It's 317 pages and it's clear how serious they are taking AI safety.
superb_dev 11 hours ago [-]
Page count is not a measure of how seriously something is being taken when you can easily generate pages and pages of slop
solenoid0937 6 hours ago [-]
You'd have a point if the report was pages of slop, but it clearly is not.
SubiculumCode 12 hours ago [-]
There is no reason for you to make personal attacks like that. Not on HN.
Moreover, your take on Dario is over simplistic, and undersells the extent to which Anthropic takes seriously safety. It's not lip service, there are real dollars and attention spent on alignment at Anthropic.
root_axis 2 hours ago [-]
> the AI safety crowd believes they are building it
Stated without a hint of irony.
rvba 7 hours ago [-]
What is the AI safety crowd exactly?
Dont we have a thread here how the model allegedly leaks responses what is "normal" safety? (Not "agi will become skynet" safety - what is mostly a rehash of terminator 2 story)
DrewADesign 13 hours ago [-]
Reductionist. Many of us think they’re all dumb.
dofm 14 hours ago [-]
Just add a line in AGENTS.md that says "never talk about Minecraft unless you're explicitly asked", I'm sure it'll be fine after that.
repeekad 13 hours ago [-]
CLAUDE.md, Anthropic is too exclusive and next level to use a standard idiomatic pattern like AGENTS.md
notnmeyer 13 hours ago [-]
echo “read @AGENTS.md” > CLAUDE.md
folkrav 13 hours ago [-]
When I still used Claude outside of work, my CLAUDE.md was just a symlink to my AGENTS.md.
jasonjmcghee 12 hours ago [-]
Just use a symbolic link
Frost1x 11 hours ago [-]
I noticed you were linking a file vs creating a correct CLAUDE.md implementation. Would you like me to fix that for you?
pertymcpert 11 hours ago [-]
Problem with that is that if the agent starts to browse the contents of the repo, it may read both AGENTS and CLAUDE.md.
dofm 13 hours ago [-]
Yep that should work 100% of the time.
dimava 8 hours ago [-]
Just @AGENTS.md should be enough, as @s is CLAUDE.md are inlined (and !`ls` are executed)
13 hours ago [-]
Tiberium 15 hours ago [-]
Sounds like a hallucination unless proven otherwise, even the leading LLMs can do those from time to time, and they will always appear plausible like that. Also could be the session having a lot previous context, like 800K+, which (I think) makes hallucinations more likely.
Relevant comment from the OP which makes a hallucination more likely:
> There is one tool call result that includes a string that printed a pathname including minecraft.py because it was listing the files in a Python virtual environment and the Pygments package has a lexer called minecraft.py
andy99 14 hours ago [-]
I realize hallucination has no precise definition but this doesn’t sound at all like anything I’ve ever heard called hallucination. Hallucination is usually plausible wrong answers or made up info that ends up fitting the most likely response (like a manufactured citation) and comes from the way LLMs work at predicting tokens. This example demonstrates completely implausible output, it’s not something that fits with hallucination.
All that said, it doesn’t require cross session leakage, it could just be training data or like those nightingale (probably the wrong bird*) data generations where they just prompt an LLM with nothing and it starts spitting out conversations.
I see a bunch of downstream comments about caching, sounds like maybe there’s an error where it loads nothing instead of the cache and so starts spitting out random generations.
* edit: it’s magpie. Worth looking at the concept, I’m not sure people realize they LLMs generate random conversations when prompted with nothing, this seems at least as likely as sessions leaking: https://github.com/magpie-align/magpie
Aurornis 11 hours ago [-]
The word “hallucination” has become overloaded, but it general means an LLM producing some output that isn’t plausible or grounded. When you have a very long context session where the context includes “minecraft.py” it’s not hard to extrapolate that Minecraft may have ended up in one of the reasoning traces and that distraction snowballed until it appeared in the output.
These effects are becoming more rare as the SOTA models are improving so much. If you spent a lot of time with earlier LLMs or you experiment with smaller, quantized local LLM models this type of thing happens very frequently. When you see it happen so much on a model you’re running on your own hardware it becomes a reflex to chuckle and reset the session with a clean context. When it happens from a hosted provider it can be scarier because it’s not the type of failure mode most people are used to seeing.
solenoid0937 13 hours ago [-]
One of his tool results mentioned the word minecraft.py, and the response was about Minecraft.
It's a hallucination.
macNchz 15 hours ago [-]
The person posting this claims to have reproduced in a separate context down the thread:
> Same thing just happened on a Claude Mobile session in same Enterprise account. Common theme in both is Sonnet 5, first response after more than 5 minutes (cache miss).
xyzzy_plugh 15 hours ago [-]
I don't disagree but this sort of thing has to be investigated regardless.
It's unfortunate that there is so little transparency that even if they deny there was a leak we will never know for certain.
alserio 14 hours ago [-]
Why? what does make it more likely?
paulddraper 13 hours ago [-]
Exactly.
If you've never had an LLM (all models) suddenly start spouting nonsense in a completely different language...you haven't been using LLMs that much. They will go absolutely insane some % of the time.
They can “go insane” but it seems often to be infra related as opposed to anything one would consider hallucination. Smaller models will often get stuck repeating a word or phrase forever but that’s a bit different and nobody would call it hallucination.
tadfisher 11 hours ago [-]
When you can reliably prompt these things into insanity, then it's demonstrably not an infrastructure issue.
andy99 11 hours ago [-]
Can you explain that please?
(Not the syllogism, the premise)
shepherdjerred 11 hours ago [-]
I've used LLMs quite a lot (Claude, GPT) and have never seen this behavior. You've got something else going on.
esafak 11 minutes ago [-]
Chinese models will do that.
unknownfuture 11 hours ago [-]
I've used LLMs all day five days a week plus my own free time for the last year or so (new job).
I've seen plenty of hallucinations and context collapse behaviours.
I've never seen that.
ambicapter 11 hours ago [-]
One annoying one is we have an LLM-as-a-judge that is supposed to quote parts of a transcript to justify its reasoning, and sometimes it’ll get stuck on something short like “No.” and then just endlessly repeat it: “4. No. 5. No. […] 728. No. […] 1435. No. …”
prima-facie 13 hours ago [-]
[dead]
jonhohle 13 hours ago [-]
I’ve been seeing this in Gemini in the past few days. Often during a prompt with a reasonably large input set, I’ll get answers that appear to belong to someone else. It may be trigger hallucination, but it seems like it may be cache collisions or something else. I’ve not seen anything to suggest private information is leaking, but it’s disconcerting to be researching something and then get what appears to be a math tutoring response.
weitendorf 12 hours ago [-]
I’ve also had problems with Gemini when accessed through their UI in the past few weeks. That’s concerning that you are also seeing it several days later in a different context.
I wonder if there could be a large security situation playing out behind the scenes right now.
I’ve been working on using AI to assist me in writing meta parsing grammars. Fortunately I have not launched most of them yet. I know for a fact that the next generation of models represent a major step change in basic vulnerability identification and exploitation, especially if you know where to point them. They’ve found several bugs and at least one exploit in my parsing tools so far, I can’t imagine how many there still are waiting to be discovered across the entire modern tech ecosystem.
malfist 13 hours ago [-]
My whole company is doing mid year reviews and Gemini is the only allowed tool and its been flumoxing people with seemingly random unrelated responses. Often in different languages.
That is when it bothers to respond instead of just sending back an 1099 error code
DANmode 9 hours ago [-]
This is a HUGE clue that someone at Google should probably see…………
trq_ 11 hours ago [-]
Hi, it's Thariq from the Claude Code Team here.
Thanks for the detailed report. We’re confident this is a hallucination but of course take these reports seriously and the team is looking into it. We’ll report back if anything turns up.
11 hours ago [-]
jdw64 11 hours ago [-]
I know it's the weekend, so thanks for working hard. Just a suggestion from a user: I wish we could manage Claude Code's memory more easily. Right now, when I go into the .claude folder and change a project folder name or something, sometimes it can't pull up the memory properly. It'd be nice if there were an easier way to import or export it. Thanks!
MuffinFlavored 9 hours ago [-]
Piggybacking onto a second/different "just a suggestion from a user":
The VS Code extension needs love. I'm sure you guys are aware but it feels like it is neglected.
GitHub Issues is a graveyard of 3-10 duplicates of really important issues with no activity getting closed. A few examples:
* Lots of /commands missing in between the CLI harness and the VS Code extension
* No way to monitor subagents/tasks/progress visually
* No status bar/line
bix6 15 hours ago [-]
So the options are this amazing tech is so stupid it just randomly brings up Minecraft or it’s got a major security issue?
Aurornis 11 hours ago [-]
The person had “minecraft.py” in their context and the session context was very long.
Having an LLM session with very long context occasionally go off on a tangent is not uncommon. The people who expect absolute perfection out of every LLM interaction see this as some total indictment of the entire technology, but the people who use these tools daily have learned to treat the output as partially stochastic and to avoid extremely long context, even if the model offers it. It’s best to compact strategically or summarize next steps to hand off to a new session. Using sub-sessions can also reduce context pollution at the cost of additional token expenditure to summarize and transfer data to and from the sub-session.
ShinyLeftPad 6 hours ago [-]
TLDR: the first one
> this amazing tech is so stupid it just randomly brings up Minecraft or it’s got a major security issue
You can sugarcoat it but that's what it is. It's not slightly wrong like a junior engineer or weird like a junior engineer on LSD, it becomes like "your junior engineer suffered a stroke or sudden onset dementia completely forgetting the entire point". one trigger word and that's it we're building Minecraft castles now.
bee_rider 13 hours ago [-]
It’s the weekend so we’re allowed to anthropomorphize.
I’ve known some brilliant engineers who would also just randomly bring up Minecraft (more likely Factorio these days) so this makes sense.
27183 15 hours ago [-]
¿Por qué no los dos?
b0ner_t0ner 5 hours ago [-]
What does a self-driving car do when it hallucinates?
6 hours ago [-]
paulddraper 13 hours ago [-]
Not that different than people, amiright?
---
Note that the author did have a minecraft.py file. So not quite 100% random.
mwnn 12 hours ago [-]
I am facing a billing/subscription problem and there's nothing I can do or get help on. Their chatbot support shuts me down. Their email is also handled by the chatbot (not even sure whether it's the "same chatbot"). It has been a dead-end. I contacted my bank (credit card issuer) and finally a staffed said I am better off just marking the card lost and having it reissued and that's what I did in the end. I hope that works.
I've never understood in what world this world decided it was okay to hand over these much unchecked power to such corporations. But this is how it has always been one way or the other.
Avicebron 15 hours ago [-]
In order Fable 5 has rejected:
"Recipe for red-braised pork, I have pork shoulder"
"Write up a framework for MCP patterns I can give to claude code"
"explain the biomechanics of motion in c. elegans" (I get this one, I mostly did it to test and it's related to my hobby project)
Do we get an extra day of functional Fable 5 because it's down?
andy99 14 hours ago [-]
Not sure the relevance of this comment, but normally if someone built a classifier that bad they’d be fired. Anthropic obviously thinks they have some monopoly power they can use to foist garbage on consumers, I think they don’t.
gojomo 13 hours ago [-]
If people are complaining about Anthropic (on an only-vaguely related thread) rather than simply switching to a suitable competitor, then Anthropic clearly has some 'monopoly' power over the specific capabilities the complainer wants from them.
leoqa 13 hours ago [-]
Fable/Opus 4.8 outperform Codex 5.5 for me at the general architecture/refactoring/performance work I’m doing, to the point where it’s not worth using Codex. Codex will often spit out non idiomatic code that overcomplicates things.
andy99 13 hours ago [-]
Not to argue the point but that statement isn’t logical, look at all the complaints about restaurants. Publicly complaining about something doesn’t require it be a monopoly.
wongarsu 12 hours ago [-]
The consequence of a too strict classifier are annoyed customers who will spend less on Fable. The consequence of a too lax classifier are export restrictions that prevent a huge chunk of their customers from using Fable
I'm annoyed but not surprised at the overeager classification
HumanOstrich 14 hours ago [-]
What does this have to do with anything? Who are you talking to? This is Hacker News, not Anthropic support.
nkrisc 6 hours ago [-]
I think they’re just posing the question to other users reading these comments.
asveikau 14 hours ago [-]
HN becoming anthropic support would certainly explain a lot of threads and comments I've seen here lately. Thank you for this.
slashdave 12 hours ago [-]
I'm impressed that folks are using this frontier model for cooking
nijave 14 hours ago [-]
The safety filter rejected or the model was down?
stavros 13 hours ago [-]
I asked it how people get blue eyes from their parents and it downgraded me to Opus because of safety.
andy99 14 hours ago [-]
Interesting to see the claudeslop reply as the first comment to the gh post and the reaction to it.
_def 14 hours ago [-]
Reminds me of a session I had recently (on web!) where claude insisted that i prefixed all my messages with statements about code execution or something, which was not the case. I interrogated it about that and it confirmed that it came from somewhere else, but could not get rid of it and each response mentioned that its gonna ignore those instructions. Eerie.
andy99 13 hours ago [-]
Anthropic injects text into the conversation triggered by certain conversation topics. This happened to me in relation to some red-teaming related discussion that was adjacent to something “sensitive”, I think sex, and Claude got confused about why I had said some kind of warning and mentioned it it’s response. After a back and forth it was clear that some extra warning to answer but avoid anything inappropriate had been inserted into the conversation.
wongarsu 12 hours ago [-]
Claude also sometimes mentions getting messages from classifiers, probably related to auto mode. Amusingly enough, when this happens to a subagent/fork, the orchestrator will call these " hallucinations by the subagent"
12 hours ago [-]
jstummbillig 15 hours ago [-]
Is there anything particular about LLMs that would make separating customer data harder than in all SaaS cases?
bri3d 13 hours ago [-]
Yes:
* There's an enormous amount of very expensive shared state (context cache) which you do not want to duplicate when you can avoid it.
* Memory locality is crucially important for performance.
* Hardware is extremely over-subscribed.
* Hardware is extremely expensive.
These factors all make hardware or even traditional memory-space (hypervisor/VM/hardware assisted virtualization) isolation a non-starter for most workloads and customers, which forces all isolation to the software layer. This already makes things way harder than they are in commodity SaaS.
Moving beyond that, the tools, frameworks, and hardware which the system runs on (GPU) wasn't designed for task isolation and building this isolation is even moreso an emergent research field than it is in x86 CPU hardware-sharing (which has required a huge amount of effort over the past 30+ years to get where we are today).
And, the ratio of usage/sensitivity to maturity is also just poor overall; these are young companies with rapid development and enormous delivery pressure under incredible customer workload requirements, too.
I can't tell if the original post is a real issue or not, but I'm surprised there aren't more like this overall; the whole thing really is a house of cards in this sense.
jstummbillig 13 hours ago [-]
> which forces all isolation to the software layer. This already makes things way harder than they are in commodity SaaS.
Is this not what happens in most SaaS? Isolation at the software layer? I understand there are special agreements, but they seem to be mostly that – no?
> the ratio of usage/sensitivity to maturity is also just poor overall; these are young companies with rapid development and enormous delivery pressure under incredible customer workload requirements, too.
Mh. The talent density in these companies is apparently quite exceptional. Things like customer data separation is something that is obvious and top of mind. I don't see why they would not hire the best to implement these relatively boring/solved things correctly at an architectural level.
bri3d 12 hours ago [-]
> Is this not what happens in most SaaS?
I think it's fairly popular to try to do more logical isolation in SaaS now, especially with VM-scheduling-as-a-service becoming more popular. For example, I did security architecture at a company who did relatively simple financial processing; we worked to move to a model where customer documents were encrypted using a tenant key which we'd then wrap in both a service key and a login key; users could only get the login key stapled to their session by authenticating against that account, and the processing jobs ran on a cloud vendor's logical isolation. So the user needed a login key, the service needed the attested service key, and the job ran in what amounted to a mini-VM, avoiding issues like "whoops we sent the wrong document ID and the backend gave it back to us" or "whoops, we routed the request to the wrong tenant backend!" This level of isolation would be really hard to achieve in an LLM vendor context.
> I don't see why they would not hire the best to implement these relatively boring/solved things correctly at an architectural level.
I think a lot of these things develop over time; obviously hiring people who have done them before helps, but it's hard. Even the people with strong experience often only know little slices. And unfortunately, every system operating at these scales has emergent behavior which can become really challenging at scale; mistakes like "we used hash(id) as a key in a memory cache without a collision list, and it collided" which would simply never affect most startups become more and more frequent at scale. High rate of change makes it hard to suss these mistakes out and root-cause them, too; "a customer gave us a log where we swapped X and Y" is hard to bisect when you're doing 500 code deploys a day.
adam_arthur 14 hours ago [-]
Vibe-coding the implementation.
I haven't had much issue with Codex, but seems Claude Code has major issues being reported nearly on the daily.
They also happen to be the most boastful about not reading or looking at the code.
LLMs are very capable, but not nearly to the level they seem to be messaging.
(We've actually moved on from vibe-coding to having the LLM vibe code itself in a loop)
27183 14 hours ago [-]
> having the LLM vibe code itself in a loop
The businesslatin name for this is Recursive Self-Improvement
rabbidruster 14 hours ago [-]
Interestingly I had an almost identical experience to this report in codex. It output a user memory file that looked awfully real and wasn't at all related to my work.
27183 15 hours ago [-]
If I had to hazard a guess, doing anything in a multi-tenant way on a GPU is going to be hard mode compared to most SaaS due to lack of memory safe tooling. I've built multi-tenant SaaS systems, and I've done a little GPU programming (a long time ago), but I've never tried to combine the two disciplines.
woadwarrior01 15 hours ago [-]
It'd be terribly compute inefficient to not share prefix caches (KV cache) across customers.
acepl 14 hours ago [-]
What is the probability that two customers will have exactly the same tokens in cache? Wouldnt it require using the exact same CLAUDE.md, skills, MCPs and context? After that it is even worse since the nondeterminism of LLMs and humans
27183 14 hours ago [-]
I suspect what GP is getting at is there will be a strong incentive to implement some structural sharing across tenants to avoid redundantly storing the same tokens over and over. At least I'd be tempted to do this if I was working with a very precious, constrained resource (e.g. VRAM). Doing this correctly seems.. very difficult. [edit] To answer your question directly: the probability that the entire cache is identical between two different users is very low, but the probability that there exists identical chunks of cache between two different users is very high. Exploiting those commonalities successfully will significantly compress the data.
weitendorf 12 hours ago [-]
Agree with this and I have been thinking about it recently as well. I think you could implement a cord-like vocabulary to identify large duplicated substrings for exact deduplication and pairwise correlations or vocabulary profiles/small classifiers for forward-looking or speculative deduplications. A clear example is the GPL license, it’s a large substring you might encounter often and highly likely to be accompanied by lots of c code.
This is probably something that you’d be doing on the CPU though before sending anything to the GPU, though that’s definitely the sensitive surface since it’s hardware without good multitenancy. I assume the interface between the CPU and GPU is where you would be most likely to make a mistake where you start decoding data from one fd that was meant for another, or from the wrong position, and get someone else’s data.
I wouldn’t be confident that these are active exploits from deliberately abusing kv cache optimizations though, possibly just the kind of bugs you get from active low level performance tuning/systems work. Since this is something I have seen across providers lately I personally suspect it to be a driver issue.
27183 4 hours ago [-]
Given the size of the datacenter class GPUs they're running these models on, don't they need to be processing multiple tenants concurrently per GPU to extract the full potential of the hardware?
I agree, shuffling the data between the CPU and GPU is itself fraught with peril. It's all the hairiest distributed systems problems combined with the sketchiest memory safety issues all in one place.
dezgeg 14 hours ago [-]
System prompt for something like Claude Code should be identical, no?
cmrdporcupine 12 hours ago [-]
Could just be a bug in the radix tree for the KVCache with deeper, wrong, levels of the trie returning for the same initial prefix match.
acepl 15 hours ago [-]
Oh yes, we do not need programmers any more…
kylehotchkiss 15 hours ago [-]
50% unemployment :D
JohnMakin 14 hours ago [-]
it’s the wet dream of execs and pm types. however, i have not seen anything close to it in my life. I remember the UML days, lol. the issue is not the code, it’s the translation layer between business and code. maybe someday ai bridges that gap. history has shown probably not
emehex 15 hours ago [-]
"Coding is largely solved"
supriyo-biswas 14 hours ago [-]
The funny thing is at my current employer, they mentioned that "coding is increasingly becoming a solved problem" and in the same breath, mentioned that one project was too hard for anyone to do so they're not doing it and would rather sell existing features...
throwatdem12311 12 hours ago [-]
Weird. Coding isn’t really “solved” - because “coding” isn’t just the process of typing in characters as fast as possible - BUT the skill floor has been massively lowered while also raising the skill ceiling considerably.
We’re doing projects now that seemed impossible before because we have access to these powerful AI models. They can make things that would have taken weeks or months take days now, freeing up time for even more ambitious buildouts we never would’ve even considered before.
consp 15 hours ago [-]
While abused by LLM vendors, that phrase in one form or another I've been hearing since the early '00s and it's likely way older.
ethagnawl 15 hours ago [-]
Sure but have you ever seen it actually play out in practice like it currently is? Whether or not it's true (of course it's not) people are currently behaving as if it is and firing/hiring accordingly.
philipov 14 hours ago [-]
Well, when was the last time you wrote machine code by hand?
... but then they went and changed what coding meant.
We've always been layering abstractions on top of abstractions. If we get to an abstraction that works well enough that you no longer have to dive down into the previous layer, we say we've solved coding, and change what coding means. Obviously LLMs aren't there yet.
techpression 15 hours ago [-]
I love that quote, especially considering the insane amount of bugs that are produced.
It’s as easy to debunk as someone claiming ”I can jump to the moon”.
CamperBob2 13 hours ago [-]
"This thing isn't 100% perfect, contrary to what absolutely no one anywhere said at any time"
Could still be a hallucination, but the concerning part is that from the outside a hallucination, local context bleed, and an infra/routing bug can be very hard to distinguish.
solenoid0937 13 hours ago [-]
> one tool call result that includes a string that printed a pathname including minecraft.py
This seems like a hallucination.
nullbio 12 hours ago [-]
Don't worry guys, Anthropic are the experts at security and no one else should have access to bug fixing LLMs because that would be dangerous.
14 hours ago [-]
15 hours ago [-]
codeduck 10 hours ago [-]
As a Butlerian, this is hilarious.
Trasmatta 13 hours ago [-]
The first reply clearly being a copy and paste from Claude made me want to vomit
If people absolutely need to use AI to write replies, they NEED to start including a "everything after this was generated by AI" disclaimer
Kapura 14 hours ago [-]
happy fourth of july everybody!
ofjcihen 14 hours ago [-]
Happy fourth to you too :)
ai_fry_ur_brain 14 hours ago [-]
Openrouters model providers give me urls people have given them quite frequently.
ShinyLeftPad 6 hours ago [-]
Can we acknowledge how it is sad that people get LLMs to basically play computer games for them. What's the point of fun?
ryantsuji 14 hours ago [-]
Note the repro condition: first response after 5+ min, i.e. a cache miss. A cache leak would show up on hits (someone else's cached prefix), not on misses where everything is recomputed from your own tokens.
bfeynman 14 hours ago [-]
fwiw, this could be a bug but the submitters level of arrogance places this rather high on the dunning-kruger side of things. There are multiple other plausible explanations, but this person is probably vibe coder who believes anything an llm says (including explaining its own hallucinations)
dainiusse 14 hours ago [-]
Don't worry. Mythos will fix that before release. Oh, wait...
jdw64 11 hours ago [-]
The biggest problem with AI agents is this. You can't debug what the AI is doing, so it's really hard to track down where something went wrong.
What I know for sure:
1.Stuff that has nothing to do with the current session got mixed in.
What guessing:
1.There's a minecraft.py file in the tool folder, and that might have triggered some hallucination.
2.Maybe data from some other project on the user's local machine got mixed in somehow.
3.Or it could be from another user's conversation.
Honestly, if I think about how the system actually works, I don't think it's pulling from another user's data. But other people say they've had issues like that, so I can't completely rule it out.
I saw this thing on YouTube once. When a bunch of users share the same system prompt, or prefix, the computation results get shared through something called a KV Cache. At least, that's what I understood. Not sure if I got it right. But if there's some bug in the hashmap that's supposed to keep those caches separate, then maybe multi-tenant memory management just broke down and that's what caused this. I mean, I can guess, but who knows. And honestly, even if that's exactly what happened, they'd never admit it.
At the end of the day, LLMs are just word predictors, right? They build up some kind of semantic space inside. So maybe the user's question just happened to be near Minecraft in that space. That's kind of what I think.
impartshadow 7 hours ago [-]
[flagged]
shard972 3 hours ago [-]
[dead]
noperator 13 hours ago [-]
[dead]
TZubiri 14 hours ago [-]
0 evidence. If this were a real privacy leak, the author would ask their coworker if they talked about the unexpected topic instead of
>"Maybe my coworker was talking about this in another session?"
This would be a critical bug that would slash the market value of a T$ company significantly, go ask your coworker or close the ticket, why do you expect the devs to put an enormous amount of effort hunting a potentially inexistent if you can't make that minuscule debugging effort.
ec109685 15 hours ago [-]
Caching doesn’t work the way the bug reporter implies. Caches are shared (at least across the enterprise), but its key is always a function of the input before it.
We achieved significant savings simply by moving everything that varies across individuals out of the system prompt so every session starts from a cache point.
For example you never want your system prompt to start with the time that the session started. Move that to the first user message if needed.
macNchz 15 hours ago [-]
Caching is not supposed to work like that, but that doesn’t preclude the cache key computation function from having bugs.
marginalia_nu 15 hours ago [-]
Yeah there's quite a lot of potential bugs that could have this shape. If I were to guess it could be a buffer in a buffer pool not being sized and zeroed correctly, allowing stale data to bleed between sessions.
nok22kon 14 hours ago [-]
or the cache retrieval function for a key retrieving the wrong entry
Waterluvian 14 hours ago [-]
There is a massive incentive for optimization, so I expect they’re doing a ton of very clever tricks, all of which make this kind of bug more likely.
estebarb 14 hours ago [-]
Hash functions necesarily have collisions. Also, it is perfectly possible to introduce bugs in the hash function (hash inputs, hash function itself) that allows cross account contamination.
margalabargala 13 hours ago [-]
Hash functions necessarily have collisions, but it's perfectly possible to make the expected time between collisions greater than the human lifespan.
supriyo-biswas 15 hours ago [-]
There could just also be a bug where the output tokens of session 1 were shared with session 2, due to a race condition or similar.
mplappert 14 hours ago [-]
Seems like a hallucination to me; note that the context contains “unmarkBlock” as the function name, which invites a connection to Minecraft. Still shouldn’t happen of course.
The alternative explanation is that the inference engine, which batches several unrelated requests for parallel processing, messed up the unpacking and returned an unrelated user’s query. This one would be very scary as it will leak arbitrary content, but it seems much less likely here.
Rendered at 05:50:07 GMT+0000 (Coordinated Universal Time) with Vercel.
One gave us a proper postmortem in which their API gateway was incorrectly handling HTTP 100 status codes, putting them into an error state where there was effectively an off by one error - you would receive the response to the prompt that came in before yours and would pay it forward (your response would go to the next caller).
The other instance never had root cause explained to us, and we were just told to trust it wouldn’t happen again.
Both of these are from $1T+ companies.
ZDR wasn’t compromised in these cases since it was responses being swapped in flight. I wouldn’t be surprised if this is a similar issue - it’s not that data is being retained, it’s just not being safely isolated in intermediate infrastructure.
Every time you multiplex requests from multiple clients onto one upstream connection, you are probably vulnerable to this, because (despite its superficial simplicity) HTTP is just too complex to reliably match the requests and responses to upstream.
For example a desync can be triggered in some systems by having more than one Content-Length header, by mixing Content-Length with chunked encoding, or by passing an HTTP/2 header called Content-Length that doesn't match the actual content length.
Here's a DEF CON talk (6 years ago) on this topic: https://www.youtube.com/watch?v=w-eJM2Pc0KI
The same attack has been applied to SMTP by messing up the line endings surrounding the end-of-message delimiter, where it's called SMTP smuggling. It may also apply to other protocols.
https://portswigger.net/research/talks?talkId=36
Maybe my last presentation on the topic! Possibly.
If that doesn’t help I guess you’ll need to wait for the whitepaper to land but I can assure you I didn’t just do my normal research then add AI to the title for clicks :)
Every emergent behavior from these actors - whose claim to positive moral values is barely plausible - should be reported, discussed, dissected and critiqued early and often.
And why does anonymity detract from any of that?
Yes, the discussion should be had constantly.
But: Should this person potentially have their life messed up because they pointed out the emperor has no clothes?
Dario might not be a literal idiot, but he might strongly benefit from training a model to do strategic thinking for Anthropic.
Seems to me Dario is actually a genius. These are all things that I would to make people believe that my “basically the same as the other guy” product is ackshually best thing ever for real. Trust me bro.
The entire bubble is hype and fear mongering. The technical merits of the products are completely irrelevant at this point. Dario is doing exactly what someone that understands this would do and they are winning.
HN doesn't believe superintelligence will be a thing; while the AI safety crowd believes they are building it. So the decisionmaking of the safety crowd is incomprehensible to HN.
Grow up. Whenever push comes to shove, they reduce safety and alignment departments, rush out releases over the heads of the same departments. If you engaged with the news these last years you’d see it for what it is “models for me, but not for thee”.
Both companies were founded on the basis of AI Safety.
- There are tons of great safety people doing real work at OpenAI. Releases are held back, models are evaluated, etc.
- Anthropic goes even further - constrained themselves with a PBC/LTBT structure, treat safety even more rigorously, and notably delayed the release of Mythos (literally the opposite of what you alleged) and continue to hold their two red lines despite threats from the gov.
You should actually talk to some of the people at these labs. Nearly everyone working at these places genuinely believe AGI/ASI is actually happening, so they do take safety seriously.
To imply these companies don't care about safety is typical internet-brand nihilism/cynicism that helps you feel smart while being literally the opposite of the truth.
Moreover, your take on Dario is over simplistic, and undersells the extent to which Anthropic takes seriously safety. It's not lip service, there are real dollars and attention spent on alignment at Anthropic.
Stated without a hint of irony.
Dont we have a thread here how the model allegedly leaks responses what is "normal" safety? (Not "agi will become skynet" safety - what is mostly a rehash of terminator 2 story)
Relevant comment from the OP which makes a hallucination more likely:
> There is one tool call result that includes a string that printed a pathname including minecraft.py because it was listing the files in a Python virtual environment and the Pygments package has a lexer called minecraft.py
All that said, it doesn’t require cross session leakage, it could just be training data or like those nightingale (probably the wrong bird*) data generations where they just prompt an LLM with nothing and it starts spitting out conversations.
I see a bunch of downstream comments about caching, sounds like maybe there’s an error where it loads nothing instead of the cache and so starts spitting out random generations.
* edit: it’s magpie. Worth looking at the concept, I’m not sure people realize they LLMs generate random conversations when prompted with nothing, this seems at least as likely as sessions leaking: https://github.com/magpie-align/magpie
These effects are becoming more rare as the SOTA models are improving so much. If you spent a lot of time with earlier LLMs or you experiment with smaller, quantized local LLM models this type of thing happens very frequently. When you see it happen so much on a model you’re running on your own hardware it becomes a reflex to chuckle and reset the session with a clean context. When it happens from a hosted provider it can be scarier because it’s not the type of failure mode most people are used to seeing.
It's a hallucination.
> Same thing just happened on a Claude Mobile session in same Enterprise account. Common theme in both is Sonnet 5, first response after more than 5 minutes (cache miss).
It's unfortunate that there is so little transparency that even if they deny there was a leak we will never know for certain.
If you've never had an LLM (all models) suddenly start spouting nonsense in a completely different language...you haven't been using LLMs that much. They will go absolutely insane some % of the time.
They can “go insane” but it seems often to be infra related as opposed to anything one would consider hallucination. Smaller models will often get stuck repeating a word or phrase forever but that’s a bit different and nobody would call it hallucination.
(Not the syllogism, the premise)
I've seen plenty of hallucinations and context collapse behaviours.
I've never seen that.
I wonder if there could be a large security situation playing out behind the scenes right now.
I’ve been working on using AI to assist me in writing meta parsing grammars. Fortunately I have not launched most of them yet. I know for a fact that the next generation of models represent a major step change in basic vulnerability identification and exploitation, especially if you know where to point them. They’ve found several bugs and at least one exploit in my parsing tools so far, I can’t imagine how many there still are waiting to be discovered across the entire modern tech ecosystem.
That is when it bothers to respond instead of just sending back an 1099 error code
Thanks for the detailed report. We’re confident this is a hallucination but of course take these reports seriously and the team is looking into it. We’ll report back if anything turns up.
The VS Code extension needs love. I'm sure you guys are aware but it feels like it is neglected.
GitHub Issues is a graveyard of 3-10 duplicates of really important issues with no activity getting closed. A few examples:
* Lots of /commands missing in between the CLI harness and the VS Code extension
* No way to monitor subagents/tasks/progress visually
* No status bar/line
Having an LLM session with very long context occasionally go off on a tangent is not uncommon. The people who expect absolute perfection out of every LLM interaction see this as some total indictment of the entire technology, but the people who use these tools daily have learned to treat the output as partially stochastic and to avoid extremely long context, even if the model offers it. It’s best to compact strategically or summarize next steps to hand off to a new session. Using sub-sessions can also reduce context pollution at the cost of additional token expenditure to summarize and transfer data to and from the sub-session.
> this amazing tech is so stupid it just randomly brings up Minecraft or it’s got a major security issue
You can sugarcoat it but that's what it is. It's not slightly wrong like a junior engineer or weird like a junior engineer on LSD, it becomes like "your junior engineer suffered a stroke or sudden onset dementia completely forgetting the entire point". one trigger word and that's it we're building Minecraft castles now.
I’ve known some brilliant engineers who would also just randomly bring up Minecraft (more likely Factorio these days) so this makes sense.
---
Note that the author did have a minecraft.py file. So not quite 100% random.
I've never understood in what world this world decided it was okay to hand over these much unchecked power to such corporations. But this is how it has always been one way or the other.
"Recipe for red-braised pork, I have pork shoulder"
"Write up a framework for MCP patterns I can give to claude code"
"explain the biomechanics of motion in c. elegans" (I get this one, I mostly did it to test and it's related to my hobby project)
Do we get an extra day of functional Fable 5 because it's down?
I'm annoyed but not surprised at the overeager classification
* There's an enormous amount of very expensive shared state (context cache) which you do not want to duplicate when you can avoid it.
* Memory locality is crucially important for performance.
* Hardware is extremely over-subscribed.
* Hardware is extremely expensive.
These factors all make hardware or even traditional memory-space (hypervisor/VM/hardware assisted virtualization) isolation a non-starter for most workloads and customers, which forces all isolation to the software layer. This already makes things way harder than they are in commodity SaaS.
Moving beyond that, the tools, frameworks, and hardware which the system runs on (GPU) wasn't designed for task isolation and building this isolation is even moreso an emergent research field than it is in x86 CPU hardware-sharing (which has required a huge amount of effort over the past 30+ years to get where we are today).
And, the ratio of usage/sensitivity to maturity is also just poor overall; these are young companies with rapid development and enormous delivery pressure under incredible customer workload requirements, too.
I can't tell if the original post is a real issue or not, but I'm surprised there aren't more like this overall; the whole thing really is a house of cards in this sense.
Is this not what happens in most SaaS? Isolation at the software layer? I understand there are special agreements, but they seem to be mostly that – no?
> the ratio of usage/sensitivity to maturity is also just poor overall; these are young companies with rapid development and enormous delivery pressure under incredible customer workload requirements, too.
Mh. The talent density in these companies is apparently quite exceptional. Things like customer data separation is something that is obvious and top of mind. I don't see why they would not hire the best to implement these relatively boring/solved things correctly at an architectural level.
I think it's fairly popular to try to do more logical isolation in SaaS now, especially with VM-scheduling-as-a-service becoming more popular. For example, I did security architecture at a company who did relatively simple financial processing; we worked to move to a model where customer documents were encrypted using a tenant key which we'd then wrap in both a service key and a login key; users could only get the login key stapled to their session by authenticating against that account, and the processing jobs ran on a cloud vendor's logical isolation. So the user needed a login key, the service needed the attested service key, and the job ran in what amounted to a mini-VM, avoiding issues like "whoops we sent the wrong document ID and the backend gave it back to us" or "whoops, we routed the request to the wrong tenant backend!" This level of isolation would be really hard to achieve in an LLM vendor context.
> I don't see why they would not hire the best to implement these relatively boring/solved things correctly at an architectural level.
I think a lot of these things develop over time; obviously hiring people who have done them before helps, but it's hard. Even the people with strong experience often only know little slices. And unfortunately, every system operating at these scales has emergent behavior which can become really challenging at scale; mistakes like "we used hash(id) as a key in a memory cache without a collision list, and it collided" which would simply never affect most startups become more and more frequent at scale. High rate of change makes it hard to suss these mistakes out and root-cause them, too; "a customer gave us a log where we swapped X and Y" is hard to bisect when you're doing 500 code deploys a day.
I haven't had much issue with Codex, but seems Claude Code has major issues being reported nearly on the daily.
They also happen to be the most boastful about not reading or looking at the code.
LLMs are very capable, but not nearly to the level they seem to be messaging.
(We've actually moved on from vibe-coding to having the LLM vibe code itself in a loop)
The businesslatin name for this is Recursive Self-Improvement
This is probably something that you’d be doing on the CPU though before sending anything to the GPU, though that’s definitely the sensitive surface since it’s hardware without good multitenancy. I assume the interface between the CPU and GPU is where you would be most likely to make a mistake where you start decoding data from one fd that was meant for another, or from the wrong position, and get someone else’s data.
I wouldn’t be confident that these are active exploits from deliberately abusing kv cache optimizations though, possibly just the kind of bugs you get from active low level performance tuning/systems work. Since this is something I have seen across providers lately I personally suspect it to be a driver issue.
I agree, shuffling the data between the CPU and GPU is itself fraught with peril. It's all the hairiest distributed systems problems combined with the sketchiest memory safety issues all in one place.
We’re doing projects now that seemed impossible before because we have access to these powerful AI models. They can make things that would have taken weeks or months take days now, freeing up time for even more ambitious buildouts we never would’ve even considered before.
... but then they went and changed what coding meant.
We've always been layering abstractions on top of abstractions. If we get to an abstraction that works well enough that you no longer have to dive down into the previous layer, we say we've solved coding, and change what coding means. Obviously LLMs aren't there yet.
This seems like a hallucination.
If people absolutely need to use AI to write replies, they NEED to start including a "everything after this was generated by AI" disclaimer
What I know for sure:
1.Stuff that has nothing to do with the current session got mixed in.
What guessing:
1.There's a minecraft.py file in the tool folder, and that might have triggered some hallucination.
2.Maybe data from some other project on the user's local machine got mixed in somehow.
3.Or it could be from another user's conversation.
Honestly, if I think about how the system actually works, I don't think it's pulling from another user's data. But other people say they've had issues like that, so I can't completely rule it out.
I saw this thing on YouTube once. When a bunch of users share the same system prompt, or prefix, the computation results get shared through something called a KV Cache. At least, that's what I understood. Not sure if I got it right. But if there's some bug in the hashmap that's supposed to keep those caches separate, then maybe multi-tenant memory management just broke down and that's what caused this. I mean, I can guess, but who knows. And honestly, even if that's exactly what happened, they'd never admit it.
At the end of the day, LLMs are just word predictors, right? They build up some kind of semantic space inside. So maybe the user's question just happened to be near Minecraft in that space. That's kind of what I think.
>"Maybe my coworker was talking about this in another session?"
This would be a critical bug that would slash the market value of a T$ company significantly, go ask your coworker or close the ticket, why do you expect the devs to put an enormous amount of effort hunting a potentially inexistent if you can't make that minuscule debugging effort.
We achieved significant savings simply by moving everything that varies across individuals out of the system prompt so every session starts from a cache point.
For example you never want your system prompt to start with the time that the session started. Move that to the first user message if needed.
The alternative explanation is that the inference engine, which batches several unrelated requests for parallel processing, messed up the unpacking and returned an unrelated user’s query. This one would be very scary as it will leak arbitrary content, but it seems much less likely here.