Taking the question of whether this would be a useful addition to Node.js core or aside, it must be noted that this 19k LoC PR was mostly generated by Claude Code and manually reviewed by the submitter which in my opinion is against the spirit of the project and directly violates the terms of Developer's Certificate of Origin set in the project's CONTRIBUTING.md
Worth noting that mcollina is a member of the Node.js Technical Steering Committee
kartaka83838 22 hours ago [-]
yes this.
if there's anyone i would trust in exploring these avenues, it's him and the maintainers doing god's work in the nodejs repo in these past few years.
everlier 1 days ago [-]
We call it a slip slop at work, it's ok to slip some slop if it's "our" slop :-)
giancarlostoro 1 days ago [-]
> I pointed the AI at the tedious parts, the stuff that makes a 14k-line PR possible but no human wants to hand-write: implementing every fs method variant (sync, callback, promises), wiring up test coverage, and generating docs.
Is it slop if it is carefully calculated? I tire of hearing people use slop to mean anything AI, even when it is carefully reviewed.
grey-area 24 hours ago [-]
Was 14k lines carefully reviewed? Seems unlikely.
joshkel 22 hours ago [-]
Considering the many hundreds of technical comments over at the PR (https://github.com/nodejs/node/pull/61478), the 8 reviewers thanked by name in the article, and the stellar reputations of those involved, seems likely.
grey-area 22 hours ago [-]
My mistake 19k lines. At 2 mins per line that’s (19000*2)/60/7=90 7-hour days to review it all, are you sure it was all read? I mean they couldn’t be bothered to write it, so what are the chances they read it all?
For someone’s website or one business maybe the risk is worth it, for a widely used software project that many others build on it is horrifying to see that much plausible code generated by an LLM.
pull_my_finger 20 hours ago [-]
When you review code, do you spend 2 minutes per line? That seems like a huge exaggeration of effort required
grey-area 14 hours ago [-]
Depends - if it is from a human I find I can trust it a lot more. If it is large blobs from LLMs I find it takes more effort. But it was just a guess at an average to give an estimate of the effort required. I’d hope they spent more than 2 mins on some more complex bits.
Are you genuinely confident in a framework project that lands 19kloc generated PRs in one go? I’d worry about hidden security footguns if nothing else and a lot of people use this for their apps. Thankfully I don't use it, but if I did I'd find this really troubling.
It also has security implications - if this is normalised in node.js it would be very easy to slip in deniable exploits into large prs. It is IMO almost impossible to properly review a PR that big for security and correctness.
seattle_spring 19 hours ago [-]
I probably review about 1k LoC worth of PRs / day from my coworkers. It certainly doesn't take me 33 hours (!!) to do so, so I must be one of those rockstar 10x superhero ninja engineers I keep hearing about.
dirkc 13 hours ago [-]
Are your coworkers producing the code using LLMs? And what level of trust do you place in them?
ThunderSizzle 10 hours ago [-]
For half my coworkers, their LLM code is better than their code.
weird-eye-issue 19 hours ago [-]
> I mean they couldn’t be bothered to write it, so what are the chances they read it all?
What kind of logic is this?
grey-area 14 hours ago [-]
It’s much harder to read code carefully than to write it. Particularly code generated by LLMs which is mostly correct but then sometimes awful.
pas 7 hours ago [-]
usually yes, but that's why there are tests, and there's a long road before people start depending on this code (if ever). people will try it, test it, report bugs, etc.
and it's not like super carefully written code is magically perfect. we know that djb can release things that are close to that, but almost nobody is like him at all!
ovflowd 6 hours ago [-]
The PR has been open for 3 months, and all the reviewers involved have actually read the whole code and are experts on the matter.
keeganpoppen 22 hours ago [-]
[flagged]
vinnymac 18 hours ago [-]
I carefully review far more than 14k LoC a week… I’m sure many here do. Certainly the language you write in will greatly bloat those numbers though, and Node in particular can be fairly boilerplate heavy.
conartist6 23 hours ago [-]
Pain is a signal. Even if the trick is not minding, it's still inadvisable to burn your hand on an open flame. The pain is there to help you not get hurt.
I do not think it is wise to brag that your solution to a problem is extremely painful but that you were impervious to all the pain. Others will still feel it. This code takes bandwidth to host and space on devices and for maintainers it permanently doubles the work associated with evolving the filesystem APIs. If someone else comes along with the same kind of thinking they might just double those doubled costs, and someone else might 8x them, all because nobody could feel the pain they were passing on to others
nine_k 22 hours ago [-]
I don't see it to be such a pain.
> Bundle a full application into a Single Executable.
Embed a zip file into the executable, or something. Node sort of supports this since v25, see --build-sea. Bun and Deno support this for a longer time.
> Run tests without touching the disk.
This must be left to the host system to decide. Maybe I want them to touch the disk and leave traces useful for debugging. I'd go with tmpfile / tmpdir; whoever cares, knows to mount them as tmpfs, which sits in RAM. (Or a ramdisk under Windows.)
> Sandbox a tenant’s file access. In a multi-tenant platform, you need to confine each tenant to a directory without them escaping
This looks like a wrong tool, again. Run your Node app in a container (like you are already doing), mount every tenant's directory as a separate mount point into your container. (Similar with BSD jails.) This seems like the only problem that is not trivial to solve without a "VFS", but I'm not very certain that such a VFS would be as well-audited as Docker, or nsenter and unshare. The amount of work necessary for implementing that is too much for the niche benefit it would provide.
> Load code generated at runtime. See tmpfs for a trivial answer. For a less trivial answer, I don't see how Node's code loader is bound to a filesystem. If it can import via https, Just use ESM loader hooks and register() your loader, assuming you're running Node ≥ 20.6.
digikata 1 days ago [-]
Large PRs could follow the practices that the Linux kernel dev lists follow. Sometimes large subsystem changes could be carried separately for a while by the submitter for testing and maintenance before being accepted in theory, reviewed, and if ready, then merged.
While the large code changes were maintained, they were often split up into a set of semantically meaningful commits for purposes of review and maintenance.
With AI blowing up the line counts on PRs, it's a skill set that more developers need to mature. It's good for their own review to take the mass changes, ask themselves how would they want to systematically review it in parts, then split the PR up into meaningful commits: e.g. interfaces, docs, subsets of changed implementations, etc.
dakiol 1 days ago [-]
Nobody wants to review AI-generated code (unless we are paid for doing so). Open source is fun, that's why people do it for free... adding AI to the mix is just insulting to some, and boring to others.
Like, why on earth would I spent hours reviewing your PR that you/Claude took 5 minutes to write? I couldn't care less if it improves (best case scenario) my open source codebase, I simply don't enjoy the imbalance.
IgorPartola 22 hours ago [-]
In theory because the code being added is introducing a feature so compelling that it is worth it. In practice, that’s rarely the case.
My personal approach to open source is more or less that when I need a piece of software to exist that does not and there is no good reason to keep it private, it becomes open source. I don’t do it for fun, I do it because I need it and might as well share it. If someone sends me a patch that enhances my use case, I will work with them to incorporate it. If they send me a patch that only benefits them it becomes a calculus of how much effort would it take for me to review it. If the effort is high, my advice is to fork the project or make it easier for me to review. Granted I don’t maintain huge or vital projects, but that’s precisely why: I don’t need yet another programming language or runtime to exist and I wouldn’t want to work on one for fun.
mpyne 21 hours ago [-]
> Like, why on earth would I spent hours reviewing your PR that you/Claude took 5 minutes to write?
If the PR does what it says it does, why does it actually matter if it took 2 weeks or 2 minutes to put together, given that it's the equivalent level of quality on review?
TingPing 19 hours ago [-]
“It works” is the bare minimum. Software is maintained for decades and should have a higher bar of quality.
lmm 15 hours ago [-]
> If the PR does what it says it does, why does it actually matter if it took 2 weeks or 2 minutes to put together, given that it's the equivalent level of quality on review?
You're right that the issue isn't how many minutes it took. The issue is that it's slop. Reviewing thousands of lines of crappy code is unpleasant whether they were autogenerated or painstakingly handcrafted. (Of course, few humans have the patience and resistance to learning to generate the amount of terrible code that AIs do routinely).
hackemmy 22 hours ago [-]
I get the frustration but I think this take only holds if you assume AI generated code is inherently worse. If someone uses Claude to scaffold the boilerplate and then actually goes through it properly, the end result is the same code you would have written by hand, just faster. The real problem is when people submit 14k lines they clearly did not read through. But that is a review process problem, not an AI problem. Bad PRs existed long before AI.
lmm 15 hours ago [-]
Yes and no. Previously when someone submitted a 14k line PR you could be assured that they'd at least put a significant amount of time and effort into it, and the result was usually a certain floor on the quality level. Now that's no longer true.
wobfan 21 hours ago [-]
I resonate with OP a lot, and in my opinion, it's not about the code quality. It's about the effort that was put in, like in each LOC. I can't quite put it in words, but, like, the art comparison works quite well. If someone generates a painting with Gemini, it makes it somewhat heartless. It may still be good and bring the project forward (in case of this PR), but it lost every emotional value.
I would probably never be able to review this kind of code in open source projects without any financial compensation, because of that reason. Not because I don't like LLMs, not use LLMs, or think their code is of bad quality. But, while without LLMs I know there was a person who sat down and wrote all this in painstaking work, now I know that he or she barely steered a robot that wrote it. It may still be good work, and the steering and prompting is still work and requires skill, but for me I would not feel any emotional value in this code, and it would make it A LOT harder to gather motivation to review it. Interestingly, when I think about it, I realize that I would inherently have motivation to find out how the developer prompted the agent.
Like, you know, when I see a wooden statue of which I know it was designed and carved by someone in months of work, I could appreciate every single edge of the wood much more than if there's a statue that was designed by someone but carved by some kind of wooden CNC machine. It may be same statue and the same or even better quality, and it was still skillful work, but I lose my connection to it.
Can't quite pinpoint it, but for me, it seems, the human aspect is really important here, at least when it's about passion and motivation.
Maybe that made some sense, idk. I just wrote out of my ass.
tyre 1 days ago [-]
Why do you care how much effort it took the engineer to make it? If there was a huge amount of tedium that they used Claude Code for, then reviewed and cleaned up so that it’s indistinguishable from whatever you’d expect from a human; what’s it to you?
Not everyone has the same motivations. I’ve done open source for fun, I’ve done it to unblock something at work, I’ve done it to fix something that annoys me.
If your project is gaining useful functionality, that seems like a win.
gonzalohm 24 hours ago [-]
Because sometimes programming is an art and we want people to do it as if it was something they cared about.
I play chess and this is a bit like that. Why do I play against humans? Because I want to face another person like me and see what strategies they can come up with.
Of course any chess bot is going to play better, but that's not the point
IgorPartola 22 hours ago [-]
What about the other times?
madeofpalk 22 hours ago [-]
I don't think node virtual filesystems is anything like chess.
UqWBcuFx6NV4r 22 hours ago [-]
[flagged]
wobfan 22 hours ago [-]
TIL that when I do anything that makes society label me as a "developer", I am not allowed to enjoy it, or feel about it in any way, as it's now a job, entirely neutral in nature, and I gotta do it, whether I hate or enjoy it - no attached emotions allowed.
paulryanrogers 21 hours ago [-]
Ignore the mercenaries. Here they are legion.
As for us (aspiring) craftsman, there are dozens of us! Dozens!
lmm 15 hours ago [-]
> Why do you care how much effort it took the engineer to make it?
Because they're implicitly asking me to put in effort as a reviewer. Pretending that they put more effort in than they have is extremely rude, and intentionally or not, generating a large volume of code amounts to misleading your potential reviewers.
> If there was a huge amount of tedium that they used Claude Code for, then reviewed and cleaned up so that it’s indistinguishable from whatever you’d expect from a human; what’s it to you?
They never do though. These kind of imaginary good AI-based workflows are a "real communism has never been tried" thing.
> If your project is gaining useful functionality, that seems like a win.
Lines of code impose a maintenance cost, and that goes triple when the code quality is low (as is always the case for actually existing AI-generated code). The cost is probably higher than the benefit.
Gigachad 16 hours ago [-]
I hate being paid to review AI slop.
goalieca 1 days ago [-]
> With AI blowing up the line counts on PRs,
Well, the process you’re describing is mature and intentionally slows things down. The LLM push has almost the opposite philosophy. Everyone talks about going faster and no one believes it is about higher quality.
digikata 1 days ago [-]
Go slow to go fast. Breaking up the PR this way also allows later humans and AI alike to understand the codebase. Slowing down the PR process with standards lets the project move faster overall.
If there is some bug that slips by review, having the PR broken down semantically allows quicker analysis and recovery later for one case. Even if you have AI reviewing new Node.js releases for if you want to take in the new version - the commit log will be more analyzable by the AI with semantic commits.
Treating the code as throwaway is valid in a few small contexts, but that is not the case for PRs going into maintained projects like Node.js.
tracker1 1 days ago [-]
TBF, most of the AI code I've reviewed isn't significantly different than code I've seen from people... in fact, I've seen significantly worse from real people.
The fact is, it's useful as a tool, but you still should review what's going on/in. That isn't always easy though, and I get that. I've been working on a TS/JS driver for MS-SQL so I can use some features not in other libraries, mostly bridging a Rust driver (first Tiberious, then mssql-client), the clean abstraction made the switch pretty quick... a fairly thorough test suite for Deno/Node/Bun kapt the sanity in check. Rust C-style library with FFI access in TS/JS server environment.
My hardest part, is actually having to setup a Windows Server to test the passswordless auth path (basically a connection string with integrated windows auth). I've got about 80 hours of real time into this project so far. And I'll probably be doing 2 followups.. one with be a generic ODBC adapter with a similar set of interfaces. And a final third adapter that will privide the same methods, but using the native SQLite underneath but smothing over the differences.
I'm leveraging using/dispose (async) instead of explicit close/rollback patterns, similar to .Net as well as Dapper-like methods for "Typed" results, though no actual type validation... I'd considered trying to adapt Zod to check at least the first record or all records, and may still add the option.
All said though, I wouldn't have been able to do so much with so relatively little time without the use of AI. You don't have to sacrifice quality to gain efficiency with AI, but you do need to take the time to do it.
dotancohen 1 days ago [-]
> Everyone talks about going faster and no one believes it is about higher quality.
Go Fast And Break Things was considered a virtue in the JavaScript community long before LLMs became widely available.
syrusakbary 1 days ago [-]
Fully disagree with this take. Not allowing AI assistance on PRs will likely decimate the project in the future, as it will not allow fast iteration speeds compared to other alternatives.
Note aside, OpenJS executive director mentioned it's ok to use AI assistance on Node.js contributions:
I checked with legal and the foundation is fine with the DCO on AI-assisted contributions. We’ll work on getting this documented.
I appreciate hearing your point of view on this. In my opinion the future of Open Source and AI assisted coding is a much bigger issue, and different people have different levels of confidence in both positive and negative outcomes of LLM impact on our industry.
It is great to have a legal perspective on compliance of LLM generated code with DCO terms, and I feel safer knowing that at least it doesn't expose Node.js to legal risk. However it doesn't address the well known unresolved ethical concerns over the sourcing of the code produced by LLM tooling.
jaredklewis 24 hours ago [-]
AI coding is great, but iteration speed is absolutely not a desirable trait for a runtime. Stability is everything.
Speed code all your SaaS apps, but slow iteration speeds are better for a runtime because once you add something, you can basically never remove it. You can't iterate. You get literally one shot, and if you add a awkward or trappy API, everyone is now stuck with it forever. And what if this "must have" feature turns out to be kind of a dud, because everyone converged on a much more elegant solution a few years later? Congratulations, we now have to maintain this legacy feature forever and everyone has to migrate their codebase to some new solution.
Much better to let dependencies and competing platforms like bun or deno do all the innovating. Once everyone has tried and refined all the different ways of solving this particular problem, and all the kinks have been worked out, and all the different ways to structure the API have been tried, you can take just the best of the best ideas and add it into the runtime. It was late, but because of that it will be stable and not a train wreck.
But I know what you're thinking. "You can't do that. Just look at what happens to platforms that iterate slowly, like C or C++ or Java. They're toast." Oh wait, never mind, they're among the most popular platforms out there.
syrusakbary 23 hours ago [-]
Since when we accepted that we can’t go fast and offer stability at the same time?
Time is highly correlated with expertise. When you don’t have expertise, you may go fast at expense of stability because you lack the experience to make good decisions to really save speed.
This doesn’t hold true for any projects where you rely on experts, good processes and tight timelines (aka: Apollo mission)
jaredklewis 22 hours ago [-]
IME there's a reason it's "move fast and break things" and not "move fast and don't break anything," because if the second was generally possible, we wouldn't even need this little aphorism.
And again, I'm not making a claim that the slow and steady tradeoff is best for all situations. Just that it is a great tradeoff for foundational platforms like a runtime. On a platform like postgresql or the JVM, the time from initial proposal to being released as a stable feature is generally years, and this pace I think has served those platforms well.
But I'm open to updating my priors. Do you think there are foundational platforms out there that iterate quickly and do a good job of it?
dijksterhuis 23 hours ago [-]
it’s a well known true-ism you can have it cheap, correct or fast.
but you can only have two of them at the same time.
and we’re talking about FOSS here, so cheap kinda has to be one of them.
oystersareyum 23 hours ago [-]
Allowing AI contributions results in lower quality contributions and allows wild things to come in and disrupt it, making it an unreliable dependency. We have seen big tech experience constant outages due to AI contributions as is...
UqWBcuFx6NV4r 22 hours ago [-]
Your comment is why advertisers say that you should repeat your core call to action at least a few times to make it stick.
You’ve read people saying the same thing hundreds of times and have somehow taken that as meaning that it’s credible.
Neither you nor I nor anyone else here knows what the “effects” are, because this is brand new tech, and it’s constantly changing. Yet you’re speaking with absolute confidence.
“Big tech” has downtime all the time, and LLMs did not change that fact. The only difference is that the peanut gallery that is already worked up about AI for philosophical / cultural reasons is suddenly ready to blame AI for every issue under the sun.
You think that you’re making a technical argument but you’re just repeating the same taking points I see teenagers regurgitating on TikTok. There’s nothing intelligent or credible about it.
habinero 16 hours ago [-]
My dude, you're making the classic problem of assuming because you don't have any first-hand knowledge of problems, other people are equally ignorant.
Don't slap someone else down because you don't know something.
lmm 15 hours ago [-]
> Not allowing AI assistance on PRs will likely decimate the project in the future, as it will not allow fast iteration speeds compared to other alternatives.
If and when there is evidence that AI is actually increasing the speed of improvement (and not just churn), it would make sense to permit it. Unless and until such evidence emerges, the risks greatly outweigh the benefits, at least for a foundational codebase like this.
KronisLV 21 hours ago [-]
> Not allowing AI assistance on PRs will likely decimate the project in the future, as it will not allow fast iteration speeds compared to other alternatives.
That sort of statement might also be sarcasm in another context: I personally use AI a lot, but also recognize that there are a lot of projects out there that are suffering from low quality slop pull requests, devs that kinda sign out and don't care much about the actual code as long as it appears to be running, alongside most LLMs struggling a lot with longer term maintenance if not carefully managed. So I guess it depends a lot on how AI is used and how much ideological opposition to that there is. In a really testable codebase it could actually work out pretty well, though.
szmarczak 1 days ago [-]
> Not allowing AI assistance on PRs will likely decimate the project in the future, as it will not allow fast iteration speeds compared to other alternatives.
It's not an AI issue. Node.js itself is lots of legacy code and many projects depend on that code. When Deno and Bun were in early development, AI wasn't involved.
Yes, you can speed up the development a bit but it will never reach the quality of newer runtimes.
It's like comparing C to C++. Those languages are from different eras (relatively to each other).
athorax 1 days ago [-]
How exactly does it violate the Developer's Certificate of Origin clause?
If submitter picks (a) they assert that they wrote the code themselves and have right to submit it under project's license. If (b) the code was taken from another place with clear license terms compatible with the project's license. If (c) contribution was written by someone else who asserted (a) or (b) and is submitted without changes.
Since LLM generated output is based on public code, but lacks attribution and the license of the original it is not possible to pick (b). (a) and (c) cannot be picked based on the submitter disclaimer in the PR body.
athorax 1 days ago [-]
Not sure if you are intentionally misrepresenting (a), but here is the full text
(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or
duskdozer 24 hours ago [-]
That seems exclusive of LLMs, as the user didn't create the contribution, the LLM did.
Dylan16807 16 hours ago [-]
It's exclusive of code where you wrote 0% of it.
"in part" is a trivial bar to clear.
duskdozer 13 hours ago [-]
I guess as a very strict reading where you take the output and insert a newline somewhere...but that sounds against the intent
paulryanrogers 21 hours ago [-]
Orthogonal to? Irrespective of the use of?
Dylan16807 1 days ago [-]
If there's a "the original" the LLM is copying then there's a problem.
If there isn't, then (b) works fine, the code is taken from the LLM with no preexisting license. And it would be very strange if a mix of (a) and (b) is a problem; almost any (b) code will need some (a) code to adapt it.
lmm 15 hours ago [-]
> the code is taken from the LLM with no preexisting license
That's not good enough to comply with (b). The code must be specifically covered by an open-source license, it's not enough for it to just not have a license.
Dylan16807 15 hours ago [-]
There's a difference between "no license, all rights reserved" and "no license, public domain". Up until recently, you could assume that not having a license meant the former. But treating the latter as the same would just be silly.
As far as I'm concerned, public domain counts as "an appropriate open source license".
lmm 14 hours ago [-]
> As far as I'm concerned, public domain counts as "an appropriate open source license".
For material whose author is known and has explicitly placed it in the public domain, sure. For code that fell off the back of a truck, not so much.
Dylan16807 15 minutes ago [-]
I'm of course assuming the legal status quo holds, where code properly generated by LLM is also explicitly public domain. No shadiness involved.
(There's always a risk of an LLM copying something verbatim by accident, but if the designers are doing their job that chance gets low enough to be acceptable. Human code has that risk too after all. (And for situations that aren't an accident, with the human intentionally using snippets to draw out training text, then if they submit that code in a patch it's just a human violating copyright with extra steps.))
benatkin 1 days ago [-]
To many, it qualifies under either A or B, and therefore C as well. Under A, you can think of the LLM as augmenting your own intelligence. Under B, the license terms of LLM output are essentially that you can do whatever you want with it. The alternative is avoiding use of AI because of copyright or plagiarism concerns.
charcircuit 1 days ago [-]
It would be considered (a) since the author would own the copyright on the code.
lacoolj 1 days ago [-]
Owning copyright of something and writing it are very different things
habinero 16 hours ago [-]
Not in the US. Copyright exists from the moment the work is created.
Whether AI output can fall under copyright at all is still up for debate - with some early rulings indicating that the fact that you prompted the AI does not automatically grant you authorship.
Even if it does, it hasn't been settled yet what the impact of your AI having been trained on copyrighted material is on its output. You can make a not-completely-unreasonable argument that AI inference output is a derivative work of AI training input.
Fact is, the matter isn't settled yet, which means any open-source project should assume the worst possible outcome - which in practice means a massive AI-generated PR like this should be treated like a nuke which could go off at any moment.
phendrenad2 1 days ago [-]
Why write open-source software at all, when the government could outlaw open-source entirely? What if an asteroid destroys Earth and there are no humans left to enjoy your work? At some point, you have to agree that a risk isn't worth worrying about. And your "worst possible outcome" is just the arbitrary outcome that you think has some subjective risk threshold. And it's certainly not one I agree with. Furthermore, calling it a "nuke" is a bad analogy because that implies that it can't be put back in the bottle once opened. In reality, we're dealing with legal definitions, which can be redefined as easily as defined.
habinero 16 hours ago [-]
> And it's certainly not one I agree with
Well, it's a good thing you're not on the hook for defending against it, then.
Like I said in another comment, you don't have a license just because they're cool and look neat. You have them specifically to guard against people like patent trolls, who are trying to wreck your shit and take your lunch money. It's not an abstract risk.
charcircuit 1 days ago [-]
The two main points are that:
1. Copyright cannot be assigned to an AI agent.
2. Copyrighted works require human creativity to be applied in order to be copyrighted.
For point 2 this would apply to times were AI one shots a generic prompt. But for these large PRs where multiple prompts are used and a human has decided what the design should be and how the API should look you get the human creativity required for copyright.
In regards to being a derivative work I think it would be hard to argue that an LLM is copying or modifying an existing original work. Even if it came up with an exact duplicate of a piece of code it would be hard to prove that it was a copy and not an independent recreation from scratch.
>the worst possible outcome
The worst possible outcome is they get sued and Anthropic defends them from the copyright infringement claim due to Anthopic's indemnity clause when using Claude Code.
monocularvision 1 days ago [-]
That indemnity clause is only for Team, Enterprise and API users. Do you know what was used here?
Also the commercial version is limited to “…Customer and its personnel, successors, and assigns…”. I am very much not a lawyer and couldn’t find definitions of these in the agreement but I am not sure how transferable this indemnity would be to an open source project.
charcircuit 1 days ago [-]
I reviewed it and it looks like personal Claude Code subscriptions are not covered, so it's riskier than I claimed.
UqWBcuFx6NV4r 22 hours ago [-]
This is not how law works. Stop pretending that you’re a lawyer. You do not “always assume the worst”. Stop giving legal advice. You’re very clearly a developer in over his head. Law is not an engineering problem. Legislation is not a technical specification. Christ.
habinero 16 hours ago [-]
No, they're absolutely correct, and they're not saying either of those things. They're pointing out an enormous hidden risk. Yanno, like an engineer is supposed to do.
You don't have a license because it's what all the cool kids are doing, you have one in case shit goes sideways and someone decides to try and ruin your day. You do, in fact, have to assume the worst.
The "nuke" here is some litigious company -- let's call them Patent Troll Rebranded (PTR) -- discovers that the LLM reproduced large amounts of their copyrighted code. Or it claims to have discovered it. They have large amounts of money and lawyers to fight it out in court and you are a relatively shoestring language foundation.
Either you have to unwind years of development to remove the offending code or you're spending six figures or more to defend yourself in court, all because you didn't bother to anticipate things that are anticipatable.
epolanski 1 days ago [-]
Do as I say, not as I do.
On a more serious note, I think that this will be thoroughly reviewed before it gets merged and Node has an entire security team that overviews these.
indutny 1 days ago [-]
As someone who was a part of the aforementioned security team I'm not sure I'd be interested in reviewing such volume of machine generated code, expecting trap at every corner. The implicit assumption that I observed at many OSS projects I've been involved with is that first time contributions are rarely accepted if they are too large in volume, and "core contributor" designation exists to signal "I put effort into this code, stand by it, and respect everyone's time in reviewing it". The PR in the post violates this social contract.
epolanski 1 days ago [-]
For free, you can decide to do what you want, if it's your job, it's a bit different and you may have to do so, especially considering Collina, is one of the largest contributors of the project and member of the technical committee.
exe34 1 days ago [-]
> if it's your job, it's a bit different and you may have to do so
Oh I'd use an llm to generate large amounts of feedback and request changes!
epolanski 1 days ago [-]
Imagine if every profession reasoned liked that when doing something they don't enjoy.
kruffalon 1 days ago [-]
What a wonderful world we would have, or possibly at least better than the current shit show :)
int_19h 20 hours ago [-]
We'd have a lot less enshittification all around, I suspect.
epolanski 9 hours ago [-]
Sure thing, your nurse ain't gonna clean your mom, in the restaurant the chef ain't gonna prepare a dish he doesn't like, your accountant ain't gonna file your taxes if you've given him data he doesn't like, etc.
Your paid to do a job, you're either professional or you aren't.
lelandfe 7 hours ago [-]
I’d probably leave the hospital if I heard the doctor filed an LLM generated diagnosis for the nurse to validate
exe34 3 hours ago [-]
So you don't do your job and submit a PR you didn't even read, and I'm supposed to waste my time that I have to the explain at my next performance review? I didn't sign up to read slop, thanks! If my boss wants me to spend 10x time time on this kind of shit, he has to pick something else that I no longer have to do. My time is not elastic. It can't expand to fit your slop.
exe34 24 hours ago [-]
Imagine fighting fire with fire. You don't have to take shit lying down.
lemagedurage 1 days ago [-]
[dead]
madeofpalk 22 hours ago [-]
> it must be noted that this 19k LoC PR was mostly generated by Claude Code and manually reviewed by the submitter
Who reviewed and approved the PR?
petetnt 20 hours ago [-]
Personally I’d like to thank you for raising the point, it seems that tsc members are willing to ram the PR through regardless as per jasnell’s LLM analysis that honestly seems like a hostile gish galloping attempt than an actual honest analysis.
15 hours ago [-]
AgentNode 1 hours ago [-]
[flagged]
wccrawford 1 days ago [-]
I'm not convinced that allowing Node to import "code generated at runtime" is actually a good thing. I think it should have to go through the hoops to get loaded, for security reasons.
I like the idea of it mocking the file system for tests, but I feel like that should probably be part of the test suite, not Node.
The example towards the end that stores data in a sqlite provider and then saves it as a JSON file is mind-boggling to me. Especially for a system that's supposed to be about not saving to the disk. Perhaps it's just a bad example, but I'm really trying to figure out how this isn't just adding complexity.
I had to laugh, because the post you're replying to STRONGLY reminds me of this story, https://news.ycombinator.com/item?id=31778490 , in which some people on the GNOME project objected to thumbnails in the file-open dialog box because it might be a "Security issue" (even though thumbnails were available in the normal file browser, something those commenters probably should have known about, but didn't, but they just had to chime in anyway).
rendaw 17 hours ago [-]
I can totally believe that, but at the same time I checked that link and didn't see anything about security mentioned.
apatheticonion 21 hours ago [-]
As a user of embedded Node.js - I need the ability to package JavaScript into the binary and feed it to Node.js without writing it to disk.
My current flow is to literally embed the JavaScript in the binary, then on start, write the JavaScript code to `/tmp/{random}` and point Node.js to execute the code at that destination.
A virtualized filesystem also allows for a safer "plugin" story for Node.js - where JavaScript plugins can be prevented from accessing the real filesystem.
TheRealPomax 1 days ago [-]
But then you go "hang on, doesn't ESM exist?" and you realize that argument 4 isn't even true. You can literally do what this argument says you can't, by creating a blob instead of "writing a temp file" and then importing that using the same dynamic import we've had available since <checks his watch> 2020.
dfabulich 1 days ago [-]
A virtual filesystem makes it possible for the ESM you import to statically import other files in the virtual filesystem, which isn't possible by just dynamically importing a blob. Anything your blob module imports has to be updated to dynamically import its dependencies via blobs.
apatheticonion 21 hours ago [-]
Correct. Especially painful if you use Worker threads or .node files
notnullorvoid 1 days ago [-]
There's also a module expression proposal, that would remove the need to use blob imports.
Using Claude for code you use yourself or at your own company internally is one thing, but when you start injecting it into widely-shared projects like this (or, the linux kernel, or Debian, etc) there will always be a lingering feeling of the project being tainted.
Just my opinion, probably not a popular one. But I will be avoiding an upgrade to Node.js after 24.14 for a while if this is becoming an acceptable precedent.
atomicnumber3 20 hours ago [-]
I still think everyone is trying to run away from the copyright problems with AI, and suspect it's going to come back to bite them. Eventually. (No I'm not willing to bet on exactly when because I'm sure it'll be a lot longer than I'd like).
austin-cheney 1 days ago [-]
Most of the 4 justifications mentioned sound like mitigations of otherwise bad design decisions. JavaScript in the browser went down this path for the longest time where new standards were introduced only to solve for stupid people instead of actually introducing new capabilities that were otherwise unachievable.
I do see some original benefits to a VFS though, bad application decisions aside, but they are exceedingly minor.
As an aside I think JavaScript would benefit from an in-memory database. This would be more of language enhancement than a Node.js enhancement. Imagine the extended application capabilities of an object/array store native to the language that takes queries using JS logic to return one or more objects/records. No SQL language and no third party databases for stuff that you don't want to keep in offline storage on a disk.
dotancohen 1 days ago [-]
> I think JavaScript would benefit from an in-memory database.
That database would probably look a lot like a JSON object. What are you suggesting, that a global JSON object does not solve?
austin-cheney 1 days ago [-]
Whether it is an object, array, something else, or a combination thereof is a design decision. It is not so much about the design of the structure, which should be determined by execution performance considerations, but how information is added, removed and retrieved. Gathering one or more records from a JSON object, or array index, by value of some child property somewhere in a descendant structure of the instance index always feels like a one-off based upon the shape of the data. That could just be a query which is more elegant to read and yet still achieves superior execution performance compared to a bunch of nested loops or string of function array methods.
The more structures you have in a given application and the larger those structures become in their schemas the more valuable a uniform storage and retrieval solution becomes.
Thinking small. In SQL databases a well put together database instance will typically have tables that with a single incrementing primary key column and some secondary key columns that point to unique records on other tables. That is the relational part of RDBMS.
Its not about what it looks like. Arrays have fancy functional methods, but not object structures. Its more about whether it executes faster and comprises fewer steps to read/write. A real case in my application is get all ports associated with unencrypted sockets associated with servers of a given type and sort the output in a manner chosen by the user. The data in this case is in different unrelated objects whose properties point to each other in various ways by identity, because each server and socket uses hashes for unique identifiers.
curtisblaine 1 days ago [-]
sorted maps with log(n) access.
iainmerrick 1 days ago [-]
Why would you want a language enhancement for that, rather than just writing it in JS code? (or perhaps WASM)
Xenoamorphous 1 days ago [-]
Like indexedDB but in Node?
duped 1 days ago [-]
> As an aside I think JavaScript would benefit from an in-memory database.
isn't that just global state, or do you mean you want that to be persistent?
socalgal2 1 days ago [-]
What's special about node.js here? Does golang, C#, python, ruby, java, etc... have a virtual file system?
I get it, I've implemented things for tests, I'm just wondering if this shouldn't be solved at an OS level.
--- update
Let's put this another way, my code does effectively, child_process.spawn('something-that-reads-and-write-a-file')
now I'm back to the same issue. To test I need a virtual file system. Node providing one won't help.
I do think it's more painful to distribute files when you're a distributed as a single binary vs scripts, since the latter has to figure out bundling of files anyway.
But still - it does exist
nine_k 23 hours ago [-]
Zip files are created in such a way that they can be a part of an executable file. (This is how self-extracting archives used to work.) Support for reading zip files is lightweight, and is present almost everywhere.
A ZIP fork embedded into the executable should be an obvious read-only VFS implementation. Bring your assets with you, even maybe build them with the standard zip utility.
It should take relatively few LOCs, provided that libzip is already linked into the executable anyway.
benatkin 24 hours ago [-]
Embed is read-only at runtime. This proposed vfs module for Node.js is a full virtual file system.
sauercrowd 4 hours ago [-]
true
benatkin 2 hours ago [-]
It's cool that it fits into golang's readable file system interface so it can be used polymorphically. I don't know if golang has very complete interfaces for a read and write file system that could be used for a full vfs. If it does, that's nice, and a starting point for a similar vfs! I'm also not sure whether it should go into the standard library or not.
paradox460 22 hours ago [-]
In the past, I've implemented this in Ruby and elixir on Linux systems using a ramdisk.
Retr0id 16 hours ago [-]
Python doesn't have a virtual filesystem in general but it is possible to shim the import mechanism.
antonvs 21 hours ago [-]
Java, Haskell, Rust, and several other languages do have solutions for this. Here’s one for Rust: https://docs.rs/vfs/latest/vfs/
ThomIves 52 minutes ago [-]
I loved this notion from just the title, and then eagerly read the post. BRAVO! I am investigating if I can, and how to, exploit this ASAP.
il-b 21 hours ago [-]
> no human wants to hand-write: implementing every fs method variant (sync, callback, promises), wiring up test coverage
That’s so dehumanizing, I would happily write such code.
huksley 1 hours ago [-]
There is already memfs package which implements virtual fs, with other packages as well. What we need is to support import/require working with that vfs.
PaulHoule 1 days ago [-]
Would be nice if node packages could be packed up in ZIP files so to avoid the security/metadata tax for small file access on Windows.
MarleTangible 1 days ago [-]
The number of files in the node modules folder is crazy, any amount of organization that can tame that chaos is welcomed.
koolba 1 days ago [-]
And if you thought malware hiding in a mess of files was bad, just wait till you see it in two layers of container files.
PaulHoule 1 days ago [-]
Or worse yet, the performance load of anti-malware software that has to look inside ZIP files.
Look, most of us realized around 2004 or so that if you had a choice between Norton and the virus you would pick the virus. In the Windows world we standardized around Defender because there is some bound on how much Defender degrades the performance of your machine which was not the case with competitive antivirus software.
I've done a few projects which involved getting container file formats like ZIP and PDF (e.g. you know it's a graph of resources in which some of those resources are containers that contain more resources, right?) and now that I think of it you ought to be able to virus scan ZIP files quickly and intelligently but the whole problem with the antivirus industry is that nobody ever considers the cost.
ronsor 1 days ago [-]
Now we'll have to encrypt the files to prevent the performance hit of antivirus peeking inside.
Oh, wait...
Dangeranger 1 days ago [-]
There are alternative package managers like Yarn that use zip files as a way to store each Node package.[0]
yarn with zero-installs removes an awful lot of pain present in npm and pnpm. Its practically the whole point of yarn berry.
Firstly - with yarn pnp zero-installs, you don't have to run an `install` every time you switch branch, just in case a dep changed. So much dev time is wasted due to this.
Secondly - "it worked on my machine" is eliminated. CI and deploy use the exact same files - this is particularly important for deeply nested range satisfied dependencies.
Thirdly - packages committed to the repo allows for meaningful retrospectives and automated security reviews. When working in ops, packages changing is hell.
All of this is facilitated by the zip files that the comment you replied to was discussing, that you tangented away from.
The graph you have linked is fundamentally odd. Firstly - there is no good explanation of what it is actually showing. I've had claude spin on it and it reckons its npm download counts. This leads to it being a completely flawed graph! Yarn berry is typically installed either via corepack or bootstrapped via package.json and the system yarn binary. Yarn even saves itself into your repo. pnpm is never (I believe) bundled with the system node, wheras yarn and npm typically are.
Your graph doesn't show what you claim it does.
PaulHoule 1 days ago [-]
... and of course JAR files in Java are just ZIP files with a little extra metadata and the JVM can unpack them in realtime just fine.
buttsack 1 days ago [-]
When npm decided to have per-project node_modules (rather than shared like ruby and others) and human readable configs and library files I think the goal was to be a developer friendly and highly configurable, which it is. And package.json became a lot more than that as a result, it’s been a great system IMO.
Combined with a hackable IDE like Atom (Pulsar) made with the same tech it’s a pretty great dev exp for web devs
PaulHoule 22 hours ago [-]
It’s one thing or another.
Python had shared packages for a long time and those are fine up to a point but circa 2017 I was working at a place where we had data scientists making models using different versions of Tensorflow and stuff and venv’s are essential to that. We were building unusually complex systems and having worse problems than other people but if you do enough development you will have trouble with shared packages.
The node model of looking for packages in the local directory has some appeal and avoids the need for “activation” but I like writing Python-based systems that define one or more command line programs that I can go use in any directory I want. For instance, if I want to publish one of my Vite projects I have a ‘transporter’ written in Python that looks for a Vite project in the current directory and uploads it to S3, updates metadata and invalidates cloudfront and all that. I have to activate it which is a minor hassle but then I can go to different Vite projects and publish them.
skydhash 21 hours ago [-]
I still prefer shared packages because it incentivizes developer to have a stable API. And you always have an option to manipulate the path variables and have projects (java) and virtual env (python). Cargo and NPM always seems to be straight from Alice’s dreams (Lewis Caroll).
buttsack 6 hours ago [-]
[dead]
poopdick 6 hours ago [-]
[dead]
fmorel 1 days ago [-]
I remember when Firefox started putting everything into jars for similar reasons.
Would accessing deps directly from a zip really be faster? I'd be a little surprised but not terribly, given that it's readonly on an fs designed for RW. If not, maybe just tar?
pie_flavor 1 days ago [-]
You just cat the exe with the zip file, then it is all loaded into memory at the same time on process init. This is how e.g. LÖVE does game code packaging. (It can't be tar, because this trick only works because the PKZIP descriptor is at the end of the file.)
MBCook 1 days ago [-]
It’s insane to me that node works how it does. Zip files make so much more sense, I really liked that about Yarn.
pverheggen 1 days ago [-]
You can always use virtualized Linux to avoid the NTFS penalty (WSL2, VS Code dev containers, etc.)
hrmtst93837 1 days ago [-]
Moving your whole workflow into WSL or nested containers just to dodge NTFS is a band-aid. Then you get flaky file watchers, odd perms, and a dev setup that feels like a workaround piled on top of another workaround. A fast Node VFS would remove a lot of this nonsense.
pverheggen 1 days ago [-]
Oh it's a workaround for sure, didn't mean to suggest otherwise.
sheept 1 days ago [-]
Would it work to run a bundler over your code, so all (static) imports are inlined and tree shaken?
giancarlostoro 1 days ago [-]
> I pointed the AI at the tedious parts, the stuff that makes a 14k-line PR possible but no human wants to hand-write: implementing every fs method variant (sync, callback, promises), wiring up test coverage, and generating docs.
This is the biggest takeaway for me for AI. It's not even that nobody wants to do these things, its that by the time you finish your tasks, you have no time to do these things, because your manage / scrum master / powers that be want you to work on the next task.
Culonavirus 23 hours ago [-]
That's perfectly understandable. But has no business being in a large open source project, let alone world class one like Node or (god forbid) the Linux kernel. Get that shit the fuck out.
potsandpans 19 hours ago [-]
> Get that shit the fuck out.
No.
Lerc 24 hours ago [-]
I think the insight there is that the increased productivity of AI could be used to add features where the end results are weighing the ability of the AI against the ability of an individual implementing the same thing.
The alternative is that you work on the same number of features and utilize the ability to make those features as robust as you know they could be, but you have other pressing matters to attend to. That's weighing the ability of AI against the ability of neglect.
torginus 1 days ago [-]
Why do people keep reinventing OS features?
There's Docker, OverlayFS, FUSE, ZFS or Btrfs snapshots?
Do you not trust your OS to do this correctly, or do you think you can do better?
A lot of this stuff existed 5, 10, 15 years ago...
Somehow there's been a trend for every effing program to grow and absorb the features and responsibilities of every other program.
Actually, I have a brilliant idea, what if we used nodejs, and added html display capabilities, and browser features? After all Cursor has already proven you can vibecode a browser, why not just do it?
I'm just tired at this point
williamstein 1 days ago [-]
This exact thing solves a huge problem with SEA binaries as he points out in his post. You can include complicated assets easily and skip an ugly unpack step entirely. This is very useful.
ryandrake 1 days ago [-]
One of the worst is media players that all insist on grafting their own "library" on top of my already-working OS filesystem. So I can't just run the media player and play files. No, that would be too simple. I have to first "import" my media into a "library" abstraction and then store that library somewhere else on my filesystem. Terrible!
SAI_Peregrinus 1 days ago [-]
There's a legitimate problem they're trying to solve there: there are several ways to sort media that don't match up well with a hierarchical filesystem¹. They solve it badly. Good players maintain a database for efficient queries of media metadata, and periodically rescan the folders to update it. Shitty media players try to manage the files themselves, and still end up needing to maintain a database. The worst of these use the database to manage the contents of their storage files (or store the files themselves in the database), if something isn't in the database they delete the files. Adobe Lightroom Classic does this, if your database gets corrupted it deletes all your RAW files!
¹E.g. if you've got music, and it's sorted `artist/album/track<n>.extension`, and two artists collaborate on an album, which one gets the album in their folder? What if you want to sort all songs in the display by publication date? Even if they use the files on your filesystem without moving them, some sort of metadata database will be needed for efficient display & search.
mg 1 days ago [-]
You can’t import or require() a module
that only exists in memory.
You can convert it into a data url and import that, can't you?
afavour 1 days ago [-]
What happens to relative imports?
doctorpangloss 1 days ago [-]
[flagged]
philo23 17 hours ago [-]
Little bit saddened the sqlite provider doesn't use the SQLite archive format under the hood. Seems like it'd be a good fit for what they're trying to achieve + give you an easy way to create/extract the files out of the virtual file system.
The sqlar schema is missing some of the info thats being stored atm, but there's nothing stopping you from adding your own fields/tables on top of the format, if anything the docs encourage it. It is just a sqlite database at the end of the day.
This is because yarn patches fs in order to introduce virtual file path resolution of modules in the yarn cache (which are zips), which is quite brittle and was broken by a seemingly unrelated change in 25.7.
The discussion in issue 62012 is notable - it was suggested yarn just wait for vfs to land. This is interesting to me in two ways: firstly, the node team seems quite happy for non-trivial amounts of the ecosystem to just be broken, and suggests relying on what I'm assuming will be an experimental API when it does land; secondly, it implies a lot of confidence that this feature will land before LTS.
pamcake 6 hours ago [-]
> firstly, the node team seems quite happy for non-trivial amounts of the ecosystem to just be broken
yarn/node relations specifically are... complicated. On display on corepack (yarn project which got bundled into official nodejs distribution) issue tracker.
> secondly, it implies a lot of confidence that this feature will land before LTS.
This confidence is somewhat concerning. Will it get reviewed at all or has the "trust the LLM" mandate arrived at Node too now.
Not spamming, not affiliated, just trying to help others avoid so much needless suffering.
Normal_gaussian 1 days ago [-]
This is quite spammy; you could mitigate it by explaining what you think the "needless suffering" is. Having been using npm, pnpm, and yarn for many years the only benefit I find with pnpm is a little bit of speed when using the cli, but not enough that I notice; I've outlined the major yarn benefit to me 'in a peer comment' (which I didn't realise was you when I answered) https://news.ycombinator.com/item?id=47415660
I expect yarn to have a real competitor sooner rather than later that will replace it; and I do wonder if it is this vfs module that will enable it.
nwienert 13 hours ago [-]
For many years I was using yarn with 0 issue on massive monorepos, and every year I'd hear people hyping pnpm, I'd try and switch, run into multiple bugs often open issues in pnpm itself, yes even without their link strategy, then give up and wait. After about 3 years of this I gave up and never tried again.
zadikian 1 days ago [-]
I just use npm because I like to stay as vanilla as possible. Glad that alternatives exist though.
Normal_gaussian 1 days ago [-]
This can't be overstated. The main benefit with yarn berry (v4+) is being able to commit the dependencies to the repo - I have yarn based tools that I wrote years ago that just work wheras I frequently find npm and python tools are broken due to version changes. However this benefit comes at a setup cost and a lot more on disk complexity - one off tools are just npm and done.
keepamovin 18 hours ago [-]
The way I bundle into SEA is modules that need to be imported from disk (that can't be bundled due to node or wasm modules), is just include them in the assets, and do a "write to tmp, import, delete" flow. It works.
Not saying vfs is bad, just it's not impossible in a few lines of code to set up that. My idea for a simple version of a vfs in node is to use a RAM disk/RAMfs - would that work?
gnarbarian 1 days ago [-]
one of the reasons I prefer deno is the availability of indexeddb (and all the other great stuff that comes with it out of the box)
chmod775 20 hours ago [-]
Yes, but no. Node itself merely needs a standardized, pluggable layer of indirection in its file APIs. If someone wants to implement a VFS using that, that's cool.
Basically an "fs-core" that everything ultimately goes through, and which can be switched out/layered with another implementation. Think express-style routing but for the filesystem.
That'll keep things simple in node's codebase while handing more power to users.
mohsen1 1 days ago [-]
Yarn, pnpm, webpack all have solutions for this. Great to see this becoming a standard. I have a project that is severely handicapped due to FS. Running 13k tests takes 40 minutes where a virtual file system that Node would just work with it would cut the run time to 3 minutes. I experimented with some hacks and decided to stay with slow but native FS solution.
What I really want is a way of swapping FS with VFS in a Node.js program harness. Something like
node --use-vfs --vfs-cache=BIG_JSON_FILE
So basically Node never touches the disk and load everything from the memory
Normal_gaussian 1 days ago [-]
The way to do this today is to do it outside of node. Using an overlay fs with the overlay being a ramfs. You can even chroot into it if you can't scope the paths you need to be just downstream from some directory. Or, just use docker.
mohsen1 1 days ago [-]
making that work cross platform is pure pain
Normal_gaussian 1 days ago [-]
yes and no. Waiting 40mins for every test run is pure pain, platform specific ramfs type mounting is quite scriptable. Yes some devs might need to install a dependency, but its not a complex script.
skydhash 1 days ago [-]
What are the other OS? There's a bunch of solutions described on Wikipedia
This brings back memories back in the days I mess around Resource Hacker for Win32 EXEs.
I miss those days where you can tweak all kinds of software GUI by your self. Change icons, menus, shortcut keys, etc.
notnullorvoid 1 days ago [-]
I could see something like this being useful if it could be passed to workers to replace any fs access inside the worker.
butz 1 days ago [-]
How about trying to reduce dependencies? 11ty is going in correct direction, dropping significant chunk of various dependencies or replacing them with packages with no dependencies or using platform features, that becomes readily available.
pier25 22 hours ago [-]
Not a priority for the Node team, unfortunately.
adzm 1 days ago [-]
How does electron do this with its packaged files? I suppose it does not work with module resolution?
ozlikethewizard 1 days ago [-]
I'm not convinced this needs to be in core Node, but being able to have serverless functions access a file system without providing storage would definitely have some use cases. Had some fun with video processing recently that this would be perfect for.
themafia 1 days ago [-]
> You can’t import or require() a module that only exists in memory.
Sure you can. Function() exists and require.cache exists. This is _intentionally_ exploitable.
cmrx64 21 hours ago [-]
this is a pretty bad vfs. there are pure “cap manifest” approaches that don’t pull in decades of cruft semantics. don’t build systems that aren’t objectstore native in 2025 (since this work was initiated in december).
gwbas1c 1 days ago [-]
Can you dynamically load code via eval?
(I know, I know, it's ugly and has its own set of problems)
sidewndr46 1 days ago [-]
Don't all projects eventually grow to encompass service discovery?
1 days ago [-]
AndyKelley 15 hours ago [-]
I used Node.js extensively for about a decade, both professionally and as a hobby. I don't buy the problem statement.
These arguments don't even make sense, they look LLM generated. I can't even formulate a disagreement against this nonsense.
minraws 1 days ago [-]
Why is this not a library what is this insanity??
verdverm 1 days ago [-]
Separate the valid critiques on other comments, Go's io.FS interface is really nice for making these sorts of things. Is there something like this in Node already? (with base implementations like host and in memory)
I would put virtual or filesystem abstractions in a different category than sandboxing, which puts restrictions over the virtual or native implementations.
dabbz 21 hours ago [-]
I mean that's a fair distinction. There's definitely some overlap depending on needs though.
westurner 1 days ago [-]
Is node::vfs the new solution for JupyterLite filesystems?
>Let me be honest: a PR that size would normally take months of full-time work. This one happened because I built it with Claude Code.
The node.js codebase and standard library has a very high standard of quality, hope that doesn't get washed out by sloppy AI-generated code.
OTOH, Matteo is an excellent engineer and the community owes a lot to him. So I guess the code is solid :).
bronlund 1 days ago [-]
Yeah. That’s what we need. More Node.
devnotes77 1 days ago [-]
[flagged]
AgentNode 1 hours ago [-]
[dead]
leontloveless 1 days ago [-]
[dead]
iam_circuit 20 hours ago [-]
[dead]
openinstaclaw 1 days ago [-]
[dead]
aplomb1026 1 days ago [-]
[dead]
rigorclaw 1 days ago [-]
[flagged]
andrewmcwatters 1 days ago [-]
[dead]
buttsack 1 days ago [-]
[flagged]
syrusakbary 1 days ago [-]
[flagged]
szmarczak 1 days ago [-]
HN comments isn't a place to advertise your product.
szszrk 1 days ago [-]
I am not so sure about that. I recall multiple posts that start with most upvoted comments from founders...
Wonder what Dang says about that.
wei03288 1 days ago [-]
[flagged]
AgentMarket 23 hours ago [-]
[flagged]
pier25 1 days ago [-]
The Node team has lost the plot IMO.
By far the most critical issue is the over reliance on third party NPM packages for even fundamental needs like connecting to a database.
afavour 1 days ago [-]
What would a Node-native database connection layer look like? What other platforms have that?
Databases are third party tech, I don’t think it’s unreasonable to use a third party NPM module to connect to them.
mike_hearn 1 days ago [-]
Most obviously, Java has JDBC. I think .NET has an equivalent. Drivers are needed but they're often first party, coming directly from the DB vendor itself.
Java also has a JIT compiling JS engine that can be sandboxed and given a VFS:
N.B. there's a NodeJS compatible mode, but you can't use VFS+sandboxing and NodeJS compatibility together because the NodeJS mode actually uses the real NodeJS codebase, just swapping out V8. For combining it all together you'd want something like https://elide.dev which reimplemented some of the Node APIs on top of the JVM, so it's sandboxable and virtualizable.
LunaSea 1 days ago [-]
> Most obviously, Java has JDBC. I think .NET has an equivalent. Drivers are needed but they're often first party, coming directly from the DB vendor itself.
So it's an external dependency that is not part of Java. It doesn't really matter if the code comes from the vendor or not. Especially for OpenSource databases.
zadikian 1 days ago [-]
DBMS vendor providing the client is nice. At least if you're using pg-native in Node, that's just a wrapper around the Postgres-owned libpq, but I've run into small breaking updates before that I don't feel would've happened if Postgres maintained both.
afavour 24 hours ago [-]
But that’s not Node’s fault surely? Shouldn’t Postgres be providing an NPM module given the popularity of Node?
zadikian 24 hours ago [-]
No it's not Node's fault, this isn't their job. I don't blame Postgres either, cause maintaining libpq is fair enough, just would've been extra nice to have an official Node lib too.
mike_hearn 1 days ago [-]
Well in the case of Oracle you can get the language, runtime, DB and driver all from the same organization under unified support contracts.
If you don't value that, why would you want your programming language implementors to also implement database drivers?
zadikian 1 days ago [-]
Well that's only because Oracle happens to own both Java and Oracle DB. Suppose you're not using that DB.
pier25 1 days ago [-]
Bun provides native MySQL, SQlite, and Postgres drivers.
I'm not saying Node should support every db in existence but the ones I listed are critical infrastructure at this point.
When using Postgres in Node you either rely on the old pg which pulls 13 dependencies[1] or postgres[2] which is much better and has zero deps but mostly depends on a single guy.
Node has sqlite, though I have not had any issues using better-sqlite3 and worker processes for long running ops
pier25 1 days ago [-]
Until the day it gets pwned by a malicious actor. Which is something we've seen quite a lot of times on npm deps.
zadikian 24 hours ago [-]
Maybe MySQL and Postgres should make official Node libs then. Bun maintaining this is ok too, but it seems odd given that it means having to keep up with new features in those DBMSes.
pier25 23 hours ago [-]
> but it seems odd given that it means having to keep up with new features in those DBMSes
That would be more useful for the ecosystem than the Node team investing time on a virtual file system.
zadikian 21 hours ago [-]
Hard to compare, but reason #1 of bundling an app is a pretty big deal that can't be solved with just a library.
ksherlock 1 days ago [-]
Perl has DBI. PHP has PDO.
Spivak 1 days ago [-]
Python has DB-API.
nulltrace 22 hours ago [-]
I publish a package with zero deps and people still pull in a pile of transitive stuff from their lockfile. "pg" has 13 dependencies and nobody even blinks. One gets compromised and suddenly every Node backend using Postgres is in scope. Bun shipping native drivers feels like the right call, fewer moving parts.
NoNameProvided 20 hours ago [-]
I understand the general point you're making, but the pg package isn’t a good example. It has 6 deps, not 13, and 5 of those are internal packages from the same monorepo without additional dependencies. There’s only a single external dependency, and that one brings in just one additional package.
In my opinion, the pg repo and packages are an example of how OSS stuff should be maintained. Clean repo, clean code, well-maintained readme, and clearly focus on keeping things simple instead of overcomplicating.
pier25 18 hours ago [-]
You still need to pull 13 extra deps that could be compromised.
beart 1 days ago [-]
Outside of sqlite, what runtimes natively include database drivers?
pier25 1 days ago [-]
Bun, .NET, PHP, Java
Deukhoofd 1 days ago [-]
For .NET only the old legacy .NET Framework, SqlClient was moved to a separate package with the rewrite (from System.Data.SqlClient to Microsoft.Data.SqlClient). They realized that it was a rather bad idea to have that baked in to your main runtime, as it complicates your updates.
pier25 1 days ago [-]
It's still provided by Microsoft. They are responsible for those first party drivers.
LunaSea 1 days ago [-]
For Bun you're thinking of simple key / values, hardly a database. They also have a SQLite driver which is still just a package.
pier25 1 days ago [-]
I think you're confusing the database engine with the driver?
petcat 1 days ago [-]
Are people still building new projects on Node.js? I would have thought the ecosystem was moving to deno or bun now
dzogchen 1 days ago [-]
I don't really understand what the value proposition of Bun and Deno is. And I see huge problems with their governance and long-term sustainability.
Node.js on the other hand is not owned or controlled by one entity. It is not beholden to the whims of investors or a large corporation. I have contributed to Node.js in the past and I was really impressed by its rock-solid governance model and processes. I think this an under-appreciated feature when evaluating tech options.
packetlost 1 days ago [-]
Deno has some pretty nice unique features like sandboxing that, afaik, don't exist in other runtimes (yet). It's enough of a draw that it's the recommended runtime for projects like yt-dlp: https://github.com/yt-dlp/yt-dlp/issues/14404
> The permission model implements a "seat belt" approach, which prevents trusted code from unintentionally changing files or using resources that access has not explicitly been granted to. It does not provide security guarantees in the presence of malicious code. Malicious code can bypass the permission model and execute arbitrary code without the restrictions imposed by the permission model.
Deno's permissions model is actually a very nice feature. But it is not very granular so I think you end up just allowing everything a lot of the time. I also think sandboxing is a responsibility of the OS. And lastly, a lot of use cases do not really benefit from it (e.g. server applications).
zamadatix 1 days ago [-]
If one gets nothing from them directly, they've at least been a good kick to get several features into Node. It's almost like neovim was to vim, perhaps to a lesser extent.
zadikian 1 days ago [-]
Note that Bun was recently acquired by Anthropic.
gavmor 1 days ago [-]
Faster, no transpilation, dev-ex sugar.
pier25 1 days ago [-]
I agree about the governance and long-term sustainability points but if you don't see any value in Bun or Deno is probably because (no offense) you are not paying attention.
1 days ago [-]
jitl 1 days ago [-]
loud people on twitter are always switching to the new hotness. i personally can't see myself using bun until its reputation for segfaults goes away after a few more years of stabilizing. deno seems neat and has been around for longer, but its node compatibility story is still evolving; i'm also giving it another year before i try it.
if there's anyone i would trust in exploring these avenues, it's him and the maintainers doing god's work in the nodejs repo in these past few years.
Is it slop if it is carefully calculated? I tire of hearing people use slop to mean anything AI, even when it is carefully reviewed.
For someone’s website or one business maybe the risk is worth it, for a widely used software project that many others build on it is horrifying to see that much plausible code generated by an LLM.
Are you genuinely confident in a framework project that lands 19kloc generated PRs in one go? I’d worry about hidden security footguns if nothing else and a lot of people use this for their apps. Thankfully I don't use it, but if I did I'd find this really troubling.
It also has security implications - if this is normalised in node.js it would be very easy to slip in deniable exploits into large prs. It is IMO almost impossible to properly review a PR that big for security and correctness.
What kind of logic is this?
and it's not like super carefully written code is magically perfect. we know that djb can release things that are close to that, but almost nobody is like him at all!
I do not think it is wise to brag that your solution to a problem is extremely painful but that you were impervious to all the pain. Others will still feel it. This code takes bandwidth to host and space on devices and for maintainers it permanently doubles the work associated with evolving the filesystem APIs. If someone else comes along with the same kind of thinking they might just double those doubled costs, and someone else might 8x them, all because nobody could feel the pain they were passing on to others
> Bundle a full application into a Single Executable.
Embed a zip file into the executable, or something. Node sort of supports this since v25, see --build-sea. Bun and Deno support this for a longer time.
> Run tests without touching the disk.
This must be left to the host system to decide. Maybe I want them to touch the disk and leave traces useful for debugging. I'd go with tmpfile / tmpdir; whoever cares, knows to mount them as tmpfs, which sits in RAM. (Or a ramdisk under Windows.)
> Sandbox a tenant’s file access. In a multi-tenant platform, you need to confine each tenant to a directory without them escaping
This looks like a wrong tool, again. Run your Node app in a container (like you are already doing), mount every tenant's directory as a separate mount point into your container. (Similar with BSD jails.) This seems like the only problem that is not trivial to solve without a "VFS", but I'm not very certain that such a VFS would be as well-audited as Docker, or nsenter and unshare. The amount of work necessary for implementing that is too much for the niche benefit it would provide.
> Load code generated at runtime. See tmpfs for a trivial answer. For a less trivial answer, I don't see how Node's code loader is bound to a filesystem. If it can import via https, Just use ESM loader hooks and register() your loader, assuming you're running Node ≥ 20.6.
While the large code changes were maintained, they were often split up into a set of semantically meaningful commits for purposes of review and maintenance.
With AI blowing up the line counts on PRs, it's a skill set that more developers need to mature. It's good for their own review to take the mass changes, ask themselves how would they want to systematically review it in parts, then split the PR up into meaningful commits: e.g. interfaces, docs, subsets of changed implementations, etc.
Like, why on earth would I spent hours reviewing your PR that you/Claude took 5 minutes to write? I couldn't care less if it improves (best case scenario) my open source codebase, I simply don't enjoy the imbalance.
My personal approach to open source is more or less that when I need a piece of software to exist that does not and there is no good reason to keep it private, it becomes open source. I don’t do it for fun, I do it because I need it and might as well share it. If someone sends me a patch that enhances my use case, I will work with them to incorporate it. If they send me a patch that only benefits them it becomes a calculus of how much effort would it take for me to review it. If the effort is high, my advice is to fork the project or make it easier for me to review. Granted I don’t maintain huge or vital projects, but that’s precisely why: I don’t need yet another programming language or runtime to exist and I wouldn’t want to work on one for fun.
If the PR does what it says it does, why does it actually matter if it took 2 weeks or 2 minutes to put together, given that it's the equivalent level of quality on review?
You're right that the issue isn't how many minutes it took. The issue is that it's slop. Reviewing thousands of lines of crappy code is unpleasant whether they were autogenerated or painstakingly handcrafted. (Of course, few humans have the patience and resistance to learning to generate the amount of terrible code that AIs do routinely).
I would probably never be able to review this kind of code in open source projects without any financial compensation, because of that reason. Not because I don't like LLMs, not use LLMs, or think their code is of bad quality. But, while without LLMs I know there was a person who sat down and wrote all this in painstaking work, now I know that he or she barely steered a robot that wrote it. It may still be good work, and the steering and prompting is still work and requires skill, but for me I would not feel any emotional value in this code, and it would make it A LOT harder to gather motivation to review it. Interestingly, when I think about it, I realize that I would inherently have motivation to find out how the developer prompted the agent.
Like, you know, when I see a wooden statue of which I know it was designed and carved by someone in months of work, I could appreciate every single edge of the wood much more than if there's a statue that was designed by someone but carved by some kind of wooden CNC machine. It may be same statue and the same or even better quality, and it was still skillful work, but I lose my connection to it.
Can't quite pinpoint it, but for me, it seems, the human aspect is really important here, at least when it's about passion and motivation.
Maybe that made some sense, idk. I just wrote out of my ass.
Not everyone has the same motivations. I’ve done open source for fun, I’ve done it to unblock something at work, I’ve done it to fix something that annoys me.
If your project is gaining useful functionality, that seems like a win.
Of course any chess bot is going to play better, but that's not the point
As for us (aspiring) craftsman, there are dozens of us! Dozens!
Because they're implicitly asking me to put in effort as a reviewer. Pretending that they put more effort in than they have is extremely rude, and intentionally or not, generating a large volume of code amounts to misleading your potential reviewers.
> If there was a huge amount of tedium that they used Claude Code for, then reviewed and cleaned up so that it’s indistinguishable from whatever you’d expect from a human; what’s it to you?
They never do though. These kind of imaginary good AI-based workflows are a "real communism has never been tried" thing.
> If your project is gaining useful functionality, that seems like a win.
Lines of code impose a maintenance cost, and that goes triple when the code quality is low (as is always the case for actually existing AI-generated code). The cost is probably higher than the benefit.
Well, the process you’re describing is mature and intentionally slows things down. The LLM push has almost the opposite philosophy. Everyone talks about going faster and no one believes it is about higher quality.
If there is some bug that slips by review, having the PR broken down semantically allows quicker analysis and recovery later for one case. Even if you have AI reviewing new Node.js releases for if you want to take in the new version - the commit log will be more analyzable by the AI with semantic commits.
Treating the code as throwaway is valid in a few small contexts, but that is not the case for PRs going into maintained projects like Node.js.
The fact is, it's useful as a tool, but you still should review what's going on/in. That isn't always easy though, and I get that. I've been working on a TS/JS driver for MS-SQL so I can use some features not in other libraries, mostly bridging a Rust driver (first Tiberious, then mssql-client), the clean abstraction made the switch pretty quick... a fairly thorough test suite for Deno/Node/Bun kapt the sanity in check. Rust C-style library with FFI access in TS/JS server environment.
My hardest part, is actually having to setup a Windows Server to test the passswordless auth path (basically a connection string with integrated windows auth). I've got about 80 hours of real time into this project so far. And I'll probably be doing 2 followups.. one with be a generic ODBC adapter with a similar set of interfaces. And a final third adapter that will privide the same methods, but using the native SQLite underneath but smothing over the differences.
I'm leveraging using/dispose (async) instead of explicit close/rollback patterns, similar to .Net as well as Dapper-like methods for "Typed" results, though no actual type validation... I'd considered trying to adapt Zod to check at least the first record or all records, and may still add the option.
All said though, I wouldn't have been able to do so much with so relatively little time without the use of AI. You don't have to sacrifice quality to gain efficiency with AI, but you do need to take the time to do it.
Note aside, OpenJS executive director mentioned it's ok to use AI assistance on Node.js contributions:
[1]: https://github.com/nodejs/node/pull/61478#issuecomment-40772...It is great to have a legal perspective on compliance of LLM generated code with DCO terms, and I feel safer knowing that at least it doesn't expose Node.js to legal risk. However it doesn't address the well known unresolved ethical concerns over the sourcing of the code produced by LLM tooling.
Speed code all your SaaS apps, but slow iteration speeds are better for a runtime because once you add something, you can basically never remove it. You can't iterate. You get literally one shot, and if you add a awkward or trappy API, everyone is now stuck with it forever. And what if this "must have" feature turns out to be kind of a dud, because everyone converged on a much more elegant solution a few years later? Congratulations, we now have to maintain this legacy feature forever and everyone has to migrate their codebase to some new solution.
Much better to let dependencies and competing platforms like bun or deno do all the innovating. Once everyone has tried and refined all the different ways of solving this particular problem, and all the kinks have been worked out, and all the different ways to structure the API have been tried, you can take just the best of the best ideas and add it into the runtime. It was late, but because of that it will be stable and not a train wreck.
But I know what you're thinking. "You can't do that. Just look at what happens to platforms that iterate slowly, like C or C++ or Java. They're toast." Oh wait, never mind, they're among the most popular platforms out there.
Time is highly correlated with expertise. When you don’t have expertise, you may go fast at expense of stability because you lack the experience to make good decisions to really save speed. This doesn’t hold true for any projects where you rely on experts, good processes and tight timelines (aka: Apollo mission)
And again, I'm not making a claim that the slow and steady tradeoff is best for all situations. Just that it is a great tradeoff for foundational platforms like a runtime. On a platform like postgresql or the JVM, the time from initial proposal to being released as a stable feature is generally years, and this pace I think has served those platforms well.
But I'm open to updating my priors. Do you think there are foundational platforms out there that iterate quickly and do a good job of it?
but you can only have two of them at the same time.
and we’re talking about FOSS here, so cheap kinda has to be one of them.
You’ve read people saying the same thing hundreds of times and have somehow taken that as meaning that it’s credible.
Neither you nor I nor anyone else here knows what the “effects” are, because this is brand new tech, and it’s constantly changing. Yet you’re speaking with absolute confidence.
“Big tech” has downtime all the time, and LLMs did not change that fact. The only difference is that the peanut gallery that is already worked up about AI for philosophical / cultural reasons is suddenly ready to blame AI for every issue under the sun.
You think that you’re making a technical argument but you’re just repeating the same taking points I see teenagers regurgitating on TikTok. There’s nothing intelligent or credible about it.
Don't slap someone else down because you don't know something.
If and when there is evidence that AI is actually increasing the speed of improvement (and not just churn), it would make sense to permit it. Unless and until such evidence emerges, the risks greatly outweigh the benefits, at least for a foundational codebase like this.
That sort of statement might also be sarcasm in another context: I personally use AI a lot, but also recognize that there are a lot of projects out there that are suffering from low quality slop pull requests, devs that kinda sign out and don't care much about the actual code as long as it appears to be running, alongside most LLMs struggling a lot with longer term maintenance if not carefully managed. So I guess it depends a lot on how AI is used and how much ideological opposition to that there is. In a really testable codebase it could actually work out pretty well, though.
It's not an AI issue. Node.js itself is lots of legacy code and many projects depend on that code. When Deno and Bun were in early development, AI wasn't involved.
Yes, you can speed up the development a bit but it will never reach the quality of newer runtimes.
It's like comparing C to C++. Those languages are from different eras (relatively to each other).
If submitter picks (a) they assert that they wrote the code themselves and have right to submit it under project's license. If (b) the code was taken from another place with clear license terms compatible with the project's license. If (c) contribution was written by someone else who asserted (a) or (b) and is submitted without changes.
Since LLM generated output is based on public code, but lacks attribution and the license of the original it is not possible to pick (b). (a) and (c) cannot be picked based on the submitter disclaimer in the PR body.
(a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or
"in part" is a trivial bar to clear.
If there isn't, then (b) works fine, the code is taken from the LLM with no preexisting license. And it would be very strange if a mix of (a) and (b) is a problem; almost any (b) code will need some (a) code to adapt it.
That's not good enough to comply with (b). The code must be specifically covered by an open-source license, it's not enough for it to just not have a license.
As far as I'm concerned, public domain counts as "an appropriate open source license".
For material whose author is known and has explicitly placed it in the public domain, sure. For code that fell off the back of a truck, not so much.
(There's always a risk of an LLM copying something verbatim by accident, but if the designers are doing their job that chance gets low enough to be acceptable. Human code has that risk too after all. (And for situations that aren't an accident, with the human intentionally using snippets to draw out training text, then if they submit that code in a patch it's just a human violating copyright with extra steps.))
Source: https://www.copyright.gov/help/faq/faq-general.html
Whether AI output can fall under copyright at all is still up for debate - with some early rulings indicating that the fact that you prompted the AI does not automatically grant you authorship.
Even if it does, it hasn't been settled yet what the impact of your AI having been trained on copyrighted material is on its output. You can make a not-completely-unreasonable argument that AI inference output is a derivative work of AI training input.
Fact is, the matter isn't settled yet, which means any open-source project should assume the worst possible outcome - which in practice means a massive AI-generated PR like this should be treated like a nuke which could go off at any moment.
Well, it's a good thing you're not on the hook for defending against it, then.
Like I said in another comment, you don't have a license just because they're cool and look neat. You have them specifically to guard against people like patent trolls, who are trying to wreck your shit and take your lunch money. It's not an abstract risk.
1. Copyright cannot be assigned to an AI agent.
2. Copyrighted works require human creativity to be applied in order to be copyrighted.
For point 2 this would apply to times were AI one shots a generic prompt. But for these large PRs where multiple prompts are used and a human has decided what the design should be and how the API should look you get the human creativity required for copyright.
In regards to being a derivative work I think it would be hard to argue that an LLM is copying or modifying an existing original work. Even if it came up with an exact duplicate of a piece of code it would be hard to prove that it was a copy and not an independent recreation from scratch.
>the worst possible outcome
The worst possible outcome is they get sued and Anthropic defends them from the copyright infringement claim due to Anthopic's indemnity clause when using Claude Code.
Also the commercial version is limited to “…Customer and its personnel, successors, and assigns…”. I am very much not a lawyer and couldn’t find definitions of these in the agreement but I am not sure how transferable this indemnity would be to an open source project.
You don't have a license because it's what all the cool kids are doing, you have one in case shit goes sideways and someone decides to try and ruin your day. You do, in fact, have to assume the worst.
The "nuke" here is some litigious company -- let's call them Patent Troll Rebranded (PTR) -- discovers that the LLM reproduced large amounts of their copyrighted code. Or it claims to have discovered it. They have large amounts of money and lawyers to fight it out in court and you are a relatively shoestring language foundation.
Either you have to unwind years of development to remove the offending code or you're spending six figures or more to defend yourself in court, all because you didn't bother to anticipate things that are anticipatable.
On a more serious note, I think that this will be thoroughly reviewed before it gets merged and Node has an entire security team that overviews these.
Oh I'd use an llm to generate large amounts of feedback and request changes!
Your paid to do a job, you're either professional or you aren't.
Who reviewed and approved the PR?
I like the idea of it mocking the file system for tests, but I feel like that should probably be part of the test suite, not Node.
The example towards the end that stores data in a sqlite provider and then saves it as a JSON file is mind-boggling to me. Especially for a system that's supposed to be about not saving to the disk. Perhaps it's just a bad example, but I'm really trying to figure out how this isn't just adding complexity.
I had to laugh, because the post you're replying to STRONGLY reminds me of this story, https://news.ycombinator.com/item?id=31778490 , in which some people on the GNOME project objected to thumbnails in the file-open dialog box because it might be a "Security issue" (even though thumbnails were available in the normal file browser, something those commenters probably should have known about, but didn't, but they just had to chime in anyway).
My current flow is to literally embed the JavaScript in the binary, then on start, write the JavaScript code to `/tmp/{random}` and point Node.js to execute the code at that destination.
A virtualized filesystem also allows for a safer "plugin" story for Node.js - where JavaScript plugins can be prevented from accessing the real filesystem.
https://github.com/tc39/proposal-module-expressions
Just my opinion, probably not a popular one. But I will be avoiding an upgrade to Node.js after 24.14 for a while if this is becoming an acceptable precedent.
I do see some original benefits to a VFS though, bad application decisions aside, but they are exceedingly minor.
As an aside I think JavaScript would benefit from an in-memory database. This would be more of language enhancement than a Node.js enhancement. Imagine the extended application capabilities of an object/array store native to the language that takes queries using JS logic to return one or more objects/records. No SQL language and no third party databases for stuff that you don't want to keep in offline storage on a disk.
The more structures you have in a given application and the larger those structures become in their schemas the more valuable a uniform storage and retrieval solution becomes.
Its not about what it looks like. Arrays have fancy functional methods, but not object structures. Its more about whether it executes faster and comprises fewer steps to read/write. A real case in my application is get all ports associated with unencrypted sockets associated with servers of a given type and sort the output in a manner chosen by the user. The data in this case is in different unrelated objects whose properties point to each other in various ways by identity, because each server and socket uses hashes for unique identifiers.
isn't that just global state, or do you mean you want that to be persistent?
I get it, I've implemented things for tests, I'm just wondering if this shouldn't be solved at an OS level.
--- update
Let's put this another way, my code does effectively, child_process.spawn('something-that-reads-and-write-a-file')
now I'm back to the same issue. To test I need a virtual file system. Node providing one won't help.
I do think it's more painful to distribute files when you're a distributed as a single binary vs scripts, since the latter has to figure out bundling of files anyway.
But still - it does exist
A ZIP fork embedded into the executable should be an obvious read-only VFS implementation. Bring your assets with you, even maybe build them with the standard zip utility.
It should take relatively few LOCs, provided that libzip is already linked into the executable anyway.
That’s so dehumanizing, I would happily write such code.
Look, most of us realized around 2004 or so that if you had a choice between Norton and the virus you would pick the virus. In the Windows world we standardized around Defender because there is some bound on how much Defender degrades the performance of your machine which was not the case with competitive antivirus software.
I've done a few projects which involved getting container file formats like ZIP and PDF (e.g. you know it's a graph of resources in which some of those resources are containers that contain more resources, right?) and now that I think of it you ought to be able to virus scan ZIP files quickly and intelligently but the whole problem with the antivirus industry is that nobody ever considers the cost.
Oh, wait...
[0] https://yarnpkg.com/advanced/pnp-spec#zip-access
See https://pnpm.io/motivation
Also, while popularity isn't necessarily a great indicator of quality, a quick comparison shows that the community has decided on pnpm:
https://www.npmcharts.com/compare/pnpm,yarn,npm
Firstly - with yarn pnp zero-installs, you don't have to run an `install` every time you switch branch, just in case a dep changed. So much dev time is wasted due to this.
Secondly - "it worked on my machine" is eliminated. CI and deploy use the exact same files - this is particularly important for deeply nested range satisfied dependencies.
Thirdly - packages committed to the repo allows for meaningful retrospectives and automated security reviews. When working in ops, packages changing is hell.
All of this is facilitated by the zip files that the comment you replied to was discussing, that you tangented away from.
The graph you have linked is fundamentally odd. Firstly - there is no good explanation of what it is actually showing. I've had claude spin on it and it reckons its npm download counts. This leads to it being a completely flawed graph! Yarn berry is typically installed either via corepack or bootstrapped via package.json and the system yarn binary. Yarn even saves itself into your repo. pnpm is never (I believe) bundled with the system node, wheras yarn and npm typically are.
Your graph doesn't show what you claim it does.
Combined with a hackable IDE like Atom (Pulsar) made with the same tech it’s a pretty great dev exp for web devs
Python had shared packages for a long time and those are fine up to a point but circa 2017 I was working at a place where we had data scientists making models using different versions of Tensorflow and stuff and venv’s are essential to that. We were building unusually complex systems and having worse problems than other people but if you do enough development you will have trouble with shared packages.
The node model of looking for packages in the local directory has some appeal and avoids the need for “activation” but I like writing Python-based systems that define one or more command line programs that I can go use in any directory I want. For instance, if I want to publish one of my Vite projects I have a ‘transporter’ written in Python that looks for a Vite project in the current directory and uploads it to S3, updates metadata and invalidates cloudfront and all that. I have to activate it which is a minor hassle but then I can go to different Vite projects and publish them.
https://web.archive.org/web/20161003115800/https://blog.mozi...
This is the biggest takeaway for me for AI. It's not even that nobody wants to do these things, its that by the time you finish your tasks, you have no time to do these things, because your manage / scrum master / powers that be want you to work on the next task.
No.
The alternative is that you work on the same number of features and utilize the ability to make those features as robust as you know they could be, but you have other pressing matters to attend to. That's weighing the ability of AI against the ability of neglect.
There's Docker, OverlayFS, FUSE, ZFS or Btrfs snapshots?
Do you not trust your OS to do this correctly, or do you think you can do better?
A lot of this stuff existed 5, 10, 15 years ago...
Somehow there's been a trend for every effing program to grow and absorb the features and responsibilities of every other program.
Actually, I have a brilliant idea, what if we used nodejs, and added html display capabilities, and browser features? After all Cursor has already proven you can vibecode a browser, why not just do it?
I'm just tired at this point
¹E.g. if you've got music, and it's sorted `artist/album/track<n>.extension`, and two artists collaborate on an album, which one gets the album in their folder? What if you want to sort all songs in the display by publication date? Even if they use the files on your filesystem without moving them, some sort of metadata database will be needed for efficient display & search.
The sqlar schema is missing some of the info thats being stored atm, but there's nothing stopping you from adding your own fields/tables on top of the format, if anything the docs encourage it. It is just a sqlite database at the end of the day.
https://www.sqlite.org/sqlar.html
- https://github.com/yarnpkg/berry/issues/7065
- https://github.com/nodejs/node/issues/62012
This is because yarn patches fs in order to introduce virtual file path resolution of modules in the yarn cache (which are zips), which is quite brittle and was broken by a seemingly unrelated change in 25.7.
The discussion in issue 62012 is notable - it was suggested yarn just wait for vfs to land. This is interesting to me in two ways: firstly, the node team seems quite happy for non-trivial amounts of the ecosystem to just be broken, and suggests relying on what I'm assuming will be an experimental API when it does land; secondly, it implies a lot of confidence that this feature will land before LTS.
yarn/node relations specifically are... complicated. On display on corepack (yarn project which got bundled into official nodejs distribution) issue tracker.
> secondly, it implies a lot of confidence that this feature will land before LTS.
This confidence is somewhat concerning. Will it get reviewed at all or has the "trust the LLM" mandate arrived at Node too now.
Not spamming, not affiliated, just trying to help others avoid so much needless suffering.
I expect yarn to have a real competitor sooner rather than later that will replace it; and I do wonder if it is this vfs module that will enable it.
Not saying vfs is bad, just it's not impossible in a few lines of code to set up that. My idea for a simple version of a vfs in node is to use a RAM disk/RAMfs - would that work?
Basically an "fs-core" that everything ultimately goes through, and which can be switched out/layered with another implementation. Think express-style routing but for the filesystem.
That'll keep things simple in node's codebase while handing more power to users.
What I really want is a way of swapping FS with VFS in a Node.js program harness. Something like
So basically Node never touches the disk and load everything from the memoryhttps://en.wikipedia.org/wiki/List_of_RAM_drive_software
I miss those days where you can tweak all kinds of software GUI by your self. Change icons, menus, shortcut keys, etc.
Sure you can. Function() exists and require.cache exists. This is _intentionally_ exploitable.
(I know, I know, it's ugly and has its own set of problems)
These arguments don't even make sense, they look LLM generated. I can't even formulate a disagreement against this nonsense.
From https://github.com/jupyterlite/jupyterlite/issues/949#issuec... :
> Ideally, the virtual filesystem of JupyterLite would be shared with the one from the virtual terminal.
emscripten-core/emscripten > "New File System Implementation": https://github.com/emscripten-core/emscripten/issues/15041#i... :
> [ BrowserFS, isomorphic-git/lightningfs, ]
pyodide/pyodide: "Native file system API" #738: https://github.com/pyodide/pyodide/issues/738 re: [Chrome,] Filesystem API :
> jupyterlab-git [should work with the same VFS as Jupyter kernels and Terminals]
pyodide/pyodide: "ENH Add API for mounting native file system" #2987: https://github.com/pyodide/pyodide/pull/2987
The node.js codebase and standard library has a very high standard of quality, hope that doesn't get washed out by sloppy AI-generated code.
OTOH, Matteo is an excellent engineer and the community owes a lot to him. So I guess the code is solid :).
Wonder what Dang says about that.
By far the most critical issue is the over reliance on third party NPM packages for even fundamental needs like connecting to a database.
Databases are third party tech, I don’t think it’s unreasonable to use a third party NPM module to connect to them.
Java also has a JIT compiling JS engine that can be sandboxed and given a VFS:
https://www.graalvm.org/latest/security-guide/sandboxing/
N.B. there's a NodeJS compatible mode, but you can't use VFS+sandboxing and NodeJS compatibility together because the NodeJS mode actually uses the real NodeJS codebase, just swapping out V8. For combining it all together you'd want something like https://elide.dev which reimplemented some of the Node APIs on top of the JVM, so it's sandboxable and virtualizable.
So it's an external dependency that is not part of Java. It doesn't really matter if the code comes from the vendor or not. Especially for OpenSource databases.
If you don't value that, why would you want your programming language implementors to also implement database drivers?
I'm not saying Node should support every db in existence but the ones I listed are critical infrastructure at this point.
When using Postgres in Node you either rely on the old pg which pulls 13 dependencies[1] or postgres[2] which is much better and has zero deps but mostly depends on a single guy.
[1] https://npmgraph.js.org/?q=pg
[2] https://github.com/porsager/postgres
That would be more useful for the ecosystem than the Node team investing time on a virtual file system.
In my opinion, the pg repo and packages are an example of how OSS stuff should be maintained. Clean repo, clean code, well-maintained readme, and clearly focus on keeping things simple instead of overcomplicating.
Node.js on the other hand is not owned or controlled by one entity. It is not beholden to the whims of investors or a large corporation. I have contributed to Node.js in the past and I was really impressed by its rock-solid governance model and processes. I think this an under-appreciated feature when evaluating tech options.
> The permission model implements a "seat belt" approach, which prevents trusted code from unintentionally changing files or using resources that access has not explicitly been granted to. It does not provide security guarantees in the presence of malicious code. Malicious code can bypass the permission model and execute arbitrary code without the restrictions imposed by the permission model.
Deno's permissions model is actually a very nice feature. But it is not very granular so I think you end up just allowing everything a lot of the time. I also think sandboxing is a responsibility of the OS. And lastly, a lot of use cases do not really benefit from it (e.g. server applications).
Open 80, closed 492.