fair point. that's the title we used on our X article and i copy-pasted. updated the article's actual title.
23 hours ago [-]
a_t48 24 hours ago [-]
I've found that tar processing tends to dominate the time used to do anything with standard OCI layers. I have a more efficient format (that splits apart the layer into metadata+chunks) that I'm open sourcing soon if y'all are interested in using it.
appcypher 23 hours ago [-]
interested. is the split for dedup, parallel pulls, or lazy loading specific files? maybe all.
we've played with some chunking ideas on our end but haven't landed on a format. drop a link when it's out.
a_t48 20 hours ago [-]
All of the above, plus being able to reflink to skip copies of large files, plus not having to round trip from disk a few times for tar layers, plus a number of other side benefits. Only using lazy loading for buildkit right now, as it does require FUSE and I want it to be opt in (for robotics contexts, for instance, you never want to lazy load).
andix 1 days ago [-]
Isn't it really obvious that a user space fs will always be slow, and especially slow with small files?
I don't know the purpose of microsandbox, but such an article doesn't give me great confidence in exploring it further.
toast0 23 hours ago [-]
> Isn't it really obvious that a user space fs will always be slow, and especially slow with small files?
Small files seem like the perfect case for a user space fs... depending on what you mean by user space fs. If you mean interfacing with a FUSE (or similar) filesystem by using syscalls in your program to context switch into the kernel, then context switch to the userspace FUSE layer, then send it back to the kernel and then back to your program ... that will be especially slow with small files where date bytes per context switch is small. OTOH, if you mean a user space fs where your program has a built in filesystem it can access without context switching, then that will be of benefit ... especially if the files are small enough that you can pack multiple files into a single page.
andix 7 hours ago [-]
If I got the article right, they were running FUSE inside the VM, and the VM's FUSE was talking to a process on the VM host (probably over virtual network). That can't be fast, not even theoretically.
Ever access to the fs had to go through two processes and two kernels, virtual networking, and probably even running on two different cores most of the time. That must be slow.
somat 22 hours ago [-]
Not obvious to me.
A filesystem is a database, it is a type of key value storage. there is a hierarchical lookup key that points you to an unstructured block of data.
Many databases(berkleydb, postgresql, sqlite) are then built on this unstructured database. There is absolutely nothing indicating that putting putting a key value database with hierarchical keys and unstructured blocks in a single file will be slow.
It could be, naive indexing or rebalancing could be very slow. But it does not have to be. In fact berkleydb is a neat case study here. superficially it is a ridiculously simple key value store, why does such a simple thing even need to exist, or have such a long lived presence. It turns out building the efficient structures needed to work with slow non-volatile storage is non-trivial. Early mysql used berkleydb as a low level storage engine. Note that mysql main selling point was speed before correctness.
See also: Virtual machines another ubiquitous case of a filesystem in userspace.
andix 22 hours ago [-]
It's something completely different. A database like SQLite runs in the same process as the application
All the filesystem calls go through the kernel. A userspace file system is another process.
It's not like SQLite, it's more like Postgres. Try sending a few hundred thousand small queries to Postgres, and be surprised how slow it's going to be.
The file system api is not like sql that allows complex queries. It's a lot of tiny and simple requests that assume very low latency.
densh 23 hours ago [-]
Sqlite is essentially a user space queryable file system and it can be faster than writing to file system directly while working with small files.
andix 22 hours ago [-]
SQLite is in-process. A user space file system is another process. Like Postgres if we want to compare fs with dbs. And Postgres is slow for many small queries, like a userspace file system.
dspig 1 days ago [-]
That's such a HN title!
nine_k 1 days ago [-]
And the content. A number of smart design decisions, and analysis of what was wrong with the previous versions.
Also, it's a great illustration of the benefits of layered, modular design that Linux sports: it allows to mix and match parts to build what you need.
menno-sh 1 days ago [-]
Sounds like a title you’d read on the satirical version of HN in a TV show like Silicon Valley
nathanmills 1 days ago [-]
[flagged]
moralestapia 1 days ago [-]
>Every file operation inside the VM had to bounce out to the host through FUSE
Lol, yeah that was your mistake. FUSE is a phenomenal idea but anyone who has used it knows how slow it can be.
fh973 24 hours ago [-]
FUSE is not slow. Our distributed file system pipes over 70 GB/a through a single mount point.
throw1234567891 23 hours ago [-]
70GB/s or 70 Gbps?
uneekname 22 hours ago [-]
OP said 70GB/a so I'm gonna assume that's gigabytes per annum /j
himata4113 1 days ago [-]
I have learned first hand with my agentic workflow as it took 1 hour to compile rust instead of 6 seconds.
fc417fc802 23 hours ago [-]
I don't think you can blame that on FUSE in general. If not some quirk of your local setup then maybe the particular implementation you were using - what sort of volume was it?
himata4113 22 hours ago [-]
it was dirty pages, kernel was artificially throttling it inside a sandbox.
goneri 1 days ago [-]
Especially compared to direct virtio access to physical volume.
22 hours ago [-]
nasretdinov 24 hours ago [-]
If your problem could be solved without FUSE, it probably should!
appcypher 22 hours ago [-]
that's the tldr. we used fuse and we learnt we shouldn't for a sandbox filesystem
cassianoleal 1 days ago [-]
Is anyone here using this software? How do you integrate it with your agent workflow? Do you run agents in editor (Zed, VS Code, Cursor, whatever)?
Have you tried the sync feature?
Edit: FML why is this being downvoted? At least have the decency of explaining, I'm happy to adjust my conduct but I can't do so if I don't know what I did wrong.
toksdotdev 23 hours ago [-]
it depends where you want your agent to live. inside the sandbox, start a sandbox via the CLI and run your agents/do your dev in there. outside the sandbox, you'd configure your harness to use the MCP / skill integration. here's the guide: https://docs.microsandbox.dev/getting-started/agents
if you're building a harness, the SDK provides better integration. let me know if you hit any blockers.
for sync, it's currently in the works.
cassianoleal 20 hours ago [-]
Awesome, thanks!
If I run the agent outside and use the MCP, is it the model's responsibility to actually develop in the sandbox or are there deterministic guardrails against performing activities outside of it?
Enginerrrd 1 days ago [-]
I can only guess at the actions of others, but I would guess it’s because your comment is a tangent and at best only vaguely related to the featured article?
The article is really about solving a particular problem with the backend of their infrastructure. Discussion about VMs, Linux kernel syscalls, file systems (virtual, FUSE, etc) would all be relevant.
Your comment is a question about whether and how people use the software itself, which is pretty unrelated to the article.
It’s a bit like an article about Porsche identifying a particular engineering nuance in their fuel injectors, and how things didn’t work the way they thought at a low level, and how they solved it once they realized it. And then you come in with a comment about what people like to do with their Porsches. Like, sure, it involves the same company but what would that have to do with the underlying article on automotive engineering?
Combine that with a growing disdain for the insistence of certain segments of the tech scene to make everything about agentic workflows, (an echo to the constant evangelism of cryptocurrencies or blockchain in the recent past) and you have a recipe for downvotes.
cassianoleal 1 days ago [-]
This is pretty common on this forum though. Many times the comments section becomes mostly about things that are not necessarily directly related to the article but remain related to the bigger thing the article is about.
Oh well. :) Thanks for your insight anyway.
Enginerrrd 1 days ago [-]
You’re not wrong, and this is speculation, but I suspect you’re just missing the subtext I added in my edit: that some people are burnt out on the evangelism of agentic workflows in the same way they were about blockchain or whatnot.
wizzwizz4 1 days ago [-]
People are generally sick of AI, and of people who bring AI up in every single comment thread. The downvoters may not realise that TFA is by an AI company, about an AI product, making your comment tangentially relevant, and not (strictly-speaking) an example of the behaviour they're fed up with.
cassianoleal 1 days ago [-]
Heh yeah it very well could be. I am also fed up with AI everywhere but I don't go out downvoting everyone and everything who mentions it - and definitely not where the whole context is about it.
jmclnx 23 hours ago [-]
The site is impossible for me to read due to the colors. I went to lynx and i looks like it is about a file system in a VM.
toksdotdev 22 hours ago [-]
thanks, that's useful feedback. is this also the case in light mode? i'll take a look and tighten the contrast.
throw1234567891 23 hours ago [-]
Could you not find the reader view button?
PunchyHamster 24 hours ago [-]
You made your filesystem 47x slower by NIHing it
Rendered at 19:49:29 GMT+0000 (Coordinated Universal Time) with Vercel.
Your coworker/other account, messed it up last time you submitted it too: We made our sandbox filesystem 47× faster by deleting it https://news.ycombinator.com/item?id=48195883
> Otherwise please use the original title, unless it is misleading or linkbait; don't editorialize.
You're linkbaiting.. the opposite of the guideline
https://news.ycombinator.com/newsguidelines.html
we've played with some chunking ideas on our end but haven't landed on a format. drop a link when it's out.
I don't know the purpose of microsandbox, but such an article doesn't give me great confidence in exploring it further.
Small files seem like the perfect case for a user space fs... depending on what you mean by user space fs. If you mean interfacing with a FUSE (or similar) filesystem by using syscalls in your program to context switch into the kernel, then context switch to the userspace FUSE layer, then send it back to the kernel and then back to your program ... that will be especially slow with small files where date bytes per context switch is small. OTOH, if you mean a user space fs where your program has a built in filesystem it can access without context switching, then that will be of benefit ... especially if the files are small enough that you can pack multiple files into a single page.
Ever access to the fs had to go through two processes and two kernels, virtual networking, and probably even running on two different cores most of the time. That must be slow.
A filesystem is a database, it is a type of key value storage. there is a hierarchical lookup key that points you to an unstructured block of data.
Many databases(berkleydb, postgresql, sqlite) are then built on this unstructured database. There is absolutely nothing indicating that putting putting a key value database with hierarchical keys and unstructured blocks in a single file will be slow.
It could be, naive indexing or rebalancing could be very slow. But it does not have to be. In fact berkleydb is a neat case study here. superficially it is a ridiculously simple key value store, why does such a simple thing even need to exist, or have such a long lived presence. It turns out building the efficient structures needed to work with slow non-volatile storage is non-trivial. Early mysql used berkleydb as a low level storage engine. Note that mysql main selling point was speed before correctness.
See also: Virtual machines another ubiquitous case of a filesystem in userspace.
All the filesystem calls go through the kernel. A userspace file system is another process.
It's not like SQLite, it's more like Postgres. Try sending a few hundred thousand small queries to Postgres, and be surprised how slow it's going to be.
The file system api is not like sql that allows complex queries. It's a lot of tiny and simple requests that assume very low latency.
Also, it's a great illustration of the benefits of layered, modular design that Linux sports: it allows to mix and match parts to build what you need.
Lol, yeah that was your mistake. FUSE is a phenomenal idea but anyone who has used it knows how slow it can be.
Have you tried the sync feature?
Edit: FML why is this being downvoted? At least have the decency of explaining, I'm happy to adjust my conduct but I can't do so if I don't know what I did wrong.
if you're building a harness, the SDK provides better integration. let me know if you hit any blockers.
for sync, it's currently in the works.
If I run the agent outside and use the MCP, is it the model's responsibility to actually develop in the sandbox or are there deterministic guardrails against performing activities outside of it?
The article is really about solving a particular problem with the backend of their infrastructure. Discussion about VMs, Linux kernel syscalls, file systems (virtual, FUSE, etc) would all be relevant.
Your comment is a question about whether and how people use the software itself, which is pretty unrelated to the article.
It’s a bit like an article about Porsche identifying a particular engineering nuance in their fuel injectors, and how things didn’t work the way they thought at a low level, and how they solved it once they realized it. And then you come in with a comment about what people like to do with their Porsches. Like, sure, it involves the same company but what would that have to do with the underlying article on automotive engineering?
Combine that with a growing disdain for the insistence of certain segments of the tech scene to make everything about agentic workflows, (an echo to the constant evangelism of cryptocurrencies or blockchain in the recent past) and you have a recipe for downvotes.
Oh well. :) Thanks for your insight anyway.