The memory forking is really interesting. I wonder if copy-on-write at the VM level, O(1) with respect to machine size, won't scale cost with how many forks to take, but 320ms median seems good for the branch-and-explore pattern without reprovisioning every time.
One gap I'm noticing in these comments and in the current sandbox landscape is Windows. Every platform mentioned in these comments like E2B, Daytona, Fly Sprites, Sandflare appears Linux-native. Makes sense for coding agents targeting Debian environments, but a real category exists to automate Windows-specific workflows: enterprise software, ERP systems, anything that runs only on Windows.
If anyone wants to run agents in Mac or Linux and need to access Windows for computer use, Dexbox could be helpful. [github.com/getdexbox/dexbox]
I launched an open source developer tool called Dexbox to run agent workloads that quickly provision and run Windows desktops. It's a CLI and MCP experience that's different from Freestyle, but slightly closer to our Windows-specific production infra, Nen. I like Freestyle's cool UI that shows off the unique technical approach and developer friendliness. Nen's a bit closer to that experience.
benswerd 2 hours ago [-]
Its actually almost O(1) with respect to fork count. We have some O(N) behaviors but I expect to be able to remove those in the next 6 months and get to full horizontal fork O(1) any VM any fork count.
TheTaytay 1 days ago [-]
Wow, forking memory along with disk space this quickly is fascinating! That's something that I haven't seen from your competitors.
If the machine can fork itself, it could allow for some really neat auto-forking workflows where you fuzz the UI testing of a website by forking at every decision point. I forget the name of the recent model that used only video as its latent space to control computers and cars, but they had an impressive demo where they fuzzed a bank interface by doing this, and it ended up with an impressive number of permutations of reachable UI states.
benswerd 1 days ago [-]
That’s what I’m hoping for!
BlueRock-Jake 2 hours ago [-]
Ton of people have mentioned this but what you're doing with memory forking is pretty unique. Most sandboxes seem to just fork the filesytem and call it a day. Forking full VM memory mid-exec is taking it to another level entirely. Would be very interested to hear how the implementation looks under the hood, specifically how you handle dirty memory pages across forks without the pause ballooning.
_pdp_ 23 hours ago [-]
Nice work.
However, 50 concurrent VMs is not a lot. Similar limits exists on all cloud providers, except perhaps in AWS where the cost is prohibitive and it is slow.
Earlier this year, we ended up rolling out own. It is nothing special. We keep X number of machines in a warm pool. Everything is backed by a cluster of firecracker vms. There is no boot time that we care about. Every new sandbox gets vm instantaneously as long as the pool is healthy.
kjok 23 hours ago [-]
Thanks for sharing your approach!
> It is nothing special. We keep X number of machines in a warm pool.
I'd love to better understand the unit economics here. Specifically, whether cost is a meaningful factor.
The reason I ask is that many startups we've seen focus heavily on optimizing their technology to reduce cold/boot startup times. As you pointed out, perceived latency can also be improved by maintaining a warm pool of VMs.
Given that, I'm trying to determine whether it's more effective to invest in deeper technical optimizations, or to address the cold start problem by keeping a warm pool.
benswerd 23 hours ago [-]
50 is not heavy, what is heavy is 1000 VMs that can be paused/brought back 50 in 1 second.
Though generally ya, handrolling this stuff can work at the scale of 50 VMs, it becomes a lot harder once you hit hundreds/thousands.
sonink 10 hours ago [-]
Congratulations on the launch !
We run upwards of a thousand sandboxes for coding agents - but these are all standard VM's that we buy off the shelf from Azure, GCP, Akamai and AWS. I am not sure why we should use this instead of the standard VM's? Pricing could be one part, but not sure if the other features resonate.
Forking is interesting, but I would need to know how it works and if it is in the blast radius of the agent execution. If we need to modify the agent to be cognizant of forking, then that is a complexity which could be very expensive to handle in terms of context. If not, then I am not sure what is the use for it.
Sandbox start time at 500ms is definitely interesting. But its something we already are on track to reproduce with a pooled batch of VM's. So not sure if that in itself is worth paying for the premium.
My two cents on the space is that agents are rapidly becoming more capable to just use the tooling developed for humans. All clouds provide a CLI which agents can already use to orchestrate - they should just use the VM's designed for humans through the CLI. Our agent can already 'login' to any VM on the cloud and use the shell exactly like a human would. No software harness is required for this capability. The agent working on a VM is indistinguishable from humans.
stingraycharles 1 days ago [-]
I’m super interested since it seems like you have given everything a lot of thought and effort but I am not sure I understand it.
When I’m thinking of sandboxes, I’m thinking of isolated execution environments.
What does forking sandboxes bring me? What do your sandboxes in general bring me?
Please take this in the best possible way: I’m missing a use case example that’s not abstract and/or small. What’s the end goal here(
benswerd 1 days ago [-]
So isolation is correct. Forking a sandbox gives you multiple exact duplicates of isolated environments.
When your coding agent has 10 ideas for what to do, to evaluate them correctly it needs to be able to evaluate them in isolation.
If you're building a website testing agent and halfway down a website, with a form half filled out a session ongoing, etc and it realizes it wants to test 2 things in isolation, forking is the only way.
We also envision this powering the next generation of devcycles "AI Agent, go try these 10 things and tell me which works best". AI forks the environment 10 times, gets 10 exact copies, does the thing in each of them, evaluates it, then takes the best option.
stingraycharles 15 hours ago [-]
You should focus much more on this aspect, this makes so much more sense but it’s a very specific, narrow use case: multiple solution spaces must be explored in parallel, and then reconciled.
I can also see this being more of a framework / library that integrates into existing LLM frameworks than a SaaS; I wouldn’t switch my whole application to a different framework / runtime just for this.
benswerd 15 hours ago [-]
This is a good note. We've never been great at explaining what we're doing and plan to do a lot more work on making it accessible/make sense.
mememememememo 16 hours ago [-]
The other way might be testing VMs vs agent VMs but that would be slower as to "fork" it would need to run the test again to that point. But wouldn't need agent context.
The forking you provided adds a lot more speed.
benswerd 16 hours ago [-]
That + its not always simple to replicate state. A QA agent in the future could run for hours to trigger an edge case that if all actions to get there were theoretically taken again it wouldn't happen.
That can happen via race conditions, edge states, external service bugs.
indigodaddy 1 days ago [-]
Yep I can see this especially when the agent is spinning up test servers/smokes and you don't want those conflicting. How do we reconcile all the potential different git hashes though, upstream I guess etc (this might be an easy answer and I'm not super proficient with git so forgive)
benswerd 1 days ago [-]
So we recommend branch per fork, merge what you like.
You have to change the branch on each fork individually currently and thats unlikely to change in the short term due to the complexity of git internals, but its not that hard to do yourself `git checkout -b fork-{whateverDiscriminator}`
chrisweekly 23 hours ago [-]
Have you considered git worktree?
benswerd 22 hours ago [-]
Great for simple things, but git worktrees don't work when you have to fork processes like postgres/complex apps.
ghm2199 18 hours ago [-]
For postgres there are pg containers, we use them in pytest fixtures for 1000's of unit-tests running concurrently. I imagine you could run them for integration test purposes too. What kind of testing would you run with these that can't be run with pg containers or not covered by conventional testing?
I'll say this is still quite useful win for browser control usecases and also for debugging their crashes.
vasco 15 hours ago [-]
> and it realizes it wants to test 2 things in isolation, forking is the only way
Why would forking be the only way, when humans don't work like that? You can easily try one thing, undo, try the second thing. Your way is a faster way potentially, but also uses more compute.
benswerd 15 hours ago [-]
This assumes you can retain the same state after an operation.
> "I wonder if this is slow because we have 100k database rows"
> DELETE FROM TABLE;
> "Woah its way faster now"
> But was is the 100k rows or was it a specific row
Thats a great place where drilling bugs and recreating exact issues can be really problem, and testing the issues themselves can be destructive to the environment leading to the need for snapshots and fork.
vasco 15 hours ago [-]
Again, that is a problem of approach, not of compute. Compute just makes that faster, it doesn't make it possible. It's like you saying the only way to do something is with threads. It's good for some use cases, bad for others, and makes most faster, but it doesn't unlock much
wsve 1 days ago [-]
Agreed, the thing I'd be most interested in is the isolated execution environment you mentioned. Agents running autopilot are powerful. Agents running unsupervised on a machine with developer permissions and certificates where anything could influence the agent to act on an attacker's behalf is terrifying
benswerd 1 days ago [-]
I recommend running the agent harness outside of the computer. The mental model I like to use is the computer is a tool the agent is using, and anything in the computer is untrusted.
jeremyjh 1 days ago [-]
I would recommend not giving an agent the full run of any computing environment. Do handle fine grained internet access controls and credential injection like OpenShell does?
benswerd 1 days ago [-]
I used to believe this, but I think the next generation of agents is much more autonomous and just needs a computer.
The work of a developer is open ended, so we use a computer for it. We don't try to box developers into small granular screwdrivers for each small thing.
Thats whats coming to all agents, they might want to run some analysis with python, want to generate a website/document in typescript, and might want to store data in markdown files or in MongoDB. I expect them to get much more autonomous and with that to end up just needing computers like us.
jeremyjh 20 hours ago [-]
The difference is that I am not always legally liable for what a rogue developer does with their computer - if I had no knowledge of what they were up to and had clear policies they violated then I'm probably fine. But I'm definitely always liable for anything an agent I created does with the computer I gave it.
And while they are getting better I see them doing some spectacularly stupid shit sometimes that just about no person would ever do. If you tell an agent to do something and it can't do what it thinks you want in the most straightforward way, there is really no way to put a limit on what it might try to do to fulfill its understanding of its assignment.
croes 1 days ago [-]
The problem is the agent, which should be treated untrusted.
The computer isn’t the problem
benswerd 1 days ago [-]
Kind of. The chat logs of the agent are trustworthly, as should any telemetry you have on it or coming out of the VM. Its behavior should be treated as probabilistic and therefore untrustworthly.
lll-o-lll 19 hours ago [-]
It’s untrustworthy because its context can be poisoned and then the agent is capable of harm to the extent of whatever the “computer” you give it is capable of.
The mitigation is to keep what it can do to “just the things I want it to do” (e.g. branch protection and the like, whitelisted domains/paths). And to keep all the credentials off its box and inject them inline as needed via a proxy/gateway.
I mean, that’s already something you can do for humans also.
shubhamintech 1 days ago [-]
I think one of the very few who actually support ebpf & xdp, which you do need when you're building low level stuff. + the bare metal setup is like out of the world lol.
benswerd 1 days ago [-]
Tx it took a lot of work lol
vimota 1 days ago [-]
This is awesome - the snapshotting especially is critical for long running agents. Since we run agents in a durable execution harness (similar to Temporal / DBOS) we needed a sandboxing approach that would snapshot the state after every execution in order to be able to restore and replay on any failure.
We ended up creating localsandbox [0] with that in mind by using AgentFS for filesystem snapshotting, but our solution is meant for a different use case than Freestyle - simpler FS + code execution for agents all done locally. Since we're not running a full OS it's much less capable but also simpler for lots of use cases where we want the agent execution to happen locally.
The ability to fork is really interesting - the main use case I could imagine is for conversations that the user forks or parallel sub-agents. Have you seen other use cases?
Deterministic testing of edge cases. It can be really hard to recreate weird edge cases of running services, but if you can create them we can snapshot them exactly as they are.
stocktech 1 days ago [-]
I built something like this at work using plain Docker images. Can you help me understand your value prop a little better?
The memory forking seems like a cool technical achievement, but I don't understand how it benefits me as a user. If I'm delegating the whole thing to the AI anyway, I care more about deterministic builds so that the AI can tackle the problem.
benswerd 1 days ago [-]
So first MicroVM != Container, and container is not a secure isolation system. I would not run untrusted containers on your nodes without extra hardening.
The memory forking was originally invented because for AI App Builders and first response driven applications its extremely important that they are instant (difference between running bun dev and the dev server already being running).
However its much more generally applicable, Postgres is a great example of this. You can't fork the filesystem under postgres and get consistency. Same thing with a browser state, a weird server state, or anything that exists in memory. The memory forking gives a huge performance boost while snapshotting whats actually going on at one instant.
sabedevops 22 hours ago [-]
What does this protect you from that you’re exposed to by running a well-crafted rootless container on a system with SELinux or similar?
benswerd 22 hours ago [-]
Generally kernel level attacks and neighbor performance impacts on the security side.
On the functional side without a kernel per guest you can't allow kernel access for stuff like eBPF, networking, nested virtualization and lots of important features.
theoretically you can get to fairly complete security via containers + a gVisor setup but at the expense of a ton of syscall performance and disabling lots of features (which is a 100% valid approach for many usecases).
Never tried them, I think the weird thing about VM providers is the difference really all is in the execution. These guys seem great in concept but I don’t know enough about how they properly work.
thepoet 4 hours ago [-]
Hi Ben, one of the founders of InstaVM here. Congrats on the launch!
Would love to give you a demo of InstaVM and trade notes. Let me know abhishek@instavm.io
_jayhack_ 1 days ago [-]
Would love to understand how you compare to other providers like Modal, Daytona, Blaxel, E2B and Vercel. I think most other agent builders will have the same question. Can you provide a feature/performance comparison matrix to make this easier?
benswerd 1 days ago [-]
I'm working on an article deep diving into the differences between all of us. I think the goal of Freestyle is to be the most powerful and most EC2 like of the bunch.
I haven't played around with Blaxel personally yet.
E2B/Vercel are both great hardware virtualized "sandboxes"
Freestyle VMS are built based on the feedback our users gave us that things they expected to be able to do on existing sandboxes didn't work. A good example here is Freestyle is the only provider of the above (haven't tested blaxel) that gives users access to the boot disk, or the ability to reboot a VM.
tomComb 1 days ago [-]
And fly.io sprites
benswerd 1 days ago [-]
Fly.io sprites is the most similar to us of the bunch. They do hardware virtualization as well, have comparable start times and are full Linux. What we call snapshots they call checkpoints.
The big pros of Sprites over us is their advanced networking stack and the Fly.io ecosystem. The big cons are that Sprites are incredibly bare bones — they don't have any templating utilities. I've also heard that Sprites sometimes become unavailable for extended periods of time.
The big pros of Freestyle over Sprites is fork, advanced templating, and IMO a better debugging experience because of our structure.
knowsuchagency 1 days ago [-]
Thanks for the thoughtful response. I'm predominantly a self-hoster, but I think your product makes a lot of sense for a wide variety of users and businesses. I'm excited to try out freestyle!
benswerd 1 days ago [-]
Self hosting can be doable for constant small/medium size workloads
Freestyle/other providers will likely provide better debugging experience but thats something you can probably get past for a lot of workloads.
The time when you/anyone should think about Freestyle/anyone is when the load spikes/the need to create hundreds of VMs in short spikes shows up, or when you're looking for some of the more complex feature sets any given provider has built out (forks, GPUs, network boundaries, etc).
I also highly recommend self hosting anything you do outside of your normal VPC. Sandboxes are the biggest possible attack surface and it is a feature of us that we're not in your cloud; If we mess up security your app is still fine.
indigodaddy 1 days ago [-]
This is what I do (my project) for self hosting on a VPS/server:
Obviously your service/approach is different than exe, more like sprites but like you said more targeted/opinionated to AI coding/sandboxing tasks it looks like. Interesting space for sure!
kstenerud 20 hours ago [-]
I built yoloAI, which is a single go binary that runs anywhere on mac or linux, sandboxing your agents in disposable containers or VMs, nested or not.
Your agent never has access to your secrets or even your workdir (only a copy, and only what you specify), and you pull the changes back with a diff/apply workflow, reviewing any changes before they land. You also control network access.
Still WIP, but the core works — three rootfs tiers (minimal Ubuntu, headless Chromium with CDP, Docker-in-VM), OCI image support (pull any Docker image), automatic thermal management (idle VMs pause then snapshot to disk, wake transparently on next API call), per-user bridge networking with L2 isolation, named checkpoints, persistent volumes, and preview URLs with auto-wake.
Fair warning: the website is too technical and the docs are mostly AI-generated, both being actively reworked. But I've been running it daily on a Hetzner server for my AI agents' browser automation, and deploy previews.
I'd love any feedback if you want to go ahead and try it yourself
0123456789ABCDE 1 days ago [-]
sprites have weird lately, i think fly.io is having trouble with capacity in various locations.
is the experience similar? can i just get console to one machine, work for a bit, logout. come back later, continue?
how does i cost work if i log into a machine and do nothing on it? just hold the connection.
benswerd 1 days ago [-]
This will just work on us.
We do auto suspend depending on your configured timeout. We'll pause your VM and when you come back the processes will be in the exact same state as when you left.
tomComb 24 hours ago [-]
But your pricing page suggests that that is not available without a subscription: in the on-demand pricing section "persistent Snapshots" and "Persistent VM's" have an 'x'.
benswerd 21 hours ago [-]
We do not allow long term persistence for the free tier.
This is purely a defense mechanism, I don't want to guarantee storing the data of an entire VM forever for non paying users. We have persistence options for them like Sticky persistence but it doesn't come with the reliability of long term persistence storage.
tomComb 20 hours ago [-]
But it wouldn’t be non paying customers. That was from the on demand section. I just want to pay for what I use without getting into a subscription.
benswerd 20 hours ago [-]
Ah I see. This is very interesting but not what we're focused on right now. I will keep this in mind for future prioritization.
rsyring 1 days ago [-]
I'd also be interested in a comparison with exe.dev which I'm currently using.
benswerd 1 days ago [-]
Exe.dev is a individual developer oriented service. Freestyle is more oriented at platforms building the next exe.dev.
Thats why our pricing is usage based and we have a much larger API surface.
MarcelinoGMX3C 1 days ago [-]
The technical challenges in getting memory forking to deliver those sub-second start and fork times are significant. I've seen the pain of trying to achieve that level of state transfer and rapid provisioning. While "EC2-like" gets the point across for many, going bare metal reveals the practical limits of cloud virtualization for high-performance, complex workloads like these. It shows a real understanding of where cloud abstraction helps and where it just adds overhead.
The cost argument for owning the hardware for this specific use case also makes sense, considering the scale these agent environments will demand. Also worth noting, sandboxes are effectively an open attack surface; architecting them not to be in your main VPC is a sound security decision from the start.
cheema33 23 hours ago [-]
I currently use lightweight VMs (Proxmox containers) and git worktrees. I can fork an existing VM in in seconds. It is not entirely clear to me what I would gain from using your solution.
benswerd 22 hours ago [-]
Proxmox forking in a few seconds is a miracle!
These are likely only a better value for you at large scale/if you start wanting to run hundreds.
orliesaurus 7 hours ago [-]
There are many providers popping up every day offering sandboxes, I think Cloudflare is ahead of the game for pricing and performance, that being said it would be super nice to see a huge competitor analysis:
Cloudflare vs e2b vs daytona vs freestyle vs whatever else
n2d4 1 days ago [-]
Cool! I've been using your API for running sandboxed JS. Nice to see you also support VMs now.
> we mean forking the whole memory of it
How does this work? Are you copying the entire snapshot, or is this something fancy like copy-on-write memory? If it's the former, doesn't the fork time depend on the size of the machine?
benswerd 1 days ago [-]
We're using copy on write with the memory itself. Fork time is completely decoupled from the size of the machine.
Creating snapshots takes a 2-4 second interruption in the VM due to sheer IO that we didn't want here.
Whats especially cool about this approach is not only is fork time O(1) with respect to machine size, but its also O(1) with respect to the amount of forks.
Just want to say that even if alternatives exist (not necessarily exact capabilities obviously), I appreciate what seems to be genuine excitement on your part of having built something cool / best in class.
So best of luck with your vision for it!
brap 15 hours ago [-]
Very nice, congrats!
One thing:
>Freestyle is the only sandbox provider with built-in multi-tenant git hosting — create thousands of repos via API and pair them directly with sandboxes for seamless code management.
Maybe I’m just stupid, but I don’t know what this means. I initially thought I’m your target audience but after failing to understand this part I’m thinking maybe I’m not? I honestly don’t know.
benswerd 15 hours ago [-]
If git isn't for you we'd still love to support you. We believe to build the sandboxes for coding agents you also need to provide git repos for them so we do that as well. You can easily say give me this vm with these 3 repos and these permissions with us.
But that said, the sandbox stands on its own without it.
stingraycharles 13 hours ago [-]
It’s difficult to understand the content and what the product actually is, as I’ve also mentioned in another reply. I think the product is probably great, but you need to improve the communication, it’s too abstract.
I don’t know what “give each sandbox a unique git repository” does for me in practice, what problem it solves.
You’re not providing any practical problems your product is intended to solve.
ewidar 14 hours ago [-]
I think you should just explain that part more clearly: why would they want you to host git repos on their behalf.
benswerd 13 hours ago [-]
Freestyle isn't designed for an individual engineer working on their Github repos. Its designed for platforms building coding agents that want to take the place of Github all together. Those platforms need some source of truth alongside the VMs, just like how you don't store all of your important documents on your personal computer. That is why we offer git.
nyellin 1 days ago [-]
Is it possible to run a Kubernetes cluster inside one? (E.g. via KIND.)
If so, we'd very much like to test this. We make extensive use of Claude Code web but it can't effectively test our product inside the sandbox without running a K8s cluster
benswerd 23 hours ago [-]
Yes! You can def run something like K3s in these VMs.
skybrian 1 days ago [-]
It doesn't seem very easy to calculate how much it would cost per month to keep a mostly-idle VM running (for example, with a personal web app). The $20/month plan from exe.dev seems more hobbyist-friendly for that. Maybe that's not the intended use, though?
benswerd 1 days ago [-]
We're not going after hobbyists. We're building the platform for companies like exe.dev to build on. Thats why its all usage based.
That said, our $50 a month plan can be used as an individual for your coding agents, but I wouldn't recommend it.
indigodaddy 1 days ago [-]
Ooof, if you are the middleman platform then it's sure gonna get expensive for the end user
rvz 1 days ago [-]
> The $20/month plan from exe.dev seems more hobbyist-friendly for that. Maybe that's not the intended use, though?
And you can go even below that by self-hosting it yourself with a very cheap Hetzner box for $2 or $5.
skybrian 1 days ago [-]
Can you start up multiple VM's easily on a Hetzner box?
umarcyber 1 days ago [-]
Your UI design is really nice.
lukebaze 15 hours ago [-]
The observability point is real but honestly the loop detection problem is more about how you structure your agent than the sandbox. When I've had agents go rogue, the issue was always the outer loop logic, not visibility into the VM. What does your current loop controller look like?
CompuIves 12 hours ago [-]
This is really cool to see, reminds me of the early days of CodeSandbox. Though this API looks _fantastic_. I love that you do VM configuration using `with`.
benswerd 4 hours ago [-]
We read your blogs when building all of this!
jFriedensreich 1 days ago [-]
Non open source and non local SAAS sandboxes are offensive to even try to launch. No one needs this and the only customers will be vibe coders who just don't know any better. There are teams building actual sandboxes like smolmachines, podman, colima and mre. At least be honest and put the virtualisation tech you are using as well as that its closed source SAAS on the landing page to safe people time.
benswerd 1 days ago [-]
Our users are platforms, and many of the best already build on us.
Self hosting is a valuable feature but our technology is unfriendly to small nodes — it will not work on consumer hardware. Many of the optimizations we spend our time on only seriously kick in above 2TB of storage and above 500GB of RAM.
senko 22 hours ago [-]
> Non open source and non local SAAS sandboxes are offensive to even try to launch. No one needs this and the only customers will be vibe coders who just don't know any better.
This is simply not true, but also not a very charitable take.
alasano 21 hours ago [-]
Your comment could have been "I prefer these open source alternatives:" but you chose to be a hater.
There's nothing wrong with offering services that people find useful.
brap 15 hours ago [-]
Apparently it’s offensive to even try to make things people want
csomar 14 hours ago [-]
> No one needs this
VMWare was acquired for $69Bn.
k38f 16 hours ago [-]
500ms fork of a running VM with full memory state is the kind of thing I'd assume wasn't possible until I saw it work. What does failure look like — does the fork just not happen, or can you get partial state?
benswerd 16 hours ago [-]
There is no partial state really possible. We can run out of space on a Node and just say no. But the nature of memory forking is if you don't literally do it 100% right it crashes immediately (I know cuz it took me a while too get it right).
ianberdin 23 hours ago [-]
Congrats guys!
Would share some technical details, I bet you have great stories to tell.
Let’s, what is forking? You completely copy disk, make ram snapshot and run it? If CoW, but ram? You mentioned 8GB ram vms. Sounds like impossible to copy 8Gb under 500ms, also disk?
benswerd 23 hours ago [-]
So fork time is actually O(1) with VM size, its 500ms even for 64gb + disk. We're using some pretty weird COW techniques to pull it off.
ghm2199 18 hours ago [-]
O(1) What! What might bring it down to say 10's of ms? Looks like its some kind of optimizable wall that its 500 for everything.
Like with 10ms then online replication/backup — analogus to litestream for sqlite — but for in memory processes becomes feasible, no?
benswerd 17 hours ago [-]
We're actually median under 500ms — ~320ms median — I just didn't want to piss of hacker news with over estimatation.
We have another set of optimizations that we believe can take us to ~200ms in the next few months but beyond that we're pretty much completely stuck.
Realistically other sandboxes will be able to get there before us because we've chosen to support so much of Linux/if you don't run an operating system or don't support custom snapshots that is much easier.
ianberdin 22 hours ago [-]
Insane. Does it possible to fork to another bare metal machine? Maybe multi region as fly io.
If not, I bet you have huge disk sizes on your machines to store all the snapshots (you said, you store them and bill only for disk space).
benswerd 22 hours ago [-]
So forking across multiple nodes in that speed is not possible — we run extremely beefy nodes in order to avoid moving VMs across nodes as much as possible.
We are researching systems of hot moving VMs across VMs but it would have very different performance characteristics.
ianberdin 22 hours ago [-]
Yeah, I see.
Is it possible to get a corrupted state? Let’s say we had realtime database actively writing at that moment?
benswerd 22 hours ago [-]
It is impossible.
Our tech is not decades old so there is a chance we've missed something but our layer management is atomic so I'd be shocked if you'd be able to corrupt state across forks/snapshots.
skybrian 1 days ago [-]
Any ideas for locking down remote access from an untrusted VM? Cloudflare has object-based capabilities and some similar thing might be useful to let a VM make remote requests without giving it API keys. (Keys could be exfiltrated via prompt injection.)
benswerd 1 days ago [-]
So we have there are 3 solutions to this, Freestyle supports 2 of them:
1. Freestyle supports multiple linux users. All linux users on the VM are locked down, so its safe to have a part of the vm that has your secret keys/code that the other parts cannot access.
2. A custom proxy that routes the traffic with the keys outside
3. We're working on a secrets api to intercept traffic and inject keys based on specific domains and specific protocols starting with HTTP Headers, HTTP Git Authentication and Postgres. That'll land in a few weeks.
siscia 20 hours ago [-]
It is not clear to me how much CPU I get.
"Unlimited" as in 8vCPU and then I am billed for it on consumption?
benswerd 20 hours ago [-]
Billed for wall time. whichever plan you are on you get in credits, so hobby plan gets $50 of credits and beyond that billed on per CPU wall time.
bhaktatejas922 14 hours ago [-]
do you think the industry is overfixated on startup times? what are better metrics people building with sandboxes should pay attention to
benswerd 14 hours ago [-]
So first I don't, I think startup times are fundamentally really important. 5s is different than 1s is different than 500ms is different than 200ms and users notice.
I don't think people run real world benchmarks on what that coldstart really means though, like time to first response from a NextJS is a very important benchmark for Freestyle and we've spent a lot of time on it. While Daytona sandboxes boot faster than Freestyle ones our first response is an order of magnitude ahead of theirs.
I think another important one is concurrency: In worst case scenarios how many VMs can you get from a provider in a 5 second period is important.
I also think not enough time is spent on "Does it actually work on this VM", stuff like postgres, redis, ntftables, complex linux binaries that are hard to run need to work on these sandboxes because AI is going to need them and I don't think there has really been a feature-bench system yet.
Networking/snapshotting/persistence characteristics all also need to come into this.
I
jbethune 1 days ago [-]
Congratulations on the launch! Will definitely test this out.
17 hours ago [-]
lawrencechen 22 hours ago [-]
Can you develop freestyle in freestyle vms?
benswerd 21 hours ago [-]
Yessir, we haven't mastered it yet but we've compiled the kernel with enough flags for stuff like nftables and KVM to make it possible.
jnstrdm05 1 days ago [-]
how many seconds to provision are we talking about here? 1 sec vs 60 is a dealbreaker for me, some clarity on that would be nice.
benswerd 1 days ago [-]
500ms. Less than 1 second. We're aiming to get that down to 200ms in the next 3 months.
rasengan 1 days ago [-]
Interesting!
We're working on a similar solution at UnixShells.com [1]. We built a VMM that forks, and boots, in < 20ms and is live, serving customers! We have a lot of great tools available, via MIT, on our github repo [2] as well!
Can your service scale ram? like the way docker desktop does. Manual is fine.
benswerd 1 days ago [-]
yep you can choose ram + disk + cpu size
tomComb 24 hours ago [-]
? You say 'yes' but you seem to be answering a different question. Docker desktop only makes me choose a max ram - it dynamically scales RAM usage. I don't need fully automatic like that, but the ability to vertically scale RAM for an existing instance is really important, particularly given the cost of RAM these days.
benswerd 23 hours ago [-]
Ah we cannot do this without a restart. Hot pluggable ram is something I'm interested in but is currently a backburner feature.
csomar 14 hours ago [-]
I was intrigued to try but your web app is so extremely slow, it takes up to 30+ seconds to move from one tab to the next. Not exactly selling your point of being a super fast provisioning service. Another thing I am wondering. You seem to be selling this as VMs configurable from node/bun. Wouldn't a CLI make more sense here?
Another question: How hard do you think it'll be to integrate this with something like Claude Code. ie: /resume in claude code both return your session and wake up your vm. Or even better /resume from freestyle and have your claude code session open where you left it.
benswerd 13 hours ago [-]
I'm not sure what you saw as slow, I'd love to improve it. Do you mean the dashboard?
We're built as an API for platforms to build on rather than tool for individual developers. Oriented at platform orchestrating tens of thousands at VMs rather than individuals using CLI. We also have a CLI but its primarily a debugging and testing tool.
Resuming a freestyle VM with claude code in it will just work. You can do that via SSH.
csomar 9 hours ago [-]
> I'm not sure what you saw as slow, I'd love to improve it. Do you mean the dashboard?
Switching tabs in the Dashboard (Domains/Routes/etc.) was basically unusable about 4 hours ago. It's noticeably better now, though there's still some latency (just retested).
Fraaaank 1 days ago [-]
Your pricing page is broken
benswerd 1 days ago [-]
Reviewing this now. our public pricing at www.freestyle.sh/pricing seems to be working, can you point me in a more specific direction?
messh 24 hours ago [-]
Checkout shellbox.dev, you can do pretty much the same, automating it all bia ssh
zhdhdjfdhsbs 15 hours ago [-]
Wqq wwiq and hdhddjdbnzzs
S
maxmaio 1 days ago [-]
Congrats Ben and Jacob!
esseph 17 hours ago [-]
> In order to make this possible, we’ve moved to our own bare metal racks. Early in our testing we realized that moving VMs across cloud nodes would not have acceptable performance properties. We asked Google Cloud and AWS for a quote on their bare metal nodes and found that the monthly cost was equivalent to the total cost of the hardware so we did that.
Yes! And good on you, well-tuned bare metal performance is hard to beat.
fawabc 24 hours ago [-]
how does this differ from daytona or e2b?
benswerd 23 hours ago [-]
Generally compared to those two more powerful. Freestyle VMs are full Debian machines, with support for sysd, docker in docker, multiple users, hardware virtualization etc. Daytona and E2B are both great "sandbox" providers but don't really feel like VMs/you can't run everything you can in an EC2.
We also support the forking/snapshotting/long running jobs that they struggle to.
ianberdin 24 hours ago [-]
Also modal.com, I saw a few more as well.
schopra909 1 days ago [-]
Honestly never considered the forking use case; but it makes a ton of sense when explained
Congrats on the launch. This is cool tech
holoduke 23 hours ago [-]
The problem with agents is that it is currently way too expensive. 100 times more expensive maybe. Another big issue is the lack of interactivity with an agent. Therefor for now agentic development is only viable from your own machine. And there isolation is less of an issue easier to manage.
benatkin 1 days ago [-]
It's hard to tell what this is or how it compares to other things that are out there, but what I latched onto is this:
> Freestyle is the only sandbox provider with built-in multi-tenant git hosting — create thousands of repos via API and pair them directly with sandboxes for seamless code management. On top of that, Freestyle VMs are full Linux virtual machines with nested virtualization, systemd, and a complete networking stack, not containers.
Edit: I realize the Loom is a way to look at it. Loom interrupted me twice and I almost skipped it. However it gave me a better idea of what it does, it "invents" snapshotting and restoring of VMs in a way that appears faster. That actually makes sense and I know it isn't that hard to do with how VMs work and that it greatly benefits from having only part of the VM writable and having little memory used (maybe it has read-only memory too?).
benswerd 1 days ago [-]
So the snapshotting tech is actually 100% independent of Git.
Git is useful for branching vs forking (IE you can't merge two VM forks back together), but all the tech I showed in the Loom exists independently from Git.
The hard part of it was making the VM large and powerful while making snapshotting/forking instant, which required a lot of custom VMM work.
benatkin 23 hours ago [-]
> The hard part of it was making the VM large and powerful while making snapshotting/forking instant, which required a lot of custom VMM work.
I don't find "large and powerful" in reference to a VM to sound compelling. What should be large? The memory? The root disk? As I alluded to in my comment, I'm more curious about what can be made small.
Also I'm skeptical that if I forked a vm running a busy Gas Town that it would be very light or fast in how it forks. A well behaved sqlite I could see, but then I'd wonder why not just fork the storage volume containing the database...
benswerd 22 hours ago [-]
So thats what we did. We've made forking a whole gas town performant in 100s of milliseconds. Try it — you can definitely see it working on free tier.
In respect to large and powerful RAM + Size is important but I was more-so referring to full Linux power. The ability to run nested virtualization, ebpf, fuse, and the powerful features of a normal Linux machine instead of a container.
benatkin 22 hours ago [-]
Well that does sound pretty impressive then. And as a champion of open source it wouldn't make me feel like I was getting locked in because the regular speeds I could live with (on a server with KVM or a nested virtalization setup).
dominotw 1 days ago [-]
dumb question. none of these protect your from prompt injection. yes?
benswerd 1 days ago [-]
no, but the goal of these is if you are faced with prompt injection the worst case scenario is the AI uses that computer badly.
dominotw 1 days ago [-]
unless i am misundestanding. not sure how this computer prevents secrets from my gmail leaking. thats the worst case.
benswerd 1 days ago [-]
If you put your gmail credentials into a VM that an AI Agent dealing with untrusted prompts has access to they should be treated as leaked and be disabled immediately.
However, if you don't put your administrative credentials inside of the VM and treat it as an unsafe environment you can safely give it minimal permissions to access specific things that it needs and using that access it can perform complex tasks.
dominotw 1 days ago [-]
i am talking about this . not my gmail credentials.
I have so many interesting problems on Ai, sandboxing isn't one of them. It's a pointless excercise yet disproportionately so many people love to to do this. Probably because sandboxing doesn't feel as magic as Agents itself and more like the old times of "traditional" software development.
hobofan 1 days ago [-]
It is a mostly pointless exercise if the goal is trying to contain negative impact of AI agents (e.g. OpenClaw).
It is a very necessary building block for many common features that can be steered in a more deterministic way, e.g. "code interpreter" feature for data analysis or file creation like commonly seen in chat web UIs.
moezd 1 days ago [-]
Believe it or not, once you start working for a regulated industry, it is all you would ever think of. There, people don't care if you are vibing with the latest libraries and harnesses or if it's magic, they care that the entire deployment is in some equivalent of a Faraday cage. Plus, many people just don't appreciate it when their agents go rm -rf / on them.
iterateoften 1 days ago [-]
Yeah, idk I guess it’s interesting if you are an engineer looking for something to do,
But like I see multiple sandbox for agents products a week. Way too saturated of a market
benswerd 1 days ago [-]
I disagree (as a sandboxing company).
With respect to the market, every single sandbox sucks. I'm not gonna shit talk competitors but there is not a good sandboxing platform out there yet — including me — compared to where we'll be in 6 months.
We've heard all the platforms have consistent uptime, feature completeness, networking and debugging issues. And in our own platform we're not 1/10ths of the way through solving the requests we've gotten.
Next generation of Agents needs computers, and those computers are gonna look really different than "sandboxes" do today.
tcdent 1 days ago [-]
I don't think you're wrong, but if you really want to really re-think the approach, building an orchestration layer for Firecracker like every other company in the space is doing is probably not it.
chwzr 7 hours ago [-]
Wonder what you are thinking of then?
Sattyamjjain 3 hours ago [-]
[dead]
philbitt 6 hours ago [-]
[dead]
techpulselab 12 hours ago [-]
[dead]
atlasagentsuite 21 hours ago [-]
[flagged]
benswerd 21 hours ago [-]
Freestyle has really built with this in mind. We propose a primary architecture built around declarative configuration of the vm with a git repo as external source of truth.
If the VM crashes/you have another idea/you want to try something else it should be reconstructable from outside of the VM.
However, I think this is potentially unrealistic. While it is the ideal architecture, I hear more and more every day people who just want to have the VMs run for months at a time.
sn9 21 hours ago [-]
What are some examples of this?
benswerd 21 hours ago [-]
CI Builders/QA Agents can do this very well. User session starts, bring VM up with content + dependencies, when session is done throw it away. Keeps it clean, debuggable, fast and cheap.
psychomfa_tiger 16 hours ago [-]
[flagged]
benswerd 16 hours ago [-]
TBH I wouldn't recommend using it for this. I'm a big believer in agent chat running outside of the VM, where you can get much better control over the chat loop. I would treat the VM as a tool the agent is using rather than the agent's environment. Like the agent is a human using a machine and watching it, rather than trying to watch it from inside the machine. Then there are great existing observability tools, my fav is langfuse.
brap 15 hours ago [-]
But doesn’t this defeat the purpose?
I would actually imagine this would be useful for observably in the sense that you can fork and then kill the loop in the fork, hop into an interactive session to figure out what it’s doing, while the loop is still running in the original instance.
benswerd 15 hours ago [-]
I don't believe so. while it is technically easy to fork claude code running in these VMs, its not technically difficult to fork a conversation loop outside of the VM as well.
What matters is that its all forked atomically, which can be done with resources outside of the VM as well.
brap 14 hours ago [-]
Fair enough, and I respect you pointing out the alternatives
mt18 9 hours ago [-]
[dead]
areys 14 hours ago [-]
[flagged]
maltyxxx 22 hours ago [-]
[dead]
nightrate_ai 22 hours ago [-]
[dead]
borakostem 1 days ago [-]
[flagged]
benswerd 1 days ago [-]
So this is an ongoing optimization point, no perfect solution exists. Freestyle VMs work with a network namespace and virtual ethernet cable going into them, so they all think they are the same IP.
This means that while complex protocol connections like remote Postgres can break in the forks, stuff like Websockets just automatically reconnects.
bac2176 9 hours ago [-]
[flagged]
aplomb1026 1 days ago [-]
[dead]
randomtoast 13 hours ago [-]
[dead]
AlexanjaSenke 11 hours ago [-]
[dead]
n1tro_lab 1 days ago [-]
[dead]
johnwhitman 1 days ago [-]
[dead]
tuhgdetzhh 1 days ago [-]
[dead]
aaztehcy 22 hours ago [-]
[flagged]
Shin0221 17 hours ago [-]
The dispatch problem is real. With multiple agents running in parallel, the human ends up being the routing layer — deciding what goes where, when. Most setups solve the execution layer well but leave the intake layer as a manual process. Curious if anyone has found a clean solution for this.
Rendered at 20:43:16 GMT+0000 (Coordinated Universal Time) with Vercel.
One gap I'm noticing in these comments and in the current sandbox landscape is Windows. Every platform mentioned in these comments like E2B, Daytona, Fly Sprites, Sandflare appears Linux-native. Makes sense for coding agents targeting Debian environments, but a real category exists to automate Windows-specific workflows: enterprise software, ERP systems, anything that runs only on Windows.
If anyone wants to run agents in Mac or Linux and need to access Windows for computer use, Dexbox could be helpful. [github.com/getdexbox/dexbox]
I launched an open source developer tool called Dexbox to run agent workloads that quickly provision and run Windows desktops. It's a CLI and MCP experience that's different from Freestyle, but slightly closer to our Windows-specific production infra, Nen. I like Freestyle's cool UI that shows off the unique technical approach and developer friendliness. Nen's a bit closer to that experience.
If the machine can fork itself, it could allow for some really neat auto-forking workflows where you fuzz the UI testing of a website by forking at every decision point. I forget the name of the recent model that used only video as its latent space to control computers and cars, but they had an impressive demo where they fuzzed a bank interface by doing this, and it ended up with an impressive number of permutations of reachable UI states.
However, 50 concurrent VMs is not a lot. Similar limits exists on all cloud providers, except perhaps in AWS where the cost is prohibitive and it is slow.
Earlier this year, we ended up rolling out own. It is nothing special. We keep X number of machines in a warm pool. Everything is backed by a cluster of firecracker vms. There is no boot time that we care about. Every new sandbox gets vm instantaneously as long as the pool is healthy.
> It is nothing special. We keep X number of machines in a warm pool.
I'd love to better understand the unit economics here. Specifically, whether cost is a meaningful factor.
The reason I ask is that many startups we've seen focus heavily on optimizing their technology to reduce cold/boot startup times. As you pointed out, perceived latency can also be improved by maintaining a warm pool of VMs.
Given that, I'm trying to determine whether it's more effective to invest in deeper technical optimizations, or to address the cold start problem by keeping a warm pool.
Though generally ya, handrolling this stuff can work at the scale of 50 VMs, it becomes a lot harder once you hit hundreds/thousands.
We run upwards of a thousand sandboxes for coding agents - but these are all standard VM's that we buy off the shelf from Azure, GCP, Akamai and AWS. I am not sure why we should use this instead of the standard VM's? Pricing could be one part, but not sure if the other features resonate.
Forking is interesting, but I would need to know how it works and if it is in the blast radius of the agent execution. If we need to modify the agent to be cognizant of forking, then that is a complexity which could be very expensive to handle in terms of context. If not, then I am not sure what is the use for it.
Sandbox start time at 500ms is definitely interesting. But its something we already are on track to reproduce with a pooled batch of VM's. So not sure if that in itself is worth paying for the premium.
My two cents on the space is that agents are rapidly becoming more capable to just use the tooling developed for humans. All clouds provide a CLI which agents can already use to orchestrate - they should just use the VM's designed for humans through the CLI. Our agent can already 'login' to any VM on the cloud and use the shell exactly like a human would. No software harness is required for this capability. The agent working on a VM is indistinguishable from humans.
When I’m thinking of sandboxes, I’m thinking of isolated execution environments.
What does forking sandboxes bring me? What do your sandboxes in general bring me?
Please take this in the best possible way: I’m missing a use case example that’s not abstract and/or small. What’s the end goal here(
When your coding agent has 10 ideas for what to do, to evaluate them correctly it needs to be able to evaluate them in isolation.
If you're building a website testing agent and halfway down a website, with a form half filled out a session ongoing, etc and it realizes it wants to test 2 things in isolation, forking is the only way.
We also envision this powering the next generation of devcycles "AI Agent, go try these 10 things and tell me which works best". AI forks the environment 10 times, gets 10 exact copies, does the thing in each of them, evaluates it, then takes the best option.
I can also see this being more of a framework / library that integrates into existing LLM frameworks than a SaaS; I wouldn’t switch my whole application to a different framework / runtime just for this.
The forking you provided adds a lot more speed.
That can happen via race conditions, edge states, external service bugs.
You have to change the branch on each fork individually currently and thats unlikely to change in the short term due to the complexity of git internals, but its not that hard to do yourself `git checkout -b fork-{whateverDiscriminator}`
I'll say this is still quite useful win for browser control usecases and also for debugging their crashes.
Why would forking be the only way, when humans don't work like that? You can easily try one thing, undo, try the second thing. Your way is a faster way potentially, but also uses more compute.
> "I wonder if this is slow because we have 100k database rows" > DELETE FROM TABLE; > "Woah its way faster now" > But was is the 100k rows or was it a specific row
Thats a great place where drilling bugs and recreating exact issues can be really problem, and testing the issues themselves can be destructive to the environment leading to the need for snapshots and fork.
The work of a developer is open ended, so we use a computer for it. We don't try to box developers into small granular screwdrivers for each small thing.
Thats whats coming to all agents, they might want to run some analysis with python, want to generate a website/document in typescript, and might want to store data in markdown files or in MongoDB. I expect them to get much more autonomous and with that to end up just needing computers like us.
And while they are getting better I see them doing some spectacularly stupid shit sometimes that just about no person would ever do. If you tell an agent to do something and it can't do what it thinks you want in the most straightforward way, there is really no way to put a limit on what it might try to do to fulfill its understanding of its assignment.
The mitigation is to keep what it can do to “just the things I want it to do” (e.g. branch protection and the like, whitelisted domains/paths). And to keep all the credentials off its box and inject them inline as needed via a proxy/gateway.
I mean, that’s already something you can do for humans also.
We ended up creating localsandbox [0] with that in mind by using AgentFS for filesystem snapshotting, but our solution is meant for a different use case than Freestyle - simpler FS + code execution for agents all done locally. Since we're not running a full OS it's much less capable but also simpler for lots of use cases where we want the agent execution to happen locally.
The ability to fork is really interesting - the main use case I could imagine is for conversations that the user forks or parallel sub-agents. Have you seen other use cases?
[0] https://github.com/coplane/localsandbox
The memory forking seems like a cool technical achievement, but I don't understand how it benefits me as a user. If I'm delegating the whole thing to the AI anyway, I care more about deterministic builds so that the AI can tackle the problem.
The memory forking was originally invented because for AI App Builders and first response driven applications its extremely important that they are instant (difference between running bun dev and the dev server already being running).
However its much more generally applicable, Postgres is a great example of this. You can't fork the filesystem under postgres and get consistency. Same thing with a browser state, a weird server state, or anything that exists in memory. The memory forking gives a huge performance boost while snapshotting whats actually going on at one instant.
On the functional side without a kernel per guest you can't allow kernel access for stuff like eBPF, networking, nested virtualization and lots of important features.
Here is a good blog from docker explaining how even the best container is not as safe as a MicroVM https://www.docker.com/blog/containers-are-not-vms/
theoretically you can get to fairly complete security via containers + a gVisor setup but at the expense of a ton of syscall performance and disabling lots of features (which is a 100% valid approach for many usecases).
Here is a feature tour of InstaVM https://instavm.io/blog/meet-instavm-infra-for-your-agents We would be publishing on the tech soon.
Would love to give you a demo of InstaVM and trade notes. Let me know abhishek@instavm.io
Daytona runs on Sysbox (https://github.com/nestybox/sysbox) which is VM-like but when you run low level things it has issues.
Modal is the only provider with GPU support.
I haven't played around with Blaxel personally yet.
E2B/Vercel are both great hardware virtualized "sandboxes"
Freestyle VMS are built based on the feedback our users gave us that things they expected to be able to do on existing sandboxes didn't work. A good example here is Freestyle is the only provider of the above (haven't tested blaxel) that gives users access to the boot disk, or the ability to reboot a VM.
The big pros of Sprites over us is their advanced networking stack and the Fly.io ecosystem. The big cons are that Sprites are incredibly bare bones — they don't have any templating utilities. I've also heard that Sprites sometimes become unavailable for extended periods of time.
The big pros of Freestyle over Sprites is fork, advanced templating, and IMO a better debugging experience because of our structure.
You can handroll a lot with: https://github.com/nestybox/sysbox?tab=readme-ov-file https://gvisor.dev https://github.com/containers/bubblewrap?tab=readme-ov-file
For hardware virtualized machines it much harder but you can do it via: https://github.com/firecracker-microvm/firecracker/ https://github.com/cloud-hypervisor/cloud-hypervisor
Freestyle/other providers will likely provide better debugging experience but thats something you can probably get past for a lot of workloads.
The time when you/anyone should think about Freestyle/anyone is when the load spikes/the need to create hundreds of VMs in short spikes shows up, or when you're looking for some of the more complex feature sets any given provider has built out (forks, GPUs, network boundaries, etc).
I also highly recommend self hosting anything you do outside of your normal VPC. Sandboxes are the biggest possible attack surface and it is a feature of us that we're not in your cloud; If we mess up security your app is still fine.
https://GitHub.com/jgbrwn/vibebin
Also I'm a huge proponent of exe.dev
Obviously your service/approach is different than exe, more like sprites but like you said more targeted/opinionated to AI coding/sandboxing tasks it looks like. Interesting space for sure!
Your agent never has access to your secrets or even your workdir (only a copy, and only what you specify), and you pull the changes back with a diff/apply workflow, reviewing any changes before they land. You also control network access.
Free, open-source, no account needed.
https://github.com/kstenerud/yoloai
Still WIP, but the core works — three rootfs tiers (minimal Ubuntu, headless Chromium with CDP, Docker-in-VM), OCI image support (pull any Docker image), automatic thermal management (idle VMs pause then snapshot to disk, wake transparently on next API call), per-user bridge networking with L2 isolation, named checkpoints, persistent volumes, and preview URLs with auto-wake.
Fair warning: the website is too technical and the docs are mostly AI-generated, both being actively reworked. But I've been running it daily on a Hetzner server for my AI agents' browser automation, and deploy previews.
I'd love any feedback if you want to go ahead and try it yourself
is the experience similar? can i just get console to one machine, work for a bit, logout. come back later, continue?
how does i cost work if i log into a machine and do nothing on it? just hold the connection.
We do auto suspend depending on your configured timeout. We'll pause your VM and when you come back the processes will be in the exact same state as when you left.
This is purely a defense mechanism, I don't want to guarantee storing the data of an entire VM forever for non paying users. We have persistence options for them like Sticky persistence but it doesn't come with the reliability of long term persistence storage.
Thats why our pricing is usage based and we have a much larger API surface.
The cost argument for owning the hardware for this specific use case also makes sense, considering the scale these agent environments will demand. Also worth noting, sandboxes are effectively an open attack surface; architecting them not to be in your main VPC is a sound security decision from the start.
These are likely only a better value for you at large scale/if you start wanting to run hundreds.
Creating snapshots takes a 2-4 second interruption in the VM due to sheer IO that we didn't want here.
Whats especially cool about this approach is not only is fork time O(1) with respect to machine size, but its also O(1) with respect to the amount of forks.
edit: just saw the pr for freestyle. something seems to be blocking, but curious how it compares: https://github.com/computesdk/benchmarks/pull/41
So best of luck with your vision for it!
One thing:
>Freestyle is the only sandbox provider with built-in multi-tenant git hosting — create thousands of repos via API and pair them directly with sandboxes for seamless code management.
Maybe I’m just stupid, but I don’t know what this means. I initially thought I’m your target audience but after failing to understand this part I’m thinking maybe I’m not? I honestly don’t know.
But that said, the sandbox stands on its own without it.
I don’t know what “give each sandbox a unique git repository” does for me in practice, what problem it solves.
You’re not providing any practical problems your product is intended to solve.
If so, we'd very much like to test this. We make extensive use of Claude Code web but it can't effectively test our product inside the sandbox without running a K8s cluster
That said, our $50 a month plan can be used as an individual for your coding agents, but I wouldn't recommend it.
And you can go even below that by self-hosting it yourself with a very cheap Hetzner box for $2 or $5.
Self hosting is a valuable feature but our technology is unfriendly to small nodes — it will not work on consumer hardware. Many of the optimizations we spend our time on only seriously kick in above 2TB of storage and above 500GB of RAM.
This is simply not true, but also not a very charitable take.
There's nothing wrong with offering services that people find useful.
VMWare was acquired for $69Bn.
Like with 10ms then online replication/backup — analogus to litestream for sqlite — but for in memory processes becomes feasible, no?
We have another set of optimizations that we believe can take us to ~200ms in the next few months but beyond that we're pretty much completely stuck.
Realistically other sandboxes will be able to get there before us because we've chosen to support so much of Linux/if you don't run an operating system or don't support custom snapshots that is much easier.
We are researching systems of hot moving VMs across VMs but it would have very different performance characteristics.
Our tech is not decades old so there is a chance we've missed something but our layer management is atomic so I'd be shocked if you'd be able to corrupt state across forks/snapshots.
"Unlimited" as in 8vCPU and then I am billed for it on consumption?
I don't think people run real world benchmarks on what that coldstart really means though, like time to first response from a NextJS is a very important benchmark for Freestyle and we've spent a lot of time on it. While Daytona sandboxes boot faster than Freestyle ones our first response is an order of magnitude ahead of theirs.
I think another important one is concurrency: In worst case scenarios how many VMs can you get from a provider in a 5 second period is important.
I also think not enough time is spent on "Does it actually work on this VM", stuff like postgres, redis, ntftables, complex linux binaries that are hard to run need to work on these sandboxes because AI is going to need them and I don't think there has really been a feature-bench system yet.
Networking/snapshotting/persistence characteristics all also need to come into this.
I
We're working on a similar solution at UnixShells.com [1]. We built a VMM that forks, and boots, in < 20ms and is live, serving customers! We have a lot of great tools available, via MIT, on our github repo [2] as well!
[1] https://unixshells.com
[2] https://github.com/unixshells
Another question: How hard do you think it'll be to integrate this with something like Claude Code. ie: /resume in claude code both return your session and wake up your vm. Or even better /resume from freestyle and have your claude code session open where you left it.
We're built as an API for platforms to build on rather than tool for individual developers. Oriented at platform orchestrating tens of thousands at VMs rather than individuals using CLI. We also have a CLI but its primarily a debugging and testing tool.
Resuming a freestyle VM with claude code in it will just work. You can do that via SSH.
Switching tabs in the Dashboard (Domains/Routes/etc.) was basically unusable about 4 hours ago. It's noticeably better now, though there's still some latency (just retested).
Yes! And good on you, well-tuned bare metal performance is hard to beat.
We also support the forking/snapshotting/long running jobs that they struggle to.
Congrats on the launch. This is cool tech
> Freestyle is the only sandbox provider with built-in multi-tenant git hosting — create thousands of repos via API and pair them directly with sandboxes for seamless code management. On top of that, Freestyle VMs are full Linux virtual machines with nested virtualization, systemd, and a complete networking stack, not containers.
It makes me think of the git automation around rigs in Gas Town: https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
Edit: I realize the Loom is a way to look at it. Loom interrupted me twice and I almost skipped it. However it gave me a better idea of what it does, it "invents" snapshotting and restoring of VMs in a way that appears faster. That actually makes sense and I know it isn't that hard to do with how VMs work and that it greatly benefits from having only part of the VM writable and having little memory used (maybe it has read-only memory too?).
Git is useful for branching vs forking (IE you can't merge two VM forks back together), but all the tech I showed in the Loom exists independently from Git.
The hard part of it was making the VM large and powerful while making snapshotting/forking instant, which required a lot of custom VMM work.
I don't find "large and powerful" in reference to a VM to sound compelling. What should be large? The memory? The root disk? As I alluded to in my comment, I'm more curious about what can be made small.
Also I'm skeptical that if I forked a vm running a busy Gas Town that it would be very light or fast in how it forks. A well behaved sqlite I could see, but then I'd wonder why not just fork the storage volume containing the database...
In respect to large and powerful RAM + Size is important but I was more-so referring to full Linux power. The ability to run nested virtualization, ebpf, fuse, and the powerful features of a normal Linux machine instead of a container.
However, if you don't put your administrative credentials inside of the VM and treat it as an unsafe environment you can safely give it minimal permissions to access specific things that it needs and using that access it can perform complex tasks.
https://simonwillison.net/2024/Mar/5/prompt-injection-jailbr...
It is a very necessary building block for many common features that can be steered in a more deterministic way, e.g. "code interpreter" feature for data analysis or file creation like commonly seen in chat web UIs.
But like I see multiple sandbox for agents products a week. Way too saturated of a market
With respect to the market, every single sandbox sucks. I'm not gonna shit talk competitors but there is not a good sandboxing platform out there yet — including me — compared to where we'll be in 6 months.
We've heard all the platforms have consistent uptime, feature completeness, networking and debugging issues. And in our own platform we're not 1/10ths of the way through solving the requests we've gotten.
Next generation of Agents needs computers, and those computers are gonna look really different than "sandboxes" do today.
If the VM crashes/you have another idea/you want to try something else it should be reconstructable from outside of the VM.
However, I think this is potentially unrealistic. While it is the ideal architecture, I hear more and more every day people who just want to have the VMs run for months at a time.
I would actually imagine this would be useful for observably in the sense that you can fork and then kill the loop in the fork, hop into an interactive session to figure out what it’s doing, while the loop is still running in the original instance.
What matters is that its all forked atomically, which can be done with resources outside of the VM as well.
This means that while complex protocol connections like remote Postgres can break in the forks, stuff like Websockets just automatically reconnects.