Unfortunately it's not safe as the kernel can still write to (what it thinks is) the old filesystem on the device, which will introduce corruption to the new disk image.
However a fun fact is that you can (do not actually do this!) boot a qemu VM from /dev/sda. You have to use an overlay (eg. qemu -drive snapshot=on flag) so that qemu won't write through to /dev/sda. I use this trick in supernested, a script I wrote that runs nested within nested within nested VMs ad infinitum until your hypervisor crashes. http://git.annexia.org/?p=supernested.git;a=blob;f=run-super...
tux3 1 days ago [-]
I used to dual-boot windows, but I was too lazy to actually reboot, so naturally I had Virtualbox just boot the physical Windows partition while Linux was running. Which is totally fine!
It's not a real dual boot if you don't boot both partitions at the same time.
As long as you don't install guest VBox drivers, those would make it hang when it boots as the host on physical hardware, since there's no longer someone above to answer the hypercalls.
hdb2 1 days ago [-]
> I had Virtualbox just boot the physical Windows partition while Linux was running. Which is totally fine!
I had no idea that this was possible, and I learned something new today. Thank you!
ahartmetz 1 days ago [-]
I think Windows refused to do that at some point? So I booted the physical Linux partition from Windows if I needed both at the same time. That's on a laptop that otherwise almost always ran Linux.
johnisgood 1 days ago [-]
Yeah. That is a valid use. I mean, this is how I installed Windows to begin with, from Linux via QEMU, onto my other hard drive. I did reboot and test it out, and it worked just fine.
astralbijection 1 days ago [-]
That script sounds extremely unhinged, and I mean it as a compliment :)
Without spoiling too much, the command at the very end of the series does something adjacent to this.
Joker_vD 1 days ago [-]
What if we remount the filesystem(s) at /dev/sda as read-only first? Then make a small ramfs with statically-linked curl in it and exec it. Hmm. Ideally, you'd also want to call reboot(2) after it's done...
astralbijection 1 days ago [-]
All of those things get covered in parts 2, 3, and 4 :)
Joker_vD 1 days ago [-]
There's... no part 2 in the post? And it's the latest blog post on the site, as far as I can see.
Oh, I see, the posts got published in the reversed order.
On the topic itself: wow, what a journey. And I personally fully support "come on, you should totally be able to just dump the system image onto your disk and reboot/exec it!"
duskwuff 1 days ago [-]
One bit of magic you may be interested in is pivot_root, which allows another filesystem to take the place of the root filesystem (e.g. / and /mnt become /old and /). It's usually used during startup, to allow the "real" root filesystem to take the place of the initrd, but could have other uses.
Dylan16807 1 days ago [-]
Last time I tried to use it though I just could not get it to let go of the main filesystem even after repeatedly killing the processes I could and restarting the rest.
Taking control at the initrd stage, as in the second page of the article, is significantly more reliable.
But have busybox in your initrd so you don't have to suffer. It takes up 0.5% of the size of my initrd file.
tremon 1 days ago [-]
You also don't want to do this under any kind of memory pressure, because the kernel will happily drop read-only pages from memory if it thinks they can be re-read from disk when needed.
akdev1l 1 days ago [-]
in most cases you could just drop back into the initramfs that is included in most distros
Or if you have access to the boot command line you can also usually stop the boot process before pivot_root happens (hence you’ll be left running in the initramfs environment)
On Fedora/EL it would be done by putting `rd.break` in the kernel command line
vidarh 1 days ago [-]
The second part in the series deals with that by mounting it read-only from initrd.
depending on the size of your disk image and your uefi+boot partitions it's still possible to safely pull off.
unmount the efi and boot partitions, write your image to the head of the disk, power cycle, then grow the last filesystem from the image to cover the rest of the disk.
you might get lucky and have all three of uefi/boot/swap to work with.
of course with the advent of uefi, you could instead just drop an installer image directly into the efi parition and boot that.
matja 1 days ago [-]
> How do you unmount your OS’s disk while keeping the OS running to be able to overwrite itself?
I went down a similar rabbit-hole myself, with the goal of safely replacing the Linux installation on a disk that a machine is already running from (e.g. replace a VPS's setup image with one of your own) without needing a KVM-style remote access tool to the console.
The problem there is if you directly modify the disk when a filesystem is mounted on that disk then all bets are off in terms of corruption of the filesystem that's already on there and also the filesystem(s) you're writing over the top.
My solution was to kexec into a new kernel+initramfs which has a DHCP client and cURL in it - that effectively stops any filesystem access while the image is being written over the disk, then to just reboot.
codeflo 1 days ago [-]
> My solution was to kexec into a new kernel+initramfs which has a DHCP client and cURL in it - that effectively stops any filesystem access while the image is being written over the disk, then to just reboot.
It's technically not an unmount, but still a pretty strong guarantee OS will not corrupt the image being written.
When done, reboot has to be done from the same sysrq handler, of course.
rkeene2 1 days ago [-]
I usually just move all the files to a new directory (/oldroot) and pivot_root -- any open files reference the new paths. Then install into the newly empty root directory of the filesystem, reboot and delete the /oldroot.
matja 1 days ago [-]
That sounds like the best way if keeping the filesystem is an option. In my case I wanted to also change filesystems and apply FDE, which is possible to do if the original filesystem supports online shrinking but many do not.
arboles 1 days ago [-]
Don't you get any errors even if you race immediately to start pivot_root? pivot_root also won't modify all open file descriptors at once. Seems it's not fatal, but have you managed to do this over ssh and not be disconnected?
rkeene2 18 hours ago [-]
I don't know what you mean regarding pivot_root affecting file descriptors because they are not modified, they point to new names because the enclosing directory has been moved/renamed. There is a small race between moving items in the root directory as well as after moving all items and before starting pivot_root, but that race doesn't involve file descriptors but opening at the old paths before the new one is established, though lots of things use openat() these days so it doesn't really even occur in most cases then.
lloydatkinson 1 days ago [-]
The gymnastics VPS providers force people to go through just so they can have some dumb "wizard" with a limited number of OS choices is maddening. Just allow people to upload an ISO!
pzmarzly 1 days ago [-]
You will run into problems if destination drive has different sector size than your VM, as GPT header won't be aligned.
I think it should be possible to make an image with many headers at different locations, so that it works on all types of disks at once, but I don't think any tools do it for you by default.
motrm 23 hours ago [-]
Mildly pedantic, and of course ignores how wild this whole thing is, but I don't think this bit is correct:
After waiting for a little while, the program terminated with the following output:
astrid@chungus infra gzip -vc result/nixos.img | ssh root@myhost.example -- bash -c 'gunzip -vc > /dev/sda'
root@myhost.example's password:
77.8% -- replaced with stdout
What happened here?
The 77.8% bit is gunzip -v reporting that it finished decompressing the data to stdout and that the compression ratio was 77.8%... so this invocation may well have succeeded. Assuming, as rwmj points out, nothing else stomped on any of the written blocks.
I do like this idea - with sufficient prep of the system before writing the image, namely stopping as many processes as possible especially those that might do some writing, it's a quick and dirty way to replace a stock OS with a ready-made image. Could perhaps be safer doing it twice, once into a minimal image that does very little beyond network bringup & runs ssh, followed by final OS replacement in a (more) controlled manner.
SamWhited 1 days ago [-]
Reminds me of the first company I worked for out of school.
We had a big drive with the source of truth image used to boot all our machines on it, and we added rsync to the init image. When each machine booted init would rsync everything from the storage box to the local machine. We'd keep the storage machine up to date and when we wanted to update other machines in the fleet we'd just do a reboot and it would sync up the latest files (provisioning for whatever each machine was supposed to do happened later, can't remember how that was handled now). The storage machine was running ZFS so we also took a snapshot before doing any rolling reboots, so if anything did go wrong you could just revert to the previous snapshot and reboot again as long as you didn't break the init image.
Sounds jank saying it out loud, but I don't remember it ever causing us any problems.
M95D 1 days ago [-]
From the article:
> The OS may stop you from unmounting /dev/sda1, but it won’t stop you from writing to /dev/sda1 or /dev/sda even if there’s something mounted!
Not always true. There's a kernel config option that allows it. CONFIG_BLK_DEV_WRITE_MOUNTED
Sophira 1 days ago [-]
It's worth noting, though, that that config option was only introduced in kernel version 6.8! Before then the option didn't exist and you could write with impunity to mounted devices (as root, obviously).
1 days ago [-]
alexellisuk 1 days ago [-]
This reminds me of netbooting workflows from things like MaaS, Tinkerbell, and Dan's old Plunder tool.
They'd netboot.. not mount the disks, then download an ISO/IMG and write it directly to the primary boot disk.
If netbooting is a heavy lift, why not boot into a custom initramfs you built, with i.e. dd/curl installed, and flash the disk that way, without mounting / at all? Then kexec/chroot into it?
I'd much prefer this as a way to provision Raspberry Pis.
> Well, what can we try instead?
> write to the mounted disk anyways. fuck you
Stupid penguin trick I learned: Add a file inside ramdisk (i use /dev/shm) as LVM PV.
pvmove off the hard drive
Boom, now your OS lives entirely in RAM
You can now even replace the hard disk, put a new one and migrate back.
Or migrate to network storage (nbd,iSCSI etc.), re-sequence disks into whatever RAID you need, and migrate back
Need to fix /boot after that tho, and probably make sure to not have power failure in meantime
e12e 1 days ago [-]
Nice series! Really takes me back to the days of Linux 1.x kernel, Lilo and trying to fit a kernel and initrd on a single floppy disk.
So ending up at:
> From a 292MB initramfs, we now have a 6.1MB initramfs, smaller than almost every other distro's initramfs and made entirely to run busybox wget dd.
Is pretty great achievement today - but way bigger than something that can fit on a floppy.
justsomehnguy 53 minutes ago [-]
Modern "floppy" is just any USB thumbdrive you found lying around. Just giggles checked my 'to go' retailer - there are whopping 3 variants of 4GB no-name drives for $6. For $12 you would get a Kingston one with 64GB.
"Floppy" image size is not important now nor for the last 15 years.
astralbijection 24 hours ago [-]
To be honest, even this has plenty of room to go down. I get the feeling I could have squeezed a couple more MB off if I had actually cut things off of the default Nixpkgs busybox, and possibly also cut a couple of kernel drivers out.
dizhn 1 days ago [-]
Reminded me of how to install Alpine linux (which isn't available) on Oracle cloud over an ubuntu install. It uses dd and has the advantage of having a console.
I had found it in a github gist when I used it but here's a similar blog post.
Wait hold on, can you not simply just access the underlying volume/block device using an API? The VMs in OCI have a boot volume that is attached, so I reckon it's possible to "mount" this somehow and overwrite it with whatever data you want.
dizhn 1 days ago [-]
I am not sure. Maybe it's a thing about not being able to download the iso (no network on the console?) or not having space for it or something. I wouldn't know about the API thing. I am not a cloud user.
Made me think though.
astralbijection 1 days ago [-]
From what it sounds like, because you have a console and therefore aren't dependent on SSHD not getting overwritten, you can just dd the live running system here?
dizhn 1 days ago [-]
Like I said in the other comment I am not really familiar with the various clouds. I self host. The console is a weird web based thing that isn't the same environment as your VMs (or their hypervisor). It's a barebones shell where you can mount your volumes and such, not actually boot or enter the vm. (Edit: This was incorrect) And if I remember correctly I probably did have to do some mounting there to create device files and such. I didn't really have much use for the console after that either. I will try to find the actual gist I followed.
Why would not have not done the dd bit on the console? I have no idea. Again possibly can't download the iso there?
tosti 1 days ago [-]
If you have a swap partition, swapoff it and install there. Or at least a minimal kernel and initramfs. Set as default in grub and there you go.
Also, I once burned an iso straight from ftp using a fifo. I was low on disk space and really needed that CD. Worked fine because the Internet was already faster than the CDR.
klinch 1 days ago [-]
Sounds cursed. But I'm not judging, given that I use nixos-anywhere[0] on an almost weekly basis.
> "download a pre-prepared disk image directly to your disk"
Well not quite direct; the bits go through your RAM in between.
indigodaddy 1 days ago [-]
NOC techs have been doing these tricks for tens of years
creantum 1 days ago [-]
Just because you can doesn’t mean you should.
anshulbasia27 1 days ago [-]
Happened with me as well
ma2kx 1 days ago [-]
Why not just use netboot?
kotaKat 1 days ago [-]
you may be in a restricted environment with no boot option selections, like on some VPS and dedi server providers.
i've seen similar techniques used to shove windows on "linux" VPS/dedis boxes by booting into rescue mode and then applying a raw Windows boot image that's preconfigured and rebooting back to the Windows install and hoping you stood the image up right.
good ol' days of getting Windows up on Kimsufi boxen.
megous 1 days ago [-]
Instead of applying some sense to the problem, and using a solution that actually allows you to kill all running processes of the original distro at runtime, incl. getting rid of the original init process, to be able to pivot_root somewhere else amd umount the original system's filesystems and free the block device for re-installation, this ridiculous approach gets promoted to a front page, lol.
astralbijection 5 hours ago [-]
I did learn about systemctl switch-root after finishing my kexec solution, but I think if you're at the point of reimaging a server over the network, you probably don't care enough to shut it down correctly :)
igtztorrero 1 days ago [-]
Can I run a Windows qcow2 disk imagen on a Contabo Vps ?
poppafuze 1 days ago [-]
Looking forward to seeing a device with a short image that has the string "404" on it.
irishcoffee 1 days ago [-]
I've been dd-ing A/B partitions for embedded yocto distributions for years and years. read-only-rootfs (/var/log is its own writable partition), dd the "other partition", sed fstab, reboot.
The neat part was the whole process kicked off when you scp'd the rootfs and inotifywait kicked off the whole process.
BirAdam 1 days ago [-]
Yeah, make /home, /var/log, and /usr/local rw and everything else ro. Makes a great "immutable" that's not as annoying as truly "immutable" systems.
nkondratyk 1 days ago [-]
[dead]
unmayx 12 hours ago [-]
[dead]
Nahid890 1 days ago [-]
[dead]
Rendered at 20:11:30 GMT+0000 (Coordinated Universal Time) with Vercel.
However a fun fact is that you can (do not actually do this!) boot a qemu VM from /dev/sda. You have to use an overlay (eg. qemu -drive snapshot=on flag) so that qemu won't write through to /dev/sda. I use this trick in supernested, a script I wrote that runs nested within nested within nested VMs ad infinitum until your hypervisor crashes. http://git.annexia.org/?p=supernested.git;a=blob;f=run-super...
It's not a real dual boot if you don't boot both partitions at the same time.
As long as you don't install guest VBox drivers, those would make it hang when it boots as the host on physical hardware, since there's no longer someone above to answer the hypercalls.
I had no idea that this was possible, and I learned something new today. Thank you!
Without spoiling too much, the command at the very end of the series does something adjacent to this.
On the topic itself: wow, what a journey. And I personally fully support "come on, you should totally be able to just dump the system image onto your disk and reboot/exec it!"
Taking control at the initrd stage, as in the second page of the article, is significantly more reliable.
But have busybox in your initrd so you don't have to suffer. It takes up 0.5% of the size of my initrd file.
Or if you have access to the boot command line you can also usually stop the boot process before pivot_root happens (hence you’ll be left running in the initramfs environment)
On Fedora/EL it would be done by putting `rd.break` in the kernel command line
[0]: https://www.man7.org/linux/man-pages/man8/xfs_freeze.8.html
unmount the efi and boot partitions, write your image to the head of the disk, power cycle, then grow the last filesystem from the image to cover the rest of the disk.
you might get lucky and have all three of uefi/boot/swap to work with.
of course with the advent of uefi, you could instead just drop an installer image directly into the efi parition and boot that.
I went down a similar rabbit-hole myself, with the goal of safely replacing the Linux installation on a disk that a machine is already running from (e.g. replace a VPS's setup image with one of your own) without needing a KVM-style remote access tool to the console.
The problem there is if you directly modify the disk when a filesystem is mounted on that disk then all bets are off in terms of corruption of the filesystem that's already on there and also the filesystem(s) you're writing over the top.
My solution was to kexec into a new kernel+initramfs which has a DHCP client and cURL in it - that effectively stops any filesystem access while the image is being written over the disk, then to just reboot.
That's what I was expecting from the article.
Update: It's not obvious, but it turns out that this is a multipart article, and kexec is reserved for part 3: https://astrid.tech/2026/03/24/2/how-to-pass-secrets-between...
https://www.kernel.org/doc/html/latest/admin-guide/sysrq.htm...
It's technically not an unmount, but still a pretty strong guarantee OS will not corrupt the image being written.
When done, reboot has to be done from the same sysrq handler, of course.
QEMU defaults to 512B sectors, which isn't true for many NVMe drives. There are some flags to change that. https://unix.stackexchange.com/a/722450
I think it should be possible to make an image with many headers at different locations, so that it works on all types of disks at once, but I don't think any tools do it for you by default.
I do like this idea - with sufficient prep of the system before writing the image, namely stopping as many processes as possible especially those that might do some writing, it's a quick and dirty way to replace a stock OS with a ready-made image. Could perhaps be safer doing it twice, once into a minimal image that does very little beyond network bringup & runs ssh, followed by final OS replacement in a (more) controlled manner.
We had a big drive with the source of truth image used to boot all our machines on it, and we added rsync to the init image. When each machine booted init would rsync everything from the storage box to the local machine. We'd keep the storage machine up to date and when we wanted to update other machines in the fleet we'd just do a reboot and it would sync up the latest files (provisioning for whatever each machine was supposed to do happened later, can't remember how that was handled now). The storage machine was running ZFS so we also took a snapshot before doing any rolling reboots, so if anything did go wrong you could just revert to the previous snapshot and reboot again as long as you didn't break the init image.
Sounds jank saying it out loud, but I don't remember it ever causing us any problems.
> The OS may stop you from unmounting /dev/sda1, but it won’t stop you from writing to /dev/sda1 or /dev/sda even if there’s something mounted!
Not always true. There's a kernel config option that allows it. CONFIG_BLK_DEV_WRITE_MOUNTED
They'd netboot.. not mount the disks, then download an ISO/IMG and write it directly to the primary boot disk.
If netbooting is a heavy lift, why not boot into a custom initramfs you built, with i.e. dd/curl installed, and flash the disk that way, without mounting / at all? Then kexec/chroot into it?
I'd much prefer this as a way to provision Raspberry Pis.
Stupid penguin trick I learned: Add a file inside ramdisk (i use /dev/shm) as LVM PV.
pvmove off the hard drive
Boom, now your OS lives entirely in RAM
You can now even replace the hard disk, put a new one and migrate back.
Or migrate to network storage (nbd,iSCSI etc.), re-sequence disks into whatever RAID you need, and migrate back
Need to fix /boot after that tho, and probably make sure to not have power failure in meantime
So ending up at:
> From a 292MB initramfs, we now have a 6.1MB initramfs, smaller than almost every other distro's initramfs and made entirely to run busybox wget dd.
Is pretty great achievement today - but way bigger than something that can fit on a floppy.
"Floppy" image size is not important now nor for the last 15 years.
I had found it in a github gist when I used it but here's a similar blog post.
https://alextsang.net/articles/20191006-063049/index.html
Made me think though.
Here's the gist I had used. It's really simple. https://gist.github.com/unixfox/05d661094e646947c4b303f19f9b...
Why would not have not done the dd bit on the console? I have no idea. Again possibly can't download the iso there?
Also, I once burned an iso straight from ftp using a fifo. I was low on disk space and really needed that CD. Worked fine because the Internet was already faster than the CDR.
[0] https://github.com/nix-community/nixos-anywhere
now go back to diskette 2...
now please put diskette 15 again....
https://support.tools/dd-over-netcat-clone-drive-remote-back...
But I like the curl approach very much!
Well not quite direct; the bits go through your RAM in between.
i've seen similar techniques used to shove windows on "linux" VPS/dedis boxes by booting into rescue mode and then applying a raw Windows boot image that's preconfigured and rebooting back to the Windows install and hoping you stood the image up right.
good ol' days of getting Windows up on Kimsufi boxen.
The neat part was the whole process kicked off when you scp'd the rootfs and inotifywait kicked off the whole process.