NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Bytecode VMs in surprising places (2024) (dubroy.com)
qubex 17 minutes ago [-]
I’m old enough that I could do more computation on my father’s office’s laserwriter sending it postscript than I could on my Amstrad 1512.
ykl 22 hours ago [-]
A medium-spicy take of mine is that a bytecode VM in a GPU kernel is not as bad of an idea as one might think, and in some cases it can actually be the most reasonable solution. Some fun examples:

1. As mentioned in the post above, the Dolphin emulator famously implements the entire Gamecube/Wii GPU pipeline in a single gigantic ubershader, and this is useful because it avoids shader compilation stalls [1].

2. Blender's Cycles renderer implements its shading graph eval system as a bytecode VM in a GPU kernel [2]. IIRC early versions of Vray GPU did something similar. There are better ways of course, but a VM gets you surprisingly far as a general approach.

3. Finally, a lot of ML frameworks (Tensorflow, PyTorch, etc) by default use the GPU relatively suboptimally (especially without kernel fusion and such). Tensor frameworks can extract a lot more perf out of GPUs using a VM-in-a-giant-kernel approach [3].

If you think abstractly about how a GPU SM actually works (using CUDA terminology here), all threads in a warp must execute in lockstep and the cost of execution divergence across threads in a warp is that you effectively run serially, losing the parallel advantage of the SM. This penalty gets magnified enormously if you are doing memory reads after wherever the execution divergence happens, since you now have multiple slow memory stalls in serial instead of one big memory read at once for all threads. If you're clever about implementing a bytecode VM, you can load as much state as you need upfront into shared memory, and then if your bytecode VM is just looping through executing a bunch of opcodes in a huge switch statement, then at least as far as the SM is concerned, there's no execution divergence! All threads look like they're doing the same thing at the same time; even if within the VM what is happening a lot is just no-ops, at the SM level you're not dealing with serialized memory stalls and serial scheduling and such.

Is it the _best_ most optimal approach imaginable? Almost certainly not! But can it be a _surprisingly good_ and possibly even reasonable approach for some problem domains and specific constraints? Yeah absolutely!

[1] https://dolphin-emu.org/blog/2017/07/30/ubershaders/ [2] https://www.youtube.com/watch?v=etGMk9wYwNs&t=1882s [3] https://hazyresearch.stanford.edu/blog/2025-09-28-tp-llama-m...

genxy 21 hours ago [-]
Linux running in a shader https://blog.pimaker.at/texts/rvc1/
mastermage 10 hours ago [-]
thats crazy
superjan 1 days ago [-]
How about the infamous iOS hack with a VM implemented in a JBIG2 PDF? https://projectzero.google/2021/12/a-deep-dive-into-nso-zero...
larodi 9 hours ago [-]
it should, by all means, be in this otherwise excellent, but somewhat incomplete article.
yoz 21 hours ago [-]
This is one of my favourite exploit stories. Incredible work.
magnat 1 days ago [-]
Some other examples:

- ACPI configuration for power management and platform stuff [1]

- Bitcoin transactions [2]

- TrueType fonts [3]

[1] https://wiki.osdev.org/AML

[2] https://en.bitcoin.it/wiki/Script

[3] https://learn.microsoft.com/en-us/typography/opentype/spec/t...

m132 1 days ago [-]
Since ACPI was mentioned, let's not forget about EFI!

https://uefi.org/specs/UEFI/2.10/22_EFI_Byte_Code_Virtual_Ma...

segbrk 1 days ago [-]
Since that page is a little dense, the higher-level version: PCI supports Option ROMs (OpRoms) - plug in device like a NIC or a GPU, your BIOS actually loads compiled code from it and executes it on the CPU. In many systems for example PXE booting (net booting) is actually a function of the NIC, executing code on the CPU to load an operating system. We're talking actual x86/x86_64 machine code here running in the privileged pre-boot environment. Not portable or secure in any way. OpRoms _may_ now be checked for SecureBoot signatures on systems where that's set up properly at least.

EFI ByteCode (EBC) is meant to help at least the portability side. I'm not sure if anybody is actually delivering devices with EBC OpRoms yet though. I'm also not sure if anybody is looking at using the EBC VM to sandbox untrusted OpRoms.

mjg59 22 hours ago [-]
"Yet"? The only card anyone's ever found that shipped with an EBC option ROM was from about 20 years ago, nobody's migrating to EBC and the general approach is to just emulate the x86 instructions instead. And secure boot has been verifying option ROMs since 2012.
eptcyka 23 hours ago [-]
Does this imply that plugging in a NIC into an ARM or PowerPC machine might fail to pxe boot if the manufacturer hasn’t prepped code for those platforms?
p_l 22 hours ago [-]
Not "might" - will.

That's why there were separate "Mac editions" of certain cards (like GPUs) - the Option ROMs were different to support the Mac's frankensteined PPC OpenFirmware-like setup, and later to provide early EFI option roms when most x86-targeting cards were shipping with classic VBIOS.

EDIT: And while there was x86 emulator on many firmwares, it was often not enough to run everything, and x86 NIC firmware won't work for netbooting a PPC machine

genxy 21 hours ago [-]
The network is the computer
chirsz 1 days ago [-]
SBus peripherals use the Forth language in their PROMs to initialize themselves[1].

[1] https://docs.oracle.com/cd/E19957-01/802-3239-10/sbusandfc.h...

DonHopkins 1 days ago [-]
Good call! (Whether it's a directly threaded, indirectly threaded, subroutine threaded, token threaded, Huffman threaded, or string threaded call.)

https://en.wikipedia.org/wiki/Threaded_code#Token_threading

Mitch Bradley created OpenFirmware. It started at Sun as OpenBoot (informally "SunForth") on the SPARCstation 1 in 1989, was standardized as IEEE 1275-1994, and was renamed OpenFirmware at that time. Its lineage runs back through Mitch's earlier Forthmacs (Bradley Forthware, early 80s), which ran on 68k Macs, Sun-2/3, Atari ST, and Amiga. Mitch credits Henry Laxen and Michael Perry's F83 and Glen Haydon's MVP-Forth as the public-domain ancestors.

The metacompiler can target many platforms, word sizes, CPUs, and threading models, and produce stripped ROMable images. It can build the kernel as direct-threaded (DTC), indirect-threaded (ITC), subroutine-threaded (STC), or token-threaded (TTC), with 16, 32, or 64 bit cells. Shipping kernels are DTC native code with cell-sized xt pointers: 32 bit on the original SPARC and PowerPC machines, 64 bit on modern PPC64, SPARC64, and ARM64 builds.

Peripheral expansion cards ship a separate, portable, variable-byte token format called FCode. The kernel interprets FCode at boot/probe time and recompiles it on the fly into the live native dictionary. After probe, FCode-loaded drivers run as ordinary native Forth words. That two-stage design (fast native runtime, portable FCode transport) is what let Sun ship one card PROM image that worked across CPU generations.

https://github.com/MitchBradley

https://github.com/MitchBradley/openfirmware

FCode was designed for SBus on the SPARCstation 1, with cross-CPU portability built in. Sun's earlier and contemporary buses were not interchangeable with SBus (Sun-2 used Multibus, Sun-3 used VMEbus, the Sun386i "Roadrunner" used AT-bus), so the cross-architecture payoff arrived later, when IEEE 1275-1994 standardized OpenFirmware and PCI allowed FCode in option ROMs. After that, the same expansion-card PROM image could boot on Sun SPARC, Apple PowerPC Macs, IBM PowerPC servers (CHRP), and the OLPC XO.

Interview with Mitch Bradley (he's like the Woz of Forth):

https://web.archive.org/web/20120118132847/http://howsoftwar...

In parallel with the OpenBoot work, Mitch also developed an extremely portable C-based Forth (the public version is "C Forth 93"). It runs a switch-threaded inner interpreter over packed tokens, with configurable cell width (16, 32, or 64 bit) and configurable token width (pointer-sized by default, 16 bit with the T16 build flag for tight flash budgets), plus a small hand-rolled FFI built around a fixed-arity 12-argument marshalling trampoline driven by a format string. It is now the embedded variant used in OLPC's OpenFirmware and in PlatformIO targets including RP2040, Teensy, ESP32, ESP8266, and STM32:

https://github.com/MitchBradley/cforth

OpenFirmware even has its own song:

https://www.youtube.com/watch?v=b8Wyvb9GotM

More on Mitch, OpenFirmware, and CForth:

https://news.ycombinator.com/item?id=21822840

https://news.ycombinator.com/item?id=33681531

https://news.ycombinator.com/item?id=38689282

yoz 21 hours ago [-]
This is FASCINATING, thank you!
actionfromafar 22 hours ago [-]
Power Macs had an x86 emulator which ran the x86 ROM in PCI cards.
mjg59 22 hours ago [-]
I don't think that's true? Macs were running Open Firmware, they had an expectation of the same Forth code that Suns made use of, and several cards needed to be flashed with Apple firmware to be Mac compatible. Alphas definitely ran x86 video card init code under emulation, though.
BuildTheRobots 4 hours ago [-]
iirc, the OpenFirmware boot image was larger than the equivalent BIOS image - I've got half a memory of resoldering ROM chips on ATi cards so they could be cross-flashed to work in G3/G4 PoweMacs.
actionfromafar 20 hours ago [-]
I probably misremembered Alphas.
mjg59 14 hours ago [-]
Very easy things to confiate given we're talking about hardware that's on the order of 30 years old!
anthk 1 days ago [-]
I ran EForth under the Subleq from Howe R.J at https://github.com/howerj/muxleq (the subleq one) first at QuickJS (trivial tasks, almost a 1:1 map from the C code, made in a hurry) and under... jsinterp.py from the infamous yt-dlp but using arrays instead of printing functions. But... if yt-dlp's "mini-JS" implements some captcha input functions... you can add I/O with ease and run EForth with what they call (not me) a "Not totally functional interpreter".

Not totally... until people there run the 110 rule program, Conway's Life, Subleq+EForth...

DonHopkins 1 days ago [-]
You may need to write a WebGPU shader and run it in a Beowulf Cluster to make that run fast!
anthk 5 hours ago [-]
I ran EForth under Muxleq (multiplexed subleq) under an n270 Atom (32 bits Intel) and was fast enough. Much slower than GForth or even PFE (which is slow compared to GForth), but usable even to do Algebra exercises. Rendering a Mandelbrot fractal (ASCII) lasted half a minute but it's amazing that few lines of C enable you to run a Forth with input composed of numbers. I even have a backup in paper.

https://sites.google.com/view/win32forth/win32forth-readme/m...

I did some syntax changes for floats and that's it.

drob518 1 days ago [-]
On one hand, all these mini interpreters and compilers are cool. I have a soft spot for extensible systems. On the other hand, all these things are a huge security problem. When every subsystem and data format is carrying around its own Turing complete bytecode and JIT, they all need to be secure and bug free for the system to be secure and bug free. And that far more code surface to keep clean.
petra 1 days ago [-]
Maybe they can compile the bytecode to the x86 subset in this paper, and check if it is secure using their tool:

https://dl.acm.org/doi/pdf/10.1145/2254064.2254111

22 hours ago [-]
krab 21 hours ago [-]
> When every subsystem and data format is carrying around its own Turing complete bytecode and JIT

LLMs enter the chat

jaen 1 days ago [-]
References for the Quake virtual machines:

Quake 1 had QuakeC: [1] https://en.wikipedia.org/wiki/QuakeC [2] Hello world in QuakeC - https://www.leonrische.me/pages/quakec_bytecode_hello_world....

Quake 2 moved to native binaries.

Quake 3 had a new VM that enabled compiling regular C using LCC: [1] https://fabiensanglard.net/quake3/qvm.php [2] Spec - https://www.icculus.org/~phaethon/q3mc/q3vm_specs.html

0john 11 hours ago [-]
Lua became pretty popular to use in other games for the purposes of scripting, but no surprises there I guess.

Lesser known- games using Havok Physics may have used Havok's MOPP (a bytecode and interpreter for partitioning and searching the geometry).

https://github.com/niftools/nifxml/wiki/Havok-MOPP-Data-form...

pervasif 1 days ago [-]
These little VMs in applications are everywhere. Apple Mach-O binaries have built in opcodes for binding and rebasing symbols interpreted by (numerous) little VMs in dyld:

https://github.com/apple-oss-distributions/dyld/blob/e9da5ae...

https://github.com/apple-oss-distributions/dyld/blob/e9da5ae...

Their use is less common now since the introduction of the mach-o load command LC_DYLD_CHAINED_FIXUPS, but these opcodes still have to be supported for older binaries. Also, some popular compilers including Zig still emit these opcodes for LC_DYLD_INFO and LC_DYLD_INFO_ONLY.

raddan 1 days ago [-]
I was told by an engineer at Microsoft that Excel's formula interpreter is essentially a kind of bytecode-based stack machine. This came up in the context of a bug I found (while working on a project with Microsoft) that revealed that not only was there a small floating-point bug in some calculations, but (improbably, to me) that Excel preserved this inaccuracy across architectures for decades. So the bytecode interpreter made sense. That said, I've never seen this implementation myself, so it may still be rumor.
pratikdeoghare 1 days ago [-]
There is one in golang regular expressions https://swtch.com/~rsc/regexp/regexp2.html

I guess that is why you say re.Compile.

rhdunn 1 days ago [-]
That goes back to Ken Thompson's NFA regex interpreter from 1968 [1], [2], [3]. Note: that whole regex series by Russ Cox [4] is great.

[1] https://dl.acm.org/doi/10.1145/363347.363387 -- Programming Techniques: Regular expression search algorithm

[2] https://swtch.com/~rsc/regexp/regexp1.html -- Regular Expression Matching Can Be Simple And Fast

[3] https://swtch.com/~rsc/regexp/regexp2.html -- Regular Expression Matching: the Virtual Machine Approach

[4] https://swtch.com/~rsc/regexp/ -- Implementing Regular Expressions

kqr 1 days ago [-]
I second the Russ Cox recommendation. I read that ages ago and that was what made me realise some theory could actually be useful in practice.
pjc50 1 days ago [-]
All regular expressions are deterministic final automata https://en.wikipedia.org/wiki/Deterministic_finite_automaton (finally, a use for my CS course); the extent to which that counts as a virtual machine varies. Some of the regex syntaxes extend it in ways which don't fit in a DFA and do count as a VM; Perl-compatible RE used to be popular (e.g. in Exim).
titzer 1 days ago [-]
It's easier to construct NFAs directly from regular expression definitions (rather than DFAs) because implementing the choice operator is easier. We can convert from NFA to DFA with worst-case exponential blowup.
sureglymop 1 days ago [-]
Interesting. Not that surprising that it works like this. But isn't it a little surprising that things like regexes, printf syntax and other DSLs aren't mostly handled and parsed at compile time in 2026?
pjc50 1 days ago [-]
Kind of language-dependent since regexes are normally specified as strings and most languages are pretty weak at "run this code at compile time". One of the things Rust users are fond of.

C# is in the middle on this one, where specific features get compile-time support and regex is one of them: https://www.devleader.ca/2026/05/03/c-regex-performance-gene...

I have also built a C# source generator myself (XML parser generator), but the developer experience is a bit of a hill to climb compared to what it could be.

tptacek 1 days ago [-]
More surprising to me than the BPF VM itself is the optimizing compiler for it that lives in libpcap.
dlojudice 1 days ago [-]
Another World (Out of this world) game had its own bytecode [1]

[1] https://github.com/fabiensanglard/Another-World-Bytecode-Int...

gmerc 22 hours ago [-]
Many games have. Neverwinter Nights (and descendents like the Witcher), Dragon Age, Jade Enpire implemented their own byte code scripting language.

Fun fact, for the console port of Dragon Age: Origins the scripts were cross compiled to cpp.

account42 7 hours ago [-]
Many games have their own scripting system with or without byte code. This is however not the same as Another World where the entire game is implemented on top of a small VM.
mikewarot 21 hours ago [-]
Don't forget Sweet16 by Woz

https://en.wikipedia.org/wiki/SWEET16

ivankelly 1 days ago [-]
Quake had it’s own vm also
1 days ago [-]
childintime 12 hours ago [-]
I guess all (or most) of these could be replaced by a RISC-V VM, with the necessary domain extensions. A RISC-V VM already competes with WASM for many applications.
sph 12 hours ago [-]
Yes. After implementing an RV64IM simulator over a weekend, the most valuable take away is that it is a simple and well-designed enough architecture it should be the base for any register virtual machine.

Instead of reinventing the wheel, just copy RISC-V. And the bonus is that you get all the existing tooling for free. Seeing a Rust program run on my simulator I wrote in two days is pretty magical.

Right now I’m working on a RISC-V on RISC-V simulator for sandboxing programs. I’m a big fan.

grishka 23 hours ago [-]
Smithereen, my fediverse server, contains a bytecode VM for the `execute` API method: https://smithereen.software/docs/api/methods/execute

I plan to eventually use it for things like automatic spam filtering as well.

omeid2 1 days ago [-]
This list is entirely incomplete without mentioning Java Card.

There is a tiny Java Bytecode VM in an insanely large list of places, you can find some of them here:

https://github.com/crocs-muni/javacard-curated-list https://en.wikipedia.org/wiki/Java_Card

p_l 22 hours ago [-]
Naughty Dog's Uncharted games for PS3 used bytecode VMs for various graphic tasks - essentially they implemented shaders running on SPUs using their custom bytecoded VM, with compiler written in Scheme.
kurtoid 18 hours ago [-]
LEGO Mindstorms programs run in an on-brick VM, iirc
indrora 4 hours ago [-]
Kinda-Sorta.

If you read Proudfoot's docs [ https://www.mralligator.com/rcx/ ] you'll find that what Lego did was half VM half native half "well, it depends".

There's a BIOS/stdlib, which in turn boots a userspace OS held in RAM ("firmware") that then executes the assembled mini-VM. However, there was nothing keeping people from rewriting the in-ram OS with something else, which led to BrickOS, jeJOS, pbForth, ROBOLAB, etc.

I spent many, MANY hours of my youth hacking on the RCX and am damn sad that there isn't currently a good replacement for it.

kazinator 1 days ago [-]
Busicom 141 PF calculator (1971). This was a product built on the Intel 4004 processor. It was not programmed using Intel 4004 machine langauge directly, but using a more powerful machine language for which the 4004 ran an intepreter included in the image.
twic 1 days ago [-]
The Python pickle format is a bytecode [1], although not a Turing-complete one, I think.

[1] https://formats.kaitai.io/python_pickle/

hansvm 24 hours ago [-]
Pickle is definitely turing-complete. It's a super easy way to RCE your system.
twic 20 hours ago [-]
Where does that come from though? I don't see any flow control or anything else compute-y in the bytecode itself. I know unpickling can run Python code, but i wouldn't say that makes the bytecode itself Turing-complete.
hansvm 5 hours ago [-]
Among other things, a couple big culprits are STACK_GLOBAL, which converts strings on the stack into a Python object, functioning something like

  global_name = pop()
  module_name = pop()
  push(getattr(import_module(module_name), global_name))
And REDUCE, which executes code

  args = pop()
  f = pop()
  push(f(*args))
I think you're right that if you ignore the Python bits it's not a turing-complete stack machine, but I'm not sure ignoring those is fair.
self_awareness 1 days ago [-]
RarVM was used in a previous version of the format, newest RAR has removed it, and RarV5 doesn't have a VM.
atiedebee 11 hours ago [-]
Another interesting one is the ZPAQ compression program[1]. It is one of the top performers on the large text compression benchmark[2] and uses the bytecode to specify how to model the data.

[1]: https://en.wikipedia.org/wiki/ZPAQ

[2]: https://mattmahoney.net/dc/text.html

ignoramous 1 days ago [-]
TikTok shipping XOR cipher'd bytecode & interp is right up there: https://news.ycombinator.com/item?id=34109771
pjc50 1 days ago [-]
VM for obfuscation is a whole thing. Denuvo has a particularly complicated one https://connorjaydunn.github.io/blog/posts/denuvo-analysis/

Other game examples using VMs not for obfuscation: Z-machine and SCUMM-VM.

anthk 12 hours ago [-]
Also I think X.org (or XFREE86) had an X86 mini VM under non-X86 machines with libint10/vm86.

https://raw.githubusercontent.com/XQuartz/xorg-server/refs/h...

https://cgit.freedesktop.org/xorg/xserver/plain/hw/xfree86/i...

anthk 1 days ago [-]
yt-dlp's jsinterp.py

https://jxself.org/compiling-the-trap.shtml

I've got subleq+eforth (https://github.com/howerj/muxleq) running in JS which is dead simple to do. No input but I could output ASCII mapping values to an array.

https://esolangs.org/wiki/Subleq

So, yes. yt-dlp runs propietary Youtube JS code defying the original purpose.

faangguyindia 1 days ago [-]
Why youtube does not use tls fingerprint to block ytdlp?
pocksuppet 1 days ago [-]
possibly because yt-dlp updates rapidly and would simply switch to the correct fingerprint, but Google-approved clients use many different and uncontrollable fingerprints (as they use OS TLS facilities for example).
wiseowise 1 days ago [-]
Hopefully, an iota of decency.
aa-jv 9 hours ago [-]
Being a Debian Guy, I recently found that RPM's .spec-file macro format was an unnecessarily tedious affront to my dignity, requiring immense ergs of patience to learn and understand - that is, until I discovered that underneath it all is a Lua VM - in which case I rapidly became a .spec afficionado and advocate upon loading up the REPL-like tooling to do some actual work.

It was a delightful, yet bittersweet surprise, to discover my favourite language and VM of choice was the cause of so much frustration - but yet, once I was able to wrangle my .spec files via the REPL, a certain kind of zen state was attained and I was actually able to ship the .spec properly.

I continue to be amazed at just where and when the Lua VM pops up. I've used it myself for many, many wonderful things, and shouldn't be surprised of course .. because Lua is the VM that just keeps on giving. It is out there in so many wonderful places ..

majorbugger 1 days ago [-]
Does it mean we can play Doom on WinRar?
dsecurity49 1 days ago [-]
[flagged]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 18:15:30 GMT+0000 (Coordinated Universal Time) with Vercel.