NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Baby's Second Garbage Collector (matheusmoreira.com)
frutiger 1 days ago [-]
I wanted to read this but I couldn’t because of all the allusions in the article that distracted me from the points the author was trying to make.
allthetime 1 days ago [-]
On mobile all the text is about 3 times bigger than it needs to be as well making for an obnoxious amount of scrolling. Unreadable code examples.
matheusmoreira 1 days ago [-]
Thanks for the feedback. That's due to a viewport configuration meta tag which I added recently in an attempt to make it more responsive in portrait mode. I just reverted it. Should work just like on desktop now.
1 days ago [-]
starkeeper 1 days ago [-]
Yeah it totally grossed me out too, it's quite twisted. Author has serious issues.
ai_critic 22 hours ago [-]
If the prose is too much for you, any number of AI services will pre-chew it and give it to you devoid of voice, character, or offense.
mananaysiempre 1 days ago [-]
TL;DR: Conservative collector. Not where I would have taken things, but valid.

Re forcing a register spill: assuming the GC is invoked via an ABI-compliant function call, you don’t actually need to save all the scalar registers manually, only the callee-save ones (as setjmp does). Alternatively, you can make the compiler do the spilling by inserting a function preserving no registers into the call chain—this is spelled __attribute__((preserve_none)) in sufficiently new Clang or GCC, but you can also accomplish this kind of thing on Watcom with #pragma aux, for example.

Re obtaining the top and bottom of the stack: in addition to __builtin_frame_address, there’s also the age-old option of declaring a local and looking at its address, as long as strict aliasing doesn’t bite you. And if you know you’re running on Linux or something sufficiently compatible (as seen from the reference to auxv), you can treat argv as the bottom of the stack for the main thread, as the startup stack on Linux is (IIRC) argv, then the argument strings, then the environment, then the auxiliary vector.

viega 2 hours ago [-]
Interesting; while I knew about `preserve_all` and `preserve_most` attributes, I didn't have an inkling that they've added `preserve_none`. But that's much cleaner that the traditional `setjmp()` or custom asm (which I've never needed-- `setjmp()` has always seemed to get the job done portably).

As for stack bounds, address of a local is easy enough to make work well, but if you care about scanning thread stacks, I'm not aware of a good way to do that with compiler intrinsics only. If you're willing to assume pthreads, it's doable, but even that requires platform-specific code. The only viable portable approaches for dealing w/ thread stacks I'm aware of aren't awesome:

1. Assume your control of thread creation isn't circumventable, and add your own sentinel at the other end of the stack. 2. Probe the stack for the first access violation.

No option is particularly satisfying there; wonder if I missed some other intrinsic added along the way to solve that problem.

DevelopingElk 24 hours ago [-]
Good? Bad? Doesn't matter as long as you had fun.

Have you tested this GCs performance? Sometimes a baby GC can be fast enough.

mananaysiempre 23 hours ago [-]
Normally the reason to do a conservative GC is because it makes the VM’s implementation (and potentially FFI) simpler, not because it’s faster. Also for fun of course. (There are reasons a fully-conservative design might be faster, like not having to havr a ton of scanning functions if you have a ton of disparate types, but in general my perception is that production GCs are normally moving to at least some degree, which a fully conservative one can’t be.)
matheusmoreira 23 hours ago [-]
Yeah. I implemented the conservative collector because it simplified development of the language's primitive functions. Allowed me to put lisp values on the C stack without worrying about them being garbage collected.

The current version of the collector also compacts the heap and moves values around. All conservatively discovered values get pinned.

mananaysiempre 19 hours ago [-]
One recent interesting partly-conservative, moving design I remember reading about is Whippet[1] by Andy Wingo (Guile maintainer, SpiderMonkey contributor). He’s been partly forced into it, though, by the fact that Guile’s existing external API (unlike, say, Lua’s) exposes the fact that it uses a non-moving collector (by allowing the user to keep hold of raw pointers to Scheme objects), so I’m not sure if this should serve as an inspiration or a cautionary tale.

[1] http://wingolog.org/archives/2023/02/07/whippet-towards-a-ne... and other related posts

matheusmoreira 19 hours ago [-]
I really like his blog! I emailed him when I published my delimited continuations article because it addresses the overlapping native/lisp stacks problem he wrote about. Sadly I don't think he's seen it.

> One way is to inform the garbage collector of the locations of all roots [...] implicitly, in the form of a side table generated by the compiler associating code locations with root locations.

I wonder if there's a way to get GCC to do this.

mananaysiempre 18 hours ago [-]
Getting GCC to do things for you is fraught. Probably still possible; just... fraught.

I believe Clang was intended to be able to do this[1] and I remember seeing that stuff even back when it was a particularly spunky research project. The facility doesn’t seem to have really gone anywhere even if it technically works; I wonder why.

In general, the problem with stack maps of any kind—even the very minimal ones you need for stack unwinding—is that they’re liable to be large, slow to process, or both. Now that I’m thinking about it, I wonder if you could map the stack at runtime using those same unwinding tables. A C++ compiler does have to know where it put things so it can call their destructors, and perhaps you could make your GC root a word-sized but otherwise opaque uncopyable word-sized thingy so the compiler can’t put it in more than one place at once. (I can’t be the first to have the idea, even with how stupid it sounds and with how utterly miserable Itanium ABI unwinding is.)

[1] https://llvm.org/docs/StackMaps.html

LeonTing1010 16 hours ago [-]
[flagged]
matheusmoreira 23 hours ago [-]
Probably bad, right? I did have fun though.

I haven't measured the performance. I would like to. I'm especially interested in comparing it with the current version of the collector. It is now capable of heap compaction which will be the subject of a future article. I'm also interested in knowing how big a problem the conservative part of the collector is. The C stack depth has been minimized significantly since I got rid of the recursive evaluator. I don't think it's a significant problem but I could be wrong.

I need to implement the compiler's profiling functions in lone. Documentation on the matter seems scarce. Not even sure if those instrumentation APIs are stable. I got away with not implementing the ASAN and stack protector since they added flags that make the compiler emit trapping instructions instead of function calls. Profiling is a lot more complex.

PaulHoule 22 hours ago [-]
'Way back when the universe was formed, the first stack frames came into existence. Nobody truly knows who mapped them there, though among the well-learned, whispers of a great kernel are heard, a certain "Linux".'

That was 1970-01-01T00:00Z right?

matheusmoreira 21 hours ago [-]
Unknown to me. My study of kernology is limited to gazing with the naked eye into the higher addresses of 64-bit address space. I have looked there for this great kernel, but have found nothing but the fog of memory protection. I was unable to determine what lurked there and ascribe a date to its conception...

When asked such a question, the adepts of this "Linux" defend a thesis supported by archaeological expeditions which revealed the year of the oldest known mailing list scroll bearing that name: 1991. However, although it is heresy, I do know of the existence of an even older sect of powerful and wise sorcerers whose numerology includes the number you mention. They profess that this "Linux" was not the first. They speak of an age long forgotten where new kernel universes were not only routinely birthed from scratch but also diverged from one another, an age they call the Epoch. They say these galaxies exist out there to this day, unobservable to us, hidden away in corporate mainframes in lands far, far away...

Morpheus_Matrix 20 hours ago [-]
[flagged]
matheusmoreira 20 hours ago [-]
Thanks for the tips! I'll take a look at perf on Linux. If it supports freestanding programs I'll definitely start using it.

> On 32-bit this was a genuine problem but on 64-bit with large virtual address spaces, the chance of an arbitrary integer being a valid heap pointer drops alot. Especially if your allocator isnt using the low addresses. So the false retention problem is probably less bad than you'd expect.

False retention could be a problem in spite of that. Lone has moved to an index-based tagged value representation. Heap values are now indexes into a large contiguous array of objects. Pointers are calculated on the fly from the heap base pointer.

I did this in order to support easy zero copy heap reallocation with mremap. Uncoupling the values from their addresses also paved the way for heap compaction. It did turn the pointers into small integers though which are probably extremely common everywhere. I could devise a scheme to XOR some constant into the value in order to randomize it a bit and just undo the operation before dereferencing the actual pointers. Not sure if it's worth the trouble though. I haven't measured anything so far.

> One thing id be curious about is how your stack depth reduction after removing the recursive evaluator affects pause time. Conservative GC pause time is often dominated by how much stack there is to scan, so getting rid of recursive eval might have already improved your worst-case pauses more than you realize.

Indeed! I expect the C stack to remain shallow throughout the entire program. It was one of the major wins of the new evaluator design.

I just used a debugger to trace garbage collection during lone's execution of my recursive (fibonacci 10) test program. The garbage collector fired four times. The conservative stack scanner worked up from the bottom of the stack to the top. The difference between top and bottom was always 1008 in all four cases. Accesses are 8 byte aligned so that means 126 iterations. It's scanning around 16 cache lines. Added a counter for conservatively discovered values and the results were: 40, 38, 37, 33. So around 26~31% of values scanned were hits. Not too bad I guess?

tug2024 23 hours ago [-]
[dead]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 20:01:19 GMT+0000 (Coordinated Universal Time) with Vercel.