Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Black-White Array: fast, ordered and based on with O(log N) memory allocations (github.com)

42 points by platzhirsch 1 days ago | 5 comments

vlowther 6 hours ago [-]

If the performance charts are to be believed, this has uniformly worse performance in fetching and iterating over items than a boring old b-tree, which makes it a total nonstarter for most workloads I care about.

It is also sort of ironic that one of the key performance callouts is a lack of pointer chasing, but the Go implementation is a slice that contains other slices without making sure they are using the same backing array, which is just pointer chasing under the hood. I have not examined the code closely, but it is also probably what let them get rid of the black array as a performance optimization.

avadodin 1 days ago [-]

You accidentally the word arrays

barbegal 23 hours ago [-]

The memory overhead is fairly significant it uses between 1.5 and 3 times the space of the data stored.

akoboldfrying 19 hours ago [-]

Interesting! I read the underlying paper, and it looks to me like a variant of log-structured merge (LSM) trees, with at most 1 segment of each power-of-2 size, and all segments stored in a single array W. The 1-bits in the totalSize variable tell you exactly which of these segments are "active" (in use), which is clever. Note that the half-size B array is only used for temporary working space, so it could even be allocated afresh at the start of each modifying operation and thrown away afterwards.

The claim of O(log n) search time isn't quite true -- it holds only for inputs drawn from a uniformly random distribution (Thm 5, p. 9). Many inputs are like this in practice, but many are not, which will cause degradation to O(log^2 n) time -- though this is still very reasonable, and may still beat or be competitive with other truly O(log n) approaches due to lower constant factors (no pointer chasing).

One idea I had to speed up searching if you know you're going to be doing a lot of it with no modifications in between: Since the data structure is agnostic to the actual values within each active segment, whenever there are several adjacent active segments, that entire contiguous block of segments can be sorted without upsetting any invariants. From then until the next modification, instead of separate binary searches on each segment, a single binary search on the entire block can be done. This will halve the total number of binary searches needed in the worst case from log2(n) for 111111... to log2(n)/2 for 101010..., at the cost of these extra sorts -- but note that we don't actually need a full sort, just a (multiway) merge, since the individual segments are already in sorted order. Merging s segments can be done in O(n log s) time with a min-heap, which is not asymptotically better than a full sort (s could be as large as log2(n)) but will be faster in practice due to usually being lower, and lower constants.

In fact, the multiway merge idea could also be used to speed up insertions that trigger more than 1 merge: Instead of a series of s merges, each between a pair of equal-size segments, you could do 1 s-way merge. I would expect this to be faster as it does 1/s as many uncacheable writes (I'm not counting writes to the min-heap itself, which will fit in L1 for all practical n), but only profiling will tell.

whiteclawlegal 1 days ago [-]

[flagged]

Dwedit 23 hours ago [-]

The best optimization is to know how big the array will be before you add any elements at all. Use Reserve. Then you completely avoid any intermediate arrays and copying. And in the case of C++, you avoid mass copy construction for classes without move constructors.

Rendered at 23:04:09 GMT+0000 (Coordinated Universal Time) with Vercel.