NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
A header-only C vector database library (github.com)
eatonphil 1 days ago [-]
As data stores go go this is basically in memory only. The save and load process is manually triggered by the user and the save process isn't crash safe nor does it do any integrity checks.

I also don't think it has any indexes either? So search performance is a function of the number of entries.

andy99 22 hours ago [-]
Does declaring a function as inline do anything for any modern compiler? I understood that this is basically ignored now and is the compiler makes its own decisions based on what is fastest.
wasmperson 21 hours ago [-]
The idea that it does nothing is a persistent myth. Both GCC and Clang heed it although neither treats it as a mandate:

https://tartanllama.xyz/posts/inline-hints/

This library seems to have the annotation on every function, though, so it's possible the author is just following a convention of always using it for functions defined in header files (it'd be required if the functions weren't declared `static`).

adrian_b 9 hours ago [-]
"static inline" is not the same as "inline".

In the former case the compiler is allowed to always inline the function.

In the latter case, even when the compiler chooses to inline the function, it also emits code for an independent instance of the function, because the function is public and it may be called from another file.

So "static inline" in the worst case does nothing, but it suggests to the compiler that the function should be inlined everywhere, which it will probably do, unless it decides that the function is too long (or it uses some features forbidden in inlined functions, e.g. variadic arguments, setjmp, alloca, etc.), so the benefits of inlining it may be less than the disadvantages.

When the compiler refuses to follow the suggestion of inlining the function, it can be made to tell the reason, e.g. with "-Winline".

So the compiler does not ignore the suggestion, even if it may choose to not follow it.

garaetjjte 4 hours ago [-]
>In the latter case, even when the compiler chooses to inline the function, it also emits code for an independent instance of the function, because the function is public and it may be called from another file.

Not in standard C. "inline" function provides implementation for usage iff compiler decides to inline the call. If it does decide not to inline, it will emit call to external symbol that needs to be defined in different TU (otherwise you will get errors at link time).

adrian_b 3 hours ago [-]
The meaning of "inline" differs between C and C++.

Quote from the gcc manual:

"GCC implements three different semantics of declaring a function inline. One is available with -std=gnu89 or -fgnu89-inline or when gnu_inline attribute is present on all inline declarations, another when -std=c99, -std=gnu99 or an option for a later C version is used (without -fgnu89-inline), and the third is used when compiling C++."

Nevertheless, "static inline" means the same thing in all 3 standards, unlike "inline" alone.

This can be a reason to always prefer "static inline", because then it does not matter whether the program is compiled as C or as C++.

TheNewAndy 18 hours ago [-]
One obvious benefit for a header only library is that it suppresses the warning you get when a static function isn't used.
uecker 16 hours ago [-]
It is not a benefit if you do not get warnings about unused functions. With any proper library, you would also not get warnings for functions that are part of the API that are not used, but you would get warnings about non-exported functions internal to a translation unite that are accidentally not used. This is a good thing.
ddtaylor 21 hours ago [-]
Kind of. At the end of the day the compiler can do almost anything it wants outside of unrefined behavior, which isn't much of a guard rail.

In reality header only libraries allow for deep inlining, the compiler may optimize very specifically to your code and usage.

The situation is a bit more exaggerated with C++ because of templates, but there is some remaining gains to he had in C alone.

kazinator 1 days ago [-]
In the world of Kubernetes and languages where a one-liner brings in a graph of 1700 dependencies, and oceans of Yaml, it's suddently important for a C thing to be one file rather than two.
jasonpeacock 1 days ago [-]
C libraries have advertised "header-only" for a long time, it's because there is no package manager/dependency management so you're literally copying all your dependencies into your project.

This is also why everyone implements their own (buggy) linked-list implementations, etc.

And header-only is more efficient to include and build with than header+source.

uecker 24 hours ago [-]
I never copied my dependencies into my C project, nor does it usually take more than a couple of seconds to add one.
AlotOfReading 22 hours ago [-]
There's a number of extremely shitty vendor toolchain/IDE combos out there that make adding and managing dependencies unnecessarily painful. Things like only allowing one project to be open at a time, or compiler flags needing to be manually copied to each target.

Now that I'm thinking about it, CMake also isn't particularly good at this the way most people use it.

uecker 17 hours ago [-]
They are certainly bad vendor toolchain, but I want to push back against the idea that this is a general C problem. But even for the worst toolchains I have seen, dropping in a pair of .c/.h would not have been difficult. So it is still difficult to see how a header-only library makes a lot of sense.
AlotOfReading 16 hours ago [-]
One of the worst I've experienced had a bug where adding too many files would cause intermittent errors. The people affected resorted to header-izing things. Was an off-by-one in how it was constructing arguments to subshells, causing characters to occasionally drop.

But, more commonly I've seen that it's just easier to not need to add C files at all. Add a single include path and you can avoid the annoyances of vendoring dependencies, tracking upstream updates, handling separate linkage, object files, output paths, ABIs, and all the rest. Something like Cargo does all of this for you, which is why people prefer it to calling rustc directly.

uecker 4 hours ago [-]
People certainly sometimes create a horrible mess. I just do not see that this is a good reason to dumb everything down. With a proper .c/.h split there are many advantages, and in the worst case you could still design it in a way that it is possible "#include" the .c file.

I tried to use cargo in the past and found it very bad compared to apt / apt-get (even when ignoring that it is a supply-chain disaster), essentially the same mess as npm or pip. Some python packages certainly wasted far more time of my life than all dependencies for C projects I ever had deal with combined.

24 hours ago [-]
quotemstr 1 days ago [-]
Writing new C code in 2026 is already an artisanal statement, so why not got all the way in making it?
fonheponho 1 days ago [-]
Exactly; I can't understand this obsession with header-only C "libraries".
hendler 1 days ago [-]
Useful for embedded devices? Crashes, disk updates not important for ephemeral process?
whstl 23 hours ago [-]
I feel like there's two kinds of developers. The ones who shit all over other people's preferences and turn everything into an almost religious discussion, and the ones who prefer to just build stuff.

Get over it. Some people like header only.

gkhartman 22 hours ago [-]
Agreed, once you've spent hrs fighting with C build tools under a deadline, it becomes very easy to see why this is beneficial.
FranklinJabar 20 hours ago [-]
No need to be an asshole; we can all discuss things civilly.
johnisgood 12 hours ago [-]
Only if people provide reasons for why they think it is bad, but with people along the lines of "Header-only? Eww. Sucks." you cannot.

To comment on this, I have a couple of header-only projects I have written. It makes sense in some scenarios. Sometimes I want no external dependencies and a single header file interface.

ddtaylor 21 hours ago [-]
Some people may not have known the difference and probably thought it was more akin to a naming convention.
whstl 21 hours ago [-]
I'm obviously not talking about the people asking "what is it".
FranklinJabar 20 hours ago [-]
It would be a lot better for the community if you directly replied to the objectionable content with a civil response.
Mikhail_Edoshin 1 days ago [-]
Why to call it a header? Could be just a source file. Including sources is uncommon, but why not? Solid "amalgamation" builds are a thing too.
Y_Y 18 hours ago [-]
In the early days of CUDA it was pretty common to just #include all your sources, since linking was such a nightmare.
bawolff 23 hours ago [-]
As a non-C programmer, why would "header only" be a good thing?
saidinesh5 18 hours ago [-]
It's not.

It's a tradeoff people make between ease of integration - just download the .h file into your project folder and #include it in your source file instead of worrying about source build system vs target build system, cross compiling headaches etc...

And compilation times: any time you change any of your source files, your compiler also has to recompile your dependencies. (Assuming you haven't used precompiled headers).

atiedebee 1 hours ago [-]
Recompiling the dependencies should only really happen if you change the file with the implementation include (usually done by defining <library>_IMPLEMENTATION or something like that.
robotpepi 16 hours ago [-]
I'm completely ignorant about this, but wouldn't it be possible to compile separately your project to improve compilation times? for instance, if you're using OP's vector library, which is self contained, you could compile that first and just once?
saidinesh5 14 hours ago [-]
Let's say you need to use a function like:

    int add(int a, int b){
        // Long logic and then this
        return a+b;
    }
Let's say this is your main.c.

    #include "add.h"

    int main(void) {
      return add(5,6);
    }

The preprocessor just copies the contents of add.h into your main.c whenever you're trying to compile main.c. (let's ignore the concept of precompiled headers for now).

What you can instead do is just put the add function declaration in add.h that just tells the compiler that add function takes two integers and returns an integer.

   int add(int a, int b);
You can then put the add function definition in add.c , compile that to an add.o and link it to your main.o at link time to get your final binary - without having to recompile add.o every time you change your main.c.

Precompiled headers: https://maskray.me/blog/2023-07-16-precompiled-headers

yxhuvud 14 hours ago [-]
Unless you have link time optimization you would lose out on optimization and performance.

The whole thing is essentially a workaround for lack of sufficiently good/easy ways to package code in the ways people want to use it.

ddtaylor 21 hours ago [-]
It often also means it was written more correctly. There is a bit of an art to designing a header only library and it can strike a different balance between code size and runtime speed optimization.

In strict terms when you place implementation in a .c file you probably want that code to be shared when different things call it, and the compiler will "link" to that same implementation.

When you have a header only library the compiler is free to optimize in more ways specific to your actual use case.

c45y 23 hours ago [-]
Extremely easy copy paste deployment into projects
colonCapitalDee 21 hours ago [-]
C's package management story is unfriendly to say the least. A header only library simplifies it dramatically, and makes it much more straightforward to integrate dependencies into your application.
johnisgood 11 hours ago [-]
Using your OS' package manager IS C's package management. Is it really that difficult to use apt, pacman, or BSD's "pkg"?
1718627440 3 hours ago [-]
This. Wish I could upvote this 10 times.
kreco 7 hours ago [-]
What if I'm using 10 different OS?

I can still push the file on git and it works everywhere else.

johnisgood 6 hours ago [-]
git is not a package manager. It does not handle many things a package manager does.
whstl 6 hours ago [-]
GP never said it was.

But it does successfully replaces the need of using one, with less problems for certain situations.

ddtaylor 1 days ago [-]
Would it work to replace the memory store with mmap?
22 hours ago [-]
newzino 1 days ago [-]
Brute-force kNN gets a bad reputation, but below ~50K vectors the overhead of building and maintaining an HNSW index often costs more than it saves, especially for infrequent queries. I use sqlite-vec (also flat scan by default) in production with 10K vectors at 384 dimensions and search takes under 5ms.

The low-hanging fruit for this library would be SIMD. At 128d float32, each distance computation touches 512 bytes of data. AVX2 processes 8 floats per cycle, NEON does 4. That's a 4-6x speedup on the hot path without changing the algorithm at all. For a header-only library where simplicity is the point, that seems like the right optimization to reach for before adding approximate indexing.

One gotcha: metadata isn't persisted on save/load. The README mentions the binary format stores vectors and IDs but not metadata. Anyone attaching text chunks to their embeddings for RAG will lose them on reload.

altcunn 1 days ago [-]
[dead]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 22:02:24 GMT+0000 (Coordinated Universal Time) with Vercel.