10:30 AM
The Termite bundling is the most interesting part. Packaging embedding and reranking inference alongside the database means no separate model server to manage and no network hop for every vector op.
Curious about resource contention though: if a heavy indexing job saturates Termite, does that affect query latency on the Raft side? And how does Termite handle model cold starts in single-process mode?
On the license: the ELv2 framing is honest and the "can't offer as managed service" carve-out is pretty standard at this point. Won't bother most people reading this.
thefogman 1 days ago [-]
Interesting project.
I’ve got a project right now, separate vector DB, Elasticsearch, graph store, all for an agent system.
When you say Antfly combines all three, what does that actually look like at query time? Can I write one query that does semantic similarity + full-text + graph traversal together, or is it more like three separate indexes that happen to live in the same binary?
Does it ship with a CLI that's actually good? I’m pivoting away from MCP. Like can I pipe stuff in, run queries, manage indexes from the terminal without needing to write a client? That matters more to me than the MCP server honestly.
And re: Termite + single binary, is the idea that I can just run `antfly swarm`, throw docs and images at it, and have a working local RAG setup with no API keys? If so, that might save me a lot of docker-compose work.
Who's actually running this distributed vs. single-node? Curious what the typical user experience looks like.
kingcauchy 1 days ago [-]
Thanks for the awesome questions!!
Exactly the use case I built it for!
I wanted a world where you could build your indexes and the query planner could just be smart enough to use them in a single query. I've not quite nailed down the agentic query planner side 100% (it's getting there), but the JSON query DSL allows you to pipeline, join, fuse all the full-text, semantic, graph, reranking, pruning (score/token pruning) all in one query.
The CLI is my primary development tool with antfly, I am definitely looking for feedback on what people would like to see there, it's a little chonky with the flags --pruner e.g. requires writing the JSON for the config because I didn't want users to have to memorize 1000 subflags. It's definitely a first class citizen.
With respect to "Termite + single binary" that's exactly right, Termite handles chunking, multimodal chunking, embeddings (sparse + dense), reranking, fused chunking/embedding models, and we're excitedly getting more support for a variety of onnx based llms/ner models to help with data extraction use cases (functiongemma/gliner2/etc) so you don't have to setup 10 different services for testing vs deployment.
We run Antfly ourselves for our https://platform.searchaf.com (cheeky search AntFly) Algolia style search product in a distributed setup, and some users run Antfly in single node with large instances (more at the Postgres size datasets with millions of documents vs. large multitenant depoys). But we really wanted to build something with a more seamless experience of going back and forth between a distributed vs single node instance than elasticsearch or postgres can offer.
Hope that helps! Let me know if I can help you with anything!
wiresurfer 1 days ago [-]
A quick note, on platform.searchaf.com
The account creation process hits a snag with verify-email links received on email giving a 404. hope it helps.
On a parallel note, It would be nice to put an architecture diagram in the github repo.
Are there particular aspects of the current implementation which you want to actively improve/rearchitect/change?
I agree with the goals set out for the project and can testify that elasticsearch's DX is pretty annoying.
Having said that, distributed indexing with pluggable ingestion/query custom indexes may be a good goal to aim for.
- Finite State Transducers (FST) or Finite state automata based memory efficient indexes for specific data mimetypes
- adding hashing based search semantic search indexes.
And even changing the indexer/reranker implementation would help make things super hackable.
kingcauchy 1 days ago [-]
Oh thanks for the 404 on the verify link (I abstracted out the auth OIDC for cross domain login and must have missed a path).
Yes good call, I tried to start that on the website with a react-flows based architectural flow chart a little bit but it's a bit high level, and not consumable directly in github markdown files but I'll work on that!
That's exactly the direction I've been working on, the reranking, embedders and chunkers are all plugable and the schema design (using jsonschema for our "schema-ish" approach allows for fine-grained index backend hints for individual data types etc.) I'll work on getting a good architecture doc up today and tomorrow!
KnowFun 9 hours ago [-]
This is powerful. At KnowFun, we're building a platform to transform multimodal content like videos and articles into interactive learning paths. A core challenge is making this knowledge base searchable and discoverable. A system like Antfly could be game-changing for creating a 'memory' of the content, allowing users to find related concepts across different formats. How extensible is the system for custom content types?
kingcauchy 5 hours ago [-]
I'd be super interested to here more about what you all do in this space, currently Antfly (and Termite) doesn't handle custom content types explicitly because we've mostly focused on supporting the "classic" ones (application/pdf, image/png, image/jp2, e.g.) but we've had to build out a lot of the support for these things as custom support into the system. For instance I chose jsonschema for the schema so users could do exactly what you're suggesting, custom content types indexed differently. The ML side of things also has to know how to support them (i.e. does a pdf get rendered ocr then embedded or text extraction on some fallback). Would love to here about what you all do and the types of media you make searchable today!
Comment on the Pause method indicates that waits for in flight Batch operations (by obtaining the lock) but Batch doesn’t appear to hold the lock during the batch operation. Am I missing something?
Upon another look it looks like we were actually missing the pause lock for the backfill operation too during a shard split though, I also went ahead and added it to batch for good measure although that case should be caught by the manager! Thank you for the report!
kingcauchy 18 hours ago [-]
Nope! Awesome you’re poking around though. I’m currently working on deterministic simulation testing and a feature set to allow pausing of index backfills but it’s not fully implemented yet, stay tuned!
schmichael 23 hours ago [-]
As a longtime Raft user (via hashicorp/raft), I'm curious about your Raft implementation! You mention etcd's Raft library, but it isn't natively Multi-Raft is it? Is your implementation similar to https://tikv.org/deep-dive/scalability/multi-raft/ ? I'd love to hear about your experience implementing and testing it!
carpenterant91 23 hours ago [-]
Awesome question! We'd experimented with https://github.com/lni/dragonboat and the hashicorp/raft in the early implementation, the etcd/raft library had been ported to a multi-raft style implementation by CockroachDB way back when, but they went the way of TigerBeetle and coupled their consensus deeply with the kv storage. Etcd has recently in v3.6 abstracted out their raft implementation and gave a pluggable interface into the transport layer, which meant that we could implement our own multi-raft style transport layer with heartbeat and multi-node message buffering on top of HTTP/Quic.
We implemented chaos testing suites akin to Jepsen to cover as many scenarios as possible and are currently implementing TigerBeetle style simulation tests on top of that for harder to reproduce scenarios!
Fascinating! We settled on Quic with Protobuf because it was more performant in our testing than the gRPC when coupled with the backoff, failure cases (node startup ordering server/client connections), and to not be coupled with the gRPC library versions in Go, which has bitten us a number of times when dealing with dependency management when you're trying to juggle k8s, etcd, and google dependencies in the same Go project. Plus the performance bottleneck in most of the use cases we're specializing in are on the embedding/ml side of things.
Of course the two most visionary people I worked with at Lytics went and built this. Just in time... this is the vector database I actually need. Termite is the killer feature for me, native ML inference in a single binary means I can stop duct-taping together embedding APIs for my projects. Excited to spend the upcoming weekends hacking on the Antfly ecosystem.
epsniff 22 hours ago [-]
I totally agree. I'm looking forward to what AJ and James build here. And I'm also planning on using it at my current company.
SkyPuncher 1 days ago [-]
Can you help me understand what type of practical features Graph Traversal unlocks?
I've seen it on a few products and it doesn't click with me how people are using it.
kingcauchy 1 days ago [-]
I can't speak for everyone, knowledge graphs are the "new hotness" of the ai space (RAG and MCP are seeing a lull in their hype cycles I guess). But I've used graphs professionally for a long time to connect relationships that SQL normal forms have trouble expressing non-recursively. E.g. I used graphs to define identity relationships between data sources hierarchically, and then had a another graph relationship on top of that to define connections between those identities, user at one level and organizations at the next. Graphs as indexes allow you to express arbitrary relationships between data to allow for more efficient lookups by a database. Some folks use it to express conceptual relationship between data for AI now, so if I have a bunch of images stored in google drive, I might want to abstract the concept of pets and pets have relationship with a human etc. then my database queries for looking up all pictures related to the dog-pets owned by some human becomes a tractable search instead of a scan of the corpus!
epsniff 1 days ago [-]
The one area I've seen knowledge graphs come up are: Product Knowledge Graphs (PKGs), which are a centralized, semantic, and highly interconnected data structure that brings together information about products, customers, and their interactions into a single, comprehensive "360-degree" view. Basically, it's the idea of combing through all the data (CRMs, codebases, Ticketing System, Churn Management System, sales calls, ...) that the company has digitally about their customers, and building one giant knowledge graph that they can use to determine a bunch of business intelligence use cases, or using it to power how to create new features. Then you slap an answer bar or semantic search on top of it, and you have a powerful way of getting insights or doing gap analysis on your product versus your customer needs.
Anyway, that's just one example of why you might want to use a knowledge graph. I'm sure there are literally hundreds, of more examples.
SkyPuncher 20 hours ago [-]
Okay, but that's just a bunch of SQL queries.
I can't figure out what the graph part of the equation unlocks.
Curious why you decided to go with Go. Instead of Rust for instance.
kingcauchy 18 hours ago [-]
Great question! I think the fundamentally hard problem with distributed systems (at least for me!) comes down to the complicated distributed state machines you have to manage rather than the memory management problems. I think async rust gets in my way with respect to these problems more than it helps (especially when it comes to raft or paxos). That being said with the new async Zig, I’ve been excitedly implementing a swappable backend for the core database that I hope will be a nice marriage of performance and ergonomics.
cindyllm 18 hours ago [-]
[dead]
thedevilslawyer 18 hours ago [-]
The idea is good, but the project isn't open. So I assume a rust fork will come out under MIT with these ideas, which can be the wider community adopted version.
kingcauchy 18 hours ago [-]
Possibly, Amazon and Google also made the ability for smaller startup based DB companies to go that route with things like ValKey and OpenSearch. LLMs have made it super easy to transpile the ideas into whatever programming language you please though, you just have to put in the time.
didip 1 days ago [-]
in the query_test.go, I don’t see how the hybrid search is being exercised.
For fun I am making hybrid search too and would love to see how you merge the two list (semantic and keyword) and rerank the importance score.
kingcauchy 1 days ago [-]
There's some examples in the quickstart on the website but I'll add an explicit e2e example case for that too. Otherwise the tests for that are a little lower level in the code! I'll add the RSF (merging of the two lists) example for that too!! Thanks for the feedback.
Was thinking to create something similar, well done!
Aceshootzxx95 13 hours ago [-]
Good points here. Would love to see a follow-up.
jnstrdm05 1 days ago [-]
This looks sick!
Did you build this for yourself?
kingcauchy 1 days ago [-]
I built this for myself because I hated running a large ElasticSearch instance at work and wanted something that would autoscale and something that allowed for reindexing data. I also had a lot of experience running a large BigTable/Elasticsearch custom graph database I thought could be unified into a single database to cut costs. Started adding an embedding index for fun based on some Google papers and now here we are!
perfmode 1 days ago [-]
what google papers?
kingcauchy 1 days ago [-]
Not strictly google but microsoft/bing too, here's the top ones from my notes:
I have a variety of blogs that I used too and reference implementations!
It's a Rabit[Q]uantized Hierchical Balanced Clustering algorithm we use for the vector index and we use a chunked segment index for the sparse index if you're curious! Happy to discuss more!
perfmode 1 days ago [-]
Curious if you’re using any SIMD optimizations for numerical calculations.
kingcauchy 1 days ago [-]
Yes we do use SIMD heavily! https://github.com/ajroetker/go-highway I also added SME support for Darwin for most algorithms. We use it in the full-text index, all over the vector indexes and heavily for the ml inference we do in go especially.
1 days ago [-]
openinstaclaw 1 days ago [-]
[dead]
skwuwu 1 days ago [-]
[dead]
rigorclaw 1 days ago [-]
[flagged]
kingcauchy 1 days ago [-]
Definitely open to working with you on supporting even better tooling for this as I imagine many different "styles" of migration will be necessary.
The number 1 supported migration path for users though is one of my personal favorite features of antfly which is the linear merge api, which allows you to incrementally reconcile an external pageable datasource with antfly at the pace you want while also getting the benefit of batching! We support index templates just like ES and the ability to change you schema and antfly manages the full-text reindex for you. If you're looking at migrating your embeddings in Elastic or another vectordb we can also support that! Let us know :)
epsniff 1 days ago [-]
Yeah, that is a pretty sweet feature. So you can keep two databases in-sync while you're doing your migration until you finish the cut over.
Rendered at 20:26:22 GMT+0000 (Coordinated Universal Time) with Vercel.
Curious about resource contention though: if a heavy indexing job saturates Termite, does that affect query latency on the Raft side? And how does Termite handle model cold starts in single-process mode?
On the license: the ELv2 framing is honest and the "can't offer as managed service" carve-out is pretty standard at this point. Won't bother most people reading this.
I’ve got a project right now, separate vector DB, Elasticsearch, graph store, all for an agent system.
When you say Antfly combines all three, what does that actually look like at query time? Can I write one query that does semantic similarity + full-text + graph traversal together, or is it more like three separate indexes that happen to live in the same binary?
Does it ship with a CLI that's actually good? I’m pivoting away from MCP. Like can I pipe stuff in, run queries, manage indexes from the terminal without needing to write a client? That matters more to me than the MCP server honestly.
And re: Termite + single binary, is the idea that I can just run `antfly swarm`, throw docs and images at it, and have a working local RAG setup with no API keys? If so, that might save me a lot of docker-compose work.
Who's actually running this distributed vs. single-node? Curious what the typical user experience looks like.
Exactly the use case I built it for! I wanted a world where you could build your indexes and the query planner could just be smart enough to use them in a single query. I've not quite nailed down the agentic query planner side 100% (it's getting there), but the JSON query DSL allows you to pipeline, join, fuse all the full-text, semantic, graph, reranking, pruning (score/token pruning) all in one query.
The CLI is my primary development tool with antfly, I am definitely looking for feedback on what people would like to see there, it's a little chonky with the flags --pruner e.g. requires writing the JSON for the config because I didn't want users to have to memorize 1000 subflags. It's definitely a first class citizen.
With respect to "Termite + single binary" that's exactly right, Termite handles chunking, multimodal chunking, embeddings (sparse + dense), reranking, fused chunking/embedding models, and we're excitedly getting more support for a variety of onnx based llms/ner models to help with data extraction use cases (functiongemma/gliner2/etc) so you don't have to setup 10 different services for testing vs deployment.
We run Antfly ourselves for our https://platform.searchaf.com (cheeky search AntFly) Algolia style search product in a distributed setup, and some users run Antfly in single node with large instances (more at the Postgres size datasets with millions of documents vs. large multitenant depoys). But we really wanted to build something with a more seamless experience of going back and forth between a distributed vs single node instance than elasticsearch or postgres can offer.
Hope that helps! Let me know if I can help you with anything!
On a parallel note, It would be nice to put an architecture diagram in the github repo. Are there particular aspects of the current implementation which you want to actively improve/rearchitect/change?
I agree with the goals set out for the project and can testify that elasticsearch's DX is pretty annoying. Having said that, distributed indexing with pluggable ingestion/query custom indexes may be a good goal to aim for. - Finite State Transducers (FST) or Finite state automata based memory efficient indexes for specific data mimetypes - adding hashing based search semantic search indexes.
And even changing the indexer/reranker implementation would help make things super hackable.
Yes good call, I tried to start that on the website with a react-flows based architectural flow chart a little bit but it's a bit high level, and not consumable directly in github markdown files but I'll work on that!
That's exactly the direction I've been working on, the reranking, embedders and chunkers are all plugable and the schema design (using jsonschema for our "schema-ish" approach allows for fine-grained index backend hints for individual data types etc.) I'll work on getting a good architecture doc up today and tomorrow!
Comment on the Pause method indicates that waits for in flight Batch operations (by obtaining the lock) but Batch doesn’t appear to hold the lock during the batch operation. Am I missing something?
Upon another look it looks like we were actually missing the pause lock for the backfill operation too during a shard split though, I also went ahead and added it to batch for good measure although that case should be caught by the manager! Thank you for the report!
We implemented chaos testing suites akin to Jepsen to cover as many scenarios as possible and are currently implementing TigerBeetle style simulation tests on top of that for harder to reproduce scenarios!
I've long wished for QUIC with Nomad! [1] We've always used a weird QUIC-over-TCP multiplexer called yamux. [2]
[1] https://github.com/hashicorp/nomad/issues/23848
[2] https://github.com/hashicorp/yamux (I'm fairly certain libp2p's fork is actually better)
Thanks for the links! I hadn't seem yamux before!
I've seen it on a few products and it doesn't click with me how people are using it.
Anyway, that's just one example of why you might want to use a knowledge graph. I'm sure there are literally hundreds, of more examples.
I can't figure out what the graph part of the equation unlocks.
For fun I am making hybrid search too and would love to see how you merge the two list (semantic and keyword) and rerank the importance score.
Did you build this for yourself?
https://arxiv.org/abs/2410.14452 spfresh, https://arxiv.org/abs/2111.08566 spann, https://arxiv.org/abs/2405.12497 rabitq, https://arxiv.org/abs/2509.06046 diskann,
I have a variety of blogs that I used too and reference implementations!
It's a Rabit[Q]uantized Hierchical Balanced Clustering algorithm we use for the vector index and we use a chunked segment index for the sparse index if you're curious! Happy to discuss more!
The number 1 supported migration path for users though is one of my personal favorite features of antfly which is the linear merge api, which allows you to incrementally reconcile an external pageable datasource with antfly at the pace you want while also getting the benefit of batching! We support index templates just like ES and the ability to change you schema and antfly manages the full-text reindex for you. If you're looking at migrating your embeddings in Elastic or another vectordb we can also support that! Let us know :)