Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲We built a persistent agent memory layer on Elasticsearch with 0.89 recall (elastic.co)

110 points by showmypost 1 days ago | 38 comments

stingraycharles 1 days ago [-]

This is such a basic thing nowadays, and ElasticSearch is massive overkill for it. Something like SQLite or LanceDB or basically any vector database is much more appropriate.

This seems to be coming from the “we must make ElasticSearch AI-compatible” department more than anything.

clintonb 1 days ago [-]

If you already have Elasticsearch, it makes sense to continue utilizing it.

Saying, “just use SQLite” completely dismisses the idea that this is a _shared_ memory across teams. The ability to easily connect to the remote service and have everything “just work” pays dividends when you have dozens or hundreds of users.

appplication 1 days ago [-]

I’m literally laughing at the root comment’s idea of proposing we replace ES with SQLite and imagining how that architecture review would go. Not everyone is doing MB/GB scale workloads.

infinite_spin 1 days ago [-]

that would be a pretty frail architecture too, I think I recall ES even saying not to rely on it for data persistence. Every time I've worked with ES it was always backed by some other database used as a source of truth.

sandeepkd 1 days ago [-]

This is a important bit of information which either gets lost or ignored for convenience at times. Other side of it is the fact that this open up the door to keep two data stores in sync which is a much bigger battle for a lot of small companies or teams

jakevoytko 1 days ago [-]

Nah, "Any other vector DB" starts to fall apart once you need stuff like scripted scoring like OP uses. Then it starts to be a question of, "do you need ANN for performance?" since SQLite only does brute-force vector scoring. And granted, brute-force is performant for far more vectors than most people give it credit for, but it definitely hits a wall well below 1 million if you want it to have webpage-type latency.

Maintaining Elasticsearch isn't free, but picking an underpowered db and having to port to the right one is also quite time consuming.

stuaxo 7 hours ago [-]

Everyone has tools they are most comfortable with.

Postgres and PGVector works for me.

Elastic search always wants a lot of resources and takes a long time to populate, requiring the JVM is yet another thing to add and configure too.

infinite_spin 1 days ago [-]

it's also an odd situation to say a tabular database can replace a document store .. sure, it can, but that's not good practice from my point of view

also, I've run ES on an old laptop and it worked really well, so the cost of it can be pretty low if you're still in development

gchamonlive 1 days ago [-]

ElasticSearch is fine. If your dataset isn't too big you aren't going to hit shard and memory limits and if you do chances are you are already in a large enough organisation that you'll have the manpower to do the required maintenance. It's not rocket science.

> This seems to be coming from the “we must make ElasticSearch AI-compatible” department more than anything.

I don't see the problem in that. It'd be great to have agentic capabilities embedded into Kibana and ES as long as it's not user hostile.

0xbadcafebee 1 days ago [-]

The design they talk about includes 3 different types of memory. They store those kinds of memory separately, so that if there's 10 users, all 10 access memories that are more general ("what bulbs work with this kind of light fixture"), and user-specific memories are segregated ("sarah has three lightbulbs"). The different memory types are ranked together leading to a different result. So this is a novel design and use of ElasticSearch-specific features

coldtea 23 hours ago [-]

Yeah, it's like this Dropbox service they made a big deal about, when one could just make one of its own with rsync and some bash scripting.

Catloafdev 1 days ago [-]

I agree for casual usage, but this seems targeted towards enterprise setups, which makes much more sense to use something like ElasticSearch if you're already in the Amazon cloud, and especially if using the advanced features it provides like they are.

xor-eax-edx 1 days ago [-]

Would be interesting if one can replace ElasticSearch with something like Typesense here

1 days ago [-]

haeseong 1 days ago [-]

[dead]

BiraIgnacio 1 days ago [-]

TIL

- Hybrid recall + reranker: Two searches merged, then re-scored for best matches

- Supersession: Old facts get hidden, new ones take their place

- Decay: Recent or often‑used memories get a score boost

- DLS: Each user only sees their own documents

0xbadcafebee 1 days ago [-]

Summary of the article (https://pastebin.com/aawJfrF6) since the original one is like reading an academic paper filtered through an LLM that hates human readers.

It seems like a cool approach. Don't know if it's novel but it's much smarter than "shove markdown files into directories".

SwellJoe 1 days ago [-]

> but it's much smarter than "shove markdown files into directories".

Is it, though? I mean, is there evidence "bunch of markdown files" is bad while "database the model has to be instructed how to use" is good? `rg` is fast as hell. Markdown is the LLMs native tongue. It does require maintenance of the Markdown files to keep them current, but maybe explicit management is fine. The models can do the grunt work.

BMDF (Bunch of Markdown Files) can be checked into the git repo, they travel to any developer on the project without any setup or special auth, any agent and any model can read them with no special tools to install, and humans can easily poke around and read them, too. And, they can be part of the PR review process, documenting the code and intentions.

I can't come up with good arguments for why a database or search index would be better than documentation in Markddown for any of my projects.

1 days ago [-]

0xbadcafebee 1 days ago [-]

Because a bunch of markdown files is just RAG, and RAG is unintelligent, so the results are not great. If you want a smarter AI, it needs to have not-dumb memory. That's why this article (and the summary I posted) covers multiple kinds of memory, multiple ways of managing different memories, multiple ways of finding memories, a way to pick the best memory, and a way to manage memories long-term (and among multiple users). Now the memory isn't dumb, so the results are better. (And the article shows you why it's better)

tl;dr https://www.elastic.co/search-labs/blog/agent-memory-elastic...

SwellJoe 23 hours ago [-]

> If you want a smarter AI, it needs to have not-dumb memory.

Who says? According to what metric? How would you prove that assertion?

> Now the memory isn't dumb, so the results are better. (And the article shows you why it's better)

But, it doesn't. It explains what they built, and how it behaves. It does not show why it's better than any other alternative for making models "smarter", somehow.

20 hours ago [-]

KaiShips 23 hours ago [-]

[flagged]

itissid 1 days ago [-]

I have a request: can this text be even more AI generated?

Cilvic 1 days ago [-]

I wonder if one clear sign is so much custom language that makes me wonder "is this lingo used in that field" but more often than not it's just the LLM trying to sound hyper-professional?

1 days ago [-]

voidUpdate 1 days ago [-]

For someone who isn't super familiar, what is "R@10", and is 0.89 good? It's impossible to google for

Hexcles 1 days ago [-]

89% chance the thing you want is among the 10 items returned by the system

voidUpdate 1 days ago [-]

So over 10% of the time, it fails? That's not a great search engine

schmookeeg 1 days ago [-]

It's sort of the whole tension when you query vectorized data/embeddings -- you need to balance accuracy and recall against the performance you need.

It took me a while to wrap my head around the two terms since they seem similar -- but Accuracy is basically "did i get mostly good results" and Recall is "did I get most of the good results" and they're subtly different. :)

Those two terms, though, will unlock as deep a rabbit-hole as you'd like on the subject.

esafak 1 days ago [-]

https://www.evidentlyai.com/ranking-metrics/precision-recall...

"Good" is subjective.

reactordev 1 days ago [-]

I built one into my agent using sqlite…

itissid 1 days ago [-]

Especially for indie users/devs and smaller teams. I built a part of this(the retriever) in < 4 hours https://github.com/itissid/wiki for replacing deepwiki.

I think the challenge is to teach how ranking works to people more effectively so that they can build it for themselves and host them on their own.

Like the other day someone who has worked in search explained to me why you would care about using learning-to-rank(LTR) technique to train your own feature vector weights on your data. My understanding is that weighted features work better(retreival wise) on textual data than plain BM-25 and vector embedding db indexing of text chunks of your data with minimal preprocessing. So if you have lots of conversations you can create a ton of features(like attributes of a conversation) from it and ones that matter more will rank higher. And you can use a regularization(like L1) to kill unimportant ones.

[EDIT]: IIUC, I think LTR is important because you likely want different features to matter more for different parts of your documents, e.g. what matters for codebase documentation is different from your personal journal.

reactordev 1 days ago [-]

I don't treat memory like RAG. That's the key. I only track decisions, actions, and outcomes.

itissid 1 days ago [-]

Ah so you extract decisions, actions and outcomes and you index and search over them?

reactordev 1 days ago [-]

Yeah, after I tokenize them and embed them into vector form. Then it’s a simple cosine distance.

The point about memory is sometimes you remember great detail, sometimes you only remember that the memory exists, so having a good tool loop to attempt to recall and try permutations is good.

marknutter 15 hours ago [-]

At this point building a memory tool for your ai agents is like a right of passage.

1 days ago [-]

koinedad 1 days ago [-]

I think this is cool and helpful but my biggest complaint is the writing style and word choice just scream LLM

leoprctmp 13 hours ago [-]

try Manticore Search, it's much more lightweight

verdverm 1 days ago [-]

I'm using Typesense to power my take on a md kb, highly recommend this option which positions itself against Elasticsearch and Algolia. Combines vector with bm25 and all the extras you get from a trad search tool like Algolia.

dominotw 1 days ago [-]

is there any proof that all these shenanigans impove agent performance

anthonypasq 1 days ago [-]

https://cursor.com/blog/semsearch

ozim 1 days ago [-]

Cursor and PI harness are the proof.

tuo-lei 1 days ago [-]

so the 11% miss rate - do users actually notice when the agent drops a memory? like if someone already said they tried X and the agent suggests it again.

zhayujie 3 hours ago [-]

[flagged]

DanceNitra 23 hours ago [-]

[flagged]

ashish296 15 hours ago [-]

[flagged]

ashish296 15 hours ago [-]

[flagged]

amitpatole 24 hours ago [-]

[flagged]

winterissnowing 1 days ago [-]

[flagged]

Rendered at 18:27:40 GMT+0000 (Coordinated Universal Time) with Vercel.