In the event someone is encountering suffix trees for the first time and thinking of using one: the amount of RAM required for suffix trees is obscene.
xen0 10 hours ago [-]
I've never grokkeed suffix trees, but isn't possible for them to be O(n) in space (n total length of all strings)? Is there just an unacceptable constant factor overhead? I can imagine the pointer overhead being painful.
> the human genome can be encoded as a 3GB string constructed out of an alphabet of four characters
> As of 2019, a suffix tree indexing the human genome using state of the art algorithms can easily occupy tens of gigabytes.
vintermann 12 hours ago [-]
Yes, we get the same speed from suffix arrays these days, and much, much less memory usage.
But good luck visualizing what those algorithms do :)
esafak 1 days ago [-]
https://gemini.google/overview/canvas/ is great at these visualizations. I used it recently to help me understand chord progressions in specific songs, and compare them to those of other songs.
Rendered at 20:55:18 GMT+0000 (Coordinated Universal Time) with Vercel.
C++ with output rendered with Graphviz
Also explained by the creator of this: https://www.abahgat.com/project/suffix-tree/
> the human genome can be encoded as a 3GB string constructed out of an alphabet of four characters
> As of 2019, a suffix tree indexing the human genome using state of the art algorithms can easily occupy tens of gigabytes.
But good luck visualizing what those algorithms do :)