NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
SQLite Is a Library of Congress Recommended Storage Format (sqlite.org)
tnelsond4 1 days ago [-]
I'm always inspired by SQLite. Overall I like it, but if you're not doing writes it's really overkill.

So I made a format that will never surpass SQLite, except that it's extremely lighter and faster and works on zstd compressed files. It has really small indexes and can contain binaries or text just like SQLite.

The wasm part that decompresses and reads and searches the databases is only 38kb (uncompressed (maybe 16kb gzipped)). Compare that to SQLite's 1.2mb of wasm and glue code it's 3% the size but searching and loading is much faster. My program isn't really column based and isn't suitable for managing spreadsheets, but it's great for dictionaries and file archives of images and audio.

I ported the jbig2 decoder as a 17kb wasm module, so I can load monochrome scans that are 8kb per page and still legible.

https://github.com/tnelsond/peakslab

SQLite is very well engineered, PeakSlab is very simple.

sgbeal 1 days ago [-]
> Compare that to SQLite's 1.2mb of wasm and glue code

The current trunk is actually 1.7mb in its canonical unminified form (which includes very nearly as much docs as JS code), split almost evenly between the WASM and JS pieces :/. Edit: it is 1.2mb in minified form, though.

Disclosure: i'm its maintainer.

Edit: current trunk, for the sake of trivia:

    sqlite3.wasm 896745
    sqlite3.mjs  816270 # unminified w/ docs
    sqlite3.mjs  431388 # unminified w/o docs
    sqlite3.mjs  310975 # minified
smartmic 24 hours ago [-]
Many comments here to your creation, PeakSlab, but not yet a dedicated praise. I didn't know it but I have to say it is really cool and innovative! The performance of the dictionary is indeed superb and I will definitely bookmark this for future reuse. So, in a nutshell: thanks for sharing!
pjc50 1 days ago [-]
I think actually this competes with the old BerkeleyDB: https://en.wikipedia.org/wiki/Berkeley_DB - which I now see is no longer BSD-licensed, and in any case has been rendered almost extinct by SQLite. It was used for basic on-disk key-value store work.
nostrademons 19 hours ago [-]
It seems more like SSTables, which are widely used by open-source software like LevelDB, HBase, and Cassandra (and Google's BigTable) but AFAIK don't have a standard open-source reader (unless you want to pull the relevant source file out of Cassandra or LevelDB).

https://www.igvita.com/2012/02/06/sstable-and-log-structured...

tnelsond4 17 hours ago [-]
Yes this does seem almost exactly like what PeakSlab is doing. I'll have to investigate more how SSTables and memtables work to see if there's any improvements or anything to glean from them.

I'm storing null delimited sorted strings with an index that's an offset to where that entry begins, so if I wanted to do memtables for writing I would have to have two memtables and merge them on write? I've been considering this exact approach.

tnelsond4 1 days ago [-]
Even BerkeleyDB tries to be mutable. What I'm doing doesn't need the mutability so it's much more similar to dictionary formats (though probably simpler) than it is to a database. Though a lot of people do use full databases for immutable dictionary key-value stuff. I just couldn't get any database to work well enough for a pwa dictionary.
tingletech 19 hours ago [-]
I don't think it has had a BSD license this century, Sleepy Cat was selling licenses in the 90s before Oracle bought them.
raxxorraxor 23 hours ago [-]
SQLite is simple in its own way and I like the design principle of their SQL dialect.

"Right joins are just left joins in the wrong direction, you don't need that crap"

Of course it always gets simpler or more specialised. I think many apps using databases would run with SQLite just as well. And some would probably run just as well with a textfile instead of any db like SQLite.

chungy 22 hours ago [-]
> "Right joins are just left joins in the wrong direction, you don't need that crap"

SQLite has supported all types of joins since version 3.39 in 2022.

tnelsond4 22 hours ago [-]
I must've messed something up, but I remember some joins (was it full outer join?) being unbelievably slow? Was I doing something wrong?
chungy 19 hours ago [-]
Too vague of a question to give you an answer you'll likely sound satisfactory :)

You probably just needed to create indexes over your data to speed things up.

raxxorraxor 22 hours ago [-]
Well, look at that, now it is downhill from here!
luckystarr 23 hours ago [-]
For the love of god, don't do blank textiles anymore. In the end you have a software that has 20 (or more) individual files for each programs section, which works fine until you want the files to be consistent. Boom. And then you add a lock to fix it and suddenly your whole program can only run sequentially. And then your customers ask why it's so slow in ingress. I won't name any names here, but this is a real commercial product.
BenjiWiebe 23 hours ago [-]
We use a cheap invoicing program. It works fine except it gets very slow when dealing with large numbers if invoices. Turns out each invoice (or payment record, or customer record, or whatever) is a separate text file with form-urlencoded data. No indices.
gpvos 1 days ago [-]
A more standard solution would be cdb.[0] Although that doesn't support compressed data.

[0] https://cdb.cr.yp.to/ , https://en.wikipedia.org/wiki/Cdb_(software)

tnelsond4 22 hours ago [-]
I'll definitely have to look at this. I'm using binary search right now because it's fast enough and I understand it better, but I might replace it with hashes at some point.
tracker1 12 hours ago [-]
I've had to deal with image scans and my own approach lately has been to quantize to 16-color (grayscale via pngquant) and oxipng with zopfli set to max. Mostly so that the output can be easily used in a browser UI.

Will definitely have to look into your solution with JBIG2 next time. Though I may keep with png just for compatibility sake.

tnelsond4 10 hours ago [-]
Yeah, I had Claude generate me the jbig2 to 1-bit png wasm module from the jbig2 source so that there'd be no dependencies on zlib or libpng. Then I aggressively removed functions from 97kb down to 17kb. So if it's missing functions you need you may need to reset the files to the .bak and recompile.
tmountain 22 hours ago [-]
Overkill in what way exactly? The LOC of the project shouldn't have any bearing on most people's usage of the project. SQLite is one of the well tested and mature projects in the world. What exactly would motivate someone to use PeakSlab instead? What problem are you solving?
tnelsond4 22 hours ago [-]
I'm solving a simpler problem. Just making cross platform dictionary progressive web apps with indexes and full text search and HTML tags and uppercase letters inserted back into the text on render so they don't interfere with search.

SQLite is 1.2mb in combined wasm and JavaScript and not really designed for my use case, so I would have to add all the things i need anyway like compression and HTML tag insertion. For my use case which is just for pwas SQLite takes too long to load and the files are too big and the search isn't tailored. So I made something else in 38kb instead

tmountain 19 hours ago [-]
Got it, thanks.
IshKebab 22 hours ago [-]
Read the comment. He's using it in WASM form and doesn't want users to have to download 1.2MB of SQLite every time they visit the page.
shermantanktop 21 hours ago [-]
Client caches are a thing, so this is most relevant for cold-start customers. In that case PeakSlab’s download size is an advantage.

Fwiw LocalStorage is a SQLite db on most browsers, with a kv api. It’s be interesting to have the actual API available.

tnelsond4 20 hours ago [-]
Even on warm start PeakSlab is twice as fast. It's not just download size, it's execution speed, zero copy, database decompression, etc.

That's why PeakSlab is written in c, because what's faster than casting the whole database to a struct? ;-P

tekne 17 hours ago [-]
I think web sqlite was originally an (experimental) thing
kev009 18 hours ago [-]
Even if you aren't doing writes SQLite should be the default option as a file dependent format or even durable IPC. And it isn't going to fall away (i.e. BerkeleyDB, Tokyo Cabinet). I recently dug into WAL mode and it is pretty incredible for multiple readers https://github.com/InterNetNews/inn/pull/338.
giza182 1 days ago [-]
Perhaps a dumb question, but how do you get data into it if you’re not doing writes
andrelaszlo 1 days ago [-]
I think it's just immutable once you've generated it. No need to update indexes or check consistency on writes, no need for transactions, etc.
electroly 21 hours ago [-]
I have a system that builds SQLite databases and uploads them to S3. Once they're in S3, they are never changed. The program that builds the databases only does writes, and the program that queries the databases only does reads. It uses a VFS to query the database in-place with HTTP range requests.

This is indeed not an optimal setup. A more careful design from first principles would not require seeking around the file as much as SQLite does, we'd do a better job on reading exactly the correct range of bytes for a given query since we know ahead of time what the access patterns are, and we could do reads in parallel. With SQLite we have to be very careful about the schema design to ensure it won't have to seek too many times to answer a query. But SQLite was expedient, and I'm confident I'll always be able to read the files. That's less certain for a custom file format.

giovannibonetti 19 hours ago [-]
If it's going to be read-only, why not make it a Parquet file instead? It should result in a smaller file size due to columnar compression.

DuckDB has built-in capability to read Parquet files with HTTP range requests.

electroly 18 hours ago [-]
For this use case we need the ability to do an indexed query and extract a small number of rows from a large database. It's a traditional row-oriented database workload. I'm sure other solutions would also work, but SQLite's design melds well with the data. The migration from partitioned SQL Server tables to a collection of SQLite database files was straightforward.
pfortuny 1 days ago [-]
Think historical records of, say, share values for past years. You might have a single db for 1900-2000, for instance. Things like that.

Not everything needs to be real-time updated.

tnelsond4 1 days ago [-]
Generate it one time from a source tsv file or folder of media.
shermantanktop 21 hours ago [-]
It’s an RODB. Ship the preindexed data blob.
meindnoch 1 days ago [-]
It is crashing Safari.
tnelsond4 23 hours ago [-]
Works on my wife's old iphone. I don't have a mac to test things on.
TheRealPomax 19 hours ago [-]
If you're not modifying data, whatever system is using the data doesn't need a database at all, it just needs a data export.
themafia 16 hours ago [-]
> but if you're not doing writes it's really overkill.

SQLite is not just a B-Tree+ file format manager. It's also a powerful CLI with tons of built in data manipulation and extraction tools which can all be scripted.

Finally, if your data is _relational_, then a format that pereserves those relations, even if they aren't intended to be updated, is exceptionally powerfull.

> This is a custom format built to be very fast to load (cast to a c-struct and done) and very fast to search with 3 binary search indexes built in.

What is the endianess of your format?

tnelsond4 15 hours ago [-]
Little endian. x86 and wasm have the same endianness so it's very straightforward. If I wanted to enable support for reading and writing on big endian systems I could patch a few lines very easily, but it seems to be a consideration unnecessary for most uses.

Instead of doing multiple columns, there's some of that with secondary and tertiary indexes, but mostly it's just key value.

To make the data more relational I find it's more natural to just have additional "columns" in separate peak files and when the peak files are queried they can be combined by headword. This is kinda like what parquet does putting all of one column together.

I could make PeakSlab have columns and have it query based off specific columns fairly easily, but it was a design decision to make things more tag based and less column based. I like that it's more flexible and less rigid in structure.

SQLite keeps everything relational and their raw filesizes are 50% larger than a .peak file, not sure what's exactly to blame, but I'm optimizing for small filesizes.

zoky 1 days ago [-]
something something XKCD competing standards something something
lpln3452 1 days ago [-]
Creating something new for a different use case isn't pointless. It's like comparing inline skates to ice skates.
tnelsond4 1 days ago [-]
Believe me, I tried sticking to SQLite or aard2 or stardict, they just were fundamentally inadequate with no good pwa cross platform tooling.
bbkane 1 days ago [-]
Does this remain true now that SQLite has a WASM build?
tnelsond4 1 days ago [-]
Yes, because originally when I started PeakSlab it used the SQLite wasm build.
keybored 1 days ago [-]
Doesn’t even apply unless someone says that (1) there are too many “standards”, and (2) so we are making this standard (neither apply here). Someone made something.

We should really consider eventually retiring memes because they just end up as thought-terminating cliches.

This is of course referring to xkcd #927. How do I know that?

alexpotato 1 days ago [-]
I have always loved SQLite.

I have also heard that some firms ban its use.

Why?

Because it makes it SO easy to set up a database for your app that you end up with a super critical component of your application that looks exactly like a file. A file that can have any extension. And that file can be copied around to other servers. Even if there is PII in that file. Multiply this times the number of applications in your firm and you can see how this could get a little nuts.

DevOps and DBA teams would prefer that the database be a big, heavy iron thing that is very obviously a database server. And when you connect to it, that's also very obvious etc etc.

I still love SQLite though.

Fwirt 1 days ago [-]
The question is, do the same firms ban Excel? Excel spreadsheets often end up as shadow databases in unlikely places.
croon 1 days ago [-]
This might catch flak, but generalizing I would assume that the people banning things are the same people who would use excel for something where a database would be better, and if so, that is the reason Excel isn't banned on the same conditionals that would get sqlite banned.
hermitShell 1 days ago [-]
The sane thing would be to ban Excel and promote SQLite. Excel is often used for tabulated text (issue tracking) not calculations. Perfect use case for a relational db
rswail 1 days ago [-]
Excel has sheets for tables, columns and rows, primary keys (UNIQUE), foreign key references etc if you squint.

It doesn't require you use all of that properly, but it's there.

frollogaston 1 days ago [-]
Excel is made for calculations. But if you make it hard to make a DB, people will abuse Excel as a DB.
TJSomething 1 days ago [-]
I mean, it might have been at first, but Microsoft figured out that the majority of users for lists without formulas in 1993 and they've strategized around that. IMHO, the biggest concession to this was when they added Power Query to core Excel in 2016.
harvie 1 days ago [-]
or reimplement excel with sqlite as a backend :-D

BTW sqlite can run SQL queries on CSV files with relatively simple one-liner command...

euroderf 21 hours ago [-]
Well heck can't someone make an SQLite extension that is basically just a simplified Excel ?
0123456789ABCDE 1 days ago [-]
and excel has gui for forms
rantingdemon 1 days ago [-]
Only where VBA is available. Not available for MacOs versions if I'm correct?
nereye 22 hours ago [-]
VBA is just there for backward compatibility.

The modern alternative is to use JavaScript/TypeScript, which makes such solutions cross platform (including MacOs, web etc.):

https://learn.microsoft.com/en-us/office/dev/add-ins/overvie...

silon42 1 days ago [-]
IMO, almost any Excel more than a month old should become readonly.
irishcoffee 1 days ago [-]
You should consider knock-on effects of this brilliant idea. Now there would be copies of spreadsheets younger than a month that get replicated 47 billion times, exponentially compounding the problem you're trying to solve.

This sounds like how we pass so many stupid laws. Nobody thinks about 2nd order effects.

silon42 5 hours ago [-]
Doesn't this happen anyway with "final2.really.final" ?
perching_aix 24 hours ago [-]
So you're saying they should further auto-delete after two or three months?
irishcoffee 24 hours ago [-]
3rd order effect, people copy and paste the old sheet into a new sheet, now we have worse exponential. You’re not very good at this huh.
perching_aix 24 hours ago [-]
Which is very annoying and people will complain. People complaining can be then directed towards a better solution. As a bonus, mistakes will also rise, leading to further complaints, especially ones that reach higher. All this making the dogshit practice, and the idiots committing them, infinitely more visible and thus fixable.

The sheer volume of data that needs tending to may even grind certain departments to a halt! What a great opportunity! It'd appear I'm positively stellar at this!

irishcoffee 21 hours ago [-]
Sorry for the snark, that was shitty of me.
perching_aix 16 hours ago [-]
No worries, was a bit of a gamble of a joke from me (sarcasm frequently doesn't translate in text, or can be inopportune), so I tried taking it accordingly.

For clarity, while I did have some rather perverse fun toying with the idea, I do not actually think it should be implemented, or at least certainly not in one fell swoop and as-is. Mostly for the aforelisted reasons. This is what I actually intended to convey.

Though for better or for worse, that doesn't mean I think the notion is completely meritless either, so I might still be deserving of at least some of your snark. But a lot of it was in jest from my side indeed.

This whole keeping an inventory, disabling items, and later taking them out completely is a chore I already do in other contexts. While it does work, it is anxiety inducing, doesn't really scale well, and it's quite miserable to go through with. It's the cost of keeping things organized as far as I'm concerned, with no real way around it in the general case. The best one can do is try and mitigate it, by monitoring for patterns and building out systems, to automate and streamline the tedium away.

I do sincerely not know of other ways to keep things in check though, in lieu of which you do get the makeshift shadow ops with all of its pitfalls. It's kind of just life.

Spooky23 1 days ago [-]
They generally cannot. But they do banish Access.
pasc1878 1 days ago [-]
Now that is different.

Access gets used for a shared DB and that is quite easy to corrupt. It is much more cost effective to have that in a proper central database (I supse SQLLite is better here as well)

cwillu 1 days ago [-]
Excel is also a shared DB: it has supported multiple concurrent users accessing and modifying the same spreadsheet for decades.
haspok 22 hours ago [-]
You can enforce classification and privacy labels (or something similar) in Excel and other document files, at least in a closed corporate environment. Azure also supports this. Also, everyone has Office installed (in a corporate environment), anyone can open and work with an Excel file.
jnwatson 22 hours ago [-]
I don't have Office installed, nor do a significant majority of my peers. Given that sqlite is installed by default on Macs, a sqlite file is far more portable than an Excel file.
mcdonje 21 hours ago [-]
PII sniffers are pretty good at dealing with excel files. Excel is seen more as an analyst tool than a dev tool. Any place that bans Excel needs to either let analysts use some other turing complete data tools, like python or R or something, or they'll have trouble attracting analyst talent. They'll have devs and data entry users and that's it.

The only way that works is if the dev team is large enough to be responsive to business needs, which almost never happens because devs are expensive. The juniors who are tweaking business logic every day are functionally doing a role analysts can do if you just give them a sane API and data tools.

mr_toad 1 days ago [-]
I’ve worked at some organisations that have strict rules (not always strictly followed) about what can go in Excel spreadsheets, and where they have to be stored. The C drive is verboten. Some also have standards about classification and labelling of PII and sensitive data.
forinti 21 hours ago [-]
Don't get me started on Access...
mcdonje 21 hours ago [-]
Man, Access could've been so good if they just made an app around SQLite. Or since it's Microsoft and they need to do everything their own way, it would've been so good if they made a flat file DB à la SQLite, but with T-SQL (or a subset thereof) instead of JET-SQL.

Increase interoperability. Funnel data people from Excel into real DB technologies.

And if they did more to blur the lines between spreadsheets and databases, and make it seamless to work out of both Excel and Access, add more spreadsheet features to the data views, etc.

1 days ago [-]
DeathArrow 1 days ago [-]
Do companies ban text files? Text files are used to store data.
perching_aix 24 hours ago [-]
That's why you store them on unsaved tabs instead.
yard2010 1 days ago [-]
Do companies ban data centers? It's crazy to send PII to other computers on the line.
altmanaltman 1 days ago [-]
Do companies ban brains? Brains are used to store data.
shermantanktop 21 hours ago [-]
Brains do not exhibit ACID properties…
tehlike 1 days ago [-]
There are interesting uses for sqlite, like this one: https://sqlite.org/sqlar.html
darthwalsh 10 hours ago [-]
A production app with customer data needs a data backup/restore strategy. I'm guessing a random app server writing to a local sqlite file isn't doing that either.
y-curious 20 hours ago [-]
Required reading for “anything can become a mission critical database” conversations:

https://www.reddit.com/r/sysadmin/comments/eaphr8/a_dropbox_...

giancarlostoro 22 hours ago [-]
This is why I put configs like that into AppData or dotfile directories, or the equivalent for MacOS (I forget which one it is inside of the ~/Library directory).
Scribbd 23 hours ago [-]
I recently watched a YT video about this subject: https://www.youtube.com/watch?v=lSVgeMoXJTs

In summary, companies use the bus-metric to see how viable a project is. Bus, as in, how many people can be hit by a bus before there is no one left to maintain the project.

Despite its ubiquity, SQLite is maintained by only 3 people. That bus-metric for SQLite is 3, which is way too low for some companies.

Give the link a watch; it was really interesting.

redeeman 22 minutes ago [-]
and anyone that considers this to be the case for sqlite, should probably have their reasoning skills examined.

if the unfortunate bus incident happens to sqlite developers, there is exactly ZERO chance that it will not be very well maintained on the count of all the users, many of whom already have support contracts going for decades, and which would require the same level of support they have already enjoyed.

BenjiWiebe 23 hours ago [-]
At least with SQLite, it is really stable so if development did cease, you'd probably be fine indefinitely.
WorldMaker 20 hours ago [-]
This "shadow IT DBA" issue has always been a classic problem with Access databases, too.
gandutraveler 1 days ago [-]
DevOPs and DBAs must hate RAM and caches. We
TheRealPomax 19 hours ago [-]
Some firms don't understand how to do data management, and if we draw the venn diagram of those and the ones that ban sqlite, it'd be pretty close to a circle.

Yes, databases could have any extension. No sane dev team would accept code that doesn't use an object extension for a sqlite database.

Yes, databases can contain PII but no sane product manager would go "yes, that's a good use of sqlite".

Yes, you can trivially copy database files, but no sane product needs to in the same way that no sane product should require folks to just clone the db just to do some work.

Pretty much every reason a company has for banning sqlite is a red flag for working there.

duped 20 hours ago [-]
> a file that can have any extension

So read the magic number, you shouldn't trust file extensions anyway

> that file can be copied around to other servers

So can spreadsheets

I'm not discounting that having centralized data access is desirable but it doesn't sound like that particular reasoning is well thought out

ai_slop_hater 1 days ago [-]
That's so dumb
slopinthebag 1 days ago [-]
> DevOps and DBA teams

Ah so two teams nobody should listen to.

frollogaston 1 days ago [-]
At least would take it with a grain of salt when the DBA wants you to depend more on the DBA.
slopinthebag 1 days ago [-]
Same with devops tbh.

"Hey everyone, we need to chose the option that involves us the most and provides us the most job security"

mschuster91 1 days ago [-]
Well... eventually the company learns the lesson the hard way, either because a site goes down or gets 0wned. Then everyone will cry about "how this could happen", and the ops people will tell you in response "we warned you that this would happen, here's the receipts, now GTFO".
shermantanktop 21 hours ago [-]
Preparing to say “I told you so” is a fairly obvious incentive for someone to act like Chicken Little. And of course sometimes they are right, but not always.

Lots of great people in both devops and security. But when teams position themselves as the conscience of the org and the gatekeepers of production, the defensive victim mentality can get pretty strong.

faangguyindia 1 days ago [-]
I went from thinking “SQLite is a toy product, not reliable for real data" to "lets use SQLite for almost everything"

SQLite is very good if you can fit into the single writer, multiple readers pattern; you'll never lose data if you use the correct settings, which takes a minute of Google search to figure out.

Today, most of my apps are simply go binary + SQLite + systemd service file.

I've yet to lose data. Performance is great and plenty for most apps

michaelchisari 1 days ago [-]
The single writer is less of an issue in practice than it's made out to be. Modern nvme drives are incredible and it's trivial to get 5k writes per second in an optimized WAL setup. Way more than most apps could ever dream.

And even then, I've used a batch writer pattern to get 180k writes per second on a commodity vps.

0123456789ABCDE 1 days ago [-]
all* of that + sharding -> https://sqlite.org/lang_attach.html

ex: main.db + fts.db. reading and writing to main.db is always available; updating the fts index can be done without blocking the main database — it only needs to read, the reads can be chunked, and delayed. fts.db keeps the index + a cursor table — an id or last change ts

could also use a shard to handle tables for metrics, or simply move old data out of main.db

* some examples:

  conn = sqlite3.connect("data.db")
  conn.execute("PRAGMA journal_mode=WAL")        # concurrent reads (see above)
  conn.execute("PRAGMA synchronous=NORMAL")      # fsync at checkpoint, not every commit
  conn.execute("PRAGMA cache_size=-62500")       # ~61 MB page cache (negative = KB)
  conn.execute("PRAGMA temp_store=MEMORY")       # temp tables and indexes in RAM
  conn.execute("PRAGMA busy_timeout=5000")       # wait 5s on lock instead of failing
edit: orms will obliterate your performance — use raw queries instead. just make sure to run static analysis on your code base to catch sqli bugs.

my replies are being ratelimited, so let me add this

the heavy duty server other databases have is doing that load bearing work that folks tend to complain about sqlite can't do

the real dmbs's are doing mostly the same work that sqlite does, you just don't have to think about it once they're set up. behind that chunky server process the database is still dealing with writing your data to a filesystem, handling transaction locks, etc.

by default sqlite gives you a stable database file, that when you see the transaction complete, it means the changes have been committed to storage, and cannot be lost if the machine were to crash exactly after that.

you can decide to wave some, or all of those guaranties in exchange for performance, and this doesn't even have to be an all or nothing situation.

hparadiz 1 days ago [-]
Oh fun something I have some metrics on. I just made this benchmark for every php orm a few weeks ago for fun.

https://the-php-bench.technex.us/

There's a huge performance difference between memory and file storage within sqlite itself. Not even getting into tuning specifics.

1 days ago [-]
Ringz 1 days ago [-]
I usually try to explain it like this: “Single writer” is rarely a real problem, because a writer is not slow. It writes exclusively, but very quickly.

"Batch writer pattern" is a good idea to get rid of expensive commits.

beoberha 20 hours ago [-]
For me, the concern about SQLite has never been if the database engine itself is “reliable for real data”, but that storing data on a single node is not “reliable for real data”. Performance aside, what you are positing is no different than dumping everything to a text file on disk. What happens if that VM dies?
ramblurr 15 hours ago [-]
Postgres, MySQL, all of them, they write to files (binary, not text) on your disk. What happens if your postgres VM dies?

(Hint: whatever your answer is it'll apply to SQLite too)

hellcow 17 hours ago [-]
It's trivial to set up WAL-based streaming backups. Same thing as you'd have with Postgres on GCP. Restore from your latest backup.
aperrien 19 hours ago [-]
If the file is that important, it shouldn't be stored in the VM, but on some sort of more robust storage system.
ohnei 16 hours ago [-]
The standard for pretty much any multiuser app of a reasonable size is a quorum of SQL or noSQL DBs preferably as a single source of truth for all retainable state. Personally, I think foundationDB is the closest to an attempt to make the minimal viable base layer that I've encountered. But C/C++ based and then owned by apple make it not suitable for the role.
ashellunts 21 hours ago [-]
Do you use multiple backend nodes? If yes, how do you access sqlite files from different nodes?
faangguyindia 20 hours ago [-]
I use it for apps which don't need multiple backend nodes.

When i actually have something that requires multi nodes, i just use postgres (with replica) or mongo (with replica).

But it's for those apps which are in autoscaler.

For bulk data refresh I use build artifact and hotreload memort mapped files, by checking a manifest on object storage then only getting update if newer.

I've used this pattern everywhere and never really needed anything more, occasionally i might use redis if something required shared state across multiple nodes and fast.

xenadu02 9 hours ago [-]
You can always route writes to a writer node with streaming WAL replication to all the reader nodes. Works for some workloads and systems, not for others.

For that matter if you write your system with the correct abstraction you can switch to Postgres _later_ if it becomes necessary. For every system that really did need to scale 10,000 are pointlessly overbuilt - worrying about scale when it just didn't matter.

srcreigh 1 days ago [-]
nashashmi 24 hours ago [-]
Taking a minute to appreciate the level of long term thinking required for storing data, to plan for 300-500 years into the future, to be able to withstand all kinds of innovations, and survive basic obsolescence.

What is the longest surviving paper medium?

StilesCrisis 23 hours ago [-]
Dead sea scrolls come to mind, for some values of "survive." And the Book of Kells is in good shape.
IshKebab 22 hours ago [-]
Seems like they're pretty lax about their recommendations tbh. XLS is "preferred".
rmunn 1 days ago [-]
> As of this writing (2018-05-29) ...

So this news is nearly <del>six</del> EIGHT years old. But I didn't happen to know about it until now, so that's not a complaint at all; rather, this is a thank-you for posting it.

(Thanks for the correction. Brief brain malfunction in the math department there).

tehlike 1 days ago [-]
Sir, it's 2026. It's 8 years old.
harrouet 1 days ago [-]
Not if the GP was written 2 years ago :)
rmunn 1 days ago [-]
Corrected; thanks.
mcfedr 17 hours ago [-]
makes sense really, nothing this sensible is gonna happen under the current US administration
frollogaston 1 days ago [-]
Was going to say, was having deja vu reading this
akihitot 1 days ago [-]
For public-sector data preservation, it may be one of the best options.

The specification is publicly available

- It is widely adopted - It is likely to remain readable in the future - It has little dependency on specific operating systems or services - It carries low patent risk

From the perspective of long-term continuity, avoiding dependence on any particular company or service is extremely important.

Spooky23 1 days ago [-]
Archivists also love formats close to native. SQLite lets the relational relationships be present in a way that csv cannot.
b40d-48b2-979e 22 hours ago [-]
Foreign keys are not enforced unless you enable it but only for that connection.
akihitot 1 days ago [-]
That's certainly true. The ability to define table relationships is a major difference from CSV.
afshinmeh 1 days ago [-]
I love SQLite and thanks for sharing it but there should be a "(2018)" at the end in the title:

> As of this writing (2018-05-29) the only other recommended storage formats for datasets are XML, JSON, and CSV.

maxloh 1 days ago [-]
FYI, they added a lot more formats to the list after that.

  Preferred
  
  1. Platform-independent, character-based formats are preferred over native or binary formats as long as data is complete, and retains full detail and precision. Preferred formats include well-developed, widely adopted, de facto marketplace standards, e.g.
    a. Formats using well known schemas with public validation tool available
    b. Line-oriented, e.g. TSV, CSV, fixed-width
    c. Platform-independent open formats, e.g. .db, .db3, .sqlite, .sqlite3
  
  2. Any proprietary format that is a de facto standard for a profession or supported by multiple tools (e.g. Excel .xls or .xlsx, Shapefile)
  
  3. Character Encoding, in descending order of preference:
    a. UTF-8, UTF-16 (with BOM),
    b. US-ASCII or ISO 8859-1
    c. Other named encoding
  
  ---
  
  Acceptable
  
  For data (in order of preference):
  
  1. Non-proprietary, publicly documented formats endorsed as standards by a professional community or government agency, e.g. CDF, HDF
  2. Text-based data formats with available schema
  
  For aggregation or transfer:
  
  1. ZIP, RAR, tar, 7z with no encryption, password or other protection mechanisms.
https://www.loc.gov/preservation/resources/rfs/data.html
xxs 1 days ago [-]
.7z being there just discredits the entire process. The underlying compression algorithm is a free-hand one and can be anything[0], or contain bugs and exploits[1]. Personally I use only zstd with .7z which is 'non-standard' by the official (Russian) release.

[0]: https://7-zip.org/7z.html

[1]: CVE-2025-0411

tnelsond4 1 days ago [-]
I love using zstd, it's so fast to decompress. I especially like that the JavaScript decoder is 8kb and still really fast. Though the 25kb wasm decoders are about twice as fast.

What are the advantages or reasons to use zstd in a 7z container versus just .zst?

xxs 24 hours ago [-]
I love zstd as the next guy and I do use zstd solo for the most part. I had a talk on it few years back too (incl. using the lib directly from Java, massively decreasing log storage, and so on).

Why use it w/ 7-zip though. 7-zip archives multiple files/directories and supports encryption. It has the UI too.. On Windows there is NanaZip that's available in the microsoft store which has been graced by corporate for user-install (unlike zstd that effectively needs WSL), and most folks won't be able to use the command line tool.

Of course using tar with zstd is always an option if you are on linux.

21 hours ago [-]
tracker1 14 hours ago [-]
I've used line-delimited, gzipped JSON for archive formats on several projects myself, which is a pretty good option... If I wanted more flexibility, would definitely consider SQLite.

In fact, I've worked on several projects, where I heavily advocated that even the primary app storage was SQLite, and that archival was simply copying the database after an event. Specifically, elections, petition verification, etc. It's kind of difficult coming up with complex schemas to handle multiple events as well as the state of data at those events... by separating the database itself, using SQLite, that simplifies a lot of thigs. Though it does, practically limit scale a bit. The main thing would be to archive the application and the database after a given event. If the application is containerized, you could create an image of the source, the container and the database after the event.

I think this kind of structure would work well for a lot of things... especially if you're considering data sharding anyway.

tombert 1 days ago [-]
On a recent project I have needed to use exFAT. exFAT is terrible for a number of reasons, but in my case the thing I had to deal with was the lack of journaling, which had the possibility to corrupt files if there were a power interruption or something.

I initially was writing a series of files and doing some quasi-append-only things with new files and compacting the old one to sort of reinvent journaling. What I did more or less worked but it was very ad hoc and bad and was probably hiding a lot of bugs I would eventually have to fix later.

And then I remembered SQLite. I realized that ACID was probably safe enough for my needs, and then all the hard parts I was reinventing were probably faster and less likely to break if I used something thoroughly audited and tested, so I reworked everything I was doing to SQLite and it worked fine.

I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere", but until it does I'm grateful SQLite exists.

topham 1 days ago [-]
The problem with it is you didn't solve your biggest actual problem, you just haven't had a problem bite you in the ass yet so you think your problem is solved.
tombert 1 days ago [-]
I am not sure the problem is actually fully solvable. I think SQLite helps at least a little.
IshKebab 22 hours ago [-]
It's totally solvable and SQLite solves it (or claims to anyway). The real question is if it works. To test this sort of thing properly you really need what is now called DST and I'm not sure SQLite does that. It is pretty well tested though so they've probably done at least some testing of it.
tombert 17 hours ago [-]
I guess some context; I'm not 100% sure it's solvable for the actual domain I'm working on, which is Micro SD cards; they have a tendency to lie about write success.

I think that is at too low of a level for me to realistically solve it, but with SQLite it will at least do what little I can; the fact that it's been around for twenty years with extremely thorough testing and frequent updates means that it's more likely to be correct than some ad-hoc thing I come up with. I think I'm pretty clever sometimes and I could probably get something *as good as SQLite if I really wanted to, but I don't think I'd surpass it and at that point why not just use SQLite?

IshKebab 17 hours ago [-]
> they have a tendency to lie about write success

As long as they lie in order, or alternatively you have a way of verifying the write (e.g. by reading it back) then you should be able to make it work fairly easily.

If they just completely lie - the data is just cached but never actually written - then you're screwed. There's obviously no way to make a persistent storage device out of something that doesn't persist your data.

tombert 16 hours ago [-]
In my experience it's the latter as far as I can tell. It has actually written like 99.99% of the time, but about 1/10000 writes it actually isn't writing.

exFAT has the lovely feature of potentially not only corrupting the file, but also corrupting the metadata for the surrounding system as well. It's terrible.

mmooss 1 days ago [-]
> I wish exFAT would die in a fire and a journaling filesystem would replace it as the "one filesystem you can use everywhere"

Where exactly is everywhere? Win32? All of Linux? BSDs? MacOS? IOS? ...

noirscape 22 hours ago [-]
Everywhere in the sense of "I have a USB stick/SD card, what do I format it to so that every major device I'm using can read it".

In practice, every OS has its preferred system and the rest has varying levels of "I guess this works", with FAT32 and exFAT being the only real cross-platform options.

To wit:

* NTFS is only really properly and fully supported on Windows. Apple mounts it read-only. Linux can certainly mount NTFS and do some basic reads and writes. Unfortunately for whatever reason, the Linux fsck tools for NTFS are absolutely terrible, poorly designed and generally can't fix even the most basic of issues. At the same time, mount refuses to work with a partially corrupted filesystem, so if you're dealing with dirty unmounts (where the worst case usually is some unclosed file handle rather than data loss, but this also happens if you try to mount a suspended Windows parititon, which isn't uncommon since Windows hibernates by default and calls it fast boot), that's a boot to Windows just to fix it.

* Apple filesystems basically only work on apple devices. It's technically possible to mount them on Linux, but you end up digging into the guts of a bunch of stuff that Apple usually just masks for you.

* ext4 is only properly read/write under Linux and requires external drivers under Windows (which may not work properly either, as corruption issues are common).

FAT32 is reliable in that any OS can fsck/chkdsk it and properly mount it without needing special drivers, but is hindered by ancient filesize limitations. exFAT, at least for most cases, is the only filesystem you can plug into most devices and expect more or less the same capabilities as FAT32 (read/write support, can fix filesystem corruption.)

Out of the os specific ones, NTFS seems like it has the most potential to be the one filesystem that works everywhere; it's modern, works good-ish on most devices, it's just that the fsck/chkdsk tooling is awful outside of Windows.

ghrl 1 days ago [-]
Something MacOS and Windows support natively would be a good start, it could grow from there.
Ringz 1 days ago [-]
Looking at *all* my external drives now... that would be great.
tombert 1 days ago [-]
Everywhere exFAT is supported now. Windows, Mac, Linux, FreeBSD would be fine.
pbhjpbhj 1 days ago [-]
Presumably Microsoft fear making it easy to swap OSes and access the same data.

"I can use Linux because if I get stuck I can just switch to Windows and still access my data" is a comfort that probably keeps people from even trying Linux (or other OSes)?

Why else would MS not support BTRFS/ZFS/Ext or whatever?

{I'm not saying that I think this works.}

jodrellblank 21 hours ago [-]
Have you seen Linus Torvalds' comments on ZFS from 2020?

https://www.realworldtech.com/forum/?threadid=189711&curpost...

".. there is no way I can merge any of the ZFS efforts until I get an official letter from Oracle that is signed by their main legal counsel or preferably by Larry Ellison himself .. Don't use ZFS. It's that simple. It was always more of a buzzword than anything else, I feel, and the licensing issues just make it a non-starter for me. .. The benchmarks I've seen do not make ZFS look all that great. And as far as I can tell, it has no real maintenance behind it either any more, so from a long-term stability standpoint, why would you ever want to use it in the first place?"

BTRFS: RedHat has removed all support for BTRFS and deprecated it: https://access.redhat.com/solutions/197643

BTRFS, "Linux's perpetually half-finished filesystem" by ArsTechnica: https://arstechnica.com/gadgets/2021/09/examining-btrfs-linu... with many problems still unaddressed in 2021 dating back to 2009.

tombert 17 hours ago [-]
I definitely understand why he doesn't want to merge it in and risk potential litigation from Oracle, but I think he's kind of wrong about the rest of what he says.

I don't know what people on Solaris use, but I'm pretty sure everyone in the Linux and BSD community is running OpenZFS, which does get frequent updates and has been pretty stable as a kernel module for quite awhile. My main server in my house is running a RAID-Z2 on Linux and has been for more than six years, and I haven't really had any issues. I run scrubs regularly and things seem to work just fine.

I do wish that Oracle would give written permission to let Linux include it into the kernel, since I think it would make it easier to run ZFS on root (which I don't bother with, I just use btrfs on root and that's fine for single-drive systems, like a laptop).

iknowstuff 1 days ago [-]
> Why else would MS not support BTRFS/ZFS/Ext or whatever?

You seriously can’t think of another reason? File systems are complex. Maintenance is a huge burden. Getting them wrong is a liability. Reason enough to only support the bare minimum. And then, 99% of their users don’t care about any of those. NTFS is good enough

StilesCrisis 22 hours ago [-]
NTFS is dog slow. Unfortunately it's nowhere near good enough.
tombert 20 hours ago [-]
In my mind, in the year 2026, I don't really see the point in using a non-CoW filesystem; it would be nice if the Windows System Restore tool actually worked, and that could be achieved much simpler if there were filesystem-level snapshots.
testermelon 1 days ago [-]
I'm surprised they included proprietary format that's de facto standard in profession or supported by multiple tools (.xls, .xlsx) in preferred section [1]. I wonder if "well-known enough" is as good as "open" from preservation standpoint.

[1] https://www.loc.gov/preservation/resources/rfs/data.html

mort96 1 days ago [-]
Especially when Office 365 shows that not even Microsoft is capable of making software which can display Office files anymore... if you have a Word file which was created or has ever been modified by the Word application, working with it through Office 365 in a browser is such a pain. I've literally had images which are impossible to delete or move in the web version, and they will absolutely render in the wrong place.
acdha 24 hours ago [-]
Archivists and librarians have to think in terms of practicality: if many tools exist to read something and it’s a mainstream software product, the odds are good that they’ll be unable to use those files 50 years from now. Not certain, but good, and that matters with limited budget and ability to tell the rest of the world what format to provide things in.

This can require nuance: for example, PDF has profiles because the core format is widely supported but you could do things like embed plugin content from now-defunct vendors and they would only want the former for long-term preservation.

pletnes 1 days ago [-]
You can unzip the xlsx and read the xml inside. It’s not the worst format by far.
perching_aix 24 hours ago [-]
What would you reckon is the worst format? I'm very curious of your standards given this.
ray_v 1 days ago [-]
It's so funny, because I was JUST telling a colleague of mine - another librarian - this exact fact about sqlite!
llagerlof 1 days ago [-]
I used SQLite for a few applications several years ago. One time, the database got corrupted and all the data was lost. That was the day I stopped using SQLite.

Also, the lack of enforced column data types was always a negative for me.

jjice 1 days ago [-]
No matter the medium, backups are a must.
llagerlof 23 hours ago [-]
A hard lesson learned...
benhurmarcel 23 hours ago [-]
For column types there are STRICT tables now
llagerlof 23 hours ago [-]
Thank you!
justin66 23 hours ago [-]
> the database got corrupted

What caused that?

llagerlof 23 hours ago [-]
I don't know why that happened, but one fine day I tried to open the file using the vanilla SQLite client, and it didn't open.
Mawr 22 hours ago [-]
I used a hard drive for a few applications several years ago. One time, the drive got corrupted and all the data was lost. That was the day I stopped using hard drives.
b40d-48b2-979e 22 hours ago [-]
This isn't the same and you know it.
danborn26 23 hours ago [-]
It is great to see SQLite getting this level of institutional recognition. The single file format makes archival storage incredibly straightforward compared to traditional database dumps.
lenwood 22 hours ago [-]
Just yesterday it occurred to me that it had been a while since I last saw an SQLite post at the top of HN.

I really like the simplicity and speed of SQLite, I've used in both personal and professional projects. For day-to-day work I still end up in Excel, not because I like it more (I don't), but because its ubiquity makes it the lowest friction way to share & explore datasets with less technical stakeholders and execs.

gcardinal 21 hours ago [-]
I'm under no illusion I'll suddenly shatter your world views with this, but in case it's as useful to you as it was to me, you might want to check out Metabase[1].

You can self-host and if all you care about is showing data in a digestible format to stakeholders, it's really simple. You can of course go overboard and regret all of your life's decisions with it, but I try and abstain myself.

[1]: https://www.metabase.com/

LoganDark 22 hours ago [-]
I've always been irked by how SQLite relies on text parsing to work. Why is it that I have to write queries in text rather than expressing them in programmatic logic? I have never used a relational database because of this, because I hate them, but they can be more performant than pure structured data, but I hate SQL and the entire idea of SQL and I don't want to write it or learn it or use a system that relies on it. It feels like the wrong approach, on the level of PHP. Is there anything I can do to help this? I don't want to keep passing up SQLite just because of SQL, but I just can't seem to agree with it. I don't want to build strings or have string parsing anywhere in the stack, it just feels wrong.
StilesCrisis 20 hours ago [-]
A "prepared statement" is a precompiled SQL command, ready for bindings and execution: https://sqlite.org/c3ref/stmt.html

You can't precompile your SQL at build time, unfortunately, but you _can_ precompile all your SQL at the very start of your program and then never touch the parser again. This might be a good middle ground for you. It is infra that you can centralize, write some unit tests against, and then not worry about forever.

It's not common because the SQLite parser is lightning fast and it's so convenient to just write out a new query as you need one, versus having one bucket of all queries. But it's an option!

DowsingSpoon 18 hours ago [-]
If you want to avoid string manipulation then you can construct queries with a query builder API like C#'s LINQ. Other languages have similar libraries, e.g., Rust has Diesel.

If your objection is to the SQL language itself then you might find Datalog interesting. Datalog is a logic-based language where you query by writing predicates rather than writing SQL statements. Check out Logica <https://logica.dev>. It's a language in the Datalog family that compiles to SQL.

In both cases, SQL is used only as a low-level IR for interfacing with the database engine.

weird-eye-issue 21 hours ago [-]
I bet you really love LLMs
LoganDark 20 hours ago [-]
I'm torn on LLMs. I've started to use them to accelerate personal development, but they still require a lot of babying and manual assistance. Still, they help a lot.
xiaod 15 hours ago [-]
The operational complexity is worth comparing here. The migration path and schema evolution story often matter more than raw performance numbers for teams choosing between these options.
fpj 23 hours ago [-]
I don't know much about the LoC use case, but my initial reaction to the post is to ask why they are not building a data lake with open formats. I'm sure there are reasons for discarding open-table formats. Claude keeps telling me that the issue is that they don't address preservation properly.
imrozim 19 hours ago [-]
I use postgresql for my startup but every time i needed a quick local testing i wish it was as simple as sqlite. No config just works.
semiquaver 23 hours ago [-]
It certainly will be in the toolkits of data archeologists hundreds of years from now. Must be a weird feeling to create something so potentially long-lasting.
infogulch 21 hours ago [-]
SQLite is remarkably versatile. Just a couple weeks ago an extension to do cross-process queues, streams, pub/sub etc in SQLite was released:

Show HN: Honker – Postgres NOTIFY/LISTEN Semantics for SQLite | 327 points | 94 comments | https://news.ycombinator.com/item?id=47874647

Live notifications was one of the big missing pieces to implement whole apps on a sqlite backend, and now there's a decent solution.

fragmede 7 hours ago [-]
Yes! Can it replace CSV, please?
amai 18 hours ago [-]
Which version of SQLite?
butterNaN 1 days ago [-]
(US)
GeorgeTirebiter 16 hours ago [-]
Now, if only the LoC would recognize the brilliance of the Fossil SCM ....
CalmBirch127 1 hours ago [-]
[dead]
WindyBolt907 7 hours ago [-]
[dead]
WindyBolt907 19 hours ago [-]
[dead]
WindyBolt907 1 days ago [-]
[dead]
mercaearth 1 days ago [-]
[dead]
arian_ 1 days ago [-]
[flagged]
openclawclub 1 days ago [-]
[dead]
FrozenThane269 19 hours ago [-]
[dead]
ksamantha 1 days ago [-]
[flagged]
cpach 1 days ago [-]
Welcome to Hacker News! Please write in English here. Thank you in advance from a long-time member :)
latexr 1 days ago [-]
Translating the comments and looking at the bio, wouldn’t be surprised if this is a bot.
guelo 1 days ago [-]
I get annoyed at all the other DBs that require their own heavy duty server process when for 90% of my projects there is only one client, my app server. Is there a DB that combines sqlite's embedded simplicity with higher concurrent write throughput?
TeriyakiBomb 24 hours ago [-]
I think the concurrent write thing is not as much of an issue nowadays with the speed of NVMEs and WAL.
graemep 23 hours ago [-]
Firebird, maybe?
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 11:33:22 GMT+0000 (Coordinated Universal Time) with Vercel.