NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Modern SQLite: Features You Didn't Know It Had (slicker.me)
aaviator42 24 hours ago [-]
SQLite is insanely robust. I have developed websites serving hundreds of thousands of daily users where the storage layer is entirely handle by SQLite, via an abstraction layer I built that gives you a handy key-value interface so I don't have to craft queries when I just need data storage/retrieval: https://github.com/aaviator42/StorX
arunix 9 hours ago [-]
Does StorX have any special handling of concurrent writes, or would the user need to take care of that?
krylon 1 days ago [-]
STRICT tables are something I appreciate very much, even though I cannot recall running into a problem that would have prevented by its presence in the before-time. But it's good to have all the same.

I don't think I've ever done much with SQLite's JSON functions, but I have on one or two occasions used a constraint to enforce a TEXT column contains valid JSON, which would have been very tedious to do otherwise.

crazygringo 1 days ago [-]
> even though I cannot recall running into a problem that would have prevented by its presence in the before-time

I very, very much did. I was using a Python package that used a lot of NumPy internally, and sometimes its return values would be Python integers, and sometimes they'd be NumPy integers.

The Python integers would get written to SQLite as SQLite integers. The NumPy integers would get written to SQLite as SQLite binary blobs. Preventing you from doing simple things like even comparing for equal values.

Setting to STRICT caused an error whenever my code tried to insert a binary blob into an integer column, so I knew where in the code I needed to explicitly convert the values to Python integers when necessary.

QuadrupleA 24 hours ago [-]
Love SQLite and most of these features.

On the STRICT mode, I've asked this elsewhere and never gotten an answer: does anyone have a loose-typing example application where SQLite's non-strict, different-type-allowed-for-each-row has been a big benefit? I love the simplicity of SQLite's small number of column types, but the any-type-allowed-anywhere design always seemed a little strange.

SQLite 22 hours ago [-]
Flexible typing works really well with JSON, which is also flexibly typed. Are you familiar with the ->> operator that extracts a value from JSON object or array? If jjj is a column that holds a JSON object, then jjj->>'xyz' is the value of the "xyz" field of that object.

I copied the idea for the ->> operator from PostgreSQL. But in PostgreSQL, the ->> operator always returns a text rendering of the value from the JSON, even if the value is really an integer or floating point number. PG is rigidly typed, so that's all it can do. But SQLite is flexibly typed, so the ->> operator can return anything - text, integer, floating-point, NULL - whatever value if finds in the JSON.

lateforwork 24 hours ago [-]
When your application's design changes, you may need to store a slightly different type of data. Relational databases traditionally require explicit schema changes for this, whereas NoSQL databases allow more flexible, schema-less data. SQLite sits somewhere in between: it remains a relational database, but its dynamic typing allows you to store different types of values in a column without immediately migrating data to a new table.

This flexibility is convenient when only one application reads and writes to the table. But if multiple applications access the same tables, the lack of a strictly enforced schema becomes a liability. The same is true when using generic tools to process data in SQLite tables, because such tools don't know what type of data to expect. The column type may be X but the actual data may be of type Y.

ncruces 22 hours ago [-]
Not necessarily, but being able to specify types beyond those allowed by STRICT tables is useful.

Ideally, I'd like to be able to specify the stored type (or at least, side step numeric affinity), and give the type a name (for introspection, documentation).

Specifying that a column is a DATETIME, a JSON, or a DECIMAL is useful, IMO.

Alas, neither STRICT nor non-STRICT tables allow this.

irq-1 23 hours ago [-]
> the any-type-allowed-anywhere design always seemed a little strange.

Sqlite came from TCL which is all strings. https://www.tcl-lang.org/

An example of where this would be a benefit is if you stored date/times in different formats (changing as an app evolved.)

spiffytech 18 hours ago [-]
I remember an ORM that uses fake column types like `boolean` and `datetime` because SQLite doesn't enforce anything. That way the ORM knows how to deserialize the data. Strict mode prohibits this by only accepting column types SQLite recognizes.

My preference would be for SQLite to actually support commonplace data types. But as long as it doesn't, I can see the appeal in using the schema to specify what data you're storing in your database.

kherud 1 days ago [-]
SQLite seems very powerful for building FTS (user enters free text, expects high precision/recall results). Still, I feel like it's non-trivial to get good search quality.

I think the naive approach is to tokenize the input and append "*" for prefix matching. I'm not too experienced and this can probably be improved a lot. There are many settings like different tokenizers, stemming, etc. Additionally, a lot can be built on top like weighting, boosting exact matches, etc.

Does anyone know good resources for this to learn and draw inspiration from?

fizx 1 days ago [-]
I mean you can use sqlite as an index and then rebuild all of Lucene on top of it. It's non-trivial to build search quality on top of actual search libraries too.

O'Reilly's "Relevant Search" isn't the worst here, but you'll be porting/writing a bit yourself.

subhobroto 1 days ago [-]
> Does anyone know good resources for this to learn and draw inspiration from?

Is there a reason why something more custom built, like ParadeDB Community edition won't meet your needs?

I understand you're speaking about SQLite, while ParadeDB is PostgreSQL but as you know, it's non-trivial to get good search quality, so I'm trying to understand your situation and needs.

nikisweeting 1 days ago [-]
Surprised no one has mentioned Turso yet!

They recently landed multi-writer support for their rust SQLite re-implementation, which is personally the biggest issue I've had with using SQLite for high concurrency applications.

`PRAGMA journal_mode = 'mvcc';`

https://docs.turso.tech/tursodb/concurrent-writes

Very excited to see if SQLite responds by adding native support, I'm hoping competition here will spur improvements on both sides.

ncruces 22 hours ago [-]
Incredible that a database company writes that page and doesn't document the isolation level of the feature.
nikisweeting 16 hours ago [-]
101008 1 days ago [-]
Not sure if people interested, but since I use sqlite in a lot of my own projects, I am working on a lightweight monitoring and safety layer for production SQLite. The idea is pretty simple: SQLite is amazing, but once it’s running in production you basically have zero observability. If something weird happens (unexpected writes, schema changes, background jobs touching tables, etc.) you only find out after the fact. It tries to solve that without touching application code. It's a Rust agent that runs next to your sqlite file, and connects to the server where everything is logged in. My current challenge right now is encryption and trust, mostly.

Curious if others here are running SQLite in production and if you would be interested in something like this.

captn3m0 21 hours ago [-]
You can shorten your JSON queries using arrow notation in sqlite.

    SELECT
    settings -> '$.languages' languages
    FROM
    user_settings
    WHERE
    settings ->> '$.languages' LIKE '%"en"%';
I use them heavily with my jekyll-sqlite projects. See https://github.com/blr-today/website/blob/main/_config.yml#L... for example.
faizshah 1 days ago [-]
Theres also spellfix1 which is an extension you can enable to get fuzzy search.

And ON CONFLICT which can help dedupe among other things in a simple and performant way.

somat 1 days ago [-]
I was trying to port a small program I wrote from postgres to a sqlite backend(mainly to make it easier to install) and was pleased to find out sqlite supported "on conflict" I was less pleased to find out that apperently I abuse CTE's to insert foreign keys all the time and sqlite was not happy doing that.

    with thing_key as (
    insert into item(key, description) values('thing', 'a thing') on conflict do nothing )

    insert into user_note(uid, key, note) values (123, 'thing', 'I like this thing') on conflict (uid, thing) do update set note = 'I like this thing');
momo_dev 1 days ago [-]
the JSON functions are genuinely useful even for simple apps. i use sqlite as a dev database and being able to query JSON columns without a preprocessing step saves a lot of time. STRICT tables are also great, caught a bug where I was accidentally inserting the wrong type and it just silently worked in regular mode
subhobroto 23 hours ago [-]
> caught a bug where I was accidentally inserting the wrong type and it just silently worked in regular mode

Typically one would design their "DTO"s to catch such errors right in the application layer, way before it even made it into the DB.

Different people call this serialization-deserialization layer different names (DTO being the most ubiquitous I think) but in general, one programs the serialization-deserialization layer to catch structural issues (age is "O" instead of 0).

The DB is then delegated to catch unchanging ground truths and domain consistency issues (people who have a DOB into the future are signing up or humans who don't have any email or address at all when the business needs it).

In your case, it's great the DB caught gaps in the application layer but one would handle it way before it even made it there.

The way I think about DB types are:

1. unexpected gaps/bugs: "What did I miss in my application layer?"

2. expected and unchanging constraints: "What are some unchanging ground truth constraints in my business?", "What are the absolute physical limits of my data?" - while I check for a negative age in my DTOs to provide fast errors to the user, I put these constraints in the DB because it's an unchanging rule of reality.

Crucially, by keeping volatile business rules out of the database and restricting it only to these ground truths, I avoid being dragged down by constant DB migrations in a fast-evolving business

momo_dev 13 hours ago [-]
yeah fair point, validation should ideally happen at the app layer first. in my case it was a quick prototype where i skipped the proper serialization step and STRICT caught it before it became a real problem. definitely not a substitute for proper DTOs in production
malkia 1 days ago [-]
In the past I've used the backup API - https://sqlite.org/backup.html - in order to load in memory a copy of sqlite db, and have another live one. I would do this after certain user action, and then by doing a diff, I would know what changed... I guess poor way of implementing PostgreSQL events... but it worked!

Granted it was small DB (few megabytes), I also wanted to avoid collecting changes one by one, I simply wanted a diff over last time.

kristianp 23 hours ago [-]
There's table valued functions over json as well, as mentioned by [1].

https://sqlite.org/json1.html#table_valued_functions_for_par...

[1] https://news.ycombinator.com/item?id=47618597

tombert 1 days ago [-]
For a long time I absolutely hated SQLite because of how terribly it was implemented in Emby, which made it so you couldn't load balance an Emby server because they kept a global lock on the database for exactly one process, but at this point I've grown a kind of begrudging respect for it, simply because it is the easiest way to shoehorn something (roughly) like journaling in terrible filesystems like exFAT.

I did this recently for a fork of the main MiSTer executable because of a few disagreements with how Sorg runs the project, and it was actually pretty easy to change out the save file features to use SQLite, and now things are a little more resistant to crashes and sudden power loss than you'd get with the terrible raw-dogged writing that it was doing before.

It's fast, well documented, and easy to use, and yeah it has a lot more features than people realize.

mpyne 24 hours ago [-]
I actually needed that exact window function example earlier this week when I needed to figure out why our shared YNAB budget somehow got out of balance with the bank. SQLite to load the different CSVs and lay out the bank's view of the world against YNAB's with running totals was what I turned to.
mastermage 10 hours ago [-]
Sqlite is pretty great I personaly would appreciate abit more expressive types though.
malkia 18 hours ago [-]
One more "hidden" fact - Windows uses sqlite a lot, for a lot of tables. There is even

    "C:\Windows\System32\winsqlite3.dll" 
and

    "C:\Program Files (x86)\Windows Kits\10\Include\10.0.27975.0\um\winsqlite\winsqlite3.h"
    "C:\Program Files (x86)\Windows Kits\10\Include\10.0.27975.0\um\winsqlite\winsqlite3ext.h"

Well it's compiled in it's own way, which may not be to your liking, but it's there to use :)
FooBarWidget 1 days ago [-]
I've found FTSE5 not useful for serious fuzzy or subword full text search. For example I have documents saying "DaemonSet". But if the user searches for "Daemon" then there will be no results.
ers35 23 hours ago [-]
nikisweeting 1 days ago [-]
I have found this as well, FTSE5 is convenient to have as an option, but it's not as versatile as postgres or sonic or other full-text search solutions.

Does anyone have any other favorite modern bloom-filter-based search solutions that dont need to store copies of all the documents in the search db? Ideally something that can run in WASM too so we can ship a tiny search index to the browser. I found https://github.com/tinysearch/tinysearch but haven't tried it yet.

yomismoaqui 1 days ago [-]
andrewstuart 24 hours ago [-]
Disturbing.

I did not know SQLite allows writing data that does not match the column type. Yuck. Now I need to review anything I built and fix it.

I understand why they wouldn’t, but STRICT should be the default.

sgbeal 21 hours ago [-]
> STRICT should be the default.

Somewhat ironically[^1], if STRICT were suddenly made the default, countless applications which work today would fail to work right after that update. SQLite is on billions upon billions of devices, frequently with many installations on any given device, and even a 0.001% regression rate adds up to many, many clients.

One of the reasons people trust SQLite is because, as a project policy, it does not pull the rug out from under them by changing long-held defaults.

[^1]: it's ironic because proponents of Strict tables frequently say that it improves app stability and robustness, whereas activation of Strict tables in apps not designed to handle it will, in fact, make them more fragile. Spoiler alert: most SQLite apps are not designed for Strict tables.

lateforwork 24 hours ago [-]
STRICT has severe limitations, for example it does not have date data type.

Why is it a problem that it allows data that does not match the column type? SQLite is intended for embedded databases, where only your application reads and writes from the tables. In this scenario, as long as you write data that matches the column's data type, data in the table does match the column type.

andrewstuart 23 hours ago [-]
>> Why is it a problem that it allows data that does not match the column type?

“Developers should program it right” is less effective than a system that ensures it must be done right.

Read the comments in this thread for examples of subtle bugs described by developers.

lateforwork 23 hours ago [-]
> “Developers should program it right” is less effective than a system that ensures it must be done right.

You're right, of course. But this must be balanced with the fact that applications evolve, and often need to change the type of data they store. How would you manage that if this is an iOS app? If SQLite didn't allow you to store a different type of value than the column type, you would have to create a new table and migrate data to a new table. Or create a new column and abandon the old column. Your app updates will appear to not be smooth to users. So it is a tradeoff. The choice SQLite made is pragmatic, even if it makes some of us that are used to the guarantees offered by traditional RDBMSs queasy.

subhobroto 23 hours ago [-]
> Why is it a problem that it allows data that does not match the column type? SQLite is intended for embedded databases

I'm afraid people forget that SQLite is (or was?) designed to be a superior `open()` replacement.

It's great that modern SQLite has all these nice features, but if Dr. Hipp was reading this thread, I would assume he would be having very mixed feelings about the ways people mention using SQLite here.

SQLite 22 hours ago [-]
No, I think that people can use SQLite anyway they want. I'm glad people find it useful.

I do remain perplexed, though, about how people continue to think that rigid typing helps reliability in a scripting language (like SQL or JSON) where all values are subclasses of a single superclass. I have never seen that in my own practice. I don't know of any objective research that supports the idea that rigid typing is helpful in that context. Maybe I missed something...

lateforwork 22 hours ago [-]
> where all values are subclasses of a single superclass

I don't understand this. By values do you mean a row (in database terms)? I don't understand what that has to do with rigid typing.

Lack of rigid typing has two issues, in my opinion: First, when two or more applications have to read data from a single database, lack of an agreed-upon-and-enforced schema is a limitation. Second, when you use generic tools to process data, the tools have no idea what type of data to expect in a column, if they can't rely on the table schema.

subhobroto 21 hours ago [-]
First off, I am so glad the famous "HN conjure" actually worked! My "if Dr. Hipp was reading this thread" was tongue in cheek because on HN it was extremely likely that's precisely what would happen. Thank you for chiming in, Dr. Hipp - this is why I love HN!

So, in case you missed it, you're responding to Dr. Hipp himself :)

> I don't understand what that has to do with rigid typing.

Now I would like to learn a bit from Dr. Hipp himself, so here's my take on it:

Scripting languages (like my fav, Python) have duck or dynamic typing (a variation of what I believe Dr. Hipp, you specifically call manifest typing). Dr. Hipp's take is that the datatype of a value is associated with the value itself, not with the container that holds it (the "column"). (I must say I chose the word "container" here to jive with Dr. Hipp's manifest. Curious whether he chose that word for typing for the same reason! )

- In Python, everything is fundamentally a `PyObject`.

- In SQLite, every piece of data is (or was?) stored internally as a `sqlite3_value` struct.

As a result, a stack that uses Python and SQLite is extremely dynamic and if implemented correctly, is agnostic of a strict type - it doesn't actually care. The only time it blows up is if the consumer has a bug and fails to account for it.

Hence, because this possibility exists, and that no objective research has proven strict typing improves reliability in scripting environments, it's entirely possible our love for strict types is just mental gymnastics that could also have been addressed, equally well, without strict typing.

I can reattempt the "HN conjure" on Wes McKinney and see if this was a similar reason he had to compromise on dynamic typing (NumPy enforces static typing) to Pandas 1.x df because, as both of them are likely to say, real datasets of significant size rarely have all "valid" data. This allows Pandas to handle invalid and missing fields precisely because of this design (even if it affects performance)

A good dynamic design should work with both ("valid" and "invalid") present. For example: layer additional "views" on top of the "real life" database that enforce your business rules while you still get to keep all the real world, messy data.

OTOH, if you dont like that design but must absolutely need strict types, use Rust/C++/PostgreSQL/Arrow, etc. They are built from the ground up on strict types.

With this in mind, if you still want to delve into the "Lack of rigid typing has two issues" portion, I am very happy to engage (and hope Dr. Hipp addresses it so I learn and improve!)

The real world is noisy, has surprises in store for us and as much as engineers like us would like to say we understand it, we don't! So instead of being so cocksure about things, we should instead be humble, acknowledge our ignorance and build resilient, well engineered software.

Again, Dr. Hipp, Thank you for chiming in and I would be much obliged to learn more from you.

lateforwork 20 hours ago [-]
Thank you for the great explanation. But SQL isn't as dynamically typed as you suggest. If a column is defined as DECIMAL(8, 2), it would be surprising for some values in that column to be strings. RDBMSs are expected to provide data integrity guarantees, and one of those guarantees is that only values matching the declared column type can be stored.

Relaxing that guarantee has benefits. For example, it can make application evolution easier--being able to store strings in a column originally intended for numbers is convenient. But that convenience can become a liability when multiple applications read from and write to the same database. In those cases, you want applications to adhere to a shared schema contract, and the RDBMS is typically expected to enforce that contract.

It also creates problems for generic tools such as reporting systems, which rely on stable data types--for example, to determine whether a column can be aggregated or how it should be formatted for display.

andrewstuart 22 hours ago [-]
>> but if Dr. Hipp was reading this thread

He is.

subhobroto 20 hours ago [-]
If you reached out and notified him, Thank you. I hope he has time to revisit - had a few more followups. Cheers!
andrewstuart 20 hours ago [-]
No I did not I think he’s been a regular community member a long time he probably just saw it on front page.
subhobroto 23 hours ago [-]
> I understand why they wouldn’t, but STRICT should be the default.

No wait, what do you mean?

As I mentioned at https://news.ycombinator.com/item?id=47619982 - your application layer should be validating the data on its way in and out. I mention the two reasons I use for DB fall back

SQLite 22 hours ago [-]
Checking the datatype is not the same as validating. There is lots of data out there that is invalid, and yet still has the correct type. In fact, that is the common case.

I dare say you will be hard pressed to find a dataset of significant size that doesn't have at least one invalid entry somewhere. Increasingly strict type rules will not fix that.

subhobroto 20 hours ago [-]
Dr. Hipp,

> I dare say you will be hard pressed to find a dataset of significant size that doesn't have at least one invalid entry somewhere

I agree. In my experience, and you've forgotten more than I have learned, the mark of a good data engineer is how they account for invalid entries or whether they get `/dev/null`ed.

> Checking the datatype is not the same as validating. There is lots of data out there that is invalid, and yet still has the correct type. In fact, that is the common case. Increasingly strict type rules will not fix that.

I am having a hard time letting go of this opportunity to learn from you, so in case you have time and read this again - When you say "There is lots of data out there that is invalid, and yet still has the correct type", I read "type" as "shape" or "memory layout" and "invalid" as "semantically wrong".

So, is a good example of this a value of `-1` for a person's age? The database sees a perfectly valid integer (the correct shape), but the business logic knows a person cannot be negative one years old (semantically invalid).

In that case, to be explicit, `0` or `14` is a "valid type" for an age (usually integer), but completely invalid data if it's sitting in an invoicing application for an adult-only business?

Again, thank you for your time and attention, these interactions are very valuable.

PS: I'm reminded of a friend complaining that their perfectly valid email would keep getting rejected by a certain bank. It was likely a regex they were using to validate email was incomplete and would refuse their perfectly valid email address

subhobroto 1 days ago [-]
None of these are news to the HN community. Write-ahead logging and concurrency PRAGMAs have been a given for a decade now. IIRC, FTS5 doesn't often come baked in and you have to compile the SQLite amalgamation to get it. If you do need better typing, you should really use PostgreSQL.

However, I will concede, and the article doesn't mention at all, far less are aware that you can build HA, cross region replicated SQLite using purely OSS software provided you architect your software around it. Now that would be a really good `Modern SQLite: Features You Didn't Know It Had` article!

Another interesting discussion point is how far self hosted PostgreSQL and pgBackRest can get you to a near-zero data loss high RPO, RTO setup. Its simply amazing we can self host all this.

sgbeal 1 days ago [-]
> Write-ahead logging and concurrency PRAGMAs have been a given for a decade now.

All of the listed features except for strict tables and generated columns have been in SQLite for 10+ years, and those two are certainly not new. The JSON APIs were not made part of the standard distribution until 3.38 (2022-02) but were added in 3.9 (2015-10) and widely used long before they were upgraded from an optional extension to a core feature.

- Generated columns: 3.31 (2020-01)

- Strict tables: 3.37 (2021-11)

subhobroto 24 hours ago [-]
> All of the listed features except for strict tables and generated columns have been in SQLite for 10+ years, and those two are certainly not new

Correct.

As I mentioned in my GP, these features have been around a long, long, long time and in the current age of AI that would happily tell you these features exist if you remotely even hint at it, I would assume one would really have to go out of their way to be ignorant of them.

It doesn't hurt to have an article like this reiterate all that information though, I just would have loved the same level of effort put into something that's not as easily available yet.

A gap exists, and the article doesn't mention at all, about actually configuring solutions like litestream, rqlite, dqlite (not the same!) to build HA, cross region replicated SQLite using purely OSS software provided you architect your software around it. Now that would be a really good "Modern SQLite: Features You Didn't Know It Had" article!

Another interesting discussion point is how far self hosted PostgreSQL and pgBackRest can get you to a near-zero data loss high RPO, RTO setup. It just blows my mind that we can self host all this on consumer grade hardware without having to touch a DC or "the cloud" at all.

happytoexplain 1 days ago [-]
There are plenty of people in the HN community who don't know much about SQLite. Tech is a big, huge, enormous, gigantic domain.
123abcdef 1 days ago [-]
I’m afraid you overestimated my knowledge
devnotes77 1 days ago [-]
[dead]
cloudpeaklabs 1 days ago [-]
[flagged]
duskwuff 1 days ago [-]
One handy bit of SQLite's JSON features which isn't called out in this article is the short syntax, e.g.

    SELECT payload->>'$.user.id' AS user_id FROM events ...
This works a little more consistently than JSON_EXTRACT - the -> operator always returns a JSON string; the ->> operator (used here) always returns a SQL scalar (text, number, or null). More details: https://sqlite.org/json1.html#jptr
Animats 1 days ago [-]
Everybody has a JSON extension, and they're all slightly different.

I just got hit badly by Dreamhost, which is still running MySQL 8.0. That version has a "JSON extension" which is almost totally useless. I designed something before discovering how different MySQL 8.4 and MySQL 8.0 are.

MoonWalk 1 days ago [-]
That does seem cool. So there are sufficient functions to iterate through collections that might be stored in a single row of a JSON column?
duskwuff 1 days ago [-]
Yes, using the table-valued JSON functions like JSON_EACH and JSON_TREE (which works recursively). Details: https://sqlite.org/json1.html#table_valued_functions_for_par...
fushihara 19 hours ago [-]
I wish SQLite would add a bool type and proper date/time types. Is there really no plan to add them?

For bool, it could just be an alias of a numeric type. Something equivalent to number check(col = 0 or col = 1) would be perfectly fine.

Date/time handling is pretty weak. Having to store values as GMT text is just inconvenient.

When retrieving values in Node.js, I ended up using new Date(val), which caused the classic bug where a GMT-stored value gets interpreted in the local timezone.

The correct approach was new Date(val + ".Z"), but I really don’t want to deal with that kind of hassle anymore.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 20:06:57 GMT+0000 (Coordinated Universal Time) with Vercel.