Work on SQLite4 has concluded

Work on SQLite4 has concluded(sqlite.org)

378 points by joewalnes 8 years ago | 157 comments

gregmac 8 years ago |

The relevant change:

> This directory contains source code to an experimental "version 4" of SQLite that was being developed between 2012 and 2014.

> All development work on SQLite4 has ended. The experiment has concluded.

> Lessons learned from SQLite4 have been folded into SQLite3 which continues to be actively maintained and developed. This repository exists as an historical record. There are no plans at this time to resume development of SQLite4.

https://sqlite.org/src4/artifact/56683d66cbd41c2e

Scaevolus 8 years ago |

For context, SQLite4 explored reimplementing SQLite using a key-value store on log-structured merge trees, like RocksDB and Cassandra.

I'd be interested to hear why they stopped. Presumably reimplementing SQL on a KV store was seen as not worth it, when applications that are satisfied with an embedded KV store backend (which is much faster and simpler to write!) already have many options.

makmanalp 8 years ago | |

I've had the chance to hear Richard Hipp talk about SQLite yesterday! He mentioned that the LSM tree storage engine is available as an extension to sqlite3. More specifically, he mentioned that he didn't really get the performance improvements he had hoped for, for insertion-heavy use cases.

I think part of this is because of a fundamental limitation of sqlite that it's an embedded database that has to persist data on disk at all times: The design of LSM trees works well with databases with a resident in-memory component because it's an approximation of just dumping every new thing you see at the end of an unordered in-memory array. This is as opposed to a data structure like a b-tree where you have to /find/ exactly where to put the data first, and then put it there. This finding bit means you're doing a lot of random access in memory, which is thrashing all of your caches (CPU / disk etc). LSM trees avoid this thrashing by just dumping stuff at the end of an array. However this means you have to scan that array to do lookups (as opposed to something easier like binary search). Then as your array gets big, you merge and flush it down to a lower "layer" of the lsm tree which is slightly bigger and sorted. And when that one fills, you flush further. And these merge-flushes are nice big sequential writes so that's nice too.

Anyway, with SQLite, the highest layer of your LSM tree would probably (this is conjecture) have to be on disk because of the way that there is no server component, versus in an in-memory system it'd probably be in your L2/L3 cache or at least your main memory. So this could be one reason why that model didn't work out as well for them.

coleifer 8 years ago | | |

I'm jealous you got to hear Dr. Hipp, that sounds cool. Would love to hear more about the circumstances :)

Regarding the LSM engine, you can find all the relevant implementation details here: https://sqlite.org/src4/doc/trunk/www/lsm.wiki#summary

> The in-memory tree is an append-only red-black tree structure used to stage user data that has not yet flushed into the database file by the system. Under normal circumstances, the in-memory tree is not allowed to grow very large.

thesz 8 years ago | | |

I did something like that - LSM backend for OO-relational DBMS (inhouse).

I also hoped for big win on the insertion-heavy loads, and I also haven't succeed in that. The problem is that every insert statement must read back something from DB to verify DB state against schema for correctness. As reads in LSM are slower, the net win is either absent or negligible. I have to say I wrote "must" in sentence above because you sometimes can get away without reading back, but not always. In the end, worst case scenario is always "read and write", not just "write".

But!

I devised a scheme to lay out layers' data so that they are as contiguous as they can be. Or get a very good approximation to that contiguousness, basically (O(1) "pages" per level). Thus contiguous reads got very high performance and beat old storage on read scheme, despite the need of level merging, etc.

katastic 8 years ago | | |

Perhaps I'm misunderstanding you, but sqlite supports ramdisks/"In-memory databases"

https://www.sqlite.org/inmemorydb.html

Perhaps the key (ha!, pun) is that you're talking about using RAM _and_ disk with the RAM being for caching/fast access that eventually hits the disk. Whereas, I think, in this case sqlite is either on the disk, or in RAM. There is no multiple tiers.

Correct me if I'm off here. Thanks.

robotbikes 8 years ago | |

According to a presentation Richard Hipp did at Ohio State, they just didn't get the same level of performance from a log-structured merge vs. the b-tree in SQLite 3. If I remember correctly, while log structured merges where able to accumulate in memory actual to consolidate writes there was still a number of reads required to support joins etc. So they just determined based upon their testing that the b-tree based code was just faster. I don't recall the exact benchmarks, it is probably on the web somewhere.

sliverstorm 8 years ago | |

Everything I hear about SQLite3 always suggests that, essentially, it is considered "done". It does what it is supposed to do with great performance. There is nothing major left to do. If it doesn't meet your needs, pick a different SQL database.

Which, while a totally alien concept in the modern software world, is actually a pretty cool thought.

(I'm sure under the hood bugs are getting fixed and all)

Someone 8 years ago | | |

Done? Almost every .0 version I checked in https://www.sqlite.org/chronology.html saw some new features.

ianamartin 8 years ago | | |

Well, obviously they should rewrite it in Rust.

coleifer 8 years ago | | |

Check out the changelog...they've added a ton of new stuff. Native json support, completely new full-text search extension, lsm key/value extension, performance improvements. I think they're looking at some changes to locking in the near future as well. Lots of stuff to find if you look.

Oxitendwe 8 years ago | | |

SQLite is one of my favorite pieces of software for this exact reason, that it is actually a more or less "finished" program instead of a mire of shifting design requirements and constant security updates, a comfortably static and unchanging object against the chaotic backdrop of modern software development.

manigandham 8 years ago | |

There is also a project called UnQLite for UnQL (unstructured query language) designed to be an embedded document store, announced in 2011 [1]. Looks like it's now maintained by another company. [2]

1. https://www.infoq.com/news/2011/08/UnQL

2. https://unqlite.org/

tyingq 8 years ago | |

Would have been a neat way to experiment around with putting a SQL front end on various KV interfaces. Redis or etcd, for example.

michaelmior 8 years ago | | |

CockroachDB has a good blog post[0] that describes how they implemented SQL. (CockroachDB is a key-value store.)

[0] https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mappin...

electrum 8 years ago | | |

Presto has connectors for various types of non-SQL databases including Redis: https://prestodb.io/docs/current/connector/redis.html

Presto is a distributed SQL query engine for big data, so basically the complete opposite of SQLite, though it often gets used in federation scenarios, as does SQLite.

An interesting anecdote is that the team working on what would become osquery (https://osquery.io/) asked if they could reuse the SQL parser from Presto. We get that question a lot, and after explaining that the parser is the easy part (semantic analysis and execution is the real work), I determined that what they really wanted were SQLite virtual tables: https://sqlite.org/vtab.html (and those worked out great for them)

nerfhammer 8 years ago | | |

Fairly straightforward to make a mysql storage engine for it, e.g. https://github.com/AALEKH/ReEngine

alphaalpha101 8 years ago | |

Scaevolus, I recognise that name. #0x10c-dev on freenode? rmmh? skybot?

I created a logsql.py plugin for skybot, it just logs to the DB instead of to text files. Is that something you'd be interested in merging back in?

striking 8 years ago | | |

Developer of some optimization mods for Minecraft, one of which became OptiFine.

gtrubetskoy 8 years ago |

I have to say I learned more about databases from just studying SQLite code than any book on the subject. I've bought a bunch of books on DB's, some very expensive ones, but I wish someone pointed me to SQLite source early on.

To internalize it better I invented a "project" for myself - http://thredis.org/ which was (and is, but I'm not maintaining it) a Redis/SQLite hybrid. It was fun to hack on.

Another invaluable source of DB internals information is PostgreSQL. Both projects have amazingly well written and detailed comments.

bane 8 years ago |

SQLite is one of those awesome things that's the exact opposite of magic. It's beautiful, jaw dropping, engineering that exercises so many technical muscles.

The number of oddball, often critical, places where I've found SQLite being used would defy belief. As far as I can tell, the "expected" place for SQLite to work seems to be almost anything that's not your normal dB driving some web-based CRUD app...all kinds of embedded systems, easy to manipulate in-memory scratch pads for bioinformatics, lots of data analysis tools in mobile communications.

It's so good, and so obvious, that I think sometimes it makes other tools that might be simpler fits for many use-cases less likely to be used, like leveldb.

hasenj 8 years ago | |

> your normal dB driving some web-based CRUD app

That can totally be handled with SQLite.

sametmax 8 years ago | | |

I have several crud apps running sql. With moderate write load and a good concurrency write error handling code it works very well. Good when the product size is not worth a postgres full blown setup.

Daycrawler 8 years ago | | |

Unless you want to scale. I'm all for the rule of least power, but as soon as the app is exposed to multiple users I would ask and be sure about the expected number of simultaneous users before going with SQlite instead of going with a Client/Server RDBMS. Still, the bound is pretty high if you keep your transactions short.

scott_karana 8 years ago | | |

Not if you need to use multiple app servers.

assface 8 years ago |

Richard Hipp has said that they have signed contracts to support SQLite3 for 35 years. SQLite4 is never going to happen.

sharpercoder 8 years ago |

Every time I see something about sqlite, I become sad. It reminds me of the failure of the w3 standards comittee to accept it as web standard. They rejected sqlite because no competing implementation existed. Furthermore, "public domain" license of the software was also a hurdle, iirc.

dude01 8 years ago | |

We literally lost several years for web app advancement because of that. Reading the decision making, it seemed like overly-legalistic engineers, but I'm open to conspiracy theories that this decision enhanced mobile app store adoption.

smitherfield 8 years ago | | |

No, it was Mozilla who killed it[1] over Apple and Google's strong objections.

[1] For pretty much complete nonsense NIH and standards-lawyering reasons.

triangleman 8 years ago | |

That is strange, for some reason I was under the impression that all browsers natively support sqlite. I even installed a Chrome extension when I wanted a gui to browse sqlite db's. I am disappointed to hear this.

hoodoof 8 years ago |

The biggest thorn I found working with sqlite was the lack of ability to modify columns with ALTER TABLE which was a real pain.

Doesn't look like this is fixed in sqlite4 though...

coleifer 8 years ago | |

There's an easy workaround: rename the table, create new table, insert into/select from.

mst 8 years ago | | |

The migration tools I work with have this already automated and (having been involved in the design of said automation) it's really not that difficult to add.

Other DBMSes often have to do complete table rewrites for certain types of ALTER anyway - so while, yes, it's faintly annoying to have to do that in userspace rather than having the db engine do it for you, it's not nearly as big a disadvantage as it first appears.

kamac 8 years ago | |

Same here. Had to switch to dockerized mariadb for local tests, because migrations wouldn't work.

jey 8 years ago | | |

Wouldn't you want your tests to be run against the same DB family (and version) as production anyway?

contingencies 8 years ago | | |

Why not work around the expectation and simply migrate offline? (eg. dump DB, hack dumpfile/stream, load new CSV?) While you may lose instantaneous constraint validation, it would almost certainly be faster and allow you to work with known and well tested tools. Conforms to the Unix design philosophy: "Store data in flat text files." / "Write programs to handle text streams, because that is a universal interface." http://github.com/globalcitizen/taoup

Since you were nominally optimizing for migration, a zoom-out perspective may be to note that upgrading SQLite3 versions vs. upgrading major RDBMS versions is trivial/fast, relatively rarely required, also cohabitation of multiple versions works a lot easier, any kind of CI/CD process is going to be orders of magnitude faster and use much less CPU/memory/disk space, which means smaller build artifacts and thus faster transfer/download.

carry_bit 8 years ago | |

You can add columns at least.

gigatexal 8 years ago | |

lack of window functions kill it's usefulness for me.

contingencies 8 years ago | | |

Official SQLite response http://sqlite.1065341.n5.nabble.com/Window-functions-td77470...

NelsonMinar 8 years ago |

There's an excellent ~80 minute podcast interview with the sqlite author here: https://changelog.com/podcast/201

Afton 8 years ago |

Would love to hear some of the lessons learned...

k__ 8 years ago | |

Their main idea was that B-Trees are slow and LSMs are fast.

This was a partially right assumption, but only for writes.

If you write something in a DB you check some constraints and those checks are reads.

So most DB writes come with a bunch of reads.

The reads were slower with the LSMs, so the B-Trees performed better in "real world" writes (which come with reads) and LSMs only performed better in "artificial" writes (without reads).

jpetso 8 years ago | |

I found these slides: https://www.slideshare.net/InsightTechnology/dbtstky2017-c23...

shalabhc 8 years ago |

SQLite is great. For an unusual application see actordb.com - a server side database that uses a large number of independent SQLite databases.

bane 8 years ago | |

Woah, that's awesome! Any performance information anywhere?

maxpert 8 years ago |

Interesting and I saw title and thought to my self hmmmm... may be SQLite4 is just around the corner. This is a good case study to show people look sometimes classic works better and NoSQL coined terms and techniques might work in limited scenarios. Still makes me wonder if LSM would have been faster for mobile devices though, I know it might not work well for embedded devices; but with modern mobile devices (1+ GB of RAM) it might have some speed benefits.

Shameless plug https://github.com/maxpert/lsm-windows (I did port the LSM storage to windows).

the_common_man 8 years ago |

Anyone know how sqlite makes money?

chmaynard 8 years ago |

The title of this post, a true statement removed from its context, attracts readers like me because its implied meaning has considerable shock value. Of course, Dr. Hipp didn't help matters by naming his experimental fork "SQLite4".

adekok 8 years ago |

I'm surprised there wasn't more investigation of SQLite and LMDB:

https://github.com/LMDB/sqlightning

The performance there shows either little to no performance difference, up to substantial speed increases.

coleifer 8 years ago |

The source tree for sqlite3 now contains an extension named lsm1 that contains both the standalone lsm kv database as well as a virtual table extension which allows you to use it directly from sqlite3. Some info on python integration can be found here:

http://charlesleifer.com/blog/using-sqlite4-s-lsm-storage-en...

In peewee 3.0a I've also added built-in support for using the lsm1 virtual table if you're interested.

tomphoolery 8 years ago |

instead of pretending to release a new version, it might be better to just call this fork sqlite-failed.

cyberferret 8 years ago |

Doubt still exists... Does this mean 'concluded' as in "We've finished polishing the pre production code and are close to releasing it" or 'concluded' as in "We have thrown our hands up in the air and won't be working on this thing any more to bring it to production" ??!!??

EDIT Seeing as I am getting slammed by downvotes, my comment here was simply pointing out that the headline I saw on HN could be read in multiple ways. As a long time user of SQLite3, I was initially excited when I read the title as I had thought it meant something good coming from the SQLite team. Turns out not to be. That, to me, still entails doubt.

fgonzag 8 years ago | |

" Lessons learned from SQLite4 have been folded into SQLite3 which continues to be actively maintained and developed. This repository exists as an historical record. There are no plans at this time to resume development of SQLite4."

Right there on the web page. Highlighted in green even.

xelxebar 8 years ago | | |

Come on. Shaming helps nobody. I know I've missed seemingly obvious things before and am sure everyone else here has as well.

cyberferret 8 years ago | | |

Below the fold even, after about 1000 other words of text. Would be better if the title had used 'cancelled' or 'terminated' instead of 'concluded' to explain the situation at the outset?

favorited 8 years ago | |

From the commit:

> This repository exists as an historical record. There are no plans at this time to resume development of SQLite4.

Danihan 8 years ago | |

The answer you seek is in the link, in highlighted green text.

_kst_ 8 years ago | | |

Yes, it is, but I also initially assumed that this was a successor to SQLite3. No, it didn't take long to correct that misunderstanding, but it would be nice if the title were clearer.

exikyut 8 years ago | |

HN, stop being so monumentally stupid. This is a genuine question, and one that I had too.

(To clarify, this is directed at all the downvoters, not the commentator I'm replying to.)

atombender 8 years ago | | |

I didn't downvote, but I don't think it's too much to ask for commenters to read the linked web site carefully before they jump in with a comment. The answer is literally in the link.

Your company desires warranty of title and/or indemnity against claims of copyright infringement. You are using SQLite in a jurisdiction that does not recognize the public domain. You are using SQLite in a jurisdiction that does not recognize the right of an author to dedicate their work to the public domain. You want to hold a tangible legal document as evidence that you have the legal right to use and distribute SQLite. Your legal department tells you that you have to purchase a license.