SuperSQLite: SQLite library for Python (2018)

SuperSQLite: SQLite library for Python (2018)(github.com)

135 points by jkldotio 6 years ago | 75 comments

dest 6 years ago |

Interesting pick from one of the links in the article:

"SQLite has fantastic write performance as well. By default SQLite uses database-level locking (minimal concurrency), and there is an “out of the box” option to enable WAL mode to get fantastic read concurrency — as shown by this test. But lesser known is that there is a branch of SQLite that has page locking, which enables for fantastic concurrent write performance."

https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...

patelajay285 6 years ago | |

Author of SuperSQLite here. SQLite is under-utilized in my opinion! Because of the name, people seem to have the misconception that it can't be used for production workloads. It's true that it generally works well for embedded use cases, but it can be used for so much more and has an extremely simple codebase without a lot of bloat.

agumonkey 6 years ago | | |

Some guy on IRC told me they made a world scale DNS server work because of sqlite.

NilsIRL 6 years ago | | |

Would you recommand it on the server side as well?

For example to store user credentials.

marktangotango 6 years ago | |

When you have a write-heavy workload with multiple servers that need to write concurrently to a shared database (backend to a website), you would probably want to choose something that has a client-server model instead like PostgreSQL

It's easy to get really stellar concurrent performance out of SQLite using a many reader, single writer model (ie many threads, single process). In testing we did it easily surpassed Postgres.

Moto7451 6 years ago | | |

I had great success using SQLite as the backing store for fast sharing and reindexing of a domain specific search engine component my company licensed. Before we developed our own algorithms and moved to Elasticsearch I was playing with augmenting the licensed component with the full text search capabilities of SQLite. The whole thing ended up being shuffled around via Gluster so I was able to offload the responsibility for sharing shards to it.

My experience pretty much matched what you describe and it was such a great opportunity to really lean on a fabulous piece of software. Given this was many years ago, before ES was stable and SOLR wasn't working well for us, I think it was the right choice. These days, ES is good enough that if I had to do it all over I'd go straight to it. None the less, I don't think SQLite should be ignored as an option when you have high levels of control over data access patterns.

skrebbel 6 years ago | | |

I really like this idea.

Does anyone use this seriously in production on a typical web service? I wonder about how eg backups and stuff like that work out in real scenarios.

thunderbong 6 years ago | | |

That's really interesting.

I've always been a big fan of SQLite and this is the one challenge I've always faced.

Can you give some more insights as to how you achieved that?

nbevans 6 years ago | |

In case anyone is curious "BEGIN CONCURRENT" is what is being discussed here (page level locks). There is also a WAL2 mode which is basically the WAL mode operating in a A/B hot swap mode - to facilitate checkpointing without holding up writes.

SQLite has progressed a lot in the last few years. It is no longer advisable to mock it over SQLITE_BUSY et al; you will come unstuck on the Internet very quickly ;-)

pdw 6 years ago | |

I knew Oracle offered something like that (implemented by grafting the SQLite frontend/VM on top of Berkeley DB). But it'd be really cool to see page-level locking appear in standard SQLite version.

alberth 6 years ago | |

Dumb questions: if BedRockDB has all of these huge benefits over using stock-SQLite, why hasn't SQLite merged in all of the changes into SQLite? Why does BedRockDB have to exist as a separate fork?

nbevans 6 years ago | | |

I think it is because the changes are, currently, considered esoteric and slightly experimental in the sense they want to reserve the right to make breaking changes to them. SQLite has a relatively slow cadence when it comes to things like this. Other reasons may be that they're still working on the tests to validate and support those features - as well as documentation. I am hopeful BEGIN CONCURRENT and WAL2 will make it into the amalgamation at some future date.

cheez 6 years ago | |

uh.... concurrent writes? southparkejaculation.gif

Unfortunately, I use SQLite through Python so I'm stuck with the system version :-(

_pgmf 6 years ago |

Looks like it just bundles a full build of the sqlite shared library, along with some kind of gross hacks. It bundles pysqlite2, which notably is missing a ton of features and improvements from upstream. I'll just compile my own, thanks...

patelajay285 6 years ago | |

Hi @coleifer, I'll disagree, I'm the author. I don't think they are gross hacks, just a more sensible default configuration. In fact, I used a lot of your advice from your blog post https://charlesleifer.com/blog/going-fast-with-sqlite-and-py... while making this library.

Fair enough if you want to compile your own version, but there's a lot of people and Python devs who don't know how / won't go through the effort and that's what this project is for.

It's also a work in progress and not meant to be released yet. Shoot me an e-mail ajay@plasticityai.com if you have suggestions for how to improve it.

_pgmf 6 years ago | | |

The advice in my post that you linked can be supplemented by this:

http://charlesleifer.com/blog/compiling-sqlite-for-use-with-...

Describes several handy ways to compile sqlite for use with python (2 or 3), as a system-wide shared library or compiled directly into the python module. This can also be applied to sqlcipher, etc.

If you don't know how to compile sqlite, I'd argue that you have no business trying to use it's more advanced features. How can you tune or optimize something you don't understand?

Furthermore, your library is stale. Anyone using it is dependent on you to merge in upstream changes constantly. And based on what I saw, it's already well out of date.

patelajay285 6 years ago |

Hi, I'm the author and founder at Plasticity (YCS17) and built this for some internal use cases our startup. This is actually a Work-in-Progress and not meant to be released yet, but it seems like someone has found it online. I will re-submit this here when it is ready as a Show HN given the interest, shoot me an e-mail at ajay@plasticityai.com if you would like to be kept up to date!

nabdab 6 years ago |

This looks like a dead project that just bundles together the built-in sqlite3 And another wrapper library APSW which itself is more active than this project. Why is this on hn?

stedaniels 6 years ago | |

No commits in 9 months, "looks like a dead project".

patelajay285 6 years ago | |

Hi, I'm the author! It's not a dead project, it's just not released yet, someone seems to have submitted it early :). Will post it back here when it's ready for prime time.

airstrike 6 years ago | |

> Why is this on hn?

Because anyone can submit links?

typon 6 years ago | |

We need comments like yours and let the down votes do the rest.

tony 6 years ago |

Any features in this library you'd like to see standard library's sqlite3 [1]? Maybe a PEP [2, python enhancement proposal] could do it.

[1] https://docs.python.org/3/library/sqlite3.html

[2] https://www.python.org/dev/peps/pep-0001/

mehrdadn 6 years ago | |

> Any features in this library you'd like to see standard library's sqlite3 [1]? Maybe a PEP [2, python enhancement proposal] could do it.

Not sure if it's in this library, but I'd want the ability to open a database that's already loaded in memory as a byte array.

nerdponx 6 years ago | |

In the standard library? Probably nothing.

But if someone published an alternative APSW wheel with JSON1, ICU, and FTS5 enabled, I'd be happy.

rogerbinns 6 years ago | | |

I'm the APSW author. The binary builds for Windows are distributed with those extensions all compiled in, although my doc needs some updating. It is also only a single flag for other platforms to include all extensions during compilation. What can I do?

dragonsh 6 years ago | |

May be just create a new PEP to replace sqlite3 with it.

johnisgood 6 years ago |

> SQLite is extremely reliable and durable for large amounts of data (up to 140TB). It is considered one of the most well-engineered and well-tested software solutions today, with 711x more test code than implementation code.

I keep seeing this statement. Why is it considered one of the most well-engineered software?

Gys 6 years ago |

Its supports 'Remote Streaming over HTTP' without explaining what that means. Maybe someone here knows?

niea_11 6 years ago | |

From what i saw in the source code, it's a feature that lets sqlite open a database file stored on a web server. There is a class in the code HTTPVFS [1] that proxies sqlite's filesystem operations to http requests.

[1]:https://github.com/plasticityai/supersqlite/blob/01e54bbb829...

patelajay285 6 years ago | |

I'm the author, yes @niea_11's comment is right. We add a HTTP Virtual File System. So you can stream the SQLite files over a static HTTP Server like Amazon S3 using HTTP Range Headers to do it efficiently. No dynamic web software needed.

egorfine 6 years ago | |

Got me, too

ddorian43 6 years ago |

Title is misleading. It needs to add something like `python-library`. I thought they supercharged sqlite itself.

patelajay285 6 years ago | |

Hi I'm the author, see my parent comment in the thread. This is still a WIP. You're right that it's mostly a Python library, but when it's released, it will actually be released with pre-built native static SQLite libraries that can be linked into any C application, so it is actually supercharged SQLite itself :).

jgalt212 6 years ago |

> SQLite is extremely reliable and durable for large amounts of data (up to 140TB).

This has not been our experience. Our experience is that it sort of bogs down around 12GB in file size. linux box, ext3/4, spinning disk (not SSD)

patelajay285 6 years ago | |

Hi @jgalt, I'm the author of this library, and we've been using it to serve extremely large files over ~130GB at Plasticity (YCS17) for AI model data! Shoot me an email ajay@plasticityai.com and if you'd like to chat, but it should remain extremely responsive even at that file size.

jgalt212 6 years ago | | |

Thanks may take you up on that after running some new tests. It was years ago we had performance issues (approx 3-4), so will see if time has healed all wounds.

d_burfoot 6 years ago |

I've been toying with the idea of using SQLite as a data exchange format instead of JSON or XML. I can't stand navigating through complex JSON trees, I want to just use SQL queries. Has anyone else tried this?

sametmax 6 years ago | |

Works well if your data is big and doesn't have to be read on a web browser.

mtw 6 years ago |

How accurate is this statement? "SQLite is faster than nearly every other database". It links to a page that's decades old.

trollied 6 years ago |

The title would be better as "SuperSQLite: a supercharged SQLite library for Python".