Quack: The DuckDB Client-Server Protocol(duckdb.org) |
Quack: The DuckDB Client-Server Protocol(duckdb.org) |
I'm piping sensor readings into duckdb with a deno server, and couldn't use duckdb -ui to look over the data without shutting down the server. I had no interest in using the server to allow me to look at the contents of the db, so I was just going to live with it for now. This perfectly solves that, along with several other similar kinds of problems I've encountered with duckdb.
duckdb is my favourite technology of 2025/26. It has worked its way into so many of my workflows. It's integral to how I work with LLMs, how I store all kinds of data, analytics, data pipelines... I love it.
Yesterday I pulled a bunch of data from Sentry, multiple log groups on AWS, and Github to figure out when some incidents occurred and how events correlated or caused each other.
Doing that in other tools is perfectly possible and fine, but the overhead of setting up a docker container or understanding requirements for setup or needing an account or whatever bespoke query language makes me lose interest immediately.
With this I only need to know SQL, optionally duckdb -ui, roughly how to ingest the sources correctly so they can be joined easily (in this case just make sure everything is a UTC time series), and I'm mostly off to the races. It works fine.
There are more sophisticated and cool and whatever ways to do this, but with Claude as an assistant you can do this with like 3.5 brain cells and get absolutely incredible results.
DuckDB is awesome partially because of how effortless it is and how little ceremony there is. Like SQLite, but even less friction. Having duckdb -ui as a little work bench is brilliant.
This makes Ducklake much more interesting for my use case, excited where this is going.
I can definitely see exploring this for some homelab use.
I'll try to search from source/Github, reply appreciated though, for example:
- when DuckDb bulk exports a table, does Quack benefit from pre-existing compression/encodings/0-copy where Arrow requires decode+re-encode ?
- the post mentions parallel reads, is the level of parallelism the same on Arrow vs Quack here ? Running the high throughput benchmark at resource saturation with increasing number of concurrent bulk-read clients would be more transparent
> Not yet, but we are working on it!
Seems like a niche use case, but it's the one I'm most interested in.
Our lakehouse uses ducklake with postgres as the catalog. Seems like a DuckDB / Quack catalog would be an excellent alternative.
1. No type mismatches for inlining. If you use a non-DuckDB catalog, many types do not have a 1:1 mapping, which introduces additional overhead when operating on those data types.
2. You get the raw performance of DuckDB analytics (and now transactions) over the catalog. DuckDB reading DuckDB is simply faster than any of our Postgres/SQLite scanners.
3. No round-trip for retries. We can easily(tm) run the full retry logic on the DuckDB server side. Right now, these retries trigger multiple round trips for Postgres, making it a performance bottleneck for high-contention workloads.
Disclaimer: I'm a duckdb/ducklake developer.
So you'll be able to test it in a few days.
Because rn even with Postgres as a catalog my client needs access to the underlying storage to use Ducklake.
ADBC: https://github.com/gizmodata/adbc-driver-quack JDBC: https://github.com/gizmodata/quack-jdbc
This is wrong, HTTP is bad for transferring large amount of data and it is also bad for doing streaming.
It is bad for large amount of data because you have timeout issues on some clients, you hit request/response size limits etc.
It is obviously bad for streaming as there is no concept of streaming in it.
It is comical to go the path of least resistance so lazy people can put a reverse proxy on top of it. And then say HTTP is the only relevant way to do it in 2026.
The benchmark doesn't seem to mean much as TCP can max out 50GB/s on a single thread. Pretty sure it can do more than that even. So you could be using anything that isn't terrible and you should get max performance out of this.
Also the protocol is something else from the format. For example if you are transferring mp4 over ftp and http you can compare that.
If you are transferring different things over different protocols then the comparison means nothing.
The benchmark graph for bulk transfer should show more granularity so it is possible to understand how much of the % of the hardware limit it is reaching. Similar to how BLAS GEMM routines are benchmarked based on the % of theoretical max flops of the hardware.
> 60 million rows (76 GB in CSV format!)
This reads a bit disingenuous.
It is dissappointing to see this instead of something like PostgreSQL protocol with support for a columnar format.
> HTTP also allows the DuckDB-Wasm distribution to speak Quack natively! So DuckDB running in a browser can e.g., directly connect to a DuckDB instance running in an EC2 server using Quack.
And I appreciate that the Hannes still appreciates the magic of the WASM. [And I keep hearing quark which makes me hungry for tangy creamy German yogurt]
RDBMS have always been multi-user concurrent systems. DuckDB is a very fast local engine that has a multitude of use cases because it is a embeddable in other systems.
It’s like saying what does SQLite wanna be? It’s in your phones, your browser, your desktop apps, iot devices and people have extended it in different directions. The only difference here is that this is first party not third party. But to me it’s a very legible move.
I'm not knocking Quack or DuckDB but I'm starting to get a bit confused.
The engine is often not the painful part anymore. The pain is the stuff around it: live DBs, S3 paths, Parquet files, credentials, repeatable runs, exports, validation, and the moment a one-off script quietly becomes infrastructure.
Quack makes the remote/server part cleaner, but the bigger trend seems to be DuckDB becoming the SQL layer inside tools, not necessarily the final user-facing tool.
I can't think of many use cases for this and Arrow Flight, other than moving data around.
This is not commercially a terrible idea. Why keep paying Snowflake for bog-standard SQL query workload when SF makes it easy to migrate to Iceberg & commodity engines like MotherDuck?
I agree they should have also listed the compressed size of the table instead of only mentioning the CSV size. But the compressed dataset is probably not smaller than 1/10 of the CSV size. If that's the case they're transferring ~8GB in 4.6 s on a 2GB/s (15Gbps) connection. Seems pretty close to max.
The size of the dataset should be under 3GB in parquet from what I understand. [0]
So it did 3*8/4.94 = 4.85 Gbps which is underwhelming in terms of network performance.
It is still not possible to make any conclusions since we don’t know how specifically they encode it or how they are running the query.
I just mean this writing is useless in terms of engineering perspective, also what it says about http doesn’t make sense
[0] - https://clickhouse.com/docs/getting-started/example-datasets...
CRDT can absolutely do what you’re asking.
Is the concern that the duckdb messaging is now diluted by it having all these extra features? That you can’t sell it to friends as “this thing” like you can a one use tool like curl? I get that, but I also feel that duckdb is so much bigger than a “do one thing and do it well” tool.
It’s an engine that drives the modern data tool stack. Duckdb’s team has been prescient in that it has made many tasteful bets on what users want —- the ability to interop with pandas and polars, addition of geospatial, the plug-in infra. They’re all optional but when you neeed these things, they’re so useful. They’ve also clued me into what the broader data world is thinking about (I didn’t know about sketches and hilbert, but those are so useful in probailistic large scale queries and in geospatial queries). And they exist in larger database systems like Redshift too.
So far duckdb’s bets have been tasteful, and mostly ignorable if you don’t happen to use them.
I'm not really sure what Firebird is for at this point in life really. It was pretty exciting when it was open sourced in the early 2000s, before postgres became the mature beast it is, before mysql acquired something as basic as transactions, and before sqlite became the default embedded db. But then it never really went anywhere.
SQLite/DuckDB actually enables a bunch of normally-illegal behavior when you compare to normal databases. Backups is just copy&paste of a file; spamming queries willy-nilly becomes cheap; you can version the whole DB in git (can’t diff it properly.. but you can do cross-db queries with SQLite ATTACH); locking concerns goes out the window because it’s single-writer anyways.
But if I were actively trying to support multiple users with a single source of truth, I’d probably default to Postgres. If it’s single-user, default to SQLite/DuckDB. If it’s single-user with multiple devices, default to SQLite + replication.
Ducklake handles the metadata and storage, but a local duckdb instance connected to it still has to do the compute itself. This lets you federate access to the compute.
Fun for me, I just finished a big streaming implementation doing essentially the same thing in Go-gRPC with arrow table record batches. It was fun though.
https://duckdb.org/quack/faq#what-is-the-relationship-betwee...
Of course, in the future MotherDuck can also support Quack, but this is not the only interesting use case for Quack.
Just making the point that DuckDB is disruptive technology & what it’s most likely to disrupt.