Show HN: Kvass, a personal key-value store(github.com) |
Show HN: Kvass, a personal key-value store(github.com) |
(Just in case you are unaware, kvas/kvass is a traditional north-eastern europe drink.)
I've never read it this way but now I can't unsee it.
You could call yeast kvas, but in slavic languages there are usually other nouns used. (drozdze, drozdie, kvasok, kvasnice, закваска , дрожжи). Kvas (the drink) is kvas everywhere.
Prior, I had to deal with ephemeral http servers, which I didn't like from an ergonomic perspective.
Ergonomically, I find redis nice. The problem is, that it is in-memory and that encryption is cumbersome. Also, kvass is able to be used offline, as the kv-store is implemented as a CRDT.
More importantly, it has Firefox and Chrome extensions for auto-filling passwords on the web https://github.com/passff/passff https://github.com/browserpass/browserpass-extension
Honestly a password manager would probably be technically better—or a bunch of flat files lol—but there was a certain charm to having it displayed / function exactly as I like it, and lightning quick with nothing I didn’t need.
IDE would be another natural place for a lot of my usages, but I kept finding I needed to leave it in a pull request review or slack conversation or similar, not necessarily programming myself.
https://www.youtube.com/watch?v=ifE7gDiLDbE
The Life of Boris also has a great video on making Kvass:
https://www.youtube.com/watch?v=k1UTJKBMvgc
though I haven't gotten around to trying it, I've only had commercial bottled and canned ones. I imagine if you make it yourself you'll have a slightly more alcoholic outcome.
Especially self-hosting kvass is even simpler than skate, and I had issues linking/syncing skate in the past.
It would probably be a nice weekend project to port the url/qr features to skate.
I'm wondering why you choose to implement your own cryptography routines instead of using something standard like TLS. Apparently your `DecryptData` and `Encrypt` methods are vulnerable to replay attacks due to a lack of (EC)DH-style key exchange.
On the other hand, I didn't anticipate replay attacks in the design and thanks to your comment, I'll keep them in mind should I ever find myself in a scenario where they are undesirable...
It would be better to use an established cryptography system. You could do self-signed certs with TLS, like Syncthing does. Or just use SSH.
This is by no means meant to replace the backend of your app. It's more of an alternative to usb-sticks and google drive.
In that sense it took off more than bitcoin.
can anyone help explain what i'd use this for?
I got it running on a free GCP Compute VM and linked it through to my PC so that the VM hosts the Kvass server and my PC (and in future laptop) set/get stuff on there.
I plan on using Kvass to pass things between my laptop and PC - links, files, images... etc. Will see how that goes - perhaps I don't end up using it at all.
If it seems useful I'll try hook my web domain in so that I have a more static domain to use it with.
On the other hand, using redis (/skate) for storing files was the inspiration for creating kvass.
For the README, I'd hope to find a bit more information about the way data is stored and transmitted. For example, this seems to just be a SQLite database with values in fields? Is there a separate encryption key for the database itself? Otherwise anyone with access to the file would be able to see all data stored?
The encryption key is only used to encrypt data in transit, but not at rest? And then you're encrypting the full JSON blob instead of only the values? This seems risky to me.
What is the purpose of the ProcessID? It is randomly generated and stored in the database (thus used by all clients too). So, I'm not sure what this is for? I see it's used to resolve conflicts, but these should probably be given out by the server?
Do the clients cache data locally? It looks like you're basically syncing from the server for every request. You're already making a round trip to the server for a request anyway, so why not keep state only on the server? I can understand an offline-only mode, but this would require a significantly more robust sync mechanism. If this was the goal, I'd love to see this discussed more in the README too.
Finally, I don't understand why you're using plain HTTP (no TLS) for communication b/w client and server. I didn't see any authn/authz in the requests. You're also unmarshalling random data from the request w/o confirming that it is valid first. This seems risky to me and could potentially crash the server if I were to send it random data.
This would have been a great use-case for a simple (non-HTTP/JSON) TCP server:
>>> AUTHTOKEN xxx
>>> SET $KEY $LEN $SHA1
>>> <bytes>
<<< OK
>>> AUTHTOKEN xxx
>>> GET $KEY
<<< $LEN $SHA1
<<< <bytes>
Custom protocols have their own security issues, but it can also be easier to see where there are potential issues (like unmarshalling unvalidated blobs). If you wrap something like the above in TLS-PSK, you're set. If you want to use encryption for a session (after you authenticate), that's possible too, but you're at risk of effectively re-creating TLS.A simple one paragraph why at the top of this project's README wouldnt be amiss.
I didn't see this on the readme.
I'm not interested in bottled kvass, it never tastes like the real thing and you don't get to watch kvass explosions in the bottle as it is being made
This recipe is similar to how I make mine (in Russian): https://www.gastronom.ru/recipe/55100/domashnij-kvas-iz-hleb...
There's a pretty amusing "Life of Boris" video that shows how on YT.
echo "value" > ${home}/.db/key
cat ${home}/.db/key > value
scp -r ...> this seems to just be a SQLite database with values in fields?
Sqlite is used as a storage format ("SQLite competes with fopen()"). The key-value pairs are stored as a modified Append-Only CRDT. The LUB-Operation (to merge to states while syncing) is implemented here: https://github.com/maxmunzel/kvass/blob/e32fdabdc86b039f716c...
> anyone with access to the file would be able to see all data stored?
Yes, attackers with access to your fs are not part of my attacker model. I rely on disk encryption for that matter.
> Do the clients cache data locally? It looks like you're basically syncing from the server for every request. You're already making a round trip to the server for a request anyway, so why not keep state only on the server? I can understand an offline-only mode, but this would require a significantly more robust sync mechanism. If this was the goal, I'd love to see this discussed more in the README too.
The sync mechanism is actually pretty solid, as its based on CRDTs. One of the applications of kvass is central management of config files, so automatic syncing and offline fallback are important.
> What is the purpose of the ProcessID?
The Counter Variable implements a rudimentary implementation of Lamport clocks. To get a total order from Lamport clocks, you need ordered, distinct process ids. The process id's don't really need to mean anything and the Lamport clock is itself just a fallback for the case that the wall-clock timestamps collide (see the Max() function), so it's practical to just draw them randomly.
> I didn't see any authn/authz in the requests. You're also unmarshalling random data from the request w/o confirming that it is valid first. This seems risky to me and could potentially crash the server if I were to send it random data.
Authentication is provided by the GCM mode of AES. As I decrypt (and thereby verify) early, I can assume to work on trustworthy payloads. GCM is also non-malleable unlike for example CBC or CTR.
As suggested by losfair, I'll switch to PSK TLS as soon as it's available or just put HTTPS in front of the end-points. But that's not high-priority right now.
That way my entire working file system is encrypted at rest, in transit, and while stored remotely - entirely with heavily mature off the shelf open source tools.
It’s a bit like selling a car by showing all the different things you can hold in the cup holders.
Why can’t people see a use case for this? It maybe doesn’t compare as unique against the hundred other KV stores but it’s also a toy project and a KV store seems to have an obvious use?
Personal, I’m going to try this out since I was actually looking for a similar KV store. Only because I was looking and HN presented it to me tbh.
My use case is that I have a few Raspberry Pis at home (aka low powered) that I wanted to have a distributed config on. I wanted something easy to manipulate with a command line that was lightweight (eg not redis or consul or a password manager). Since it’s for LAN use (or actual Tailscale) the security wasn’t really important.
The gist is the original yc debut of Dropbox had a comment that described a pretty technical way to get the same functionality as drop box. It's commonly referenced when folks on hackernews dismiss a product when they can do the same with 10 unix commands, not realizing they might not be the target customer. Interestingly I think this situation is the exact opposite, Kvass seems to be more complicated for a non technical user than file systems as the top level comment responded with.
That's just repeating the original ignorant Dropbox comment. Over 15 million paying users don't think Dropbox is useless. And hundreds of millions of non-paying users don't either.
I'm in the same boat as you, but there are more kinds of people and situations in the world than just us.
AEAD or gtfo
And syncing between file systems across a network is hard. (Before you say it's easy you can just do X, Y, and Z... remember that infamous Dropbox comment.)
Syncing filesystems across networks with rsync has worked well for years.
If you are considering a personal key value store, you are probably already familiar with web servers and rsync. If not, they are two general purpose tools which are likely to be useful for other projects as well.
I was absent the day of the infamous Dropbox comment.
You're just parroting the original comment which was proven to be so so wrong in practice. Most people aren't able to / don't want to duck-tape random systems together like this.
I could snakily ask you what's the point of Nginx? Why not just run a dial-in BBS? Don’t you have the skills to do that? Why do you need this fancy Nginx and why did anyone bother writing it? That’s what you sound like.
There's value in building something that is integrated.
> Its trivial to set up and operate kvass across multiple devices
> remember the file we stored earlier? Let's get a shareable url for it!
Remember when Dropbox explained itself by telling you you didnt need to carry around USB sticks in your jean pockets that get washed or lost? I thought that was pretty neat.
Still, using a distributed file system is so much better, as its API is supported by basically everything else (including Dropbox!).
I feel that a key-value store goes against the Unix philosophy and is solving an imaginary problem.
Also not everything has to follow the Unix philosophy. Plenty of very useful things are better off less Unix-y eg ffmpeg. But this doesn’t seem to do a bad job - it’s a very dedicated tool to do one thing, it just doesn’t store everything as files.
https://www.sqlite.org/whentouse.html
"Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic."
If the message is:
Key: Foo
Reference CRDT node ID: 7654321 (the last node that the clients knows of that updated the value of ‘Foo’)
Operation: Update
Value: Bar
The ID of this new node: 1122112211
(Omitted for simplicity: Timestamps, hashes, …)
Replaying that message won’t do anything if the target already knows about the existence of that new node.
If the target didn’t know about the node, then I guess you’re helping them sync their own data? Maybe they owe you a thanks? If you knew what each encrypted message contained, you might be able to do some split-state shenanigans; for example: replay the message that sets a “PasswordAuthEnabled” key to “Yes” but deliberately omit the message that changes the “Password” key from its default of “password” to a genuine password. It’s very hard to imagine an actual situation like this occurring, but I guess that’s what makes crypto (and designing secure systems in general) so damn tricky. That and the math. And end users. And…
write '1' @ 0
write '2' @ 0
write '1' @ 0 (replayed through a duplicated packet)
the duplicated write RPC reverts the second write. Duplicated link and rename RPCs are even worse. They added a replay detection cache in the server later to prevent some common error cases, but it fails if the server reboots in the middle.Anyway, CRDT correctness is hard enough that I'd be reluctant to trust it against an adversary who can inject replays.
[0] https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=75...
The only places I see qr codes is on my phone to share the WiFi password and on products to scan for compitions and from time to time on advertising at bus stoos