Stripe: Open Source(stripe.com) |
Stripe: Open Source(stripe.com) |
It looks like a poster child of companies open source projects done wrong. I'm sure it was developed for internal use, made public with a prominent "Open Source! We're Hiring!"[2] and then mostly forgotten about. Not that maintenance is easy! It's a big commitment, which is why Stripe really shines here: not just for open sourcing their code, but for maintaining their projects continuously.
[0] https://github.com/shutterstock/rickshaw
I don't understand this. You think Netflix and Facebook's Open source projects are lame?
I just want to make a quick comment: Everything the Stripe's front-end guys touch is pure and magical perfection. They definitely have a killer front-end team - design, UI/UX, and more.
Edit: Or at least I think that's how it works...
Agreed that multilog is arcane and weird but what are the actual problems with it that unilog solves?
This actually took a fair amount of digging! We've been using some version of unilog for over 4 years now (longer than I've been at Stripe), and we'd mostly forgotten why we switched. What follows is more the result of historical exploration and guesswork than authoritative statement of original truth.
I'm fairly confident that the impetus for unilog was timestamp prefixes for our log lines. We wanted timestamps (so that we weren't dependent on all applications adding them). multilog is capable of doing writing out timestamps, but it formats them with TAI64N. We wanted something more human-parseable.
Once we had it, we started adding other features. These days, I'd say the most useful thing unilog does for us is buffer log lines in memory. We would occasionally see disk writes on EC2 hang for long enough that the in-kernel (64k) pipe buffer would fill up and cause applications to stall.
The original headline feature of unilog was that it wouldn't block writes if the disk filled up. multilog does - if it can't write a line to disk, it stops ingesting data off of stdin, which eventually causes the application to hang writing to stdout.
unilog sends you an email and starts dropping log lines, which we decided better matched the tradeoffs we wanted to make - losing logs sucks, but not as much as blocking your application until you figure out how to free up disk.
I would have to look at my commit history to remember what exactly were the differences though. I think it could match the various credit card types with fewer digits entered or was better for cvv validation of amex cards.
(I'm an engineer at Stripe)
I realized stripe.js had validation functions built in one day looking into the source, and that's where I remember those validations acting differently from.
With the kind of resources you have, you could fix the few things about postgres that you dont like.
[1] https://github.com/stripe/mosql/blob/master/README.md
[2] https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9....
(ToroDB developer here)
The text is different. The color is different. The animation is different. The code is different. It's a unique implementation.
When you search in MongoDB (or PostgreSQL with a single table with a jsonb column) the executor needs to visit every single document, every single key, to try to see whether that documents matches the query criteria. This means that in the absence of indexes (or inability to use them, as it happens in MongoDB, where they are not used under many circumstances) the whole db would be scan to answer every single query.
I contrast, ToroDB effectively is "partitioning" the data "by type" (being the type the shape of the documents). See this explanation for more detail: https://github.com/torodb/torodb/wiki/How-ToroDB-stores-json.... To answer a given query, ToroDB only reads those tables with potential documents involved in the result, which usually is a small % of the while database. And this, of course, leads to significantly less I/O and queries that can run up to orders of magnitude faster.
Edit: my bad, found dangs comment.
Regarding the timestamps. We've actually gotten quite used to TAI64N--it's definitely not human parseable but it is extremely specific :) We end up just piping logs through tai64nlocal or converting it at the logstash stage.
I contrast, PostgreSQL extensions need C, which doesn't have these advantages and becomes less productive.
But even more importantly: we don't want to tie it, as much as possible, to PostgreSQL. Maybe, tomorrow, ToroDB would run on other RDBMs :)
If run locally, there's barely performance hit. And it also enables you to run it remotely.