Evernote blog: WhySQL?(blog.evernote.com) |
Evernote blog: WhySQL?(blog.evernote.com) |
Robert Haas, a Postgres committer , occasionally blogs about comparisons : http://rhaas.blogspot.com/search/label/mysql
Theres also http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL
Historically , MySQL has been more widely available on low end web hosting plans, so its what a lot of people first use when they start using databases, and a lot of web apps, such as Wordpress, support it exclusively.
Until a year or two ago, only MySQL had built in (if occasionally fragile) replication which made it popular for that reason alone. Postgres now has robust replication, with new features coming down the pipeline soon : http://www.depesz.com/2011/07/26/waiting-for-9-2-cascading-s....
I prefer Postgres, but oddly enough under Oracle theres been some interesting features added to MySQL, which is good for both.
Also, MySQL got commercial entity behind it in its early days which promoted it a lot. In addition, it worked an all platforms, including windows, while Postgres was there just in last couple of years.
I cringe every time I read that. MySQL replication is many things, but it is not reliable (as anyone who has used it at scale will confirm).
I think the only reason this myth prevails is because hardly anyone ever actually verifies if their master/slave are in sync. A table checksum can be a real eye-opener here, especially on a deployment that's been running for while and undergone schema changes, restarts, network splits, etc.
PostgreSQL's built-in replication is pretty easy to set up[1] and provides a writable master, and a cascade of slaves. Slaves can be synchronous or asynchronous, and the synchronicity can be turned off per transaction.
[1] http://www.depesz.com/2011/01/24/waiting-for-9-1-pg_baseback...
MySQL is good enough so there is no need to migrate applications for a few additional features.
In the Windows world the same is true of SQL Server -- the setup, connectivity, and basic usage is so incredibly easy that it made it the first choice of many teams.
This seems incredible -- that products are chosen on such an irrelevant-in-the-long-term basis -- however it has proven true across almost all of the computing market, even targeting highly skilled developers. PHP has few competitive merits, yet it was the default option for many because it was so easy to make something basic in.
There's a lesson there in that.
I don't think there is anything too incredible in that. If you want to throw together an idea quickly, get it out there and test response then use whatever technology gets the job done quickest. You can always change later.
Why waste huge amounts of time setting up a technically perfect database for a product it turns out no-one wants?
"But we’re relatively satisfied with sharded MySQL storage for Evernote user account metadata, even though that’s not going to win any style points from the cool kids."
Indeed, hipsters beware!
Does it mean they have a database per user? That can't be right is it?
This is really useful for things like sharding, where you can split a database table onto more than one machine, because there will be few queries that will stall fetching data from one machine to another.
See, they haven't run into problems with their setup, as per, MySql 'just works' for them.
What would be interesting and educational (for me anyway) would be a situation where folks that ran into serious problems with their SQL setup despite doing the 'right things' persevered where conventional knowledge would have them switch to a NoSql solution.
tldr; Dog bites man article, would love to hear from someone that actually struggled with a SQL solution and soldiered on.
Storing an entire notebook in a single document would be the most obvious. I use postgres all the time and sql is great, but poo-pooing nosql because it wouldn't work with your relational structure is not the best idea. Also - I have found a hybrid between nosql (mongodb) and sql (postgres) is ideal - who says you need to use a single database?
Traverse all notebook documents and look at each notes date? Good luck with that.
The cost of course being that a change to any note in a notebook yields a save of the entire document. Not a problem in simple cases, but that sort of mass-write-amplification can kill you (talk to Digg about that).
Also - I have found a hybrid between nosql (mongodb) and sql (postgres) is ideal - who says you need to use a single database?
Simplicity. Coherency. Maintainability. And on. Sure, it might make sense, but if you already have you toes in the "SQL" world, it is usually worthwhile to dunk your whole foot in. Many SQL products also offer the document functionality of MongoDB, for instance. SQL Server, as an example, lets you store XML documents to your hearts content, which you can index and intelligently query upon, etc. Your scheme is boundless, and on and on.
Maybe it's the 30 year old in me showing, but I'm sticking with the 'it just works' crowd. Until some other approach provides a staggeringly overwhelming reason to switch. I find scaling up with MySQL to be ridiculously easy, allowing me to focus my time elsewhere. Ram, bandwidth, and fast storage have gotten substantially cheaper in the last few years, making it that much easier and cost effective to throw hardware at scaling up. For 99.9% of the Web, those hardware resources are expanding in value much faster than traffic is increasing.
(It's understood other developers find it just as easy to take a different approach)
Also, the example itself is very weak as bitdiffusion below pointed out.
People sometimes conflate the DB access approach (document vs relational) with the storage approach (transactional vs warehouse). This may be because the NoSQL poster child, MongoDB, at one point defaulted to a non ACID mode of operation. But you can have a relational DB that's not ACID (MySQL 3) and an object DB that is (Couch).
I was surprised so much of the original article focused on ACID as though it were the biggest selling point for an RDBMS. It seems like the biggest win (right now) is the sheer number of things a typical RDMBS does for you -- not just ACID but also data integrity (foreign key constraints), automatic index creation (mostly), and automated schema changes across many records (ALTER TABLE). The cost, of course, is the up-front effort of fitting your data and app to the relational model.
In this model, it's the application's job to anticipate and work around these inconsistencies. Assertions that check for data integrity, if you have them, have to be modified to work around this. For a lot of the web applications we talk about these days, who cares - it's people's lists of friends and TODO notes. It's simple data and some dangling records aren't going to hurt anyone.
In the SQL world, we instead write migration scripts that migrate the structure and data all at once. This is a little more work up front, but as long as you stuck to a mostly normalized form and use great tools (which we now have) this is not a big deal ("MONTHS" to migrate across slaves sounds like they had some very wrong decisions made earlier on). The application can move straight to supporting the new structure and entirely forget that the old one existed. In this world, we can also have really complex relationships between fields, like when we're storing accounting or medical data linked to temporal records of changes. The application can consume this structure without worrying about consistency.
I think the problem is that MySQL isn't really a standard SQL-world db. It has some of the advantages of one, but not all of them and some very annoying gotchas.
Not always. Changing the schema can break an application, in particular when the database supports multiple applications.
> It's simple data and some dangling records aren't going to hurt anyone.
It depends on the application. In healthcare people literally die due to some dangling inconsistent records.
The problem in anticipating and working around these inconsistencies is that the workaround is added ad-hoc in code, rather than through a model defining the data change. You need a model http://chronicdb.com/preparing_schema_changes
Migrations are programs, but your existing live application is also a program. Changing the schema breaks your live application to prepare for the new version of the application.
http://chronicdb.com/blogs/nosql_is_technologically_inferior....
Flexibility of schema definition and flexibility of schema change are two different things. Defining schemas only involves data. But changing schemas involves not just data, but code too.
That isn't a reason to choose NoSQL. That's a reason to-
-doubt MySQL -- many of the purported downsides of SQL solutions (for instance the ridiculous "avoid joins" meme that has zero bearing on any good database product) are actually MySQL problems. Or rather, they were -- the product has made some pretty incredible strides.
-understand and embrace normalization. Most of the "we keep trying to change our schema and SQL is just so restrictive" stories could often be described as "the problems with denormalization".
* The occasional join.
* Complex queries.
* Reporting.
* Schema changes.
* Transactions, with options for trading off strictness and performance.
* Performance problems that are easily addressed using stored procedures.
* Performance tuning with minimal code changes (e.g. adding an index to change the execution plan of a complex query).
* Enforcement of invariants regardless of application code.
Why would you give these up for scalability problems you won't have, with 99.999999% probability?
a) not all code written is for web apps
b) that said recent technology also only fits a niche for modern web apps?
I think you view on what constitutes modern web apps or software in general is biased in favor of some examples who benefit by using mongo/couch/redis whatever. (And are very vocal about it)
[EDIT: spelling, and to add that I have tremendous joy in playing with couch, redis, node lately. But at the same time I try to stay critical: what are the real trade offs when I employ these tools?]
Hmm, let's see. What does a "modern web app" do? Well there is a screen with fields for users to enter something, and there is another screen where things that (other) users have previously entered formatted nicely for display. Behind the scenes, this data is exchanged between machines that do some other processing on it.
They were doing this in the '60s.
Sometimes you just pick the simplest thing that works and that can be a key value store, data structures in memory, or a lucrene interface. It's a better solution and less fragile than trying to force everything into a relational model.
So something I did for fun turned out to be the simplest thing which worked. Only I didn't know that when I did it. I think developers who don't occasionally pick something new for fun on a side project are doomed not to know when that new thing could actually be applicable on a project that matters.
Only ever done this in MySQL, where you really can't alter big, live, tables. Well, facebook made an utility for it: http://www.facebook.com/notes/mysql-at-facebook/online-schem...
The same is true with many technologies and approaches. Projects that consume thousands or tens of thousands of hours, with a toolset chosen because it represented an outset savings of single-digit hours.
Changing in the future is seldom as easy as it seems in those early days.
Yes and no. There are a vastly more hosting options that provide MySQL vs Postgres. So it's not an issue of getting competent with the DB system, it's an issue of using an existing LAMP stack or having to roll your own.
I'm not saying people shouldn't use RDBMS', but there are many reasons for using something else:
- The new databases/stores have different features and use cases. Many include features RDBMS' do not have.
- Having to use table/column's for everything can be quite unnatural and tiresome.
- RDBMS' are battle-tested and their pros & cons are well known. But they might also be based on legacy models and truths that simply no longer holds.
- At least for me, using & learning something new is a big motivation booster :)
Use whatever is right for the job. But one should at least have a general knowledge of whats out there: features, pros & cons etc..True. My view is that the tradeoff comes out overwhelmingly on the RDBMS side most of the time.
"Having to use table/column's for everything can be quite unnatural and tiresome."
Unnatural in the sense that it is different from the first language you learn, yes. What I find tiresome is low-level, record-at-a-time programming when doing what could be expressed concisely as, say, a 5-way join with aggregation.
"RDBMS' are battle-tested and their pros & cons are well known. But they might also be based on legacy models and truths that simply no longer holds."
I really think you need to learn something about RDBMS fundamentals. Set theory certainly "holds" to this day, and is certainly not legacy. The relational model is clumsy for some kinds of data ( (e.g. time series), but saying that the foundation is obsolete is just objectively false.
"At least for me, using & learning something new is a big motivation booster"
No argument there. But now take the next step and do an objective comparison between old and new. I have no problem with NoSQL systems. I even built one. But for nearly all applications, when you account for a wide set of requirements, (not just what is needed to get the MVP up and running), I think an RDBMS is the right tool.
Again, that depends on what you are doing. Converting back and forth between rows/tables and, say, data stored in hashes, is tiresome. It's also unnatural when the data doesn't "fit" in the relational model. Of course it works, but writing all this banal sql is not fun.
"I really think you need to learn something about RDBMS fundamentals...."
Right.. I was talking about the software. Maybe "models" is the wrong word.
"But now take the next step and do an objective comparison between old and new"
Believe it's pretty objective already.
I chuckle to myself when kids who wouldn't know a "legacy" app if it jumped up and bit them on the nose, go through the same thought process over and over again.
Many NoSQL databases have a structure you have to conform to---its just not a relational structure.
However, that doesn't mean it isn't well suited for building a website. You'd probably want to include some add-ons for that though (eg: express).
Note that even for realtime chat-like apps you'll probably need some kind of datastore and in fact that example sounds like a great use case for nosql as a datastore.
Since you asked, my project is over here http://jobstractor.com if you want to take a look. My email is on the site so feel free to drop me a line if you have any questions. I'm planning on writing up my experience and a bit about how I've structured things so some questions might help me think through that.