MongoDB, Better(go.rackspace.com) |
MongoDB, Better(go.rackspace.com) |
I'm optimistic that pg9.4 will be our migration path. But regardless, tokumx has given us the breathing room to defer the decision.
CREATE INDEX ON members ((lower(my_json_data->>'email')));
[1] http://www.postgresql.org/docs/9.3/static/functions-json.htm...(It's not full of vitriol and strawmen like most of the Mongo posts that seem to end up here.)
A message I proxied recently: http://lists.openstack.org/pipermail/openstack-dev/2014-Marc...
Most people in a commercial setting aren't modifying the MongoDB source code. AGPL does not require that any software that communicates with AGPL software be AGPL'd.
But yes, if you're using MongoDB in a commercial endeavor, and you modify the source code, and you're using AGPL version, you do need to share your changes to the MongoDB source code.
It is not clear to me exactly what the problem is from the message. Are we now discouraged to use MongoDB as a database (from startup to university), or writing MongoDB database driver based on existing MongoDB language driver such as pymongo or writing on top of MongoDB's database driver?
Whereas if mysql shipped with a similar restriction, you could easily flip the connection strings and have it mostly working on postgres or something else.
1. You are locked into Rackspace as your provider. MongoHQ provides multiple cloud providers.
2. They force you to shard. This increases operational and application complexity ... and may not offer any real advantages for the amount of data that you have.
On [1], you can actually keep your app in AWS and connect / migrate your data to OR over Direct Connect.
On [2], you don't have to shard. There is an option to scale vertically when you need to.
You can call RS people who can help guide you making the best decision on [1] or [2] based on your current and future situations.
No, I'm not trolling. I really want to know : https://news.ycombinator.com/item?id=7446919
The website/app is at https://rwt.to , and an example route search is; from "Milky Way, Johannesburg" to "O.R. Tambo International Airport".
I should note that I've had a look at geocouch and it didn't fit my use case, I'm not doing trivial 'find my 3 places near [y,x]' queries, but am traversing a pseudo-network of routes to calculate directions. Neo4j also wouldn't have worked in my case. TokuMX is based on MongoDB 2.2 as far as I'm aware, so them too.
Of all the "We moved from MongoDB to Cassandra/Riak/etc and gained massively!" I've rarely seen - and its possible that this is selection bias - companies start with the other NoSQL options.
I want to say, that unlike MongoDB - the others actually force you to think about your data and actively decide how you are going to store it. With MongoDB you can pretty much add an index on anything, but with Cassandra (maybe Riak/Dynamo too) you only get one free index before you have to denormalize and write application code to keep your performance.
Then lastly, MongoDB is good enough for most use cases. We didn't see major performance issues until we started constantly writing data to it (high write/low read) (basically we were wrestling with lock contention). I'd wager for a significant amount of MongoDB deployments, not only is Mongo easy to use, but fast enough too.
So while the other NoSQLs are (probably) more complicated and likely more performant, MongoDB, to me, hits a sweet spot of ease of use and performance that is good enough for most applications out there.
However, considering other "raw" technical aspects like performance, durability and scaling I've never seen anything that has shown MongoDB to be a leader.
I think trello uses mongo primarily for production. technically it's feasible but I've found it to be more trouble than it's worth to scale -- too many machines are required per shard. I'm currently looking into rethink db as a replacement now though.
They've made a few changes, like not hardcoding the maximum number of connections and shards anymore, which helps but overall the big problems like database-level locking are serious problems even a year later.
The reasons for choosing it were very simple - the lead developer was familiar with JSON and liked using it for queries, and he liked the "schema-less" nature of document storage. No consideration was given to performance or scaling issues, it was purely a comfort level decision.
There are many large-scale deployments of MongoDB - a simple Google search will yield you results.
Off the top of my head - FourSquare, Stripe, ServerDensity, eBay (non-site) etc.
MongoDB (the company) also uses it for MMS - their cloud-based monitoring system, which probably handles hundreds of thousands of metrics every second from tens of thousands of hosts.
So yes, there is a lot of FUD about "it doesn't scale" etc.
Most of the FUD seems to originate from people not reading the manual, and completely misconfiguring things, and wondering why it doesn't work.
To be fair, most competing products (Riak, Couch etc.) will scale enough for most people. So this is sort of a red herring. (And by the point that you are as big as FourSquare, the assumption is you'll probably hire engineers who will read the manual =) ).
So the decision boils down to other things - how easy is the query language, do you need GeoJSON support, do you need aggregations, how mature is the overall ecosystem etc.
And that's why people are picking MongoDB - not really the WOAH, LOOK AT THE OPS PER SECOND!.
Most of the FUD comes from the deceptive marketing 10gen used to promote MongoDB. It now has a well deserved bad reputation that will never go away,no matter how much startup choosed it.
Also, I used to work for MapBox, and I know we did one project on mongo which I was not involved in, and afterwards we built everything with CouchDB (which is how I got acquainted with it).
For the geo stuff we actually used a lot of sqlite and to a lesser extent spatialite. We would pre-calculate things and build them into the rendered tiles in mbtiles format, or stream the point/polygon data from the couch database for realtime client-side compositing.
But yeah, routing is pretty high level stuff. I think they are only now putting the finishing touches on their openstreetmap driven routing system many years later.
As long as Openstack only use Apache licensed code >>from MondgoDB Inc.<< and diligently avoids using any open source contributions from any community contributor to the MongoDB ecosystem, then you remain compliant the your CLA.
I wouldn't have known the APGL licensed database and the conflict with MongoDB, Inc. licensed code and community code. I guess now all ORM built upon pymongo used in commercial settings is that trouble zone then?
This is a huge bummer. Definitely an alert for those looking forward to use MongoDB at commercial settings as parent said.
Technically, according the AGPL, MongoDB's database drivers licenses (apache) are incompatible with AGPL, and technically should be licensed under AGPL. Now the parent says that should be fine for official drivers because MongoDB isn't going to sue themselves, but the issue is for community drivers, like the Node Driver or the Golang driver. Since the AGPL states that any software built for the exclusive use for the accompanying software must be AGPL - then it follows that community drivers should be AGPL as well.
To me that means not only can you not modify the database, but you cannot modify the drivers. And I'm also unsure if that also means that any applications that link those drivers means that they must be AGPL as well. And if your web application must be AGPL, it also means that the source of whatever service you are providing must be available as well. So in a way it doesn't just affect corporations that want to modify Mongo, it affects everyone who wants to use Mongo (with a community driver atleast).
IANAL
It's not about suing themselves'.
>>> The Affero GPL is designed to close the so-called "application service provider loophole" in the GPL, which lets ASPs use GPL code without distributing their changes back to the open source community. Under the AGPL, if you use code in a web service, you required to open source it.
Could you cite an example please?
For example, there was noise before about how MongoDB was allegedly tweaking benchmarks.
The funny thing was, from what I've read, they've always had a policy of never ever publishing official benchmarks. Their line was, read the manual, and try it with your own data.
Are you aware of any "deceptive marketing"?
Usually first thing you need to do is write a REST layer on top of it, and with CouchDB that part is just done already.
Obviously there's certain kinds of data I wouldn't put in Couch, or any kind of NoSQL database.
You need to know what the right tool for the job is,but I just want to figure out when that tool is mongo.
It's only popular because there's less of a conceptual gap between mongo and the relational database tools that a lot of people are used to.
Couchdb on the other hand requires you to actually learn and use map/reduce.. which is a pain for people who don't feel like having to learn something new, but Couchdb is MUCH MUCH better in a lot of ways and Mongo is pretty much fundamentally flawed in my opinion
I do wish rackspace luck though with their offering. I think it was smart of them to create this mongodb product for one simple reason: a good number of people are already using mongodb so it makes sense to help them get the most out of it.
"I've rarely seen - and its possible that this is selection bias - companies start with the other NoSQL options."
It seems like everyone starts with Mongo, because everyone starts with Mongo.
This means that you don't have the deluge of posts from people moving from other databases,
a) there are much fewer of them b) they chose them for solid technical reasons (not just because everyone does this)
So as for your perception that other NOSQL databases are "probably" more complicated, you should know that complexity is an objective measure. I think that mongo is definitely a lot more objectively complex than couchdb, and from what I have read around the subject, many of the other NoSQL databases.
What Mongo could well be is 'easier', which is relative. It seems like it's more familiar to certain programmers, which is kind of echoed by the fact that there's an incredibly popular object relational mapper (mongoose), that is being used with what is supposedly a non-relational database.
I use those terms in a very specific sense btw, which I documented here - http://daemon.co.za/2014/03/simple-and-easy-vocabulary-to-de...
It's from a very insightful presentation by the creator of the Clojure language, and I only wrote a summary because I got sick of trying to get people to watch an hour-long video before trying to discuss systems on this level.
I've used Mongo in a couple projects where it was a great fit. The scale wasn't huge, but having pre-shaped data for a mostly read scenario was great. I've found that it works really nicely for a lot of situations, and would definitely be a consideration.
I find that document databases work best when your data is read far more than written to, and when you can shape your data structures for simple key reads in most cases combined with indexed searches. I would consider the use of ElasticSearch or RethinkDB in most cases where you might look at MongoDB. It really depends on your needs here.
Riak and Couch offer other advantages, and like anything it really depends. Cassandra is another nice option for larger scalability, but everything has a cost.
Mongo is very reasonable, and to be honest, if you don't need more than a single server for your needs, it's really easy to get up and running quickly, and development tooling is decent enough, and the concepts are pretty easy to get up to speed with.
We had a 5 node cluster in Mongo that we moved to Cassandra last summer. While our experience with Cassandra is by and large much more performant and cost effective than MongoDB, getting setup with Cassandra was not as easy with MongoDB. With MongoDB you can literally start throwing data in your database, then add an index after the fact. With Cassandra we had to make sure our data was modeled correctly, and decide where we would denormalize. Riak from what I remember has a similar data model to Cassandra, and Redis isn't something you just "start up and go" (mainly because its an in memory store).
So I know for a fact that Cassandra, Riak, Dynamo, and Redis are far more complex than MongoDB. Cassandra even requires you run a "repair" command periodically, and that alone makes it more ops work than Cassandra. We can even throw HBase in there too as it requires Zookeeper nodes, Named Nodes, and all that Hadoop goodness.
Now none of these databases are hard to use, but compared to mongo, mongo is a cakewalk. You literally spin it up, throw json inside, and get json back. There is no query writing, and for most cases there is very little ops management. In most cases if a query is slow, you can fix that by adding an index, or moving to SSDs, only once you have exhausted these options do you really have to consider anything else.
FullContact also has a similar story : http://www.fullcontact.com/blog/mongo-to-cassandra-migration... tl;dr Mongo was great for getting the product up and iterating quickly, but then they moved once they thought they needed too. Its my opinion that its far easier to get started with MongoDB that it is to get started with Postgres/MySQL.
Lastly, damn the technical reasons why its so popular, Mongo/10Gen used be a huge marketing engine around ~1.6/8. They captured a lot of developer mindshare and I'd attribute that to why its so popular now as well. Wasn't much longer after that when they naysayers & those hurt by the initial hype came out of the woodwork and we got the now infamous "MongoDB is webscale" video.
TLDR, for hosted mongodb its pretty awesome (ssd hardware in direct connect locations, tuned and managed).
IMO RS's ObjectRocket pages should do a better job showing what you're getting on top just deploying to a bunch of RS cloud servers with SSD's in them.
i find it helps being able to scale them out separately too.
http://daemon.co.za/2012/05/replacing-couchdb-views-with-ela...
Couchdb-lucene seems more tightly integrated with couchdb to to me.
The real problem was that the data was being imported in bulk by the user, from a many-meg-sized csv . It would grind couchdb to a halt trying to build views, so having elasticsearch be a separate process that could work through it made a lot of sense.
It just subscribes to the _changes feed and updates the index, in the same way couch replication works.