Announcing MoSQL

390 points by nelhage 13 years ago | 113 comments

gfodor 13 years ago |

FYI you can store unstructured data in PostgreSQL (and query it) with the introduction of hstore. So knock one more reason to use MongoDB instead of PostgreSQL off your list. (Disclaimer: the length of my list to use MongoDB has always been a constant that is less than one.)

http://www.postgresql.org/docs/9.1/static/hstore.html

mrkurt 13 years ago | |

Wow, hstore really isn't a great alternative to an actual document DB. The "better" Postgres option would be a JSON type and functional indexes.

shawn-butler 13 years ago | | |

There is a JSON type but it just validates content.

HSTORE can be fully indexed (gIST and GIN). Just have to roll your own object graphs for nesting if that's what you need to do.

I swear I have typed this exact same comment previously. Deja vu, maybe

vog 13 years ago | | |

> JSON type and functional indexes

Those "Indexes on Expressions" are really a great feature that can also be combined with XML (not just JSON) and any other types. I recommend everyone to have a look at those:

http://www.postgresql.org/docs/9.2/static/indexes-expression...

taligent 13 years ago | |

> So knock one more reason to use MongoDB instead of PostgreSQL off your list.

One of the reasons MongoDB is so popular is because it is an fantastic database for developers. As a Java developer I can deal in my code with sets, hashmaps, embedded structures and have it effectively map 1-1 in the database. It's akin to an object database meaning you can focus higher up in the stack.

With the SQL ORMs you can't avoid having to deal with the ER model.

gfodor 13 years ago | | |

Of course, the problem with that approach is you don't have anything enforcing any sort of data integrity below the application. In my experience most of the time you actually can put down on paper a schema and a set of rules the data should obey without too much fear of it changing dramatically. The nice thing about hstore is it allows you the flexibility to introduce unstructured data in just the places where a schema is unknowable or not worth the complexity.

MongoDB et all basically are built around the assumption that a schema is never worth the complexity. It's a bold claim that contradicts many decades worth of database research.

wheaties 13 years ago | | |

Please don't confuse problems with SQL ORMs with SQL itself. SQL stores are powerful, flexible, and quite easily queryable. MongoDB is only a good database for developers if it solves the problems that you need to solve in a way that causes no impedance mismatch.

And for the record, we use both a SQL store, Redis and MongoDB where the use case suits it where I work.

physcab 13 years ago |

This is pretty cool but I'm struggling to see what the use cases are, atleast for analysis. There might be quite a bit of benefits for running application code that I'm not aware of. With regards to analysis though, their own example question is "what happened last night?" but then they go on to say that it is a near real-time data store. Does it matter that it is a real-time mirror then?

I've always liked the paradigm of doing analysis on "slower" data stores, such as Hadoop+Hive or Vertica if you have the money. Decoupling analysis tools from application tools is both convenient and necessary as your organization and data scales.

dugmartin 13 years ago |

Reading the headline I thought they were introducing a SQL like interface to their API, sort of like FQL for Facebook and I got a little excited. Something like this to get the email addresses of all your active trial subscribers:

SELECT c.email FROM customers c, subscriptions s WHERE c.subscription_id = s.id AND s.status = "active" and s.trial_start IS NOT NULL;

(where of course the customer and subscription tables would be a virtual view on your customers and subscriptions)

pc 13 years ago | |

Hm, that could be pretty cool actually. Especially if we also added a REPL for interactive queries at manage.stripe.com.

dugmartin 13 years ago | | |

You're welcome Patrick. I'd recommend looking at Antlr4 to parse the "StripeSQL" commands.

cglace 13 years ago | |

Isn't there a postgres foreign data wrapper than lets you do this?

jabagonuts 13 years ago |

At what point do you abandon mongodb and just use postgresql?

danielpal 13 years ago | |

At no-point. Stripe is doing it right. They are using the right tool for each job. Mongo for storage speed etc and then postgres to analyze query etc.

This kind of comment shows how little knowledge you have about NoSQL and SQL. Is not a SQL vs NoSQL, it's about using the right technology for the job.

seanwoods 13 years ago | | |

> This kind of comment shows how little knowledge you have about NoSQL and SQL.

The question is perfectly valid. In many scenarios (not necessarily Stripe's), PostgreSQL is fast enough to do the job. Stop putting people down for legitimate engineering questions.

eksith 13 years ago | | |

>This kind of comment shows how little knowledge you have about NoSQL and SQL.

Try not to be condescending and your point will be better received. "Right technology" as I'm sure you're aware, has as much to do with subjectivity as appropriateness. Familiarity, workflow, ease of use (and did I mention familiarity?) cannot be overstated even when the perceived benefits are considered.

Read: religion.

Some of the people who rally against NoSQL may be deriding it from a knee jerk reaction, however others are simply frustrated with developers who, as Ted Dziuba would say, "value technological purity over gettin' shit done".

dennis82 13 years ago | | |

are you kidding me? There is absolutely NO reason whatsoever to use a NoSQL database for a financial services company. Postgres is more than capable of sustaining the necessary speeds of a startup.

Relational databases were created in the first place to solve these very problems around transactionality and analytics for finance.

This library is a beautiful example of reinventing the wheel, and otherwise creating a patchwork of unnecessary - and ultimately brittle - infrastructure.

lucian1900 13 years ago | | |

The only advantage MongoDB has over Postgres is built-in sharding, and even that is of dubious value.

nodesocket 13 years ago |

10gen also has a nice python app which syncs by tailing the MongoDB oplog to an external source. Most common is Solr.

https://github.com/10gen-labs/mongo-connector/tree/master/mo...

Seems to be high quality, and supports replica sets.

e1ven 13 years ago |

Very neat project. I can see several use-cases for this where I work- It'd be nice to have alternatives means of searching through data.

I'd also like to mention a project I've been contributing to, Mongolike

[My fork is at https://github.com/e1ven/mongolike , once it's merged upstream, that version will be the preferred one ;) ]

It implements mongo-like structures on TOP of Postgres. This has allowed me to support both Mongo and Postgres for a project I'm working on.

Ensorceled 13 years ago |

Nice. Real businesses need a data warehouse and SQL is the right tool for that job.

I thank them for releasing this.

thesis 13 years ago | |

Maybe I'm misunderstanding your comment but... Real businesses need real solutions for their use cases. SQL is not necessarily the right tool for "that" job.

PommeDeTerre 13 years ago | | |

If such a "solution" involves safely querying, analyzing, storing and manipulating data in any way, SQL and relational databases are usually the best option in practice.

It's much more effective and efficient to use a SQL query than it is to throw together a huge amount of imperative JavaScript code (that's usually very specific to a single NoSQL database, as well) merely to perform the equivalent query.

It's much safer to use a database that offers true support for transactions and constraints, rather than trying to hack together that functionality in some Ruby or PHP data layer code, or relying on some vague promise of "eventual consistency", for instance.

It's much more maintainable, and leads to higher-quality data, to spend some time thinking about a schema, rather than just arbitrarily throwing data into a schema-less system, and then having to deal with the lack of a schema throughout any application code that's ever written.

Aside from an extremely small and limited handful of situations (Google and Facebook, for instance), relational databases are the best tool for the job.

taligent 13 years ago | |

> Real businesses need a data warehouse and SQL is the right tool for that job.

Honestly. I don't think you could be more misinformed if you tried.

Hint: Google "Big Data".

knightni 13 years ago | | |

...data warehouses in general mostly use SQL, and lots of businesses use data warehouses successfully. Teradata, Netezza, Oracle, DB2, etc. I'm not sure why his statement was controversial - SQL's a great language for reporting and analytics.

wging 13 years ago | | |

http://pages.cs.wisc.edu/~jignesh/publ/underattack.pdf

Ingaz 13 years ago |

I thought that "young" NoSQLs sometime in will got SQL interface.

Look at old NoSQLs: Intersystems Cache got SQL interface, GT.M (in PIP-framework) also got SQL.

My impression that MongoDB looks a lot like MUMPS storage with globals in JSON.

andrewjshults 13 years ago |

Is there currently support for "unrolling" arrays or hashes into tables of their own? If not, would definitely be interested in helping to add that on (we use arrays on documents quite a bit, but have run into a number of situations where a simple SQL query for analysis could have quickly replaced a bunch of mongo scripts).

dcraw 13 years ago | |

I've added that capability to mongo_fdw, which I use for getmetrica.com. I'll be contributing it back soon (after that 9.2 API conversion). Would be happy to talk to you about the wrapper or Metrica. Email's in my profile.

andrewjshults 13 years ago | | |

FYI, the email field from the profile doesn't actually get displayed publicly. Mine is (username) @gmail.com

nelhage 13 years ago | |

There isn't support. It's definitely something I've pondered. If you're interested in adding support, I'd be happy to hear from you at (my username) AT stripe.com.

andrewjshults 13 years ago | | |

Email sent!

ElGatoVolador 13 years ago |

If you need to make a tool(and use twice the amount of storage) to be able to "query your data" in a SQL manner while using noSQL, it probably means you are using the wrong tool for the job.

j-kidd 13 years ago | |

Actually, it is pretty common to replicate the transactional data into another data store for analytical purpose. However, using PostgreSQL as the OLAP data store may not be the wisest move.

hgimenez 13 years ago |

Author of MoSQL, did you consider just using the MongoDB FTW instead? https://github.com/citusdata/mongo_fdw

nelhage 13 years ago | |

(I wrote MoSQL)

I actually played with mongo_fdw. At this point, it's a really cute hack, and useful for some things, but it doesn't give Postgres enough information and knobs to really let the query planner work effectively, so it ends up being really slow for complex things. I do love the concept, though.

BlackJack 13 years ago | | |

What were your thoughts on MongoConnector? (https://github.com/10gen-labs/mongo-connector/tree/master/mo...)

bryanjos 13 years ago |

I love this idea. I can see myself using MoSQL pretty soon. Does it handle geospatial data? Can it replicate geospatial data from Mongo to a Geometry data type in Postgres?

danso 13 years ago |

Out of curiousity, but what is the rest of Stripe's stack like? Ruby, apparently, but I'm assuming they don't use any kind of Mongo ORM at all.

scragg 13 years ago |

Someone should write a client library so you can do ad hoc data aggregation queries without using SQL. You can call it NoMoSQL :)

meaty 13 years ago |

Also useful when MongoDB blows chunks because it was a crap architectural decision and you quickly port your app to raw SQL...

arthulia 13 years ago |

Can't wait for NoMoSQL

Uchikoma 13 years ago |

Waiting for BroSQL.

govindkabra31 13 years ago |

how do you deal with sharded mongo clusters?

umur 13 years ago | |

(disclosure: I'm one of the founders at Citus Data)

hey, one way to do that is to use the MongoDB foreign data wrapper - also mentioned in some of the earlier threads.

mongo_fdw (https://github.com/citusdata/mongo_fdw) allows you to run SQL on MongoDB on a single node. Citus Data allows you to parallelize your SQL queries across multiple nodes (in this case, multiple MongoDB instances) by just syncing shard metadata. So you would effectively run SQL on a sharded mongo cluster without moving the data anywhere else.

another idea could be to use MoSQL to neatly replicate each mongo instance to a separate PostgreSQL instance, and then use Citus Data to run distributed SQL queries across the resulting PostgreSQL cluster.

dschiptsov 13 years ago |

MongoDB is great for a lot of reasons - record-level locking? multiple concurrent writes? append-only journals?

I have read than in version 2.x they announce some features, so, it is greatness?