Columnar Storage(the-paper-trail.org) |
Columnar Storage(the-paper-trail.org) |
Current Sensage company blurb about the event-data warehouse: http://sensage.com/content/clustered-columnar-database and http://sensage.com/content/why-columnar%E2%80%A6not-row-base...
Patent work: http://www.patentgenius.com/patent/7024414.html
The core engineering team was CTO + 3 engineers. Best engineering experience of my life. I wasn't involved at the lowest DB storage level, the guys who did that did a great job.
Michael Stonebraker, technical advisor to Sensage, learned from the Sensage mistakes and built Vertica.
[1] http://kx.com/
In addition. If you considere the record format of traditional row-oriented databases you will see that the overhead of storing a single attribute record is rather high. Since with column-oriented DBMS its all about IO performance (Disk/Memory, Memory/CPU) such overhead can diminish the advantage.
Thus typical column stores tend to use only single strings of sequential memory to store the data. This can even be enhanced by applying dictionary compression and as a result only storing integer values. And modern CPUs are good in processing lots of them.
See the paper "Column-stores vs. row-stores: how different are they really?" in SIGMOD 2008 for performance comparisons between C-store and approaches of emulating column-store in row-store databases.
In database systems it is important to distinguish between the logical and physical models.
When you design a relational database, you focus on the correct logical model.
"People can have multiple phone numbers".
"Phone numbers belong to a single phone".
"Mobile phones are possessed by one person. Landlines can be shared".
And so on. You express this logical model to the database, most likely in SQL.
Eventually you notice that query X is slow. Your first step is to check that your logical design was sound, because poorly designed schemata are hard for query planners to reason correctly about.
Then you start doing things to the physical representation. You say stuff like:
"I look up by phone numbers a lot."
Or, in SQL terms, you add an index to a column.
Similarly, as this article pointed out, there are times when grouping data by column rather than row is advantageous. So then you tell the database to use a columnar store.
And so on. Modern RDBMSes all support the same major logical model descriptions; but they can vary widely on what physical directives you give.