Principles of Sharding for Relational Databases

jchanimal 8 years ago | |

There is a new generation of relational databases that are native to multi-node operation, and don't require sharding. I'm speaking of tech like Google Spanner and my employer, FaunaDB.

Now you don't have to shard. More info on how we accomplish distributed transactions. https://fauna.com/blog/distributed-consistency-at-scale-span...

brianwawok 8 years ago | | |

P.S. Google Spanner and FaunaDB both shard. They can call it something else. But unless every node has all data on it, it is sharded.

freels 8 years ago | | |

It is true that Spanner and FaunaDB partition a cluster's dataset across multiple nodes but it's handled transparently by the database. Whenever I've heard the term "sharding" it's usually in reference to the application-level sharding described in the article.

Partitioning the dataset isn't really novel these days (Cassandra, Riak, Mongo et al do the same of course), but what is a significant difference is that both Spanner and FaunaDB implement ACID transactions distributed across partitions. It no longer matters for application correctness what partition key you choose if you can involve any arbitrary set of records in an single transaction.

ozgune 8 years ago | | |

(Ozgun from Citus Data)

> Whenever I've heard the term "sharding" it's usually in reference to the application-level sharding described in the article.

I wanted to drop a quick clarification note here. In the article, I used the term "sharding" to refer to both application and database level sharding.

For anyone that's looking at sharding as an option for scaling, we're always happy to chat and help point you in the right direction. My email's ozgun @ citusdata.com

If you're looking databases that come with built-in sharding, I'd definitely check out Citus (then again, I'm biased): https://www.citusdata.com/

zaptheimpaler 8 years ago | | |

Abstractions are leaky. As soon as you run into a query that runs fast when the data is on one shard and slow when its not, you now need to know about sharding.

berns 8 years ago | | |

The title tag of Spanner home page reads: Cloud Spanner | Automatic Sharding with Transactional Consistency at Scale

dboreham 8 years ago | | |

_You_ don't have to, but someone does.

novembermike 8 years ago | | |

Right, but it's the default state of the database.

sigstoat 8 years ago | |

http://www.fixstars.com/en/ssd/ ?

if all you need is a lot of data in a single database, there's basically nothing except for money between you and your goal. JBODs full of SSDs coming into a single machine via SAS will get you into petabytes, just with commodity hardware you can order from amazon.

i'm expect IBM could sell you a mainframe that'll do it for whatever capacity you care to name.

user5994461 8 years ago | | |

The thing is that 5TB of company data cannot reasonably be kept in JBOD on the cheapest drives you could find on Amazon.

sigstoat 8 years ago | | |

if you're insisting on using the cheapest drives on amazon, you probably can't fit 5TB of data into a small room worth of computers.

if you're a reasonable person, you buy 6 1TB samsung SSDs and stuff them into a single 2U case and you're done.

user5994461 8 years ago | | |

And you're gonna cry when the RAID0/JBOD fails and you lose all your data.

Let's not pretend there is anything reasonable in this setup.

gjjrfcbugxbhf 8 years ago | | |

Presumably your data is (1) mirrored to another similar server ready to replace in case of failure and (2) also regularly backed up to two off-site locations...

icedchai 8 years ago | | |

Why do you want the cheapest drives? Do you value your application data?

zbobet2012 8 years ago | |

You can _easily_ buy a box with 60+TB of SSD...

http://www.dell.com/en-us/work/shop/povw/poweredge-r930

Some of us do need to shard for sure though (I have multi petabyte data sets).

kakwa_ 8 years ago | | |

Storing 60+TB of data is different than searching and doing complex computation on 60TB of data.

Also, operations on a such huge data set can be really painful. Think how to backup a DB like that safely, or how to update the engine.

Some slides (little old, 2014) about a huge postgres instance serving as a backend for leboncoin.fr (main classified advertising website in France).

https://fr.slideshare.net/jlb666/pgday-fr-2014-presentation-...

Basically, they bought the best hardware money could buy at the time to scale vertically, they, in the end, run in some issues and started thinking about sharding this huge DB.

zbobet2012 8 years ago | | |

Absolutely. Queries on 60TB of data can certainly merit more than one box. Hell, queries on 1TB of data can merit more than one box.

I have a workload that runs close to 1.2million TPS for hours at a time and needs less than 100 millisecond response times at the 99th percentile. That uses more than 1 box and sits (replicated) in RAM.

However, 5TB of data really _isn't_ that much on modern SSD's. You can fit a sizable chunk of that in RAM on a decent server, so you probably _don't_ need more than one box.

I have 5TB of data that needs to sit on an SSD is, to be honest, a really poor performance metric. If you are genuinely specing out hardware and a database a better statement would be:

"I have 5TB of relational data, with a pareto distribution for access, at a peak of 100K TPS". Then we can start talking about what solves the problem.

morgo 8 years ago | |

I'm the original author of the linked post you refer to.

Perhaps the title is click bait, but at the time I was meeting with a lot of users looking for someone else's problems.

5TB could still easily be single server territory. It depends more on the queries.

My point is just that some workloads are better solved with (some) vertical scaling first.

grw_ 8 years ago | |

Sorry if this is obvious, but why not use multiple drives?

jajern 8 years ago | | |

Technically you can. You can run a SSD array in RAID5, 6 or 10, and lose a little bit of performance but get the capacity needed.

metaphorm 8 years ago | | |

because that is nearly a synonym for sharding?

or did you mean just use a large capacity RAID setup? that will probably work fine for a lot of situations but it can expensive and introduce more latency for certain types of operations (but that might not matter, depends on context).

gnaritas 8 years ago | | |

It's a server, you don't put databases on non RAID drives anyway, that'd be silly as the loss of a single drive would lose all your data. RAID setup isn't an option, it's how things are done by default for servers in order to ensure redundancy against a single disk failure.

PaulHoule 8 years ago | |

You get one of these

https://petapixel.com/2015/08/15/samsung-16tb-ssd-is-the-wor...

kabdib 8 years ago | |

Um, we run many databases of 20-30TB, some well over 100TB. We use SQL Server, and it just allocates more files. It's not zero touch, but with the right storage technology it's not bad, either.

qaq 8 years ago | |

did you loose a 0 somewhere? Even something crappy like RDS will be able to help you with 5TB database. you could do more than 10 on i3.16xlarge or if you are ok running on own/leased hardware you could 20-30 for PG or 100+ for commercial DBs.

gnaritas 8 years ago | |

> But if you got 5 TB of data, that needs to be in a SSD drive, then please tell me how I can get that into 1 single physical database.

Drive capacity in a server is not limited to the size of a single drive. You can build a raid array any size you like by simply adding more drives.