Joy and Pain of Using Google BigTable

Joy and Pain of Using Google BigTable(syslog.ravelin.com)

70 points by jonomacd 7 years ago | 20 comments

cortesi 7 years ago |

This matches my experience with BigTable, down to the short-duration failure spikes.

I feel that something should be said on the plus side of the ledger here. I'm the solo founder of a company that indexes huge amounts of fine-grained information. Bigtable is the key technology that let me start my company on my own: it soaks up all the data we can throw at it, with almost zero maintenance. Even within the stable of GCP technologies it stands out as being particularly reliable.

My biggest "problem" with BigTable is the lack of public information on schema design - which in this context is mostly the art of designing key structures to solve specific problems. I've come up with sensible strategies, but much of it was far from obvious. I can't help but feel that there should be a body of prior art I could draw on.

rrdharan 7 years ago | |

Disclosure: I work on Google Cloud Bigtable.

You might find this talk from a recent Google Cloud event useful in this regard:

Visualizing Cloud Bigtable Access Patterns at Twitter for Optimizing Analytics (Cloud Next '18) https://www.youtube.com/watch?v=3QHGhnHx5HQ

jonomacd 7 years ago | |

Yeah, it does sound like I am really down on the product but it does some things very well. If you can work around the very short lived unavailability it is fantastic.

yongjik 7 years ago |

Random inside joke I overheard about ten years ago:

"It's called BigTable, not FastTable or AvailableTable!"

...It's probably a bad idea to evaluate 2019's BigTable based on the joke, but my puerile mind still find it amusing. :)

fastest963 7 years ago |

We are a user of BigTable, 30k writes/sec and 300k reads/sec, and compared to the other managed services (Pub/Sub, Memorystore, etc), it has been the most stable by far, but we have to scale up our node count at times when we don't think we should have to (based on the perf described in the docs) as well as the latency/errors described in the article. They also added storage caps based on node count last year that increased our costs dramatically.

The Key Visualizer has been a huge help but there's still not enough metrics and tooling to understand when things do go wrong or what is happening behind the scenes. Luckily we have a cache sitting in front of Bigtable for reads that allows us to absorb most of the described intermittent issues because cost has prevented us from doing any sort of replication.

jonomacd 7 years ago | |

Interestingly, for us scaling up doesn't really solve the short term unavailability. It seems to be only somewhat related to load as it does seem to hit more often at high traffic times but we have also seen it at low traffic times.

Putting in that cache is a great move. Cache is challenging for us as we get hits over a very wide range of keys.

HenryBemis 7 years ago |

Reading the article the following quote got my attention "you should always keep things simple even if your tools allow for more complex patterns".

I follow the "it is perfect when you don't need to remove anything else" rule in most systems/processes/functions/tasks in life (not only IT systems). I am happy to see in this cluttered space called IT there are many more like-minded people who see that too much is TOO much.

SEJeff 7 years ago | |

Obligatory, "Does this function bring you joy?"

codeisawesome 7 years ago | | |

It's a beautiful thing that this comment exists :-D

radsftw694 7 years ago |

As another (former) user of cloud bigtable (migrating from cassandra) we saw almost identical results. Great performance when it works, but regular periods of unavailability (this was around 2-3 years ago at this point). Interesting to hear that they still have the same problems. Had a similar experience spending time with the cloud bigtable team but they never really got to the bottom of it.

jonomacd 7 years ago | |

Yeah, it was quite frustrating trying to figure out what was going on. Up until replication was released, it makes it a real non-starter for a lot of use cases. With replication you can combat the problem and it does give you great performance (when it isn't giving you random errors).

puzzle 7 years ago | | |

When I was at Google, you were not supposed to serve end users straight out of BigTable. You had to do extra work: request hedging against multiple replicas (Jeff Dean has mentioned this up in public many times, with numbers on long-tail latency impact), some in-memory caching if appropriate, etc. In other words, you had to protect end users from Bigtable: after all, its original target was the web crawling and indexing pipelines. The problem is that, as you both say, it works very well the vast majority of the time, so people tended to get spoiled and/or make assumptions. Which is why Spanner was created.

As to those hiccups, unless they last for minutes or hours, in which case you might have a case of data corruption (BT is paranoid and rereads data right after any kind of compaction), most of the time they might be explained by, in approximately increasing order of badness:

- an orderly tablet server restart, e.g. for a binary update or because a Borg machine is undergoing a kernel update

- a tablet server crash: a software crash or a hardware one (this is bad, because there's a timeout that needs to be hit before a new server can take over the shard. The BT paper has details about the recovery protocol.)

- heavy load on the master, while either of the previous two are happening

- I don't think any of the various types of compactions would normally block reads/writes, but with some abnormal traffic patterns you might be able to make the tablet server suffer

- slowness at the lower layer, GFS/Colossus (although it mitigates a bit against this by having two separate log files into which it can write)

- Chubby outage

- power outage affecting a good chunk of or the entire cluster

abalone 7 years ago |

Worth noting his original reason for moving away from DynamoDB is outdated. DynamoDB added an “adaptive capacity” feature to handle hot partitions.[1]

[1] https://aws.amazon.com/blogs/database/how-amazon-dynamodb-ad...

ses1984 7 years ago | |

It's still expensive though, right?

abalone 7 years ago | | |

Compared to what? You'll have to be more specific.

Definitely adaptive capacity targets the primary reason people had to overprovision DynamoDB. It changes the entire calculation and obsoletes all the advice you might have heard based on experiences prior to late 2017.

jonomacd 7 years ago | |

Yes, I have heard this! Great news they have sorted that out. Might be worth us looking at it again though we are hosted in GCP now so it is less tempting...

hcnews 7 years ago |

"Unfortunately, we do multiple operations on Bigtable in one request to our api and we rely on strong consistency between those operations."

I feel like "strong consistency" is misused here. Strongly consistent is relevant only in a distributed environment. Its usually solved by using paxos/raft between the replicas. Bigtable only has had best-effort replication, so I am not sure its being mentioned here. I think they are looking for the term serial, that their queries have to be executed in a specific order for a particular user request.

jonomacd 7 years ago | |

Bigtable is a distributed database. It does best effort replication across regions, hence eventual consistency. But within a region it is still distributed across nodes and provides strong consistency.

draw_down 7 years ago |

I really, really, really hate unexplained problems like the one described here. Not in storage but any facet of computing. It's true that the systems we build and work on are complex, but they are also ultimately deterministic, and there is a reason why something goes wrong like TFA describes. Ideally we would seek to understand our systems before continuing to add features to them, but of course the real world often doesn't work that way.

This would be a super frustrating situation for me, particularly when you're not given the tools you need to diagnose in the first place, and you loop in support but they still can't help you identify what's wrong.

Years ago, I worked on a .NET system that sometimes would respond super slowly and we didn't have a concrete explanation for why. As in TFA, we developed a kind of religion about it. "Oh, it must be JITting", that sort of stuff.