Failure Friday: How We Ensure PagerDuty is Always Reliable(blog.pagerduty.com) |
Failure Friday: How We Ensure PagerDuty is Always Reliable(blog.pagerduty.com) |
The simian army isn't AWS only. :) Some of it runs on other stacks.
And the best part is, it is open source! So if you wanted to leverage the simian army, it wouldn't be that hard to modify it to run on whatever stack you want and then submit the changes back. :)
The other thing we like is the integration with HipChat to deliver alerts into our NOC chat room.
Overall we've been quite impressed....will be more impressed if you folks run into actual trouble but we still get our alerts :)
We do occasionally post about trouble that we've survived, http://blog.pagerduty.com/2012/07/a-utc-leap-second-vs-derec... caused us some mild stress but no missed alerts.
I like that tip on how to simulate a slow network too.
Great post!.