Ask HN: Common Patterns for Resiliency in Distributed Systems

2 points by rshetty 8 years ago | 2 comments

I am collating ideas on bringing resiliency in distributed systems at scale. I had previously written a article on this here: https://blog.gojekengineering.com/resiliency-in-distributed-systems-efd30f74baf4

Above article includes:

1) Timeouts

2) Retries

3) Circuit breakers

4) Fallbacks

5) Resiliency Testing

More patterns I can think of include

6) Rate limiting and Throttling

7) Bulkheading

8) Queuing to decouple tasks from consumers

9) Monitoring/alerting (Observability?)

10) Redundancies

Please let me know your experiences with these resiliency patterns. Also please feel to pitch in more other patterns if you have encountered any and was of immense help

Thanks for your time :)

anildigital 8 years ago |

Having worked with Elixir for a small web app and these days exploring Akka. I am curious how you are able to achieve above resiliency patterns compared something Akka offers. Fundamentals wise these are concepts, but I am doubtful of handling these concerns properly without Actor model. I haven't read many comparisons between Kubernetes and fault tolerant abstractions such as supervisors etc that Akka provides. Hopefully, you would cover those.

rshetty 8 years ago | |

It is totally possible to implement these patterns without Akka. The blog post highlights the way be implemented some of these patterns.