Ask HN: Common Patterns for Resiliency in Distributed Systems I am collating ideas on bringing resiliency in distributed systems at scale. I had previously written a article on this here: https://blog.gojekengineering.com/resiliency-in-distributed-systems-efd30f74baf4 Above article includes: 1) Timeouts 2) Retries 3) Circuit breakers 4) Fallbacks 5) Resiliency Testing More patterns I can think of include 6) Rate limiting and Throttling 7) Bulkheading 8) Queuing to decouple tasks from consumers 9) Monitoring/alerting (Observability?) 10) Redundancies Please let me know your experiences with these resiliency patterns. Also please feel to pitch in more other patterns if you have encountered any and was of immense help Thanks for your time :) |