A summary of how not to measure latency

A summary of how not to measure latency(bravenewgeek.com)

36 points by juanrossi 10 years ago | 3 comments

cortesoft 10 years ago |

Some of this is good, but the idea that you can determine your likelihood of experiencing a 99th percentile latency on a webpage by the naive probability calculation shown (1 - .99^n where n is the number of objects requested on a page) is silly. That is assuming that latency is completely randomly distributed across all objects and all clients to a page.

This is completely not true. Latency is very dependent on the client requesting and the object being requested. You are going to get clustering, not an even distribution.

JaRail 10 years ago | |

+1. The majority of those requests on major websites don't matter. They are optional elements/scripts and prefetches for possible actions.

Now, if the point is that something will be delayed, that's true. And it's true that many people don't realize that. There's the classic example where if everyone tries to be five minutes early, your group is still going to be late.

The real lesson is to analyse your critical path to death and ensure it is as resilient as possible. And if possible, real data is buckets more meaningful than conventional load tests.

I also don't see anything in here about how to get meaningful metrics from real users. The W3C Navigation Timing API has really shed a ton of light into things commonly forgotten.

jonaf 10 years ago |

Doesn't this assume a single-threaded application? The example of a clerk's service time and people waiting in line is oversimplified. Modern systems have maybe 100 clerks per store, and many stores; how do you perform a "Ctrl+Z" test in this case? Even if you had a perfectly divided line of people waiting at each cashier in each store (machine), the worst case would be experienced people in line for the store or clerk with a reduced service time. Thus, for accuracy, you would need to measure queue depth at the maximum latency per thread (clerk) and add that latency to each subsequent request until you serve the number of reuqests in your queue. This kind of math requires constant sampling that would slow down any system so dramatically it would defeat the purpose. I think this becomes even more clear when you consider that most such systems have load balancing strategies that further mitigate queue depths such that they are intentionally distributed based on which backend services have the lowest historical latencies (and yes, I realize these algorithms are likely plagued by the same "omission conspiracy" mentioned -- but they certainly don't uniformly distribute requests).

In summary, let's focus on the max latency, home in on which backend exhibited said latency, identify the depth of the queue at the time that latency was experienced, and use that information to model the impact to users. From this, I expect you can draw some meaningful percentiles in terms of latency distributions, and without having to measure more data points than feasible without decreasing latency further.

Am I misunderstanding something? I'm no math whiz, this is mostly intuition.