50 Shades of System Calls(sysdig.com) |
50 Shades of System Calls(sysdig.com) |
One thing I find useful sometimes for debugging purposes is to actually see the contents of each system call. Do I get to see that when I click on individual boxes here?
Yes, when you drill down using the mouse you will get the relevant system calls, including the buffers if they are I/O reads and writes.
Also, sysdig and csysdig have pretty advanced system call capture and filtering functionality. See this link for an introduction https://github.com/draios/sysdig/wiki/Sysdig%20User%20Guide
The latency is the time between stimulation and response, not the overall duration.
In this case the latency of a syscall would be the time between a user program performing a syscall and it starting to operate, not the entire time taken.
Since the app doesn't actually execute the open code, it is latency -> from stimulation (called), to response (returned) is the latency of file access. For the system code latency would be waiting for the disk controller. For the disk controller latency would be waiting for the heads/platters.
Sadly, misuse of the term is rife in software circles.
Latency is correctly used when talking about how long an ISR takes to start running when provoked by an external stimulus, or how long it takes for a task to be scheduled when made ready.
Latency is a very precise term. a syscall, as its name suggests is a 'call', and 'calls' are not considered to be instantaneous.
(Edit: instantaneous, not atomic)
Duration is the correct term for syscalls.
I also sometimes feel a bit ... challenged ... by translating the questions I have (e.g. why did this arbitrary program start using a lot of memory and then OOM) into actual sysdig chisel invocations, but I'm learning slowly but surely. This command line spectrogram looks like a really nice addition to the existing toolset!
I wonder if the visual representation should somehow emphasize the slow calls more. The example on the page immediately draws the attention to that cluster of many fast calls, when the interesting part for optimizations is likely in the 100ms and slower range.
A lot of syscalls should be considered atomic with regards to resources. open() has at least two options this applies to. What kind of atomic do you mean?
Was intending to say that calls are never considered to be instantaneous from either user or sys point of view, they always have duration.
Latency on the other hand is almost always about scheduling, whether OS-level scheduling or hardware (ISR) scheduling.
Another case; network latency is the time between a packet being sent and it being received.... not the time taken to process that packet. That is a direct analogy to what we're talking about here.
These are all both durations in one context and latencies in another. For the app syscalls are "how long did I have to wait for that call to give me a result" so a latency and a duration as well. (just because it's a period of time)
This indicates that 'web page' latency is nothing to do with redraw-duration, but more akin to network-latency, but for the page as a whole. In comparison to redraw-duration (which is entirely client-side, web-page latency is the time to transit the network... i.e. latency.
In all the above cases, 'latency' is the delay between the stimulus and response, not the processing time.
I'll leave you with the top Google result for 'latency':
"Latency is a time interval between the stimulation and response, or, from a more general point of view, as a time delay between the cause and the effect of some physical change in the system being observed."