50 Shades of System Calls

carlsborg 10 years ago |

Brilliant. (I note that the founder of this product co-authored the Winpcap packet capture library on Windows back in the day, and therefore made Ethereal/Wireshark on Windows possible.)

truncate 10 years ago |

Nice work! I sometimes use strace/ltrace and this program would certainly be nice addition my toolbox!

One thing I find useful sometimes for debugging purposes is to actually see the contents of each system call. Do I get to see that when I click on individual boxes here?

degio 10 years ago | |

Blog post author here.

Yes, when you drill down using the mouse you will get the relevant system calls, including the buffers if they are I/O reads and writes.

Also, sysdig and csysdig have pretty advanced system call capture and filtering functionality. See this link for an introduction https://github.com/draios/sysdig/wiki/Sysdig%20User%20Guide

truncate 10 years ago | | |

Thanks for answering. Small suggestion - its kind of hard to find Github link. Perhaps make it more visible.

raldu 10 years ago |

It is fun to observe that everyone might be busy tinkering with mentioned features instead of commenting.

TickleSteve 10 years ago |

This guy talks of 'latencies' when he really means 'durations'.

The latency is the time between stimulation and response, not the overall duration.

In this case the latency of a syscall would be the time between a user program performing a syscall and it starting to operate, not the entire time taken.

viraptor 10 years ago | |

The way people talk about syscalls depends on your context/point of view. For example doing `open()` for the system is duration, because it does the work of opening a file. For the app, it may as well be latency - how long you're stopped before the file is opened.

Since the app doesn't actually execute the open code, it is latency -> from stimulation (called), to response (returned) is the latency of file access. For the system code latency would be waiting for the disk controller. For the disk controller latency would be waiting for the heads/platters.

TickleSteve 10 years ago | | |

I would disagree with that interpretation. I would talk about the duration of the open() call, in the same way as when you're optimising code, you talk about the duration of a function call, not its latency.

Sadly, misuse of the term is rife in software circles.

Latency is correctly used when talking about how long an ISR takes to start running when provoked by an external stimulus, or how long it takes for a task to be scheduled when made ready.

Latency is a very precise term. a syscall, as its name suggests is a 'call', and 'calls' are not considered to be instantaneous.

(Edit: instantaneous, not atomic)

Duration is the correct term for syscalls.

viraptor 10 years ago | | |

> and 'calls' are not considered to be atomic in any way.

A lot of syscalls should be considered atomic with regards to resources. open() has at least two options this applies to. What kind of atomic do you mean?

TickleSteve 10 years ago | | |

Sorry, didn't mean atomic.

Was intending to say that calls are never considered to be instantaneous from either user or sys point of view, they always have duration.

Latency on the other hand is almost always about scheduling, whether OS-level scheduling or hardware (ISR) scheduling.

Another case; network latency is the time between a packet being sent and it being received.... not the time taken to process that packet. That is a direct analogy to what we're talking about here.

viraptor 10 years ago | | |

I still think you need to specify what latency are we talking about. Scheduling latency is about scheduling. Network interface latency is about putting data in the buffer and then on the wire. Network latency is about actually delivering the packet. Web page latency is about time-to-render. First few pages of google results about various types of latency always qualify it with some other word, so it's not clear anymore what people mean if they just say "latency".

These are all both durations in one context and latencies in another. For the app syscalls are "how long did I have to wait for that call to give me a result" so a latency and a duration as well. (just because it's a period of time)

jolynch 10 years ago |

Sysdig makes me giddy like dtrace used to.

I also sometimes feel a bit ... challenged ... by translating the questions I have (e.g. why did this arbitrary program start using a lot of memory and then OOM) into actual sysdig chisel invocations, but I'm learning slowly but surely. This command line spectrogram looks like a really nice addition to the existing toolset!

perlgeek 10 years ago |

Wow, that looks really impressive.

I wonder if the visual representation should somehow emphasize the slow calls more. The example on the page immediately draws the attention to that cluster of many fast calls, when the interesting part for optimizations is likely in the 100ms and slower range.

SEJeff 10 years ago |

Sysdig is absolutely incredible software, but when are you going to work on getting it upstream in the Linux kernel? That will massively lower the barrier to entry and make sysdig "win" so to speak.