eBPF Is Awesome(filipnikolovski.com) |
eBPF Is Awesome(filipnikolovski.com) |
I can't get my head around using it that way, and have found it pretty straightforward to just write C programs, compiled with clang `-target bpf`. Until very recently, writing anything interesting this way required you to declare all functions inline, compile into a single ELF .o, and, of course, avoid most loops. But most of the kinds of things you'd write in BPF tend not to be especially loopy (you can factor most algorithmic code out into userland, communicating with BPF using maps).
A big issue for this kind of development is kernel compat; struct layouts can change from release to release, for instance. This isn't a problem for us at Fly, because we just run the same kernel everywhere, but it's a real problem if you're trying to ship a tool for other people's systems. But that's changing with CO-RE; recent kernels can export a simplified symbol table in a BPF-legible format called BTF, and the leader can perform relocations. Facebook has written a bunch of good stuff about this:
https://facebookmicrosites.github.io/bpf/blog/2020/02/20/bcc...
It seems somewhat unavoidable, if the goal is to introspect the kernel at a very intimate level ...
https://github.com/weaveworks/tcptracer-bpf/blob/cd53e7c84ba...
This was done (by Kinvolk) for the visualisation tool Weave Scope; also picked up by DataDog https://github.com/DataDog/datadog-process-agent/tree/master...
I'm not sure the state of them at this point, but it's the same paradigm GP mentioned.
The documentation is also pretty dire, but it's mostly implement-once remember-forever in my experience - it's all there but kernel samples are quite hard to read, and I'd rather not guess based on struct listings (e.g. variable length structs aren't particularly fun when you're fumbling around)
JIT compilation means compiling code on the fly right when you're about to execute it. When the code is part of a larger program, jitting allows to compile the parts of it you actually need and avoid wasting time to compile other parts. It also allows to compile the relevant code paths based on dynamic flow analysis, which often involves interpreting your program the first time you run it and emitting instructions for the next time around (tracing)
If the code unit is small and you know you're going to run it all, you can compile it in one swoop. If you compile it when you load it in the kernel, as opposed to compiling it lazily right before you run it the first time, i think it's fair to call this Ahead-of-Time compilation, even if the compilation happens right next the use site and not as part of the developer tool chain
https://blog.habets.se/2020/11/BPF-the-future-of-configs.htm...
Isn't it actually running a user program in kernel space?
(I've asked this before, but haven't gotten any response, and no clear answer from Google/DDG either).
Can eBPF be used for observability using platforms like Java or .net core, or does their platform VMs obfuscate too much and monitoring them using eBPF is not feasible?
How does eBPF work wrt OpenTelemetry etc.? Should OpenTelemetry be seen as standardized interfaces to which eBPF reports data?
OpenTelemetry is just a reference API. You could export metrics using eBPF as well. I'm pretty sure Sysdig does this for example.
http://www.brendangregg.com/blog/2014-06-12/java-flame-graph...
Pseudo? This is a nit, but isn't it actually regular C?
Your toolbox can be used to fix things, but eBPF is a factory for making new types of tools and toolboxes.
eBPF can be used to make small programs that run at tracing points, thus making dtrace. But it can also be made to make packet filter decisions (thus altering what happens), and with at least one network card that eBPF program can be pushed to the network card and filter before the packet even hits RAM, much less the CPU!
eBPF can run at socket init time, and set some default TCP tuning parameters.
Another comment in this thread asked if one can write a whole device driver in eBPF. The answer is actually not clear.
eBPF is more similar to "the ability to load kernel modules" than it is "a tracing framework".
Sadly, I don't program in Linux, so I can't use it. :'(
[0]https://github.com/microsoft/perfview/blob/master/documentat...
My general development skill (in Linux or otherwise) has definitely improved since I became a Linux native. But that didn't happen overnight.
Imagine when someone comes up with a revolutionary new paging method, but it causes everyone's eBPF scripts to fail to load and a bunch of tools to break...
I find the Cilium libraries sort of hit-or-miss† but they mostly work well, but, again, I just build my BPF programs themselves with Makefiles into .o's, and use Cilium (or, for XDP/TC, iproute2) to load them.
The time to translate the bytecodes is taken ahead of time, when the eBPF is installed, rather than the first time some other system call needs to actually execute the code.
It is not hard to imagine a system removing and installing eBPF bits dynamically according to realtime events, but it is a fair bet that in most uses they are set up at program start time and left in place.
The JIT isn't necessarily turned on so it's probably not a great idea in the first place
That's a fascinating read and an amazing idea. To your knowledge are there any other software ecosystems that track stability in nearly as formalized a way? Has there been investigation into bringing these ideas into other modern languages? (I don't believe Rust has a concept like this, for instance, though it would even further strengthen the language's concept of correctness if it did!)
Something of a funny story that this brings to mind: the taxonomy we have here is actually the interface taxonomy from what was Sun's Platform Software Architecture Review Committee (PSARC), which itself borrowed it from Sun's larger Software Development Framework (SDF). We had to get DTrace reviewed by PSARC, which we weren't necessarily looking forward to -- in part because of big developments like this one. To get past our PSARC review, we adopted several strategies, one of which was to separate out DTrace from its instrumentation providers as separate cases before the committee. When we first presented DTrace to PSARC, committee members wanted to fixate on instrumentation methodology -- and it was very helpful to be able to defer these fixations to later cases (after having let members pontificate and chew up some of the clock, of course). The other technique that we developed (which was devastatingly effective) was to distract the committee with issues that were irrelevant but amenable to debate. When a debate emerged among the committee members (and PSARC being more or less a debating society, this was practically guaranteed), we would effectively feed both sides of the debate -- and in the end, run out the clock on something we didn't care about. All of this worked exceedingly well -- and DTrace itself (one of the largest cases that had ever appeared before PSARC) was approved with essentially no modifications.
Shortly after the DTrace case was approved, we started bringing forward cases on instrumentation providers. With each case, we presented the stability matrix of that particular provider; on the first such case, I remember vividly one committee member asking: "what the hell is this and when do we review it?!" We explained that it was the stability matrix -- as explained at length in the case that they had in fact already approved. They realized in an instant that they had fixated on a dinghy of nomenclature while we had slipped behind them an ocean liner of semantics -- and it was glorious.
It's not exactly the same but this reminds me of the way Matt Stone described his interactions with the MPAA board in This Film Is Not Yet Rated (https://en.wikipedia.org/wiki/This_Film_Is_Not_Yet_Rated).
i.e. they went into the Team America rating negotiation with aggressive material they were prepared to cut, and probably wanted to cut anyway, and let the committee spend all their time on that.
See also (NSFW):
https://youtu.be/SgyG8y1vg1M?t=151
https://lettersofnote.com/2009/09/30/p-s-this-is-my-favorite...