C++ in the Linux Kernel(threatstack.com) |
C++ in the Linux Kernel(threatstack.com) |
that would be very weird on Linux. The x86_64 linux ABI mandates that the first arguments go on registers afaik (I'm assuming x86_64 here since the post mentions linux distros which are overwhelmingly x64). What compiler would default to a pure stack-based calling convention ? Certainly not GCC or clang, no ?
> and the kernel expected my code to pass arguments in the registers.
so, how is that a problem with C++ and not the compiler defaults ?
> I found the real gold mine of C++ kernel module development knowledge in OSDev.org. They have an entire article on C++ issues. That includes avoiding templates
bullshit it is then. https://www.youtube.com/watch?v=A_saS93Clgk
if templates are good (sometimes better even) on AVR microcontrollers with memory in kilobytes, there's no reason to not use them in a kernel meant to run on large embedded.
Also what's that rant about strings for ? In the end there is zero substance to this article, only very strange rants.
The System V i386 ABI passes parameters through the stack. Perhaps that is what the author is referring to, although I wouldn't be surprised if he mixed it up with the x64 ABI.
When a non-trivially-copyable object is passed by value to a function, you need to ensure that through the lifetime of the copy, it's address will never change because the constructor may have stored address of some of the field (for instance, a pointer to a field). The way this is handled at list in the SystemV x84_64 ABI is that the object is created on the caller's stack, and a pointer to it is stored in a register, so just like if it was passed as a pointer.
I have seen several cases where a header would have an "ifdef c++" clause with copy constructors and destructors in them ("It does not add field so it should be OK, right ?"), which make the object non trivially-copyable, leading to clashing calling convention between C and C++ codes. I am curious about if this may be the issue he encountered.
No. The C/C++ ABI is quite uniform across architectures. The first 1..N (N is ISA dependant) parameters that can fit into a CPU register are passed via registers. The first input parameter that _can't_ fit into a register (e.g. a structure passed by value) is pushed onto the stack, with every other following parameter being pushed onto the stack as well. N+1… parameters are always passed through the stack.
I don't think you should see this article as a criticism of C++. Just a rent on how hard it is to use in the Linux kernel which is openly against it.
> article:published-time 2016-10-28T11:40:06+00:00
which is well into the era of 64-bit code.
"CppCon 2016: Jason Turner “Rich Code for Tiny Computers: A Simple Commodore 64 Game in C++17”"
https://www.youtube.com/watch?v=zBkNBP00wJE
"C++20 For The Commodore 64"
Unless the templates have been externalised (i.e. defined as «extern template …», of course). Even then, a modern compiler+linker combo will optimise most of the unused code away at the linking time thus reducing the final binary size. I do understand that the LTO might not be available for every embedded platform, though.
P.S. That is exactly the point of the C++ template metaprogramming – the hard lifting is delegated to the compiler, which leads to increased compile times but also to more efficient and very compact runtime code.
Edit: Unless you're doing something rather silly with the templates, but again, that's not a template problem.
No, it doesn't?!? The linked article mentions templates two times (+ 2 mentions of the standard template library), once saying that templates can be used without further setup and the other times recommending that some template based data structures should be implemented. That's pretty far from "avoiding templates".
https://elixir.bootlin.com/linux/latest/source/include/linux...
Call it device_class ffs
#define class Class
#include <some_linux_header.h>
#undef class
/s"In the dark old days, in the time that most of you hadn't even heard of the word "Linux", the kernel was once modified to be compiled under g++. That lasted for a few revisions. People complained about the performance drop. It turned out that compiling a piece of C code with g++ would give you worse code. It shouldn't have made a difference, but it did. Been there, done that."
at least in general - if it's something that can't handled by a c shim then you might have an issue.
Just to understand the scope of the work, did you implement any of the following: memory isolation, networking, concurrency via interleaving on single thread, parallelism where n threads can run n processes simultaneously? How long did each take to get done?
At the end, what I had was a kernel which could boot on bare metal (or VM) and provided a command line interface. It had a virtual file system layer and ext2 file systems, process management (fork, exec sys call), memory management (paging and process isolation) and device drivers for keyboard and hard disk. The kernel was able to fork and exec static ELF binary.
I did not reach to networking and threading. But that was next step which could make it complete unix kernel.
I implemented in bits of assembly(nasm) and C++. So I had to learn runtime and code generation aspect of c++. Based on that learning I wrote this articles on c++ object models and other internals. https://www.avabodh.com/cxxin/cxx.html
The article is merely a rant because the author doesn't have much of an idea of how C++ actually works. Merely knowing the syntax doesn't make one a "C++ programmer" and this is even more true when you are messing around in the Kernel. The article contains no specifics only general statements making me think this was put up to just be a "hit piece".
> Merely knowing the syntax doesn't make one a "C++ programmer"
Does knowing all the possible abstract layers (uh, it's an ocean) make one a C ++ programmer then?
> this is even more true when you are messing around in the Kernel
It's his right to mess around Kernel and learn things.
The article has zero substance with a generic rant being "i tried to use C++ to write a Kernel Module and ran into problems". There are no specifics w.r.t. C++ nor The Kernel and yet the author blames the C++ Language! Whatever is written up also betrays a certain ignorance of basic C/C++ ABI conventions leading one to surmise that the author is clueless (w.r.t. these two domains). As you can see from other comments in this thread, many others are also of the same opinion while others are guessing all over the map as to what the actual problem might be.
Are people this ignorant when it comes to C/C++ or any systems language? ABI & calling conventions were introduced early in my C & C++ textbooks (age 13 btw, not even close to college years).
(Semi-random tangent: the hardest bug I ever had the pleasure of debugging was when I discovered that the PLT glue code to load an entry into the PLT was unexpectedly clobbering a register that the calling convention said needed to be preserved. By very, very careful using non-default calling conventions across shared object boundaries!)
Small sample of the remnants of that in the DOS world. https://docs.microsoft.com/en-us/cpp/cpp/argument-passing-an...
Decoupling the compilers optimizations and the ABI (particularly what constitutes a "move" of a struct) has derailed a few conversations I've been involved with - even from very smart devs (although mainly interpretation rather than basic misunderstandings like thinking what is actually due to the ABI is an optimization)
I've never messed with calling convention for the sake of performance before so I found this bit interesting. I found more info about it at: https://en.wikipedia.org/wiki/X86_calling_conventions#Borlan...
Does anyone have benchmarks? Assuming I don't care about ABI stability, what's the fastest calling convention?
I'd assume a modern optimizing compiler will, in situations where it's permitted, create completely novel calling conventions depending on the situation. Whole program optimization is one area you might see this.
Whole program optimization gives the compiler some ways around that. I am not sure how much freedom it gives the compiler.
From a technical point of view, I'm not so sure. C++ still interoperate easier with C than Rust, if only because you can normally just include the headers and be done with it. (Although as the article says, there are some cleanup to do.)
Rust isn't being experimented with in the kernel because someone decided we should really add a second language. C++ interop with C doesn't matter when there's no reason to use C++ in the kernel anyway.
Some people (especially game devs) are bizarrely obsessed with 32bit :-/
Here is GCC doing that optimization for a static noinline function: https://godbolt.org/z/cn6Wz9Kvn
Similarly, compilers can also clone functions if it makes sense to propagate constants from call sites into the function. Example: https://godbolt.org/z/59z6xT75n
I'm sure there is more room for improvement. A perfect compiler would always optimized programs as a whole and only regard function boundaries as hints at best. In practice, you have to keep complexity in check somehow.
https://lwn.net/ml/linux-api/20180905165436.GA25206@kroah.co...
And that's for a userspace header.
It's easy to make it compatible by not using fields called "class", or using #ifndef __cplusplus, most C library headers are actually like that. But not the Linux kernel because they refuse it.
That's why I'm saying that the choice is not a technical one.
I doubt that Linux headers give a damn about usability from C++ though.
If there is a difference (either way) I'd expect it to be something you can measure, but not something you would notice in the real world on one computer. (though at google scale it probably shows up)
PLEASE COME FROM HELLIf your Commodore 64 template is dealing with say, foozles that might be 8-byte, 12-byte or 16-byte, the complexity incurred is pretty small, bugs with foozle<16> are likely to be something mere mortals can understand and fix.
On a more complicated system like a cloud Linux setup the template may be for foozles that can be in any colorspace and on a remote machine or locally, and now sometimes the bug with foozle<HSV,remove> involves a fifty line error message because the compiler doesn't realise all that happened is you meant to write foozle<HSV,remote> ...
It's not even as if the C++ committee isn't aware that templates are a problem. Remember template meta-programming wasn't really intended from the outset, and a big part of the point of Concepts was to at last let you write code that compilers can provide readable errors for when it's wrong.
Here is the "System V i386 ABI" mentioned above: https://refspecs.linuxfoundation.org/elf/abi386-4.pdf (from https://refspecs.linuxfoundation.org/). It clearly passes all arguments on the stack, and none on registers ("Function Calling Sequence" starting on page 35). That is the ABI used on 32-bit x86 Linux if you don't specify -mregparm (which the kernel uses); since the author was calling the compiler directly (which was necessary because the kernel makefiles only have rules for building C files, not C++ files), there was a mismatch between the -mregparm used by the kernel and the default ABI used by the C++ compiler, which was fixed by also passing -mregparm to the C++ compiler.
How can it be? What about an architecture without conventional registers? And for example I work on an implementation of C/C++ that logically uses the heap for its ABI.
Especially when it comes to C (less so C++), it is a remarkably adaptable language that has been able to attune to a variety of vastly incompatible hardware architectures, including stack based ones, heap based ones and some esoteric ones as well. Yet, in the case of conventional, register based ISA's, the ABI has been remarkably similar: nonwithstanding actual ISA specific register names, registers 0…N (apart from RISC ISA's where storing into/loading from the register 0 is a no-op / zero constant) are used as input parameters and register 0 (where available) is used as the function return value (provided it can fit in); otherwise the return result is returned via stack.
Don't know if you're a non-native speaker, but no 'quite' usually does means 'completely'!
As far as I’ve seen, two things make C and C++ specifically problematic on 8-bitters: automatic promotion to int for all expressions, with int required to be at least 16 bits (a language problem); and subpar codegen on accumulator architectures and other things that are not like desktops (a compiler problem).
C++ adds type-safety on top of that for no cost. It's great when your compiler tells you that there is no operator =|(PORTD, PINA). Did you mean |=(PORTD,PIND) or =|(PORTA,PINA).
But usually much worse ASM than what a human would write on such CPUs, because the C compiler is still restricted by artificial high-level concepts like calling conventions, and it needs to wrestle with instruction sets that are not very compiler-friendly and tiny non-orthogonal register sets. C++ just adds a whole level of code obfuscation on top, so it's harder to tweak what code the compiler actually generates.
(non-native speaker here, quite frustrated about the quite different meanings of 'quite')
https://developer.apple.com/documentation/driverkit
https://docs.microsoft.com/en-us/cpp/build/reference/kernel-...
https://fuchsia.dev/fuchsia-src/development/languages/c-cpp/...
The constraints in /kernel like forbidding exceptions are because otherwise they (Microsoft) need to do a bunch of extra work to support your bad idea. But your use of templates has no impact on their work, so knock yourself out adding as much complexity as you like this way.
https://www.parasoft.com/blog/breaking-down-the-autosar-c14-...
But what do they state specifically? Ah, right.
> "The document allows in particular the usage of dynamic memory, exceptions, templates, inheritance and virtual functions."
https://www.autosar.org/fileadmin/user_upload/standards/adap...
Also IO Kit is no more, unless one is running an outdated macOS installation.
And talking about the past, maybe discussing about dynamic dispatch of Objective-C messsages on NeXTSTEP drivers with the previous Driver Kit, would also be quite interesting regarding "bloat".
https://developer.apple.com/documentation/iokit
Going forward not for long.