Linux kernel coding style

124 points by sthlm 11 years ago | 101 comments

kagia 11 years ago |

I doubt everyone agrees to that coding style, I certainly don't. However when submitting code to a project I'd still stick to the prescribed coding style, because I believe consistency in any code base can be just as important as any other measure of readability.

snlacks 11 years ago | |

I find it hard to agree with a lot of this, but it'd obviously be someone who's written a lot of code, thought a lot about how to write code, and reads a lot of code - even if we didn't know who it was. There's a lot to learn from reading stuff like this, if you take it all with a grain of salt... or if you're contributing.

stinos 11 years ago | | |

This. It's easier to find lots of people who don't agree with a particular style (I agree with none but mine which is far, far away from the kernel one's for instance and much more readable of course, lol) than people who do, but worse is projects where styles are mixed. Or even tabs and spaces are mixed. Don't get started on that one :] (edit already happened in other comments, of course)

andmarios 11 years ago | | |

I think it was written by Torvalds and other kernel hackers. It is part of the Linux source code, under the Documentation directory.

notacoward 11 years ago |

Mostly good advice, sometimes even great, but the part about typedefs is total BS. Any non-trivial program will use values that have clearly different meanings but end up being the same C integer type. One's an index, one's a length, one's a repeat count, one's an enumerated value ("enum" was added to the language to support this very usage), and so on. It's stupid that C compilers don't distinguish between any two types that are the same width and signedness; why compound that stupidity? Both humans and static analyzers could tell the difference if you used typedefs, and avoid quite a few bugs as a result. Being able to change one type easily might also make some future maintainer's life much better. There's practically no downside except for having to look up the type in some situations (to see what printf format specifier to use), but that's a trivial problem compared to those that can result from not using a typedef.

Don't want to use typedefs? I think that's a missed opportunity, but OK. Don't use them. OTOH, anyone who tries to pretend that the bad outweighs the good, or discourage others from using them, is an ass. Such advice for the kernel is even hypocritical, when that code uses size_t and off_t and many others quite liberally.

cesarb 11 years ago | |

Most of that section is concerned with hiding structs or pointers as typedefs: "In general, a pointer, or a struct that has elements that can reasonably be directly accessed should _never_ be a typedef."

Say you are reading a function, and see a local variable declared: "something_t variable_name;". Is it a struct, a pointer, or a basic type? Now compare with "struct something * variable_name;", which is clearly a pointer. If on the other hand it is "struct something variable_name;", you know that it's a struct allocated on the very small kernel stack (less than 8KiB per thread) - something which wouldn't be as clear if the fact that it's a struct were hidden by a typedef.

There are three main reasons to use typedefs: to allow for changes to the underlying type; to add new information to the underlying type (which is item (c) in that section); and to hide information. Since the Linux kernel runs in a constrained environment (as I mentioned, the kernel stack is severely limited, among other things), hiding information without a good reason is frowned upon. It's the same reason they use C instead of C++; the C++ language idioms hide more information.

> Both humans and static analyzers could tell the difference if you used typedefs, and avoid quite a few bugs as a result.

The Linux kernel does that! As I mentioned, it's item (c): "when you use sparse to literally create a _new_ type for type-checking." See for instance the declaration of gfp_t:

  typedef unsigned __bitwise__ gfp_t;

The __bitwise__ is for the Sparse static checker. There are other similar typedefs, like __le32 which holds a little-endian value; the Sparse checker will warn you if used incorrectly (without converting to "native" endian).

notacoward 11 years ago | | |

> it's item (c): "when you use sparse to literally create a > _new_ type for type-checking."

The problem is that this is presented as an exception that must be (strongly) justified. I think that using typedefs for integer types should be acceptable by default, and there should be specific rules for when to avoid them. The burden of proof is being put on the wrong side.

Even for structs, the argument for typedefs is stronger than the argument against. Even across a purely internal API, the caller often doesn't need to know whether something is an integer, a pointer to a struct, a pointer to a union, a pointer to another pointer, or whatever. Therefore they shouldn't need to know in order to write a declaration, which will become stale and need to be changed if the API ever changes. This is basic information hiding, as known since the 60s. Exposing too much reduces modularity and future flexibility. I've been working on kernels for longer than Linus, and the principle still applies there.

Again, it comes down to defaults and burden of proof. The rule should be to forego struct typedefs only if every user provably needs to know that it's a struct and what's in it (which is often a sign of a bad API). Even then, adding a typedef hardly hurts; anyone who needs to know that a "foo_t" is a "struct foo" and can't figure it out in seconds shouldn't be programming in the kernel or anywhere else.

DSMan195276 11 years ago | |

You're not understanding their idea behind typedef. IMO, their lines for usage are extremely good when adhered too.

The note about integers is worth complaining a bit about, I agree there is merit to typedef'ing integers in some situations, but the Kernel standard agrees with that in those instances (And the example is bad, there are instances of 'flag' typedefs in the kernel). In general the note about integers is just to discourage spamming typedef's everywhere.

More importantly then integers though, their note about making opaque objects with a typedef is extremely good practice, as it makes it easy to distinguish when it is or isn't expected that you'll be accessing the members directly.

The point of those rules are to allow typedef to actually be useful and communicate some information. If you just allow typedef'ing everything in every situation, then whether or not something is typedef'd becomes useless information to the reader.

raverbashing 11 years ago | |

"Such advice for the kernel is even hypocritical, when that code uses size_t and off_t and many others quite liberally"

Did you even read their explanation. Apparently not.

This is an acceptable use of typedefs, as explained there, exactly because a size_t varies between architectures.

notacoward 11 years ago | | |

That's just rationalization. It's basically saying that some typedefs are OK because Linus is used to them, but he doesn't want to take the few seconds to figure out any new ones. The cases for typedefs shouldn't be treated as exceptions. The cases against them should.

dllthomas 11 years ago | |

I've had luck using single-element structs to distinguish between types of data when I'm throwing a lot of primitive types around. In my test with gcc, the generated code was identical to using the primitives directly, although the standard doesn't actually guarantee that and it's historically not been the case in some particular compilers (not sure which).

bluecalm 11 years ago | |

As a recreational C programmer I have the same impression. I've always used typedefs for structs and enums in my code and I think it makes it more readable and easier to work on. My reaction to reading the kernel style guidelines was a surprise and I am happy I am not the only one disagreeing.

coldpie 11 years ago | | |

As a "professional" C/C++ programmer (day job), I don't have strong feelings either way. It is frustrating not knowing what the type of a variable is. Am I being passed a pointer, or an integer, or a floating point value, or a whole struct, or what? This really matters! Digging up the definition isn't difficult, but isn't easy, either. I would lean against using typedefs liberally, but I don't feel strongly about it.

Personally I use typedefs as a shortcut. Rather than type boost::shared_ptr<const MyFavoriteClass> over and over, typedef it to ConstMyFavoriteClassPtr for convenience. Then be consistent with that paradigm through the whole project, so you only have to learn it once to know any given Ptr type.

maguirre 11 years ago | |

I agree with you. I think structs help readability specially we using function pointers within structures. Would it be out of line to suggest a new naming convention for struct typedefs and pointer typedefs i.e _t for typedegs and _tp for typedeg pointers

notacoward 11 years ago | | |

While I generally think typedefs for integer/enumerated types and structs/unions are a good idea, I also don't think the arguments I've made apply as much to typedefs for pointers. The difference between an X and a pointer/reference to X is often an explicit part of the contract between modules or functions. If that contract ever changes, the declarations and usage should change in ways beyond replacement of an identifier. That's different than if X itself changes, which usually can and should be transparent. You also get the same type checking for an "X pointer" declaration (avoiding star because of HN mis-formatting issues) as for its "X_ptr" equivalent. Even compilers will flag "pointer to wrong type" errors, even as they remain oblivious to many "wrong integer type" errors. In short, "X_ptr" typedefs don't help anywhere that "X" typedefs don't already.

I'm not going to argue against pointer typedefs, though I personally don't use them. I'm just saying that I can't make a strong argument for them as I believe I can for other cases.

adestefan 11 years ago | | |

The argument is that you don't need to resort to naming conventions since the language already supports differentiating them with the struct and the * markings. It's one of the things I fully support. I hate working on code with a billion typedefs for every struct.

robinhoodexe 11 years ago |

"First off, I'd suggest printing out a copy of the GNU coding standards, and NOT read it. Burn them, it's a great symbolic gesture."

Shots fired.

wsc981 11 years ago | |

I thought it was a funny read (and I certainly don't agree with everything, even if some snark remarks gave me a good laugh). I wonder if the style guide has been written by Linus, which would make sense to me.

bryal 11 years ago | |

With the target being the, what?, 5 people?, that enjoy following the GNU coding standards.

eyko 11 years ago | | |

The Kernel coding style wasn't written a week ago. The first I read it, more than a decade ago, the GNU coding standards did matter and I remember feeling quite hurt by that (in a good way, since I don't think anybody took it that seriously).

Matter of fact, the GNU coding standards still matter (to a certain extent) to many of us, and you would be thankful that they did, since it's the basis that provides consistency among GNU (and non GNU) command line apps, for example.

The GNU coding standards is an extensive document which doesn't only talk about how to write C code, but also how to talks about how to design software consistent with the GNU principles (command line user interfaces, options, stdin/stdout behaviour, etc).

Personally I take the kernel coding style as a whole different thing. It's a short guide on how to write consistent code for the linux kernel. And full of good opinionated choices in my opinion. Its scope is very different from that of the GNU coding standards (which, I'd say, is focused towards writing userland programs which the user will interact with).

Also, remember that GNU wanted (wants?) to create an OS, not just a kernel, so I guess we can read their guidelines as something similar to Apple's human interface guidelines for devs :)

patrickg 11 years ago |

I am glad that for Go there is `go fmt` which predefines some of the issues mentioned in the article. Thus there is "one global coding style for Go". It's another matter if one likes it or not.

Dewie 11 years ago | |

I don't see why there couldn't be a `kernel fmt` tool. In this day and age, we should really be beyond having to worry about things like hmm, what was the brace style in this project again, and should all if/while/for have mandatory braces?.

DSMan195276 11 years ago | | |

The kernel has a perl script called 'checkpatch.pl'[0] which can check if code is formatted correctly. The Kernel coding style isn't actually enforced 100% though, which makes it a bit more iffy. Not all the code in the kernel actually follows the same style (IIRC, there's at least one sub-system that uses a slightly different style, I think 'net' maybe?), and so 'checkpatch' is recommended but may not be the be-all end-all in every situation.

[0] https://github.com/torvalds/linux/blob/master/scripts/checkp...

qznc 11 years ago | | |

That would be "astyle --style=linux" for example.

jackalope 11 years ago |

"Get a decent editor and don't leave whitespace at the end of lines."

Trailing whitespace always raises a huge red flag for me whenever I look at someone's code. It's not just sloppy, it often makes diff output so noisy you can't detect real changes to the code.

Perseids 11 years ago | |

I understand that it would be bad to introduce whitespace-only changes, but why would whitespace at the end of the line that doesn't break the 80 character limit be a problem otherwise? Sure, git colors them red in its diffs, but that is kind of a circular reasoning.

E.g. in this diff

  0a1
  > k=2 
  2c3,4
  <     print(i)
  ---
  >     k*=k 
  >     print(k)

you don't even see the spaces after "k=2" and "k*=k".

teacup50 11 years ago | |

Who cares? Get a decent editor that doesn't give a crap if there's invisible whitespace at the end of a line.

edran 11 years ago | | |

As the top commenter said, the problems do not end with choosing a good editor and fixing its display methodology. Git and many other VCSs create noisy diffs whenever space is added and forgotten, which ultimately complicates the life of developers that want to review changes. Even if your editor were able to make display diffs in a clean way, you would still have a dirty history when for instance using less/more or other tools (not to mention the merging issues problem).

pshc 11 years ago | | |

Right, but that's only half of the robustness principle.

uniformlyrandom 11 years ago | | |

or run perl -e s/(.*)\s+$/$1/g

JBiserkov 11 years ago |

>Encoding the type of a function into the name (so-called Hungarian notation) is brain damaged - the compiler knows the types anyway and can check those, and it only confuses the programmer. No wonder MicroSoft makes buggy programs.

"Making Wrong Code Look Wrong" by Joel Spolsky is a must-read and contains an explanation of Apps Hungarian (the original, thoughtful one) vs Systems Hungarian http://www.joelonsoftware.com/articles/Wrong.html

humanrebar 11 years ago | |

C is not a strongly typed language and it does not allow function overloading. C projects should allow for some flexibility in naming notations to make up for those language design decisions.

Also, any project that uses int return codes shouldn't be leaning too heavily on type safety.

juliangregorian 11 years ago | | |

This is really interesting. I was about to correct you since I remembered using function overloading, but then I double checked and it was indeed C++. I knew about C++'s name mangling from having coded against it in FFI, but never knew why it existed: the name mangling is what allows C++ to have function overloading. Light bulb!

cbd1984 11 years ago | |

Right. Use Hungarian to encode information the type system can't represent. This is relevant even in object oriented languages, such as distinguishing between escaped and unescaped strings.

kakakiki 11 years ago |

"There are heretic movements that try to make indentations 4 (or even 2!) characters deep, and that is akin to trying to define the value of PI to be 3."

HA!

snlacks 11 years ago | |

I laughed too. He makes a good point about the amount of indentation in code.

As someone who spends most of their time in JavaScript, I see how hard it would be to fit our code to this, and at the same time how much we'd all benefit if we tried to.

I just looked at the random JS file on top of my editor... have some refactoring to do.

kakakiki 11 years ago | | |

I understand your point. I am having some rethinking about my style of writing too :)

kijin 11 years ago |

> spaces are never used for indentation

If indentation should always use tabs (0x09) and never spaces (0x20), then the whole rant about indentation width is pointless. Any modern editor will allow you to adjust the displayed width of a tab. It's only when you use spaces for indentation that the width becomes a concern.

repsilat 11 years ago | |

There are two counterarguments to this:

- Line length. Some people say lines should be no longer than 78-80 characters, and you can't reasonably enforce a rule like this without answering how "wide" a tab is.

- Alignment. The "right thing to do" is to indent with tabs and align with spaces, but this is difficult for some people, against the religion of others (mixing tabs and spaces!) and insufficient if you want to align text that spans different levels of indentation. If most people use 8-character wide tabs, things will at least look right for them when it inevitably goes wrong.

LnxPrgr3 11 years ago | | |

I've been playing with clang-format for my own projects (installed via homebrew, since Apple doesn't ship it with Xcode). I tell it to enforce limits assuming ts=8 and use tabs for indentation. My editor is configured for ts=4.

Doing that seems to actually mostly work! It's made some weird (and in one case obviously broken) formatting decisions, but otherwise I'm pleased with it.

dllthomas 11 years ago | | |

I almost want a language that demands a visible character where indentation ends and alignment begins...

rossy 11 years ago | |

When you have a limit of 80 characters per line, the indentation width still matters because code that doesn't appear to overflow with 4 space tabs could overflow with 8 space tabs.

raverbashing 11 years ago |

I like it

I really prefer using tabs. Having it displayed as 8 spaces in other languages is not as good as in C

And they get it right about typedefs in C

Dewie 11 years ago | |

Right, having tabs seems better since I can configure my editor to display tabs as 4 spaces, while whoever else can have tabs be displayed as 8 spaces. Having spaces instead, and having to make your tabs output spaces, and perhaps also backspace deleting four spaces (a "tab") in certain contexts, seems pretty complex in comparison.

twic 11 years ago | | |

Agreed. Using spaces to painstakingly emulate tabs, rather than just using tabs, seems absurd to me.

Even better, if you use real tabs, you might be able to use elastic tabstops:

http://nickgravgaard.com/elastictabstops/

dezgeg 11 years ago | | |

In theory, it sounds like a nice idea that by having tabs, you could choose your own preferred indent width of something different than 8 columns. But in practice, this will cause problems, such as code written by you going over the maximum line length when viewed by people with 8 column tabs.

scrollaway 11 years ago | | |

I got into that fight so many times. It baffles me a majority of programmers out there do not understand that tabs are not just a matter of preference, they are a matter of accessibility. I read better on 4-char indent, and some people read better on 8-char indent. Let the user choose, rather than force it with spaces.

oneeyedpigeon 11 years ago | | |

Agreed, although this doesn't sit well with me when combined with the reasons for the recommendation - i.e. 8 is just better, line length should be 80, nesting should be limited to 3. If someone can set their indent level to something other than 8, won't they be more likely to violate other rules? I say this having just realised I have my tab set to 4 spaces ...

Zardoz84 11 years ago | |

2 spaces to indent is enough for anyone.

BugsBunnySan 11 years ago |

It's a very nice coding style. It keeps the code in pieces that are easy to grasp as units, it doesn't waste space and doesn't clutter the code at the same time.

Just take any random function from the kernel sources and ask yourself, what does it do. I think in most cases you'll find it's really obvious...

For me I find the kernel sources one of the most readable and understandable sources I've seen. The structure of them is just so clearly visible from the sources. I think a lot of that has to do with the coding style.

davidw 11 years ago |

I'm a fan of the Tcl/Apache/BSD style. Indeed, Tcl has nice C code:

https://github.com/tcltk/tcl/blob/master/generic/tclCompile....

oneeyedpigeon 11 years ago | |

Nice, but they definitely overcomment IMO, e.g.

https://github.com/tcltk/tcl/blob/master/generic/tclCompile....

jmnicolas 11 years ago | | |

After having maintained several projects with absolutely 0 comment apart the ones that were copy pasted from examples found on the web, I can tell you there's no such thing as "overcomment".

thatswrong0 11 years ago |

> Do not unnecessarily use braces where a single statement will do.

    if (condition)
	    action();

> and

    if (condition)
	    do_this();
    else
	    do_that();

The Apple SSL bug (https://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-got...) makes me wonder if this is really worth the potential for introducing bugs.

DSMan195276 11 years ago | |

IMO, the bug was a much deeper issue then simply not putting braces on if statements. It doesn't matter if the code becomes this:

    if (condition) {
        goto fail;
    }
        goto fail;

if nobody looks at the commit. Don't get me wrong though, braces on if's do help for making cleaner patches, so there is a valid reason to request braces. You should never rely on them to fix these types of bugs though, that's bound to come back and bite you. In general, having a proper system for submitting and approving patches (Like the Kernel has) will allow you to avoid errors like this one.

thatswrong0 11 years ago | | |

I certainly don't disagree with you on the importance of process. I was going to mention how this probably would never be an issue for the kernel.

To me, though, requiring braces would make it much easier to spot any such problem at any point in the development process (writing, debugging, reviewing, maintenance) such that the extra line per conditional would be well worth it in all cases, not to mention making edits easier.

RogerL 11 years ago | |

Any decent code analyzer will pick that kind of thing up. I'd say the Apple bug speaks more about their process than their coding standards.

whoisthemachine 11 years ago |

I agree with and already practice many of these conventions (at least the ones that apply to C-like languages in general). It's interesting that I do and I kind of wonder what lead me down that path, since I haven't programmed in C since my college days. I often think that my assembly class from those days pushed me into making my code as vertical as possible rather than the indented-if-statement-curly-brace hell that I often see, since assembly was very readable without having that capability.

geekam 11 years ago |

>> Do not unnecessarily use braces where a single statement will do.

Shouldn't this be changed to always use braces? Given the Apple bug?

dllthomas 11 years ago | |

The "Apple bug" - I assume you mean the duplicated "goto fail" - isn't really a common kind of error. That said, there are others that "braces everywhere" does protect from somewhat. The question is whether the clutter trades off too much in readability. Linus apparently thinks it does.

Nmachine 11 years ago |

I stopped reading pretty early on: ... if you need more than three levels of indentation you're screwed and should rewrite...