Linux kernel coding style(kernel.org) |
Linux kernel coding style(kernel.org) |
Don't want to use typedefs? I think that's a missed opportunity, but OK. Don't use them. OTOH, anyone who tries to pretend that the bad outweighs the good, or discourage others from using them, is an ass. Such advice for the kernel is even hypocritical, when that code uses size_t and off_t and many others quite liberally.
Say you are reading a function, and see a local variable declared: "something_t variable_name;". Is it a struct, a pointer, or a basic type? Now compare with "struct something * variable_name;", which is clearly a pointer. If on the other hand it is "struct something variable_name;", you know that it's a struct allocated on the very small kernel stack (less than 8KiB per thread) - something which wouldn't be as clear if the fact that it's a struct were hidden by a typedef.
There are three main reasons to use typedefs: to allow for changes to the underlying type; to add new information to the underlying type (which is item (c) in that section); and to hide information. Since the Linux kernel runs in a constrained environment (as I mentioned, the kernel stack is severely limited, among other things), hiding information without a good reason is frowned upon. It's the same reason they use C instead of C++; the C++ language idioms hide more information.
> Both humans and static analyzers could tell the difference if you used typedefs, and avoid quite a few bugs as a result.
The Linux kernel does that! As I mentioned, it's item (c): "when you use sparse to literally create a _new_ type for type-checking." See for instance the declaration of gfp_t:
typedef unsigned __bitwise__ gfp_t;
The __bitwise__ is for the Sparse static checker. There are other similar typedefs, like __le32 which holds a little-endian value; the Sparse checker will warn you if used incorrectly (without converting to "native" endian).The problem is that this is presented as an exception that must be (strongly) justified. I think that using typedefs for integer types should be acceptable by default, and there should be specific rules for when to avoid them. The burden of proof is being put on the wrong side.
Even for structs, the argument for typedefs is stronger than the argument against. Even across a purely internal API, the caller often doesn't need to know whether something is an integer, a pointer to a struct, a pointer to a union, a pointer to another pointer, or whatever. Therefore they shouldn't need to know in order to write a declaration, which will become stale and need to be changed if the API ever changes. This is basic information hiding, as known since the 60s. Exposing too much reduces modularity and future flexibility. I've been working on kernels for longer than Linus, and the principle still applies there.
Again, it comes down to defaults and burden of proof. The rule should be to forego struct typedefs only if every user provably needs to know that it's a struct and what's in it (which is often a sign of a bad API). Even then, adding a typedef hardly hurts; anyone who needs to know that a "foo_t" is a "struct foo" and can't figure it out in seconds shouldn't be programming in the kernel or anywhere else.
The note about integers is worth complaining a bit about, I agree there is merit to typedef'ing integers in some situations, but the Kernel standard agrees with that in those instances (And the example is bad, there are instances of 'flag' typedefs in the kernel). In general the note about integers is just to discourage spamming typedef's everywhere.
More importantly then integers though, their note about making opaque objects with a typedef is extremely good practice, as it makes it easy to distinguish when it is or isn't expected that you'll be accessing the members directly.
The point of those rules are to allow typedef to actually be useful and communicate some information. If you just allow typedef'ing everything in every situation, then whether or not something is typedef'd becomes useless information to the reader.
Did you even read their explanation. Apparently not.
This is an acceptable use of typedefs, as explained there, exactly because a size_t varies between architectures.
Personally I use typedefs as a shortcut. Rather than type boost::shared_ptr<const MyFavoriteClass> over and over, typedef it to ConstMyFavoriteClassPtr for convenience. Then be consistent with that paradigm through the whole project, so you only have to learn it once to know any given Ptr type.
I'm not going to argue against pointer typedefs, though I personally don't use them. I'm just saying that I can't make a strong argument for them as I believe I can for other cases.
Shots fired.
Matter of fact, the GNU coding standards still matter (to a certain extent) to many of us, and you would be thankful that they did, since it's the basis that provides consistency among GNU (and non GNU) command line apps, for example.
The GNU coding standards is an extensive document which doesn't only talk about how to write C code, but also how to talks about how to design software consistent with the GNU principles (command line user interfaces, options, stdin/stdout behaviour, etc).
Personally I take the kernel coding style as a whole different thing. It's a short guide on how to write consistent code for the linux kernel. And full of good opinionated choices in my opinion. Its scope is very different from that of the GNU coding standards (which, I'd say, is focused towards writing userland programs which the user will interact with).
Also, remember that GNU wanted (wants?) to create an OS, not just a kernel, so I guess we can read their guidelines as something similar to Apple's human interface guidelines for devs :)
[0] https://github.com/torvalds/linux/blob/master/scripts/checkp...
Trailing whitespace always raises a huge red flag for me whenever I look at someone's code. It's not just sloppy, it often makes diff output so noisy you can't detect real changes to the code.
E.g. in this diff
0a1
> k=2
2c3,4
< print(i)
---
> k*=k
> print(k)
you don't even see the spaces after "k=2" and "k*=k"."Making Wrong Code Look Wrong" by Joel Spolsky is a must-read and contains an explanation of Apps Hungarian (the original, thoughtful one) vs Systems Hungarian http://www.joelonsoftware.com/articles/Wrong.html
Also, any project that uses int return codes shouldn't be leaning too heavily on type safety.
HA!
As someone who spends most of their time in JavaScript, I see how hard it would be to fit our code to this, and at the same time how much we'd all benefit if we tried to.
I just looked at the random JS file on top of my editor... have some refactoring to do.
If indentation should always use tabs (0x09) and never spaces (0x20), then the whole rant about indentation width is pointless. Any modern editor will allow you to adjust the displayed width of a tab. It's only when you use spaces for indentation that the width becomes a concern.
- Line length. Some people say lines should be no longer than 78-80 characters, and you can't reasonably enforce a rule like this without answering how "wide" a tab is.
- Alignment. The "right thing to do" is to indent with tabs and align with spaces, but this is difficult for some people, against the religion of others (mixing tabs and spaces!) and insufficient if you want to align text that spans different levels of indentation. If most people use 8-character wide tabs, things will at least look right for them when it inevitably goes wrong.
Doing that seems to actually mostly work! It's made some weird (and in one case obviously broken) formatting decisions, but otherwise I'm pleased with it.
I really prefer using tabs. Having it displayed as 8 spaces in other languages is not as good as in C
And they get it right about typedefs in C
Even better, if you use real tabs, you might be able to use elastic tabstops:
Just take any random function from the kernel sources and ask yourself, what does it do. I think in most cases you'll find it's really obvious...
For me I find the kernel sources one of the most readable and understandable sources I've seen. The structure of them is just so clearly visible from the sources. I think a lot of that has to do with the coding style.
https://github.com/tcltk/tcl/blob/master/generic/tclCompile....
https://github.com/tcltk/tcl/blob/master/generic/tclCompile....
if (condition)
action();
> and if (condition)
do_this();
else
do_that();
The Apple SSL bug (https://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-got...) makes me wonder if this is really worth the potential for introducing bugs. if (condition) {
goto fail;
}
goto fail;
if nobody looks at the commit. Don't get me wrong though, braces on if's do help for making cleaner patches, so there is a valid reason to request braces. You should never rely on them to fix these types of bugs though, that's bound to come back and bite you. In general, having a proper system for submitting and approving patches (Like the Kernel has) will allow you to avoid errors like this one.To me, though, requiring braces would make it much easier to spot any such problem at any point in the development process (writing, debugging, reviewing, maintenance) such that the extra line per conditional would be well worth it in all cases, not to mention making edits easier.
Shouldn't this be changed to always use braces? Given the Apple bug?
NO
Because the wrong uses are more numerous than the right ones. It's that simple
Creating typedefs for integers is mostly useless and causes confusion, except in the cases specified.
Of course, if you work with a small project it's easier than with a big project like the kernel.
And of course I admire Linus for cutting through BS and usually avoiding it.
Thanks for enlightening me.
Though I've used that (wrapping around) mostly when writing in Haskell, for some reason. In Java and whatever lines tend to not become too long, perhaps because I tend to use four space indent...
That's because Git is stupidly opinionated about end-of-line whitespace, having been written by .... Linus.
"Comments are good, but there is also a danger of over-commenting. NEVER try to explain HOW your code works in a comment: it's much better to write the code so that the _working_ is obvious, and it's a waste of time to explain badly written code."
It's almost as bad as:
// Increment i
i++;
I think we should be aiming for code that is so beautiful and simple that it doesn't require commenting; comments should be left for exceptional circumstances in which something really can't be clearly expressed in the code. But that's definitely separate from documentation which should be separate, and at a much higher level.http://stackoverflow.com/questions/184618/what-is-the-best-c...
That said, I think the argument does apply in that some pieces of the kernel don't strictly follow the kernel style, and the fact that braces aren't enforced leads to some uglier pieces of code [0] being allowed despite not strictly adhering to the style.
[0] https://github.com/torvalds/linux/blob/master/kernel/groups....
[0] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux....
[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux....
You should instead look into historical repositories like https://archive.org/details/git-history-of-linux, which go further back; however, before Bitkeeper the authorship of each change was not tracked in detail (that historical repository IIRC does not have the detailed Bitkeeper history, but there is another repository somewhere which has it).
There were many edits to it, by several people; its more recent story can be seen on the kernel git repository. Of course, if Linus disagreed with an edit, it wouldn't go in, so in a way it's still his document.
The parent provided a very good example: a structure takes up a lot more space than a single int/pointer type, and passing them by value is usually an unnecessary copy.
and need to be changed if the API ever changes.
If the API changes then changing the declarations is likely to be trivial in comparison to the other changes that would need to be made to all the code using it.
Exposing too much reduces modularity and future flexibility.
...and exposing too little reduces understanding of the details, which I think is far more important especially for a kernel.
Not necessarily. Many structures, especially those used to make up for the lack of tuples/lists in C, are very small. The real difference is between large and small objects. Knowing which is which is part of the essential discipline of being a kernel (or embedded) programmer, and is hardly affected by whether or not typedefs are used.
> changing the declarations is likely to be trivial in comparison
That's generally true of pointer typedefs, which is why I don't particularly care for them and said so in another sub-thread. I think it's much less likely to be true for integer/enum or struct/union typedefs. For example, in the integer/enum case, the most common scenario is a change to a parameter's real type without changing its width or sign. The compiler won't flag that, even though it can cause real problems. Giving the compiler more information should be encouraged, not discouraged, even if there are exceptions either way.
> more important especially for a kernel.
Why do people persist in this belief that a kernel is some mystical realm where software-engineering principles don't apply? Being able to know is not the same as being forced to know. Kernel programmers are already more burdened than others with concerns that they need to think about for every line. Forcing more at them when it's not necessary doesn't help anyone. If you need to know whether something's a pointer to a struct or an array of something even though you never dereference into either (maybe you just pass it back or onward to another function), then somebody's wasting your precious time. Believe me, I know all about the tighter resource constraints for kernel code. OTOH, the people who worked on the AIX and Solaris kernels still knew and applied this stuff. They didn't have the anti-CS attitude that seems rampant among Linux kernel devs, and IMO they were better for that. If an RTOS for tiny devices can have decent modularity - and I've seen some that do - then why can't a full-blown kernel?
Why do you think that these "software-engineering principles" should apply? I think the fact that the Linux kernel works, and it works quite well, is strong enough evidence that they don't matter.
the people who worked on the AIX and Solaris kernels still knew and applied this stuff.
I don't know about AIX, but there's a reason Solaris has been called "Slowlaris"...
If an RTOS for tiny devices can have decent modularity
But is that modularity actually necessary? I've worked with plenty of overly complex applications that were far more inefficient and harder to understand as a whole than they could be, and most of them were the result of dogmatic adherence to principles of modularity, encapsulation, extensibility, etc. (none of which actually improved anything from the point of view of either the users nor the ones trying to figure out how everything works), so maybe that "anti-CS attitude" is a good thing after all...
It's not that it's a "mystical realm where software-engineering principles don't apply", it's that - like embedded - it's a domain where you often face tighter resource constraints. Applying the same engineering principles with different constraints can lead to different trade-offs, and ultimately different best practices.
There have been some good comments in this thread, but the only "noise" is from those who haven't even tried to present an argument one way or the other. I get that some people would draw the lines between good vs. bad use of typedefs differently than I would. I'm OK with that, as long as there's some kind of rational decision process behind it. The problem is that often there doesn't seem to be. Aesthetic concerns or the trivial difficulty of getting from the typedef to the underlying type do not, in my opinion, stand against the proven benefits of modularity or robust type checking.
It really does bother me how much of the coders' and code reviewers' bandwidth in the kernel community is wasted due to these silly formatting issues. In most IDE-using communities these problems were solved a long time ago, by the IDE autoreformatting your code on commit, with no exceptions.
int valueone = 1;
int anothervalue = 2;
float yetmore = 3.;
Aggggh what a waste of time why do people do thisIn your example, it's really easy to run your eyes down a column and see that one of those values is radically different from the others.
As a trivial example:
int robert_age = 32;
int annalouise_age = 25;
int bob_age = 250;
int dorothy_age = 56;
I find easier to read as: int robert_age = 32;
int annalouise_age = 25;
int bob_age = 250;
int dorothy_age = 56;
Coding styles are about readability and usability. The columns metaphor works well for some categories of data - that's why spreadsheets are so popular. int rumpelstiltskin_age = 202;
to your code, I already want to throw you out the window for the work I have to do and the diff I have to ruin to keep your "pretty" formatting. Just don't bother.https://google-styleguide.googlecode.com/svn/trunk/javaguide...
Google Java Style
4.6.3 Horizontal alignment: never required
Terminology Note: Horizontal alignment is the practice of adding a variable number of additional spaces in your code with the goal of making certain tokens appear directly below certain other tokens on previous lines.
This practice is permitted, but is never required by Google Style. It is not even required to maintain horizontal alignment in places where it was already used.
Here is an example without alignment, then using alignment:
private int x; // this is fine
private Color color; // this too
private int x; // permitted, but future edits
private Color color; // may leave it unaligned
Tip: Alignment can aid readability, but it creates problems for future maintenance. Consider a future change that needs to touch just one line. This change may leave the formerly-pleasing formatting mangled, and that is allowed. More often it prompts the coder (perhaps you) to adjust whitespace on nearby lines as well, possibly triggering a cascading series of reformattings. That one-line change now has a "blast radius." This can at worst result in pointless busywork, but at best it still corrupts version history information, slows down reviewers and exacerbates merge conflicts.https://developer.mozilla.org/en-US/docs/Web/CSS/vertical-al...
The vertical-align CSS property specifies the vertical alignment of an inline or table-cell box.
Values (for inline elements)
Most of the values vertically align the element relative to its parent element:
baseline: Aligns the baseline of the element with the baseline of its parent. The baseline of some replaced elements, like <textarea> is not specified by the HTML specification, meaning that their behavior with this keyword may change from one browser to the other.
sub: Aligns the baseline of the element with the subscript-baseline of its parent.
super: Aligns the baseline of the element with the superscript-baseline of its parent.
text-top: Aligns the top of the element with the top of the parent element's font.
text-bottom: Aligns the bottom of the element with the bottom of the parent element's font.
middle: Aligns the middle of the element with the middle of lowercase letters in the parent.
<length>: Aligns the baseline of the element at the given length above the baseline of its parent.
<percentage>: Like <length> values, with the percentage being a percent of the line-height property. (Negative values are allowed for <length> and <percentage>.)
The following two values vertically align the element relative to the entire line rather than relative to its parent:
top: Align the top of the element and its descendants with the top of the entire line.
bottom: Align the bottom of the element and its descendants with the bottom of the entire line. For elements that do not have a baseline, the bottom margin edge is used instead.
If you have, say, 50 lines of assignment and you align all the values to the largest one, adding one forces you to update 50 lines. I've been faced with those very situations and that is when I understood how important it is not to align values like that.
> Formatting change. Use `wdiff` to confirm that changes are just stylistic
The reviewer runs `wdiff` and confirms that the commit is just a formatting change. If the language is not layout-aware, then he will know that none of the changes are "semantic". Now he can look over the changed lines themselves (not necessarily with `diff`; just looking at the changed lines themselves) and see if the change is worth it/in line with the project.
PS: Maybe there should be a "column diff", something that checks that one file uses the same alignment as another file. I'm not able to show it here since HN will truncate spaces between words ( ;) ), but the point is to check if two files uses the same alignment, for example that in
> var v = 12
the next variable declaration, the numbers line up. I don't know if that is worth it, and the check would only be valid for some parts of the files.
You can fantasize about better diffs. Why isn't a diff applied directly to the abstract syntax trees of a language, for example? However, I think part of the robustness of version control systems comes from keeping things simple, namely line-based diffs, and with that comes a preference for keeping line-based diffs short.