`three = 1` in the linux sourcecode(github.com) |
`three = 1` in the linux sourcecode(github.com) |
Variables `three`, `five` and `seven` are better described as `next_power_of_three`, `next_power_of_five` and `next_power_of_seven`. Since the `ext4_list_backups` function should iterate through 1 (= 3^0 = 5^0 = 7^0), 3, 5, 7, 3^2, 5^2, 3^3, 7^2, ... and 1 should not repeat three times, the initial value of `next_power_of_three` (or any of others) should be 1 and those of others should not be 1. The naming is a bit unfortunate (couldn't they be `threes` etc., for example?) but actually makes sense.
https://github.com/torvalds/linux/blob/d158fc7f36a25e19791d2...
/*
* Iterate through the groups which hold BACKUP superblock/GDT copies in an
* ext4 filesystem. The counters should be initialized to 1, 5, and 7 before
* calling this for the first time. In a sparse filesystem it will be the
* sequence of powers of 3, 5, and 7: 1, 3, 5, 7, 9, 25, 27, 49, 81, ...
* For a non-sparse filesystem it will be every group: 1, 2, 3, 4, ...
*/
I'm not sure what's so noteworthy about that. There's ton of arcane code in any kernel, as long as it's commented correctly there's no issue.Good commenting is no substitute for good naming. For a variable containing the number 1, "three" is a shitty name.
The function being called is directly above this declaration. It's a static function not used outside of this file. What are the chances that someone would edit this code without understanding what "three" is used for in this context? Pretty slim I wager.
It's good that people are auditing the linux source code but if you stumble upon some weird looking code (which again, is not the case here in my opinion) the right way to deal with it is not to post it on hacker news. Contact the maintainer (look at the MAINTAINER file at the root of the kernel) if possible with a patch to fix the issue.
If I adjust your comment to "for a variable whose only purpose is to be passed to another function as the parameter called 'three', 'three' is a shitty name", would you endorse that sentiment?
How do you know it is commented correctly? Now you have to understand the code and the comment and mentally confirm them to be in sync. You also have to maintain the comment when you make changes to the code and you have to trust that others are going to do the same.
It would be better if the code itself was expressive enough so that a comment wasn't needed.
See http://www.nongnu.org/ext2-doc/ext2.pdf#page15 for a slightly more detailed explanation.
"The first version of ext2 (revision 0) stores a copy at the start of every block group, along with backups of the group descriptor block(s). Because this can consume a considerable amount of space for large filesystems, later revisions can optionally reduce the number of backup copies by only putting backups in specific groups (this is the sparse superblock feature). The groups chosen are 0, 1 and powers of 3, 5 and 7."
(Edit: Yes, I know that 3^0=1, but the wording "0, 1 and powers of 3, 5" does not imply 3^0. Thanks!)
> -- Phil Karlton
unsigned counter1 = 1;
unsigned counter2 = 5;
unsigned counter3 = 7;
or use an array: unsigned counter[3] = {1,5,7};
No more confusion. That being said, I doubt that the names of these local variables is actually a real source of confusion and is not a problem that needs to be fixed.Trying to get a patch into Linux as a new developer without personally knowing one of the (sub-)lieutenants is a futile exercise.
Find the maintainer of the relevant subsystem, talk to them via mail or IRC about what's bugging you and how to improve it, implement it, send a patch and it will most likely be accepted.
Where I have had difficulties getting something accepted is if your patch might cause problems later or doesn't solve the problem in a way that aligns with the vision of the maintainer. But while that might be annoying as a developer that wants a problem fixed now, I'd say that it's probably for the better of the whole kernel.
You're totally wrong about "personal" part.
Yes, the naming is unfortunate. No, there's probably no way to make it better (and if you think code reviews suck, wait until you see a kernel code review) (Well, this came directly from Linus but there's usually some discussion as well)
But in the end this has shipped and is running. (Well, the double goto from Apple as well, but I doubt that went through a code review)
localised comments may help but naming things properly would help more (and negate a comment being required at all), actually adopting some better practices would be a better fix going forward.
looking at the called function's comment:
The counters should be initialized to 1, 5, and 7 before * calling this for the first time.
so how about making a function with the same name, ending in '_first' (maybe rename the original _next) which initialises these variables internally before calling the version intended to be called multiple times...?
since the parameters are not well described by their names renaming them would help too... it is not a pointer to a 3 a 5 or a 7. it might not even be pointing at multiples... but current_multiple_of_three etc. would be better. there is some reason why these values are significant in this algorithm (which I do not know) and the best names would capture that.
the code is moderately difficult to read in the function itself - the method of swapping around a local copy of a pointer based on the conditions is not very easy to follow. although makes sense here to avoid excessive code... comments would help make it clearer what is going on.
So I think this means three = pow(3, 0);
It doesn't explain why it's not 3,1,7 or 3,5,1, but that's because there is no reason for that.
kernel/sysctl.c
#ifdef CONFIG_LOCKUP_DETECTOR
static int sixty = 60;
#endif
static int __maybe_unused neg_one = -1;
static int zero;
static int __maybe_unused one = 1;
static int __maybe_unused two = 2;
static int __maybe_unused three = 3;
static unsigned long one_ul = 1;
static int one_hundred = 100;
#ifdef CONFIG_PRINTK
static int ten_thousand = 10000;
#endifBesides, it made you read the code and the comments.
"Naming things" includes systems like MAC address allocation, IP address allocation, DNS, URLs for documents, process IDs, name to inode mapping in filesystems, autoincrement primary keys in databases, etc.
Any ideas on what Karlton really meant with the quote?
It seems to me that this quote makes a rather standard usage of the "Zeugma" [1] figure of speech. It consists in putting next to each other two things of very different nature. In that case we have the very technical problem of invalidating caches on the one hand, and the much higher level problem of naming things on the other hand.
That produces a rather stylish quote.
But if Phil Karlton was actually referring to the technical problem of assigning unique identifier as you suggest, the figure of speech is lost and the quote becomes less stylish.
So I prefer tho read "naming things" as the very general problem of naming things.
The non-human-readable things like MACs, IPs and auto-increment keys are easy. But when I have to come up with a short, memorable and non-confusing name for something like a script that can take me longer than writing the code.
>The difficulty only arises if the body of a nested function refers directly (i.e., not via argument passing) to identifiers defined in the environment in which the function is defined, but not in the environment of the function call.
Also, the version I know has much more bite:
>There is also a variation on this that says there are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.
That's absolutely horrible: you've now introduced the possibility of accidental out-of-bounds access where there previously was none, not to mention that you just forced the compiler to put those variables on the stack instead of using registers.
Contrived example:
int array(int lol)
{
unsigned counter[3] = {1,5,7};
for (int i = 0; i < lol; i++)
counter[i % 3]++;
return counter[0] + counter[1] + counter[2];
}
int vars(int lol)
{
unsigned counter1 = 1;
unsigned counter2 = 5;
unsigned counter3 = 7;
for (int i = 0; i < lol; i++)
switch(i % 3) {
case 0: counter1++;
case 1: counter2++;
case 2: counter3++;
}
return counter1 + counter2 + counter3;
}
The output is quite different (gcc -c -O3 -std=gnu99): Disassembly of section .text:
0000000000000000 <array>:
0: 85 ff test %edi,%edi
2: c7 44 24 e8 01 00 00 movl $0x1,-0x18(%rsp)
9: 00
a: c7 44 24 ec 05 00 00 movl $0x5,-0x14(%rsp)
11: 00
12: c7 44 24 f0 07 00 00 movl $0x7,-0x10(%rsp)
19: 00
1a: 7e 5f jle 7b <array+0x7b>
1c: 41 b8 01 00 00 00 mov $0x1,%r8d
22: 31 c9 xor %ecx,%ecx
24: be 56 55 55 55 mov $0x55555556,%esi
29: eb 1f jmp 4a <array+0x4a>
2b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
30: 89 c8 mov %ecx,%eax
32: f7 ee imul %esi
34: 89 c8 mov %ecx,%eax
36: c1 f8 1f sar $0x1f,%eax
39: 29 c2 sub %eax,%edx
3b: 8d 04 52 lea (%rdx,%rdx,2),%eax
3e: 89 ca mov %ecx,%edx
40: 29 c2 sub %eax,%edx
42: 48 63 c2 movslq %edx,%rax
45: 44 8b 44 84 e8 mov -0x18(%rsp,%rax,4),%r8d
4a: 89 c8 mov %ecx,%eax
4c: f7 ee imul %esi
4e: 89 c8 mov %ecx,%eax
50: c1 f8 1f sar $0x1f,%eax
53: 29 c2 sub %eax,%edx
55: 8d 04 52 lea (%rdx,%rdx,2),%eax
58: 89 ca mov %ecx,%edx
5a: 83 c1 01 add $0x1,%ecx
5d: 29 c2 sub %eax,%edx
5f: 39 f9 cmp %edi,%ecx
61: 48 63 c2 movslq %edx,%rax
64: 41 8d 50 01 lea 0x1(%r8),%edx
68: 89 54 84 e8 mov %edx,-0x18(%rsp,%rax,4)
6c: 75 c2 jne 30 <array+0x30>
6e: 8b 44 24 ec mov -0x14(%rsp),%eax
72: 03 44 24 e8 add -0x18(%rsp),%eax
76: 03 44 24 f0 add -0x10(%rsp),%eax
7a: c3 retq
7b: b8 0d 00 00 00 mov $0xd,%eax
80: c3 retq
81: 66 66 66 66 66 66 2e data32 data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
88: 0f 1f 84 00 00 00 00
8f: 00
0000000000000090 <vars>:
90: 85 ff test %edi,%edi
92: 7e 52 jle e6 <vars+0x56>
94: 31 c9 xor %ecx,%ecx
96: 41 b8 07 00 00 00 mov $0x7,%r8d
9c: 41 b9 05 00 00 00 mov $0x5,%r9d
a2: 41 bb 01 00 00 00 mov $0x1,%r11d
a8: 41 ba 56 55 55 55 mov $0x55555556,%r10d
ae: 89 c8 mov %ecx,%eax
b0: 89 ce mov %ecx,%esi
b2: 41 f7 ea imul %r10d
b5: c1 fe 1f sar $0x1f,%esi
b8: 89 c8 mov %ecx,%eax
ba: 29 f2 sub %esi,%edx
bc: 8d 14 52 lea (%rdx,%rdx,2),%edx
bf: 29 d0 sub %edx,%eax
c1: 83 f8 01 cmp $0x1,%eax
c4: 74 09 je cf <vars+0x3f>
c6: 83 f8 02 cmp $0x2,%eax
c9: 74 08 je d3 <vars+0x43>
cb: 41 83 c3 01 add $0x1,%r11d
cf: 41 83 c1 01 add $0x1,%r9d
d3: 83 c1 01 add $0x1,%ecx
d6: 41 83 c0 01 add $0x1,%r8d
da: 39 f9 cmp %edi,%ecx
dc: 75 d0 jne ae <vars+0x1e>
de: 45 01 d9 add %r11d,%r9d
e1: 43 8d 04 01 lea (%r9,%r8,1),%eax
e5: c3 retq
e6: b8 0d 00 00 00 mov $0xd,%eax
eb: c3 retq
Arrays mean memory - don't use them unless you actually want the compiler to use memory.For example, how to assign unique MAC numbers to network cards without a centralised database that has to be modified for each card manufactured? Vendors get address space blocks, and probably split these further into blocks for factories and production lines.
Generating unique autoincrement row numbers in a distributed database or process IDs in a cluster is also a complex problem if it has to be faster than any communication between the nodes.
Designing the layout of IP addresses, architecture of DHCP and DNS, and the very idea of URLs are fantastic pieces of Computer Science work. Easy to use, but a hard problem for the original designers!
But of course, I have no idea what Karlton had in mind with the quote. "Naming things" as "assigning unique identifiers" always felt appropriate with cache invalidation, as both are problems that are easy for single cores/machines, but very complex in distributed systems.
On the other hand, I think that this kind of definitions are not useful for C.
I agree in general. However, the Embedded Linux world suffers from severe fragmentation and duplicate and/or incompatible work in part due to this policy (the greater good). The ARM platform has been consolidated a bit due to the involvement of very large players, but on the PowerPC side, things are pretty dim.
The abbreviated quote I was replying to just confuses things by leaving out critical parts.
Using an array here is wrong: it needlessly complicates the code (now in addition to everything else, I have to remember how big the array is and what each (unnamed) index corresponds to), possibly makes the compiler emit worthless loads/stores, and let's me potentially blow up the world if I mess up and access out-of-bounds.
Actually no, GCC can optimize out pointer indirection like that if it ends up inlining the function (which it could in the kernel example, since it's static, relatively short, and has only three callers).
Another contrived example:
static int inlineme(int *lol, int lolol)
{
*lol += lolol;
return *lol * 5;
}
int test(int x)
{
int tmp = 5;
tmp += inlineme(&tmp, x);
return tmp;
}
... all of which compiles to (with -O2): 0000000000000000 <test>:
0: 83 c7 05 add $0x5,%edi
3: 8d 04 bf lea (%rdi,%rdi,4),%eax
6: 01 f8 add %edi,%eax
8: c3 retq
Note that "tmp" doesn't have to be a constant for that to happen.As far as the actual kernel function, GCC ends up emitting: (The only uncompressed kernel binary I had laying around was for my beaglebone black, sorry)
c0136fa0 <verify_reserved_gdb>:
c0136fa0: e92d4ff0 push {r4, r5, r6, r7, r8, r9, sl, fp, lr}
c0136fa4: e24dd02c sub sp, sp, #44 ; 0x2c
<snip>
c0136fbc: e3a03001 mov r3, #1
c0136fc0: e58d301c str r3, [sp, #28]
c0136fc4: e3a03005 mov r3, #5
c0136fc8: e58d3020 str r3, [sp, #32]
c0136fcc: e3a03007 mov r3, #7
c0136fd0: e58d3024 str r3, [sp, #36] ; 0x24
c0136fd4: e1a00007 mov r0, r7
c0136fd8: e28d101c add r1, sp, #28
c0136fdc: e28d2020 add r2, sp, #32
c0136fe0: e28d3024 add r3, sp, #36 ; 0x24
c0136fe4: ebffffd4 bl c0136f3c <ext4_list_backups>
... so in this case it does actually pass by reference.That doesn't change my argument: an array is the more confusing way to do this, and in similar situations could prevent the compiler from making the optimization it did in my example.
The goal is to enumerate the powers of 3, 5, and 7, once each. Since power sequences all overlap at x^0 = 1, but we specifically don't want to enumerate 1 three times, we have to give one (or, from an alternative viewpoint, two) of the variables special treatment. Whether you name the variables "three", "five", and "seven", or "next_power_of_three", "next_power_of_five", and "next_power_of_seven", you're doing something strange by starting one of them at 1 and the other two past 1, and that should be commented on. The naming-only solution "powers_of_three_initialized_starting_at_three_to_the_zeroeth", "powers_of_five_initialized_starting_at_five_to_the_first", and "powers_of_seven_initialized_starting_at_seven_to_the_first", is hilariously awful, and still requires a comment to explain why the threes variable is more (or less) special than the other two.
// current power of three, init to 3^0
unsigned three = 1;
The fact that this looks like a bug at first glance is a pretty good indication that there should be some explanation of what exactly it's doing.This brings to mind the Orwell essay on language usage[1]: Break any of these rules sooner than say anything outright barbarous.
IMO having "three = 1" with no immediate context qualifies as barbarous. Yes it would be best to rename the variable, but, failing that, just toss in a freaking comment for common sense's sake.
The code example being discussed has comments sprinkled throughout the functions, and the "good style" / standard also says try to avoid not avoid at all cost. Also, something about blind adherence to rulebooks.
unsigned current_power_of_three = 1;
or something equally verbose. I'd expect such silliness in some enterprisey java code, but I think most people would agree a short explanatory comment is probably the superior choice. /* current power of three, init to 3^0 */ Linux style for comments is the C89 "/* ... */" style.
Don't use C99-style "// ..." comments.
[1] https://www.kernel.org/doc/Documentation/CodingStyle