I am finding it extremely hard to envision a circumstance where this is a bottleneck for anything. Care to clarify the context?
I also find the struct layout really odd; why not just move c?
Your benchmarks are also probably broken; branch predictors use global state so will almost certainly predict fine the way you've used things. You need to repopulate a significantly-sized array each time with randomly chosen values. You can't use the same array because it'll be learnt, and you can't use a short array because it'll be predicted globally.
For some additional context, these structs are packed wire-line protocol structs. In reality the padding bytes are full of other fun details.
For me, a wrapper includes the original thing and just wraps stuff around it. How does this work here? May also be a question to the author, I suppose now...
clang warning: flexible array members are a C99 feature [-Wc99-extensions]
is this still the case?
"Don't."
The end.
(Basically, as the story shows, with this kind of micro-optimization you may or may not beat the compiler but you're almost certainly wasting your time compared with more effective optimization methods, like rethinking the problem.)
Yet, a business case might exist when the library is heavily utilized, or often when a compiler isn't able to produce the correct code.
There are also cases primarily, in finance, where single threaded low-latency distinguishes competing groups. Some of those guys count every nanosecond.
The techniques described here ( and in other places) are universally applicable.
Try as I might, I could not beat GCC [2], which used non-vectorized code. I chalk it up to not knowing how best to write optimized x86 code anymore (it's been years since I did any real assembly language programming) and I might be hitting some scheduling or pipeline issues, I just don't know.
[1] I described the code years ago here: http://boston.conman.org/2004/06/09.2
[2] I beat clang easily though.
When that isn't the case, it is just wasting money.
> you're almost certainly wasting your time compared with more effective optimization methods
Some rare cases absolutely do exist where specific micro-optimizations such as these may be useful - I'm not arguing that they don't.
Even in those cases, though, you're far more likely to achieve significant performance gains by taking a step back and re-examining your high level goals and your approach to the problem.
I'm not saying that eliminating a branch or two is never going to be useful. I'm just saying you should focus your attention elsewhere first.
But, you are right that it probably doesn't matter. The minimal benchmarking I did convinced us that there wouldn't be much of a cost. The code that actually ended up being committed to our project has the branch.