Compilers in OpenBSD(marc.info) |
Compilers in OpenBSD(marc.info) |
* "gcc 2.5 (at the time) had a few bugs, but not many"
* "schism between gcc 2.8, conservative...and the ``Pentium gcc'' group...[Pentium gcc group was] stretching the optimizer code beyond its limits"
* "These projects eventually merged as gcc 2.95...gcc [now] had bugs"
But what does this historical lesson tell us?
Stallman was conservative, slow-moving and cathedral-like with 2.8. This approach helped keep bugs out of the code.
The "pentium gcc" (Cygnus/egcs) group was quickly responding to marketplace needs. It was more bazaar-like. It committed code more freely than Gnu would - and the code while allowing for new functions was not always that well architected.
So what was the deal with this schism and then subsequent merger which happened? What happened was egcs (the Cygnus-oriented one, the "pentium gcc" one) began eclipsing gcc. Toward the end it really began eclipsing gcc. It was not as solid as gcc, but it had all the new functionality people wanted. All over, people were seriously about to abandon gcc and go with egcs. At this point Stallman threw up his hands and accepted that the egcs approach had won. They merged, and gcc became more liberal about what code it would commit, at the expense of being a solid code base. Just like the OP says.
So now what is different this time around? Why is a compiler which prioritizes stability and correctness over new functionality and optimizations going to win? The latter approach won last time around, why should the first approach win this time? Especially since in battles between cathedral/waterfall projects and bazaar/agile projects, the bazaar/agile approach seems to come out on top again and again. OpenBSD can afford to go this route if it wants because OpenBSD fills a marginal niche. It might even be interesting to watch OpenBSD go down this road. But for more mainstream OS's like Linux, this approach might not be possible.
And if anyone mentions Apple - Apple is not marginal, but it is a niche. GCC and Linux are in a multitude of environments. A company like Apple with its own ecosystem and only a handful of targets can afford to pick and choose its compiler.
As to the subject at hand: OpenBSD aims for an OS "which prioritizes stability and correctness over new functionality and optimizations". Such an OS will want to use a similar compiler. And yes, I think one can draw an analogy between gcc/egcs and BSD/Linux here.
GCC 4.8.1 was the first compiler with complete C++11 support. Clang with complete C++11 support was released a bit later.
I read it to say that GCC is so open source that it cannot converge on a stable release. Further there isn't a non-commercial (aka free) incentive for making it stable, so it doesn't converge. Rather it trundles along from new optimization strategy to the next constantly in a state of minor bugginess. The economics of 'sold' products uses the loss of revenue as the incentive to maintain quality, without that incentive its hard.
Google has (had?) a pretty good sized team that did nothing but maintain GCC. I'm sure it cost them easily $1M/year to keep that team going. There is no incentive for them to fund a team like that in a third party such that everyone else benefits from their work. Sure they offer the changes back into the base product, and somewhere else there is another team working for company Y that is taking those, porting them into their effort. In this article from Marc he mentions himself and 5 other developers who are the "compiler people". 5 developers, $120K each, that is .6M/year before you add insurance and office space.
And those 5 have their lives made more difficult by the dozen or so folks who are committing in changes that destabilize parts of the code or require side ports.
It makes me wonder how many people there are like me who would be willing to pay $100/year for a bespoke C compiler that was supported by a single source and stable.
[1] http://pcc.ludd.ltu.se/
[2] http://comments.gmane.org/gmane.os.openbsd.misc/196817The new de-facto LTS compiler is gcc 4.2.1, the last version released under GPLv2. After gcc switched to GPLv3, Apple and FreeBSD stayed on 4.2.1.
But that is definitely a good place to stop/start (depending on how you look at it)
An LTS release of an open source compiler."
Surprising that this doesn't already exist--Apple and RedHat and Ubuntu, etc. must all maintain what is in effect a LTS version of the compilers they ship, in the same way that OpenBSD does.
Clearly OpenBSD developers haven't been involved with the compiler engineering communities, and their wishes have been neglected over many years; this is not news. Why? Because there are no _users_. Bugs don't get fixed by bitching about them: they get fixed when you get involved with upstream and write patches.
GCC can be as "conservative" or "cathedral-like" as it wants: if it does not produce sufficiently optimized code for the big users, it will be thrown out the window. Today, GCC is in active development and has more users than anyone else. The leadership is strong: RMS and many of the GNU people are heavily involved. Those are the facts.
LLVM is the other elephant in the room: from my experience posting patches on their list, they don't give a shit about getting llvm/clang to build linux.git, or many of the projects that currently use GCC. Although it might be technically superior (the code is much more readable and maintainable), the community is much too narrow. Moreover, the leadership is gone: most of the top contributors (Chris Lattner, Evan Cheng, Reid Spencer) seem to have lost interest in the project.
The proprietary compilers like ICC are mostly useful just in research. Sure, they produce highly optimized code, but they're black boxes that cannot be studied or tinkered with. I tried compiling git.git with ICC a few years ago out of curiosity [1]: pages and pages of totally pointless warnings; gcc and clang both clean-compiled git.git at that point.
What the community needs is a compiler project with a strong leadership that cares deeply about its users, not a dead "LTS" project that nobody else gives a shit about: nobody wants to work on a project that's in maintenance-mode. Hardware, programming languages, and compilers evolve constantly, and programmers must learn to cope with these changes.
Fwiw, I'd really like to see what "bugs" this guy is talking about. If they really don't care about hardware, programming language, and compiler technology advancement, why don't they just maintain a port of an older version of GCC? Why bother with new versions at all?
The LLVMLinux project is making good progress towards this goal. http://llvm.linuxfoundation.org/
EDIT: I misunderstood the license file, the majority of the code is non-commercial-use only, only a small part is dual-licensed. Still a cool project, even if it's not open source...
"First, compilers are fragile. While one would like to expect a minimum
level of correctness and trustworthiness from a modern compiler, we
can't, regardless of the compiler we use."
CompCert (had it been truly open source) would have provided that trustworthiness. And it can be ported to new architectures with the confidence that these ports won't silently break by random other changes.It's some non-commercial non-free license. Only certain files are GPL? I don't think this would be redistributable...
What Miod is really asking for is an open source compiler that puts stability and portability over compiler optimization. If it weren't for optimizations and the endless fiddling that goes with them, gcc would have remained stable. It is already the only option that meets their portability requirements.
> The only way to prove code correct is by reference to some kind of specification.
there are other interesting properties to prove besides "correctness of the entire compiler".
you could prove the entire compiler version n+1 emits the same code as version n, modulo $bugfix.
you could prove that individual optimizations are, on their own, correct with regards to the AST or IR that gcc operates on.
neither of those require a formalized spec of C. the first would probably make miod pretty happy. sadly one isn't going to get gcc man handled into a proof assistant, ever.
While one would like to expect a minimum level of correctness and trustworthiness from a modern compiler, we can't, regardless of the compiler we use.
Notice he doesn't say "open source compiler" - he means all compilers will have issues. Much as it sounds like it would be nice to just pay someone for good tools and not have to worry about it, that's not the case (and never has been). The reason I look for open source solutions first is precisely because I know nothing is perfect, but at least when I have the source I know I might have a chance of fixing it, or if need be, pay someone more expert to. Just paying someone upfront for something I can't tinker on in no way guarantees an incentive to make it stable. I know this from experience. OTOH, I'm more than willing to pay (and have) for open source software. I often wonder if something like kickstarter (without as much fanfare or pressure) might be a good way to fund LTS releases of open source software.
Nothing of the sort is true.
There are plenty of open source projects with stable releases, for example Debian, Ubuntu LTS, Firefox ESR.
Some open source projects focus more on stability than others. The same is true for closed source software.
The compilers are basically ABI stable right now.
If your complaint is support, well, yeah, nobody is going to prioritize fixing your bugs over someone elses if you don't pay them.
GCC does converge on stable releases, in the same way debian/ubuntu/everyone else does.
They declare they will not ship until there are less than X P1 bugs, X P2 bugs, etc.
These bugs get fixed, and the compiler ships. Non-primary platforms and miscellaneous bugs are simply an artifact of life.
Did you look at ICC / XLC / MSVC? They typically outperform GCC by about 20%, although I haven't checked in a while.
Note: our build system uses three compilers.
I don't see how not having a LTS version supports that argument. Lack of support for previous versions doesn't really imply anything about the stability of the past, present, or future releases. I think they're mostly orthogonal concepts.
The problem is if you combine all the various flags that affect the compiler, across all the architectures, across all the platforms, in all its variants (cross compiler, native, the many handful of libc and barebones variants) you're looking at too many tests to run no matter how huge an infrastructure you have to run it.
Another problem is that optimization depends a lot of context, given the amount(basically infinity) of C code that could surround any other piece of C code and affect the result - it's quite a hard task.
One interesting approach is csmith ,http://embed.cs.utah.edu/csmith/, that generates random C programs and look for bugs.
As someone who at one point in time maintained such a compiler test system, I'll say that it isn't possible to get all combinations, but you can hit a reasonable percent of them.
A good compiler test run end up running through millions of tests. It isn't for the feint of heart, but it is perfectly doable.
Add multiple platforms, and it gets more difficult, because each platform has quirks and peculiarities you don't know about until you get a bug report.
Add optimisation of languages that are difficult to automatically reason about, and it gets harder again.
Multiply the optimisation difficulties by multi-platform difficulties and yes: it is hard.
I do wonder if OpenBSD would accept a C compiler built on a small functional language amenable to these kinds of proofs. I suppose fulfilling the portability requirements is priority one, then stability. I just wonder if they would accept something not written in C at all.
"For information regarding the C++11 features available in the experimental C++11 mode, see http://gcc.gnu.org/projects/cxx0x.html"*
That link shows that the compiler is feature complete, and refers (indirectly) to http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#s... for library support. That has quite a few 'missing x,y,z' or outright "N" markers.
gcc 4.8.1 is from May 31; Clang/LLVM _claimed_ full C++11 support in June (http://blog.llvm.org/2013/06/llvm-33-released.html)
Does that mean that clang passed the finish line earlier? Maybe, but it could just be that the gcc project looks harder for bugs in their own project, thus placing their finish line farther out.
Frankly, it doesn't really matter who was first. It's way more important to know whether the compilers generate correct code and if they do, that it is efficient.
Not perfect, but pretty good.
Your point is still valid: if you want to compile C99 code, MSVC is not even an option.
If you don't care about sticking to C, you can usually get what you want with a C++ feature anyway.
http://stackoverflow.com/questions/3879636/what-can-be-done-...