C and C++ Aren't Future Proof(blog.regehr.org) |
C and C++ Aren't Future Proof(blog.regehr.org) |
It doesn't matter. This is not how programming works in the real world. In the real world, you write the most correct program you can under time pressure. A new compiler, operating system, or platform arrives that exposes a bug. You fix it and you move on. It doesn't matter if the language is future proof or not. The process is similar for any complex program.
The blog's name is "Embedded in Academia" and this is perfectly valid viewpoint for someone in academia to take. And people in academia should research towards building more robust tools and languages. But it really is not going to matter in the real world. Languages and platforms will always not be future proof because computing is complex.
His suggestion #3, that the standards should define more of the commonly used behavior and leave less of it undefined, wouldn't even require C programmers to do anything about it themselves.
I've written Windows, Mac, Linux, Xbox, PlayStation, PSP, iOS, and Android code. The memory model is subtly different for each platform. I just don't think you can define certain behaviour and have that work across disparate platforms.
I haven't really written any device drivers or kernel space code but I would imagine it would make the job even more difficult.
Picture a single person or small team releasing an open-source project, it generates little developer interest and a community fails to start, and the original author(s) move on.
Fast forward 5 years or more. The code's floating around the internet, but nobody's left who understands it well enough to explain why it breaks with a modern toolchain. Requiring people to use a compiler -- and possibly an entire operating system -- of that age will deter people significantly from using that project.
A new compiler, OS or platform will require much less rewriting of a Python program than a C program. Under time pressure, it is much more likely that you will incidentally write future-proof code if you write in Python instead of C.
I can speak as someone who has been programming in C and C++ for over ten years, but only in the last few years became aware of this issue and started taking it seriously. Five years ago I would do things like cast function pointers to void-pointer and back, or calculate addresses that were outside the bounds of any allocated object and compare against them, all without really even realizing I was doing something wrong.
I don't think this will spell doom-and-gloom for C and C++ though. I think a few things will happen.
First of all, the compiler people are walking a fine line; yes, they are breaking code that relies on undefined behavior, but they often avoid breaking too much. For example, I've had it explained to me that at least for the time being, gcc's LTO avoids breaking any programs that would work when compiled with a traditional linker. In addition, they often provide switches that preserve traditional semantics for non-compliant code that needs it (like -fno-strict-aliasing and -fwrapv).
Secondly, I believe that tooling will get better, and rather than ignoring the warnings I believe that people's general awareness of this issue will raise, as well as knowledge of standard-compliant ways of working around common patterns of undefined behavior. For example, it's often easy to avoid aliasing problems by using memcpy(), and this can usually be optimized away.
Thirdly, I expect that the standard may begin to define some of this behavior. For example, I think that non-twos-complement systems are exceedingly rare these days; I wouldn't be surprised if a future version of the standard defines unsigned->signed conversions accordingly.
For me, this is perhaps the biggest issue raised in this article, as static and dynamic analysis tools become more ubiquitous we should be learning to fix the issues that they raise, not ignore them.
I remember a while ago (2004 or 5) interviewing a college-hire candidate, I had asked about working with others and we had gotten to talking about code review - the candidate was passionate about how code review had helped with a group project he worked on, but every single example he gave of a a bug found by code review was something that -Wall would have found...
The same applies to static analysis - let the machines do the work that they can do, that leaves the humans to get on with the work that the machines can't do (yet!)
Making a correct overflow check in C/C++ is not just not straightforward, it is overy complicated even for experienced developers [2]. This is IMHO inacceptable for a thing that is required often in a security context.
Therefore, I hope that option 3 proposed by the author (change of the C/C++ standard to define the correct behavior at for least integer overflows) will be adopted. However, this probably will not happen for a long time, leaving us with security holes all over the net.
[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475
[2] http://stackoverflow.com/questions/3944505/detecting-signed-...
C++11 added many changes intended for "do-it-yourself" crowd, like auto, new function syntax, lambdas. It didn't add much in terms of "let the compiler do the work for me" crowd (one notable exception being variadic templates, something that was in my own XL programming language since 2000). In C++, you are still supposed to do the boring work yourself.
For example, C++11 still lack anything that would let you build solid reflexion and introspection, or write a good garbage collector that doesn't need to scan tons of non-pointers.
If you want to extend C++, it's just too hard. C++11 managed to add complexity to the most inanely complex syntax of all modern programming languages. Building any useful extension on top of C++, like Qt's slots and signals, is exceedingly difficult. By contrast, Lisp has practically no syntactic construct and is future proof. My own XL has exactly 8 syntactic elements in the parse tree.
So in my opinion, C and C++ are already left behind for a lot of application development these days because they lack a built-in way to evolve. If you are curious, this is a topic I explore more in depth under the "Concept programming" moniker, e.g. http://xlr.sourceforge.net/Concept%20Programming%20Presentat....
These are not problems of a language per se, but the original sins of neo-vaxocentrism and confusing "I understand how this might work, at some random abstraction layer" and "I can depend on what happens when I do something stupid". Free your mind of these and the rest will follow.
These low-level bit banging errors are vastly less common than shared-memory concurrency issues, which as far as I can tell are endemic to all code that attempts shared-memory concurrency, in any language. If you want to have an axe to grind about languages that aren't future proof, look there.
Talking about C, well... it's unsafe by nature, let's face it.
C is there since 1972, it is one of the most widely used programming languages of all time and there are very few computer architectures for which a C compiler does not exist. Many later languages have borrowed directly or indirectly from C, including C#, D, Go, Java, JavaScript, Limbo, LPC, Perl, PHP, Python, and Unix's C shell.
Don't all languages have "don't do that" corners, even if they are just bugs in the current versions of the compilers/interpreters?
C and C++ at least tell you where some of these are, so actually the situation is better?
Laws are the be broken, and C/C++ is the wild west in this respect - cowboy programming is welcomed.
And I love it :)
omfg:))
Why is this on HN?
Use the right tool for the job. Sometimes that C or C++, sometimes it's not.
I'm sure Dr. Regehr is a smart guy, but I don't consider academics good sources of advice on software engineering, for the same reason I don't get sex tips from Catholic priests. Also, John Regehr's CV relates more to static analysis than software engineering anyway.
"Like leaving a string unfinished. You expect a compiler error right? Nope, undefined.
Any why should making an invalid pointer be undefined?
It becomes ridiculous to try to just remember the rules.
unsigned -> signed conversion is already “implementation-defined behavior” (as opposed to “undefined behavior”). The standard does not guarantee how it behaves but forces compilers to make a choice and to stick to it.
A different example, of a behavior that is really undefined, would be signed arithmetic overflow:
int detect_max(int x) { return x+1 < x; }
The function above branchlessly detects that its argument is INT_MAX, and returns 1 in this case thanks to 2's complement representation.
Except that it doesn't. The command “gcc -O2” compiles it into “return 0;”. GCC can do this, because signed arithmetic overflow is undefined behavior. The compiler is only taking advantage of undefined behavior in a way locally convenient.
Now that two's complement is (almost) everywhere, making it the standard for signed arithmetic overflows is the sort of bold choice I would like to see, but it won't happen (it would break GCC's existing optimization).
What does the STL do about signed overflow? As for out of range pointers, that is an easy one to get with the STL:
vector<int> somevector(100);
somevector[200] = 5;
"These are not problems of a language per se"Yes they are: the default numeric type is fixed-width, pointers pop up all over the place and pointer dereferences are unchecked by default. Personally, though, I would have chosen (as the article's author did) the more severe deficiencies in the standard, like the lack of any requirement that a function with a non-void return type have a return statement along every control path or the fact that there is no reliable way to signal errors that occur in destructors.
"These low-level bit banging errors are vastly less common"
Not in my experience, and not judging by the number of bug reports and vulnerabilities I have seen that stem from low-level mechanics.
this:
vector<int> somevector(100);
somevector[200] = 5;
Is a C idiom translated by cut-and-paste. The unmotivated poking of arbitrary magic-number offsets into a magic-number sized vector is not proper. It's the kind of thing that sets off alarm bells on even the most casual of review.This is not a matter of RAII or "smart" pointers (which are not even smart enough to deal with cyclic references unless the programmer explicitly breaks the cycle). It is a matter of a language whose high-level features are constrained by low-level concerns, where the features compose poorly, and where things that are obvious (like requiring that functions with a non-void return type have a return value along every control path) are simply omitted from the standard. It is dizzying to think of how much money has been spent on bugs that were made possible by C or C++; why are we continuing to waste time and money on these languages?
We see this a lot with Ruby on Rails web apps, for instance. Perhaps they're quicker to develop in many cases, and maybe they're slightly less vulnerable to certain problems than C or C++ apps are, but they are much less efficient at runtime.
This inefficiency becomes visible when more hardware, or more powerful hardware, is needed to run such web apps. This inefficiency further becomes evident when users (it's worse when they're highly-paid employees) have to literally sit and wait for the web app to do its work. Over time, these costs can add up significantly.
It can be even worse for applications that are running on millions of systems. Even slight performance decreases can sum together to be very costly at such a scale.
C and C++ are still unmatched when it comes to producing efficient applications, both in terms of CPU usage and memory usage. Languages like Go and Rust may get close, but that'll be far in the future, if ever. I think it's safe to say that scripting languages like Perl, Ruby and Python will, in general, never be as efficient as C or C++.
Maybe you see C and C++ as a "waste of time and money", but they bring significant cost reductions for many of their users. That's why they're still being used today, and while they'll be used for a long time to come.
Citation needed. This is a wishful thinking idea that ignores tons of pragmatics in application development.
I'll consider once the Mozilla foundation have written any application of note in it.
Could they have used C/C++ for better performance? Yeah, but Java was goodEnough and for most high performance web apps it is GoodEnough, the cost being some more money for hardware (for Ruby and Python the cost is a lot higher and sometimes not even with expensive hardware you can solve the problems):
http://c2.com/cgi/wiki?GoodEnough
Just like C++ became GoodEnough and people switched to it from C for high performance apps, just like C became GoodEnough and we switched from assembly to C.
Then again I also understand that because the language changes and evolves, maintaining more code in the docs means more effort. However, this is one of the few things I've felt bugging me as an end-user. Of course I believe this can be expected to change for the better in the future as the language matures and stabilizes, but I figured I'd voice this out given this perfect opportunity. :)
The fundamental "problem" we're having/facing with C and C++ is the investments we've put in. Lots of "infrastructure" in modern day computing relies on C and C++ and will do so for ages. We can't just drop the projects and switch to something else(say Rust or maybe Go), but we can stop creating new C and C++ codebases to alleviate the problem for the future.
And I have better things to do than port them, because C++ doesn't bother me.
Otherwise known as the "just do it right" argument. This is an argument that goes all the way back to the days of writing everything in assembly language, and it was just as wrong then as it is today. If only a restricted subset of a language can ensure that basic issues do not become serious problems, then the language should be restricted to that subset.
"not having the compiler make up semantics for broken code or putting in checks everywhere"
Really? I would rather have the compiler put in run time checks whenever it cannot infer that no input will cause the program's behavior to be undefined. Thus, the compiler might insert a check here:
for(i = 0; i < input.length(); i++)
some_vector[i]++;
but not here: for(i = 0; i < min(input.length(), some_vector.length()); i++)
some_vector[i]++;
nor here: if(input.length() > some_vector.length()) {
throw some_exception();
}
for(i = 0; i < input.length(); i++)
some_vector[i]++;
At the very least, requiring bounds checks on array access would create a definition for out-of-bounds pointers: program termination (or perhaps an exception being thrown). A reasonably good compiler can detect when a bounds check is unnecessary and can remote the bounds check as an optimization. Why shouldn't this be something that compilers do -- out-of-bounds array access is never a good thing (oh, wait, you might be dereferencing some arbitrary pointer that you got by some means other than allocating memory with "new" -- OK, fine, but that is what type systems are for; this sort of separation is not unheard of, I see it in Lisp with SBCL's FFI)?"The unmotivated poking of arbitrary magic-number offsets into a magic-number sized vector is not proper. It's the kind of thing that sets off alarm bells on even the most casual of review."
Perhaps so, but then the answer is not simply "just use the STL." As with most things C++, it requires a long list of things to make code work right, and even people who have been writing C++ code for many years are sometimes surprised to discover that something they thought was fine is actually bad. C++ makes it pretty easy for programmers to do the wrong thing and needlessly difficult to do the right thing, which is why years of expertise are needed to write remotely reliable C++ code.
That said, I don't understand why you still keep putting up awful mostly-C code as if any trained C++ programmer wouldn't yell at you for doing it wrong, even before they saw the part with the error.
for(i = 0; i < input.length(); i++)
Where did you learn this? don't do this. Everyone else knows not to do this. for(i = 0; i < min(input.length(), some_vector.length()); i++)
This is actually worse, though it does have the virtue of probably working. If you want a run-time check, use at(), or better still use an iterator already.C++ has all sorts of issues. It's too hard to learn, it's missing some very useful features, and it has a number of rough edges that you have to learn your way around. But the things being complained about in the OP and by you are not real problems for anything but beginners. There just aren't that many naked array accesses or pointer math operations going on in an ordinary C++ application written in non-C style.
Even modern C++ has very unsafe parts.
Maybe, maybe not. Can you make a better example than arbitrary magic numbers used as pointers?
- IOC : low overhead, only for integer overflows
- KCC : high overhead, for all kinds of undefined behavior, limited standard library support (and source-level only)
- Valgrind : medium overhead, for various memory errors, binary, may fail to detect undefined behaviors that have been made undetectable by compilation.
You may also find:
- various memory-safe C compilers. There are plenty here, I had better let you do the googling. medium overhead, generally better than Valgrind at being sound (since they work at source level), unless they trade efficiency for soundness: http://research.microsoft.com/pubs/101450/baggy-usenix2009.p... . May require all source code to be available.
- Frama-C's value analysis, a static analyzer that can be used as a C interpreter. This is what I work on. Limitations comparable to KCC, quite a bit faster (but still high overhead), some slightly different design choices. I do not have a good single write-up for this use, but some details are available at these URLs:
http://blog.frama-c.com/public/csmith.pdf
http://blog.frama-c.com/index.php?post/2011/08/29/CompCert-g...
http://embed.cs.utah.edu/ioc/ http://code.google.com/p/c-semantics/
Haven't used either in anger though.
You could also have the compiler insert checks. Obviously this isn't desirable for a lot of C projects by default, but (other than in places like kernel development etc.) it could be a nice debugging aid. I don't know of any good tools for doing this comprehensively.
Another behaviour he mentions is not properly return'ing at the end of a non-void function. This is again technically equivalent to the halting problem, but it is negated by the good practice of making every code path (even potentially dead ones) have a return statement (or throw an exception, etc.) Go takes this approach if I remember correctly.
That's generally how compilers take advantage.
The C ABI is the operating system ABI and it is only ubiquitous in operating systems done in C. Other operating systems use whatever ABI the system offers.
A good post describing how these optimizations come about is http://www.airs.com/blog/archives/120
More options to warn about uses of or disable these optimizations would be welcome in compilers.
I own a pneumatic nail gun that has two rather rudimentary safety features in the form of a trigger lock and a switch near the end of the 'barrel' the prevents it from being activated without the barrel pressing against something (in normal use, the wood you're nailing). It's rudimentary and still accidents happen with nail guns. I sometimes purposely circumvent the safety measures to get something done, e.g. when I'm shooting nails under a weird angle. It would be possible to think of many more safety features - allowing the gun to be operated only when activated with two hands (preventing one from shooting in one's hand), having all sorts of electronics that detect the surface that is being shot into, etc. Not a single one of guns would get sold because they cripple the way you work too much to be convenient.
This is a false dichotomy. Safety checks can be disabled if they become a performance problem in languages like Lisp. Safety checks can often be removed by a good compiler when the compiler can infer that the check will always be satisfied.
C++, however, provides nothing by default -- as opposed to being safe by default, and allowing programmers to be unsafe if they explicitly request that.
There are times, however, when an experienced, knowledgeable user does need the power these features provide. Amazing things can be accomplished that couldn't otherwise be done when using a language like Java, C#, Ruby, Python, Perl, Go, Haskell, or Scheme.
The compilers are now starting to fairly radically rewrite the original code in ways the author would not recognize, simply because of some undefined behaviour exists within the code. You need to be increasingly language lawyerly to avoid the compiler outsmarting you, almost as if it was a hostile opponent.
The read of an uninitialized variable in the article was a good example.
The problem is that programmers have a mental model of how the C they write turns into machine code, and that model is increasingly out of date in the search for more performance. The compiler is becoming less predictable, in precisely the way that we argue against "sufficiently smart compilers" in the past for languages at a higher level than C - that you wouldn't be able to predict when the smart compiler was smart enough to optimize your high-level construct. Now you're increasingly unable to predict what the compiler will turn your code into, unless you have a deeper understanding of the rules.
Same with other high-level VM based languages like Python...
For an impression, see this excellent blog series on how a certain garbage collector sweep issue was solved: http://blog.cloudera.com/blog/2011/02/avoiding-full-gcs-in-h...
There are undefined sequences even in Python, where Jython and CPython output different programs.
One of the major design decisions for C/C++ is that you don't pay for what you don't use. This is what makes them so flexible and performant across a wide range of systems and applications, but also leaves these safety choices up to the user. Some languages make that tradeoff, but it's not always the right decision.
For example, in the Pascal family of languages, you can always disable bounds checking or do pointer arithmetic if you really want to, but that should only be done if there is really the need to do so.
A problem with many C and C++ developers is that they suffer from premature optimization, thinking that we are still targeting PDP-11 like environments.
My core point is that the OP has a theory about there being a school of C programmers that intentionally or unintentionally invoke undefined behavior and expect the compiler to do the right thing. He's doing a pretty good job of backing it up, although I'm not sure I understand what exactly he's proposing to do about it.
... And then he just kind of throws C++ in for the ride, presumably on the argument that C++ is just like C with even more cases for undefined behavior. But that's not correct because he's making both a technical and a cultural argument. C++ is technologically (mostly) a superset of C, the culture is completely different to the extent that Linus famously argued that the main advantage of using C is that it keeps all the C++ programmers out. http://article.gmane.org/gmane.comp.version-control.git/5791...
Of the widely fragmented C++ user base, there are multiple, popular methods of development that encourage true high-level development were you are encouraged to target your code to the abstract/portable machine that the standard uses and not your personal guess of how the compiler should work and avoids doing things that require inordinate care to get right.
Again, C++ is full of practical problems, the kind of undefined behavior cases the OP worries about don't really rank up there among them.
For the same reason we can't ever completely prevent traffic accidents by requiring higher skilled drivers. We can prevent traffic accidents by building cars, lanes, junctions and roads in such way which minimizes the damage caused by a human error.
I'll rather use a hammer which refuses to strike to my finger even if I try to make it to, rather than one which I can smash my fingers with by accident. I am sure you would too.
int foo = 1;
int bar;
System.out.println(foo + bar); // compile error: variable bar might not have been initialized