C considered dangerous(lwn.net) |
C considered dangerous(lwn.net) |
I'm confused by this. The third argument provides the destination length, so what good would a "maximum destination length" do? I guess he must mean that because the length is often computed, you'd need a fourth argument to ensure the length isn't greater than some sane upper bound. But you can easily fix that using an if statement around the memcpy.
Maybe memcpy_oobp (out of bounds protection) signature could be:
memcpy_oobp(void* dst, size_t dst_size, void* src, size_t src_size);
Then again, I guess you could just as well do: memcpy(dst, src, min(dst_size, src_size));
But having to explicitly specify both destination and source sizes might have prevented a lot of buffer overwrite bugs.A good way to prevent this is to have a buffer abstraction, where the size is a property of the type, e.g.,
typedef struct {
size_t bytes_used;
size_t capacity;
void *data;
} buf_t;
int buf_init(buf_t *buf);
void buf_cleanup(buf_t *buf);
void buf_copy(buf_t *dst, buf_t *src);
/* ... */
Of course, it doesn't prevent people from using memcpy directly. memcpy_s (void *dest, size_t destSize, const void *src, size_t count);
which is effectively equivalent to your memcpy_oobp function.However the Microsoft function also returns an error code which must be checked (because count might be larger than destSize), thus providing another way for the programmer to screw up. I'm not sure if this is better or worse than just copying the min() as in your second example. It probably depends on the situation.
And yes, having something like "if (strlcat(buffer, src, sizeof(buffer) >= sizeof(buffer)) { abort(); } " is much better than buffer overrun. But security does not always seem to be a real concern, compared to politics.
C is dangerous partly because of swaths of undefined behaviour and loose typing. Eliminating much of undefined behaviour either by defining the behaviour or forcing the compiler to refuse compile undefined behaviour could be of some help. There are still classes of undefined behaviour that cannot be worked around but narrowing that down to a minimal set would make it easier to deal with it. Strong typing would help build programs that won't compile unless they are correct at least in terms of types of values.
C is dangerous partly because of the stupid standard library which isn't necessarily a core language problem as other libraries can be used. The standard library should be replaced with any of the sane libraries that different projects have written for themselves to avoid using libc. It's perfectly possible not to have memcpy() or strcpy() like minefields or strtok() or strtol() which introduce the nice invisible access to internal static storage, fixed by a re-entrant variant like strtok_r(), or require you to do multiple checks to determine how the function actually failed. The problem here is that if there are X standards, adding one to replace them all will make it X+1 standards.
Yet, good programmers already avoid 99% of the problems by manually policing themselves. For them, C is simple, productive, and manageable in a lot more cases and domains than it is for the less experienced programmers.
I really wish Bell Labs had been allowed to sell UNIX.
Even if the complete userspace of Aix, HP-UX, *BSD, GNU/Linux, OS X, iOS, Solaris,.... gets re-writen in something else, there will always be the kernel written in C.
Hence why improving C's lack of safety is so important to get a proper IT stack.
I've always felt that C is near the sweet spot. I'd rather see a minimal change to C that broke backwards compatibility (because it has to) and fixed the top ten simple problems.
Kernel drivers and embedded system bare metal firmware.
The problem with C is that in any bigger project something always slips through even the best programmers, reviewers, static analysis and unit tests. And that something can lead to disastrous crashes and security vulnerabilities.
You can still overwrite memory but it suddenly became much less likely.
That's also true of all the other languages.
C is a very useful language and one you basically have to know if you're interested in low level software but it's very, very far from flawless.
If you look at many high profile software vulnerabilities of late (heartbleed, goto fail, etc...) many can be traced to the lack of safety and/or bad ergonomics of the C language.
We need to grow up as an industry and accept that using a seatbelt doesn't mean that you're a bad driver. Shit happens.
This doesn't actually refute the assertion that C is dangerous :)
Control and increased safety are not mutually exclusive. I'll take safe-by-default, unsafe-when-asked any day. It's not 1972 anymore.
Just imagine how many millions the IT industry and PhD research have spent developing solutions that would improve C's safety, many of them largely ignored by most C developers.
I'd wager it'd be much better to just specify that abort() gets called in the "overflow" case. (Given that overflow is basically never what you want anyway.)
Yeah, it'll crash but at least it won't be suprising/undefined behavior.
Yeah, bare metal systems often don't allocate at all. Although one sin they often do commit is using same buffer for multiple purposes. What could go wrong...
Perhaps even more common is allocating a buffer on stack and writing past bounds somehow. Also DMA to/from stack is usually not a great idea...
Above things sound dumb, but can easily happen when you build your abstraction layers and use them carelessly.
wait what oh my god
That said, sometimes I'm shocked what kind of disasters get past the analyzers.
Stakes are higher than ever. It's not just about functional correctness and avoiding crashes anymore. Your code needs to be secure against outside world malicious actions. Getting rid of counterintuitive security vulnerabilities is very, very hard.
Sadly we are a very very tiny percentage, as proven by Herb Sutter question to the audience at CppCon (1% of the audience answered positively), and CVE frequent updates.
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=memory+corr...
Good C-compilers will most of the time take care of the superscalar CPU friendliness. When they don't, you can always drop down to the assembler level, and it'll mesh well with C.
Likewise most static languages defer to the compiler for CPU-specific performance optimisations and will permit foreign native calls into C or ASM where necessary. So I don't see how this is an argument in C's favour.
You often also need correct alignment. Cache-line or page. Your unboxed access across two pages can cause two TLB misses, L1 misses etc. Not to mention two page faults.
Sometimes you need to ensure two (or more) buffers are NOT aligned in a particular way to avoid interfering with CPU caching mechanisms.
C wins them all in implicit conversions and opportunities for memory corruption.
Their major sin was to be tied to commercial OSes, instead of one with source code available for a symbolic price to universities.
Of course you can shoot yourself into foot with stuff like metatables in Lua and Python metaclasses and whatnot. Then again you should see some C macro messes around...
Anyways I don't like when people defend C with that age old argument it requires a clever disciplined programmer that never makes mistakes. Because either such programmers don't exist or they're very rare.
Fewer defects, or just different (arguably less severe) defects? It's great that you're sure, but evidence would be even better.
Scripting languages do have their pitfalls. Lua and python can have type mismatches and even typos causing misbehavior, things that usually aren't issues with C.
However, you do need significantly less code than in C.
C and C++ = Logic Errors + Memory Corruption + UB
From this point of view,
Σ Logic Errors < Σ (Logic Errors + Memory Corruption + UB)
#include <stdlib.h>
void *aligned_alloc(size_t alignment, size_t size);
which works like malloc() but lets you specify the required alignment.Something like this for example:
char* aligned_buf;
char* buf;
size_t max_align_offset = (1<<align) - 1;
buf = malloc(length_needed + max_align_offset);
aligned_buf = (buf + max_align_offset) & ~max_align_offset;
In the example, if align==8, you have 256 byte alignment. If it's 12, 4kB alignment.It was just an example to show one way how C can control alignment.