Git's list of banned C functions(github.com) |
Git's list of banned C functions(github.com) |
/joke
It should be strncpy(a,b,(size_t)-1)!
- strcpy: no bounds check
- strcat: no bounds check
- strncpy: does not nul-terminate on overflow
- strncat: no major issues, probably to force usage of strlcat
- sprintf: no bounds check
- vsprintf: no bounds check
- gmtime: returns static memory
- localtime: returns static memory
- ctime: no bounds check
- ctime_r: no bounds check
- asctime: returns static memory
- asctime_r: no bounds check
The str functions all have safer alternatives. The time functions have reentrant alternatives, and/or alternatives that provide a bounds check.(~~Technically~~ Optionally, C11 has strcpy_s and strcat_s which fail explicitly on truncation. So if C11 is acceptable for you, that might be the a reasonable option, provided you always handle the failure case. Apparently, though, it is not usually implemented outside of Microsoft CRT.)
edit: Updated notes regarding C11.
But strings in BASIC are so simple. They just work. I decided when designing D that it wouldn't be good unless string handling was as easy as in BASIC.
"Theoretically" is the word you're looking for: they're part of the optional Annex K so technically you can't rely on them being available in a portable program.
And they're basically not implemented by anyone but microsoft (which created them and lobbied for their inclusion).
C is semantically so poor, I find it hard to understand why people use it for new projects today. C++ is over complicated but at least you can find a good subset of it.
Strings in C are more like a lie. You get a pointer to a character and the hope there is a null somewhere before you hit a memory protection wall. Or a buffer for something completely unrelated to your string.
And that's with ASCII, where a character fits inside a byte. Don't even think about UTF-8 or any other variable-length character representation.
In fairness, the moment you realize ASCII strings are a tiny subset of what a string can be, you also understand why strings are actually very complicated.
One of the big problems with C programmers is they often neglect to check for and handle those failure cases. Did you know that printf() can fail, and has a return value that you can check for error? (Not you, personally, but the "HN reader" you) Do you check for this error in your code? Many of the string functions will return special values on error, but I frequently see code that never checks. Unfortunately, there isn't a great way to audit your code for ignored return values with the compiler, as far as I know. GCC has -Wunused-result, but it only outputs a warning if the offending function is attributed with "warn_unused_result".
I'm not a huge fan of using return values for error checking, but we have the C library that we have.
But also, "strings" and "time" are actually very complex concepts, and these functions operate on often outdated assumptions about those underlying abstractions.
C99 came so very very close with VLAs. You can declare a function like:
int main(int argc, char *argv[argc]) { ... }
But C99 requires the compiler to discard the type annotations and treat the declaration as equivalent to: int main(int argc, char **argv) { ... }
Imagine a world where the C string functions were declared as: char *strndup(s, n)
const char *s[n];
size_t n;
{
/* now we can do sizeof(s) and bounds checking! */
}
(You'd have to use K&R style declarations to get around the fact that the pointer argument comes before the length argument, alas.)Edit: and then C11 made VLA support optional, since the feature didn't get used much, because the feature was only half-baked to begin with... sigh.
Even in safer languages such as Rust, there are often quæstions as to why certain string operations are either impossible, or need to be quite complicated for a rather simple operation and are then met with responses such as “*Did you know that the length of a string can grow from a capitalization operation depending on locale settings of environment variables?
P.s.: In fact, I would argue that strings are not necessarily all that complicated, but simply that many assume that they are simpler than they are, and that code that handles them is thus written on such assumptions that the length of a string remain the same after capitalization, or that the result not be under influence of environment variables.
I remember thinking about setting the high bit to denote the end of string to save space.
Nowadays the binary for "hello world" might be as big as a whole operating system of the past.
(though honestly I can't recall the size of the OS on a boot floppy, but the original floppies were 160k)
Funny mind thing to forget to increment counters each year.
It has nothing to do with null termination.
And that uninitialized memory is not self-describing in any way in the C language. Which is that way in machine language also.
This is a problem you have to bootstrap yourself somehow if you are to have any higher level language.
The machine just gives you a way to carve out blocks of memory that don't know their own type or size. C doesn't improve on that, but it is not the root cause of the situation. Without C, you still have to somehow go from that chaos to order.
Copying two null terminated strings into an existing null-terminated string can be perfectly safe without any size parameters.
void replace_str(char *dest_str, const char *src_left, const char *src_right);
If dest_str is a string of 17 characters, we know we have 18 bytes in which to catenate src_left and src_right.This is not very useful though.
Now what might be a bit more useful would be if dest_str had two sizes: the length of string currently stored in it, and the size of the underlying storage. This particular operation would ignore the former, and use the latter. It could replace a string of three characters with a 27 character one.
"What? You mean I can type an arbitrary string and it works? I don't need to worry about terminators or the amount of memory I've allocated? You can concatenate two strings with +?!? What is this magic?"
I still love C, but I'd do my best not to have to write anything serious with it again.
Compared to the alternative (straight assembler) at the time as a systems programming language, C is a massive step up.
Also, the UNIX way was independent processes, so the APIs did not need to be thread safe, as there was no threading in the target architectures.
Now given the massive amount of existing C out there from the time of such architectures, you either have to move the API and language on to make it incompatible with existing code, or support the old baggage. The language has kept compatibility, and in this case, the github peeps have deprecated APIs using macros, so it's a reasonable approach.
An alternative approach would be to move the language on, but by it's nature it won't be compatible with C, so you give it a new name. You call it things like go, or rust, or swift. These are all C with the dangerous bits removed. It'll be interesting in 40 years time to see if people are having the same conversation about these languages - 'OMG, how did people write stuff in rust? It can't cope with [insert feature of distributed quantum computing]. It's really scary'
I've been coding in JS on a daily basis for more than 10 years and today I learned there is a `with` statement in JS.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
Edit: well, seems like it's been deprecated/forbidden since ES5 (2009), so it makes sense I've never seen it.
Also, I just want to remind you that JS isn't just React. There are plenty of libraries written in C that introduce breaking changes over the course of 3 years. Nothing will stop people from finding ways to complain about JS though, I know. The hate-boner is very real.
I think stuff has kinda gotten better, but while Unicode had emoji to kinda save the day, dates never had this moment and we're still suffering through major messes on a daily basis because of it.
C's string manipulation functions are a regular source of the worst vulnerabilities in software.
Even if they're in the same category of legacy cruft, they're not even remotely in the same magnitude of consequences.
It's absolutely true that decades ago the C community was complacent, but it's not true now. Source: I taught secure coding in C/C++ in the 00s.
On BSDs and macOS you're always SOL because the syscall api isn't stable and only the C wrappers are.
It's easy to survive: just don't crash. :)
And, functions aside, it's trivial to write a C program that bombs out without calling any functions at all, safe or otherwise.
It's a language from a different era, for sure. Back then no one had the computing power to build Rust. And remember that before C, they were writing Unix in assembly language. So sprintf() was a big step up!
Why can't we just have some nice structures instead?
struct memory {
size_t size;
unsigned char *address;
};
enum text_encoding { TEXT_ENCODING_UTF8, /* ... */ };
struct text {
enum text_encoding encoding;
struct memory bytes;
};
All I/O functions should use structures like these. This alone would probably prevent an incredible amount of problems. Every high-level language implements strings like this under the hood. Only reason C can't do it is the enormous amount of legacy code already in existence...1. https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...
Recall that during the rise of C, people were writing machine code on punch cards. Assembly -> Machine code has far more footbullets than C, it is a tradeoff between hand holding and tiny fast code.
Wow, this blew up.
To all the people popping off about how great other languages are, tell me: when will we see the Unreal Engine written in Python, or Pascal, or Algol, or Rust, or Go... the next big step is WebASM (or .cu), and that's way more footbullet-y than C. And what is the native language all of your sub-30 year old interpreted languages were written in? Thank you!
I know C/C#/Python/Rust/Javascript.
After a decade of using C I am still not totally sure if I didn't dangle a pointer somwhere in precisely the wrong way to create havoc. And yeah, that means I have to get better, etc. But that is not the point. The point is, that even with a lot of experience in the language you can still easily shoot yourself into the foot and don't even notice it.
Meanwhile after a month of using Rust I felt confident that I didn't shoot myself in the foot, because I know what the compilers e.g. ownership guarantuees. While in C shooting myself into the foot happen quite often in Rust I would have to specifically find a way to shoot myself into the foot without the compiler yelling at me, and quite frankly I havent found such a way yet.
Javascript is odd, because the typesystem has quite a few footguns in it. This is why such things like Elm or Typescript exist: to avoid these footguns.
I don't want to take away from the accomplishments of C, and I still like the language, but to claim it is equally likely in all languages to shoot yourself into the foot is not true.
The “C competing with assembly” meme was very specific to microcomputer game and operating system development, not more general microcomputer application development, and not to minicomputer or mainframe development.
Or Fortran, Algol, Lisp, Cobol, Basic, Pascal, ...
He had some bug where in one place it returned to the start of the string, executed it, and kept going. The end result just happened to be a nop. Had been like that in production for a couple of years.
The reason why C won had little to do with its advantages as a language over the competitors. It just happened to be the systems language for Unix, which was the winner in the early OS wars on microcomputers (for unrelated reasons). Once it became so established, there was a positive feedback loop: you would write portable code in C, because you knew that it was the fastest language that most platforms out there would support. And then any new platform would offer a C compiler, because they wanted to be able to run all the existing C code out there. And so, here were are.
Those of us who have always known about less dangerous 'system' languages (Pascal probably being the most popular) lament the fact that so much code got written in C instead.
It wasn't inevitable. It was preventable! It just didn't happen that way for reasons which are largely historical.
I don't work for the Rust Evangelism Strike Force, my main project is written in (as little) C (as possible), but I beg anyone who has a choice: use something else! Rust is... fine, Zig is promising. Ada still works!
Writing out the set {Python, Pascal, Algol, Rust, Go} tempts me to say uncharitable things about your understanding of the profession, but I accept you were just being snarky so I'll just gesture in the direction of how $redacted that is.
Why would a huge C++ (not C, btw) codebase with roots going back to the 90s be rewritten in any other language?
And in fact how is the language Unreal Engine written in relevant to C having footguns?
Go (golang)
It is not that there is anything intrinsically wrong with these functions. You can technically use all of them and I have been using all of them, safely, for decades.
The issue is they are huge traps to the point that in a larger piece of software one can say "well, it's just not worth it".
You can go much, much, much further than that.
In couple embedded projects I worked some of the rules were:
* dynamic allocation after application has started is banned -- any heap buffers and data structures must be allocated at the start of the application and after that any allocation is a compile time error,
* any constructs that would prevent statically calculating stack usage were banned (for example any form of recursion except when exact recursion depth is ensured statically),
* any locks were banned,
* absolutely every data structure must have size ensured, in a simple way, beyond any reasonable doubt,
etc.
> The ctime_r() and asctime_r() functions are reentrant, but have no check that the buffer we pass in is long enough (the manpage says it "should have room for at least 26 bytes"). Since this is such an easy-to-get-wrong interface, and since we have the much safer strftime() as well as its more convenient strbuf_addftime() wrapper, let's ban both of those.
(https://github.com/git/git/commit/91aef030152d121f6b4bc3b933...)
> The traditional gmtime(), localtime(), ctime(), and asctime() functions return pointers to shared storage. This means they're not thread-safe, and they also run the risk of somebody holding onto the result across multiple calls (where each call invalidates the previous result). All callers should be using their reentrant counterparts.
(https://github.com/git/git/commit/1fbfdf556f2abc708183caca53...)
If someone wants some fun, try this:
1. Slurp up all the FOSS projects that extend back to 90s or early 2000s.
2. Filter by starting at earliest snapshot and finding occurrences of strcpy and friends who don't have the "n" in the middle.
3. For those occurrences, see which ones were "fixed" by changing them to strncpy and friends in a later commit somewhere.
4. See if you can isolate that part of the code that has the strncpy/etc. and run gcc on it. Gcc-- for certain cases (string literals, I think)-- can report a warning if "n" has been set to a value that could cause an overflow.
I'm going to speculate that there was a period where C programmers were furiously committing a large number of errors to their codebases because the "n" stands for "safety."
If you are doing something like `sprintf(buffer, "%f, %f", a, b)`, yes it is tricky to choose the size of buffer frugally, but if you replace that by `ftoa` and constructing the string by hand, you are likely to introduce more bugs.
Edit: as pointed out in another post, you can do git blame to see the rationale for each ban, quite interesing.
std::string needs some tweaks, but it can mostly be treated as a built in and it wipes out a huge set of C string issues.
However, I look at old books on C, and then I look at this list, and I wonder if it would not have been helpful to, after mentioning that a function was banned, suggest what the replacement is, even as a comment.
It's likely that the authors of this list didn't think the comments would be worthwhile for the audience (git developers).
- strlcpy() if you really just need a truncated but
NUL-terminated string (we provide a compat version, so
it's always available)
- xsnprintf() if you're sure that what you're copying
should fit
- strbuf or xstrfmt() if you need to handle
arbitrary-length heap-allocated strings
"#pragma GCC poison printf sprintf fprintf
Turns out you just can't use them when you contribute code to the Git project. That makes sense, and seems reasonable.
Edit: wait, I can't use strcpy?! Screw that, then I'm not open sourcing my AGI!
https://github.com/git/git/blob/master/object-file.c#L1293
And currently used here (at least):
While I think such rules are a good idea it only makes sense if it is done consistently and depends on how religiously the tooling (duct-tape and "process") enforces them (even so, you're still only one `#ifdef` away from undoing that "safety"). Having GCC[1] now support static analysis is a killer feature for this type of problem.
On the other end of the spectrum we have Huawei which instead of linting their code is finding creative ways to trick auditing tools and hide such warnings from auditors:
[0] https://news.ycombinator.com/item?id=22712338
[1] https://developers.redhat.com/blog/2021/01/28/static-analysi...
The strncpy() function is less horrible than strcpy(), but
is still pretty easy to misuse because of its funny
termination semantics. Namely, that if it truncates it omits
the NUL terminator, and you must remember to add it
yourself. Even if you use it correctly, it's sometimes hard
for a reader to verify this without hunting through the
code. If you're thinking about using it, consider instead:
- strlcpy() if you really just need a truncated but
NUL-terminated string (we provide a compat version, so
it's always available)
- xsnprintf() if you're sure that what you're copying
should fit
- strbuf or xstrfmt() if you need to handle
arbitrary-length heap-allocated strings
I just did a search on the keywords 'banned' and 'strncpy' [2][0] https://lore.kernel.org/git/20180724092828.GD3288@sigill.int...
[1] https://lore.kernel.org/git/20190103044941.GA20047@sigill.in...
[2] https://lore.kernel.org/git/20190102093846.6664-1-e@80x24.or...
https://github.com/git/git/commits/master/banned.h
(Git development is done by emailing patches. Those patches include the git commit message, which we can see just by looking at the history of the file. Sometimes there's additional discussion on the ML, but the most important details are in the commit message because the git development team is very disciplined about that.)
https://github.com/git/git/commit/1fbfdf556f2abc708183caca53...
https://github.com/git/git/commit/91aef030152d121f6b4bc3b933...
It would be good to know what the commonly-accepted alternatives are.
For example: https://lgtm.com/rules/2154840805/
Much like with all other forms of effective censorship, I see this as a quick short-term "fix" with hidden long-term costs[1]. IMHO this sort of anti-thinking just leads to even worse, more dogmatic and cargo-cult, programmers who know less and less about the basics and then go on to make even more subtle errors.
Somehow the collective software industry has managed to propagate the notion that people are incapable of doing even basic arithmetic. Yet they think people are capable of creating complex systems with even more subtle behaviour? The justification would normally be because it's not directly affecting security. WTF. It's beyond stupid.
The only C function I think should be truly banned is gets(), because it is actually impossible to calculate what size of buffer it needs. That is not true of any of the others on this list.
[1] By short and long, I mean decades vs centuries.
Static analysis would probably be more robust, but way more involved.
C#
But please, nothing about using unsafe.
strcpy() was replaced with a safer strncpy() and in turn has been replaced with strlcpy().
The list is a ban of the less safe versions, where more modern alternatives exist.
http://the-flat-trantor-society.blogspot.com/2012/03/no-strn...
C has unsafe basic functions because the programs written then were much simpler, and this sufficed. There's decades of PL research resulting in new languages that give better guarantees than C, allowing you to worry less about wrestling with the language and more on your business logic.
> C programmers don’t trust anything, and they’re better programmers for it.
By that dime, frontend JS programmers trust things even less than C programmers, and they're even better programmers for it. \s (in reality, FE JS devs mainly wish that browser environments were more consistent and predictable, and would disagree that they are better developers because of it).
"../../../../../../../../../../../../../../../../../../../../etc/shadow" is not a file someone would ever reasonably want to access. But is there an easy way to look for nonsense paths without potentially limiting functionality, or writing more code than you wanted to? Nope.
The same footgun exists in all languages; C's design just has a hair trigger.
BTW, in macOS there are "secure bookmarks" (see NSURL docs) that are effectively capability tokens: when user drags a file, or selects it in an Open File dialog (which runs isolated from the app), the kernel creates an app-specific token that grants access to that file to the app, so it can access it beyond its sandbox.
Unfortunately, it is riddled with sharp knives that can cut you, open flames that can burn you, gas that can smother you, water than can drown you and food that can make you sick if you prepare it incorrectly.
Some react to this potential safety threat by banning the use of knives, stoves, sinks, and food from a kitchen.
Fortunately most attempts at safety just require having a microwave to prepare the frozen pizza or Uber Eats delivery.
I've seen the same question posted way too often by beginners to C. "I've created a char*. Why am I getting <random fault> when I try to write to it?"
This in turn stems from the dedicated CPU support for working with zt strings that traces all the way back to PDP-11. So what C does here is exactly what it has always been doing - it provides a thin wrapper of the existing hardware functionality.
The variadic arguments are of the same nature - they basically allow for manual call stack parsing, again something that is a level down from the application code.
It's also easy to see how an API like sprintf and scanf came about - someone's just got tired of writing a bunch of boilerplate code to print a float with N decimals aligned to the left with a plus sign. So they threw together a function call "spec" (the format string), added a call stack parsing support (va_args) and - voila - a beautifully concise print/scan interface. It is a very clever construct, you've gotta give it that.
The flip side that it required people to pay close attention how they use it, which wasn't that bold of a requirement back then. But as time went on, the average skill of C programmers went down, their use of the language did too, so more and more people started to step on the same rakes.
So, here we are. Zero-terminated strings are forbidden and va_args calls are nothing short of the magic.
If they had used a {pointer, size} pair instead, it would have avoided all of these string problems, most buffer overflows, even the GTA Online loading problem that was on HN recently.
These days (ptr,size) is probably 16 bytes -- longer than almost all words in the English language (the scrabble SOWPODS maxes out at 15). A pointer alone is 8B. Back at the dawn of C in 1970, memory was 7..8 orders of magnitude more expensive than today..(about 1 cent per bit in 1970 USD). (Today, cache memory can be almost as precious, but I agree that the benefits of bounded buffers probably outweigh their costs.)
8B pointers today are considered memory-costly enough "in the large" that even with dozens of GiB machines common, Intel introduced an x32 mode to go back to 32-bit addressing aka 4B pointers. [1] There are obviously more pointers than just char* in most programs, but even so.
Anyway, trade offs are just something people should bear in mind when opining on the "how it should be"s and "What kind of wacky drugs were the designers of language XYZ on?!!?".
[1] https://stackoverflow.com/questions/9233306/32-bit-pointers-...
Pascal, to save one byte, limited strings to length 255. Bad decision.
I think the sentinel character was the best choice in hindsight and at the time in that regard.
But I wish the xxx_s versions and strdup would have made it into the standard like 30 years ago.
Except when you have these rules in Java, the ironic counter-point is "if you are doing this much memory control yourself, you should just use C or C++ or something".
I'll keep your comment in mind next time I see that rebuttal. Thank you.
There is a bunch of misconceptions about Java. Java is actually very performant and memory allocation is generally cheaper than in C (except for inability to have good use of stack in Java). What's slow about Java is all the shit that has been implemented on top of it, but that's another story for another time.
For example, allocation in Java is basically incrementing the pointer. And deallocation for most objects is basically forgetting the object exists.
No, you don't want to "limit the use of new", that's wrong approach.
What you want is to have objects that are either permanent or last very short amount of time.
The worst types of objects are ones that have kind of intermediate lifetime ie if they are allowed to mature from eden. These cost a lot to collect.
The objects that have very short lifetime are extremely cheap to collect.
So if your function takes arguments, creates couple of intermediate objects and then never returns them (for example they were just necessary for inner working of the function) and your function does not call a lot of other heavy stuff, then it is very likely the cost of those temporary objects will be very low. Also, they tend to be allocated very close to each other and so pretty well cached.
- ternary operator("?") was strictly forbidden. One had to use full "if () {..}else {..}" syntax with comments inside each branch even if the branch was empty
- a dynamic array written in an abstract way, when used and implemented specifically for current project had to become a constant static one, with values precalculated and copy/pasted to current project source. This was a fun one to do maintenance work years later.
- magic numbers inside code was forbidden. All numbers had to be defined in a specific header, with explanation why is that number said value.
- no variable parameters. All functions to have fixed parameters
- use of macros as minimum as possible. Code review was wasted sometime on 50% time over use of macros that were not already "classic" from the project point of view
- operator overload strictly forbidden. Also overloading functions was forbidden too.
The argument was to allocate memory freely and let it pool memory as necessary. Fair enough, it was simpler and fit the standard expectation of development.
The issue is that if you talk with the allocator team they complain of not being able to fix performance issues fast enough due to allocations firing off left and right in the middle of a request.
I never realized that my view of C programming is heavily influenced by MISRA until your comment.
I know game engine programming follows a similar, perhaps unspoken, convention.
Also doesn’t the OS lie? I thought the memory wasn’t really physically assigned until first use.
In both cases the project size is small enough, or the scrutiny is high enough that the ad-hoc allocator doesn't develop. The environment is also simple enough that the memory cheats you're thinking of don't exist (or you can squash them by touching all allocated memory up front).
The goal of these rules is to improve reliability and timeliness of your application. If you intend on working around those rules to do what the rules explicitly forbid then either you or the rules are wrong.
You could maybe call filescope buffers with an size counter a dynamic memory allocation. I.e. for storing RS232 or CAN messages. Since they shrink and grow.
The important thing is that you want to know that flooding one buffer wont flood another, which malloc could result in if it was used for unrelated buffers.
That depends on the OS. Linux lies (overcommits), Windows doesn't. In embedded it's more typical to have a special OS like VxWorks or FreeRTOS that don't lie to you, or to have no OS at all (like basically every arduino project)
how do ensure that?
(It would have to be in the .c files, not the headers, might not be so clean)
https://github.com/git/git/commit/c8af66ab8ad7cd78557f0f9f5e...
It actually gives examples and a lengthy explanation and reasoning behind the ban.
A fun exercise you can do is put a "%s" in the format string, omit the string argument and see what happens to the stack.
I'd say the usual trap is rather the size of the target buffer, because that requires bigger static analysis guns. (I'm ignoring things like "%n", because then you're playing with fire already.)
char buf[2];
sprintf(buf, "%d", n);
This will happily write to buf[2] and beyond if n is negative or greater than 9.If you're thinking about using it, consider instead:
- strlcpy() if you really just need a truncated but
NUL-terminated string (we provide a compat version, so
it's always available)
- xsnprintf() if you're sure that what you're copying
should fit
- strbuf or xstrfmt() if you need to handle
arbitrary-length heap-allocated stringssnprintf or nul-plus-strncat do what you want, but snprintf has portability problems on overflow. Most projects I've been on rely on strlcpy (with a polyfill implementation where not available).
It may actually be a bug that I got the warning, because the range of each input was checked, and I think the compiler is supposed to be smart enough to remember that.
> we provide a compat version, so it's always available
Furthermore, imagine "src" has 1Mb characters but we only want to copy the first 3 chars. The git implementation would traverse the entire 1Mb to find the length first, but a proper implementation only needs to look at the first 3 chars. So, they banned strncpy and provided a worse solution to that.
[1]: https://github.com/git/git/blob/master/compat/strlcpy.c
(strcpy is just banned because there's no bounds check, and they want to force use of strlcpy instead).
See https://developers.redhat.com/blog/2019/08/12/efficient-stri...
> C designer Dennis Ritchie chose to follow the convention of null-termination, already established in BCPL, to avoid the limitation on the length of a string and because maintaining the count seemed, in his experience, less convenient than using a terminator.[1][2]
* https://en.wikipedia.org/wiki/Null-terminated_string#History
Richie et al had experience with the B language:
> In BCPL, the first packed byte contains the number of characters in the string; in B, there is no count and strings are terminated by a special character, which B spelled *e. This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed, in our experience, less convenient than using a terminator.
I was also saying that JS dates _can_ generate ISOStrings. But good luck doing any serious manipulation without issues. Hell, there isn't even a `strftime` equivalent for JS dates! And so much stuff ends up going through locales that you can't rely on it for machine transformations.
I would be careful about ascribing the quadratic perf discussion to be a C thing though.... I find loads of "accidentally quadratic" stuff in loads of languages all the time. People are really bad about this (lots of confusion between "this is built-in to the data structure" and "this is cheap").
Anyways, yeah. Strings are uniquely awful. Other C APIs suffer from issues, but I find those issues are on par with other language thing. Granted, it's sometimes _because of C_ that other languages suffer from the issues (by relying on C layers for the logic).
Also registers! Especially in syscall interface, consider eg:
int renameat(int olddirfd,char* oldpath,int newdirfd,char* newpath); /* first example I found that had 2 paths */
If you have registers edx,ecx,esi,edi,ebx available, nul-terminated strings make this fit into: edx olddirfd,ecx oldpath,
esi newdirfd,edi newpath
If you need separate length fields, there simply aren't enough registers: edx olddirfd,ecx oldpath.ptr,esi oldpath.len,
edi newdirfd,ebx newpath.ptr,??? newpath.lenPeople keep repeating this. What about embedded systems? For instance, I have to know how an object is structured and how is allocated, exactly and without surprises. The behavior has to be predictable and as simple and fast as possible. You can (likely) achieve that with C.
Agreed. I want to make something like this on top of Linux one day. I discarded the entire libc and started from scratch with freestanding C and nothing but the Linux system call interface. Turns out the Linux system call interface is so much nicer.
https://github.com/matheusmoreira/liblinux/blob/master/examp...
Sure, but parent wasn't saying "it was not possible", they said "It was too expensive".
And sure enough, the market drifted to the cheaper solution: you could run slightly more applications if your OS and applications were all written in C than if they were written in Pascal, Modula, etc.
Big C projects work well when they are carefully maintained (like Git).
It's a constantly shifting subset, though. Moving slowly is a feature of C for some.
It makes more sense if you think of it as an email message justifying why the project maintainer should accept that change, because that's what they were before git even existed. Still today, unless you're one of the Linux kernel subsystem maintainers, you have to convert your changes to emails with git-format-patch/git-send-email and send them to the right mailing list. Even the Linux kernel subsystem maintainers keep writing commits in that style out of habit (and because Linus will rant at them if they don't).
I agree.
size_t strlcpy(char *dst, const char *src, size_t dstsize)
{
size_t len = strlen(src);
if(dstsize)
*((char*)mempcpy(dst, src, min(len, dstsize-1))) = 0;
return len;
} *((char*)mempcpy(dst, src, min(len, dstsize-1))) = 0;
can be replace by ((char*)memcpy(dst, src, min(len, dstsize-1))[min(len, dstsize-1)] = 0;
if you don't have mempcpyOn Linux memory allocation is basically assigning range of address that may or may not be backed by pages in physical memory.
This allows doing a lot of interesting and useful stuff.
If you really want the memory for some reason (for example you need to guarantee your operation finishes without running out of memory), you need to touch the pages or force them in some other way (for example using mlock()).
It is just that developers are mostly oblivious how memory management works on Linux and then are surprised that it doesn't exactly do what they want.
Most people I work with can't tell how much memory is available on a Linux box if their life depended on it.
Turns out, creating then destroying every single missile/enemy was extremely costly
https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Function-Attribut...
There's options other than printf too.
A mostly unrelated stackoverflow post I found[0] states that an empty standard string in Java occupies 40 bytes due to the normal object overhead and overhead related to the internal byte array for the char storage. Obviously what you gain in return is convenience in programming as well as runtime-enforced safety from buffer overflows. Whether this is worth it depends on what you're doing.
In general, you're definitely absorbing overhead with any managed lang, although it need not be hidden. The specifics should be documented somewhere for whatever platform you're using, and most GCs are pretty tuneable nowadays.
[0] https://stackoverflow.com/questions/56827569/what-is-an-over...
If you don't know about the Compiler Explorer, definitely check it out: https://godbolt.org
Oh absolutely, but it's a pretty reasonable expectation that any contemporary language should handle that complexity for you. The entire job of a language is to make the fundamental concepts easier to work with.
If yes, then you can do #include “sds.c“ in some random source file. In fact, that's what so-called header-only libraries in C implicitly do. shudder
Using defer to unlock locks can lead to some fun deadlocks if you don't realize the issue with the scope, and it's completely unintuitive to someone with experience with other implementations of similar concepts.
the whole var/:=/= assignment combined with the error handling style and the shorthand is another one
Don't understand your second point.
select {
case <-ch1:
case <-ch2:
}On the other hand, Moore’s law and all that. Computers will get faster over time so at some pint we might not care. And the opposite of my question is also true: if you could switch to a faster OS written in assembly, would you (assuming all functionality stays the same), knowing certain classes of bugs are more likely?
It seems to me that the cost of these kinds of bugs is amortized such that it is cheaper to use C than to switch. Expressed in those terms, we will only switch to a different language for all our systems stuff when the cost of the rewrite and the cost of the performance penalty are clearly and significantly less than the cost of the bugs we are likely to expedience.
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1967.htm#:~...
> Microsoft Visual Studio implements an early version of the APIs. However, the implementation is incomplete and conforms neither to C11 nor to the original TR 24731-1. For example, it doesn't provide the set_constraint_handler_s function but instead defines a _invalid_parameter_handler _set_invalid_parameter_handler(_invalid_parameter_handler) function with similar behavior but a slightly different and incompatible signature. It also doesn't define the abort_handler_s and ignore_handler_s functions, the memset_s function (which isn't part of the TR), or the RSIZE_MAX macro.The Microsoft implementation also doesn't treat overlapping source and destination sequences as runtime-constraint violations and instead has undefined behavior in such cases.
> As a result of the numerous deviations from the specification the Microsoft implementation cannot be considered conforming or portable.
As a bonus this crashes icc if you do it.
I thought this was a pretty funny thing but unfortunately when I tried this on ICC it seemed to compile just fine.
Though I am amused by one thing: the VLA version generates worse code on all compilers I've tried. Seems to validate the common refrain that VLAs tend to break optimizations. (Surely it's worse when you have an on-stack VLA though.)
> "if the destination string size dest_size is too small, the invalid parameter handler is invoked"
> "The invalid parameter handler dispatch function calls the currently assigned invalid parameter handler. By default, the invalid parameter calls _invoke_watson, which causes the application to close and generate a mini-dump."
https://docs.microsoft.com/en-us/cpp/c-runtime-library/refer...
Also known as "why does my code that parses floats fail in Turkey?"
Also also known as the discrepancy between a string's length-as-in-bytes, its length-as-in-code-points, and its length-as-in-how-humans-count-glyphs.
Strings are hard.
Edit to respond to your addendum:
> P.s.: In fact, I would argue that strings are not necessarily all that complicated, but simply that many assume that they are simpler than they are, and that code that handles them is thus written on such assumptions that the length of a string remain the same after capitalization, or that the result not be under influence of environment variables.
I don't think I agree with that, though we may just be disagreeing on semantics. I think the big mistake many of us make is confusing two different abstractions for the same one. We've got this high level abstraction for "text" that includes issues like locale and encoding and several other things. And then we've got this low level abstraction for "text" that is just a blob of bytes. And we often mix the abstractions because it often turns out okay anyway. Otherwise we have to confront demons like "a UTF-8 string containing 10 characters can be anywhere between 10 and 40 bytes long".
I am quite certain that I have produced code that lowercases or uppercases and then checks for “i” in them, that I now realize would fail under Turkish locale settings as under that “i” does not uppercase to “I”, as one might expect.
Because you, or someone, called
fuck_my_program();
which is defined in "idiot.h" as #define fuck_my_program() setlocale(LC_ALL, "")
and the project is missing: #define setlocale(x, y) BANNED(setlocale)
Hope that helps!One project I worked with these rules was on Verix OS.
The rules are more intended on reducing application complexity and unpredictability which is typically helping reliability regardless of the setting.
Linux does not lie about what available memory is. Rather, it is most developers that do not understand how memory is managed on Linux.
What you probably mean is that you don't get physical memory when you run malloc().
That's because when you allocate memory on Linux you allocate virtual address space rather than physical memory.
Basically, you get a bunch of addresses that may or may not be backed up by physical pages.
If you want physical memory you just need to use mlock() along with malloc() and you are all fine.
I'm sure there's a lot of important things that rely on COBOL, but by most definitions of "critical", I think this is way off the mark.
The OS kernel for nearly every PC and server on earth is written in C.
Almost every electronic device on earth complicated enough to require software is probably running at least some firmware written in C.
I think those both outnumber ATMs by a hefty margin.
There's a QNX user process that's always present, called "proc", which handles pathnames and the "resource managers", programs which respond to path names. But that's in user space, and has all the tools of a user-space program.
When handling strings in C, it's useful to use the string functions from glib or pull in one of the specifically safe string handling libraries and not use any C stdlib functions for strings at all.
There are a number of C strings libraries safer to use than the standard library, and many of them are simpler, more feature-rich, or both.
* https://github.com/intel/safestringlib (MIT licensed)
* https://github.com/rurban/safeclib (MITish)
* https://github.com/mpedrero/safeString (MIT licensed)
* https://github.com/antirez/sds (BSD 2-clause, and gives you dynamic strings)
* https://github.com/maxim2266/str (BSD 3-clause)
* https://github.com/xyproto/egcc (GPL 2.0, includes GC on strings)
* https://github.com/composer927/stringstruct (GPL 3.0)
* https://github.com/c-factory/strings (MIT licensed)
* https://github.com/cavaliercoder/c-stringbuilder (MIT licensed, does dynamic)
If one does use the C standard library directly for handling strings, the advisories from CERT, NASA, Github, and others should be welcome advice (CERT's advice, BTW, includes recommending a safer strings library right off).
The major differences in my head are:
- Garbage Collection
- Interfaces
There are some other more minor things like defer and the built-in generic types but many C programmers already had macros to implement similar things.
So sure, Go doesn't solve all of the problem spaces that C does. But if you are in a problem space where you don't need see (or another low-level systems language) Go may be a middle ground to move C programmers to.
So currently the issue is you can't write baremetal stuff in Go, for example the linux kernel or Arduinos. You can't get rid of GC if you have realtime requirements.
GO is 'close enough' to C in many areas like writing server-side code and CLI tools - it's GC pauses are short, even though C could produce an executables in a few KB, rather than an MB, that has no practical relevance in those areas.
If I needed a C alternative, rust gives me all (most of) the power of C, plus safety, without all the drawbacks of a heavy runtime.
If I am okay with paying the garbage collection tax, etc, then I would use scala or kotlin. If I'm using a higher level language, then I expect higher level features like exceptions, generics, reflection, etc.
I'm sure go is a great language with it's own usecases, but it's not a C replacement.
Most of the C lovers that I have spoken with find that Rust is far too complicated with templates and lifetimes. Similar arguments can be provided for Scala and Kotlin with their more functional syntax.
The thing that C/Go authors brought in straight from the seventies: utter garbage naming of identifiers.
But then again, the name of the language itself should be warning enough.
Because that is effort every person who uses the file has to do over and over again, whereas maintaining the file is effort that has to be done once by one person.
https://github.com/eamodio/vscode-gitlens/tree/v11.2.1#curre... (screenshot)
> Current Line Blame: Adds an unobtrusive, customizable, and themable, blame annotation at the end of the current line
You can’t fuck up String(“Hello “) + String(“world”) but you can definitely fuck up strcat(buf, “Hello “); strcat(buf, “world”);.
But anyway, we were talking about the real world software on microcomputers, not just standards in the abstract. With that in mind, I think TP/BP is a better example of Pascal in the wild than anything Wirth ever made.
I'm not sure if they take bug reports if you're not a customer, but this one goes back at least 8 years.
Was it intended for fixed length records?
You can also use it to overwrite part of an existing string, but I think that’s a side effect of the above.
struct dir
{
char name[14];
int inode;
};
Adding a NUL byte might waste a full byte that could otherwise be used---remember, back when C was first developed, 10M disks were large and very expensive.As you say, it does in fact obviate some errors. A value judgement as to which behaviors are more or less safe may be subjective, but the intent is not.
To be clear, strncpy does not guarantee NUL termination. It takes a C string as the source argument, but it doesn't write out a C string; it writes out a very esoteric data structure that is unfortunately easily confused with a C string.
By contrast, strlcpy was intended to be a safer string copy routine: https://www.usenix.org/legacy/event/usenix99/full_papers/mil... In particular, it was designed to be what people seem think strncpy is. Its return value semantics are controversial, though mostly only among the glibc crowd as every other Unix libc, including musl and Solaris, now provide it. But the semantics were designed based on experience in fixing old C code, and observations about how developers tend to write C code, not based on prescriptive theories about how people should manipulate C strings in C code.
Anyway, things have changed a lot, and I recently worked on my first ever web app with native ES6 - no transpiling to ES5! It was... not nearly as bad as it used to be! Modules are a thing, and the language has evolved with things like async/await, evolved for the better, I think. The standard library is still horribly anaemic though - the number of "helper" functions needed is ridiculous.
But still, I would no longer classify myself as a hater. Progress at last :)
I do like Typescript though, as it adds some really nice ergonomics.
Assorted musing : Rust, OCaml under C disguise.
snprintf(buf, sizeof(buf), "%s", string);
strlcpy is on track for future standardization in POSIX, for Issue 8, but even as a de facto standard, it exists in libc on *BSD, macOS, Android, Solaris, QNX, and even Linux using musl.https://www.austingroupbugs.net/view.php?id=986#c5050
But you're correct in that it is not a replacement for strncpy because no code should be using strncpy.
I see this `strlcpy` recommanded everywhere.
By that logic, a non-insignificant amount of (good) comments in code could be removed and people asked to "git blame the code and check out the commit that made it for the documentation". Of course this could be done, but it sounds ridiculous even typing it out.
If you're truly clueless as to what could be substituted for these commands, then you don't understand why they're banned. So our first step? Figure out why they're banned. And how would we sanely approach this? Probably by checking the commit message for _why that code is there in the first place_. That's a very safe, sane, and not-at-all backwards assumption. After you understand why it's there, a quick google search might help out if the commit message didn't already include information on alternatives.
Lastly, yeah, I totally agree a large amount of GOOD comments should be relegated to the git commits if all they're doing is adding additional context around a complex piece of logic. Comments do not exist to edifying a code base in any way other than context. They're too easy to let become stale, whereas a git commit will always reference exactly the code you're blaming.
So, I have to really disagree that it's ridiculous or in any way absurd. In fact, I think a lot of code suffers from NOT using git as a way to extend context around a code base. It's SUPER easy with most development environments to select a block of text and blame it. It's so easy that it's almost always my go-to to increase my context of what's been happening around a particular part of the code base.
Yes, exactly. You want to understand how a codebase changed and evolved over time? Git is your friend. If you want the facts of the code today? The source code is your friend. That's why the way Linux and Gits Git repository method of storing history makes sense. See also https://news.ycombinator.com/item?id=26348965
Try navigating the Git codebase with a git-blame sidebar (probably VS Code has that somewhere) so you can see the history of the source files. If you wonder why something is what it is, you can checkout the commit that last modified it. Or go even further backwards and figure out in the context it was first added. If you truly want to understand a change, a git repository with well written git messages is a pleasure to understand and dig into.
Because there is no way for a commit message to become outdated or detached from what it talks about, both of which are very much issues with comments.
> why make it impossible to update if there are other suggested alternatives that are available since whenever the commit was made?
Because that doesn't really matter.
Discovering why the thing is banned you only have to do once, if you care. If you're just modifying something quickly and minor in Git, you might not even care why.
with (Math) with (console) log(PI)
It reminds me a bit of Clojure's doto. with (myShape.graphics) {
beginPath();
setFill(0xFF0000);
moveTo(6, 9);
// ...
} git blame -w -M
This will ignore whitespace and detect moved or copied lines.Technically it would be better, especially from a multi-threading point of view. The locale stuff was designed in the 1980's, before multi-threading was a mainstream technique.
Say you have a multi-threaded global server which has to localize something in the context of a session, to the locale of the user making the request.
Still, for thread support, you don't necessarily need a cluttering argument. The locale can be made into a thread-specific variable. In Lisp I would almost certainly prefer for the local to be a dynamic variable. (It would be pretty silly to be passing an argument to influence whether he decimal point is a comma, while the radix of integers is being controlled by *print-base*.)
What you want is for the locale stuff to be broken out into a complete separate library: a whole separate set of loc_* functions: loc_strtod, loc_printf, and so on.*
I don't think having to pass a locale arguments would be that big of a problem - you could always have wrapper functions for the C locale, although they should be implemented directly for performance.
> What you want is for the locale stuff to be broken out into a complete separate library: a whole separate set of loc_* functions: loc_strtod, loc_printf, and so on.*
Yes, that would be ideal.
100% agree. Though I don't mind if comments also leave historical information about the code. Can't be too much -- there is a delicate balance.
Do note, however, that you said it yourself: If you want the facts of the code today, go to the source code. In my opinion, the "facts of the code of git" are that functions X,Y,Z are "banned", but the code does not tell me why, or what to use instead. It just bans them. I would expect to see something in the code, not (just) in a git commit. It's also not that I can't google these functions (a couple of minutes will answer these questions), or that I should be experienced enough to know why they're evil, it's that it's IMO a reasonable, developer-friendly and good thing to do.
If it uses a struct with length of string and pointer to a c-style string, even the conversion can be elided (at the price of some inflexibility/unnecessary copying while in use)
In short, it's not possible to write a nice string library in C because C simply doesn't support objects, and by extension doesn't support libraries.
Strings are a perfect example of an "object" in what is later known as object oriented programming. C doesn't have objects, it's the last mainstream language that's simply not object oriented at all, and that prevents from making things like a nice string library.
If you're curious, the closest thing you will see in the C world is something like GTK, a massive C library including string and much more (it's more known as a GUI toolkit but there are many lower level building blocks). It's an absolute nightmare to use because everything is some abuse of void pointers and structs.
Take another look at https://developer.gnome.org/glib/stable/glib-Strings.html#g-... . That’s all C, baby, and could be replicated in a completely independent strings-only library built on the standard library if you wished. The reasons no such library exists are ecological, not technical.
GLib and GTK are closely aligned parts of GNOME so they are easy to get mixed up.
There were a few big libraries in the ecosystem if I remember well, GTK, glib and another two. They're from the same origin and often mixed together.
It's been almost a decade since I dabbled into this stuff day-to-day. I think being forced to use glib is the turning point in a developer's life where you realize you simply have to move on to a more usable language.
Let's say, something that's easier to use and doesn't have all the footguns of the char arrays.
The library you link doesn't come anywhere close to that. It's 99% like the standard library and it has the exact same issues.
In B, thee was only one data type: machine word. The actual meaning was determined by the operators used on it. Thus, given x, (x + 1) would be integer addition, but *x would dereference it as a pointer (to another word). There was no need to distinguish between integer and pointer arithmetic, because their semantics was the same - pointers were not memory addresses of bytes, but of words, and thus (x + 1) would also mean "the next element after x", if x is actually a pointer.
When it came to arrays, B didn't have them as a type at all. It did have array declarations - but what they did was allocate the memory, and give you a variable of the usual word type pointing at that memory (which could be reassigned!). Thus, arrays "decayed" to pointers, but in a broader sense they did in C.
This all works fine on machine where everything is a word, and only words are addressable. But C needed to run on byte-addressable architectures, hence why it needed different types, and specifically pointer types to allow for pointer arithmetic - as something like (p + 1) needs to shift the address by more than 1 byte, depending on the type of p. But they still tried to preserve the original B behavior of being able to treat arrays as pointers seamlessly, hence the decay semantics.
BTW, this ancestry explains some other idiosyncracies of C. For example, the fact that array/pointer indexing operator can have its operands ordered either way - both a[42] and 42[a] are equally valid - is also straight from B. A more obvious example, the reason why C originally allowed you to omit variable types altogether, and assumed int in that case, is because int is basically the "word type" of B, and thus C code written in this manner very much resembles B. And then there's "auto" which was needed in B to declare locals because there was no type, but became redundant (and yet preserved) in C.
https://en.wikipedia.org/wiki/B_(programming_language)#Examp...
#include <stdio.h>
void foo(int len, const char (*str)[len]){
printf("%zu\n", sizeof(*str));
printf("%.*s", len, *str);
}
int main(void){
// note: not nul-terminated
const char text[] = {'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '!', '\n'};
// prints 13, then 'hello world!'
foo(sizeof(text), &text);
return 0;
}It's unfortunate that the resulting VLA-enhanced function is no longer compatible with the original:
/* original function */
void foo(size_t len, const char *str);
/* compatible signature but str[] decays to sizeless pointer */
void foo(size_t len, const char str[len]);
/* allowed but signature is no longer compatible with original */
void foo_improved(size_t len, const char (*str)[len]);
/* (this is how the non-VLA caller would see the signature) */
void foo_improved(size_t len, const char **str);
So what your example does show is that existing compilers already support this concept (no need for fancy dependent types) but the C99 standard explicitly prohibits compilers from acting on the VLA information contained within the const char str[len] declaration.I think many of the "safe C" variants get tripped up by starting with fat pointers (length + pointer as an atomic value) and then have trouble (rightly so!) when trying to squeeze them through the C standard ABI; it's a square peg in round hole sort of situation.
The key observation from WalterBright's post is that the C standard ABI already has a way to pass fat pointers, using a pair of arguments (size_t and char *) in an ad-hoc manner.
It's the not-useful-but-legal C99 VLA declaration in the function prototype that could, if one is willing to violate the C99 spec, allow a compiler to automatically derive a fat pointer inside the body of a function in a manner that is backwards-compatible with the C standard ABI.
char *strndup(size_t n; const char *s[n], size_t n) {
char buf[n]; /* alloc a temporary VLA */
assert(sizeof(buf) == n); /* yep! */
assert(sizeof(s) == n); /* nope, sizeof(s) == 1 */
}
So there's absolutely no reason (other than being in violation of the C99 specification) for the compiler to refuse to let you make the assertion that sizeof(s) == n.And given the prototype for this VLA-enhanced strndup(), a smart C compiler could catch errors like this:
char * bugged_func() {
char buf[20];
/* do stuff with buf, e.g. snprintf() into it */
return strndup(buf, 30); /* error: 30 > sizeof(buf) */
}
Since of course within a function the C type system is already tracking the size of an array -- so no additional type information is required, and certainly not dependent types!https://www.kernel.org/doc/Documentation/vm/overcommit-accou...
Get over it. What you call "allocation" is two distinct operations on Linux, one of which is called malloc in standard c library, unfortunately, and that's where your confusion comes from.
I think you know these things and it's mostly just a semantics argument, but this is the widely agreed definition.
Also in many systems the C library is linked dynamically and shared among all programs so even though a program is compiled it still relies on the underlying system to provide the function.
Finally i'm certain that if a C standard removes something, it'll be treated as the equivalent to that standard not existing. C programmers are already a conservative bunch without such changes.
Removing strcpy would make the Python transition look easy.
And yes: it should all still compile, but none of that prohibits the compiler from issuing flashing red/yellow warning messages to your terminal for using footgun functions, preferably with uncomfortable audible notifications too.
All of this is silly though, because even in a strict C89 environment you can still have your own safe wrappers over the unsafe functions. I find that very little of modern programming has a hard dependency on ultramodern compiler features (e.g. you can theoretically build React/Redux using only ES3 (1998ish) if you like. Generics using type-erasure can be implemented with macros. Etc.).
Also, C89 conformance doesn’t mean much: you can have a confirming C89 system that doesn’t even have a heap - nor a stack for autos! (IBM Z/series uses a linked-list for call-frames, crazy stuff!)
When updating existing code C89 (maybe K&R) might be what's used so minor code changes won't undo that.
I tend to write most of my code in something higher-level than C and only resort to C or assembly in performance-critical sections as found with a profiler. Plenty of general-purpose languages have memory-safe strings built into the language, and honestly I keep hoping the Cisco/Intel safestrings library or something like SDS gets the standard library blessing one day.
As long as it is done like in recent versions of Visual C++ where i can disable that useless compiler output pollution with a #define, usually with a snide remark about Visual C++ right above it.
The reality of C is that if we deprecated every objectionable function in the stdlib we wouldn't have anything left.
I think you mean there are very few functions that cannot possibly be used correctly (namely gets). Most C functions are dangerous - can lead to crashes and security vulnerabilties if used incorrectly - but that's just a expected consequence of using a language with no provisions for memory-safety.
> The reality of C is that if we deprecated every objectionable function in the stdlib we wouldn't have anything left.
Somewhat ironically, malloc is actually perfectly safe[0] - using the return value has some issues, but calling it is always[0] fine.
0: Assuming the OS-level memory allocator is sanely configured WRT overcommit, anyway.
Its nearly the exact same reasoning as "we're not going to break older websites"
Runtime backwards compatibility is similarly extensive on platforms that care about it. You can still take a DOS app written in ANSI C89 the year that standard was released, and run it on (32-bit) Windows 10, and it'll work exactly the same. In fact, you can do this with apps all the way back to DOS 1.0.
So a severely memory limited architecture of the 70s led to blending of data with control - which is never a safe idea, see naked SQL. We now perpetuate this madness of nul-terminated strings on architectures that have 4 to 6 orders of magnitude more memory than the original PDP-11.
It's also highly inefficient, because a the length of string is a fundamental property that must me recomputed frequently if not cached.
Bottom line, unless you work on non-security sensitive embedded systems like microwave ovens or mice, there is absolutely no place for nul-terminated strings in today's computing.
It is by far my favorite language, because it is filled with elegant solutions to hard language problems.
As a perfectionist, there are very few things I would change about it. People rave about Rust these days, but I rave about D in return.
Just wanted to say thanks (and that I bought a D hoodie).
Would you might sharing the things that you look for, from the obvious to the subtle? I would love to see some rejected push requests if possible. If I were writing C under your direction, what would you drill into me?
Thank you, it is an honour to address you here.
2. be aware of all the C string functions that do strlen. Only do strlen once. Then use memcmp, memcpy, memchr.
3. assign strlen result to a const variable.
4. for performance, use a temporary array on the stack rather than malloc. Have it fail over to malloc if it isn't long enough. You'd be amazed how this speeds things up. Use a shorter array length for debug builds, so your tests are sure to trip the fail over.
5. remove all hard-coded string length maximums
6. make sure size_t is used for all string lengths
7. disassemble the string handling code you're proud of after it compiles. You'll learn a lot about how to write better string code that way
8. I've found subtle errors in online documentation of the string functions. Never use them. Use the C Standard. Especially for the `n` string functions.
9. If you're doing 32 bit code and dealing with user input, be wary of length overflows.
10. check again to ensure your created string is 0 terminated
11. check again to ensure adding the terminating 0 does not overflow the buffer
12. don't forget to check for a NULL pointer
13. ensure all variables are initialized before using them
14. minimize the lifetime of each variable
15. do not recycle variables - give each temporary its own name. Try to make these temporaries const, refactor if that'll enable it to be const.
16. watch out for `char` being either signed or unsigned
17. I structure loops so the condition is <. Avoid using <=, as odds are high that'll will result in a fencepost error
That's all off the top of my head. Hope it's useful for you!
I was working on a RISC processor and somebody started using various std lib functions like memcpy from a linux tool chain. I got a bug report - it crashed on certain alignments. Made sense - this processor could only copy words on word alignment etc.
So I wrote a test program for memcpy. Copy 0-128 bytes from a source buffer from offsets 0-128 to a destination buffer at offset 0-128, all combinations of that. Faulted on an alignment issue in code that tried to save cycles by doing register-sized load and store without checking alignment. That was easy! Fixed it. Ran again. Faulted again - different issue, different place.
Before I was done, I had to fix 11 alignment issues. A total fail for whomever wrote that memcpy implementation.
What was the lesson? Well, writing exhaustive tests is a good one. Not blindly trusting std intrinsic libraries is another.
But the one I took with me was, why the hell isn't there an instruction in every processor to efficiently copy from arbitrary source to arbitrary destination with maximum bus efficiency? Why was this a software issue at all! I've been facing code issues like this for decades, and it seems like it will never end.
</rant>
What will be the alternative for strncpy/strncat? I thought they're a safer strcpy/strcat but now I need something to replace them.
I assume snprintf for sprintf, vsnprintf for vsprintf.
No idea what to do with gmtime/localtime/ctime/ctime_r/asctime/asctime_r, any alternatives for them too?
char buffer[2000];
strcpy(buffer, "hello", sizeof buffer);
writes "hello" and 1995 0 to the buffer.Thank you, have a great weekend!
Naturally, our teacher wisely pushed hard on figuring what you could out on paper first.
Better comparison would be between C and Turbo Pascal strings in DOS times. TP strings were limited to 255 characters but they were almost as fast as C strings, in some operations (like checking length) they were faster, and you had to work very hard to create a memory leak or security problem using them.
I've learnt Pascal before C and the whole mess with arrays/strings/pointers was shocking to me.
Turbo Pascal wasn't released until 1983, if the wiki is to be believed.
There are ways around this that are varying degrees of acceptable. Versioning libc itself is outside the scope of the language, since it really depends on how the system linkers and loaders are implemented.
Calculating mandelbrot fractals to measure speed might be a nice exercise in which Rust or Zig can compete with C. But in a real software implementation, when you need to open a file you still have to call the OS function fopen(). Whatever thing File::open (Rust) is doing before calling fopen() is overhead.
How can you avoid that overhead? Write in C (at your own risk).
This is why we keep seeing the “X is faster than C” articles: if you use the standard C library in a sort of not great way (sscanf) vs a more intelligent version of the code in another language you will get faster than C results. But on the whole doing less work is always faster. Not doing bounds checking on an array will always be faster than doing bounds checking. How could it not be? No amount of computer science can make bounds checking take negative time.
I am not saying C is magically faster. I am saying that by letting you not do critical safety checks it will be faster. Rust has a similar capability for some things but if your goal is to write unsafe Rust for the sake of performance, then is it worth the switch?
"Underspecifying and overspecifying.
Ninja executes commands in parallel, so it requires the user to provide enough information to get that correct. But at the other extreme, it also doesn't mandate that it has a complete picture of the build. You can see one discussion of this dynamic in this bug in particular (search for "evmar" to see my comments). You must often compromise between correctness and convenience or performance and you should be intentional when you choose a point along that continuum. I find some programmers are inflexible when considering this dynamic, where it's somehow obvious that one of those concerns dominates, but in my experience the interplay is pretty subtle; for example, a tool that trades off correctness for convenience might overall produce a more correct ecosystem than a more correct but less convenient alternative, if programmers end up avoiding the latter. (That could be one reason Haskell isn't more successful. Now that I work in programming languages I see this dynamic play out regularly.)"
Ok, so maybe rather than have this file we should run “git log | grep BANNED” and build a list of functions from that? Or maybe we could change all error messages to be “go look at the commit history to work out why this happened”.
No? Maybe putting context in source files (or better yet, an error message!) rather than in a side channel like the commit message has value when it comes to understanding and updating, and it won’t be lost under the weight of future commits.
That's why it makes sense to describe the background and reasoning behind a change in a Git commit, instead of inside your source files as comments.
They are suggesting adding a more informative error, which may include a subset of that background and reasoning. An error message that points you to the functions you should use instead is infinitely more informative than one that says “this is banned. Bye.”
The commit message from 2020 with suggested alternatives might very well go stale. Does the author go and force a noop commit so they can document new best practice in a new commit message?
What if they think of another reason why one of the same functions should be disabled?
The only issue is that C99 insists that the first dimension of an array argument must decay to a pointer, discarding the associated type information of that array's dimension.
Also commits shouldn't be changed so if you want to improve the doc and provide more details, well you can't.
Granted these concerns are probably less likely to apply to this particular file.
Perhaps they just felt that anyone contributing to git would already know why not to use those/what to use instead (but then there would be no need to ban them).
Might work in practice for a long time, but Git is a version control system, not a documentation system.
> Naturally, our teacher wisely pushed hard on figuring what
> you could out on paper first.
Specifically in the case of the Commodores (I grew up on a C128) I find this observation backwards. Sure, if you only had three machines for twenty students then time on the machine was valuable. But on those machines there was so much to explore with poke (and peek to know what to put back). From changing the colours of the display to changing the behaviour of the interpreter.I think that I discovered the for loop at eight years old just to poke faster!
> I thought they're a safer strcpy/strcat
Let's look at the documentation for strncpy, from the C Standard:
"The strncpy function copies not more than n characters (characters that follow a null character are not copied) from the array pointed to by s2 to the array pointed to by s1."
There's a subtle gotcha there. It may not result in a 0 terminated string!
"If the array pointed to by s2 is a string that is shorter than n characters, null characters are appended to the copy in the array pointed to by s1, until n characters in all have been written."
A performance problem if you're using a large buffer.
Yeah, always prefer snprintf.
The time functions? I'm just very careful using them.
But the various rewrites and rephrases? Nope. If you absolutely, positively want to get it right, refer to the C Standard.
printf is particularly troublesome. The interactions between argument types and the various formatting flags is not at all simple.
Other sources of error are the 0 handling of `n` functions, and behavior when a NaN is seen.
Many of his other points are solved by Turbo Pascal and Delphi/Object Pascal.
But of course nowadays there are better languages for real world programming. It's just a shame that there's nothing as simple and elegant for teaching programming ().
() lisp is even more elegant, but it has a lot of gotchas and it's so far from mainstream that using it for teaching isn't a good idea IMHO
double f(double *xs, int n){
return xs[g()];
} double f(xs, n)
double xs[n];
size_t n;
{
size_t _tmp0 = g(); /* temporary var created by compiler */
assert(_tmp0 < n); /* bounds check inserted by compiler */
return xs[_tmp0];
}
The key to making this possible is telling the compiler about the relationship between double* xs and size_t n; once the compiler has the knowledge that the type of xs is double [n] (array of double with first dimension n) it would be able to automatically insert dynamic bounds checks.Yes. What you can't do is associate bounds information with some specific pointer to an array, but this will work, for instance:
int *x = malloc(2 * sizeof(int));
x[1]; //ok
x[2]; //runtime errorWhy stop there? Don't use C. Use Rust!
When it is possible, I certainly agree that Rust is nicer.
sds strings contain their lengths, so operating on them you don't have to rely on null termination, which (to my knowledge as a lower-midlevel C programmer) is the most prevalent reason why people take issue with C strings.
If you mean that they're not really "strings" but byte arrays I would say that I agree, but to all intents and purposes that's what the C ecosystem considers as strings.
Keeping an API which is very similar to the standard library is also a plus, as it doesn't force developers to change the way they reason about the code.
Wait, haven't I seen that idea somewhere else...?
> If you mean that they're not really "strings" but byte arrays I would say that I agree, but to all intents and purposes that's what the C ecosystem considers as strings.
Aha, strings as byte arrays but with a built-in length marker.
But yeah, Pascal is sooo outmoded and inferior to C...
Sigh.
In fact there are several libraries for string-like objects; the main barrier to use them is that none of them is standard. You can at least acknowledge that before talking about nice-ness, which is a whole other point.
The compiler is trying to help you write better code - suppressing warnings should not be taken lightly.
As i wrote in another comment, something that may lead to issues isn't the same as something that will always lead to issues - e.g. if i check a string's length or actually calculate and allocate the necessary memory before calling strcpy it is perfectly fine and safe to use it, but Visual C++ doesn't know about that, it complains like some stupid greenhorn that read somewhere "never use gotos" and then is surprised when he sees some Linux kernel code with gotos everywhere for cleanup, thinking that those people writing the kernel do not know what they're doing.
Uh, you're not an hardware designer and it shows.. What if there's a page fault during the copy, you handle it in the CPU? That said, have a look at RISC-V vectors instruction (not yet stable AFAIK) and ARM's SVE2: both should allow very efficient memcpy(among other things) much more easily than with current SIMD ISA.
Page fault is irrelevant. It already can happen in block copy instructions.
As for block copy instruction AFAIK there's no such things in RISC-V for example.
It was a limitation, because they chose a byte length (to save space). So strings up to 255 characters only. It was decades before folks were comfortable with 32-bit length fields. And that still limited you to 4GB strings. In the bad old days, memory usage was king.
Such a system would effectively remove that feature. Yes, you could disable range checks when indexing into a string, but you still would have to figure out how many length bytes there are. That would only be a little bit faster than a full range check.
Because of that, I don’t see how that would have been useful at the time.
In hindsight, I think the complexity is worth the safety, but I could see why it felt more elegant to use null-terminated strings at the time.
Human concepts are inherently messy. "Elegant" solutions just shove the mess down the road.
The problem is the null termination, which is not general to arrays (though it is sometimes used with arrays of pointers).
Sure 16 exabytes sounds like a lot today, but so did 4 billion ip addresses. Differently bad is not better.
Null is always 1 byte minimum so at best you save size_t-1 bytes per string. Ignoring clever structures like LEB128 varint length.
This is a classic case of "simple is actually complex". How many billions of dollars has null terminal strings cost? Hope that 3 bytes of overhead per string saved is worth it.
No matter how you slice it, null termination was a mistake.
cat_pascal_strings(pascalstr *uninited_memory,
pascalstr *left,
pascalstr *right);
how big is uninited_memory? Can left and right fit into it?You need to design language constructs around Pascal srings to make them actually safe. Such as, oh, make it impossible to have an uninitialized such object. The object has o know both its allocation size and the actual size of the string stored in it.
What is unsafe is constructing new objects in an anonymous block of memory that knows nothing about its size.
C programs run aground there not just with strings!
struct foo *ptr = malloc(sizeof ptr); // should be sizeof *ptr!!
if (ptr) {
ptr->name = name;
ptr->frobosity = fr;
Oops! The wrong size of allocated only the size of a pointer: 4 or 8 bytes, typically nowadays, but the structure is 48 bytes wide."struct foo" itself isn't inferior to a Pascal RECORD; the problem is coming from the wild and loose allocation side of things.
Working with strings in Pascal is relatively safe, but painfully limiting. It's a dead end. You can't build anything on top of it. Can you imagine trying to make a run-time for a high level language in Pascal? You need to be in the driver's seat regarding how strings work.
Like I get why it happened. It is just crazy how long it has stuck around.
Strings as implemented in e.g. Borland Pascal were better. But then, the length-prefixed implementation had its own downsides. For example, it had to decide how many bits to use for length. 16-bit Pascal would generally use a single byte, and in BP at least, you could even access it as a character via S[0]. Thus, strings were limited to 256 bytes max - and because this was baked into the ABI, it wasn't something that could be easily changed later.
Hence when Delphi decided to fix it, they basically had to introduce a whole new string type, leaving the old one as is. And then they added a bunch of compiler switches so that "string" could be an alias for the new type or the old, as needed in that particular code file.
> None of BCPL, B, or C supports character data strongly in the language; each treats strings much like vectors of integers and supplements general rules by a few conventions. In both BCPL and B a string literal denotes the address of a static area initialized with the characters of the string, packed into cells. In BCPL, the first packed byte contains the number of characters in the string; in B, there is no count and strings are terminated by a special character, which B spelled `*e'. This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed, in our experience, less convenient than using a terminator.
[…]
> C treats strings as arrays of characters conventionally terminated by a marker. Aside from one special rule about initialization by string literals, the semantics of strings are fully subsumed by more general rules governing all arrays, and as a result the language is simpler to describe and to translate than one incorporating the string as a unique data type. Some costs accrue from its approach: certain string operations are more expensive than in other designs because application code or a library routine must occasionally search for the end of a string, because few built-in operations are available, and because the burden of storage management for strings falls more heavily on the user. Nevertheless, C's approach to strings works well.
* https://www.bell-labs.com/usr/dmr/www/chist.html
He mentions Algol 68 and Pascal [Jensen 74].
I personally don't think that the qualitative pros/cons of the chosen approach or alternatives that we're discussing today, 30-ish years later, would be all that new to the designers of C in 1993. The difference is that we've had 30-ish years to watch those decisions play out over millions of lines of code in software running at scales and levels of complexity that programmers in 1993 could only dream of.
Also, software security was barely an issue in 1993. Today, it's a massive issue.
That was him reflecting on things in 1993, but the C team designed things in ~1970. That was basically the Stone or Iron Age of computing.
(OK, it's hard to compare; Code Complete and other much later stuff might be just as good. Too many decades between when I read them to say for sure.)
You mean like the strings in Delphi? Yeah, I can since I use them daily. Strings in Delphi nowadays are actually more like classes in java than Old Pascal strings. Then depending on your intend either get them to be arrays or old strings after linker goes over your code. Best of both worlds, and on top of it, if you really want, you can definitely shoot yourself in your leg with unsafe operations. So in the end is best of both worlds and worse of 3rd world. Though the 3rd one you really need to go out of your way to have it as bad as C strings are.
I doubt string representation is really the blocker here since C-strings are now pretty much just used by some but not all C programmers. QString and GString and C++ std::string and Rust strings and Go strings and Java strings and so on are not null terminated
Better yet, how about Modula-2? I can't help but think that the programming language landscape would be much better if that language occupied the niche that C does today.
This is why whenever I use sizeof, I pass a type, not a variable.
That being said. Length as a first parameter and the rest of the arguments being the variadic bit is also quite normal.
But sure on modern 64 bit systems just using a 64 bit integer makes much more sense. On a small embedded 8 bit oder 16 bit microcontroller it might make sense.
Having truly unbounded integers was rather fun. Of course performance was abysmal.
But if you keep up the good work you will one day go from
extern void *lecturer;
to static const lecturer; volatile unsigned short lecturer; delete from schema.hr.employee
where employee.employee_type = 'Lecturer'
having rownum = cast(dbms_random.value(1,count(*)) as int)
Most Deans' computers have it mapped to alt-delete. They don't even know what it does-- it's just called the "reduce budget function". Which is really unfortunate because when they hit ctrl-alt-delete on a frozen system, but miss the ctrl key by accident, some poor lecturer gets fired and at the end of the semester the Dean says "Huh, wonder where that budget surplus came from.".Once an entire physics department was disbanded when their Dean's keyboard had a broken ctrl key.
Then on the monday, it was makefiles if i remember correctly, then open(), read(), close() and write(). Then linking (and new libc functions, like strcat) . A day to consolidate everything, including bash and git (a new small project every hour for 24 hours, you could of course wait until the end of the day to execute each of them). And then some recursivity and the 8 queen problem. Then a small weekend project, a sudoku solver (the hard part was to work with people you never met before tbh).
The 3rd week was more of the same: basic struct/enums exercises, then linked list the next day, maybe static and other keyword in-between. I used the Btree day to understand how linked list worked (and understand how did pointer incrementation and casting really work), and i don't remember the last day (i was probably still on linked lists). Then a big, 5-day project, and either you're in, or you're out.
I assure you, strings were not the hardest part. Not having any leaks was.
And then people rhetorically ask themselves why students coming from economically disadvantaged households are under-represented in this industry (one of the best paying industries in this time and age). Stuff like that has got to change.
Medicine is still better paid and better paid universally. Silicon valley is really the outlier here, most of Europe and the world programmers don't get paid that much in comparison.
In the end, it was mostly those that didn't get discouraged and socialized with the other students that would remain in the end.
I myself did not have any programming experience before going through that ordeal.
Those who have been subjected to such programs can also probably agree that the filtering of the first semester (and there is a filter, but again we think it's a fair one not dependent on prior programming experience or other such privilege) ends up normalizing everyone, for the benefit of everyone. For the people who started at 0, they're now Somewhere nearby everyone else, ready for the next (harder) material, and for people who started with some "advantages" they've discovered they... are also now Somewhere, not Somewhere Else ahead of everyone like they might have been at the very start. In these sorts of programs, people with prior experience find that they couldn't sleep through their classes and get A's like they might have pulled off in high school, their advantages were not actually that significant after all, and indeed some from-nothings can and do perform better than they.
For anyone who just wants access to the software industry's wealth, I'd encourage them to ignore college entirely. There may be a case-by-case basis to consider college, especially if you need economic relief now in the form of scholarships/grants/loans only accessible through the traditional college protocol, but in general, avoid.
(If you want something besides just access to the wealth, you have more considerations to make.)
I think the filter is more effective for finding those who can quickly adapt, learn, and grok a methodical mindset. Not necessary characteristics to be a programmer, but necessary characteristics to excel at programming.
If that sounds evil, imagine the grief, wasted money, time, frustration, and stress of letting people get 3-4 years into computer science and then dropping out because it's fucking hard.
So my second hardest classes were freshman year. 3rd year (micro-architecture and assembler)finally bested them.
Their parents can't afford a laptop? They can't afford an Internet connection? The kids don't have a good place to learn in their house? They don't have time?
Is programming affected more than other subjects like math, English/grammar, science, etc?
Also, I'd say "not having segfaults" is the hardest thing to get right when you're going through that.
I wouldn't want to use it my day job, but I'm glad that it was taught in university just to give the impression that string manipulation is not quite as straightforward as it's made to appear in other languages.
The early days of Swift also reminded me of this problem – strings get even more challenging when you begin to deal with unicode characters, etc.
Unfortunately most of those developers don't care much about efficiency and Python is out of the box inefficient compared to other high-level languages like Java [1] or C#. OO Java courses circulating in academia lack modern functional, and to be frank educational, concepts and must to be refreshed first.
I personally would recommend to start with Java and Maven because it's still faster than C# [2], open source, and has a proven track record in regards of stability and backwards compatibility. Plus quickly introduce Spring Framework and Lombok to reduce boiler plate code.
For advanced systems programming I suggest looking into Rust instead of C/C++.
And last but not least the use of IDE's should be encouraged and properly introduced, so aspiring developers are not overwhelmed by them and learn how to use them properly (e.g. refactoring, linting, dead code detection, ...). I recommend Eclipse with Darkest Theme DevStyle Plugin [3] for a modern look.
[1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
[2] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
[3] https://marketplace.eclipse.org/content/darkest-dark-theme-d...
I also like the newfound interest in some FP languages, I for example had a mandatory Haskell course in first year — we did not take Monads in this course yet, but I think it is a great introduction for students for a different take from the more imperative world.
We've started from 0 with no assumption of any computer knowledge and first 2 years most courses were using Delphi (console only, no GUI stuff, basically it could just as well have been Turbo Pascal, some Linux enthusiasts used FPC instead of Delphi and it worked).
We all complained that we want C++ then, but I've learnt to appreciate Pascal later. After first few months we've known every nook and cranny and there was very little corner cases and gotchas. So basically we focused on algorithms and not on avoiding the traps language set for us.
Most people had no programming experience and after a few weeks they wrote correct programs no problem.
I doubt this would happen if we started with C++ as most people wanted, and I think it's better than Python as a starting language because it teaches about static typing and difference between compile- and run-time.
Sadly it's a dead language now.
I understand the predilection for Python but there are some parts of Python that are just... odd.
I used to think that everyone should be taught python first, because it lets you focus on the meat of computer science - algorithms, data manipulation, actually _doing_ something - but after helping my girlfriend out with some comp sci 101-104 projects, I really think Go, Java, or Rust should be everyone's first language. It's hard for someone new to the field to understand the nuances that come with python's corner cutting. You can work yourself into some weird corners because of how permissive the language is, where in a (strongly) typed language, the complier just says no.
I always feel a little iffy when people talk about Python like it's a language ideally suited to beginners.
Dynamic typing puts so much power in your hands to create expressive structures. But it requires discipline to use properly. It's a great trade off for me but I don't think it would be for beginners.
Could you share an example?
In Python, the weird stuff is generally easy to avoid/ignore until it's actually needed.
We started talking and basically we discovered what he was teaching really related for what I do for work so he asked me to become a "mentor" meaning a professional that helps students with their thesys.
In the meantime I went to talk during his class about product management as an engineer where basically I said "I'm an engineer like you, go and talk to customers, it's part of the job", plus extreme programming stuff etc...
After that there was a position open and this professor recommended me because I told him it was one of my goals to be a teacher as well.
And then from there I met the head of dept. He was happy with me being versatile, I usually handle C, database design or java.
But the usual stuff is go to the university you like an look for open positions.
I need to get confirmed every semester and apply again. Usually this job is done by people with a main job and sometimes it happens you don't have time in a semester.
20 years ago I was in the exact situation of one of your students, i.e. I was put in front with the C language in the first semester of the first year. I barely, barely passed, failed with glory a similar course in the second semester which I only passed (with an A, to put it in US university terms) a couple of years later after I had managed to learn Python by myself in the meantime.
Because if remove the basics of programming with something like Python, you can fully concentrate on the second course on low hardware stuff, how to use memory etc..which is really important for my students, them being Electrotechnical engineers.
Nowadays I find it extremely strange to think of bits and bytes when being confronted with strings.
That is something I have a hard time convey as a teacher. My problem is that I have done this so long that I have no idea what there is not to understand about loops ... it's such a simple thing. But my (undergrad biology) students regularly have a hard time groking the concept no matter what explanation I use.
int i = 0; //a
while(i < 10) //b
{
printf("%d\n"); //c
++i; //d
}
and introduce for loops as a special case of the while loop: //a //b //d
for(int i = 0; i < 10; ++i)
{
printf("%d\n",i); //c
}
Then outline situations when you would use a for loop over a while loop, fixed number of repetitions, use with arrays etc.I've mentored quite a few first semester students (in my spare time, to help. Not as a job) and there is no way some of them would've passed without serious help.
At some point I used to think privately that CS should have a programming test as an admission exam, because these students did drag everyone down. If medicine and law have admission restrictions, why not CS too?
But I have changed my opinion because I think everyone deserves a real opportunity, and our school system does not provide a level playing field sadly. (Also the medicine & law admission criteria are GPA based and that is the last thing I'd want for CS.)
Anyway the real filter was always maths.
Or do you mean value vs reference semantics? In that case, I think C pointers are simpler as a fundamental concept, and slices are best defined in terms of pointer + size.
if word == "this" or "that":
...
Not an error, always runs. Very mysterious to a beginner. (Shared with C/C++) Another one: counter = "0"
for thing in things:
if matches(thing):
counter += 1
The error is in the init, by someone who is overzealous with their quoting, but the error is reported, as a runtime error, on the attempted increment, which throws a TypeError and helpfully tells them "must be str, not int", and of course I know exactly why it's reporting the problem there and why it's giving that error, but it's a bit confusing to the newbie programmer and it doesn't even turn up until they actually test their code effectively, which they are also still just learning how to do.The job role of 'professor' may be able to get tenure (I think these roles usually do) but 'lecturer' really means 'full time temporary teacher, with a contract for a specified amount of time.
Them: "Hello Professor"
Me: "Technically I'm not a professor."
Them: "Okay, we'll just call you Doctor."
Me: "Yeah, about that... not a doctor either."
Them: "So why are we paying you?"
Me: "Technically, you're paying the school. And the school is paying me... very little"
Them: "Answer the question"
Me: "Because I know stuff that you don't."
Mostly they still just call me professor and I feel awkward every time.
Professor(Professor&&) = delete;But I agree there is a risk of academic instructors going way overboard in practice, e.g. by flagging actually useful minor standard conformance violations (like zero length arrays or properly #ifdef'd code assuming bitfield order).
Or imagine math was taught by giving kids all the axioms and requiring them to derive the other rules needed to solve tasks as needed :)
Kids from well off families would be ok - it would just be considered another random thing you have to teach your kids to help them make it.
But other kids would suffer and think "English and math is not for me".
By that same argument, schools should not teach anything that is not widely known by 100% of the parents of each kid. Otherwise, it would be discrimination to those kids whose parents cannot help. I disagree very strongly with this principle.
I have two kids, and the best things that they learn in school are precisely those that I'm unable to teach them. For a start: mastery of the language, since I'm not a native speaker of the place where we live. I would be frankly enraged if the school lowered the level of language exigence to accommodate for the needs of my kids who do not speak it at home!
Not at all what I mean. I mean schools (at least primary schools) should be designed for top 80% or 90% not for top 10% or 20%. You can never get to 100% but resigning from the start and going for 20% makes no sense.
You should expect people taking math at university to be able to solve linear equations and explaining it is a waste of time but you shouldn't expect kids in primary school to be able to do the same and it is your responsibility to prepare them in case they want to pursue academic career.
If public schools teach linear equations it's ok to assume that knowledge at university.
If they don't - it's not.
It should be the same with teaching programming and anything else is just funding rich people kids education by everybody's taxes.
The whole point of common public low-level education is to maximize the number of people participating in the economy. It's much better if everybody can read and write. Whole industries are impossible without this. And so is democracy.
It's the same with basic programming and math literacy. It benefits the whole society if vast majority of people have it.
If you "weed out" 60% or 80% of population just because they happen to be born in the wrong environment or went to the wrong school - you lose massive amounts of money and economic/scientific potential. Then you have to import these people from countries which don't fuck their own citizens in such a way.
He fears while a professor might imagine they're weeding out people who lack 'dedication' or 'aptitude' they're actually weeding out people who didn't grow up with a PC at home.
ArrayList etc do have add(), and they implement it by re-allocating the backing array once capacity is exceeded.
In practice, you'd use the ArrayList anyway. I don't think it's worthwhile comparing Go and Java "language-only", because the standard library is as much a part of the language definition as the fundamental syntax; indeed, what goes where is largely arbitrary. E.g. maps in Go are fundamental, but the primary reason is that they couldn't be implemented as a library in Go with proper type safety, due to the lack of generics.
* As a potentially amusing aside, a different course in a different degree program had a professor rage-quit after his first semester because he didn't want to deal with children -- he had a policy of giving 0s on papers with no name or class info on them, and enough students ("children") failed to do that correctly but complained hard enough to overturn the policy and get a resubmit.
BTW "no child left behind" isn't practical, there are people who can't learn basic stuff no matter how hard you try. But "less than X% kids left behind" is for some low value of X.
1) 'Steve' (his first name)
2) 'Mr. Wolfman' (his last name)
3) 'Darth Wolfman' (funny, obvious not meant to be taken seriously, option)
Guess what the class overwhelmingly voted for? :)
"God-Boss."
(Pace Steven Brust.)
Yes! There are millions of kids in the US whose parents can't afford a cheap $300 laptop. The federal government pays for school lunches because there are so many kids who otherwise wouldn't even be getting decent food otherwise.
> They can't afford an Internet connection?
See above. Also, there are many places in the US where getting broadband service is very difficult. Including places just an hour outside of Washington, DC. My parents were only able to get conventional broadband service a few years ago. Prior to that they paid exorbitant fees for satellite internet service with a 500mb per month cap.
> The kids don't have a good place to learn in their house?
Imagine being a kid with 3 siblings and your parent(s) living in a studio apartment. Or a kid that doesn't have a stable "home" at all.
> They don't have time?
That can be an issue too, depending on age. A teenager may be working outside of school hours to help take care of the family's financial needs.
All of the above, and it's surprising this isn't obvious. It may be hard to notice or internalize if you've never seen it and only know privilege, but possession of all or even some of those things is not a guarantee for everyone. Believe it or not, there are some who don't come home to a computer, caring (or even existent!) parents, stable meals, or free time.
Maybe I didn't use the right words to formulate my question.
> Their parents can't afford a laptop?
Holy crap, the amount of privilege shown off in just two sentences is absolutely astounding.
This may come as a shock to you, but a very significant number of people don't have a couple hundred dollars to buy a low-end used laptop. 40% of Americans would struggle to come up with $400 for an emergency expense [0], let alone save $400 for a laptop.
[0] https://www.cnbc.com/2019/07/20/heres-why-so-many-americans-...
It actually doesn't say that, it says they don't have $400 in cash equivalents but may be able to produce it by selling "assets". So a person who keeps all their savings in CDs or investments also counts, although only for expenses you can't put on credit cards.
They often take payday loans, mortgaging their next minimum-wage paycheck; since the next paycheck minus payment no longer covers their regular living expenses, they take another predatory loan or pawn another heirloom. 80% of people who take a payday loan have to renew it because they can't repay it. I have a deep personal dislike for Dave Ramsey, but he does a good job of explaining how even minor emergency expenses can lead to a cycle of debt and further despair. (https://www.daveramsey.com/blog/get-out-payday-loan-trap)
There is so much more instability and precarity in this country than most PMC people can imagine.
Figure 12 on page 21 of the underlying report - https://www.federalreserve.gov/publications/files/2017-repor... .
https://www.statista.com/statistics/756054/united-states-adu...
Conversely, programming was not available in my fairly middle-class school. In terms of money, we only have to look to the laptops schools are providing to students (or not depending on government funding) to see how many children don't have access to a laptop. A good place to learn can also be hard to find for large families in small houses which is sadly all too common for low income households.
Probably a bit more, as it is common to learn other subjects by a book, but learning programming without a computer ... sounds hard.
But it requires you to refresh your knowledge constantly so from this point of view it's similar
A few things though:
1. A psychiatrist has to commute 1 to 2 hours per day. So that salary is not for 8 hours per day, but 9 hours at minimum. Adjusting their salary to an 8 hour basis, it needs to be multiplied by 8/9 or higher like 8/10.
2. The psychiatrist has to be on location. The cost associated with that is hard to quantify, but it is there. For example, I always sleep during the afternoon for 20 minutes, a psychiatrist can't do that. Also, I can take a break whenever I want, a psychiatrist can be on call for 24 hours straight in severe cases. Let's suppose this gives a cost of 1/16 as a multiplier (half an hour of extra work per day).
So the minimum overhead a psychiatrist has is 16/19, their salary is then 4200 EUR. This can be amazing or not so much, considering your own personal preference. My personal multiplier is 0.8 on top of all of this, so for me a 5000 EUR salary is worth 3360 EUR if it's working as a psychiatrist.
As a developer I experience something different, which is:
1. I do not have to commute, I can if I want to, but don't have to.
2. I do not have to be on location, nor do I have a strict schedule for going client after client. I can take random breaks during the day if it helps me be more productive.
So a developer's salary for 2600 EUR is much more like an actual 2600 EUR in that sense. Moreover, my personal multiplier for being a developer is a 1. There are some things I dislike and some things I absolutely love about being a dev (e.g. being a true netizen in the sense that you can randomly act with APIs if you want to).
To conclude: the absolute values are far apart, but the relative values might not. It differs on a person by person basis, and I haven't discussed the whole picture of course (e.g. needing to stay sharp as a dev, I don't know how that works for psychiatrists).
[1] https://www.monsterboard.nl/vacatures/zoeken/?q=Psychiater&w...
[2] https://www.glassdoor.nl/Salarissen/junior-web-developer-sal...
When people on HN discuses salaries, or I see a job posting from a Silicon Valley company I can't help think that we don't even pay our CTO that much. Frequently you could get two developers for the same price here in Denmark.
Think about what your CTO could do in that setting and realize that he's probably worth more to FAANG shareholders than to you hence the salary differential.
For the record, I do not work at a FAANG.
Interesting. I'm in South Africa, right now. The largest offer for a senior C# dev *right now* on www.pnet.co.za is R960k/a.
Twelve years ago, the GP that I was dating, who worked in a *state hospital* (i.e. not making as much as she could have in private practice) was making more than that.
I don't believe that doctors' salaries over the last 12 years have effectively been lowered. OTOH, if you know of places where they are offering more than R1.8m/a for senior developers, then by all means give me their contact details.