Git's list of banned C functions

Git's list of banned C functions(github.com)

876 points by muds 5 years ago | 613 comments

paultopia 5 years ago |

Its really wild, as a person coming from other languages who has written maybe ten lines of C in his life that the functions that seem to be massive footguns in C are, like, "format a string" or "get time in GMT." That's... really scary.

jchw 5 years ago | |

Unfortunately, much of the pain with C surrounds dealing with strings. It’s been a bit of a theme on Hacker News for the past few days, but it’s actually a pretty good spotlight on something I feel is not always appreciated - strings in C are actually hard, and even the most safe standard functions like strlcpy and strlcat are still only good if truncation is a safe option in a given circumstance (it isn’t always.)

(~~Technically~~ Optionally, C11 has strcpy_s and strcat_s which fail explicitly on truncation. So if C11 is acceptable for you, that might be the a reasonable option, provided you always handle the failure case. Apparently, though, it is not usually implemented outside of Microsoft CRT.)

edit: Updated notes regarding C11.

WalterBright 5 years ago | | |

Whenever I review C code, I first look at the string function uses. Almost always I'll find a bug. It's usually an off by one error dealing with the terminating 0. It's also always a tangled bit of code, and slow due to repeatedly running strlen.

But strings in BASIC are so simple. They just work. I decided when designing D that it wouldn't be good unless string handling was as easy as in BASIC.

InvOfSmallC 5 years ago | | |

I teach at university as external lecturer. Teaching strings in C is the hardest thing I have to do every time. The university decided to explain C to first year student without previous experience. My feedback was to do a precourse in Python to let them relax a bit with programming as a concept and then teach C in a second course.

masklinn 5 years ago | | |

> Technically C11 has strcpy_s and strcat_s

"Theoretically" is the word you're looking for: they're part of the optional Annex K so technically you can't rely on them being available in a portable program.

And they're basically not implemented by anyone but microsoft (which created them and lobbied for their inclusion).

brmgb 5 years ago | | |

The issue is pretending that C even has strings as a semantic concept. It just doesn't. C has sugar to obtain a contiguous block of memory storing a set number of bytes and to initialize them with values you can understand as the string you want. Then you are passing a memory address around and hoping the magic value byte is where it should be.

C is semantically so poor, I find it hard to understand why people use it for new projects today. C++ is over complicated but at least you can find a good subset of it.

rbanffy 5 years ago | | |

> strings in C are actually hard,

Strings in C are more like a lie. You get a pointer to a character and the hope there is a null somewhere before you hit a memory protection wall. Or a buffer for something completely unrelated to your string.

And that's with ASCII, where a character fits inside a byte. Don't even think about UTF-8 or any other variable-length character representation.

In fairness, the moment you realize ASCII strings are a tiny subset of what a string can be, you also understand why strings are actually very complicated.

mxcrossb 5 years ago | | |

What I don’t understand is why C programmers use the built in strings. It’s like rolling your own sorting algorithm every time you need it. Surely someone could write a better string library in C that hides the complexity. The real problem is that C programmers are apparently allergic to using other people’s code.

swlkr 5 years ago | | |

I'm partial to https://github.com/antirez/sds these days

macjohnmcc 5 years ago | | |

strcpy is a coding challenge where I work for interviews. I typically ask them to write it as the standard version and ask them why they might not want to use it to see if they are aware of the risks. After that I ask them to modify the code to be buffer safe. And for those claiming C++ knowledge ask them to make it work for wchar_t as well to see if they can write a template. Some people really struggle with this.

lenkite 5 years ago | | |

If only C had followed the Pascal way to have the size with a string - so much human suffering could have been avoided!

ryandrake 5 years ago | | |

> ~~Technically~~ Optionally, C11 has strcpy_s and strcat_s which fail explicitly on truncation. So if C11 is acceptable for you, that might be the a reasonable option, provided you always handle the failure case.

One of the big problems with C programmers is they often neglect to check for and handle those failure cases. Did you know that printf() can fail, and has a return value that you can check for error? (Not you, personally, but the "HN reader" you) Do you check for this error in your code? Many of the string functions will return special values on error, but I frequently see code that never checks. Unfortunately, there isn't a great way to audit your code for ignored return values with the compiler, as far as I know. GCC has -Wunused-result, but it only outputs a warning if the offending function is attributed with "warn_unused_result".

I'm not a huge fan of using return values for error checking, but we have the C library that we have.

loeg 5 years ago | | |

Truncation, even if it is wrong in an application logic sense, is strictly superior to UB (and in practice, buffer overruns, which can be exploitable). That's the main benefit of strlcpy/strlcat. It is certainly possible to construct a security bug due through truncation! But it is much more common to have security bugs from uncontrolled buffer overruns.

liuliu 5 years ago | | |

Yeah. I just avoid str manipulations in general in C and when I have to, fuzz it ... (but still, the perf cliff is definitely new to learn in the past few days).

munchbunny 5 years ago | |

The decision to make C strings null terminated with implied length instead of length + blob continues to trip us up, 30+ years later. There's a good reason the "safe" versions of those functions all take length parameters. But way back when this approach was chosen, I don't think the state of the art could fully predict this outcome.

But also, "strings" and "time" are actually very complex concepts, and these functions operate on often outdated assumptions about those underlying abstractions.

kiwidrew 5 years ago | | |

I would argue that C's fundamental mistake (well, more like limitation due to hardware of the time) was allowing arrays to decay to pointers; arrays hold valuable type information (the length!) that is lost once converted to a pointer.

C99 came so very very close with VLAs. You can declare a function like:

  int main(int argc, char *argv[argc]) { ... }

But C99 requires the compiler to discard the type annotations and treat the declaration as equivalent to:

  int main(int argc, char **argv) { ... }

Imagine a world where the C string functions were declared as:

  char *strndup(s, n)
    const char *s[n];
    size_t n;
  {
    /* now we can do sizeof(s) and bounds checking! */
  }

(You'd have to use K&R style declarations to get around the fact that the pointer argument comes before the length argument, alas.)

Edit: and then C11 made VLA support optional, since the feature didn't get used much, because the feature was only half-baked to begin with... sigh.

retrac 5 years ago | | |

For reasons that were never clearly articulated, the prefix approach was considered odd, backwards, and to have numerous downsides, at least where I learned C. In hindsight, I can only cringe at that attitude. Strings as added in later Pascal, about 40 years ago now, were memory safe in a way that C strings still are not.

Blikkentrekker 5 years ago | | |

> But also, "strings" and "time" are actually very complex concepts, and these functions operate on often outdated assumptions about those underlying abstractions.

Even in safer languages such as Rust, there are often quæstions as to why certain string operations are either impossible, or need to be quite complicated for a rather simple operation and are then met with responses such as “*Did you know that the length of a string can grow from a capitalization operation depending on locale settings of environment variables?

P.s.: In fact, I would argue that strings are not necessarily all that complicated, but simply that many assume that they are simpler than they are, and that code that handles them is thus written on such assumptions that the length of a string remain the same after capitalization, or that the result not be under influence of environment variables.

m463 5 years ago | | |

I think 30-40 years ago it was perfectly appropriate to null-terminate strings. Every byte actually counted.

I remember thinking about setting the high bit to denote the end of string to save space.

Nowadays the binary for "hello world" might be as big as a whole operating system of the past.

(though honestly I can't recall the size of the OS on a boot floppy, but the original floppies were 160k)

jrimbault 5 years ago | | |

30+ years -> 50+ years

Funny mind thing to forget to increment counters each year.

kazinator 5 years ago | | |

The reason that the safe functions take length parameters is that they produce a new object in uninitialized memory, a pointer to which is specified by the caller.

It has nothing to do with null termination.

And that uninitialized memory is not self-describing in any way in the C language. Which is that way in machine language also.

This is a problem you have to bootstrap yourself somehow if you are to have any higher level language.

The machine just gives you a way to carve out blocks of memory that don't know their own type or size. C doesn't improve on that, but it is not the root cause of the situation. Without C, you still have to somehow go from that chaos to order.

Copying two null terminated strings into an existing null-terminated string can be perfectly safe without any size parameters.

   void replace_str(char *dest_str, const char *src_left, const char *src_right);

If dest_str is a string of 17 characters, we know we have 18 bytes in which to catenate src_left and src_right.

This is not very useful though.

Now what might be a bit more useful would be if dest_str had two sizes: the length of string currently stored in it, and the size of the underlying storage. This particular operation would ignore the former, and use the latter. It could replace a string of three characters with a 27 character one.

coliveira 5 years ago | | |

Null terminated strings are remnants of an era when computers had little memory available. So, at the time it seemed smart to discard the length field and use a single byte-sized terminator (null). If you are writing an operating system for a machine with little memory to spare, this seems like a good decision. Of course things are very different now when memory is not a problem and the goal is safety.

frob 5 years ago | |

As someone who learned C as their first language, strings in every single language after that have felt like cheating.

"What? You mean I can type an arbitrary string and it works? I don't need to worry about terminators or the amount of memory I've allocated? You can concatenate two strings with +?!? What is this magic?"

unbalancedevh 5 years ago | | |

It always makes me wonder if there's some hidden overhead that I'm absorbing. When I program in C I feel like I know a lot better what the generated instructions will be. Using higher-level languages for embedded programming where resources are tight makes me uncomfortable.

macintux 5 years ago | | |

Yeah, every time I decide to play with C for nostalgia's sake, I immediately get hung up on just how painful everything is, especially strings.

I still love C, but I'd do my best not to have to write anything serious with it again.

cesaref 5 years ago | |

I think the key is to understand the historical context of C, what it was competing with, and what concerns people writing C had.

Compared to the alternative (straight assembler) at the time as a systems programming language, C is a massive step up.

Also, the UNIX way was independent processes, so the APIs did not need to be thread safe, as there was no threading in the target architectures.

Now given the massive amount of existing C out there from the time of such architectures, you either have to move the API and language on to make it incompatible with existing code, or support the old baggage. The language has kept compatibility, and in this case, the github peeps have deprecated APIs using macros, so it's a reasonable approach.

An alternative approach would be to move the language on, but by it's nature it won't be compatible with C, so you give it a new name. You call it things like go, or rust, or swift. These are all C with the dangerous bits removed. It'll be interesting in 40 years time to see if people are having the same conversation about these languages - 'OMG, how did people write stuff in rust? It can't cope with [insert feature of distributed quantum computing]. It's really scary'

johnnycerberus 5 years ago | | |

I wouldn't say that Go is an alternative approach. I mean, what's the difference between Go and Java AOT with Graal? But Rust is truly an alternative to C/C++.

da39a3ee 5 years ago | | |

This is git, not github.

cperciva 5 years ago | |

A better way of looking at it is that functions which expose very simple operations were among the first ones to be placed into the standard library -- and consequentially are the least well thought out.

IgorPartola 5 years ago | |

This is a lot like how in JavaScript you have footguns like the with statement or in Python 2 where you have Unicode issues, etc. I am sure we could definitely a new C standard that excludes these functions as obsolete, but the linked header file is a pretty sensible interim solution. C is an old language and it’s kind of amazing that code written 30 years ago can still by and large be compiled by a modern compiler. Ever try to run 3 year old React projects using today’s React? :)

ggregoire 5 years ago | | |

> in JavaScript you have footguns like the with statement

I've been coding in JS on a daily basis for more than 10 years and today I learned there is a `with` statement in JS.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

Edit: well, seems like it's been deprecated/forbidden since ES5 (2009), so it makes sense I've never seen it.

viklove 5 years ago | | |

It amuses me that HN hates JS so much, that even a topic about problems with C turns into a JS-bashing thread.

Also, I just want to remind you that JS isn't just React. There are plenty of libraries written in C that introduce breaking changes over the course of 3 years. Nothing will stop people from finding ways to complain about JS though, I know. The hate-boner is very real.

detaro 5 years ago | | |

Because individual libraries choosing to change quickly is comparable to language stability how? The relevant comparison would be "run a 3y old react app (or a 20 year old website using JS) in a modern browser or interpreter"

rtpg 5 years ago | |

The string stuff is kind of the original sin, but to be honest almost all programming environments have massive footguns when it comes to times/dates. Python's datetime story is _extremely_ painful to deal with. Try doing .... I dunno, anything apart from getting the current time and doing an ISO format of a Javascript Date object.

I think stuff has kinda gotten better, but while Unicode had emoji to kinda save the day, dates never had this moment and we're still suffering through major messes on a daily basis because of it.

AaronFriel 5 years ago | | |

Python's dates are very unlikely to cause quadratic or exponential performance dips, segfaults, or remote code execution vulnerabilities. (And JS now has Date#toISOString, since ES5.)

C's string manipulation functions are a regular source of the worst vulnerabilities in software.

Even if they're in the same category of legacy cruft, they're not even remotely in the same magnitude of consequences.

ironmagma 5 years ago | |

Yeah, there is a culture of complacency in C probably owing to the enormous historical baggage of legacy code that has to be supported and the blurred line between stdlib and system call.

freedomben 5 years ago | | |

I disagree completely. Devs who use C are the least complacent about security in my experience. The problems are from previous eras before they knew about many of these things. A ton of people in modern languages couldn't name a single dangerous function, though they do exist in every language. You'd be amazed at how many race condition vulns result from TOCTOU errors just in authentication, or checking for the existence of a file before opening it, etc.

It's absolutely true that decades ago the C community was complacent, but it's not true now. Source: I taught secure coding in C/C++ in the 00s.

dangerbird2 5 years ago | | |

It's not really complacency: it's that the standard library is intentionally minimalistic to maintain portability and backwards compatibility. If you want sensible string handling, it's usually best to use a high level utility library like GLib(https://developer.gnome.org/glib/stable/) or Apache Portable Runtime(http://apr.apache.org/), or roll your own safe string type (preferably non-null terminating)

dangerbird2 5 years ago | | |

c standard library doesn't really relate directly to system calls (at least in modern os'es). In particular, the stdio.h functions are buffered by default, while their system call analogues are not. For unixes, system call wrappers are typically found in <unistd.h>, not the "official" c standard library

Spivak 5 years ago | | |

I mean on Linux you're not encumbered by this because the syscall api is stable but in practice most GNU/Linux distros assume glibc. You can't correctly resolve a hostname on Linux without farming out to glibc -- hell even the kernel punts to userspace for dns names but you can technically ignore it if you want.

On BSDs and macOS you're always SOL because the syscall api isn't stable and only the C wrappers are.

beej71 5 years ago | |

While it's true that there are a lot of unsafe functions in C, it's not really a mistake. C is a fundamentally unforgiving language. You just have to accept the fact you're driving a naked supercar with no seatbelts.

It's easy to survive: just don't crash. :)

And, functions aside, it's trivial to write a C program that bombs out without calling any functions at all, safe or otherwise.

It's a language from a different era, for sure. Back then no one had the computing power to build Rust. And remember that before C, they were writing Unix in assembly language. So sprintf() was a big step up!

matheusmoreira 5 years ago | |

Yeah, because of NUL-terminated strings. They cause so many problems it's not even funny. Even something simple like computing the length of the string is a linear time operation that risks overflowing the buffer. People attempted to fix these problems by creating variations of those functions with added length parameters, thereby negating nearly all benefits of NUL-terminated strings.

Why can't we just have some nice structures instead?

  struct memory {
      size_t size;
      unsigned char *address;
  };

  enum text_encoding { TEXT_ENCODING_UTF8, /* ... */ };

  struct text {
      enum text_encoding encoding;
      struct memory bytes;
  };

All I/O functions should use structures like these. This alone would probably prevent an incredible amount of problems. Every high-level language implements strings like this under the hood. Only reason C can't do it is the enormous amount of legacy code already in existence...

guerrilla 5 years ago | | |

That would be nice. You hit on the other hell with C strings: modern encodings where wchar_t and mb* are useless and replacements essentially don't exist yet with char8_t, char32_t etc. Then there's the locale chaotic nonsense [1]. A new libc starting fresh would be nice.

1. https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...

SavantIdiot 5 years ago | |

If you list the languages you use, I'd be happy to point out the "footguns" in each of them. For all the warts on C, there really is no language that can compete for what it has accomplished over ~50 years.

Recall that during the rise of C, people were writing machine code on punch cards. Assembly -> Machine code has far more footbullets than C, it is a tradeoff between hand holding and tiny fast code.

Wow, this blew up.

To all the people popping off about how great other languages are, tell me: when will we see the Unreal Engine written in Python, or Pascal, or Algol, or Rust, or Go... the next big step is WebASM (or .cu), and that's way more footbullet-y than C. And what is the native language all of your sub-30 year old interpreted languages were written in? Thank you!

atoav 5 years ago | | |

Yeah there are footguns in every language. But this is not a boolean question about the presence of footguns, this is about how much one has to know to be able to handle a language safely.

I know C/C#/Python/Rust/Javascript.

After a decade of using C I am still not totally sure if I didn't dangle a pointer somwhere in precisely the wrong way to create havoc. And yeah, that means I have to get better, etc. But that is not the point. The point is, that even with a lot of experience in the language you can still easily shoot yourself into the foot and don't even notice it.

Meanwhile after a month of using Rust I felt confident that I didn't shoot myself in the foot, because I know what the compilers e.g. ownership guarantuees. While in C shooting myself into the foot happen quite often in Rust I would have to specifically find a way to shoot myself into the foot without the compiler yelling at me, and quite frankly I havent found such a way yet.

Javascript is odd, because the typesystem has quite a few footguns in it. This is why such things like Elm or Typescript exist: to avoid these footguns.

I don't want to take away from the accomplishments of C, and I still like the language, but to claim it is equally likely in all languages to shoot yourself into the foot is not true.

eschaton 5 years ago | | |

This is a grossly inaccurate description of computing at the time of the rise of C. C was competing with Pascal/Modula, BLISS, PL/I, BCPL, and so on, not assembly on punched cards.

The “C competing with assembly” meme was very specific to microcomputer game and operating system development, not more general microcomputer application development, and not to minicomputer or mainframe development.

cygx 5 years ago | | |

Recall that during the rise of C, people were writing machine code on punch cards.

Or Fortran, Algol, Lisp, Cobol, Basic, Pascal, ...

Gibbon1 5 years ago | | |

My favorite assembly foot gun was a guy I worked with had a cute routine. You had a call to the routine, followed by a null terminated string after that. The routine would spit the string to the terminal. And then return to the location after the string.

He had some bug where in one place it returned to the start of the string, executed it, and kept going. The end result just happened to be a nop. Had been like that in production for a couple of years.

int_19h 5 years ago | | |

Consider the fact that Simula-67, which predated C by 3 years, had classes and objects very similar to what Java offers (and then some - e.g. coroutines), and a built-in string processing library that used object-oriented syntax.

The reason why C won had little to do with its advantages as a language over the competitors. It just happened to be the systems language for Unix, which was the winner in the early OS wars on microcomputers (for unrelated reasons). Once it became so established, there was a positive feedback loop: you would write portable code in C, because you knew that it was the fastest language that most platforms out there would support. And then any new platform would offer a C compiler, because they wanted to be able to run all the existing C code out there. And so, here were are.

samatman 5 years ago | | |

Your edit really isn't helping your case.

Those of us who have always known about less dangerous 'system' languages (Pascal probably being the most popular) lament the fact that so much code got written in C instead.

It wasn't inevitable. It was preventable! It just didn't happen that way for reasons which are largely historical.

I don't work for the Rust Evangelism Strike Force, my main project is written in (as little) C (as possible), but I beg anyone who has a choice: use something else! Rust is... fine, Zig is promising. Ada still works!

Writing out the set {Python, Pascal, Algol, Rust, Go} tempts me to say uncharitable things about your understanding of the profession, but I accept you were just being snarky so I'll just gesture in the direction of how $redacted that is.

badsectoracula 5 years ago | | |

> when will we see the Unreal Engine written in...

Why would a huge C++ (not C, btw) codebase with roots going back to the 90s be rewritten in any other language?

And in fact how is the language Unreal Engine written in relevant to C having footguns?

maerF0x0 5 years ago | | |

Not that I dont believe there are any, but I'd love to hear your perspective...

Go (golang)

rodgerd 5 years ago | | |

There's far more critical code in the world running on COBOL and s3[79]0 assembler. COBOL is vastly more important than C.

lmilcin 5 years ago |

To respond to some of the comments.

It is not that there is anything intrinsically wrong with these functions. You can technically use all of them and I have been using all of them, safely, for decades.

The issue is they are huge traps to the point that in a larger piece of software one can say "well, it's just not worth it".

You can go much, much, much further than that.

In couple embedded projects I worked some of the rules were:

* dynamic allocation after application has started is banned -- any heap buffers and data structures must be allocated at the start of the application and after that any allocation is a compile time error,

* any constructs that would prevent statically calculating stack usage were banned (for example any form of recursion except when exact recursion depth is ensured statically),

* any locks were banned,

* absolutely every data structure must have size ensured, in a simple way, beyond any reasonable doubt,

etc.

drfuchs 5 years ago |

It would be nice if the error messages generated would suggest replacement functions that they deem appropriate. I see that I'm not supposed to use gmtime, localtime, ctime, ctime_r, asctime, and asctime_r; but what do they think I should use?

cle 5 years ago | |

From the commit messages

> The ctime_r() and asctime_r() functions are reentrant, but have no check that the buffer we pass in is long enough (the manpage says it "should have room for at least 26 bytes"). Since this is such an easy-to-get-wrong interface, and since we have the much safer strftime() as well as its more convenient strbuf_addftime() wrapper, let's ban both of those.

(https://github.com/git/git/commit/91aef030152d121f6b4bc3b933...)

> The traditional gmtime(), localtime(), ctime(), and asctime() functions return pointers to shared storage. This means they're not thread-safe, and they also run the risk of somebody holding onto the result across multiple calls (where each call invalidates the previous result). All callers should be using their reentrant counterparts.

(https://github.com/git/git/commit/1fbfdf556f2abc708183caca53...)

drfuchs 5 years ago | | |

Yes, but every hapless user shouldn't have to go searching through a bunch of commit messages to find the suggested replacement. Bad UX.

tinus_hn 5 years ago | | |

Strangely there is no mention of strtok which has a similar issue.

jancsika 5 years ago |

I love seeing "strncpy" right after "strcpy."

If someone wants some fun, try this:

1. Slurp up all the FOSS projects that extend back to 90s or early 2000s.

2. Filter by starting at earliest snapshot and finding occurrences of strcpy and friends who don't have the "n" in the middle.

3. For those occurrences, see which ones were "fixed" by changing them to strncpy and friends in a later commit somewhere.

4. See if you can isolate that part of the code that has the strncpy/etc. and run gcc on it. Gcc-- for certain cases (string literals, I think)-- can report a warning if "n" has been set to a value that could cause an overflow.

I'm going to speculate that there was a period where C programmers were furiously committing a large number of errors to their codebases because the "n" stands for "safety."

gilbetron 5 years ago | |

Meh, most of us understood the sharp edges of strings pretty well. Before, we'd check the len of strings before strcpy, strncpy let us do it without doing that, and just slap a 0 in if needed. Safe? No. Better? A bit. Do I ever want to do string manipulation again with C? Nope.

tomjakubowski 5 years ago | | |

Understanding the sharp edges is one thing. Being able to avoid them in practice is another. The history of memory safety problems in C string handling, especially involving strcpy/strncpy, strongly suggests to me that they're unavoidable even for C programmers who are skilled, knowledgeable, and experienced.

commandlinefan 5 years ago | |

Ok, memcpy(dst, src, strlen(src)) it is then!

yetihehe 5 years ago | | |

Yay for errors, it should be memcpy(dst, src, strlen(src)+1). Strlen doesn't count last 0. If your dst is not zeroed already you will have unterminated string.

captainmuon 5 years ago |

It would be interesting to see the rationale behind these bans, and what the suggested alternatives are. Some are obvious, like `strcpy`, but I can't remember what the problem with `sprintf` or the time functions are.

If you are doing something like `sprintf(buffer, "%f, %f", a, b)`, yes it is tricky to choose the size of buffer frugally, but if you replace that by `ftoa` and constructing the string by hand, you are likely to introduce more bugs.

Edit: as pointed out in another post, you can do git blame to see the rationale for each ban, quite interesing.

StillBored 5 years ago |

These functions are one of the many reasons why I tend to have a C with some C++ classes dialect I use in my own projects.

std::string needs some tweaks, but it can mostly be treated as a built in and it wipes out a huge set of C string issues.

at_a_remove 5 years ago |

I have only ever dabbled in C, just to look at other people's code and occasionally when I really needed speed, so I am at what I would call a "Pretty Pathetic" level, able to recognize that I am looking at C.

However, I look at old books on C, and then I look at this list, and I wonder if it would not have been helpful to, after mentioning that a function was banned, suggest what the replacement is, even as a comment.

syncsynchalt 5 years ago | |

You're not wrong. But a seasoned C developer looks at this list and nods along. (I'm a little out of practice, but I have war stories for most of these).

It's likely that the authors of this list didn't think the comments would be worthwhile for the audience (git developers).

attractivechaos 5 years ago |

I wonder how they copy strings with strcpy and strncpy both banned. strlcpy? But it is not conforming to major standards. Or just memcpy with extra code?

dgentile 5 years ago | |

Edited: Looks like they have safe alternatives: "

  - strlcpy() if you really just need a truncated but
    NUL-terminated string (we provide a compat version, so
    it's always available)

  - xsnprintf() if you're sure that what you're copying
    should fit

  - strbuf or xstrfmt() if you need to handle
    arbitrary-length heap-allocated strings
"

shadowgovt 5 years ago |

To its credit, it's convenient that the C pre-processor is so powerful that it facilitates baking a "C the good parts" concept directly into the compilation process.

1337_d00dZ 5 years ago |

In compilers that implement GCC extensions (such as Clang), you can use the "poison" directive to achieve the same effect (but with a better error message):

#pragma GCC poison printf sprintf fprintf

[0] https://gcc.gnu.org/onlinedocs/gcc-3.2/cpp/Pragmas.html

TheRealSteel 5 years ago |

I'm an idiot, I read the headline and thought these were banned from Git entirely. As in, you couldn't commit them to any repo using Git, at all. Thought that seemed a bit harsh.

Turns out you just can't use them when you contribute code to the Git project. That makes sense, and seems reasonable.

edgyquant 5 years ago | |

Critiquing poor code practices is beyond the scope of git at this time

TheRealSteel 5 years ago | | |

Should be easy to implement, will have a pull request ready tomorrow.

Edit: wait, I can't use strcpy?! Screw that, then I'm not open sourcing my AGI!

EdSchouten 5 years ago |

Funnily enough, strtok() is not listed :)

raegis 5 years ago | |

This one has my vote for the weirdest library function ever.

rightbyte 5 years ago | | |

The storing of state between calls is beautiful in all its wickedness.

moomin 5 years ago |

They should probably add sscanf.

ed25519FUUU 5 years ago | |

First thing I looked for. It looks like it was used here:

https://github.com/git/git/blob/master/object-file.c#L1293

And currently used here (at least):

https://github.com/git/git/blob/master/refs.c#L1235

DyslexicAtheist 5 years ago |

Some functions are missing which would normally cause a warning with most linters and static security analysis tools (e.g. the atoX family, mktemp, etc ...). Problem is most people I know don't run external linters (maintaining good linting rules is hard to scale in larger projects and in my >3 decades of writing C only few companies[0] I've seen managed the linting rules as part as their "definition of done").

While I think such rules are a good idea it only makes sense if it is done consistently and depends on how religiously the tooling (duct-tape and "process") enforces them (even so, you're still only one `#ifdef` away from undoing that "safety"). Having GCC[1] now support static analysis is a killer feature for this type of problem.

On the other end of the spectrum we have Huawei which instead of linting their code is finding creative ways to trick auditing tools and hide such warnings from auditors:

[0] https://news.ycombinator.com/item?id=22712338

[1] https://developers.redhat.com/blog/2021/01/28/static-analysi...

[2] https://grsecurity.net/huawei_and_security_analysis

abetusk 5 years ago |

The Git Mailing List Archive on lore.kernel.org (found in the README from the git mirror on GitHub) has more context [0] [1] [2]. From Jeff King on 2018-07-24:

  The strncpy() function is less horrible than strcpy(), but
  is still pretty easy to misuse because of its funny
  termination semantics. Namely, that if it truncates it omits
  the NUL terminator, and you must remember to add it
  yourself. Even if you use it correctly, it's sometimes hard
  for a reader to verify this without hunting through the
  code. If you're thinking about using it, consider instead:

    - strlcpy() if you really just need a truncated but
      NUL-terminated string (we provide a compat version, so
      it's always available)

    - xsnprintf() if you're sure that what you're copying
      should fit

    - strbuf or xstrfmt() if you need to handle
      arbitrary-length heap-allocated strings

I just did a search on the keywords 'banned' and 'strncpy' [2]

[0] https://lore.kernel.org/git/20180724092828.GD3288@sigill.int...

[1] https://lore.kernel.org/git/20190103044941.GA20047@sigill.in...

[2] https://lore.kernel.org/git/20190102093846.6664-1-e@80x24.or...

[3] https://lore.kernel.org/git/?q=banned+strncpy

js2 5 years ago | |

Psst:

https://github.com/git/git/commits/master/banned.h

(Git development is done by emailing patches. Those patches include the git commit message, which we can see just by looking at the history of the file. Sometimes there's additional discussion on the ML, but the most important details are in the commit message because the git development team is very disciplined about that.)

abetusk 5 years ago | | |

Ha, yep, whoops

Luyt 5 years ago |

It would be great if the BANNED() macro could suggest the correct function to use.

sedatk 5 years ago | |

The right function may change based on the use case, that's why they may not have wanted to suggest an alternative outright.

tinus_hn 5 years ago | |

You could send a pull request, it doesn’t seem too complicated to implement

zX41ZdbW 5 years ago |

The similar list from ClickHouse repository: https://github.com/ClickHouse/ClickHouse/blob/master/base/ha...

zbendefy 5 years ago |

Are there some details on whats wrong with these?

bvaldivielso 5 years ago | |

The commit messages that added them explain the reasoning

ufo 5 years ago | | |

I wish they would have put that on comments instead of on the commit messages. It's not the first time that I've seen this particular list of banned functions being shared online and every time it happens someone has to explain that the most interesting info is hidden in the commit messages.

alexchamberlain 5 years ago | |

All the string functions have buffer overrun vulnerabilities if not used carefully. I'm not sure about the time functions though.

edflsafoiewq 5 years ago | | |

The time functions are either non-reentrant, or, for the _r versions, have the same problem with buffer overruns.

https://github.com/git/git/commit/1fbfdf556f2abc708183caca53...

https://github.com/git/git/commit/91aef030152d121f6b4bc3b933...

trilinearnz 5 years ago | | |

Very much this. I frequently write small games in C, and the number of times I have been bitten by baffling behaviour because a string somewhere was copied into an array that was too short, are many! Apart from that, I love the simplicity of the language and the stdlib, and it's definitely my preferred hobby programming environment.

It would be good to know what the commonly-accepted alternatives are.

csours 5 years ago | |

I'm pretty sure you could google each of these with the word 'dangerous'

For example: https://lgtm.com/rules/2154840805/

userbinator 5 years ago |

At least they didn't ban memcpy()...

Much like with all other forms of effective censorship, I see this as a quick short-term "fix" with hidden long-term costs[1]. IMHO this sort of anti-thinking just leads to even worse, more dogmatic and cargo-cult, programmers who know less and less about the basics and then go on to make even more subtle errors.

Somehow the collective software industry has managed to propagate the notion that people are incapable of doing even basic arithmetic. Yet they think people are capable of creating complex systems with even more subtle behaviour? The justification would normally be because it's not directly affecting security. WTF. It's beyond stupid.

The only C function I think should be truly banned is gets(), because it is actually impossible to calculate what size of buffer it needs. That is not true of any of the others on this list.

[1] By short and long, I mean decades vs centuries.

synergy20 5 years ago |

OK, so no strncpy, strncat etc, what are the alternatives used in git then? I'm a long-time C coder but I do not know what will be used to replace strncpy/strncat and all those gmtime/localtime/ctime/asctime.

bvaldivielso 5 years ago |

Ah this is a very good idea. I guess you still have to make sure that all your translation units include this header, which isn't completely foolproof.

Static analysis would probably be more robust, but way more involved.

radus 5 years ago | |

Best of both worlds: use static analysis to ensure the header is included?

koenigdavidmj 5 years ago | |

gcc has a -include option, so this can be done once in the Makefile and get the benefit everywhere (unless you’re being clever).

Athos_vk 5 years ago | |

I remember visual studio having an option to force include a file, surely something like that would exist for other toolchains

kccqzy 5 years ago | |

You don't need fancy static analysis. You can find out whether the banned functions are called just by inspecting the compiled object file. Add it to the build step and done.

snvsn 5 years ago |

Previous discussion: https://news.ycombinator.com/item?id=20792938

shaggie76 5 years ago |

Our forbidden functions header is similar; it's got about 30 functions including most of the str* family to enforce the use of our safer versions.

robinduckett 5 years ago |

Is there no linting software that can catch these kinds of issues? Like using strlen with sscanf like I've been hearing about lately?

whydoyoucare 5 years ago |

I am so thankful git isn't forcefully including this header in every C language project and that we have a choice when using git! :-)

matt-attack 5 years ago |

I used C many years ago so I’m quite out of it. What are the replacements for these? I would have thought these were all necessary.

malkia 5 years ago |

Let's use a loophole ;) - (strcpy)(a,b)

totorovirus 5 years ago |

Now I am getting really curious whether other companies with supposedly strong engineering knew about sscanf issues.

ape4 5 years ago |

Just replace strcpy(a,b) with strcpyn(a,b,INT_MAX)

/joke

fatnoah 5 years ago | |

I'm pretty sure I've seen similar logic in my life.

beeforpork 5 years ago | |

Been a while, eh?

It should be strncpy(a,b,(size_t)-1)!

guerrilla 5 years ago | | |

SIZE_MAX does exist.

maxk42 5 years ago |

What would be helpful is an explanation of how each function ends up being misused so people can learn from this.

petters 5 years ago | |

Git blame is helpful here. See e.g.https://github.com/git/git/commit/1b11b64b815db62f93a04242e4...

jsmith45 5 years ago | |

View the git history for the file. Each commit that adds functions has a detailed explanation of what is wrong with the functions.

Animats 5 years ago |

About 20 years too late. Those should have been moved to a "deprecated" header file decades ago.

amir734jj 5 years ago |

Maybe instead of just writing a banned message, it should be the name of alternative function to use.

lerax 5 years ago |

Yes, this is right. Any C decent programmer knows that functions are cursed.

xvilka 5 years ago |

I hope, one day to see it's rewritten in a safer language.

qbasic_forever 5 years ago | |

There's a nice Go implementation of git: https://github.com/go-git/go-git

jll29 5 years ago |

gets() and scanf() should be on that list due to potential buffer overflow.

anovikov 5 years ago |

why is strncpy banned? what's wrong about it?

kgrimes2 5 years ago |

Can a C guru provide a TL;DR of why these are bad?

syncsynchalt 5 years ago | |

    - strcpy: no bounds check
    - strcat: no bounds check
    - strncpy: does not nul-terminate on overflow
    - strncat: no major issues, probably to force usage of strlcat
    - sprintf: no bounds check
    - vsprintf: no bounds check
    - gmtime: returns static memory
    - localtime: returns static memory
    - ctime: no bounds check
    - ctime_r: no bounds check
    - asctime: returns static memory
    - asctime_r: no bounds check

The str functions all have safer alternatives. The time functions have reentrant alternatives, and/or alternatives that provide a bounds check.

sys_64738 5 years ago |

getc?

syncsynchalt 5 years ago | |

`gets` would be the ultimate banned C function, I suspect nobody thought it was worth spelling out though.

sys_64738 5 years ago |

scanf?

known 5 years ago |

Just banning is not fair; Include the alternatives;

size_t strlcpy(char *dst, const char *src, size_t dstsize) { size_t len = strlen(src); if(dstsize) *((char*)mempcpy(dst, src, min(len, dstsize-1))) = 0; return len; }

- strlcpy() if you really just need a truncated but NUL-terminated string (we provide a compat version, so it's always available) - xsnprintf() if you're sure that what you're copying should fit - strbuf or xstrfmt() if you need to handle arbitrary-length heap-allocated strings