Neverflow: C macros that guard against buffer overflows

Neverflow: C macros that guard against buffer overflows(github.com)

120 points by sertraline 3 years ago | 143 comments

eqvinox 3 years ago |

The problem with C and buffer overflows isn't that you can't guard against them, or that there is no existing, reusable code to do so — it's that none of this functionality is standardized. Adding another one to the existing 41383 ways of doing this is in fact the exact opposite of what's needed. Ideally C needs one way of doing this, and that would be described in the standard.

But that's not how C "rolls", and we'll never get that. So I guess we now have 41384 ways to do buffer overflow guards.

ActorNightly 3 years ago | |

There is value in actually understanding what someone is doing in regards to protecting against buffer overflows, instead of relying on well established patterns.

hinkley 3 years ago | | |

Not when I’m trying to orchestrate third party libraries.

6D794163636F756 3 years ago | |

C never has just one way to do something. myArr[5] == 5[myArr] == (insert pointer arithmetic that I won't write here without a compiler check). I think that part of C's beauty is that it gives you freedom. Freedom to shoot yourself in the foot, freedom to write hyper efficient code, and freedom to choose another tool.

I agree that this will never be implemented as a standard, but I think that's a good thing. Higher level languages push against their boundaries non stop. Java has libraries and frameworks that fundamentally change the syntax and functionality of the language. C knows what it is. If you want something that it can't do it promises that you can either build it yourself or switch to a different tool.

All of this to say, C has a single suggested way of doing this: using a different language. That's part of why we built them

adastra22 3 years ago | | |

Those are syntactic sugar for the same thing though. Array[5] is just shorthand for *(Array + 5), which is why 5[Array] also works (because addition is commutative).

Note that C does have strong conventions, such as that strings are terminated by a zero byte. Nothing in the language demands that, it’s just a convention! C could adopt better conventions.

shrimp_emoji 3 years ago | | |

Checked arithmetic has been implemented in the standard with `ckdint.h`, so give it 50 more years!

JohnFen 3 years ago | |

> Ideally C needs one way of doing this, and that would be described in the standard.

I'm really glad that C doesn't do this, personally. It would reduce one of the main advantages of the language.

nix0n 3 years ago | |

> existing, reusable code to do so

Is there a library that you recommend for this?

augustk 3 years ago |

Even without array bounds checking, a bit of discipline and smart conventions will go a long way of reducing errors:

1. Define a macro function for retrieving the length of an array:

  #define LEN(arr) (sizeof (arr) / sizeof (arr)[0])

2. Don't introduce macro constants for array lengths; hard code the length in the declaration and use LEN to retrieve it. Example:

  int a[100];
  ...
  for (i = 0; i < LEN(a); i++) {
     ...
  }

3. Define a macro function for dynamic array allocation:

  #define NEW_ARRAY(ptr, n) \
     (ptr) = malloc((n) * sizeof (ptr)[0]); \
     if ((ptr) == NULL) { \
        fprintf(stderr, "Memory allocation failed: %s\n", strerror(errno)); \
        exit(EXIT_FAILURE); \
     }

4. When you create a function with an array argument, also add an argument for the array length.

5. Use a convention for naming the length of array pointer targets, for instance by adding the suffix `Len'. Example:

  int *b, bLen = 100;
  ...
  NEW_ARRAY(b, bLen);  /* nice to know that b and bLen belong together */
  ...
  SomeFunction(b, bLen, ...);
  ...
  for (i = 0; i < bLen; i++) {
    ...
  }

6. Define your own safe wrappers around unsafe standard library functions or use someone else's code that does that.

hgs3 3 years ago |

C23 improved struct compatibility so you might be able to leverage that to craft macros that better emulate slices. [1]

There is an RFC proposal for the Clang frontend for adding bounds checking reminiscent of Microsoft's SAL. [2]

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3003.pdf

[2] https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-...

uecker 3 years ago | |

You may be interested in this: https://github.com/uecker/noplate.git

kazinator 3 years ago |

The following error prone: it can be mistakenly applied to a pointer:

#define LEN(NAME) (sizeof NAME / sizeof(NAME)[0])

I think gcc has a warning for this pattern now: when the size of a pointer is divided by the size of its referent type.

More importantly, it has an odd extra level of indirection. The traditional definition is:

#define LEN(ARRAY) (sizeof ARRAY / sizeof (ARRAY)[0])

This means that to use LEN on an array, we have to take the address:

   char *array[5];
   LEN(&array);  // -> 5

If we use

   LEN(array);

which is an easy mistake, we get:

    sizeof *array / sizeof (*array)[0]

which is

    sizeof (char *) / sizeof (char)

which is

    sizeof (char *)

which is likely 4 or 8.

I do see that LEN is supposed to be (only) used in conjunction with ARR:

    #define ARR(TYPE, NAME, COUNT) TYPE(*NAME)[COUNT]

but that isn't enforced. An idea would be to add some "secret" prefix or suffix to NAME like blah_ ## NAME, so that name cannot be referenced without going through the macros; i.e. if we define ARR(int, foo, 42) then there is no declared identifier foo; it actually declares blah_foo, and LEN(foo) knows about that, also adding the prefix. Thus mistakenly using LEN(foo) on something not declared with ARR will likely be a reference to an undeclared identifier.

skullchap 3 years ago | |

It's so funny, but i actually had this in 0.0.1 for exact same reason. I removed it in 0.0.2 today after complains that it complicates things and a bit confusing. It made harder to pass VLAs to functions. Maybe if i find a better way i will return name mangling again, but for now being able to pass arrays to functions and maintain same flexibility is more important imo

kazinator 3 years ago |

The expansion of the AT macro seems a bit bloated:

  #define AT(NAME, IDX)                                         \
    ((typeof(&(*NAME)[0]))                                      \
    ((ASSERT(((size_t)IDX) * sizeof(*NAME)[0] < sizeof *NAME,   \
    "Buffer Overflow. Index [%lu] is out of range [0-%lu]",     \
    ((size_t)IDX), ((sizeof *NAME / sizeof(*NAME)[0]) - 1))),   \
    ((uchar *)*NAME) + ((size_t)IDX) * sizeof(*NAME)[0]))

Some of this might be pushed into non-inlined run-time support function. That could be static and defined in the header, to keep it header-only, but ideally there would be a .c file so it's defined only once.

When you factor in the definition of ASSERT, and the ERRLOG macro that is using, it's a lot of cruft for just one array access.

Some compile-time options (via preprocessor macros) to control the bloat would be useful; e.g. a way of compiling it so that AT will just predictably crash, without a detailed error message with __FILE__ and __LINE__ and all. Basically just the check, with a branch to some code that calls abort() if it's out of bounds.

nerpderp82 3 years ago | |

Benchmark it after -O3, does it really matter ?

pjmlp 3 years ago |

Interesting idea, although given the demotion into optional feature in C11, it isn't necessarly portable.

Also doesn't cover all the string and memory buffer manipulations.

SAL and Frama-C are the bare minimum for security in C code.

e4m2 3 years ago | |

Frama-C as a bare minimum is a pipe dream.

It's a nice thought, don't get me wrong, but it's hard enough to convince people to add `-fsanitize=...` to their compiler flags. An entire separate static analysis tool with its own learning curve (and its own set of idiosyncrasies) doesn't really qualify for "bare minimum" IMO.

pjmlp 3 years ago | | |

Thankfully the ongoing cybersecurity laws will change that mindset.

uecker 3 years ago | |

We will make VM-types, i.e. pointers to VLAs, mandatory in C23.

kovac 3 years ago | |

What is SAL?

hgs3 3 years ago | | |

Source-code annotation language (SAL) [1].

[1] https://learn.microsoft.com/en-us/cpp/code-quality/understan...

pjmlp 3 years ago | | |

Besides the sibling comment, SAL was born out of the security efforts to fix Windows XP that ended up with the release of Windows XP SP2.

inetknght 3 years ago |

Why use C and keep reinventing things that C++ provides?

dang 3 years ago |

Related ongoing thread:

Modern C (2019) - https://news.ycombinator.com/item?id=36167820 - June 2023 (19 comments)

JonChesterfield 3 years ago |

Runtime bounds check tied to fprintf and abort via macros. Allocation by calloc.

mtlmtlmtlmtl 3 years ago | |

The calloc part is one of the most common blind spots I see among C programmers.

I try to avoid the malloc(n * sizeof (...)) pattern as much as possible. Sure there are lots of cases where it can never overflow, and you might save a bit of overhead from the zeroing and overflow checking, but most of that overhead might also be imaginary depending on allocator internals, and even kernel internals. It's the sort of thing it only makes sense to optimise when you've already squeezed out every bit of performance. And by then you've probably minimised dynamic allocation as much as possible anyway.

It's also very easy to think something like "well, n is passed in as a parameter, but it's a static function, and I know all the callers. So it's fine".

But now every caller in the future has to be aware of this possibility.

lelanthran 3 years ago | | |

> But now every caller in the future has to be aware of this possibility.

Can you clarify: what possibility should you be aware off with malloc that you don't need to be aware of with calloc?

cornstalks 3 years ago |

This evaluates macro parameters multiple times, so if the parameters have side effects or evaluate inconsistently this won't work. For example:

    size_t SomeIndex() {
      static size_t example_index = 0;
      return example_index++ % 2;
    }

    int main() {
      NEW(int, arr, 1);
      // This buffer overflow is not detected:
      *AT(arr, SomeIndex()) = 42;
      return 0;
    }

frabert 3 years ago |

Never heard of a serious buffer overflow caused by _constant_ indices. Does it work with AT(arr, i), or only with AT(arr, 10)?

oleganza 3 years ago | |

"'Brother,' says he, 'greetings. Didn't I see you in Southern Missouri last summer selling colored sand at half-a-dollar a teaspoonful to put into lamps to keep the oil from exploding?'

"'Oil,' says I, 'never explodes. It's the gas that forms that explodes.' But I shakes hands with him, anyway.

...

"'Listen,' says I. 'I instruct her to keep her lamp clean and well filled. If she does that it can't burst. And with the sand in it she knows it can't, and she don't worry.

— O. Henry, The Man Higher Up

CyberDildonics 3 years ago | | |

Did you mean to reply somewhere else? This thread is about about bounds checking arrays in the C programming language.

heylemao 3 years ago | |

Yeap, that's the whole point of it

frabert 3 years ago | | |

Huh I misinterpreted the error messages in the example, I thought those were compiler output. This is quite cool then.

EDIT: although, it seems like this looses much of its power once you start passing these buffers around to functions that do not use these macros.

uecker 3 years ago |

See also here for my experiments, but it relies on UBSan for bounds checking: https://github.com/uecker/noplate.git

norir 3 years ago |

The best way to deal with this kind of thing is to write a small language that transpiles to the subset of c that you are using.

kazinator 3 years ago |

Here is a different take on it. We can use #define to inform the header about the properties of certain symbols.

Here is my oob.c program. I will show the output, and then the content of "oob.h".

  #include <stdlib.h>
  #include <stdio.h>
  #include "oob.h"

  int oob_fail(const char *file, int line)
  {
    fprintf(stderr, "%s:%d:out of bounds array access\n", file, line);
    abort();
  }

  /*
   * Declare properties of array type x
   */
  #define ARRAY_ELTYPE_x int    /* element type is int */
  #define ARRAY_SIZE_x 7        /* number of elements is 7 */

  /*
   * Ensure array type x is fully declared at file scope
   */
  ARRAY_FULLTYPE(x);

  /*
   * Inform the OOB module that the identifiers p and a are
   * used as variables related to type x: either pointers
   * to it or values.
   */
  #define ARRAY_TYPEOF_p x
  #define ARRAY_TYPEOF_a x

  int get_elem(ARRAY_TYPE(x) *p, int i)
  {
     return APREF(p, i);
  }

  int main(void)
  {
     ARRAY_TYPE(x) a = ARRAY_INIT(1, 2, 3);

     for (size_t i = 0; i <= ARRAY_SIZEOF(a); i++)
        printf("a[%zd] == %d\n", i, get_elem(&a, i));

     return 0;
  }

Output:

  $ ./oob
  a[0] == 1
  a[1] == 2
  a[2] == 3
  a[3] == 0
  a[4] == 0
  a[5] == 0
  a[6] == 0
  oob.c:31:out of bounds array access
  Aborted (core dumped)

The content of "oob.h"

  #ifndef OOB_H_435E_FDE9
  #define OOB_H_435E_FDE9

  int oob_fail(const char *file, int line);

  #define OOB_PREFIX oob_ident_
  #define OOB_XCAT(X, Y) X ## Y
  #define OOB_CAT(X, Y) OOB_XCAT(X, Y)

  #define ARRAY_ELTYPE(T) OOB_CAT(ARRAY_ELTYPE_, T)
  #define ARRAY_SIZE(T) OOB_CAT(ARRAY_SIZE_, T)
  #define ARRAY_TAG(T) OOB_CAT(ARRAY_TAG_, T)

  #define ARRAY_FULLTYPE(T)                                                     \
    struct ARRAY_TAG(T) {                                                       \
      ARRAY_ELTYPE(T) a[ARRAY_SIZE(T)];                                         \
    }

  #define ARRAY_TYPE(T) struct ARRAY_TAG(T)

  #define ARRAY_TYPEOF(V) OOB_CAT(ARRAY_TYPEOF_, V)
  #define ARRAY_SIZEOF(V) ARRAY_SIZE(ARRAY_TYPEOF(V))

  #define ARRAY_INIT(...) { { __VA_ARGS__ } }

  #define AREF(ARRAY, I)                                                        \
    (((size_t) (I) >= ARRAY_SIZEOF(ARRAY))                                      \
     ? oob_fail(__FILE__, __LINE__), (ARRAY).a[0]                               \
     : (ARRAY).a[I])

  #define APREF(PARRAY, I)                                                      \
    (((size_t) (I) >= ARRAY_SIZEOF(PARRAY))                                     \
     ? oob_fail(__FILE__, __LINE__), (PARRAY)->a[0]                             \
     : (PARRAY)->a[I])

  #endif

Preprocessor invoked on oob.c (snipped down to the relevant part after the run-time support function oob_fail):

  struct ARRAY_TAG_x { int a[7]; };


  int get_elem(struct ARRAY_TAG_x *p, int i)
  {
     return (((size_t) (i) >= 7) ? oob_fail("oob.c", 31), (p)->a[0] : (p)->a[i]);
  }

  int main(void)
  {
     struct ARRAY_TAG_x a = { { 1, 2, 3 } };

     for (size_t i = 0; i <= 7; i++)
        printf("a[%zd] == %d\n", i, get_elem(&a, i));

     return 0;
  }

It's clean enough to be readable (except, of course, code dense with AREF or APREF calls will be a mess). Uses arrays wrapped in structs, so you can pass arrays by value.

You have to make a list of your variables that are involved and write some #define lines for them.

Same for the array types.