Heap memory corruption in GitHub's Markdown table parsing extension

Heap memory corruption in GitHub's Markdown table parsing extension(github.com)

83 points by hyfen 4 years ago | 42 comments

esprehn 4 years ago |

This seems like a good opportunity to use wasm on the server to sandbox the processing of user provided content. Of course they could also try rewriting in a safer language, but given that this already exists and handles all their content, wasm might be a simple defense in depth protection.

staticassertion 4 years ago | |

What Dropbox did for this sort of thing is ideal. You spawn a child process that has two file handles piped to/from the parent - stdin, stdout.

That child process does the scary stuff - parsing. Parsing requires zero system calls. Reading to/from the parent requires only read and write, but not open, so they can only read and write to those file descriptors.

And exit.

That's it. Seccomp v1 is trivial to apply, gives 4 system calls, and makes the process virtually useless to an attacker. If you want to get fancy and allow for multithreading you can use seccomp v2 and create your threadpool before you drop privs, and probably add futex and memmap.

You pay a latency cost but the security win is huge.

the_duke 4 years ago | | |

That's a lot of complicated, non-portable steps, with many subtle semantics that can easily be implemented incorrectly.

Running the code in a Wasm sandbox sounds a whole lot easier and less error prone. You do have to trust the Wasm engine, but nothing else. And you don't need in-depth knowledge of OS security mechanisms.

nwmcsween 4 years ago | | |

Welcome to 1996 with ucspi

pjmlp 4 years ago | |

WASM doesn't protect against heap corruption, because bounds checking doesn't apply inside a linear memory segment.

olliej 4 years ago | | |

I think the point was that you can’t corrupt the containing process, and wasm separates code from data (Harvard arch?) so you don’t get arbitrary code exec. Of course if you process output of the wasm in a trusted environment the compromised wasm could generate something that compromises the host, but the same applies to using separate processes and IPC

pjmlp 4 years ago |

Another integer overflow bites the dust.

tines 4 years ago |

I am a C++ fanatic---template metaprogramming is a beautiful thing---but I've come to believe that software that handles untrusted user input should never be written in C or C++. It's too difficult to write correct software by hand, memory safe languages are really the only way.

bob1029 4 years ago | |

Does there actually exist any practical way to ensure user input does not cause mischief when authoring C/C++ programs at scale? Are memory-safe languages the only answer?

tialaramex 4 years ago | | |

Much worse than that, even memory-safe languages like (safe) Rust, and the inevitable suggestion of AUTOSAR and so on aren't the answer. To properly answer your demand for a "practical way to ensure user input does not cause mischief" you want a drastically less capable language which cannot even in principle express the programs that should not exist, that's exactly what WUFFS is for.

https://github.com/google/wuffs

This sort of bug can't happen in WUFFS because you can't express the idea "corrupt the heap memory" even if you desperately wanted to. The tell-tale sign of such languages is that they are not general purpose languages, because those are able to express a wide variety of stupid things you don't want to do.

pjmlp 4 years ago | | |

Yes, the security standards like MISRA and AUTOSAR basically castrate C and C++ into subsets similar to those languages.

endorphine 4 years ago | |

Is this a vulnerability that would be impossible kn6, let's say, Rust?

steveklabnik 4 years ago | | |

This seems to be the patch: https://github.com/github/cmark-gfm/commit/cf7577d2f74289cb8...

Integer overflow can happen in Rust, but it's well-defined, not undefined. This helps.

Bounds checking is part of indexing, and so even if an index overflows, the check should happen, and panic.

"impossible" is a strong word, but it would be significantly less likely in Rust. If you did the same thing as you did in C, with unsafe, then it could happen. But there's not a lot of reason to 99.9999% of the time, as it's the more difficult and less ergonomic option.

roblabla 4 years ago | | |

Yes. Heap Memory Corruption is a type of memory safety issue that's impossible in Safe Rust. (As usual, this depends on any unsafe code and the compiler being bug-free, but that's supposed to be much easier to prove since the "scope" of things to check for correctness is much reduced).

pornel 4 years ago | | |

Rust programs don't call `malloc` directly, so the problem of overflow in malloc size calculation is mitigated by never needing to write such code (Rust programs use something like Vec, which is a safe abstraction that reliably (re)allocates as much as required.)

Rust's lack of implicit numeric conversions pushes authors towards using usize (size_t) for everything. So in Rust you'd be more likely to have a denial of service due to supporting 2^64 columns. If you tried to carelessly use u16 for the number of columns, you'd more likely have an application level bug like incorrect page rendering, or in the worst case a panic (equivalent of an uncaught C++ exception, which may be a program-stopping bug, but not a vulnerability).

olliej 4 years ago | | |

Unexpected overflow faults in most modern safe languages (rust, swift, presumably go?) by default - they generally use different operators or functions for when overflow is ok.