Bootstrapping from Hex to Bison to GCC

Bootstrapping from Hex to Bison to GCC(github.com)

103 points by z29LiTp5qUC30n 5 years ago | 19 comments

Nice work Fossy and co.

I believe this is this is the dependency chain your live-bootstrap works through: https://github.com/oriansj/talk-notes/raw/master/live-bootst...

userbinator 5 years ago |

See also Bellard's TCCBOOT, which is based on the far simpler TCC instead of GCC: https://bellard.org/tcc/tccboot.html

donio 5 years ago | |

That's a different kind of bootstrapping though. The OP is about building up to a toolchain from almost nothing using only source code and only relying on some tiny binaries that could be hand-assembled or verified. And it also uses a version of TinyCC along the way before it arrives to gcc.

fosslinux 5 years ago | |

I do plan to integrate this, or something very similar, into live-bootstrap. This is essentially the type of thing we would do to compile the Linux kernel within the bootstrap, which has not been done yet. However tccboot/derivative needs to be compiled from source within live-bootstrap which is not simple.

markjenkinswpg 5 years ago | |

One nice thing about the work showcased here is that it bootstraps TCC on its journey to GCC. At some point we could see attempts to bootstrap tccboot as a kind of "escape pod" from other systems, and after booting tccboot, have the bootstrapping work documented here continue from TCC onward.

XorNot 5 years ago |

This is cool as heck. Outside of architectural attacks, this seems like a practical response to Reflections on Trusting Trust (http://users.ece.cmu.edu/~ganger/712.fall02/papers/p761-thom...).

While we can definitely discuss whether it's practical for anyone to actually audit all that source code (no it is not), proving a 356 bytes codestream isn't malicious seems like a good foundation to argue about.

recuter 5 years ago | |

suspicious squints

Perhaps this bit is key as you could cross reference the two:

> Furthermore, having an alternative bootstrap automation tool allows people to have greater trust in the bootstrap procedure.

Interesting thought exercise.

Edit: Avoid this subject unless you want to be nerd sniped and spiral into paranoia.

siraben 5 years ago |

See also blynn-compiler[0], made by the same contributors, that bootstrap a Haskell compiler from C (which in term is bootstrapped from hex).

[0] https://github.com/oriansj/blynn-compiler

Aissen 5 years ago |

See this doc for how the full process works: https://github.com/fosslinux/live-bootstrap/blob/master/part...

In particular, this the very first step: https://github.com/oriansj/stage0-posix/blob/master/x86/hex0... (or its hand-edited binary version ?)

Edit: this how it's "assembled":

    sed 's/[;#].*$//g' $input_file | xxd -r -p > $output_file

See: https://github.com/oriansj/bootstrap-seeds/blob/master/READM...

fjfaase 5 years ago |

I wonder if Brainfuck could be used for https://github.com/oriansj/stage0-posix ? It would not surprise me if there is no other language for which there are so many interpreters written in so many different programming languages. It is even possible to write a Brainfuck interpreter in Brainfuck, which can be verified. And there is also a Brainfuck interpreter written in x86-64: https://github.com/316k/brainfuck-x86-64 . It is a little larger than hex0_x86.hex0 , but not too much to make it hard to verify.

kragen 5 years ago | |

Bootfuckbrainstrapping (?) is definitely an interesting idea; I explored it in some depth in https://dercuano.github.io/notes/uvc-archiving.html#addtoc_2 and https://dercuano.github.io/notes/self-compiler-bootstrapping..., and I concluded that although BF is inspiring, it would be easy to do better. Also, although the interpreter you point at compiles to 11331 bytes on my machine (5992 stripped, 5912 after objcopy -S -R .note.gnu.build-id bf bf.small) Brian Raiter wrote a 199-byte BF interpreter for i386 Linux last millennium: http://www.muppetlabs.com/~breadbox/software/tiny/bf.asm.txt

fjfaase 5 years ago | | |

I have written a BrainFuck generator for writing a BrainFuck program which could replace the hex0 seeds. See https://www.iwriteiam.nl/BFgen.html

choeger 5 years ago |

So, how about the kernel, eh? ;)

fosslinux 5 years ago | |

The kernel is still an unsolved problem, unfortunately. We need an appropriate, small seed kernel to be able to successfully run everything up until we can recompile linux (which also has proved difficult and is not why it is in the repo yet) :(