SSHD: Random boot time relinking, OpenBSD(undeadly.org) |
SSHD: Random boot time relinking, OpenBSD(undeadly.org) |
Now the tables have turned, and legitimate software has to become somehow polymorphic to thwart attacks by malware.
I'm curious because years ago the academic strongly pushes the FG ASLR story, then OpenBSD did kernel relinking, but I haven't heard any industry story on how effective this is.
Not a rhetorical question.
(FG)ASLR is more of a "targeting at exploit instead of vulnerability" style mitigation.
FG-ASLR helps because, even when you know where .text is, there are now N possible randomized locations for the piece of code your exploit leverages, so if you pick one and exploit M machines that way, only M/N of the exploits will succeed (where you got lucky).
Ultimately it is obfuscation, but with enough entropy it is very effective. It can't mitigate or prevent an exploit, but it makes it more work to turn an exploit into code execution consistently enough for it to be useful.
I wonder if it is possible to make a relinker which only requires binary output -- so it could be easily incorporated into existing systems.
One way I can think of is to keep relocation/original object information in the debug sections, so that one can reconstruct original object files and re-link them. But I am guessing this will not work with LTO though... Or maybe we can just make a bunch of debug sections and store input object/library files verbatim -- this will at least double the binary size, but will allow for easier relinking.
https://research.facebook.com/publications/bolt-a-practical-...
https://groups.google.com/g/llvm-dev/c/ef3mKzAdJ7U/m/1shV64B...
At least, someone finally understands that static, fully predictable, reproduce-able-builds are only an convenience feature for the attacker side.
So it looks like they are going to move on from CVS eventually.
AFAIK, they still interface with CVS directly, but I assume the expectation is to eventually transition to got.
[0]: https://marc.info/?l=openbsd-tech&m=167388832715992&w=2
Makefile.relink: cc -o sshd `echo ${OBJS} | tr ' ' '\n' | sort -R` ${LDADD} ./sshd -V && install -o root -g wheel -m ${BINMODE} sshd /usr/sbin/sshd
https://github.com/openbsd/src/commit/898412097f87ba70d4012f...
Sometimes it can be beneficial to optimize the link so most of the main thread stays in cache. Obviously this only really matters for CPU-intensive programs.
Fully reproducible builds provide great assurance against the supply chain attacks. But 100% reproducibility is in some cases a bit too much. What matters is whether the artifact can be easily proven to be functionally identical to the canonical one.
So I am 100% for a fully predictable sshd random-relink kit, producing unpredictable sshd binaries, but only as long as there is an instruction how to check that the sshd binary that allegedly came from it indeed could have come from it, and was not quietly replaced by some malicious entity.
You can easily verify the integrity of the object files that are used in the random relinking - they are included in the binary distribution, and are necessary to perform the relinking.
The debate of static vs dynamic linking is still going on, and a very strong argument against static linking has always been that upgrading vulnerable libraries is made difficult. But think of it: package managers already hold the meta-data of what links to what; object files can be distributed just as easily as shared objects; the last necessary step is to move the actual linking step from the kernel to the package manager.
I guess the holy grail would be to combine this with hot patching (https://en.wikipedia.org/wiki/Patch_(computing)#HOT-PATCHING), and relink the kernel every now and then while it is running (currently, a system under attack would have to be rebooted every now and then, and that’s undesirable). That would face ‘a few’ technical hurdles, though.
Maybe take a known vulnerable exec, create a fuzzing attacker and run it both ways seeing how long it takes to get lucky a few times. The more secure version should take longer.
I have to admit I am guilty of this as well, but any mantained openbsd setup should have an uptime of no more than 6 months and a well maintained openbsd setup will be shorter than that as security patches are applied.
Having said that one of the things I like about openbsd is that if you want to go dark and have an ultra stable system(no updates ever) all the pieces are there for you, (you will want to have the source, I would also make sure I have the ports tree for that release and a copy of the ports dist files.)
This does not replace classic ASLR: OpenBSD 5.7 activated position-independent static binaries (Static-PIE) by default.
https://en.wikipedia.org/wiki/Address_space_layout_randomiza...
One thing I'm still not sure about is whether the kernel could theoretically do the same reordering at load time using relocatable symbols.
In theory all functions, or more realistically groups of functions spanning page-size increments, could be dynamically located. The obvious way to achieve that would be to have multiple .text sections within a main executable or library. But off-hand I don't know if that's actually supported by ELF, or if so whether the standard tool chains and environments could easily support it.
Any linker that could not handle multiple identically named sections is simply buggy. That said, it is normal for a a linker to prefer to output only a single section of each name, but it is not difficult to get a linker to output multiple .text equivalent sections, especially if you make them have distinct names.
However section are not really what you want. PT_LOAD segments are, since those represent regions that get memmap'd contiguously. One can certainly put different executable sections into different segments.
I'm not 100% certain about how it works on OpenBSD, but on Linux, neither the kernel loader nor the loader embedded in the dynamic linker randomize the segments independently. The problem is that for dynamically linked code, the .text needs to be able to reference the GOT and PLT via relative addresses, so those segments must be loaded at a known distance relative to the code. For simple static PIE executables this should not be needed [1], however if you start introduce multiple chunks of code loaded at random addresses again, then you need to reintroduce similar concepts, as you cannot reference code in those other randomly placed chunks with a relative address.
Assuming things are at all similar in OpenBSD, to do what you are proposing, it would be needed to mark groups of segments that need to be loaded relative to each other, allowing other segment groups to be randomized with respect to each other. For code in one group to access globals or functions from the other, the linker would generate a GOT and PLT per group, similar to how dynamic linking works, but with simplifications since you know all the code that will be present, so don't need to worry about interposing, etc. In theory each GOT could get away with having as few as one entry per other segment group. [2]
Of course you would need code to initialize these GOT values. Realistically the static ELF loader would need to be augmented to provide the program with information about where it placed each segment group. Then the static PIE libc could include code that reads these offsets, and uses them to initialize the GOTs. If using the one entry per segment group approach and you place the GOTs a say the very start of each segment group, with the entries in segment group order, this would make for really simple initialization code. Of course, a more complicated relocation engine like a hyper stripped down dynamic linker would also be possible.
Footnotes:
[1]: Apparently on Linux even static PIE executables those have some amount of runtime relocation code that is needed (I'm not really sure what/why).
[2]: This is because the linker would know exact offsets of functions and variables within each segment group, so the code can simply load the other segment's pointer into a register using a relative addressing, and do the load/store/jump with that register plus the already known displacement into that segment group.
I don't understand the full logic here. Yes I can authenticate the object files. But how would you discern, after a possible intrusion, an "sshd" binary that is indeed a random combination of these objects, from a trojaned "sshd" binary?
There may be a code size cost in some architectures - that since the call destination can be relocated far from the call site that the assembler will need to make sure it allocates enough space to reach the call target instead of a small PCREL relocation.
The weak link in reproducibility is that you currently have no trivial way of recreating the same random order of the linked object files.
Currently the random relinking is implemented literally through a call to "| sort -R" (-R for random order) on the list of object files, passed as arguments to the linker. I suppose if sort -R took a seed argument that was saved somewhere safe (chmod 400), the linking order can still be reproduced, and the resulting executable checksummed against the state of the system.
Yet another solution is to re-sort the executable into a stable order and compare the hash of that.