Dllicious – shared object usage analysis on Linux(wiki.alopex.li) |
Dllicious – shared object usage analysis on Linux(wiki.alopex.li) |
You would need two distinct, long-lived dynamically-linked executables, both being hot at the same time, and both having libc (or another common library) in the hotpath at the same time.
To elaborate,
- If two hot processes are the same executable, static linking would yield the same results: They will be mmap()ed to the same physical addresses (and I'm assuming physical indexing or equivalent here, otherwise ASLR really negates any dynamic linking advantage wrt caches).
- If the processes are not hot or not long-lived, everything will be dominated (by orders of magnitude) by media access, context switches, and virtual page allocation.
- Sure, we could imagine a non-hot process trashing the cache for a hot one, and doing it less so with dynamic linking. But again, context switch is the bigger concern. A complete cache flush would make little difference here.
- And I'm not even going to touch on the various cache-timing vuln mitigations out there.
That being said, I am a staunch proponent of dynamic linking being the general case, for the memory and storage savings. Security is a common argument too, and although I'm not expert enough to have an opinion on this one, I do love the traditional distro approach to packages.
Like, for example, bash and a process you are executing in parallel in a loop?
They also didn't look at DLLs used by other DLLs.
This is like a back-of-the-envelope calculation. A quick first pass to get an order of magnitude.
If so, then the data is too wrong to be of any value.
However, ldd lists also indirect dependencies so I think this was already accounted for (though that might've been by accident):
~> readelf -d /bin/bash | grep NEEDED 0x0000000000000001 (NEEDED) Shared library: [libreadline.so.8] 0x0000000000000001 (NEEDED) Shared library: [libc.so.6] ~> ldd /bin/bash linux-vdso.so.1 (0x00007fff15191000) libreadline.so.8 => /lib64/libreadline.so.8 (0x00007f2a24818000) libc.so.6 => /lib64/libc.so.6 (0x00007f2a2460e000) libtinfo.so.6 => /lib64/libtinfo.so.6 (0x00007f2a245db000) /lib64/ld-linux-x86-64.so.2 (0x00007f2a249b1000)
Also it seems like you're also scanning a bunch of flatpak apps and steam apps, which provide an environment for common libraries, but expect each program to statically link anything not shipped within these common environments.
Both of these skew the graph of most used libraries, and the former skews the additional space required for static linking a little.
It's also worth mentioning that inlining, the lack of a requirement for symbol names/version info/dependency info, and dead code stripping, make static binaries significantly smaller than the size of program + so.
I'm a huge proponent of shared libraries, but these numbers wouldn't really mean much in reality, unfortunately. Though for me, the main benefit of shared libraries is being to individually patch and upgrade them, without a care for how many apps actually use them nor how. This has been a godsent in terms of adding functionality and making my computer behave like I want it to.
10093 libcap.so.2
49344 libc.so.6
27555 libdl.so.2
13296 libffi.so.8
16112 libgcc_s.so.1
10086 libgcrypt.so.20
11352 libglib-2.0.so.0
10128 libgpg-error.so.0
13015 liblzma.so.5
22664 libm.so.6
11337 libpcre.so.1
29942 libpthread.so.0
11036 libresolv.so.2
19510 librt.so.1
17287 libz.so.1
11362 libzstd.so.1
Though, some usages are not the same as others even if they have the same ABI so probably hurting performance a bit. :PEven then, I would expect NixOS to do some unsharing of shared libraries, so it would probably be best to use the full path to determine how much each shared object is actually shared.
For example, I have 87 different `libzstd.so.1` in my `/nix/store` right now.
49344 libc.so.6
29942 libpthread.so.0
27555 libdl.so.2
22664 libm.so.6
19510 librt.so.1
17287 libz.so.1
16112 libgcc_s.so.1
13296 libffi.so.8
13015 liblzma.so.5
11362 libzstd.so.1
11352 libglib-2.0.so.0
11337 libpcre.so.1
11036 libresolv.so.2
10128 libgpg-error.so.0
10093 libcap.so.2
10086 libgcrypt.so.20Yeah, no. Every sane compiler developer would always prefer DLLs over static linking because of modularity. It makes the compiler much smaller if you can defer the combination of modules to a linker and thus have proper separate compilation.
Unfortunately, separate compilation with a C-like ABI only works well with a relatively limited monomorphic language. All other languages would have to leave quite a few optimizations on the table or make some suboptimal choices for uniform object representation (compare OCaml's ABI with Rust or Haskell).
Consequently, a modern language would face the tremendous (but I think interesting) task to extend the C-ABI in a way that allows these optimizations and separate compilation. Understandably, developers then play down the importance of separate compilation and choose a global approach.
> So huzzah, you now have some real data for your next Internet Argument
You don't need real data for internet arguments. Make something up and sound convincing, it works at least 60% of the time according to studies by a guy whose wiki blog I read once.
There is no relationship between whether static or dynamic linking is used and modularity.
When a program is decomposed in modules, those are compiled separately and the complete executable program is made by linking, either statically or dynamically.
The static vs. dynamic option does not influence the semantics of the program regarding modularity.
Dynamic linking offers a few extra features, e.g. delaying the linking of a library to some time after a program starts and choosing one between more libraries at that time, but exactly the same functionality can be implemented in a statically linked program (using pointers to functions), with no difference in behavior (but with different costs in memory space and execution time; which costs are larger will be different for each particular case).
Actually a compiler that targets only static linkers will be slightly smaller, because many of the standards for dynamic linking, e.g. the UNIX SVR4/ELF ABI, require the compiler to emit additional instructions and data structures whenever external variables or functions are accessed, in comparison with the case when only static linking is used.
Dynamic linking is an additional complication for the compiler, not a simplification. Proper separate compilation of modules is the easiest with only static linking, when the compiler just has to emit appropriate relocation and linking data (which was actually the job of an assembler for the traditional UNIX compilers, which generated an assembly program, not an ELF object file), besides what it needs to do for compiling a monolithic program.
No, they're not. At least not in languages with nontrivial features and optimizations. Consider the identity function id = \\x.x in some module A and it's usage A.id 42 in some module B. Unless you commit to uniform object representation, excluding several optimizations, there's no way to compile A separately from it's use in B (because specialization of A.id is required). That fact excludes the option of creating dynamic libraries because you would expect the dynamic library compiled from A to be used in stead of A (with the necessary type interface data). Similar problems occur with polymorphic data structures.
Separate compilation, modularity, and dynamic linking are all aspects of the same problem.
I think that's why Rust uses a global compilation approach and if I am not completely mistaken, only Swift tries to have dynamic linking with polymorphism.
If you're trying to be cross-platform, you have the major issue that dynamic libraries have very different models on different platforms. Windows DLLs require all of the exported--and imported--names to be explicitly identified at compile time; MACH-O and ELF files don't require that. On the other hand, ELF files have symbol visibility which can be toggled to do something similar, but the standard expectations in C are completely different: every symbol is capable of being exported (or overridden--thanks symbol preemption!). (I don't have much familiarity with MACH-O's two-namespace system, so I won't talk about it further).
There are other issues. DSOs make it hard to use linker features such as arranging things in a section to make a distributed static list (e.g., for registering reflection stuff). Even doing static constructors in dynamic libraries is hard. Arranging for single static addresses (as C/C++ require) can be challenging. On Windows, malloc in one DLL can't be freed by free in another... sometimes. It also requires defining a stable ABI, since you have no reasonable expectation that the other side of the dynamic library is using the same compiler version you are. In general, crossing the dynamic library boundary is like doing FFI, given how little control you have over the process.
No, static linking is much easier. You don't have to hold in your head so much more crazy semantics with dynamic linking.