C++ Headers are Expensive

C++ Headers are Expensive(virtuallyrandom.com)

107 points by kbwt 7 years ago | 109 comments

AndyKelley 7 years ago |

In the Zig stage1 compiler (written in C++), I tried to limit all the C++ headers to as few files as possible. Not counting vendored dependencies, the compiler builds in 24 seconds using a single core on my laptop. It's because of tricks like this:

    /*
     * The point of this file is to contain all the LLVM C++ API interaction so that:
     * 1. The compile time of other files is kept under control.
     * 2. Provide a C interface to the LLVM functions we need for self-hosting purposes.
     * 3. Prevent C++ from infecting the rest of the project.
     */


    // copied from include/llvm/ADT/Triple.h

    enum ZigLLVM_ArchType {
        ZigLLVM_UnknownArch,
    
        ZigLLVM_arm,            // ARM (little endian): arm, armv.*, xscale
        ZigLLVM_armeb,          // ARM (big endian): armeb
        ZigLLVM_aarch64,        // AArch64 (little endian): aarch64
    ...

and then in the .cpp file:

    static_assert((Triple::ArchType)ZigLLVM_UnknownArch == Triple::UnknownArch, "");
    static_assert((Triple::ArchType)ZigLLVM_arm == Triple::arm, "");
    static_assert((Triple::ArchType)ZigLLVM_armeb == Triple::armeb, "");
    static_assert((Triple::ArchType)ZigLLVM_aarch64 == Triple::aarch64, "");
    static_assert((Triple::ArchType)ZigLLVM_aarch64_be == Triple::aarch64_be, "");
    static_assert((Triple::ArchType)ZigLLVM_arc == Triple::arc, "");
    ...

I found it more convenient to redefine the enum and then static assert all the values are the same, which has to be updated with every LLVM upgrade, than to use the actual enum, which would include a bunch of other C++ headers.

The file that has to use C++ headers takes about 3x as long to compile than Zig's ir.cpp file which is nearing 30,000 lines of code, but only depends on C-style header files.

ArthurBrussee 7 years ago | |

What a world... Thanks for working on Zig, can't wait to see what comes of it. Anything to get some languages back that bring some joy into programming back!

pjmlp 7 years ago | |

Any plans to actually bootstrap the compiler?

AndyKelley 7 years ago | | |

https://github.com/ziglang/zig/issues/853

beached_whale 7 years ago |

You can know where you time is going, at least with clang, by adding -ftime-report to your compiler command line. The headers take a long time is often that the compiler can do a better job at optimizing and inlining as everything is visible. Just timing your compiles is like trying to find things in the dark, you know the wall is there but what are you stepping on :) Good to know what is taking a long time, but it may not be the header itself but how much more work the compiler can do now to give a better output(potentially)

fouronnes3 7 years ago | |

I've been working with -ftime-report, but unfortunately it reports times per cpp file. I'm looking for a way to get a summary across an entire CMake build. Right now reading 100+ -ftime-report outputs is not really useful, although deep down I know it's all template instancing anyway.

beached_whale 7 years ago | | |

When I look most of the time has gone to inline and optimization. But I only look sometimes and sample size is Me

nanolith 7 years ago |

I recommend three things for wrangling compile times in C++: precompiled headers, using forward headers when possible (e.g. ios_fwd and friends), and implementing an aggressive compiler firewall strategy when not.

The compiler firewall strategy works fairly well in C++11 and even better in C++14. Create a public interface with minimal dependencies, and encapsulate the details for this interface in a pImpl (pointer to implementation). The latter can be defined in implementation source files, and it can use unique_ptr for simple resource management. C++14 added the missing make_unique, which eases the pImpl pattern.

That being said, compile times in C++ are going to typically be terrible if you are used to compiling in C, Go, and other languages known for fast compilation times. A build system with accurate dependency tracking and on-demand compilation (e.g. a directory watcher or, if you prefer IDEs, continuous compilation in the background) will eliminate a lot of this pain.

AdieuToLogic 7 years ago |

If C++ compile time is a concern and/or impediment to productivity, I recommend the seminal work regarding this topic by Lakos:

Large-Scale C++ Software Design[0]

The techniques set forth therein are founded in real-world experience and can significantly address large-scale system build times. Granted, the book is dated and likely not entirely applicable to modern C++, yet remains the best resource regarding insulating modules/subsystems and optimizing compilation times IMHO.

0 - https://www.pearson.com/us/higher-education/program/Lakos-La...

de_watcher 7 years ago | |

If it's a book I'm thinking about then it appeared already very dated to me 10 years ago. Too many limitations and there are some weird rules about boundaries between elements of the architecture.

kazinator 7 years ago |

Speaking about GNU C++ (and C), the headers are getting cheaper all the time compared to the brutally slow compilation.

Recently, after a ten year absence of not using ccache, I was playing with it again.

The speed-up from ccache you obtain today is quite a bit more more than a decade ago; I was amazed.

ccache does not cache the result of preprocessing. Each time you build an object, ccache passes it through the preprocessor to obtain the token-level translation unit which is then hashed to see if there is a hit (ready made .o file can be retrieved) or miss (preprocessed translation unit can be compiled).

There is now more than a 10 fold difference between preprocessing, hashing and retrieving a .o file from the cache, versus doing the compile job. I just did a timing on one program: 750 milliseconds to rebuild with ccache (so everything is preprocessed and ready-made .o files are pulled out and linked). Without ccache 18.2 seconds. 24X difference! So approximately speaking, preprocessing is less than 1/24th of the cost.

Ancient wisdom about C used to be that more than 50% of the compilation time is spent on preprocessing. That's the environment from which came the motivations for devices like precompiled headers, #pragma once and having compilers recognize the #ifndef HEADER_H trick to avoid reading files.

Nowadays, those things hardly matter.

Nowdays when you're building code, the rate at which .o's "pop out" of the build subjectively appears no faster than two decades ago, even though the memories, L1 and L2 cache sizes, CPU clock speeds, and disk spaces are vastly greater. Since not a lot of development has gone into preprocessing, it has more or less sped up with the hardware, but overall compilation hasn't.

Some of that compilation laggardness is probably due to the fact that some of the algorithms have tough asymptotic complexity. Just extending the scope of some of the algorithms to do a bit of better job causes the time to rise dramatically. However, even compiling with -O0 (optimization off), though faster, is still shockingly slow, given the hardware. If I build that 18.2 second program with -O2, it still takes 6 seconds: an 8X difference compared to preprocessing and linking cached .o files in 750 ms. A far cry from the ancient wisdom that character and token level processing of the source dominates the compile time.

RcouF1uZ4gsC 7 years ago |

> The test was done with the source code and includes on a regular hard drive, not an SSD.

In my opinion, this makes any conclusion dubious. If you really care about compile times in C++, step 0 is to make sure you have an adequate machine (at least quadcore CPU/ lot of RAM/SSD). If the choice is between spending programmer time trying to optimize compile times, versus spending a couple hundred dollars for an SSD, 99% of the time, spending money on an SSD will be the correct solution.

lbrandy 7 years ago |

All of msvc, gcc, clang, and the isocpp committee have active work ongoing for C++ modules.

We'll have them Soon™.

Valmar 7 years ago | |

Who knows whether they'll see much use, due to C++ needing to keep backwards compatibility for older projects that demand older versions of C++.

It probably partially depends on whether old-style headers can be used simultaneously with new-style modules.

_0w8t 7 years ago |

Opera contributed jumbo build feature to Chromium. The idea is to feed to the compiler not the individual sources, but a file that includes many sources. This way common headers are compiled only once. The compilation time saving can be up to factor of 2 or more on a laptop.

The drawback is that sources from the jumbo can not be compiled in parallel. So if one has access to extremely parallel compilation farm, like developers at Google, it will slow down things.

maccard 7 years ago | |

> The drawback is that sources from the jumbo can not be compiled in parallel. So if one has access to extremely parallel compilation farm, like developers at Google, it will slow down things.

Generally the way this works is rather than compiling into one jumbo file, you combine into multiple files, and you can then compile them in parallel. UE4 supports it (disclosure, I work for them). and it works by including 20 files at a time, and compiling the larger files normally.

There is also a productivity slow down where a change to any of those files causes the all the other files to be recompiled, so you can remove those files from the individual file.

> The compilation time saving can be up to factor of 2 or more on a laptop.

The compilation time savings are orders of magnitude in my experience, even on a high end desktop. That's for a full build. For an incremental, there is a penalty (see above for workarounds)

speps 7 years ago | |

This is also called "unity builds" and is used by Unreal Engine 4[1] and can definitely be used in parallel (eg. IncrediBuild).

[1] https://api.unrealengine.com/INT/Programming/UnrealBuildSyst...

mcv 7 years ago |

This reminds me of my very first job after university. We used Visual C++, with some homebrew framework with one gigantic header file that tied everything together. That header file contained thousands or possibly tens of thousands of const uints, defining all sorts of labels, identifiers and whatever. And that header file was included absolutely everywhere, so every object file got those tens of thousands of const uints taking up space.

Compilation at the time took over 2 hours.

At some point I wrote a macro that replaced all those automatically generated const uints with #defines, and that cut compilation time to half an hour. It was quickly declared the biggest productivity boost by the project lead.

fizwhiz 7 years ago |

Isn't this the reason precompiled headers are a thing?

timvisee 7 years ago |

I would love to see the times of this on a Linux system (preferably on the same hardware).

$ for i in `seq 3`; do gcc-6 -x c-header /dev/null -o x.h.gch; sha256sum x.h.gch; done 98d8093503565836ba6f35b7adf90330d63d9d1c76dfb8e3ad1aeb2d933d1a45 x.h.gch 17e5de099860d94aaa468c5ad103b3f0dd5e663f6cdbd01b4f12cf210023e71c x.h.gch 3cc2f1c0a517b5fedbbd49bb3a34084d9aa1428f33f3c30278a8c61f9ed9ba88 x.h.gch