Cost of enum-to-string: C++26 reflection vs. the old ways(vittorioromeo.com) |
Cost of enum-to-string: C++26 reflection vs. the old ways(vittorioromeo.com) |
So speaking of old ways, I'm not a C++ dev, but a while ago saw someone comment that they still organize their C++ projects using tips from John Lakos' Large-scale C++ software design from 1997, and that their compile times are incredibly fast. So I decided to find a digital copy on the high seas and read it out of historical curiosity. While I didn't finish it, one wild thing stood out to me: he advised for using redundant external include guards around every include, e.g.
#ifndef INCLUDED_MATH
#include <math>
#define INCLUDED_MATH
#endif
The reason for this being that (in 1997) every include required that the pre-processor opened the file just to check for an include guard and reading it all the way to the end to find the closing #endif, causing potentially O(N*2) disk read overhead (if anyone feels like verifying this, it's explained on pages 85 to 87).Again, that was in 1997. I have no idea what mitigations for this problem exist in compilers by now, but I hope at least a few, right?
This conclusion is making me wonder if following that advice still would have a positive impact on compile times today after all though. Surely not, right? Can anyone more knowledgeable about this comment on that?
You can also use `#pragma once` which works everywhere, is nicer, and technically needs less work by the compiler, but compilers have optimized for include guards since a long time ago.
Some random measurements I found: https://github.com/Return-To-The-Roots/s25client/issues/1073
> at least for gcc and Visual Studio using #pragma once has a significant impact. The fact is, the compiler does not need to continue parsing the whole file when reaching a #pragma once. otherwise the compiler always needs to do it even if the include guard afterwards will avoid double processing of the content afterwards.
As written the explanation for these optimizationst suggest that both "pragma once" and include guard optimization still requires opening and closing the file each time an include is encountered, even if you bail after parsing the first line. Is that overhead zero? Or are the optimizations explained poorly and is repeatedly opening/closing the file also avoided?
Either way, do you know what causes the slowdown as a result of including <meta>?
I'm going to experiment with other compilers and figure out how they handle it.
I'll just point out that Lakos updated his work with a new edition in 2019:
Large-Scale C++ Volume I: Process and Architecture
and there's scattered evidence that Volume II might be published in Feb. 2027 [1]
Large-Scale C++ Volume II: Design and Implementation
[1]: https://www.amazon.co.uk/Large-Scale-Implementation-Addison-...
Regardless, I don't think things are going to differ much with Clang. Without PCH/modules, standard header inclusion is still the "slow part" of C++ compilation, regardless of the compiler used and the standard library used (libstdc++ vs libc++). `#include` is fundamentally the same on any modern compiler.
Because the reflection feature itself seems quite fast on GCC (compared to the cost of the header), I predict the results will be similar on Clang as well.
Promises and claims have been made for longer than that on how Modules would have improved compilation times and made everyone's lives easier. In 2026, I still have to see any real evidence of that, especially when PCH + unity builds are much easier to use (except on damn Bazel, which supports neither) and deliver great results.
If after 6+ years of development Modules are still so far behind, it is fair to question if the problem is with the design/implementability of the feature itself.
struct MyStruct {
int val = 42;
string name = "my name";
};
into {
"val": 42, // if JSON had integers, and comments of course
"name": "my name",
}
is incredibly powerfuly. If reflection supported attributes (i can't believe it shipped without, honestly), then you could also mark members as [[ignore]] and skip them.Almost all the Java web frameworks are giant balls of reflection. Name a function the right way or add the right magic annotation and the framework will autowire it correctly.
It's a pretty powerful tool. (IDK if C++'s reflection is as capable, but this is what was enabled by java's reflection).
My favorite thing is that I will get to remove and replace most of the cryptic template recursion stuff I have with "template for" and maybe a bit of reflection. Debugging the unrolled stuff will be a joy in comparison.
It would be cool if the stated goal of C++29 was compile times.
For many useful use cases, you don't need C++26 reflection at all. E.g. https://www.linkedin.com/posts/vittorioromeo_cpp-gamedev-ref...
But interestingly the code can be improved. The issue is that meta::info[1] is a pure compile time object so in the original code we need to statically unroll the loop of the vector that contains it so that we can splice it in in the loop body. But if we convert it to our own objects, then we can use a plain for loop.
template<class T>
constexpr static inline auto reflect_type = ^^T; // not really necessary
template <typename T>
requires std::is_enum_v<T>
constexpr std::string_view to_enum_string(T val)
{
struct my_string_view { const char * ptr; size_t sz = strlen(ptr); };
static constexpr auto meta = std::define_static_array(
std::meta::enumerators_of(reflect_type<T>)
| std::ranges::views::transform(
[](auto e) {
return std::pair{my_string_view{define_static_string(std::meta::identifier_of(e))}, extract<T>(e)};
}));;
for (auto [name, value] : meta)
{
if (val == value) { return name; }
}
return "<unknown>";
}
This actually generate less code bloat as, if the array is large it will use a plain loop instead of always unrolling. Also the meta array can now be used for as lookup table for dense enums, while I don't think it is doable with the original version. Supposedly GCC should be able to convert a if chain into a switch statement, but it doesn't seem to trigger here [edit: scratch that: GCC does the switch conversion for the original version].define_static{_array,_string} still feel as unnecessary magic, but hopefully they are only transient and we will be able to use std::vectors directly. Also somehow GCC doesn't let me use std::string_view and I had to introduce an helper string type.
edit: I literally learned everything I know about static reflection in the last 24 hours. It is complicated, but not that complicated.
[1] Not sure why, I suspect they want to avoid being constrained by ABI.
I program mostly in C, if I need 'meta' programming I just write another C program that processes C source code (I've written a simple C parser), then in my build script I build in two stages, build meta program, run it, build rest of program.
Simple, effective, debuggable (the meta program is just normal C), infinite capabilities - can nest this to arbitritary depths, need meta-meta programming? Make a program that generates a meta program.
For example, what does https://miguelmartin.com/blog/nim2-review#implementing-a-sim... look like with C++26's std::meta::info?
My guess is: libclang is more suited for this situation if you care about compile times, even if Python is used.
I'm now trying to migrate from msbuild to cmake+sscache+PCH for std libraries while also trimming unnecessary includes to reduce suffering in the future - if not for me then at least for future developers. So I would say compile time is important for development. It causes other limitations too (like bugfixing becomes a huge commit with several squished fixes together to avoid recompiles, messing up git history or slower context switching when developing several features in parallel)
I'm sure you wouldn't say "it doesn't matter how long it takes to compile" it if took days. So where do you draw the line? Regardless, it matters.
EDIT: and based on these compilation time results, this would be a major setback for building the engine, which already takes an eternity.
Once you have that in place, you can easily detect duplicates, etc...
Of course, there are major limitations, as it's all a big hack: https://github.com/ZXShady/enchantum/blob/main/docs/limitati...
Similarly interesting is Boost.PFR, which gives you reflection superpowers since C++14: https://github.com/boostorg/pfr
That's the essence of C++: you're basically trading ergonomics for compile times.
C++ build times are hard pill to swallow when migrating from c. This is just another reason we'll probably stick to writing c as t the company where I work. It's like asking someone to give up instant compilation for cleaner easier to read apps?
Also now that we have cleanup handlers in c (destructors) even less of a reason to move...
We've come full circle huh?
Why do you need this, logging? In that case I would rather reflect the logging statement to pribt any variable name, or hell, just write out the string.
If saving for db, maybe store as string, there's more incentive for an enum in the db, if that's a string you might as well. At any rate it doesn't seem a great idea to depend on a variable name, imagine changing a variable name and stuff breaks.
When I saw the 'no boilerplate' example, the very first thought that came to my mind:
This is the ugliest, most cryptic and confusing piece of code I've ever seen. Calling this 'no boilerplate' is an insult to the word 'boilerplate'.
Yeah, I can parse it for a minute or two and I mostly get it.
But if given the choice, I'd choose the C-macro implementation (which is 30+ years old) over this, every time. Or the good old switch case where I understand what's going on.
I understand that reflection is a powerful capability for C++, but the template-meta-cryptic-insanity is just too much to invite me back to this version of the language.
I played around with cppfront over Christmas and it was a lot more ergonomic than my distant memories of C++11, which I don't even have negative memories of per se.
Why? The implementation is not pretty, but you only need to write it once and then it works for all enums. The actual usage is trivial, it's just a function call.
The C macro version is horrendous in comparison. Why would I want to declare my enums like that just because I might want to print them?
Seeing this argumentation is so tiresome, because it feels like there is a lack of self-awareness regarding what is "familiar" and what isn't, which is subconsciously translated to "ugly" and "bad".
And template for but I assume that's like inline for like in zig.
Not familiar with Zig but AFAICT `inline for` is about instructing the compiler to unroll the loop, whereas `template for` means it can be evaluated at compile time and each loop iteration can have a different type for the iteration variable. It's a bit crazy but necessary for reflection to work usefully in the way the language sets it up.
See wg21.link/P3491
But there is also good news that with the advent of JIT like components for compile time evaluation in progress and the like of CLion having the beginnings of a compile debugger in combination with concepts there is a chance some help is available and on the way.
However right now you have to rely on compiler errors and static_asserts which is not ideal of course.
In practice, I haven't really needed to ever debug `consteval` functions -- it's quite easy to get the right behavior down thanks to `static_assert`-based testing and thanks to the fact that they do not depend on external state (simpler).
For one thing they are required to disallow all undefined behavior for compile time execution, and some forms of UB only occur when the code is run.
I never felt the need for them when doing TDD.
Casey has been talking about this some time ago: https://www.youtube.com/watch?v=UzD_Ze6zFKA
Also, John Carmack's perspective: https://www.youtube.com/shorts/PRE51epznT8
Without taking a stance on whether in-language meta programming facilities are good or bad, it’s not hard to find examples of cases where people find it useful to have them.
C++ metaprogramming is bad, but the problem there is the C++ part, not the metaprogramming-in-the-language part.
But you're probably not doing s ton of metaprogramming all the time like you should be, and would with a language that allows it.
The lack of metaprogramming is also why C is so slow compared to C++
Two-stage compilation is just a bonus on top: you add a sequential dependency in your build graph and if you have enough of these parsing programs you are going to wait till they are all built before your build can go wide.
1. https://matklad.github.io/2025/04/19/things-zig-comptime-won...
A for loop executed during comptime is just
const stuff = comptime stuff: {
for (0...8) |i| {
// etc, build up some stuff
}
break :stuff some_stuff;
};
The difference is that a comptime block won't leave behind runnable 'residue', only whatever data is constructed for later. An inline for might not leave behind an unrolled loop either, but it can.Typically, I am given an ancient code base that is full of bad decisions, hard to read code and no tests in sight. Sometimes there are assertions, if I am lucky. It's impractical to create a reliably test suite, or rewrite everything from scratch.
Here, I heavily rely on a debugger just to make sense of the code. Sure, I'd wish that all of this code would just be sparkling clean, easy to read, free of UB, etc. But that's not the reality I work in, and good debugger is my number one tool getting the job done.
And don't even get me started on dealing with closed source implementations where all you could read is disassembly.
My understanding is that this is an optimization that has been available for a very long time now.
The only issue is if a file is referred through multiple names (because of hard links, symlinks, mounts). That might cause the file to be opened again, and can actually break pragma once.
Yes, originally they only supported runtime reflection.
Nowadays they have compile time tooling as well, via plugins, annotation processors, and code generators.
Which is exactly how you can have a Spring like frameworks that do all the AOP magic at compile time, for native code with GraalVM or OpenJ9, like Quarkus or Micronaut.
I find this to be very powerful, and also very unintuitive/undiscoverable at the same time.
Most frameworks in Java are very similar. The ones that aren't are effectively doing what "expressjs" does in terms of setup, which is still pretty discoverable.
Most java frameworks rely on annotations rather than naming schemes which makes everything a lot easier to grok.
The module story is just insane. How was it possible to get such a big feature into the standard without any working reference implementation? Isn't this the requirement for standard proposals to get accepted? If I compare this with how they treated JeanHeyd and his #embed proposal, the difference is staggering. To me it seems like a few powerful comittee members wanted to get modules into C++20 at any cost. This was just irresponsible.
Maybe you forget Hacker News of 10 years ago, but in 2015-2016, everyone was complaining C++ doesn't have modules and how awful it must be because they're not modules. Now that C++ has modules, they're complaining about how it has modules.
People are not complaining about the fact that C++ has modules, but about their usability and effectiveness. The compile time benefits seem modest and I have seen reports that it breaks Intellisense. (Maybe that's not true anymore?)
As Vittorio said, if it takes compiler vendors so long to implement them properly, maybe the design wasn't that good after all?
My point was: if you add such a big feature, shouldn't the standard require a sufficiently complete implementation? Otherwise, how can they assess whether the proposal actually works in practice and lives up to its promises?
It is no different from any other language that compiles via C or C++ code generation, it got sold a bit differently due to his former position at WG21.
But I do think it is different than other "compile to C++" languages, because it seems to be more of a personal case study for Sutter to figure out various reflection and metaprogramming features, and then "backport" those worked out ideas to regular C++ via proposals. And the latter don't have to match the CPP2 syntax at all.
In multiple examples he's given in talks the resulting "regular" C++ code is easier to read, mainly because the metaprogramming deals with so much boilerplate.
Typescript is a linter, nothing else, type annotations for JavaScript. The two features that aren't present in JavaScript, enums and namespaces, are considered design mistakes and the team vouched to focus only on being a linter,and polyfill for older runtimes, when possible (some JS features require runtime support).
While Kotlin spews JVM bytecode many language constructs, like co-routines, make it one way, it is easy to call Java from Kotlin, the other way around requires boilerplate code, manipulating the additional classes generated by the Kotlin compiler for its semantics.
It seems that this is being worked on, and eventually the `define_static_array` won't be needed anymore
A quick compiling C++ project is most likely extremely conservative in its use of C++ (vs C) features.
My entire VRSFML codebase compiles from scratch in ~4s and I liberally use C++ features, I just avoid the Standard Library most of the time.
Templates are not inherently slow, people just don't know how to use them and don't know how to control instantiation.
Most people still think that templates have to go in header files, which is also just plainly false.
#define E_LIST(X) \
X(V0) X(V1) X(V2) X(V3)
DEFINE_ENUM(E, E_LIST)
That's not how I want to declare my enums...The underlying machinery implementation is going to be much uglier and complex, though.
That looks much nicer indeed, but I still vastly prefer the other solutions, simply because I can just declare regular enums.
#define MY_ENUM(x) x(MY_A) x(MY_B) x(MY_C)
enum my_enum { MY_ENUM(ENUM_ENTRIES) };
static const char *my_enum_names[] = { MY_ENUM(ENUM_NAMES) };
but one could also make it even more compact if one cared.Yes, xmacros have the best compile times, but you can't possibly argue that they are elegant to use compared to the alternatives.
Like, yeah, what you say about TS and Kotlin is true about TS and Kotlin. But since you're not explaining what cpp2 does or plans to do differently, and why it matters, I'm not sure where you're going with that. It's probably obvious but I'm not getting it.
The metaphor Sutter was going for, as I see it, is that TS and Kotlin both added missing features to their host language. Most importantly reflection and decorators in TS, which are now becoming a standard in JS as well[0]. cpp2 mainly focuses on experimenting with reflection and metaprogramming as well, adding features currently missing in C++ by being a compiles-to-C++ language. Sutter has written C++ proposals what would allow give C++ similar reflection and metaprogramming capabilities based on what he discovered by working on cpp2. That's pretty comparable if you ask me.
So it is as it is, plenty of software in C++ isn't going to be rewriten into something else.
Maybe someone can do a Claude rewrite from LLVM into something else. /s
In a lot of languages, you achieve the same with 1 line of code. It's not about familiarity, it's about the fact that it's a long and convoluted incantation to get the name of an enum.
Why do I have to be familiar with all those weird symbols just to do a trivial thing ?
Update:
Zig:
const Color = enum { red, green, blue };
const name = @tagName(Color.red); // "red"
Rust:
#[derive(Display)]
enum Color { Red, Green, Blue }
let name = Color::Red.to_string(); // "Red"
Clojure:
(name :red) => "red"
As far as I understand you would have to mess with individual parser tokens in Rust instead of high-level structures like "enum" (C++ reflection). It would be much, much uglier to implement anything like "to_enum_string" in Rust as you would have to re-implement parts of the compiler to get the "enum" concept out of a list of tokens.
enum Color { red, green, blue };
auto name = to_enum_string(Color::Red); // "Red"You do see a lot of macro use to deal with this, but that is just primitive, non-typesafe metaprogramming, and it gets unwieldy enough that in practice, you see people add an extra pointer. This is why it gets slower.
99% of code in the wild is comically inefficient and is doing the wrong thing, using way too generic data structures and algorithms for very concrete problems. C++ templates may be one way to make comically slow code faster by spending a lot of compile time. But it's often much quicker to just write straightforward concrete code that the compiler can easily optimize.
IMO C++ makes for slow programs for the sole fact that it compiles so slow (if you use its modern features), so you have much less time to actually iterate and improve.
(The link above shows ImGui generation, but the same exact logic can be applied for serialiation to JSON/YAML/whatever.)
> The magic sauce? Boost.PFR! An incredibly clever library that enables reflections on aggregates, even in C++17.
That's not vanilla C++!
That is if you are worried about doing this by hand reflection is not the answer, something like protobuf where your data structures are generated is the answer.
In practice both clang and VS have had some form of module support for quite a while, but the final standard ended up being different from either implementation (shaped by their experience, and certainly with inevitable last minute inventions).
I wonder if for some features the committee should vote for general guidelines, the delegate a third party (one or more implementors) to come up with both an implementation and standardese with the understanding that it will be fast-tracked in wit too much bike-shedding
I have heard rumors that certain people in the Visual Studio team have exaggerated the state of their modules implementation to speedrun the standardization process. I have no idea if that is really true, but it would explain a lot of things...
I'm not the only one who is asking these questions:
> I don’t know if they exaggerated their claims at the time, or if they didn’t properly fund the Visual Studio team since or what, but you can’t tell me 8 years wasn’t enough to make syntax highlighting work with modules. And if it is, then maybe there was something deeply wrong in their proposal and the committee should have asked to see the receipts before voting yes.
Fair enough.
C++ templates _are_ slow to compile. They require running something like a dynamically typed VM in the compiler.
This is my `sf::base::Optional<T>` template class, a lightweight replacement for `std::optional` with same semantics: https://github.com/vittorioromeo/VRSFML/blob/master/include/...
This is what ClangBuildAnalyzer reports:
**** Template sets that took longest to instantiate:
833 ms: sf::base::Optional<$> (911 times, avg 0 ms)
Each individual instantiation of this class is sub 1ms.
Including the header itself takes 3ms.I'm sure I can optimize it even further if I wanted to.
---
Now to refute your other incorrect claims:
> The point of templates is generic programming, reusable components.
That's ONE use case. A more general use case is just reducing code repetition in a type-safe manner, which is extremely useful even within the same translation unit. Another use case is metaprogramming. And I'm sure I can come up with more. Templates are a versatile tool.
> And if you have to "selectively pick TUs where they're instantiated", you're basically admitting that you have to invest effort to reduce compile times.
...well, yeah? Of course you have to put in effort to reduce compile times. That doesn't undermine my point at all.
C++ templates are not slow to compile.
As I wrote elsewhere, 1 second is a timespan where we could aim to compile 1 MLOC of code on a single core.
> A more general use case is just reducing code repetition in a type-safe manner
As I said -- code reuse. And interestingly your Optional.hpp is a header...
You won't have to care about ^^ and [:X:] if you just want to consume reflection-based utils, which was the whole point of my comment.
> Why do I have to be familiar with all those weird symbols just to do a trivial thing ?
And my answer demonstrates that you do not have to.
It should be a goal to keep rebuild times around 1 second (often not quite possible, but 3-5 seconds, even for full rebuilds, is often realistic). I edit, compile, run, edit, compile, run. Editing and running can often take as little as 1-3 seconds, and I sometimes do it dozens of times working in a row, working on a single improvement. That's why there is a 1 second rebuild time goal.
In practice I often work on codebases I don't fully control, but when the build times are excessively high, I will complain and try to improve. Build times longer than 10-15 seconds break the flow, they are a significant productivity hit. But they are quite common with C++ codebases (it can also be bad with C codebases by the way, but C++ is typically much worse because of templates and metaprogramming which is very slow).
> Compilation times don't even measure.
You must be joking. Do you even program?
1 second, seriously? Even the Linux kernel is based on C, and it doesn't even have compilation times approaching that.
I guess I also work on a lot of big data projects, where getting results will take... 48 hours or so, so anything shorter than that is basically some sort of unit test or dry run... so in that context, compilation times do not even register on the things slowing me down.
Yes, seriously, have you ever written a project from scratch? A simple .c file with a thousand lines in it should easily build and start within 100ms. A compiler should be able to do basic parsing and codegen at 1M lines per core.
If your runs take 48h, of course you need a strategy to avoid noticing bugs only after dozens of hours running. You can't tell me that it is efficient to make changes and to wait for minutes or even hours before noticing that your code wasn't even syntactically valid, or maybe it did compile but your code had a small oversight and you need to start over building.
The Linux kernel is a HUGE project, one of the biggest around. Yes, a full rebuild takes a long time, depending on configuration. Incremental rebuilds do not, though.
I'm actually working on a Linux kernel module (distributed filesystem client), it's on the order of 40 KLOC. I can do a full rebuild in 10/15 seconds (debug/release), and that includes calling into the kernel's infrastructure and doing a lot of stuff that shouldn't have to be done. An incremental rebuild after changing a single .c file is about 3 seconds. Restarting the module (swapping for the newly built one) takes less than 10 seconds also. And this can be already a stressful bottleneck depending on the task. Say you're improving logging in a particular section of code, this can easily require 5-10 attempts.
I'm working on Desktop GUIs (2D/3D) too. You need a quick turnaround time as much as possible. Many changes are trivial but you want to do many small incremental improvements, recompile, run and test (manually), often with a breakpoint on the code you're currently working on.
The projects I'm working on are written in C or conservative C++, and most have from thousands to hundreds of thousands lines of code. They can be built from scratch in a short amount of time (< 10s for the smaller ones). And all of them do incremental builds in <= 10 seconds except when maybe changing the most central headers which essentially means a full rebuild.
You can also design a C/C++ codebase to always do a full rebuild, compiling everything as a single unit. That can be faster than trying to do incremental builds, for codebases of considerable size. Try out the popular raddebugger project, a complete build after checkout is about 3 seconds. It's ~300 KLOC I think.
A guiding principle of C++ is that if something can be implemented cleanly and efficiently in a library, the language should not be extended to support the use case.
Now boost.pfr is exceedingly clever, but relying on speculative pack expansions or using stateful metaprogramming hacks is not something I would call clean and efficient, so proper reflection is warranted.
I do worry about the compile time impact though.
PFR has given us reflection since C++14.
I also don't think the Standard Library is particularly well-defined nor well-implemented, as demonstrated by the atrocious compilation times.
If you think that's uninteresting, that's an aesthetic preference, not a technical argument.
But let's set that aside, because it's also irrelevant to the compile-time claim.
The point of the example wasn't "look at this fascinating class," it was "here is a real template, used 911 times across the codebase, in a public header -- exactly the scenario you said would be slow -- and it costs under 1ms per instantiation."
You can swap `Optional` for any non-trivial template of similar complexity and the numbers will look similar.
On your 1 MLOC/sec benchmark: that's a fair reference point for C-like code, but it's not the right yardstick for template instantiation, which is doing semantic work (overload resolution, SFINAE, constraint checking) that a C compiler simply isn't.
Comparing them is comparing different jobs.
The honest question is whether template compilation is slow relative to what it's actually doing, and in well-structured code, it isn't.
And yes, `Optional.hpp` is a header -- that's the whole point of the demonstration. I'm not claiming you should hide every template in a .cpp file. I'm claiming that even templates in headers, instantiated hundreds of times, are cheap when written with compile times in mind.
The "put templates in .cpp where it makes sense" advice is for the specific cases, not a blanket rule.
Then again - "where does that `to_enum_string` come from exactly?".
How many libraries do you read the source code after installing them with the package manager?
#include "to_enum_string.h"
You don't have to understand it to use it. Even then, it's not that hard to understand, it just looks unfamiliar.First of all, the only correct way to use package managers is with validated internal repos, don't vibe install, that goes for node, and goes for C++ as well.
Second this thread was all about how code lands in one's computer.