Learning that you can use unions in C for grouping things into namespaces

Learning that you can use unions in C for grouping things into namespaces(utcc.utoronto.ca)

167 points by deafcalculus 4 years ago | 147 comments

Anonymous nested structs are also quite useful for creating struct fields with explicit offsets:

    #include <stdio.h>
    #include <stdint.h>
    
    #define YDUMMY(suffix, size) char dummy##suffix[size]
    #define XDUMMY(suffix, size) YDUMMY(suffix, size)
    #define PAD(size) XDUMMY(__COUNTER__, size)
    
    struct ExplicitLayoutStruct {
        union {
            struct __attribute__((packed)) { PAD(3); uint32_t foo; };
            struct __attribute__((packed)) { PAD(5); uint16_t bar; };
            struct __attribute__((packed)) { PAD(13); uint64_t baz; };
        };
    };
    
    int main(void) {
        // offset foo = 3
        // offset bar = 5
        // offset baz = 13
        printf("offset foo = %d\n", offsetof(struct ExplicitLayoutStruct, foo));
        printf("offset bar = %d\n", offsetof(struct ExplicitLayoutStruct, bar));
        printf("offset baz = %d\n", offsetof(struct ExplicitLayoutStruct, baz));
        return 0;
    }

WalterBright 4 years ago | |

Anytime macros are used for metaprogramming, it's time to reach for a more powerful language.

tpoacher 4 years ago | | |

I hear your quote, but it's only a quote.

A macro is effectively preprocessing facilitated by the language. You could always preprosess externally if you wanted, and there's nothing stopping you from doing that in the "Powerful Language (TM)" either.

Now whether people use macros and preprocessing usefully is another question, but not one to which the answer is "abolish macros for more language features". When used correctly, macros ARE power.

10000truths 4 years ago | | |

Macros are useful, as long as they're used sparingly. I think that in this case, it's used well - the struct is still perfectly readable, and the sole purpose of it is to make it so that you don't have to manually name the dummy fields. But you could totally just write out dummy1, dummy2, dummy3 etc. yourself if you want to get rid of the macros.

throwaway17_17 4 years ago | | |

I want to be clear about your meaning, because I don’t know if I’m reading your comment correctly. Are you referring explicitly to syntax based, preprocessor macros? Or does your comment extend to other metaprogramming techniques? I am inclined to think you mean the first considering the amount of emphasis on generic programming in D? Just curious.

Spivak 4 years ago | | |

Doesn’t work in a lot of cases unfortunately. If you’re writing a library designed to be consumed by other languages you’re stuck with writing C abi compatible code which can be written in other languages that can “extern” them but it puts limits on what’s possible in those libraries.

cryptonector 4 years ago | | |

You might as well have written that "any time you're reaching for C, it's time to reach for a more powerful language".

But if -sadly- you must use C, metaprogramming using macros is not a terrible thing.

gumby 4 years ago | |

One of the very few things from C that I miss in C++ is anonymous structs and enums. I really don’t understand why they are not allowed.

That is, C style enums don’t have to have a name but “type safe” (enum class) ones do. One classic use is to name an otherwise boolean option in a function signature; there’s typically no need to otherwise name it.

C++ incompatibly requires a name for all struct and class declarations, again a waste when you will only have a single object of a given type.

WalterBright 4 years ago | | |

> I really don’t understand why they are not allowed.

I don't, either. Such were in D from 2000 or so.

I also don't understand why `class` in C++ sits in the tag name space. I wrote Bjarne in the 1980s asking him to remove it from the tag name space, as the tag name space is an abomination. He replied that there was too much water under that bridge.

D doesn't have the tag name space, and in 20 years not a single person has asked for it.

This did cause some trouble for me with ImportC to support things like:

    struct S { ... };
    int S;

but I found a way. Although such code is an abomination. I've only seen it in the wild in system .h files.

cjaybo 4 years ago | | |

> C++ incompatibly requires a name for all struct and class declarations

You're right about "enum class", but anonymous classes and structs are perfectly valid in C++:

https://godbolt.org/z/7MbcqhnoK

jcelerier 4 years ago | | |

... ? That's definitely not true, both anonymous structs and enums work fine in c++.

https://wandbox.org/permlink/ICaQJXCaVOt9mXdP

midjji 4 years ago | | |

Use a enum in a namespace, or anonymous namespace

oshiar53-0 4 years ago | |

IMO you can still be explicit about field offsets by writing the struct in a usual way, and using static assertions to ensure offsets match the intended layout.

nyanpasu64 4 years ago | |

Do foo and bar deliberately overlap?

10000truths 4 years ago | | |

Yes, I was looking to demonstrate the flexibility of the approach by including overlapping fields.

midjji 4 years ago | |

There are two kinds of undefined behaviour being invoked in using this. Its a horrible idea and a horrible code smell, get rid of it if you ever see something like this.

10000truths 4 years ago | | |

I don't see any undefined behavior here. As I mentioned below, gcc explicitly documents type punning via unions as being well defined. But yes, this is compiler specific and is not guaranteed to work elsewhere.

flohofwoe 4 years ago |

I'm using anonymous nested structs extensively for grouping related items, but I consider the extra field name a feature, not something that should be hidden:

https://github.com/floooh/sokol-samples/blob/bfb30ea00b5948f...

(also note the 'inplace initialization' which follows the state struct definition using C99's designated initialization)

kevin_thibedeau 4 years ago |

The result is uglier and less maintainable than a pair of macros. Or just stop trying to hide syntax. This is ultimately on the same level as typedefing pointers.

remram 4 years ago |

The first example seems wrong, instead of `struct sub { ... };` what is meant is `struct { ... } sub;`

siebenmann 4 years ago | |

You're right; thanks for noticing and I've updated the first example. My C is a bit rusty these days and I didn't check it with a compiler the way I should have.

(I'm the author of the linked-to article.)

sesuximo 4 years ago |

Doesn’t matter for C, but in C++ this could make your contexpr functions UB since you can only use one member of a union in constexpr contexts (the “active” member).

ferdek 4 years ago | |

In other words: please always be wary of differences in C and C++, for instance type punning [0].

[0] https://stackoverflow.com/a/25672839

pjmlp 4 years ago | |

Triggering UB is a compiler error in constexpr code.

https://shafik.github.io/c++/undefined%20behavior/2019/05/11...

sesuximo 4 years ago | | |

True, you’ll hopefully get a compiler error.

pjmlp 4 years ago | |

In C++ we have namespaces for 30 years now, no need for such tricks.

tialaramex 4 years ago | | |

Hmm. How do C++ namespaces help with the structure naming problem in this example? They seem completely orthogonal.

C++ namespaces are a way to avoid library A's symbol "cow" clashing with library B's symbol "cow" without everything being named library_a_cow and library_b_cow all over the place which is annoying. I agree C would be nicer with such a namespace feature.

However this technique is about what happens when you realise your structure members x and y should be inside a sub-structure position, and you want both:

d = calculate_distance(s.x, s.y); // Old code

and

d = calculate_distance(s.position.x, s.position.y); // New

... to work while you transition to this naming.

comex 4 years ago | | |

C++ namespaces are unrelated to this. They don’t accomplish the same thing.

midjji 4 years ago | |

Constexpr unions is the sane/safe way to use them. Its great, because accessing a member which isnt the last one written, constexpr will explicitly prevent it compile time. Whereas all other examples here are explicitly undefined behaviour!

bruce343434 4 years ago |

Imo this is not “perverse”. In my vector library I alias a vec3 as float x,y,z and float[3] using this technique.

midjji 4 years ago | |

This is also known as the most common invocation of undefined behaviour in game programming. If you do this, write to y, then read from [1]. You are invoking undefined behaviour, and compilers doing different things here between windows, linux mac, and different compiler versions is a common cause of "why isnt my game working right on XXX, it works fine on YYY questions.

genocidicbunny 4 years ago | | |

Type punning is not undefined, it's implementation defined in C. In practice, every major C compiler will be fine with type punning, though it may disable some optimizations.

The story is different in C++, but in practice many compilers support it the same as in C. Especially for games, where VC++ (PC, Xbox) and Clang (PS4/PS5) are the most commonly used compilers, it also works as expected. The trick is to only use type punning for trivial structs that don't invoke complications like con/de-structors or operators. The GP's example of a Vec3 struct that puns float x,y,z with float[3] is a very common one in games.

saagarjha 4 years ago | | |

I don’t see the undefined behavior here?

PaulHoule 4 years ago |

The C programming language, brought to you by Cthulhu.

You don't need eval(), you've got strcpy()!

rightbyte 4 years ago |

I don't regard this as a "perverse" hack. If I ever do embedded memory mapped stuff in C11 this is way too tempting.

midjji 4 years ago | |

You are practically guaranteed to invoke undefined behaviour if you do. Just use a map on a std::array of e.g. std::byte

dexterhaslem 4 years ago | | |

they said C11 tho

Subsentient 4 years ago |

Bleurgh. I have a deep soft spot for C, and I'm known to get twisted pleasure from using obscure language features in new ways to annoy people, but this is a level of abuse that even I can't get behind. If you need namespacing, use C++. As much as I love C, it's terrible for large projects.

vbezhenar 4 years ago | |

Linux kernel is large project and clearly C is sufficient for it, given the fact that migrating to C++ would probably be very easy (not using all C++ features, but just selected ones), yet it did not happen.

I think that C++ is better than C, but C is not that bad, even for large projects.

dkersten 4 years ago | | |

> Linux kernel is large project and clearly C is sufficient for it

Sure, and operating systems have been written in assmebly too. The question is whether it would be better than just sufficient if Linux were written in C++, today (ie C++17 or 20, not something old). Switching now probably wouldn't be feasible (even ignoring technical reasons, the kernel developer community is familiar with the C codebase and code standards and bought into it), but if Linux were started today, would it be a better choice?

Maybe the answer is still no and C would still be chosen, but the choice today is very different than it was when Linux was started. Of course, maybe Rust or something would be chosen today instead.

adtac 4 years ago | | |

the kernel has to live with the choices it made in the 90s, you don't

dathinab 4 years ago | | |

And the Kernel devs would probably get really annoyed if you try to push this kind of name-spacing.

> C++ would probably be very easy

Not necessary, besides some small? problems due to the C++ allowing "more magic optimizations" then C they would switch to a sub-set of C++, and it might be so you would need to communicate to all contributors that a lot of C++ things are not allowed. And it might be easier to simple not use C++. I mean if it would be that easy the kernel likely would have switched.

AlotOfReading 4 years ago | | |

A big issue with introducing C++ into a codebase is that it's incredibly hard to stick to a particular subset or standard. There's always a well-justified argument for the next standard or "just this one additional feature". Eventually you end up with the whole kitchen sink, regardless of where you started.

I've had far more success hard-firewalling C++ into its own box where programmers can use whatever they can get running than trying to limit people to subsets.

kktkti9 4 years ago | |

People will make a mess of a large project regardless of the language.

midjji 4 years ago |

This is probably a terrible idea, remember that if you have written one member of a union, all other members remain public, yet accessing any of them in any way is undefined behaviour. This is made way worse by most compilers mostly choosing to let you do what you think it will. They just dont guarantee they always will or in all cases.

drfuchs 4 years ago | |

I believe you are mistaken. The C11 standard, section 6.5.2.3 "Structure and union members" pgf 6, says "One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members." And that seems to be what's being used here.

midjji 4 years ago | | |

No: from https://en.cppreference.com/w/cpp/language/union.

The union is only as big as necessary to hold its largest data member. The other data members are allocated in the same bytes as part of that largest member. The details of that allocation are implementation-defined but all non-static data members will have the same address (since C++14). It's undefined behavior to read from the member of the union that wasn't most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.

What 6.5.2.3 simplifies is the use of unions of the type:

struct A{int type; DataA a;}

struct B{int type; DataB b;}

union U{A a;B b};

U u;

switch(u.type)...

Its not what is beeing used here.

std::variant is designed to deprecate all legitimate uses of union

adamnemecek 4 years ago |

Don't actually do this.

sp332 4 years ago | |

The Linux kernel is using this for bounds checking. https://news.ycombinator.com/item?id=28015263

ufo 4 years ago | | |

Like the parent poster, when I read the article I assumed that there was no conceivable reason to ever use this feature in a real C program. Let me just say that I'm pleasantly surprised to be proven wrong!