Carbon’s most exciting feature is its calling convention(foonathan.net) |
Carbon’s most exciting feature is its calling convention(foonathan.net) |
No thanks google. I've been saying that for years now. You jumped the shark
This was not an overview of the language, so this comment seems a little uncharitable to the author.
I don't care if this sounds uncharitable, I see the pitfalls and I seen so many people get sucked in to prior pitfalls
It is interesting to contemplate the most-ambiguous and least-comprehensible/googleable name one might be able to give to a piece of software. "the"? "Biden"? "Russia"? "water"? "air"? "dog"? "person"? "!"? "?"? " "?
Google managers: "Whatever floats your boat. How about Go, Dart and Carbon?"
Lua and PHP have nice searchability. At least, it's fairly easy to weed out results on the Portuguese moon and the Philippine peso.
I also had to write some Go, that was actually pretty bad in terms of searchability.
https://www.irishrail.ie/en-ie/about-us/iarnrod-eireann-serv...
For example, try putting `puts("hello");` at the beginning of `add`. Now neither GCC nor Clang performs the optimization. Why? Because `puts` could theoretically modify the value behind the reference, so the value loaded is not necessarily the same as the value at the beginning of the function, which makes things more complicated, so both compilers give up.
As another example, GCC and Clang both perform the optimization within a translation unit, but if the function definition and the call are in different translation units, GCC doesn't perform the optimization even with LTO, and Clang doesn't perform it with ThinLTO (but does perform it with full LTO). Meanwhile, many projects don't compile with any form of LTO, which is a reasonable decision to improve compilation speed and predictability.
Neither compiler is smart enough to perform the optimization for virtual calls in almost any situation.
It seems odd to break with C++ semantics for small parameters only; I'd expect it to either break for all parameters or no parameters.
EDIT: looks like Carbon is not doing this optimization for small values either. Its docs on parameters say, "This binding will be implemented using a pointer, unless it is legal to copy and copying is cheaper". And the OP link says, "the compiler is allowed to convert that to a T [copy] under the as-if rule." That seems like no extra optimization is enabled other than changing the function calling convention. And the "as-if" rule would require looking at the function body to check if the value could be changed (by things like puts()). I hope my understanding is wrong because this does not make sense. The word "legal" is ambiguous, so hopefully they mean something else!
You write your code assuming such an optimisation will happen, and for some reason, the compiler decides not to apply the optimisation. Perhaps the wind was blowing in the wrong direction, or it was in a bad mood, or you forgot to specify -fvisiblitly=hidden.
The exciting part here is that it happens by default, and the compiler is required to do it. I don't have to think about it, and the ABI automatically does the thing best for performance.
It's nothing really out of the ordinary ... also that particular optimization is meager in practice (on the order of 1%?). I know everything improvement adds up, but the author talks about it as if it was a game changer. I like the enthusiasm, though :D.
Given the major selling point of C/C++ is performance, that's not an insignificant amount.
Honestly the only thing that really stood out as nice is the default pass by word-sized value, in the context of templates - it’s a thing that is achievable in C++, but requires a bunch of obnoxious additional templates that aren’t even part of the standard library so everyone ends up reimplementing the same cruft. Happily I believe there’s a proposal to add this exact functionality.
I also loathe their desire to listen to the BNF maximalists insistence on not having any “ambiguity” from <>s. I’m sorry it’s clearly parseable, and <>s are the standard token for decades. Switching to []s doesn’t make it less conceptually ambiguous, if anything it makes it more ambiguous to a human reader. The only people who don’t want <>s are PLT academics obsessed with forcing their dragon book idea of what a grammar should be. You can’t argue you’re doing it because the ambiguity in a grammar or lexer is bad, because then you would also drop infix operators.
Then in carbon the more reasonable adoption of pascal’s : notation for typing a variable or parameter removes the most common case of the supposedly terrible ambiguity anyway.
Odd that you put ambiguity in quotes. I guess it's not real ambiguity then.
Ivory tower academics, as you see them, actually care about stuff because it affects you. I've heard too many times the knee-jerk response of "it's just theory" as if that actually meant anything. Try implementing it yourself and you'd start to understand that little bits of crap typically don't add, they multiply (Edit: and I'm hurting from exactly this right now), so don't diss theory until you've (ironically) had some practice with it.
Not if it's in a different compilation unit, it doesn't. Or if it's just not inlining the function for some reason, then it probably also doesn't. Which is why Carbon's restriction here is so useful and practical.
I don't get that: can you express in C++ a code that "take the address of a parameter via any mechanism other than explicitly taking the address"?
Even C++ struggles to be compatible with C++. I expect Carbon to have at least as hard a time with it, especially given the "bazaar" that is the C++ build and packaging ecosystem.
I also expect that Carbon will have a hard time tracking C++ as it evolves. It could freeze C++ feature development? Or it could target an old and aging C++ standard?
Point is, I'm not taking C++ compatibility as a given. It seems very hard and may require a lot of big tradeoffs.
It's a very realistic take on C++ interop, but still well beyond anything offered by other languages.
The real killer is that the lack of any interaction with C++ means that you can’t do any real incremental adoption of one of those safe languages in any big security critical projects. Saying that the solution to C++ is to just not use it ignores the reality that C++ exists, and large projects in C++ exist. It doesn’t matter if you don’t like C++.
The final problem with the safe languages - with the exception of swift - is that they are all hell bent on not providing even just basic ABI stability. It doesn’t matter how “safe” your language is if the first thing you have to do is make a pure C (even losing potential for automatic lifetime management the C++ would allow) interface.
So I can have two libraries both written in rust, and the entire safety of it is contingent on each library talking to the other through a C API.
The reverse is true. Anything passed by value can be treated as const reference by the compiler if the compiler knows enough about access and lifetime. The compiler must be able to determine that the parameter is neither deallocated nor modified while the function is using it. Rust compilers should be able to do that.
ftp://ftp.linux-mips.org//pub/linux/mips/doc/NUBI/MD00438-2C-NUBIDESC-SPC-00.20.pdf
Section 3.4, page 21:"Arguments: [...]
Derived types (structures etc) and non-standard scalar types are passed in a register if and only if their memory- stored image is register-size aligned and fits into a register. [...]
All other arguments are passed by reference. The callee must copy the argument if it writes it or takes its address."
As an example, look at this issue from the mold linker: https://github.com/rui314/mold/issues/584
They are looking at creating an embeddable custom dynamic linker in order to greatly reduce loading times, by fixing locations as the file gets loaded in.
The equivalent in Carbon would (for example) be to create a new type of exception handling that changes how we think about exceptions, performance and their cost. Perhaps something like the static exceptions that Herb Sutter has been working on. Instead, they are not adding exceptions at all.
You will have to excuse my disappointment.
That is all.
If you've got a 1.6 billion empty tuples in variable A, 1.4 billion in variable B and 1.8 billion in variable C, C++ can't see a way to do that on a 32-bit operating system. It needs to give each empty tuple an address, so it must think of 4.8 billion integers between 0 and 2^32 and it can't do that, so your program won't work.
Carbon is still far from finished, but if objects needn't have addresses it can do the same as Rust here, and cheerfully handle A, B and C as merely counters, counting 1.6 billion, 1.4 billion and 1.8 billion respectively is fine. Empty tuples are indistinguishable, so I needn't worry about giving you back the "wrong" empty tuple when you remove one, I can just give you a fresh one each time and decrement the counter.
I think this boils down to: Carbon defaults to passing parameters that fit in a single register by value, and all others by const reference. This affects a few things you might take for granted in C++, like whether you can take a reference to a parameter.
The opening example is showing two samples of Carbon and the equivalent C++ code, noting that the "undecorated" parameter `p : Point` is equivalent to `const Point& p` (pass by const reference to a struct) and `x : i32` is equivalent to `std::int32_t x` (pass by value).
I give it a -1 on innovation, it's so obviously stuck in a cul-de-sac in the design space that I see no hope for it. "We'll get around to memory safety eventually". Riiight.
That said, I'm glad they at least made the leap to, "Hey, we could actually use something other than C++?"
The goal of Carbon is explicitly _not_ to be a new general purpose programming language. In the docs, the authors even advise people to use Rust instead for greenfield projects. The goal of Carbon is to support migrating existing C++ codebases to some new safer language.
There already were several options instead of using C++, but some minds are wired in that way after many years of training.
And I standby my original assessment that the design is very obviously lacking in long-term experience in another language.
It usually boils down to someone assuming that a particular collection or algorithm won't be on the critical path, and using a lazy O(N^2) solution. Then the codebase grows, use cases shift, someone puts another O(N^2) algorithm around an existing O(N^2) and the whole thing explodes.
Adding proper caching and switching to N*log(N) algorithms usually brings a DRAMATIC improvement (like 100x faster), while trying to squeeze out every unnecessary copy will only squeeze out about 10% of complexity.
Unclear from the repo who is sponsoring it.
- don’t listen to me, it’s Google.
I hope this doesn't take off, with my sincere apologies to the ones who have been working hard on this. The last thing the C/C++ ecosystem needs is becoming de facto owned by a private company.
Are they trolling Go here?
I find the tone strange. I understand that the author doesn't believe the statement. But I've never met anyone for whom performance matters and believes this. So, who's the audience here?
A C++ replacement that is just as difficult to bind/link/work with in other languages as C++ itself just seems like a missed mark, despite the fact I know this is not on their priority list at all.
I hope my read of this is wrong, and it will be easy to bind and link to from other languages after all.
Carbon might be shy about its footprint.
GC and ARC are for people that need shared ownership, which is pretty much always a bad idea.
type Point is record
x, y, z : Interfaces.Integer_64;
end record;
procedure Print(p : Point);
Is `p` passed by reference or value? The compiler chooses what it thinks is best--all parameters are considered `const` unless they're `out` parameters. There's some rules for classes (tagged types), uncopyable (limited) objects, and `aliased` parameters which are always passed by reference.I can't get a pointer type (access) out of the parameter to the function, since the accessibility rules prevent it:
-- "constant" since we don't know if it is writable
type Point_Access is access constant Point;
Last_Printed : Point_Access := null;
procedure Print(P : Point) is
begin
-- P'Access is sort of like C++ std::addressof(P) to get a "pointer"
-- There's also P'Address to get the actual address, but then requires conversion to a pointer-like "access" type to be used.
--
-- Compiler Error: "non-local pointer cannot point to local object" since Point_Access type is declared at a higher level
Last_Printed := P'Access;
-- If we really, really, want to do it, "I'm smarter than the compiler", you can force it...
-- Think of "Unrestricted" and "Unchecked" as grep-able warnings signs of "this is potentially very dangerous"
Last_Access := P'Unrestricted_Access;
-- ...
end Last_Printed;
What about making and then trying to use a local pointer-like type? This doesn't work because you can only create pointer-like accesses to types which have been marked as `aliased`, since you don't know if there's a location you can point to which has the value. procedure Print (P : Point) is
type Local_Access is access constant Point;
-- Compiler Error: prefix of "Access" attribute must be aliased
Ptr_Like : Local_Access := P'Access;
-- Similar, "I am smarter than compiler" trick works here too...
Ptr_Like : Local_Access := P'Unrestricted_Access;
You can allow passing any arbitrary pointer into a function by providing `access`, but you're not allowed to store it, since you don't know which flavor of the pointer type it could be, e.g. if it points to something on the stack, or on the heap: type Point_Access is access constant Point;
Last_Printed : Point_Access := null;
-- Allow printing any pointer-like (access) to a point.
procedure Print (P : access constant Point) is
begin
-- Compile Error: implicit conversion of anonymous access parameter not allowed
Last_Printed := P;
-- If we really, really want to do this, we can force it with a cast...
Last_Printed := Point_Access (P);
-- ...
end Print;I'd only worry about compiling it yourself if you intend to participate in Carbon's development in the near future.
The standard explicitly states that a parameter taken by value and initialized from an prvalue of the same type may elide the move constructor (which all implementations do).
You mean where Google has majority control over what does and does not go into the repo?
There is a lot of 32 bit microcontrollers, most notable stm32
Admittedly, those probably aren't running on 32-bit systems, nowadays ...
If you have those, I don’t see any way to discriminate between them. If so, why would you ever want to have more than one of a given type? Is there some template code that might accidentally try to create them?
struct Marker {};
struct Foo {
Marker marker;
int64_t number;
};
If you write the equivalent in C with GCC extensions, or Rust, sizeof(Foo) would be 8, the same as sizeof(int64_t); `marker` doesn't take up any extra space. In C++, however, sizeof(Foo) is 16, because `marker` must take up at least 1 byte to have a unique address, which gets expanded to 8 bytes due to alignment.Now, as of C++20, you can reduce sizeof(Foo) to 8 by tagging `marker` as [[no_unique_address]]. However, this has drawbacks. First of all, it's easy to get situations like this in highly generic code, so it's hard to predict where [[no_unique_address]] needs to be applied (and applying it everywhere would be verbose).
Second of all, [[no_unique_address]] is dangerous, because it doesn't just allow empty fields to be omitted, it also allows nonempty fields to have trailing padding bytes reused for other fields. Normally that's okay, but if you have any code that performs memcpy or memset or similar based on the size of a type, such as:
struct Foo {
Foo(const Foo &other) {
memcpy(this, &other, sizeof(Foo));
}
// …some fields here…
};
…then if that code writes to a [[no_unique_address]] field, it can overwrite adjacent fields, since sizeof(Foo) includes any trailing padding bytes!With only three exceptions I can't think of a case where by default one would not want that. In the case you describe you would want all those objects to have unique addresses. If you wanted to have them overlap you should go to the effort to happen the way you want it to -- how could the compiler guess on its own?
The exceptions BTW are union elements, the first class/struct element, and base object addresses (in `class foo : bar ...` the when you make a foo, the address of its bar is the same as the address of the foo itself).
I certainly don't, if you want unique addresses for indistiguishable objects I guess C++ is the perfect language for you but, what are you expecting to do with these addresses?
https://en.cppreference.com/w/cpp/language/attributes/no_uni...
int main(void) {
int x = 0;
int arr[1];
int *p = arr + 1;
*p = 42;
return x;
}
On a lot of systems (e.g., https://godbolt.org/z/jYqM8TT3Y), it just so happens that `x` is right above `arr` on the stack, so that code will return 42. But that code is absolutely UB.The more general name for this concept is "pointer provenance". Basically, you can't pull pointer values out of thin air; you have to derive them from operations rooted at taking the address of something within the same allocation.
However in practice you can do this, and walk the stack to find the parameters or what have you, and then you’ve got a pointer to a parameter without the compiler being aware of it. But this is all explicitly UB, so it’s ok for the compiler to be unaware of it, and it’s free to do whatever codegen it wants given the assumption that UB can never happen.
This is undefined behavior. Because it is UB that compiler is allowed to assume it cannot happen. Therefore I have the address of a parameter, and can pass that to a closure or whatever, and the parameter has escaped, but the compiler doesn't know.
More importantly by definition the compiler does not need to know, because a program that does that is no longer well defined.
https://sciencetrends.com/the-charge-of-a-carbon-ion/
https://aip.scitation.org/doi/10.1063/1.4966987
"Up to C⁴⁺ ions are observed."
And I guess you could also write the C²⁺ ion as "C⁺⁺" if you wanted. :-)
But you are going to need some minimal buy-in from Microsoft first, if not Sony and Nintendo too.
The great thing about carbon is that you can incrementally shift a codebase over from c++, and there are no problems interacting with existing c++ APIs, so the required amount of buy-in is pretty much just "yes, you can use a 3rd party compiler" and maybe some improvements to the debugger.
You actually have to wrap all the c++ functions in c functions, and then call those c functions from rust. Which requires either making manual wrappers, or automated wrapping tools that handle the specific idiosyncrasies of the library you are calling.
Which is a massive barrier; Gamedev standardised on c++ and there are so many c++ libraries.
In comparison, Carbon is designed from the ground up to automatically have bi-directional inter-op with c++. It's what the language is designed to do.
Things must be initialized in Rust. Things must have a single identifiable owner. Things really want to be immutable.
Sure, you can duck these in Rust with lots of "unsafe". But, if you do that, are you really gaining anything from using Rust?
Triangle meshes can be stored quite efficiently and a lot of the modern geometry is tessellated, procedural or replicated.
There are far better optimizations for gamedev like using SOA instead of AOS.
void f(int i) { &i; /* disallowed */ }
I think that's a trivial case and is hopefully obviously easy to detect, and I know we're considering relevance to escaping which this doesn't do, but it seems that taking the address is explicitly disallowed regardless of use. Anyway given the many references to references, lets do void g(int &i) { &i; /* disallowed once more */ }
To me this seems no harder to detect, but this does seem like a case where banning taking the address /could/ impact things. But I can't see what the win is.I'm assuming its a different TU or something else that prevents inter procedural optimization. Could you give some examples where this allows performance wins?
If in boo.cpp you call your g() function which is in foo.cpp, how does the compiler know if it's taking the address or not? It's in a different compilation unit. All the compiler knows at that point is there's a function 'g' that takes an int by reference. It has no idea what 'g' does with it, it doesn't even know where 'g' is - that won't be known until link time. So the compiler is forced to be conservative and allow for 'g' to do anything that C++ allows - which includes stashing the reference somewhere or const_cast'ing away a const& into a mutable reference.
This is where Carbon's win comes in. Since that behavior just isn't allowed, the compiler doesn't have to be conservative. It doesn't need to somehow see into different compilation units to perform helpful optimizations (critically for const unique_ptr& or const shared_ptr&)
There is of course no 'registry of allowed names that have aged-out' but there are probably other creative names one could come up with.
Because a parser is never really ambiguous is it? Nobody writes a parser that has a random decision on which rule to take.
It's still ok talk about the grammar as being ambiguous no matter what a purported implementation of such grammar is deterministic (because it ultimately does not implement the actual grammar)
I think parser generators were a mistake.
It's also well understood that for (2) this theoretical ambiguity is not actually a real problem for humans - the things for which reading is actually important - because otherwise they would not allow infix operators, which are ambiguous according to their definition, but much like languages that they have chosen to define as ambiguous, infix is not.
In that theoretical model of grammars we define the power of different classes of parsers to go something like LL < LALR < LR < GLR. This is how the mathematical theory of grammars goes, and everything beyond LL is basically required to be based on some PDA. The ambiguities in the PDA's state transitions are what "prove" a grammar to be ambiguous. Further LL parsers are considered the least powerful, and incapable of parsing the majority of grammars, especially those that belong to "real" languages.
The problem is that this model of grammars is arbitrarily restricted, and many (I would guess most real world) parsers for "real" languages are not any of these classes, which is certainly odd given the completeness indicated by theory. That's because this restricted definition of what a parser is says that most real world programming languages require an LALR parser, and in reality the big ones likely need LR or GLR.
In the real world, what is used could probably be classed as LL(infinity), which is impossible according to the theoretical parser classes in use. The reason that we use these impossible LL(infinity) parsers, is because they are human writable, and human understandable, and also vastly faster than the state machines LALR parsers (think bison and yacc IIRC), even more so for LR, and slower again for GLR (even if you can get away with a tomita parser, if you could find a generator for one). Additionally all the non LL parsers can only be reasonably implemented via state machine, and beyond trivial examples, a human is not going to be able to write the state transition tables, let alone modify it in future.
That means using a tool to generate the parser for you, and this really is the heart of the theoretical definition being used, to be a "real" parser, you need to be able to make a tool that will read a BNF grammar, and have it produce a program that will be able to take some text, and then, accumulating no state beyond a stack and its current state, produce an AST.
I have never been shown any justification for why this makes a grammar more understandable, it is simply stated as fact that being easier for a machine to parse means being easier for a human. The contra to that is that most people seem find infix much easier to understand than the non ambiguous pre- and post-fix notations. You could argue that's familiarity, but I would say "so what?" that people can find the "ambiguous" grammar easier to understand than the non ambiguous one indicates that the claim of "easier for machine == easier for people" to not stand up.
Anyway that giant wall of text is my position, and that is why I don't accept "<> for generics is ambiguous" as the sole justification for a change from the root language this is meant to be a replacement for. If they were creating their own language then sure, they could go ham and make the least ambiguous language that they liked (but lisp I think has technically already done that), but that is not what Carbon is doing.
Finally (and so the Great Wall of text continues) on your "ivory tower academic" accusation, I disagree. Academic CS is hugely valuable, and having been in CS academia for many years I feel I am allowed to express my disagreement with some of it.
If credential waving is needed, for academics: my masters is in programming language theory, and I was the TA for among other things my university's courses on programming language theory, parsing, etc. In industry I've worked on clang for many years, and for parsing specifically I wrote the javascript parser in javascriptcore.
I understand parsing, and I understand the PLT subfield of CS.
My frustration with academic PLT is more that by confining the definitions of what is a valid grammar, or what is a real parser, we don't see any real research into into those aspects of PLT and in my experience it seems like an area where we just treat the dragon book as being the definition of everything that matters. Then we turn around say anything that doesn't fit neatly in that definition must be bad, and can be ignored.
This is not true, infix operators can be easily encoded in BNF as a series of rules, each rule recognizing expressions unique to it as well as all the expressions of higher precedence. The following for example is a grammar that parses the four arithmetic operations with the usual precedence.
Expr ::= Addition
Addition ::= Term ((PLUS | MINUS) Term)*
Term ::= Prim ((MUL | DIV) Prim)*
Prim ::= NUM | OPEN_PAREN Expr CLOSE_PAREN
>The problem is that this model of grammars is arbitrarily restricted
It is restricted, but not arbitarily so. There is a very natural constraint that explains why Parsing Theory doesn't like ad-hoc parsers in the way you hate so much : the parser shouldn't depend on the "content" of the stream. The easiest way to see what I mean by this is to demonstrate how one violation of it works : C++'s ambiguous "X Y();" is either a function prototype or a variable declaration, depending on whether Y is a valid type name. This is extremely ugly to me, a parser shouldn't accumulate type information while it's parsing, even if it's as trivial as a set of type names. This is an entirely different concern that should be left to an entirely different module of the compiler. C++'s syntax is full to the brim of such ugly corners, the standard doesn't even bother to specify a binding BNF on the implementations!
Compiler Theory settled on an interface design where the parser must be able to output a concrete syntax tree from a stream without relying on global ad-hoc state like a table of special names or whatever. You framing it as "accumulating no state beyond a stack and its current state" is being unfair, it would be like framing the point of local variables as just a limiting thing that we invented arbitarily : Yes, that's the entire point, No, it's not arbitary, it's intentionally limiting, because the resulting design is easier to understand and simply more elegant.
>lisp I think has technically already done that
There is a middle ground between lisp's austere and misguided "F*k You, Why Should Parsing Be Hard Just To Please Your Eyes?" syntactical philosophy, and between C++'s and Perl's "Lol, Grammars Are For Weaklings". Complex syntax is good, it automates the tedious tree-layout that is the text of lisp programs, you don't need to repetitively state the precedence and hierarchy of code elements, the parser infers some of them for you. However, as in all "The Machine Infers Things" situations, we have to be careful with the rules of inference. We don't want the machine to infer counter-intuitive things, or fail to infer entirely. CFGs formalism is an elegant tradeoff.
>we don't see any real research into into those aspects of PLT
It might be the other way around, that allowing arbitary ad-hoc state in the parser results in too ugly a formalism or no interesting formalism at all, therefore people settled on disallowing it. It's like how theoretical mechanics famously hates friction, it isn't arbitary or born of prejudice, it's just that friction is too ugly and hard to yield an elegant and concise theory, therefore people just pretend that it doesn't exist.
On the surface, it doesn't seem hard to try and formalize some of the real-world parser hacks into the grammar formalism. For example, C++'s "this should be parsed as X unless the name is in a certain set then it should be parsed as Y" doesn't appear too wild a constraint type to encode in a formal grammar, my suspicion is that if you try to do it you will either discover that (a) it doesn't actually change any of the implication of traditional theory, and therefore it is just a "sugar" over traditional constraints : cool, but it technically doesn't tell us anything new (b) it significantly changes the implications and guarantees of theory, to the point that it's not an elegant or beautiful theory anymore. You can introduce control flow into prolog, but why bother ? it wouldn't be prolog anymore.
I don't want to imply that Academia is all-seeing or all-wise, but it is a very powerful novelty-rewarding machine, if there is Anything theoretically interesting to be gained from formalizing parsing hacks, I find it very hard to believe it hasn't been studied in 65+ years of grinding on parser theory and formal grammars.
>I would guess most real world parsers for "real" languages are not any of these classes
I find this too strong a claim. I would bet money it's false for anything but C, C++, Perl, Ruby, and let's say PHP just to be safe. Mention any language that isn't on this blacklist, and I bet I can find BNF-like description of the grammar that tells you perfectly well how to obtain a CST from text, which means they are CFG.
However, the purpose of unsafe is to mark code that does something which a human needs to check for correctness. We're not saying "This isn't OK" but "The compiler can't check this is OK, so a person needs to do so". If you write C++ today, all of your code is like that. I'd be astonished if more than a tiny proportion of the actual game code in a modern video game needed that treatment in Rust.
For example, Rust's approach to late initialization is std::mem::MaybeUninit<T>, a wrapper type which says to the compiler hey, I am not initialized yet, it's OK to write a T value into me, but you can't read my value until somebody says they're done initializing me. The "say you're done initializing" part is indeed unsafe, but that's a small part of the program. That intern writing a zone preview gadget? They don't need to be writing unsafe initialization code, when they try to access preview_zone.orb_color the compiler tells them this is MaybeUninit and so they can't read it. "Huh, apparently orb_color is MaybeUninit ?" "Oh, just show all the orbs in preview as orange, it'll be fine". You just avoided Undefined Behaviour and possibly a trip to the land of "But it works in debug builds".
> But, if you do that, are you really gaining anything from using Rust?
In each new release of Bevy, they've found soundness bugs with the borrow checker—seems like they're gaining something.
Ironically, Google (googletest, gRPC, etc.) has been more agressive than most in dropping support for older C++ standards.
If the likes of Intel, AMD, ARM, IBM, Embarcadero,.... don't step up in ISO C++ support at upstream, clang will stay mostly C++17 with some bits and pieces from C++20.
It is already ironic, that after so many years of being joked for their non-standard extensions, VC++ has the best ISO C++20 support.
To my mind, an object is a value with a unique identity. How else would you define it?
And if you want a billion empty values, then yes, those could all be implemented by the same object - it's pretty easy to implement, even though it would be ncie for a compiler to do it automatically (like how Java normally gives you the same Integer object if you box the same int value in two different places, even though it will give you a different Integer of the same value if you explicitly ask for it with new).
But it doesn't have any bearing on the problem I was talking about, if you were to instantiate your empty class it has non-zero size. C++ just doesn't have ZSTs.
Now, related to ZSTs, I think the main reason why C++ doesn't have this is that C++ really doesn't have any good support for constant values. Sure, you can mark something `const` but that rarely means too much - specifically, it can never add any new semantics to a type, it only removes some options.
One consequence of this is your example - a `const vector<T>` can't be a simple counter of how many T are in the vector even if the T type has a single possible value: the language can't really use the fact that the array is `const` to change its layout.
An even worse consequence is that a `const vector<int> * const` (a const pointer to a const vector of T) is not covariant (it can't be initialized with a const pointer to a const vector of a subtype of T), even though it should be: the language just won't use the fact that it is `const` in this way.
This is clasically done via memory synchronization mechanisms/atomics.
You do need to synchronize concurrent access to the same object, and it's not just concurrent writes, it's concurrent accesses if at least one of them is a write.
It's just that there is a single owner, and the other threads have a non-owning view only.
You need strict control on the thread lifetime, which is enforced through the same single ownership system, in order to enforce this.
As written p is a one-past-the-end pointer into object arr, but the address one past the end of arr may well be the address of x. If pointers are just addresses, these pointers are the same... right?
Neither C nor C++ currently actually explain how this works for their "abstract machine" in the standards documents. The reality is that your C++ compilers (and any non-toy C compilers) have pointer provenance because it's a nightmare to optimise C programs without, but since it isn't documented anywhere (my understanding is that C23 might fix this for C by taking a TS and an equivalent fix via P2318 could land in C++ 26) it's difficult to say if you ever find bugs in their behaviour.
While one-past-the-end pointers are allowed to exist, they are not allowed to be dereferenced.
> these pointers are the same... right
The entire point of provenance is that even though their numerical values are the same, they are not the same.
> Neither C nor C++ currently actually explain how this works for their "abstract machine" in the standards documents.
While it isn't mentioned explicitly, it can be inferred from other things that the standard does say. The compiler authors didn't just make it up.
They didn't "just make it up" but well, here's (a draft of) TS 6010 explaining where it comes from, alas it's not "inferred from other things that the standard does say" but rather riffing on a phrase from a discussion about a defect report...
""In a committee discussion from 2004 concerning DR260, WG14 confirmed the concept of provenance of pointers, introduced as means to track and distinguish pointer values that represent storage instances with same address but non-overlapping lifetimes. Implementations started to use that concept, in optimisations relying on provenance-based alias analysis, without it ever being clearly or formally defined, and without it being integrated consistently with the rest of the C standard.""
TS 6010 will, some day, actually define how this works. Well, it will define how it should work, and assuming compiler vendors can be bothered to implement TS 6010 then it becomes how it actually works.
In TS 6010 (which again, is not how your C or C++ compiler works today, and in the best case won't be how your C++ compiler is required to work until at least 2027 or so) the rules go roughly like this:
* If you've got an actual pointer to a living object via some legitimate means, e.g. you used the & operator in C, that works
* If you try to make pointers from somewhere else, e.g. doing arithmetic on pointers that point to a different object, this only works if you've previously done some operation which might cause non-pointer stuff to be aware of this pointer, e.g. you cast a pointer to an integral type or you type-punned a pointer and then looked at the bytes
* However, the compiler is obliged to give you the benefit of the doubt about pointer types, if it's possible you knew how to make a Doodad* with this address in it, then the fact that you also knew how to make a Foo* with the same address doesn't matter, your program is allowed to make a Doodad* not a Foo* if it wants
Thus, your example up thread is still Undefined Behaviour under TS 6010, because you've got no reason to believe the memory layout is the way it actually was. But if you use some type punning hack to get the address of x into that pointer instead, TS 6010 says that works and is not Undefined Behaviour.
I think it's more that interop with C++ is hard and ultimately a lot less valuable than interop with C, which is what the vast majority of languages use for FFI.
> It doesn’t matter how “safe” your language is if the first thing you have to do is make a pure C (even losing potential for automatic lifetime management the C++ would allow) interface.
I don't see how that leads to "it doesn't matter"
Whether that's true in practice...? I guess we'll need a few more years to tell.
Interop with C is trivial, and that is why everyone has that - all you have to do is expose the primitive types (e.g. ints of varying sizes, floats of various sizes), pointers for anything heap allocated or with non-trivial copy/destroy, and pure structs.
Interop with C++ is hard, but you can at least start off without dealing with exposing template types and functions, and then C++ intro reduces to the above C functionality with some fancy pants layout for vtables.
Supporting interop at a C API level is fine as your FFI to other languages as a library, but dropping your safety guarantees as a library talking to other code written in your own language is counterproductive. Because the first thing anyone needs to do is that "rare" FFI work of recreating a memory safe wrapper around a system library.
> I don't see how that leads to "it doesn't matter"
Ok fair cop, that was a little strong, or at least insufficiently qualified.
I'll try again:
The safety of a new language doesn't matter to existing software, if there is not a reasonable adoption path for that software.
Obviously being able to implement new code in a memory safe language that has modern features and a trivial path to calling C code (as opposed to the historical "safe" language Java for which talking to non-Java - even plain C - is absolute misery).
But the largest sources of memory safety related security bugs nowadays are large existing codebases. Some like linux are plain C, and so using C as your FFI isn't adding any more safety issues than already exist, and the existing code is all by definition just C types that can be easily accessed, copied, etc as they're all just trivial bunch of bytes.
But some of the largest software in use is C++. Most obviously browser engines which by design are required to load arbitrary data from arbitrary locations, then intentionally give that arbitrary data a full near arbitrary execution Turing machine, but there are a bunch of other very large, and critical, projects.
Not having an interop story with C++ means that if any of these projects wish to adopt some new safe language, are stuck with having no transition path. Even Mozilla, who created rust, have only had partial success adopting rust in gecko, like their layout engine where the "transition" was rewrite the entire engine, and when we're done swap the engines wholesale.
For other libraries it means a large API performance regression because you have to add a pile of marshaling logic to wrap the mandatory C interface. Obviously this marshaling also impacts projects with C APIs but we're going to pretend for now that there's no heap allocation being introduced.
It's not that it's inherently bad, it's just insanely difficult to do and, probably, isn't worth the pain. Carbon's approach to solving this problem includes embedding a custom C++ compiler as part of its toolchain, and at this point it's just the idea, who knows if they will be able to actually do it.
> they are all hell bent on not providing even just basic ABI stability
Right, the famous stable C++ ABI
I suspect you're confusing when you make an ABI change, without changing the API, which is indeed an easy thing to do: You have to make sure that you only add members to the end of objects, however that exact constraint also applies to C.
C++ ABI doesn't even exist, there are several competing standards (much fewer now than there used to be) and nothing is guaranteed even between different compiler versions, let alone different compilers. Not that long ago we used to have two implementations of std::string in gcc you had to choose from and if the library you're linking with chose another one, you were out of luck.
Carbon isn’t looking to do this either. The “solution” they’ve come up with is that you just recompile everything and get things on the same ABI, which is not surprising considering how Google operates internally.
Likewise JNI has always had support for COM like models, and the C API is written in such way that the struct layouts can be used as C++ classes instead, with implicit this on method calls.
But I don't think the goal is inherently wrong, nor do I think it's niche - there is a lot of C++ in the world, and much of it would benefit from being memory safe. A lot of the largest projects and certainly many of the most heavily used software projects are written in C++. I think you'd agree that having them no longer be be trapped in a memory unsafety tarpit would be a Good Thing.
Let's imagine someone came out with a magic C++ compiler that had 100% source and ABI compatibility and had no performance costs, but was completely memory safe - remember "magic". You'd obviously expect, if not demand, that every C++ project switch to that magic compiler. So let's imagine we're on a scale from 1 to 10, where 1 is the memory safety of current compilers and 10 is our magic clearly impossible one. Obviously every step we can go towards 10 is great, but is going to come with trade offs in adoption cost. So on another 1 to 10 scale where 1 is rewriting in some other random language (rust, etc) and 10 is your current compiler, then you want the new language to be as close to 10 as possible. One thing you could do is build all your code with ASAN or a similar mechanism, that would have low adoption cost, a dramatic improvement in runtime safety, but a huge performance cost (time for another 1 to 10 scale?).
Carbon appears to be an attempt to design a language with a reasonably high score in all of the scales - no 10s, but mostly 7s or something. I personally think that there are things we could do to C++ that would make a meaningful improvement to safety without having the solution be "rewrite in other language", but I won't fault a project saying "we'll support incremental adoption" as their adoption path in exchange for stronger safety.
fn main() {
let vs = [(), (), (), (), ()];
for v in vs {
println!("{:?}", v);
}
}
This loop will five times print "()", because `v` will iterate through all five elements of `vs`. Are this values are identical or they aren't? I don't know, it doesn't matter, isn't it? But I think of them as of different values: they are different members of `vs`.But imagine the following program:
fn main() {
let mut vs = [(1,), (1,), (1,), (1,), (1,)];
vs[0].0=2;
for v in vs {
println!("{:?}", v);
}
}
Here we can see that identity is in fact important: `vs[0].0 = 2` only modifies one of the objects, even if all of them initially had the same value.By the way, note that your example should be completely equivalent to the following C++ program:
int main() {
using namespace std;
auto vs = vector<tuple<>>{tuple<>(), tuple<>(), tuple<>(), tuple<>()};
for (auto v : vs) {
cout << v; //imagine C++ actually had an implemntation of operator<< for tuples...
}
return 0;
}What allows you to say that they have different identities? They are zero-sized types. Literally zero. `()` is a type and `()` its the only possible value. log(1) = 0 bit. If you look into machine code you will not find anything that you can call an object. The very existence of `()` is a shared dream of a programmer and compiler, and `()` ceased to exist after a compilation.
> But imagine the following program
In this program you are using types of size > 0bit, which allows more than 1 value of that type. But even then I wouldn't bet that they are different objects only because you changed one. If you didn't, it would be completely logical to replace them all with just one value in a memory, while pretending that there are many copies of it.
In this particular case I don't believe rustc would manage to do it even if we drop mutation from the example. And I can't think of a case when it will manage. But I wouldn't bet that such case doesn't exist.
> By the way, note that your example should be completely equivalent to the following C++ program
Hmm... And if we write in C++ something like:
fn main() {
let vs = [(), (), (), (), ()];
for i in vs.iter() {
println!("{:?} {:?}", v, v as *const ());
}
}
Will we get an output like this: () 0x7ffdae96bec8
() 0x7ffdae96bec8
() 0x7ffdae96bec8
() 0x7ffdae96bec8
() 0x7ffdae96bec8
Will all addresses be equal? If they are, then the proposition "C++ also pays a price for insisting not only that objects have addresses, but those addresses are distinct"* is false.Then C and C++ are defacto controlled by private companies: Google+Apple do pretty much all the clang development, MS does MSVC - if they choose not to implement a feature that’s approved, or implement one that isn’t, that is the de facto standard.
You can argue it would require all three to agree on something, but that’s still essentially making WG21 somewhat irrelevant.
Why not? Just curious.
""We are planning to create an open source foundation and transfer all Carbon-related rights to it; our goal is for the foundation setup to be similar to other open source projects, such as LLVM or Kubernetes.""
Well they’re from practical experience aren’t they.
Then you say "I guess I don't understand why we use languages for writing grammars that let you express ambiguity in the first place" without any suggestions, or even if this is possible, or if it is, whether the resulting language might be too constrained to be useful (interesting question though. Edit: even context-free grammars have ambiguity, eg. the regexp "aa" is legal but ambiguous for sentence "aaa").
Then "I think parser generators were a mistake" which is surreal. If you've ever had the tedious misfortune to write one manually, you know how much faster it is to have the computer do that work.
So I am thrown a bit here.
Do you really want your language to be defined by a particular imperative parser implementation?
I'm asking because I do have a practical problem: my team has a hand crafted parser for a language. It should follow a spec written in EBNF. That spec could be ambiguous it's not because an automated tool checks that that EBNF grammar is not ambiguous.
I find the ability to document the grammar using something that everybody understands (BNF and variants) to be very useful. Yet, your comment seems to imply that since it allows ambiguous grammars we should be using it.
EDIT: our grammar is implemented by three different parsers, written in two programming languages.
Syntactic analogy with non-zero sized types. If you want a special case that says "there is a single object of the zero-sized type" that's ok, but it's a special case. All other types have a difference between object identity and value equality.
> Will all addresses be equal?
No, because C++ doesn't optimize for ZSTs, and it doesn't modify semantics for const. I agree that C++ pays a price for these two things, but I don't think it's because it "insists all objects have different addresses", I believe that is just a consequence to not giving special semantics to const beyond disallowing writes.
I can then go back to my library, and make a bunch of ABI safe changes, just as I would have to do in C, and compile my library again, with yet another compiler. At that point the program that was using the older version of my library, would continue to work with the new build of my library without needing to be recompiled.
This is because C++ is ABI stable.
If I am in C++, and I make an ABI breaking change to something that is used as a base class elsewhere, then I have broken the ABI.
But this problem also exists in C, for exactly the same reason, for example lets make some silly example C
If I am a C library, and my header declares a struct
struct AwesomeThing {
intptr_t foo;
};
void doSomething(struct AwesomeThing* thing) {
thing->foo = 0;
}
And then someone uses my library in their code: struct ImAwesomeToo {
struct AwesomeThing theAwesomest;
int* thisFieldIsGreat;
};
int doSomethingElse() {
struct ImAwesomeToo thing;
thing.thisFieldIsGreat = malloc(sizeof(int));
doSomething(&thing.theAwesomest);
free(thing.thisFieldIsGreat);
}
Obviously this is a somewhat silly example, but now say I make a change to my library: struct AwesomeThing {
intptr_t foo;
double bar;
}
void doSomething(struct AwesomeThing* thing) {
thing->bar = 42;
}
By the standard rules I haven't "broken" ABI, but now that call to free() is going to cause problems.That's the fragile base class problem.
If you are making an API that will have ABI stability requirements you have to expend quite a bit of effort designing the API so it's not only pleasant to use, but also can be evolved without breaking the ABI. As with C APIs, the people who make C++ APIs, know how to make them robust as well.
That said you could have a C or C++ ABI that is ABI stable, it just hurts performance, and obviously adopting that would break ABI :D
Anyway, the problem with what rust and co are saying, is that the same source can result in different ABIs from one compiler version to the next, or for one API to compile to a different ABI depending on what the rest of the code in the project is doing.
That means the OS can't use the safe language as an actual OS API, which I just think is wasting an opportunity.
Swift manages, and ostensibly performs the same kind of optimizations within a module, which is most of what you want. That said because it's the system API all objects are refcounted, and the recount is atomic - I was writing a raytracer in it (this is how I learn programming languages) and the refcount overhead was annoying to deal with.
Obviously there are trade offs in all the choices, but I still feel like rust, etc could do more. Rust has rich support for annotations, so an "abi_stable" annotation seems like it would be perfectly reasonable - it would be in keeping with rust's general theme of the default behaviour not having any implicit performance costs, but would be easy and very clear when you were making something into an API.
It seems too optimistic for me. I use Gentoo, and I believe that the only reason I have no such troubles is Gentoo with all its maintainers who do a great work of testing different combinations of libraries. And of course `ebuild` that can rebuild libc.so without breaking the system. I tried it once when I just started with linux, and I failed. Never tried to do it again myself, I let `ebuild` do it. It is not so easy, you know, to change version of a library which is a dependency of every binary in the system. And it is a C library. I wouldn't even try to do it, if it was a C++ library. Though who knows, two decades ago when I started with linux I probably was dumb enough to try.
> I can take any C++ compiler, and I can compile any C++ library, and then I can take some other compiler and compile some other piece of C++ that calls my library, and it will Just Work.
This is most definitely not true. It just happens that clang and gcc are mostly compatible, but MSVC and clang/gcc is not! For example, vtables are implemented differently and therefore virtual method calls can crash (unless you follow certain rules, e.g. no virtual destructors, no overloaded virtual methods, etc.)
With 32-bit code, compilers won't even use the same calling convention by default... (look up 'thiscall')
EDIT: strange to see that someone downvoted...
That sounds like what you're saying is that gcc or clang fail to conform to the platform ABI, which is a compiler bug. If a compiler wishes to ignore platform ABI, it doesn't make the ABI not a thing.
But if you are comparing ABI compatibility to MSVC, you are comparing the ABI used by the code generated by clang or gcc to what is by definition the windows platform ABI. If the code generated by a compiler targeting a given platform, does not match the ABI used by that platform, the problem is the compiler is generating broken code.
> vtables are implemented differently and therefore virtual method calls can crash
No, vtables are implemented incorrectly by the compiler. If the vtable for a type laid down by gcc or clang crashes when it is passed to a OS function, that means the compiler is broken, as the compiler has decided to ignore the platform ABI. Again, if I write a compiler that chooses to use the wrong ABI when interacting with the host OS, I don't get to claim that it is the language's fault, or the OS, or the ABI. Similarly I don't get compile code for linux that makes windows system calls, and then complain that there isn't a windows ABI.
Here is the thing: The ABI for vtables is specified for every non-trivial platform. The ABI for argument parameter ordering is specified for every platform. The ABI for struct layout and struct padding is specified for every platform.
"thiscall" is the ABI for member functions on windows, it's not some magical non-abi thing, is is by definition _the_ abi. There needs to be a name for it, because documentation at the very least has to be able to distinguish it from "cdecl". Importantly, claiming that it's a sign that C++ doesn't have an ABI, or that ABI isn't stable, is simply incorrect. The fact that it exists as a name we can reference is an indication that the ABI matters enough that it is specified. Claiming it's evidence of a lack of ABI is like claiming the C doesn't have an ABI because gcc has stdcall and cdecl - and thiscall is useful MS choosing it as the ABI for member functions on i386 is a reasonable performance win that gcc chose not to do, in favor of using cdecl everywhere, even if cdecl was a slow choice.
There is generally a pile of sadness when dealing with 32bit C++ as the various platform ABIs came into existence when C++ was new, and so the ABI for it on any given platform just happened to be whatever happened to be used by the first releases of the primary C++ compilers for those platforms. That's also why those ABIs - for windows, linux, macOS, etc - tended to not be super well designed or systematically created. Hence even when trying to match ABIs it was easy to hit edge cases where things went horribly wrong between completely different compilers. However even then, generation to generation of each of those compilers maintained ABI stability with themselves at least. Mercifully in the consumer 64-bit every compiler seems to have essentially gravitated to the itanium ABI, which is an actionably thought our and designed rather than evolved as the language is being invented.
So continuing to claim that there is not an ABI, or that that ABI is not stable, or whatever other claim you wish to make, does not make it become true just because you don't like C++. It also does not become true because you are a fan of a language that doesn't want to provide a stable ABI. The historical problems of gcc vs msvc were a product of how the respective ABIs were developed, but on architectures that are more modern it should not be a problem, and I'm sure that if you find places where the compilers differ from the host OS on a more modern platform the developers are much less likely to ignore the issue vs. i386 where they are stuck with whatever their exact ABI was in the early 90s.
In the end, to be very clear, C++ has an ABI, it is stable - a fact demonstrated by Windows, XNU, QT, etc all existing, and continuing to successfully exist. Technically the c++ standard library demonstrates this as well but given the various implementations of the standard library are now maintained largely by their respective compiler projects, that seems like cheating.
Windows is a different platform, so comparing it to linux isn't relevant, any more than saying I can't run the code I compiled for sparc on a ppc Mac.
Yes please. The code defines completely and unambiguously how the language is parsed.
If instead you choose to specify your language using a class of grammar which does not permit ambiguity, then you don't have to resolve it because it never existed.
That's the point.
The suggestion is to either specify your language using a formal grammar which does not permit ambiguity, or to specify your language imperatively, using a reference parser.
I've written many parsers, both using parser generators and manually. I'd choose to write one manually. In fact, I'm currently looking at a project at work right now to take a generated parser and to re-write it manually because it's easier to work with.
That's a pretty mainstream opinion amongst professionals in the industry - not sure why you think it's surreal or why it's throwing you.
I added an edit after which you may not have seen:
even context-free grammars have ambiguity, eg. the regexp "aa" is legal but ambiguous for sentence "aaa"
So what grammar can you suggest that's even weaker than context-free (to make expressions of ambiguity impossible) and still useful?> I'd choose to write one manually
Matter of taste I guess.
It doesn’t need to be weak. For example use a Parsing Expression Grammar.
First off, I work mainly in C++ (Windows, Linux, macOS and some embedded) and I don't hate the language. I really wished that C++ had a well-specified ABI, but the sad reality is that it doesn't.
> if the code generated by a compiler targeting a given platform, does not match the ABI used by that platform, the problem is the compiler is generating broken code.
The compiler only has to correctly implement the C++ standard. The standard does not specify things like calling conventions, struct padding, implementation of virtual functions, so the compiler is free to do anything it wants. (Whether that is a good idea is another topic.)
> No, vtables are implemented incorrectly by the compiler.
The C++ standard does not even mandate that virtual functions are implemented with vtables.
> The ABI for vtables is specified for every non-trivial platform. The ABI for argument parameter ordering is specified for every platform. The ABI for struct layout and struct padding is specified for every platform.
AFAIK, Microsoft does not officially specify a C++ ABI at all (correct me if I'm wrong!) The only thing that comes close is COM - which can be implemented with a restricted subset of C++, but is really language-agnostic.
> Importantly, claiming that it's a sign that C++ doesn't have an ABI, or that ABI isn't stable, is simply incorrect. The fact that it exists as a name we can reference is an indication that the ABI matters enough that it is specified.
'thiscall' is not part of the C++ standard, it is a calling convention invented by Microsoft. The C++ standard does not talk about calling conventions at all.
> Mercifully in the consumer 64-bit every compiler seems to have essentially gravitated to the itanium ABI
This is true on Linux, but Microsoft specifies its own 64-bit calling convention (https://docs.microsoft.com/en-us/cpp/build/x64-calling-conve...) and leaves other parts of the C++ ABI unspecified.
Generally, you are right that some platforms, like Linux, have a de-facto C++ ABI, but that is not the same as saying that C++ itself has a well-defined ABI - which is simply not true.
Your initial claim was that binaries compiled with different compilers are always compatible (apart from possible library ABI mismatches). We both may wish this were true, but it is false in the general case (and specifically on Windows).
"A PEG parser generator will resolve unintended ambiguities earliest-match-first, which may be arbitrary and lead to surprising parses"
You're misunderstanding the rhetoric there - they're unambiguous by construction because they deterministically resolve earliest-match-first - a well-defined rule.
It's impossible to write an ambiguous PEG. Any grammar you write in a PEG is never ambiguous. A grammar you write using a CFG can be ambiguous.
As your own link says "Unlike CFGs, PEGs cannot be ambiguous".
Or read Ford's original paper https://bford.info/pub/lang/peg.pdf - they solve "the ambiguity problem by not introducing ambiguity in the first place".
(I wrote a much-cited thesis on PEGs and issues such as ambiguity in precedence parsing - I'm not just talking off the top of my head here.)