"Most of the design for implementing the UNIX system for System/370 was done in 1979, and coding was completed in 1980. The first production system, an IBM 3033AP, was installed at the Bell Laboratories facility at Indian Hill in early 1981."
https://web.archive.org/web/20240930232326/https://www.bell-...
I sometimes wonder if that compiler has survived anywhere.
I wonder if anybody still has a copy of Oracle v2 or v3?
Oldest I've ever seen on abandonware sites is Oracle 5.1 for DOS
> The mainframes at the time didn't have C compilers
Here's a 1975 Bell Labs memo mentioning that C compilers at the time existed for three machines [0] – PDP-11 UNIX, Honeywell 6000 GCOS, and "OS/370" (which is a bit of a misnomer, I think it actually means OS/VS2 – it mentions TSO on page 15, which rules out OS/VS1)
That said, I totally believe Oracle didn't know about the Bell Labs C compiler, and Bell Labs probably wouldn't share it if they did, and who knows if it had been kept up to date with newer versions of C, etc...
SAS paid Lattice to port their C compiler to MVS and CMS circa 1983/1984, so probably around the same time Oracle was porting Oracle to IBM mainframes – because I take it they also didn't know about or couldn't get access to the Bell Labs compiler
Whereas, Eric Schmidt succeeded in getting Bell Labs to hand over their mainframe C compiler, which was used by the Princeton Unix port, which went on to evolve into Amdahl UTS. So definitely Princeton/Amdahl had a mainframe C compiler long before SAS/Lattice/Oracle did... but maybe they didn't know about it or have access to it either. And even though the original Bell Labs C compiler was for MVS (aka OS/VS2 Release 2–or its predecessor SVS aka OS/VS2 Release 1), its Amdahl descendant may have produced output for Unix only
I assume whatever C compiler AT&T's TSS-based Unix port (UNIX/370) used was also a descendant of the Bell Labs 370 C compiler. But again, it probably produced code only for Unix not for MVS, and probably wasn't available outside of AT&T either
[0] https://archive.org/details/ThePortableCLibrary_May75/page/n...
if (argc<4) {
error("Arg count");
exit(1);
}I remember few buddies using similar pattern in ASM that just added n NOP's into code to allow patching and thus eliminating possible recompilation..
Boy it took a lot of code to get a window behaving back in the day... And this is a much more modern B/C; it's actually ANSI C but the API is thick.
I did really enjoy the UX of macOS 6 and it's terse look, if you can call it that [3].
[1] https://www.gryphel.com/c/minivmac/start.html
[2] https://archive.org/details/think_c_5
[3] https://miro.medium.com/v2/resize:fit:1024/format:webp/0*S57...
Your System 6.0.8 is from April 1991, so TCL was well established by then and the C/C++ version in THINK C 5 even used proper C++ features instead of the hand-rolled "OOP in C" (nested structs with function pointers) used by TCL in THINK C 4.
I used TCL for smaller projects, mostly with THINK Pascal which was a bit more natural using Object Pascal, and helped other people use it and transition their own programs that previously used the Toolbox directly, but my more serious programs used MacApp which was released for Object Pascal in 1985, and for C++ in 1991.
waste() /* waste space */
{
waste(waste(waste),waste(waste),waste(waste));
waste(waste(waste),waste(waste),waste(waste));
waste(waste(waste),waste(waste),waste(waste));
waste(waste(waste),waste(waste),waste(waste));
waste(waste(waste),waste(waste),waste(waste));
waste(waste(waste),waste(waste),waste(waste));
waste(waste(waste),waste(waste),waste(waste));
waste(waste(waste),waste(waste),waste(waste));
} tree() {
extern symbol, block, csym[], ctyp, isn,
peeksym, opdope[], build, error, cp[], cmst[],
space, ospace, cval, ossiz, exit, errflush, cmsiz;
auto op[], opst[20], pp[], prst[20], andflg, o, p, ps, os;
...
Looks like "extern" is used to bring global symbols into function scope. Everything looks to be "int" by default. Some array declarations are specifying a size, others are not. Are the "sizeless" arrays meant to be used as pointers only?Cool kids may talk about memory safety but ultimately someone had to take care of it, either in their code or abstracted out of it.
If anything the cool kids are rediscovering what we lost in systems programming safety due to the wide adoption of C, and its influence in the industry, because the cool kids from 1980's decided memory safety wasn't something worth caring about.
"A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980 language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."
-- C.A.R Hoare's "The 1980 ACM Turing Award Lecture"
Guess what programming language he is referring to by "1980 language designers and users have not learned this lesson".
IMO it is young people that have trouble understanding.
The same mistakes are made over and over, lessons learned long ago are ignored in the present
It easier to write than read, easier to talk than listen, build new than expand the old
commit d82b11e4a46307f1f1415024f33263e819c222b8 Author: Brian Kernighan <bwk@research.att.com> Date: Fri Apr 1 02:03:04 1988 -0500
last-minute fix: convert to ANSI C
R=dmr
DELTA=3 (2 added, 0 deleted, 1 changed)
:100644 100644 8626b30633 a689d3644e M src/pkg/debug/macho/testdata/hello.ccommit 0744ac969119db8a0ad3253951d375eb77cfce9e Author: Brian Kernighan <research!bwk> Date: Fri Apr 1 02:02:04 1988 -0500
convert to Draft-Proposed ANSI C
R=dmr
DELTA=5 (2 added, 0 deleted, 3 changed)
:100644 100644 2264d04fbe 8626b30633 M src/pkg/debug/macho/testdata/hello.ccommit 0bb0b61d6a85b2a1a33dcbc418089656f2754d32 Author: Brian Kernighan <bwk> Date: Sun Jan 20 01:02:03 1974 -0400
convert to C
R=dmr
DELTA=6 (0 added, 3 deleted, 3 changed)
:100644 000000 05c4140424 0000000000 D src/pkg/debug/macho/testdata/hello.b
:000000 100644 0000000000 2264d04fbe A src/pkg/debug/macho/testdata/hello.ccommit 7d7c6a97f815e9279d08cfaea7d5efb5e90695a8 Author: Brian Kernighan <bwk> Date: Tue Jul 18 19:05:45 1972 -0500
hello, world
R=ken
DELTA=7 (7 added, 0 deleted, 0 changed)
:000000 100644 0000000000 05c4140424 A src/pkg/debug/macho/testdata/hello.b main(argc, argv)
int argv[];
This is a culture shock. Did the PDP-11 not distinguish between `char` and `int`?And yes, the PDP-11 did have byte addressing while the PDP-7 on which Unix was originally created was a word addressed machine.
In "ancient"/K&R C, types weren't specified with the parameters, but on the following lines afterwards. GCC would still compile code like this, if passed the -traditional flag, until ... some point in the last decade or so. Still, this style was deprecated with ANSI C/C89, so it had a good run.
main(argc, argv)
char argv[][];
Which sadly is no longer valid in C.B was bootstrapped in BCPL, then rewritten in B to be self-hosting. But the transition from B to NB (New B) to C was continuous evolution. Thompson or Richie would add a feature to the compiler, compile a new compiler, then change the compiler source to use the new feature. If you did not have a sufficiently new enough B/NB/C compiler you could not compile the compiler and there was no path maintained to deal with that. You went down the hall and asked someone else to give you the newer compiler.
There also wasn't a definitive point where NB became C... they just decided it had changed enough and called it C.
Also helpful: C history https://en.wikipedia.org/wiki/C_language#History
From wikipedia, early Unix was developed on PDP/11 (16-bit).
signed 16-bit ints, 8-bit chars, arrays of those previous types.
identifiers were limited in length? (I'm seeing 8 chars, lowercase, as the longest)
octal numeric constants, was hexadecimal used?
there was only a line editor available (vi was 1976)
did the file system support directories at that point?
no C preprocessor, no header files. (1973)
no make/makefiles (1976)
was there a std library used with the linker or an archive of object files that was the 'standard' library?
Bourne shell wasn't around (1979), so wikipedia seems to point to the Thompson shell - https://en.wikipedia.org/wiki/Thompson_shell
was there a debugger or was printf the only tool?
https://web.archive.org/web/20250130134200/https://www.bell-...
See also this comment https://news.ycombinator.com/item?id=43462794
And don't answer "to waste space of course" please. :)
Even today's machines often have a limit as to the offset that can be included in an instruction, so a compiler will have to use different machine instructions if a branch or load/store needs a larger offset. That would be another thing that this function might be useful to test. Actually that seems more likely.
It might be instructive to compare the binary size of this function to the offset length allowed in various PDP-11 machine instructions
" A second, less noticeable, but astonishing peculiarity is the space allocation: temporary storage is allocated that deliberately overwrites the beginning of the program, smashing its initialization code to save space. The two compilers differ in the details in how they cope with this. In the earlier one, the start is found by naming a function; in the later, the start is simply taken to be 0. This indicates that the first compiler was written before we had a machine with memory mapping, so the origin of the program was not at location 0, whereas by the time of the second, we had a PDP-11 that did provide mapping. (See the Unix History paper). In one of the files (prestruct-c/c10.c) the kludgery is especially evident. "
a better way to think of extern is, "this symbol is not declared/defined/allocated here, it is declared/defined/allocated someplace else"
"this is its type so your code can reference it properly, and the linker will match up your references with the declared/defined/allocated storage later"
(i'm using reference in the generic english sense, not pointer or anything. it's "that which can give you not only an r-value but an l-value")
C and its contemporaries introduced automatic or in modern terms local or stack allocated values, often with lexically-scoped lifetimes. extern meaning something outside this file declares the storage for it and register meaning the compiler should keep the value in a register.
However auto has always been the default and thus redundant and style-wise almost no one ever had the style of explicitly specifying auto so it was little-used in the wild. So the C23 committee adopted auto to mean the same as C++: automatically infer the type of the declaration.
You can see some of B's legacy in the design of C. Making everything int by default harkens back to B's lack of types because everything was a machine word you could interpret however you wanted.
Also with original C's function declarations which don't really make sense. The prototype only declares the name and the local function definition then defines (between the closing paren and the opening brace) the list of parameters and their types. There was no attempt whatsoever to have the compiler verify you passed the correct number or types of parameters.
You can think of deduction as crap type inference.
Modern C won't let you put extern declarations inside a function like this, basically because it's bad practice and makes the code less readable. You can of course still put them at global scope (e.g. at top of the source file), but better to put them into a header file, with your code organized into modules of paired .h definition and .c implementation files.
https://www.nokia.com/bell-labs/about/dennis-m-ritchie/bintr...
Have a look at the early history of C document on DMR's site, it mentions that the initial syntax for pointers was that form.
Reality is not simple. Every language that’s used for real work has to deal with reality. It’s about how the language helps you manage complexity, not how complex the language is.
Maybe Forth gets a pass but there’s good reason why it’s effectively used in very limited circumstances.
The rest of the complexity stems from the language being a thin layer over a von Neumann abstract machine. You can mess up your memory freely, and the language doesn’t guarantee anything.
Representing computation as words of a fixed bit length, in random access memory, is not (See The Art of Computer Programming). And the extent to which other languages simplify is creating simpler memory models.
C forces programs to be simple, because C doesn't offer ways to build powerful abstractions. And as an occasional C programmer, I enjoy that about it. But I don't think it's simple, certainly not from an implementer's perspective.
It uses a different memory model than current hardware, which is optimized for C. While I don't know what goes on under SBCL's hood, the simpler Lisps I'm familiar with usually have a chunk of space for cons cells and a chunk of "vector" space kinda like a heap.
Lisp follows s-expression rules... except when it doesn't. Special forms, macros, and fexprs can basically do anything, and it's up to the programmer to know when sexpr syntax applies and when it doesn't.
Lisp offers simple primitives, but often also very complex functionality as part of the language. Just look at all the crazy stuff that's available in the COMMON-LISP package, for instance. This isn't really all that different than most high level languages, but no one would consider those "simple" either.
Lisp has a habit of using "unusual" practices. Consider Sceme's continuations and use of recursion, for example. Some of those - like first-class functions - have worked their way into modern languages, but image how they would have seemed to a Pascal programmer in 1990.
Finally, Lisp's compiler is way out there. Being able to recompile individual functions during execution is just plain nuts (in a good way). But it's also the reason you have EVAL-WHEN.
All that said, I haven't invested microcontroller Lisps. There may be one or more of those that would qualify as "simple."
But also C itself is very simple language. I do not mean C++, but pure C. I would probably start with this. Yes, you will crash at runtime errors, but besides that its very very simple language, which will give you good understanding of memory allocation, pointers etc.
Best programming joke. Teacher said when your code becomes "recalcitrent", we had no idea what he meant. This was in the bottom floor of the library, so on break, we went upstairs and used the dictionary. Recalcitrant means not obeying authority. We laughed out loud, and then went silent. Opps.
The instructor was a commentator on the cryptic-C challenges, and would often say... "That will not do what you think it will do" and then go on and explain why. Wow. We learned a lot about the pre-processor, and more about how to write clean and useful code.
It's still a tad more complicated than it needs to be - e.g. you could drop non-0-based arrays, and perhaps sets and even enums.
C is simple for some use cases, and not for others.
Syntactically, yes. Semantically, no.
There are languages with tons of "features" with far, far less semantic overhead than C.
https://blog.regehr.org/archives/767
FWIW, writing programs in C has been my day job for a long time.
Turing Tarpits like Brainfuck or the Binary Lambda Calculus are a more extreme demonstration of the distinction, they can be very tiny languages but are extremely difficult to actually use for anything non-trivial.
I think difficulty follows a "bathtub" curve when plotted against language size. The smallest languages are really hard to use, as more features get added to a language it gets easier to use, up to a point where it becomes difficult to keep track of all the things the language does and it starts getting more difficult again.
I would say Zig is the spiritual follower from the first two, while Go follows up the Oberon and Limbo heritage.
That's because computers are very complex with tons of nuance.
I remember a Computerphile video where prof. Brailsford said something along the lines of "nobody knew who wrote the first C compiler, everybody just kinda had it and passed it around the office" which I think is funny. There's some sort of analogy to life and things emerging from the primordial soup there, if you squint hard enough.
the page that's referenced from GitHub doesn't describe that
http://cm.bell-labs.co/who/dmr/primevalC.html
however there probably was a running c compiler (written in assembly) and an assembler and a linker available, hand bootstrapped from machine code, then assembler, linker, then B, NB and then C...
We can't tell but that would make sense...
Then they tweaked the compiler and called it NB (New B), then eventually tweaked it enough they decided to call it C.
The compiler continuously evolved by compiling new versions of itself through the B -> New B -> C transition. There was no clean cutoff to say "ah this was the first C compiler written in New B".
You can see evidence of this in the "pre-struct" version of the compiler after Ritchie had added structure support but before the compiler itself actually used structs. They compiled that version of the compiler then modified the compiler source to use structs, thus all older versions of the compiler could no longer compile the compiler: https://web.archive.org/web/20140708222735/http://thechangel...
Primeval C: https://web.archive.org/web/20140910102704/http://cm.bell-la...
A modern bootstrapping compiler usually keeps around one or more "simplified" versions of the compiler's source. The simplest one either starts with C or assembly. Phase 0 is compiled or assembled then is used to compile Phase 1, which is used to compile Phase 2.
(Technically if you parsed through all the backup tapes and restored the right versions of old compilers and compiler source you'd have the bootstrap chain for C but no one bothered to do that until decades later).
https://www.opengroup.org/openbrand/register/index2.html
That would include a C compiler, but yours is probably on tape somewhere.
Linux has been on this list, courtesy of two Chinese companies.
A Lisp compiler today should by default evaluate every top level form that are compiles, unless the program opts out of it.
I made the decision in TXR Lisp and it's so much nicer that way.
There are fewer surprises and less need for boilerplate for compile time evaluation control. The most you usually have to do is tell the compiler not to run that form which starts your program: for instance (compile-only (main)). In a big program with many files that could well be the one and only piece of evaluation control for the file compiler.
The downside of evaluating everything is that these definitions sit in the compiler's environment. This pollution would have been a big deal when the entire machine is running a single Lisp image. Today I can spin up a process for the compiling. All those definitions that are not relevant to the compile job go away when that exits. My compiler uses a fraction of the memory of something like GCC, so I don't have to worry that these definitions are taking up space during compilation; i.e. that things which could be written to the object file and then discarded from memory are not being discarded.
Note how when eval-when is used, it's the club sandwich 99% of the time: all three toppings, :compile-toplevel, :load-toplevel, :execute are present. The ergonomics are not very good. There are situations in which it would make sense to only use some of these but they rarely come up.
There was a lot of self-modification, going on, in those days. Old machine language stuff had very limited resources, so we often modified code, or reused code space.
edit: http://cm.bell-labs.co/who/dmr/primevalC.html (linked from another comment has the answer):
> A second, less noticeable, but astonishing peculiarity is the space allocation: temporary storage is allocated that deliberately overwrites the beginning of the program, smashing its initialization code to save space. The two compilers differ in the details in how they cope with this. In the earlier one, the start is found by naming a function; in the later, the start is simply taken to be 0. This indicates that the first compiler was written before we had a machine with memory mapping, so the origin of the program was not at location 0, whereas by the time of the second, we had a PDP-11 that did provide mapping. (See the Unix History paper). In one of the files (prestruct-c/c10.c) the kludgery is especially evident.
So I guess it has to be a function in order to be placed in front of main() so the buffer can overflow into the no longer needed code at the start of it.
This can be a strength, to be fair - the human mind really does tend to get stuck in a rut based on familiarity, and someone new to the domain can spot solutions that others haven't because of that. But more often, it turns into futile attempts to solve problems while forgetting the lessons of the past.
Also, notice how the functions call each other from wherever, even from different files, without need of any forward declarations, it simply works, which, as I have been repeatedly told, is not something a single-pass compiler can implement :)
Regarding functions, it only works if the function returns int and you match the types correctly for any call that doesn't have the prototype in scope. I believe this to be one of the relict B compatibility features, BTW, since that's exactly how it also worked in B (except that int is the only type there so you only had to match the argument count correctly).
I was refuting the idea that they sat down and wrote the C compiler in B, then rewrote the compiler in C and compiled it with the B-compiled C compiler. You and the parent might not have meant it that way but I wanted to clarify because in modern terms that is what many people will assume.
What else is there? Pointless distinction between the declaration syntax of functions and procedures?
[1]for certain definitions of 'really'
That is just an illusion to trip unsuspecting programmers who have false mental models. Pointers are not addresses, and pointer arithmetic is rife with pitfalls. There is the whole pointer provenance thing, but that's more like the tip of the iceberg.
That is really the problem with C; it feels like you can do all sorts of stuff, but in reality you are just invoking nasal demons. The real rules on what you can and can not do are far more intricate and arcane, and nothing about them is very obvious on the surface level.
Until WG14 makes everything you love about C "undefined behavior" in the name of performance.
What do you mean?
I just looked up WG14 and I cannot see what you mean
A link perhaps? Am I going to have to "pin" my C compiler version?
For some of these people WG14 (the C language sub-committee of SC22, the programming language sub-committee of JTC1, the Joint Technical Commitee between ISO and the IEC) is the problem because somehow they've taken this wonderful language where you just write stuff and it definitely works and does what you meant and turned into something awful.
This doesn't make a whole lot of sense, but hey, they wrote nonsense and they're angry that it didn't work, do we expect high quality arguments from people who mumble nonsense and make wild gestures on the street because they've imagined they are wizards? We do not.
There are others who blame the compiler vendors, this at least makes a little more sense, the people who write Clang are literally responsible for how your nonsense C is translated into machine code which does... something. They probably couldn't have read your mind and ensured the machine code did what you wanted, especially because your nonsense doesn't mean that, but you can make an argument that they might do a better job of communicating the problem (C is pretty hostile to this, and C programmers no less so)
For a long time I thought the best idea was to give these people what they ostensibly "want" a language where it does something very specific, as a result it's slow and clunky and maybe after you've spent so much effort to produce a bigger, slower version of the software a friend wrote in Python so easily these C programmers will snap out of it.
But then I read some essays by C programmers who had genuinely set out on this path and realised to their horror that their fellow C programmers don't actually agree what their C programs mean, the ambiguity isn't some conspiracy by WG14 or the compiler vendors, it's their reality, they are bad at writing software. The whole point of software is that we need to explain exactly what the machine is supposed to do, when we write ambiguous programs we are doing a bad job of that.
BTW: The right place to complain if you disagree would be the compiler vendors. In particular the Clang side pushes very much for keeping C and C++ aligned, because they have a shared C/C++ FE. So if you want something else, please file or comment on bugs in their bug tracker. Similar for other compilers.
To the beat of my knowledge, there was no case where "auto" wasn't redundant. See e.g. https://stackoverflow.com/a/2192761
This makes me feel better about repurposing it, but I still hate the shitty use it's been put to.
Actually I did use C compilers, with K&R C subset for home computers, where auto mattered.
Naturally they are long gone, this was in the early 1990's.
People can skip the usual lifecycle and feedback for an idea by presenting jumping directly to committee stage./
The idea that C does not offer ways to build powerful abstractions is also wrong in my opinion. It basically allows the same abstractions as other languages, but it does not provide as much syntactic sugar. Whether this syntactic sugar really helps or whether it obscures semantics is up to debate. In my opinion (having programmed a lot more C++ in the past), it does not and C is better for building complex applications than C++. I build very complex applications in C myself and some of the most successful software projects were build using C. I find it easier to understand complex applications written in C than in other languages, and I also find it easier to refactor C code which is messed up compared to untangling the mess you can create with other languages. I admit that some people might find it helpful to have the syntactic sugar as help for building abstractions. In C you need to know how to build abstractions yourself based on training or experience.
I see a lot of negativity towards C in recent years, which go against clear evidence, e.g. "you can not build abstractions" or "all C programs segfault all the time" when in reality most of the programs I rely on on a daily basis and which in my experience never crash are written in C.
The preprocessor is a classic example of simplicity in the wrong direction: it's simple to implement, and pretty simple to describe, but when actually using it you have to deal with complexity like argument multiple evaluations.
The semantics are a disaster ("undefined behavior").
This is damning with faint praise. Perl is undecidable to parse! Even if C isn't as bad as Perl, it's still bad enough that there's an entire Wikipedia article devoted to how bad it is: https://en.wikipedia.org/wiki/Lexer_hack
Doesn't sound as much of a problem with the language as it is with the design of earlier compilers.
Walter Bright (who, among other things, has been employed to work on a C preprocessor) seems to disagree that the C preprocessor is simple to implement: https://news.ycombinator.com/item?id=20890749
> The preprocessor is fiendishly tricky to write. [...] I had to scrap mine and reimplement it 3 times.
I have seen other people in the general "C implementer/standards community" complain about it as well.
I'm curious, what language do you know of with a more complex macro system than the whole C preprocessor?
EDIT: To be clear to prospective downvoters, I'm not just throwing these languages out because they're hype or whatever. They all have a grammar that's much simpler to parse. Notably, you can construct a parse tree without a semantic analyser which is capable of running in lockstep with the parser to provide semantic information to the parser. You can just write a parser which makes a parse tree.
When people say that C is a simple language, my interpretation is that they mean it is easy to interpret what a C program does at a low level, not that it is simple to write.
C ain't simple, it's an organically complex language that just happens to be small enough that you can fit a compiler into the RAM of a PDP-11.
Features only get added when there is a champion to push for them forward across all hurdles (candidate), and voted in by its peers (election), at the end of a government cycle (ISO revision), the compiler users rejoice for the new set of features.
The C preprocessor is not text substitution.
It is not easy to describe what C does at a low level. There are simple, easy to describe and wrong models of what C does "at a low level". C's semantics are defined by a very difficult to understand standards document, and if you use one of those simple and enticing mental models, you will end up with incorrect C code which works until you try a different compiler or enable optimisations.
The preprocessor has some weird behavior, it it is also not very complicated.
And I would argue that the abstract machine model of C is still relatively simple. There are are certainly simpler languages in this regard, but they give up one of the key powers of C, i.e. that you can manipulate the representation of objects on a byte level.