The earliest versions of the first C compiler known to exist

The earliest versions of the first C compiler known to exist(github.com)

375 points by diginova 1 year ago | 173 comments

tanelpoder 1 year ago |

The first publicly available version of Oracle Database (v2 released in 1979) was written in assembly for PDP-11. Then Oracle rewrote v3 in C (1983) for portability across platforms. The mainframes at the time didn't have C compilers, so instead of writing a mainframe-specific database product in a different language (COBOL?), they just wrote a C compiler for mainframes too.

chasil 1 year ago | |

UNIX was ported to the System/370 in 1980, but it ran on top of TSS, which I understand was an obscure product.

"Most of the design for implementing the UNIX system for System/370 was done in 1979, and coding was completed in 1980. The first production system, an IBM 3033AP, was installed at the Bell Laboratories facility at Indian Hill in early 1981."

https://web.archive.org/web/20240930232326/https://www.bell-...

jdougan 1 year ago | | |

Interesting. Summer 84/85 (maybe 85/86) I used a port of PCC to System/360 (done, I believe, by Scott Kristjanson) on the University of British Columbia mainframes (Amdahls running MTS). I was working on mail software, so I had to deal with EBCDIC/ASCII issues, which was no fun.

I sometimes wonder if that compiler has survived anywhere.

skissane 1 year ago | |

> The first publicly available version of Oracle Database (v2 released in 1979) was written in assembly for PDP-11.

I wonder if anybody still has a copy of Oracle v2 or v3?

Oldest I've ever seen on abandonware sites is Oracle 5.1 for DOS

> The mainframes at the time didn't have C compilers

Here's a 1975 Bell Labs memo mentioning that C compilers at the time existed for three machines [0] – PDP-11 UNIX, Honeywell 6000 GCOS, and "OS/370" (which is a bit of a misnomer, I think it actually means OS/VS2 – it mentions TSO on page 15, which rules out OS/VS1)

That said, I totally believe Oracle didn't know about the Bell Labs C compiler, and Bell Labs probably wouldn't share it if they did, and who knows if it had been kept up to date with newer versions of C, etc...

SAS paid Lattice to port their C compiler to MVS and CMS circa 1983/1984, so probably around the same time Oracle was porting Oracle to IBM mainframes – because I take it they also didn't know about or couldn't get access to the Bell Labs compiler

Whereas, Eric Schmidt succeeded in getting Bell Labs to hand over their mainframe C compiler, which was used by the Princeton Unix port, which went on to evolve into Amdahl UTS. So definitely Princeton/Amdahl had a mainframe C compiler long before SAS/Lattice/Oracle did... but maybe they didn't know about it or have access to it either. And even though the original Bell Labs C compiler was for MVS (aka OS/VS2 Release 2–or its predecessor SVS aka OS/VS2 Release 1), its Amdahl descendant may have produced output for Unix only

I assume whatever C compiler AT&T's TSS-based Unix port (UNIX/370) used was also a descendant of the Bell Labs 370 C compiler. But again, it probably produced code only for Unix not for MVS, and probably wasn't available outside of AT&T either

[0] https://archive.org/details/ThePortableCLibrary_May75/page/n...

ggm 1 year ago | |

I very much doubt anyone from the time wants to talk about it, but there is substantial bad blood about Oracle and Ingres. I believe not all of this story is in the public domain, nor capable of being discussed without lawyers.

dboreham 1 year ago | |

Writing something that large in assembly is pretty crazy, even in 1979!

acchow 1 year ago | | |

Keep in mind, Oracle was designed to run with 128KB of RAM (no swapping). So it was really tens of thousands of lines, not millions.

saghm 1 year ago | | |

Was it actually that uncommon back then? My understanding is that there were other things (including Unix itself, since it predated C and was only rewritten in it later) written in assembly initially back in the 70s. Maybe Oracle is much larger compared to other things done this way than I realize, or maybe the veneration of Unix history has just been part of my awareness for too long, but for some reason hearing that this happened with Oracle doesn't seem to hit as hard for me as it seems for you. It's possible become so accustomed to something historically significant that I fail to be impressed by a similar feat, but I genuinely thought that assembly was just the language used for stuff low-level for a long time (not that I'm saying there weren't other systems languages besides C, but my recollection is having read that for a while some people were skeptical of the idea of using any high-level language in the place of assembly for systems programming).

ChrisMarshallNY 1 year ago |

This is my favorite function :): https://github.com/mortdeus/legacy-cc/blob/936e12cfc756773cb...

arp242 1 year ago | |

Gotta love the user-friendliness of these old Unix tools:

  if (argc<4) {
      error("Arg count");
      exit(1);
  }

Rendello 1 year ago | | |

SQLite error messages are similarly spartan. I wrote a SQLite extension recently and didn't find it difficult to have detailed/dynamic error messages, so it may have just been a preference of the author.

Amlal 1 year ago | |

Ah, yes, was that because of a lack of inline assembly? I feel like these could be replaced by 'nop' operations.

johnisgood 1 year ago | |

What is the point of it?

aap_ 1 year ago | | |

It's an awkward way to reserve memory. The important detail here is that both compiler phases do this, and the way the programs are linked guarantees that the reserved region has the same address in both phases. Therefore an expression tree involving pointers can be passed to the second phase very succinctly. Not pretty, no, but hardware limitations force you to do come up with strange solutions sometimes.

fxtentacle 1 year ago | | |

It's an obscure way to statically allocate memory for the ospace pointer.

rasjani 1 year ago | | |

Without actually knowing, i'd guess that would generate bytecode's that could be modified later by patching the resulting binary ?

I remember few buddies using similar pattern in ASM that just added n NOP's into code to allow patching and thus eliminating possible recompilation..

agumonkey 1 year ago | | |

warm up the stack ? (no idea to be honest)

tanelpoder 1 year ago | | |

The C alternative for the hardware "halt and catch fire" instruction?

keyle 1 year ago |

Aside: I was playing with Think C [2] yesterday and macOS 6.0.8 (emulated with Mini vMac [1]).

Boy it took a lot of code to get a window behaving back in the day... And this is a much more modern B/C; it's actually ANSI C but the API is thick.

I did really enjoy the UX of macOS 6 and it's terse look, if you can call it that [3].

[1] https://www.gryphel.com/c/minivmac/start.html

[2] https://archive.org/details/think_c_5

[3] https://miro.medium.com/v2/resize:fit:1024/format:webp/0*S57...

brucehoult 1 year ago | |

It's much less of your own code if you use TCL (THINK Class Library), which shipped with THINK C 4.0 (and THINK Pascal) in mid 1989.

Your System 6.0.8 is from April 1991, so TCL was well established by then and the C/C++ version in THINK C 5 even used proper C++ features instead of the hand-rolled "OOP in C" (nested structs with function pointers) used by TCL in THINK C 4.

I used TCL for smaller projects, mostly with THINK Pascal which was a bit more natural using Object Pascal, and helped other people use it and transition their own programs that previously used the Toolbox directly, but my more serious programs used MacApp which was released for Object Pascal in 1985, and for C++ in 1991.

keyle 1 year ago | | |

Thanks for this. I was using think C 3.X last night unaware that there is a 5.0. I figured it out as I typed and googled this morning. I will have to revisit the 5.0, and pick up a digitised book.

dark-star 1 year ago |

My favorite function, which some might say even made it into Windows ;-)

    waste()  /* waste space */
    {
     waste(waste(waste),waste(waste),waste(waste));
     waste(waste(waste),waste(waste),waste(waste));
     waste(waste(waste),waste(waste),waste(waste));
     waste(waste(waste),waste(waste),waste(waste));
     waste(waste(waste),waste(waste),waste(waste));
     waste(waste(waste),waste(waste),waste(waste));
     waste(waste(waste),waste(waste),waste(waste));
     waste(waste(waste),waste(waste),waste(waste));
    }

bluetomcat 1 year ago |

Interesting usage of "extern" and "auto". Quite different from contemporary C:

    tree() {
        extern symbol, block, csym[], ctyp, isn,
        peeksym, opdope[], build, error, cp[], cmst[],
        space, ospace, cval, ossiz, exit, errflush, cmsiz;

        auto op[], opst[20], pp[], prst[20], andflg, o, p, ps, os;
        ...

Looks like "extern" is used to bring global symbols into function scope. Everything looks to be "int" by default. Some array declarations are specifying a size, others are not. Are the "sizeless" arrays meant to be used as pointers only?

ricardo81 1 year ago |

Reminds me of the humility every programmer should have, basically we're standing on the shoulders of giants and abstraction for the most part. 80+ years of computer science.

Cool kids may talk about memory safety but ultimately someone had to take care of it, either in their code or abstracted out of it.

pjmlp 1 year ago | |

Memory safety predates C by a decade, in languages like JOVIAL (1958), ESPOL/NEWP (1961) and PL/I (1964), it follows along in the same decade outside Bell Labs, PL/S(1970), PL.8 (1970), Mesa (1976), Modula-2 (1978).

If anything the cool kids are rediscovering what we lost in systems programming safety due to the wide adoption of C, and its influence in the industry, because the cool kids from 1980's decided memory safety wasn't something worth caring about.

"A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to--they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980 language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."

-- C.A.R Hoare's "The 1980 ACM Turing Award Lecture"

Guess what programming language he is referring to by "1980 language designers and users have not learned this lesson".

estebank 1 year ago | |

The "cool kids talking about memory safety" are indeed standing on the shoulders of giants, to allow for others to stand even taller.

wang_li 1 year ago | |

Big non sequitur, but your comment triggered a peeve of mine that I find it ironic when people talk like oldsters can't understand technology.

worik 1 year ago | | |

> ...people talk like oldsters can't understand technology

IMO it is young people that have trouble understanding.

The same mistakes are made over and over, lessons learned long ago are ignored in the present

It easier to write than read, easier to talk than listen, build new than expand the old

90s_dev 1 year ago |

The thing I always loved about C was its simplicity, but in practice it's actually very complex with tons of nuance. Are there any low level languages like C that actually are simple, through and through? I looked into Zig and it seems to approach that simplicity, but I have reservations that I can't quite put my finger on...

FeistySkink 1 year ago |

Missed opportunity not calling it LegaC.

smackay 1 year ago |

1972 is the answer to the question on the lips of everybody too busy to look at the source files.

jeff_carr 1 year ago | |

The first 4 commits in GO are:

commit d82b11e4a46307f1f1415024f33263e819c222b8 Author: Brian Kernighan <bwk@research.att.com> Date: Fri Apr 1 02:03:04 1988 -0500

    last-minute fix: convert to ANSI C
    
    R=dmr
    DELTA=3  (2 added, 0 deleted, 1 changed)

:100644 100644 8626b30633 a689d3644e M src/pkg/debug/macho/testdata/hello.c

commit 0744ac969119db8a0ad3253951d375eb77cfce9e Author: Brian Kernighan <research!bwk> Date: Fri Apr 1 02:02:04 1988 -0500

    convert to Draft-Proposed ANSI C
    
    R=dmr
    DELTA=5  (2 added, 0 deleted, 3 changed)

:100644 100644 2264d04fbe 8626b30633 M src/pkg/debug/macho/testdata/hello.c

commit 0bb0b61d6a85b2a1a33dcbc418089656f2754d32 Author: Brian Kernighan <bwk> Date: Sun Jan 20 01:02:03 1974 -0400

    convert to C
    
    R=dmr
    DELTA=6  (0 added, 3 deleted, 3 changed)

:100644 000000 05c4140424 0000000000 D src/pkg/debug/macho/testdata/hello.b :000000 100644 0000000000 2264d04fbe A src/pkg/debug/macho/testdata/hello.c

commit 7d7c6a97f815e9279d08cfaea7d5efb5e90695a8 Author: Brian Kernighan <bwk> Date: Tue Jul 18 19:05:45 1972 -0500

    hello, world
    
    R=ken
    DELTA=7  (7 added, 0 deleted, 0 changed)

:000000 100644 0000000000 05c4140424 A src/pkg/debug/macho/testdata/hello.b

deweywsu 1 year ago |

Am I interpreting this repo correctly? The first C compiler was written in...C?

Joker_vD 1 year ago |

Funnily enough, it is emphatically not a single-pass compiler.

dbrower 1 year ago | |

I don’t think anybody thinks or thought it was.

int_19h 1 year ago | | |

I thought it would be, given that C is designed in such a way that a single pass ought to be sufficient. Single-pass compilers were not uncommon in that era.

diginova 1 year ago |

Also read how a compiler can be written in the same language - https://en.wikipedia.org/wiki/Bootstrapping_%28compilers%29

https://stackoverflow.com/a/18247926/15566831

robertkoss 1 year ago |

As someone who has no touchpoints with lower languages at all, can you explain to me why those files are called c01, c02 etc.?

tempodox 1 year ago |

Highly interesting!

  main(argc, argv)
  int argv[];

This is a culture shock. Did the PDP-11 not distinguish between `char` and `int`?

jecel 1 year ago | |

Here the `int` is being used in place of `char *`, not `char`.

And yes, the PDP-11 did have byte addressing while the PDP-7 on which Unix was originally created was a word addressed machine.

tempodox 1 year ago | | |

I see, now it makes sense.

spijdar 1 year ago | |

Of course it did -- this was one of the distinguishing features (byte addressing) of the PDP-11 vs the original machine that ran UNIX, the PDP-7, after all ;-)

In "ancient"/K&R C, types weren't specified with the parameters, but on the following lines afterwards. GCC would still compile code like this, if passed the -traditional flag, until ... some point in the last decade or so. Still, this style was deprecated with ANSI C/C89, so it had a good run.

Maken 1 year ago | |

I find even more interesting that in a later version this appears:

  main(argc, argv)
  char argv[][];

Which sadly is no longer valid in C.

ModernMech 1 year ago |

I thought the first C compiler was written in B.

9rx 1 year ago | |

If we had the full change history you would see that it is written in B. New features were added and changes were iteratively made along the way, but it is the same codebase. Nowadays we'd pick some change point and call it B v2, but back then they named that point C.

xenadu02 1 year ago | | |

That's not quite correct. See my comment here: https://news.ycombinator.com/item?id=43465698

B was bootstrapped in BCPL, then rewritten in B to be self-hosting. But the transition from B to NB (New B) to C was continuous evolution. Thompson or Richie would add a feature to the compiler, compile a new compiler, then change the compiler source to use the new feature. If you did not have a sufficiently new enough B/NB/C compiler you could not compile the compiler and there was no path maintained to deal with that. You went down the hall and asked someone else to give you the newer compiler.

There also wasn't a definitive point where NB became C... they just decided it had changed enough and called it C.

indoordin0saur 1 year ago | |

Yes. I'm not an expert in compilers, but how is the first c compiler also written in C? How did they compile the compiler?

ModernMech 1 year ago | | |

There's a thread here which talked about it: https://news.ycombinator.com/item?id=26721305

ZhiqiangWang 1 year ago |

Can't stop thinking about Ken Thompson Hack. This should be a clean one ...

aap_ 1 year ago |

Probably one of my favorite pieces of software of all times. Learned so much from this!

gus_massa 1 year ago | |

Do you remember any interesting anecdote you can share?

aap_ 1 year ago | | |

Anecdote probably not. But i learned how a compiler works from it and reconstructed the B compiler based on it (found here: https://github.com/aap/b, warning: repo is messy, will clean up more soon hopefully).

canucker2016 1 year ago |

Can people who have used/were around at this time (early 1970s) give a description of the typical dev environment?

Also helpful: C history https://en.wikipedia.org/wiki/C_language#History

From wikipedia, early Unix was developed on PDP/11 (16-bit).

signed 16-bit ints, 8-bit chars, arrays of those previous types.

identifiers were limited in length? (I'm seeing 8 chars, lowercase, as the longest)

octal numeric constants, was hexadecimal used?

there was only a line editor available (vi was 1976)

did the file system support directories at that point?

no C preprocessor, no header files. (1973)

no make/makefiles (1976)

was there a std library used with the linker or an archive of object files that was the 'standard' library?

Bourne shell wasn't around (1979), so wikipedia seems to point to the Thompson shell - https://en.wikipedia.org/wiki/Thompson_shell

was there a debugger or was printf the only tool?

int_19h 1 year ago | |

I'm not sure about max identifier length in general, but identifiers exported across translation units (i.e. non-static in modern C) were limited to 6 significant chars as late as ISO C90, although I don't think there were still any compilers around at the time that actually made use of this limit.

andromaton 1 year ago |

Somebody check for Trojan horses. (Ref to Ken Thompson)

sbassi 1 year ago |

which compiler is used to compile the first compiler?

sc68cal 1 year ago | |

With BCPL

https://web.archive.org/web/20250130134200/https://www.bell-...

See also this comment https://news.ycombinator.com/item?id=43462794

ramon156 1 year ago |

Love how unserious some of the code comments are. Makes you feel less noob for a second :')

higgins 1 year ago |

disappointed this didn't link to some analysis of clay tablets