Linux on a Commodore 64(github.com) |
Linux on a Commodore 64(github.com) |
https://www.folklore.org/StoryView.py?story=Saving_Lives.txt
To explain for the uninitiated how rare this bit of hardware is. The REU available for the c64 back in the day were 256kB and 512kB. These are most commonly built replicas as there are schematics available for them. Sometime in the late 90s there was also an "expansion" for c64 that contained a completely new CPU (superCPU - 65816) that was code compatible with the original and I believe this device could accommodate up to 16mb.
Later reimplementations based purely on fpga popped up including a REU with 16mb. The original SuperCPU schematic was lost to time. Allegedly fpga based expansions are available to buy for few hundred EUR now, but I don't know anyone that attempted to buy one or has one.
So, although it is a neat trick(still a cool tech achievement) , saying it runs on c64 is akin to saying I got doom3 running on a 386, but my 386 is actually a pci card in a modern pc...
If I can't pull my c64 with hardware available back in the day (or hardware one could realistically built back in the day) I'm not sure saying "runs on c64" is correct.
Coming back to the subject of a REU, why has no one published a schematic for one yet? There are cheap SRAM chips floating on ebay. It should be trivial to put one together. Unfortunately it isn't, because the original (Super Cpu) had two components we need a beefy fpga to emulate. The supercpu itself and it's dma controller which was a custom asic I believe.
Perhaps as cheap(ER) fpgas or uC with fpga-like functionality become available someone will create an open source "super cpu". As of yet, everyone I ever heard using these, uses emulation. Nothing wrong with that, but I get the most out of my "retro hobby" by running original hardware. Emulation is very useful for dev, but for general use it's a bit "meh" for me.
A 16MB REU could absolutely have been built in the 80s. It would have been absolutely astronomically expensive, but there’s no technical reason it could not be done. You seem to be confusing the SuperCPU with a plain REU expansion — the REU is just a bunch of RAM and an ASIC that talks to the 64 and allows it to store or retrieve banks of RAM (because obviously a 6502 cannot address more than 64K so instead you have to tell it to swap out system RAM) — there is no CPU on it.
The SuperCPU (65816) can indeed address up to 16MB directly and that is a different thing. The project in the OP runs on a stock C64 on the stock C64 CPU, it just needs a mountain of RAM that would have cost the equivalent of a house back then ;)
Or is that how far your purism in anti-emulation goes? (;
Now I want one of those. I guess these days you could easily fit a 386, 486, Pentium and who knows what other SoCs on a single PCIe card, passively cooled…
I've tried the "Kung Fu Flash"- it's a software defined cartridge that is cheap- just a single STM32 and can do pretty much everything. I bought this because I'm trying to duplicate the developer experience I see on "8-bit show and tell"- it can emulate the "super snapshot", but not the REU. It's a really nice way to quickly try a lot of C-64 software and games.
https://8bithardware.wixsite.com/website/post/kung-fu-flash
https://github.com/KimJorgensen/KungFuFlash
I also have an SD2IEC: what I've learned is that it would have been useful to get a variant with an extra DIN socket. It's nice but I was never a fan of C-64's DOS and this reinforces it. To mount a D64 disk image you have to: OPEN1,8,15,"CD:MYIMAGE.D64":CLOSE1... yuck..
JiffyDOS (replacement ROM for the C-64) improves this (it's faster and includes a permanent DOS wedge), I bought one- it's on the way. I'm curious to try it with the real 1541 drive.
What got me started on this recently is the "Penultimate +2" cartridge for the VIC-20:
https://www.youtube.com/watch?v=eNGyneXHKJQ
In this case, I basically bought a VIC-20 just to try out the cartridge.
Only slightly noticeable waiting times when I accessed some sites, but it worked and the application works too.
I think you'd want to aim somewhere around the Pentium 4 / Athlon XP era. The docs say it doesn't support the original Pentium, so I suppose you could go back as far as the Pentium II if you really want to suffer.
Still impressive of course, but semantics matter :)
I’d be interested in watching a time-lapse video of that on real hardware, if someone has a couple of months/years to spare. ;)
-- Commodore sold a Ram Expansion Unit named "1764" to bring the C64 to 256kb of RAM;
-- it was possible to use the REU for the C128 named "1750" to bring the C64 to 512kb of RAM;
-- and it is possible to expand on that to have a 2MB REU for the C64 - see https://www.neperos.com/article/rlut8ce90fbb7701
You can have two megabytes on the C64, pretty "legally".
As far as I know, we still don't have an open source equivalent of that dma chip.
(The max addressable memory with a C64 REU is 16 Megabytes.)
Start there and you’ll find a rabbit hole of reasonable depth.
On the VIC-20, you even get a few colors!
I don’t think you’ll manage with a recent linux kernel. Heck, even 2.6-era stuff won’t fit easily.
As others have mentioned, a 6502 is very poorly suited to C-style code, but a Z80 should be somewhat better with that.
https://en.wikipedia.org/wiki/KERNAL
> The KERNAL was known as kernel[6] inside of Commodore since the PET days, but in 1980 Robert Russell misspelled the word as kernal in his notebooks. When Commodore technical writers Neil Harris and Andy Finkel collected Russell's notes and used them as the basis for the VIC-20 programmer's manual, the misspelling followed them along and stuck.[7]
> According to early Commodore myth, and reported by writer/programmer Jim Butterfield among others, the "word" KERNAL is an acronym (or, more likely, a backronym) standing for Keyboard Entry Read, Network, And Link, which in fact makes good sense considering its role. Berkeley Softworks later used it when naming the core routines of its GUI OS for 8-bit home computers: the GEOS KERNAL.
I had a 6502 machine language book of his as a kid. I figured out in my head what I thought I wanted to do with the various instructions, then wrote out on graph paper the (decimal) number for the op or it’s args, then transcribed the whole affair into memory manually via POKEs. Good times.
You need to both decrypt and decode all at above the framerate of the video, doubt that will be doable on any older hardware, unless ssid hardware has dedicated components for those functions.
If I had to implement this on an old CPU I would likely be passing network, video and encryption off to co-processors and the older chip will effectively only be running control information.
But that that point why not just use a modern low power chip.
Linux Mint Vanessa, running the Mate DE.
The program is a C program with a single Makefile. My workflow was (and still is, even on my desktop) using a Vim with three vertical splits:
1. A LHS split which is a terminal to run make and execute the program for testing
2. A RHS split with the program source code (single file program).
3. A middle split with the test input file and test output file (in horizontal splits).
Although it is just a single file, on my other C projects I've used the same laptop, with the same 3-vert-split Vim, with multiple tabs, so up to maybe 16-20 source files open at a time for a single project.
Building C projects is very fast, even on the Core 2 Duo/2GB RAM setup. Running a similar workflow but in VSCode on my desktop is less snappier than Vim on the laptop.
I haven't tried doing a Go project on that laptop yet with VSCode, but I am tempted to see what happens :-)
But now I can tell why you're experience was good. Mate is a phenomenal desktop environment certainly, so 2GB is probably more than enough for a daily driver.
I know from experience if you block the bottom intake vents on a VIC 20, so it can't convect properly, it will eventually start acting funky.
On a related note, I understand the 64's original power bricks are considered timebombs, they might also not appreciate being left on for weeks at a time.
Yes. These early computers found their way into various embedded control applications too, and I suspect there's quite a few C64s still in operation that way; they would've been replaced long ago if they weren't stable. An article occasionally appears when someone discovers this:
If you block all air circulation, sure you might eventually end up with problems, but it takes quite a bit.
For 6502, to get the optimum assembly you'd have to structure your data in structure-of-arrays instead of arrays-of-structures and use indices instead of pointers as much as possible (at least when amount of Ball objects would be < 256).
On line 14, it uses Y, then decrements it to 0, uses it, increments it, uses decrements, uses it, then increment again.. why not perform the indirect load on lines 18 and 26 without the Y index and eliminate lines 16, 21, and 25?
here's my pseudocode:
rc2 <= base of struct
rc4 = rc2 + 4 // addr of dx
rc5 = rc5 + 0 // addr of x
rc6 = *(&rc2+4)
rc4 = *(&rc4+1) // get low byte
rc5 = rc6 + *(&rc2) // add high byte
rc4 = rc4 + *(&rc2+1) // add low byte
rc2 = rc5 // store high result
*(&rc2+1) = rc4 // store low result
I believe it could have done more to do the work in place, but my batt is about to die :(
Now the 64C was released in 1986, four years after the 64 and its faulty power supplies came out. I don't know whether Commodore had decisively fixed the flawed PSUs by that time, but I know for sure that my second PSU lasted for the lifetime of that device too.
Instead of passing in a pointer to two separate functions, I'd write a single UpdateBalls procedure that operated on global data. This data is going to be core to my game logic and physics, so I'd put it all on the ZP. As you suggested, "structure-of-arrays". I'd choose a fixed number of balls so I don't need an argument; maybe I'd set my loop to iterate backwards so I get a free zero check with the decrement, maybe I'd unroll the loop ("dead" balls can be placed off-screen with a dx/dy of 0). I'd probably decide that I don't need 16-bit precision for the deltas (how fast could the balls move, really?), and a 16-8 addition is going to be quicker than a 16-16 one.
The compiler isn't going to make these optimizations; that's not a slight against the compiler. In fact, I just checked - the output [0] when I write my C code this way is pretty close to what I'd hand-write. It's roughly a third the number of instructions and - I'm not going to cycle count, so this is a stab in the dark - would take maybe an order of magnitude fewer cycles to run. semu wasn't written with performance on the 6502 in mind, it's not going to have taken considerations like this, so it's going to inevitably be slow when compiled.
Now that this has come up again as the stock reason "you can't do C well on the 6502", replacing the stack, the zero page, and the register set, I'm probably going to reprioritize it and put the register allocator on pause.