Clocking a 6502 simulator to 15GHz

Clocking a 6502 simulator to 15GHz(scarybeastsecurity.blogspot.com)

176 points by scarybeast 6 years ago | 74 comments

kabdib 6 years ago |

I was told by Leonard Tramiel (who was my manager at Atari for a while) that the world record for a production 6502 was 25Mhz. This was demonstrated one Friday evening, some time after the beer fridge had been opened in one of the labs.

I don't know if they applied any kind of external cooling, or what the benchmark was. Probably it was "keep cranking up the clock until pins stop wiggling or smoke comes out." Not very scientific, but quite entertaining.

JoachimS 6 years ago | |

One of my old companies (InformASic) developed a VPN solution for serial communication. The product LinkShield (later renamed and spun off to form CrypTango) was implemented as a small ASIC. The main CPU was a 6502 clone with memory protection. We clocked it at 33 MHz, but usually ran them at 25 MHz in the products. That 6502 clone was cycle correc, that is the number of cycles required for an instruction was the same as for the original MOS 6502.

Nowdays you can quite easily to a 6502 implementation in a FPGA running at 100 MHz. Esp if you allow the design to use more cycles for some instructions.

Sadly the product never took off and the companies folded. I have some chips somewhere. Googling at least revealed a picture of the product:

https://www.google.com/imgres?imgurl=https%3A%2F%2Ffarm3.sta...

bemmu 6 years ago | |

From the Commodore book, about the early 80s:

"We actually made a couple of really hot processors for a chess tournament for somebody. He literally water-cooled it, and he ran it at something like eight megahertz. It was just ridiculous how fast he ran it."

Earlier it was explained that some processors coming off the production line could run faster than others, and they could test for it to pick the best ones for such purposes. They didn't end up increasing the clock speed for released computers, as other components could not keep up.

tyingq 6 years ago | |

There's an FPGA 65C02 core running at ~73Mhz. https://github.com/MorrisMA/MAM65C02-Processor-Core

duskwuff 6 years ago | | |

18 MHz, actually -- the FPGA clock speed is 73 MHz, but it executes the equivalent of one 6502 clock cycle in four of its clocks.

That being said, this was implemented on a budget-line FPGA from 2006 (XC3S50A - a small Xilinx Spartan-3A). A modern performance-line FPGA would probably hit a couple hundred MHz easily.

cmrdporcupine 6 years ago | | |

A new stock 65c02 from WDC can do 20mhz. So this FPGA version @ 18mhz doesn't sound any better. Though I'm sure on a modern FPGA one can do more than that.

QuadrupleA 6 years ago | |

Can anyone with an electronics background explain why it's so hard to clock a 6502 higher than a handful of MHz, when modern chips can do 1000x that? Is it just larger transistor scale leading to excess capacitance / slower switching?

raverbashing 6 years ago | | |

One reason is that: transistor size and switching speed. Though the technology of the 6502 probably could go 50MHz? 100MHz? Not sure. Would it be equivalent to 74HC TTL line? Again not sure

But the main (basic) reason is that the internal logic blocks don't worry too much about processing and arrival times beyond the speed at which they need to operate. What's simultaneous at 1MHz might be not so simultaneous at 10MHz or 100MHz

Another (advanced) reason why overclocking it might be hard is EM interference inside and outside the chip.

MagerValp 6 years ago | | |

The 6502 has very limited pipelining, and every CPU cycle is tied to a memory access with no support for wait states or stalls. At 1 MHz it can work with really slow memory (roughly 500 ns), but at 10 MHz it needs ~60 ns, and at 20 MHz something like ~20ns. The architecture simply wasn't designed for anything above single digit clock speeds.

paulmd 6 years ago | |

"until pins stop wiggling"?

variaga 6 years ago | | |

The pins don't physically wiggle. "pins wiggling" is a common metaphor for "the voltage level on a pin is changing".

As a signal driver is toggled at increasing frequencies ('cranking up the clock'), the signal amplitude (voltage difference between the 'high' and 'low' period) starts to drop. At a high enough frequency, the signal will be indistinguishable from noise and 'stops wiggling'.

mmastrac 6 years ago | | |

Wiggling a pin - ie: toggling it. Basically means "activity on the pins"

https://www.cypress.com/blog/technical/more-pdl-examples-wig...

metaphor 6 years ago | | |

Euphemism for when you don't see output pins transition state on a measuring instrument, e.g. oscope.

justwalt 6 years ago | | |

I think it should be “start wiggling” as that makes more sense.

xentripetal 6 years ago | | |

Maybe the pins got so hot they melted into the board?

Cthulhu_ 6 years ago | |

I love reading the stories about overclocking attempts (actually that scene doesn't seem to be much of a thing anymore?), people bolting pipes on top of CPU's and filling them with liquid nitrogen, nearly supercooling the CPU's and breaking speed records.

segfaultbuserr 6 years ago |

It's an interesting article, but...

Better title: Clocking a 6502 Simulator to 15 GHz. There are multiple efforts to recreate the physical 6502 CPU on modern hardware, this is not one of them and should not be confused with that.

arriu 6 years ago | |

I was a bit confused and expected to see some elaborate liquid cooling nonsense to get the poor chip up to 15 GHz.

dang 6 years ago | |

Ok, we've put a simulator in the title above.

JshWright 6 years ago |

I realize I'm late to the party, but I've really been enjoying Ben Eater's series on building a simple computer with a 6502.

https://www.youtube.com/playlist?list=PLowKtXNTBypFbtuVMUVXN...

louwrentius 6 years ago | |

Yes, it's awesome.

halotrope 6 years ago |

After stumbling on Ben Eaters “Hello world from scratch” [1] I went out and bought the cpu some parts and breadboards. The chip is only a few dollars. It is highly recommended if you want to dive down into computers and digital logic on first principles. Also great fun to get a break from all the screens and layers upon layers of software that I have to deal with daily.

1. https://youtu.be/LnzuMJLZRdU

dodo6502 6 years ago | |

Kind of plugging my own project here, but I too am a software developer that found great joy from breaking away from all the layers of abstraction and working directly with the hardware. I created a portable game system with the 6502:

http://www.dodolabs.io/

halotrope 6 years ago | | |

This is awesome!

__s 6 years ago |

For more "very fast simple CPU" architecture, see 50,000,000,000 Instructions Per Second: Design and Implementation of a 256-Core BrainFuck Computer: https://people.csail.mit.edu/wjun/papers/sigtbd16.pdf

russellbeattie 6 years ago |

Huh... I hadn't considered it before, but Bender's brain could actually be a 6502, just being run at an insanely high clock speed. A few petahertz should be able to handle the AI involved, no?

Planck time is like 10^-43 seconds, so there's lots of room to divvy up a second for more processing power given advanced technologies...

gregoryl 6 years ago | |

If the hardware is advanced enough to do that, the AI software is similarly advanced, and a basic 6502 can produce a Bender like AI without breaking a sweat!

hvidgaard 6 years ago | | |

More advanced software would with all likelihood require significant calculations, rendering a basic 6502 useless.

londons_explore 6 years ago |

It would be interesting to compare this project to simply converting 6502 assembly into LLVM IR, and letting clangs optimization passes work their magic.

Obviously self modifying code would be hard to handle, but every other case ought to work, and the auto-vectorization ought to do amazing things to some loop-heavy code.

zentiggr 6 years ago |

GeOS would have been much more responsive...

jsd1982 6 years ago |

To solve the FF page wrapping problem, I wonder if it would work to double-map each 6502 page to x64 host pages side by side. I assume the word read at FF would straddle the two mapped pages effectively reading the second byte at 00. You'd have to map to host page boundaries of course and probably offset all reads/writes to the end of the host page at $3F00.

userbinator 6 years ago |

I believe VMware without hardware support for virtualisation also falls back to "binary translation" and similarly gets tripped by SMC - I don't recall the details right now but one of the ways to detect it was to modify an instruction in an obscure way that the developers had forgotten about.

PaulHoule 6 years ago |

I want to see a 6502-alike clocked to 15GHz with an exotic semiconductor such as GaAs, InP, SiGe, etc.

undersuit 6 years ago | |

Probably wouldn't be able to see it with current fabrication technologies and historic transistor counts.

PaulHoule 6 years ago | | |

Exotic materials, other than maybe SiGe use fabrication techniques less advanced than Si, and the transistor counts are much less.

The department of defense funded an SBIR grant in the late 1990s to produce an InP based microprocessor, given the limits of the time it would have been closer to a 6502 than a Pentium. There has not been word of such a thing since which leads me to conclude that the topic is classified.

The worst limitation a 6502-era chip has is that it has no instruction cache so instruction reads are fighting with data for memory bandwidth. You might even consider a Harvard architecture where the instructions go on a different bus. Without an I-Cache there is no point in pipelining, but there is a lot of pressure to implement CISCy instructions such as the string copy operation from the 8086 line.

The other issue is that there is no DRAM replacement with exotic materials, and all the difficulties with interconnect latency get a lot worse than they already are. It's more clear how to make SRAM, so having somewhere between 64K to 1Mbytes of SRAM on die seems likely for an exotic material CPU.

Of course, armchair CPU designers are more likely to make progress with transition triggered architectures and FPGAs in 2020.

tasty_freeze 6 years ago |

Now someone needs to write an x86 emulator in 6502 asm and boot windows.

orionblastar 6 years ago |

The Mega65 runs a 6502 at 50Mhz compatible with the Commodore 65 plus C64 mode. http://mega65.org

fortran77 6 years ago |

I'm not 100% sure where the 15 GHz equivalent speed calculation comes from.

segfaultbuserr 6 years ago | |

The author showed it at the end of the article. It's the "effective speed" reported by some benchmark programs (including calling subroutines, running for loops, iterating on a string, etc). These are simple and trivial programs and can be highly optimized in a simulator on modern x86_64. Real-world programs, like games, is slower, as acknowledged in the article.

scarybeast 6 years ago | | |

A lot of BBC BASIC programs, doing real work (e.g. Mandelbrot drawing etc.), should have a shot at 10GHz. Games are slower because they are hammering hardware registers external to the JIT (sound, graphics, keyboard polling, timing, etc.)

My laptop is an ancient 5th gen i5 with 2 keys having fallen off, so games are down in the 2GHz - 3GHz range for me. (Perhaps the missing keys make all the difference.)

RoutinePlayer 6 years ago |

X86 ... not x64

ajross 6 years ago | |

The architecture never had a good name. AMD originally called it "x86-64" (but not AFAIK "AMD64", even though lots of other people did), but "x86_64" is most common in the open source world (I guess because the underscore makes it legal as a C symbol). "x64" is what Sun and Microsoft decided to use. Intel has called it "ia32e", "EM64T" and "Intel 64" at various times.

I think this article gets a pass.