The 6502 instruction set as a database(gitlab.com) |
The 6502 instruction set as a database(gitlab.com) |
https://lite.datasette.io/?sql=https://gist.github.com/simon...
Here are the 65 hardcoded opcodes: https://lite.datasette.io/?sql=https://gist.github.com/simon...
And the 64 instructions: https://lite.datasette.io/?sql=https://gist.github.com/simon...
Here's datasette lite running off that file:
https://lite.datasette.io/?parquet=https%3A%2F%2Fcsvbase.com...
Really nice when webby stuff works together
Instead, I hatched a plan:
1. Collect sources, and encoded the raw data in a machine-readable form
2. Study those sources, and encode my understanding as assertions,
sanity-checks and validations of that data
3. Synthesise that data according to my understanding, and verify it
against the sources availableI'm going to be thinking about this as the Discovery Manifesto.... or maybe the autodidactaliser.
I've found it useful to reexpress foreign knowledge in a familiar setting. The magic is possibly that the learning seems fun because the familiar tool is fairly easy to use and the new information is expressed in the familiar way.
It's also good because it's not cheap dopamine, like looking over Youtube videos.... I watched a video on the 6502... I know how it works now??.. Youtubes do have their place, but not at the expense of doing more in depth thinking.
It's the first time I see an instruction set as a relational database, which I would imagine is a very portable way to describe a machine, perhaps it might be worth collecting other machine specs in that same format and then create a portable assembler that uses the specficic DBs.
> A Language for Rapid Processor Specification
From a SLEIGH description, the assembler, disassembler, and even decompiler can be synthesized.
It's a DSL not a database schema, but fundamentally it's the same idea.
Here's their definition of the 6502: https://github.com/NationalSecurityAgency/ghidra/blob/cae919...
- emulators/simulators/FPGA code
- books, data sheets, OCR'd PDFs of books and data sheets, text files copy/pasted from PDFs or retyped from books and data sheets
Code is likely to be heavily tested, but it makes extracting high-level information about the instruction set very difficult.
Data is easy to analyse and synthesise, but since it's described in prose there's no easy way to test or validate it - if somebody in 1984 made a typo that a particular instruction took 3 cycles instead of 2, and that error was copy/pasted and made its way into half the "6502 instruction set" websites online, how would you know? How would you detect it?
Using SQL to enforce constraints and validation gives me confidence that there aren't a bunch of typos and copy/paste errors in this data. In addition, being able to express special cases like "read-modify-write instructions applying to the accumulator do not pay the three cycle penalty" in code rather than in prose makes it more likely they will be applied correctly. Lastly, since the result is an SQL database, it can be pretty easily formatted to resemble any book or data sheet you like for simplified visual verification against book/data sheet sources.
https://lite.datasette.io/?sql=https://gist.github.com/simon...
And I use opcode references [1] very often (sometimes daily, depending on the project). I even wrote my own disassemblers. But I mostly use opcode references for manual cross checking, so maybe I'm not a target of this project?
[1] My favorite one for x64 is https://ref.x86asm.net/coder64.html
The db also includes modern variants of the 6502.
In terms of bytes that the original CPU officially recognised as instructions, it was more like ~150 (working from old memories, I may be off by one or few there). Some of the other ~106 did something unofficially, and a number were valid instructions on later versions of the design.
That ~150 were grouped into 56 instructions, many with multiple addressing modes (so "load A immediate", "load A direct", "load A indexed", etc, were different opcodes but considered the same instruction).
Because register use was far from orthogonal (one accumulator, two index registers, and a flags register), instructions for them were considered different (LDA, LDX, & LDY, for load for instance) where in other instruction sets (for chips with multiple general purpose registers) they might be considered the same instruction affecting a different register, though considering them the same instruction didn't reduce the opcode count just the instruction group count.
(Apologies for failing to keep my inner pendant properly inner!)
I assume you mean pedant.
I was referencing Simonw's post:
> Here are the 65 hardcoded opcodes
I really must stop writing things on the phone. I'm bad enough with a decent keyboard, sometimes I make sense at all via phone input.