BBC BASIC raytracer in 432 characters(mastodon.me.uk) |
BBC BASIC raytracer in 432 characters(mastodon.me.uk) |
https://bbcmic.ro/#%7B%22v%22%3A1%2C%22program%22%3A%22MODE1...
It would look even better with bidirectional error diffusion, but that requires reading back memory which I don't know how to do.
https://chat.openai.com/share/6e754fa1-531d-432e-a767-8c8132...
> VDU5: Sets the output to graphics.
Send a command to the visual display unit to turn off the flashing cursor.
> B=0: Initializes variable B to 0. This variable will be used for brightness calculations.
B is used to propagate quantization error from one pixel to the next - specifically, if we wanted one pixel to be orange, but the closest color we could use was red, then B contains "more yellow needed". In the next pixel, this will be added in to give this pixel more chance of being yellow.
> recursion
There is no recursion here. It's an infinite loop that exits when the raytraced ray fails to hit the floor or a sphere.
> The 4M and 4N are for scaling the coordinates to the screen resolution.
No - this is because early computers didn't have enough memory to store the whole screen image. Therefore there were 'device pixels' and 'logical pixels'. In this mode, each row of pixels is output 4 times into the screen, making each pixel into a group of 16 pixels (4x4).
Drawing commands in BBC basic are done in device pixels, but there is no point in figuring out a color for every device pixel if you can only store 1 in 16 of them..
Didn't check the inner loop since I didn't write that code - but I suspect it has similar accuracy issues.
Also tried having it generate a JavaScript version, but the output doesn't look right: https://chat.openai.com/share/5d76e32b-81a1-4b4a-a808-7682a8... I'd be inclined to suspect the lack of line numbers, as a first guess towards the problem.
Intel's optimization manual has suggestions for fast versions at 15.12.3 (recommended by @James https://stackoverflow.com/a/2637823/10981777) Don't look especially long to implement for 22-bit approximation.
https://software.intel.com/content/www/us/en/develop/downloa...
Also, what's the SQRD? Not finding that referred to anywhere.
Do enjoy the insane things ppl managed to cram into tweets to the bbcmicrobot over the years. Anyone who's followed any part of the demoscene especially the classic smaller payload demos.. 4k etc.....this kinda thing is mindblowing and totally in the same spirit.
i'm running this on a ryzen 5 3500u at 2400 megahertz. the acorn electron which supposedly takes 8 hours and 40 minutes is a 1 megahertz 6502 when running from ram, roughly, 262144 instructions per second. at 2 ipc one core of the ryzen should be about 4800 mips. if the emulation slowdown is 10× (10 host instructions to emulate one guest instruction), which is typical for naïve interpretation, it should be about 1000× faster on this laptop as on the original hardware
(possibly the basic is in rom and so it's closer to 524288 instructions per second)
emulation through qemu-like dynamic code generation should cut the 10× slowdown to 3×, so it should be 3000× faster than the original hardware
where did that factor of 1000 go? not the javascript engine, surely?
incidentally there is a rocket-ship button underneath the output screen image labeled 'send to beebjit' which runs the program in about a second
Incredible raytraced demo in 256 bytes.
https://www.youtube.com/watch?v=36BPql6Nl_U
Incredible raytraced demo in 128 bytes. (An underwater journey through a Menger-sponge fractal)
then i looked and rrrola's 4is256 https://www.pouet.net/prod.php?which=29286 is a working tetris in under 256 bytes, and it's better than mine because it has colors and scoring. i could blame a little bit of inflation on arm being 32-bit rather than 16-bit, but not 4×
In something like python, an analogous thing is turtle graphics. Is there something more similar (but still very basic) in python for something like this? A mini graphical language?
This is a small set of primitives around a tkinter canvas. I suppose one could expose a small set of primitives around a pygame too.
I have to think hard about what exactly I would want.
You don’t have a lot of memory to play with though as you only have 32k to start with and the frame buffer will use 20k of that, so I’m not sure how good a look up table you could make for SQR.
https://www.rustexplorer.com/b#Zm4gbWFpbigpIHsKICAgIC8vIHJvd...
If you just want to see the result, type MODE 1 and then press return, then type *LOAD SCREEN and press return again.
IF V < 0 then P = (Y + 2)/V : V = -V*((INT(X - U*P) + INT(Z - W*P) AND 1)/2 + .3) + .2
then the (palette index of?) the color of the sky is computed from the square root of the y-coordinate of the direction vector as GCOL 0, 3 - (48*SQR V) DIV 16
plus a fixed dithering tableE.g. see:
W=1/SQR(U*U+V*V+1)
This makes a sphere. It gets multiplied by two coordinates to make spheres at different places, U and V.if you change the initialization of i from sgn u to 1.2 * sgn u you will get an image with the spheres a little further apart
Consider that each chip of a home computer system (CPU, 1..2 IO/timer chips, audio, video, ...) needs to do 'emulation work' for each 1 or 2 MHz clock cycle which can add up to quite a number of host computer instructions (dozens to hundreds).
If each chip emulator just takes around 10..20 host system clock cycles to emulate one emulator clock cycle, then you are already looking at around 100 host system clock cycles per emulated clock cycle for the entire home computer (in reality it's probably worse).
Such 'vertically sliced' emulation code is also full of conditional branches which put a heavy burden on the host CPU branch predictor.
...and that's how a theoretical 1000x speedup suddenly becomes a 10x speedup, it's not the CPU emulation (this is usually cheap) but the rest of the emulated system which can be expensive.
Different emulators use all sorts of tricks and shortcuts, but usually with tradeoffs like less precise emulation, or less 'compartmentalized' code.
PS: case in point this is just the top-level per-cycle function in my C64 emulator, which in turn calls per-cycle-functions in each of the chip emulators (which may each be just as much code):
https://github.com/floooh/chips/blob/9a7f6d659b5d1bbf72bc8d0...
I'm trying to strike a balance between 'purity' (e.g. an entire emulated system can be 'plugged together' by wiring together chip emulators, just like one would build a real 8-bit computer on a breadboard), while still being fast enough to comfortably run in realtime even in browsers (https://floooh.github.io/tiny8bit/c64.html).
It's definitely possible to implement a faster cycle-correct C64 emulator with a less 'pure' architecture, but it's quite hard to do meaningful optimizations while keeping that same 'virtual breadboard' architecture.
...considering all the code that runs per cycle it's actually amazing how fast modern CPUs are :)
The Firefox profiler suggests it's spending most of its time doing nothing, waiting for the next Window.requestAnimationFrame.
10 REMĀĘĆĞ$Č*Ēĉ!ăě-ĕ'ď
rem dithering table in comment above consulted by GCOL line?
rem raytracer from https://bbcmic.ro/?t=9ctpk by https://mastodon.me.uk/@coprolite9000
rem deobfuscated and possibly broken by kragen javier sitaker (untested)
20 MODE 1
VDU 5
30 FOR N = 8 TO 247 : rem loop over rows (n) and columns (m) of pixels
FOR M = 0 TO 319
40 X = 0 : rem three-dimensional vector
Y = -.1
Z = 3
U = (M - 159.5)/160 : rem xform screen coords to direction vector
V = (N - 127.5)/160
W = 1/SQR(U*U + V*V + 1) : rem normalize direction vector
U = U * W
V = V * W
I = SGN U : rem derive coordinate of sphere center (i, i, 1)?
G = 1
50 E = X - I : rem beginning of do-while loop
F = Y - I
P = U*E + V*F - W*Z
D = P*P - E*E - F*F - Z*Z + 1
IF D <= 0 then goto 60
T = -P - SQR D
IF T <= 0 then goto 60
X = X + T*U
Y = Y + T*V
Z = Z - T*W
E = X - I
F = Y - I
G = Z
P = 2*(U*E + V*F - W*G)
U = U - P*E
V = V - P*F
W = W + P*G
I = -I : rem if we are reflecting then check other sphere now
GOTO 50 : rem end of do-while loop, we have our escape from the scene geometry
rem color according to y-component of direction vector to get sky gradient
rem if instead y component is negative, make a checkerboard of pieces of sky
60 IF V < 0 then P = (Y + 2)/V : V = -V*((INT(X - U*P) + INT(Z - W*P) AND 1)/2 + .3) + .2
70 GCOL 0, 3 - (48*SQR V + ?(PAGE + 5 + M MOD 4 + N MOD 4*4)/3) DIV 16
80 PLOT 69, 4*M, 4*N
NEXT,
i'm interested to hear what i got wrongsee also https://news.ycombinator.com/item?id=39026517 and https://news.ycombinator.com/item?id=39026495
I'd maybe phrase this a bit differently – in both cases it uses the v coordinate to calculate the shade but whereas the sky is a simple gradient, the plane has a "ground fog" type effect that can be adjusted with those +.3 and +.2 magic constants.
>50% right... 50% trash...
That's not bad at all for ChatGPT. I guess it will be a long time before a computer can really accurately describe what a complicated uncommented algorithm is doing.
At the start of the code is a comment "REM xxxxxx". It stripped this comment out thinking it was junk.
But actually, the code later uses the "PAGE" command to directly read RAM to get the contents of this comment, which it then uses as a lookup table for dithering.
When I fed it your updated version that didn't include the REM, farther down the page, it still didn't quite work. Although I can tell it's not too badly broken. There's some visual structure in the output, just not the expected image.
What a nightmare to read though if you don't know what it's supposed to be written like. Is it a: "FORM=" or a "FOR M=" ? Is it: "S GNU:G", "SG NU:G", "SGN U:G"? SQRD totally looks like some special kind of square root implementation.
// sphere_center = (E, F, Z)
E = X - I
F = Y - I
// solving the ray-sphere quadratic eqn...
P = U * E + V * F - W * Z
// discriminant
D = P * P - E * E - F * F - Z * Z + 1
// if solution (ie. intersection) exists
IF D > 0
// intersect_point = ray_orig + T * ray_dir
// this is the closer pt, -P + SQR D is the other
T = -P - SQR DIn most home computer BASIC interpreters, the source code (including line numbers) never exists as text in memory or on tape/disk, only as binary tokens. The only place where the original input text exists is a small line buffer which gets translated into binary token form when pressing Enter/Return.
The LIST command translates the token form back into human readable text.
And if you use 'AUTO' you also never type any line numbers... so are they actually part of the source code? Is there even 'source code' when the original source is never stored as text anywhere? Is the 'source code size' then the size of the original text data (that actually never exists a whole), or the size of binary token representation in memory and on tape/disk? :)
5 REM MY COOL PROGRAM
10 PRINT “HELLO”
20 GOTO 10
Then later I write 10 PRINT “DIE BART DIE”
How does it know to replace the right line?so, in the space of a single 16-bit thumb or rv32c instruction, you can fit 1.8–2.8 instructions, and that makes a big difference
i compiled some code for rv32ec. here are 8 instructions of it occupying 22 bytes. what would this look like in a stack bytecode?
38: 00158693 addi a3,a1,1 a1 1 + 3 bytecodes
3c: 43d8 lw a4,4(a5) a5 4 + @ 4
3e: 08e6fa63 bgeu a3,a4,d2 <.L15> < if <.L15> 3 bytecode bytes including jump target
42: 439c lw a5,0(a5) a5 @ 2
44: 058a slli a1,a1,0x2 a1 2 lshift 3
46: 7179 addi sp,sp,-48 literal 48 stackframe 3
48: 97ae add a5,a5,a1 + 1
4a: 0007a303 lw t1,0(a5) @ 1
we can see that in a stack instruction set this would use about 20
bytes, barely less, because the instructions being smaller is offset
by their being more numerous.
on the other hand, some of the code is occupied with doing
things like allocating a stack frame, which usually isn’t necessary on
a stack machinebut as far as i can tell on the x87 (which i have never programmed, i'm just going by p. a-1 (176/258) et seq. of http://bitsavers.trailing-edge.com/components/intel/80386/23...) all the instructions are at least two bytes, so i don't see where you get any extra code density
for what it's worth, the subroutine that the above was taken from compiles to 62 instructions and 156 bytes for rv32ec, 61 instructions and 189 bytes for amd64, and 52 instructions and 138 bytes for arm cortex-m4. i'll be compiling it for my own stack-based virtual machine this year but i don't have even a prototype compiler backend yet
https://www.usenix.org/legacy/events%2Fvee05%2Ffull_papers/p... [2005]
(Hey, I seem to remember tha an Anton Ertl posts to comp.compilers.)
Spoiler: they claim that with their sophisticated translation from stack to register code, they eliminated 47% of the instructions, and the resulting code is still around 25% larger than the byte code. The size advantage goes to stack-based byte code, but it may not necessarily be as large as you might think.
So more or less in line with your findings or intuition?
i thought they had to be, running on processors with that little memory. though later on i learned about forth, which is surprisingly efficient for an interpreted language
a more likely explanation is that sophie wilson was just a better hacker than bill gates and paul allen
I loved it at the time, and the more I think back on it the more impressed I am. Like it had a very decent suite of floating-point routines, which if I remember right were very performant. In a 32KB ROM!
Because BBC Basic had a built-in assembler it was pretty uncommon for BBC programs to inline machine code as raw data (unlike some other computers from BITD).