Bresenham's Circle Drawing Algorithm (2021)

Bresenham's Circle Drawing Algorithm (2021)(funloop.org)

107 points by RojerGS 1 year ago | 45 comments

corysama 1 year ago |

Do note that Bresenham’s family of algorithms (and much of the venerable Computer Graphics Principles and Practice) are from a bygone era where computers executed 1 instruction per cycle without pipelining or prediction.

These days processors prefer to draw shapes by having a coarse grained pass that conservatively selects tiles of interest then brute-force evaluates each pixel in each tile independently of all others.

Instead of minimizing total work, the goal is to maximize pipelined parallelism while skipping over unnecessary work in large blocks.

Someone 1 year ago | |

> a bygone era where computers executed 1 instruction per cycle without pipelining or prediction

1 instruction per cycle? What luxury bygone era did you grow up in?

Wikipedia tells me the algorithm is from 1962 on an IBM 1401 (https://en.wikipedia.org/wiki/Bresenham's_line_algorithm#His...

That definitely didn’t have many single cycle instructions. Skimming https://ibm-1401.info/A24-6447-0_1401_1460_Instruction_and_T..., I couldn’t find any.

Certainly, in the era of 8-bit CPUs like Z80 and 6502 programmers would have been lyric about “1 instruction per cycle”

Actually, did any CPU ever “execute 1 instruction per cycle without pipelining or prediction” (or, slightly looser “had fixed time instructions without pipelining or prediction”)?

RISC introduced fixed time instructions, but also pipelining.

chillingeffect 1 year ago | | |

Adsp-218x family of harvard architecture DSPs. Each 25 ns clock cycle = 1 MAC, 1 data fetch, and one instruction fetch. All instructions 1 cycle long. And many other gizmos like reversible address bit ranges for DFT in place, separate register bank for IRQ handlers. And all laid out by hand.

kragen 1 year ago | | |

no, you're right, you almost always need pipelining to get one instruction per clock cycle

but there are a lot of cpus out there—maybe the majority now—that are pipelined microarchitectures that get one instruction per cycle without much or any prediction. avrs, most of the cortex-m* (all?), most modern implementations of old slow isas like the z80 and 8051, etc. big processors like your laptop cpu and cellphone cpu are of course superscalar, but they are a tiny minority of all processors. even inside the cellphone case, they're outnumbered by in-order scalar microcontrollers

without prediction, of course, you have a pipeline bubble every time you have a branch, so you never quite hit 1 ipc with scalar execution. but it's usually pretty close, and even with prediction, sometimes you miss. and usually if you have branch prediction, you also have a cache, because ain't nobody got time to share one triflin memory bus between instructions and data

so pipelining gets you from, say, 3 clocks per instruction down to 1.2 or so. then prediction gets you from 1.2 down to, say, 1.02

buescher 1 year ago | | |

I remember the bragging point on the RTX2000 in the very late eighties was "a MIP per megahertz".

fasa99 1 year ago | | |

"oh these new-fangled kids what with their superscalar processors. 100 instructions in a cycle, phooey! Back in my day, it was dang gummed 100 cycles for each instruction, and gosh darnit we liked it! Now that was just for the add instruction, a division, well some say they never figured out how many cycles that was because it took too long. I had an onion in my belt which was the style at the time"

owisd 1 year ago | | |

Probably meant 'cycle' in the sense of instruction cycle, rather than clock cycle.

chiph 1 year ago | |

I think it depends. I had Dr. Bresenham as my graphics instructor (he taught for a while after he retired from IBM) and the class used Borland Turbo Pascal and for it's time it was fast. Not as fast as raw assembly. But faster than Borlands Turbo C that had just come out.

So far as 1 instruction per cycle - Wikipedia says the 80286 (the top dog PC processor of the time) could execute 0.21 "typical" instructions per clock. Optimized code was up to 0.5 instructions per clock. And that agrees with my memories.

Today, I would try and use parallelism if I could. With lots of conditions though - is the process being applied to the image trivially parallelizable, will most/all of it fit in cache, etc. Trying to parallelize Bresenham's algorithms though would be futile - when drawing circles you can reflect it into the different quadrants (big savings) but the algorithm itself is going to be pretty serial because it has to keep up with the error coefficient as it draws.

ack_complete 1 year ago | | |

It's actually not difficult to vectorize Bresenham algorithms, at least the line algorithm. You just have to preroll the algorithm for each of the lanes and then adjust the steps and error factors so each lane steps 4-8 pixels ahead at a time interleaved. I've done this for the floor type 1 render in Ogg Vorbis, which is defined in terms of a Bresenham-like algorithm.

eru 1 year ago | |

It's also from an era when floats were rather expensive.

mabster 1 year ago | | |

I still picture them as expensive. Things like trig functions are still very expensive.

_0ffh 1 year ago | |

Nah, while cycles/instruction where indeed fixed in those days (and for some time yet to come), it was not necessarily 1 cycle but rather depended on the instruction.

tengwar2 1 year ago | | |

Indeed. I used this algorithm on OS/2 1.0 as part of a GUI (OS/2 did not have a GUI until 1.1). That was on an 80386. MASM came with a nice ring-bound A6 book which summarised the instruction set, including timings. I seem to remember that 3 cycles was normal for a short instruction, but many were considerably longer.

userbinator 1 year ago | |

In particular, his line-drawing algorithm is easily beaten by fixed-point methods, which also have the advantage of a very short and non-branchy inner loop:

https://news.ycombinator.com/item?id=9954975

mysterydip 1 year ago | |

"yes, but" there's still places that can take advantage of such algorithms today, namely microcontrollers. I think some scripting languages may even apply here, although many interpreters do some level of compilation/optimization instead of serial execution.

amelius 1 year ago | |

Sounds like your algorithm is from a bygone era where power was not important ;)

corysama 1 year ago | | |

If you can get the work done fast and hurry the processor back into a low-power sleep state ASAP, it can actually be power efficient too!

phire 1 year ago | |

1 instruction per cycle? No, that's only possible with pipelining.

We are talking about instructions which took 8-12 cycles to complete.

kragen 1 year ago | | |

usually this is correct, but there are some exceptions. most instructions on a non-pipelined but synchronous stack machine like the mup21 take a single cycle, for example

even with a register file, it isn't really inherent that you need to decode inputs, do alu operations, and write outputs in separate clock cycles; you can do all of that in combinational logic except writing the outputs, and you can even decode which register to write the output to. it just means your max clock rate is in the toilet

for that kind of thing a harvard architecture is pretty useful; it allows you to read an instruction in instruction memory at the same time you're reading or writing data in data memory, instead of in two separate cycles

brcmthrowaway 1 year ago | |

Got an exsmple?

possiblywrong 1 year ago |

> Note that if F(x,y)=0, then the point (x,y) is exactly on the circle. If F(x,y)>0, then the point is outside of the circle, and if F(x,y)<0 then the point is inside of it. In other words, given any point (x,y), F(x,y) is the distance from the true circle line [my emphasis].

This last is not quite true. The exact distance from the circle, call it G(x,y), is the corresponding difference of square roots, i.e.,

  def G(x, y, r):
    return math.sqrt(x * x + y * y) - math.sqrt(r * r)

and G(x,y) isn't just the square root of F(x,y), and indeed doesn't behave monotonically with respect to F(x,y).

It's an interest property of Bresenham's algorithm, that I've never seen even stated let alone proved in the literature, that this doesn't matter, and the algorithm is indeed exact in the sense that it always chooses the next point based on which is truly closest to the circle... despite using an error function that is only an approximation.

bsenftner 1 year ago |

My computer graphics professor back in the early 80's was Prof. Glen Bresenham, but not The Bresenham. It was a lot of fun being at SIGGRAPH back then and watching people freak upon reading his name badge. He'd let them believe for a bit, and then explain it's not him. Al Acorn was one that freaked, and that was fun.

kragen 1 year ago | |

alcorn?

bsenftner 1 year ago | | |

Allan Alcorn, American electrical engineer and computer scientist, is an American pioneering engineer and computer scientist best known for creating Pong, one of the first video games

KingOfCoders 1 year ago |

What I found most amazing about Bresenham during my Amiga days, is how you can use it to zoom and shrink bitmaps.

nikolay 1 year ago |

This is easy; anti-aliasing is thougher. I used this algorithm in the '90s to come up with one for drawing ellipses. I've never had to do a disc or filled ellipse with or without outloune, but that would be interesting, too. Line size > 1 would be interesting as well.

seanhunter 1 year ago |

Feels like simply using the parametric form of the circle equation would be way easier

For t in 0 to 2 pi in whatever small steps you want, draw_pixel(x=a+rcos t, y=b+rsin t) where a and b are the x and y coordinates you want for the centre of the circle and r is the radius.

The derivation of this form is pretty simple if you know basic trig. https://publish.obsidian.md/uncarved/3+Resources/Public/Unit...

ericyd 1 year ago |

A cool write up with an approachable explanation of algorithm optimization! I wonder how long it would take me to arrive at this algorithm left to my own devices...