QOI – The “Quite OK Image Format” for fast, lossless image compression

QOI – The “Quite OK Image Format” for fast, lossless image compression(github.com)

200 points by JeanMo 4 years ago | 103 comments

sreekotay 4 years ago |

If QOI is interesting because of speed, you might take a look at fpng, a recent/actively developed png reader/writer that is achieving comparable speed/compression to QOI, while staying png compliant.

https://github.com/richgel999/fpng

Disclaimer: have not actively tried either.

jws 4 years ago | |

I find it interesting that QOI avoids any kind of Huffman style coding.

Huffman encoding lets you store frequently used values in fewer bits than rarely occurring values, but the cost of a naïve implementation is a branch on every encoded bit. You can mitigate this by making a state machine keyed by "accumulated prefix bits" and as many bits as you want to process in a whack, these tables will blow out your L1 data cache and trash a lot of your L2 cache as well.¹

The "opcode" strategy in QOI is going to give you branches, but they appear nearly perfectly predictable for common image types², so that helps. It has a table of recent colors, but that is only of a few cache lines.

In all, it seems a better fit for the deep pipelines and wildly varying access speeds across cache and memory layers which we find today.

␄

¹ I don't think it ever made it into a paper, but in the mid-80s, when the best our Vax ethernet adapters could do was ~3Mbps I was getting about 10Mbps of decompressed 12 bit monochrome imagery out of a ~1.3MIP computer using this technique.

² I also wouldn't be surprised if this statement is false. It just seems that for continuous tone images one of RGBA, DIFF, or LUMA is going to win for any given region of a scan line.

chrismorgan 4 years ago | | |

Meta: ␄ (https://en.wikipedia.org/wiki/End-of-Transmission_character) isn’t the right control character when footnotes follow; ␃ (https://en.wikipedia.org/wiki/End-of-Text_character) is a better fit, and ␌ (https://en.wikipedia.org/wiki/Form_feed) would be a decent choice too.

(I write comments with footnotes in the same style as you, but use “—⁂—” as the separator, via Compose+h+r (name from the HTML tag horizontal rule). Good fun being able to use Compose+E+O+T, Compose+E+T+X and Compose+F+F in this comment; I added the full set to my .XCompose years ago.)

adgjlsfhk1 4 years ago | | |

One thing to note is that QOI composes really nicely with high quality entropy encoders like LZ4 and ZSTD. LZ4 gives a roughly 5% size reduction with negligible speed impact, and ZSTD gives a 20% size reduction with moderate speed impact (https://github.com/nigeltao/qoi2-bikeshed/issues/25).

ErikCorry 4 years ago | | |

I would think that you could use a hybrid approach where you have a table that is perhaps 9 or 10 bits wide and covers many of the more common codes, which will by definition be more common. Should be small enough to fit in the cache. Then do something slower for the very long codes. This way you avoid difficult branches most of the time.

ErikCorry 4 years ago | |

Funnily enough it's just a few days since I did some similar code to support writing PNGs from a small embedded device. In this case the full deflate algorithm seemed like overkill in memory and CPU requirement, and most of the images were probably going to be served over a LAN anyway. https://twitter.com/toitpkg/status/1471986776357097475

https://github.com/toitlang/toit/commit/65c6c1bd7138f9ebced4... It's not as highly optimized as this effort though, and it just uses the standard huffman table that is built into deflate, rather than a static-but-custom one.

cornstalks 4 years ago |

A couple previous interesting discussions from this past month:

- "The QOI File Format Specification" 214 points | 3 days ago | 54 comments: https://news.ycombinator.com/item?id=29625084

- "QOI: Lossless Image Compression in O(n) Time" 1057 points | 29 days ago | 293 comments: https://news.ycombinator.com/item?id=29328750

FullyFunctional 4 years ago |

Since we are rehashing this for the 3rd (4th?) time, I'll repeat mine (and apparently many others) key critique: there is no thought at all to enabling parallel decoding, be it, thread-parallel or SIMD (or both). That makes it very much a past millennium style format that will age very poorly.

At the very least, break it into chunks and add an offset directory header. I'm sure one could do something much better, but it's a start.

EDIT: typo

throwamon 4 years ago |

Didn't HN disallow recent reposts? This (or its spec) was already posted 3 days ago (twice) and then 2 days ago...

https://news.ycombinator.com/item?id=29625084

https://news.ycombinator.com/item?id=29631717

https://news.ycombinator.com/item?id=29643370

versteegen 4 years ago | |

It's because this link (https://github.com/phoboslab/qoi) has been approved by mods for reposting: it appears on the "pool" list [1] [2]. Which is a bit odd because as you point out, a different link [3] for the same project already received lots of attention.

[1] https://news.ycombinator.com/pool

[2] https://news.ycombinator.com/item?id=26998308

[3] https://news.ycombinator.com/item?id=29625084

corysama 4 years ago |

I think QOI inspired the creation of https://github.com/richgel999/fpng which creates standard PNGs and compares itself directly to QOI.

phoboslab 4 years ago |

Don't expect too much of QOI.

I wanted a simple format that allows you to load/save images quickly, without dealing with the complexity of JPEG or PNG. Even BMP, TIFF and other "legacy" formats are way more complicated to handle when you start looking into it. So that's what QOI aims to replace.

There's a lot of research for a successor format ongoing. Block based encoding, conversion to YUV, more OP-types etc. have all shown improved numbers. Better support for metadata, different bit-depths and allowing restarts (multithreading) is also high on the list of things to implement.

But QOI will stay as it is. It's the lowest of all hanging fruits that's not rotten on the ground.

hulitu 4 years ago | |

> I wanted a simple format that allows you to load/save images quickly, without dealing with the complexity of JPEG or PNG. Even BMP, TIFF and other "legacy" formats are way more complicated to handle when you start looking into it. So that's what QOI aims to replace.

XPM ? compressed with gzip ?

abainbridge 4 years ago | | |

XPM and gzip are still not that simple. QOI is much simpler.

PostThisTooFast 4 years ago | |

What are we to make of that warning? What shortcomings do you think people will find?

flohofwoe 4 years ago | | |

It's simply better suited for some types of images than others (e.g. the resulting size is sometimes bigger than expected). The main advantage is the very simple encoder and decoder with a specification that fits on a single page (and which still yields surprisingly good results for many image types):

https://qoiformat.org/qoi-specification.pdf

causality0 4 years ago | | |

Mostly the fact people have found and will find shortcomings that won't be fixed because the project is done, like everything being big-endian.

petitg1987 4 years ago |

I just implemented this format in my game engine and the performances are crazy: images loading is 3.2 times faster (compared to png) and 40 times faster to generate game screenshot!

junon 4 years ago | |

And the size difference? mapping in a raw pixel data binary is infinitely faster than any image encoding, but takes up the most space of course.

petitg1987 4 years ago | | |

The generated screenshots are lighter (about 5%). However, the resource images in QOI format that I load are in average a little bigger (about 5% and sometimes until 35%). I guess it is not the perfect solution for AAA games which already use more than 30go nowadays.

zigzag312 4 years ago |

Is there any open source audio compression format like that? Lossless and very fast. I haven't found any yet.

EDIT: I'm thinking about a format that would be suitable as a replacement for uncompressed WAV files in DAWs. Rendered tracks often have large sections of silence and uncompressed WAVs have always seemed wasteful to me.

LeoPanthera 4 years ago | |

FLAC is always lossless, but has a variable compression ratio so you can trade compression for speed.

Using the command line "flac" tool, "flac -0" is the fastest, "flac -8" is the slowest, but produces the smallest files.

In my experience, 0-2 all produce roughly equivalent sized files, as do 4-8.

makapuf 4 years ago | | |

I tried passing stereo wavs in 2 x 16bits (4bytes) as rgba for qoi but I haven't been very successful.

wombatmobile 4 years ago | |

I'd also like to know what's the best (or any) lossless audio compression process/tools.

My application is to send audio (podcast recordings) to a remote audio engineer friend who will do the post processing, then round trip it to me to complete the editing.

Wav is so big it makes a 1 hr podcast a difficult proposition.

MP3 is unsuitable because compression introduces too many artefacts the quality suffers unacceptably.

What do other people do in this circumstances?

phonon 4 years ago | | |

1 hour of CD quality mono FLAC encoded is about 100-150 MB. Is that small enough?

meltedcapacitor 4 years ago |

It is nice but pity it does not have a "turn right" opcode: start going left, on the turn opcode, continue decoding pixels after turning 90 degrees to the right, until you hit a previously decoded pixel or the wall of the bounding box defined after the first two turns, in which case you turn automatically. The file ends when there's nowhere to turn.

This would eliminate the need for a header (bloat!) as the end of file is clearly defined, the size is defined after decoding the top and right line (second turn), and it's not so sensitive to orientation (a pathological image can compress very differently in portrait vs landscape in line oriented formats). Color profile can be specified in the spec.

Also allows skipping altogether some image-wide bands or columns that are of the background color (defined by the first pixel) as you do not need to walk over all the pixels.

adgjlsfhk1 4 years ago | |

Writing an encoder for that sounds like a nightmare though. Also the speed would suck since you would have unpredictable memory accesses.

meltedcapacitor 4 years ago | | |

An encoder just walking a regular spiral (no uniform bands detection) is not hard. The band thing is an accidental artefact of the idea but plain run length encoding probably already captures most of the effect so no imperative to actually implement it.

Speed, yes, it is a fair objection, until hardware adopts spiral encoding :-)

booi 4 years ago |

Seems like they benchmarked it against libpng which shows anywhere from 3-5x faster decompression and 30-50x compression. That's pretty impressive and even though libpng isn't the most performant of the png libraries, it's by far the most common.

I think the rust png library is ~4x faster than libpng which could erase the decompression advantage but that 50x faster compression speed is extremely impressive.

Can anybody tell if there's any significant feature differentials that might explain the difference (color space, pixel formats, .. etc)?

sakras 4 years ago | |

I think fundamentally it’s faster just because it’s dead simple. It’s just a mash of RLE, dictionary encoding, and delta encoding, and it does it all in a single pass. PNG has to break things into chunks, apply a filter, deflate, etc.

pornel 4 years ago | | |

Filters are a form of delta encoding, and are optional for PNG encoders. Deflate is a form of dictionary encoding with RLE. There's no "breaking into chunks" in PNG — PNG can encode the entire image as a single iDAT chunk (and chunks themselves are so trivial they have no impact on speed).

You can choose not to do filtering when encoding PNG. Fast deflate settings are literally RLE-only, and you can see elsewhere in this thread people have developed specialized encoders that ignore most deflate features.

The only misfeature PNG has that slows down encoding is CRC. Decoders don't have to check the CRC, but encoders need to put one in to be spec-compliant.

aspyct 4 years ago |

I would love to have something to compress the raw files from my camera. They're huge, I have to keep a ton of them, and I also need to transmit them over internet for my backup.

I tried a few standard compression format, with very little luck.

Canon has devised a very smart (slightly lossy) compression format for newer cameras, but there's no converter that I know of for my old camera files.

So, unless I shell out large amounts of money for a new camera, I'm stuck sending twice the data over the internet. Talk about pollution...

rocqua 4 years ago | |

There is the option of converting to DNG files. Which allow for really good lossless compression. This does come at the cost of changing the file format, and risks losing metadata. That's why I personally decided to just buy more storage instead.

Come to think of it, have you tried running a modern compression algorithm on the data? I don't think I did. Could be cool if combined with ZFS or similar to get the compression done transparently.

aspyct 4 years ago | | |

Converting the CR2 (Canon) to DNG tends to double or even triple it's size, but I haven't tried compressing it afterwards.

I should, as you suggest, test a more exhaustive list of formats, who knows...

jaxrtech 4 years ago |

At a previous job was looking at different binary parsing methods. This project looks quite interesting having binary format descriptions in YAML that then can be generated into your language of choice.

https://formats.kaitai.io/png/

jqpabc123 4 years ago |

Interesting format. It would be much more interesting if browsers supported it.

dnautics 4 years ago | |

Not sure what you're expecting given how old it is. Why not write a polyfill as an exercise for yourself? Convert it to png, then save as an image tag to a data url.

Here look some people adapted to ios in one hour faffing around on twitch: https://www.twitch.tv/videos/1241476768?tt_medium=mobile_web...

ReactiveJelly 4 years ago | |

It's always gonna be chicken-and-egg for this, and browsers won't spend the time sandboxing and supporting a codec until it's already popular.

So this will probably see a JS / Webasm shim, and if that proves popular, Blink and Gecko will consider it.

The day might come soon when browsers just greenlight a webasm interface for codecs. "We'll put packets in through this function, and take frames out through this function, like ffmpeg. Other than that, you're running in a sandbox with X MB of RAM, Y seconds of CPU per frame, and no I/O. Anything you can accomplish within that, with user opt-in, is valid."

flohofwoe 4 years ago | |

Here you go ;)

https://floooh.github.io/qoiview/qoiview.html

A QOI decoder should fit into a few hundred bytes of WASM at most, maybe a few kilobytes for a "proper" polyfill.