https://github.com/richgel999/fpng
Disclaimer: have not actively tried either.
Huffman encoding lets you store frequently used values in fewer bits than rarely occurring values, but the cost of a naïve implementation is a branch on every encoded bit. You can mitigate this by making a state machine keyed by "accumulated prefix bits" and as many bits as you want to process in a whack, these tables will blow out your L1 data cache and trash a lot of your L2 cache as well.¹
The "opcode" strategy in QOI is going to give you branches, but they appear nearly perfectly predictable for common image types², so that helps. It has a table of recent colors, but that is only of a few cache lines.
In all, it seems a better fit for the deep pipelines and wildly varying access speeds across cache and memory layers which we find today.
␄
¹ I don't think it ever made it into a paper, but in the mid-80s, when the best our Vax ethernet adapters could do was ~3Mbps I was getting about 10Mbps of decompressed 12 bit monochrome imagery out of a ~1.3MIP computer using this technique.
² I also wouldn't be surprised if this statement is false. It just seems that for continuous tone images one of RGBA, DIFF, or LUMA is going to win for any given region of a scan line.
(I write comments with footnotes in the same style as you, but use “—⁂—” as the separator, via Compose+h+r (name from the HTML tag horizontal rule). Good fun being able to use Compose+E+O+T, Compose+E+T+X and Compose+F+F in this comment; I added the full set to my .XCompose years ago.)
https://github.com/toitlang/toit/commit/65c6c1bd7138f9ebced4... It's not as highly optimized as this effort though, and it just uses the standard huffman table that is built into deflate, rather than a static-but-custom one.
- "The QOI File Format Specification" 214 points | 3 days ago | 54 comments: https://news.ycombinator.com/item?id=29625084
- "QOI: Lossless Image Compression in O(n) Time" 1057 points | 29 days ago | 293 comments: https://news.ycombinator.com/item?id=29328750
At the very least, break it into chunks and add an offset directory header. I'm sure one could do something much better, but it's a start.
EDIT: typo
https://news.ycombinator.com/item?id=29625084
[1] https://news.ycombinator.com/pool
I wanted a simple format that allows you to load/save images quickly, without dealing with the complexity of JPEG or PNG. Even BMP, TIFF and other "legacy" formats are way more complicated to handle when you start looking into it. So that's what QOI aims to replace.
There's a lot of research for a successor format ongoing. Block based encoding, conversion to YUV, more OP-types etc. have all shown improved numbers. Better support for metadata, different bit-depths and allowing restarts (multithreading) is also high on the list of things to implement.
But QOI will stay as it is. It's the lowest of all hanging fruits that's not rotten on the ground.
XPM ? compressed with gzip ?
EDIT: I'm thinking about a format that would be suitable as a replacement for uncompressed WAV files in DAWs. Rendered tracks often have large sections of silence and uncompressed WAVs have always seemed wasteful to me.
Using the command line "flac" tool, "flac -0" is the fastest, "flac -8" is the slowest, but produces the smallest files.
In my experience, 0-2 all produce roughly equivalent sized files, as do 4-8.
My application is to send audio (podcast recordings) to a remote audio engineer friend who will do the post processing, then round trip it to me to complete the editing.
Wav is so big it makes a 1 hr podcast a difficult proposition.
MP3 is unsuitable because compression introduces too many artefacts the quality suffers unacceptably.
What do other people do in this circumstances?
This would eliminate the need for a header (bloat!) as the end of file is clearly defined, the size is defined after decoding the top and right line (second turn), and it's not so sensitive to orientation (a pathological image can compress very differently in portrait vs landscape in line oriented formats). Color profile can be specified in the spec.
Also allows skipping altogether some image-wide bands or columns that are of the background color (defined by the first pixel) as you do not need to walk over all the pixels.
Speed, yes, it is a fair objection, until hardware adopts spiral encoding :-)
I think the rust png library is ~4x faster than libpng which could erase the decompression advantage but that 50x faster compression speed is extremely impressive.
Can anybody tell if there's any significant feature differentials that might explain the difference (color space, pixel formats, .. etc)?
You can choose not to do filtering when encoding PNG. Fast deflate settings are literally RLE-only, and you can see elsewhere in this thread people have developed specialized encoders that ignore most deflate features.
The only misfeature PNG has that slows down encoding is CRC. Decoders don't have to check the CRC, but encoders need to put one in to be spec-compliant.
I tried a few standard compression format, with very little luck.
Canon has devised a very smart (slightly lossy) compression format for newer cameras, but there's no converter that I know of for my old camera files.
So, unless I shell out large amounts of money for a new camera, I'm stuck sending twice the data over the internet. Talk about pollution...
Come to think of it, have you tried running a modern compression algorithm on the data? I don't think I did. Could be cool if combined with ZFS or similar to get the compression done transparently.
I should, as you suggest, test a more exhaustive list of formats, who knows...
Here look some people adapted to ios in one hour faffing around on twitch: https://www.twitch.tv/videos/1241476768?tt_medium=mobile_web...
So this will probably see a JS / Webasm shim, and if that proves popular, Blink and Gecko will consider it.
The day might come soon when browsers just greenlight a webasm interface for codecs. "We'll put packets in through this function, and take frames out through this function, like ffmpeg. Other than that, you're running in a sandbox with X MB of RAM, Y seconds of CPU per frame, and no I/O. Anything you can accomplish within that, with user opt-in, is valid."
https://floooh.github.io/qoiview/qoiview.html
A QOI decoder should fit into a few hundred bytes of WASM at most, maybe a few kilobytes for a "proper" polyfill.
Seriously, who?
This project is interesting because of how well it does compared to other systems of much higher complexity and without optimizing the implementation to high heaven. We can all learn something from that.
Of all the artifacts in our industry, few things live longer than formats. Eg. we are still unpacking tar files (Tape ARchieve), transmitted over IPv4, decoded by machines running x86 processors (and others, sure). All of these formats couldn't possible anticipate the evolution that follow nor predicted the explosive popularity they would have. And all of these (the latter two notably) have overheads that have real material costs. IPv6 fixed all the misaligned fields, but IPv4 is still dominant. Ironically, RISC-V didn't learn from x86 but added variable length instructions making decoding harder to scale than necessary.
I'm not sure what positive lessons you think we should learn from QOI. It's not hard to come up with simple formats. It's much harder coming up with a format that learns from past failures and avoids future pitfalls.
(but seriously, MODs can encode hours of audio into kilobytes, the downside is of course that they require a special authoring process which seems to be a bit of a lost art today)
For example, here's JPEG XL's Huffman decoder: https://github.com/libjxl/libjxl/blob/1c67f4eff0464e6241f365.... The branch uses a second table if the code was too long.
However I e.g. wonder from reading the concrete spec why you e.g. cannot differentially change the alpha channel leading me to the question what happens if images have different alpha levels.
Usually it's a good idea anyway to read file headers byte by byte instead of mapping a struct over it to avoid alignment, padding and endianness issues.
ZStandard is a very good compressor, with an especially fast decompressor. Maybe someone should try using this instead of zlib in an audio format (FLAC, WavPack, ...)
FLAC is extremely good at compression audio, has very fast encode and uber fast decode. It also doesn't use zlib...
A top level thread scans the opcodes only to solve this, with no decoding and no writing, thus progressing faster in the stream than the child chunked decoding threads it progressively spawns.
Not as quick as a format with a chunk table, but faster than naive single core.
QOI is simple enough to remix the file format as much as you want in your own encoder/decoder (that's actually the USP), it's not meant as a standardized image exchange format, just something that lives in your own asset pipeline.
Another stranger mistake is the use of generic compression a part of the format, its much better to leave that up to the filesystem or stream. I dont really understand why this was ever a good idea, but it certainly hasn't been one for decades.
The more recent blob formats people have developed aren't much better, they still confuse the specification of a single blob with the specification of the container format, and try to be convenient and fully generic at once. If people actually wanted a fully generic dataformat and accepted that this requires a programming language, just let that be python, instead of inventing some shitty new one...
Do you mean APNG? In all fairness, that is not even in the specification although there is discussion to add it. https://github.com/w3c/PNG-spec/issues/26
The one that was specified was MNG but it was a different format and practically no one used it since it was not parsable as a PNG.
It used by some videogames as assets, for instance storing what amounts as a thumbnail as a shown image, but leaving meshes and textures in the other internal datablocks.
Variable length ISAs are characterized by not being able to tell the beginning of an instruction without knowing the entrypoint. This applies to RISC-V with compressed instructions. Finding the boundaries is akin to a prefix scan and has a cost roughly linear in the scan length, but IMO the biggest loss is that you can’t begin predecode at I$ fill time.
That helps enough to overcome the increased code size?
I really wouldn't say they learned nothing from x86, though. You only have to look at 2 bits, and if you can get your users to put in the slightest effort then compilers can be told not to use C.
Creating formats and specs that are "future proof" is a noble goal. Criticizing QOI for not being able to be well parallelized inside the decode function, that seems more like a demand for a premature optimization to me...
What? Faster encoding and decoding is one of the primary reasons for the format. Yet, QOI decoders are currently an order of magnitude slower than SSDs available today and even worse compared to DRAM! Now seems like the perfect time to look at possible optimizations to close that gap.
Interesting to contrast with Arm which upon defining Aarch64 did _away_ with variable length instructions and thus also page crossing ones. Maybe they knew something.
However, frankly, if you're working professionally with audio like that, the best solution is simply to have sufficient disk space available to work with raw audio.
Use FLAC to compress the final product, when you are done.
The issue is audio is typically mixed close to maximum so any processing steps can easily lead to clipping. One solution is to use float or larger integers internally during each processing step and normalize/convert back to 24-bit integer to write to disk. Another (better imo) option would be to do all intermediate steps and disk saves in a floating point format and only normalize/quantize for output once.
I haven't worked with professional audio in over 25 years (before everything went fully digital) but I would be surprised if floating point formats were not an option for encoding and intermediate workflows. Many quantization steps seems like a bad idea.
For bouncing tracks to disk, uncompressed 32-bit floating point formats are avaliable, but I am not aware of any fast losslessly compressed 32-bit floating point format.
EDIT: Floating point is useful while you are working to avoid any accidental clipping. As an intermediate format, like a ProRes for video. FLAC is great as a final format.
The simplest possible mitigation would have been to disallow an instruction from spanning a 64-byte boundary. It would have almost no impact on instruction density, but it would have saved a lot of headaches for implementations.
> The simplest possible mitigation would have been to disallow an instruction from spanning a 64-byte boundary.
Sure, that sounds good. But before this you hadn't even mentioned any problems with split instructions that need to be mitigated.
(You did mention decoding without a known entry point, but a rule like that doesn't guarantee you can find the start of an instruction. And if it would help to know that a block of 64 bytes probably starts with an aligned instruction, that seems like something you could work out with compiler writers even without a spec.)
Implementing this would require instruction fetch to take an exception on line-crossing instructions (which must be illegal) and a change to the assembler to insert a 16-bit padding nop or expand a compressed instruction to maintain the alignment. There is nothing needed from the compiler (or linker AFAICT). JITs will have to be aware though.
Part of why I built my current machine with 64GB of ram