https://en.m.wikipedia.org/wiki/Hutter_Prize
A lossless compression contest to encourage research in AI. It's lossless, I think just to standardize scoring, but I always thought a lossy version would be better for AI -- our memories are definitely lossy!
Almost. Compression and AI both revolve around information processing, but their core objectives diverge. Compression is focused on efficient representation, while AI is built for flexibility and the ability to navigate the unpredictable aspects of real-world data.
Compression learns a representation from the same data it encodes, like "testing on the training set". AI models have different training and test data. There are no surprises in compression.
I like your sentiment, it is technically inspiring.
“On the other hand, I can generate endless amounts of Harlan Coben miniseries… :-P”
[1] https://en.m.wikipedia.org/wiki/Sloot_Digital_Coding_System
Guy named Borges already patented that, I'm afraid.
From the description, it looks like it's only being tested with 128x128 frames, which implies that the speed is very low.
It can? Maybe I'm misunderstanding the graphs but it doesn't look like it to me?
Many older/commercial video codecs optimized for PSNR, which results in the output being blurry and textureless because that's the best way to minimize rate for the same PSNR.
Even with that, showing H.265 having lower PSNR than H.264 is odd --- it's the former which has often looked blurrier to me.
I guess where this sort of generative video "compression" is headed is that the video would be the prompt, and you'd need a 100GB decoder (model) to render it.
No doubt one could fit a prompt to generate a movie similar to something specific in a floppy size ("dude gets stuck on mars, grows potatoes in his own shit"). However, 1MB is only enough to hold the words of a book, and one could imagine 100's of movie adaptations (i.e. visualizing the "prompt") of any given book that would all be radically different, so it seems a prompt of this size would only be enough to generate one of these "prompt movie adaptations".
As a casual non-scholar, non-AI person trying to parse this though, it's infuriatingly convoluted. I was expecting a table of "given source file X, we got file size Y with quality loss Z", but while quality (SSIM/LPIPS) is compared to standard codecs like H.264, for the life of me I can't find any measure of how efficient the compression is here.
Applying AI to image compression has been tried before though, with distinctly mediocre results: some may recall the Xerox debacle about 10 years, when it turned out copiers were helpfully "optimizing" images by replacing digits with others in invoices, architectural drawings, etc.
https://www.theverge.com/2013/8/6/4594482/xerox-copiers-rand...
This is not even AI. JBIG2 allows a reuse of once-decoded image patches because it's quite reasonable for bi-level images like fax documents. It is true that similar glyphs may be incorrectly groupped into the same patch, but such error is not specific to patch-based compression methods (quantization can often lead to the same result). The actual culprit was Xerox's bad implementation of JBIG2 that incorrectly merged too many glyphs into the same patch.
Furthermore, it's pretty common in compression research to focus on the size/quality trade-off, and leave optimization of compute for real-world implementations.
Which have examples in it.
Unfortunately it only contains still images with teeny thumbnails: https://arxiv.org/html/2402.08934v1/x2.png
Is this more extreme than youtube ?
compression is most of the times about finding the minimal grammar that unfolds to the same original material.
interestingly Fabrice Bellard somehow found a way to use transformers for compression without loss, and beats xz by significant margin. https://bellard.org/nncp/nncp_v2.1.pdf. it uses "deterministic mode of PyTorch" to make sure both directions work alike which I guess means - it saves the random toss throughout compression, for the decompression to use. note: this paper is still on my to-read list.
But you're right the predictor does need to be reproducible - the output must be exactly the same to match encoder and decoder behavior. While I don't think this is big focus right now for many, I don't think there's a fundamental reason why it couldn't be, though probably at the cost of some performance.
Gonna be able to skip holodeck episodes in real life soon
Gwern posts about this when people say something like that on here, but I'll do it instead. Lossless encoding is just lossy encoding + error correction of some sort.
For example, to memorize "Doggo woofs at kity", we first get the concept of "dog barks at cat", it compresses well because intuitively, we know that dogs bark and cats are common targets. That's our lossy compression and we could stop there but it is only part of the story. It is not a "dog" but a "doggo", and it goes well with the familiar tone, a good compression algorithm will take only a few bits for that. Then there is the typo "kity" vs "kitty", it will take a bit of extra space, but again, a good algorithm will recognize the common typos and compress even that. So it means the entire process to lossless matters, lossy is just stopping halfway.
And if there is pure random noise remaining, there is nothing you can do, but all algorithms are on an equal footing here. But the key is to make what the algorithm consider as uncompressible noise as small as possible.
There are some people working hard to provide the means to perform deterministic AI computations like these, but that will come with some performance losses, so I would guess that most AIs will continue to be (slightly) non-deterministic.
At best we might consider it a new type of lossy (or... replacey?) compression. Of course if storage / RAM / bandwidth keeps increasing, this is quite likely the least energy efficient technique available.
It checks a reference video against an encoded video and returns a score representing how close the encoded video appears to the original from a human perspective.
also see https://jon-cld.s3.amazonaws.com/test/ahall_of_fshame_SSIMUL... which is an ab comparison of a lot of images where it gives 2 versions, one preferred by ssimulacra, the other preferred by vmaf
TID2013 for example is an image dataset with many artifacts completely unrelated to compression and scaling.
- Additive Gaussian noise - Additive noise in color components is more intensive than additive noise in the luminance component - Spatially correlated noise - Masked noise - High frequency noise - Impulse noise - Quantization noise - Gaussian blur - Image denoising - JPEG compression - JPEG2000 compression - JPEG transmission errors - JPEG2000 transmission errors - Non eccentricity pattern noise - Local block-wise distortions of different intensity - Mean shift (intensity shift) - Contrast change - Change of color saturation - Multiplicative Gaussian noise - Comfort noise - Lossy compression of noisy images - Image color quantization with dither - Chromatic aberrations - Sparse sampling and reconstruction
Doing better on TID2013 is not really an indication of doing better on a video compression and scaling dataset (or being more useful for making decisions for video compression and streaming).