Biggest image in the smallest space

Biggest image in the smallest space(bamsoftware.com)

359 points by fekberg 10 years ago | 103 comments

michaelmior 10 years ago |

Actually, it decompresses to a 5.8MB PNG. However, many graphics programs may choose to use three bytes per pixel when rendering the image and because it has incredibly large dimensions, this representation would take up 141GB of RAM.

userbinator 10 years ago | |

Better graphics programs will not attempt to put the whole image into RAM, but only decompress the pieces needed for processing it.

I remember working with multi-megapixel images on systems with far less than 1MB of RAM, many years ago. Perhaps this is a good example of how more hardware resources can lead to them being wasted - the fact that RAM has grown so much that most images fit completely in it, has also meant programmers assuming they can do this for all images without a second thought when often all that's needed is a tiny subset of all the data.

Even if the image data is compressed, there's absolutely no need to keep all of it in memory - just decompress incrementally into a small, fixed-size buffer until you get to the "plaintext" position desired, ignoring everything before that. The fact that it's compressed also means that, with suitable algorithms, you can skip over huge spans at once - this is particularly easy to do with RLE and LZ - and the compression ratio actually boosts the speed of seeking to a specific position.

Currently, (hopefully...) no application is attempting to read entire video files into memory before processing them, but I wonder if that might change in the future as RAM becomes even bigger, and we'll start to get "video decompression bombs" instead?

dahart 10 years ago | | |

This! Command line programs have no excuse, they should never need to decompress the entire file to memory. GUI image editors and web browsers probably generally do need to, but there definitely are such options for dealing with more pixels than you can display.

Anyway, do you know some of these "better" graphics programs that actually behave this way, especially command line processing? I am interested in finding more of them.

EDIT: Okay, I have to add & admit that by "no excuse", I actually mean somewhat the opposite. ;) I mean that its possible to do streaming image processing on compressed formats, not that its trivial to do or as easy as decompressing the file in a single call. I just wish that programs would handle very large images more, and it sucks when they don't even though I know its possible. Especially programs intended for dealing with large images like Hugin. Now, I know its a PITA to tile & stream compressed formats because I've done it, but I'm sure I've written image I/O that decompresses the entire file to RAM 100x more frequently than anything that tiles and/or streams, because I've only handled tiling or streaming myself once, and it was harder. :P

theoh 10 years ago | | |

What you describe sounds a bit like demand paging of a memory-mapped file. The problem with implementing it for a 2d image is that a given rectangular region doesn't map to a contiguous area in memory. It's easy to construct a long thin image that would cause problems for a line-based demand paging strategy. For example, ten pixels high by a billion pixels wide.

Edit: skipping sections of the line to get to the region of interest is fine, I suppose, but what's really needed is a hierarchical quadtree-like organization of the storage, surely...

jerf 10 years ago | |

One of the rules of secure programming is that any program that is used in an even remotely security-sensitive context, and anything displaying a Portable Network Graphic is likely to be used in such a context, must be able to specify resource usage limits. In this case that could be dimensions or a limit on the total RAM allowed to be used. Limits need not be hard, either, but could produce a query, for instance, the way very long-running scripts in the browser ask you if they should continue.

Now, go find an API/library for dealing with PNGs that allow you to pass in such a limit, let alone pass in a callback for dealing with violations. Go ahead. I'll wait.

(The Internet being what it is, if there is one, someone will pop up in a reply in five minutes citing it. If so, my compliments to the authors! But I think we can all agree that in general image APIs do not offer this control. In fact, in general, if you submit a patch to allow it, it would probably be rejected from most projects as unnecessarily complicating the API.)

This is the sort of thing that I mean when I say that we are so utterly buried by insecure coding practices that we can't hardly even perceive it around us. I should add this as another example in http://www.jerf.org/iri/post/2942 .

userbinator 10 years ago | | |

Now, go find an API/library for dealing with PNGs that allow you to pass in such a limit

The article itself links to http://libpng.sourceforge.net/decompression_bombs.html

These new libpng versions do not impose any arbitrary limits, on the memory consumption and number of ancillary chunks, but they do allow applications to do so via the png_set_chunk_malloc_max() and png_set_chunk_cache_max() functions, respectively.

pierrec 10 years ago | | |

You're overdoing it a bit. I believe the most popular API/library for server-side manipulation of images is ImageMagick, and it has a few options for specifying limits that will easily protect against decompression bombs.

That being said, even with these limits, it's undeniable that something like ImageMagick still has a very large attack surface (especially since it uses many third-party libraries), so it should run in its own heavily unprivileged or sandboxed process.

phkahler 10 years ago | | |

Ultimately it's going to require a malloc to get the space for all those pixels. That is where things should fail. If not, how is one to specify what the image size limits should be? Ever try to open the Blue Marble images from NASA? In a web browser? Back in 2001?

ctdonath 10 years ago | | |

I'm inclined to agree. As legitimate image sizes increase, there's more need to sanely limit the resources thrown at such images.

While on vacation last week, I finally grokked that my relatively cheap Nikon camera is producing 6000x4000 images...that's about 100MB uncompressed. As a mobile app developer, I'm becoming painfully aware how images breaking the 25MB uncompressed line are breaking apps, with some still-in-use 256MB RAM iOS devices crashing when memory fills under normal usage plus a few instances of such large images (1-2 vacation photos can easily overwhelm available memory).

silon7 10 years ago | | |

ulimit may help. Perhaps it could (maybe already is) be added to Chrome per-tab process.

Not sure of Windows equivalent, Microsoft has deprecated Windows System Resource Manager, not sure of a good equivalent (except going fully to Linux).

wiredfool 10 years ago | | |

Limits, yes. Callback, maybe not so much.

https://github.com/python-pillow/Pillow/blob/master/PIL/PngI...

fla 10 years ago | |

And almost every program that tries to display it.

wiredfool 10 years ago | |

Some image programs will allocate space based on the metadata in the file. The actual image data isn't actually required. So, if there's corrupted image data, say a byte or two (or even missing), there's nothing stopping the reported size being in the gigapixel range.

DanBC 10 years ago |

That's impressive. Here are some other compression curiosities.

http://www.maximumcompression.com/compression_fun.php

A 24 byte file that uncompresses to 5 MB; another file with good compression under RAR but almost no compression under ZIP; and a compressed file that decompresses to itself.

0x0 10 years ago |

That's neat, but I still think the self-reproducing r.zip from "zip files all the way down" is the best compression trick I've seen:

http://research.swtch.com/zip

dmit 10 years ago | |

There's also dynamic generation of output by specifying a custom filter in a RAR archive that is executed during decompression: http://blog.cmpxchg8b.com/2012/09/fun-with-constrained-progr...

__mp 10 years ago |

Photoshop was able to show it: http://i.imgur.com/7EdBySv.png (Macbook Pro, 16GB RAM)

userbinator 10 years ago | |

Photoshop is an example of a graphics program that doesn't attempt to read the entire image into memory.

How much RAM did it actually use?

__mp 10 years ago | | |

Difficult to say. I don't completely understand the activity monitor RAM column: http://i.imgur.com/QS3NPQQ.png Looking at the activity monitor details we see that it uses something in the order of 2.54 GB of real memory. I suspect the rest is mostly compression.

anilgulecha 10 years ago | |

Tells you why it's a solid image application. Kudos to them.

semi-extrinsic 10 years ago |

If you follow the "related reading" link on the bottom of TFA, you come to a page by Glenn Randers-Pehrson discussing how libpng deals with decompression bombs. On the bottom of that page you find the following curious note; anyone know what to make of it?

""" [Note for any DHS people who have stumbled upon this site, be aware that this is a cybersecurity issue, not a physical security issue. Feel free to contact me at <glennrp at users.sourceforge.net> to discuss it.] """

saalweachter 10 years ago | |

He's presumably had problems with people confusing "decompression bombs" with the blowy-up kind and sending him panicky e-mails.

cperciva 10 years ago | | |

Another possibly apocryphal case of linguistic collisions resulting in governmental interest: When the MIT Media Lab started doing work on intelligent kitchen counters, they found that a lot of shadowy government agencies wanted to talk to them about their research into "counter intelligence".

semi-extrinsic 10 years ago | | |

Ah, of course. That didn't even cross my mind, for some reason; bomb in this context was so obviously not a physical device.

fennecfoxen 10 years ago | |

What to make of it? Seems clear enough; he's (half-jokingly?) afraid that someone in the federal government will see the page and think "oh no! bombs! explosions! TERRORISM!" and identify more clearly that this is only a computer analogy.

nerdy 10 years ago | |

I think it is related to the word "bomb" existing on the page

octatoan 10 years ago | |

DHS = Department of Homeland Security?

wiredfool 10 years ago |

PNGs also have optional compressed text metadata chunks, and it's possible to sneak a decompression bomb into one of those as well. You can get about a factor of 1000 in the compression -- 1MB of 'a' winds up being about 1040 bytes. You can have multiple itxt chunks, and it appears that the chunk size is only limited to 2^31-1.

See https://github.com/python-pillow/Pillow/blob/master/Tests/ch... for a quick way to generate some of these.

andersthue 10 years ago |

Reminds me of how you could crash a fido node by sending them some big empty files, so when they got automatically unzipped the filled of the harddrive :)

fizgig 10 years ago | |

I think this kind of thing was common even a few years ago in DoS'ing mail gateways that uncompressed and scanned various archive formats. Things like really huge files when uncompressed or ridiculously deep nested directory structures.

I think most software these days is immune to such tricks, or at least has tunables to reduce the chance of such tricks causing harm.

digi_owl 10 years ago | | |

Zip bombs, a relative of the fork bomb.

https://en.wikipedia.org/wiki/Zip_bomb

The billion laughs XML attack is also lovely in its simplicity.

https://en.wikipedia.org/wiki/Billion_laughs

dingaling 10 years ago | | |

There was also the trick of infinitely recursive zips that kept decompressing to a copy of themselves.

Zip-bombing was such a problem for our corporate network in the late 1990s that inbound e-mail attachments were deliberately discarded for a while. Chaos ensured.

eli_gottlieb 10 years ago |

http://jeremykun.com/2012/04/21/kolmogorov-complexity-a-prim...

http://c2.com/cgi/wiki?KolmogorovComplexity

Here be rabbit-hole.

inglor 10 years ago |

This does wonders when used in favicons :D

raffomania 10 years ago | |

I just tried it on a locally served page, and my browser handles it quite well (although it won't really display it).

inglor 10 years ago | | |

Only on firefox and chrome since they fixed it https://github.com/benjamingr/favicon-bug

feld 10 years ago | |

You just made my stomach turn at the thought

raffomania 10 years ago |

Fun fact: When trying to upload this as a profile picture (on a site I host myself), chromium crashes.

dahart 10 years ago |

Having dealt with and printed a lot of very large images, e.g., 60k x 60k pixels, I have been on the lookout for image processing software that never decompresses the entire image into ram, but instead works on blocks or scan lines or blocks of scan lines, but stays in constant memory and streams to and from disk. For example, the ImageMagick fork GraphicsMagick does a much better job of this than ImageMagick. What other software is out there that can handle these kinds of images?

phkahler 10 years ago | |

The key is not to store it in raster form in RAM. Either tiles (like GIMP) or I prefer Z-ordering. Then a user can zoom in and pan around easily - you let the system swap and it won't be bad at all. If they zoom out though, you probably want to store MIP maps of it.

Swap works well for this as long as your data has good locality. huge raster images don't.

But no, I'm not aware of any software that handles stuff like that well - except the GIMPs tiling, but that's not going to help when zoomed out.

dahart 10 years ago | | |

What does Z-ordering mean in this context?

I definitely want to avoid swap at all costs and find things that are designed to tile & stream instead. The difference between GraphicsMagick resizing an image by streaming and ImageMagick resizing an image that hits swap is orders of magnitude - seconds versus hours.

lcrs 10 years ago | |

Nuke works in scanlines like this, and can process a whole tree of operations only loading the input lines necessary for the current output row. The SDK docs explain the architecture somewhat: https://www.thefoundry.co.uk/products/nuke/developers/90/ndk...

AndrewStephens 10 years ago |

I used to work on a scanning SMTP/HTTP proxy and even back then it wasn't unknown for people to send crafted decompression bombs to attempt to crash the services. We handled it by estimating the total uncompressed size upfront (including sub archives) and throwing out anything with a suspiciously large compression ratio.

I imagine that .pdf files are another avenue for mischief. They contain lots of chunks which may be compressed in varying ways.

tetrep 10 years ago |

Neat. I needed to make very large PNG bombs recently and toyed with the idea of doing it "manually." In the end I decided to take the lazy route and use libpng[1].

[1]: https://bitbucket.org/tetrep/pngbomb/src/03dfc95065d78562c15...

x0 10 years ago | |

This works wonderfully! With an image size 123456x123456, I made this happen: http://i.imgur.com/2Dgrazj.png

I killed it at about 25GB memory usage, who knows how high it would have climbed otherwise.

JosephRedfern 10 years ago |

That's cool. Presumably the same "attack" could be applied to any file format that uses DEFLATE.

From a legal stand-point, I'd be wary about following through with the authors suggestion of "Upload as your profile picture to some online service, try to crash their image processing scripts" without permission. Sounds like a good way of getting into trouble.

cnvogel 10 years ago | |

Yes, but on the other hand it's a good reminder for everyone processing user provided files to sanity check or convert them to a canonical format in a sandboxes and resource limited process.

atom_enger 10 years ago | |

What about responsibly disclosing the bug you found with steps to reproduce, the impact and the solution? As long as you only timed out the backend without entirely crashing it, I can't imagine any sane company would prosecute you for trying to improve their service with this level of detail.

JosephRedfern 10 years ago | | |

How do you know that you're only going to time out the backend without entirely crashing it, without actually attempting it? It's a kinda Schrödinger's cat scenario.

It's all good and well saying that you had good intentions, but if you can't prove it, and they didn't invite you to test it (via a responsible disclosure policy), then I would steer clear.

While I wouldn't personally attempt to prosecute anyone for responsibly disclosing a bug to me, it doesn't meant to say that BigCorp™ wouldn't.

logicallee 10 years ago |

>The image is almost entirely zeroes, with a secret message in the center.

too pressed for time, did anyone look? What is it?

sgdread 10 years ago | |

It is "SORRY, OUR PRINCESS IS IN ANOTHER PIXMAP"

tiler 10 years ago |

I realize that this is besides the point but going on the title alone we could write a script that could generate an 'infinite' (max out available memory) sized image.

javajosh 10 years ago |

Everyone's focusing on this being a PNG problem but actually if my server unzips a 420 byte file into a 5M file of any kind, I'd say that's the first red flag. Assuming some sort of streaming decompression, you could write an output filter that shuts off the decompressor when it's seen a factor of X bytes. A reasonable factor would be 10 - which in this case would have halted bzip decompression at 4kB.

This would probably be a trivial patch to bzip2. But I like the idea in general of passing an "max input/output ratio" to any process or function that might yield far more output than input.

ctdonath 10 years ago | |

The real problem is image handling libraries that blindly render images into too-large objects where unnecessary. While full-res uncompressed images are very convenient under the hood, the image library should inherently handle anything "too big" gracefully. Instead we're often prone to apps crashing when someone feeds in a ridiculously large image.

A 420B > 5MB expansion should not be a "red flag" because there is nothing about it (including the subsequent attempt to process a 141GB uncompressed image) which cannot be handled appropriately in software. Flagging such ratio limits is arbitrary, and setting an arbitrary limit is usually a sign the software is incorrect, not the data.

ctdonath 10 years ago |

Looks handy for large image processing tests, thanks.

atom_enger 10 years ago |

Trying to run the program and create my own image, however a few questions, what did you use for secret.png? Any old png?

Are you using PIL or pillow?

pvdebbe 10 years ago |

Cool, but most web sites wouldn't allow to upload a 5-MB picture as a profile picture. Or do they, these days?

andrewstuart 10 years ago |

Is there a way to check for decompression bombs? I'd like my software to be able to unzip zip files safely.

MatthewWilkes 10 years ago | |

A python example:

    def decompress(data, maxsize=262144):

        dec = zlib.decompressobj()
        data = dec.decompress(data, maxsize)
        if dec.unconsumed_tail:
            raise ValueError("Possible zip Bomb")
        del dec

        return data

ZenoArrow 10 years ago | |

Monitor zip files as they decompress. Halt decompression process if the size ratio between zip file and decompressed file exceeds a fixed ratio (for example, if ratio between the file sizes is something like 10:1).

Ambroos 10 years ago | | |

If you do that, pick something a little more extreme. When using BEM, for example, your CSS becomes pretty repetitive and you easily get better than 10:1 ratio with GZIP, for example.

wiredfool 10 years ago | |

You can do it in zlib -- there's one call that effectively does the whole thing, and one that fills a buffer. You can check to see how much input has been consumed, if there's more, then you know you're getting large. It's up to the friendly programmer to decide when large is too large.

bluedino 10 years ago | |

Sandbox them. We once created a 1024MB, 6GB disk single-core VM and built a tiny API around image decompression and scaling. Never had any issues with it, but it was a simple way of preventing things from filling up the regular web servers.

wglb 10 years ago | |

Yes. A method whose only purpose is to answer the question 'is this file larger than <parameter>'. If it is don't go further.

ak2196 10 years ago |

It's probably using middle-out.

TurplePurtle 10 years ago |

I wonder what the ratio would look like if the equivalent was done with a JPEG instead of a PNG.

mridulmalpani 10 years ago |

does anybody tried to upload it on facebook as profile picture?

MrKristopher 10 years ago | |

"Your photo couldn't be uploaded due to restrictions on image dimensions. Photos should be less than 30,000 pixels in any dimension, and less than 41,000,000 pixels in total size."

hnpc123 10 years ago |

The title was changed and is now more opaque and less descriptive.

fekberg 10 years ago | |

Yeah, I agree. The original title was a lot more descriptive.

dang 10 years ago | | |

The original title is the one the author gave it. The HN guidelines ask you to not to change that unless it is linkbait or misleading.

_hhff 10 years ago |

righto pied piper

hadeharian 10 years ago |

This is a very easy form of attack in security circles.