"I think you will all appreciate this person's commenting style"

"I think you will all appreciate this person's commenting style"(jwz.livejournal.com)

516 points by ahalan 13 years ago | 88 comments

saurik 13 years ago |

PSD was never intended to be a data interchange format: it is the serialization format of a single program that has more individual unrelated features that actual people rely on than almost any other piece of software and has maintained striking amounts of backwards compatibility and almost unbroken forwards compatibility during its over two decades of existence. This product's "file format" needs to be critiqued in this context, along with similar mega-programs like Office.

I am thereby having a difficult time fathoming why anyone would think that a PSD file is thereby going to be some well-organized file format that they should easily be able to parse from their own application is just naively wishful thinking: even other products from Adobe have limitations while opening these files; to truly manipulate these files you really need to be highly-compatible with Photoshop's particular editing model (hence the conceptual difference between these two classes of file format).

jacobolus 13 years ago | |

Further points:

1) The specs are now much more publicly accessible than they used to be, and frankly the spec does a fairly reasonable job describing a tricky format relatively compactly. It requires a fair bit of knowledge of Photoshop to read and understand, but it’s mostly fairly explicit. Much better than many other proprietary document formats.

2) For someone with relatively extensive knowledge of photoshop, the format is fairly comprehensible, albeit complicated. The biggest part of the problem here is, as you say, that Photoshop just has a ton of features to support, so that becoming enough of a Photoshop expert to understand it all is a difficult undertaking by itself.

3) The code this comment is taken from only interacts with a small fraction of PSD features, and is frankly pretty awful code: hacky, ad-hoc, not modular at all, etc.

All that said, if someone was to redesign PSD format today, I’m sure it would be organized quite a bit differently, and would have much better re-use of a smaller number of features. (The same goes for Photoshop itself.)

sabret00the 13 years ago | | |

So just out of curiosity, why doesn't Adobe embark on those two projects? We're entering a new era of desktop software, surely that should be a catalyst for redesigning both from scratch?

CoolGuySteve 13 years ago | |

This is true, but there are container formats just as old like .mov that are quite nice to work with. (While your still sniggering, keep in mind that .mov has a lot in common with MPEG4.)

Whenever I need to write a binary serialization format, I usually copy .mov's tree of structs format, it's ridiculously fast, extensible, and keeps people away from C++ terrible stream operators/Java's BinaryReaderWhateverFactoryErrorProneOneIntAtATimeReader.

matt4711 13 years ago | | |

do you have a description of the ".mov tree of structs format"

yason 13 years ago | |

If a wise programmer decided he needs a serialization format, would he deliberately include in that format all the crap so vividly pointed to by the article?

No.

He will think of the "serialization format" as an interchange format between two different instances of his program. One process first writes the data file and another process later will read it. He also knows that sooner or later the "serialization format" needs to talk with different versions of his program, not just different running instances.

AFAIK, the Word .doc also started (and unfortunately continued) as basically a not-so-designed memory dump of the in-memory OLE data model. It's a format that more often than not has infamously stumped its own implementation as well. (Over time, OpenOffice has saved quite a lot of .doc files of Office users.)

lifeisstillgood 13 years ago | | |

The overriding aim of most formats was to load into memeory efficiently - fast load times was the key winner in the 80s and 90s. So you did not want a simple serialisation because that meant slow CPU intensive save and loads. But if you slammed it in pretty much as it would be in memeory you would win. Downside is if you change the in memeory representation of the running program you had to change the file format.

And .mov would have no such concerns - it's prime use case is store data in serialised chunks anyway - it was already serialised so could use very dumb stores.

saurik 13 years ago | | |

You are making a general argument why serialization formats should not exist. Fine, but in reality, and for any number of reasons, they do: they are easier at first, they are actually often somewhat easier over time, the pain cost that occurs is often easily amortized over time, they are fast to load (no transformations), they are fast to edit (you can often treat them as some insane memory page container and do internal allocation for updates, leaving old content begin until it is recycled), and their concept makes them capable of handling random seemingly-unrelated garbage that these mega-programs end up being popular for.

They aren't even always considered the non-ideal: I have seen many an argument from people who use Smalltalk that the ideal transfer format is to literally serialize part of the running program state and call it a "document", including whatever code might be required to operate the more epic parts of the document. (If you think about it, this is actually fairly similar to the various file formats that involve OLE, as you end up having the identifier of some code the user hopefully has installed attached with a block of data that that code hopefully can reinstate ate itself using.)

So, given that it is a tradeoff, and given that it was often a neccessary one for file formats where you want or need to be able to edit files that both contain numerous nearly-unrelated features (OLE would be the most beautiful example of this in the Word container format) where the entire contents may be larger than the RAM available to the entire computer, it simply seems silly to complain about this: man up, import the data, make your own format for saving your files, and stop complaining that someone in 1990 made something that over 22 years has become slightly difficult to understand without that historical context.

SideburnsOfDoom 13 years ago | | |

> AFAIK, the Word .doc also started (and unfortunately continued) as basically a not-so-designed memory dump

This may be true but not the whole story. It's the reason why the MS office team bit the bullet and replaced .doc with .docx about 5 years ago http://en.wikipedia.org/wiki/Office_Open_XML

Docx is basically XML in a zip file. It's a beast and has lots of compromises for backward compatibility, but as a design starting point, "zipped XML" is far far better than a binary dump of the in-memory data.

Maakuth 13 years ago | | |

It could be possible that the format was first very reasonable, but the surrounding platform has changed completely during the development. Then the new layers of specification have been added in a form that seemed to be the best possible solution on that platform and on that time. Wasn't Photoshop at the beginning an app for the original m68k Macintosh? Surely different kind of field sizes made more sense in that world than ours - also the tradeoffs in the sake of performance could have had some say.

taejo 13 years ago | | |

Word dates back to 1983, while OLE was only introduced in 1990 (but otherwise I think you are correct)

yuhong 13 years ago | | |

Not to mention security bugs too.

huhtenberg 13 years ago | |

When a file uses both little-endian and big-endian serialization, at times within the same logical structure, employs several different ways to store an array, and does other things of similar nature, then it is a genuine clusterfuck regardless of whether it is reflective of "Photoshop particular editing model."

Camillo 13 years ago | |

> I am thereby having a difficult time fathoming why anyone would think that a PSD file is thereby going to be some well-organized file format that they should easily be able to parse from their own application is just naively wishful thinking

I like how you embodied your point in the unsyntax of that very sentence. ;)

saurik 13 years ago | | |

Two hours latereader read what I wrote, and thought "man, this comment's upvotes to correct grammar ratio is remarkably high" (this is after one hour later, when I noticed another serious typo in the first few words that I still had time to edit). If it makes any difference to you: I write most of these comments I make on my iPhone, so I often can't even see the whole horizontal line at once. ;P

SideburnsOfDoom 13 years ago | |

> PSD was never intended to be a data interchange format

And that's the basic design flaw - it is a data interchange format despite not being designed as one, and the terrible job that it apparently does at it. The people who wrote it didn't recognise that they were going to be filling that need. There's a lesson in there somewhere.

TeMPOraL 13 years ago | | |

> There's a lessen in there somewhere.

Greetings. I have arrived from the future to spare mankind more years of pain by stating it clear here that the lesson is not "serialize your data to XML".

wmf 13 years ago | |

Perhaps some people care more about what would be convenient to them than whatever laziness or lock-in Adobe intended. Or maybe they believe that Photoshop might have better interchange with future versions of itself if its file format was sane.

shrughes 13 years ago | | |

What decade was Photoshop first written in and how long did it take to open a file?

JoeAltmaier 13 years ago | |

Um, they could have had a process for how to extend the format. Even GIF has a definition for how to add new pieces. PSD apparently had a process:play it by the seat of your pants?

Now, Word docs are xml format. Pretty extensible.

ibotty 13 years ago | |

i'm not sure i get your point about the similarity to (early) office data formats. these are very badly supported now. that's a stark contrast to your assertion that psd is backwards and forwards compatible to a very large degree.

care to elaborate?

gjm11 13 years ago |

Has been on HN before (http://news.ycombinator.com/item?id=575122) but it was years ago. I mention this just in case others are having the same feeling of deja vu as me.

rjzzleep 13 years ago | |

well i'm glad it was posted again

Tobu 13 years ago | |

I kept looking for a twist — jwz's “It seems very familiar to me” indicated one — but it's just a repost then.

greggman 13 years ago |

I'm pretty sure the PSD format chucks are based off IFF spec from 1985

http://www.martinreddy.net/gfx/2d/IFF.txt

Things were padded to 4 byte boundries because the 68000 processor would crash if you read an unaligned 32bit value. So the length of the actual data was what you find in the size field of each chuck but each chunk is padded. That way you didn't have to work around the 68000 quirks and read a byte at a time.

I wrote a psd reader in 93. It wasn't that hard and still works today. Maybe I chose an easy subset. It only reads the original result (merged layers) that gets saved when you chose to save backwards compatible files in photoshop.

http://elibs.svn.sourceforge.net/viewvc/elibs/trunk/elibs/li...

to3m 13 years ago | |

I wrote one a few years ago as well. It read layers, summary image, and some layer metadata that I needed (blend mode, layer name, visibility flag, etc.). There's documentation for the format on the adobe site, I think (wherever it came from at the time - autumn 2007 - no fax was required), so it was actually fairly straightforward. An artist made me a bunch of PSD files with the stuff in that they wanted to use, and I sat there comparing the results of my code to what Photoshop did.

The only oddity I can recall is that Photoshop does something odd with the alpha channel - I think it was the alpha channel? - by sometimes storing it with the summary image rather than the layer to which it's related. (Don't ask me for more details than that - I don't remember.) I thought at the time that this looked like somebody's attempt to make newer data work tolerably with older revisions. That part WAS annoying, because the documentation didn't mention that, and it took about a week before somebody managed to create a photoshop file that was arranged this way.

The file format overall bore many of the hallmarks of one that had grown rather than being planned, but it looks like they'd started to clamp down on things at some point because the newer data chunks looked a lot better-designed than the old ones. These things happen. It could be worse. BMP is worse. TGA is worse. They aren't even chunk-based.

beagle3 13 years ago | |

> Things were padded to 4 byte boundries because the 68000 processor would crash if you read an unaligned 32bit value.

It is actually padded to 2-byte boundaries. The 68000 had an external 16 bit data interface. That's the only thing I would fix about IFF if I redid it today. (And I would add a 64-bit length extension, and a "reserved chunkid" designation, e.g. anything that starts with a '$' must be registered in some central registry)

runn1ng 13 years ago |

John Nack replied to this 3 years ago on his blog.

http://blogs.adobe.com/jnack/2009/05/some_thoughts_about_the...

hcarvalhoalves 13 years ago |

I appreciate the first code comment more after the introduction:

    if(sign!='8BIM') break; // sanity check

"Sanity check" as in "let's make sure it's really a PSD before we go insane".

drivingmenuts 13 years ago |

So, I guess embedding a PSD in a DOC file is like putting a Bag of Holding in a Portable Hole?

bitwize 13 years ago |

And yet to be considered a non-toy image editor, you must support 100% of this format perfectly.

simula67 13 years ago |

Many more for your viewing pleasure : http://stackoverflow.com/questions/184618/what-is-the-best-c...

mmariani 13 years ago | |

Thanks for this link! It's filled great laughs. Like this one:

  #define TRUE FALSE //Happy debugging suckers

I imagine what the guy who wrote it must've been through... :-P

PS: I wish Jeff hadn't shut down the thread.

henrik_w 13 years ago | |

Absolutely hilarious, lots of gems there, like:

try {

} finally { // should never happen

}

felipc 13 years ago |

One of my favorite blog posts from Joel Spolsky talks about this, basically explaining how these formats come to be. For mega-softwares like those, the source code is the de facto file spec www.joelonsoftware.com/items/2008/02/19.html

smosher 13 years ago |

This reminded me of just how nice the Doom WAD format is: http://doomwiki.org/wiki/WAD

When a friend complained that he had a hard time figuring out which maps were present in a given WAD, I enjoyed myself while writing a utility to organize them into directories with map numbers. I kept thinking: this is how you serialize data. Looking back on the code now, it's still easy to understand.

brendandahl 13 years ago |

If he thinks PSD is bad he should try PDF which is really about 30 inconsistent formats all packaged into one inconsistent format.

dschiptsov 13 years ago |

This is much better reason to hire a person than 10 resumes.)

flebron 13 years ago |

I like the 'sanity check' at the bottom. :)

new299 13 years ago | |

should clearly just return false. ^^

drp4929 13 years ago |

Is this a comment or rant ?

opminion 13 years ago | |

Rant, of course, as in http://news.ycombinator.com/item?id=4134426 Rants can be very informative.

bvdbijl 13 years ago | |

Both!

mbetter 13 years ago | |

Is this a comment or a question?

unix-dude 13 years ago |

lol'd hard.

joshka 13 years ago |

Whilst I enjoy jwz's writings, please follow the hacker news guidelines which can be found at http://ycombinator.com/newsguidelines.html

In particular: Please submit the original source. If a blog post reports on something they found on another site, submit the latter. The original source is https://code.google.com/p/xee/source/browse/XeePhotoshopLoad...

Also: Please use the original title, unless it is misleading or linkbait.

danilocampos 13 years ago | |

Hmm. Here's something else I found at those guidelines you linked, there:

> Please don't submit comments complaining that a submission is inappropriate for the site. If you think something is spam or offtopic, flag it by going to its page and clicking on the "flag" link. (Not all users will see this; there is a karma threshold.) If you flag something, please don't also comment that you did.

bigiain 13 years ago | |

In this case, I suspect jwz's commentary and re-post of it is as much "the story" as the rant in the original source.

pseut 13 years ago | |

Later on the list of guidelines:

"Don't abuse the text field in the submission form to add commentary to links. The text field is for starting discussions. If you're submitting a link, put it in the url field. If you want to add initial commentary on the link, write a blog post about it and submit that instead."

jleader 13 years ago | |

Well, from an archival point of view, jwz's blog post I expect will last for some time (I believe he understands the value of durable URLs). On the other hand, either Google Code or the Xee project have managed to break the direct link that was submitted to HN last time (less than 4 years ago).

So that Google code link is the original source today, but it might not be tomorrow.