"I think you will all appreciate this person's commenting style"(jwz.livejournal.com) |
"I think you will all appreciate this person's commenting style"(jwz.livejournal.com) |
I am thereby having a difficult time fathoming why anyone would think that a PSD file is thereby going to be some well-organized file format that they should easily be able to parse from their own application is just naively wishful thinking: even other products from Adobe have limitations while opening these files; to truly manipulate these files you really need to be highly-compatible with Photoshop's particular editing model (hence the conceptual difference between these two classes of file format).
1) The specs are now much more publicly accessible than they used to be, and frankly the spec does a fairly reasonable job describing a tricky format relatively compactly. It requires a fair bit of knowledge of Photoshop to read and understand, but it’s mostly fairly explicit. Much better than many other proprietary document formats.
2) For someone with relatively extensive knowledge of photoshop, the format is fairly comprehensible, albeit complicated. The biggest part of the problem here is, as you say, that Photoshop just has a ton of features to support, so that becoming enough of a Photoshop expert to understand it all is a difficult undertaking by itself.
3) The code this comment is taken from only interacts with a small fraction of PSD features, and is frankly pretty awful code: hacky, ad-hoc, not modular at all, etc.
All that said, if someone was to redesign PSD format today, I’m sure it would be organized quite a bit differently, and would have much better re-use of a smaller number of features. (The same goes for Photoshop itself.)
Whenever I need to write a binary serialization format, I usually copy .mov's tree of structs format, it's ridiculously fast, extensible, and keeps people away from C++ terrible stream operators/Java's BinaryReaderWhateverFactoryErrorProneOneIntAtATimeReader.
No.
He will think of the "serialization format" as an interchange format between two different instances of his program. One process first writes the data file and another process later will read it. He also knows that sooner or later the "serialization format" needs to talk with different versions of his program, not just different running instances.
AFAIK, the Word .doc also started (and unfortunately continued) as basically a not-so-designed memory dump of the in-memory OLE data model. It's a format that more often than not has infamously stumped its own implementation as well. (Over time, OpenOffice has saved quite a lot of .doc files of Office users.)
And .mov would have no such concerns - it's prime use case is store data in serialised chunks anyway - it was already serialised so could use very dumb stores.
They aren't even always considered the non-ideal: I have seen many an argument from people who use Smalltalk that the ideal transfer format is to literally serialize part of the running program state and call it a "document", including whatever code might be required to operate the more epic parts of the document. (If you think about it, this is actually fairly similar to the various file formats that involve OLE, as you end up having the identifier of some code the user hopefully has installed attached with a block of data that that code hopefully can reinstate ate itself using.)
So, given that it is a tradeoff, and given that it was often a neccessary one for file formats where you want or need to be able to edit files that both contain numerous nearly-unrelated features (OLE would be the most beautiful example of this in the Word container format) where the entire contents may be larger than the RAM available to the entire computer, it simply seems silly to complain about this: man up, import the data, make your own format for saving your files, and stop complaining that someone in 1990 made something that over 22 years has become slightly difficult to understand without that historical context.
This may be true but not the whole story. It's the reason why the MS office team bit the bullet and replaced .doc with .docx about 5 years ago http://en.wikipedia.org/wiki/Office_Open_XML
Docx is basically XML in a zip file. It's a beast and has lots of compromises for backward compatibility, but as a design starting point, "zipped XML" is far far better than a binary dump of the in-memory data.
I like how you embodied your point in the unsyntax of that very sentence. ;)
And that's the basic design flaw - it is a data interchange format despite not being designed as one, and the terrible job that it apparently does at it. The people who wrote it didn't recognise that they were going to be filling that need. There's a lesson in there somewhere.
Greetings. I have arrived from the future to spare mankind more years of pain by stating it clear here that the lesson is not "serialize your data to XML".
Now, Word docs are xml format. Pretty extensible.
care to elaborate?
http://www.martinreddy.net/gfx/2d/IFF.txt
Things were padded to 4 byte boundries because the 68000 processor would crash if you read an unaligned 32bit value. So the length of the actual data was what you find in the size field of each chuck but each chunk is padded. That way you didn't have to work around the 68000 quirks and read a byte at a time.
I wrote a psd reader in 93. It wasn't that hard and still works today. Maybe I chose an easy subset. It only reads the original result (merged layers) that gets saved when you chose to save backwards compatible files in photoshop.
http://elibs.svn.sourceforge.net/viewvc/elibs/trunk/elibs/li...
The only oddity I can recall is that Photoshop does something odd with the alpha channel - I think it was the alpha channel? - by sometimes storing it with the summary image rather than the layer to which it's related. (Don't ask me for more details than that - I don't remember.) I thought at the time that this looked like somebody's attempt to make newer data work tolerably with older revisions. That part WAS annoying, because the documentation didn't mention that, and it took about a week before somebody managed to create a photoshop file that was arranged this way.
The file format overall bore many of the hallmarks of one that had grown rather than being planned, but it looks like they'd started to clamp down on things at some point because the newer data chunks looked a lot better-designed than the old ones. These things happen. It could be worse. BMP is worse. TGA is worse. They aren't even chunk-based.
It is actually padded to 2-byte boundaries. The 68000 had an external 16 bit data interface. That's the only thing I would fix about IFF if I redid it today. (And I would add a 64-bit length extension, and a "reserved chunkid" designation, e.g. anything that starts with a '$' must be registered in some central registry)
http://blogs.adobe.com/jnack/2009/05/some_thoughts_about_the...
if(sign!='8BIM') break; // sanity check
"Sanity check" as in "let's make sure it's really a PSD before we go insane". #define TRUE FALSE //Happy debugging suckers
I imagine what the guy who wrote it must've been through... :-PPS: I wish Jeff hadn't shut down the thread.
try {
} finally { // should never happen
}
When a friend complained that he had a hard time figuring out which maps were present in a given WAD, I enjoyed myself while writing a utility to organize them into directories with map numbers. I kept thinking: this is how you serialize data. Looking back on the code now, it's still easy to understand.
In particular: Please submit the original source. If a blog post reports on something they found on another site, submit the latter. The original source is https://code.google.com/p/xee/source/browse/XeePhotoshopLoad...
Also: Please use the original title, unless it is misleading or linkbait.
> Please don't submit comments complaining that a submission is inappropriate for the site. If you think something is spam or offtopic, flag it by going to its page and clicking on the "flag" link. (Not all users will see this; there is a karma threshold.) If you flag something, please don't also comment that you did.
"Don't abuse the text field in the submission form to add commentary to links. The text field is for starting discussions. If you're submitting a link, put it in the url field. If you want to add initial commentary on the link, write a blog post about it and submit that instead."
So that Google code link is the original source today, but it might not be tomorrow.
Old formats like PSD are better viewed as archaeological artefacts than as exemplars of some elegant ideal.
Then it wouldn't be a new, "clean" format. Then it'd be the old format with additions. Which it already is. Your post makes no sense to me.
PS: I fixed the typo in "lesson"
This argument applies the more, the bigger markup-to-data ratio is.
Nearly everything inherits from a basic struct that is 8 bytes per atom: { length of self + children, quasi-human readable 4 char code describing contents }
Practically speaking, in C/C++, you can stride by length and switch() on the ftype, using it to cast the read-in data to whatever class/struct you desire.
All of this while being so brutally dumb that you can rewrite it over and over again in about 10 lines of code in most languages.
I suspect that's where it originated.
The file is a single atom that has other atoms (and random parameters and such) in its data field. You end up with a big tree of atoms which can be parsed as needed. Super simple format -- like the parent, I use atom trees all the time for serialization.
IFF is: struct chunk { char tag[4]; int32 length; byte data[length]; byte padding[(2-(length%1))%2]; }
The padding is to two bytes; the tag uses ascii exclusively and no space (33-127), although every format I remember uses upper case + digits. The length does not include tag and the length field, not the padding. Microsoft, in a typical "we don't care" move adopted the spec except they specified little endian whereas IFF is originally big endian.
The entire file must be one complete chunk, and is thus limited to 2GB (signed integer length).
This format has been around (and at some point, dominated image storage with it's "ILBM" chunks, as well as other domains) since 1985 at least. https://en.wikipedia.org/wiki/Interchange_File_Format
See:
http://en.wikipedia.org/wiki/Interchange_File_Format
http://en.wikipedia.org/wiki/Resource_Interchange_File_Format(Under absolutely no circumstances should anyone actually do this)