Preview in macOS Big Sur is destroying PDFs

Preview in macOS Big Sur is destroying PDFs(annoying.technology)

359 points by matrixagent 5 years ago | 317 comments

mulmen 5 years ago |

I have learned to be scared of my MacBook. Seemingly safe behavior can cause permanent damage. It does completely unexpected things, apparently by design.

I do not put my pictures in the ~/Pictures directory for fear of what the newest app will do to “improve” them for me. I fully expect it to apply lossy compression to my files without asking. This is after Photos or whatever it was called at the time mangled the dates on a bunch of my vacation photos to 10 years before the actual trip.

Oh and have fun when your photos are automatically uploaded to iCloud to save space locally then silently deleted from iCloud to... save space? My sister lost her first year of baby pictures to that one.

Same with ~/Music after iTunes wiped out a bunch of carefully curated metadata. Yes, I did want that album art.

I fat-fingered some key combination in Messages recently and got a prompt confirming I wanted to delete the entire conversation history. I consider myself lucky it bothered to ask.

I can add “view a PDF” to the list of things likely to leave me holding the bag.

cle 5 years ago | |

I have run into that Messages fat-finger-delete multiple times, it is infuriating! I still don't know what the key combination is, but IIRC the confirmation defaults to "Delete" when enter or space are pressed, which are...quite common when sending short messages.

tekacs 5 years ago | | |

They've finally removed the keyboard shortcut for this in Big Sur. :)

It was Cmd + Delete/Backspace before.

mulmen 5 years ago | | |

After my latest iCloud password change Messages has also been giving me the beachball of doom when images are received. I'm terrified of what that implies. I'm looking forward to the announcement of the exploit where a carefully crafted image owns MacBooks.

FireBeyond 5 years ago | |

Multiple people are complaining that Big Sur is blowing out their speakers on the laptops.

I submitted something yesterday where Big Sur completely breaks DSC for all non-Apple monitors (and in some cases, even those).

Oof.

p1necone 5 years ago | |

The more I learn about macs the more I think the "it just works" crowd really mean "I will sacrifice my system "working" sometimes in exchange for having zero configuration options".

iansinnott 5 years ago | | |

In the past it really did just work and macs were configurable. The trend of limited configuration is more recent (and yes, it's terrible).

netflixandkill 5 years ago | | |

For a decade or so it pretty much did just work. Alas, nothing gold (or metallic shades of white in this case) can stay.

norswap 5 years ago | |

Apple, who sold hardware because they had the best software, now sells software because they have the best hardware.

Funny how things change.

dkonofalski 5 years ago | |

Those things are literally not possible to happen without your intervention... ಠ_ಠ

kalleboo 5 years ago | | |

Yeah, macOS does not touch files in Pictures or Music, only files that you've explicitly imported into the Photos or Music/iTunes apps. And it definitely doesn't silently delete photos from iCloud, if that was common it would be a major bug that would make the news.

robertoandred 5 years ago | |

Do you have proof of iCloud deleting photos?

meibo 5 years ago | | |

I assume these are a result of bad UX, at least in my personal experience.

iPhones used to/will(haven't had the pleasure in a year now) bother you quite heavily if you're at your iCloud storage cap to either upgrade or clean it out. Not a stretch that some users might not think long enough about the consequences.

Yetanfou 5 years ago | |

Install 'Linux' [1] on the machine then? That is, assuming you're using a model which is supported by some form of Linux. That way you get to use the hardware without being bitten by the software. Linux distributions are not perfect either but they offer fewer such 'surprises'. Keep MacOS around for those times you need to run software which is only supported there but do your main work in Linux.

[1] where 'Linux' stands for any supported Linux distribution

xxpor 5 years ago |

Why anyone treats PDF as anything but a write-once format is beyond me. It's so finicky that I'm not shocked bugs like this happen. The only programs I'd be reasonable sure wouldn't screw it up are Acrobat itself, and pdflatex and friends.

I think we need a multi-image container format. It could be something that's literally a bunch of jpgs/pngs/pick your poison in a tar container, and given a new extension. OSes would open it and present it as a gallery in order. There's no value in a non-ocr'd PDF existing. For OCR'd text that gets more complicated, but it feels like we should be able to come up with a common denominator that doesn't have the legacy of a binary format derived from postscript in the early 90s.

crazygringo 5 years ago |

I work with a ton of PDF's between my Mac and iPad, and it mostly works but there are still just way too many bugs.

It's a lot of little things, like in Catalina where opening up the sidebar for annotations (comments) seemingly randomized their order. (Big Sur, fortunately, fixed it to be page-order again.)

Or how printing a PDF from a website (in Catalina, also seemingly fixed in Big Sur) would look right on the page... but if you copied and pasted the text from the PDF to somewhere else, something like 10% of the glyphs were scrambled ("lik3 thZs"), like some sort of character table corruption.

Or reading a PDF with Books on my iPad, maybe 10% of the time bookmarking a page... doesn't bookmark it. Or removing a bookmark... doesn't remove it. Or a handful of highlights you just made have inexplicably disappeared the next time you open the file.

Or whenever you open the PDF in Books it remembers which page you were on. Except sometimes it doesn't, so you can't really rely on that for saving your place.

Or in Books, if you select some text to copy but accidentally hit the adjacent "select all" in the pop-up menu, and you're dealing with a 400-page PDF, it just locks up and you have to restart it.

Or in Preview if you want to convert a PDF to black-and-white, there's an option for it but your PDF will balloon in filesize to 10x larger or something.

I mean, I could go on and on. It's weird, because Preview is an incredible app, really. But it really is like they build it and then never bother to test if basic workflows reliably work.

avalys 5 years ago |

This is a clickbait, sensationalist headline. “Saving a PDF with Preview in Big Sur can corrupt OCR text added by a third-party program” is more accurate.

lilyball 5 years ago |

I find posts like this completely pointless when they include no details at all. This is just "there's an incompatibility between third-party software and a version of macOS that the third-party software says they don't support yet, so I'm going to publicly criticize Apple".

If you're not going to do the work to figure out what the corruption is, at least include the two PDFs so other people can look at them and see what happened.

dewey 5 years ago | |

There’s a list of blog posts about the same problem linked in the article including a radar from 2016 (https://openradar.appspot.com/29786282) on Apple’s bug tracker. It’s not exactly an obscure bug that nobody knows how to reproduce.

lilyball 5 years ago | | |

A radar from 2016 is not useful, that describes an old bug. Just because the symptoms look like something we’ve seen before doesn’t mean it’s the same underlying issue.

matrixagent 5 years ago | |

> If you're not going to do the work to figure out what the corruption is …

I'm sorry, but last time I checked neither Apple nor ABBYY pay my salary. I really don't understand these takes. If Apple or ABBYY want my PDFs, they should be able to find my email address rather easily. Your tl;dr version of the post is completely unfair. I publicly criticize Apple because they are breaking something that potentially affects a lot of people who are unlikely to even know about it, and they are doing it for at least the second time now. If you don't think that's worthy of criticism, I don't know what is.

I also love how so many people assume I didn't already talk to support and file radars. I guess you had better luck in the past than me, but I can assure you, these options aren't always as useful as you might think they are.

JumpCrisscross 5 years ago | | |

> neither Apple nor ABBYY pay my salary

This is a fair bar for conversation, in person or online. One can be more demanding of a public write-up.

ztravis 5 years ago |

My guess is that the output PDF is still valid, but that an embedded (subset) font has had its `ToUnicode` map stripped, so that there's no link between the character codes used in the text elements and the "actual" characters they represent (there are also other ways this corruption could happen, but dropping or mangling the `ToUnicode` map seems like a likely cause).

duskwuff 5 years ago | |

This is almost certainly it. I've seen similar issues with copy/paste from poorly constructed PDFs, often ones generated by "print to PDF" features.

arthur2e5 5 years ago | | |

Very old LaTeX PDFs tend to have this issue too. Chances are pretty slim for profs to edit PDFs witb Preview, I think…

lrossi 5 years ago | |

I agree. If you look closely, you can see certain patterns repeating, they’re just not English letters. But it definitely looks like natural language, and not random binary dump.

Marioheld 5 years ago | | |

Also look at the spaces. The length of the words is the same on both texts. So the content is still present just the characters got shifted.

zepto 5 years ago |

They are using software unsupported by the vendor and blaming Apple for the outcome.

“ABBYY says they don’t support Big Sur yet, that’s fine. But Apple didn’t tell me that I can’t upgrade to Big Sur when I use ABBYY. I’d be a lot less angry if there was a changelog or release notes from Apple where it says there is a known problem with OCR’ed PDFs in Preview. Their software is broken, they need to tell me. I don’t care if it only worked because they had workarounds for super shitty PDFs that ABBYY possibly produces, I just need my OS to keep working for me.”

userbinator 5 years ago |

I remember many years ago distributing PDFs as part of course material, that Adobe's official reader would open just fine, but Mac's built-in one wouldn't (and simply fail with a useless "an error occurred" message.) Only a small subset of the class was using Macs and the built-in reader, so it took a while to discover. The problem eventually turned out to be some oddity in the way it treats whitespace[1], that Adobe and a few other readers were perfectly fine with, but not Preview.

[1] PDF is one of the strangest file formats I've worked with. It is a bizarre mix of binary and text, and some of the other design decisions are also perplexing.

rubyn00bie 5 years ago | |

> PDF is one of the strangest file formats I've worked with.

Do you by chance have a "definitely strangest" file formats? Just curious if something out there is vastly weirder, or more perplexing, than PDFs?

agersant 5 years ago | | |

I haven't worked with it myself but I heard Photoshop's PSD format is a good candidate.

maximilianburke 5 years ago | | |

Yes, Adobe's PSD is definitely more weird and perplexing than PDF.

unfocused 5 years ago |

I think the HN crowd has forgotten that the entire legal system uses PDFs, and in addition uses the redaction features of the likes of Adobe Acrobat, as well as others trying to squeeze in like FoxIT.

Redaction is huge in governments that have gone digital. Gone are the days where you print the paper, black it out, and then photocopy it.

I have worked with PDFs for a long time, and if you ever wanted compatibility, you had to use Adobe Pro, since there were so many bad PDFs with weird embedded stuff that only Adobe could read properly...because it was initially created in Adobe sigh

All other products try to catch up, but they can't clean up the mess that Adobe has left behind.

mhh__ 5 years ago |

Preview seems like a good example of something that's worth open sourcing. Not only will people end up doing work for you, you get eyes on the code and more direct issue tracking.

Consumers get a product and they still have to go on Mac to use it.

bigbubba 5 years ago | |

I've been looking for a FOSS desktop agnostic universal file previewer or thumbnail generator for a while now; if anybody has suggestions I'd love to hear them. Ffmpegthumbnailer for video thumbnails or imagemagick for image thumbnails are fine, but what about previews for things like ebooks or PDFs? Something that provides a one-stop-shop for as many common filetypes as possible is what I'm looking for.

My current solution is controlling a floating mpv window to open image, video or audio files as they are selected. This works well for A/V but not so well with other sorts of documents.

hydrox24 5 years ago | | |

> but what about previews for things like ebooks or PDFs

MuPDF is a great FOSS application and my go-to PDF reader. It lacks fancy annotation, and doesn't even have great text selection and copy/paste, but it is really fast, and has fast search, manipulation, etc.

https://mupdf.com/

duskwuff 5 years ago | |

For what it's worth, Preview is a relatively thin shell around Apple's own PDFKit:

https://developer.apple.com/documentation/pdfkit

Whether that could itself be open-sourced is an interesting question. (My concern would be that parts of it might be covered by Adobe NDAs.)

tonyedgecombe 5 years ago | | |

My understanding is that it is Apple’s own code, they didn’t license it from Adobe.

arvindamirtaa 5 years ago | |

>Consumers get a product and they still have to go on Mac to use it.

There will be ports to windows and linux in under a month.

fastball 5 years ago |

From what I can tell, there is no reason you can't just run the PDF through ABBYY FineReader again and get the exact same OCR you got the first time, so I think "irreversible" is a bit over-the-top.

Is it as easy as CMD+Z? No. Is it data you can never get back? Also no.

matrixagent 5 years ago | |

In theory that is probably true – in my actual scenario I can't run them through ABBYY again because of the limitations of the bundled version. It only accepts PDFs coming from the scanner software, so running these through ABBYY again would give me an error message. I'd have to buy the full version to be able to try out that workaround.

non-nil 5 years ago | | |

On a totally not entirely unrelated note, I have found ExifTool[0] to be quite useful for many tasks. Especially in combination with a bash alias or simple Automator action, to be used in the services menu, or as a droplet or folder action. [0]https://exiftool.org/TagNames/PDF.html

cprecioso 5 years ago |

This happened to me in Catalina as well. This summer I was preparing the paper proceedings for a conference, which were made with InDesign. I had to remove a couple of pages from the output, did so with Preview, and from then on, the text was garbled on copy-pasting. Had to switch to using Acrobat for that step.

juskrey 5 years ago |

Preview for PDF manipulation was a nice try at first, until I realized I suddenly have unexpected problems with produced docs, trouble with drag-and-drop, overwritten files etc..

Now I am using PDFGenius and never looking back.

e40 5 years ago |

Let's be real. Every single macOS release, until it reaches x.y.4 or x.y.5 is just in beta and you are the tester.

I upgraded to Catalina when it hit 10.15.6, and I watched for the year since the release all the comments and posts about the horrible things it was doing to their computer, files, apps, etc.

Apple supports the latest 2 versions of macOS. Always be on the "previous" one is my advice. Since my family and friends started following it, they are much happier and more productive.

Let the masses beta test.

fastball 5 years ago | |

Is that not like, every piece of software ever?

I don't know very many pieces of widely used / actively developed software that stayed static on X.0.0 for more than a couple weeks after release or so.

e40 5 years ago | | |

No, it's not. I knew I'd get downvotes. Don't mind. I don't say this about macOS lightly. I've been using it since 10.0.

krull10 5 years ago | |

Seconded for macOS. I usually update when the next major release is about to be announced. By 11 months they usually finally have the bugs worked out. It isn’t always necessary for every yearly release, but once you’ve been burned a few times you learn it’s better to wait for several point releases...

ehutch79 5 years ago |

Apple has a lot of shit they need to fix in macOS and the accompanying apps.

That said, the author of this article is clearly an ass, and i have a hard time being sympathetic.

Assuming the pdf is actually in spec, which it's probably not, this shouldn't be happening. That said, if the 3rd party app vendor says the pdfs they generate are broken in big sur, that should tell you, they may be broken other places as well, and it's probably not apple's issue.

matrixagent 5 years ago | |

Could you explain why or how exactly I'm an ass?

ehutch79 5 years ago | | |

To quote:

"""But Apple didn’t tell me that I can’t upgrade to Big Sur when I use ABBYY"""

cosmotic 5 years ago |

The text corruption doesn't appear to be random. The same word gets converted to the same corruption. It's more likely an encoding/decoding bug.

dev_tty01 5 years ago |

Preview used to be solid, but it has been increasingly fragile in recent years. I found PDF Expert to be a great replacement. I have no affiliation.

nerpderp82 5 years ago |

> You have to completely close the file and reopen it, only then will you realize that it has been destroyed.

Someone 5 years ago |

At first glance, it’s a replacement cypher. Every ‘a’ becomes a filled square, every ‘b’ a ‘p’, every ‘c’ a ‘(‘, every ‘d’ a ‘)’, etc.

However, there are exceptions, for example the first ‘b’ on line 10. It becomes an ‘ä’ on line 21. I guess that’s because that is bold text, and thus a different font.

rubatuga 5 years ago |

Once again, the Hacker News comments prove to be more useful and insightful than the article itself.

kekeblom 5 years ago |

I had an issue recently where the form contents filled and saved with Preview.app would not show up in acrobat reader. I've encountered this in two cases so far, with two completely different documents.

qwerty456127 5 years ago |

I have encountered too man PDFs (mostly digital originals rather than OCRed scans) corrupted this way during the recent months. Now I see why...

skissane 5 years ago |

I hate Preview's PDF editing features, I wish there was a way to turn them off.

I'm the kind of person who tends to randomly click on things as I read them. In other PDF readers, this is quite harmless. In Preview, it starts editing the PDF. 99.9% of the time I have zero interest in editing or annotating the PDF I am reading. And then when I quit it asks me if I want to save a copy. I never wanted to change it to begin with!

(Maybe it is time I found another PDF reader...)

jordache 5 years ago |

anyone else not able to see sufficient details the tiny screenshots? What was the difference?

sp332 5 years ago | |

The difference to look for is between the top half on the right vs the bottom half on the right. The text has been scrambled into random symbols.

Here's a direct link to the 2,240x939 image: https://annoying.technology/media/previeweatingpdfs.png

r00fus 5 years ago | |

There is a more detailed image link in the doc.

lisper 5 years ago |

Using Apple devices in general seems like a total crap shoot to me nowadays because of the impossibility of down-grading the OS. Every "upgrade" comes with a considerable risk that something that had been working will stop working, and if that happens, you are pretty much SOL.

fastball 5 years ago | |

What? You can definitely downgrade to an earlier MacOS.

It's not a one-click downgrade like the upgrade is, but I don't know of any OS with that feature.

lisper 5 years ago | | |

> You can definitely downgrade to an earlier MacOS.

Sometimes you can, sometimes you can't. Going from Mavericks to Yosemite for example is one-way because it includes a non-backwards-compatible firmware update. Going to Catalina is also one-way because it changes the file system from HFS to AFS.

And iOS is famously non-downgradable.

rbanffy 5 years ago | | |

> What? You can definitely downgrade to an earlier MacOS.

Unless they got a brand-new M1-based Mac. Macs usually don't install versions of macOS prior to their launches.

0000011111 5 years ago |

Use "Adobe Acrobat Reader DC" for pdf work on macOS v11.1

tonyedgecombe 5 years ago | |

I tried that and it was less reliable than Preview.

ProAm 5 years ago | | |

In what ways?

nt2h9uh238h 5 years ago |

Is this German?

matrixagent 5 years ago | |

Yes.

anonuser123456 5 years ago |

Time machine?

dewey 5 years ago | |

Backups are always great, but if something is broken silently behind your back and you only realize in a few years that your archived documents are not searchable any more that makes it harder to recover.

beamatronic 5 years ago |

Preview should not change the file on disk. I would expect it to open the original file as read-only.

blacksmith_tb 5 years ago | |

Yes, the author says it's "the result after modifying (removed a blank page) and saving that same PDF in Preview." So it's not enough to just view the file in Preview.app I take it, but you need to save it out (which still shouldn't strip anything extra, obviously, but is not what I thought was being claimed).

birdyrooster 5 years ago | | |

So you are saying that they overwrote their file and are upset that the file they overwrote is different from the new file? This is insanity. Clearly a bug in AABBY that it can’t read PDF saved in the standard spec.

PDF is not a bitmap, it’s a script like HTML or JS. People understand browser incompatibility but some how this is unconscionable.

throwaway744678 5 years ago | |

I understand it does not: the issue occurs when the user removes another (blank) page, then saves the file.

MrBuddyCasino 5 years ago | |

> In the lower half is the result after modifying (removed a blank page) and saving that same PDF in Preview.

I don't think this means Preview changes the files just by opening them.

YetAnotherNick 5 years ago |

PDFs are not intended to be modified. Preview and other readers use hacks to do the work. In general don't modify the PDF and if you really want to do it buy Acrobat reader.

tonyedgecombe 5 years ago | |

The PDF file format has a mechanism in it for modifying documents.

sn41 5 years ago |

There was something in macos Catalina that broke mupdf on my macbook pro. The view would occupy the lower left corner of the window, and something was clipping the view to the lower quadrant.

I tried installing from source, changing the gl library etc. But it was the same.

Am done with Apple for now. M1 is a bit tempting, but I guess I will wait for the technology to mature, buy a Macbook Air, and run Linux on it.

ehutch79 5 years ago | |

Why would installing from source change things. Without finding/fixing the bug, you're just using the same compiled code as before

sn41 5 years ago | | |

I was trying avoid library incompatibilities. Pulled everything from the repository and recompile with the latest libraries. I also tried a couple of different libraries. I gave up after a week or so. (What I did not do was to compile the libraries from the source as well.)

I really like mupdf so it was a big nuisance for me to lose that.