Apple File System Reference [pdf](developer.apple.com) |
Apple File System Reference [pdf](developer.apple.com) |
[1] https://developer.apple.com/library/archive/technotes/tn/tn1...
Kudos to Apple for providing the information. It's a hell of a lot better than reverse engineering the thing (see how many years it took to get NTFS down...)
Or does it matter, performance-wise? People that know more can chime in.
"This document is for developers of software that interacts with the file system directly, without using any frameworks or the operating system—for example, a disk recovery utility or an implementation of Apple File System on another platform."
"You cannot enable Fast Directory Sizing on directories containing files or other directories directly; you must instead first create a new directory, enable fast directory sizing on it, and then move the contents of the existing directory to the new directory."
but there was never any documentation on how to do this, and no Apple engineer would say. The most common internet theory seemed to be that this feature was purely automatic, and all mentions (like this) in the docs were just incredibly misleading.
Now it seems we have an answer, in this flag: "INODE_MAINTAIN_DIR_STATS: The inode tracks the size of all of its children."
HFS+ had no knowledge about Fusion drives, the caching was handled entirely at block-level by the lower CoreStorage layer (although later versions did add some flags so CoreStorage could pin metadata/swap blocks to the SSD).
Now what I'm really interested to see is if they open-source the filesystem driver along with the macOS 10.14 code drop. HFS+ (and its utilities) has always been open-source, last year APFS was not.
Having had to replace failed HDDs in Fusion Drive iMacs at work, it's certainly no fun. For all new Mac purchases I ensure they are SSD only now.
It certainly is better to have the filesystem aware of the Fusion situation, but...measurably, significantly better? Would the experience have been significantly worse without it? 10.13 betas allowed APFS use on Fusion drives, presumably without any Fusion-awareness in the FS.
I'm surprised, but happy to see they did it.
Those counters always are 64 bits, and won’t overflow in normal use (for example, the text says: ”if you created 1,000,000 transactions per second, it would take more than 5,000 centuries to exhaust the available transaction identifiers.”), but I can see people making ‘interesting’ disk images, for example ones where writing to a specific directory is impossible or, depending on how the implementation handles it, even panics the OS.
I fear this is too little, too late to have iDefrag make a comeback. I understand defragmenting an SSD typically does more harm than good [edit: and I only defrag spinning drives), but nothing touched it for effectiveness on spinning drives.
https://coriolis-systems.com/iDefrag/
https://coriolis-systems.com/blog/2017/9/what-works-macos-10...
I’d also hoped that a next generation file system from Apple would have had more to say on this topic, but it seems like features that promote their iOS device agenda took front seat over less “sexy” features like data integrity.
In the days before iOS devices dominated OS level decision making at Apple there was an assumption that Apple might adopt ZFS as their next generation file system, which is apparently much better in this regard. There’s various evidence of a cancelled MacOS ZFS project scattered throughout past MacOS releases.
> https://en.wikipedia.org/wiki/Data_degradation
> https://arstechnica.com/gadgets/2016/06/zfs-the-other-new-ap...
For a long time Apple has had an HFS+ driver baked into the firmware. The way APFS is implemented with EFI jumpstart, they've got much less filesystem code in firmware.
It glosses over and assumes knowledge of XDR from an external source. That is documented here: https://tools.ietf.org/html/rfc1014.html
I only have spinning rust in my (older) NAS or as backup drives.
I have a multi-HDD system running to 6TB of storage, I can't run to the £ of an all-SSD system at the moment, but that's on the cards for the future, so it's spinning rust for me, for now.
It's actually quite a clever spec, because it takes advantage of existing efforts to read fragmented files to perform the majority of the de-fragmentation process.
I'm not sure if this spec applies to APFS or for SSDs. (With SSDs you're generally better off not defragmenting most of the time, because the performance penalty is far lower, but the write amplification has consequences.)
——————
When a file is opened on an HFS+ volume, the following conditions are tested:
If the file is less than 20 MB in size
If the file is not already busy
If the file is not read-only
If the file has more than eight extents
If the system has been up for at least 3 mins
If all of the above conditions are satisfied, the file is relocated—it is defragmented on-the-fly.http://osxbook.com/software/hfsdebug/fragmentation.html
——————
Defragging has been snake oil for more than a decade, anything that hastens it’s demise is a good thing.
ZFS in particular completely falls off a cliff somewhere between 80% and 90%, due to the copy on write nature of ZFS always allocating and freeing small bits of space. That creates the little gaps all over the FS which murder performance when the big gaps run out.
It’s not technically the little holes or even CoW that is the problem. It simply switched from a “first fit” to a “best fit” algorithm as it got full which was quite expensive to do the search for.
When ZFS was opensourced under the CDDL, lots of people complained that they should have chosen a clearer, more permissive opensource license. Other people said it was fine, because the license was good enough and Sun is full of good people. The way everything played out, its clear the first group's concerns were valid.
Its a huge shame. ZFS is a fantastic piece of engineering. It was ahead of its time in lots of ways. It would take years for btrfs to become usable and for apfs to appear on the scene. If not for the weird licensing decision, zfs would almost certainly have landed in the linux and macos kernels. We almost had an ubiquitous, standard, cross platform filesystem.
For more history about Sun and Oracle, this talk by Bryan Cantrill is a great watch: https://www.youtube.com/watch?v=-zRN7XLCRhc
For example, data painstakingly entered by the user a character at a time with a keyboard might deserve more redundancy than for example a movie downloaded by iTunes.
Also, if I pull the drive and move it to another machine, again it’s kind of nice if the data integrity features are tied to the drive format rather than higher level software. I don’t think it’s too unreasonable to expect the file system to make sensible guarantees that the sequence of bytes I record today will remain the same until I next interact with them.
I’m not sure how appropriate comparisons with IP error correction is either; it’s a markedly different class of problem really (you are not dealing with long term storage issues at all).
Defragging an SSD is often a good thing? Why? It would seem to greatly increase wear for no benefit.
But I don’t see how this would ever realistically happen.
That said, your hard drive already does block level checksumming so doing it at the FS layer is mostly redundant unless the errors are being introduced in your SATA controller or on the PCI bus.
If a bit flip occurs during the path to storing data, that could get persisted. That's a moment in time, though. Maybe you'll notice the document you just wrote seems corrupted, or just has a typo.
But if you write successfully to disk, you are trusting that data to stay there long-term. If years later your drive corrupts a bit, you may have a very hard time noticing. Bad RAM manifests as computer instability and you can just replace RAM without data loss, as nobody is permanently storing data in RAM
Because the data spends so much longer on disk than in RAM, the chance of a bit flip affecting stored data.
[1] https://support.apple.com/en-us/HT1375 (last updated 2010) [2] http://osxbook.com/software/hfsdebug/fragmentation.html (from 2004)
The red slivers towards the lower middle are where the reported fragmented files are located, and the lower right is the fragmented files listed by number of fragments descending.
The anonymised .mkv files are training videos with multiple language and subtitle streams. They're exported to a scratch drive and then copied to the the drive they currently reside on.
After a full, offline defrag (including a b-tree rebuild) the legend for defragmentation is a neatly arranged list of that blue/grey colour, no red to be seen.
I know from personal experience how powerful this placebo effect can be.
Perhaps. I can certainly tell the difference performance when an overnight defrag run has finished, and I have no beef with you to prove or disprove a point.
If you're ever in North Cornwall, UK: drop me a line and I'll show you a before and after over a mug of coffee/tea/etc.