NixOS on Btrfs+tmpfs(cnx.srht.site) |
NixOS on Btrfs+tmpfs(cnx.srht.site) |
That doesn't sound right. Noatime turns off recording of the last access time, not modification.
This noatime thing is an old-wive's tale that needs to die.
AFAIK, most "modern" filesystems (XFS,BTRFS etc.) all default to relatime
relatime maintains atime but without the overhead
EDIT TO ADD:
Actually,I've just done a bit of searching .... relatime has been the kernel mount default since >= 2.6.30 ! [1]
[1] https://kernelnewbies.org/Linux_2_6_30 (scroll to 1.11. Filesystems performance improvements)
The cost of atime is an extra write every time you read something.
Relatime changes this to one atime update per day (by default), low enough that it usually doesn't matter.
However, that update per day may have significant impact when you are using Copy-on-Write filesystems (btrfs, zfs). Each time the atime field is updated you are creating a new metadata block for that file. Old blocks can be reclaimed by the garbage collector (at an extra cost), but not if they exist in some snapshot.
All of this means that if you use btrfs/zfs and have lots of small files and take snapshots at least once per day, there's a noticeable performance difference between relative and noatime.
I've been using noatime everywhere for several years and I've never noticed any downside. This is definitely my recommended solution.
I would not recommend doing that. It might work for now, but there's a high risk of the disk being seen as "empty" (since it has no partition table) by some tool (or even parts of the motherboard firmware), which could lead to data loss. Having an MBR, either the traditional MBR or the "protective MBR" used by GPT, prevents that, since tools which do not understand that particular partition scheme or filesystem would then treat the disk as containing data of an unknown type, instead of being completely empty; and the cost is just a couple of megabytes of wasted disk space, which is a trivial amount at current disk sizes (and btrfs itself probably "wastes" more than that in space reserved for its data structures). Nowadays, I always use GPT, both because of its extra resilience (GPT has a backup copy at the end of the disk) and the MBR limits (both on partition size and the number of possible partition types).
https://openzfs.github.io/openzfs-docs/Getting%20Started/Nix...
[0] I lost 2 root filesystems to btrfs, probably because it couldn't handle space exhaustion. I'm paranoid now.
I use nixos with zfs on /home, /nix and /persist. Everything else is tmpfs, including /etc. Mostly you can configure applications to read config from /persist, but when not, a bind mount from /etc/whatever to /persist/whatever works pretty well.
I will never use a computer any other way again.
Isn't this useless? My understanding is that compression is only done at file write time. When you "btrfs send" a snapshot, the data is streamed over without recompression, so there's no point in setting up a higher compression level in the backup disk.
-Fill your rootpartionion as root with "dd if=/dev/urandom of=./blabla bs=3m"
-rm blabla && sync (we don't want to be unfair to such a fragile system)
-Reboot and end up with unbootable /
It's a mess, for a filesystem i would declare it as alpha stage.
Ideally reproduce it with mainline kernel or most recent stable. And post the details on the linux-btrfs list. They are responsive to bug reports.
If you depend on LTS kernels then it's, OK to also include in the report the problem happens with X kernel but not Y kernel. Upstream only backports a subset of fixes and features to stable. Maybe the fix was missed and should have gone to stable or maybe it was too hard to backports.
These are general rules for kernel development, it's not btrfs specific. You'll find XFS and i915 devs asking for testing with more recent kernels too.
But in any case, problems won't get fixed without a report on an appropriate list.
The mistake however is that even though it isn't practical to make theoretical guarantees that the filesystem won't end up full and broken, it is very possible to make such a thing only happen in exceeding unlikely cases. One runaway dd isn't that...
Everytime I read about it, someone is losing data.
Thank god Ubuntu makes zfs very easy to use. No reason to even consider touching btrfs.
https://bugs.launchpad.net/ubuntu/+source/zfs-linux/+bug/190...
I've used noatime by default, except for few cases where I know it is used, in professional settings for probably two decades. Hopefully you know what kind of appliciation you are running. There are many parameters in a system and this is just one of them.
The only times I've seen atime used has been for a two queues, and only in the case of "has this file changes since last it was read". And that is precisely what relatime is for, the daily update is just an optional extra.
And to other comments:
It was a DC-Harddisk, and NO, not even root should be capable destroying the filesystem by simply write to it, it's not 1970 anymore.
Calculating the "to reserve" Metadata-block should be rather trivial since it's ONE big file. And it's not dd that is the problem, it's btrfs that cannot handle a process that writes ONE BIG file.
Make a VM OpenSUSE or Fedora (tested just these two) fill it and see it not boot anymore...it is trivial.
Most distros require a read-write /sysroot, and expect the ability to write to disk. If they can't write, various services will fail and that can prevent startup from proceeding further. But without any logs, we have no idea what you're actually experiencing.
You are saying it won't boot but that's not at all a case of a broken file system. It's an expected consequence of the file system being full. Since the examples were clear intent of making the file system completely full, it's a setup to prevent the file system from further writes.
Overwriting file systems are expected to run into this problem less, but aren't immune to it. If the data write requirement is new writes, rather than overwriting, it'll fail whether ext4, XFS or Btrfs. If the requirement involves overwriting, it's expected overwriting file systems will succeed where COW file systems simply can't. It might be a valid argument in favor of non-root users being disallowed to use the last 1-5% of free space on any file system.
Please read my comment in full especially that point:
- rm blabla && sync
- reboot
It's not dd, it's one process run by root who fills the filesystem with one big file. That's like the first thing i would test if it can destroy my filesystem.
It's really the filesystem responsibility, if it needs to reserve 30% so be it, if it need's more because i wrote billions of files so be it, (even if it says "sorry i told you i have 50GB free but because you wrote so many small files it's now just 45GB" after all they just can make a estimation) so be it. But it's the filesystem job to tell me how much ~free space that i have, and stop writing before it really/internally cant take anymore. And NOT to kill itself because i alloc 100% of it, there's is just no excuse. That's the filesystem's responsibility.
PS: The clever ZFS survives that "unlikely" test easily.
For example, say you try to delete a file, which is part of one of multiple identical snapshots, so deleting the file doesn't free up any space, but does require extra metadata to be written (since a new directory entry will be needed that shows the file is deleted in this snapshot only).
The same operation could be done for millions of files, eating up all the reserved space. End result: full disk and unusable filesystem, even for deletes.
The alternative is not to allow file deletes to use reserved space. But now when you have a full disk, some things become 'undeletable', since the only way to free space is to delete all copies of the file, but it isn't permitted to delete any one copy of the file since the intermediate state would use more disk space.
Btrfs won't issue the writes for super block update until the device says the current metadata transaction is successfully on stable media.
It is possible the filesystem is completely consistent (can be mounted, btrfsck finds no error), and yet not bootable due to the interruption of updates. Software updates are one transaction in user space but not atomic unless expressly designed for it. From the fs point of view, a software update might be broken up into dozens of fs transactions.
It's also possible the device lies about writes being on stable media. If the fs writes some metadata, does flush/FUA, then super block write, and flash/FUA, the device should only write the super block after the prior write is on stable media. If it says the first flush succeeded but that write is still happening, and the super block write goes to stable media before all the metadata writes get to stable media and there's a crash or power failure, then you can in fact have a broken filesystem. The super points to tree roots that don't exist. This is definitely a device flaw not an fs flaw.
Btrfs super blocks contain 3 backup roots. So it's possible to revert to an older and hopefully correct metadata generation (seconds to a couple minutes ago). But this has limited recovery potential. It's also completely thwarted right now if you use any discard mount option on an SSD because discard will ask the device to garbage collect recently freed metadata blocks. So the backup root trees pointed to by the super may already be zeros when they're needed.
But any need for backup roots already means some kind of device (firmware) flaw.
I've heard about this, but my understanding was that when this happens, performance becomes extremely poor. While that may be quite bad, it's still worlds apart from losing data.
There's also the fact that the user may have partitioned the drive in a such a way to prevent it from ever filling up. Even root can't fill the partition beyond its size. Here, you have to go out of your way to make sure the partition doesn't fill out, or else you have a bad time. Shit happens, so this does look like a FS bug to me, much more than PEBCAK.
Except if you keep overwriting a flash-based storage system, at some point that flash storage gets destroyed (wear level). You can absolutely achieve such by having a near full filesystem on flash. Mechanical harddrives or partitions don't suffer from this issue.
Perhaps the issue occurs more quickly on btrfs, that I don't know, but it could happen on any filesystem. On the other hand, you should have backups. Personally, I use ZFS on two of my machines, with snapshot feature.
Wearing out is yet a different thing. I've had this happen on an SD card. It would refuse to write anything new, although it reported being mostly free. But the stuff that was already on it was readable.
I've had SD cards that got full. They didn't lose any data, and once I'd moved the things off them, they became usable again.
Granted, this was with a digital camera, so using fat32 at the time, so no fancy FS.
Exactly, a swimming-pool should never explode if you overfill it, however it's the users responsibility to turn the water off to prevent "data/water-loss"
That's why we made filesystems, preserve and organize data and tell the user/system when it cant take anymore.
In the ordinary cases, btrfs full behavior is the same as other filesystems. It gets full, you can delete files. Keep in mind deleting files on any cow filesystem is a write that consumes free space before space is freed from the delete operation. There is reserve space for this. If you hit an edge case (which would be a bug) there's currently no known data eating bugs. But it's not always obvious to the user their data hasn't been eaten if the filesystem won't mount. As in, this is indistinguishable from data loss. Nevertheless it's a serious bug so if you have a reproducer it needs to be reported.
But hey how about a quota behind the scenes?....you know like ZFS? AFS? ReFS?...you know so the filesystem tells the user "sorry cant take anymore" before it really cant take anymore? That would be some crazy enterprise level stuff....
You know, a Filesystem that immediately stops writing and instead cares more about the data that's already on the platter?
BTW: It was a DC-Harddisk
"Can't boot" is also vague. I've had data loss with XFS in 2002 or so (didn't have backups), couldn't mount filesystem anymore. Thanks to help on IRC from devs I got almost all data back. I've been able to get recover data from a dying Deathstar, too. And then there's the RAID5 write hole (can be mitigated), and RAID5 issues on btrfs (which are well known). For all we know you were using RAID5 shrug. Anyway, did you file a bug report, did you contact the devs?
No true with ZFS and XFS, you are trying to defend a ill designed filesystem....in typical linux *fashion ;)
It's S#it but at least we "invented" it.
Exactly that's NOT the case:
>>-Reboot and end up with unbootable /
You can test it for yourself, it "works" reliable since over more then 8 years.
As much effort as you've spent complaining about this problem in this thread you could have provided a proper bug report in the proper venue. It's not going to get fixed by complaining on HN.
So I have to ask if you just want to complain about it or if you want it fixed? Both can be true and legit. But it still requires a report in the proper venue with enough detail for someone else to reproduce.
You don't even tryed....a loop device really? And hey i use zfs, i don't trust btrfs who has problems (data-loss) since 10 year. No thanks....hell even SLES recommends for data partitions XFS.
>But it still requires a report in the proper venue with enough detail for someone else to reproduce.
Hey how about you? Since you seam to care..install a vm ~10minutes? fill it ~10minutes? there you go.