OpenZFS 2.0(github.com) |
OpenZFS 2.0(github.com) |
https://arstechnica.com/gadgets/2020/01/linus-torvalds-zfs-s...
So cold data (cold write, cold/hot read) will take less and less space over time while still having the same read performance.
(It would also be a performance nightmare - you'd have a permanent indirection table you'd need to use for _everything_, and if you've ever seen how ZFS dedup performs with its indirection table not on dedicated SSDs, you can understand why this is terrible.)
* https://openzfs.github.io/openzfs-docs/Basic%20Concepts/dRAI...
Would be great for home use, where I have a lot of drives that I collected over the years that are not the same size.
EDIT: The more I read into this, it still seems assume that all drives must be of the same size.
That way, if one disk fails, the reserved space is used to write the data necessary to keep the array consistent. Because the free space is distributed randomly across the array, the write performance of a single drive doesn't become a bottleneck.
This is unrelated to the ability to remove drives from a pool (which is difficult to support in ZFS due to design constraints)
dRAID, Finally![0]
One thing I am wondering about is this:
> Redacted zfs send/receive - Redacted streams allow users to send subsets of their data to a target system. This allows users to save space by not replicating unimportant data within a given dataset or to selectively exclude sensitive information. #7958
Let’s say I have a dataset tank/music-video-project-2020-12 or something and it is like 40 GB and I want to send a snapshot of it to a remote machine on an unreliable connection. Can I use the redacted send/recv functionality to send the dataset in chunks at a time and then at the end have perfect copy of it that I can then send incremental snapshots to?
> Redacted send/receive is a three-stage process. First, a clone (or clones) is made of the snapshot to be sent to the target. In this clone (or clones), all unnecessary or unwanted data is removed or modified. This clone is then snapshotted to create the "redaction snapshot" (or snapshots).
Think of it like a selective sync in Dropbox or SyncThing at the FS level.
That's not to say rsync doesn't work. It does. But it doesn't scale well, and the data integrity guarantees aren't there.
btrfs seems like the main alternative if you want native kernel support, but when I checked a couple years ago there seemed to be a lot of concerns about the stability. Is that still the case?
For me, that gives a unicorn 100% of the time (tried across several minutes), instead of showing the developer profile.
Anyone else seeing that?
Many thanks to the various OpenZFS contributors.
I've seen people use it as a rootfs on RPis, and have personally run it on Pis for brief occasions without encountering any RAM problems.
(Sorry if noise; I'm just trying to get an idea of how relevant this 2.0 release is to me.)
Previously it was called ZFS on Linux, but now ZFS development is unified on the "OpenZFS" codebase shared both between Linux and FreeBSD as much of the development effort for ZFS in general ended up there.
I realized how bad the performance was when it took about 2 hours to delete 1000 files.
Deduplication is the process for removing redundant data at the block level, reducing the total amount of data stored. If a file system has the dedup property enabled, duplicate data blocks are removed synchronously. The result is that only unique data is stored and common components are shared among files.
Deduplicating data is a very resource-intensive operation. It is generally recommended that you have at least 1.25 GiB of RAM per 1 TiB of storage when you enable deduplication. Calculating the exact requirement depends heavily on the type of data stored in the pool.
Enabling deduplication on an improperly-designed system can result in performance issues (slow IO and administrative operations). It can potentially lead to problems importing a pool due to memory exhaustion. Deduplication can consume significant processing power (CPU) and memory as well as generate additional disk IO.
ZFS also has a huge legacy. Right now the license (probably) prevents you from legally shipping a compiled zfs module with the linux kernel, just solving that seems insurmountable. It's also supported on Illumos and FreeBSD, trying to refactor it to use the linux page cache would have a chance of introducing bugs to these platforms.
[1] https://lore.kernel.org/linux-btrfs/20200627032414.GX10769@h...
[2] https://www.man7.org/linux/man-pages/man8/mkfs.btrfs.8.html#...
[1] https://lore.kernel.org/linux-btrfs/
[2] https://lore.kernel.org/linux-btrfs/CAD7Y51i=mTDnEWEJtSnUsq=...
[3] https://lore.kernel.org/linux-btrfs/CAMXR++KUj2L7qpR7QZeiM2T...
(But as others have pointed out, there are options for using zfs on linux, too)
1. It often happens that the main repo offers a new kernel, but the corresponding module is not ready on obs yet. This means upgrading to the latest rolling release cannot just happen at any time, but requires careful planning. This is a big inconvenience.
2. In the past dracut sometimes just failed to pick up the module for the initrd, causing a boot failure at the next system start. I could not figure out why, however this never happened with the first class supported ext/xfs.
3. The distro's boot/rescue media do not contain the driver. This means a third-party boot medium is required to go into a broken system, and repairing it when chroot is involved is now much more complicated because of the different distro.
A friend did a video based on my blog: https://www.youtube.com/watch?v=PILrUcXYwmc
Or you can use the latest Ubuntu that is shipped with ZFS.
For the most part, yes. Occasionally a kernel developer who seems to be bitter about a company that doesn't exist any more tries to break compat with ZFS, but it's generally smooth sailing on Fedora, Debian, and CentOS, with dkms handling the building of modules seamlessly.
Do we have encryption,yet?
Use BTRFS trust me it's stable now...well the commands are terrible compared to ZFS. All my Server are FreeBSD but on the Laptop and on one Workstation i have openSUSE Tumbleweed since like 2 years and it works great.
Really? I don’t think so, I find btrfs usage extremely straightforward and easy to grok. ZFS on the other hand has all that confusing lingo about vdevs, etc...
I get that this is subjective but I disagree.
what does that mean?
I switched my freebsd box over to debian about two years ago. No complaints so far :)
[1] https://lore.kernel.org/linux-btrfs/20200627032414.GX10769@h... [2] https://lore.kernel.org/linux-btrfs/20200627030614.GW10769@h... [3] https://lore.kernel.org/linux-btrfs/20200520013255.GD10769@h...
Compression settings are set at a per dataset level, so applying this to only some files in a dataset isn't practical.
I think Canonical planing is to have ZFS as an experimental option only for Desktop version of Ubuntu until next LTS release.
fileSystems."/zfs/media" =
{ device = "tank/media";
fsType = "zfs";
};
in my hardware-configuration.nix. tank/media is defined as using a legacy mount-point or whatever the ZFS terminology is. Done.ETA: I mean, I had to do all the gruntwork to get the pool built, yeah. But once it was defined, getting it mounted and all the kernel bits and bobs set was trivial like that.
boot.supportedFilesystems = [ "zfs" ];
Both installs the necessary kernel modules and adds zpool(1) / zfs(1) to $PATH.ZFS on the other hand has just two commands for common administration tasks: zpool and zfs. zpool controls pool-level operations, mainly ones that deal with the storage layer; zfs controls the logical file systems and volumes that are contained within a pool. The zpool and zfs commands have been meticulously crafted to not expose much of the underlying software architecture and focus only on what administrators want, and all of it is clearly documented.
There are actually a few other commands that come with ZFS if you really want or need to deal with low-level and difficult details, commands like zdb, zinject, zstreamdump. You almost never need any of them.
So I guess that the GP considers /usr/sbin/{zfs,zpool} more intuitive than /usr/sbin/btrfs.
It has nothing todo with /usr/sbin/x
>what does that mean?
Not functional but logical (for me)
As an example, you're running low on space and need to find out which datasets (subvolumes) are using the most space. How do you do that? With ZFS it's a single command which runs in a few milliseconds. With Btrfs...
I may have tried it far enough back that I pretty much immediately encountered packages I wished it had and tried (and failed) to package it myself, though, and got the experience mixed up…
There are an array of flaws with the tools but, despite that, they are unbelievable powerful and you can do things in NixOS you can't dream of doing elsewhere, and it makes things like using OpenZFS or whatever pretty easy and simple. And it makes some thing far more difficult than that, nearly trivial. But only once you know what you're doing. But that's just the reality: it's an extremely powerful tool that has many rough edges. Saying it's "the easiest distro to use" is a complete joke, and I wish fellow NixOS users didn't have some weird propensity to practically lie about how good it is on that front. I say this as someone who has been a NixOS developer and user for like, ~7 years and who apparently(?!) has over 1,000 commits to the tree now, too. Trying to actually sit down at a terminal and sell unconvinced people on it opened my eyes quite a bit. It's good, but lying about what it is and isn't is a good way to burn peoples faith.
What was worrying was that the XFS dataloss was due an action totally out of our hands: a power outage at a substation which took out a whole area of the city. The whole datacentre lost power, and the XFS filesystems on some massive storage arrays were completely hosed. Just from power loss. It took days to put it all back from tape backups. XFS has long been known to have problems with unclean shutdowns, but total loss from a power outage is about as bad as it gets.
These were all on Gentoo, so with relatively recent tools and vanilla kernels.
The only filesystems that I never had problems with were ext4 and reiserfs.
That was exactly the FS that eat my data back in ~2005. Never had problems with XFS or ZFS. With Btrfs well i just use it regularly since 2 years so i cant say much, but i think Redhat chose XFS for a reason.
Replacing a core system component with an out-of-repo version is always going to hurt, yes.
> I switched to btrfs; it just working is worth the few extra warts over ZFS.
I'm not sure I'd call "catastrophic failure and data loss" a "wart". In all my years of distro hopping, I've had 3 root filesystems become unbootable: 1 F2FS system early on, which I actually did manage to fsck out of, and 2 on an openSUSE tubleweed system using BTRFS as root.
How long ago was that? and have you been using other fully checksummed filesystems (like ZFS) on that hardware since then? I'm asking because if you're using btrfs without any raid features (or with simple RAID modes like 1/0) for the past several years and it breaks, if you dig deep enough into the problem, often the hardware is found to be at fault.
And ext4 or xfs either don't find corruption at all (if it's data corruption), or have better error recovery if the FS's own metadata got trashed (which is a strong argument in favor of them, I agree, but I wouldn't trust such a filesystem anyway and would restore from backups right away).
Edit: it's a strong argument for storing data on them which is checksummed by some higher component in your software stack, like the database. Otherwise, you're just asking for silent data bitrot.
That's not really good enough though. Next gen file systems are supposed to be resilient even if hardware fails. That's the whole point of raiding and checksumming. ZFS was very much intended to be resilient when faced with bad hardware. Heck, even in the 90s this was a known problem hence chkdsk on DOS marking bad sectors to somewhat mitigate data corruption on FAT file systems. If Btrfs only works when hardware is behaving then that is absolutely a problem with Btrfs.
As for my experience with ZFS, it's kept consistency when disks have died. It's worked flawlessly when SATA controllers have died (one motherboard would randomly drop HDDs when the controllers experienced high IOPS -- which would be enough to trash any normal file system but ZFS survived it with literally no data loss). Not to mention frequent unscheduled power cuts, kernel panics (unrelated to ZFS), and so on and so forth. I'm sure it's possible to trash a ZFS volume but it's stood strong on some pretty dubious hardware configurations for me and where most other file systems would have failed.
Also, I'm going to somewhat mirror sibling comments: Even if the hardware is faulty, that should produce a filesystem with explicit checksum errors, not an unreadable filesystem. There is certainly an upper limit to what it could catch, but you'll have to forgive my skepticism that only one of the 2 filesystems on the system was affected and only after months of use, and then the corruption was so complete that it couldn't even tell me what was wrong and try to fix it.
Well with ZFS I've had hardware break and still not experienced any data loss. I've had cables getting lose multiple times, I've had several disks dying[1], I've had unstable SATA controllers (hello JMicron) and plenty of unexpected power losses and hard resets.
Yet ZFS has sailed through it all with my data intact. Sure ZFS ain't bulletproof. It can get messed up. But for the most part it takes a lot of beating without a dent.
[1]: As a matter of fact, I just finished resilvering a RAID-Z1 pool in my NAS after a WD Red 3TB died after almost 7 years of 24/7 operation (barring a few accidental power outages).
https://gist.github.com/xenophonf/76fd44ae24772e457cb63d00c0...
`apt-get update && apt-get dist-upgrade -y` works as expected. I plan to switch to a similar config on my Lenovo laptop when I upgrade it to the next Ubuntu LTS release.
As someone using new kernel version as they are released, I'm not willing to use a filesystem that may break with a kernel update. It also seems openzfs only supports up to kernel 5.6, according the the github release. I'm on 5.9, so its not even an option.
https://wiki.archlinux.org/index.php/ZFS
I would need a package that depends on zfs and provides linux-kernel at an appropriate version. Can't have something so critical break because of an upgrade, and I don't want to pin it and forget to upgrade it (also fairly anti-arch).
There have been a couple cases where I had to wait a week or two for compatibility fixed to get merged into zfs git, but otherwise staying up to date has not been a problem.
NixOS is easy to use as long as as what you're trying to do is contained within the configuration, then set up is pretty much just editing that configuration file (which essentially is just series of dictionaries and lists). I wouldn't for example have with installing NixOS for my grandparents.
Here's example config when that's true: https://github.com/areina/nixos-config/blob/master/thinkpad-...
If you want to do something that's not covered then you'll have to learn Nix, and that part is indeed hard, because it's like configuring linux through use of saltstack/chef/puppet/ansible through a functional language (which many people don't have experience with), but as you said it pays off.
I think the hardest part is the paradigm shift where everything you do is no longer imperative but declarative. It also doesn't help that documentation is always behind what nixpkgs can do and Nix functionality would cover multiple books.
That said, I generally agree with you in that do one thing and do it well is a laudable design goal. However, I also am very excited about encrypted ZFS for one main reason: backups.
Okay two. Snapshots and backups!
ZFS is absolutely amazing to use as a home NAS that does daily (or more) snapshots and then nightly differential syncs to a second location. In the past I had to run all my own infrastructure to do this, as the data was in the clear.
Now my ZFS nerd friend and I can simply swap backup space and have "zero knowledge" of the others' files, while retaining the amazing features of ZFS snapshots+zfs send/receive.
This also tickles the "create an encrypted ZFS backups as a service" service itch for me, but then I realize I'd be creating it for all 13 potential users of the service. That said, I'm sure rsync.net will offer this functionality shortly - which would make them a viable backup target for me.
It's just that majority of users never have reason to see more than tiny signs of the layers hidden behind (mostly) 2 command line tools, and for various reasons those layers are compiled into one one module.
But the clean layered design is how LustreZFS happened :)
I also recall someone working years ago on a way to push snapshots to S3 or similar, but I never heard if that idea got off the ground (downside is of course the snapshots need to be recovered before they can be mounted, but the dollar cost would be rock bottom).
What would be more interesting is a backup application for Desktop Linux that assumes a ZFS root; all the problems that plague Desktop applications (that seem to keep them in eternal beta or wither away) disappear. It needs to switch on and push snapshots. It needs allow the remote file system to be mounted (to browse the snapshots for selective recovery). It needs a a disaster recovery process to recover an entire system from a remote snapshot.
You can pipe zfs send to gof3r.
This is why I really wish btrfs would get native encryption, but maybe my info is out of date.
In my case, because it's what Ubuntu "supports" for bootable root crypto ZFS, and I wanted to try it.
I've run ZFS on top of LUKS for my backup storage servers for probably over a decade now, and it works fine. But it wasn't really an option for my workstation.
That said, I'm not really sure what benefit I'm gaining from ZFS on my desktop. I've got that snapshots which are definitely nice. I've used it a couple times to go back in time. In theory I can go back if I wedge the system through package installs or an OS upgrade, but I've not done that (yet). It does slow down package installs because of taking snapshots, but that's ok generally.
And in my experience LUKS works great.
At some level, they must understand that both XFS and LVM are over 25 years old, and when compared with e.g. ZFS, are completely outclassed. Their current efforts developing Stratis, which is an attempt to provide more ZFS-like functionality by extending XFS, adding LVM thin pools, and managing it all with an unholy complex combination of daemons, D-BUS and Python looks like a logical progression based upon what they have to hand in house, but a strategic mistake when it can never approach ZFS in functionality or reliability simply because these technologies can only be extended so far because of fundamental design limitations. I'll be morbidly interested to see what they can stretch XFS to do. But I won't be using it myself.
What I find really surprising here is that Linux in general, and RedHat in particular, don't have a competitive filesystem to offer. There is absolutely nothing which matches ZFS.
Not sure if you would risk your customers data just because of that. I never had any problems with XFS.
>At some level, they must understand that both XFS and LVM are over 25 years old
Being a User of ZFS (on FreeBSD) myself, zfs is not much younger 2006.
>and RedHat in particular, don't have a competitive filesystem to offer.
That i really don't understand too. Maybe they think for "small" stuff HW-Raid or LVM is good enough and everything bigger is Ceph or Gluster anyway.
However, XFS isn't perfect. As I wrote in a separate reply in this thread, my team in a previous position suffered catastrophic dataloss when a power cut took out some massive storage arrays. XFS does not handle power loss gracefully, and in two cases, the whole storage array was unrecoverable and required restoring from tape.
I use ZFS on FreeBSD (and Linux) too, and while it dates back to 2006 and was designed around ~2000, LVM and XFS date back at least a decade prior to that. They are a generation apart, and ZFS builds upon the knowledge of that previous generation, and its successes and its flaws.
Regarding competitive stuff, that's a mystery to me as well. My organisation went with some proprietary IBM storage array kit, but it was a real pain. Required hand compiling kernel modules against the RHEL kernel. And it still resulted in the above dataloss issues.
It's a very cleanly layered system, it just doesn't bother end user with details (as implementor, you can play with them, thus LustreZFS): there's separate SPA (block), DMU (OSD) and ZPL (FS) & ZVOL (emulated block device) layers.
Compression and encryption are integrated at DMU level because that's a logical place for them.
NFS actually calls OS nfs server.
Interesting; it'd been claimed to me before that ZFS had its own NFS server (or I guess the OpenSolaris NFS server) included but that nobody used it because it was old/buggy. A quick glance at https://github.com/openzfs/openzfs/ (the old archived version based on illumos, if I read correctly) implies that this might have been true at one point, but indeed https://github.com/openzfs/zfs doesn't seem to do its own NFS so it's not true now if it ever was. Thanks for correcting my understanding.
I believe that the modularity will only proof itself when external (as in, from unrelated people) projects becomes established and we see how well the original project maintains compatibility.
(I must say I wonder how big the intersection of people-who-like-ZFS and people-who-like-systemd is; they seemed to originate from very different cliques but there's no reason people who like one would dislike the other…)
As for external clients for the layers, LustreZFS is a separate project though it had started with certain intersection of ZFS devs. However the general division of labour between layer is pretty strict (except for - now extinct - FreeBSD TRIM support), it's just that there isn't any work being done to use it outside of OpenZFS.
The boundaries are pretty clear, it's just that ZPL and ZVOL build up on all of them. Linux has /some/ related features, but nothing that was feature-parity: SPA roughly corresponds with MD/DM subsystem assuming certain plugins in use, DMU is very roughly the equivalent of OSD subsystem, but that one supports only SCSI OSD which has incompatible assumptions etc. - in fact, an OSD implementation on top of DMU should be pretty simple (main differences are due to DMU being a bit explicit on redundancy features, iirc).