YANT (Yet Another NAS Thread): ZFS Config Edition

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
So as noted here I have come into possession of a Supermicro 4U 24 bay unit. The final upgrade build was:
  • Supermicro X10SRH-CF
  • Xeon E5-2660 v4
  • 4x64GB LRDIMMs
  • Backplane upgraded to a BPN-SAS3-846EL1
  • 15x HGST 10TB Data Center drives (14 for RAID10 or the ZFS equivalent and 1 spare)
I have completed building it and making sure everything works, and am now in the "what the hell do I want to go with" phase. Currently testing out TrueNAS Scale. Herein lies the issue.

For reference, this unit is replacing an xpenology build on old desktop hardware. The use case is mainly an SMB file dump sorted into folders with user permissions at the folder level. In addition, I plan to use VMs from time to time, the primary VM being a light Ubuntu desktop build for handling torrents and such (one of the few machines in the house capable of using the 2Gb Internet connection). I do plan to use VMs a bit more eventually, but nothing really mission critical. A bit of homelab/learning, and fun crap like Windows 3.x/9x VMs and the like.

I've read the Ars ZFS 101, and I have done a fair bit of Googling, but I cannot seem to get anything remotely definitive on the need-or-not for L2ARC, ZIL/SLOG, or special/metadata. Most of what I can find that's close to definitive on SLOG is that it's really only useful to have if you're working with lots of synchronous writes, which I don't believe I will be. I have gobs of RAM, so it seems that L2ARC having its own cache vdev is not needed/won't provide any real performance but I'm not 100% on that. This leaves special/metadata. The only thing that seems definitive there is if it goes, the whole pool goes. But there's apparently some sort of "double write" penalty if you use the pool itself for metadata, which is why using a mirrored vdev is a safe performance choice. I am hoping folks here can clarify and explain the necessity (or lack thereof) of these vdevs for me so I can stop banging my head against Google.
 

Paladin

Ars Legatus Legionis
32,552
Subscriptor
Personally, I have run a couple of TrueNAS setups with a single SSD each for a L2ARC and a ZIL. Works well and I can tell the difference in some use cases with them there versus not there. The nice part is they don't have to be big. You can get a couple of old datacenter Intel SSDs for cheap on ebay or something and give them a shot to see if they help or not. If you don't make good use of them in general IO work, you can repurpose them as a single mirror pair for either system booting for the whole machine or as a high performance volume for the virtual machine boot drives, where you want the most performance for the random IO of updates and stuff for the OS and applications.
 

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
Will you add SSDs?
I have consumer grade 250G and 500G SATA SSDs available if I wanted them, and the boot drives are 500G SATA SSDs in a mirror, but I had no intention of spending the $$$ to build an SSD pool for general use.
For a home NAS, it's not really necessary. Nice to have, nice to play with and get familiar with, but you probably won't notice any performance difference using a cache vDEV or the like, even with light virtual machine usage running on it. You'll notice more performance going SSD from HDD more than adding cache drives.
This is what it seems like is being said overall with the exception of the special/metadata vdev. Sadly I lack the $$$ necessary to build a 70TB mirror using SSD. Perhaps if I win the big lottery some day. :)
Personally, I have run a couple of TrueNAS setups with a single SSD each for a L2ARC and a ZIL. Works well and I can tell the difference in some use cases with them there versus not there. The nice part is they don't have to be big. You can get a couple of old datacenter Intel SSDs for cheap on ebay or something and give them a shot to see if they help or not. If you don't make good use of them in general IO work, you can repurpose them as a single mirror pair for either system booting for the whole machine or as a high performance volume for the virtual machine boot drives, where you want the most performance for the random IO of updates and stuff for the OS and applications.
I already have the system booting from 500G SATA SSDs in mirror, so at this point it's solely about whether I'd see any meaningful benefit for the pool of spinning rust from using any of the extraneous vdevs or not. That said, the idea of having a pool dedicated to the VMs themselves is an interesting proposition, and may be something I look at later if/when I get serious about homelab stuff.
 

UserIDAlreadyInUse

Ars Praefectus
3,602
Subscriptor
500GB boot volume is a bit large for a NAS....would you consider using the rust disk for at-rest data and carving off a 400GB-ish volume on the SSD mirror for the virtual machines? You'll notice a significant performance boost on the VMs on SSD vs HDD, the boot volume should be fairly quiet (other than logging) once the OS is up and running so sharing it with a VM volume wouldn't impact performance of the VMs any and you probably won't notice too much of a performance hit on the at-rest data using dedicated volumes on rust drives if you're mostly using it for backups and streaming.
 
Last edited:

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
500GB boot volume is a bit large for a NAS....would you consider using the rust disk for at-rest data and carving off a 400GB-ish volume on the SSD mirror for the virtual machines? You'll notice a significant performance boost on the VMs on SSD vs HDD, the boot volume should be fairly quiet (other than logging) once the OS is up and running and you probably won't notice too much of a performance hit on the at-rest data on the rust drives if you're mostly using it for backups and streaming.
I agree, but I went into it a bit blind, assuming I'd be able to use that space in a fashion that doesn't appear to actually be allowed. I'd need to look into the install process, I've only done it twice now and don't recall seeing anything during the process where I could set the partition size. A little Googling seems to suggest this is an intentional design choice and to say trying to use the boot_pool disks for other stuff as well is frowned upon appears to be a severe understatement. That said, there are guides on how to do it, this one seems workable and I can give it a shot. Should be able to leave ~400GB for use being generous and making the boot_pool 64GB.
 

steelghost

Ars Praefectus
4,975
Subscriptor++
What's your PCIe slot situation on that board? Seems like putting a pair of NVME drives in there (via adaptor cards) as a mirrored pool for your VMs would be doable.

Has the benefit of not needing to stray into unsupported install configurations, and isn't super expensive, but neither is the cost zero. You could even do a single NVME SSD and then just do regular snapshots onto the main array...
 

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
What's your PCIe slot situation on that board? Seems like putting a pair of NVME drives in there (via adaptor cards) as a mirrored pool for your VMs would be doable.

Has the benefit of not needing to stray into unsupported install configurations, and isn't super expensive, but neither is the cost zero. You could even do a single NVME SSD and then just do regular snapshots onto the main array...
Currently not using any of the PCIe slots, but I will be using one for a 10Gb NIC, haven't decided if I am going to bother with a GPU or not. I had started looking at cards for m.2 drives and saw they generally aren't cheap, unless I get one that utilizes bifurcation (which this mobo does, but from what I've read it seems their explanation in the manual of how that happens is inaccurate).
 

koala

Ars Tribunus Angusticlavius
7,579
The TrueNAS needs a dedicated system drive... well, I understand their reasons, but for me (I am not their target audience) it has always been a big obstacle.

I dunno, 15x10Tb drives + 256gb of RAM sounds like the scale where adding SSDs tends to make sense. But really, I don't know what you will be using that beast for. I've been running a Proliant Mini with 4Gb of RAM and 2x HDDs and it's plenty good for ZFS filesharing- it lacks a bit of oomph so I run VMs elsewhere, but I don't know what's your target. It sounds like you don't really need to add more oomph through SSDs, but I'm having a hard time visualizing your scenario.
 

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
The TrueNAS needs a dedicated system drive... well, I understand their reasons, but for me (I am not their target audience) it has always been a big obstacle.

I dunno, 15x10Tb drives + 256gb of RAM sounds like the scale where adding SSDs tends to make sense. But really, I don't know what you will be using that beast for. I've been running a Proliant Mini with 4Gb of RAM and 2x HDDs and it's plenty good for ZFS filesharing- it lacks a bit of oomph so I run VMs elsewhere, but I don't know what's your target. It sounds like you don't really need to add more oomph through SSDs, but I'm having a hard time visualizing your scenario.
100% agreed on the obstacle, and it seems to bug a lot of people, but they don't seem intrerested in budging on it.

I stated the majority of my use case in the first post, but I did forget to mention that I will eventually also add A/V streaming to it, mainly for when I'm not at home. I'm so used to thinking within the constraints I normally have that I'm also only now realizing I'll be able to stop using separate PCs for things like dedicated ARK and Valheim servers. I'll be able to set those up on this system as well. My current NAS cannot do that.

The system is a bit overbuilt but it was amazingly cheap, all things considered. That mobo bundled with a lesser CPU was going for ~$250 everywhere I looked, then I found the board alone for $145 and the CPU for $10, similar E5 v4 CPUs with lesser core counts were also $10, so it seemed a no-brainer to get the 2660 if I was gonna pay that much regardless. RAM was a similar deal. 64GB LRDIMMs were cheaper than 2x32 or 4x16 of the same type (I was originally looking at just meeting the stated 64GB minimum for ZFS, possibly going for the 128), so I figured I'd make the most of it and ensure I had plenty of memory for things now instead of being held up by need later. HDDs were a great deal from goHardDrive, $70 per drive, refurbed/certified/newed with 5yr warranty. Spent and got more than I originally intended, but I think I got a pretty good amount of hardware for the cost and it ensures I have room for growth. Only whole-assing here.
 

Paladin

Ars Legatus Legionis
32,552
Subscriptor
They used to let you run from a USB drive, I still have a couple systems like that. You can/could even do mirrored boot drives with USB flash drives or SD cards. They basically got tired of the support work to keep the system slimmed down and optimized for tiny install footprint and low IO logging etc. and the poor reliability of the trash flash drives people would use. Running your NAS from a 4 or 8GB USB thumbdrive you got in a box of Cap'n Crunch 7 years ago usually ends up being problematic. ;)

So they moved to requiring a real drive and I guess they never bothered to work out doing a full ZFS boot setup... but I think they might have in the newer TrueNAS Scale based on Linux. Maybe?

Ah looks like a 'probably'. I haven't done it myself yet but I imagine it should be fine.


I assume that means you can just tell the TrueNAS Scale installer to eat all your disks at once and boot off of the large pool and still have the rest for storage.
 

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
I assume that means you can just tell the TrueNAS Scale installer to eat all your disks at once and boot off of the large pool and still have the rest for storage.
I could have it use all of the drives, yes, but then none of that storage would be available for anything else. The boot_pool always uses all of the drives it is given, and none of that is available for anything else. That's the obstacle koala was talking about. I went into this not knowing that which is why I have 2x500G SSDs. I made a poor assumption that I'd be able to use the remaining disk space.
 

koala

Ars Tribunus Angusticlavius
7,579
Yes. I think there's no way to distinguish a real drive from a flash USB drive. In fact, I think you can still install to even twin USB flash drives.

I understand why they don't want to do it- their target market works that way (get two nice SSDs as OS disks- it's the proper way). But it's a bit of a pain in the low end.

(For instance, I think running TrueNAS Scale "in the cloud"- or rather a cheap dedicated server on Hetzer with HDDs would be a supernice solution for a lot of small orgs. But requiring 1/2 dedicated drives for the OS... reduces the value proposition significantly, even if it's the "right" thing to do.)

Proxmox can do "everything in ZFS, in two mirrored drives". So while it's not a NAS product, I'm using it as a NAS. Kinda like TrueNAS Scale is a NAS, but it can do other things- but in the other direction.
 

koala

Ars Tribunus Angusticlavius
7,579
I stated the majority of my use case in the first post, but I did forget to mention that I will eventually also add A/V streaming to it, mainly for when I'm not at home. I'm so used to thinking within the constraints I normally have that I'm also only now realizing I'll be able to stop using separate PCs for things like dedicated ARK and Valheim servers. I'll be able to set those up on this system as well. My current NAS cannot do that.

The system is a bit overbuilt but it was amazingly cheap, all things considered. That mobo bundled with a lesser CPU was going for ~$250 everywhere I looked, then I found the board alone for $145 and the CPU for $10, similar E5 v4 CPUs with lesser core counts were also $10, so it seemed a no-brainer to get the 2660 if I was gonna pay that much regardless. RAM was a similar deal. 64GB LRDIMMs were cheaper than 2x32 or 4x16 of the same type (I was originally looking at just meeting the stated 64GB minimum for ZFS, possibly going for the 128), so I figured I'd make the most of it and ensure I had plenty of memory for things now instead of being held up by need later. HDDs were a great deal from goHardDrive, $70 per drive, refurbed/certified/newed with 5yr warranty. Spent and got more than I originally intended, but I think I got a pretty good amount of hardware for the cost and it ensures I have room for growth. Only whole-assing here.
Yeah, I'm jealous.

I think HDDs are fine for file sharing and streaming (unless you want many simultaneous streams). I would consider using SSDs for VMs- I was running Proxmox on HDDs and really swapping it to SSDs seemed to make a big difference to me.

I'm using 4Gb of RAM in my Proliant Minis for ZFS/Proxmox, and while perhaps that's exaggeratedly low, I suspect you could do with much less than 256Gb of RAM for file sharing.

In fact, if you can do it, I'd seriously consider one Proxmox system for virtualization- perhaps with SSD storage and lots of RAM, and another for TrueNAS Scale, with all the HDDs and less RAM.

I played around with TrueNAS applications, and I was quite impressed, though. Installing Nextcloud is really simple and nice. It's all Helm charts under the hood, so it's a nice thing to build on top of. If I didn't have my existing Proxmox/LXC/Ansible/Puppet setup with tons of code done, I would consider doing everything on top of TrueNAS instead of focusing on Proxmox.

(The only thing which I really prefer on Proxmox is LXC, really. But a modern, nice, K8S-based setup is nice...)
 

Paladin

Ars Legatus Legionis
32,552
Subscriptor
I could have it use all of the drives, yes, but then none of that storage would be available for anything else. The boot_pool always uses all of the drives it is given, and none of that is available for anything else. That's the obstacle koala was talking about. I went into this not knowing that which is why I have 2x500G SSDs. I made a poor assumption that I'd be able to use the remaining disk space.
Ugh, I can't believe they still have not implemented that, though I am sure they have a philosophical reason for it like they worry an update or something might break production data storage or whatever.

It is technically possible, no reason it can't work. They just haven't bothered to put support for it in the install script etc. They fiddle with the install scripts with new releases but the way to do it has remained mostly the same. You can edit the script to tell it the maximum size of partition to make and then simply go back after install and partition the rest of your disk and join it to whatever pool you want to create.

View: https://gist.github.com/gangefors/2029e26501601a99c501599f5b100aa6


Not a supported option but for people who have limited hardware arrangements it can be useful.

EDIT: Ars unfurls the whole post but not the comments. The comments have a slightly more direct method that is reliable on new releases from what I read.
 

koala

Ars Tribunus Angusticlavius
7,579
Yeah, I suppose you can make it work, and it's probably a good idea in some scenarios, but I think on every update you're rolling the dice.

Really it's a bit unfortunate that there are so few "supported" root-and-everything-on-ZFS options. I think it's:
  • Proxmox
  • NixOS
  • FreeBSD
  • Ubuntu 22.04 desktop
  • Perhaps Ubuntu 24.04 server?
  • ...
I understand why EL will not touch ZFS (and from time to time I should check out Stratis- perhaps it's good?), Debian won't, either. The other option is ignoring that "wrong" BTRFS redundancy options exist, and use the "right" ones...

At the moment I run 2 "NAS" using Proxmox on ZFS, plus my production LXC/VM host and I feel it's ugly, but I think it's the way to go.
 
  • Like
Reactions: Paladin

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
You can edit the script to tell it the maximum size of partition to make and then simply go back after install and partition the rest of your disk and join it to whatever pool you want to create.
Yeah, I linked to something similar though a few years older that does this. If I try it, the newer one you've linked is what I'll try.

Yeah, I suppose you can make it work, and it's probably a good idea in some scenarios, but I think on every update you're rolling the dice.

Really it's a bit unfortunate that there are so few "supported" root-and-everything-on-ZFS options. I think it's:
  • Proxmox
  • NixOS
  • FreeBSD
  • Ubuntu 22.04 desktop
  • Perhaps Ubuntu 24.04 server?
  • ...
I understand why EL will not touch ZFS (and from time to time I should check out Stratis- perhaps it's good?), Debian won't, either. The other option is ignoring that "wrong" BTRFS redundancy options exist, and use the "right" ones...

At the moment I run 2 "NAS" using Proxmox on ZFS, plus my production LXC/VM host and I feel it's ugly, but I think it's the way to go.

I will say I am not married to the use of ZFS. It just seems to be the current sweetheart file system (and many of the NAS OS/softwares are BSD based and using ZFS). I'm using Btrfs on my Synology rig. I did find myself researching why it seems the "self healing" was failing (a couple non important files have failed the data scrubs over time) and I was reading things like Synology's implementation not being right, and some other stuff. It's running on RAID5 which apparently there's a write hole issue or something, but I have no idea if that's part of it or not. With this new build that's a non issue anyways since I am going RAID10. I am considering just using Ubuntu Server+Btrfs and then setting up the shares, etc. my damn self vs needing something like TrueNAS. It would require me to step up even more to learn Linux stuff but that's a good thing. Proxmox is on my list, as is UNRAID, Casa OS, OpenMediaVault, possibly XigmaNAS, and maybe others I have not yet considered.

Speaking of Btrfs, what do you mean by "The other option is ignoring that "wrong" BTRFS redundancy options exist, and use the "right" ones..."?
 

koala

Ars Tribunus Angusticlavius
7,579
Apparently, some redundancy configurations in BTRFS are large footguns, while others are perfectly safe.

I think RAID10 is "safe".

I think that the fact that some configurations are dangerous would bother me even if I was running a safe configuration.

...

iSCSI is a bit scary to me, and having a nice web UI is nice, so I think there's plenty of reasons to like TrueNAS. But setting up Samba is simpler than other stuff. You can do play in a VM.

If you really don't need ZFS, Proxmox loses most of its charm as a NAS (but it wins if you want to run VMs).
 

w00key

Ars Praefectus
5,907
Subscriptor
I feel like Synology's mdadm for RAID, then throw BTRFS over it is a bit wrong - scrub errors can't be fixed as the fs doesn't even see the two raw devices.

And the mdadm layer just sees two different copies of the data and has no idea which one is right.


TrueNAS scale seems to be the future for a Linux based NAS, but still a bit "beta"? I didn't enjoy FreeBSD shell on FreeNAS, everything is slightly off and jails, idk if things broke I had no idea how to fix it other than trash the plugin and configure it again. Maybe it's better now but is was wonky. I can wrangle Linux based userland and Docker a lot better.
 

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
I feel like Synology's mdadm for RAID, then throw BTRFS over it is a bit wrong - scrub errors can't be fixed as the fs doesn't even see the two raw devices.

And the mdadm layer just sees two different copies of the data and has no idea which one is right.
I believe this was what I'd read regarding Btrfs on DSM after the fact, which was a real head-scratcher since that's basically removing the point of having/using Btrfs in the first place, and is more or less exactly the trouble I've had: I get errors regarding CRC mismatch and it wants a backup source to restore from.

TrueNAS scale seems to be the future for a Linux based NAS, but still a bit "beta"? I didn't enjoy FreeBSD shell on FreeNAS, everything is slightly off and jails, idk if things broke I had no idea how to fix it other than trash the plugin and configure it again. Maybe it's better now but is was wonky. I can wrangle Linux based userland and Docker a lot better.
It's definitely interesting. So far it's not too bad, I flubbed the first attempt on creating a VM because I thought I was telling it to allocate 8192MB of RAM, but it interpreted it as 8MiB and I didn't see that, so Ubuntu wouldn't install because "8MB? What is this, a 486?" Then I was having trouble uploading the install ISO. Once I got that right, it was figuring out why the Ubuntu desktop was weird when I remoted in with xrdp, then I was struggling with getting the SMB share connected. I finally got it working in Windows (I stopped trying to use the built in admin user and created a local user that matched my username and password in Windows and it connected right up. In the Ubuntu VM I think I needed to wrangle with setting up a network bridge. At this point I just need to play with the partitioning hack and see what I think, then I think I'ma move on to testing something else. I may come back and revisit setting up SLOG and/or special/metadata, we'll see.
 

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
Apparently, some redundancy configurations in BTRFS are large footguns, while others are perfectly safe.

I think RAID10 is "safe".

I think that the fact that some configurations are dangerous would bother me even if I was running a safe configuration.
I'll have to look into that. I'd only seen mention of some "write hole" issue with RAID5 and 6 but didn't dig to find out more.
iSCSI is a bit scary to me, and having a nice web UI is nice, so I think there's plenty of reasons to like TrueNAS. But setting up Samba is simpler than other stuff. You can do play in a VM.

If you really don't need ZFS, Proxmox loses most of its charm as a NAS (but it wins if you want to run VMs).
iSCSI is actually very interesting to me and has been since I learned about it. I don't know that I actually have a serious use case for it, but being able to boot a machine from a network location just seems really cool. The web UI stuff is alright, I'm used to it from using DSM. Need is definitely a strong word wrt ZFS. I just want a little extra security for my data on top of having gone with registered ECC. ZFS gets a little wild with how stuff works.
 

koala

Ars Tribunus Angusticlavius
7,579
https://btrfs.readthedocs.io/en/latest/Status.html ; RAID56 is marked as "unstable".

I think bitrot protection is important for a NAS, and there are not that many options out there! If I'm building a NAS, it's disturbing to me not to add some. (Although, for instance, for the moment I do not do ECC.)

edit:
The RAID56 feature provides striping and parity over several devices, same as the traditional RAID5/6. There are some implementation and design deficiencies that make it unreliable for some corner cases and the feature should not be used in production, only for evaluation or testing. The power failure safety for metadata with RAID56 is not 100%.
 
  • Like
Reactions: Red_Chaos1

malor

Ars Legatus Legionis
16,093
Ugh, I can't believe they still have not implemented that, though I am sure they have a philosophical reason for it like they worry an update or something might break production data storage or whatever.
ZFS really likes to use whole disks. It wants to be in control from the metal all the way up.

If you want to put VMs in your boot pool, you can create a subvolume, if ZFS storage is acceptable. (If you want the storage to be, say, ext4, that's much harder.) You can put a size limit on the subvolume so that the root volume never fills. And since the boot volume is generally the root filesystem, you can put the subvolume on any path you like.
 

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
https://btrfs.readthedocs.io/en/latest/Status.html ; RAID56 is marked as "unstable".

I think bitrot protection is important for a NAS, and there are not that many options out there! If I'm building a NAS, it's disturbing to me not to add some. (Although, for instance, for the moment I do not do ECC.)

edit:

Hrm, reading the part about RAID1/10 not being optimal is a little off-putting, but not game breaking. I know ZFS seems to have stuff in place for more even spreading of writes, but in the end it too is not really RAID10. The RAID5/6 stuff is definitely problematic. It'd be nice of these files systems could work without needing direct hardware access so real RAID could still be used. That or properly implement the RAID stuff. Of course then one could retort that I should learn to code and do it myself, to which my response would be "sure, lemme just jack myself in Matrix style and learn everything real fast." :p
 

koala

Ars Tribunus Angusticlavius
7,579
That could be likely "oh, we want to do more optimizations, but it's fine". But you know, if RAID56 is marked as non-production, that doesn't give me confidence in other modes. But I think they are quite likely safe, many serious organizations are leaning into BTRFS in the "safe" configurations.

I kinda prefer software RAID, so I have to worry less about hardware models.
 

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
That could be likely "oh, we want to do more optimizations, but it's fine". But you know, if RAID56 is marked as non-production, that doesn't give me confidence in other modes. But I think they are quite likely safe, many serious organizations are leaning into BTRFS in the "safe" configurations.
Yeah, the Btrfs doc basically says that for RAID10. Not sure what, if anything, there is for ZFS when it comes to making sure data is written equally.
I kinda prefer software RAID, so I have to worry less about hardware models.
Software RAID is fine for the most part, it's just that it seems Btrfs and ZFS in particular seem kinda lazy with RAID10, namely in the 0 part of it. They aren't really striping evenly across the mirrors, so there's not only loss of performance, but with SSDs you may get higher wear on some drives. It's probably minimal, but it's a thing to consider.
 
  • Like
Reactions: koala

teubbist

Ars Scholae Palatinae
823
I will say I am not married to the use of ZFS. It just seems to be the current sweetheart file system (and many of the NAS OS/softwares are BSD based and using ZFS).
It's the sweetheart because it's a relatively simple solution to data integrity to the "low end" bulk storage niche.

BTRFS's failure to fully the close the gap is odd, but considering Stratis was also touted to solve the simple interface problem, and is still nowhere near a useful state for that purpose, suggests that it's a problem that not enough people actually care about enough to come up with a native in-kernel solution. Maybe BcacheFS will save us even if, amusingly, it's EC mode isn't production ready either like BTRFS.

At the higher end you're either solving integrity further up the stack(i.e. CEPH and similar distributed storage) or using T10-PI. The latter either as hardware or emulated using dm-integrity where your component count is high enough that the performance hit isn't critical.

I guess depending on your performance needs, and how important going pure native Linux kernel features is to you, you could look at LVM's Integrity RAID. All modern distributions LVM tooling should support it now. It doesn't fully play nice with some other LVM features(writecache being the big one IIRC) though.
 

koala

Ars Tribunus Angusticlavius
7,579
I'm kinda curious about the status of Stratis and bcachefs. bcachefs in principle looks like a very interesting option for people running desktops that want cheap mass storage. For example, for my gaming desktops (which now run Linux), which have small SSDs. I like playing Flight Simulator on them, but really it's a conscious choice about installing it :D
 

teubbist

Ars Scholae Palatinae
823
I'm kinda curious about the status of Stratis and bcachefs.

Unless something recent happened Stratis only allows you to assign block devices to a pool, encrypt it, create filesystems and enable read cacheing. RAID, dm-integrity, etc. still requires you to do the legwork yourself and then present the final block device to Stratis.

I know it's RH's play for storage management but it's got a very long way to go, unless the "hybrid-cloud" wording in the most recent relaunch in EL9.3 is a subtle hint that it's not going to get its previous headline claims. You're currently better off just using the LVM tooling to achieve it all within a single framework if your needs are complex.

While I appreciate Kent pushing things forward for cacheing filesystems on Linux, I'm sitting out on bcachefs until it's been in kernel for 2+ years. bcache had some real rough spots and I'm not keen on repeating that cycle.
 

Cool Modine

Ars Tribunus Angusticlavius
8,539
Subscriptor
ZFS really likes to use whole disks.
Software has no dreams, no fears, and no desires. Yes, partitioning disks is bad because you murder seek time/latency/IOPS. But flash gives no fucks. The only real problem is keeping track of your configuration and not screwing things up, I.e. when replacing a failed drive.
Hrm, reading the part about RAID1/10 not being optimal is a little off-putting, but not game breaking. I know ZFS seems to have stuff in place for more even spreading of writes, but in the end it too is not really RAID10. The RAID5/6 stuff is definitely problematic. It'd be nice of these files systems could work without needing direct hardware access so real RAID could still be used. That or properly implement the RAID stuff. Of course then one could retort that I should learn to code and do it myself, to which my response would be "sure, lemme just jack myself in Matrix style and learn everything real fast." :p
its 2024. RAID is stupid. Literally. You want the jounaling on writesand The checksums of data. ZFS is much more “proper” than the obsolete RAID+filesystem paradigm Because it’s designed to not destroy your data!
Software RAID is fine for the most part, it's just that it seems Btrfs and ZFS in particular seem kinda lazy with RAID10, namely in the 0 part of it. They aren't really striping evenly across the mirrors, so there's not only loss of performance, but with SSDs you may get higher wear on some drives. It's probably minimal, but it's a thing to consider.
You really don’t know what you’re talking about here.
BTRFS's failure to fully the close the gap is odd,
The project started in what, 2007? 17 years without getting core features working? That shit is the Tesla full self driving of file systems.


Personally, I don’t see much value in the NAS OSes for anyone who’s even a modest power user. Setting up the storage pools and samba is basically a one shot deal. I spend a lot more time directly working with VMs and containers than with disks and file systems and shares. And for the VM side of the discussion, Proxmox is really nice. It’s one of the few pieces of software that I consider to make my life easier rather than harder.
 

Cool Modine

Ars Tribunus Angusticlavius
8,539
Subscriptor
ZFS really likes to use whole disks.
Software has no dreams, no fears, and no desires. Yes, partitioning disks is bad because you murder seek time/latency/IOPS. But flash gives no fucks. The only real problem is keeping track of your configuration and not screwing things up, I.e. when replacing a failed drive.
Hrm, reading the part about RAID1/10 not being optimal is a little off-putting, but not game breaking. I know ZFS seems to have stuff in place for more even spreading of writes, but in the end it too is not really RAID10. The RAID5/6 stuff is definitely problematic. It'd be nice of these files systems could work without needing direct hardware access so real RAID could still be used. That or properly implement the RAID stuff. Of course then one could retort that I should learn to code and do it myself, to which my response would be "sure, lemme just jack myself in Matrix style and learn everything real fast." :p
its 2024. RAID is stupid. Literally. You want the jounaling on writesand The checksums of data. ZFS is much more “proper” than the obsolete RAID+filesystem paradigm Because it’s designed to not destroy your data!
Software RAID is fine for the most part, it's just that it seems Btrfs and ZFS in particular seem kinda lazy with RAID10, namely in the 0 part of it. They aren't really striping evenly across the mirrors, so there's not only loss of performance, but with SSDs you may get higher wear on some drives. It's probably minimal, but it's a thing to consider.
I dont think you understand how these systems really work, and the huge limitations of traditional RAID.
BTRFS's failure to fully the close the gap is odd,
The project started in what, 2007? 17 years without getting core features working? That shit is the Tesla full self driving of file systems.


Personally, I don’t see much value in the NAS OSes for anyone who’s even a modest power user.
 

malor

Ars Legatus Legionis
16,093
Software has no dreams, no fears, and no desires.
No, but its creators do. Getting pedantic about crap like that is silly. You're just wasting everyone's time.

ZFS is fundamentally designed to control whole disks. It doesn't matter what kind of disks you use, the design doesn't change. If you don't use whole disks with it, at least with Linux, you can have weird issues. ZFS is more or less one giant layering violation as far as Linux is concerned, a whole separate storage stack from the metal up.

I have not used it with the BSDs, so I can't comment on how it works on that side of the fence.

You can certainly argue that ZFS' design is bad on SSDs, but that's still the design.
 

koala

Ars Tribunus Angusticlavius
7,579
Personally, I don’t see much value in the NAS OSes for anyone who’s even a modest power user.
My test run of TrueNAS Scale was fairly promising. Their bet on K8S + Helm to deliver apps is quite interesting- getting Nextcloud running with a few clicks was pretty nice.

I've settled on Proxmox because I am extremely picky. But I could see a lot of people very happy with TrueNAS. In fact, if they relaxed the OS drives limitation, you can get a 2x4Tb Hetzner server in an auction for 30€/month. If Hetzner provided TrueNAS Scale as an OS option, you could get an automatically updated Nextcloud without touching a console. And that would have capacity to spare to run a couple other apps. That is starting to sound very interesting in many scenarios.
 

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
Yes, partitioning disks is bad because you murder seek time/latency/IOPS.
Short stroking has entered the chat
its 2024. RAID is stupid. Literally. You want the jounaling on writesand The checksums of data. ZFS is much more “proper” than the obsolete RAID+filesystem paradigm Because it’s designed to not destroy your data!

You really don’t know what you’re talking about here. / I dont think you understand how these systems really work, and the huge limitations of traditional RAID.
No, RAID is not stupid. Not in the slightest. It did/does have risks, and those risks were always clear and understood. RAID levels beyond 0 along with nested RAID and battery backed cache were attempts at reducing the risk of data loss. The power of more current CPUs and the gobs of RAM available are what allow software RAID to be viable now, and thus we get systems like ZFS and Btrfs which in theory take the good parts of RAID and improve on it with all the other things you mention. I may not have intimate knowledge of these things but I've read plenty and what I said seems to remain true unless you've got definitive information you'd like to provide and actually contribute to the discussion with (instead of condescension). Btrfs/ZFS, as far as I understand it, DO NOT STRIPE. They distribute, which is not the same and does not ensure even writes across the RAID. If this is the case (which documentation, etc. seem to support) then there is indeed a loss of performance, and for NAND, a potential for uneven wear.