YANT (Yet Another NAS Thread): ZFS Config Edition

w00key

Ars Praefectus
5,907
Subscriptor
Btrfs/ZFS, as far as I understand it, DO NOT STRIPE. They distribute, which is not the same and does not ensure even writes across the RAID.
It tries to fill vdevs up evenly, so in a clean mirror filling up to full, it does distribute over the two, same as striping.

Now if you have super weird access patterns that hammers sections of the data file, small enough that it doesn't span both drives, then yes it could write to only one.

In reality, you rarely rewrite existing data unless running databases like Postgres and Postgres runs just fine on ZFS (with compress + recordsize=16kB), distribute and stripe have no performance difference.

Postgres WAL files are append only and don't hammer a single 16kB record. Data files are randomly accessed and written and again, >> 16 kB, so stripe or distribute, they both use all available disks.
 

malor

Ars Legatus Legionis
16,093
Btrfs/ZFS, as far as I understand it, DO NOT STRIPE. They distribute, which is not the same
As I understand it, at least with mirrors, ZFS guarantees at least X copies of the data, where X is your mirror width, with each copy on a different spindle. The disks may not be precisely identical, but each takes about the same amount of write traffic.

I'm not sure how that works with raidz volumes. I'm under the impression that the disks will take about the same amount of overall traffic, but it's not also not precisely identical. I gather you can end up with kind of weird results if you add more drives to a mirror, though. It seems like resizing ZFS volumes is kind of a bad idea; my impression is that ti's better to back up and recreate a larger volume, at least if balanced traffic is important to you.

I am not, however, a ZFS expert. I'm pretty comfortable with using it in basic, straightforward ways, but haven't tried really anything complex with it, just raidz and mirrors.
 

w00key

Ars Praefectus
5,907
Subscriptor
As I understand it, at least with mirrors, ZFS guarantees at least X copies of the data, where X is your mirror width, with each copy on a different spindle. The disks may not be precisely identical, but each takes about the same amount of write traffic.
The question is about a pool with multiple mirrors, each being a vdev, like sda sdb mirror, sdc sdd mirror, then add both mirrors to a pool. The pool would distribute over both vdevs but it isn't a strict stripe like first 128k to sda/b then 128k to sdc/d, but on record level to balance disk usage% of both.

ZFS is COW so new data is just appended to both pairs, then old unreferenced data garbage collected later. This may lead to more deletes on one pair than the other and very strictly speaking, may lead to unbalanced workload.

In reality if you issue 1 million random writes it would be way lower than lottery winning odds that it would be even 45/55% unbalanced. It just doesn't happen,. You need to have a very odd access pattern that allocates a file, write 16kB, skip 16 kB, write 16 kB, etc, to hammer the second vdev with recordsize=16K.
 

Red_Chaos1

Ars Tribunus Angusticlavius
7,875
The question is about a pool with multiple mirrors, each being a vdev, like sda sdb mirror, sdc sdd mirror, then add both mirrors to a pool. The pool would distribute over both vdevs but it isn't a strict stripe like first 128k to sda/b then 128k to sdc/d, but on record level to balance disk usage% of both.

ZFS is COW so new data is just appended to both pairs, then old unreferenced data garbage collected later. This may lead to more deletes on one pair than the other and very strictly speaking, may lead to unbalanced workload.

In reality if you issue 1 million random writes it would be way lower than lottery winning odds that it would be even 45/55% unbalanced. It just doesn't happen,. You need to have a very odd access pattern that allocates a file, write 16kB, skip 16 kB, write 16 kB, etc, to hammer the second vdev with recordsize=16K.
Yeah, to be fair this point I make is probably a wee bit pedantic. If this is how ZFS do, and it's been like this for as long as ZFS has been around, then the performance probably isn't that bad. No idea when it comes to Btrfs. Probably about the same.