ECC memory for backup/NAS required?

eisa01

Ars Scholae Palatinae
1,051
Subscriptor
I have been of the impression that you really should have ECC memory if you build a dedicated NAS/backup mini PC

However, in some hackernews comment people pointed to this piece: https://www.klennet.com/notes/2022-11-25-zfs-and-ecc-rant.aspx

But, it unhelpfully doesn't tell you what the right method is where ECC RAM is only for convenience?
Whatever method you use to prevent tampering with your backups (e.g., when a cryptovirus silently corrupts your data) should catch the RAM problem too. If it does not, you need to change the method rather than rely on ECC RAM.

If your backup and recovery strategy relies on ZFS using ECC RAM, you should rework your backup strategy. After you do not need ECC RAM, you can buy it as a matter of convenience to save time troubleshooting and/or restoring from the backup.
 
I suppose you could checksum every file after it's been backed-up and compare against the original copy. Pretty much every backup tool for both endusers and the enterprise has checksum validation, but I don't remember any of them re-checking against the original copy. Instead they use stored checksums which could have been corrupted in memory and then recorded. So they're great at finding physical corruption due to bitrot in your storage media but less so for logical corruption in what was written in the first place. And of course the original file needs to be available for the checksum in the first place-- you may have backed it up and then deleted/purged it!

You could also take each backup three separate times to three separate destinations, and then checksum them before restoring. Any discrepancies, likely two of the three will agree.

If your data isn't mission-critical, none of the above applies and it doesn't much matter. If it is, just buy ECC.
 

teubbist

Ars Scholae Palatinae
823
ECC RAM is a convenience in the same way having checksums of data, or using a redundant form of RAID when you have a 3-2-1 backup strategy: it warns you if the RAM is going bad(i.e. bit flips being detected during a scrub) and minimizes downtime(by correcting those bit flips until you replace the RAM).

But no, it's not strictly needed for a NAS and the failure cases it covers are fairly small. The incremental cost of ECC in a DIY NAS is minimal, so it's something I've always opted for, but you can reach similar reliability if you use non-OC RAM from a known vendor and stick to JEDEC frequencies. Running quality RAM in spec isn't that failure prone.
 

Andrewcw

Ars Legatus Legionis
18,129
Subscriptor
I have been of the impression that you really should have ECC memory if you build a dedicated NAS/backup mini PC

However, in some hackernews comment people pointed to this piece: https://www.klennet.com/notes/2022-11-25-zfs-and-ecc-rant.aspx

But, it unhelpfully doesn't tell you what the right method is where ECC RAM is only for convenience?
It uses the Meme way of answering. The top line is the truth. "Is it required? No." The next line is should you use it. And it leaves the answer ambiguous because there is no right answer. The rest is the TLDR explanation of why ECC RAM is the least of your worries with a ZFS array failure. It's just saying having ECC ram out of convivence make it easier to probably not pin your error on bad RAM while trying to trouble shoot a constant ZFS array failure.

The part you quoted is basically you better have some secondary validation that your backup is valid to the data you think you just backed up. Ex. When i copy files to USB drives i have a program that does a MD5 hash check on the source and a re-verification on the Target data after the fact. IF they don't match it could mean something is wrong like a Memory corruption. Malware. Bad cable.
 
Sure, but maybe you resliver or replicate your backups elsewhere and then bad RAM could hit you. It's one of those things that's really rare and only matters if your data integrity is extremely important. For home users building their own NAS to hold pirated movies and pics of their kids, get ECC if you can and it isn't a bunch more money. Otherwise don't worry about it.
 

eisa01

Ars Scholae Palatinae
1,051
Subscriptor
Thanks for all the explanations! Any tool that would do checksums before and after copy?
But no, it's not strictly needed for a NAS and the failure cases it covers are fairly small. The incremental cost of ECC in a DIY NAS is minimal, so it's something I've always opted for, but you can reach similar reliability if you use non-OC RAM from a known vendor and stick to JEDEC frequencies. Running quality RAM in spec isn't that failure prone.
The cost is unfortunately not minimal if you want to have something small and flash-based for use in an apartment where hard drive noise is a concern, not even sure how to feasible get hold of it, e.g., this one: https://www.solid-run.com/industrial-computers/bedrock-v3000-basic/
 

teubbist

Ars Scholae Palatinae
823
That machine is expensive because its a niche product, not because of the DDR5 ECC. Small, industrial, DIN mounting, passive, embedded. Each of those is a multiplier on the price.

And you're trying to build an AFA(yet another niche in the home market), which is going to blow the cost out of the water unless you want absolutely minimal storage. 8+ TB NVMe still isn't cheap.

Although DDR5 being relatively new does mean there is a higher premium for ECC on a DIMM cost level, at ~50% depending on capacity and speed. But when factored against the total cost of a NAS it trends to incremental.
 
Last edited:

steelghost

Ars Praefectus
4,975
Subscriptor++
Any tool that would do checksums before and after copy?
Macrium Reflect is the backup tool I use, and it certainly has the option to do post backup checksum validation. It's not just copying files though, it uses its own compressed (and potentially encrypted, if you want) file format.

When I built a NAS a few years back I used the then-slightly-out-of-date i3 8100 quad core CPU, more than enough oomph for NAS stuff but has the benefit of supporting ECC memory as well, on the right board. Inexpensive on the used market as well. Couple that with a Supermicro ITX board and some lightly used low profile ECC DIMMs, and the overall BOM for the core system wasn't too painful.

Is it dramatically better than a Synology of equivalent price? Debatable, but it does have the benefit that if the hardware dies, I can boot TrueNAS on more or less anything and still access my data, rather than having to buy another Synology or use separate third party software to read the SHR format.

These days I am not sure the i3 parts still support ECC, you might end up "needing" to step up to a Xeon part to get ECC compatibility.
 
  • Like
Reactions: steelghost

Xelas

Ars Praefectus
5,444
Subscriptor++
I'm probably late diving into this, but I have had 2 different RAM sticks go bad on me. One happened about 10 years ago. A non-ECC 8GB SODIMM went bad out of the blue in my work laptop after working fine for a couple of years. I noticed sudden stability issues and figured it out only after it had corrupted the Windows 7 OS (system would randomly bluescreen), and I was finding borked document files for years afterwards. That RAM stick worked fine for a couple of years until it suddenly didn't. Memtest86 quick test did not find the issue, but a full stress test did. My MO for all of my important stuff was to save local and also immediately save copies to the department/office file servers, but that bad RAM stick destroyed many files and documents because they were being silently corrupted in the process of being saved and backed up. That was really the scary part of this. I also ended up having to totally blow away and reinstall Windows as well, as many of the settings files and probably the registry were bad and installing "on top" of the old install still ended up with some broken stuff, but that was less of an issue and Windows 7 needed to be blown away and refreshed about every 3-4 years anyway as cruft built up.

Another was an ECC RAM stick I bought for my home lab server that was defective out of the box. I caught ECC errors logged in the IPMI error logs. Seeing that was actually great, as it gave me the peace of mind knowing that the ECC reporting actually works and possibly prevented me from dealing with file corruption on that server later. Newegg didn't suck back then and they overnighted me an exchange.

If the NAS will be used to host irrecoverable and important data or files, then why not spend a few more bucks for the little bit of extra security that ECC provides?
 
  • Like
Reactions: w00key

w00key

Ars Praefectus
5,907
Subscriptor
What's the most economical solution for the holy trinity of

  • ECC
  • iGPU (Intel preferred for QuickSync)
  • NAS-ish chassis


Synology does ECC XOR iGPU. Either low end Intel with iGPU without ECC or some embedded Zen 1 with ECC.

AMD does ECC but has worse iGPU, people are complaining about the decoder/encoder in the Ryzen 7000 series, and the APUs with more GPU don't do ECC.

Intel can do all three with W680 but the board is like 500 euros.

Last HP Microserver does ECC and NAS-like but no iGPU.


Meh. Any ideas?