Hey C&MT, it's been a while for me.
I'm putting together a passively-cooled Qotom system (Atom C3758R) to become a new OPNsense router... More accurately, it will probably run Proxmox, with the router virtualized and getting NICs passed through to it. This is for ease of backup, and so I can run some other utility VMs on the same system (pihole, unifi controller, etc) since I don't need 8 cores of C3758 to route 1Gb at home.
I have 2x32GB of ECC DDR4 - yes, overkill for a router, but see above re: virtualization/etc.
Because I'm not a savage, I'm running it through bench testing before doing more with it. Started with the FOSS
Grabbed Passmark's free edition of Memtest, which does support ECC. Both sticks together were throwing ECC errors less than 2 hours in. Let's do the reductive testing dance... Tested them one by one in the first slot.
One stick made it through 4 whole rounds of Passmark Memtest (8.5 hours) and had one single ECC error during that run.
The other stick throws multiple ECC errors within the first 30-60 minutes; I have it sitting to complete a full run now and will check it in 9 hours to see a real total.
But these are "only" correctable ECC errors... should I care? I feel like the stick with a bunch of them probably should be RMA'd? The one that made it 8.5 hours with a single error, I could potentially believe that's a fluke / legit random bit flip.
TL;DR: Should I RMA the "more bad" stick with a lot of ECC errors, or should I RMA them both? Or don't worry about it at all, because ECC is doing its job?
I'm putting together a passively-cooled Qotom system (Atom C3758R) to become a new OPNsense router... More accurately, it will probably run Proxmox, with the router virtualized and getting NICs passed through to it. This is for ease of backup, and so I can run some other utility VMs on the same system (pihole, unifi controller, etc) since I don't need 8 cores of C3758 to route 1Gb at home.
I have 2x32GB of ECC DDR4 - yes, overkill for a router, but see above re: virtualization/etc.
Because I'm not a savage, I'm running it through bench testing before doing more with it. Started with the FOSS
memtest86+
; it made it through a whole "pass" and a bit more, with zero errors. MT86+ did not seem to be reading any ECC information from the board though.Grabbed Passmark's free edition of Memtest, which does support ECC. Both sticks together were throwing ECC errors less than 2 hours in. Let's do the reductive testing dance... Tested them one by one in the first slot.
One stick made it through 4 whole rounds of Passmark Memtest (8.5 hours) and had one single ECC error during that run.
The other stick throws multiple ECC errors within the first 30-60 minutes; I have it sitting to complete a full run now and will check it in 9 hours to see a real total.
But these are "only" correctable ECC errors... should I care? I feel like the stick with a bunch of them probably should be RMA'd? The one that made it 8.5 hours with a single error, I could potentially believe that's a fluke / legit random bit flip.
TL;DR: Should I RMA the "more bad" stick with a lot of ECC errors, or should I RMA them both? Or don't worry about it at all, because ECC is doing its job?