Home Rack Colo Project - ETA To Complete June '23

Placeholder of my insane WIP Project.

As a Sys Admin of 3yrs and 13yrs in OEM Building/Desktop Support of mostly MS/PC stuff, decision was made to move away from "consumer" looking cases and go big with less fuss to move stuff around. At least 2 years of real VMware ESXi admin with another 6yrs in Horizon VM management.

Design concepts:
  1. Primary server will run VMware ESXi 7(8??) w/vSphere Essentials for Veeam backups.
  • File Storage management will be through TrueNAS as a VM.
  • 1x Windows Server 2016 for basic management and possible AD testing with test VMs

Design:
Rack design
1x StarTech 12U 4post Open Frame + 1U PS w/8 outletshttps://www.amazon.com/dp/B09QMDHRSL
1x 2U Shelf 16" or a 20" https://www.amazon.com/dp/B008X3JHJQ Position 12U & 11U - Top
1x 4U Rosewill RSV-R4000U https://www.amazon.com/dp/B09HLCNKM3 Position 10U through 7U - Middle
1x 4U Rosewill RSV-L4000U https://www.amazon.com/dp/B096T3J8CQ Position 6U through 3U - Middle
1x 2U TrippLite SMART1500LCD https://www.amazon.com/dp/B000DZRY9C Position 2U & 1U - Bottom
2x IStarUSA 26" Rail Kits https://www.bhphotovideo.com/c/product/834849-REG/iStarUSA_TC_RAIL_26_Sliding_Rail_Kit_26.html

The 2U shelf will keep the home network devices: Cable modem/router/switches/security hub and the 1U PS on the opposite end at Position 11U
Stand-alone dedicated gaming system will be first 4U at the top with a Zen4 build - *Future Post*
ESXi Server will be the bottom 4U - System Design as follows:
AMD R7 5800X/w 64GB DDR4 3200 ECC UDIMMs ASRock Rack X570D4U
2x SK Hynix P31 1TB or 2TB in RAID1
4x Seagate EXOS or IW 14TB(16TB?) in RAID10(ZFS) via TrueNAS

UPS Power Backup to Primary Server and 1U PS for Network devices.

Give me your feedback and how crazy and silly this design will become by next year!
 
  • Like
Reactions: hyperactive
4x Seagate EXOS or IW 14TB(16TB?) in RAID10(ZFS) via TrueNAS
Anything other than RAID-Z1 seems wasteful to me. Presumably, you have a backup, and that covers the R part of RAID. I keep a single parity drive simply for convenience in case one drive goes down. Its actually faster to restore a backup using zfs send-recv on a 10GbE network than it is to order a new drive, have it arrive in the mail, and then rebuild the array. Especially once you are talking about modern high capacity drives.

I have a similar setup, 5 14TB Exos drives, a single parity. RAID10 means you are losing half the capacity for no real reason, IMHO. Im even virtualizing TrueNAS in ESXi, lol. Using a Dell R710 though because RAM and not CPU is the most important part of my workloads.

FYI, Ive been tracking these for replacements and it looks like 16TB is now cheaper than 14tb.
 

continuum

Ars Legatus Legionis
94,897
Moderator
Given the risk of an uncorrectable bit error with such large volume sizes we normally prefer 2-drive parity ... but yeah... with just four disks that is a 50% capacity hit. Depends on what the goal is here, if it's uptime, then I would still do it. But if the goal is merely a larger-than-a-single-disk-can-do volume and uptime is not a key goal.....
 

Xelas

Ars Praefectus
5,444
Subscriptor++
At these huge drive capacities, RAID rebuild (resilver) times are considerable even with the optimizations that have been done to speed up resilver times. I've replaced a 18TB drive in my 6 x RaidZ2 array, and it took almost 24 hours. If another drive even so much as burps during a resilver, you stand the chance of losing the while pool if you only have RAIDZ1, and a resilver is very hard on the drives as they are running continuously during the rebuild and if you access the array during a resilver, there is a LOT of thrashing. The cost of that extra drive is negligible, IMHO, compared with the other hardware costs and hassle of restoring data. It's $300 for an 18TB drive that will function for, say, 5 years. That's $5/month.

Conventional wisdom states that RAIDZ10 is supposedly faster than RAIDZ1 or Z2 (since it's pure striping with no parity), but I don't really see a practical impact. Any VMs should be hosted on SSDs anyway (IMHO), and the spinners should be used for bulk storage. RAIDZ2 will allow any 2 drives to fail, while RAIDZ10 might get hosed depending on which 2 drives fail. RAIDZ2 has the potential of being expanded later with more slices, while that's not possible with RAIDZ10. I'd go with RAIDZ2.

You might regret the lack of PCI slots in a micro-ATX board. Your drive count obviously maxes out your SATA headers, so you might find yourself needing to buy an HBA card. Add in another card later for, say, a 10Gb NIC and you're out of space.

You are also not taking advantage of the M.2 slot on the board. I'd use that for the ESXI Host, leave the SSDs for VMs and spinning rust for bulk/ local backups.

My server sucks down ~130W, and it has had a huge impact on my power bill. I don't have AC (don't really need it in my area), so this doesn't take into account secondary cooling costs of running an AC to manage heat. The extra power draw pushes my electrical usage into a "high usage" tier, so the the real per-KWh rate that it takes to run the server is fairly steep. Obviously, YMMV and your electric costs are probably different, etc. but don't forget to take this into account. I did the math (napkin-level, but close enough), and the increase in my electric bill actually adds up to be more than my on-going hardware purchase/replacement costs.

I run ESXI on bare metal on an SSD. ESXI runs some VMS on other SSDs, and one of my VMs is TrueNAS. I have 2 x pools in TrueNAS running on 2 x LSI controllers that I passed through to ESXI, so TrueNAS is managing the pools directly. The TrueNAS VM is the first VM to boot, so it's ready by the time the other VMs boot up and need to see the mapped drives that they then use for bulk storage (mostly multimedia and files). My ESXi config is backed up and I image save an image of the boot drive to a USB drive before every ESXi patch or update (which I do about every 6 months). I found this arrangement to be simpler and more cost efficient than trying to optimize the ZFS pool to host VMs on directly - I screwed around with ZIL and SLOG drives and it really wasn't worth it. Not saying that this is the "right" way, but throwing it out there for consideration. I've had this setup running for almost 10 years now with no data loss and survived several drive failures, dozens of ESXi updates/upgrades (ESXi 5.0 --> ESXi 7), expanded my drive capacity almost ten-fold, and it's been 100% rock solid.
 

caustic meatloaf

Ars Legatus Legionis
13,857
Subscriptor
The only thing I can think of is if there's a need for that much storage space. I mean, 4x 16TB in a Raid 10 is still something like 30 TB or so. that's a LOT of storage.

IMHO, it's a lab setup at home - is there anything that specfically is so huge it needs that much space? And with a RAID10, no hot spare?

If you don't have an active plan for that much storage (like, I guess replicating very large databases at home or something), I'd suggest exploring using SSDs in a RAID6 or something. Or split your storage and have a "fast" and "slow" tiers. For me personally, I'd rather have a hot spare available and I'd rather go with a higher bandwidth and lower latency on storage. It's like night and day when hosting VMs.

As an anecdote - when we migrated our work ESX hosts from using a NetApp with 3.5" spinning disks to a Pure SAN array using Intel SSDs, the latency went from 200+ milliseconds to less than 2 milliseconds.
 
As an anecdote - when we migrated our work ESX hosts from using a NetApp with 3.5" spinning disks to a Pure SAN array using Intel SSDs, the latency went from 200+ milliseconds to less than 2 milliseconds.


If you didn't put an extra 0 in there, and that isn't highest latency seen for a single IO (a somewhat worthless metric), but actually a meaningful measure, the issue wasn't exactly spinning disk. The issue was way under-speccing your equipment. Well, even 20ms 95th percentile is underspecced, but in the realm of "I've seen it before, but it *really* needed fixed".
 
Thank you for the input.
Let me state that the primary storage and VMs will be on the RAID1 NVMes. I will have enough space to host the ESXi, TrueNAS, a Windows Server, 2 Windows 10/11 clients, and whatever else for testing.
I already have a 6yr old mini-itx TrueNAS and felt the burnt of the ZFS limitations for the expansion cost so I am taking a bigger step after I retire it to more of an emergency backup unit. With the bigger drives having a better cost ratio now, I rather have a better outlook on the budget cost of buying 2 HDs vs buying 4 HDs to expand. Worst case, I give up this route and dropship into UNRAID and forget this dream project.

Really wanting to expand my skillset in VMware on the side while working in the goliath enterprise arena. Also want to retire my crappy standalone homelab system that is made of scrap parts running on a Win2016 Hyper-V. Going to fork money on the VMware license of $550~ for the standard license and utilize Veeam to possibly a cloud repo if I can ask the wife for more budget. Hoping my work provides a buffer as a training cost for a license expense but that is wishful thinking since its still on the board of planning at this time.

PS: I hate LONG resilvering. Past work scenario was with a Synology NAS 8-drive unit expanding from a 4x 8TB drive RAID5 to 8x 8TB. It took almost the entire work week to complete the expansion/resilver while still acting as the on-prem Veeam repo to cloud.
 
Just an update on my design. Dumping the low-key stuff and setting eyes on last gen Epyc 7002 stuff. ebay filth and I'll be dipping in it once Tax hell is over. 7302P w/Supermicro H12SSL-i (est $720 for CPU+MB). Will start memory setup at 128GB (16GBx8 DIMMs). I'll still dabble more on the MB choice since some report TPM and BMC Fan oddities. 128PCIe Lanes!!!!!!!!

I reconsidered the RAID configuration and will cry in agony on the long build time to get Z2 configured.
 
What's the price gap between the Zen 2 and Zen 3 stuff? (especially Zen 4 is now out)

7302P is 16 cores so not sure how that compares to the desktop stuff, but if you want all the PCI-e lanes then EPYC definitely is the way to go.
If you check Ebay, it is a BIG GAP. Literally, the seller that I have been monitoring for this combo has averaged from $720 to $780. The 7302P can be had for almost $250 used while new still average in the $900s. The Zen3 Epyc comparison 7313P is almost at $900 Used if lucky to $1300 new. I literally plan to place at least 2 sub 200W GPUs.

My budget won't allow a pricy server CPU as I want to keep as close to $2K for this server build. The 16TB-18TB HDD prices fluctuated during the year-end and hopefully they still drop another $20. Wishful thinking considering the forecast and recent loss report from the Storage OEMs. 50% drop in demand was a big eyesore.
 

Xelas

Ars Praefectus
5,444
Subscriptor++
What's the price gap between the Zen 2 and Zen 3 stuff? (especially Zen 4 is now out)

7302P is 16 cores so not sure how that compares to the desktop stuff, but if you want all the PCI-e lanes then EPYC definitely is the way to go.

Who can pass up a motherboard having 5 x PCIe4 X16 slots!? I don;t need this and I'm not sure I can afford this, but DAMN is this a good base for a home server build. My only concern would be power use and efficiency and heat.
 
Who can pass up a motherboard having 5 x PCIe4 X16 slots!? I don;t need this and I'm not sure I can afford this, but DAMN is this a good base for a home server build. My only concern would be power use and efficiency and heat.
Well a 7003 Epyc would be more efficient but my outright silly brain thinks that cost on a consumer level won't even out nicely for newish Enterprise level hardware. Reason why many find used and retiring DC equipment if ever lucky to get big discount deals.
 
Soo uhh.. I did a naughty and went tiny big. AMD R9 5900X w/128GB Asrock Rack X570D4U-2T2L. The ESXi 8 is loaded and got TrueNAS Core going and see's 4 lovely SG EXOS X18 16TB Hdds. Fighting IOMMU Passthru in VMware was a needle in a haystack and I am happy that we have such a big group of people attempting and providing the solution for such a platform.

Now if you have a reason why I decided to not go for for the Epyc 7002 was due to ZEN Bleed. Sure, it would be slightly more dirt cheap and easier platform, but I rather remediate the issues on a newer architecture. Added pics of the near finalized setup. Stress testing and whatnot for about a week before I put the RAIDZ2 into sync. Still working on some bugs with TrueNAS Core as the Reporting doesn't give me details such as Temp/I/O while it does show in BIOS. Haven't tested shell like the TrueNAS community posted about.

VMware licensing: I did find a possible Cost solution for a vSphere license so crossing fingers that it is legit.

Most updates are referenced here: ServeTheHome - My ESXI Build
UPDATE: 11/21/2023 - Added 2x Noctua 80mm PWM fans because the X570 chipset was overheating. Screwed them onto the Dynatron rad. Forgot to take a pic before I shoved the server back into the rack.
20231021_120818.jpg20231019_181834.jpg20231025_115342.jpg
 
Last edited:

Xelas

Ars Praefectus
5,444
Subscriptor++
My biggest concern with this build is the fact that it's based on an AMD Ryzen platform with is NOT one of the platforms officially supported by VMWare. There are issues with passthough that can be worked around, but I'd be paranoid about some future update/patch to ESXi potentially breraking that workaround or introducing other issues.

I have a really old ESXi box running 4-5 VMs including a TrueNAS VM with a couple SAS cards in passthrough (Haswell-era Xeon 1245V3 on a Supermicro X10SL7 board). It's getting really dated, the 32Gb RAM limit is becoming an issue, and the CPU is no longer supported by VMWare, but I did get about 10 years out of the system that was 100% trouble-free and did not require any work-arounds for things becuase it is a well-supported platform. When I plan my upgrade, I'll likely stay with Intel just due to that fact. As much as I would have loved to jump to AMD, I don't want that Damocle's Sword of suddenly requiring hours of time on on a weekend to figure out how to make an ESXi update work becuase yet another workaround/bandaid broke at some point.
 
My biggest concern with this build is the fact that it's based on an AMD Ryzen platform with is NOT one of the platforms officially supported by VMWare. There are issues with passthough that can be worked around, but I'd be paranoid about some future update/patch to ESXi potentially breraking that workaround or introducing other issues.

I have a really old ESXi box running 4-5 VMs including a TrueNAS VM with a couple SAS cards in passthrough (Haswell-era Xeon 1245V3 on a Supermicro X10SL7 board). It's getting really dated, the 32Gb RAM limit is becoming an issue, and the CPU is no longer supported by VMWare, but I did get about 10 years out of the system that was 100% trouble-free and did not require any work-arounds for things becuase it is a well-supported platform. When I plan my upgrade, I'll likely stay with Intel just due to that fact. As much as I would have loved to jump to AMD, I don't want that Damocle's Sword of suddenly requiring hours of time on on a weekend to figure out how to make an ESXi update work becuase yet another workaround/bandaid broke at some point.
Reason why to wait at least 30-60 days from a release patch by VMware. The MS October 2023 broke couple Windows Servers on AMD ESXi platforms in a report. So far, no hiccups for almost 30 days of zero downtime. SMART finally worked. Took a little longer to build report data. It improved after almost 7.5 days for my TrueNAS HDD Burn-in to finish. Passthrough been stable since the vDev/Pool build. I haven't had any with the SMB share on the LAN/Wifi APs. Can easily stream an MP4 or MKV that is over 8GB on my android with VLC.

I will have time this holiday to put in the vSphere license and kick on the VSCA VM and get the vTPM going to test a Win11 VM. Will also spin up 2 Ubuntu VMs. My main Win10 console VM has been rock solid and will have my Win2022 VMs up later if time allows.
 

waqar

Ars Praefectus
4,216
Subscriptor
With storage having a lot a of capacity is all good but I would say faster rather than bigger is always the way to go with virtualisation.
In my current lab I've got a few templates in various OSes, for instance, Windows Server 2022 template. A few powershell scripts that will render out a quick AD domain with a CA, and a few scripts to populate the AD with accounts etc for testing. Others with Debian, Rocky, Ubuntu etc.
With automation speed is a the real win.
Spinning rust, even commercial is not as fast as pcie nvme consumer.
Fast network using bonded links, lots of memory, and more cheaper nodes rather than 1 monolithic point of failure.
Currently running esxi 8 on a pair of HP ProDesk 600 G3 SFF, with 64GB of DDR4 RAM and i5 processors in each.
They are managed by vcsa 8.
Licenses are VMUG ones couple of hundred bucks a year. With that you get all the stuff you'll want to play with licenses that last the length of of membership (1 year)
In the process of putting in some SFP+ broadcom 10GBe's into them, they've got a couple of ports so one for storage and one for vMotion
Best toy I recently got was a Asustor Flashstor 12. All flash NAS. Its not production but it lifts for a homelab
Its running TrueNAS Scale.
It's running raidz1 and its got about 10TB available 7 wide, and its got dedup on as well. Virtualisation is one of the best use cases for it. As its all flash, I've had no issues so far and I haven't even leveraged the 10GBe bits yet. It will clone out a 2022 server template in comfortably under a minute, I expect that might improve once its all 10GBe.
10Tb is a lot of VMs if you go by minimum spec for the bulk of system roles if you take out of the equation actual live data.
If you are using this as a test bed, there won't be any real data to speak of.
The minimum spec for a lot stuff is tiny for storage.
 
Last edited: