one DC of 3 failed, should I try to resuscitate it or is it easier to build a new one?

chalex

Ars Legatus Legionis
11,286
Subscriptor++
I inherited an AD environment with 3 DCs. One of them has recently failed (we actually didn't even notice for a long time). The machine is dead and won't boot. Everything in the domain is still working just fine though. AFAIK the DCs don't have any other functions on them (we have separate DHCP servers, etc).

Should I try to somehow recover/restore this system? Or is it easier to just build a new Windows Server and join it as the third DC? Or should I just leave things as is and just stay running with 2 DCs?

I'm somewhat unfamiliar with Windows AD admin details but the good news is that the DCs are Windows Server 2019 which is quite new and the overall configuration is simple and minimal, as far as I know.

I hope I'm asking the right questions. Let me know if there is something else I need to consider.
 
Regardless of what you decide, you need to clean up the server metadata of the server which was forcibly removed. See this guide: https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/deploy/ad-ds-metadata-cleanup

Also make sure your FSMO roles weren't on that domain controller. Run netdom query fsmo and it will tell you. If any were on the dead DC, seize them onto another domain controller.

Once that's done, run dcdiag /e /q /c from a domain controller and make sure there aren't any further errors.

As far as whether you should add third domain controller, it depends on your environment. Would all three domain controllers be physically located in the same datacenter / building?
 
  • Like
Reactions: dodexahedron

chalex

Ars Legatus Legionis
11,286
Subscriptor++
netdom query fsmo shows 5 items and the first two are dc1 and the last 3 are dc2. So that looks fine.

I ran dcdiag yesterday and saw a bunch of replication errors about dc3 which is where I discovered one of the DCs was down.

The three DCs here are distributed across different networks/systems in a way that is probably overkill. So maybe I'll just do the metadata cleanup and stay with 2 DCs.

What happens if I don't do the cleanup? DC3 has already been down for several months and seems like nothing is really affected by that.
 
If you skip the cleanup, the other domain controllers will fruitlessly try to replicate with the unavailable DC, as indicated by dcdiag. Your users may experience longer logon times because they could try to authenticate against the DC that's no longer there. Also, if the DC was a DNS server, anything trying to use that will have to wait for a timeout when trying to use that DC's IP.
 

chalex

Ars Legatus Legionis
11,286
Subscriptor++
OK, so I'm convinced that I should do the cleanup and then we'll have 2 working ones, and then I can decide whether we really need a third one or not.

If we do decide to build a third fresh one, would you consider that "easy"? Just build a new Windows server 2019 and run some command to "join" it as a new DC?
 
Yes, the process is very straightforward. Install Windows Server, install the Active Directory Domain Services server role, restart, and then promote it to a domain controller via Server Manager. Just be aware that Windows Server 2019 drops support for replication via FRS (File Replication Service) and requires that your domain is using DFS-R for replication before it can act as a domain controller. What are your current Domain and Forest Functional Levels?
 

chalex

Ars Legatus Legionis
11,286
Subscriptor++
The domain and forest functional levels both say "Windows Server 2012 R2"

The full dcdiag output after removing the third DC (obfuscated domain name word)

C:\Windows\system32>dcdiag /e /q /c [PVE-DC01] No security related replication errors were found on this DC! To target the connection to a specific source DC use /ReplSource:<DC>. There are warning or error events within the last 24 hours after the SYSVOL has been shared. Failing SYSVOL replication problems may cause Group Policy problems. ......................... PVE-DC01 failed test DFSREvent An error event occurred. EventID: 0xC0000583 Time Generated: 10/10/2023 11:55:49 Event String: Active Directory Domain Services failed to construct a mutual authentication service principal name (SPN) for the following directory service. ......................... PVE-DC01 failed test KccEvent ** Did not run Outbound Secure Channels test because /testdomain: was not entered [PVE-DC02] No security related replication errors were found on this DC! To target the connection to a specific source DC use /ReplSource:<DC>. The event log DFS Replication on server pve-dc02.obfuscated.local could not be queried, error 0x6ba "The RPC server is unavailable." ......................... PVE-DC02 failed test DFSREvent The event log Directory Service on server pve-dc02.obfuscated.local could not be queried, error 0x6ba "The RPC server is unavailable." ......................... PVE-DC02 failed test KccEvent ** Did not run Outbound Secure Channels test because /testdomain: was not entered The event log System on server pve-dc02.obfuscated.local could not be queried, error 0x6ba "The RPC server is unavailable." ......................... PVE-DC02 failed test SystemLog Test results for domain controllers: DC: pve-dc01.obfuscated.local Domain: obfuscated.local TEST: Delegations (Del) Error: DNS server: ssf-dc02.obfuscated.local. IP:<Unavailable> [Missing glue A record] DC: pve-dc02.obfuscated.local Domain: obfuscated.local TEST: Delegations (Del) Error: DNS server: ssf-dc02.obfuscated.local. IP:<Unavailable> [Missing glue A record] Summary of DNS test results: Auth Basc Forw Del Dyn RReg Ext _________________________________________________________________ Domain: obfuscated.local pve-dc01 PASS WARN PASS FAIL PASS PASS n/a pve-dc02 PASS WARN PASS FAIL PASS PASS n/a ......................... obfuscated.local failed test DNS ***ERROR: There is an inconsistency in the DS, suggest you run dcdiag in a few moments, perhaps on a different DC. ......................... obfuscated.local failed test Intersite
 

oikjn

Ars Scholae Palatinae
969
Subscriptor++
how big is your environment both physically and logically? At a minimum you should have two DCs, but unless you have multiple sites or have many hundreds of users or some other strange use case, you should easily be able to stick with 2 DCs. Also, you say "died" like it is a physical computer. Virtualize them so you don't have to deal with hardware specific problems again.
 
Virtualize them so you don't have to deal with hardware specific problems again.
I believe best practice is still at least one physical (usually primary). All others can be virtualized. I think the belief (not sure how true it is anymore), is that the host server may not come up if it can't contact any DC's, which is possible if all are virtualized and say, the HV service is broken or something. I've worked where everything (including DCs) are virtualized and also where only DCs are physical. I never noticed much of a difference with either as long as infrastructure is properly maintained.
 
Last edited:
Ideally you have DCs spread across different virtual clusters and ideally at different sites so nothing would take them down all at once. That said, not having a DC up isn’t exactly an issue that keeps a hypervisor down, but you do need to document how to bring up a DC in a totally down situation so you aren’t learning that on the fly.
 

Andrewcw

Ars Legatus Legionis
18,129
Subscriptor
You can VM it all of course on different physical hosts. I remember back in the day having a single DC on a 2000 network. It would take an extra 5-10 minutes of just freezing there trying in vain to find another DC. Which might be part of the physical best practice when VM was newish. As VM's just sitting there you start to scratch your head is it the Guest or Host freezing.

My secondary controller on my network is a VM with 1GB of ram who's sole purpose is to keep the Primary happy when it needs rebooting.
 

chalex

Ars Legatus Legionis
11,286
Subscriptor++
OK, so today's dcdiag after I cleaned some stuff up (old DNS entries, etc) is much shorter and I think it means everything is happy.

C:\Windows\system32>dcdiag /e /q /c [PVE-DC01] No security related replication errors were found on this DC! To target the connection to a specific source DC use /ReplSource:<DC>. ** Did not run Outbound Secure Channels test because /testdomain: was not entered [PVE-DC02] No security related replication errors were found on this DC! To target the connection to a specific source DC use /ReplSource:<DC>. ** Did not run Outbound Secure Channels test because /testdomain: was not entered

Which means I can now go back to troubleshooting my original issue! I'll follow up in another thread.
 

stevenkan

Ars Legatus Legionis
15,662
What is there to tell? It serves its purpose as DC to function as a secondary DC that handles AD. Other then crying in pain when it needs to do an update with how resource starved it is.

The OP inherited a setup with 3 DC's but never got into why they had 3 or why only 2 were in operation.
I'm curious how you configured a VM to run a DC in only 1 GB of RAM.
 

stevenkan

Ars Legatus Legionis
15,662
You can configure your VM to be anything. But 2016. 512MB of ram Minimum. It has double the minimum!
Ah, my thinking is stuck in the 90s where "server" meant >>>> "workstation." I keep forgetting that a DC doesn't need to do much, especially for a small number of users.

This little <$200 Intel NUC would probably do the job, if the OS has the appropriate drivers. For one of my remote sites I thought about spinning up a VM like you did, but I'd want to have it on a separate network segment from the PC that's there, and I don't want to rely on this other box for DC functionality.
 

stevenkan

Ars Legatus Legionis
15,662
This little <$200 Intel NUC would probably do the job, if the OS has the appropriate drivers. For one of my remote sites I thought about spinning up a VM like you did, but I'd want to have it on a separate network segment from the PC that's there, and I don't want to rely on this other box for DC functionality.
Server 2022 installed with no additional drivers required. So I have a DC for <$200 hardware cost.
 

dodexahedron

Ars Praefectus
3,356
Subscriptor++
Regardless of what you decide, you need to clean up the server metadata of the server which was forcibly removed. See this guide: https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/deploy/ad-ds-metadata-cleanup

Also make sure your FSMO roles weren't on that domain controller. Run netdom query fsmo and it will tell you. If any were on the dead DC, seize them onto another domain controller.

Once that's done, run dcdiag /e /q /c from a domain controller and make sure there aren't any further errors.

As far as whether you should add third domain controller, it depends on your environment. Would all three domain controllers be physically located in the same datacenter / building?
This
Also note that, if it had certain other server roles, like DHCP, you will need to manually de-authorize it as a DHCP server or it will show up as an available option in the DHCP MMC snap-in when switching servers. Doesn't really seem to affect anything, but it's just another bit of cleanup to do.

DNS also usually ends up with some manual cleanup, as it won't have been removed as a name server for anything except your domain zone. This is a job for powershell, if you have more than a small handful of zones (especially reverse zones). You can spend an hour manually clicking through all the zones in MMC or you can write a 3-line script that removes the entry from all zones. There may be other lingering records in DNS, too, so just scour for its names and addresses across all zones - especially for SRV records that can result in very noticeable delays for things for users.
 
Last edited:

dodexahedron

Ars Praefectus
3,356
Subscriptor++
In my experience (local government), proper licensing for the hardware has always been more of a challenge than finding hardware to act as that extra physical DC.
Plus like.... Consumer hardware with zero redundancy and also now a dead product line (intel has dropped NUC) as a domain controller?
No thanks.

I mean there's being thrifty, and then there's being thrifty.