How to hot swap hardware that do not support hot swap?

I have an I/O board with some hardware attached to it. The problem with this hardware is that if the attached hardware intermittently loses power and comes back again, the whole system needs to be rebooted in order for the host system to recognize it again. It is a Linux system by the way.

Is there a way to reinitialize the hardware without rebooting the entire system? Can I just not restart some systemd daemon, use some kind of interface to send an FLR signal to the PCIe slot of the hardware or simply use power states D4-D0-D4 to cut the power to the PCIe slot in order to trig a reset and restart the driver? How can one administrate such things in a Linux environment? Does anyone know?

I see that there are commands such as rmmod and modprobe that can be used to unload and reload the driver but I'm not sure whether it will properly reset the underlying hardware.

I also found this method although I don't know whether it is up to date in 2024:

 
Last edited:

Paladin

Ars Legatus Legionis
32,552
Subscriptor
You can also damage the device while removing/installing it. the PCIe connector is not meant for hot swap so you can have arc issues or shorting if you are not very careful to remove it very precisely.

The voltage is not that high so it is probably ok but there is a chance. I would just reboot it. If it happens often enough to be a problem, fix the problem that causes the power loss. Check for physical issues etc., you might be facing an eventual fire if it is a physical problem like the card is drooping out of the slot or is not properly seated or the slot is damaged.
 
  • Like
Reactions: AndrewZ
There are I/O cards that embed whole PCIe slots into external ports such as these ones:



My intention is not to physically hot-swap external PCIe devices, but I have discovered that when the power intermittently fails on said devices, then the host system is unable to reattach them. Hence my wording as temporarily cutting power to such devices are essentially the same as hot-swapping them.

I want this to be handled automatically without needing physical presence near the machine.
 
Last edited:

teubbist

Ars Scholae Palatinae
823
You can also damage the device while removing/installing it. the PCIe connector is not meant for hot swap so you can have arc issues or shorting if you are not very careful to remove it very precisely.
AFAIK this isn't completely true. The PCIe connector is hot swap capable, using a similar system to SATA: longer ground pins and a slightly recessed enable pin.

This stackexchange post covers most of it.

But practically it's correct in that most devices aren't designed to support hot swapping, and outside of a server chassis with PCIe card carriers that use a latch/lever type mechanism to ensure cards are inserted evenly, I wouldn't want to rely on hand plugging in something like a 16x card and getting the contact insertion 100% correct.
 
  • Like
Reactions: Paladin

Paladin

Ars Legatus Legionis
32,552
Subscriptor
Yeah I think I would still recommend first looking for a solution to the actual problem. It could be as simple as moving it to a different slot, or checking to see if it needs a support to stay inserted fully. Or it could be a problem with the board not supplying enough power consistently (motherboard or power supply issues, or lack of a secondary power input for the card, etc.). Eventually, a software solution to reload the card will just result in the card finally dying from repeated power cycling.
 

Lord Evermore

Ars Scholae Palatinae
1,490
Subscriptor++
I may be wrong but my reading of this was that it's not an issue with the PCIe slot or motherboard. The I/O board is installed in the PCIe slot, and there is some external hardware connected to it which has its own power supply. If that external hardware loses power (for whatever reason it's something that regularly happens) then the I/O card, and thus the OS, no longer recognizes the external hardware until the I/O card itself is also reset, which requires a reboot of the computer. OP is looking for a way to re-initialize the I/O board via software.

If the I/O board itself is no longer recognized by the system, but it's being caused by the external hardware resetting, then that's a pretty shitty I/O board design, but it wouldn't indicate there is a fault with the mainboard or the card's power circuitry. If the I/O board is still recognized by the OS, and it's just the external hardware that stops working, then it seems like there ought to be a software command that could be sent to the I/O board itself to tell it to reinitialize without requiring the whole system to be rebooted or the card hot-plugged or reinitializing the PCIe subsystem.