Proxmox Networking issues - migrated VM

IncrHulk

Ars Praefectus
3,460
Subscriptor++
EDIT: I FOUND THE ISSUE(S). I will post the fix in a new comment after I walk the dog, she is insisting loudly that she needs a walk.

Where to start? I got a Proxmox system up and running and painlessly migrated my Windows 2019 VM off my ESXI host. I added a second and third node and joined them in a cluster, and everything worked seamlessly. Until today.

Config Details (to keep things a little clearer)
Node 1: PVE01
Node 2: PVE02
Node 3: PVE03

I decided I'd add a second SSD to PVE01. I'd purposefully left it out as part of learning Proxmox. I wanted to migrate VMs off the PVE01 to a PVE02 and PVE03 to test being able to briefly bring a node offline for hardware upgrades. Okay, migrated my HomeAssitant VM off PVE01 to PVE02. The VM powered on successfully after the migration and, bob's your uncle.

I then went to migrate the Windows 2019 VM to the PVE03, and the VM moved over to the new node, except as soon as it powered on, I lost network connectivity to PVE03.

  • The cluster could not communicate with it. I could, however, ping the VM via IP and FQDN. I could not ping PVE03 via IP (I did not test via FQDN).
  • I connected via the console locally and confirmed the VM was running using qm list. Issued a qm shutdown.
  • I tried pinging the gateway from PVE03's console and it failed. I could ping PVE03's assigned IP and loopback.
  • I rebooted PVE03 and almost immediately, it lost connectivity. Basically, as soon as the VM finished its boot-up cycle.
  • I tried editing /etc/pve/qemu-server/pve03.conf and get an access denied error when I tried to write out from nano. I'm in the console as root.
Thinking maybe the VM corrupted during the migration between nodes, I re-imported the VM from my ESXI box to PVE02. As soon as the import was completed and I powered up the VM, I lost connectivity to PVE02. Exact same symptoms.
  • Logged into the console and confirmed the VM was running and shut it down.
  • I still could not ping the host, or ping the gateway from the host.
  • rebooted PV02 and regained connectivity to the host via the GUI interface.
  • I removed then changed the NIC from the VMEX3 Nic to ETH1000 and powered the VM on.
  • Lost connectivity again
Eventually narrowed down that as soon as I assign the VLAN tag for the VLAN the VM is supposed to be on, it kills the network on the host. If I removed the NIC from the VM, and powered it on, all was OK. I'm now stuck at this point. DNS (the Windows VM is a DC) is now back living on my ESXI server.

Does anyone have any suggestions what might be wrong?
 
Last edited:

IncrHulk

Ars Praefectus
3,460
Subscriptor++
As I had narrowed it down to something appearing to be a networking issue, I verified my switch configs. Unifi had pushed out an update last night, so I rolled back the configuration. This did not fix the issue. I went back and reviewed my /etc/network/interfaces on PVE01.

That all looked good, so I connected to PVE02 and reviewed /etc/network/interfaces. Wait a minute...

I missed configuring vmbr0. Configuration on PVE02 and PVE03 was missing

bridge-vlan-aware yes
bridge-vids 2-4094

auto lo iface lo inet loopback iface eno2 inet manual auto vmbr0 iface vmbr0 inet static bridge-ports eno2 bridge-stp off bridge-fd 0 auto vmbr0.2 iface vmbr0.2 inet static address 10.0.10.3/24 gateway 10.0.10.254 iface wlo1 inet manual