Before I knew about docker:
I’ve see grub errors on proxmox boot drives after unscheduled shutdowns (power outages). The problem and solution is on this thread/post:
This happened on desktop-class hardware, with both dram-less and higher-end ssds. With the 720d being enterprise-level hardware, it probably has some advanced write-caching to prevent this.
The only way to prevent this has been to immediately issue a shutdown command when there’s been a power outage, I accomplish this by monitoring the battery levels on the inverter/UPS — as soon as they dip below the float voltage (indicating a power outage, since the inverter is no longer charging the batteries) a node-red flow triggers a shutdown of cluster through proxmox’s http api:
I’ll try and summarize:
-
mqtt flow starting with battery status: logging battery voltages reported by tasmota into influxdb
-
mqtt flow starting with pve netwatch: updating global arrays ‘pve_nodes-offline’ or ‘pve_nodes-online’ with whether a node is online or offline, as reported by a virtualized router
-
mqtt flow starting with deb netwatch: this section basically watches my vms, if they go offline and a shutdown isn’t planned or ongoing, then they’re started/restarted, depending on if they crashed or hanged — status is pulled in from proxmox’s metrics that are directly sent to influxdb, the mqtt trigger is from another virtualized router
-
flow that’s run every 30 seconds: get battery voltages from influxdb and start processing the data in the battery monitor node:
-
if there’s a power outage, immediately shutdown the workstation cluster
-
also if there’s a outage, start logging battery voltages to telegram, all of the link-out nodes are to telegram in this flow
-
if the battery voltages fall to low, trigger shutdown of the other cluster
-
five minutes after the shutdown signal, turn off all the sockets in the “pdu” (a bunch of tasmota smart plugs)
-
after power returns and has been available for 10 minutes, trigger start-up by turning on the “pdu” (triggered by the BIOS setting of power on after ac power loss).
This was one of my earliest node-red flows so it’s probably in need a lot of refinement.
.
edit: a partial clip of the staggered start-up sequence: