On 28/06/2020 10:46, Simon Wydooghe via arch-devops wrote:
Quoting Sven-Hendrik Haase via arch-devops (2020-06-27 20:26:33)
Yeah, well I dunno. That's gonna get really embarrassing if we don't notice for too long at some point.
I'm not familiar with your RAID and monitoring setup, so apologies if this info is not applicable, redundant or useless, but if you're fans of Prometheus, node_exporter exposes mdadm metrics by default about the state of all disks, so you could easily alert on the failed state, if we're talking about a mdadm RAID here.
We use btrfs RAID, but we are interested in switching to prometheus and it's currently just a manpower issue. So if you are interested in contributing knowledge, ansible roles or prometheus alertmanager rules we are certainly interested. Greetings, Jelle