As you probably have read, we're getting a new bunch of servers quite soon. I want to change our monitoring solution on those new servers as we are currently using munin but it has static graphs only and no notifications built-in. I think notifications would be helpful to have so that we're the first ones to know when a service is down or a server runs out of memory/storage.

The alternatives that were mentioned are these:

1) munin for graphs, timed ansible for notifications
2) influxdb for storage, collectd for stats collection, graphite for graphs, timed ansible for notifications
3) zabbix (does everything)
4) cacti (does everything with plugins)
5) prometheus (does everything)

My thoughts:

1) Basically the same of what we have now. Static daily/weekly/monthly/graphs only. On top of that, we'd use a systemd-timer unit for ansible I suppose. Munin generally is fairly easy to maintain and set up.
2) Four different programs to but each taking care of a certain task. I'm not sure this counts as complex or KISS. But really I have no idea about this stack as I have not worked with any element in it.
3) No idea about this one but it from a glance it looked rather complex and kind of icky to use.
4) I used this one once. It's a heavyduty PHP-based monitoring software that I probably wouldn't recommend. It was a bitch to set up and use when I tried it some years back. It might have improved.
5) This is the newcomer. Seems to be doing everything we'd need but I haven't checked it out in detail yet. Configuration format seems fairly easy and graphs are pretty. Might be worthwhile.

Would be interested in hearing your thoughts. If you have something to add, just add numbered bullet points. If nothing comes of this, we'll probably go back to munin.

Sven