Failed jobs monitor

This monitor checks if there are new jobs that have failed. The monitor will trigger an alert as soon as it detects that there is a job at the top of the failed jobs that have not triggered an alarm before.

For this monitor to work properly you need to keep at least 1 failed job in your queues with the removeOnFailed option set to false, which is the default, or to a number equal or larger than 1.

Since it is not uncommon that when one job fails several others fail too, the monitor will not trigger new alerts until the existing triggered alert is acknowledged.

The alerts are triggered per queue, so you can have many alerts in the "triggered" status belonging to different queues.

Last updated