Incident Metrics

Incident metrics are crucial for helping you understand the health and effectiveness of your services, environments, functionalities, and incident response teams. They can help us determine how quickly we are responding to incidents, and in turn, how much trust we are building with our users. If we are not paying attention to relevant metrics, we can lose valuable time by investing in the wrong projects and procedures. Luckily, FireHydrant can provide you with all the information you need to make informed business decisions when it comes to reliability.

The following definitions include common incident milestones, which are defined in this article.

  • MTTD : Mean Time to Detection 
    time of detection - time of incident start

  • MTTA : Mean Time to Acknowledged
    time to acknowledgment - time of incident start

  • MTTM : Mean Time to Mitigation
    time to mitigation - time of incident start

  • MTTR : Mean Time to Resolution
    time to resolution - time of incident start

  • Healthiness : (MTTM * incidents) / time window

As an example, if you have an incident for a given service that was started at noon, mitigated at 1 PM, and then resolved at 2 PM, healthiness for that infrastructure would be 50% for the window of noon to 2 PM.

  • Impact : Within a given date range, multiple incidents are added up to calculate the time a service, functionality, or environment was degraded.

Last updated on 2/9/2024