Incident Management is the process that IT organizations follow to manage the lifecycle of incidents that are reported.
IT Jargon Explained
ITIL Incident Management
Incident Management is usually the first IT Infrastructure Library (ITIL®) process targeted for implementation or improvement among organizations seeking to adopt ITIL best practices. The reasons for this are simple: Improved Consumerization and Service Value Realization. Incident Management is the day-to-day process utilized by the organization through engagement with the service desk or self-help technology for rapid service restoration.
The high performance of this process is critical to the organization and to the users of impacted services. Without it, chaotic behavior is experienced, impacting user performance, organizational performance and overall economic value for both the customer and the supplier of the service. Incident Management itself should support the business strategy, and the business strategy should enable the means by which Incident management is performed to obtain value.
In this guide, we'll look at ITIL's Incident Management system in detail. Beginning with a definition and objective statement for the process, we'll look at how ITIL defines the process flow, understand how the support team works together to resolve IT incidents, and learn how the process' success within an enterprise can be measured using key performance indicators (KPIs). Finally, we'll examine how new integrated service management software facilitates automation and helps organizations establish a consolidated service desk and resolve incidents more efficiently.
What Is Incident Management in ITIL?
In ITIL, the term "incident" is used to describe an unplanned interruption or reduction in the quality of an IT service, which can be tremendously costly for large organizations. The primary objective of the Incident Management process is to return service to users as quickly as possible when interruptions occur.
Along with basic request fulfillment, Incident Management is one of the most important processes that IT organizations manage each day. While the request fulfillment process is used to address standard user requests like changing a password, Incident Management addresses genuine service outages with the goal of resolving the outage and returning service to users as quickly as possible.
In the five-stage service lifecycle model used in ITIL, Incident Management falls under "Service Operation." This is the fourth stage of the service lifecycle and the one where a service is already in operation by the organization. The process helps ensure that an organization can extract the maximum value from the services and applications that it supports by working to ensure performance, availability, and user access to the service.
What Are the Incident Management Process and Workflows?
Incident Management is the process that IT organizations follow to manage the lifecycle of incidents that are reported. That process consists of several steps, often known as sub-processes, that must all be carried out to ensure that incidents are adequately resolved and documented. Below, we describe each of the sub-processes and what they achieve for the organization.
Incident Management Support - The goal of Incident Management support is to provide and maintain the tools, processes, skills, and rules needed for effective and efficient handling of incidents. This process helps to ensure that service desk agents or technicians have adequate education and training to respond to and resolve incidents that occur within the IT organization. This process also maintains the rules and workflows for processing and resolving incidents, ensuring that technicians always know what the next step is to ensure an incident is resolved.
Incident Logging and Categorization - The objective of this sub-process is to record and prioritize incident reports with the appropriate diligence to facilitate a swift and effective resolution. Organizations often have limited resources for resolving incidents and other IT issues, and the effective prioritization of inbound incident reports is a crucial step in ensuring that labor is allocated appropriately towards the highest-priority incidents. IT organizations need to be proficient at determining the scope and severity of a reported incident and prioritizing it accordingly. Incident logging and categorizations is often automated such as when an IT operations monitoring solution creates an incident due to a performance or availability event occurring.
Immediate Incident Resolution by 1st-Level Support - When a user reports an incident to the service desk for the first time, they will typically report the issue to a 1st-level service technician. The ideal outcome is that the 1st-level technician can address the incident and restore the IT service on the first call and within a target resolution time set by the IT organization. When an incident cannot be resolved within the target time, or if a greater degree of technically specialized knowledge is required to resolve the incident, an escalation occurs and a 2nd-level support technician can take over the incident.
Incident Resolution by 2nd-Level Support - Once an incident has been escalated beyond a first-call resolution by 1st-level support, a 2nd-level support technician can take over the incident and begin searching for a workaround to restore service as quickly as possible. At this level, the technician has the flexibility to involve support groups or third-party suppliers in the resolution of the incident. If the incident is due to a malfunctioning application, for example, the 2nd-level technician may contact the company that developed the application for additional guidance in resolving the incident. If there is no way to address the root cause of the incident, the 2nd-Level Support technician can create a Problem Record and transfer the incident to the Problem Management process/team.
Handling of Major Incidents - Earlier, we mentioned the importance of prioritizing incidents according to their urgency so that resources could be deployed most efficiently. Major incidents are the highest priority IT incidents that an organization can recognize—they constitute serious interruptions or threats to business activities and need to be resolved with the utmost urgency to prevent financial losses or other critical consequences. Major incidents are escalated rapidly through 1st-level and 2nd-level support personnel and can involve third-party suppliers if the incident is not resolved quickly. Again, if a correction of the root cause is impossible, the incident is transferred to Problem Management.
Incident Monitoring and Escalation - IT organizations following ITIL best practices will establish and maintain a system for monitoring the status and escalations of each IT incident that is reported. IT managers that deal with Incident Management should be able to track the number of incidents currently reported and see their status in the Incident Management process. Service level agreements are breached when the Incident Management team takes too long to respond to incidents, and service outages lead to business interruptions. Incident monitoring is used to ensure that Incident Management tickets are being resolved and moved through the process in a timely fashion, such that service levels are maintained for the organization.
Incident Closure and Evaluation - Once an incident has been effectively resolved, the incident record is submitted to a final quality control step. This sub-process confirms that the incident has been resolved and that the lifecycle of the incident has been documented in sufficient detail. The findings from the incident report can be used by the organization in the future, including as an input for the Knowledge Management process. Incident Closure and Evaluation helps to ensure that the organization tracks all important information about an incident, and that it can learn something about the incident having resolved it.
Proactive User Information - Incident Management reports are usually submitted through the organization's service desk, which acts as a single point of contact for IT resources within the organization. The service desk team can also use this communication portal to proactively inform users about known issues and service outages within the organization. This sub-process helps to distribute information throughout the organization and cut down on the number of requests and inquiries on the service desk by providing up-to-date information about service outages within the organization.
Incident Management Reporting - This sub-process works to capture information from the Incident Management process and supply it to the other Service Management processes, ensuring that the organization has an opportunity to improve its performance based on data from past incidents.
How Do Organizations Measure Success in Incident Management?
Measuring the success of processes across the ITIL service lifecycle is the key to continuous service improvement. Organizations should decide on metrics that will be used to monitor the performance of each process and report accurately on those metrics to help identify the best opportunities for improvement. Below, we've listed five of the most significant KPIs that organizations can measure to ensure their Incident Management process is performing up to par.
Status of Incidents - Organizations can use software to track the status of incidents that are currently being managed as part of the Incident Management process. A look at the status of all open incidents in real-time can reveal information about where the largest back logs are being created and how the organization can best commit resources to improve flow and shorten resolution times. For example, if a lot of incidents are getting stuck at 2nd-Level Support without being resolved, the company could pursue several potential solutions:
- Add more 2nd-level support staff to expedite handling of incidents.
- Add more training for 2nd-level support staff to increase efficiency of incident resolution.
- Add more training for 1st-level support staff to reduce escalations.
- Engage 3rd-level support that can help manage the backlog of incidents of a specific type (for example, if there is a backlog of incidents for a malfunctioning printer, contact the manufacturer to help resolve issues).
First Call Resolution - The first call resolution rate tells us how often incidents are resolved by 1st-level technical support staff on the first call. Timely resolutions are the result of effectively trained staff with sufficient experience and access to resources and knowledge.
Average Cost per Incident/Incident Resolution Effort - Organizations can choose to measure either the average cost per incident managed or the average effort spent to resolve each incident. Organizations would like to minimize these costs while satisfying service level agreements and customer satisfaction. IT investment that leads to enhanced business up-time should generate a positive return on investment.
Average Initial Response Time - This KPI measures the average time between when a user reports an incident and when the service desk responds to the incident. If the service desk can resolve incidents quickly, but it takes three hours to get a response, the organization might consider adding more 1st-level service technicians to reduce the response time and correspondingly increase service availability.
Number of Repeated Incidents - Repeated or re-opened incidents are bad news for your organization. They can mean that support technicians have not identified the root cause of an issue, and therefore it keeps happening. Perhaps the IT staff knows how to resolve the issue and the users could actually do it themselves, but there are no resources available to facilitate self-service. Repeated incidents can be avoided by finding the root cause of an issue and pro-actively communicating with users to help them resolve the issue without reporting it to IT.
Incident Management Roles and Responsibilities
Well defined roles and responsibilities are critical to the effective execution of the Incident Management process. The Incident Management team is comprised of the following:
The Incident Manager has primary responsibility for driving and continually improving the Incident Management Process. In small- to mid-size organizations, this role is commonly assigned to the Service Desk Manager; in larger organizations, this may be a separately defined role. Key responsibilities include: team leadership, reporting key performance indicators (KPIs) back to management, direct management of first and second line support, managing the Incident Management system and enforcing the Incident Management process work flow.
First Line Support
First Line Service Desk Technicians are the single point of contact for end users seeking information and reporting service disruptions. They are primarily responsible for the initial support and classification of Incidents and the immediate attempt to restore a failed service as quickly as possible. If they are unable to resolve the Incident, the First Line Service Desk Technician will route the Incident to appropriate support personnel, monitor activity and keep users up to date on the status of their Incident.
Level Two Support
Second Line Support Technicians typically have more advanced knowledge than First Line Service Desk Technicians. They may become responsible for Incidents that First Line Support is unable to resolve. These technicians may interact with third party experts from software or hardware vendors to help restore normal service as quickly as possible.
Incident Management Key Performance Indicators (KPIs)
Measurements are important across all stages of the ITIL lifecycle. Each process has metrics that should be monitored and reported to effectively evaluate the overall performance. Continuous Service Improvement necessitates that the performance of each process be measured to identify areas needing improvement.
Typical Incident Management metrics include:
- Total Incidents reported (per category, priority, person, organizational unit, etc.)
- Status of Incidents
- Time between Incident creation and resolution
- Incidents and SLA (reached, breached)
- Average cost per Incident
- Reopen rate
- Incidents handled without escalation
- First call resolution
- Configuration Items experiencing recurring Incidents
- Incidents by time of day
KPIs should be related to Critical Success Factors (CSF) and CSFs should be related to objectives. This relationship helps with decision support for maintaining current state and improving to desired state. Although each organization is different, relevant reports for users, staff and management will help support important decisions that can be used to improve both the processes and the business as a whole.
Best Practices for Implementing Incident Management
Adopting the ITIL framework within a business can be a daunting task. As with any ITIL process, Incident Management implementation requires support from the business. Of particular importance is gaining buy-in from executives and upper management. Before beginning the adoption process, it’s important to have at least one person dedicated to the overall project management and orchestration of adherence to best practices for Incident Management. It is also extremely helpful to have an IT service management (ITSM) tool in place that will support your current state processes and desired future state processes, as well as a Service Desk acting as the primary interface with the IT department.
1) Understand the current Incident Management process
Occasionally an organization does not have a consistent process for handling incidents, or they have a less sophisticated one in place. Either way, it is important to map the existing process as well as possible in an effort to understand what the existing Service Desk process offers.
2) Identify long-term Incident Management process vision
It is also important to understand what the organization expects from the Incident Management process. The expectation may be based on generic Incident Management templates included with the ITSM tool or a more custom process based on the organization’s specific needs.
3) Conduct a gap analysis
Next, identify what must be adjusted between the organization’s current Incident Management process and its long-term vision for Incident Management. This will arm you with valuable information about the effort, time, money and resources necessary to achieve your Incident Management objectives and you overall service goals.
4) Create an implementation road map
Adopting any ITIL process will take time to develop, and you will need a road map to help set expectations for management. Use that road map to describe the activities, timeframe and efforts necessary to deliver. This roadmap should include quick wins, tool implementation, process changes, people and organization enablement, communication plans and overall governance changes.
5) Begin project implementation
It’s time for implementation to begin. Create a project plan that defines the actions or tasks, responsibilities and time line for completion of all tasks. Communicate the successes along the way as you achieve each milestone, demonstrating your progress towards your ultimate implementation goal.
Feature Checklist for Incident Management Software
For IT organizations evaluating Incident Management software and/or IT service management suites that offer Incident Management capabilities, it is important to understand the types of features required to support key processes. At a minimum, Incident Management software should provide the following capabilities:
- Create, modify, resolve, and close incident records
- Generate unique record numbers associated with each incident record
- Link incidents to problem records, knowledge articles, known workarounds, and requests for change
- Link configuration management data to incident record
- Notify incident owners when associated problem is resolved
- Automatically record of historical data in an audit log
- Configurable incident categorization
- Incident search and reporting capabilities
- Route incidents based on resource availability, time-zones, sites, etc.
- Prioritize, assign, and escalate incidents based on categorization; escalate based on priority or other categorization
- Integrate with event monitoring solutions with the ability to automatically create, update, and close incident
- Flexible field configurations including, free text, drop down, date/time, attachments, screen captures
- Link incidents to customer data
- Utilize knowledge base solutions/scripts for diagnosis and resolution
- Assign incidents or associated tasks to external service providers
- Assign incidents to multiple assignees
- Create a problem or request for change from an incident record
- Automated incident alerts (to IT staff and/or end-user) based on deadlines, SLAs, closure, and other activity
- Link incident records to SLAs
- Collect feedback from end-users via a customer satisfaction survey
- Initiate an incident on behalf of someone else
- Stop the SLA clock functionality to put an incident on hold
- Differentiate between an incident and a service request
- Reactivate resolved incident
- Prioritize automatically determined by impact and urgency
- Integrate with Telephony/ACD system to pre-populate customer information based on caller ID
*This content originally appeared on Cherwell.com, prior to the acquisition by Ivanti.