Incident Management vs. Problem Management – Why It’s Critical You Understand the Difference
*This post originally appeared on the Cherwell blog, prior to the acquisition by Ivanti.
Why Is It Important to Know the Difference Between Problems and Incidents?
On the surface, it may seem like an “incident” and a “problem” are the same thing. Either word can be used in layman's terms to describe a situation that is having a negative impact on the business. But in IT, the two terms are different and need to be addressed and managed accordingly with different goals in mind.
At its most basic definition, an incident is a singular, independent event. Incidents are often something users would file an IT help desk ticket for and expect to be resolved quickly. A problem is the root cause of incidents and Problem Management tries to prevent incidents from occurring. Problems can often result in many incidents.
Think about a manager running a fleet of vehicles. One vehicle may experience a flat tire that needs to be changed quickly to get the truck back on the road. This event is an incident in that it’s isolated and only impacts that one truck. Incident Management is used in this case: The tire is changed to get the truck back into operation as quickly as possible.
Flat tires may move from being an incident to requiring Problem Management if they recur repeatedly or more than they reasonably should. In this case, the trucking company would investigate further in an attempt to identify the root cause of the excess flat tires. It may be that those particular tires are under a recall or the tire maintenance schedule is not being followed correctly, causing incidents to occur often. By identifying this underlying cause, the company can implement action to prevent future related incidents.
These basic principles are used by IT to appropriately address and resolve incidents and problems.
It’s important for business owners and managers outside of IT to understand the difference between an incident and a problem. While the terms may seem interchangeable, communicating clearly using the technical language of IT support will help reduce confusion and frustration. If you tell IT support you have an incident when in reality it’s a farther reaching problem, the underlying root cause could be left unaddressed, causing future headaches. Understanding the difference can help the organization reach appropriate resolution faster.
This article dives deeper into IT Incident Management and IT Problem Management, which are both ITIL processes commonly used at organizations across industries.
What Is Incident Management?
First, let’s look at IT Incident Management. Its goal is to restore service operations as quickly as possible and minimize the impact of an outage or service degradation. It is the IT support desk focused on troubleshooting individual tickets—sometimes with a workaround rather than a true fix. The activities associated with Incident Management primarily deal with recording the details of the incident, classifying the incident, investigating the incident, and ultimately resolving the incident.
The thought and process behind effective Incident Management appears in many places outside of IT. I recently had a bout with back pain. While frustrating, my experience helps illustrate how Incident Management, when performed well, functions more like a well-run doctor’s office rather than a “take two of these and call me in the morning” approach. On my first visit to the orthopedist, I was required to fill out forms to provide a context into my overall health and to articulately describe my symptoms. My doctor used that information, in addition to an X-ray, to diagnose and prescribe a treatment plan. The incident (in this case my back pain) was thoroughly documented, investigated, and an effective plan was put into place to resolve the issue quickly and effectively.
Inside business environments, many incidents are IT-related and need to be addressed by the appropriate parties. Whether an IT organization aligns to ITIL or not, there is almost always a role or function responsible for the management of incidents—whether it’s a group of two or a group of 200. The objectives and key performance indicators (KPIs) for Incident Management are relatively straightforward:
- Resolve the incident as quickly as possible
- Be conscious of the priority of the incident
- Be conscious of the cost of the resolution
- Assess the users’ level of satisfaction throughout the process
- Measure results with discrete metrics such as First Contact Resolution, Cost Per Contact, and Customer Satisfaction
If an incident does not appear to be isolated, IT teams may need to move into Problem Management.
What Is Problem Management?
The goal of Problem Management is to minimize the adverse impact of incidents and problems caused by errors in the infrastructure, and to prevent the recurrence of incidents related to those errors. The activities associated with Problem Management primarily deal with identifying why the incident occurred in the first place, and identifying and documenting known errors.
Unlike Incident Management, there is not almost always a role or function responsible for the management of problems (this goes beyond the IT support desk, which is focused on Incident Management). Nor is there a solid understanding of the objectives and key performance indicators. Companies must take a conscious extra step to implement Problem Management, assign resources to the task, and define the expected outcomes and KPIs that best fit their organization.Let’s go back to my back issue to understand how Problem Management, when performed well, functions like a comprehensive treatment. While my doctor provided some immediately relief (i.e., addressing the incident), he mentioned that if the treatment plan wasn’t working and I continued to experience pain, we might be dealing with something more significant that an MRI and further analysis would be able to determine.
Note this doesn’t negate the doctor's initial work. He couldn’t provide immediate resolution, but rather a workaround (medication and exercise, while limiting travel) that he’d identified and documented previously having seen complaints like mine in the past. He didn’t recommend surgery at my first visit, understanding that not only is that option not cost effective, but is also not appropriate until the root cause is determined.
Understanding the difference between Incident and Problem Management is merely the first step. The doctor’s office analogy is one of many to help you understand that Incident Management deals with an individual incident as quickly as possible, and that Problem Management deals with why the incident (or multiple similar incidents) occurred, and seeks to either eliminate the root cause or build an effective, easily-deployable workaround.
What Does Fixing an Incident Require?
As stated above, every organization must have at least a few individuals or a team dedicated to Incident Management and resolution (most likely an IT support desk or the team that handles IT support tickets). Without dedicated owners, incidents may not be resolved quickly, effectively, or consistently. Beyond having a team in place, there are a few key factors to successful Incident Management, particularly when addressing IT and operational related incidents. For Incident Management to be effective, it's important to have the following requirements:
- Continuous development of problem and error control
- A tiered support structure, where the team understands Tier 1 and 2 escalations
- A Continual Service Improvement program that measures efficiency and effectiveness through KPIs aligned to organizational goals and objectives
- Clear and documented roles and responsibilities within IT in terms of desired outcomes
Furthermore, IT must have robust Incident Management software at its disposal that includes:
- Integration of the IT service desk software and the IT asset management repository. This provides IT support with context regarding the the assets and services the user leverages, negating the need to fill out forms.
- A knowledge base within the ITSM tool that helps spread, scale, and standardize symptomatology. This enables IT support to work more quickly and maintain consistency across the team.
- The view of an IT service map provided by the ITSM solution’s configuration management database (CMDB). This helps IT understand what’s happening at the service level and better isolate troublesome configuration items that impact availability and performance.
These tools and processes will make it easier for IT or the service desk to collect the information needed with the appropriate context to fully understand the incident and its impact. That leads us into the second phase of fixing an incident: categorization and prioritization. Not all incidents will have the same impact on an organization and the ones causing the most or most influential disruptions need to be addressed first (i.e., an in-house printer not working versus a customer portal that ties to company SLAs being inaccessible). Understanding the incident and its impact will help IT teams assign proper resources and priority.
Once IT believes the incident has been resolved or they provide a workaround, they should check with end users to ensure the solution is functioning as intended and that no more user pain points persist.
What Does Fixding a Problem Require?
The integration of change, assets, and knowledge adds value to the Incident Management process, and therefore the organization. So why then do we see such a major drop-off when it comes to the problem management process? In the HDI Practices and Salary Report 2015, only 44 percent of IT organizations have adopted the problem management process, and only 22 percent of those organizations had a dedicated problem manager.
I believe the low adoption rate of Problem Management can most often be attributed to a lack of understanding of why Problem Management is important to the organization, which affects the alignment of roles and responsibilities associated with the process. There also tends to be a over-reliance on technology, which creates problem records and assigns ownership, but can’t within itself encourage individuals to determine root cause, identify workarounds, and recommend resolution approaches.
And, therein lies the problem with Problem Management.
To build successful Problem Management processes, IT must first determine why the process is important to them (reducing future incidents, minimizing downtime, improving infrastructure, etc.), and assign the roles and resources accordingly. At a minimum, IT leaders must apply the same amount of rigor as done with Incident Management.
A Problem Management leader must ensure that:
- Problems and errors are regularly (and properly) classified and identified
- Workarounds are documented and communicated to the incident management function
- The problem management process has well-defined and relevant KPIs (as determined by the organization and its goals for problem management)
- Clear and documented roles and responsibilities in terms of desired outcomes
IT must also ensure it has the proper enabling ITSM solution (or more targeted Problem Management software) that performs the following functions:
- Cross-reference the details of the incident against both the knowledge base and the known-error database, making it easier to link incident records to problem records
- Make it easy to assign ownership of problem records to individuals or functional groups
- Make it easy to quickly promote problems to Request for Change (RFC), complete with all the necessary context and documentation
- Provide a full and rich dashboard that intuitively organizes critical problem management metrics into a single panel
As Orr pointed out, Problem Management should not only be reactionary. While identifying the root cause of the most recent and pressing incidents may take top priority, Problem Management teams should also be vigilantly searching for problems or infrastructure weaknesses that have not yet cause incidents (but are likely to eventually). This process may be a more manual, but it is an important component of true Problem Management.
Problems are “fixed” when they have an implemented solution or well documented and communicated workaround and subsequently no longer cause incidents.
Why It’s Important to Know the Difference Between the Two
While Incident Management and Problem Management are similar (so similar in fact that many people new to ITIL have difficulty separating the two), the main difference lies in the ultimate end goal. It’s important to remember that Incident Management’s goal is to quickly and effectively resolve an incident while minimizing negative impact. From there, support teams may move into Problem Management, with the aim of preventing similar incidents from recurring by addressing the underlying root cause.
For business owners and managers, understanding the difference between an IT incident and problem can help them effectively communicate with IT support and establish realistic expectations regarding outcomes.
Implementing effective Incident Management and Problem Management can be complicated, especially if your organization is new to ITIL. The key is to focus on your desired outcomes and find the processes that work best for your company and team.