Root Cause Analysis FAQs

by Kevin McManus, Chief Excellence Officer and Systems Guy, Great Systems

How Do You Find Root Causes?

I believe that in order to improve ANY process, you have to find and minimize the root causes of process waste. Most organizations would say that they know this, and they could easily show you how they invest lots of time and money over time in attempt to do this. What they might struggle to do however is show you how effective their current approach to performing root cause analysis is.

LEARN MORE: Buy my new ‘How to Lead Great Investigations’ book on!

The following questions represent those that I have been asked to comment on in recent months.

When should a formalized “root cause analysis” be conducted on an equipment failure event?

Before we can effectively answer this question, we need to determine what is meant by the term ‘formalized.’  Most people consider root cause analysis to be formal in nature if some tool, or set of tools, is used to help guide people towards the root causes of a problem (as opposed to simply standing around and relying on opinion).  My personal definition goes a little further however.  To me, an RCA process is not formal unless facts are used to validate the root causes that are selected, and ideally, to also help guide the problem solvers towards a comprehensive set of possible root causes to begin with.  Often, the list of possible root causes itself is much less than adequate because the belief systems, opinions, or knowledge base of the problem solver does not consider a sound list of human engineering, communication, procedure, training, work direction, or management system root cause possibilities.  If your list of possible root causes is too narrowly defined, you won’t even begin to consider certain problem roots, no matter if you are using a formal tool for analysis purposes or not.

Ideally, all equipment problems should be analyzed.  Unfortunately, few organizations, if any, have the resources for pursuing this ideal in the short term.  In turn, equipment problems must be attacked from two perspectives.  First, all major problems should be formally analyzed.  The organization must determine what ‘major’ means by considering acceptable downtime duration or cost thresholds.  Secondly, all downtime problems of a significant (again, management must define the significance threshold) nature should be captured in a database for subsequent Pareto analysis.  Formal root cause analysis is then used on the biggest bar of the Pareto chart, just as it was for the singular equipment problem.

As root cause analysis process formality and soundness of design increases, your problems should decrease over time.  Conversely, if you choose to rely on a primarily opinion based root cause tool such as fishbone diagrams or the five whys, you will compromise the results of your root cause analysis efforts, whether you used these tools in a formal manner or not.  In too many organizations, problems keep coming back because the problems were not analyzed using a sound RCA process, facts were not used to validate the root causes that were selected, and a less than adequate list of possible causes was considered in the first place.

What is the main mistake people make when they attempt to find the root cause of a problem?

The most common mistake people make when they search for the root cause of a problem is assuming that the human errors or equipment failures they find are the actual problem’s root cause or causes. Think about it – what does the media say the main cause of airline accidents is? You probably guessed it – pilot error. Instead of looking for the systemic design or execution flaws that increased the likelihood of the error or equipment problem, we blame the person, or fix the equipment, and think that our problems are solved. Unfortunately, the problems often come back – again and again. In turn, the time we invested in our root cause analysis efforts is often wasted, in addition to the frustration that arises when the same old problems come back to haunt us.

From a TapRooT® root cause analysis perspective, pilot error is what we call a causal factor. Causal factors are the starting point for root cause analysis, not the end point. A small percentage of teams do search for the system problems that led to a human error or equipment failure, but most do not. In TapRooT® however, each casual factor is taken through the root cause tree of more than 100 possible root causes. Facts from the problem investigation effort, along with questions from the TapRooT® dictionary, are used to help find the actual systemic problems which are the true problem root causes.



How many people are normally involved in a root cause analysis investigation?

The number of people involved in an investigation depends primarily on two factors – the magnitude of the problem being investigated and the investigation quality level desired.  Most of the time required to complete an investigation is used in the collection of information and other data (usually at least 75%).

The person, or persons, who were injured, made an error, or discovered an equipment problem is normally questioned first, along with any witnesses.  Subject matter experts (SMEs) are then questioned.  I have found that the number of subject matter experts you choose to involve is what affects the total number of people involved AND the quality of the investigation itself the most.

High levels of investigator skill (the ability to ask a lot of great questions while also keeping a very open mind for example) can reduce the number of SMEs involved to some degree, but the same rule of thumb still applies – more human input, better results.  Additionally, the amount of non-interview data that needs to be collected will also affect the number of people involved.

The final factor that should be considered in determining how many people to involve is the time allowed to complete the investigation.  If an analysis and a report need to be completed quickly, you will need to throw more resources at the investigation.  If you have the luxury of taking six months to analyze a relatively minor problem, one skilled person could possibly handle it.

What roles do people take during a root cause analysis investigation?

The roles that are played during an investigation are closely linked to the (1) types of information that need to be collected (interviews, document collection and review, site visit and analysis, and video review) and (2) the steps of the investigation process itself.  One person, of course, would serve as the lead of the investigation effort.  Others might serve as interviewers and data collectors.  Some people will be expected to do little more than answer questions or provide documentation.

If you use a team approach to collecting investigative information and developing corrective actions, you will also need a skilled team facilitator to avoid wasting valuable group time.  Some people, for example, are very skilled at managing a project on their own, but they have not ever had the chance to learn the skills required to manage group dynamics or group conflict. In turn, a lot of time can be wasted and the group can lose confidence in the investigation effort.

How long will a root cause analysis investigation normally take?

Three primary factors affect the amount of time required to complete an investigation – the magnitude of the problem being investigated, the skills of the investigator, and the time allowed for completing the investigation.  The worst case scenario would involve a person with poor investigation skills trying to solve a big problem in a short amount of time.

TapRooT® instructors often say that it can take as much as a week to thoroughly investigate a lost time incident or significant environmental spill.  Major problems, such as refinery fires or fatalities, could take weeks or months.  Smaller problems, such as analyzing why a person put the wrong label on a box of product, can be completed in a few hours. One key factor that drives your ability to do good root cause analysis is having the ability to collect a broad-based set of facts in a concise amount of time (before the evidence cools off or changes too much. Interview findings provide only a portion (less than half) of the facts needed to find the systemic root causes of a given problem – site assessment, document analysis, and video review (if available) are also key. You can cut the investigation cycle time down by skimping on the facts, but you will also miss root causes.

Most importantly, you have to consider the amount of resources that can be devoted to the investigation in a given period of time in order to effectively answer this question.  For example, if it takes 40 man-hours to complete an investigation, and the people involved can only devote a total of ten hours a week to the investigation effort, it will take 4 weeks to complete the investigation if they use this time wisely.

Who should be involved in an investigation (level of organization)?

The answer to this question is largely dependent on (1) the culture of the organization involved and (2) the type of involvement you are talking about.  Ideally, you should involve a cross section of people from different departments and different levels in the organization.  Involving this mix of people will give you much better results due to the fact that you will be accessing a variety of perspectives and skill / knowledge levels.

Unfortunately, too many organizations have cultural issues that make it difficult to involve this mix of people.  I have seen companies where front line people are rarely involved because management does not value their opinion.  I have seen others where little but conflict is the result when certain groups are brought together in a room. Finally, the lack of a skilled facilitator makes it tough to get good group results in general when a mix of departments is brought together.

Culture aside, there is little excuse for not involving this mix of people from an interview perspective.  You might not be able to have them work together in a room to build a Snapchart or develop corrective actions, but you should always be able to ask them questions, even if you have to do it one-on-one or with a union representative present.

If you are interested in learning more about the TapRooT® root cause analysis process, please send me an an e-mail at

Do I need to bother the operators and maintenance people, or can we do the investigation using only our process documentation?

After reading my response to the “Who should I involve?” question above, you probably can guess what my response here will be.  Documentation review is only one piece of the investigation puzzle – you have to talk to people also! Talking to people helps you understand how the documentation was completed, interpreted, and acted upon.  It helps you clarify how well people understood the documentation, and what they were thinking as they used or completed it.  Talking to people also helps you learn more about the things that can’t be captured on paper, such as smells, feelings, sounds, perceptions, and tastes.  Finally, as you know, people don’t write down everything they should.

Also, don’t see it as bothering people. Most people want to contribute their thoughts if these thoughts are requested in a non-threatening manner and will not be used against them.  You need to schedule your interviews at a time that is convenient for the person being interviewed, but it has been my experience that people will be more put off if they are not asked than if they are.

Keep improving! – Kevin McManus, Chief Excellence Officer and Systems Guy, Great Systems