Why Our Process Improvement Fixes Fail

By Kevin McManus, Chief Excellence Officer, Great Systems

There is a psychological pattern that causes many of our process improvement fixes to fail. Over the past forty-plus years, I have seen tens of thousands of corrective and preventive actions. After a while, a pattern began to emerge. I saw the same pattern with attempts to address ‘unsafe act’ and ‘unsafe condition’ audit findings.

It wasn’t just the ‘weak fix’ pattern many people see in their organizations. You know what I mean by weak fixes. Think of your favorite retraining, procedure expanding, and punishment focused changes. Instead, it was a pattern of a less visible nature.

In a nutshell, here is what I saw:

Our fixes tend to be weak, and in turn fail, when we blame people and equipment for our problems.

The root causes we select set the stage for weak versus strong corrective actions. Those root causes are a result of the root cause questions we ask, how we collect and analyze evidence, and the design of the root cause analysis process itself.

As Henry Ford said, “If you always do what you’ve always done, you will always get what you’ve always got.”

What Causes Your Process Improvement Fixes to Fail?

When a person attempts to write a corrective action to address a human error, they tend to gravitate towards the recommendation of a relatively weak fix. For example, if we want to write a corrective action to address the problem of people not wearing the right work gloves, what do we often recommend?

Most people say we should make sure gloves are available. Plus, we should remind the employee of the requirement, and need, to wear gloves. Some might even want to punish the non-glove wearers. How effective are these fixes? What is the probability that people will always wear the correct gloves in the future?

We don’t address the work system gaps or weaknesses that fail to prevent errors.

This, I believe, is the core reason corrective and preventive actions fail. Our Western culture conditions us to blame people, the weather, and equipment.

We fail to see the ever-present connection between human error rates and weak work system design. Weak designs equate to high error rates for any human! All too often we expect micro-level error rates – rates below one-half of a percent! You cannot sustain such low rates without exquisite work system design.

More Reasons Why Our Process Improvement Fixes Fail

We don’t know our real error rates, costs, and risk levels. Many more errors occur daily on the job than we capture. Also, risk levels are often higher than we expect due to team member, skill level, and work setting changes. We won’t invest significant money if the problem is not large enough.

However, how well does our investigation and analysis capture the problem’s true magnitude? How effective are we at risk level estimation? When we underestimate risk, we often under-design our safeguards. Risk level should drive safeguard design and use.

We fail to evaluate and improve our existing safeguards. Most people use the same safeguards every day to minimize errors. It does not seem to matter if the risk level for a day’s work fluctuates or not. In other cases, some work teams only have ineffective safeguards, such as weak supervision, poorly written work instructions, and no formal training to rely on.

Does Your Root Cause Analysis Process Contribute to Fix Failure?

Too many organizations rely on weak fixes that fail. Examples include reminders, discipline, and retraining. What is the case in your company? Why does this happen? Often, this is a result of the root cause analysis approach we use to find root causes.

The design of traditional approaches such as the 5 Why technique or fishbone analysis allow us to view human error and equipment failures as root causes. My experiences teach me that this is a bad thing to do. It is an analysis process error to blame problems on people and equipment. The better option is to use a root cause analysis approach that looks for the systemic reasons a person makes mistakes.

Additionally, most process problems have a path to failure. However, we often fail to hunt for that path. In other words, multiple errors and failures across the process or value stream build on each other. Some call this the path of causation. An effective root cause analysis process looks at all the errors and failures that could help prevent a problem or make it less severe.

Two Final Reasons Why Our Process Improvement Fixes Fail

We often underestimate what it takes to create and sustain human behavior change. Lecturing people for an hour won’t make them change. You can threaten them or be nice to them, but the outcome is the same. Preventive actions in the form of work system change are necessary to drive behavior change that lasts.

We fail to make distinctions between corrective (short-term) and preventive (longer-term) fixes. In many cases, the two types of fixes for a given problem are different. The preventive action is not necessarily an extension of the corrective action. Instead, it is often a higher order form of work system improvement. For example, instead of improving how we replace a part each time it breaks, we switch to a more reliable part.

THAT’S ENOUGH GRIPING … NOW, LET’S LOOK AT HOW TO FIX THINGS!

#1: Take Advantage of the Corrective and Preventive Action Difference!

I did a check on the two terms from a Wikipedia dictionary perspective, but the posted definitions were a tad too textbook-like. Instead, I will use the definitions and examples different CAPA Managers have taught me over the years. In short, the two types of fixes are very different. You need both fix types to SUSTAIN micro-level error rates!

Corrective actions are like a band-aid. They show that we took action to address a problem’s causes. On the other hand, preventive actions go after the source of non-conformity. When we write an effective preventive action, we change how the work is done. Instead of improving one instruction set, we improve the process and criteria for instruction design.

In an effective CAPA system, we write both types of actions to address problems. Corrective actions address short-term, immediate concerns. They help mitigate immediate risk. Preventive actions focus on longer-term improvements that prevent future errors and failures. However, I find that most organizations fail to teach people how to make clear distinctions between these two types of improvement.

#2: Focus on System Gaps for Better Corrective and Preventive Action Alternatives

When a person uses a work systems gap or weakness as their initial corrective action reference point, they are more likely to recommend a systematic fix. This sounds simple. However, before you toss my observation aside, look at the collective nature of the last fifty or so corrective actions you have written or reviewed.

It seems simple, really. If we think a person is to blame, we try to fix the person. It is literally how parents often raise their children. When equipment fails, we repair the equipment. It looks like we took action. However, how well do our actions prevent future errors and failures? Why did the errors and failures occur? If we can find and fix the causes of the errors and failures, our fixes are much more likely to last.

How relatively strong, or weak, are the fixes you propose and install? Where would they fall on the hierarchy of controls? What percent of the time do you attempt to change people instead of systems? What percentage of your fixes are preventive, versus corrective, in nature? How often do your fixes change how work is done?

#3: Change Work Systems – How Work is Done – to Minimize Human Error

All people make mistakes. If we want our folks to produce error free work, we must design work systems that discourage, versus encourage, human error. Our space programs and nuclear power generation companies get this. For example, they rely heavily on well-designed checklists to guide people in daily job performance.

Even people with PhD degrees and genius-level IQs don’t rely primarily, if not solely, on memory. Think Neil Armstrong and his use of instructions in-hand as an astronaut. What percentage of the time do you count on memory to help minimize errors? Is it possible that the use of well-designed checklists could significantly improve performance?

For years, I saw human error as a root cause until I started teaching the TapRooT® root cause analysis approach as a contract trainer. The design of this approach forces the user to look for the systemic causes of human error and equipment failure.

In other words, human error is rarely, if ever, a root cause with this process. What percentage of your root causes are human errors? Is it possible that a different root cause analysis approach could lead you to better, work system focused fixes?

#4: Start with the Five Core Daily Risk Minimization Safeguards

Most work teams use five core safeguards every day to achieve effective work process performance. For example, all work teams have some level of supervision and training. Both approaches may need improvement, but we spend daily time and money to provide them. Also, the work environment typically contains some type of passive safeguards, such as barricades, to help prevent errors.

More day-to-day variation occurs with the other two core safeguards – work instructions and rules. Some people work from memory every day. Others have detailed in-hand instructions. Who do you think makes more errors? Similarly, variation exists in terms of how work team leaders use positive and negative enforcement to ensure people follow the daily rules.

Most medium to large organizations have some form of these core safeguards in place. Smaller teams may not. In either case, leaders rarely measure safeguard effectiveness. How can they do this? It’s simple. For starters, ask the user!!

Listen to the podcast for this post!

Here is a summary of some additional best practices I share to help people write more effective corrective and preventive actions:

#5: Review your most recent 50 or so root causes and corrective actions.

What patterns, biases, and gaps do you see? Do you have favorite root causes and fixes? Are there certain types of root causes and fixes you don’t see often enough on the list? How can you change your corrective action writing process to help prevent this issue?

#6: Require the effective completion of a corrective action writing class.

This type of class does not have to fancy, expensive, or even live. However, it should include corrective action writing practice where the instructor and/or peers provide feedback on the work. Also, support this practice with examples of both well-written and poorly written fixes. This helps define the post-class corrective action expectations.

#7: Evaluate ‘fix mix’ strength against potential problem severity to ensure a match exists.

All too often, we base our mix of corrective actions on what happens versus what could happen. The difference between a near miss and a fatality is often not that much. We should not dilute our mix of fixes just because we got lucky. Instead, always ask ‘What could have happened?’. Then, you can focus on future problem prevention.

#8: Set a creative, engineered fix minimum threshold.

One way to do this is to always expect to have at least one engineered fix of some type in the mix of changes you recommend. Another way to encourage engineered fixes is to work your way down the hierarchy of controls when you look for solutions. Can you eliminate or minimize the potential for a problem? How might you better protect the customer or the patient from potential hazards or errors? Can we better ‘ guard the customer or employee from the potential problem?

#9: Check for evidence of corrective action effectiveness in your ‘high risk’ behavior observation trends.

The best way to tell how well your fixes work is to capture and trend daily ‘high risk behaviors’ at the process level. Is there a decrease in the error or failure rates?

#10: Reduce the holes in your current safeguards before you add new ones (more cheese).

Too few companies measure safeguard effectiveness. In turn, they don’t know the relative degree to which a safeguard is effective. They assume that the existing safeguards work somewhat, but more are needed to prevent the errors we see.

Opportunity exists for safeguard improvement when you use best practices as a guide. Don’t spend MORE time and money to prevent errors. Instead, spend your current time and money investments for this purpose DIFFERENTLY. Smaller holes in the cheese win the game, not more cheese slices.

#11: Always question the potential impact of a potential corrective or preventive action.

It is even more important to assess the relative impact of the corrective action MIX that you recommend for implementation. Will this mix of fixes significantly reduce the risk associated with future human errors or equipment failures of this type?

There should be synergy between the changes you propose. Higher quality instructions lead to more effective training, either on screen or on the job. Job prep processes are requisite for excellence, but effective leaders must guide their execution. The work environment should encourage rule compliance, not discourage it, by its design.

#12: System-Focused Root Causes Lead to Better Process Improvement Fixes

If we continue to try to write corrective and preventive actions to address human error and equipment failures directly, we will continue to write relatively weak fixes. That is the psychology of fixes that fail. Additionally, we should expect our preventive actions to address higher level work system gaps and result in more change.

The best fix you can personally make is to reject human error and component failures as root causes. Instead, always search for the systemic reasons humans do things they themselves really don’t intend to do. Try to understand the error!

How often do your fixes fail? Is it possible that a root cause analysis process shift, along with a psychological shift, could lead you towards a more error free workplace?

Learn More About the TapRooT® Root Cause Analysis Process

If you want to learn more about the TapRooT® root causes analysis process, you can always visit the TapRooT® website. Some favorite quick access links of mine include:

3-day VIRTUAL TapRooT® Root Cause Analysis Workshop
I still teach 10-15 virtual 3-day courses a year for System Improvements as a contract instructor. If you would like me to teach this course in your company, one option is to contact me directly. Also, you can request to have me as your instructor when you book a virtual course through the System Improvements website.

2023 TapRooT® Summit
I will be at this annual event once again in 2023. In 2023, I will present my new ‘Measurement, Trending, and Predictive Analytics – How the Best use Data to Improve’ workshop as a 2-day pre-Summit learning event. It synergizes the content of Mark Paradise’s ‘Advanced Trending Techniques’ course from past Summits, my own Vital Signs, Scorecards, and Goals book, and my daily best practices research.

Connect with Kevin to Share Your Comments and Questions

If you found value in this article, you might also like my ‘Real Life Work’ weekly podcast. Plus, you might like my ‘Error Proof’ book, 2-day ‘Mistake Proofing and Corrective Action Writing’ workshop, and/or the workbook for that workshop.

FOLLOW these links to buy these books NOW on Amazon.com.

Error Proof – How to Stop Daily Goofs for Good’ book
I hope to have the audio version edition of this book ready to drop early next year. Plus, I will use that opportunity to fix a few of the small typos in that first edition (SMH!)

Mistake Proofing and Corrective Action Writing Best Practices’ workbook
In the past, you could only get this workbook by attending one of my 2-day Mistake Proofing workshops. As I become more virtual to help minimize my carbon footprint, I am making my content available online.

Keep improving!

Kevin McManus, Chief Excellence Officer, Great Systems

LIKE Great Systems on Facebook

CONNECT with me on LinkedIn

CHECK OUT my Amazon.com Author Page

FOLLOW me on Twitter: @greatsystems