Posts Tagged ‘infant mortality’

Roots of Reliability-Centered Maintenance

Tuesday, February 11th, 2014

Last month, I discussed the pioneering WWII-era work of the eminent British scientist C.H. Waddington, who discovered that the scheduled preventive maintenance (PM) being performed on RAF B-24 bombers was actually doing more harm than good, and that drastically cutting back on such PM resulted in spectacular improvement in dispatch reliability of those aircraft. Two decades later, a pair of brilliant American engineers at United Airlines—Stan Nowlan and Howard Heap—independently rediscovered the utter wrongheadedness of traditional scheduled PM, and took things to the next level by formulating a rigorous engineering methodology for creating an optimal maintenance program to maximize safety and dispatch reliability while minimizing cost and downtime. Their approach became known as “Reliability-Centered Maintenance” (RCM), and revolutionized the way maintenance is done in the airline industry, military aviation, high-end bizjets, space flight, and numerous non-aviation applications from nuclear power plants to auto factories.

RCM wear-out curve

The traditional approach to PM assumes that most components start out reliable, and then at some point start becoming unreliable as they age

The “useful life” fallacy

Nowlan and Heap showed the fallacy of two fundamental principles underlying traditional scheduled PM:

  • Components start off being reliable, but their reliability deteriorates with age.
  • The useful life of components can be established statistically, so components can be retired or overhauled before they fail.

It turns out that both of these principles are wrong. To quote Nowlan and Heap:

“One of the underlying assumptions of maintenance theory has always been that there is a fundamental cause-and-effect relationship between scheduled maintenance and operating reliability. This assumption was based on the intuitive belief that because mechanical parts wear out, the reliability of any equipment is directly related to operating age. It therefore followed that the more frequently equipment was overhauled, the better protected it was against the likelihood of failure. The only problem was in determining what age limit was necessary to assure reliable operation. “In the case of aircraft it was also commonly assumed that all reliability problems were directly related to operating safety. Over the years, however, it was found that many types of failures could not be prevented no matter how intensive the maintenance activities. [Aircraft] designers were able to cope with this problem, not by preventing failures, but by preventing such failures from affecting safety. In most aircraft essential functions are protected by redundancy features which ensure that, in the event of a failure, the necessary function will still be available from some other source.

RCM six curves

RCM researchers found that only 2% of aircraft components have failures that are predominantly age-related (curve B), and that 68% have failures that are primarily infant mortality (curve F).

“Despite the time-honored belief that reliability was directly related to the intervals between scheduled overhauls, searching studies based on actuarial analysis of failure data suggested that the traditional hard-time policies were, apart from their expense, ineffective in controlling failure rates. This was not because the intervals were not short enough, and surely not because the tear down inspections were not sufficiently thorough. Rather, it was because, contrary to expectations, for many items the likelihood of failure did not in fact increase with increasing age. Consequently a maintenance policy based exclusively on some maximum operating age would, no matter what the age limit, have little or no effect on the failure rate.”

[F. Stanley Nowlan and Howard F. Heap, “Reliability-Centered Maintenance” 1978, DoD Report Number AD-A066579.]

Winning the war by picking our battles

FMEAAnother traditional maintenance fallacy was the intuitive notion that aircraft component failures are dangerous and need to be prevented through PM. A major focus of RCM was to identify the ways that various components fail, and then evaluate the frequency and consequences of those failures. This is known as “Failure Modes and Effects Analysis” (FMEA). Researchers found that while certain failure modes have serious consequences that can compromise safety (e.g., a cracked wing spar), the overwhelming majority of component failures have no safety impact and have consequences that are quite acceptable (e.g., a failed #2 comm radio or #3 hydraulic pump). Under the RCM philosophy, it makes no sense whatsoever to perform PM on components whose failure has acceptable consequences; the optimal maintenance approach for such components is simply to leave them alone, wait until they fail, and then replace or repair them when they do. This strategy is known as “run to failure” and is a major tenet of RCM.

A maintenance revolution…

Jet airliner

The 747, DC-10 and L-1011 were the first airliners that had RCM-based maintenance programs.

As a direct result of this research, airline maintenance practices changed radically. RCM-inspired maintenance programs were developed for the Boeing 747, Douglas DC-10 and Lockheed L-1011, and for all subsequent airliners. The contrast with the traditional (pre-RCM) maintenance programs for the Boeing 707 and 727 and Douglas DC-8 was astonishing. The vast majority of component TBOs and life-limits were abandoned in favor of an on-condition approach based on monitoring the actual condition of engines and other components and keeping them in service until their condition demonstrably deteriorated to an unacceptable degree. For example, DC-8 had 339 components with TBOs or life limits, whereas the DC-10 had only seven—and none of them were engines. (Research showed clearly that overhauling engines at a specific TBO didn’t make them safer, and actually did the opposite.) In addition, the amount of scheduled maintenance was drastically reduced. For example, the DC-8 maintenance program required 4,000,000 labor hours of major structural inspections during the aircraft’s first 20,000 hours in service, while the 747 maintenance program called for only 66,000 labor hours, a reduction of nearly two orders of magnitude.

Greybeard AMTs.

Owner-flown GA, particularly piston GA, is the only remaining segment of aviation that does things the bad old-fashioned way.

Of course, these changes saved the airlines a king’s ransom in reduced maintenance costs and scheduled downtime. At the same time, the airplanes had far fewer maintenance squawks and much better dispatch reliability. (This was the same phenomenon that the RAF experienced during WWII when they followed Waddington’s advice to slash scheduled PM.)

…that hasnt yet reached piston GA

Today, there’s only one segment of aviation that has NOT adopted the enlightened RCM approach to maintenance, and still does scheduled PM the bad old-fashioned way. Sadly, that segment is owner-flown GA—particularly piston GA—at the bottom of the aviation food chain where a lot of us hang out. I’ll offer some thoughts about that next month.