Roots of Reliability-Centered Maintenance

Tuesday, February 11th, 2014

Last month, I discussed the pioneering WWII-era work of the eminent British scientist C.H. Waddington, who discovered that the scheduled preventive maintenance (PM) being performed on RAF B-24 bombers was actually doing more harm than good, and that drastically cutting back on such PM resulted in spectacular improvement in dispatch reliability of those aircraft. Two decades later, a pair of brilliant American engineers at United Airlines—Stan Nowlan and Howard Heap—independently rediscovered the utter wrongheadedness of traditional scheduled PM, and took things to the next level by formulating a rigorous engineering methodology for creating an optimal maintenance program to maximize safety and dispatch reliability while minimizing cost and downtime. Their approach became known as “Reliability-Centered Maintenance” (RCM), and revolutionized the way maintenance is done in the airline industry, military aviation, high-end bizjets, space flight, and numerous non-aviation applications from nuclear power plants to auto factories.

The “useful life” fallacy

Nowlan and Heap showed the fallacy of two fundamental principles underlying traditional scheduled PM:

  • Components start off being reliable, but their reliability deteriorates with age.
  • The useful life of components can be established statistically, so components can be retired or overhauled before they fail.

It turns out that both of these principles are wrong. To quote Nowlan and Heap:

“One of the underlying assumptions of maintenance theory has always been that there is a fundamental cause-and-effect relationship between scheduled maintenance and operating reliability. This assumption was based on the intuitive belief that because mechanical parts wear out, the reliability of any equipment is directly related to operating age. It therefore followed that the more frequently equipment was overhauled, the better protected it was against the likelihood of failure. The only problem was in determining what age limit was necessary to assure reliable operation. “In the case of aircraft it was also commonly assumed that all reliability problems were directly related to operating safety. Over the years, however, it was found that many types of failures could not be prevented no matter how intensive the maintenance activities. [Aircraft] designers were able to cope with this problem, not by preventing failures, but by preventing such failures from affecting safety. In most aircraft essential functions are protected by redundancy features which ensure that, in the event of a failure, the necessary function will still be available from some other source.

“Despite the time-honored belief that reliability was directly related to the intervals between scheduled overhauls, searching studies based on actuarial analysis of failure data suggested that the traditional hard-time policies were, apart from their expense, ineffective in controlling failure rates. This was not because the intervals were not short enough, and surely not because the tear down inspections were not sufficiently thorough. Rather, it was because, contrary to expectations, for many items the likelihood of failure did not in fact increase with increasing age. Consequently a maintenance policy based exclusively on some maximum operating age would, no matter what the age limit, have little or no effect on the failure rate.”

Winning the war by picking our battles

FMEAAnother traditional maintenance fallacy was the intuitive notion that aircraft component failures are dangerous and need to be prevented through PM. A major focus of RCM was to identify the ways that various components fail, and then evaluate the frequency and consequences of those failures. This is known as “Failure Modes and Effects Analysis” (FMEA). Researchers found that while certain failure modes have serious consequences that can compromise safety (e.g., a cracked wing spar), the overwhelming majority of component failures have no safety impact and have consequences that are quite acceptable (e.g., a failed #2 comm radio or #3 hydraulic pump). Under the RCM philosophy, it makes no sense whatsoever to perform PM on components whose failure has acceptable consequences; the optimal maintenance approach for such components is simply to leave them alone, wait until they fail, and then replace or repair them when they do. This strategy is known as “run to failure” and is a major tenet of RCM.

A maintenance revolution…

As a direct result of this research, airline maintenance practices changed radically. RCM-inspired maintenance programs were developed for the Boeing 747, Douglas DC-10 and Lockheed L-1011, and for all subsequent airliners. The contrast with the traditional (pre-RCM) maintenance programs for the Boeing 707 and 727 and Douglas DC-8 was astonishing. The vast majority of component TBOs and life-limits were abandoned in favor of an on-condition approach based on monitoring the actual condition of engines and other components and keeping them in service until their condition demonstrably deteriorated to an unacceptable degree. For example, DC-8 had 339 components with TBOs or life limits, whereas the DC-10 had only seven—and none of them were engines. (Research showed clearly that overhauling engines at a specific TBO didn’t make them safer, and actually did the opposite.) In addition, the amount of scheduled maintenance was drastically reduced. For example, the DC-8 maintenance program required 4,000,000 labor hours of major structural inspections during the aircraft’s first 20,000 hours in service, while the 747 maintenance program called for only 66,000 labor hours, a reduction of nearly two orders of magnitude.

Of course, these changes saved the airlines a king’s ransom in reduced maintenance costs and scheduled downtime. At the same time, the airplanes had far fewer maintenance squawks and much better dispatch reliability. (This was the same phenomenon that the RAF experienced during WWII when they followed Waddington’s advice to slash scheduled PM.)

…that hasnt yet reached piston GA

Today, there’s only one segment of aviation that has NOT adopted the enlightened RCM approach to maintenance, and still does scheduled PM the bad old-fashioned way. Sadly, that segment is owner-flown GA—particularly piston GA—at the bottom of the aviation food chain where a lot of us hang out. I’ll offer some thoughts about that next month.

The Waddington Effect

Tuesday, January 14th, 2014
In 1943, a British scientist named Conrad Hal (C.H.) Waddington made a remarkable discovery about aircraft maintenance.  He was a most unlikely person to make this discovery, because he wasn’t an aeronautical engineer or an aircraft mechanic or even a pilot.  Actually, he was a gifted developmental biologist, paleontologist, geneticist, embryologist, philosopher, poet and painter who wasn’t particularly interested in aviation.  But like many other British scientists at that time, his career was interrupted by the outbreak of the Second World War and he found himself pressed into service with the Royal Air Force (RAF).

Waddington wound up reporting to the RAF Coastal Command, heading up a group of fellow scientists in the Coastal Command Operational Research Section.  Its job was to advise the British military on how it could more effectively combat the threat from German submarines.  In that capacity, Waddington and his colleagues developed a series of astonishing recommendations that defied military conventional wisdom of the time.

For example, the bombers used to hunt and kill U-boats were mostly painted black in order to make them difficult to see.  But Waddington’s group ran a series of experiments that proved that bombers painted white were not spotted by the U-boats until they were 20% closer, resulting in a 30% increase in successful sinkings. Waddington’s group also recommended that the depth charges dropped by the bombers be set to explode at a depth of 25 feet instead of 100 feet.  This recommendation—initially resisted strongly by RAF commanders—ultimately resulted in a sevenfold increase in the number of U-boats destroyed.

Waddington subsequently turned his attention to the problem of “force readiness” of the bombers.  The Coastal Command’s B-24 “Liberator” bombers were spending an inordinate amount of time in the maintenance shop instead of hunting U-boats.  In July 1943, the two British Liberator squadrons located at Ballykelly, Northern Ireland, consisted of 40 aircraft, but at any given time only about 20 were flight-ready.  The other aircraft were down for any number of reasons, but mostly undergoing or awaiting maintenance—either scheduled or unscheduled—or waiting for replacement parts.

At that time, conventional wisdom held that if more preventive maintenance were performed on each aircraft, fewer problems would arise and more incipient problems would be caught and fixed—and thus fleet readiness would surely improve. It turned out that conventional wisdom was wrong. It would take C.H. Waddington and his Operational Research team to prove just how wrong.

Waddington and his team started gathering data about the scheduled and unscheduled maintenance of these aircraft, and began crunching and analyzing the numbers.  When he plotted the number of unscheduled aircraft repairs as a function of flight time, Waddington discovered something both unexpected and significant: The number of unscheduled repairs spiked sharply right after each aircraft underwent its regular 50-hour scheduled maintenance, and then declined steadily over time until the next scheduled 50-hour maintenance, at which time they spiked up once again.

When Waddington examined the plot of this repair data, he concluded that the scheduled maintenance (in Waddington’s own words) “tends to INCREASE breakdowns, and this can only be because it is doing positive harm by disturbing a relatively satisfactory state of affairs. There is no sign that the rate of breakdowns is starting to increase again after 40-50 flying hours when the aircraft is coming due for its next scheduled maintenance.” In other words, the observed pattern of unscheduled repairs demonstrated that the scheduled preventive maintenance was actually doing more harm than good, and that the 50-hour preventive maintenance interval was inappropriately short.

The solution proposed by Waddington’s team—and ultimately accepted by the RAF commanders over the howls of the maintenance personnel—was to increase the time interval between scheduled maintenance cycles, and to eliminate all preventive maintenance tasks that couldn’t be demonstrably proven to be beneficial. Once these recommendations were implemented, the number of effective flying hours of the RAF Coastal Command bomber fleet increased by 60 percent!

Fast forward two decades to the 1960s, when a pair of gifted scientists who worked for United Airlines—aeronautical engineer Stanley Nowlan and mathematician Howard Heap—independently rediscovered these principles in their pioneering research on optimizing maintenance that revolutionized the way maintenance is done in air transport, military aviation, high-end bizjets and many non-aviation industrial applications.  They were almost certainly unaware of the work of C.H. Waddington and his colleagues in Britain in the 1940s because that work remained classified until 1973, when Waddington’s meticulously-kept diary of his wartime research activities was declassified and published.

Next time, I’ll discuss the fascinating work of Nowlan and Heap on what came to be known as “Reliability Centered Maintenance.” But for now, I will leave you with the major takeaway from Waddington’s research during World War II: Maintenance isn’t an inherently good thing (like exercise); it’s a necessary evil (like surgery). We have to do it from time to time, but we sure don’t want to do more than absolutely necessary to keep our aircraft safe and reliable. Doing more maintenance than necessary actually degrades safety and reliability.