Archive for the ‘Mike Busch’ Category

How Do Piston Aircraft Engines Fail?

Wednesday, April 9th, 2014

Last month, I tried to make the case that piston aircraft engines should be overhauled strictly on-condition, not at some fixed TBO. If we’re going to do that, we need to understand how these engines fail and how we can protect ourselves against such failures. The RCM way of doing that is called Failure Modes and Effects Analysis (FMEA), and involves examining each critical component of these engines and looking at how they fail, what consequences those failures have, and what practical and cost-efficient maintenance actions we can take to prevent or mitigate those failures. Here’s my quick back-of-the-envelope attempt at doing that…

Crankshaft

CrankshaftsThere’s no more serious failure mode than crankshaft failure. If it fails, the engine quits.

Yet crankshafts are rarely replaced at overhaul. Lycoming did a study that showed their crankshafts often remain in service for more than 14,000 hours (that’s 7+ TBOs) and 50 years. Continental hasn’t published any data on this, but their crankshafts probably have similar longevity.

Crankshafts fail in three ways: (1) infant-mortality failures due to improper materials or manufacture; (2) failures following unreported prop strikes; and (3) failures secondary to oil starvation and/or bearing failure.

Over the past 15 years, we’ve seen a rash of infant-mortality failures of crankshafts. Both Cnntinental and Lycoming have had major recalls of crankshafts that were either forged from bad steel or were damaged during manufacture. These failures invariably occurred within the first 200 hours after the new crankshaft entered service. If the crankshaft survived its first 200 hours, we can be confident that it was manufactured correctly and should perform reliably for numerous TBOs.

Unreported prop strikes seem to be getting rare because owners and mechanics are becoming smarter about the high risk of operating an engine after a prop strike. There’s now an AD mandating a post-prop-strike engine teardown for Lycoming engines, and a strongly worded service bulletin for Continental engines. Insurance will always pay for the teardown and any necessary repairs, so it’s a no-brainer.

That leaves failures due to oil starvation and/or bearing failure. I’ll address that shortly.

Crankcase halvesCrankcase

Crankcases are also rarely replaced at major overhaul. They are typically repaired as necessary, align-bored to restore critical fits and limits, and often provide reliable service for many TBOs. If the case remains in service long enough, it will eventually crack. The good news is that case cracks propagate slowly enough that a detailed visual inspection once a year is sufficient to detect such cracks before they pose a threat to safety. Engine failures caused by case cracks are extremely rare—so rare that I don’t think I ever remember hearing or reading about one.

Lycoming cam and lifterCamshaft and Lifters

The cam/lifter interface endures more pressure and friction than any other moving parts n the engine. The cam lobes and lifter faces must be hard and smooth in order to function and survive. Even tiny corrosion pits (caused by disuse or acid buildup in the oil) can lead to rapid destruction (spalling) of the surfaces and dictate the need for a premature engine teardown. Cam and lifter spalling is the number one reason that engines fail to make TBO, and it’s becoming an epidemic in the owner-flown fleet where aircraft tend to fly irregularly and sit unflown for weeks at a time.

The good news is that cam and lifter problems almost never cause catastrophic engine failures. Even with a badly spalled cam lobe (like the one pictured at right), the engine continues to run and make good power. Typically, a problem like this is discovered at a routine oil change when the oil filter is cut open and found to contain a substantial quantity of ferrous metal, or else a cylinder is removed for some reason and the worn cam lobe can be inspected visually.

If the engine is flown regularly, the cam and lifters can remain in pristine condition for thousands of hours. At overhaul, the cam and lifters are often replaced with new ones, although a reground cam and reground lifters are sometimes used and can be just as reliable.

Gears

The engine has lots of gears: crankshaft and camshaft gears, oil pump gears, accessory drive gears for fuel pump, magnetos, prop governor, and sometimes alternator. These gears are made of case-hardened steel and typically have a very long useful life. They are not usually replaced at overhaul unless obvious damage is found. Engine gears rarely cause catastrophic engine failures.

Oil Pump

Failure of the oil pump is rarely responsible for catastrophic engine failures. If oil pressure is lost, the engine will seize quickly. But the oil pump is dead-simple, consisting of two steel gears inside a close-tolerance aluminum housing, and usually operates trouble free. The pump housing can get scored if a chunk of metal passes through the oil pump—although the oil pickup tube has a suction screen to make sure that doesn’t happen—but even if the pump housing is damaged, the pump normally has ample output to maintain adequate oil pressure in flight, and the problem is mainly noticeable during idle and taxi. If the pump output seems deficient at idle, the oil pump housing can be removed and replaced without tearing down the engine.

spun main bearingBearings

Bearing failure is responsible for a significant number of catastrophic engine failures. Under normal circumstances, bearings have a long useful life. They are always replaced at major overhaul, but it’s not unusual for bearings removed at overhaul to be in pristine condition with little detectable wear.

Bearings fail prematurely for three reasons: (1) they become contaminated with metal from some other failure; (2) they become oil-starved when oil pressure is lost; or (3) main bearings become oil-starved because they shift in their crankcase supports to the point where their oil supply holes become misaligned (as with the “spun bearing” pictured at right).

Contamination failures can generally be prevented by using a full-flow oil filter and inspecting the filter for metal at every oil change. So long as the filter is changed before its filtering capacity is exceeded, metal particles will be caught by the filter and won’t get into the engine’s oil galleries and contaminate the bearings. If a significant quantity of metal is found in the filter, the aircraft should be grounded until the source of the metal is found and corrected.

Oil-starvation failures are fairly rare. Pilots tend to be well-trained to respond to decreasing oil pressure by reducing power and landing at the first opportunity. Bearings will continue to function properly at partial power even with fairly low oil pressure.

Spun bearings are usually infant-mortality failures that occur either shortly after an engine is overhauled (due to an assembly error) or shortly after cylinder replacement (due to lack of preload on the through bolts). Failures occasionally occur after a long period of crankcase fretting, but such fretting is usually detectable through oil filter inspection and oil analysis).They can also occur after extreme unpreheated cold starts, but that is quite rare.

Thrown Connecting RodConnecting Rods

Connecting rod failure is responsible for a significant number of catastrophic engine failures. When a rod fails in flight, it often punches a hole in the crankcase (“thrown rod”) and causes loss of engine oil and subsequent oil starvation. Rod failure have also been known to cause camshaft breakage. The result is invariably a rapid and often total loss of engine power.

Connecting rods usually have a long useful life and are not normally replaced at overhaul. (Rod bearings, like all bearings, are always replaced at overhaul.) Many rod failures are infant-mortality failures caused by improper tightening of the rod cap bolts during engine assembly. Rod failures can also be caused by the failure of the rod bearings, often due to oil starvation. Such failures are usually random failures unrelated to time since overhaul.

Pistons and Rings

Piston and ring failures usually cause only partial power loss, but in rare cases can cause complete power loss. Piston and ring failures are of two types: (1) infant-mortality failures due to improper manufacturer or assembly; and (2) heat-distress failures caused by pre-ignition or destructive detonation events. Heat-distress failures can be caused by contaminated fuel (e.g., 100LL laced with Jet A), or by improper engine operation. They are generally unrelated to hours or years since overhaul. A digital engine monitor can alert the pilot to pre-ignition or destructive detonation events in time for the pilot to take corrective action before heat-distress damage is done.

Head SeparationCylinders

Cylinder failures usually cause only partial power loss, but occasionaly can cause complete power loss. A cylinder consists of a forged steel barrel mated to an aluminum alloy head casting. Cylinder barrels typically wear slowly, and excessive wear is detected at annual inspection by means of compression tests and borescope inspections. Cylinder heads can suffer fatigue failures, and occasionally the head can separate from the barrel. As dramatic as it sounds, a head separation causes only a partial loss of power; a six-cylinder engine with a head-to-barrel separation can still make better than 80% power. Cylinder failures can be infant-mortality failures (due to improper manufacture) or age-related failures (especially if the cylinder head remains in service for more than two or three TBOs). Nowadays, most major overhauls include new cylinders, so age-related cylinder failures have become quite rare.

Broken Exhaust ValveValves and Valve Guides

It is quite common for exhaust valves and valve guides to develop problems well short of TBO. Actual valve failures are becoming much less common nowadays because incipient problems can usually be detected by means of borescope inspections and digital engine monitor surveillance. Even if a valve fails completely, the result is usually only partial power loss and an on-airport emergency landing.

Rocker Arms and Pushrods

Rocker arms and pushrods (which operate the valves) typically have a long useful life and are not normally replaced at overhaul. (Rocker bushings, like all bearings, are always replaced at overhaul.) Rocker arm failure is quite rare. Pushrod failures are caused by stuck valves, and can almost always be avoided through regular borescope inspections. Even when they happen, such failures usually result in only partial power loss.

Failed Mag Distributor GearsMagnetos and Other Ignition Components

Magneto failure is uncomfortably commonplace. Mags are full of plastic components that are less than robust; plastic is used because it’s non-conductive. Fortunately, our aircraft engines are equipped with dual magnetos for redundancy, and the probability of both magnetos failing simultaneously is extremely remote. Mag checks during preflight runup can detect gross ignition system failures, but in-flight mag checks are far better at detecting subtle or incipient failures. Digital engine monitors can reliably detect ignition system malfunctions in real time if the pilot is trained to interpret the data. Magnetos should religiously be disassembled, inspected and serviced every 500 hours; doing so drastically reduces the likelihood of an in-flight magneto failure.

The Bottom Line

The bottom-end components of our piston aircraft engines—crankcase, crankshaft, camshaft, bearings, gears, oil pump, etc.—are very robust. They normally exhibit long useful life that are many multiples of published TBOs. Most of these bottom-end components (with the notable exception of bearings) are routinely reused at major overhaul and not replaced on a routine basis. When these items do fail prematurely, the failures are mostly infant-mortality failures that occur shortly after the engine is built, rebuilt or overhauled, or they are random failures unrelated to hours or years in service. The vast majority of random failures can be detected long before they get bad enough to cause an in-flight engine failure simply by means of routine oil-filter inspection and laboratory oil analysis.

The top-end components—pistons, cylinders, valves, etc.—are considerably less robust. It is not at all unusual for top-end components to fail prior to TBO. However, most of these failures can be prevented by regular borescope inspections and by use of modern digital engine monitors. Even whey they happen, top-end failures usually result in only partial power loss and a successful on-airport landing, and they usually can be resolved without having to remove the engine from the aircraft and sending it to an engine shop. Most top-end failures are infant-mortality or random failures that do not correlate with time since overhaul.

The bottom line is that a detailed FMEA of piston aircraft engines strongly suggests that the traditional practice of fixed-interval engine overhaul or replacement is unwarranted and counterproductive. A conscientiously applied program of condition monitoring that includes regular oil filter inspection, oil analysis, borescope inspections and digital engine monitor data analysis can yield improved reliability and much reduced expense and downtime.

Do Piston Engine TBOs Make Sense?

Thursday, March 13th, 2014

Last month, I discussed the pioneering work on Reliability-Centered Maintenance (RCM) done by United Airlines scientists Stan Nowlan and Howard Heap in the 1960s, and I bemoaned the fact that RCM has not trickled down the aviation food chain to piston GA. Even in the 21st century, maintenance of piston aircraft remains largely time-based rather than condition-based.

mfr_logo_montageMost owners of piston GA aircraft dutifully overhaul their engines at TBO, overhaul their propellers every 5 to 7 years, and replace their alternators and vacuum pumps every 500 hours just as Continental, Lycoming, Hartzell, McCauley, HET and Parker Aerospace call for. Many Bonanza and Baron owners have their wing bolts pulled every five years, and most Cirrus owners have their batteries replaced every two years for no good reason (other than that it’s in the manufacturer’s maintenance manual).

Despite an overwhelming body of scientific research demonstrating that this sort of 1950s-vintage time-based preventive maintenance is counterproductive, worthless, unnecessary, wasteful and incredibly costly, we’re still doing it. Why?

Mostly, I think, because of fear of litigation. The manufacturers are afraid to change anything for fear of being sued (because if they change anything, that could be construed to mean that what they were doing before was wrong). Our shops and mechanics are afraid to deviate from what the manufacturers recommend for fear of being sued (because they deviated from manufacturers’ guidance).

Let’s face it: Neither the manufacturers nor the maintainers have any real incentive to change. The cost of doing all this counterproductive, worthless, unnecessary and wasteful preventive maintenance (that actually doesn’t prevent anything) is not coming out of their pockets. Actually, it’s going into their pockets.

If we’re going to drag piston GA maintenance kicking and screaming into the 21st century (or at least out of the 1950s and into the 1960s), it’s going to have to be aircraft owners who force the change. Owners are the ones with the incentive to change the way things are being done. Owners are the ones who can exert power over the manufacturers and maintainers by voting with their feet and their credit cards.

For this to happen, owners of piston GA aircraft need to understand the right way to do maintenance—the RCM way. Then they need to direct their shops and mechanics to maintain their aircraft that way, or take their maintenance business to someone who will. This means that owners need both knowledge and courage. Providing aircraft owners both of these things is precisely why I’m contributing to this AOPA Opinion Leaders Blog.

When are piston aircraft engines most likely to hurt you?

Fifty years ago, RCM researches proved conclusively that overhauling turbine engines at a fixed TBO is counterproductive, and that engine overhauls should be done strictly on-condition. But how can we be sure that his also applies to piston aircraft engines?

In a perfect world, Continental and Lycoming would study this issue and publish their findings. But for reasons mentioned earlier, this ain’t gonna happen. Continental and Lycoming have consistently refused to release any data on engine failure history of their engines, and likewise have consistently refused to explain how they arrive at the TBOs that they publish. For years, one aggressive plaintiff lawyer after another have tried to compel Continental and Lycoming to answer these questions in court. All have failed miserably.

So if we’re going to get answers to these critical questions, we’re going to have to rely on engine failure data that we can get our hands on. The most obvious source of such data is the NTSB accident database. That’s precisely what brilliant mechanical engineer Nathan T. Ulrich Ph.D. of Lee NH did in 2007. (Dr. Ulrich also was a US Coast Guard Auxiliary pilot who was unhappy that USCGA policy forbade him from flying volunteer search-and-rescue missions if his Bonanza’s engine was past TBO.)

Dr. Ulrich analyzed five years’ worth of NTSB accident data for the period 2001-2005 inclusive, examining all accidents involving small piston-powered airplanes (under 12,500 lbs. gross weight) for which the NTSB identified “engine failure” as either the probable cause or a contributing factor. From this population of accidents, Dr. Ulrich eliminated those involving air-race and agricultural-application aircraft. Then he analyzed the relationship between the frequency of engine-failure accidents and the number of hours on the engine since it was last built, rebuilt or overhauled. He did a similar analysis based on the calendar age of the engine since it  was last built, rebuilt or overhauled. The following histograms show the results of his study:

Ulrich study (hours)

Ulrich study (years)

If these histograms have a vaguely familiar look, it might be because they look an awful lot like the histograms generated by British scientist C.H. Waddington in 1943.

Now,  we have to be careful about how we interpret Dr. Ulrich’s findings. Ulrich would be the first to agree that NTSB accident data can’t tell us much about the risk of engine failures beyond TBO, simply because most piston aircraft engines are voluntarily euthanized at or near TBO. So it shouldn’t be surprising that we don’t see very many engine failure accidents involving engines significantly past TBO, since there are so few of them flying. (The engines on my Cessna 310 are at more than 205% of TBO, but there just aren’t a lot of RCM true believers like me in the piston GA community…yet.)

What Dr. Ulrich’s research demonstrates unequivocally is striking and disturbing frequency of “infant-mortality” engine-failure accidents during the first few years and first few hundred hours after an engine is built, rebuilt or overhauled. Ulrich’s findings makes it indisputably clear that by far the most likely time for you to fall out of the sky due to a catastrophic engine failure is when the engine is young, not when it’s old.

(The next most likely time for you to fall out of the sky is shortly after invasive engine maintenance in the field, particularly cylinder replacement, but that’s a subject for a future blog post…stay tuned!)

 So…Is there a good reason to overhaul your engine at TBO?

Engine overhaulIt doesn’t take a rocket scientist (or a Ph.D. in mechanical engineering) to figure out what all this means. If your engine reaches TBO and still gives every indication of being healthy (good performance, not making metal, healthy-looking oil analysis and borescope results, etc.), overhauling it will clearly degrade safety, not improve it. That’s simply because it will convert your low-risk old engine into a high-risk young engine. I don’t know about you, but that certainly strikes me as a remarkably dumb thing to do.

So why is overhauling on-condition such a tough sell to our mechanics and the engine manufacturers? The counter-argument goes something like this: “Since we have so little data about the reliability of past-TBO engines (because most engines are arbitrarily euthanized at TBO), how can we be sure that it’s safe to operate them beyond TBO?” RCM researchers refer to this as “the Resnikoff Conundrum” (after mathematician H.L. Resnikoff).

To me, it looks an awful lot like the same circular argument that was used for decades to justify arbitrarily euthanizing airline pilots at age 60, despite the fact that aeromedical experts were unanimous that this policy made no sense whatsoever. Think about it…

Roots of Reliability-Centered Maintenance

Tuesday, February 11th, 2014

Last month, I discussed the pioneering WWII-era work of the eminent British scientist C.H. Waddington, who discovered that the scheduled preventive maintenance (PM) being performed on RAF B-24 bombers was actually doing more harm than good, and that drastically cutting back on such PM resulted in spectacular improvement in dispatch reliability of those aircraft. Two decades later, a pair of brilliant American engineers at United Airlines—Stan Nowlan and Howard Heap—independently rediscovered the utter wrongheadedness of traditional scheduled PM, and took things to the next level by formulating a rigorous engineering methodology for creating an optimal maintenance program to maximize safety and dispatch reliability while minimizing cost and downtime. Their approach became known as “Reliability-Centered Maintenance” (RCM), and revolutionized the way maintenance is done in the airline industry, military aviation, high-end bizjets, space flight, and numerous non-aviation applications from nuclear power plants to auto factories.

RCM wear-out curve

The traditional approach to PM assumes that most components start out reliable, and then at some point start becoming unreliable as they age

The “useful life” fallacy

Nowlan and Heap showed the fallacy of two fundamental principles underlying traditional scheduled PM:

  • Components start off being reliable, but their reliability deteriorates with age.
  • The useful life of components can be established statistically, so components can be retired or overhauled before they fail.

It turns out that both of these principles are wrong. To quote Nowlan and Heap:

“One of the underlying assumptions of maintenance theory has always been that there is a fundamental cause-and-effect relationship between scheduled maintenance and operating reliability. This assumption was based on the intuitive belief that because mechanical parts wear out, the reliability of any equipment is directly related to operating age. It therefore followed that the more frequently equipment was overhauled, the better protected it was against the likelihood of failure. The only problem was in determining what age limit was necessary to assure reliable operation. “In the case of aircraft it was also commonly assumed that all reliability problems were directly related to operating safety. Over the years, however, it was found that many types of failures could not be prevented no matter how intensive the maintenance activities. [Aircraft] designers were able to cope with this problem, not by preventing failures, but by preventing such failures from affecting safety. In most aircraft essential functions are protected by redundancy features which ensure that, in the event of a failure, the necessary function will still be available from some other source.

RCM six curves

RCM researchers found that only 2% of aircraft components have failures that are predominantly age-related (curve B), and that 68% have failures that are primarily infant mortality (curve F).

“Despite the time-honored belief that reliability was directly related to the intervals between scheduled overhauls, searching studies based on actuarial analysis of failure data suggested that the traditional hard-time policies were, apart from their expense, ineffective in controlling failure rates. This was not because the intervals were not short enough, and surely not because the tear down inspections were not sufficiently thorough. Rather, it was because, contrary to expectations, for many items the likelihood of failure did not in fact increase with increasing age. Consequently a maintenance policy based exclusively on some maximum operating age would, no matter what the age limit, have little or no effect on the failure rate.”

[F. Stanley Nowlan and Howard F. Heap, “Reliability-Centered Maintenance” 1978, DoD Report Number AD-A066579.]

Winning the war by picking our battles

FMEAAnother traditional maintenance fallacy was the intuitive notion that aircraft component failures are dangerous and need to be prevented through PM. A major focus of RCM was to identify the ways that various components fail, and then evaluate the frequency and consequences of those failures. This is known as “Failure Modes and Effects Analysis” (FMEA). Researchers found that while certain failure modes have serious consequences that can compromise safety (e.g., a cracked wing spar), the overwhelming majority of component failures have no safety impact and have consequences that are quite acceptable (e.g., a failed #2 comm radio or #3 hydraulic pump). Under the RCM philosophy, it makes no sense whatsoever to perform PM on components whose failure has acceptable consequences; the optimal maintenance approach for such components is simply to leave them alone, wait until they fail, and then replace or repair them when they do. This strategy is known as “run to failure” and is a major tenet of RCM.

A maintenance revolution…

Jet airliner

The 747, DC-10 and L-1011 were the first airliners that had RCM-based maintenance programs.

As a direct result of this research, airline maintenance practices changed radically. RCM-inspired maintenance programs were developed for the Boeing 747, Douglas DC-10 and Lockheed L-1011, and for all subsequent airliners. The contrast with the traditional (pre-RCM) maintenance programs for the Boeing 707 and 727 and Douglas DC-8 was astonishing. The vast majority of component TBOs and life-limits were abandoned in favor of an on-condition approach based on monitoring the actual condition of engines and other components and keeping them in service until their condition demonstrably deteriorated to an unacceptable degree. For example, DC-8 had 339 components with TBOs or life limits, whereas the DC-10 had only seven—and none of them were engines. (Research showed clearly that overhauling engines at a specific TBO didn’t make them safer, and actually did the opposite.) In addition, the amount of scheduled maintenance was drastically reduced. For example, the DC-8 maintenance program required 4,000,000 labor hours of major structural inspections during the aircraft’s first 20,000 hours in service, while the 747 maintenance program called for only 66,000 labor hours, a reduction of nearly two orders of magnitude.

Greybeard AMTs.

Owner-flown GA, particularly piston GA, is the only remaining segment of aviation that does things the bad old-fashioned way.

Of course, these changes saved the airlines a king’s ransom in reduced maintenance costs and scheduled downtime. At the same time, the airplanes had far fewer maintenance squawks and much better dispatch reliability. (This was the same phenomenon that the RAF experienced during WWII when they followed Waddington’s advice to slash scheduled PM.)

…that hasnt yet reached piston GA

Today, there’s only one segment of aviation that has NOT adopted the enlightened RCM approach to maintenance, and still does scheduled PM the bad old-fashioned way. Sadly, that segment is owner-flown GA—particularly piston GA—at the bottom of the aviation food chain where a lot of us hang out. I’ll offer some thoughts about that next month.

The Waddington Effect

Tuesday, January 14th, 2014
Conrad Hal (C.H.) Waddington

C.H. Waddington (1905-1975)

In 1943, a British scientist named Conrad Hal (C.H.) Waddington made a remarkable discovery about aircraft maintenance.  He was a most unlikely person to make this discovery, because he wasn’t an aeronautical engineer or an aircraft mechanic or even a pilot.  Actually, he was a gifted developmental biologist, paleontologist, geneticist, embryologist, philosopher, poet and painter who wasn’t particularly interested in aviation.  But like many other British scientists at that time, his career was interrupted by the outbreak of the Second World War and he found himself pressed into service with the Royal Air Force (RAF).

Waddington wound up reporting to the RAF Coastal Command, heading up a group of fellow scientists in the Coastal Command Operational Research Section.  Its job was to advise the British military on how it could more effectively combat the threat from German submarines.  In that capacity, Waddington and his colleagues developed a series of astonishing recommendations that defied military conventional wisdom of the time.

For example, the bombers used to hunt and kill U-boats were mostly painted black in order to make them difficult to see.  But Waddington’s group ran a series of experiments that proved that bombers painted white were not spotted by the U-boats until they were 20% closer, resulting in a 30% increase in successful sinkings. Waddington’s group also recommended that the depth charges dropped by the bombers be set to explode at a depth of 25 feet instead of 100 feet.  This recommendation—initially resisted strongly by RAF commanders—ultimately resulted in a sevenfold increase in the number of U-boats destroyed.

Consolidated B-24 "Liberator" bomber

Consolidated B-24 “Liberator” bomber

Waddington subsequently turned his attention to the problem of “force readiness” of the bombers.  The Coastal Command’s B-24 “Liberator” bombers were spending an inordinate amount of time in the maintenance shop instead of hunting U-boats.  In July 1943, the two British Liberator squadrons located at Ballykelly, Northern Ireland, consisted of 40 aircraft, but at any given time only about 20 were flight-ready.  The other aircraft were down for any number of reasons, but mostly undergoing or awaiting maintenance—either scheduled or unscheduled—or waiting for replacement parts.

At that time, conventional wisdom held that if more preventive maintenance were performed on each aircraft, fewer problems would arise and more incipient problems would be caught and fixed—and thus fleet readiness would surely improve. It turned out that conventional wisdom was wrong. It would take C.H. Waddington and his Operational Research team to prove just how wrong.

Waddington and his team started gathering data about the scheduled and unscheduled maintenance of these aircraft, and began crunching and analyzing the numbers.  When he plotted the number of unscheduled aircraft repairs as a function of flight time, Waddington discovered something both unexpected and significant: The number of unscheduled repairs spiked sharply right after each aircraft underwent its regular 50-hour scheduled maintenance, and then declined steadily over time until the next scheduled 50-hour maintenance, at which time they spiked up once again.

Waddington Effect graph

When Waddington examined the plot of this repair data, he concluded that the scheduled maintenance (in Waddington’s own words) “tends to INCREASE breakdowns, and this can only be because it is doing positive harm by disturbing a relatively satisfactory state of affairs. There is no sign that the rate of breakdowns is starting to increase again after 40-50 flying hours when the aircraft is coming due for its next scheduled maintenance.” In other words, the observed pattern of unscheduled repairs demonstrated that the scheduled preventive maintenance was actually doing more harm than good, and that the 50-hour preventive maintenance interval was inappropriately short.

The solution proposed by Waddington’s team—and ultimately accepted by the RAF commanders over the howls of the maintenance personnel—was to increase the time interval between scheduled maintenance cycles, and to eliminate all preventive maintenance tasks that couldn’t be demonstrably proven to be beneficial. Once these recommendations were implemented, the number of effective flying hours of the RAF Coastal Command bomber fleet increased by 60 percent!

Fast forward two decades to the 1960s, when a pair of gifted scientists who worked for United Airlines—aeronautical engineer Stanley Nowlan and mathematician Howard Heap—independently rediscovered these principles in their pioneering research on optimizing maintenance that revolutionized the way maintenance is done in air transport, military aviation, high-end bizjets and many non-aviation industrial applications.  They were almost certainly unaware of the work of C.H. Waddington and his colleagues in Britain in the 1940s because that work remained classified until 1973, when Waddington’s meticulously-kept diary of his wartime research activities was declassified and published.

Next time, I’ll discuss the fascinating work of Nowlan and Heap on what came to be known as “Reliability Centered Maintenance.” But for now, I will leave you with the major takeaway from Waddington’s research during World War II: Maintenance isn’t an inherently good thing (like exercise); it’s a necessary evil (like surgery). We have to do it from time to time, but we sure don’t want to do more than absolutely necessary to keep our aircraft safe and reliable. Doing more maintenance than necessary actually degrades safety and reliability.