Within the space of a few months in 1999, NASA’s Mars Climate Orbiter (MCO) and Mars Polar Lander (MPL) were both lost in the final stages of their trips to the red planet. The circumstances leading to the loss of the MCO help illustrate the risks of reuse without testing. The reasons for the MPL’s loss are perhaps less subtle, but never-the-less provide an example how simple errors can combine to cause a disastrous outcome and the importance of keeping an end-end systems engineering perspective.
The MPL was intended to use a combination of atmospheric breaking, a parachute and finally descent rockets to land near the South Pole of Mars. It entered Mars’ atmosphere as planned but was never heard from again. While there is no data to confirm the actual reason why the probe was lost, the accepted view is that the descent engines were shutdown prematurely and that the probe fell the final 40 meters or so, hitting the Martian ground with enough speed to destroy it. The MPL was designed to land on 3 legs, which were folded during flight and unfurled using springs just before landing. The control software was designed to monitor sensors on the feet of each of the legs and shutdown the descent engines as soon as contact was made with the Martian surface. It was well known that on many occasions the jolt of the legs unfolding during descent could cause the sensors to give a transient touchdown signal. The MPL control software was supposed to ignore these transient signals but unfortunately this requirement was not accurately implemented in the code.
During testing, the procedure to unfold the lander’s legs was rehearsed but no phantom touchdown signals were detected. In fact no signals, neither the transients when the legs were unfurled nor the simulated touchdown on the surface were being reported due to a wiring problem. The problem was detected when the sensors failed to notice contact with the simulated Martian surface. The wiring was corrected and a retest to check that the sensors could detect touchdown was repeated and the problem declared fixed.
However, a repeat of the full test—including unfurling the lander’s legs—was not conducted, and the potential impact of phantom signals on the controlling software and the knock on effect on the descent engines went unnoticed.
The decision not to repeat the full landing test seems to have been taken to avoid potential damage to the lander caused during a further disassemble and redeploy cycle. Moreover, the need to handle transient signals in the software was “so well-known across the project that it was taken for granted”. Indeed, the potential for the transient signal to cause the control software to fail was considered so remote, it wasn’t considered in the inquiry following the loss. It was only during the testing of the successor to the MPL (a mission that was eventually cancelled) that a similar problem was spotted and led the investigators to reconsider this as a probable cause for the MPL loss. This helps illustrate a number of lessons for our work:
Nothing involving software is so “well known” that it doesn’t need to be proven.
Fixing one thing can reveal another previously concealed defect, so as long as you are fixing bugs, there is a reason to run a regression test. This is a very good reason to invest in automating regression testing (an option not available to the legs of the MPL).
The lack of involvement of key system engineering staff in all stages, including testing, is identified as a contributing factor to the loss of the MPL in several reports.  In our terminology, system engineering can most readily be equated with technology architecture. It demonstrates the key position the architect holds in ensuring that the end-to-end solution is understood, including the interdependencies between components, and that the testing is executed in a manner to prevent any of the dependencies being masked.
NASA has followed these mishaps on Mars with a series of triumphs, including the Mars Odyssey orbiter, the Sprit and Opportunity rovers and most recently, the audacious Curiosity rover.
[1 ]The Failures of the Mars Climate Orbiter and Mars Polar Lander: A Perspective from the People Involved: American Astronomical Society – 2001[2 ]Mars Program Independent Assessment Team Summary Report March 14, 2000 and The Role of Software in Spacecraft Accidents - Nancy G. Leveson