Part 1 – When to Test “All Up”
The testing of one of the most complex machines ever assembled—the Saturn V rocket and Apollo space craft—was originally planned to involve a large number of test launches. Starting with a “live” first stage and dummy upper components, the plan was to progressively add live components with each successive test flight.
However, George Mueller NASA’s visionary associate administrator proposed an alternative, much more aggressive approach. Why build a representative “dummy” payload when you could use the real thing? And if you were flying the real thing then why not test it as a bonus? This strategy was known as “All Up Testing” —and lead to the Saturn V carrying astronauts around the moon on only its third flight rather than the tenth or later as the original plan had suggested. This not only saved millions of dollars and years of development effort but also probably enabled America to win the “space race” to the moon.
Functional testing of computer systems is typically incremental—starting with individual components, then groups of components (assemblies), then entire products and finally the end-to-end solution. This often delays the testing of the end-to-end Execution Architecture to late in the project which means we find relatively simple things (such as, firewall rule changes, WAN latency, workstation compatibility, IO throughput) only late in the day—when it is embarrassing and expensive to fix.
Obviously you can’t test the entire solution until the application components are complete but there is no reason to not test the majority of the execution and operations architecture as soon as possible and at the same time (“all up”). These are often based on proven “off the self” components and don’t have the complex inter-dependencies that warrant incremental testing. Example approaches include:
Put the monitoring, database and backup software on the server estate as soon as possible—and use a “vanilla” version of a packaged application (or even a benchmarking application such as Swingbench) to explore the transactional characteristics of solution without waiting for the real application to be finished.
Test connectivity and basic data transfer mechanisms (for example, ftp) with “live” source/destinations even if there are no application interfaces available.
Obviously this is not a substitute for testing with full application—but it helps that later stages of testing can concentrate on the application rather than the base architecture.
Part 2—Reuse does not mean “don’t test”
Reuse is an important part of building systems; we do not have the time nor the patience to constantly re-invent wheels. However, care is needed when reusing code to understand the pre-conditions and assumptions on which the code depends, as illustrated in the following examples:
In 1996, the maiden flight of the European Ariane 5 launcher ended in a failure when the rocket veered off course and exploded. The failure was traced to a software bug, where an unhandled exception generated by conversion of a 64-bit integer to a 16-bit signed integer caused both the primary and backup guidance systems to fail—leaving the rocket unguided and out of control. The code in question had been reused from the earlier (very successful) Ariane 4 rocket and failed because the value being measured (velocity) was much higher for the new, larger Ariane 5 than its predecessor. The irony of this situation is compounded by the fact the velocity monitoring functionality was not actually required in Ariane 5. It was included in a block of code with other functions which were needed, presumably based on the view that, unless proven necessary, it was not wise to make changes in software which had worked well previously.
This issue also demonstrates two other interesting principles—the inability of High Availability hardware to deal with systems software failure (the same defect caused both the primary and back-up guidance computers to fail) and the danger of carrying unneeded code around with you.
On a positive note, Ariane 5 has gone on to be an extremely reliable vehicle despite the inauspicious start, and is still in service a decade and half later.
Mars Climate Orbiter
In 1999, NASA’s Mars Climate Orbiter was lost due to a navigation error partially attributable to a mismatch in units of measure (metric vs. imperial) between the software on the spacecraft and the software used to control the mission back on Earth. The issue was introduced as part of the modification of code being reused from a previous successful NASA mission (the Mars Global Surveyor). One of the development teams summarized the problem—“As [bad] luck would have it, the 4.45 conversation factor, although correctly included in the MGS equation by the previous development team, was not immediately identifiable by inspection (being buried in the equation) or commented in the code in an obvious way. Thus…the new thruster equation was inserted in place of the MGS equation—without the conversion factor.”
Both cases above are great examples of why reuse doesn’t remove the need for testing—since for software, like many things in life, previous performance alone is not always an indication of future success!
1Stages to Saturn: http://history.nasa.gov/SP-4206/ch12.htm
2Ariane 5—Flight 511 Failure: Report by the Inquiry Board—1996
3The Failures of the Mars Climate Orbiter and Mars Polar Lander: A Perspective from the People Involved: American Astronomical Society—2001