The Value of Prototyping in Software R&D

Clean Tunnels

Many years ago, I worked as a co-op student at a company that was trying to reinvent the semiconductor fab line. Even then, fab lines were expensive, and a large part of the cost was in building an enormous "clean room" in which the air was filtered to remove dust particles and other contaminants, to keep them off of the wafers. Someone had the bright idea that clean room costs could be reduced dramatically if wafers were moved around in "clean tunnels" that were closed off from the outside world. Filtering the smaller volume of air in these tunnels would be much cheaper than filtering the air in a giant room.

The wafers were moved between processing stations by motorized carts on rails. Turntables at bends and 'T' junctions allowed carts to switch to a different tunnel. A cart would drive onto the turntable and stop while it rotated. Then it would drive down the new tunnel. Software in a centralized computer would track the carts in tunnels and command each turntable to rotate when appropriate.

The mechanical engineers working on the project set up a dummy set of tunnels with junctions, bends, and equipment stations. There was no software to control the turntables, so they operated on a fixed program. Carts ran all day (and perhaps all night) long.

Along the way some changes were made to the system. One change reduced the number of turntables: curved tracks were constructed at bends; the carts would simply go around the curve rather than wait for the computer to command a turntable. This cut hardware costs, too; networking cards were still expensive, and installation of power and data cables cost money.

Obviously it cost this company quite a bit of money to build a full-scale mockup. One day I heard one of the engineers explain why, and his answer sticks with me to this day: "If you don't build a prototype, you ship the prototype."

Software Prototyping

Everyone understands that prototypes of mechanical systems are a necessity - fit and finish can only be determined by assembling all of the parts, and certification is given to systems, not components. But how does this apply to software?

Every significant software program has some uncertainty - something new that must be developed or invented. It may use data structures of unknown efficiency or new optimization algorithms that have never been run on a large-scale system. It may be meant for use with millions of users. Any of these things can cause problems in a production system. If the rest of the system is designed at the same time, it may have faulty assumptions built into it. The result is a product that does not function as intended in the real world. Costly rework may be necessary.

Often the core idea of a software system is relatively straightforward and can be described in a concise way. It's the input data preparation, error checking, and output generation that require all of the code. This means you can get a quick picture of how your idea will work in the real world by giving it test loads, test data sets, or synthetic examples.

Test loading is best for real-time or server-based systems. What is in the data is not as important as whether it is handled in a timely manner. Write some small programs to send data to your prototype. Turn up the rate until the prototype begins failing. Understand how it fails before going further. Does it lose data? Can it queue requests until traffic dies down? Does it crash? Is the failure point realistic?

Test data sets are best for numeric or other streaming data algorithms, for example digital signal processing. Write a program that qualifies the output of the prototype, then modify the sample data sets (e.g. by adding noise) until the prototype begins failing. Understand how it fails before going further. Is the degradation graceful? Are the modified data sets (not just the clean input) comparable to what you will see in the real world?

Synthetic examples are best for complex transformations such as optimization. Include regular examples as well as quasi-random data. Understand how run time and transform quality vary before going further. Is runtime reasonable for regular data as well as for random data? How does it scale for very large data sets? Plot the run time to ensure it is not exponential; otherwise your system will slow considerably as soon as the real world input data becomes larger than your test data.

I once wrote a fast geometric pattern matcher that worked extremely well when extracting regular (repeated pattern) data, decimating it quickly until only fragments were left. Run time exploded when the matcher was given a large random data set. I spent much more development time dealing with the random input than with the regular patterns that I expected would dominate typical input data!

Conclusions

Identify the key uncertainties in your software project and write prototype code, sans error-checking or reporting. Run it with test input data that is carefully constructed to be error-free and has important characteristics of real input data - size, relationships, whatever. Make sure the core idea scales up enough to run your enterprise. In the Internet era, you never know how quickly you will need to scale up. You can rent servers in the Cloud quickly, but you can't rewrite your software nearly as fast.

I returned to school before the semiconductor equipment manufacturer finished its "clean tunnels" project, so I don't know how far product development got. Clean rooms are still massive to this day, so I doubt the company sold anything. But I'm confident they found their problems early, simply because they went looking for problems early.

"If you don't build a prototype, you ship the prototype." Don't ship prototype software!

Chapman Consulting

Software Development Done Right.

Clean Tunnels

Software Prototyping

Conclusions