Reuse Is Good
In the commercial world, rewriting code is never a goal. Time is money, and new code increases the cost of a product. Whenever possible, businesses will try to reuse code unless there is a compelling reason to rewrite it: new hardware, newly designed algorithms, high bug rates, or code that has aged poorly (see When It's Time to Throw a Program Away).
Thus the goal whenever possible is to reuse existing code. When you're writing code for the next release of a product already on the market, this is simple - the existing release provides the code base. But if the product is new, where do you begin? Economics tells you to start with code that you already have, using it in another product.
Sometimes reuse opportunities are obvious - library routines for previous versions of hardware or software can be adapted, or better yet used as is. These are routines written for general use, typically documented and published.
Other times you'll remember that a previous product has code to do something very similar or identical to what you need to do now. If you didn't write it, how can you be sure that it will work in the new product?
Reuse Requires Retesting
To reuse existing code, you want some confidence that it will work. That means testing. Ideally, there will be standalone test code somewhere. An assurance that "it works in our product" does not mean that the code was tested enough to reuse. In fact, it is quite possible that the code has bugs that the existing product simply hasn't hit yet. If the code isn't thoroughly tested, how can you rely on it in your new product?
Even if existing code works in a previous product, you need at least some testing within your new product. Someone in your product group will have to learn about it, then validate its operation. You want this time to be as short as possible:
- If the code is the responsibility of another group, you are duplicating their efforts by running additional tests on it.
- If the code does not work in the new product, or requires adaptation to work in the new product, you need to know quickly for scheduling purposes.
Standalone Test Programs Pay Off
If you already have a standalone test program for a module or subsystem, you're well on the way to reuse:
- The standalone test program can be examined for completeness;
- Standalone testing implies standalone usability;
- Standalone test drivers are easy to extend if the code requires enhancement;
- If the code is fully tested now, you can verify that modifications do not impact existing users;
- Standalone test programs provide full use examples, with all setup present.
First of all, if code is not fully tested, its reliability is questionable. You want code that works now and will continue to work even if you need to make some changes. I've used code that was deemed "correct by construction," needing no testing, and yet silently corrupted data as it passed through (converting one shape into another due to a cut-and-paste error).
Second, a standalone test program shows you all of the dependencies of the code. The test program can't be linked and run without its dependencies, so you will have 100% confidence that you know all of the dependencies. There is nothing like pulling in one tiny bit of code and finding that half a library comes along with it. I should know - one library I defined had that requirement, much to the chagrin of users. It was a lesson learned; see Writing Code Layer by Layer.
Third, standalone test programs written in the style I described in Yes, You Can Test Every Line of Code are easily extendible. You can always add more tests, and the old tests will still be run. Test programs at this level of code run quickly because they do not require full application setup.
Fourth, with all-conditions test coverage you can verify that any necessary modifications do not harm existing users of the code. You can verify that the code still meets its old requirements while still meeting your new ones. Your changes can then become part of its master code base - not a branch. Bugs fixed in the original code will immediately benefit your product.
Finally, standalone test programs by necessity provide a multitude of usage example. Every test uses some part of the code, and all-statements or better test coverage (see Levels of Software Testing) provides you with examples of how to call every routine.
Even basic white box testing gives you usage examples for the public Applications Program Interfaces (APIs). If you need to use a function not yet in the APIs, you (or the original authors) can add a public interface and define some tests specifically for that code.
Example: A Species Card Printer
Recently I wrote a species card printer for the local chapter of the California Native Plant Society. Information about plants we've displayed over the last 30 years is stored in a Microsoft Excel spreadsheet that is then printed for plant collectors and "keyers" - trained botanists able to precisely identify an unknown specimen.
With this year's publication of the new edition of The Jepson Manual of Higher Plants of California, about 30% of all plant taxa in the state got new names - moved into different families, genera, species, subspecies, or variants. This meant we would have to lay out, type, and laminate nearly 500 new cards in short order without making any mistakes. With the addition of common name changes, about half of the 1500 cards would need updates.
We decided that the best route was to reprint all of the cards rather than miss any of the changes. This also allowed us to define a consistent style for the cards, which had been typed by hand over the past 19 years. Nearly all of the information had to be added to the spreadsheet anyway, so the simplest way to guarantee accuracy was to add all of the missing fields, cross-check them, and write a program to print them from the spreadsheet.
Considerable effort has gone into the formatting of the spreadsheet, so converting to a database such as MySQL was not an option. Instead the spreadsheet data was exported to a comma-separated-value (CSV) file. Experiments with HTML style sheets showed a way to print the card data without a mail merge program.
I wrote code to read the CSV file and to generate the HTML for each species card. To save paper, three species cards were printed per page. The HTML was visually checked in Internet Explorer and Mozilla Firefox; differences were resolved by adjusting the HTML template used for each page.
The code to read a CSV file was an obvious candidate for export to a library source file; it should come in handy some day. (I gave up trying to find a good implementation in C on the Internet.) It was written for reusability from the start and has an all-conditions standalone test program.
The code to extract and print information about each plant looked pretty specialized and thus not useful elsewere; I exported it to a separate module primarily to be able to test it. With 1,500 species cards printed on 500 pages, visual debugging of the full results was definitely not an effective use of my time. The routines didn't seem to be significant, yet they still total 645 lines (plus headers) at the time of this writing.
The main driver program has 783 lines in it, mostly HTML style sheets (so that no separate style sheet file is needed - the program is fully self-contained) and comments. It calls the CSV code, then loops through the resulting array printing species cards. The species card printing routine is quite specialized and changed multiple times as we debated format changes, so it was left inside. Testing for this routine used conventional methods - single stepping and visual analysis of the output.
As time went on and more changes were made, the species card printing routine grew more complex. Code was first exported into static routines, and then these static routines were exported to the plant extraction module for testing.
I figured that once I got approval of the species card layout, I would be done. I was wrong. Genera are grouped into families, and we need to print new family cards. Not only that, but there are two sets of species cards, and the second set has different requirements. Lo and behold, all of the code necessary to print this information has already been exported and tested. Other than writing new top-level drivers (based on the original species card driver), I don't need to do any more work to generate the extra two sets of cards.
Even for simple programs, export everything and test it. Don't hide it inside the application-level module. You never know when you will need the code you just wrote; it might even be the very next week.