Everyone expects new code to have a higher bug rate. Software goes through "alpha" and "beta" releases before being sold to the general public; even early production releases have significantly higher patch rates than older code. Customers accept this in return for being the first to see a new system. But that doesn't mean it's a good thing!
Typically the release date of a first production version is based on reaching a targeted bug rate. The number of bugs per customer will drop over time until it reaches an acceptable threshold. Normally the bug rate will decrease smoothly and rapidly at first. This is a good thing, of course, but it is the result of a simple truth: most bugs in new code are trivial.
From a software developer's standpoint, there are a number of reasons why new code has a higher error rate:
- The requirements for the software release may still be in flux. Changing requirements means changing code. This takes time, and there is a temptation to do the job too quickly.
- The limits of the input data might not be known yet. There might be more data than originally expected, or its numeric range may be larger. As a result memory limits may be exceeded or overflows may occur.
- The application might be doing something that has not previously been possible. This is typical for Research and Development (R&D). The best ways of analyzing or optimizing the target data might not be known yet.
- Code development tends to get ahead of testing. Everyone wants to get the application into the hands of early adopters; their feedback can guide feature development or demonstrate the level of demand for a new product.
Forgetfulness is a large cause of bugs in new code. These are the "head-slapping" bugs - the ones for which a fix is obvious as soon as the developer is able to reproduce the problem (sometimes as soon as the error message is seen). People forget things - it is human nature. Reducing the initial bug rate can be as simple as mitigating forgetfulness.
I've written over 1,200,000 lines of code, and fixed or extended hundreds of thousands of existing code. The most common errors are:
- Function inputs are not checked. At the very least, there should be assertions at the head of a routine checking that the parameters are in the expected limits. These can usually be turned off at run time (or stripped out by the compiler in release mode), so their run time should not be significant.
- Parameters for function calls, including C++ constructors, are not proper (wrong order, etc.). Pop-up function call help within a syntax-aware editor or integrated development environment (IDE) is very helpful here. Better yet, when there are a lot of parameters, is to create a parameter block (data structure or class) and pass that in. Values are assigned to the parameter block by name, not by position. You can also reuse the parameter block for multiple calls, modifying a minimal set of values each time.
- Variables are undefined or stale when they are used. Ensure that all variables are set for all execution paths unless a value is intended to be reused from a previous iteration (in which case it should be commented). Compiler warnings here tend to be conservative, reporting more possible errors than exist in reality, but you should take them seriously. Even if you know that a variable will be assigned in all execution paths, assign a dummy value to it at the start of the function call if there is a possibility that it may be used before a "real" value is assigned to it.
- Code that is cut and pasted is not properly customized in all destinations, especially if it is copied to different functions. Put a marker in each destination as a reminder to complete the customization process. You're done only when you have modified each one and removed the marker.
- Not all return paths from a function clean up completely. This is a frequent cause of local memory leaks (e.g. allocated memory not intended to be returned to the caller). You might also forget to close a file or release a lock. The C++ std::auto_ptr template can help here; it will delete the pointer stored inside whenever it goes out of scope (e.g. function return). You can of course write your own local resource release objects.
As a general rule, you can minimize "head-slapping" bugs using the following method:
- Write a shell of the new routine, with the parameters and a high-level flow present.
- List all low-level requirements in each portion of the flow, immediately while they are still fresh.
- In each block, add notes on what functionality and cleanup is needed.
Once the shell and its notes are completed, you can begin filling in the details. Add initialization and cleanup to the various code blocks first. The idea is to get the simple things (like cleanup) out of the way before focusing on the hard parts. Thinking about the hard parts will tend to make you forget the simple things.
This strategy is kind of like writing pseudocode, but here you are not creating a separate maintainable artifact. In particular, whenever you complete a task within the routine, you can remove the note that reminds you to do it. When all of the notes are gone, you should be done.
Wading through a flood of silly little bugs when you first test new code is very frustrating - they get in the way of evaluating your work. With a little care, you can greatly reduce their number.