Memory Optimization 101

Too Much Memory?

If your software runs but uses too much memory, you need to optimize it. Optimization is a special form of refactoring: don't change the visible functionality, but improve the code in some way. Here you are trying to turn a big program into a smaller one.

Once upon a time in a land far, far away, memory was free. Every program had all of the space it needed, and every user was happy. The end.

This little fairy tale is almost true - memory costs have continued to drop exponentially, and the amount of memory in the largest computer systems has continued to grow. However, there are always programs that bump up against practical limits on memory usage.

To minimize cost, embedded systems often have very little memory. Engineering design software usually runs on top-of-the-line hardware, but design of complex systems can require every byte of memory in the computer. Data analytics software typically works on huge unstructured data sets.

Tricks of the Trade

When I started writing software, even the biggest computer systems had memory measured in kilobytes or megabytes. Reducing memory consumption has been critical for most of my career, and here are the tricks I've applied to reduce memory usage in applications:

Recompute results rather than store them;
Eliminate rarely used data fields and data structures;
Fold sparse data structures and lookup tables;
Use custom memory management techniques; and
Spill data to mass storage.

When you compute results that are not going to change over time, you will often store them rather than recompute them. If there are many objects and many fields in each object, memory consumption will be too large. Fields should be evaluated to determine the cost of computation and the frequency of use; fields with the smallest product of these two numbers (i.e. the lowest runtime cost) should be removed and the values recomputed each time they are used. This is an example of trading off runtime for memory usage, and the appropriate fields must be determined experimentally.

One of the quickest ways to increase memory usage wastefully is to define data structures that have every field that might ever be used. In long-running programs, it is unlikely that all of these fields will be used simultaneously. At any one time, most of these fields will simply take up space. Grouping fields and then defining auxiliary data structures is more efficient; the pointer overhead in the parent record is more than made up by the reduced average record size.

Large arrays with few entries are also candidates for memory reduction. Sparse array and hashing techniques can greatly reduce the amount of memory used for fast lookup systems without degrading runtime significantly. Fields within data structures can also be folded if you know that they will never be used simultaneously.

Memory allocation overhead is a widely overlooked avenue for reduction of memory use. When you allocate X bytes of memory, the underlying library often uses more. For example, a pointer to an earlier block may be stored in an address just prior to the address returned to your program. The block size may be rounded up as well, perhaps to the next multiple of eight or 16 bytes. If you have many records of the same size, consider writing your own memory manager (or license mine).

In long-running programs that use large block sizes, memory fragmentation also becomes a problem. Blocks are allocated and released in an order that appears random to the memory manager. Although memory managers try to coalesce adjacent freed blocks, this may not be enough. You may need a memory manager that actively works to reduce fragmentation. My frame-based memory allocator uses advanced management techniques to reduce fragmentation with a minimum of overhead.

Finally, if there is too much data to fit in memory without page faulting, you should consider spilling some of it to mass storage. Even the most advanced page management scheme will not know where your application plans to read data next, so thrashing (constant page faulting) is likely. When you know that something will not be needed immediately, write it to disk and reread it only when it is needed.

Conclusions

Of course, some of these methods increase runtime, so they must be used judiciously. "Make it right, then make it small." It is even possible that a program could have both runtime and memory usage problems, so the best solution must be determined experimentally.

Memory optimization takes effort, and often it increases the complexity of the code. It is important not to optimize too early, especially in a research and development environment where the final form of the program is not fully known. You don't want to waste time on code already that fits into memory, and you need to minimize the number of bugs in order to evaluate its function.

Chapman Consulting

Software Development Done Right.

Too Much Memory?

Tricks of the Trade

Conclusions