What Can Be Done about Software Reliability?

[article]
Summary:

When an error is found in an application during development, the automated error prevention method helps you correlate that error to a specific point in the development process, and allows you to modify your processes to remove it, and more important, to prevent it from happening again. Preventing errors, rather than chasing them, dramatically improves software reliability. This way, you can stay competitive and not risk your valuable reputation on unforeseen bugs.

Ask any developer in your organization what he or she is doing at any point during the workday and most likely you'll hear the answer "debugging code"; that is, trying to make a piece of code work the way it was originally intended. Software is the only industry I know of where so much time is spent trying to figure out why something is broken and how it can be fixed. Other industries make their products right the first time; why can't the software industry do the same?

Software is a relatively new industry, less than fifty years old, and it has spent much of that time trying to figure out how to create reliable software applications with minimum errors. While much wonderful work has been done in a relatively short time, software remains the weakest link in our advanced technological society. The much touted "Software Reliability" campaigns recently espoused by industry leaders is a misnomer—no one is really trying to do anything about making software completely reliable because there is a widespread belief within the computing industry that nothing can be done. Despite recent studies that indicate that software errors cost U.S. businesses $60 billion a year, bad software is still accepted and used, largely without question, by both businesses and consumers alike.

There are many industry experts and leaders that claim to hold the answer to better software. These theories can generally be divided into two categories: process improvement and development dynamics. Process-improvement measures generally focus on human quality-control methods, that is, how to improve a behavioral process so that a procedure is performed exactly the same way each time it is performed, regardless of who is performing it. Well known process-improvement methods include ISO 9001 and SEI-CMM. While popular and effective for configuring human interactions and behaviors, quality control methods such as these leave much to be desired in the software arena. These approaches are costly to implement, they require enormous amounts of human labor to maintain and verify, and they create vast amounts of documentation to simply certify that a company follows a written procedure for writing software.

When it comes right down to it, the only notable result from development certification is lost time and money. Such processes do not guarantee that your products are error free, only that the human elements involved in the software development lifecycle are regulated in the hopes of reducing inefficiency. Your employees follow a process, but so what? Has that process really improved your software applications?

Development dynamics focus on improving development group behavior. Waterfall and iterative development were early and popular attempts to bring order to the actual process of software development. Much software created today is still developed in one of these two methods. More recent examples of development dynamics include eXtreme and Agile Programming. When using these methods, development groups "speed" up software production by shortening the development lifecycle. This is done through many numerous small application iterations that focus only on well-defined customer specifications. Approaches, such as these latter examples, address how development groups operate better when working in focused, small groups where coworker oversight and testing are integrated into the production process.

While they do keep developers focused on customer requirements and working efficiently in shortened time-to-market windows, such programming generally addresses only programmer behavior. They tend to focus on the question "Under what conditions do programmers work efficiently and quickly?" Unfortunately, error detection is still a large factor in such models. Testing is conducted to look for bugs in order to fix them, but nothing is done to alter the process that allowed those bugs to be created in the first place.Neither traditional process improvement nor development dynamics focus on error prevention. They are a step in the right direction, certainly, but they do not address fixing the actual software development lifecycle when a development error or application bug is found. What really needs to be addressed with the software industry is the question of "How can software be better manufactured?" The only answer is: through error prevention. Manufacturers of traditional consumer goods learned long ago how to prevent production errors by correlating functionality and quality problems to the production line. Errors were fixed in the assembly process, not after the product was finished. The Japanese used this approach to corner the global market for economy cars in the 1970s and 1980s (the Germans and Swedes did the same for luxury autos), and it is a lesson that American industries, while slow to learn, have started to take to heart.

Error prevention is very different from error detection, which is the process of finding and fixing errors after an application is built. Unfortunately, this leaves the flawed process that generated those errors uncorrected. This is how the software industry currently deals with bugs, by treating the "symptom" (bugs) and not the "disease" (the development process). Correlating errors to the exact point in the software development process that spawns them, and fixing that part of the process, prevents the need to debug applications after the fact, and produces an exponential increase in product quality.

Error prevention methodology is the best way to improve software quality and reliability. It goes beyond traditional testing practices to combine the intelligence, tools, techniques, and services, to automatically prevent errors in software. What makes an automated error prevention method effective is taking the information gained from software measurement, monitoring, and testing, and use it to progressively improve the development process.

A good automated error prevention method adapts to any software lifecycle and follows five simple steps:

  1. Identify an error
  2. Find the cause of an error
  3. Locate the point in production that created the error
  4. Implement preventative practices to ensure that errors do not recur
  5. Monitor the process

An automated error prevention method should use well-known techniques in the software industry. These include coding standards, unit testing, load testing, and functional testing, as well as less frequently used techniques such as firewalls and defensive programming. This places a transparent layer on top of key development processes, allowing error prevention and monitoring techniques to be integrated seamlessly into the full production lifecycle of any software development project. When an error is found in an application during development, the automated error prevention method helps you correlate that error to a specific point in the development process and allows you modify your processes to remove it, and more importantly, prevent it from happening again.

Preventing errors rather than chasing them dramatically improves software reliability, allowing you to stay competitive and not risk your valuable reputation on unforeseen bugs.

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.