Establishing Unit Test Criteria

[article]
Summary:

It is time for a new build. What should be included in it? Obviously, it should include the latest and greatest versions of each module. Right?

The phrase "latest and greatest" belies an assumption that the latest version of something is automatically the best. The latest version adds features, corrects problems and, in short, improves on the versions that preceded it. How could it be anything other than the greatest?

Things are often not as great as they seem to be, unfortunately. Those new features may be incompatible with other existing functionality and things that users previously relied upon may have disappeared. The new features may impair usability, particularly for novice users. There are also bugs that invariably crop up in all of that new and changed code.

So, how can we determine when the latest actually is greatest? How can we know when the code really is ready to be included in the next build? Many development groups solve this problem by establishing promotion criteria. Promotion criteria are policies about how a particular module's readiness for inclusion in a build is determined.

Unit Testing Standards
Though there are many different things that could be included in your promotion criteria, unit testing is the foundation on which they are all built. Almost every organization assumes that software developers are performing appropriate unit tests. Unfortunately, different individuals tend to have radically different ideas about what is an appropriate amount of testing.

A good practice is to require that the developer document the tests he or she will run, and have peers review those tests to ensure appropriate coverage. If automated testing is used, then the developer can simply create the test scripts for the automated tool, and submit those scripts for review.

Of course, establishing group standards on what should be included in the unit tests is also a must. Coming to an agreement as a development group about what testing should be done will take some time and deliberation, but the time spent on this will be paid back many times over in builds that go right! Let's look at some examples of unit test expectations.

Functionality
Each module must be tested to ensure it satisfies its design and actually does what it should do correctly. What inputs must it handle? What things must it do? What services will it provide? What outputs should it produce? What data must it manage, and what must it do with that data? We must ensure that the module actually does what it was intended to do.Negative Testing
Then there's error handling. Does the module do the right thing when things go wrong? What happens when it is presented with invalid inputs? What if they are poorly formed or out of sequence? How about non-numeric data when numbers are expected? Data overflow? Underflow? What if it gets an error status back from the database or the network interface? What does it do then? The module must handle all error conditions correctly before it is considered to be complete.

Coverage
We all know that exhaustive testing of software is not a reasonable goal. There are simply too many combinations of inputs, too many orders in which events can happen, and too many different ways things can go wrong to be able to completely test everything in even the most trivial program.Code and path coverage, though, are an achievable goal for unit testing. In fact, unit testing is the only time that complete coverage of code and paths is a reasonable goal.

Code Coverage
It’s reasonable to require that every line of code be executed during unit test, (and analyzers exist to help with ensuring that this is done). Some code (especially error handlers) cannot be tested without taking extraordinary steps (such as writing a harness that passes bad data, or poking error codes into memory), but these are not only appropriate, they are critical to assuring that the program will handle the situation as it was supposed to.Path Coverage
One step beyond mere code coverage, testing of every path in the code is reasonable. For example, we can ensure that both branches of every IF have been traversed, and make sure that all branches of every case have been executed. We can also ensure correct initiation and termination of every loop.

Regression Testing
Doing all of this testing is fine on freshly minted code, but how much testing should we expect when there have only been small changes? How much regression testing should be done at the unit level? This is where things often go wrong. It is easy to reason that it was only a small change, so it doesn't make sense to "waste" a lot of time testing it.

It is true that we cannot do full testing on every module every time a single line was changed. But at the same time, these "small" changes often have the potential for significant unexpected side effects. The best way to make appropriate judgments about regression testing is to do a combination of risk-based testing and regional impact testing.

Risk-Based Testing
Risk-based testing involves choosing tests based on the risk of defects. There are two dimensions to risk: probability and impact. Probability is a judgment about how likely is it that the change that was made will cause certain kinds of problems. We should test for those problems that we are most likely to experience. Part of this judgment is a matter of examining the nature of the code that was changed, and part of it is experience with prior similar changes.

Impact is a judgment about (regardless of how unlikely it is) how bad it would be if certain parts of the program failed. Those things that are high impact should also be rested. For example, functions that are central to the program's reason for being must always work. Things that impinge on security or safety always have a high impact.

Regional Impact Testing
Regional impact testing involves actually looking at the code and focusing testing in the region of code that has changed. For example:

  • A developer should probably do full coverage testing of every line of code that was added or changed in the module.
  • Likewise, he or she should test all paths that have been affected by those changes.
  • In addition, developers should do some less stringent level of testing in the parts of the code that are related to the changes. For example, if I changed code that sets a parameter's value, I will what to test a few places where that parameter is used as well.

Objective Evidence of Unit Testing
Planning to do all of those tests is a good thing, but there must also be some objective way to verify that the testing was actually done. What evidence should be captured and saved by the developer to show that the tests were run, and that they produced the expected results? How can we know that all of the testing that we decided was called for has been done and done well?

Obviously, the burden of proof that we place on the developer must be reasonable and commiserate with the risk of tests being inadvertently skipped. We don't want to impose undue busywork and are merely trying to ensure that when the developer says a module is ready for promotion, we all agree what “ready” means.
 

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.