Software Triage

[article]
Managing Software Defects
Member Submitted
Summary:

At some point in the software development lifecycle, regardless of which model you use, we have to make some tough decisions. What defects do we fix? Which should we let go? How do we decide? Triage is one way!

Triage. If you're a fan of the TV show M*A*S*H then you're probably familiar with the term "Triage." It’s also a concept we can apply to software testing.

According to the Wikipedia:

"Triage is a system used by medical or emergency personnel to ration limited medical resources when the number of injured needing care exceeds the resources available to perform care so as to treat the greatest number of patients possible."

Triage is actually a French word meaning "Sorting." In medical triage, patients, on a battlefield or at the scene of an emergency, are evaluated to determine which are in need of immediate care, and which can wait. At times, doctors or medics may decide the severely or critically injured should not receive immediate care because they are not likely to survive and will more than likely tie up scarce resources that could be used to save others.

So how does that apply to software testing? Let's change the definition a bit.

"Triage is a system used by software development teams to ration limited technical resources when the number of defects needing resolution exceeds the resources available to correct and verify them so as to resolve the greatest number of defects possible."

Ah–that’s better.

If there is a concept that testers and test managers are acutely familiar with, it's "limited resources." Unfortunately we can't fix and retest everything in the limited amount of time we have or with current resources. We just ask for more time or more people right? Riiiiight!

So, how can we make the best use of the limited time and people we have? Triage!

Severity and Priority
Successful Triage requires use of 2 similar yet very different concepts: Severity and Priority. With the Triage system, each defect is assigned both a Severity and a Priority. Many defect tracking systems use one or both of these concepts. But they are sometimes used interchangeably. They are really two separate and distinct concepts. Let's take a closer look.

Severity is used to define the impact that a defect has on the user of the application, or customer. Impact is probably a better term. We assign Severity levels to defects to define the seriousness of the problem.

So how many levels do you need? It's like the story of Goldilocks and the Three Bears–6 is too many, 3 are too few.

Personally, I like to use an even number of Severity levels. With an odd number of levels, like the typical 1-5, too many defects tend to get put on the fence or in the middle (severity=3). With an even number (like 1-4), you indirectly force a decision. No fence-sitting. Can you have too many levels? Absolutely! Too few? Of course. If you have too many levels, managing defects becomes a nightmare. Too few and you may not be able to fully define the impact of the defect. I worked with one system that had something like 15 Severity levels (3 levels of critical, 3 levels of severe, and so on). After a while Severity just became meaningless.

I try to avoid using just a number to define Severity levels. I typically include a brief description with the number (1-Critical, 2-Severe, 3-Cosmetic, etc.). Numerical rankings alone can be confusing (is a 1 the most severe or a 4) One major test tool vendor uses 5 as the most severe and 1 as the least, another vendor uses the complete opposite. Regardless of how you rank them, it's also important to define and document the criteria for assigning each severity level to a defect. Severity levels are initially assigned by a tester when a defect is logged; however they should be reviewed and adjusted as needed. More on that later.

Many Defect Tracking Systems use only Severity. Impact is probably a better term. Many systems use Priority when they really mean Severity. Really good ones use both and allow you to customize them. Priority is a really completely different concept than Severity. An effective triage system really needs both.

From a software triage perspective, Priority is used to rank the order in which defects are to be resolved. A defect may be Critical in terms of Severity, but the amount of time it would take to resolve it and the resources it would consume make it impractical to resolve now. Some critical defects may be assigned a very low priority, while other, less critical ones, may be assigned a higher priority and move to the front of the line. For example, a typo may not be critical, but makes you look bad. It's also an easy fix, so it may get assigned a higher Priority assignment and moves up.

Priority levels, like severity levels, need to be defined and documented. I like to use 3 priority levels (1-High, 2-Medium and 3-Low) because it tends to be simple and understandable to most of the team. 1-High is the highest priority and represents the front of the line while 3-Low represents those defects we can "live with for now" or the back of the line. These are the issues that typically end up in release notes and are deferred to future releases.

Priority determines the order in which defects are resolved. In most cases, Priority is assigned based on two primary factors: likelihood of occurrence, and impact on resources (time needed to resolve the defect and availability of resources such as developers, testers, etc.). A test may cause the system to crash, but takes an obscure set of keystrokes or user actions to make it crash. If it requires a detailed fix or major redesign, you may want to hold off on fixing it right now. Critical? Yes! High Priority? Maybe. Maybe not.

Priority levels are also helpful in determining "Readiness for Release" or defining the "Quality Bar" (but that’s another paper).

Triage Team
The Triage Team reviews, evaluates, prioritizes, and in some cases assigns all defects. Membership on the Triage Team may vary depending on organizational needs. As a minimum, I recommended the following required members:

Project Manager. The project manager serves as the process owner. Project managers are typically in the best position to evaluate impacts to the schedule and resources. As such, they are also in the best position to mediate disputes and make quick decisions if needed. They typically have "veto" authority.

Product Management/Business Analysis Team Lead. Product managers, or business analysts, represent the customer. They are typically the best resource to evaluate a defect from a customer perspective.

Development Team Lead. The development team lead is usually in the best position to evaluate a defect from a technical perspective, and can best assess how long it will take to resolve a defect and the impact on development resources for both current development efforts and resolution of defects.

Test Team Lead. The role of the test team lead is to evaluate the system and defects from a Quality perspective. As a Test or Quality Assurance Team Lead, I usually like to lead the Triage meeting since I am typically the owner of "software quality" and "Defect Manger."

In addition to their role as meeting leader, test leads are in the best position to evaluate the testing impact of a defect, to include schedule impacts, when and where to put the defect resolution (which release/environment) and the potential impact of the defect resolution on other parts of the system (regression). They also provide feedback on the impact on test schedules and resources.

Other team members may be invited as needed. These other team members may include system architects, database administrators, business analysts, subject matter experts, individual developers or testers, technical writers, change management, or others. These team members may be able to provide greater insight on issues such as customer expectations, specific development issues, build and migration impacts, or test processes.

Sometimes it is best to include the rest of the team as"silent members" so that they can hear and contribute to the discussions. They also provide immediate feedback when needed. I like inviting them and have them sit on the perimeter of the room rather than at the "big tabl." The entire team is always encouraged to attend as workload and schedules permit. I'm the first to admit that I don’t have all the answers. But I know people who do! In the long run, it may save time later on in the project. Of course, the more people involved, the greater the chance of getting stuck on a single issue or off on a tangent.

Triage Goals
Regardless of how you structure the process, the goal of Triage remains the same: evaluate, prioritize and assign the resolution of defects. As a minimum, you want to validate defect severities, make changes as needed, prioritize resolution of the defects, and assign resources.

Triage Process
Your process may vary, but I try to keep Triage focused on the following:

Defect Review, Assessment and Assignment. Triage Teams typically review and make an initial assessment of all new and/or rejected defects. They review each defect to validate the severity, clarify any issues surrounding the defect and then prioritize them for resolution. It is typically during this process that we set the defect’s Priority in the defect tracking system and assign it to a team member for resolution or further investigation. I like to have the Defect Tracking System open and projected on a wall for all to see, making changes as we go.

Any rejected defects are also reviewed to validate the reason for the rejection and to determine its disposition. In some cases the defect may be deferred to a future release or even closed outright if the team determines the defect, after further investigation, should be closed.

Investigation. In some cases, defects may be assigned to a team member for further review, investigation, or to coordinate with other team members. The reasons vary, but typically these defects need review or coordination with customers, or other team members to fully assess them for the level of effort involved with resolving and retesting them. We usually revisit them at future Triage meetings once all the pertinent questions have been answered.

Triage Frequency
A question I am often asked is "How often should the Triage Team meet?" My less than concrete answer is typically: "It depends." It depends on the impact on schedules of team members and their availability. It depends on project schedules. It depends on where we are in the development lifecycle. It depends on how many defects there are. Sorry, that’s the best answer I can give you."It depends."

Given that completely vague answer, typically a triage team meets often in the early days or weeks of the development process, and less frequently later in the process. You may meet daily at first, then 2-3 times a week, and finally 0nce a week.

The primary driver should be the number of defects. The more defects you have, the more frequently you should meet. If you track your defect discovery rate (number of defects found per day), you will usually find there are more defects following each new build or release or test cycle. Plan accordingly. Sometimes you will not have enough time. Sometimes you will break early.

Whether you call it Defect Review or Triage, the process, formal or informal, is a necessary part of any software development project. At some point in the process you have to review and prioritize defect resolution. Project success depends on effective defect management and making tough decisions. This is the best way I have found to do it. It's not pretty, but it's necessary. Why not make it easy? By the way–I've found that pizza or doughnuts encourage attendance. Anyone have the number to Krispy Kreme?

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.