You don't wait until the day before a software release to test the product. Testing software is a complex process, involving systematic investigation and sustained observation. In this week's column, James Bach argues that evaluating testers is similarly complex. And it shouldn't be put off until the night before the tester's performance review.
I was at the Sixth Software Test Managers Roundtable meeting recently, discussing ways to measure software testers. This is a difficult problem. Count bug reports? Nearly meaningless. Even if it meant something, it would have terrible side effects, once testers suspect they are being measured in that way. It's easy to increase the number of bugs you report without increasing the quality of your work. Count test cases? Voila, you'll get more test cases, but you won't necessarily get more or better testing. What incentive would there be to do testing that isn't easily reduced to test cases, if only test cases are rewarded? Why create complex test cases, when it's easier to create a large number of simple ones?
Partway through the meeting, it dawned on me that measuring testers is like measuring software. We can test for problems or experience what the product can do, but no one knows how to quantify the quality of software, in all its many dimensions, in a meaningful way. Even so, that doesn't stop us from making a meaningful assessment of software quality. Maybe we can apply the same ideas to assessing the quality of testers.
Here are some ideas about that:
To test something, I have to know something about what it can do.
I used to think of testers in terms of a set of requirements that all testers should meet. But then I discovered I was missing out on other things testers might have to offer, while blaming them for not meeting my Apollonian ideal of a software testing professional. These days, when I watch testers and coach them, I look for any special talents or ambitions they may have, and I think about how the project could benefit from them. In other words, I don't have highly specific requirements. I use general requirements that are rooted in the mission the team must fulfill, and take into account the talents already present on the team. If I already have an automation expert, I may not need another one. If I already have a great bug writer who can review and edit the work of the others, I might not need everyone to be great at writing up bugs.
"Expected results" are not always easy to define.
Let's say two testers test the same thing, and they both find the same two bugs. One tester does that work in half the time of the other tester. Who is the better tester? Without more information, I couldn't say. Maybe the tester who took longer was doing more careful testing. Or maybe the tester who finished sooner was more productive. Even if I sit there and observe each one, it may not be easy to tell which is the better tester. I'm not sure what my expectation should be. What I do instead is to make my observations and do my best to make sense of them, weaving them into a coherent picture of how each tester performs. "Making sense of observations" is a much richer concept (and, I think, more useful) than "comparing to expected results."
When I find a problem, I suspend judgment and investigate before making a report.
When I see a product fail, especially if it's a dramatic failure, I've learned to pause and consider my data. Is it reliable? Might there be problems in the test platform or setup that could cause something that looks like a product failure, even though it isn't? When the product is a tester, this pause to consider is even more important, because the "product" is its own programmer. I may see behavior that looks like poor performance, when in fact the tester is doing what he thought I wanted him to do.
Sometimes, when one problem is fixed, more are created.
Whenever testers try to improve one aspect of their work, other aspects may temporarily suffer. For instance, doing more and better bug investigation for some problems may increase the chance that other problems will be missed entirely. This performance fluctuation is a normal part of self-improvement, but it can take a test manager by surprise. Just remember that testing, like any software program, is an interconnected set of activities. Any part of it may affect any other part. Overall improvement is an unfolding process that doesn't always unfold in a straight line.
Something may work well in one environment, and crash in another.
A tester may perform well with one technology, or with one group of people, yet flounder in others. This can lead those of us who spend a long time in one company to have an inflated view of our general expertise. Watch out for this. An antidote may be to attend a testing conference once in a while, or participate in a tester discussion group, either live or online.
Problems and capabilities are not necessarily obvious and visible.
As with testing a software product, I won't know much about it just by dabbling with the user interface or viewing a canned demonstration. I know that to test a product I must test it systematically, and the same goes when I'm evaluating a tester. This means sustained observation in a variety of circumstances. I learned long ago that I can't judge a tester from a job interview alone. All I can do is make an educated guess. Where I really learn about a tester is when I'm testing the same thing he's testing, working right next to him.
Testers are not mere software products, but I find that the parallel between complex humans and complex software helps me let go of the desire for simple measures that will tell me how good a tester is. When I manage testers, I collect information every day. I collect it from a variety of sources: bug reports, documentation, first-hand observation, or second-hand reports, to name some. About once a week, I take mental stock of what I think I know about each tester I'm working with, triage the "bugs" I think I see, and find something that's good or recently improved about each tester's work. It's a continuous process, just like real testing-not something that works as well when pushed to the last minute before writing a performance review.