Avoiding the Organizational Death Spiral

[article]
Summary:

The death spiral supersedes the death march in that the death march is a singular event, whereas the death spiral is systemic. It is the result of organizational dysfunction where teams march toward deadline after deadline without reflecting on or questioning if there is a better way to deliver software. There is! Take these positive steps.

Ants communicate via smell. They march along in a vast line with each ant following the pheromones released by the ant in front of them. This works great until a log falls across the marching column of ants. The group in front of the log marches on oblivious to what has transpired, while the group behind the log has lost the pheromone trail. The ant that is now head of the group left behind searches for the scent so it can continue its journey, and the ants behind the new leader follow without question. The new leader eventually picks up what it believes is the missing scent, but it’s actually the scent of the last ant in its own column. The column of ants begins moving in a spiral, where the first ant in the column is following the last ant in the column. They continue to march on in ignorant bliss, unaware that they are marching in a spiral, until eventually, they die. (Click here for a video of the ant death spiral in action.)

This sad story from the insect world is also unfortunately true in our modern corporate culture. We are too familiar with the infamous death march associated with development projects that involve teams striving to achieve an unrealistic and unforgiving deadline. A death march as defined by Wikipedia is

“a project where the members feel it is destined to fail, or requires a stretch of unsustainable overwork. The general feel of the project reflects that of an actual death march because the members of the project are forced to continue the project by their superiors against their better judgment.”

The death march concept was made famous in Fred Brooks’s book The Mythical Man-Month, with the premise that adding manpower to a late software project makes it later.

The death spiral supersedes the death march in that the death march is a singular event, whereas the death spiral is systemic. It is the result of organizational dysfunction where groups of people march toward deadline after deadline without reflecting on or questioning if there is a better way to deliver software. Even though the organization is “busy” doing stuff, they are unaware of the negative impact it has on its people and its value generation. Furthermore, because the organization struggles to meet its current commitments, it does not have the bandwidth to address environmental changes (such as changes in the market they compete in) and make the necessary adjustments to remain viable. A death spiral culture is a culture that has lost its ability to adapt and survive and thus is endanger of marching into oblivion.

How does an organization break the unproductive cycle imposed by its own dysfunction? Ultimately, the company needs to take an objective view of how it is operating in order to meet its commitments so that it can break away to a new level of performance without killing its employees. This article will address one of the main reasons organizations fall victim to the death spiral, additional factors that can cause a death spiral culture, and techniques about how to confront this behavior.

Not Knowing Your Organizational Capacity

Many organizations do not take the time to understand their own capacity to deliver value. It’s kind of like the kid whose eyes are too big for her stomach. She eats as much as she can and eventually is full to the point of nausea, but her parents keep goading her to eat more.

The overutilization of organizational resources can take on many forms. Two of the most common are overloaded projects or too many projects in progress. Organizations tend to overload the scope defined for any one project that is well beyond the ability for a team or teams to complete based on the project’s duration. This results in more overtime to complete the work in the time remaining, thus inducing another death march.

In order to adequately prevent a project death march, an organization needs to take a hard look at how much capacity the team actually has to complete work. It is only then that more prudent planning can occur that reduces the probability of a death march and ultimately leads to greater sustainable productivity in the long term.

The software creation process is highly complex and highly variable. The only way to effectively address variability in a system is to allow for some amount of excess capacity so that variation in work can be absorbed. Current resource management approaches still believe that 100 percent utilization of a resource, in this case a person, is optimal. Study after study shows that in the realm of knowledge work, this line of thinking is completely outdated and is a relic of the industrial revolution. Due to knowledge work’s inherent complexity, some degree of slack is required to allow time for the unknowns that have yet to be exposed so that there is enough time to complete the work.

Think of a team member in your organization as a freeway. What happens when more and more cars are added to the freeway? Do things move faster or slower? The reality of the situation is that up to a point, the net addition of more cars actually slows the entire flow of traffic down. Why is this? It has to do with the fact that there is no more slack in the system to absorb the variation in speed that occurs in a complex system. If one driver slows down unconsciously to view a billboard, pick his nose, or text his BFF, this reduction in speed will impact the person right behind them because there is no slack for him to maintain his current speed without colliding into the rear of the nose-picker.

So not only do organizations make a mistake when utilizing all of their team members at 100 percent, but often, they also exacerbate the problem by overloading beyond 100 percent, many times without even knowing it. This is one variable of many that creates a death spiral culture. Ultimately, organizations need to become more aware of the root causes of the death spiral culture so that they proactively take steps to improve the delivery of value and not go in circles sniffing one another’s pheromones.

Understanding the capacity for work is a major issue that many organizations face. Departments, divisions, and entire companies have a natural tendency to overstuff software releases, similar to a man with a size fifteen foot attempting to fit into a size thirteen shoe. He may eventually squeeze into the shoe, but it won’t be pretty.

Overloading release after release beyond the limits of an organization’s capacity will increase the stress on teams and heighten the likelihood of employee burnout. As a result, organizations will face an ever-growing number of quality issues materializing from the production base. In contrast, a key agile principle is to work at a sustainable pace. Over time, the organization will become more productive due to faster throughput, higher quality, and less rework. Organizations that are more adept will embrace this key principle in order to minimize stress and groom their employee bases for the future.

The challenge that many organizations face is how to measure their capacity. Agile provides the ability to size requirements using a relative estimation technique called story pointing. By sizing each item associated with a release and knowing the average velocity of each team, the organization has the tools needed to determine an average velocity. As such, an average of the average velocities will provide an excellent means by which to forecast future release dates. This value is often used to help determine the likelihood of meeting a pre-established release date. The result of this approach is often a very focused conversation by people responsible for the organization’s release execution.

For example, if the total number of story points (SPs) slated for a release is three hundred and there are only six sprints in the release, we need to determine if the collective capacity of the teams working on the release can realistically complete the work in the defined time frame. Let’s assume there are three teams. Team A has an average velocity of twenty-five SPs, Team B’s is forty SPs, and Team C’s is fifty-five SPs. We know that SPs are local, so we need to normalize the values across all three teams. The easiest way to do this is to average the average velocities. In this case, the average velocity of all three teams is forty SPs. We can then determine how many sprints are needed to complete three hundred SPs of work. This comes to seven and a half—let’s round that up eight. We can now see that the current release is overloaded because it would require two additional sprints (eight instead of six) to complete three hundred SPs of work—or the release needs to be descoped to 220 SPs.

No Prioritization Process

A common issue with many organizations is they are not disciplined about prioritizing what projects should be done and in what order they should proceed through the development process. The assumption is that all projects are important and all must be started as soon as possible. There is no consideration that the organization is constrained by its capacity to do the work and that not all work is equal.

Prioritization, whether at the project or story level, needs to have the right level of rigor in order to maximize the delivery of value. There are multiple ways to go about prioritizing work. A common approach that has emerged over the years is the “weighted shortest job first” method, which combines multiple variables, including business value, time criticality, risk reduction, and effort.

Prioritization choices should be made with thought rather than emotion or the force of one person’s will. Regardless of the approach, an organization needs to follow it consistently so that the right things are started at the right time.

Too Many Projects in Flight

Multitasking is a myth. If you are skeptical about that statement, take time to observe people on your way to and from work. Notice how many are texting while driving and how well they are driving their cars and it becomes apparent that humans are really only good at one thing at time.

Organizations, like people, assume they are great at multitasking. The Toyota Production System realized long ago that by limiting the amount of work in progress (WIP), throughput will increase. Many organizations initiate as many projects as possible for various reasons, one of which is that they may falsely assume they are making progress. In reality, it slows everyone down because the majority of people are assigned to multiple projects, which causes the completion rate for individual projects to diminish.

Instead, an organization should determine what is the right number of projects to have going on concurrently in order to maximize throughput. This requires that the organization understand each state a project moves through in its lifecycle, the average cycle time per state, and the average total cycle time it takes to go from inception to deployment. The next step is to establish a WIP limit for each state and then observe. See where work begins to queue up from one state to the next and where some groups are starved for work. Adjust the WIP to even the flow through the system. An organization’s mantra should be “Stop starting and start finishing.”

Large Queues

Excess inventory costs money, whether it is physical inventory sitting in a warehouse or virtual inventory collecting cyber dust in the world of bits and bytes. “Just in time” inventory management in the manufacturing world reduces the cost of inventory by determining the right amount to minimize cost but enough to maximize the flow of work through a factory.

You manage queue size by migrating from a “push” culture to a “pull” culture, where new work starts only when there is capacity for it—the team “pulls” work in. Large queues lead to death marches because work is pushed onto teams with no regard for their available capacity. This slows things down instead of speeding things up because there is too much work in process. The lean concept of using a pull system combined with WIP limits, observation, and fine-tuning eventually evens the flow of work through the software factory. This allows for teams to work at a sustainable pace throughout the project lifecycle as opposed to marching toward their ultimate demise.

Large Batch Transfers

Traditional software development methods involve large batch transfers as the software creation process moves from phase to phase (e.g. requirements, design, development, etc.). From a lean perspective, handoffs in general are a source of waste, and large batch transfers only exacerbate this. What appears to be an efficient use of economies of scale has proven both in manufacturing and software development to be incredibly inefficient. Moving smaller batches of work through a system yields greater throughput and better quality because an error detected is more easily resolved due to the reduction of variables involved in the root cause process.

Organizations should move small batches of work from the inception phase to the next in order to maximize throughput and minimize waste. This approach goes hand in hand with limiting WIP at each project state because this limits batch transfer sizes.

Delaying Feedback

One of the biggest causes of the death spiral is delaying feedback on the software currently under development. Traditional software development methods postpone feedback and introduce risk by delaying the start of the development process. A key agile principle is that working software is the only true measure of progress. Yet, traditional approaches defer the point of software creation until all requirements and design are complete.

Once actual development and testing is underway, the true progress of the software and its condition becomes known. Unfortunately, using traditional methods, this information becomes available much later in the project’s lifecycle. Normally, the amount of time remaining in the project is not long enough to get things back on track when derailments occur. This results in undue stress on teams as more resources are thrown at the problem or the deadline is extended. By working in small batches and bringing capabilities to a level of “doneness” in order to receive feedback, teams are given the opportunity to inspect, adapt, and evolve the software to best meet customers’ needs. It is better to be two weeks wrong than two months wrong, and shorter cycles, small batches, and customer feedback mitigate this risk.

Improving Organizational Throughput

Many of us have experienced the occasional software release death march, and a select number of us have endured the larger organizational death spiral. However, there is light at the end of the tunnel if the organization is willing to adhere to a set of simple principles: Understand your organizational capacity and don’t exceed it, prioritize your work, limit your work in process, work in smaller batches, reduce queue sizes, and create frequent, regular customer feedback loops. I hope the concepts covered in this article help your organization move forward in positive steps by breaking historical patterns of dysfunctional organizational behavior.

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.