Fixing a Broken Deployment Process

[article]
Summary:
When you have hundreds of applications performing various functions across several environments, it's tough to push all the code when it needs to be. Here are some steps to help your own team develop the internal tooling it requires to deploy thousands of applications if needed, all in a reliable, efficient manner.

At my company, we have hundreds of applications that need to be frequently deployed. These applications perform various functions, including data analytics, data normalization, and presenting our findings to customers in a concise, human-readable format.

These hundreds of applications are run in a variety of environments. We have an environment specifically dedicated to development, another meant for rigorous testing, one for verifying that everything looks okay before releasing our code out into the world, and, finally, production. 

My team was tasked with solving this challenge by building internal tooling to allow application development teams to deploy their applications in a reliable, efficient manner.

There are a number of steps that I found valuable in helping my team achieve this goal. I think they’re lessons I can take with me anywhere and apply to deployment problems across other teams and companies, so I hope you can also use these lessons to fix deployment problems specific to your organization.

Lesson #1: Find the Bottleneck

If you know your company’s deployment process is broken but don’t know where, then you can’t really fix it. Look at how your development teams work and ask yourself where the problems are. What is the part that takes teams the longest? Or what part of the process has an ever-growing backlog of items to attend to because the work coming in can’t be kept up with?

That's your bottleneck. That's where you start.

My team and I took a look at the current software delivery process. There was one guy organizing deployments for all the application teams in the engineering organization—one guy, twenty teams, and six environments that these applications need to be deployed into. And each app has a different configuration per environment that is manually configured.

This resulted in chaos and bugs. Imagine having to keep track of 120 different configurations while hand-modifying those that need to change. If you make a mistake and the application doesn’t work right, you have no idea how to find the problem, as you aren’t the developer of the application. You’re just setting configuration based on what the development team told you to do.

Because of this buggy, manual process, deployments to production only happened on a biweekly basis, if we were lucky. Ideally, developers should be able to deploy whenever they finish a feature that provides value. That could be on a weekly or even daily basis. Limiting developers to this biweekly cadence for production deployments created a bottleneck, which resulted in larger changes to applications and many more applications breaking once they actually made it to production—which is the problem we were trying to avoid in the first place.

At this point, we had a good sense of where we needed to focus. It made sense to take all those application configurations, figure out what the similarities are, define a standard format, and automatically parse through them. In this way, we automatically configured the applications in each environment instead of having some unfortunate individual do this by hand. Automating our application configuration process both accelerated our deployments and reduced the number of bugs in the deployment process.

A useful tool that I wish I had known about before I started this whole deployment-fixing adventure is the cumulative flow diagram. It’s a visual tool that shows work items along the entire software delivery cycle, and it’s great for highlighting the bottlenecks in a process. I’ll likely use a cumulative flow diagram the next time I feel like teams are getting stuck and can’t pinpoint where their bottlenecks are.

Lesson #2: Don’t Get in the Way of Development

While you will need some support from your application development teams to automate deployment processes, remember that your development teams are still implementing their own code and have deadlines to meet. Asking these teams to modify their applications or standardize their application configurations to support your automation needs may seem like a great idea, but it will ultimately slow them down—exactly the opposite of what you’re trying to do.

If you want your new deployment automation to be used, you’re going to have to go the extra mile in helping your development teams prepare their application for this process.

In our case, the work involved gathering application configurations, standardizing their format, and parsing this information via an automated deployment tool. And by “gathering application configurations,” I mean that we had to move over other teams’ applications for them.

You may be thinking something along the lines of "That must have taken forever." It did take our team some time, but by focusing on this project and dividing the parts that needed to be done among my team members, several people were able to work on this effort at once, and we constantly regrouped to connect our pieces together. Plus, it took a lot less time than trying to teach all the dev teams the process we’d created on top of having them add their configurations.

Once finished, our tooling then had the ability to automatically deploy applications to an environment. We proved that it worked by deploying a simple "Hello World" application into each environment, later using it as part of our training process.

Lesson #3: Automate Testing (with Helpful Output!)

All these lessons are valuable to me, but this one is my favorite. Regardless of how well your automated deployment process works, developers will make mistakes when attempting to deploy their applications. Your process should test what is being deployed to make sure deployments work properly and provide useful diagnostics and error messages if a problem occurs.

For our situation, we needed a way to assure that our standardized configuration format was used correctly and, if an application configuration was broken, for development teams to know what was wrong. Just putting a definition of our format in a README file or another document wasn’t sufficient.

Instead, we automated the testing of any changes made to our configuration file format. Our tests checked each configuration for the development teams: Do the keys being used make sense? Do the values entered fall into the correct format? Is the configuration flexible enough to take some oddball use case that likely got created when there wasn't a process at all?

These tests run automatically any time someone tries to make a change to our standard and requires teams to fix issues before deploying their apps with our deployment tool. This allowed our format to stay clean and be utilized properly by our tooling.

Lesson #4: Educate the Organization

Getting teams to adopt your new process involves more than just setting up some tests. Your tests may prevent people from adding changes when they make mistakes, but without communicating how to do things correctly, you’re not going to make a whole lot of progress in the way of deploying applications efficiently. The best thing to do is reach out to the developers you’re providing the tooling to and seeing how you can help.

Start with a team you have a good rapport with. Work directly with them and use them as a way to test out your tooling and gather feedback. By having initially copied over the first configurations, you’ve given developers a template off of which to base new apps that they add to your tooling: Take a look at Team Athena's configuration. Copy and paste. Change around the values. Now it's yours.

Document any questions they have with answers in the README, and make sure to focus on those areas with the next team you onboard.

Communicate that things are happening! Use your company chat tool of choice and set up calendar events that people can keep an eye on. Do whatever you can to make sunsetting the old process a visible (but welcome!) change.

Finally, educate the new hires, existing teams, and managers. I scheduled a number of meetings with various development teams and volunteered to give biweekly presentations to new employees. In each of these gatherings, I presented what deploying used to look like, the problems with it, and how the new method works.

People were a lot less confused, and they were much more likely to come back with questions if they needed further help rather than sitting in frustration. Developers also started adding new applications without my team being involved, becoming self-sustaining with their deployments.

Your New and Improved Process

If it improves the current deployment process and people are adopting your automation, your new tooling is a success.

You’ll have the occasional skeptic, but think of that conversation as a way of showing off your great work. One of my team members had written a script to automate metrics gathering around our new deployment tool. We also got fantastic and thorough feedback from that one guy originally in charge of deploying all the applications in the company. Those wins spoke volumes about what we had achieved and how it was helping the organization.

Best of luck on your journey to change deployments at your company for the better. I hope these lessons help you get there.

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.