The nature of software delivery has evolved significantly over the years. This has led to a change in how application teams work, so the tools they should use to help them deliver software have also changed. Integrating application performance monitoring into the software development lifecycle means issues affecting performance can be fixed before applications are deployed.
Two trends in software development are contributing to the need to integrate application performance monitoring into software development. The first is that application development and lifecycle has changed in two significant ways: It’s evolved from a single team developing a monolithic application to a much more granular component-based (mainly API-driven) development effort where component reuse from external parties is more prevalent, and it’s moved from a managed and centrally hosted controlled infrastructure to a more dispersed and fluid cloud- or software as a service-oriented architecture targeting global customers.
The second trend is the “consumerization” of user expectations. Expectations for application performance are often attributed to the consumerization of IT. Essentially, because everyone uses web applications on a variety of devices to shop, watch videos, and communicate with friends, their experience as consumers of these technologies colors their expectations. Users expect the same kind of performance they get on Facebook or Amazon when they are transferring money at the bank, filing an insurance claim, or looking up medical records at a hospital. Additionally, the overall pace of the business cycle drives application delivery to be faster, have more features sooner, and perform at the expected levels with fewer resources.
Application development methodologies have evolved to manage the changes in development, moving from serial development of features assembling and releasing once or twice a year, to more agile development techniques meant to allow for smaller blocks of work to get done and deployed more rapidly without waiting for other items. The goal now is continuous delivery in all points of the lifecycle.
End-user expectations show no sign of relaxing—if anything, they are becoming more onerous as device proliferation occurs. Users believe that applications should work well regardless of the device.
In this context, application development and delivery teams are also evolving toward a more fluid model of working—hence the DevOps movement. With very short cycles to develop, deliver, maintain, fix, and deliver again, new ways of working together that save time must be found.
Integrating performance monitoring into this rapid development and delivery cycle helps DevOps teams work together to deliver high-performing applications quickly. Just as development methods have changed, so has application performance monitoring. Previously, developers moved from develop to test to monitor in production—three separate steps—and production was mostly an application and infrastructure under their control. Monitoring is now about the need to ensure performance with internal and external applications and end-users across an application and infrastructure at least partially out of their control. The only way to gain control is to monitor performance.
In the current environment, developers can and should start application performance monitoring in test and then move to a production environment, cycling back as needed.
Applying Performance Monitoring in the Development Lifecycle
First, developers can inject performance monitoring while building functional components, baselining performance before assembling the components into the completed applications. This provides a measurement against the application performance as a whole.
Next, when the application is built, baseline the performance of the end-to-end application in a staging environment, providing measurement against a changing environment and moving to production. Without this baseline, there is no way to determine if performance has changed when moving to production.
Finally, monitor in production—preferably from wherever users are coming from—internal, external, and applicable geographies.
Now, because the performance has been baselined for both the components and the application in a test environment, the development team understands what the performance should be, which aids them when problems occur. Given that production is a different environment where changes are made that are not under the developer’s control, there will be issues, but with the information already gathered, it is easier to locate the root cause. The operations team finds the issues, but the developers have to fix them; integrating performance monitoring throughout the process reduces mean time to repair.
There are other considerations when implementing this strategy. Some high level considerations include:
- How fast do components or applications need to be? What is the performance of competitors in this space?
- What kind of volume and growth need to be supported with this application?
- If an API is provided, what is its performance? Does an SLA (service-level agreement) need to be provided?
- If an API is being consumed, is it the best performing one for the application? Is it continuously performing?
There are various best practices to consider. When using a component architecture, component encapsulation requires component-based monitoring in addition to application monitoring from the end-user perspective. The more granular level of detail this provides, the more useful it will be for troubleshooting. Detecting changes in performance at a detailed level alleviates the need to spend lots of time looking for the root cause.
Monitoring requires something to monitor and a reference to compare the results. The minimum requirement is one SLA or performance threshold per component. If software components are too numerous or proliferate quickly, it may become impossible to set thresholds by hand, and a more advanced automatic technique maybe required. Consider using automated baselining and automatic anomalies detection in QA environments to implement this successfully.
Due to component proliferation and third-party services syndication, synthetic-based tools specially enabled to test component performance (like APIs) are required. Because access to third-party component systems may not be possible, other approaches, such as agent-based monitoring or log collection, cannot be used. It is important to look from the user’s point of view, meaning that when using synthetic monitoring, it is critical to use real browsers and generate traffic from the geographic locations of end-users.
A mix of monitoring techniques may be necessary to provide all the information needed. Agent-based monitoring provides an end-user perspective for live traffic but cannot detect problems occurring in areas without live traffic. Continuous synthetic monitoring detects performance issues as they happen so that they can be fixed prior to impacting live traffic.
Monitoring is all about understanding what is going on that is different from normal.
By bringing performance monitoring across more of the application development lifecycle, changes that affect performance can be caught faster, issues can be fixed before applications are deployed, and the move toward continuous delivery is better supported.
User Comments
Really excellent points.
It is true that development is increasingly API based. Microservices are one trend that highlights this. That has implications for Agile processes, which generally presume that development should be feature oriented, with stories consisting of end-to-end features, but when a server API must be used by multiple clients - some of which have yet to be identified - development needs to be more oriented around those APIs.
You are also right that testing the perofrmance of those APIs is essential. API level testing often reveals inadequate performance for many popular character oriented protocols. Increasingly, organizaitons are going back to binary protocols (e.g., Google's "protocol buffers", which is essentially a return to IIOP/CORBA), because the performance is so vastly superior. Performance testing at the API level will reveal this.
The author mentions "injecting performance monitoring while building functional components" - I think with the intention of measuring performance at the compoent level. I have not done that, but it might be a good idea. It would mean, however, that one has to have a performance framework in place very early, and developers need to use it the way that they use unit test frameworks. An alternative might be to use a BDD/ATDD approach, whereby feature level performance is tested. Again, I have not tried that - my experience is in end-to-end performance testing, so I am intrigued by the author's suggestion.
The author mentions baselining performance in staging. Indeed, but I suggest not waiting for staging. If you can, create a separate performance testing environment - perhaps in a cloud - so that you can run end-to-end performance tests nightly at scale, without interfering with the staging environment. Staging will give you the most accurate results, but you will discover most performance issues if you test at scale early, and if you have developer access to the environment (not usually the case with staging), then you can diagnose things more easily.
The author recommends a monitor based approach. I agree with this. Traditional performance testing is done using special tools that generate URL load and measure results by monitoring the protocol traffic or the end-to-end response time. With a monitoring approach, one still needs load generation, but performance is measured using monitoring tools. These tools can be left in place as part of the application. The challenge is that these tools require a good amount of design (to define the monitors), so there is some work that needs to occur early to create that.
One issue that will come up in an Agile environment is who is responsible for monitoring the performance. I have found that it is important to have a technical lead who can focus on end-to-end system issues including performnace. It is a-lot to expect of a team, that everyone should be worrying about every non-functional requirements. Agile promotes the idea that everyone does everything, but in practice I feel that needs to be moderated with common sense. If no one focuses on performance, it becomes a "tragedy of the commons". So have a tech lead. But the author is right that developers should think about performance; the tech lead can lead discussions about performance considerations, and be on point to look over the nightly performance tests with a fine tooth comb. The tech lead will also want to performance special white box stress tests to pin down issues and discover the cause of performance symptoms.
Excellent article!
Thanks Clifford! I appreciate the validation