In his CM: the Next Generation series, Joe Farah gives us a glimpse into the trends that CM experts will need to tackle and master based upon industry trends and future technology challenges.
The technology view is to be expected. Vendors move at different speeds and have differing visions. But on the user side, it's a bit more difficult to understand. One would expect that if you could find tools that would lower your total cost of operation and free up resources for your core business needs, the path would be obvious. However, a significant portion of new CM installations continue to be on the low-end side of things. The key reason in our opinion: perceived complexity. CM vendors may be responsible for this perception, especially because there are a number of complex tools out there - the total cost of ownership isn't what is advertised, or the initial outlay to reach these targets are far too great and incur significant risk. That being said, there are also a number of tools out there that reduce complexity significantly, including those which target the ALM and Enterprise CM markets.
I've been heavily involved in the CM industry for over a quarter century. The first generation of CM tools were version control and build support tool with features such as baseline reproducibility and delta/difference and merge capabilities. That finally gave way to a second generation, starting in the 1990s for the most part. Change Management was recognized, at least by a critical mass, as a necessary component. Distributed development was also recognized. Other major advances included workspace support, scalability, cross-platform support, graphical user interfaces, directory revisioning, branch and label management, the addition of problem/issue tracking, context-based views, and the introduction of process support. However, with this significant second generation expansion came significant overhead.
Supporting scalability, distributed development, label management and process became, not just a full time job, but a task for a CM team to take on. The tools that were supposed to be simplifying development management and improving quality came at quite a cost to the project. As such, today's perception: only some companies can afford to "do it right". And that's where many high end tools stand today. They are second generation tools requiring significant infrastructure to support.
There are at the same time a number of tools, which, though they may not provide all of the capabilities required to count them as full second generation tools, have side-stepped the infrastructure and overhead. They have good technical solutions which are scalable, and perhaps even support some form of distributed development. Typically they fall short on the process and CM maturity side. Maybe they lack directory revisioning or change packaging, or rely on heavy infrastructure add-ons to permit these. Perhaps they focus on the version and build management space but fail to address process. There are some good ideas out there: generic process engines; highly-scalable solutions; small footprint solutions; better branching/stream support. But a new feature or two does not push one solution ahead of the pack, and does not constitute the next generation of CM.
The Third Generation of CM
The next generation of CM tools will move the solution towards the Enterprise CM and ALM markets. They will provide a full range of second generation features, but will also bring a significant improvement in five key areas:
· CM maturity
· Reliability/availability
· Administration and infrastructure
· Process support
· Ease of use
These cover the broad spectrum of CM technology growth. All are important. All can be advanced significantly. In fact, in each area there are tools which provide full next generation capabilities, but the number of tools advanced in all areas rapidly approaches zero. Although just one or two third generation CM tools are available now, we should expect to see an increasing number by the end of the decade.
What does a third generation tool have to do? For starters, it has to reduce the CM administration team significantly. If not, the tool will continue to cater only to a few who can afford it. It has to reduce the risk of introduction. Consider a tool which requires you to invest heavily just to assess the level of return it's going to give you versus a tool, which for the cost of your investigation is fully deployed and returning an even greater investment. Which one do you think will win out? Marketing aside, that's easy.
What about CM Maturity? Countless resources are spent evolving, and training users on, branching and labeling strategies, synchronizing distributed development and putting together version description documents or traceability reports. There is a lot of room for improvement there. Reliability and availability not only affect end user productivity, but can significantly make a difference in the case of a disaster, be it a disk crash or a terrorist attack. Process evolution will apply not only to the basic CM model, but will expand that model into the areas of requirements management, work breakdown structures, test case management and test run management. Process specification must also be simplified.
With all of these advancements, there has to be a focus on ease of use issues. Whereas the advances for second generation CM naturally compounded ease of use issues because of the wider scope, third generation CM advances should generally work in the opposite direction. The tendency is to reduce administration and increase CM automation. These goals in themselves will improve ease-of-use. However, a vastly advanced standard for the overall computer industry has set the bar high for ease-of-use issues.
So let’s look at each of these areas in more detail.
CM Maturity
Since branching and labeling were first facilitated, strategies have evolved which sometimes tend to make your CM architecture resemble a plate of spaghetti. Why is this? The key reason here is that branching is an overloaded concept. It's not just used for parallel development. It's used to collect files into a change, to collect files into a promotion level, to help identify active work before it's checked in, to create patch and variant builds. The list goes on. The key point here is that the basic real-world objects of CM often lack the underlying data capabilities, so branches are used. Branches are a powerful concept.
With clearer identification of the most frequent branching patterns, a shift is coming. The "main trunk" model of CM will give way stream-based models for both support and development streams. While there may be the odd sub-pattern attached to the stream-based model, for the most part, the drive to reduce complexity, and the realization that complexity can be reduced dramatically, will drive the effort to clean up branching.
This will require new first order items. Rather than having a baseline as a labeled branch, a baseline will be a first order item. We see this in more than one tool already. Similarly, build objects will exist which allow you to collect changes that are to be applied to a baseline to create a patch or variant load - perhaps a customization for a particular customer.
Rather than branching and labeling to collect files of a change, you'll see more and more tools introducing change packages. Many will get it wrong, and some already have - using the task or problem/issue as the "change" object. This won't work primarily because a task or problem often requires several changes over time to implement. And it's often desireable to check-in the stability affecting portions of a new feature well in advance of the rest of the functionality, so that re-stabilization may occur early on in the cycle. As well, a problem may apply to multiple streams, whereas a change may have to be implemented separately for each stream. A change is different from a task or an issue/problem.
One of the key aspects of a change object that will become more and more visible is its status, or promotion level. In a third generation system, you don't simply promote file revisions by promoting the change, the change status is the promotion level. There's really no need to talk about promoting file revisions and no need to label them. They're creations of the change. A third generation system will look at the changes that have been promoted and tell you what your new baseline is, or allow you to look at several baselines simultaneously in a given stream, each for a different promotion level. You won't have to create parallel branches as you might have to today because the tool lacks the ability to give you a change-based view of the system. Unfortunately, this will be a much more difficult task for most tools, which grew up on file-revision based views instead of on change-based views.
All in all, the introduction of more first order objects and the elimination of branch-based promotion will significantly reduce the need to branch, label and merge. Change-based promotion will also make it easier to pull changes out of a build, simply by demoting the change status. Configuration Management will tend to be the task the tool does. Humans will focus on Change Management and the tool will use that input to automate the CM function almost entirely.
The integration of related management capabilities (requirements, test cases/runs, activities/tasks/features, etc) into the CM environment will allow issues of traceability and reporting to advance significantly. Some second generation tools have a number of integration capabilities bolted on. This will not be sufficient from a third generation perspective. You must be able to navigate quickly, using a point-and-click capability, your traceability links. Reports and views will be integrated across applications. When you compare one deliverable to another, it doesn't just give you the source code/file differences. It tells you the set of changes applied, problems/issues resolved, features implemented, requirements addressed, test runs performed, etc.
As IDE standards come into play more, there will be a greater level of integration. File-based integration schemes will give way to change-based schemes, with to-do lists eventually showing assigned tasks and problems/issues. The CM tool will begin to look more like a service, serving the IDEs, in some shops. A combination of IDE integrations and CM tool advances will result in better Active Workspace feedback. You won't need to look at your workspace to see its state - icons, colors and other annotations will allow you to see at a glance if your workspace differs from your context view, if files are missing or and other useful information, including the normal checked-out indicators.
Beyond the 3rd generation, don't be surprised to see the CM tool evolving into the IDE. When that happens, the CM tool will also start to be used to analyze dependencies and to help layer the system, resulting in more re-usable API production. Also, we'll finally start to see CM tools that can easily integrate with non-file-based objects that need revision control. Typically a plug-in will be required to interface with a particular proprietary-format object. But in the more advanced solutions, the same plug-in will make the proprietary objects visible to the host operating system as files.
Other fourth generation advances will include:
· Promotable directory structure changes, where the structural changes are easily queried and applied to your view
· Dynamic variant capabilities, where repeatable variants may be applied across multiple streams and/or builds
· Advanced product/sub-product management, so that a maze of shared sub-products may be readily managed across multiple products
· Automatic updates, so that the CM tool can inspect the workspace and create (and optionally check in) a change package complete with structural and code changes
· Data revisioning, so that data other than files may be revision controlled in an efficient manner
Reliability and Availability
Having a central CM repository off line becomes pretty costly when dozens or hundreds of users are sitting around idle. Unlike first generation CM systems, second generation systems tend to have some form of centralized repository. This holds key meta-data for the CM function, along with the source code. Take your central repository out of action and or provide poor performance and the feedback, and cost, will be plentiful.
Reliability and availability of the CM tool and repository is of critical importance. Even having to restore from disk backup can be costly. So in third generation systems you'll see a lot more redundancy. Repository or disk mirroring becomes a more important part of your strategy.
As well, live backups (i.e., no down time) are necessary. For larger shops, it's necessary to improve backup capabilities. Neuma's CM+ has an interesting feature that allows you to migrate most of the repository to a logically read-only medium which only has to be backed up one time ever. This reduces the size of nightly backups while new data continues to be written to the normal read/write repository store.
Initial attempts at disaster recovery will make use of replication strategies, especially where this doubles as the strategy for handling development at multiple sites. This is OK, as long as all of the ALM data is being recovered.
Third generation administration will also aid in the traceability task, from a transactional perspective. This will help to satisfy SOX requirements, addressing the traceability from a repository perspective. Development traceability will remain a CM function.
What lies beyond the 3rd generation? Well, we'll start to see hot-standby disaster recovery capabilities, so that clients can switch from one central repository to another, without missing a beat. We'll also see CM repository checkpointing and recovery capabilities without loss of information. Even in a multiply replicated environment this will add value as it will permit the repository to be rolled back to a given point, offensive transactions edited, and then rolled forward automatically, with the effect that offensive actions, whether by accident or intentional, will no longer pose a significant threat.
Security levels will be improved. It will be easier to specify roles, not only by user, but also by associating teams with specific products to which they have access. This will simplify the complex world of permissions and will enable ITAR segregation requirements, such as those in the Joint Strike Fighter project, to be more easily met. Encryption will be built into CM tools to more easily protect data without the complexities inherent when protected data must be moved off to a separate repository.
Long projects, such as the JSF, will force CM vendors to clearly demonstrate the longevity of their solutions by demonstrating those properties of their tools across the background of changing operating systems, hardware platforms and data architectural capabilities. A major project will want to know that the tool will be able to support them for 20 to 30 years. This will give a boost to many open source solutions. However, because of the complexities of CM, vendor solutions will continue to hold a significant edge if they can demonstrate longevity of their solutions.
Administration
CM tools have a bad record when it comes to administration. Not all tools, but the industry as a whole. Even though the CM function is critical, it must stop consuming more and more resources as projects scale up in size, and more capabilities are brought on-line. Take, for example, CM across multiple sites. Some solutions allow this by restricting functionality and ramping up a administrative multiple site team. While some projects can afford this (most can't), it is a sign that there will be human error, lower reliability.
A third generation CM tool will reduce administration so that installation, server management, backups, upgrades, multiple site management and general availability will be a part time task for a single person, rather than a full time job for a team. This is when a tool will be given a near-zero administration rating.
More significantly, the third generation tool will be rolled out rapidly, easily and with little or no risk. This will include data conversion. Tools that don't meet this mark will rapidly fall off of the evaluation short-list. The industry still accepts significant effort and roll out times for backbone CM tools. This will not be the case much longer.
Less significant, but related, are the infrastructure costs. If heavy-duty hardware and databases continue to make a solution a "big-IT" solution, then that solution will be shunned more and more. Customers will start demanding that scalability be shown or proven up front. Bigger, faster computers at lower costs will help, but ultimately will give the small footprint solutions that much more of an advantage. If a solution has a large footprint, there are issues related to backups, re-targeting of platforms, and upgrade effort that will place a millstone around the solution. There will be a number of small footprint solutions that do a better job at a smaller cost. "Big-IT" doesn't just cost in hardware, software and administrative resources. It typically requires a significantly more training and expensive consulting, while slowing down roll-out. Some vendors may be able to survive with big-IT solutions, but their likely survival will be more akin to the mainframe industry rather than the wireless industry.
Small footprint solutions will be important. Small footprint does not mean less functionality or more administration. In fact some of the most promising products out there are small footprint solutions. They will often be very scalable and portable. This is important. Architectures are changing. Five years ago, Linux had an outside chance of succeeding in some business circles. Today, the MAC architecture is moving to Intel. Tomorrow - who knows? That's where small footprint solutions have an important advantage. Small footprint is not quite a third generation requirement, but likely will become one as the benefits in realizing a third generation capability become apparent.
Portable scripting languages are important. If your CM tool requires VB scripting, you're taking a gamble. If you have to script differently for each architecture, you no longer have portability between architectures. And talking about portability, a third generation solution will port between big and little endian platforms, between 32- and 64-bit platforms, between Unix, Linux and Windows architectures. An even better solution will allow complete interoperability among all such platforms, with no conversion needed to rehost a database, and with full support for simultaneous clients and servers across architectures.
Scalability issues will be addressed sufficiently so that issues of partitioning data for performance reasons will disappear. Data may still be partitioned because of security concerns, or along product boundaries, etc. The net result will be a lower administrative overhead.
Administration of multiple sites will be much simpler as the growing network bandwidth will eliminate the need to physically partition data for the sake of distributed development. Instead, either full or partial replication of data at one or more sites will support multiple site development. Synchronization will be done in real time, rather than occasionally, so that each site independently sees all the data it needs to see, up to date. Network outages, though much less frequent than over the past two decades, will be dealt with automatically by the tool, with restrictions as necessary for prolonged outages.
Finally, third generation CM tools that offer multiple site development will not force a whole new process on its users. Users will be able to go from one site to another and continue working without having to go through unnatural synchronization processes.
Beyond the third generation, besides a small footprint, we'll see administration beginning to drop from near-zero admin levels to exception handling only - and due to the low frequency of exceptions, the expertise will either be encapsulated within the tool or available over the network directly from the vendor. No more multi-week administration training. Scalability will grow so that virtually any site can be hosted on a single server, relieving the advanced planning function considerably.
Process
Third generation CM/ALM will make great strides on the process side. Integration of related applications will result in an end-to-end management capability with the introduction of preliminary dashboards (or equivalent), easy traceability, and rapid response. The piecework integration of second generation systems will give way to the seamless integration mentality. Single user interface, role-based process, less training, common repository - these will all be part and parcel of the seamless integration thrust. The visible results will be easy data navigation, from requirements through to test results; from line of code to problem reports or requirements; from features to the change packages that implemented them and the builds in which they first appeared.
Process specification will also have significant advances. We'll see workflow specified by modifying live workflow diagrams. Workflow diagrams will move from one workflow per object class, to multiple partially shared workflows, reflecting the processes for specific types of objects. For example, a documentation activity will follow a different workflow than a test run activity. Data schema changes will not only be done interactively, but without any server or client down time. Configuration of the related forms, displays and reports will be straightforward.
Third generation process will see a more role-based approach, with the project team organization chart becoming part of the project data managed by the CM tool. This will permit reporting and metrics to be done on a departmental basis. It will also identify email addresses that the process can use when it's time to send out messages. Users will be assigned a set of roles which define their access rights to the repository data, the process applicable to their roles, and the user interface that they see.
Beyond the third generation, all process and data schema changes will be part of the repository data, tracked like any other management data. We'll even see multiple revisions of process workflow, simultaneously active to reflect the various processes in effect for different development streams and/or for different products. Data schema changes will automatically be reflected on forms, displays, reports, etc. and new traceability links will be automatically available for navigating the process data network.
The ALM solution will expand beyond the third generation to handle request tracking and customer relationship management so that the product manager has full traceability to all input to product decisions. Project management will be much more tightly integrated with the ALM solution, including time sheet tracking, so that actual project metrics can be reflected directly from the time sheets, as well as from process steps such as checking in code and entering test results. For some projects, there will also be expansion into the ITIL side of things.
Ease of Use
Ease of use issues focus on three primary areas in third generation CM tools: end-user (e.g., developer), configuration manager, tool administrator. Whereas in the past, the CM manager and administrator roles were not distinct, the evolution of the functionality will help to clearly separate these roles, and then to simplify them.
The biggest push forward will be moving the end-user to prefer the change-based model over the file-based model. The ability to reduce keystrokes by using a change-based model will be the key impetus for this. As well, the capability of CM tools to infer a change package definition (apart from traceability data) from changes that have been made to the workspace will cater to even the most resistant of users. Adding a new directory of files to the CM tool will be a simple matter, without the usual steps of checking in all of the new files. Instead, the directory root will be used, with a filter, to automatically attach the new files at the appropriate place in the design hierarchy.
Still because the move is from a very simple file-based check-in/out model, with a pile of rules and exceptions, to a slightly more complex change-based check-in/out model, with few rules and exceptions, there will still be resistance. The CM tools that can demonstrate clear return on value to the developer, and that can ease the transition to change-based operation, will be the winners. Unfortunately, many of the add-on change package solutions have given other solutions a bad name. They instantly conjure up images of multiple tools and databases, restrictions and complexity. The third generation CM tool will have change packaging as a natural, core component, not as an optional add-on.
Workspace synchronization will be more easily automated and higher visibility to workspaces provided. Active workspace feedback will allow the user to compare any CM repository view with a workspace visually, identifying missing files and files that differ. Workspace-wide comparison and merge operations will support detailed query and synchronization processes.
The process will be supported at the user interface with the introduction of in-boxes/to-do-lists. Traceability will be established as the contents of these to-do lists are used to initiate events, such as fixing a problem or implementing an activity. The set of to-do lists will be established based on a user's role and visual indications will identify those lists in which work is queued.
Stream-based development models will become more prevalent, not because of the branching simplifications, but because it will allow users to easily filter the data they wish to see. For example, one would switch views from one stream to another to look at pending work. The age of configuration view specifications will disappear as the CM tool will use the product, stream, and possibly promotion level to establish the views, not only of source code but, of all pertinent data for the user. Especially at the management level where it is often difficult to use the CM tool to garner required data, the automatic view specifications will permit simpler data filtering and the ability to drill down from metrics to specifics for a particular stream.
Other views will be supported by specifying a specific build identifier, a baseline identifier, or perhaps even a change package, where the view would revert to the context used initially to establish the change package, whether or not it is still in progress.
Interactive graphs, tables, and diagrams will be used to provide more natural ways to query data and to configure processes. Drill-down operations and data hyperlink traversal will be a rapid and natural means of resolving queries, especially at product management and CCB meetings.
Usability will be governed overall by the ability of the customer to configure the solution to specific needs. The more extensive and easier this capability is, the more likely that usability will become less and less of an issue. If this can be done iteratively and without down time, all the better.
Beyond the third generation we'll see the virtual file system model (currently used by only by CC?) finely tuned so that the CM services can be applied across a wider spectrum of users, including legal, accounting and business development personnel. This will be accompanied by special purposes interfaces which are integrated into the virtual file system, so that the users can do most of their operations directly from their familiar desktops.
The object-oriented approach to user interfaces will continue to expand, but in an ALM environment, there will be a more pronounced focus on cross-object operations. For example, dragging a problem report onto a change package will add the problem as a traceability link. Dragging a file-system folder onto a directory in the CM source tree might trigger a bulk load operation for that folder. Such cross-object type operations will have to be easily customizable.
Dashboards will grow in complexity, from their infancy in third generation systems, to role-based control centers, whether for a developer, showing all current changes, to-do lists, and past history, or for a product manager trying to assess the state of the product in each of multiple streams. Source tree hierarchies will be complemented with other hierarchical data, and non-hierarchical data relationships so that traceability can be seen simply by expanding the data tree.
Quo Vadis?
So where are we headed? Is this an accurate picture I've painted? To be sure, it is incomplete. Some may say that these targets for a third generation system are too advanced. It is true that many of today's tools will never make it to a third generation architecture, but other tools are well on their way. Neuma, for example, is already focused on delivering a fourth generation product.
Many will say that ALM is too wide a scope for smaller companies. However, the focus in defining third and fourth generation systems is precisely to make these systems applicable to smaller and larger shops. Less administration, smaller footprint, ease-of-use, lower risk, CM automation. The next question has to be whether or not the prices will also make these systems affordable.
How soon will we get there? Not many third generation systems are currently available. When's the best time to move forward? Should we wait until there is a wider selection? This is no different than any other product decision (e.g., OS, telecom, etc.), except that CM and ALM are backbone applications. Look at the vendor's upgrade policy. Sometimes upgrades are included as part of the annual maintenance. But also look at their track record in moving the product forward. Big IT solutions typically move forward more slowly, though this is not always the case. As well, smaller firms often have the ability to move the product ahead quickly (look at Accurev, for example), though quality has to be explored carefully - so use references.
Third generation systems will likely include significant architectural capabilities not present in second generation systems. This will allow them to move forward more quickly. But more than that, architecture will help determine whether or not the solution will be supported in 20 years time. I don't hold out much hope for some of today's market leaders, though, through acquisitions, some of these solutions may be morphed into newer architectures.
Cost
Cost is a key concern. From my perspective, low administration, high reliability and increased automation all reduce support requirements. If a CM vendor is a product-based company, this is good. Prices will come down because the existing vendor support structure will be able to support more and more customers. If a CM vendor is a services-based company, this can be bad. Services will be required less frequently. However, if the vendor is smart, this can be turned around into: "we can do a lot more for you with your existing services budget".
If ease-of-use has been properly addressed, training requirements should begin to disappear for end-users, or at least be replaced by a few hours of interactive computer-based training. That will improve productivity for anyone introducing a third generation solution. Administrative training should be far less in a near-zero admin tool, and possibly will be outsourced to the vendor, administering from a remote location. CM automation will ease the burden on CM managers and make decision making easier and more clear for all.
So what should we expect to see from a cost perspective for a third generation tool?
· Low risk deployment in one day to one week, including loading of pilot project data
· Cost per user for a full ALM solution ranging from $1000 to $2500
· Negligible additional infrastructure costs
· Training for users in the order of 1/2 day to 1 day
· Training for administrators in the order of 1 day
· Training for configuration managers in the order of 2 to 4 days
· Training for customization and process tools in the order of 2 to 5 days
· Consulting and customization service requirements in the order of 2 to 10 days
This compares very favorably with second generation tools which cover a smaller portion of the ALM solution. The long roll-out path is replaced with a rapid deployment, along with iterative improvement, using the solution to help with the process definition and customization.
One of the biggest problems with today's second generation solutions is the risk. You pay out your money, roll out the solution and see if it works. Even if it doesn't, you pay consulting fees to correct it. You may come to a point where you think you're getting less than you expected, but your budget is eaten away and your credibility is on the line, so there's no going back. Rapid roll-out eliminates such a risk. Because of the rapid process, the vendor is more likely to allow you to complete your evaluation prior to full payment. You're likely to consume fewer resources and, if necessary, there's still time to switch to an alternate solution.
Of course training is one of the big expenses, especially when lost productivity during the training is factored in. So reduced training times are important. And a reduction in the need for customization services will not only be a cost saver, but a confidence builder. That's why it is important to check references with project architectures similar to your own.
That's my perspective on things. I'd like to hear from you: what you agree with and what do you disagree with? In fact, your voice is important to the whole community so consider posting your views in the general discussion forum.