How do you build? How do you select the source code files to include in a build? How do
you identify the revisions, or versions, of the files to include a build? Do you build from the tip revisions in your version control system or do you build by selecting a specific revision of every source code file? Do members of the development team specify the changes to include in each build or do you sweep in all changes implemented at the time of the build? When do you build from revisions in the mainline of the version control system? When do you build from revisions in a branch?
Deciding what revisions to include in a build is one of the most important aspects of
building a software system. Equally important is deciding when to build from the mainline and when to build from a branch. The outcome of the build, and its usefulness, depends on the revisions of source code files included in the build. Building from the tip revisions in the version control system may be a common practice, but it is not the only, or even the most effective way, to build. You need to examine how you introduce changes to your code base to decide if, or when, building from the tip revisions makes sense for your project.
What Does It Mean to Place New or Modified Files Under Version Control?
At one time, version control systems were used exclusively as electronic libraries to
store source code files that were complete, tested, and fully working. Adding new
or modified files to the version control system implied that a coarse grained development
task from the project schedule was complete. The source code files that implemented that task were validated and approved for inclusion in the code base. Work in progress was not placed under version control because the act of versioning files meant that the work was complete.
Today, project teams that practice an agile, or even an iterative, style of software
development are likely to use a software configuration management (SCM) tool to
store both completed work and work in progress. Some SCM tools even distinguish
between versioning changes in a developer's workspace and releasing (or promoting)
those versioned changes so that they can be used by others. Even if an SCM tool
does not offer a built-in capability that separates versioning a file from releasing that file, it is still possible to use the tool effectively to version work in progress.
Why Does it Make Sense to Version Work In Progress?
Clearly, an SCM tool needs to be used to store new and modified source code files that
implement a new feature or fix a defect. Once these changes are saved in the SCM tool, they can be retrieved for inclusion in a build. However, it is equally important to use an SCM tool to save work in progress towards implementing a new feature of fixing a defect.
While these changes may not always be ready to build, they are ready to save and version because they represent incremental progress towards completing a development task.
There are good reasons for using an SCM tool to version work in progress. One important reason is to allow each developer to roll back the contents of the developer's workspace to a known stable state. If a developer versions changes frequently, such as after making and testing each fine grained change, roll back of the developer's workspace to a known stable state entails refreshing the workspace with the tip revision of each modified file. Fortunately, the tip revisions probably contain work completed within the last few hours. But if changes from a developer's workspace are versioned infrequently, such as only after a coarse grained task from the project plan is complete, restoring the workspace to a stable state may require significant rework. Imagine working for a few days on a complex development task without versioning any changes. Now imagine that a sequence of changes made in the developer's workspace causes the code to break. Attempts to fix the code make it more unstable. The most reasonable way to recover is to refresh the workspace with known stable code. But the known stable code does not include any changes made in the developer's workspace since the start of the development task. After refreshing the workspace with the same revisions used at the start of the task, the developer must begin the coding task again.
An equally important reason for versioning work in progress is to recover from a disk
failure without losing much work. Like the scenario described above, frequent versioning of work in progress means a developer never risks losing more than a few hours of work in the event of a failure that wipes out the current contents of the local file system. Of course, no version history of work in progress means redoing all the lost work.
Aside from recovering from a coding mishap or a hardware failure, frequent versioning of
work in progress provides configuration management benefits. Versioning work in
progress, i.e., versioning new and modified files after completing each fine grained change, makes it possible to associate each revision of a file with one and only one change. If that change ever needs to be examined further less research is needed to identify which lines of code were added or modified to implement the change. The differences between two revisions of each file are the lines of code that implement just that fine grained change. If the change needs to be removed there is no need to study each line in the files to identify just the lines that implement the change.
Clearly, versioning work in progress can be beneficial to both individual developers and
to the entire development team. However versioning work in progress should never be used as an excuse to delay the integration of code. A long period of work in progress between the start and end of a development task may mean the task is not properly sized. Ideally, work in progress should be measured in hours, not in days or weeks. For example, versioning work in progress every one to two hours then releasing completed work once a day achieves a good balance between versioning frequently and releasing only stable code. If work in progress accumulates for more than a day or two, it may be difficult to integrate that work when it is complete.
How Can Work in Progress be Distinguished from Code that is Ready to Build?
There is nothing wrong with using an SCM tool to store work in progress that is not ready to build provided there is a way to prevent that code from being swept in to a build. While the version control capabilities of an SCM tool are ideally suited for versioning work in progress, development team members must be able to use the SCM tool to distinguish work in progress from code that completes a development task. Code that completes a development task is
- new and/or modified code that implements all or part of a new feature
- new and/or modified code that fixes a defect, or
- refactored code that improves the implementation of an existing feature.
Code that completes a development task is code that is ready to share with other
developers. Code that completes a development task is also ready to build. Ready to build means the file revisions that satisfy a development task can be included in an integration build. An integration build pulls together all completed development tasks at the time of the build. The integration build validates that separate development tasks, completed in the workspaces of different developers, will compile and pass regression tests.
Work in progress that is accidentally included in an integration build may destabilize
the build and render it useless. For this reason, a mechanism for identification and a technique for isolation are needed to distinguish work in progress from code that is ready to build. Change packages are the ideal mechanism to identify new and modified source code files that are ready to build. Branching in a traditional SCM tool is an effective technique to isolate work in progress from code that is ready to build.
Modern SCM tools implement change packages so that each change can be abstracted from the file revisions that implement the change. While developers make changes by
modifying files, they commit these changes to the SCM tool using changepackages. Depending on how an SCM tool implements change packages, each change package either contains a status attribute or is associated with an issue in an issue tracking system that contains a status attribute. The status of the change package, or associated issue, can be set to indicate that the change is in progress or complete. As long as the status is set to in progress, the developer can use the same change package repeatedly to commit work in progress towards the change. When the change is complete, the developer changes the status of the change package, or associated issue, to complete. This signals that the file revisions in the change package can be used by others and included in the next integration build.
Even though change packages identify when a change is complete, they do not provide the isolation needed when different changes that involve the same files are made at
the same time. Unless each developer waits until a change is complete before
starting work on another change that involves the same files, there is the risk that one developer will work from another developer's work in progress. In this case, branching can be used to isolate the work of different developers. Work in progress is committed to the branch. When the work is complete it is merged from the branch to the parent codeline from which the branch was created.
What do the Tip Revisions Represent?
When a software development team uses an SCM tool to version work in progress, the tip revisions of a codeline do not always represent code that is ready to build. Instead, the tip revisions may represent work in progress or they may represent completed, stable code.
The branching strategy used by the project helps to determine the meaning of the
tip revisions in each codeline. If the project uses a mainline, the mainline serves as a home codeline for the duration of the project. Branches and codelines are created from the mainline to support maintenance and new development. Changes made in each branch or codeline eventually merge to the mainline.
The project may use the mainline to develop code for each release, but branch only to
maintain a shipped release and to isolate those new development tasks that are
potentially disruptive. Figure 1 illustrates this branching strategy. The project branches a release line from the mainline to support production fixes. After each production fix is complete and tested, the completed change is merged to the mainline. At any time, the tip revisions in the release line could contain work in progress or completed defect fixes that are ready to build. Like the release line, the project creates a task branch from the mainline to work on each potentially disruptive change. The task branch is retired when the task is complete and the changes are ready to be merged to the mainline. At any
time, the tip revisions in the task branch could contain work in progress or completed
changes that are ready to build. Like the release line and the task branches, the mainline is updated frequently with changes. At any time, the tip revisions in the mainline could contain work in progress or completed code that is ready to build. In this model, the tip revisions in the release line, in each task branch, and in the mainline are not guaranteed to represent stable code.
Alternatively, the project may use the mainline as an integration codeline. In this case, all development takes place in long lived development codelines that are branched from the mainline. Figure 2 illustrates this branching strategy. The mainline is updated when completed changes are merged from a development codeline. At any time, the tip revisions in a development codeline could contain work in progress or completed changes that are ready to build. Unlike the development codelines, the mainline is kept
stable because it is used to integrate changes that built in a development codeline. The tip revisions in the mainline contain stable code that is ready to build.
When Does it Make Sense to Build from the Tip Revisions?
Building from the tip revisions makes sense when the tip revisions contain code that is
complete, tested, and ready to build. The tip revisions meet these conditions when the mainline, or any other codeline, is used to integrate stable changes from other branches.
However, building from the tip revisions may be impractical if the tip revisions contain
work in progress. A build from the tip revisions may not work as desired. In some cases, a build created from the tip revisions may not compile. When the mainline is used to develop code, the tip revisions may not always contain code that is complete and ready to build. Since release lines and task branches are also used to develop code, the tip revisions in these codelines may not always be ready to build.
Building from Change Packages
Rather than building from the tip revisions, a more effective approach is to build using
completed change packages. Building by change packages means abstracting the
build input from file revisions to the changes implemented by these revisions.
Completed change packages alone do not comprise the input to an integration build. Instead, the contents of completed change packages need to be added to a known baseline to produce a new candidate baseline. The new candidate baseline contains the
file revisions that will be used to attempt the integration build.
The known baseline for each integration build should be the file revisions included in
the last integration build. Depending on the SCM tool, this baseline may be identified by a label or an immutable snapshot.
Using Change Packages to Implement Task-Based Development
Building from completed change packages is the consequence of practicing task-based
development. Task-based development means developers use a change package to
commit every code change to version control. Every change is committed to complete a development task or to save work in progress towards a development task.
The benefit of task-based development over traditional file-based development is that all
changes to the code base can be described in terms of development tasks. Each
build can be described by the completed development tasks in the build.
Conclusion
Building from the tip revisions means nothing if you cannot identify with absolute certainty the enhancements and fixes included in the build. An effective way to identify the changes in a build is to build from completed change packages. To build from completed change packages, every change must be committed to the SCM repository using change packages. Rather than adding overhead that slows down the development process, change packages serve as an abstraction mechanism so that the SCM tool can be used to manage not just file revisions, but the file revisions that implement changes.