A Tale of Two Trees

[article]
Summary:

Our discussion of source control must begin by defining the basic terms and describing the basic operations. Let's start by defining two important terms: repository and working folder.

An SCM tool provides a place to store your source code. We call this place a repository. The repository exists on a server machine and is shared by everyone on your team.

Each individual developer does her work in a working folder, which is located on a desktop machine and accessed using a client.


Each of these things is basically a hierarchy of folders. A specific file in the repository is described by its path, just like we describe a specific file on the file system of your local machine. In Vault and SourceSafe, a repository path starts with a dollar sign. For example, the path for a file might look like this:

$/trunk/src/myLibrary/hello.cs

The workflow of a developer is an infinite loop which looks something like this:

  • Copy the contents of the repository into a working folder.
  • Make changes to the code in the working folder.
  • Update the repository to incorporate those changes.
  • Repeat.

I've omitted certain details like staff meetings and vacations, but this loop essentially describes the life of a developer who is working with an SCM tool. The repository is the official place where all completed work is stored. A task is not considered to be completed until the repository contains the result of that task.

Let's imagine for a moment what life would be like without this distinction between working folder and repository. In a single-person team, the situation could be described as tolerable. However, for any plurality of developers, things can get very messy.

I've seen people try it. They store their code on a file server. Everyone uses Windows file sharing and edits the source files in place. When somebody wants to edit main.cpp, they shout across the hall and ask if anybody else is using that file. Their Ethernet is saturated most of the time because the developers are actually compiling on their network drives. When we sell our source control tool to someone in this situation, I feel like an ER doctor. I go home that night with a feeling of true contentment, because I know that I have saved a life.

Best Practice: Don't Break the Tree

The benefit of working folders is mostly lost if the contents of the repository become "broken." At all times, the contents of the repository should be in a state which allows everyone on the team to continue to work. If a developer checks in some code which won't build or won't pass the test suite, the entire team grinds to a halt.

Many teams have some sort of a social penalty which is applied to developers who break the tree. I'm not talking about anything severe, just a little incentive to remind developers to be careful. For example, require the guilty party put a dollar in a glass jar. (Use the money to take the team to go see a movie after the product is shipped.) Another idea is to require the guilty developer to make the coffee every morning. The point is to make the developer feel embarrassed, but not punished. With an SCM tool, working on a multi-person team is much simpler. Each developer has a working folder which is a private workspace. He can make changes to his working folder without adversely affecting the rest of the team.

Terminology note: Not all SCM tools use the exact terms I am using here. Many systems use the word "directory" instead of "folder." Some SCM tools, including SourceSafe, use the word "database" instead of "repository." In the context of Vault, these two words have a different meaning. Vault allows multiple repositories to exist within a single SQL database. For this reason, I use the word "database" only when I am referring to the SQL database.

In and Out

The repository exists on a server machine which is far away from the desktop machine containing the working folder where the developer does her work. The word "far" in the previous sentence is intended to mean anything from a few centimeters to thousands of kilometers. The physical distance doesn't really matter. The SCM tool provides the ability to communicate between the client and the server over TCP/IP, whether the network is a local Ethernet or an Internet connection to another continent.

Because of this separation between working folder and repository, the most frequently used features of an SCM tool are the ones which help us move things back and forth between them. Let's define some terms:

Add: A repository starts out completely empty, so we need to "Add" things to it. Using the "Add Files" command in Vault you can specify files or folders on your desktop machine which will be added to the repository.

Get: When we copy things from the repository to the working folder, we call that operation "Get." Note that this operation is usually used when retrieving files that we do not intend to edit. The files in the working folder will be read-only.

Checkout: When we want to retrieve files for the purpose of modifying them, we call that operation "Checkout." Those files will be marked writable in our working folder. The SCM server will keep a record of our intent.

Checkin: When we send changes back to the repository, we call that operation "Checkin." Our working files will be marked back to read-only and the SCM server will update the repository to contain new versions of the changed files.

Note that these definitions are merely starting points. The descriptions above correspond to the behavior of SourceSafe and Vault (with its default settings). However, we will see later that other tools (such as CVS) work somewhat differently, and Vault can optionally be configured in a mode which matches the behavior of CVS.

Terminology note: Some SCM tools use these words a bit differently. Vault and SourceSafe use the word "checkout" as a command which specifically communicates the intent to edit a file. For CVS, the "checkout" command is used to retrieve files from the repository regardless of whether the user intends to edit the files or not. Some SCM tools use the word "commit" instead of the word "checkin". Actually, Vault uses either of these terms, for reasons that will be explained in a later chapter.

H.G. Wells Would be Proud

Your repository is more than just an archive of the current version of your code. Actually, it is an archive of every version of your code. Your repository contains history. It contains every version of every file that has ever been checked in to the repository. For this reason, I like to think of a source control tool as a time machine.

The ability to travel back in time can be extremely useful for a software project. Suppose we need the ability to retrieve a copy of our source code exactly as it looked on April 28th, 2002. An SCM tool makes this kind of thing easy to do.

An even more common case is the situation where a piece of code looks goofy and nobody can figure out why. It's handy to be able to look back at the history and understand when and why a certain change happened.

Over time, the complete history of a repository can become large and overwhelming, so SCM tools provide ways to cope. For example, Vault provides a History Explorer which allows the history entries to queried and searched and sorted.

Perhaps more importantly, most SCM tools provide a feature called a "label" or a "tag." A label is basically a way to mark a specific instant in the history of the repository with a meaningful name. The label makes it easy to later retrieve a snapshot of exactly what the repository contained at that instant.

Looking Ahead

This chapter merely scratches the surface of what an SCM tool can provide, making brief mention of two primary benefits:

    • Working folders provide developers with a private workspace which is distinct from the main repository.

 

  • Repository history provides a complete archive of every change and why it was made.

    In the next chapter, I'll be going into much greater detail on the topic of checkins.

Eric Sink is a software developer at SourceGear who make source control (aka "version control," "SCM") tools for Windows developers. He founded the AbiWord project and was responsible for much of the original design and implementation. Prior to SourceGear, he was the Project Lead for the browser team at Spyglass (now OpenTV) who built the original versions of the browser you now know as "Internet Explorer." Eric received his B.S. in Computer Science from the University of Illinois at Urbana-Champaign. The title on Eric's business card says "Software Craftsman." You can Eric at [email protected]. This series of articles from Eric Sink are part of his online book called Source Control HOWTO, a best practices guide on source control, version control, and configuration management. You can find it online at http://software.ericsink.com/scm/source_control.html

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.