Introduction to Subversion

[article]
Summary:

This is the first of a four part series on Subversion (SVN for short). It covers Subversion history and basic overview of most important SVN features. (Part 2 will cover Subversion installation and look at the most popular clients. Part 3 will give you the guided tour of the most frequent Subversion use cases. Finally, Part 4 and will give some hints for the process of migration from other RCSs to Subversion.)

Brief SVN Background

Since the 1980's when Dick Grune created his set of scripts called CVS, the configuration management domain has come a long way. CVS become the de facto standard for community development as well as for many companies. Even though it underwent many improvements and enhancements during the time, it still suffers from some limitations caused by 20-year old design. This was the status quo until CollabNet, Inc. came up with its idea of completely new- designed, open source replacement for CVS which would preserve the CVS development model, but fix it's most obvious flaws. The initial design work on Subversion started in early 2000, the project was self hosted in mid 2001, and the stable version 1.0 was released in February 2004. Since then, the project has moved to its current release version 1.3.0.

Over one million downloads, the adoption by many community projects (with Apache Software Foundation among the biggest), and the appearance of many 3rd party clients, integrations, and tools all make for pretty convincing proof that if the declared project goal (to create an improved CVS replacement) is not met yet, then it's just a matter of time. The above facts should suggest to anyone in the CM area that knowledge of Subversion is a “must have” if they would not be left behind this trend that appears to gather increased momentum almost daily.

For Everyday Users– Business as Usual

From the perspective of the daily work of the ordinary user, Subversion does not differ from CVS so very much. There is a server which stores all the data. The individual users 'checkout' data to their local machines into a so-called “working copy”. The working copy is a directory structure which contains the versioned data plus special “.svn” folders containing some extra information for Subversion, such as from which server the data were taken, from which version, and so on. Unlike CVS, there are no user credentials stored in SVN working copy.

The user works normally with the files on his/her local file system and, when finished, ”commits” the changes to the server. The server does not track who has which files in their working copy.

New Concepts in SVN

Now, let's have a look the new concepts behind SVN. The hardest to understand for many people is the fact that SVN does not version single files, but rather the whole file system (directory tree, if you wish) - the repository. Therefore we don't speak about version 1.1, 1.2 of files, but about version 1, 2, 3, ... of the whole repository (with the particular file being modified in, say, versions 1 and 3). Consequently all modifications of the directory structure, such as deletions, renames, and copying are versioned as well. Therefore it's trivial to find out who created, deleted, or copied a particular file or folder - which could be a nightmare with some other systems.

So if every change to any resource in the repository increases the overall repository version, what is the basic unit - the operation, which increments the version number by one? The answer is simple - it's a commit. Subversion uses the concept of “atomic commits” - whatever change is done in your local workspace involving multiple files and folders within one repository, it can be committed as one atomic change to the repository. The whole commit either succeeds - creating new repository revision, or fails - and the repository remains untouched.

The concept of atomic commits has two main advantages. First, changes to multiple mutually dependent resources can be done in one atomic change, thus enabling users to keep the repository content consistent all the time. Second, all changes forming one logical set can be grouped into one commit, making it easier to audit WHY – (why was particular change done?) or vice versa, to audit WHAT (what changes were done to implement some feature or to fix some bug?) In Subversion terminology the change done by one commit is called a ”revision”. We speak about “committing revision 123” or about “revision 123 of the repository”. Subversion revisions are assigned ordinary numbers starting with 0 and incrementing by 1.

What About Branching and Tagging?

So far, you should be able to understand that Subversion is basically a versioned file system which keeps track of all changes and enables you to see the whole history at any time. Subversion keeps things simple. But now you might ask: if this is all, then what about branching and tagging? This is the must-have feature for any CM system!

The answer, shocking though it may seem at first, is: there is no special support for branching and tagging in SVN. Once you get past the initial astonishment, you soon discover that the reality is that SVN’s directory copy operation, with it's constant time and disk usage complexity, is an ideal candidate to replace the explicit branching/tagging of CVS.

This is where the magic trio “trunk”, “branches”, and “tags”, which you see in most SVN repository structures, comes into play. “Trunk” is the directory where all the main development takes place. “Branches” contain copies of the “trunk”; it’s where branched development takes place. Finally, the “tags” folder holds copies of the “trunk” folder acting in the tag role. Since the SVN tracks the copy operation, it is easy to find out when a particular tag was created, or to merge changes between trunk and branches. It's also useful to note, that since a revision number fully defines the repository state, the revision number alone can be used as kind of tag. For example, ”the build was created from revision 234”.

A Few Practical Basics

And now from the abstract concepts to some practical information. SVN supports three repository connection protocols: file, a proprietary protocol, and WebDAV/DeltaV.

The file protocol is useful for local repository handling, without any network overhead (versioning your files locally). The proprietary “SVN” protocol is easy to set up by just launching the command line server tool. The most widely used HTTP-based WebDAV/DeltaV protocol uses Apache HTTP server as the host platform. This is preferred, because of non problematic network setup (HTTP gets through most firewalls), security (when configured as HTTPS) and robustness provided by Apache HTTP Server. The SVN and HTTP protocols allow user authentication with path-specific read/write access control.
The SVN distribution (both client and server) is available for various systems, including Windows, Linux, BSD, Solaris, and MacOS. There is a choice of file system or Berkeley DB based storage implementation, with the file system being the more reliable (and therefore recommended). Repository storage is binary compatible among different systems.

Finally the good news for non English speakers: Subversion is localized to many languages and open to contribution of new localizations.

Stay Tuned For More

This concluded Part 1 of the series. In the following parts the series will cover Subversion installation and describe the most popular clients (part two), will give you a guided tour of the most frequent Subversion use cases (part three), and finally will give you some hints for the process of migrating from another RCS to Subversion (part four).

Links

Subversion Home - http://subversion.tigris.org/
SVN Book (complete Subversion description and reference from SVN authors) - http://svnbook.red-bean.com/


Michal Dobisek is Software Architect at Polarion Software GmbH ( (www.polarion.com) . He has experience with CVS, Perforce and Subversion. He has two years experience in using, administering and tweaking Subversion. He holds a Masters degree in Cybernetics from Gerstner Laboratory of the Czech Technical University in Prague.

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.