A Primer on Database Version Control and Why It Matters

[article]
Summary:

While source code version control has been a staple of basic software configuration management (SCM) in most development projects for decades, databases have been largely ignored. Implementing SCM principles in database development gives objects protection and enables the automation of database deployments. Software can help to facilitate these processes, but database SCM practices should be instilled from the top of the development organization in order to ensure optimal benefits.

Software configuration management (SCM) has been providing value to development organizations for many years. By implementing a version control system (VCS), source code is protected from unintended changes and loss, an audit trail is captured automatically by the tool itself, and large teams are coordinated through the intuitive check-out and check-in mechanism. Why then is the database, without which most applications are useless, ignored in today’s SCM?

The prevailing concern is that databases don’t lend themselves to traditional version control methodologies whose focus has historically been solely on file-based artifacts, primarily source code. Version control systems were developed in the early 1970s to address a gap in source code management on the UNIX platform. At that time, relational databases had just been invented.

By the 1990s, relational databases had become dominant in enterprise-grade systems, but they still did not fit into the file-based paradigm, which continues in VCSs today, since they consist of objects like tables, views, and procedures, rather than files. While this obvious mismatch creates inherent difficulties, the benefits almost always far outweigh the challenges.

First, all of the basic benefits conferred by SCM are applied equally to the database including code protection, audit history, and developer coordination. While those are all significant and reasons enough to implement database SCM in their own right, substantial additional value can be realized by integrating a release management solution.

Database deployments are a major pain point in any software release. They are often manual, resource intensive, high-risk affairs. Sometimes developers supply SQL scripts to a database administrator (DBA) who executes them in the target environment. In other organizations, the DBA receives only a list of database changes and must manually write SQL code to apply them at release time. In any case, the processes are very manual with low repeatability. Developers sometimes supply the wrong scripts. The DBAs can break the release in a number of ways, including executing scripts out of order, executing incorrect scripts, or forgetting scripts entirely.

Testing a release in a QA environment is useless when the same changes can’t be reliably pushed to production in the same exact way. Due to these challenges, some database deployments take teams of release managers and DBAs a whole day or more to accomplish, making them a sinkhole for productivity. Without a reliably repeatable deployment process, production database deployments can introduce defects that bring down mission-critical systems and put data at risk, which can prove very costly. Database release automation addresses these costs and risks by integrating with SCM software to facilitate the deployment of archived database code, which minimizes the manual effort involved thereby increasing quality and decreasing cost and risk.

The problems and risks discussed so far are plentiful and  costly while the benefits are heavy on cost savings and efficiency, so it seems like a no-brainer, but the practical application of database SCM and release management is still forthcoming. How does one build such a solution? What are the integral processes? Are there any caveats or hidden downsides? Has database SCM and release management been implemented successfully in the real world?

The good news is that true database version control is starting to emerge as a competency with supporting tools and processes. First, a database SCM solution starts with an SCM tool or version control system (VCS). In the interest of incurring a manageable change in the organization and utilizing existing investments, many times it is both practical and beneficial to manage database changes with the VCS already being used to manage source code. If one is not being used yet, that is the first and most important step toward achieving solid database SCM. VCS software ranges from robust, but expensive enterprise offerings to scaled-down and yet widely adopted free open source tools. One should be selected based on cost, features, scalability, and user experience.

As mentioned, database objects are not files and therefore do not readily fit into the traditional file-based paradigm common to all current VCSs. One option, which probably has the most widespread use today, is to version control the SQL script that creates or modifies the object. There are some issues inherent to that methodology based on the fact that an action is being versioned rather than an object. It is most easily recognized when discussing objects that cannot be dropped and replaced for each change like tables and data. Deploying any stored procedure script to a given database will simply replace the previous version. However, tables cannot be replaced since they can be changed incrementally.

For instance, the first revision of a table script might create it while the second adds a column. Deploying the second script without first executing the first will result in a SQL error since the table doesn’t exist yet. Also, a rollback of the second revision is difficult to automate since simply redeploying the previous version attempts to recreate the table, again, causing an error. Although the version control of SQL scripts isn’t an ideal solution, it is a great initial step and, for some smaller shops with less complexity in their database design and release process, can even be an acceptable long-term plan.

SQL scripts must be checked in or committed to version control as a database developer or DBA changes the objects in the development environment. This is the most important process in the workflow and serves as the basis for everything else. If a developer forgets to check in a script or accidentally checks in the wrong one, the integrity of the VCS repository is compromised, which can cause a variety of downstream issues including auditing and deployments. Of course, strict adherence to the process will only go so far and is never a replacement for thorough quality assurance (QA).

With exceptional automated database unit testing, some of those lapses can even be caught as soon as they are committed, but that falls outside the scope of this article. The hope is that even if the incorrect script was committed, it will cause test case failures down the line in QA, which can be remediated well before they affect production. Of course, as long as the release process dictates that database deployments must be integrated with SCM, a script that was never checked in at all can’t be pushed out, offering some measure of protection against that type of human error.

Deployment automation can take many different forms, but always carries some similar characteristics. A script or some other executable fetches database object code from the VCS repository based on a set of revisions that are included in the release and executes them against a target database. Many of the aforementioned drawbacks associated with the version control of SQL scripts can be mitigated by a DBA or database release manager with knowledge of the database whose job it is to create a master script for each deployment that calls each SQL script individually in the proper order based on dependencies and other considerations. This person must work closely with developers to identify potential problems and verify that the master script communicates their intent. The same master script should be executed in each environment, including production in order to produce the type of repeatability that will ensure the highest software quality possible.

A newer option to reach the market is implementing one of the advanced software products that are specifically designed to version control databases and automate database deployments. These emerging technologies usually accomplish this by working natively in the database to read object metadata and execute SQL changes. By adjusting SCM to speak the database’s language rather than trying to force the database into the traditional file-based paradigm, they are able to automatically perform many of the manual tasks that were still present even in the automated solutions described above as well as offer additional value through features only enabled by their database-native nature.

Whether or not they are right for your organization depends fully on your needs, based on the complexity of your databases, the complexity of your processes, and the ability of the product to decrease cost throughout the database development lifecycle weighed against the investment. In fact, that goes for any potential solution. Regardless of the solution you use to version control your database and automate deployments—be it scripts, a homegrown application, or a commercial off the shelf product—process is always the most important factor of success.

Without the proper processes and buy-in from team members, the full value to your organization cannot be realized. That said, with all of the new enabling technologies, proven solutions, and thought leaders, it’s the perfect time to visit or revisit your database SCM and release management strategies. The impact to your organization may be more than you think.

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.