Pushing Back against Database Drift

[article]
Summary:

You might know the phenomenon of database drift as version creep—or simply that sinking feeling when the production system isn’t in the state you expect it to be. If you’re having drift problems, the odds are that there’s a technical, process, or change management issue somewhere farther up the chain.

You might know the phenomenon of database drift as version creep—or simply that sinking feeling when the production system isn’t in the state you expect it to be. Your deployment fails, and nobody knows why. It could be much less serious than that and involve only some minor differences that you need to take a few minutes to understand. Or it could be much more. Either way, drift is that distance between what’s actually running on the production server and what was in the last sanctioned release.

It’s also like a code smell. If you’re having drift problems, the odds are that there’s a technical, process, or change management issue somewhere farther up the chain.

Interestingly, drift really only comes into the equation when you’re actually adopting good ALM practice. It’s less of a problem if your database lifecycle management process is either extremely mature (for example, database changes are completely integrated into automated continuous delivery) or extremely immature (where each deployment is sufficiently manual that it’s indistinguishable from drift anyway). Those who are somewhere in between are likely to encounter drift.

Why Is Drift an Issue?

Because SQL often modifies the state of existing objects rather than being built and delivered in the same way as application code, you’re rarely going to be deploying something brand-new from scratch over the top of the last version. A database upgrade script run as part of a release that expects a target schema in a certain state will fail or produce unexpected results if the target schema is not in that state. A column’s data type might have changed, or an object may have been renamed, for example. In this case, a deployment script written against the latest development version of the database that hasn’t had these changes made just isn’t going to work.

At its worst, drift causes data loss when upgrade scripts fail. At its best, drift causes them to fail nondestructively, and it will only cost you extra time. Where it’s possible and sensible, changes are wrapped in transactions, so you’re more likely to encounter the time-wasted scenario rather than a serious business continuity problem or a nasty production outage. If your organization has heavy compliance requirements, however, drift impacts change tracking and keeping an audit trail.

The technical reasons for drift are rooted in that database code/application code mismatch and the need to change the state of objects, respect dependencies, and ensure data persistence. They can—to an extent—be obviated with automatic schema deployment and synchronization tools, with methodologies that avoid the problem by working on databases as scripts (for example, Microsoft’s database projects in SQL Server), and potentially with extensive use of object-relational mapping. Not all of these are suited to all development organizations.

If you’ve already got a set of production databases, switching your entire development methodology may not be desirable. Tools exist to automatically generate a change script that will update a target database schema. Some tooling will enable you to automatically create deployment scripts from a latest version in source control.

Can We Automate Drift Away?

It stands to reason that if an upgrade script is automatically created against the current state of production, the current state of production stops being something we have to think about.

But automation tooling is limited in its actual understanding of data. For example, a table renaming or splitting one column into two could be treated as a drop-and-create, leading to data loss. Or consider the rollback situation: What do we do with data accumulated after a schema change if we need to revert to a previous version? These issues mean that perfect fire-and-forget automation is not always possible.

Including specific migration scripts in the deployment helps with repeatability, but it's incredibly difficult to construct an automation system in which the actual state of production does not matter. For a simple database, you could potentially find a way to drop and recreate it with each deployment, or for a new project you could switch methodologies to working on offline scripts. But for most current systems, we can broadly rule out automating drift away.

Can We Fix the Organization?

This leaves us with fixing the process and workflow problems that cause drift to appear. At the intersection of dev and ops, where we’re servicing both the demands of the business to deliver improvements quickly and the needs to ensure continuity and reliability of infrastructure, there are several families of problems that drift can indicate.

Specialized conditions in production

You may have invested in specialized production systems that are not feasible or cost-effective to maintain in development. Perhaps it’s a database running on particular Oracle hardware, or maybe huge volumes of production data that can’t be recreated in test environments and require specialized server configuration.

There’s no solution here above “Don’t do that,” and that’s profoundly fatuous in the actual real world. If your drift is a function of lacking advanced production hardware in your development and staging environments, then drift as a warning sign has performed the amazing deductive feat of diagnosing that you don’t have an infinite budget.

All we can really do is ensure that testing is as realistic and robust as possible and that the special conditions are well documented and well known as part of the deployment process. 

Cumbersome release processes versus short deadlines

Sometimes, making a small change to a live system just feels simpler than going through the whole release process. This is probably a sign that something, somewhere, is broken. If you’re running continuous delivery for database changes, it’s unlikely to happen. But if releases require extensive sign-off and communication isn’t good, or if different groups are separately responsible for application and database changes, it’s easy to see how this could happen. Schema change monitoring to detect drift will help here, but it’s the messier, interpersonal end of change management that really straightens this one out.

Are dev and database administration teams talking? Is the release process as streamlined as it can sensibly be? At the other end, are permissions correctly set so that rogue changes can’t be made without someone knowing? The structural problem is that a business’s demands and its processes for delivering may be misaligned. Solving this for dev teams can mean the awkward kind of managing upward and ensuring people understand the consequences of what they’re asking for.

Tight coupling with business process

This is related to but not quite the same as the release process problem. Organizations grow over time, and codebases grow with them. They accrete dependencies and build data structures that come to mirror the business processes that mandate them.

Fighting this is hard, as is getting the time allocated to refactor systems that stakeholders perceive to work. But if, say, your production database is also used for reporting by business intelligence, and without much abstraction in between, then every deployment has a massive change management overhead. This, in turn, could mean little manual tweaks to deployments or small occasional changes to service other business requirements.

The specific case can be addressed with data warehousing systems and more abstraction, but the core problem of accidental complexity and a lot of business logic in the database is again organizational. If the accumulated business dependencies on a given database effectively mandate drift, you can’t code your way out of it.

What Do We Do?

Solving the bigger problems is all about communicating and change management. Business stakeholders need to have their expectations managed and understand that asking for quick fixes may incubate trouble for the future. Technical teams need to be given time to explore more robust solutions and options like continuous delivery that can make them more responsive in the long run. That can be a hard sell. Within technical teams, one of the most important pieces of communication is between database administrators and application developers. Often they’re the same person or team (such as in small organizations), but where they’re more siloed, it’s easy for their competing responsibilities to lead to deadlock. Talking to each other and, where possible, working on the same teams is the most basic step to solving the problem. It can also be worthwhile to look into the potential of dashboards to support greater shared visibility of releases and infrastructure.

Database drift will never be your biggest change management headache, but it’s a useful example for surfacing others. Detecting drift and root-cause analysis examples can be a useful opener to conversations around those problems—and to solving them for the future.

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.