Migrating Data Successfully

[article]
Summary:

Businesses invest a lot of money (usually more than budgeted), time, and other resources to migrate their legacy data. The reasons for making such an investment range from upgrading to the latest version of an enterprise resource plan to making old data compatible with new information systems. The success of these data migration projects depends on a number of critical factors. This article looks at a few of them.

Consider Data Migration as a Separate Project
In most organizations, data migration occurs either as part of the customization of an enterprise resource plan (ERP) package or during implementation of a new system. There are a few pitfalls to this approach. Usually the cost and time frame of a project cover both customization and implementation, which includes data migration. In most cases these approximations, along with the risks associated with data migration, are underestimated. Customization and implementation either take precedence or are parallel to this data migration. But in most cases, a majority of problems at the end of a project are due to poor data migration tools, techniques, or utilities. The newly implemented systems or customized code may work well independently, but several problems surface with migrated data.

In order to resolve this issue, treat the data migration separately from customization and implementation. Data migrations are most often performed by conversion utilities that are developed in-house for this specific task. Since conversion utilities are pieces of software, it seems logical to consider the development and testing of that software as separate projects. This allows for independent design, development, and testing of a utility before executing it on the existing system. Conversion, being a one-off process has a huge impact on customization or implementation.

Review the Quality of the Data
Quality of data can pose serious issues for data migration projects. Poor quality can lead to extreme delays and also cause projects to fail. In any given organization, data is created or manipulated as a result of people, process, and technology. Lack of user knowledge, absence of stable and robust processes, and missing relationship linkages can lead to poor quality data in any system. The most common issues that emerge with data are:
 

  1. 1. Incomplete Data--Data can be missing partially or completely. For example, if a record has six fields and some of them are empty, the data is deemed to be incomplete. Such data records cause problems during migration unless the utility is designed to handle these scenarios.
  2. Duplicate Data--Multiple instances of the same data is a big problem during data migration. It's unlikely that conversion will ignore duplicate data records. Since the data format is different in each of the duplicate records, though the information is the same, it is difficult to narrow down and ignore duplicate data records.
  3. Data Non-conformity--This refers to information stored in non-standard formats such as free text fields.
  4. Inconsistent Data--When merging various systems, the data can lack consistency and represent wrong information.
  5. Inaccurate Data--Data deteriorates over time, which can cause a lot of difficulties during migration.
  6. Data Integrity--Missing relationship linkages can drastically degrade the quality of data and pose problems during migration.

Prior to developing a conversion utility for data migration, it is worthwhile to research the type of data presented in the system. This is similar to the requirements gathering phase of any project. Listing all the types of data that need to be converted reduces the risk of errors.

Create a Test Database
The successful delivery of any development project relies on how effectively the user requirements are translated into a working application. Similarly, a successful data migration should convert all the data correctly, and all business processes should operate on the converted data without any errors. In order to achieve this, we should have a correct and realistic idea of the data and range of inconsistencies with which we are dealing. This is best achieved using a snapshot or a scaled-down version of the actual database. As the utility is developed, it can be executed repeatedly on this test database to achieve the correct results.

Using a test database serves two purposes: The development team gets a real picture of what is expected of the conversion routine, and the operational system is not subjected to any conversion until the data migration utility is stable and yields expected results. This ensures that we do not corrupt or create inconsistent data with the real system data.

Use Iterative Development
A conversion utility is designed to cater to a particular customization and implementation project. It is difficult to get the complete conversion working correctly on the first attempt. When data is converted using a standard utility, often data records will be converted incorrectly or only partially. Listing all the possible problems in data records of the existing system is a difficult task. The best way to deal with the missing pieces or defects is to use an iterative approach. Once the conversion utility is developed, it can be executed on the test database.

Closely examine the resultant data, then and then run critical business processes on this data. The defects found should be due to missing data, incorrectly converted data, or duplicate data. Resolve your problems and repeat the conversion. After a few iterations, a stable and relatively bug-free utility will be developed. The number of iterations will depend on the nature of defects and inconsistencies found after the test migrations. The extent and severity of the defects will depend not only on the size and complexity of the converted data but also on the quality of the data that is subject to conversion.

Have a Comprehensive Test Plan
Like any other application development, it is important to have a comprehensive test plan to achieve the required level of quality for a data migration tool. The only major point to keep in mind is that the test plan used for testing a conversion routine should focus on data integrity, apart from other test standards. There should be adequate room for regression testing that stresses critical business processes. These regression tests should be executed repeatedly at the end of each development iteration and after bug fixes.

The test plan should accept new and converted data while executing the critical business processes to ensure that users can work with both types of data after the data migration is completed. Use of a wide range of data will uncover many errors and help build a robust and effective conversion utility. Using a copy of the actual working database will give the test and development teams a truer picture of problems and their solutions.

Tags: 

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.