How to Plan a Data Migration Project
Sometimes it’s just time to move on. In the world of data, if you want to break up with your old software you’re going to need a plan to migrate your data. In basic terms, data migration is the transfer of data from one system to another. Typically, you migrate data during an upgrade of existing hardware or when you move data to a new system altogether. For example, a company might be moving data from on-premise to the cloud, or they might be upgrading a database to the latest version.
Commonly, an ETL (Extract, Transform, and Load) tool is used to move the data; however, finding a good ETL tool is only part of the picture. The migration plan will determine the ultimate success of your project.
Step 1: First things first — understand your goals and define a measure of success
While understanding your goals may seem obvious, it’s easy to miss the nuances of the requirements for the project if you don’t document and review them with critical stakeholders. Miss this crucial step, and you may end up solving the wrong problem. Gather the business requirements, write them down, and review them with stakeholders. Once you have a clear understanding of the business requirements, you can start to consider the solutions that make sense. As a part of this process, you should define what success looks like. What will be your measures of success? Do your systems need to remain up during the migration? What percentage of your data should be migrated successfully?
Step 2: Know what you have — and what you don’t
After you have a clear sense of your goals, it’s time to assess what you have and what it will take to meet the project requirements. You need a full inventory of all your data assets, and the associated applications. This includes an understanding of upstream and downstream applications that may be affected by your proposed change. This analysis is sometimes called a gap analysis — measuring the gap between what you have and what you need.
Step 3: Data profiling — know your data
Before you migrate your data, it’s essential that you understand the current state of your data. If your existing data is in bad shape, moving it to a new system isn’t going to improve it. Profiling your data will help you to understand if there are blank or null values, if the data is unique or duplicated, or if the data patterns and values fall into a range you expect.
Understanding these aspects of your data will help to determine what sort of cleansing you’ll need to do as a part of the migration process.
Step 4: Detailed planning
Once you understand your goals, and you’ve analyzed the current state of your assets and your data, you need to get planning. For each phase of the project, you need to strategize:
- Schema matching or mapping. The high-level organization of the data in a data store is called a schema. When you move from one system to another, you need to map the schema of the source data to a schema in the target system.
- Data mapping. Think of the schema mapping, but a little more granular. When you map data, you’ll need to consider the source field’s data types. You will likely need to transform data types and cleanse the data as a part of this process.
- Recovery plan. Each stage of the data migration should include recovery options. Plan how you can roll back changes during extraction, transformation, and loading. You may choose to use a staging area or move data in batches to ensure that you can safely recover your data.
- Test planning. As a part of your project planning, you should have a test plan in place. Consider what steps will need to be tested before you can safely move data. Will you use a staging area or a test environment before you move data? How will you verify that the data quality is acceptable? What amount of error is acceptable? Consider resiliency — how much downtime or latency is acceptable?
- Go live plan. This a high-level plan that encompasses the steps to go live — ensuring that the systems are prepared, the personnel is available and trained to perform the migration, affected stakeholders have been notified, and a time window for the move has been scheduled.
- Security planning. You need to plan how you will ensure the security of your data while it is being migrated. Additionally, just as with traditional on-prem solutions, considerations should be made regarding how applications secure objects and data. For instance, how can you ensure data can be secured at the appropriate granularity (row or column level)? Do you need to mask data or ensure that PII (Personally Identifiable Information) is removed? Consider the end-to-end handling of the data from moving it (whether you need encryption in motion and at rest), to how long it should be securely retained, and what systems can access it during transit.