Data management icon

Why Data Debt is a Powerful Metric for Proving Data Management and Governance


“Data debt” is a term based on the concept of “technology debt,” which comes from the Agile software development world. Technology debt refers to the cost of deferring a software feature, or choosing an easy and/or quick solution instead of a more thoughtful one that would take longer (or be more difficult) to achieve.

As a concept and metric, data debt has similar meaning. It can be beneficial in revealing the huge costs in delaying “doing the right things” with data and information — particularly to numbers-focused senior leaders in your organization.

Data debt is not a magic bullet to be used, for example, by the data governance team to get things to always go a desired way. It is also not a magical ROI for data governance. However, it can be powerful as a way to track progress, support decision-making and show the tangible value of data governance and data management.

Defining Data Debt

Data debt has two definitions and uses:

  • As a metric, it’s an unofficial measurement depicting what an organization “borrows” when it chooses to not pay for something that is currently needed or will be in the future. This debt could typically be avoided by funding and executing basic data governance and data management activities. But that’s a perfect world, right?

Consider this example: You have a new data field or table that isn’t recorded in the glossary, and no one is 100% certain of its meaning. Each time someone wants to use that data, they’re spending time (and time means money) in looking it up — and their labor is a cost. Since you know you’re incurring this cost every time, you’ve created a debt. The data debt metric is the anticipated amount of time that will be wasted multiplied by a rate for the cost of that time. If you’d have taken the time to document the new field or table, you’d save your organization countless, wasted hours.

Data management people have known for years that enormous costs are incurred the longer you delay even the simplest and most basic levels of data management. Data debt now provides an actual number and rationale for that discussion.

  • As a message, data debt can be a concise, relevant metaphor for data governance and for the business and IT areas to communicate and prioritize decisions around data-intensive efforts. “Should we fund ‘ABC’ or not … and why?” “Are we going to increase our data debt by deferring a decision about ‘XYZ’ and having to pay for it later to make this data more leverageable?”

Here’s an example: During a planning meeting for a large application that’s under development, it’s clear that the current design calls for a new item master (i.e., a record that lists key information about an item). The entire team knows this will be a duplicate item master and that there will be issues in the future in synchronization, as well as potential errors in reports and data analysis.

In the best-case scenario, the team has a process to elevate concerns to ensure someone knows about this duplication of master data. If they are lucky, the program manager will authorize resources to address this data issue. But, sadly, the typical reaction is often “The deadline is important, so we will have to fix it later.”

Heads up! In this scenario, you immediately signed the promissory note. But if there was a data debt policy — or even if it was discussed conceptually — then there would likely be no issue. The conversation would go something like this: “Does it increase data debt?” “Yes.” “Then we will not do it.”

Data Debt in Action

As you can see with these examples, we incur debt when we choose to manage data casually. Or sometimes, “stuff happens” and we consciously accrue and acknowledge the debt. But when we use data debt — either as a metric or message — it can be quite effective.

Consider these applications for using data debt in your organization:

  • As a governor for analytics projects
  • As a way to value data assets (or liabilities)
  • To sustain enterprise information management initiatives

Here’s an example of data debit in action using simple numbers: Your IT budget is $100. You track both purchases and any data debt you’re racking up. At some point in the future, you spend $10 of your $100 budget (or 10% of your IT spend) to fix the data debt you accumulated. Now extrapolate this to a large company that spends a billion a year on IT. It would cost the organization $100 million to remediate this data debt!

Data Debt Quadrant

Let’s leverage another aspect of the technical debt concept, its quadrant framework*, to demonstrate how organizations accumulate data debt:

  1. Ignorance debt. For expediency’s sake and without realizing the full extent of the cost, we do something that will be expensive to re-do later (e.g., standalone and redundant master data). We recklessly make a decision about data and do so without acknowledging the impact of data debt. Moving away from the “ignorance” level of managing data debt will require some education and a good bit of sponsorship.
  2. Selfish debt. We know full well this is not the best way, but politics, ignorance or other attitudes instill a “ready, fire, aim” approach. We know the cost and do it anyway and make no allowances for debt remediation.
  3. Immature debt. We learn our lessons from a bad project and end up knowing the cost of our mistakes. The path forward here is reinforced data governance. (Side note: Many of the companies First San Francisco Partners works with are in this part of the quadrant. They’re making prudent decisions, but are just now grappling with the long-term cost of prior decisions and events.)
  4. Acknowledged debt. We know the cost of accruing data debt, but it’s our best choice right now — and we formulate a plan to lower the debt later. Remember, data debt is not the magic bullet to always do data management — it also is a means for making prudent business decisions.
*Software development thought leader Martin Fowler’s technical debt quadrant concept inspired this data debt example.
Data debt quadrant

The data debt quadrant demonstrates how organizations accumulate data debt, either through ignorance or other means.

Managing Data Debt

Here are scenarios of how companies might manage this debt using the data debt quadrant. And, for the moment, let’s cast judgement aside in the risk for doing so — for example, postponing GDPR compliance could be an expensive proposition for a company!

  1. Ignorance debt. This is the data debt scenario where a department or development team sees the need for a file or data store and creates it, without any consideration for the ramification to data assets.
  2. Selfish debt. A data scientist doesn’t want to wait for consumer data to be cleaned up, and neither does the marketing person he’s working with. They recognize there is a margin of error with the approach to move forward, and that they are assuming some risk and potential reduced value of the marketing campaign. If an organization is experiencing this scenario, some sort of training is required to support a process to analyze the difference between proceeding as planned vs. taking the time to clean up the data sources.
  3. Immature debt. A department wants to add a new BI/Analytics application, but is told they must defer the decision until next year. They decide to include both the application cost and the “cost of not doing anything” as two separate line items on next year’s budget. They do this to signal to leadership that the company is spending money dealing with the existing BI application.
  4. Acknowledged debt. A small company decides to postpone its investment in expensive data lineage tools and do data lineage manually. This means that, for a period of time, they are willing to assume some risk around being compliant with the General Data Protection Regulation (GDPR). They estimate the data debt and set up an allowance to pay it off and reconcile the noted risk next year.

What To Do About Data Debt

Whether deliberate or inadvertent, reckless or prudent, the mismanagement of data creates debt for an organization. Like all debts, they must be paid eventually — either slowly over time (and with interest) or in a big chunk that pays off the debt.

Your data management program needs to address data debt. This could include setting a policy for how much debt is tolerable, deciding how to pay off data debt and how best to educate the organization about this powerful tool.

Are you ready to tackle data debt? Does the concept resonate with you? Share your thoughts in the comments section.

Article contributed by John Ladley. He is a business technology thought leader and recognized authority in all aspects of Enterprise Information Management. He has 30 years’ experience in planning, project management, improving IT organizations and successful implementation of information systems. John is widely published, co-authoring a well-known data warehouse methodology and a trademarked process for data strategy planning. His books, “Making EIM Work for Business – A Guide to Understanding Information as an Asset” and “Data Governance – How to Design, Deploy and Sustain an Effective Data Governance Program,” are recognized as authoritative sources in the EIM field.

AI and generative AI article