Mastering and Managing Data Understanding


Our first two articles in this series introduced the essential capabilities that contribute to the success of analytics-based initiatives: business alignment (determining context and value); data understanding (seeking to better understand data assets and manage accordingly); data quality (defining accuracy for data’s use); data-centric processes (increasing understanding as new data is created, used, managed and measured); and data-centric resources (embedding data-oriented knowledge and skills).

To review:

  1. Business alignment: Determine context and value of using information.
  2. Data understanding: Seek to better understand data assets and manage accordingly.
  3. Data quality: Define accuracy for the purpose for which data is being used.
  4. Data-centric processes: Increase understanding as new data is created, used, managed and measured as part of operational processes.
  5. Data-centric resources: Embed data-oriented knowledge and skills throughout the staff.

In this article, I will cover the “data understanding” capability. Sometimes this area is called “data inventory” or “data landscape.” Regardless of the label, once you have business alignment and a clear view of how you will use your data, then it is time to understand the specific data needs. This capability also establishes the framework for how you will gather, manage and use metadata. You need to understand not only where data is (inventory) but also its current usage (sources).

BIR℠ seeds data understanding

If you reflect back to the business alignment article, we developed solid alignment to business needs and produced a catalog of business information requirements℠ (BIR). We have, in effect, seeded the data understanding capability we need to develop. Once there is an articulation of the information requirements, then it is possible to break them down into the specific data needs.

Data understanding is the knowledge that you have about the data, the needs that the data will satisfy, its content and location. To be clear, it is much more than current location and a definition of what a data element means in situ within an application or data base.

There is no tool or artifact for data understanding. Data understanding is expressed in organizations as business glossaries, data dictionaries, models and other forms of metadata or other places where information about the data is stored.

Collecting all of this information can be difficult. We strongly recommend you use a process that generates your data capability metadata as you lay out your requirements and delivery architectures. This should not be an “after the project” process.

The role of data characteristics

At First San Francisco Partners, we use this standard set* of characteristics to get us started toward the details of data understanding:

  •  Granularity: The level of detail required to support the requirement (applies mostly to measurements and dimensions).
  • Fact Volatility: The frequency at which new data is added or updated for usage.
  • Dimensional Complexity: The relative nature of how extensive and/or abstracted the various reporting dimensions are.
  • Dimensional Volatility: The frequency at which dimensions change, are added to or updated for BI uses.
  • Algorithm Complexity: Statement of the relative sophistication of the algorithm that is applied as part of this metric or BIR.
  • Style of Use: Classification of how the requirement is used. Is this requirement being used operationally or managerially, or is it analytical in nature? (They are not mutually exclusive.)
  • Historicity: The extent to which historical reporting requirements are necessary as part of this BIR or metric.
  • Latency: The time between when the data is available and when is it required to be placed into the framework (often called velocity).
  • Cross Functionality/Distribution: The extent to which the information will be used across an enterprise or external to the enterprise.
  • Size: Relative amount of logical data required to meet all granularity and dimensional requirements.
  • Sensitivity: Is the BIR regulated or subject to privacy oversight, or does it otherwise present a level of risk that needs to be addressed?
  • Source Complexity: Defines relative complexity of gathering and moving data into the information framework.
  • Content Variety: The variety of organization and “at rest” formats for data — digital media, document, email, discrete field, etc.
  • Frequency: How often the information is accessed to produce a particular measure or requirement; differs from periodicity, e.g., daily activity is accessed weekly.
  • Response Time: The speed required for the enterprise to react to the metric (or BIR fact) and take action, e.g., with customer or other touch point.
  • Follow-up Time: The time desired to allocate to responding to a metric or stimulus, i.e., you can respond quickly but will only be able to spend an hour on the event.
  • Data Quality: Degree of usefulness, accuracy or effectiveness of source data for this BIR or metric.
  • Availability: Description of when the requirement needs to be available to be used.
  • Persistence: The extent to which the data set supportive of this metric or BIR remains stable or needs to remain stable.
  • Access Type: Mode of data access, e.g. will the information be accessed directly, available mobile, used in an advanced scientific fashion or simply rolled off onto reports?

Each BIR is evaluated across each characteristic. Besides revealing obvious architecture patterns, you will find this exercise draws out a deep understanding of the data.

Note that we have not really “filled in” the more detailed aspects of metadata. We are not declaring filed types, lengths, ranges or rules. This still needs to be done. But we have started at a business view and engaged business users deeply at this point.

Again, these are characteristics that we have evolved over many years. Your organization may need to alter one of these to be more relevant. You may even have another (and we would love to see it), so feel free to add it in. At minimum, consider each one of these deliberately and in the context of your EIM needs.

The detailed aspects, such as sources, physical characteristics, movement, ranges, rules and standard values, can all come along now without a lot of debate, since we have established and agreed upon a large set of essential characteristics.

We cannot emphasize this enough: You do not want to start with field lengths and a definition of a data element without any context. This is a time-consuming and ineffective approach. This might be part of the process to gather existing glossary items and develop a sense of source and the variety of meaning.

But this will not support effective analytics and data usage in the long term. Working into the detailed metadata aspects from a position of understanding the context, i.e., a thorough, efficient understanding how the data is to be used.

This exercise also supports the stewardship process, as it is also critical to understand who is accountable for that data in case there are questions or points of clarification needed. While we are not covering data governance in this series, remember it is just as important to the success of any analytic effort as the data understanding or technology. Very often, in successful data-driven organizations, data governance functions actually oversee the processes and methods we just covered.

Bottom line

The data understanding capability is a true manifestation of operational management of data as an asset. Besides the mechanical aspects of making metadata, you are delving into understanding how data is to be used. This results in more effective deployment of solutions.

Candidly, the techniques we are covering in this series of articles evolved from our experiences of being called in to fix many failed BI and analytics programs. We still see too many “analytics” projects that are just simple BI. We see heavy investing in new technologies when better business alignment could have fixed the problem.

It never hurts to review and remember that techniques support your process — and these are the goals you want to accomplish:

  • Prioritize BI/analytics efforts to meet business needs.
  • Reach consensus on business vocabulary and definitions.
  • Craft solutions that meet all customer needs and deliver value.
  • Manage costs and risks embedded in current reporting and BI approaches.
  • Partner to expedite “time-to-market” (learn and share).

Do not lose sight that the ultimate goal is to align your business to appropriate BI and analytical solutions. By using a data understanding framework, you will know your data inventory, its sources and uses and be better positioned for success.

*BIR℠ characteristics are property of First San Francisco Partners, copyright 2012. Duplication or use without permission is prohibited.
— Originally published on our channel “Turning Data Into Insight” by John Ladley July 28, 2016

John Ladley is a business technology thought leader and recognized authority in all aspects of enterprise information management. He has 30 years’ experience in planning, project management, improving IT organizations and successful implementation of information systems and is a widely published author.

AI and generative AI article