Data ethics icon
Data Ethics

Bias in Data Handling: A Primer

By Deborah Henderson

Bias is something that shapes and colors every aspect of our life. We don’t like to think we are biased, because we would rather be known for our fact-based decision-making and well-informed perspectives. Bias is alive and well in the world of data, too. And though I have read others’ thoughts on this topic, I feel compelled to share mine, too.

No one in an organization wants to believe they could fall into the trap of biased handling of data — particularly advanced analytics and reporting teams who would surely leave biased practices at the door. And besides, aren’t our workplaces filled with “good people,” so we don’t have to really be concerned about this?

We do have to care about bias related to data-handling. Why are we susceptible to bias in the workplace? It’s because we want our departments and organizations to be successful … and we want to be successful, too. We want our decisions to be sound ones that are based on unbiased approaches. Because of this, companies — and we as individuals — usually don’t self-monitor explicitly for bias.

But bias is always in play. It is part of the personal, social, corporate and cultural perspectives we bring to our work. This is why data-related risks caused by bias need to be assessed and controlled, and we need every business and IT stakeholder to acknowledge this. We should discuss and establish boundaries and expectations, and offer regular training to reinforce the effects of biased handing of data and the ways to avoid bias-based risks.

Analysts and others in the organization transform data into information by contextualizing and interpreting it. What’s fair to do, and what isn’t? Unfortunately, it would be hard to get people to agree on standards. But that doesn’t mean we shouldn’t try.

Bias in Data Collection

In my career, I’ve seen specific examples of bias in advanced analytics practices. A common one could be called hunch and search. In order to get a desired result, the data selected as the input to analysis might be pre-selected by a biased source. Analysts may have a hunch, and to satisfy it, they search for the data to tell that story. Sometimes this is intentional, other times it’s not.

Data sampling is another area prone to bias. Sampling in itself is, of course, not biased; however, it is virtually impossible for humans to sample without some sort of bias. The best way to avoid this type of bias is to use statistical packages to select samples and adequate sample sizes. But this effort will only be as bias-free as the human-supplied parameters and the full sample set in the sampling exercise.

Biased Use of Data

Next is biased use. In this type, the dataset is unbiased but it is used to satisfy a chosen approach. Is a company truly doing well or poorly? In order to give a positive impression, the corporate balance sheet that is presented to the public and shareholders could be missing worrisome debt and relabeled as “off-balance-sheet transactions” — or not even reported. (Enron comes to mind for me.)

In its extreme form (like in our global political environment), purposeful misinterpretation of data could be called propaganda. And if both sides are doing this, they’ll likely find that the same data can be used to support more than one argument. For example, this can be done by putting the emphasis on one narrow story from the data that is statistically unsupported.

We also see bias in the graphs and charts we produce, where they are scaled to a point where different stories can be told. I’m reminded of a recent visit to my bank, where I saw a poster on the wall from a well-known investment research firm. It showed the financial markets going back 100 years and the massive increase in the value of stocks over time. I noticed that the crash of 2008, which devastated many companies and individuals, was scaled quite small on the illustration. It gave the impression that nothing should ever be done to adjust equities portfolios in a market downturn, because the market will always continue to rise regardless of crashes, which (of course) is what the investment department wants to do: keep your money with them. But if you weren’t alive 100 years ago, your investment needs today would certainly be more nuanced.

My message here is that making good data visualizations is difficult. Often there are many options on how to visualize data and what message to send, as well as a requirement to understand how the visualization might be received by the intended audience.

Biased Perspectives

Last on my biased handling of data list is context and culture. I mentioned this before, but it bears repeating. We are all susceptible to bias, many forms of which we are not fully aware of. These biases could be cultural or context-based, so we must remember to check our analytical culture or context for any unthought-of perspectives.

Parting Thoughts on Biased Handling of Data

To keep bias at bay in your organization, I urge you to:

  • Think of your advanced analytics practices as scientific research, even if your business is not in the science sector. Scientists are charged with an unbiased approach to testing their hypotheses.
  • Develop policies and procedures that establish a framework for the ethical handling of data. In the advanced analytics requirements analysis and design phases, review and control project assumptions, context, dataset profiles and usage of analytic outcomes.
  • Become sensitive to data visualization messages and audiences’ information needs. If you are a charting wizard, be careful with statistics! Gear the visualization to the intended audience (e.g., Marketing), but also be sensitive to the ultimate consumer. They may not have the same sophistication in seeing the intended story — or worse, become skeptical of the whole.
  • Train employees on awareness to bias, and link bias to corporate reputation. Build a bias-sensitive corporate culture. Ultimately, we all want to work in ethical environments.
  • Remember that your company’s reputation is on the line every day. Biased data-handling, in a worst-case scenario, could generate unwanted media attention and even legal exposure for your company.
  • Care about your reputation, too. You could be the first line of defense in refusing to add bias to the data you work with. It’s okay to take a stand. And others may follow.
Note: In this article, Deborah refers to her ideas on bias first published in the DAMA Guide to the Data Management Body of Knowledge DAMA-DMBOK (Second Edition) 2017 Chapter 3 – Data Governance – Data Handing Ethics.