The recent United States presidential election was hard fought on both sides and generated a lot of emotion. It seems that divisions remain, but one thing both sides can agree on is the surprising lack of accuracy in the predictions about who would win and by how much.
This seems to have dealt a considerable blow to the reputation of data science. The perception is now very widespread, and there can be little doubt that it is influencing more general attitudes.
Could it be that data science has suffered such loss of prestige that decision-makers will start to question the viability of funding data science, and the more general data management activities that support it?
Before we go further, we should first acknowledge the failure of data science has been real. Even Cambridge Analytica, the British firm that the Trump campaign hired to do much of its data science, thought that the Republican candidate only had a 30% chance of winning. In other words, even they were only slightly less wrong than the rest of the pollsters.
However, that was before the election. Now, afterwards and despite the shock of all the incorrect predictions, there has been a good deal of hype about the merits of the data science on the winning side and its inadequacies on the losing side. Such claims should be treated with skepticism. Let’s act like data scientists for a moment and use a little formal logic to show how these claims are invalid.
Consider the syllogism:
- Premise 1: If our campaign’s data science is good, then we will win the election.
- Premise 2: We won the election.
- Conclusion: Therefore, our campaign’s data science is good.
This is an example of the formal logical fallacy of affirming the consequent. It is not a valid argument; it does not prove the conclusion to be true. You can run a similar example to prove that the losing side did not have bad data science merely because it lost. So we should be very wary of any claims that someone has a form of data science that the winning campaign used to win, or that the losing side refused to use and therefore lost.
What Does It All Mean for Data Science?
But this still does not explain why data science got things so badly wrong. Now, perhaps we are being unfair to the field and expecting from it things that it cannot deliver. So what are the benefits that data science is supposed to provide? While it is difficult to pin this down exactly, fundamentally data science seems to try to:
- Discover data that can predict things (“predictors”)
- Formulate hypotheses and use data to test them
However, there appears to be a more popular expectation that data scientists simply assemble data, devise models and from these they can make predictions. This is hubris. How could a data scientist, having no knowledge other than what they get from the data, actually know how the world from which this data was taken really work? There are going to be vastly more variables than can ever be captured and put into reliable, usable data than a data scientist could operate on. Something is missing from this picture — the business subject matter expert.
A Personal Anecdote
Business subject matter experts come in all shapes and sizes and can turn up unexpectedly. Last August, in the heat of summer and at a low moment in the Republican campaign, I went to look at some land for sale in rural Pennsylvania. There I met a realtor with more than 30 years of experience in Pike County, an area bordering New Jersey. She was an avid Hillary supporter and an activist for the Democratic Party.
For about half an hour we pleasantly bantered about the latest Trump scandal. She offered an animated explanation on how the county demographics were changing from Republican to Democrat, after which I said, “So you must be happy — your candidate is pretty sure to win in November.” But she quietly shook her head and said, “I actually don’t think so. There are people out there who nobody knows exist, who have never voted before, and believe me they are going to come out of the woodwork for Trump.”
Perhaps a straw in the wind, but ultimately a lot better analysis than what the data scientists were able to come up with. So where does this leave us with data science?
It seems to me that data science is best when it supplies inputs to business subject matter experts (SMEs) who have the ability to assimilate what the data science is providing into their overall understanding of the domain they have their expertise in. As SMEs, they will be aware of a broad range of the variables that exist in the domain, many of which will never be represented in the data — and they can weigh what the data science is providing against this background. The notion that data scientists can unlock understanding based on data alone seems to be naïve.
Perhaps it is all best summed up by an FBI Special Agent I met last year who told me, “You can learn a lot from data, but I prefer to get confessions.”
He had a point.