INSIGHT: Big Data tells us what we want to hear

In the era of Big data, the theory goes that with more data we can learn more. What if that is wrong?

And since all data is pre-existing immediately after it is designed, it will always be tainted by the original design which assumes both the business process and meaningful questions about that process.

When a business analyst performs their task, they create inferences between data points where no actual data exists. When that same analyst becomes dissatisfied with their own theories of inference, they should seek new data to fill in the gaps.

Instead, most analysts resort to layering these inferences, all visibility into the underlying assumptions is lost and a data derivative is created. Adding more data does not necessarily support the analysis of new theories and often merely supports existing theory.

Data derivatives are reinforced over time, because they represent highly complex layers of logical arguments. But, once again, logic is an interpretation of facts.

So, what are we to do?

Over the past thirty-six months, Gartner has witnessed many data miners, data analysts, senior systems analysts and business intelligence professionals change their title to “data scientist”.

It is not unusual for professionals to adhere to market hype and promote their prestige and incomes by doing so. But, what business strategists need is “real” data science.

Real data science is the practice of building out competing interpretations of data, many multi-layered analytic theorems that intentionally challenge the inferences used by the others. True data science compares these theories along at least two axes.

First, how easy is it to trace the actual data used back to its originating business process? How many jumps or hops created the data? The number of assumptions and the complexity of the inferences between data points that are in use gives some idea of how reliable the data is.

Second, how far removed from the physical process world is the data point? Meters that record electrical pulses are pretty accurate; however, the manner in which they are recorded, the record layout, even the decision to record “meaningful change” in the electro-static state of a device, is a form of bias.

We record want we want to hear. Data science quantifies these distances and constructs models that test these assumptions, maybe fifty, maybe a thousand different ways. Then data science tests the veracity of the models.

Now, don’t take this as instructions on how to pursue data science yourself or how to identify a data scientist—I said at least two axes.

But, the next time you decide to make a decision based upon data, remember it only tells you what the process designer thought of at the time of deployment and literally everything else is only theory.

You need data science to identify where the inferences are becoming extreme (and becoming derivative) and: either obtain integral data to fill those assumptions with facts; or, your data science team must build multiple models that constantly challenge those over-burdened assumptions (derivatives) with competing inference laden theory.

And that, is how can start to actually listen to the data.

By Mark Beyer - VP distinguished analyst, Gartner

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags Gartnerbig data

More about Gartner

Show Comments