INSIGHT: Why Big Data isn’t suited to analytics

"Big Data is not intrinsically suited to analytics. I’m sorry, but it’s true."

Big Data is not intrinsically suited to analytics. I’m sorry, but it’s true.

Distributed storage and massive scale can make data hard to find. Lack of normalisation and structure means an analyst often has a lot of hamster-wheeling to do just to know what the data is — much less combine it with other sources.

Add to this the challenge of simultaneous batch and streaming processing and a legacy of marketing analysts who are whizzes with Adobe and Google Analytics but not SQL, much less statistical programming languages such as SAS and R.

Advanced analytics for marketing is not synonymous with Big Data, and vice versa. Adoption of pure Big Data analytics, such as MapReduce/NoSQL engines, remains below 5 percent across most industries and company sizes, according to Gartner’s 2014 survey of analytics spending intentions.

A recent Gartner survey found that 80 percent of analytics use cases still require a traditional data warehouse.

Early exceptions are seen in media, services and communications industries, which have been aggressive in building out marketing analytics teams and staffing up centres of excellence, according to Gartner’s Survey of Data-Driven Marketing, 2015.

Many analytics techniques can and do make use of Big Data stores, generally by transforming it into structured or semistructured formats first. These include:

• Data mining and predictive analytics

• Text and speech analytics

• Video analytics

• Social media and sentiment analysis

• Location and sensor analytics

• Machine learning

Because it is often organised in unstructured files rather than structured tables, Hadoop originally did not work with SQL. And NoSQL databases offered little support for ad hoc queries from analysts.

Even basic questions could (and can) require programming skills. Open source and commercial markets have been working feverishly to fill the gap.

Apache Hadoop provides SQL capabilities via Apache Hive, which is based on MapReduce. Commercial technologies including Cloudera Impala and Hortonworks, among others, provide ways to access Big Data stores for analysis.

Pivotal offers a platform-as-a-service suite called Hawq to analyze data lakes. Other B.I. vendors have developed platforms with Big Data support, including Alteryx, Qlik and Splunk Hunk.

Machine learning. An area of growing interest and relevance to marketers is machine learning. This is defined as the use of software to find high-order interactions and patterns within large amounts of data in ways that surpass human capabilities.

Big Data stores can provide a wealth of historical data that has simply been too voluminous and unstructured for human analysts to interrogate.

And the volume and complexity of data and interactions on social networks, across marketing and advertising channels, from mobile apps and sensors, such as in-store beacons – all are ripe for scrutiny by smart machines.

Frameworks for machine learning, including the widely used open source machine learning library (MLlib), are also compatible with in-memory processing models like Spark. Companies such as FICO and Microsoft offer machine learning through SaaS.

Open source tools. Clearly, open source permeates Big Data storage and processing. The same is true of analytics.

Commercial statistical programming languages such as SAS and SPSS have long been used by advanced marketing analysts, and are of course still relevant.

Most analysts are also aware of open source tools widely used in the context of Big Data. These include R, Python and Weka (now Pentaho Data Mining). They provide an ever-evolving and powerful set of methods and an active global community of users that are continually adding features to support marketers’ needs.

Lower cost, certain features, and continual updates open source analytics tools an attractive complement to commercial software in some situations.

For example, Audi USA’s digital marketing agency AKQA developed a number of R models to deliver more personalised images and content on the website for returning visitors, as well as to suggest options for the car configurator based on the users’ previous behaviour.

Talent gaps. Advanced analytics talent is in short supply everywhere, and the “data science gap” is particularly acute in marketing.

Gartner’s March 2014 survey of data-driven marketers found that 54 percent of organisations got big data analysts from internal development, 32 percent relied on consultants, and only 13 percent were able to bring on outside talent.

Gartner clients report real problems with both training and retention.

The good news is that skills are becoming more common as analysts enroll in coursework or teach themselves how to fish.

Big Data infrastructure and statistical languages are not easy to master, but they are relatively easy for an experienced analyst to start using. Online learning modules and communities such as StackOverflow abound.

As skills improve at the same time that traditional marketing analytics tools add more capabilities in familiar interfaces, we expect the pain to subside.

Martin Kihn - Research Analyst, Gartner

Join the Computerworld New Zealand newsletter!

Error: Please check your email address.

Tags analyticsGartnerbig data

More about AdvancedApacheClouderaGartnerGoogleMicrosoftPivotalQlikSASSplunkSPSS

Show Comments
[]