Business Intelligence Ain’t Over Until Exploratory Data Analysis Sings
Updated · Apr 09, 2012
Business intelligence has taken a step forward in maturity over the last few years, as statistical packages have become more associated with analytics. SAS has for years distinguished itself by its statistics-focused business intelligence solution; but when IBM acquired SPSS, the grand-daddy of statistical packages, the importance of more rigorous analysis of company and customer data seemed both confirmed and more obvious.
Moreover, over the years, data miners have begun to draw on the insights of university researchers about things like “data mining bias” and Bayesian statistics – and the most in-depth, competitive-advantage-determining analyses have benefited as a result.
So it would seem that data miners, business analysts and IT are on a nice query-technology glide path. Statistics completes the flexibility of analytics by covering one extreme of certainty and analytical complexity, while traditional analytics tools cover the rest of the spectrum up from situations where shallow and imprecise analysis is appropriate. And statistical techniques filter down by technology evolution to the “unwashed masses” of end users.
And yet there is a glaring gap in this picture – or at least a gap that should be glaring. This gap might be summed up as Alice in Wonderland’s “verdict first, then the trial.” Both the business and the researcher start with their own narrow picture of what the customer or research subject should look like, and the analytics and statistics that accompany such hypotheses are designed to narrow in on a solution rather than expand due to unexpected data. Thus, the business/researcher is likely to miss key customer insights, psychological and otherwise.
Pile on top of this the “not invented here” syndrome characteristic of most enterprises, and the “confirmation bias” that recent research has shown to be prevalent among individuals and organizations, and you have a real analytical problem on your hands.
In our excitement about the real advances in our ability to understand the customer via social media, we often fail to notice how the recent popularity of “qualitative methods” in psychology has exposed, to those who are willing to see, the enormous amount of insights that traditional statistics fails to capture about customer psychology, sociology and behavior. In the world of business, as I can personally attest, the same type of problem exists.
For more than a decade, I have run total cost-of-ownership (TCO) studies, particularly on SMB use of databases. I discovered early on that open-ended interviews of relatively few sys admins was far more effective in capturing the real costs of databases than broader on-a-scale-from-one-to-five inflexible surveys of CIOs. Moreover, if I just included the ability of the interviewee to tell a story from his or her point of view, the respondent would consistently come up with an insight of extraordinary value. One such insight, for example, is the idea that SMBs don't always care so much about technology that saves operational costs as much as technology that saves an office head time by requiring him or her to just press a button as he or she shuts off the lights on Saturday night.
EDA: Getting Preliminary Data Analysis Right
The key to success for my “surveys” was that they were designed to be:
- Open-ended (They were able to go in a new direction during the interview, and leaving space for whatever the interviewer might have left out.)
- Interviewee-driven (They started by letting the interviewee tell a story as he or she saw it.)
- Flexible in the kind of data collected (Typically an IT organization did not know the overall costs of database administration for their organization and in a survey they would have guessed — badly –but they almost invariably knew how many database instances per administrator.)
As it turns out, there is a comparable statistical approach for the data analysis side of things. It’s called exploratory data analysis, or EDA.
Wayne Kernochan has been an IT industry analyst and auther for over 15 years. He has been focusing on the most important information-related technologies as well as ways to measure their effectiveness over that period. He also has extensive research on the SMB, Big Data, BI, databases, development tools and data virtualization solutions. Wayne is a regular speaker at webinars and is a writer for many publications.