Getting Started with Predictive Analytics in 5 Easy Steps
Updated · Feb 08, 2016
Page Contents
If you want to know what's likely to happen in your business, you need predictive analytics.
Predictive analytics is distinct from descriptive analytics, which shows what happened and when, and diagnostic analytics, which deals with why events happened in the past.
Predictive analytics is used for business purposes as diverse as predicting the likelihood of customers making insurance claims or defaulting on loans, estimating which candidates may stay in a call center job the longest, working out which transactions are likely to be fraudulent or establishing which spare parts a service engineer should bring to a customer site based on the original conversation with a customer. Often the “answers” are provided in terms of a probability score.
Of these three types of analytics, predictive analytics is arguably the hardest, because developing predictive models to apply to your data is a specialist skill. For early adopters of predictive analytics, that meant employing a data scientist or hiring some outside help to get started.
Predictive Analytics Getting Easier
But that's not necessarily the case anymore. Predictive analytics is becoming more accessible to mainstream users.
Some predictive analytics software, such as SAP's InfiniteInsight, aims to do specialist work for you by running a series of algorithms against your data and finding the one which describes it with the highest accuracy. Other solutions provide modeling tools for users who may only have a computer science or undergraduate statistics background. There are even marketplaces for pre-built predictive applications, such as Alteryx Analytics Gallery.
“You can do predictive analytics without a data scientist — within limits. But not to the extent that you can do it as well without a data scientist as with one,” said Mike Gualtieri, a principal analyst at Forrester Research.
If you are thinking of dipping your toes in the predictive analytics waters, five key steps will get you started:
- Pinning down what you want to predict
- Choosing the right predictive analytics software
- Finding the right data
- Preparing data and deriving a predictive analytics model
- Putting processes in place for using a predictive analytics model
Pin Down What You Want to Predict
It sounds obvious, but you can't hope to find answers until you have posed questions. So when you get started in predictive analytics it's important to know exactly what you want to know.
A typical question might be “what is the probability that a loan applicant will be able to repay the loan?” or “what sales channel are eventual purchasers most likely to be contacted in?” or “at what time are sales calls most likely to lead to a sale?”
One challenge is that it is impossible to know whether you have the right data to answer all your questions before you start a predictive analytics project, Gualtieri said.
“You need to generate a list of questions, but it's not a sure thing that you will get answers to all of them,” he said. “So you need to think like a venture capitalist, come up with a dozen questions and hope that two or three will get answered.”
From an organizational perspective, it's also important to establish that there's a demand for the answers. There is no point carrying out a predictive analysis project if no one in the organization will do anything with your insights. That means questions should be generated by business staff, not data scientists.
“The biggest mistake companies make is trying to hire a data scientist that understands their business. You don't need that; it's a waste of time,” Gualtieri said. “A data scientist who understands the business will try to decide what to predict, but you don't need a data scientist for that. Business people can do it.”
Choose Right Predictive Analytics Software
Reviewing individual predictive analytics packages is beyond the scope of this article, but a short list of leading predictive analytics software vendors includes the following companies. Several of them are featured in more detail in our predictive analytics buying guide.
Another option is open source analytics software. Check out this guide to open source predictive analytics tools.
Find the Right Data
Predictive analytics – in fact all analytics – is about getting insights from data, so you'll need to have some idea of the type of data that's necessary to answer the question you're interested in.
The more data you can get hold of that may be relevant, the better. That's because data that turns out to be irrelevant can be ignored, but data that is relevant but which is missing will lead to less accurate forecasts.
There's also a question of data granularity: At what level of detail do you need the data? As a general rule, you should use data of the same granularity as the question you want answered. So, for example, if you want to make predictions at the month level you need at least monthly data and if you want to make hour-by-hour predictions you need hourly data.
In some circumstances, you will also need to generate data before you can start answering questions. For example, if your sales force always make sales calls between 9 a.m. and 11 a.m. on weekdays, then you won't be able to use predictive analysis to predict how likely a sales call on a Sunday evening is to end in a sale because you won't have the necessary data to analyze.
To find out the best time to make sales calls, you would first have to get your sales team to start making calls at different times of the day and on different days of the week to generate sufficient data.
Prepare Data and Derive a Predictive Analytics Model
Most data scientists will tell you that the majority of time consumed in predictive analytics projects is spent accessing the data, and them preparing and cleaning it for analysis.
But Gualtieri said data collection and preparation is becoming less of an issue because companies are increasingly creating data lakes in Hadoop – making it easy to access in one place – and because data preparation tools are becoming more effective.
Typical preparation tasks include identifying and removing data that does not carry (significant) information needed to answer the question in hand, removing outlier data and dealing with missing data. Predictive analytics software packages vary in the extent to which they provide tools to make these tasks easy, or in some cases carry them out automatically.
And check out this recent Enterprise Apps Today article for more tips on data preparation for predictive analytics.
Put Process in Place for Using Predictive Analytics Model
Once you have come up with a predictive analytics model and prepared your data, you need to apply the model to the data to find answers to your questions.
But that's only part of the story, because the answers that predictive analytics yield only create business value if you have some process in place to ensure that your organization can use them. One of the most important steps you can take is ensuring your business modifies its behavior to profit from (or minimize the loss from) the outcomes that are predicted by predictive analytics.
This could involve anything from sharing the predictions with other business departments to building the model in to other systems.
“Predictive analytics has limited value unless the exposed insights can be deployed directly into software applications and business processes,” Gualtieri said in a report. “API calls, Web services and predictive model markup language (PMMLs) are some of the methods that companies are using to seamlessly integrate predictions into their business.”
Paul Rubens has been covering enterprise technology for over 20 years. In that time he has written for leading UK and international publications including The Economist, The Times, Financial Times, the BBC, Computing and ServerWatch.
Paul Ferrill has been writing for over 15 years about computers and network technology. He holds a BS in Electrical Engineering as well as a MS in Electrical Engineering. He is a regular contributor to the computer trade press. He has a specialization in complex data analysis and storage. He has written hundreds of articles and two books for various outlets over the years. His articles have appeared in Enterprise Apps Today and InfoWorld, Network World, PC Magazine, Forbes, and many other publications.