Will the Real Predictive Analytics Please Stand Up

by Scott Hornbuckle and Nameeta Raj

As an entrepreneur that focuses on utilizing leading edge technology to improve my clients’ businesses, I am often faced with people and companies using buzzwords carelessly, with little to no substance behind their claims. Predictive analytics along with big data, IoT, etc. are all the rage, but what is real, and what is just marketing fluff?

Let’s take predictive analytics as an example. Wikipedia defines predictive analytics as:

“Predictive analytics encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events.”

By this definition, predictive analytics would essentially be the utilization of a combination of statistical algorithms combined with machine learning and data mining used to predict a future event based on patterns discovered in historical data. Here’s the key, in order for a solution to be considered predictive analytics, it must include all of these components. Frequently when meeting with prospective clients, we are told that they are already using predictive analytics. When we probe a bit deeper, we discover that the client has created a spreadsheet that uses a simple linear regression equation, or they are using the linear algorithm included in a SQL database. While this is all fine and good, that’s just statistics, not predictive analytics.

Let’s go over an example from the office products industry. We developed a solution called SuppliesIQ to help printer/copier dealers reduce the cost of wasted toner from cartridges being changed out before they are empty. SuppliesIQ makes use of a time series modelling technique to ensure just in time (JiT) delivery of cartridges. SuppliesIQ is highly dynamic and chooses the best fitting model from a wide range of seasonal and non-seasonal time series models, not for each device, but for each cartridge within the device. An autoregressive integrated moving average (ARIMA) model forms the base model for SuppliesIQ. The models are created with the help of IBMs Predictive Maintenance and Quality platform which enables switching between ARIMA and exponential smoothing models to find the best fitting model for the toner cartridge. Historical data permits the model to identify quarterly monthly and weekly seasonality and adjust the predictions accordingly.

The graph below shows SuppliesIQ in comparison to a basic linear regression model present in the market. The orange line represents the actual toner levels; the blue line represents the predicted toner levels by SuppliesIQ, and the green line represents the estimated empty date according to the linear algorithm. The SuppliesIQ model accurately captures the straightforward weekly seasonality and the graph is relatively flat on weekends.

Figure 1: SuppliesIQ vs Linear Regression Prediction

Figure 1: SuppliesIQ vs Linear Regression Prediction

This cartridge ran empty on 11/14/2017. The linear regression model predicted it 6 days after the cartridge ran empty whereas the SuppliesIQ model predicted it a day after. Due to the short cycle length of the latest cycle, this linear regression model could not completely adapt to the increased printing behavior.

Figure 2: Six-month Toner Cycles

Figure 2: Six-month Toner Cycles

So, what are the key things to look for when determining whether or not a solution truly uses predictive analytics? Here are three key thinks to look for:

1.     More than just statistics: Advanced statistics are a key component of any predictive analytics solution. However, if one is simply using an out of the box linear algorithm from a tool like Excel, SQL, etc., I wouldn’t consider this to be predictive analytics. It can make fairly rudimentary predictions, but these are not the same.

2.     It’s dynamic and adjusts to changes in environment: This is one of the key components that separates true predictive analytics from the posers. Business environments are continuously in flux. This is due to business cycles, seasonality, scaling up/down, etc. An example of this is a school. If a printer is low on toner, when should the cartridge be shipped? Well, this depends on the context of the device. If it’s May, and the school is getting ready to dismiss students for break, the toner cartridge may be able to last until the start of the next term. A static model wouldn’t take this usage change into account. True predictive analytics looks at how each cartridge in each device is used and adapts to the user behavior. We will explore this topic further in a future blog.

3.     The model gets better over time: Machine learning is a key component in predictive analytics. Using a machine learning enables the model to improve over time automatically. Static regression models must be updated manually and applied broadly. This quickly shows the benefit of true predictive analytics. Predictive analytics takes into account the devices history, the accuracy or predictions made in the past, and adjusts accordingly. This dynamic improvement is essential in the rapidly evolving business environment we all work in.

In conclusion, there are a lot of companies claiming to offer predictive analytics. The technology is powerful and can enable companies to dramatically improve their businesses and evolve business models. However, the technology is complex, and the skill sets required to use the technology is in short supply. When you are looking to employ this technology, use the tips above to separate the real from the rest.