Beware the Dodgy Prediction | CauSelf FMCG Blog

It's football finals time in Australia — a great season for sports lovers and an interesting period for stats nerds.

During this time, many predictions based on small and unique data sets emerge. A common example is: "Since 1952, the Cats have not lost a grand final when leading at half-time," implying that if the Cats are leading at half-time, they should win. But sometimes, they don't.

Similar types of predictions appear in more serious contexts, such as the US election, where predictors use criteria like the candidate's voice depth or height. These predictions have about as much statistical rigour as an octopus predicting the World Cup winner.

What's the Relevance to Business?

This phenomenon sheds light on how we sometimes analyse our data, either manually or through statistical forecasts.

Manual Analysis

We might use BI tools or Excel to drill down into a unique, small data set. For instance, we might discover that the average volume during $1.99 price points was 10,000 units, but we might overlook other relevant data, placing too much emphasis on that small set.

Overfitting

This is an example of "overfitting," a statistical concept where the model infers patterns from random data. This challenge often arises in Machine Learning, especially with limited data, such as in the Consumer Goods sector, where product history might span only 3–4 years.

Consider a product that underperformed in its first two years but saw a sales surge in the third and fourth years due to a strong marketing campaign. An unchecked ML approach might predict continued doubling of sales, leading to an overly optimistic forecast for the fifth year. A sales manager would likely see a 100% growth prediction as unrealistic and lose confidence in the numbers.

Balancing Automation and Manual Intervention

The key is to have a tool that blends powerful modelling with the ability to adjust how the data is used. This balance helps create models that pass the "reasonability test," where forecasts are believable to stakeholders like sales managers.

Assessing Model Quality

To ensure model quality, use tools and techniques such as:

Fit and accuracy measures: Back-fitted MAPE, accuracy histograms, and absolute errors
Diagrams: Scatter diagrams showing price-volume effects
Data slicing: Analyse how the model reacts to specific conditions
Test data sets: Retain the last six weeks of historical data for testing model performance
Long-term testing: Extend the horizon to 104 weeks with test data to check for unrealistic trends

There is no single measure or approach for validating a statistical model, and a subjective element exists in ensuring a model passes the reasonability test. The validation approach depends on the model's intended use. Short-term models might prioritise fit and accuracy, while long-term models might focus on handling long-term trends.

The CauSelf Approach

CauSelf integrates these flexible capabilities, empowering you to take ownership of model development. It helps you develop insights, build confidence in the models, and adapt quickly as more data becomes available — so your forecasts pass the reasonability test every time.

35+ years FMCG expertise

8–13 wks typical deployment

Hallmark Cards · Dairyworks

Stop guessing. Start forecasting with causal AI.

CauSelf replaces statistical averages with causal models that understand the real drivers of your demand — promotions, weather, distribution, seasonality.

See your ROI — free → Book a demo