top of page
Blog: Blog2
Search

Beware the Dodgy Prediction

Writer's picture: Tim WilliamsonTim Williamson


It's football finals time in Australia – a great season for sports lovers and an interesting period for stats nerds.


During this time, many predictions based on small and unique data sets emerge. A common example is: "Since 1952, the Cats have not lost a grand final when leading at half-time," implying that if the Cats are leading at half-time, they should win. But sometimes, they don't.

Similar types of predictions appear in more serious contexts, such as the US election, where predictors use criteria like the candidate's voice depth or height. These predictions have about as much statistical rigor as an octopus predicting the World Cup winner.


What's the Relevance to Business?

This phenomenon sheds light on how we sometimes analyze our data, either manually or through statistical forecasts.


Manual Analysis

We might use BI tools or Excel to drill down into a unique, small data set. For instance, we might discover that the average volume during $1.99 price points was 10,000 units, but we might overlook other relevant data, placing too much emphasis on that small set.


Overfitting

This is an example of "overfitting," a statistical concept where the model infers patterns from random data. This challenge often arises in Machine Learning, especially with limited data, such as in the Consumer Goods sector, where product history might span only 3-4 years.

Consider a product that underperformed in its first two years but saw a sales surge in the third and fourth years due to a strong marketing campaign. An unchecked ML approach might predict continued doubling of sales, leading to an overly optimistic forecast for the fifth year. A sales manager would likely see a 100% growth prediction as unrealistic and lose confidence in the numbers.


Balancing Automation and Manual Intervention

The key is to have a tool that blends powerful modeling with the ability to adjust how the data is used. This balance helps create models that pass the "reasonability test," where forecasts are believable to stakeholders like sales managers. Another important consideration is ensuring the the design of the model recognise how and why the model will be used.


Assessing Model Quality

To ensure model quality, use tools and techniques such as:


  • Fit and accuracy measures: Back-fitted MAPE, accuracy histograms, and absolute errors.

  • Diagrams: Scatter diagrams showing price-volume effects.

  • Data slicing: Analyze how the model reacts to specific conditions.

  • Test data sets: Retain the last six weeks of historical data for testing model performance.

  • Long-term testing: Extend the horizon to 104 weeks with test data to check for unrealistic trends.


There is no single measure or approach for validating a statistical model, and a subjective element exists in ensuring a model passes the reasonability test. The validation approach depends on the model's intended use. Short-term models might prioritize fit and accuracy, while long-term models might focus on handling long-term trends.


Cauself Solution

Our Cauself solution integrates these flexible capabilities, empowering you to take ownership of model development. It helps you develop insights, build confidence in the models, and adapt quickly as more data becomes available.


Want to learn more? Visit www.cauself.com.

3 views0 comments

Recent Posts

See All

Comentários


  • LinkedIn
  • YouTube
  • Facebook

©2022 by Cauself.

bottom of page