It's football finals time in Australia — a great season for sports lovers and an interesting period for stats nerds.
During this time, many predictions based on small and unique data sets emerge. A common example is: "Since 1952, the Cats have not lost a grand final when leading at half-time," implying that if the Cats are leading at half-time, they should win. But sometimes, they don't.
Similar types of predictions appear in more serious contexts, such as the US election, where predictors use criteria like the candidate's voice depth or height. These predictions have about as much statistical rigour as an octopus predicting the World Cup winner.
This phenomenon sheds light on how we sometimes analyse our data, either manually or through statistical forecasts.
We might use BI tools or Excel to drill down into a unique, small data set. For instance, we might discover that the average volume during $1.99 price points was 10,000 units, but we might overlook other relevant data, placing too much emphasis on that small set.
This is an example of "overfitting," a statistical concept where the model infers patterns from random data. This challenge often arises in Machine Learning, especially with limited data, such as in the Consumer Goods sector, where product history might span only 3–4 years.
Consider a product that underperformed in its first two years but saw a sales surge in the third and fourth years due to a strong marketing campaign. An unchecked ML approach might predict continued doubling of sales, leading to an overly optimistic forecast for the fifth year. A sales manager would likely see a 100% growth prediction as unrealistic and lose confidence in the numbers.
The key is to have a tool that blends powerful modelling with the ability to adjust how the data is used. This balance helps create models that pass the "reasonability test," where forecasts are believable to stakeholders like sales managers.
To ensure model quality, use tools and techniques such as:
There is no single measure or approach for validating a statistical model, and a subjective element exists in ensuring a model passes the reasonability test. The validation approach depends on the model's intended use. Short-term models might prioritise fit and accuracy, while long-term models might focus on handling long-term trends.
CauSelf integrates these flexible capabilities, empowering you to take ownership of model development. It helps you develop insights, build confidence in the models, and adapt quickly as more data becomes available — so your forecasts pass the reasonability test every time.
Book a free demo →