Model Evaluation With Proper Scoring Rules: A No-Math Introduction
Too Long; Didn't Read
Proper scoring rules offer a model evaluation framework for probabilistic forecasts. Calibration tells whether our predictions are statistically consistent with observed events or values. Sharpness captures uncertainty in the predictions (predictive distribution) without considering the actual outcomes. We want our model evaluation technique to be immune to ‘hedging’. Hedging your bets means betting on both sides of an argument or teams in a competition. If your metric or a score can be ‘hacked’ then you might have a problem.