How to score 50% predictions
Mar. 19th, 2018 09:39 amI've recently been experimenting with removing stress from my daily[1] todo list by listing things I hoped to do but putting a likelihood on them, like "90% foo, bar, blah; 75% other thing" etc. I know this seems overcomplicated, but I find failing things I'd planned to do REALLY REALLY kills my motivation, so it's worth arranging things such that even if they go better or less well than expected, they fall into the broad range of "what I planned for". And it also means that I'm more pushed to put small, comparatively important things first, rather than starting with the difficult things and never getting to anything else.
I don't know if I will keep it up, but even just trying it raised several interesting questions.
Slatestarcodex sometimes posts predictions like this, usually for an upcoming year, to test where he's being honest with what he expects and where he isn't (usually about external factual things like politics, but some of himself). A question arises, how to score this? Especially the 50% ones.
You can cobble together some score which is maximised when 90% of the 90% predictions are true. I think there's some particular baysian probability thing that measures, given those expectations, how unlikely a particular outcome is (which equates to 'how wrong you are' which you try to minimise).
But the bit that gets confusing is, how to rate 50% predictions? For my system, I'm predicting what *will* get done, so I feel like it's clear if I'm over-estimating or under-estimating. But Scott had the problem, that it seemed arbitrary if he said "X will do Y" or "X will do not-Y", so the 50% predictions should be random even if they're comically bad, which logically makes them impossible to score. And yet, you feel that if they're really bad, you should be able to recognise that in a systematic way. Maybe they should all be stated relative to the status quo? Or something else?
[1] Freudiano 'faily' :)
I don't know if I will keep it up, but even just trying it raised several interesting questions.
Slatestarcodex sometimes posts predictions like this, usually for an upcoming year, to test where he's being honest with what he expects and where he isn't (usually about external factual things like politics, but some of himself). A question arises, how to score this? Especially the 50% ones.
You can cobble together some score which is maximised when 90% of the 90% predictions are true. I think there's some particular baysian probability thing that measures, given those expectations, how unlikely a particular outcome is (which equates to 'how wrong you are' which you try to minimise).
But the bit that gets confusing is, how to rate 50% predictions? For my system, I'm predicting what *will* get done, so I feel like it's clear if I'm over-estimating or under-estimating. But Scott had the problem, that it seemed arbitrary if he said "X will do Y" or "X will do not-Y", so the 50% predictions should be random even if they're comically bad, which logically makes them impossible to score. And yet, you feel that if they're really bad, you should be able to recognise that in a systematic way. Maybe they should all be stated relative to the status quo? Or something else?
[1] Freudiano 'faily' :)
no subject
Date: 2018-03-19 11:34 am (UTC)I'm pretty sure somebody argued in the follow-up comments (without quite the rigour of a Proper Proof but still pretty convincingly as I recall) that there was a fundamental difficulty along the lines of, if you're trying to get your accumulated score over a great many guesses as close as possible to the centre point of 'neither systematically under- nor overestimating', there's no way a scoring system can dis-incentivise the system-gaming technique of tracking your current running total and deliberately erring on the side of whatever will move it closer to the middle.
Unfortunately there seems to have been a rare archiving cockup on Mono in that particular subdirectory, so I can't go back and dig out the argument to see if it had any holes in it, or whether it made an assumption about the type of scoring system that need not hold.
(Perhaps one could define a score with no centre point, i.e. the system awards penalty points with the same sign regardless of which way you err and doesn't track under- vs over-estimates anyway; or perhaps one could randomise the sense of each prediction, and treat some 75% predictions of X as 25% predictions of not-X, or some such. But there are probably still secondary system-gamings possible, such as skewing which kinds of event you even try to predict, going for mostly almost-sure things or mostly 50%ish things...)
Anyway, it certainly does seem to me that a necessary robustness property for any scoring system of this type is that it should reward you for not cheating, i.e. for making a good-faith effort to estimate the probability of each predicted event as accurately as you can, independently of what other events might have already come up and what your current score might be. And whether or not the not-quite-proof I mention was sound, it seems clear that such a robustness property is at the very least difficult to achieve...
no subject
Date: 2018-03-19 12:10 pm (UTC)