How to score 50% predictions
Mar. 19th, 2018 09:39 amI've recently been experimenting with removing stress from my daily[1] todo list by listing things I hoped to do but putting a likelihood on them, like "90% foo, bar, blah; 75% other thing" etc. I know this seems overcomplicated, but I find failing things I'd planned to do REALLY REALLY kills my motivation, so it's worth arranging things such that even if they go better or less well than expected, they fall into the broad range of "what I planned for". And it also means that I'm more pushed to put small, comparatively important things first, rather than starting with the difficult things and never getting to anything else.
I don't know if I will keep it up, but even just trying it raised several interesting questions.
Slatestarcodex sometimes posts predictions like this, usually for an upcoming year, to test where he's being honest with what he expects and where he isn't (usually about external factual things like politics, but some of himself). A question arises, how to score this? Especially the 50% ones.
You can cobble together some score which is maximised when 90% of the 90% predictions are true. I think there's some particular baysian probability thing that measures, given those expectations, how unlikely a particular outcome is (which equates to 'how wrong you are' which you try to minimise).
But the bit that gets confusing is, how to rate 50% predictions? For my system, I'm predicting what *will* get done, so I feel like it's clear if I'm over-estimating or under-estimating. But Scott had the problem, that it seemed arbitrary if he said "X will do Y" or "X will do not-Y", so the 50% predictions should be random even if they're comically bad, which logically makes them impossible to score. And yet, you feel that if they're really bad, you should be able to recognise that in a systematic way. Maybe they should all be stated relative to the status quo? Or something else?
[1] Freudiano 'faily' :)
I don't know if I will keep it up, but even just trying it raised several interesting questions.
Slatestarcodex sometimes posts predictions like this, usually for an upcoming year, to test where he's being honest with what he expects and where he isn't (usually about external factual things like politics, but some of himself). A question arises, how to score this? Especially the 50% ones.
You can cobble together some score which is maximised when 90% of the 90% predictions are true. I think there's some particular baysian probability thing that measures, given those expectations, how unlikely a particular outcome is (which equates to 'how wrong you are' which you try to minimise).
But the bit that gets confusing is, how to rate 50% predictions? For my system, I'm predicting what *will* get done, so I feel like it's clear if I'm over-estimating or under-estimating. But Scott had the problem, that it seemed arbitrary if he said "X will do Y" or "X will do not-Y", so the 50% predictions should be random even if they're comically bad, which logically makes them impossible to score. And yet, you feel that if they're really bad, you should be able to recognise that in a systematic way. Maybe they should all be stated relative to the status quo? Or something else?
[1] Freudiano 'faily' :)
no subject
Date: 2018-03-20 11:36 am (UTC)Imagine Alice plans to go swimming, at random, but averaging one day in five. Eve can be secretly taking notes on Alice and making predictions, and predicting 20% each day will do fine, the proper scoring rules I've discussed create no perverse incentives for Eve. However, if Alice herself starts predicting she'll go swimming with 20% probability each day, then each day there's a perverse incentive to not go swimming because it makes the predictions better. To a certain extent you can get around this by batching things: e.g. "61% chance of 6 or more swims in a 31-day period" or maybe even "binomial distribution, n=31, p=0.2". However aspirational stretch goals that can't be made to go above 50% with any amount of aggregating may be an unavoidable perverse incentive trap.
Possibly you want to be specifically avoiding pushing yourself too hard, in which case the aspirational stretch goals problem isn't a problem.