Newcomb's paradox
Apr. 15th, 2009 11:11 pmLinks
Newcomb's paradox on wikiepedia
Newcomb's paradox on Overcoming Bias blog
Newcomb's paradox on Scott Aaronson's blog and lectures
I first came across it via overcoming bias, and discussed it with a few people, but then recently saw it again in one of the transcriptions of scott aaronson's philosophy/quantum/computing lectures.
Newcomb's paradox
In very short, Newcomb's paradox says, suppose you're a professor and a grad student (or, in some cases, a superintelligent alien) comes to you and demonstrates this experiment. She chooses a volunteer, examines them, then takes two boxes, puts £1000 in box A and either £1000000 or nothing in the box B (see below for how she decides). She brings the boxes into the room and explains the set-up to the volunteer and says that they're allowed to either take the mystery box (when they either get lots or nothing) or take both boxes (when they get at least £1000.)
She even lets them see the £100000 beforehand so they know it exists, and lets them peek into box A to show it does have the money in, though box B remains a secret until afterwards.
"What's the catch," the volunteer asks. "Ah," begins the experimenter. "I have previously examined you, and worked out which choice you're going to make. If you were going to choose both boxes, I put nothing in box B. Only if you were going to take box B only, did I choose to put £1000000 in it.
"Hm", says the volunteer. "What do I do?"
A few caveats
"What if the volunteer would change their mind when they discovered the reasoning, or is going to choose based on a coin toss?" "Then I didn't accept them as a volunteer."
"How do I know it works?" You can't be sure, but she performs the experiment lots of times and is always right, so you are convinced. (Some examples ask you to presume as part of the conditions that she can, or take it on trust, but I think "having seen it work" makes it most convincing and concrete.)
"Ah, but I don't care about £1000, and certainly not if I've got £1000000, so I don't care," Well, ask what you would do if the numbers were a bit different. Can you pretend there's no combination where you'd risk something to get the little one, yet risk more to get the big one?
"How did they know what they'd pick when this experiment was performed the first time?" It doesn't really matter, just assume that you have seen it working with pretty-perfect prediction.
What would you do? An enumeration of the two obvious arguments.
11: "Why should you take both boxes?" Duh! Because whatever's in either of the boxes, you get all of it. And if that means I fail to get the million, then it's already too late to change that, isn't it?
22: "Why should you take box B?" Duh! Because you've just seen 50 people do the experiment, and all the ones who took both got £1000 and all the ones who took B got £1000000. Follow what works, even if you can't justify it with maths.
That's why it's a paradox, because, if you squint long enough, both answers seem perfectly reasonable.
I know this seems a little convoluted, but I tried to make it comprehensible if terse even to people without a very high opinion or training in philosophy (like me, in general). And hopefully get it to the point where at least asking the question makes sense.
Wait, if we use baysian reasoning, I bet the arguments will instantly become transparent and non-controversial. Right?
11: As above. Look at the table, and enumerate the possibilities. Choosing more boxes always gives a bigger payoff.
22: Ah, no, you're cheating. Based on the previous evidence, you must assume a priori that you are on row 2 or 3. After than, the choice is easy: row 3 gives more money. (See below for more "so, there's a 2/3 chance I'm in this universe..." type reasoning.)
Can we put this on a more rational footing? How does she predict what's going to happen
In fact, there are several ways.
1. You can do it if you postulate time travel, or determinism and a copy-teleport-machine, but those are not very realistic things to postulate, whether they would be physically possible or not.
2. A super-intelligent alien scans your brain and models it in a computer.
3. They give you a short-term-memory impairing drug, and try the experiment out several times beforehand while you remain the same person with the same experiences, but have no memory of the trials.
4. They discover that 94% of the time, all men choose one way, and all women choose the other. (But the experiment is double-blind, run by a technician who doesn't know the expected results, so the grad student peeks, then tells you that there's a 94% correlation, but not which way round it is, then invites you to participate yourself.)
Further arguments
33: Aha! In method 3, you don't know which one you are, one of the trial runs, or the final experiment. The only consistent answer which gets the big money is to assume you're more likely to be a trial run, and hence choose the money.
44: Aha! If so, then (*invokes Greg Egan*), the same reasoning applies with method 2. That suggests that you don't know if you're you, or the simulation of you!
55: Nope. Not so. What about method 4? Surely you can't claim that your consciousness might be either (a) you or (b) "the statistical correlation between gender and box choice"??
Which leaves us back where we started. (But remind me to come back to the "am I equally likely to be me, or some other human or simulation of a human".)
Free will
"What does it have to do with free will?" Well, the experiment is completely (sort-of) practical to do. In theory. And so you'd think it should be also actually possible to choose which to take. And yet it doesn't seem to be, and the answer seems to depend maybe on whether you believe in something you can call "free will".
In fact, people divide between take A&B, take B, and "problem is stupid, won't consider". In general, I think the last answer is often over-overlooked. In this case, if I'd seen it work out like that, I'd agree to take only box B, even if I couldn't explain the mathematics behind it. However, I also definitely feel I should be able to justify one case or the other.
Informally, it seems most people seem to eventually take B, but I don't know how important that is.
Apocrypha
Links to prisoner's dilemma, links to doomsday paradox, etc, etc.
Newcomb's paradox on wikiepedia
Newcomb's paradox on Overcoming Bias blog
Newcomb's paradox on Scott Aaronson's blog and lectures
I first came across it via overcoming bias, and discussed it with a few people, but then recently saw it again in one of the transcriptions of scott aaronson's philosophy/quantum/computing lectures.
Newcomb's paradox
In very short, Newcomb's paradox says, suppose you're a professor and a grad student (or, in some cases, a superintelligent alien) comes to you and demonstrates this experiment. She chooses a volunteer, examines them, then takes two boxes, puts £1000 in box A and either £1000000 or nothing in the box B (see below for how she decides). She brings the boxes into the room and explains the set-up to the volunteer and says that they're allowed to either take the mystery box (when they either get lots or nothing) or take both boxes (when they get at least £1000.)
She even lets them see the £100000 beforehand so they know it exists, and lets them peek into box A to show it does have the money in, though box B remains a secret until afterwards.
| Choice | In box A | In box B | Total obtained |
| B only | £1000 | £0 | £0 |
| Both | £1000 | £0 | £1000 |
| B only | £1000 | £1000000 | £1000000 |
| Both | £1000 | £1000000 | £1001000 |
"What's the catch," the volunteer asks. "Ah," begins the experimenter. "I have previously examined you, and worked out which choice you're going to make. If you were going to choose both boxes, I put nothing in box B. Only if you were going to take box B only, did I choose to put £1000000 in it.
"Hm", says the volunteer. "What do I do?"
A few caveats
"What if the volunteer would change their mind when they discovered the reasoning, or is going to choose based on a coin toss?" "Then I didn't accept them as a volunteer."
"How do I know it works?" You can't be sure, but she performs the experiment lots of times and is always right, so you are convinced. (Some examples ask you to presume as part of the conditions that she can, or take it on trust, but I think "having seen it work" makes it most convincing and concrete.)
"Ah, but I don't care about £1000, and certainly not if I've got £1000000, so I don't care," Well, ask what you would do if the numbers were a bit different. Can you pretend there's no combination where you'd risk something to get the little one, yet risk more to get the big one?
"How did they know what they'd pick when this experiment was performed the first time?" It doesn't really matter, just assume that you have seen it working with pretty-perfect prediction.
What would you do? An enumeration of the two obvious arguments.
11: "Why should you take both boxes?" Duh! Because whatever's in either of the boxes, you get all of it. And if that means I fail to get the million, then it's already too late to change that, isn't it?
22: "Why should you take box B?" Duh! Because you've just seen 50 people do the experiment, and all the ones who took both got £1000 and all the ones who took B got £1000000. Follow what works, even if you can't justify it with maths.
That's why it's a paradox, because, if you squint long enough, both answers seem perfectly reasonable.
I know this seems a little convoluted, but I tried to make it comprehensible if terse even to people without a very high opinion or training in philosophy (like me, in general). And hopefully get it to the point where at least asking the question makes sense.
Wait, if we use baysian reasoning, I bet the arguments will instantly become transparent and non-controversial. Right?
11: As above. Look at the table, and enumerate the possibilities. Choosing more boxes always gives a bigger payoff.
22: Ah, no, you're cheating. Based on the previous evidence, you must assume a priori that you are on row 2 or 3. After than, the choice is easy: row 3 gives more money. (See below for more "so, there's a 2/3 chance I'm in this universe..." type reasoning.)
Can we put this on a more rational footing? How does she predict what's going to happen
In fact, there are several ways.
1. You can do it if you postulate time travel, or determinism and a copy-teleport-machine, but those are not very realistic things to postulate, whether they would be physically possible or not.
2. A super-intelligent alien scans your brain and models it in a computer.
3. They give you a short-term-memory impairing drug, and try the experiment out several times beforehand while you remain the same person with the same experiences, but have no memory of the trials.
4. They discover that 94% of the time, all men choose one way, and all women choose the other. (But the experiment is double-blind, run by a technician who doesn't know the expected results, so the grad student peeks, then tells you that there's a 94% correlation, but not which way round it is, then invites you to participate yourself.)
Further arguments
33: Aha! In method 3, you don't know which one you are, one of the trial runs, or the final experiment. The only consistent answer which gets the big money is to assume you're more likely to be a trial run, and hence choose the money.
44: Aha! If so, then (*invokes Greg Egan*), the same reasoning applies with method 2. That suggests that you don't know if you're you, or the simulation of you!
55: Nope. Not so. What about method 4? Surely you can't claim that your consciousness might be either (a) you or (b) "the statistical correlation between gender and box choice"??
Which leaves us back where we started. (But remind me to come back to the "am I equally likely to be me, or some other human or simulation of a human".)
Free will
"What does it have to do with free will?" Well, the experiment is completely (sort-of) practical to do. In theory. And so you'd think it should be also actually possible to choose which to take. And yet it doesn't seem to be, and the answer seems to depend maybe on whether you believe in something you can call "free will".
In fact, people divide between take A&B, take B, and "problem is stupid, won't consider". In general, I think the last answer is often over-overlooked. In this case, if I'd seen it work out like that, I'd agree to take only box B, even if I couldn't explain the mathematics behind it. However, I also definitely feel I should be able to justify one case or the other.
Informally, it seems most people seem to eventually take B, but I don't know how important that is.
Apocrypha
Links to prisoner's dilemma, links to doomsday paradox, etc, etc.
no subject
Date: 2009-04-15 10:22 pm (UTC)(no subject)
From:(no subject)
From:(no subject)
From:no subject
Date: 2009-04-15 10:40 pm (UTC)(no subject)
From:no subject
Date: 2009-04-15 11:01 pm (UTC)And if I don't get the huge sum, I hope I get some satisfaction out of proving the alien wrong about me. But as what I'd choose is something I've stated before and now written down, the super-intelligent alien should know and be prepared to give me the huge sum.
I'm ready to volunteer now.
(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:no subject
Date: 2009-04-15 11:50 pm (UTC)Reading the expanded thingies, answer 22 seems the right one; granting the premise that the tester is able to predetermine my response then only choices 2 and 3 apply and the rational answer is B.
In fact, I don't even need to be convinced that the tester can predict responses with perfect accuracy - the difference between £1000000 and £1000 is so large that even if I think they only get it right 50.1% of the time it's still most rational to just take box 2: if I take just B there's a 0.501 chance I'll get £1000000 and a 0.499 chance I'll get nothing (mean gain £501000), whereas if I take both there's a 0.499 chance I'll get £1001000 and a 0.501 chance I'll get £1000 (mean gain £500000).
(no subject)
From:no subject
Date: 2009-04-15 11:57 pm (UTC)In any event, there is at this point either GBP0 or GBP1000000 in box B and a guaranteed GBP1000 in Box A, so the only sensible thing to do is take both boxes.
What am I missing? Is it just the annoyance that someone worked out that you're not stupid to begin with? That wouldn't annoy me ;-)
(no subject)
From:no subject
Date: 2009-04-16 12:20 am (UTC)In the coin flipping example, am I assuming that Omega will only ask me once? Does Omega's prediction include the result of the coin toss?
(no subject)
From:no subject
Date: 2009-04-16 06:00 am (UTC)Don't the conditions of the test mean that no rational human would be accepted? After all, the sensible thing to do before the reasoning is explained is to take both boxes, and after hearing the reasoning to take just box B. That's the whole point. So there is no paradox.
If they only accept volunteers who will not change their mind, then that's just like asking the volunteer to make their choice in advance of hearing the reasoning.
(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:no subject
Date: 2009-04-16 08:48 am (UTC)(Is this actually intended to be an analogy to religious belief of the 'pie in the sky when you die' variety? Because it works very well as such.
If you don't believe the 'super-intelligent alien' (i.e. God) you take both boxes and get the GBP1000 reward (you don't have to do awkward things for them in this life) but it turns out you have no afterlife / a bad afterlife (the GBP0 in the other box).
Whereas if you trust the 'super-intelligent alien' then you don't get the GBP1000 of having an easier time in this world due to not having to pay attention to them, but you do get the GBP1000000 of their help / the afterlife which is so much more that supposedly you don't miss the GBP1000 which was definitely in the first box but which you passed up to get the greater reward.)
(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:(no subject)
From:no subject
Date: 2009-04-16 10:03 am (UTC)(no subject)
From:no subject
Date: 2009-04-16 10:10 am (UTC)My previous favourite answer was out of Hofstadter's article on the subject, which was that one-boxing gives you a choice between two desirable outcomes – either you get the big cheque, or you get to catch the predictor out in a mistake and show that it was fallible after all. Several people in the comments here have reinvented that one, which is encouraging (Hofstadter asked the question of lots of his friends and was surprised nobody used that reasoning); the bet-half-a-million idea is a particularly nice tweak on the same thing.
But to some extent all of those answers are sort-of-cheating, in that they're dodging the question. The answers that say "proving the predictor wrong is also desirable" or "I start by making side bets" are essentially adjusting the outcome grid into something they can work with more easily, and hence avoiding the real question of "yes, but what would you do if the outcome grid weren't adjusted?". Same goes for answers like "depends how rich I was already" (if I considered £1000 a negligible sum then I might be more inclined to grandstand on the chance of the big cheque by one-boxing, whereas if I was at serious risk of starvation or eviction and £1000 was enough to save me then that might well bias me in favour of minimising my risk by two-boxing). I think that to answer in the spirit of the problem, one has to assume that what's in the boxes is not a thousand and a potential million pounds, but a thousand and a potential million units of pure linearly-valued utility.
As far as my actual personal answer goes, I incline towards "problem is stupid". I'm generally not a fan of using unrealistic hypothetical situations as a means of self-discovery, because (a) they tend to have the problem that your armchair speculation about what you'd do doesn't match what you'd actually do when the emotional stress was real rather than imagined, and (b) they also tend not to tell you what you really wanted to know, in that what they really tell you is which of your decision-making rules of thumb broke down first in the face of being stretched beyond its warranty limit, rather than which one is in some meaningful sense more fundamental to your nature.
So my position for the moment is that my brain's decision-making methodology is currently not equipped to cope with situations involving an almost-perfect Newcomb-predicting entity, and since there is neither such a thing nor a realistic prospect of such a thing I have not yet had reason to consider this an important flaw in that methodology. If one comes along, I may reconsider – but, as you suggest in your post, the right response will probably depend a lot on what I know about how the predictor works.
(In the even more unlikely situation that such a predictor existed but all I knew about it was that it had a track record established by unimpeachable empirical research, I suppose I'd have to consider all the possible mechanisms by which it might work and think about which was most likely to be the real answer. Which is one of those "pull a prior out of your arse" situations, and no matter how much you dress it up with Bayesian-sounding language the reality is that one just has to take a flying guess and be prepared to accept it as the fortunes of war if one guesses wrong.)
(no subject)
From:no subject
Date: 2009-04-16 10:51 am (UTC)However, having suspected something of the sort when the boxes were explained (because there has to be a purpose to the experiment, which is additional info), but before the alien explained how it decided whether to fill box B, I might well be the sort of person who would choose box B only.
How do I know which sort of person I am? The only way for me to tell is to actually go ahead and make the choice. Though I'm willing to conceed the possibility that a good enough personality questionnaire might make a better prediction about my than I could myself before actually doing it, and so am unlikely to bet against that possibility just to gain 0.1% more.
no subject
Date: 2009-04-16 11:13 am (UTC)Anyway, while doing so I came to conclusion that this and the prisoner's dilemma can be considered restatements of the same underlying problem. In the cold light of day and wakefulness I'm no longer quite so certain about that as I was last night, but I think it still mostly holds up.
In both cases there are four outcomes, both based on choices made by you and by another party. Moreover, at the point of the your decision, the other party's choice will not affected by your action (either because it's already been made, in the case of the Newcomb's Paradox, or because they aren't aware of your choice until the end of the experiment, in the prisoner's dilemma). And at that point, the rational short-term decision that maximises your gain is always obvious (taking both boxes always nets more money in the Newcomb's Paradox case, since the position of the money is already set, ratting out your partner always nets you a lower sentence in the prisoner's dilemma).
And, of course, the interesting part of the question comes in once you consider some extra condition - the ability of the tester to predict your decision in the Newcomb's Paradox, the possibility of repeated tests in the prisoner's dilemma. And I think *these* can be cast as the same thing, since in the prisoner's dilemma the idea is that your partner's actions will be predicated on your past/probable behaviour.
For example, consider a case of the prisoner's dilemma where both parties can perfectly predict what their partner's decision will be. This is actually relatively straightforward to implement in real life - allow both parties to see what their opposing number has chosen, allow each to change their minds as many times as they like, and don't settle on any choice until both sides have agreed they are happy with the choice. In that case, it seems self-evident that both sides will settle on neither ratting out the other (since it's not in the interest of either to settle on a decision while their partner is ratting on them), even if this is the first test they've undertaken and there's no repitition. So, just like with Newcomb's Paradox, once perfect prediction enters the equation the rational choice becomes to take the option that seems irrational at the point of use (not ratting out your partner, not taking box A) because the extra information available to the other party means that they knew this would be your choice and hence the prizes/penalties available have been adjusted to make it more favorable to you.
Oof, OK, really hope all of that makes sense now...
(no subject)
From:no subject
Date: 2009-04-16 11:22 am (UTC)So the volunteer should decide on a coin-toss, just to prove that the alien is indeed faliable.
(no subject)
From:no subject
Date: 2009-04-16 01:16 pm (UTC)I'd take box B, and I do sort of justify that with reverse causality. The people who reason "The boxes have already been filled; I may as well just take everything that's on the table" are effectively making box B be empty and doing themselves out of £1m, assuming the alien can correctly predict they'll reason like that (and I think questioning too strongly the alien's predictive abilities on grounds of practical realism is sidestepping the point of the thought experiment).
And by reasoning the way I do, I effectively make the alien put £1m in box B.
What I can't do is reason my way, make the alien put £1m in box B, and then quickly-before-the-change-has-time-to-percolate-back-to-when-the-boxes-were-filled change my mind and grab both boxes and get £1001k. If the alien foresees the first bit of reasoning, he also foresees the second bit; if reverse causality applies at all (even if as a mental model for understanding the experiment and not in reality) then it applies to all your decisions.
no subject
Date: 2009-04-16 03:07 pm (UTC)Let A be the amount of money in box A (that is, the amount that's reliably there but you can decide whether or not to take), and B be the amount in box B (that may or may not be there but if it is you will get it no matter what you choose).
For a start, let's fix B at its original £1m, and vary A.
Suppose A=0. It seems fairly clear to me that one-boxing is the sensible answer, just on the grounds that you don't know for absolute certain that reverse causality doesn't exist (even if it exists with a very low probability, or even a perhaps-not-actually-impossible event of probability zero!) and there's no better reason to choose between the two options anyway. If anyone can justify two-boxing when A=0 I'd be interested to hear the argument :-)
Now suppose A=1p. I think I would have to say that I still one-box (even though I'm undecided in the original case), because really the 1p would matter to me so little that it might as well be zero.
At the other extreme, suppose A and B are both £1m. I can't see any reason to one-box this time: if you one-box, you get either £1m or nothing, and if you two-box you get either £1m or £2m. No matter what you think the probabilities are in each case, the expected gain for two-boxing is equal or greater. No-brainer.
Now suppose A = £999999.99. Two-box again for me, no question. The 1p difference is unimportant to me, as before.
All of which tells me that somewhere in between these extremes is the critical value of A which would cause me to be perfectly undecided between one- and two-boxing – which puts the whole question on a quantitative footing, and suggests that one can be more subtle than dividing sentiences into "one-boxer" and "two-boxer".
Allowing B to vary as well as A is the point where one really gets into the nonlinear utility of money: I currently feel as if I'd be much more likely to two-box with A=£1bn and B=£1tn than I would with A=£1000 and B=£1m. Even a billion is more money than I can imagine being able to spend, so who cares if it isn't the trillion it might have been? (Though, of course, after I two-boxed and "only" got my billion, a few years later when I'd become accustomed to the resulting lifestyle I'd be kicking myself for not having held out for the extra £999bn :-) And conversely, if you head in the other direction, I'm pretty sure I'd one-box with A=£10 and B=£10000, because £10 is pretty negligible but £10000 would definitely come in handy.
(no subject)
From:no subject
Date: 2009-04-17 09:59 am (UTC)(no subject)
From: