ProblemYou have 13 randomly-selected playing cards from a deck. The probability of the first card on a full deck matching one of yours is 1/4 (25%). What is the probability of the second card matching one, if the first card doesn't? And the 3rd, 4th, etc?
      – Lee J Haywood, 2008-09-15 at 21:28:33   (86 comments)

On 2008-09-15 at 21:32:23, Lee J Haywood wrote...
This deceptively simple problem has had me thinking about it all day. I started with a Monte Carlo simulation, which tells me roughly the decimal probabilities for each card drawn from the full deck, but I've not managed to work out the corresponding fractions from first principles (or guesswork) - even with a smaller number of cards, e.g. 3 and 12 instead of 13 and 52.
On 2008-09-15 at 22:13:07, Lee J Haywood wrote...
In the case of 3 random cards and a full deck containing 12 cards, the probabilities are 55/220, 45/220/ 36/220, 28/220, 21/220, 15/220, 10/220, 6/220, 3/220 and 1/220 - no idea why though. (-: (55/220 = 3/12 = 1/4).
On 2008-09-15 at 23:13:01, Beeba wrote...
wouldn't it be 25% for the first card, 25.5% for the second, 26% for the third, etc.? i'm just doing the simple math here, i must be missing what makes this so complicated.
On 2008-09-15 at 23:33:26, Lee J Haywood wrote...
It's not that easy, although similar to what I started out thinking. The probabilities go down - not up, if you expect them to add up to 100%. According to my Monte Carlo simulation, the probability for the second card is about 0.191, the third is 0.145, etc. - they're part of a series. If you have a total at each stage, it's still 0.4546 for the second card and 0.618 for the third. I'm homing in on the fractions by starting with smaller deck sizes, 1:2, 1:3, 2:3, 2:4, 2:5, 3:4, 3:5, etc. and seeing the patterns. Say you have 2 cards and the full deck has 6, then the probabilities work out as 1/3, 4/15, 1/5, 2/15 and 1/15, which is actually 5/15, 4/15, 3/15, 2/15 and 1/15. I'm not clear as to why this is, though. Anyway, I'm late for bed - I'll carry on tomorrow.
On 2008-09-16 at 03:00:12, Beeba wrote...
that was so far over my head, lee.
On 2008-09-16 at 17:20:51, Lee J Haywood wrote...
Okay, so I've figured it all out and put the results here... http://pr0gramm3rs.googlepages.com/cards.txt Determining where the fractions come from is left as an exercise for the reader... but as a clue, for the second card it's (52-13)/52 × 13/(52-1) - the probability that the first card didn't match, multiplied by the probability that the 13 cards you still have will match one of the remaining 51 cards.
On 2008-09-17 at 02:26:49, BorgClown wrote...
@Lee J Haywood: A card match is matching the number or the suit? I ask because 13 cards make a suit, so it looks like it's suit-based.
On 2008-09-17 at 03:20:38, Beeba wrote...
i assumed a match was a suit and number. the 13 cards are selected from a different deck. what i don't understand is why the odds for the 2nd card and so on should be any more complicated to calculate than the first, why doesn't the simple formula apply? the odds for the first card are 25%, derived from 13/52. why would the formula for the 2nd card not be 13/51, and so on? all lee's numbers just confuse the hell out of me.
On 2008-09-17 at 04:20:59, BorgClown wrote...
@Beeba: Right, missed the 'full deck' part.
On 2008-09-17 at 04:26:04, BorgClown wrote...
@Beeba: I think the complex part is calculating the condition that the previous card(s) didn't match. That means tying one probability to another, decreasing one. Once you draw 39 cards that didn't match, any one of the remaining 13 will match, that's why Lee's numbers stop at 40.
On 2008-09-17 at 04:27:57, BorgClown wrote...
@Beeba: I don't get the second set of numbers though.
On 2008-09-17 at 06:29:04, Beeba wrote...
@BorgClown: i get why the numbers stop at 40. what i don't get is tying one probability to another, it seems like an error or misunderstanding of probability to me (not that i necessarily know what i'm talking about).
On 2008-09-17 at 06:31:05, Beeba wrote...
so here is my thinking. if you flip a coin 10 times, the odds that it will land on heads every time is very low. however, if it lands on heads 9 times, the odds that it will land on heads the 10th time is STILL only 50:50. the probability of a larger pattern of events does not effect the isolated probability of those events. am i making sense?
On 2008-09-17 at 06:36:39, Beeba wrote...
and yes the probability that you would draw 40 non-matching cards is extremely low. however, it is equal to the probability that you would draw 39 non-matching cards followed by 1 match, or 30 non-matches followed by 10 matches, or any possible combination you can conceive of. so the assumption that the probability of a drawn card being a non-match is altered by the fact that previous drawn cards were non-matches in any way beyond the simple elimination of the previous card seems, to me, in error.
On 2008-09-17 at 07:01:40, BorgClown wrote...
I've narrowed it down to this: 1: You take a set A of 13 cards out of deck A. that's 13 times a random card. 13 * 1/52 = 13/52 2: You deal a card a time from deck B. Each card has 13/52 of matching any card of set A. 3: With each card you deal from deck B it's more and more probable it will match, until 40, where it's a certainty. 4: Taking these, I managed to work out this formula: P(N) = 13 / (52-N+1)
On 2008-09-17 at 07:07:26, BorgClown wrote...
What <tt>P(N) = 13 / (52-N+1) means is that the standard probability of a random card (1/52) has 13 chances of matching the set of 13 cards (13/52). And it has more and more chances the more cards you draw (52-N+1). I'm figuring out where the +1 comes from, but I think the formula is correct. To test it I made a Python program which doesn't correlate with Lee's program, so I'm eager to further discuss this.
On 2008-09-17 at 07:15:48, Beeba wrote...
@BorgClown: i'm not great with math, just trying to examine the problem logically, but i think your formula basically fits my understanding that the probability increases with each card. it's pretty simple, right? but lee was saying that it isn't that simple, and that the probability actually decreases rather than increases, then increases again? i don't know, i am terribly confused by his posts.
On 2008-09-17 at 07:23:13, BorgClown wrote...
@Beeba: According to his number listings, it only increases. and it does it incredibly fast, the third draw has already a 59% probability of matching. It should increase steadily.
On 2008-09-17 at 07:27:15, Beeba wrote...
@BorgClown: lee said "According to my Monte Carlo simulation, the probability for the second card is about 0.191, the third is 0.145". he must have changed it. but i see no reason why the third draw should have a 59% chance, if that isn't an error as i described in my recent posts i would really like to hear an explanation for why it is not.
On 2008-09-17 at 07:30:43, BorgClown wrote...
@Beeba: Remember that the first has 25% probability. It isn't random, it really has a 13/52 = 1/4 probability.
On 2008-09-17 at 07:35:45, BorgClown wrote...
I also don't consider myself good at probability, so maybe we are set for a math whopping in a few hours.
On 2008-09-17 at 07:38:28, Beeba wrote...
@BorgClown: right, and by my calculation the second card should have about a 25.5% chance, the third card should have a 26% chance, etc. and increasing very gradually until it nears the end. the third draw should not have a 59% chance, that probability should not be reached until the 33rd-34th card is drawn.
On 2008-09-17 at 07:41:47, Beeba wrote...
as far as i can tell lee's math seems to be based on the assumption that with every consecutive non-matching card that is drawn, the probability for the following card to continue in that pattern decreases significantly. i'm pretty sure that premise is false.
On 2008-09-17 at 07:44:13, BorgClown wrote...
@Beeba: Same as mine, card 31 has 59.1%. These problems are tricky, because ignoring one factor can throw you way off. Let me reread the comments before commenting further, but so far I agree with your calculations, which you have held from the beginning.
On 2008-09-17 at 07:55:39, Beeba wrote...
i think the factors involved are simple, and that lee is reading into factors which are not real. returning to my coin-flip analogy, if you flipped a coin 4 times and it landed heads all 4 times (HHHH), you would naturally think that is a 'unlikely' outcome. but then, how likely is the outcome THTH? or TTHH? or THHT? the answer- they are all equally likely. no one outcome is more likely than any other. the only reason we react the way we do to HHHH and not to HTHT is that our brains' subjective pattern recognition is fixating on the former and tricking us. this same principle should hold true for any number of coin flips (or card selection in this case).
On 2008-09-17 at 07:56:46, BorgClown wrote...
@Beeba: It looks that we don't calculate the 'previous one(s) didn't match' condition. But we really do, because otherwise all cards would have a 25% probability of matching. This is a perfect pretext to write another small program to simulate the actual experiment 100 times.
On 2008-09-17 at 07:57:42, BorgClown wrote...
@Beeba: I still agree with you. It's a pity I have to resort to brute force to prove our points. It makes me feel so MENSA-unworthy.
On 2008-09-17 at 07:58:46, BorgClown wrote...
@Beeba: Just to make it clear, you're right in the pattern. Previous random events do not influence future random events, even if we tend to think otherwise.
On 2008-09-17 at 08:01:12, Beeba wrote...
@BorgClown: the only reason it doesn't remain 25% is because we are dealing with a finite number of cards and with each card drawn the odds must be recalculated to effect the new number of remaining cards.
On 2008-09-17 at 08:01:46, Beeba wrote...
that is quite different from calculating the probability of a perceived overall pattern.
On 2008-09-17 at 08:03:54, Beeba wrote...
if lee can demonstrate that we are wrong it will totally blow my mind. c'mon lee let's hear it. and if you could try to dumb it down and explain your equations in plain english that would be most helpful.
On 2008-09-17 at 08:07:45, BorgClown wrote...
@Beeba: You got me thinking. Our probabilities increase because we are diminishing the deck, but we aren't really taking into account the non-matches. Our calculations have to be adjusted to include the probability of mismatching all the previous deals.
On 2008-09-17 at 08:10:42, Beeba wrote...
@BorgClown: that is what i was trying to address with my coin-flip analogy, the previous non-matches should be irrelevant and have no effect on the probability of the next card drawn being a match. if we draw a matching card the only way it effects probability is that the following card would have a 12/n chance (n being the remaining number of cards from the full deck) rather than 13/n.
On 2008-09-17 at 08:12:56, BorgClown wrote...
@Beeba: But the question asks for a specific pattern, for card 3, for example, it has to be 0,0,1. Calculating the probability of a pattern is harder than calculating a single card.
On 2008-09-17 at 08:15:16, BorgClown wrote...
@Beeba: Gah, I really have to sleep. When I wake up I'll think more about this, and at noon I'll have the time to make that 1000-run simulation. Brute force FTW!
On 2008-09-17 at 08:24:32, Beeba wrote...
@BorgClown: hmmm. it's true that in this case 0, 0, 0 is more likely than 0, 0, 1, which is far more likely than 1, 1, 1. but i still think that that is only due to the sum of their parts, and that to look at the pattern as a whole as an additional force of probability is a mistake. if you do that you are essentially taking the same probability in a different context and then adding it to the equation on top of itself, accounting for the same factors multiple times.
On 2008-09-17 at 11:23:14, Lee J Haywood wrote...
Perhaps the explanation wasn't really good enough - I had to cram it into a topic title - but it relates to a standard card game played for money in pubs, and my aunt had posed the question. I figured it would be a simple summation or multiplication, but the Monte Carlo simulation told me otherwise and I had to figure out what was going on with simpler examples than the larger 13:52 decks.
On 2008-09-17 at 11:35:52, Lee J Haywood wrote...
The issue is that once you've matched one of your 13 cards, you stop. I'll use the 2:7 deck sizes to illustrate what's actually going on, and label the 2 cards as AB, the other 7 as ABCDEFG. For the first card, the probabilities are split AB:CDEFG, so a match is 2/7 (~29%). For the second card, you only calculate when the first card didn't match, i.e. 5/7ths of the time. Lets say that the first card was C (a mismatch), then the remaining choices are AB:DEFG - so a match is 2/6, and you multiply it with 5/7 to get 5/21. For the 3rd card, say the 2nd card was D so we have AB:EFG left and the chances of a match are 2/5 and overall (5/7)×(4/6)×(2/5) - 4/6 is the probability that the second card didn't match - i.e. 4/21. These examples only give the probabilities of a particular card matching, but the chances of having had a card match by the 3rd is (6/21)+(5/21)+(4/21). Oh, and of course when you draw the 6th card you're absolutely guaranteed to get a match.
On 2008-09-17 at 11:46:13, Lee J Haywood wrote...
Part of the confusion in wording the question is that I really wanted the probabilities that you'd had any matches by a given number of cards. But to work that out, you have to look at the distribution of matches first and then add them up... my percentages indicate the summations, so it's the final column that shows the distributions requested by the topic's wording. http://pr0gramm3rs.googlepages.com/cards.txt
On 2008-09-17 at 20:43:36, BorgClown wrote...
@Lee J Haywood: I get your explanation, but amusingly, I'm stuck at figuring out why my own simulation comes with only about 25% for the first card. I'm learning Python, so it might be a gotcha about the language that I'm not getting .
On 2008-09-17 at 20:46:20, BorgClown wrote...
@Lee J Haywood: I also read the Monte Carlo article on Wikipedia. I wasn't aware that probabilistic approaches were called the "Monte Carlo method". It's funny how it's used in more scenarios than we could guess.
On 2008-09-17 at 20:46:21, Lee J Haywood wrote...
What's wrong with getting 25% for the first card? Isn't that the right answer? (-:
On 2008-09-17 at 20:47:13, BorgClown wrote...
@Lee J Haywood: Oh, sorry. It has to be 25.0%, I get 21.25%.
On 2008-09-17 at 20:49:38, BorgClown wrote...
@Lee J Haywood: I form a random set of 13 cards, and reshuffle the second deck each run. The cards are drawn until a match is found, then the number of draws is accumulated. The percent calculated at the end.
On 2008-09-17 at 20:50:23, Lee J Haywood wrote...
We got to do Monte Carlo simulations at university, although I didn't know the origin of the term. The idea is to solve problems with a large number of variables, not just probabilistic ones. We had to use purchase histories to run simulations against various stock levels in order to maximse profit - taking into account the shelf space, number of days it takes to have new stock delivered, cost of storage in the warehouse, etc., trying out millions of combinations to work out what to re-order and when.
On 2008-09-17 at 20:53:48, Lee J Haywood wrote...
Take a look at my code - I started in JavaScript but re-wrote in C to speed it up. Also, I wrote it to solve a different problem initially, hence the need for the FIRST_CARD define... http://pr0gramm3rs.googlepages.com/cards.c.txt
On 2008-09-17 at 20:56:15, BorgClown wrote...
I recall someone calling it that, but here Monte Carlo is a well known maker of family board games, maybe that's because we tend to call those just simulations. But now I'm going to call it Monte Carlo simulations just to see their faces.
On 2008-09-17 at 21:02:39, BorgClown wrote...
@Lee J Haywood: Thanks for the source, I'm not a c programmer, and it looks big compared to my 36 line program, but it will help me to see why am I getting that result.
On 2008-09-17 at 21:41:10, BorgClown wrote...
@Lee J Haywood: Would you care to give a look at my program? It's expected that we get different probabilities because I am not just looking at the distribution, but the first one should be 25% for me too.
On 2008-09-17 at 21:41:51, Lee J Haywood wrote...
Okie dokie.
On 2008-09-17 at 21:43:09, BorgClown wrote...
I'll send it by email...
On 2008-09-17 at 21:51:27, BorgClown wrote...
@Lee J Haywood: Oops, mangled my comment. I meant that I'm not accumulating probabilities. They should be the same if I accumulate them, but the first one has to be the same for both of us.
On 2008-09-17 at 21:52:04, Lee J Haywood wrote...
I assume you've tried it with a larger number of games simulated?
On 2008-09-17 at 21:53:49, BorgClown wrote...
The biggest was 100 million.
On 2008-09-17 at 21:55:50, BorgClown wrote...
I also checked the random number generation docs. The generator is reseeded automatically with the machine time when the program starts.
On 2008-09-17 at 22:08:50, Lee J Haywood wrote...
With a million runs I got 249673 - pretty close. Because there are quite a lot of cards, you need a lot of runs to get an even distribution and avoid random clumping. Also, you have 1.0 instead of 100.0.
On 2008-09-17 at 22:11:05, Lee J Haywood wrote...
In fact, the other numbers come out correct as well - 190764 for 13/68 (0.9117647), 145522 for 247/1700 (0.14529)...
On 2008-09-17 at 22:13:00, BorgClown wrote...
@Lee J Haywood: The 1.0 is for Python to do float division, because if you don't put a float it defaults to integer division, go figure. But you're right anyway, because the % sign implies a percent, not a probability, I've changed it.
On 2008-09-17 at 22:15:24, BorgClown wrote...
Well, I'm relieved you got that result. I'm, running it more times to see how it affects the result. I was uneasy because no single run gave me 25% or greater, all were around 21-22%.
On 2008-09-17 at 22:20:09, BorgClown wrote...
@Lee J Haywood: How many times did you run your simulation? Being in C, it surely ran much faster than Python.
On 2008-09-17 at 22:24:33, Lee J Haywood wrote...
So long as you have enough games in each run you should only have to run it once, unless you change the code. Getting the stats for a larger number of games gives you more decimal places, but I found that you need about a million games for the 13:52 case and 10 million doesn't really add much even though you have to wait 10 times as long for the result. Unfortunately, getting enough decimal places to guess at the fractions would have required 100 million or more simulations, and that's just too long - it's only now that I've got the fractions that I can see for sure how (in)accurate the simulations were.
On 2008-09-17 at 22:27:30, Lee J Haywood wrote...
Originally I wrote the Monte Carlo simulation to answer a different question about the same game. To win the jackpot, you have to eliminate all 13 cards within the first 35 drawn by the quiz master from the full deck - so I worked out what the chances are of someone winning within a certain number of cards, for various numbers of players. This is the second table on my statistics page. I didn't work out the maths / fractions for this, though.
On 2008-09-17 at 22:36:18, BorgClown wrote...
Great. I ran it again with 10 millions, 19.2 %. I know that more effort gives diminishing returns, but a 6 % error, wow. Remind me to take these simulations with a grain of salt.
On 2008-09-17 at 22:36:43, BorgClown wrote...
That was a great and deceptively simple problem, like the three doors one.
On 2008-09-17 at 22:37:41, Lee J Haywood wrote...
That shouldn't happen - even with one million games it should always give a close answer.
On 2008-09-17 at 22:38:54, Lee J Haywood wrote...
Three doors? Did I point you to the 8 ball problem already? http://firstclassthoughts.co.uk/puzzle/the_8_ball_problem_iq_test.html
On 2008-09-17 at 22:40:26, BorgClown wrote...
The three doors problem: http://www.straightdope.com/columns/read/916/on-lets-make-a-deal-you-pick-door-1-monty-opens-door-2-no-prize-do-you-stay-with-door-1-or-switch-to-3
On 2008-09-17 at 22:42:02, BorgClown wrote...
Maybe it's a problem with the random number generator itself. Although it gives closer results when I use 1 million.
On 2008-09-17 at 22:43:16, Lee J Haywood wrote...
Oh, right - that 3 door problem. Yesterday, I came across this great page/site... http://www.airplaneonatreadmill.com/
On 2008-09-17 at 22:45:53, BorgClown wrote...
Oh, and the 8 ball problem is a classic. It got me the first time too, we tend to get in binary search mode.
On 2008-09-17 at 22:47:22, Lee J Haywood wrote...
Actually, I tried the 8 ball problem on my mother and aunt and they did it using a binary search - so it's not true that it only stumps programmers.
On 2008-09-17 at 22:48:35, BorgClown wrote...
The plane on the treadmill was tested on Mythbusters. They did it questionably, but you can see that the plane couldn't care less if the floor was moving or not, it just strode forward. I was one of the people who thought that it wouldn't fly because there would be no air flowing beneath the wings.
On 2008-09-17 at 22:50:13, BorgClown wrote...
Hey, at least they didn't weight each pair individually. It's so weird to have such a restrictive use of a mechanical scale.
On 2008-09-17 at 22:51:18, Lee J Haywood wrote...
I haven't seen the Mythbusters, but basically conclusion should be that the wheels just spin and don't provide any propulsion anyway - it's the motion of the wings relative to the air that's important, and since the engines provide thrust via their contact with the air then the aeroplane will take off provided it moves relative to the air/ground.
On 2008-09-17 at 22:51:31, Lee J Haywood wrote...
Perhaps you're being charged for each use of the scales. (-:
On 2008-09-17 at 22:54:42, Lee J Haywood wrote...
My 10 million run came back - 2116192, which is way out as you say. There must be some quirk with Python.
On 2008-09-17 at 23:02:14, Beeba wrote...
i get it now, lee. you're right, the wording threw me off.
On 2008-09-17 at 23:02:53, Lee J Haywood wrote...
@BorgClown: Hmm, there is a flaw of sorts in your version - you re-create <tt>DeckB for every game before the shuffle. It makes more sense to create it once and keep shuffling the already-shuffled deck, which prevents problems with any bias in the random number generator.
On 2008-09-17 at 23:04:43, BorgClown wrote...
@Lee J Haywood: Would you believe me that I tried to do that, but I kept losing the DeckB list at the second shuffle? I'll look into it now.
On 2008-09-17 at 23:05:25, BorgClown wrote...
@Beeba: Yay for mind-twisters time-wasters!
On 2008-09-17 at 23:12:30, BorgClown wrote...
The results are better reshuffling the already shuffled deck. Maybe the problem lies within the random.shuffle function, I didn't read the docs about how or how many times it shuffles.
On 2008-09-17 at 23:18:36, Lee J Haywood wrote...
Yes, I just got the results back and that does fix it for me too - although it doesn't make sense that it should. Perhaps you used xrange instead of range originally, to create the deck? Anyway, I'm going to bed now...
On 2008-09-17 at 23:20:57, BorgClown wrote...
The documentation says: "Note that for even rather small len(x), the total number of permutations of x is larger than the period of most random number generators; this implies that most permutations of a long sequence can never be generated" - Sheesh, these days you can't trust nothing =)
On 2008-09-17 at 23:22:36, BorgClown wrote...
It was my mistake giving it a ordered deck to shuffle each time.
On 2008-09-17 at 23:25:42, BorgClown wrote...
@Lee J Haywood: Missed your last comment. Sleep well, see you later.
On 2008-09-17 at 23:26:05, Lee J Haywood wrote...
It still shouldn't matter, if the random numbers are always progressing - why should 10 million iterations be worse than one million, unless the random numbers have a period bias (unlikely)? Perhaps the problem lies with the shuffle function then, since I wrote my own? Goodnight...