Convincing myself about the Monty Hall problem

Like many others, I’ve never felt that the solution to the Monty Hall problem was intuitive, despite the fact that explanations of the correct solution are everywhere. I am not alone. Famously, columnist Marilyn vos Savant got droves of mail from people trying to school her after she had published the correct solution.

The problem goes like this: You are a contestant on a game show (based on a real game show hosted by Monty Hall, hence the name). The host presents you with three doors, one of which contains a prize — say, a goat — and the others are empty. After you’ve made your choice, the host opens one of the doors, showing that it is empty. You are now asked whether you would like to stick to your initial choice, or switch to the other door. The right thing to do is to switch, which gives you 2/3 probability of winning the goat. This can be demonstrated in a few different ways.

A goat is a great prize. Image: Casey Goat by Pete Markham (CC BY-SA 2.0)

So I sat down to do 20 physical Monty Hall simulations on paper. I shuffled three cards with the options, picked one, and then, playing the role of the host, took away one losing option, and noted down if switching or holding on to the first choice would have been the right thing to do. The results came out 13 out of 20 (65%) wins for the switching strategy, and 7 out of 20 (35%) for the holding strategy. Of course, the Monty Hall Truthers out there must question whether this demonstration in fact happened — it’s too perfect, isn’t it?

The outcome of the simulation is less important than the feeling that came over me as I was running it, though. As I was taking on the role of the host and preparing to take away one of the losing options, it started feeling self-evident that the important thing is whether the first choice is right. If the first choice is right, holding is the right strategy. If the first choice is wrong, switching is the right option. And the first choice, clearly, is only right 1/3 of the time.

In this case, it was helpful to take the game show host’s perspective. Selvin (1975) discussed the solution to the problem in The American Statistician, and included a quote from Monty Hall himself:

Monty Hall wrote and expressed that he was not ”a student of statistics problems” but ”the big hole in your argument is that once the first box is seen to be empty, the contestant cannot exchange his box.” He continues to say, ”Oh, and incidentally, after one [box] is seen to be empty, his chances are no longer 50/50 but remain what they were in the first place, one out of three. It just seems to the contestant that one box having been eliminated, he stands a better chance. Not so.” I could not have said it better myself.

A generalised problem

Now, imagine the same problem with a number d number of doors, w number of prizes and o number of losing doors that are opened after the first choice is made. We assume that the losing doors are opened at random, and that switching entails picking one of the remaining doors at random. What is the probability of winning with the switching strategy?

The probability of picking the a door with or without a prize is:

\Pr(\text{pick right first}) = \frac{w}{d}

\Pr(\text{pick wrong first}) = 1 - \frac{w}{d}

If we picked a right door first, we have w – 1 winning options left out of d – o – 1 doors after the host opens o doors:

\Pr(\text{win\textbar right first}) = \frac{w - 1}{d - o - 1}

If we picked the wrong door first, we have all the winning options left:

\Pr(\text{win\textbar wrong first}) = \frac{w}{d - o - 1}

Putting it all together:

\Pr(\text{win\textbar switch}) = \Pr(\text{pick right first}) \cdot \Pr(\text{win\textbar right first}) + \\   + \Pr(\text{pick wrong first}) \cdot \Pr(\text{win\textbar wrong first}) = \\  = \frac{w}{d} \frac{w - 1}{d - o - 1} + (1 - \frac{w}{d}) \frac{w}{d - o - 1}

As before, for the hold strategy, the probability of winning is the probability of getting it right the first time:

\Pr(\text{win\textbar hold}) = \frac{w}{d}

With the original Monty Hall problem, w = 1, d = 3 and o = 1, and therefore

\Pr(\text{win\textbar switch}) = \frac{1}{3} \cdot 0 + \frac{2}{3} \cdot 1

Selvin (1975) also present a generalisation due to Ferguson, where there are n options and p doors that are opened after the choice. That is, w = 1, d = 3 and o = 1. Therefore,

\Pr(\text{win\textbar switch}) = \frac{1}{n} \cdot 0 + (1 - \frac{1}{n}) \frac{1}{n - p - 1} =  \frac{n - 1}{n(n - p - 1)}

which is Ferguson’s formula.

Finally, in Marilyn vos Savant’s column, she used this thought experiment to illustrate why switching is the right thing to do:

Here’s a good way to visualize what happened. Suppose there are a million doors, and you pick door #1. Then the host, who knows what’s behind the doors and will always avoid the one with the prize, opens them all except door #777,777. You’d switch to that door pretty fast, wouldn’t you?

That is, w = 1 still, d = 106 and o = 106 – 2.

\Pr(\text{win\textbar switch}) = 1 - \frac{1}{10^6}

It turns out that the solution to the generalised problem is that it is always better to switch, as long as there is a prize, and as long as the host opens any doors. One can also generalise it to choosing sets of more than one door. This makes some intuitive sense: as long as the host takes opens some doors, taking away losing options, switching should enrich for prizes.

Some code

To be frank, I’m not sure I have convinced myself of the solution to the generalised problem yet. However, using the code below, I did try the calculation for all combinations of total number of doors, prizes and doors opened up to 100, and in all cases, switching wins. That inspires some confidence, should I end up on a small ruminant game show.

The code below first defines a wrapper around R’s sampling function, which has a very annoying alternative behaviour when fed a vector of length one, to be able to build a computational version of my physical simulation. Finally, we have a function for the above formulae. (See whole thing on GitHub if you are interested.)

## Wrap sample into a function that avoids the "convenience"
## behaviour that happens when the length of x is one

sample_safer <- function(to_sample, n) {
  assert_that(n <= length(to_sample))
  if (length(to_sample) == 1)
  else {
    return(sample(to_sample, n))

## Simulate a generalised Monty Hall situation with
## w prizes, d doors and o doors that are opened.

sim_choice <- function(w, d, o) {
  ## There has to be less prizes than unopened doors
  assert_that(w < d - o) 
  wins <- rep(1, w)
  losses <- rep(0, d - w)
  doors <- c(wins, losses)
  ## Pick a door
  choice <- sample_safer(1:d, 1)
  ## Doors that can be opened
  to_open_from <- which(doors == 0)
  ## Chosen door can't be opened
  to_open_from <- to_open_from[to_open_from != choice]
  ## Doors to open
  to_open <- sample_safer(to_open_from, o)
  ## Switch to one of the remaining doors
  possible_switches <- setdiff(1:d, c(to_open, choice))
  choice_after_switch <- sample_safer(possible_switches , 1)
  result_hold <- doors[choice]
  result_switch <- doors[choice_after_switch]

## Formulas for probabilities

mh_formula <- function(w, d, o) {
  ## There has to be less prizes than unopened doors
  assert_that(w < d - o) 
  p_win_switch <- w/d * (w - 1)/(d - o - 1) +
                     (1 - w/d) * w / (d - o - 1) 
  p_win_hold <- w/d

## Standard Monty Hall

mh <- replicate(1000, sim_choice(1, 3, 1))
> mh_formula(1, 3, 1)
[1] 0.3333333 0.6666667
> rowSums(mh)/ncol(mh)
[1] 0.347 0.653

The Monty Hall problem problem

Guest & Martin (2020) use this simple problem as their illustration for computational model building: two 12 inch pizzas for the same price as one 18 inch pizza is not a good deal, because the 18 inch pizza contains more food. Apparently this is counter-intuitive to many people who have intuitions about inches and pizzas.

They call the risk of having inconsistencies in our scientific understanding because we cannot intuitively grasp the implications of our models ”The pizza problem”, arguing that it can be ameliorated by computational modelling, which forces you to spell out implicit assumptions and also makes you actually run the numbers. Having a formal model of areas of circles doesn’t help much, unless you plug in the numbers.

The Monty Hall problem problem is the pizza problem with a vengeance; not only is it hard to intuitively grasp what is going on in the problem, but even when presented with compelling evidence, the mental resistance might still remain and lead people to write angry letters and tweets.


Guest, O, & Martin, AE (2020). How computational modeling can force theory building in psychological science. Preprint.

Selvin, S (1975) On the Monty Hall problem. The American Statistician 29:3 p.134.

Theory in genetics

A couple of years ago, Brian Charlesworth published this essay about the value of theory in Heredity. He liked the same Sturtevant & Beadle quote that I liked.

Two outstanding geneticists, Alfred Sturtevant and George Beadle, started their splendid 1939 textbook of genetics (Sturtevant and Beadle 1939) with the remark ‘Genetics is a quantitative subject. It deals with ratios, and with the geometrical relationships of chromosomes. Unlike most sciences that are based largely on mathematical techniques, it makes use of its own system of units. Physics, chemistry, astronomy, and physiology all deal with atoms, molecules, electrons, centimeters, seconds, grams—their measuring systems are all reducible to these common units. Genetics has none of these as a recognizable component in its fundamental units, yet it is a mathematically formulated subject that is logically complete and self contained’.

This statement may surprise the large number of contemporary workers in genetics, who use high-tech methods to analyse the functions of genes by means of qualitative experiments, and think in terms of the molecular mechanisms underlying the cellular or developmental processes, in which they are interested. However, for those who work on transmission genetics, analyse the genetics of complex traits, or study genetic aspects of evolution, the core importance of mathematical approaches is obvious.

Maybe this comes a surprise to some molecularly minded biologists; I doubt those working adjacent to a field called ”biophysics” or trying to understand what on Earth a ”t-distributed stochastic neighbor embedding” does to turn single-cell sequences into colourful blobs will have missed that there are quantitative aspects to genetics.

Anyways, Sturtevant & Beadle (and Charlesworth) are thinking of another kind of quantitation: they don’t just mean that maths is useful to geneticists, but of genetics as a particular kind of abstract science with its own concepts. It’s the distinction between viewing genetics as chemistry and genetics as symbols. In this vein, Charlesworth makes the distinction between statistical estimation and mathematical modelling in genetics, and goes on to give examples of the latter by an anecdotal history models of genetic variation, eventually going deeper into linkage disequilibrium. It’s a fun read, but it doesn’t really live up to the title by spelling out actual arguments for mathematical models, other than the observation that they have been useful in population genetics.

The hypothetical recurring reader will know this blog’s position on theory in genetics: it is useful, not just for theoreticians. Consequently, I agree with Charlesworth that formal modelling in genetics is a good thing, and that there is (and ought to be more of) constructive interplay between data and theory. I like that he suggests that mathematical models don’t even have to be that sophisticated to be useful; even if you’re not a mathematician, you can sometimes improve your understanding by doing some sums. He then takes that back a little by telling a joke about how John Maynard Smith’s paper on hitch-hiking was so difficult that only two researchers in the country could be smart enough to understand it. The point still stands. I would add that this applies to even simpler models than I suspect that Charlesworth had in mind. Speaking from experience, a few pseudo-random draws from a binomial distribution can sometimes clear your head about a genetic phenomenon, and while this probably won’t amount to any great advances in the field, it might save you days of fruitless faffing.

As it happens, I also recently read this paper (Robinaugh et al. 2020) about the value of formal theory in psychology, and in many ways, it makes explicit some things that Charlesworth’s essay doesn’t spell out, but I think implies: We want our scientific theories to explain ”robust, generalisable features of the world” and represent the components of the world that give rise to those phenomena. Formal models, expressed in precise languages like maths and computational models are preferable to verbal models, that express the structure of a theory in words, because these precise languages make it easier to deduce what behaviour of the target system that the model implies. Charlesworth and Robinaugh et al. don’t perfectly agree. For one thing, Robinaugh et al. seem to suggest that a good formal model should be able to generate fake data that can be compared to empirical data summaries and give explanations of computational models, while Charlesworth seems to view simulation as an approximation one sometimes has to resort to.

However, something that occurred to me while reading Charlesworth’s essay was the negative framing of why theory is useful. This is how Charlesworth recommends mathematical modelling in population genetic theory, by approvingly repeating this James Crow quote:

I hope to have provided evidence that the mathematical modelling of population genetic processes is crucial for a proper understanding of how evolution works, although there is of course much scope for intuition and verbal arguments when carefully handled (The Genetical Theory of Natural Selection is full of examples of these). There are many situations in which biological complexity means that detailed population genetic models are intractable, and where we have to resort to computer simulations, or approximate representations of the evolutionary process such as game theory to produce useful results, but these are based on the same underlying principles. Over the past 20 years or so, the field has moved steadily away from modelling evolutionary processes to developing statistical tools for estimating relevant parameters from large datasets (see Walsh and Lynch 2017 for a comprehensive review). Nonetheless, there is still plenty of work to be done on improving our understanding of the properties of the basic processes of evolution.

The late, greatly loved, James Crow used to say that he had no objection to graduate students in his department not taking his course on population genetics, but that he would like them to sign a statement that they would not make any pronouncements about evolution. There are still many papers published with confused ideas about evolution, suggesting that we need a ‘Crow’s Law’, requiring authors who discuss evolution to have acquired a knowledge of basic population genetics.

This is one of the things I prefer about Robinaugh et al.’s account: To them, theory is not mainly about clearing up confusion and wrongness, but about developing ideas by checking their consistency with data, and exploring how they can be modified to be less wrong. And when we follow Charlesworth’s anecdotal history of linked selection, it can be read as sketching a similar path. It’s not a story about some people knowing ”basic population genetics” and being in the right, and others now knowing it and being confused (even if that surely happens also); it’s about a refinement of models in the face of data — and probably vice versa.

If you listen to someone talking about music theory, or literary theory, they will often defend themselves against the charge that theory drains their domain of the joy and creativity. Instead, they will argue that theory helps you appreciate the richness of music, and gives you tools to invent new and interesting music. You stay ignorant of theory at your own peril, not because you risk doing things wrong, but because you risk doing uninteresting rehashes, not even knowing what you’re missing. Or something like that. Adam Neely (”Why you should learn music theory”, YouTube video) said it better. Now, the analogy is not perfect, because the relationship between empirical data and theory in genetics is such that the theory really does try to say true or false things about the genetics in a way that music theory (at least as practiced by music theory YouTubers) does not. I still think there is something to be said for theory as a tool for creativity and enjoyment in genetics.


Charlesworth, B. (2019). In defence of doing sums in genetics. Heredity, 123(1), 44-49.

Robinaugh, D., Haslbeck, J., Ryan, O., Fried, E. I., & Waldorp, L. (2020). Invisible hands and fine calipers: A call to use formal theory as a toolkit for theory construction. Paper has since been published in a journal, but I read the preprint.