Simulating Probability Problem

Frequentist Interpretation & Monty-Hall Problem

Mathematics

R Programming

Probability Theory

Author

Abhirup Moitra

Published

November 26, 2023

Simulating The Frequentist Interpretation

The frequentist interpretation of conditional probability based on a large number $n$ of repetitions of an experiment is $P(A|B) \approx n_{AB}/n_{B}$, where $n_{AB}$ is the number of times that $A\cap B$ occurs and $n_B$ is the number of times that $B$ occurs. Let’s try this out by simulation, and verify the results. So let’s simulate $n$ families, each with two children.

Code

n <- 10^5
child1 <- sample(2,n,replace=TRUE)
child2 <- sample(2,n,replace=TRUE)

Here child1 is a vector of length $n$, where each element is a $1$ or a $2$. Letting $1$ stand for “girl” and $2$ stand for “boy”, this vector represents the gender of the elder child in each of the $n$ families. Similarly, child2 represents the gender of the younger child in each family. Alternatively, we could have used

Code

sample(c("girl","boy"),n,replace=TRUE)

but it is more convenient working with numerical values. Let $A$ be the event that both children are girls and $B$ the event that the elder is a girl. Following the frequentist interpretation, we count the number of repetitions where $B$ occurred and name it n.b, and we also count the number of repetitions where $A \cap B$ occurred and name it n.ab. Finally, we divide n.ab by n.b to approximate $P(A|B)$.

Code

n.b <- sum(child1==1)
n.ab <- sum(child1==1 & child2==1)
n.ab/n.b

[1] 0.5015086

The ampersand & is an elementwise AND, so n.ab is the number of families where both the first child and the second child are girls. When we ran this code, we got $0.50$, confirming our answer $P(\text{both girls|elder is a girl}) = 1/2$.

Now let $A$ be the event that both children are girls and $B$ the event that at least one of the children is a girl. Then $A\cap B$ is the same, but n.b needs to count the number of families where at least one child is a girl. This is accomplished with the elementwise OR operator | (this is not a conditioning bar; it is an inclusive OR, returning TRUE if at least one element is TRUE).

Code

n.b <- sum(child1==1 | child2==1)
n.ab <- sum(child1==1 & child2==1)
n.ab/n.b

[1] 0.3338121

For us, the result was $0.33$, confirming that $P(\text{both girls|at least one girl}) = 1/3$.

Monty-Hall Simulation

An interesting application of conditional probability is the Monty-Hall problem, whose solution intrigued many, including career mathematicians (life is hard!). There is a lot of trivia around this problem in the internet, a good place to start would be the Wikipedia article here: https://en.wikipedia.org/wiki/Monty_Hall_problem

1 Monty-Hall Three doors

Run a simulation to check that the probablility of winning increases to $2/3$ if we switch doors at step two in the regular 3-door Monty-Hall problem.

Set up the experiment two functions monty_3doors_noswitch() and monty_3doors_switch() (these functions will have no input values):

Code

monty_3doors_noswitch <- function(){

  doors = c(1,2,3) # had 3 doors
  
ch1 = sample(doors, 1) # 1st choice: pick one door at random
crct1 = sample(doors, 1) # the correct box
  
# Assume When making the second choice, 
#you stick with the original choice and you are RIGHT
  
# let X be a binary variable that takes the value 1 
#if the choice is correct and 0 if incorrect
x = 0
x = ifelse(ch1 == crct1,1,0) # you stick with the original choice and you are RIGHT
x
}

no_of_Successes = replicate(100000, monty_3doors_noswitch())
prob = sum(no_of_Successes)/100000
probability = prob
probability

[1] 0.3338

Code

monty_3doors_switch <- function(){
  
  doors = c(1,2,3) # had 3 doors
  
ch = sample(doors, 1) # 1st choice: pick one door at random
crct = sample(doors, 1) # the correct box
  
# When making the second choice, 
#you stick with the original choice and you are WRONG.
  
# let X be a binary variable that takes the value 1 
#if the choice is correct and 0 if incorrect
x = 0
x = ifelse(ch != crct,1,0) # you stick with the original choice and you are WRONG.
x
  
}
 
no_of_Successes = replicate(100000, monty_3doors_switch())
prob = sum(no_of_Successes)/100000
probabilty = prob
prob

[1] 0.66804

Use your two functions and the replicate function to compute the empirical probablility of winning for the two experiments. Compare your answers with the actual theoretical predictions.

No switch:The theoretical probability of success is 1/3 = 0.3333
Therefore, approximately the same results from the simulation = 0.3323

Switch: The theoretical probability of success is 2/3 = 0.6667.
Therefore, approximately the same results from the simulation = 0.6673.

2 Monty-Hall with Ten doors.

Repeat the Monty Hall experiment now with 10 doors. Recall the game is as follows:

Step 1: you choose one door at random.
Step 2: Monty opens 8 (out of 9 doors) that do not have the prize.
Step 3: you either switch or don’t switch.

Set up the experiment two functions monty_10doors_noswitch() and monty_10doors_switch() (these functions will have no input values):

Code

monty_10doors_noswitch <- function(){
  Prize_Door_Number<- sample(1:10, size=1000, replace=T)
choose_random_door_No <- sample(1:10, size=1000, replace=T)
  
Winner<- rep(0,1000)

#In case of no switch prize win when Prize_Door_Number same as 
#choosen_random_door_No
Winner[Prize_Door_Number==choose_random_door_No]=1
return(paste("probability of winning for Not switching Door = ",
              sum(Winner)/1000))
}

monty_10doors_noswitch()

[1] "probability of winning for Not switching Door =  0.079"

Code

#________________________________________________________________________
monty_10doors_switch <- function(){
  Prize_Door_Number<- sample(1:10, size=1000, replace=T)
choose_random_door_No <- sample(1:10, size=1000, replace=T)
  
Winner<- rep(0,1000)

#In case of switch prize win when Prize_Door_Number is not same as 
#choosen_random_door_No

Winner[Prize_Door_Number!=choose_random_door_No]=1
return(paste("probability of winning for switching Door = ",
              sum(Winner)/1000))
}

monty_10doors_switch()

[1] "probability of winning for switching Door =  0.897"

Use your two functions and the replicate function to compute the empirical probablility of winning for the two experiments. Compare your answers with the actual theoretical predictions.

3 Monty-Hall 10-doors (modified).

Consider the following modified Monty-Hall game with 10 doors.

Step 1: you choose one door at random.
Step 2: Monty opens 7 (out of 9 doors) that do not have the prize.
Step 3: you either stick with your original choice, or choose between one of the two unopened doors.

Set up the experiment two functions monty_10doors_mod_noswitch() and monty_10doors_mod_switch() (these functions will have no input values):

Code

monty_10doors_mod_noswitch <- function(){
  Prize_Door_Number<- sample(1:10, size=1000, replace=T)
choose_random_door_No <- sample(1:10, size=1000, replace=T)
  
Winner<- rep(0,1000)

#In case of no switch prize win when 
#Prize_Door_Number same as choosen_random_door_No
Winner[Prize_Door_Number==choose_random_door_No]=1
return(paste("probability of winning for Not switching Door = ",
              sum(Winner)/1000))
}

monty_10doors_mod_noswitch()

[1] "probability of winning for Not switching Door =  0.113"

Code

#_______________________________________________________________

monty_10doors_mod_switch <- function(){
  Prize_Door_Number<- sample(1:10, size=1000, replace=T)
choose_random_door_No <- sample(1:10, size=1000, replace=T)
  
Winner<- rep(0,1000)

#In case of switch prize win when Prize_Door_Number is not same as 
#choosen_random_door_No

Winner[Prize_Door_Number!=choose_random_door_No]=1
return(paste("probability of winning for switching Door = ",
              sum(Winner)/1000))
}

monty_10doors_mod_switch()

[1] "probability of winning for switching Door =  0.881"

Use your two functions and the replicate function to compute the empirical probablility of winning for the two experiments. The computation of the theoretical probability in this case might not be completely obvious, however, use your empirical probability to make a guess.

Let’s Play

Many long, bitter debates about the Monty Hall problem could have been averted by trying it out with a simulation. To study how well the never-switch strategy performs, let’s generate 105 runs of the Monty Hall game. To simplify notation, assume the contestant always chooses door 1. Then we can generate a vector specifying which door has the car for each repetition:

Code

n <- 10^5 
cardoor <- sample(3,n,replace=TRUE)

At this point we could generate the vector specifying which doors Monty opens, but that’s unnecessary since the never-switch strategy succeeds if and only if door 1 has the car! So the fraction of times when the never-switch strategy succeeds is sum(cardoor==1)/n, which was $0.334$ in our simulation. This is very close to $1/3$. What if we want to play the Monty Hall game interactively? We can do this by programming a function. Entering the following code in R defines a function called monty, which can then be invoked by entering the command monty() any time you feel like playing the game!

Code

monty <- function() {
doors <- 1:3
# randomly pick where the car is

cardoor <- sample(doors,1)
# prompt player

print("Monty Hall says ‘Pick a door, any door!’")

# receive the player’s choice of door (should be 1,2, or 3)

chosen <- scan(what = integer(), nlines = 1, quiet = TRUE)

# pick Monty’s door (can’t be the player’s door or the car door)

if (chosen != cardoor) montydoor <- doors[-c(chosen, cardoor)]
else montydoor <- sample(doors[-chosen],1)

# find out whether the player wants to switch doors
print(paste("Monty opens door ", montydoor, "!", sep=""))

print("Would you like to switch (y/n)?")

reply <- scan(what = character(), nlines = 1, quiet = TRUE)

# interpret what player wrote as "yes" if it starts with "y"
if (substr(reply,1,1) == "y") chosen <- doors[-c(chosen,montydoor)]

# announce the result of the game!

if (chosen == cardoor) print("You won!")
else print("You lost!")
}

# To Execute this code declare monty() then press ctrl+enter from keyboard

# monty()

The print command prints its argument to the screen. We combine this with paste since print("Monty opens door montydoor") would literally print “Monty opens door montydoor”. The scan command interactively requests input from the user; we use what = integer() when we want the user to enter an integer and what = character() when we want the user to enter text. Using substr(reply,1,1) extracts the first character of reply, in case the user replies with “yes” or “yep” or “yeah!” rather than with “y”.

--- title: "Simulating Probability Problem" subtitle: "Frequentist Interpretation & Monty-Hall Problem" author: "Abhirup Moitra" date: 2023-11-26 format: html: code-fold: true code-tools: true editor: visual categories : [Mathematics, R Programming, Probability Theory] image: gqb5oK.gif --- # Simulating The Frequentist Interpretation The frequentist interpretation of conditional probability based on a large number $n$ of repetitions of an experiment is $P(A|B) \approx n_{AB}/n_{B}$, where $n_{AB}$ is the number of times that $A\cap B$ occurs and $n_B$ is the number of times that $B$ occurs. Let's try this out by simulation, and verify the results. So let's simulate $n$ families, each with two children. ```{r} n <- 10^5 child1 <- sample(2,n,replace=TRUE) child2 <- sample(2,n,replace=TRUE) ``` Here `child1` is a vector of length $n$, where each element is a $1$ or a $2$. Letting $1$ stand for "girl" and $2$ stand for "boy", this vector represents the gender of the elder child in each of the $n$ families. Similarly, `child2` represents the gender of the younger child in each family. Alternatively, we could have used ```{r,eval=FALSE} sample(c("girl","boy"),n,replace=TRUE) ``` but it is more convenient working with numerical values. Let $A$ be the event that both children are girls and $B$ the event that the elder is a girl. Following the frequentist interpretation, we count the number of repetitions where $B$ occurred and name it `n.b`, and we also count the number of repetitions where $A \cap B$ occurred and name it `n.ab`. Finally, we divide `n.ab` by `n.b` to approximate $P(A|B)$. ```{r} n.b <- sum(child1==1) n.ab <- sum(child1==1 & child2==1) n.ab/n.b ``` The ampersand & is an elementwise AND, so `n.ab` is the number of families where both the first child and the second child are girls. When we ran this code, we got $0.50$, confirming our answer $P(\text{both girls|elder is a girl}) = 1/2$. Now let $A$ be the event that both children are girls and $B$ the event that at least one of the children is a girl. Then $A\cap B$ is the same, but `n.b` needs to count the number of families where at least one child is a girl. This is accomplished with the elementwise OR operator \| (this is not a conditioning bar; it is an inclusive OR, returning TRUE if at least one element is TRUE). ```{r} n.b <- sum(child1==1 | child2==1) n.ab <- sum(child1==1 & child2==1) n.ab/n.b ``` For us, the result was $0.33$, confirming that $P(\text{both girls|at least one girl}) = 1/3$. # **Monty-Hall Simulation** An interesting application of conditional probability is the Monty-Hall problem, whose solution intrigued many, including career mathematicians (life is hard!). There is a lot of trivia around this problem in the internet, a good place to start would be the Wikipedia article here: <https://en.wikipedia.org/wiki/Monty_Hall_problem> ## **1 Monty-Hall Three doors** Run a simulation to check that the probablility of winning increases to $2/3$ if we switch doors at step two in the regular 3-door Monty-Hall problem. Set up the experiment two functions `monty_3doors_noswitch()` and `monty_3doors_switch()` (these functions will have no input values): ```{r} monty_3doors_noswitch <- function(){ doors = c(1,2,3) # had 3 doors ch1 = sample(doors, 1) # 1st choice: pick one door at random crct1 = sample(doors, 1) # the correct box # Assume When making the second choice, #you stick with the original choice and you are RIGHT # let X be a binary variable that takes the value 1 #if the choice is correct and 0 if incorrect x = 0 x = ifelse(ch1 == crct1,1,0) # you stick with the original choice and you are RIGHT x } no_of_Successes = replicate(100000, monty_3doors_noswitch()) prob = sum(no_of_Successes)/100000 probability = prob probability ``` ```{r} monty_3doors_switch <- function(){ doors = c(1,2,3) # had 3 doors ch = sample(doors, 1) # 1st choice: pick one door at random crct = sample(doors, 1) # the correct box # When making the second choice, #you stick with the original choice and you are WRONG. # let X be a binary variable that takes the value 1 #if the choice is correct and 0 if incorrect x = 0 x = ifelse(ch != crct,1,0) # you stick with the original choice and you are WRONG. x } no_of_Successes = replicate(100000, monty_3doors_switch()) prob = sum(no_of_Successes)/100000 probabilty = prob prob ``` Use your two functions and the replicate function to compute the empirical probablility of winning for the two experiments. Compare your answers with the actual theoretical predictions. ``` No switch:The theoretical probability of success is 1/3 = 0.3333 Therefore, approximately the same results from the simulation = 0.3323 Switch: The theoretical probability of success is 2/3 = 0.6667. Therefore, approximately the same results from the simulation = 0.6673. ``` # **2 Monty-Hall with Ten doors.** Repeat the Monty Hall experiment now with 10 doors. Recall the game is as follows: - Step 1: you choose one door at random. - Step 2: Monty opens 8 (out of 9 doors) that do not have the prize. - Step 3: you either switch or don't switch. Set up the experiment two functions `monty_10doors_noswitch()` and `monty_10doors_switch()` (these functions will have no input values): ```{r} monty_10doors_noswitch <- function(){ Prize_Door_Number<- sample(1:10, size=1000, replace=T) choose_random_door_No <- sample(1:10, size=1000, replace=T) Winner<- rep(0,1000) #In case of no switch prize win when Prize_Door_Number same as #choosen_random_door_No Winner[Prize_Door_Number==choose_random_door_No]=1 return(paste("probability of winning for Not switching Door = ", sum(Winner)/1000)) } monty_10doors_noswitch() #________________________________________________________________________ monty_10doors_switch <- function(){ Prize_Door_Number<- sample(1:10, size=1000, replace=T) choose_random_door_No <- sample(1:10, size=1000, replace=T) Winner<- rep(0,1000) #In case of switch prize win when Prize_Door_Number is not same as #choosen_random_door_No Winner[Prize_Door_Number!=choose_random_door_No]=1 return(paste("probability of winning for switching Door = ", sum(Winner)/1000)) } monty_10doors_switch() ``` Use your two functions and the replicate function to compute the empirical probablility of winning for the two experiments. Compare your answers with the actual theoretical predictions. # **3 Monty-Hall 10-doors (modified).** Consider the following modified Monty-Hall game with 10 doors. - Step 1: you choose one door at random. - Step 2: Monty opens 7 (out of 9 doors) that do not have the prize. - Step 3: you either stick with your original choice, or choose between one of the two unopened doors. Set up the experiment two functions `monty_10doors_mod_noswitch()` and `monty_10doors_mod_switch()` (these functions will have no input values): ```{r} monty_10doors_mod_noswitch <- function(){ Prize_Door_Number<- sample(1:10, size=1000, replace=T) choose_random_door_No <- sample(1:10, size=1000, replace=T) Winner<- rep(0,1000) #In case of no switch prize win when #Prize_Door_Number same as choosen_random_door_No Winner[Prize_Door_Number==choose_random_door_No]=1 return(paste("probability of winning for Not switching Door = ", sum(Winner)/1000)) } monty_10doors_mod_noswitch() #_______________________________________________________________ monty_10doors_mod_switch <- function(){ Prize_Door_Number<- sample(1:10, size=1000, replace=T) choose_random_door_No <- sample(1:10, size=1000, replace=T) Winner<- rep(0,1000) #In case of switch prize win when Prize_Door_Number is not same as #choosen_random_door_No Winner[Prize_Door_Number!=choose_random_door_No]=1 return(paste("probability of winning for switching Door = ", sum(Winner)/1000)) } monty_10doors_mod_switch() ``` Use your two functions and the replicate function to compute the empirical probablility of winning for the two experiments. The computation of the theoretical probability in this case might not be completely obvious, however, use your empirical probability to make a guess. ## Let's Play Many long, bitter debates about the Monty Hall problem could have been averted by trying it out with a simulation. To study how well the never-switch strategy performs, let's generate 105 runs of the Monty Hall game. To simplify notation, assume the contestant always chooses door 1. Then we can generate a vector specifying which door has the car for each repetition: ```{r} n <- 10^5 cardoor <- sample(3,n,replace=TRUE) ``` At this point we could generate the vector specifying which doors Monty opens, but that's unnecessary since the never-switch strategy succeeds if and only if door 1 has the car! So the fraction of times when the never-switch strategy succeeds is `sum(cardoor==1)/n`, which was $0.334$ in our simulation. This is very close to $1/3$. What if we want to play the Monty Hall game interactively? We can do this by programming a function. Entering the following code in R defines a function called monty, which can then be invoked by entering the command `monty()` any time you feel like playing the game! ```{r} monty <- function() { doors <- 1:3 # randomly pick where the car is cardoor <- sample(doors,1) # prompt player print("Monty Hall says ‘Pick a door, any door!’") # receive the player’s choice of door (should be 1,2, or 3) chosen <- scan(what = integer(), nlines = 1, quiet = TRUE) # pick Monty’s door (can’t be the player’s door or the car door) if (chosen != cardoor) montydoor <- doors[-c(chosen, cardoor)] else montydoor <- sample(doors[-chosen],1) # find out whether the player wants to switch doors print(paste("Monty opens door ", montydoor, "!", sep="")) print("Would you like to switch (y/n)?") reply <- scan(what = character(), nlines = 1, quiet = TRUE) # interpret what player wrote as "yes" if it starts with "y" if (substr(reply,1,1) == "y") chosen <- doors[-c(chosen,montydoor)] # announce the result of the game! if (chosen == cardoor) print("You won!") else print("You lost!") } # To Execute this code declare monty() then press ctrl+enter from keyboard # monty() ``` The print command prints its argument to the screen. We combine this with paste since `print("Monty opens door montydoor")` would literally print "Monty opens door montydoor". The scan command interactively requests input from the user; we use `what = integer()` when we want the user to enter an integer and `what = character()` when we want the user to enter text. Using `substr(reply,1,1)` extracts the first character of reply, in case the user replies with "yes" or "yep" or "yeah!" rather than with "y". ![](thanks.jpg){fig-align="center" width="335"}