Chapter 13 Other Distributions and Simulations
Figure 13.1: A Statypus Professor Giving a Lecture
Platypuses have two layers of fur. The second layer serves as a sort of “dry suit” allowing it to trap a layer of air near its skin which insulates it and also allows it to remain buoyant.203
This Chapter has yet to be written. Please reach out to the author if you have any immediate questions.
FILL INTRODUCTION
New R Functions Used
All functions listed below have a help file built into RStudio. Most of these functions have options which are not fully covered or not mentioned in this chapter. To access more information about other functionality and uses for these functions, use the ? command. E.g. To see the help file for replicate, you can run ?replicate or ?replicate() in either an R Script or in your Console.
We will see the following functions in Chapter 13.
dgeom(): Density function for the geometric distribution.pgeom(): Distribution function for the geometric distribution.qgeom(): Quantile function for the geometric distribution.rgeom(): Random number generation for the geometric distribution.ppois(): Distribution function for the Poisson distribution.qpois(): Quantile function for the Poisson distribution.rpois(): Random number generation for the Poisson distribution.replicate(): Allows for repeated evaluation of an expression (which will usually involve random number generation).anyDuplicated(): Determines which elements of a vector or data frame are duplicates of elements with smaller subscripts.cumsum(): Returns a vector whose elements are the cumulative sums of the elements of the argument.rle(): Run Length Encoding: Computes the lengths and values of runs of equal values in a vector.
The functions cumsum() and rle() are only used in the challenging bonus problems and do not have any explanation here. Use ?cumsum or ?rle in your RStudio Console for help with these.
13.1 The Geometric Distribution
13.1.1 dgeom
The syntax of dgeom() is
dgeom(x, prob) #This code will not run unless the necessary values are inputted.where the arguments are:
x: Quantity to be evaluated forprob: Probability of success on each trial.
13.1.2 pgeom
The syntax of pgeom() is
pgeom( q, prob, lower.tail ) #This code will not run unless the necessary values are inputted.where the arguments are:
q: Quantile to be evaluated forprob: Probability of success on each trial.lower.tail: IfTRUE(default), probabilities are \(P(X \leq q)\), otherwise, \(P(X > q)\).
13.1.3 qgeom
The syntax of qgeom() is
qgeom( p, prob, lower.tail ) #This code will not run unless the necessary values are inputted.where the arguments are:
x: Quantity to be evaluated forprob: Probability of success on each trial.lower.tail: IfTRUE, the result isqsuch that \({\text{p} = P(X \leq \text{q} )}\) and ifFALSE, the result isqsuch that \({\text{p} = P(X > \text{q})}\).lower.tailis set toTRUEby default.
13.1.4 Random Geometric Quantities
The syntax of rgeom() is
rgeom(n, prob) #This code will not run unless the necessary values are inputted.where the arguments are:
n: Number of observations.prob: Probability of success on each trial.
Example 13.1 Three way tool finally right
Socks <- rgeom(100,1/3)
head(Socks)## [1] 0 0 2 3 3 1
table(Socks)## Socks
## 0 1 2 3 4 5 6 8 9 10 11 18
## 31 23 17 9 4 5 4 2 1 1 2 1
Example 13.2 Number of heads before first tails
set.seed(03271985)
FirstT <- rgeom(100,1/2)
table(FirstT)## FirstT
## 0 1 2 3 4 5 6 13
## 47 24 13 8 4 2 1 1
Example 13.3 First time rolling a 7204
rgeom(100,1/6)## [1] 1 0 4 2 3 4 2 11 2 14 23 4 5 2 12 4 6 4 1 3 1 3 0 0 12
## [26] 9 4 0 1 7 1 11 1 5 3 0 2 1 2 0 1 1 12 5 1 6 6 3 17 4
## [51] 0 0 0 7 9 15 2 4 4 3 3 1 2 2 0 6 1 2 2 8 16 5 1 4 2
## [76] 0 8 9 3 0 10 20 1 1 8 1 2 3 3 0 5 0 13 2 0 1 2 6 3 7
13.2 Poisson Distribution
13.2.1 dpois
The syntax of dpois() is
dgeom( x, lambda ) #This code will not run unless the necessary values are inputted.where the arguments are:
x: Quantity to be evaluated for.lambda: Value of the mean, \(\lambda\), for the distribution.
13.2.2 ppois
The syntax of ppois() is
pgeom( q, lamdba, lower.tail ) #This code will not run unless the necessary values are inputted.where the arguments are:
q: Quantile to be evaluated for.lambda: Value of the mean, \(\lambda\), for the distribution.lower.tail: IfTRUE(default), probabilities are \(P(X \leq q)\), otherwise, \(P(X > q)\).
13.2.3 qpois
The syntax of qpois() is
qpois( p, lambda, lower.tail ) #This code will not run unless the necessary values are inputted.where the arguments are:
x: Quantity to be evaluated forlambda: Value of the mean, \(\lambda\), for the distribution.lower.tail: IfTRUE, the result isqsuch that \({\text{p} = P(X \leq \text{q} )}\) and ifFALSE, the result isqsuch that \({\text{p} = P(X > \text{q})}\).lower.tailis set toTRUEby default.
13.3 Simulations
This section uses simulation as the center of a study of probability. Simulations can allow a study of probability questions that are not easily answered with abstract techniques. Simulation also plays a fundamental role in modern statistical methods such as resampling, bootstrapping, non-parametric statistics, and genetic algorithms.
We first used the R function sample() when finding a SRS in Chapter 2, but we now wish to leverage the function even further.
Rewrite the following
For an experiment with a finite sample space \(S =\{x_1, x_2, \ldots, x_n\}\), sample() can simulate one or many trials of the experiment. Essentially, sample treats \(S\) as a bag of outcomes, reaches into the bag, and picks one.
The syntax of sample is
sample( x , size , replace , prob )where the arguments are:
x: The vector of elements from which you are sampling.size: The number of samples you wish to take.replace: Whether you are sampling with replacement or not. Sampling without replacement means that sample will not pick the same value twice, and this is the default behavior. Passreplace = TRUEto sample if you wish to sample with replacement.prob: A vector of probabilities or weights associated withx. It should be a vector of non-negative numbers of the same length asx.
If the sum of prob is not 1, it will be normalized. That is, each value in prob will be taken to be out of the sum of prob. For example, if the values in prob were 0.2, 0.3, and 0.4, then the actual probabilities would be 0.2/0.9 = 2/9, 0.3/0.9 = 1/3, and 0.4/0.9 = 4/9. If prob is not provided, then each element of x is considered to be equally likely.
replicate(n , expr)repeats the expression stored in expr a series of n times and stores the resulting values as a vector.
For complicated simulations, we strongly recommend that you follow the workflow as presented above; namely,
Write repeatable code that performs the experiment a single time.
Replicate the experiment a small number of times and check the results.
replicate( 100 , {
EXPERIMENT GOES HERE
} )
- Replicate the experiment a large number of times and store the result.
event <- replicate( 10000 , {
EXPERIMENT GOES HERE
} )
- Finally, estimate the probability using
mean.
mean( event )
It is much easier to trouble-shoot your code this way, as you can test each line of your simulation separately.
Example 13.4 (First Look at M and Ms) According to Rick Wicklin205, the proportion of M&M’s of various colors produced in the New Jersey M&M factory are as shown in the table below.
Figure 13.2: M&M Candies, Photo by Evan Amos
\[\begin{array}{|l|c|} \hline \text{Color} & \text{Percentage}\\ \hline \text{Blue} & 25.0\\ \text{Orange} & 25.0\\ \text{Green} & 12.5\\ \text{Yellow} & 12.5\\ \text{Red} & 12.5\\ \text{Brown} & 12.5\\ \hline \end{array}\]
mm.colors <- c( "Blue", "Orange", "Green", "Yellow", "Red", "Brown" )
mm.probs <- c( 25, 25, 12.5, 12.5, 12.5, 12.5 )
bag <- sample(
x = mm.colors,
size = 35,
replace = TRUE,
prob = mm.probs
)
sum( bag == "Blue" ) # counts the number of Blue M&M's## [1] 9
Exercises
Exercise 13.1 Find the “odds” of Patricia Demauro’s accomplishment.
Exercise 13.2 Assume we roll a standard 6 sided die and a fair 12 sided die. Estimate the probability that the value on the 6 sided die is greater than that on the 12 sided one.
Exercise 13.3 Estimate the probability that the sum on two dice is greater than the sum on three dice.
Exercise 13.4 Watch the video below and show via simulation that the dice ranking shown in the video is correct.
Challenge Problems:
Exercise 13.5 Estimate the probability of having at least 6 successive Heads or Tails if you flipped a coin 100 times. What about 7 successive tosses of Heads or Tails? [Hint: Look up rle() in RStudio.]
Exercise 13.6 Refer to https://en.wikipedia.org/wiki/Craps#Pass_line and estimate the probability of winning a Pass line bet. [Warning: This will require more functions than given here.]
[Warning: The following problem(s) are even harder and will likely require a “loop” of some sort.]
“Platypus facts file”. Australian Platypus Conservancy.↩︎
See https://casino.betmgm.com/en/blog/luckiest-craps-players-in-the-world/ for more information.↩︎
C Purtill, “A Statistician Got Curious about M&M Colors and Went on an Endearingly Geeky Quest for Answers,” Quartz, March 15, 2017, https://qz.com/918008/the-color-distribution-of-mms-as-determined-by-a-phd-in-statistics/.↩︎