Based on Chapter 7 of ModernDive. Code for Quiz 11.
7.2.4 in Modern Dive with different sample sizes and repetitions
tidyverse
and the moderndive packagesModify the code for comparing different sample sizes from the virtual bowl
Segment 1: sample size = 30
1. a. Take 1200 samples of size of 30 instead of
1000 replicates of size 25 from the bowl dataset. Assign
the output to virtual_samples_30.
virtual_samples_30 <- bowl %>%
rep_sample_n(size = 30, reps = 1200)
virtual_samples_30 THENreplicate THENred equal to the sum of all the red
ballsprop_red equal to variable red /
30virtual_prop_red_30virtual_prop_red_30 via a
histogram. Use labs toggplot(virtual_prop_red_30, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 30 balls that were red", title = "30")

Segment 2: sample size = 55
2. a. Take 1200 samples of size of 55 instead of 1000 replicates of size 50. Assign the output to virtual_samples_55.
virtual_samples_55 <- bowl %>%
rep_sample_n(size = 55, reps = 1200)
virtual_samples_55 THENreplicate THENred equal to the sum of all the red
ballsprop_red equal to variable red /
55virtual_prop_red_55virtual_prop_red_55 via a
histogram. Use labs toggplot(virtual_prop_red_55, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 55 balls that were red", title = "55")

Segment 3: sample size = 120
3. a. Take 1200 samples of size of 120 instead of 1000 replicates of size 50. Assign the output to virtual_samples_120.
virtual_samples_120 <- bowl %>%
rep_sample_n(size = 120, reps = 1200)
virtual_samples_120 THENreplicate THENred equal to the sum of all the red
ballsprop_red equal to variable red /
120virtual_prop_red_120ggplot(virtual_prop_red_120, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 120 balls that were red", title = "120")

Calculate the standard deviations for your three sets of 1200 values
of prop_red using the standard deviation
n = 30
n = 55
n = 120
The distribution with sample size, n = 120, has the smallest standard deviation (spread) around the estimated proportion of red balls.