Based on Chapter 7 of ModernDive. Code for Quiz 11.
7.2.4 in Modern Dive with different sample sizes and repetitions
tidyverse
and the moderndive
packagesModify the code for comparing different sample sizes from the virtual bowl
Segment 1: sample size = 30
1. a. Take 1200 samples of size of 30 instead of
1000 replicates of size 25 from the bowl
dataset. Assign
the output to virtual_samples_30
.
virtual_samples_30 <- bowl %>%
rep_sample_n(size = 30, reps = 1200)
virtual_samples_30
THENreplicate
THENred
equal to the sum of all the red
ballsprop_red
equal to variable red /
30virtual_prop_red_30
virtual_prop_red_30
via a
histogram. Use labs
toggplot(virtual_prop_red_30, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 30 balls that were red", title = "30")
Segment 2: sample size = 55
2. a. Take 1200 samples of size of 55 instead of 1000 replicates of size 50. Assign the output to virtual_samples_55.
virtual_samples_55 <- bowl %>%
rep_sample_n(size = 55, reps = 1200)
virtual_samples_55
THENreplicate
THENred
equal to the sum of all the red
ballsprop_red
equal to variable red /
55virtual_prop_red_55
virtual_prop_red_55
via a
histogram. Use labs
toggplot(virtual_prop_red_55, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 55 balls that were red", title = "55")
Segment 3: sample size = 120
3. a. Take 1200 samples of size of 120 instead of 1000 replicates of size 50. Assign the output to virtual_samples_120.
virtual_samples_120 <- bowl %>%
rep_sample_n(size = 120, reps = 1200)
virtual_samples_120
THENreplicate
THENred
equal to the sum of all the red
ballsprop_red
equal to variable red /
120virtual_prop_red_120
ggplot(virtual_prop_red_120, aes(x = prop_red)) +
geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
labs(x = "Proportion of 120 balls that were red", title = "120")
Calculate the standard deviations for your three sets of 1200 values
of prop_red
using the standard deviation
n = 30
n = 55
n = 120
The distribution with sample size, n = 120, has the smallest standard deviation (spread) around the estimated proportion of red balls.