Based on Chapter 8 of ModernDive. Code for Quiz 12.
Load the R package we will use
library(tidyverse)
library(moderndive)
library(infer)
library(fivethirtyeight)
congress_age
congress_age
and
assign it to congress_age_100
set.seed(123)
congress_age_100 <- congress_age %>%
rep_sample_n(size=100)
congress_age
is the population and
congress_age_100
is the sample1. Use specify
to indicate the variable from
congress_age_100 that you are interested in
Response: age (numeric)
# A tibble: 100 × 1
age
<dbl>
1 53.1
2 54.9
3 65.3
4 60.1
5 43.8
6 57.9
7 55.3
8 46
9 42.1
10 37
# … with 90 more rows
2. generate
1000 replicates of your sample of
100
Response: age (numeric)
# A tibble: 100,000 × 2
# Groups: replicate [1,000]
replicate age
<int> <dbl>
1 1 42.1
2 1 71.2
3 1 45.6
4 1 39.6
5 1 56.8
6 1 71.6
7 1 60.5
8 1 56.4
9 1 43.3
10 1 53.1
# … with 99,990 more rows
The output has 100,000 rows
3. calculate
the mean for each
replicate
bootstrap_distribution_mean_age
bootstrap_distribution_mean_age
bootstrap_distribution_mean_age <- congress_age_100 %>%
specify(response = age) %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "mean")
bootstrap_distribution_mean_age
Response: age (numeric)
# A tibble: 1,000 × 2
replicate stat
<int> <dbl>
1 1 53.6
2 2 53.2
3 3 52.8
4 4 51.5
5 5 53.0
6 6 54.2
7 7 52.0
8 8 52.8
9 9 53.8
10 10 52.4
# … with 990 more rows
The bootstrap_distribution_mean_age has 1,000 means
4. visualize
the bootstrap
distribution
visualize(bootstrap_distribution_mean_age)
congress_ci_percentile
congress_ci_percentile
congress_ci_percentile <- bootstrap_distribution_mean_age %>%
get_confidence_interval(type = "percentile", level = 0.95)
congress_ci_percentile
# A tibble: 1 × 2
lower_ci upper_ci
<dbl> <dbl>
1 51.5 55.2
obs_mean_age
obs_mean_age
obs_mean_age <- congress_age_100 %>%
specify(response = age) %>%
calculate(stat = "mean") %>%
pull()
obs_mean_age
[1] 53.36
obs_mean_age
, to your
visualization and color it “hotpink”visualize(bootstrap_distribution_mean_age) +
shade_confidence_interval(endpoints = congress_ci_percentile) +
geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 2)
pop_mean_age
pop_mean_age
[1] 53.31373
pop_mean_age
, to the plot and color it “purple”visualize(bootstrap_distribution_mean_age) +
shade_confidence_interval(endpoints = congress_ci_percentile) +
geom_vline(xintercept = obs_mean_age, color = "hotpink", size = 2) +
geom_vline(xintercept = pop_mean_age, color = "purple", size = 3)
Is population mean the 95% confidence interval constructed using the bootstrap distribution? yes
Change set.seed (123) to set.seed (4346). Rerun all the code.