Calculate bootstrap confidence intervals using various methods.
Usage
int_pctl(.data, ...)
# S3 method for bootstraps
int_pctl(.data, statistics, alpha = 0.05, ...)
int_t(.data, ...)
# S3 method for bootstraps
int_t(.data, statistics, alpha = 0.05, ...)
int_bca(.data, ...)
# S3 method for bootstraps
int_bca(.data, statistics, alpha = 0.05, .fn, ...)
Arguments
- .data
A data frame containing the bootstrap resamples created using
bootstraps()
. For t- and BCa-intervals, theapparent
argument should be set toTRUE
. Even if theapparent
argument is set toTRUE
for the percentile method, the apparent data is never used in calculating the percentile confidence interval.- ...
Arguments to pass to
.fn
(int_bca()
only).- statistics
An unquoted column name or
dplyr
selector that identifies a single column in the data set containing the individual bootstrap estimates. This must be a list column of tidy tibbles (with columnsterm
andestimate
). For t-intervals, a standard tidy column (usually calledstd.err
) is required. See the examples below.- alpha
Level of significance.
- .fn
A function to calculate statistic of interest. The function should take an
rsplit
as the first argument and the...
are required.
Value
Each function returns a tibble with columns .lower
,
.estimate
, .upper
, .alpha
, .method
, and term
.
.method
is the type of interval (eg. "percentile",
"student-t", or "BCa"). term
is the name of the estimate. Note
the .estimate
returned from int_pctl()
is the mean of the estimates from the bootstrap resamples and not the estimate from the apparent model.
Details
Percentile intervals are the standard method of obtaining confidence intervals but require thousands of resamples to be accurate. T-intervals may need fewer resamples but require a corresponding variance estimate. Bias-corrected and accelerated intervals require the original function that was used to create the statistics of interest and are computationally taxing.
References
https://rsample.tidymodels.org/articles/Applications/Intervals.html
Davison, A., & Hinkley, D. (1997). Bootstrap Methods and their Application. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511802843
Examples
# \donttest{
library(broom)
library(dplyr)
library(purrr)
library(tibble)
lm_est <- function(split, ...) {
lm(mpg ~ disp + hp, data = analysis(split)) %>%
tidy()
}
set.seed(52156)
car_rs <-
bootstraps(mtcars, 500, apparent = TRUE) %>%
mutate(results = map(splits, lm_est))
int_pctl(car_rs, results)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms: `(Intercept)`, `disp`, `hp`.
#> # A tibble: 3 × 6
#> term .lower .estimate .upper .alpha .method
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 (Intercept) 27.5 30.7 33.6 0.05 percentile
#> 2 disp -0.0440 -0.0300 -0.0162 0.05 percentile
#> 3 hp -0.0572 -0.0260 -0.00840 0.05 percentile
int_t(car_rs, results)
#> # A tibble: 3 × 6
#> term .lower .estimate .upper .alpha .method
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 (Intercept) 28.1 30.7 34.6 0.05 student-t
#> 2 disp -0.0446 -0.0300 -0.0170 0.05 student-t
#> 3 hp -0.0449 -0.0260 -0.00337 0.05 student-t
int_bca(car_rs, results, .fn = lm_est)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms: `(Intercept)`, `disp`, `hp`.
#> # A tibble: 3 × 6
#> term .lower .estimate .upper .alpha .method
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 (Intercept) 27.7 30.7 33.7 0.05 BCa
#> 2 disp -0.0446 -0.0300 -0.0172 0.05 BCa
#> 3 hp -0.0576 -0.0260 -0.00843 0.05 BCa
# putting results into a tidy format
rank_corr <- function(split) {
dat <- analysis(split)
tibble(
term = "corr",
estimate = cor(dat$sqft, dat$price, method = "spearman"),
# don't know the analytical std.err so no t-intervals
std.err = NA_real_
)
}
set.seed(69325)
data(Sacramento, package = "modeldata")
bootstraps(Sacramento, 1000, apparent = TRUE) %>%
mutate(correlations = map(splits, rank_corr)) %>%
int_pctl(correlations)
#> # A tibble: 1 × 6
#> term .lower .estimate .upper .alpha .method
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 corr 0.737 0.768 0.796 0.05 percentile
# }