nested_cv
can be used to take the results of one resampling procedure
and conduct further resamples within each split. Any type of resampling
used in rsample
can be used.
nested_cv(data, outside, inside)
data | A data frame. |
---|---|
outside | The initial resampling specification. This can be an already
created object or an expression of a new object (see the examples below).
If the latter is used, the |
inside | An expression for the type of resampling to be conducted within the initial procedure. |
An tibble with nested_cv
class and any other classes that
outer resampling process normally contains. The results include a
column for the outer data split objects, one or more id
columns,
and a column of nested tibbles called inner_resamples
with the
additional resamples.
It is a bad idea to use bootstrapping as the outer resampling procedure (see the example below)
## Using expressions for the resampling procedures: nested_cv(mtcars, outside = vfold_cv(v = 3), inside = bootstraps(times = 5))#> # Nested resampling: #> # outer: 3-fold cross-validation #> # inner: Bootstrap sampling #> # A tibble: 3 x 3 #> splits id inner_resamples #> <list> <chr> <list> #> 1 <split [21/11]> Fold1 <tibble [5 × 2]> #> 2 <split [21/11]> Fold2 <tibble [5 × 2]> #> 3 <split [22/10]> Fold3 <tibble [5 × 2]>## Using an existing object: folds <- vfold_cv(mtcars) nested_cv(mtcars, folds, inside = bootstraps(times = 5))#> # Nested resampling: #> # outer: `folds` #> # inner: Bootstrap sampling #> # A tibble: 10 x 3 #> splits id inner_resamples #> <list> <chr> <list> #> 1 <split [28/4]> Fold01 <tibble [5 × 2]> #> 2 <split [28/4]> Fold02 <tibble [5 × 2]> #> 3 <split [29/3]> Fold03 <tibble [5 × 2]> #> 4 <split [29/3]> Fold04 <tibble [5 × 2]> #> 5 <split [29/3]> Fold05 <tibble [5 × 2]> #> 6 <split [29/3]> Fold06 <tibble [5 × 2]> #> 7 <split [29/3]> Fold07 <tibble [5 × 2]> #> 8 <split [29/3]> Fold08 <tibble [5 × 2]> #> 9 <split [29/3]> Fold09 <tibble [5 × 2]> #> 10 <split [29/3]> Fold10 <tibble [5 × 2]>## The dangers of outer bootstraps: set.seed(2222) bad_idea <- nested_cv(mtcars, outside = bootstraps(times = 5), inside = vfold_cv(v = 3))#> Warning: Using bootstrapping as the outer resample is dangerous since the inner resample might have the same data point in both the analysis and assessment set.first_outer_split <- bad_idea$splits[[1]] outer_analysis <- as.data.frame(first_outer_split) sum(grepl("Volvo 142E", rownames(outer_analysis)))#> [1] 0## For the 3-fold CV used inside of each bootstrap, how are the replicated ## `Volvo 142E` data partitioned? first_inner_split <- bad_idea$inner_resamples[[1]]$splits[[1]] inner_analysis <- as.data.frame(first_inner_split) inner_assess <- as.data.frame(first_inner_split, data = "assessment") sum(grepl("Volvo 142E", rownames(inner_analysis)))#> [1] 0#> [1] 0