Skip to content

Group Monte Carlo cross-validation creates splits of the data based on some grouping variable (which may have more than a single row associated with it). One resample of Monte Carlo cross-validation takes a random sample (without replacement) of groups in the original data set to be used for analysis. All other data points are added to the assessment set. A common use of this kind of resampling is when you have repeated measures of the same subject.

Usage

group_mc_cv(data, group, prop = 3/4, times = 25, ...)

Arguments

data

A data frame.

group

A variable in data (single character or name) used for grouping observations with the same value to either the analysis or assessment set within a fold.

prop

The proportion of data to be retained for modeling/analysis.

times

The number of times to repeat the sampling.

...

Not currently used.

Value

A tibble with classes group_mc_cv, rset, tbl_df, tbl, and data.frame. The results include a column for the data split objects and an identification variable.

Examples

data(ames, package = "modeldata")

set.seed(123)
group_mc_cv(ames, group = Neighborhood, times = 5)
#> # Group Monte Carlo cross-validation (0.75/0.25) with 5 resamples  
#> # A tibble: 5 × 2
#>   splits             id       
#>   <list>             <chr>    
#> 1 <split [2395/535]> Resample1
#> 2 <split [2236/694]> Resample2
#> 3 <split [2168/762]> Resample3
#> 4 <split [2331/599]> Resample4
#> 5 <split [2115/815]> Resample5