Overview
The rsample package provides functions to create different types of resamples and corresponding classes for their analysis. The goal is to have a modular set of methods that can be used for:
- resampling for estimating the sampling distribution of a statistic
- estimating model performance using a holdout set
The scope of rsample is to provide the basic building blocks for creating and analyzing resamples of a data set, but this package does not include code for modeling or calculating statistics. The Working with Resample Sets vignette gives a demonstration of how rsample tools can be used when building models.
Note that resampled data sets created by rsample are directly accessible in a resampling object but do not contain much overhead in memory. Since the original data is not modified, R does not make an automatic copy.
For example, creating 50 bootstraps of a data set does not create an object that is 50-fold larger in memory:
library(rsample)
library(mlbench)
data(LetterRecognition)
lobstr::obj_size(LetterRecognition)
#> 2,644,640 B
set.seed(35222)
boots <- bootstraps(LetterRecognition, times = 50)
lobstr::obj_size(boots)
#> 6,686,776 B
# Object size per resample
lobstr::obj_size(boots)/nrow(boots)
#> 133,735.5 B
# Fold increase is <<< 50
as.numeric(lobstr::obj_size(boots)/lobstr::obj_size(LetterRecognition))
#> [1] 2.528426
Created on 2022-02-28 by the reprex package (v2.0.1)
The memory usage for 50 bootstrap samples is less than 3-fold more than the original data set.
Installation
To install it, use:
install.packages("rsample")
And the development version from GitHub with:
# install.packages("pak")
pak::pak("rsample")
Contributing
This project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
For questions and discussions about tidymodels packages, modeling, and machine learning, please post on Posit Community.
If you think you have encountered a bug, please submit an issue.
Either way, learn how to create and share a reprex (a minimal, reproducible example), to clearly communicate about your code.
Check out further details on contributing guidelines for tidymodels packages and how to get help.