tidylearn provides a unified
tidyverse-compatible interface to R’s machine learning
ecosystem. It wraps proven packages like glmnet, randomForest, xgboost,
e1071, cluster, and dbscan - you get the reliability of established
implementations with the convenience of a consistent, tidy API.
What tidylearn does:
tl_model()) to 20+
ML algorithmsWhat tidylearn is NOT:
model$fit)The core of tidylearn is the tl_model() function, which
dispatches to the appropriate underlying package based on the method you
specify. The wrapped packages include stats, glmnet, randomForest,
xgboost, gbm, e1071, nnet, rpart, cluster, and dbscan.
# Classification with logistic regression
model_logistic <- tl_model(iris, Species ~ ., method = "logistic")
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
print(model_logistic)
#> tidylearn Model
#> ===============
#> Paradigm: supervised
#> Method: logistic
#> Task: Classification
#> Formula: Species ~ .
#>
#> Training observations: 150# Principal Component Analysis
model_pca <- tl_model(iris[, 1:4], method = "pca")
print(model_pca)
#> tidylearn Model
#> ===============
#> Paradigm: unsupervised
#> Method: pca
#> Technique: pca
#>
#> Training observations: 150# Transform data
transformed <- predict(model_pca)
head(transformed)
#> # A tibble: 6 × 5
#> .obs_id PC1 PC2 PC3 PC4
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 -2.26 -0.478 0.127 0.0241
#> 2 2 -2.07 0.672 0.234 0.103
#> 3 3 -2.36 0.341 -0.0441 0.0283
#> 4 4 -2.29 0.595 -0.0910 -0.0657
#> 5 5 -2.38 -0.645 -0.0157 -0.0358
#> 6 6 -2.07 -1.48 -0.0269 0.00659# K-means clustering
model_kmeans <- tl_model(iris[, 1:4], method = "kmeans", k = 3)
print(model_kmeans)
#> tidylearn Model
#> ===============
#> Paradigm: unsupervised
#> Method: kmeans
#> Technique: kmeans
#>
#> Training observations: 150tidylearn provides comprehensive preprocessing functions:
# Simple random split
split <- tl_split(iris, prop = 0.7, seed = 123)
# Train model
model_train <- tl_model(split$train, Species ~ ., method = "logistic")
#> Warning: glm.fit: algorithm did not converge
#> Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
# Test predictions
predictions_test <- predict(model_train, new_data = split$test)
head(predictions_test)
#> # A tibble: 6 × 1
#> .pred
#> <dbl>
#> 1 2.22e-16
#> 2 2.22e-16
#> 3 2.22e-16
#> 4 2.22e-16
#> 5 2.22e-16
#> 6 2.22e-16# Stratified split (maintains class proportions)
split_strat <- tl_split(iris, prop = 0.7, stratify = "Species", seed = 123)
# Check proportions are maintained
prop.table(table(split_strat$train$Species))
#>
#> setosa versicolor virginica
#> 0.3333333 0.3333333 0.3333333
prop.table(table(split_strat$test$Species))
#>
#> setosa versicolor virginica
#> 0.3333333 0.3333333 0.3333333
prop.table(table(iris$Species))
#>
#> setosa versicolor virginica
#> 0.3333333 0.3333333 0.3333333tidylearn provides a unified interface to these established R packages:
| Method | Underlying Package | Function Called |
|---|---|---|
"linear" |
stats | lm() |
"polynomial" |
stats | lm() with poly() |
"logistic" |
stats | glm(..., family = binomial) |
"ridge", "lasso",
"elastic_net" |
glmnet | glmnet() |
"tree" |
rpart | rpart() |
"forest" |
randomForest | randomForest() |
"boost" |
gbm | gbm() |
"xgboost" |
xgboost | xgb.train() |
"svm" |
e1071 | svm() |
"nn" |
nnet | nnet() |
"deep" |
keras | keras_model_sequential() |
| Method | Underlying Package | Function Called |
|---|---|---|
"pca" |
stats | prcomp() |
"mds" |
stats, MASS, smacof | cmdscale(), isoMDS(), etc. |
"kmeans" |
stats | kmeans() |
"pam" |
cluster | pam() |
"clara" |
cluster | clara() |
"hclust" |
stats | hclust() |
"dbscan" |
dbscan | dbscan() |
You always have access to the raw model from the underlying package
via $fit:
Now that you understand the basics, explore:
tl_auto_ml()tidylearn is a wrapper package that provides:
tl_model()) that dispatches to proven packages like
glmnet, randomForest, xgboost, e1071, and othersmodel$fit for package-specific functionalityThe underlying algorithms are unchanged - tidylearn simply makes them easier to use together.
# Quick example combining everything
data_split <- tl_split(iris, prop = 0.7, stratify = "Species", seed = 42)
data_prep <- tl_prepare_data(data_split$train, Species ~ ., scale_method = "standardize")
#> Scaling numeric features using method: standardize
model_final <- tl_model(data_prep$data, Species ~ ., method = "forest")
test_preds <- predict(model_final, new_data = data_split$test)
print(model_final)
#> tidylearn Model
#> ===============
#> Paradigm: supervised
#> Method: forest
#> Task: Classification
#> Formula: Species ~ .
#>
#> Training observations: 105