---
title: "10. Scaling Up with Parallel Processing"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{10. Scaling Up with Parallel Processing}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

## Introduction

The Double Super Learner is an incredibly powerful statistical framework, but it is computationally demanding. If you define 5 base algorithms and use 10-fold cross-validation, `SuperSurv` has to fit a minimum of 50 separate machine learning models just for the Event ensemble, plus another set for the Censoring ensemble!



By default, R executes these models sequentially (one after the other). However, `SuperSurv` natively supports parallel processing using the modern `future` and `future.apply` ecosystem. This allows you to distribute the cross-validation folds across multiple CPU cores, dramatically reducing computation time.

## 1. Prerequisites

To use parallel processing, you need to have the `future` and `future.apply` packages installed. 

```{r eval=FALSE}
install.packages(c("future", "future.apply"))
```

## 2. Setting Up the Parallel Environment

`SuperSurv` relies on you to define your parallel "plan" before running the function. This gives you complete control over how many resources the package is allowed to consume.

```{r parallel-setup, message=FALSE, warning=FALSE, eval=FALSE}
library(SuperSurv)
library(future)
library(survival)

data("metabric", package = "SuperSurv")

# 1. Define the parallel plan
# 'multisession' opens background R sessions. 
# We tell it to use 4 CPU cores (workers).
plan(multisession, workers = 4)
```

## 3. Running SuperSurv in Parallel

Once the `plan` is set, simply add `parallel = TRUE` to your `SuperSurv` call. The internal cross-validation loop will automatically detect your workers and distribute the folds simultaneously.

```{r run-parallel, eval=FALSE}
X <- metabric[, grep("^x", names(metabric))]
new.times <- seq(50, 200, by = 25)

# 2. Run the model with parallel = TRUE
fit_parallel <- SuperSurv(
  time = metabric$duration,
  event = metabric$event,
  X = X,
  newX = X,
  new.times = new.times,
  event.library = c("surv.coxph", "surv.weibull", "surv.rfsrc"),
  cens.library = c("surv.coxph"),
  parallel = TRUE,     # <--- The magic argument
  nFolds = 5
)
```

## 4. Closing the Environment

It is a best practice to close the background workers and return to standard, sequential processing once your intensive models are finished fitting. This frees up memory on your machine.

```{r close-parallel, eval=FALSE}
# 3. Return to sequential processing
plan(sequential)
```

## A Note on Mathematical Reproducibility

In standard parallel processing, random number generation (used heavily in cross-validation splits and Random Forests) can become disorganized, leading to results that change slightly every time you run the code. 

`SuperSurv` handles this safely under the hood. When `parallel = TRUE`, the package automatically invokes `future.seed = TRUE`, ensuring that your parallelized ensemble yields the exact same mathematically reproducible results as your sequential ensemble, just much faster!
