---
title: "Migrating from rMIDAS to rMIDAS2"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Migrating from rMIDAS to rMIDAS2}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

This vignette accompanies the deprecation of **rMIDAS**. Existing
projects can keep using **rMIDAS**, but new development should move to
[**rMIDAS2**](https://CRAN.R-project.org/package=rMIDAS2).
The source repository for the successor package is
<https://github.com/MIDASverse/rMIDAS2>.

## Why rMIDAS2?

**rMIDAS2** is the successor to rMIDAS.
It re-implements the MIDAS multiple imputation algorithm with several
improvements:

| | rMIDAS | rMIDAS2 |
|---|---|---|
| **Backend** | TensorFlow (Python, via `reticulate`) | PyTorch (Python, via local HTTP API) |
| **Runtime R dependency on `reticulate`** | Yes | No |
| **Preprocessing** | Manual (`convert()`) | Automatic |
| **Python versions** | 3.6--3.10 | 3.9+ |
| **TensorFlow required** | Yes (< 2.12) | No |

The API is deliberately simpler: most pipelines that required four
function calls in rMIDAS need just one or two in rMIDAS2.

## Installation

```{r}
# Remove rMIDAS (optional -- it can coexist)
# remove.packages("rMIDAS")

# Install rMIDAS2
install.packages("rMIDAS2")

# One-time Python backend setup
library(rMIDAS2)
install_backend()
```

## Side-by-side comparison

### 1. Setup

**rMIDAS** required configuring a `reticulate` Python environment with
TensorFlow:

```{r}
# --- rMIDAS ---
library(rMIDAS)
# Python environment configured automatically on first load,
# or manually via set_python_env()
```

**rMIDAS2** uses a standalone Python server -- no reticulate needed at
runtime:

```{r}
# --- rMIDAS2 ---
library(rMIDAS2)
install_backend()        # one-time setup
# The server starts automatically when you call any imputation function
```

### 2. Data preparation

**rMIDAS** required explicit preprocessing with `convert()`, where you
had to specify which columns were binary and which were categorical:

```{r}
# --- rMIDAS ---
data(adult)
adult_conv <- convert(adult,
                      bin_cols = c("income"),
                      cat_cols = c("workclass", "marital_status"),
                      minmax_scale = TRUE)
```

**rMIDAS2** detects column types automatically -- just pass your data
frame directly:

```{r}
# --- rMIDAS2 ---
# No convert() step needed. Pass raw data to midas() or midas_fit().
```

### 3. Training

**rMIDAS** used `train()`:

```{r}
# --- rMIDAS ---
mid <- train(adult_conv,
             training_epochs = 20L,
             layer_structure = c(256, 256, 256),
             input_drop      = 0.8,
             learn_rate      = 0.0004,
             seed            = 89L)
```

**rMIDAS2** uses `midas_fit()` (or the all-in-one `midas()`):

```{r}
# --- rMIDAS2 ---
fit <- midas_fit(adult,
                 epochs        = 20L,
                 hidden_layers = c(256L, 128L, 64L),
                 corrupt_rate  = 0.8,
                 lr            = 0.001,
                 seed          = 89L)
```

**Parameter name changes:**

| rMIDAS (`train()`) | rMIDAS2 (`midas_fit()`) | Notes |
|---|---|---|
| `training_epochs` | `epochs` | |
| `layer_structure` | `hidden_layers` | Default changed from 256-256-256 to 256-128-64 |
| `input_drop` | `corrupt_rate` | |
| `learn_rate` | `lr` | Default changed from 0.0004 to 0.001 |
| `dropout_level` | `dropout_prob` | |
| `train_batch` | `batch_size` | Default changed from 16 to 64 |
| `cont_adj` | `num_adj` | |
| `softmax_adj` | `cat_adj` | |
| `binary_adj` | `bin_adj` | |

### 4. Generating imputations

**rMIDAS** used `complete()`:

```{r}
# --- rMIDAS ---
imps <- complete(mid, m = 10)
# Returns a list of 10 data.frames
head(imps[[1]])
```

**rMIDAS2** uses `midas_transform()`:

```{r}
# --- rMIDAS2 ---
imps <- midas_transform(fit, m = 10)
# Returns a list of 10 data.frames
head(imps[[1]])
```

Or skip `midas_fit()` + `midas_transform()` entirely and use the
all-in-one `midas()`:

```{r}
# --- rMIDAS2 (all-in-one) ---
result <- midas(adult, m = 10, epochs = 20)
head(result$imputations[[1]])
```

### 5. Rubin's rules regression

The `combine()` interface has changed:

**rMIDAS** took a formula and a list of completed data frames:

```{r}
# --- rMIDAS ---
combine("income ~ age + hours_per_week", imps)
```

**rMIDAS2** takes a model ID and an outcome variable name.
Independent variables default to all other columns:

```{r}
# --- rMIDAS2 ---
combine(fit, y = "income")

# Specify predictors explicitly:
combine(fit, y = "income", ind_vars = c("age", "hours_per_week"))
```

The output format is the same: a data frame with columns `term`,
`estimate`, `std.error`, `statistic`, `df`, and `p.value`.

### 6. Overimputation diagnostic

**rMIDAS** required re-specifying the data and column types:

```{r}
# --- rMIDAS ---
overimpute(adult,
           binary_columns  = c("income"),
           softmax_columns = c("workclass", "marital_status"),
           training_epochs = 20L,
           spikein = 0.3)
```

**rMIDAS2** runs overimputation on an already-fitted model:

```{r}
# --- rMIDAS2 ---
diag <- overimpute(fit, mask_frac = 0.1)
diag$mean_rmse
diag$rmse     # per-column RMSE
```

### 7. Mean imputation (new in rMIDAS2)

rMIDAS2 adds `imp_mean()`, which computes the element-wise mean
across all imputations -- useful as a quick single point estimate:

```{r}
# --- rMIDAS2 only ---
mean_df <- imp_mean(fit)
head(mean_df)
```

### 8. Cleanup

**rMIDAS2** runs a background Python server that should be stopped when
you are done:

```{r}
# --- rMIDAS2 ---
stop_server()
```

## Complete migration example

Below is a full rMIDAS pipeline and its rMIDAS2 equivalent.

### rMIDAS (old)

```{r}
library(rMIDAS)

data(adult)
adult <- adult[1:1000, ]

# 1. Preprocess
adult_conv <- convert(adult,
                      bin_cols  = c("income"),
                      cat_cols  = c("workclass", "marital_status"),
                      minmax_scale = TRUE)

# 2. Train
mid <- train(adult_conv, training_epochs = 20L, seed = 89L)

# 3. Generate imputations
imps <- complete(mid, m = 5)

# 4. Analyse
combine("income ~ age + hours_per_week", imps)
```

### rMIDAS2 (new)

```{r}
library(rMIDAS2)

data(adult)
adult <- adult[1:1000, ]

# 1. Fit and impute (no preprocessing needed)
result <- midas(adult, m = 5, epochs = 20, seed = 89L)

# 2. Analyse
combine(result, y = "income", ind_vars = c("age", "hours_per_week"))

# 3. Clean up
stop_server()
```

## Quick-reference cheat sheet

| Task | rMIDAS | rMIDAS2 |
|---|---|---|
| Install Python env | Automatic / `set_python_env()` | `install_backend()` |
| Preprocess data | `convert(data, bin_cols, cat_cols)` | *Not needed* |
| Train model | `train(data, training_epochs, ...)` | `midas_fit(data, epochs, ...)` |
| Generate imputations | `complete(model, m)` | `midas_transform(model, m)` |
| Train + impute (one step) | *Not available* | `midas(data, m, epochs, ...)` |
| Mean imputation | *Not available* | `imp_mean(model)` |
| Rubin's rules | `combine(formula, df_list)` | `combine(model, y, ind_vars)` |
| Overimputation | `overimpute(data, ...)` | `overimpute(model, mask_frac)` |
| Shutdown | *Not needed* | `stop_server()` |
