---
title: "Model Selection for Masked Series Systems via Likelihood Ratio Tests"
output:
    rmarkdown::html_vignette:
        toc: true
vignette: >
  %\VignetteIndexEntry{Model Selection for Masked Series Systems via Likelihood Ratio Tests}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
header-includes:
- \renewcommand{\v}[1]{\boldsymbol{#1}}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  echo = TRUE,
  comment = "#>",
  fig.align = "center")

# Set to TRUE to regenerate long-running simulation results
run_long <- FALSE

library(maskedcauses)
old_opts <- options(digits = 4)
```


Introduction {#introduction}
============================

When analyzing masked series system data, a fundamental question precedes
parameter estimation: which parametric family should we use for the component
lifetimes? The `maskedcauses` package provides three models of increasing
generality --- exponential, homogeneous Weibull, and heterogeneous Weibull ---
each nested in the next. This hierarchy forms a natural scaffold for model
selection via the likelihood ratio test (LRT).

We develop a two-step testing cascade that moves top-down from the most
general model, simplifying only when the data cannot distinguish between
levels. We demonstrate the cascade on worked examples and then assess the
power of the shape heterogeneity test through Monte Carlo simulation.


The Weibull Nesting Chain {#nesting-chain}
==========================================

## Three levels of complexity

The three models form a strict nesting chain:
$$
\underbrace{\text{Exponential}}_{\lambda_1, \ldots, \lambda_m}
\;\subset\;
\underbrace{\text{Hom.\ Weibull}}_{k,\, \beta_1, \ldots, \beta_m}
\;\subset\;
\underbrace{\text{Het.\ Weibull}}_{k_1, \beta_1, \ldots, k_m, \beta_m}
$$

| Model | Constructor | Parameters | Count |
|-------|-------------|------------|:-----:|
| Exponential | `exp_series_md_c1_c2_c3()` | $\lambda_1, \ldots, \lambda_m$ | $m$ |
| Hom. Weibull | `wei_series_homogeneous_md_c1_c2_c3()` | $k, \beta_1, \ldots, \beta_m$ | $m + 1$ |
| Het. Weibull | `wei_series_md_c1_c2_c3()` | $k_1, \beta_1, \ldots, k_m, \beta_m$ | $2m$ |


## The two LRT steps

Each nesting defines a hypothesis test:

**Step 1: Shape heterogeneity.** Does the data support different shapes for
different components?
$$
H_0: k_1 = k_2 = \cdots = k_m \quad \text{(homogeneous)}
\quad\text{vs}\quad
H_1: \text{shapes unrestricted (heterogeneous)}
$$
The LRT statistic is
$\Lambda = -2[\ell(\hat\theta_{\text{hom}}) - \ell(\hat\theta_{\text{het}})]$,
which under $H_0$ follows $\chi^2_{m-1}$ asymptotically.

**Step 2: Aging.** If the homogeneous model is retained, does the common shape
differ from 1 (exponential)?
$$
H_0: k = 1 \quad \text{(exponential)}
\quad\text{vs}\quad
H_1: k \neq 1 \quad \text{(homogeneous Weibull)}
$$
The LRT statistic follows $\chi^2_1$ under $H_0$.


## Physical interpretation

The nesting chain reflects increasing physical realism:

- **Exponential $\to$ Homogeneous Weibull** tests whether components *age*
  ($k > 1$, increasing hazard) or exhibit infant mortality ($k < 1$, decreasing
  hazard). Rejecting $k = 1$ means the constant-hazard assumption is inadequate.

- **Homogeneous $\to$ Heterogeneous Weibull** tests whether components have
  *different aging patterns*. Rejecting equal shapes indicates at least one
  component has a fundamentally different failure mechanism --- for example, an
  electronic component with infant mortality alongside a bearing with wear-out.


## Top-down testing cascade

We recommend a **top-down** approach: start with the most general model and
simplify until a restriction is rejected:

1. Fit all three models to the data.
2. Test heterogeneous vs homogeneous. If we cannot reject $H_0$, adopt the
   homogeneous model.
3. Test homogeneous vs exponential. If we cannot reject $H_0$, adopt the
   exponential model.

This approach controls the risk of prematurely simplifying the model.


Worked Example: Homogeneous True Model {#example-hom}
=====================================================

We generate data from a 5-component homogeneous Weibull system --- the "middle"
model in the chain --- and show that the LRT correctly identifies it.

## Setup and data generation

```{r hom-setup}
m <- 5
theta_hom <- c(k = 1.5, beta1 = 300, beta2 = 400, beta3 = 500,
               beta4 = 600, beta5 = 700)

model_hom <- wei_series_homogeneous_md_c1_c2_c3()

# Right-censoring time at the 80th percentile of the system lifetime
beta_sys <- wei_series_system_scale(theta_hom[1], theta_hom[-1])
tau <- qweibull(0.8, shape = theta_hom[1], scale = beta_sys)

cat("System scale:", round(beta_sys, 1), "\n")
cat("Right-censoring time (q = 0.8):", round(tau, 1), "\n")
```

```{r hom-generate}
set.seed(2024)
gen_hom <- rdata(model_hom)

df <- gen_hom(theta_hom, n = 300, p = 0.2,
              observe = observe_right_censor(tau = tau))

cat("Observation types:\n")
print(table(df$omega))
```

## Fitting all three models

```{r hom-fit-all, warning=FALSE}
# Exponential (m params)
model_exp <- exp_series_md_c1_c2_c3()
fit_e <- fit(model_exp)(df, par = rep(0.002, m), method = "Nelder-Mead")

# Homogeneous Weibull (m+1 params)
fit_h <- fit(model_hom)(df, par = c(1, rep(500, m)),
             method = "Nelder-Mead", control = list(maxit = 5000))

# Heterogeneous Weibull (2m params) — warm-start from hom fit
model_het <- wei_series_md_c1_c2_c3()
het_start <- as.numeric(rbind(rep(fit_h$par[1], m), fit_h$par[-1]))
fit_w <- fit(model_het)(df, par = het_start,
             method = "Nelder-Mead", control = list(maxit = 10000))
```

## Likelihood ratio tests

```{r hom-lrt}
np_exp <- m
np_hom <- m + 1
np_het <- 2 * m
n_obs <- nrow(df)

# LRT: heterogeneous vs homogeneous
LRT_shape <- -2 * (fit_h$loglik - fit_w$loglik)
df_shape <- np_het - np_hom   # m - 1 = 4
p_shape <- pchisq(LRT_shape, df = df_shape, lower.tail = FALSE)

# LRT: homogeneous vs exponential
LRT_aging <- -2 * (fit_e$loglik - fit_h$loglik)
df_aging <- np_hom - np_exp   # 1
p_aging <- pchisq(LRT_aging, df = df_aging, lower.tail = FALSE)

# AIC and BIC
aic_val <- function(ll, k) -2 * ll + 2 * k
bic_val <- function(ll, k, n) -2 * ll + log(n) * k

comparison <- data.frame(
  Model = c("Exponential", "Hom. Weibull", "Het. Weibull"),
  Params = c(np_exp, np_hom, np_het),
  LogLik = c(fit_e$loglik, fit_h$loglik, fit_w$loglik),
  AIC = c(aic_val(fit_e$loglik, np_exp),
          aic_val(fit_h$loglik, np_hom),
          aic_val(fit_w$loglik, np_het)),
  BIC = c(bic_val(fit_e$loglik, np_exp, n_obs),
          bic_val(fit_h$loglik, np_hom, n_obs),
          bic_val(fit_w$loglik, np_het, n_obs))
)

knitr::kable(comparison, digits = 2,
             caption = "Model comparison (true model: homogeneous Weibull)",
             col.names = c("Model", "# Params", "Log-Lik", "AIC", "BIC"))
```

```{r hom-lrt-results}
cat("Step 1: Het vs Hom (shape heterogeneity)\n")
cat(sprintf("  LRT = %.2f, df = %d, p = %.4f\n", LRT_shape, df_shape, p_shape))
if (p_shape < 0.05) {
  cat("  Decision: Reject homogeneous -> adopt heterogeneous model\n\n")
} else {
  cat("  Decision: Fail to reject -> retain homogeneous model\n\n")
}

cat("Step 2: Hom vs Exp (aging)\n")
cat(sprintf("  LRT = %.2f, df = %d, p = %.4f\n", LRT_aging, df_aging, p_aging))
if (p_aging < 0.05) {
  cat("  Decision: Reject exponential -> adopt homogeneous model\n")
} else {
  cat("  Decision: Fail to reject -> retain exponential model\n")
}
```

Since the true model is homogeneous Weibull with $k = 1.5$, we expect:

1. The heterogeneous vs homogeneous test to **fail to reject** --- the additional
   shape parameters do not significantly improve the fit.
2. The homogeneous vs exponential test to **reject** --- the data shows clear
   evidence of $k \neq 1$.

The cascade correctly identifies the homogeneous model. Both AIC and BIC
should agree, with the homogeneous model having the smallest values.


Worked Example: Heterogeneous True Model {#example-het}
=======================================================

We now generate data from a system where components have meaningfully different
shapes, and show that the LRT detects this.

```{r het-setup}
# Shapes span DFR to strong IFR
theta_het_true <- c(0.8, 300,    # DFR (electronics)
                    1.0, 400,    # CFR (random)
                    1.5, 500,    # mild IFR (seals)
                    2.0, 600,    # moderate IFR (bearings)
                    2.5, 700)    # strong IFR (wear-out)

model_het <- wei_series_md_c1_c2_c3()
gen_het <- rdata(model_het)

set.seed(7231)
df_het <- gen_het(theta_het_true, n = 300, p = 0.2,
                  observe = observe_right_censor(tau = tau))

cat("Observation types:\n")
print(table(df_het$omega))
```

```{r het-fit-all, warning=FALSE}
fit_e2 <- fit(model_exp)(df_het, par = rep(0.002, m), method = "Nelder-Mead")
fit_h2 <- fit(model_hom)(df_het, par = c(1, rep(500, m)),
              method = "Nelder-Mead", control = list(maxit = 5000))
het_start2 <- as.numeric(rbind(rep(fit_h2$par[1], m), fit_h2$par[-1]))
fit_w2 <- fit(model_het)(df_het, par = het_start2,
              method = "Nelder-Mead", control = list(maxit = 10000))
```

```{r het-lrt}
# LRT: heterogeneous vs homogeneous
LRT_shape2 <- -2 * (fit_h2$loglik - fit_w2$loglik)
p_shape2 <- pchisq(LRT_shape2, df = m - 1, lower.tail = FALSE)

# LRT: homogeneous vs exponential
LRT_aging2 <- -2 * (fit_e2$loglik - fit_h2$loglik)
p_aging2 <- pchisq(LRT_aging2, df = 1, lower.tail = FALSE)

comparison2 <- data.frame(
  Model = c("Exponential", "Hom. Weibull", "Het. Weibull"),
  Params = c(np_exp, np_hom, np_het),
  LogLik = c(fit_e2$loglik, fit_h2$loglik, fit_w2$loglik),
  AIC = c(aic_val(fit_e2$loglik, np_exp),
          aic_val(fit_h2$loglik, np_hom),
          aic_val(fit_w2$loglik, np_het)),
  BIC = c(bic_val(fit_e2$loglik, np_exp, nrow(df_het)),
          bic_val(fit_h2$loglik, np_hom, nrow(df_het)),
          bic_val(fit_w2$loglik, np_het, nrow(df_het)))
)

knitr::kable(comparison2, digits = 2,
             caption = "Model comparison (true model: heterogeneous Weibull)",
             col.names = c("Model", "# Params", "Log-Lik", "AIC", "BIC"))

cat(sprintf("\nStep 1: Het vs Hom — LRT = %.2f, df = %d, p = %.4f\n",
            LRT_shape2, m - 1, p_shape2))
cat(sprintf("Step 2: Hom vs Exp — LRT = %.2f, df = %d, p = %.4f\n",
            LRT_aging2, 1, p_aging2))
```

With true shapes spanning 0.8 to 2.5, the cascade correctly:

1. **Rejects** the homogeneous constraint (step 1) --- the shapes are genuinely
   different. The cascade stops here and adopts the heterogeneous model.

Note that the homogeneous vs exponential test (step 2) may *fail to reject*
even though component aging is present. This is because the pooled shape
estimate averages across DFR ($k < 1$) and IFR ($k > 1$) components, yielding
$\hat{k} \approx 1$ --- which is close to exponential. The top-down cascade
avoids this trap by testing heterogeneity *first*.


Monte Carlo Study: Power of the Heterogeneity Test {#power-study}
=================================================================

The worked examples show the cascade on individual datasets. We now assess the
*statistical power* of the shape heterogeneity test (step 1) as a function of
how different the component shapes actually are.

## Design

We parameterize shape heterogeneity by the **coefficient of variation (CV)** of
the shape parameters. All $m = 5$ shapes are set to
$k_j = \bar{k}(1 + \text{cv} \cdot z_j)$ where $\bar{k} = 1.5$ and
$z_1, \ldots, z_m$ are fixed offsets with mean 0 and standard deviation 1.

| CV | Meaning |
|----|---------|
| 0% | All shapes equal --- homogeneous model is correct |
| 5--10% | Mild heterogeneity --- hard to detect |
| 20--30% | Moderate heterogeneity --- detectable with adequate $n$ |
| 50% | Strong heterogeneity --- reliably detected |

Other settings: $n = 500$ observations, $p = 0.2$ (masking probability),
right-censoring at the 80th percentile of the system lifetime under the
homogeneous baseline, $\alpha = 0.05$.

```{r load-precomputed-mc, include=FALSE, eval=!run_long}
mc_data <- readRDS("precomputed_model_selection.rds")
```

```{r mc-setup, eval=run_long}
set.seed(42)

m <- 5
base_k <- 1.5
base_scales <- c(300, 400, 500, 600, 700)
cv_levels <- c(0, 0.05, 0.10, 0.20, 0.30, 0.50)
n_reps <- 200
n_obs <- 500
p_mask <- 0.2
alpha <- 0.05

# Fixed offsets: mean 0, sd 1
offsets <- c(-2, -1, 0, 1, 2)
offsets <- offsets / sd(offsets)

# Right-censoring time (based on homogeneous baseline at CV = 0)
beta_sys <- wei_series_system_scale(base_k, base_scales)
tau <- qweibull(0.8, shape = base_k, scale = beta_sys)

# Models and solvers
model_exp <- exp_series_md_c1_c2_c3()
model_hom <- wei_series_homogeneous_md_c1_c2_c3()
model_het <- wei_series_md_c1_c2_c3()

solver_exp <- fit(model_exp)
solver_hom <- fit(model_hom)
solver_het <- fit(model_het)

gen_het <- rdata(model_het)

np_exp <- m
np_hom <- m + 1
np_het <- 2 * m
```

```{r mc-run, eval=run_long, cache=TRUE, warning=FALSE, message=FALSE}
results_list <- vector("list", length(cv_levels) * n_reps)
idx <- 1

for (cv in cv_levels) {
  shapes <- base_k * (1 + cv * offsets)
  theta <- as.numeric(rbind(shapes, base_scales))

  for (r in seq_len(n_reps)) {
    df_r <- gen_het(theta, n = n_obs, p = p_mask,
                    observe = observe_right_censor(tau = tau))

    row <- list(cv = cv, rep = r,
                reject_het_vs_hom = NA, reject_hom_vs_exp = NA,
                ll_exp = NA, ll_hom = NA, ll_het = NA,
                aic_exp = NA, aic_hom = NA, aic_het = NA,
                bic_exp = NA, bic_hom = NA, bic_het = NA,
                converged_all = FALSE)

    tryCatch({
      fe <- solver_exp(df_r, par = rep(0.002, m), method = "Nelder-Mead")

      # Two-phase for hom: Nelder-Mead warm-up -> L-BFGS-B refinement
      fh_nm <- solver_hom(df_r, par = c(1, rep(500, m)),
                           method = "Nelder-Mead",
                           control = list(maxit = 5000))
      fh <- solver_hom(df_r, par = fh_nm$par,
                        method = "L-BFGS-B", lower = rep(1e-6, np_hom))

      # Two-phase for het: warm-start from hom, Nelder-Mead -> L-BFGS-B
      het_start <- as.numeric(rbind(rep(fh$par[1], m), fh$par[-1]))
      fw_nm <- solver_het(df_r, par = het_start,
                           method = "Nelder-Mead",
                           control = list(maxit = 10000))
      fw <- solver_het(df_r, par = fw_nm$par,
                        method = "L-BFGS-B", lower = rep(1e-6, np_het))

      if (fe$converged && fh$converged && fw$converged) {
        row$ll_exp <- fe$loglik
        row$ll_hom <- fh$loglik
        row$ll_het <- fw$loglik

        row$aic_exp <- -2 * fe$loglik + 2 * np_exp
        row$aic_hom <- -2 * fh$loglik + 2 * np_hom
        row$aic_het <- -2 * fw$loglik + 2 * np_het

        row$bic_exp <- -2 * fe$loglik + log(n_obs) * np_exp
        row$bic_hom <- -2 * fh$loglik + log(n_obs) * np_hom
        row$bic_het <- -2 * fw$loglik + log(n_obs) * np_het

        LRT_het <- -2 * (fh$loglik - fw$loglik)
        LRT_hom <- -2 * (fe$loglik - fh$loglik)

        row$reject_het_vs_hom <- pchisq(LRT_het, df = m - 1,
                                          lower.tail = FALSE) < alpha
        row$reject_hom_vs_exp <- pchisq(LRT_hom, df = 1,
                                          lower.tail = FALSE) < alpha
        row$converged_all <- TRUE
      }
    }, error = function(e) NULL)

    results_list[[idx]] <- row
    idx <- idx + 1
  }

  cat(sprintf("CV = %.0f%% complete\n", cv * 100))
}

power_study <- do.call(rbind, lapply(results_list, as.data.frame))
```

```{r mc-save, eval=run_long, include=FALSE}
mc_data <- list(
  power_study = power_study,
  config = list(m = m, n = n_obs, p = p_mask, tau = tau,
                n_reps = n_reps, alpha = alpha,
                base_k = base_k, base_scales = base_scales,
                offsets = offsets)
)
saveRDS(mc_data, "precomputed_model_selection.rds")
```


## Rejection rates

```{r mc-rejection-rates}
ps <- mc_data$power_study
ps_conv <- ps[ps$converged_all, ]

rej_summary <- do.call(rbind, lapply(split(ps_conv, ps_conv$cv), function(d) {
  data.frame(
    CV_pct = d$cv[1] * 100,
    n_converged = nrow(d),
    reject_het_vs_hom = mean(d$reject_het_vs_hom),
    reject_hom_vs_exp = mean(d$reject_hom_vs_exp)
  )
}))

knitr::kable(rej_summary, digits = 3, row.names = FALSE,
             caption = sprintf(
               "Rejection rates at alpha = %.2f (n = %d, m = %d)",
               mc_data$config$alpha, mc_data$config$n, mc_data$config$m),
             col.names = c("Shape CV (%)", "Converged",
                          "Reject Hom (Het vs Hom)",
                          "Reject Exp (Hom vs Exp)"))
```

## Power curve

```{r mc-power-curve, fig.width=7, fig.height=5}
plot(rej_summary$CV_pct, rej_summary$reject_het_vs_hom,
     type = "b", pch = 19, col = "steelblue", lwd = 2,
     xlab = "Coefficient of Variation of Shapes (%)",
     ylab = "Rejection Rate",
     ylim = c(0, 1),
     main = "Power of the LRT Cascade")
lines(rej_summary$CV_pct, rej_summary$reject_hom_vs_exp,
      type = "b", pch = 17, col = "firebrick", lwd = 2, lty = 2)
abline(h = mc_data$config$alpha, lty = 3, col = "gray50")
legend("right",
       c("Het vs Hom (shape heterogeneity)",
         "Hom vs Exp (aging)",
         expression(alpha == 0.05)),
       col = c("steelblue", "firebrick", "gray50"),
       lty = c(1, 2, 3), pch = c(19, 17, NA), lwd = 2, cex = 0.85)
grid()
```

At CV = 0% (no heterogeneity), the rejection rate of the shape heterogeneity
test should be near the nominal level $\alpha = 0.05$ --- the Type I error rate.
As CV increases, the power rises, eventually reaching near 1 at CV = 50%.

The aging test (hom vs exp) maintains high rejection rates across all CV levels,
because the baseline shape $\bar{k} = 1.5$ is far from 1 regardless of how much
the individual shapes vary around it.


## AIC/BIC model selection accuracy

We also evaluate how often each information criterion selects the correct model.
At CV = 0, the correct model is homogeneous Weibull. At CV > 0, it is
heterogeneous Weibull.

```{r mc-aic-accuracy}
aic_correct <- do.call(rbind, lapply(split(ps_conv, ps_conv$cv), function(d) {
  cv <- d$cv[1]

  aic_choice <- ifelse(d$aic_het < d$aic_hom & d$aic_het < d$aic_exp, "het",
                ifelse(d$aic_hom < d$aic_exp, "hom", "exp"))
  bic_choice <- ifelse(d$bic_het < d$bic_hom & d$bic_het < d$bic_exp, "het",
                ifelse(d$bic_hom < d$bic_exp, "hom", "exp"))

  correct_model <- if (cv == 0) "hom" else "het"

  data.frame(
    CV_pct = cv * 100,
    AIC_correct = mean(aic_choice == correct_model),
    BIC_correct = mean(bic_choice == correct_model)
  )
}))

knitr::kable(aic_correct, digits = 3, row.names = FALSE,
             caption = "Proportion selecting the correct model",
             col.names = c("Shape CV (%)", "AIC Correct", "BIC Correct"))
```


Practical Guidelines {#guidelines}
===================================

Based on the Monte Carlo study and general theory, we offer the following
recommendations for practitioners:

| Shape CV | LRT Power | Recommended Approach |
|----------|-----------|---------------------|
| 0% (identical) | $\sim\alpha$ (nominal) | Homogeneous Weibull --- no benefit to extra parameters |
| $< 10\%$ | Low ($< 20\%$) | Prefer homogeneous --- heterogeneity is undetectable |
| 10--25% | Moderate | Run the cascade; decision depends on $n$ and censoring |
| $> 25\%$ | High ($> 80\%$) | Heterogeneous Weibull --- shape differences are reliably detected |

**The aging test is almost always decisive.** When the true common shape is far
from 1, even moderate sample sizes ($n \geq 200$) overwhelmingly reject the
exponential model. The choice between exponential and Weibull is rarely
ambiguous.

**The heterogeneity test is the crux.** Power depends strongly on how different
the component shapes are. With $n = 500$ and $m = 5$, the test has good power
for CV $\geq 25\%$ but is nearly blind to CV $< 10\%$. Practitioners should
consider:

- Increasing sample size if moderate heterogeneity is suspected
- Reducing censoring to provide more information per observation
- Using AIC/BIC as complementary criteria --- BIC tends to favor simpler models
  while AIC is more sensitive to heterogeneity

For extensive sensitivity analysis varying $n$, $m$, $p$, and censoring
schemes, see the companion paper on model selection for masked series systems.

```{r cleanup, include=FALSE}
options(old_opts)
```
