---
title: "Model Comparison"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Model Comparison}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warning = FALSE,
  dpi = 150
)

# Use ragg for better font rendering if available
if (requireNamespace("ragg", quietly = TRUE)) {
  knitr::opts_chunk$set(dev = "ragg_png")
}

old_opts <- options(width = 180)
```

Model selection is a fundamental problem in applied statistics. Given a set of candidate models, it is important to determine which specification best balances goodness-of-fit against parsimony. Traditional approaches rely on information criteria (AIC, BIC), discrimination metrics (C-statistic), and hypothesis tests for nested models.

The `compfit()` function synthesizes these metrics into a weighted Composite Model Score (CMS) to facilitate systematic comparison between models. Like other functions in this package, it follows the standard `summata` input paradigm, with an additional parameter to allow for multiple models:

```{r, eval = FALSE}
compfit(data, outcome, model_list, model_type, ...)
```

where `data` is the dataset, `outcome` is the common endpoint variable, `model_list` is a named list of predictor vectors (each representing a different model to be compared), and `model_type` is the type of model to be implemented. This vignette demonstrates the various capabilities of this function using the included sample dataset.

---

# Preliminaries

The examples in this vignette use the `clintrial` dataset included with `summata`:

```{r setup}
library(summata)

data(clintrial)
data(clintrial_labels)
```

---

# Quality Metrics and the Composite Model Score

The `compfit()` function evaluates models using several quality metrics, then combines them into a single Composite Model Score (CMS) ranging from 0 to 100 for rapid comparison. The metrics available depend on the model type.

## Available Metrics

| Metric | Used by | Interpretation | Better |
|:-------|:--------|:---------------|:-------|
| AIC | All | Information criterion balancing fit and complexity | Lower |
| BIC | Fallback | Information criterion with stronger complexity penalty | Lower |
| Concordance | GLM, Cox, GLMER, Cox ME | Discrimination ability (0.5 = random, 1.0 = perfect) | Higher |
| Pseudo-*R*² | GLM, Cox ME | Proportion of variation explained (McFadden's) | Higher |
| *R*² | LM | Proportion of variance explained | Higher |
| Brier Score | GLM | Prediction accuracy for binary outcomes (0 = perfect) | Lower |
| Global *p* | Cox | Omnibus likelihood ratio test *p*-value | Lower |
| RMSE | LM | Root mean squared error of residuals | Lower |
| Marginal *R*² | LMER, GLMER | Variance explained by fixed effects alone | Higher |
| Conditional *R*² | LMER | Variance explained by fixed + random effects | Higher |
| ICC | LMER, GLMER, Cox ME | Intraclass correlation; proportion of variance from clustering | Moderate |
| Convergence | All | Whether the model converged successfully | Yes |

## CMS Interpretation

| Range | Interpretation |
|:------|:---------------|
| 85–100 | Excellent |
| 75–84 | Very Good |
| 65–74 | Good |
| 55–64 | Fair |
| < 55 | Poor |

## Default CMS Weights by Model Type

The score components and their default weights vary by model type.

| Component | LM | GLM | Cox | LMER | GLMER | Cox ME |
|:----------|:--:|:---:|:---:|:----:|:-----:|:------:|
| Convergence | 15% | 15% | 15% | 20% | 15% | 15% |
| AIC | 25% | 25% | 30% | 25% | 25% | 30% |
| *R*² | 45% | | | | | |
| Pseudo-*R*² | | 15% | | | | 10% |
| Concordance | | 40% | 40% | | 30% | 35% |
| Brier Score | | 5% | | | | |
| Global *p* | | | 15% | | | |
| RMSE | 15% | | | | | |
| Marginal *R*² | | | | 25% | 15% | |
| Conditional *R*² | | | | 15% | | |
| ICC | | | | 15% | 15% | 10% |

---

# Basic Usage

## **Example 1:** Nested Model Comparison

Compare models with increasing complexity:

```{r}
example1 <- compfit(
  data = clintrial,
  outcome = "surgery",
  model_list = list(
    "Demographics" = c("age", "sex"),
    "Plus Stage" = c("age", "sex", "stage", "ecog"),
    "Full Model" = c("age", "sex", "stage", "ecog", "treatment", "smoking")
  ),
  model_type = "glm"
)

example1
```

## **Example 2:** Linear Regression Models

For continuous outcomes, use `model_type = "lm"`:

```{r}
example2 <- compfit(
  data = clintrial,
  outcome = "los_days",
  model_list = list(
    "Simple" = c("age", "sex"),
    "Disease" = c("age", "sex", "stage", "ecog"),
    "Treatment" = c("age", "sex", "stage", "ecog", "surgery", "treatment")
  ),
  model_type = "lm"
)

example2
```

## **Example 3:** Cox Regression Models

For time-to-event outcomes, use `model_type = "coxph"`:

```{r}
example3 <- compfit(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  model_list = list(
    "Unadjusted" = c("treatment"),
    "Demographics" = c("treatment", "age", "sex"),
    "Full" = c("treatment", "age", "sex", "stage", "ecog")
  ),
  model_type = "coxph"
)

example3
```

## **Example 4:** Count Models

For count outcomes, use `model_type = "glm"` with a count model family (e.g., `family = "poisson"`):

```{r}
example4 <- compfit(
  data = clintrial,
  outcome = "fu_count",
  model_list = list(
    "Minimal" = c("age", "treatment"),
    "Clinical" = c("age", "treatment", "stage", "ecog"),
    "Full" = c("age", "treatment", "stage", "ecog", "surgery", "diabetes")
  ),
  model_type = "glm",
  family = "poisson",
  labels = clintrial_labels
)

example4
```

---

# Interaction Testing

A key application of `compfit()` is testing whether interaction terms improve model fit.

## **Example 5:** Single Interaction

Compare a main-effects model to one with an interaction term:

```{r}
example5 <- compfit(
  data = clintrial,
  outcome = "surgery",
  model_list = list(
    "Main Effects" = c("age", "treatment", "sex"),
    "With Interaction" = c("age", "treatment", "sex")
  ),
  interactions_list = list(
    NULL,
    c("sex:treatment")
  ),
  model_type = "glm"
)

example5
```

## **Example 6:** Multiple Interactions

Compare different interaction hypotheses:

```{r}
example6 <- compfit(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  model_list = list(
    "Main Effects" = c("age", "treatment", "sex", "stage"),
    "Age × Treatment" = c("age", "treatment", "sex", "stage"),
    "Sex × Treatment" = c("age", "treatment", "sex", "stage"),
    "Both" = c("age", "treatment", "sex", "stage")
  ),
  interactions_list = list(
    NULL,
    c("age:treatment"),
    c("sex:treatment"),
    c("age:treatment", "sex:treatment")
  ),
  model_type = "coxph"
)

example6
```

---

# Accessing Detailed Results

## **Example 7:** Coefficient Comparison

Setting `include_coefficients = TRUE` generates a table comparing coefficients across models:

```{r}
example7 <- compfit(
  data = clintrial,
  outcome = "surgery",
  model_list = list(
    "Model A" = c("age", "sex"),
    "Model B" = c("age", "sex", "stage"),
    "Model C" = c("age", "sex", "stage", "treatment")
  ),
  model_type = "glm",
  include_coefficients = TRUE,
  labels = clintrial_labels
)

# Main comparison
example7

# Coefficient table
coef_table <- attr(example7, "coefficients")
coef_table
```

## **Example 8:** Fitted Model Objects

The underlying model objects are stored as attributes for further analysis:

```{r}
models <- attr(example7, "models")
names(models)

# Examine a specific model
summary(models[["Model C"]])
```

## **Example 9:** Recommended Model

The best-performing model is identified automatically based on the Composite Model Score (CMS):

```{r}
example9 <- compfit(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  model_list = list(
    "Minimal" = c("treatment"),
    "Standard" = c("treatment", "age", "sex", "stage"),
    "Extended" = c("treatment", "age", "sex", "stage", "ecog", "grade")
  ),
  model_type = "coxph"
)

recommended <- attr(example9, "best_model")
cat("Recommended model:", recommended, "\n")
```

---

# Custom CMS Weights

## **Example 10:** Custom Weights

Modify the CMS scoring weights to emphasize specific metrics:

```{r}
example10 <- compfit(
  data = clintrial,
  outcome = "surgery",
  model_list = list(
    "Simple" = c("age", "sex"),
    "Standard" = c("age", "sex", "stage"),
    "Full" = c("age", "sex", "stage", "treatment", "ecog")
  ),
  model_type = "glm",
  scoring_weights = list(
    convergence = 0.05,
    aic = 0.20,
    concordance = 0.60,
    pseudo_r2 = 0.10,
    brier = 0.05
  )
)

example10
```

---

# Application Scenarios

## **Scenario 1:** Confounder Assessment

Evaluate the impact of covariate adjustment on effect estimates:

```{r}
scenario1 <- compfit(
  data = clintrial,
  outcome = "Surv(os_months, os_status)",
  model_list = list(
    "Crude" = c("treatment"),
    "Age-Sex Adjusted" = c("treatment", "age", "sex"),
    "Fully Adjusted" = c("treatment", "age", "sex", "stage", "ecog")
  ),
  model_type = "coxph",
  include_coefficients = TRUE,
  labels = clintrial_labels
)

scenario1

# Compare treatment effect across models
attr(scenario1, "coefficients")
```

## **Scenario 2:** Variable Selection Validation

Compare automated versus theory-driven selection:

```{r}
# Identify candidates via screening
screening <- uniscreen(
  data = clintrial,
  outcome = "surgery",
  predictors = c("age", "sex", "bmi", "smoking", "diabetes",
                 "hypertension", "stage", "ecog", "treatment"),
  model_type = "glm",
  p_threshold = 0.10
)

# Extract significant predictors
sig_vars <- attr(screening, "significant")

scenario2 <- compfit(
  data = clintrial,
  outcome = "surgery",
  model_list = list(
    "Theory-Driven" = c("age", "sex", "stage", "treatment"),
    "Data-Driven" = sig_vars,
    "Combined" = unique(c("age", "sex", "stage", "treatment", sig_vars))
  ),
  model_type = "glm"
)

scenario2
```

## **Scenario 3:** Parsimony Assessment

Test whether additional predictors meaningfully improve fit:

```{r}
scenario3 <- compfit(
  data = clintrial,
  outcome = "los_days",
  model_list = list(
    "3 Predictors" = c("age", "surgery", "ecog"),
    "5 Predictors" = c("age", "surgery", "ecog", "stage", "treatment"),
    "8 Predictors" = c("age", "surgery", "ecog", "stage", "treatment",
                       "sex", "smoking", "diabetes")
  ),
  model_type = "lm",
  labels = clintrial_labels
)

scenario3
```

---

# Exporting Results

Comparison tables can be exported to various formats:

```{r, eval = FALSE}
# Main comparison table
table2docx(
  table = example1,
  file = file.path(tempdir(), "Model_Comparison.docx"),
  caption = "Table 3. Model Comparison Results"
)

# Coefficient table
table2docx(
  table = attr(example6, "coefficients"),
  file = file.path(tempdir(), "Coefficient_Comparison.docx"),
  caption = "Table S1. Coefficient Estimates Across Models"
)

# PDF with landscape orientation for wide tables
table2pdf(
  table = example1,
  file = file.path(tempdir(), "Model_Comparison.pdf"),
  caption = "Model Comparison",
  orientation = "landscape"
)
```

---

# Best Practices

## When to Compare Models

1. **Covariate selection**: Determine appropriate adjustment level
2. **Interaction testing**: Evaluate effect modification
3. **Parsimony**: Balance complexity against fit
4. **Sensitivity analysis**: Compare different specifications

## Interpreting Results

1. **Multiple metrics**: The Composite Model Score (CMS) provides a summary, but examine individual metrics
2. **Practical significance**: Statistical improvement may not translate to meaningful differences
3. **Sample size**: Complex models require larger samples
4. **Convergence**: Non-converged models should be interpreted cautiously

## Limitations

1. The Composite Model Score (CMS) is a heuristic, not a formal statistical test
2. Comparisons assume models are fit on identical observations
3. Information criteria are most meaningful for nested models
4. Small score differences may not be practically significant

---

# Common Issues

## Non-Convergence

Check convergence status and consider simplifying non-converging models:

```{r, eval = FALSE}
# Check convergence status
comparison[, .(Model, Converged)]

# For non-converging models:
# 1. Reduce complexity
# 2. Check for separation (logistic)
# 3. Examine predictor correlations
```

When encountering non-converging models, consider reducing complexity, examining predictor correlations, and checking for perfect separation (primarily in logistic models).

## Differing Sample Sizes

Models with missing data may have different sample sizes, which complicates comparison:

```{r, eval = FALSE}
# Check sample sizes
comparison[, .(Model, N, Events)]

# Use complete cases for fair comparison
complete_data <- na.omit(data[, relevant_vars, with = FALSE])
```

## Interpreting Close Scores

When scores are similar, prefer parsimony and consider domain knowledge:

```{r, eval = FALSE}
# Examine individual metrics
comparison[, .(Model, `Composite Model Score (CMS)`, AIC, Concordance)]

# Prefer parsimony when scores are close
# Consider interpretability
```

```{r, include = FALSE}
options(old_opts)
```

---

# Further Reading

- [Descriptive Tables](descriptive_tables.html): `desctable()` for baseline characteristics
- [Regression Modeling](regression_modeling.html): `fit()`, `uniscreen()`, and `fullfit()`
- [Table Export](table_export.html): Export to PDF, Word, and other formats
- [Forest Plots](forest_plots.html): Visualization of regression results
- [Multivariate Regression](multivariate_regression.html): `multifit()` for multi-outcome analysis
- [Advanced Workflows](advanced_workflows.html): Interactions and mixed-effects models