---
title: "mmbcv: Bias-corrected sandwich variance for clustered multistate Cox models"
output:
  rmarkdown::html_vignette:
    css: vignette.css
bibliography: references.bib
link-citations: true
vignette: >
  %\VignetteIndexEntry{mmbcv: Bias-corrected sandwich variance for clustered multistate Cox models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  message  = FALSE,
  warning  = FALSE
)
has_survival <- requireNamespace("survival", quietly = TRUE)
```

## Overview

The **mmbcv** package computes cluster-robust and bias-corrected sandwich variance estimators for multi-state Cox models fit in counting-process form. Wang, Turner, and Li [@wang2023] developed a class of bias corrections under clustered marginal Cox model. This package extend their approaches to multi-state settings. 

The main user-facing function is **`MMBCV()`**, which takes

- a fitted multistate Cox model object (`fit`),
- the original counting-process dataset (`data`),
- the start and stop time variables,
- the cluster identifier,
- the subject identifier,
- the event/state identifier,

and returns a collection of robust and bias-corrected variance estimators.

For background on fitting multistate Cox models using `survival::coxph()` with
list-formulas, see the survival vignette *Multi-state models and competing risks*:

- `vignette("compete", package = "survival")`
- https://CRAN.R-project.org/package=survival

## Statistical background

Suppose the data arise from a clustered multi-state survival process, and let
\(\beta\) denote the full regression coefficient vector. Let \(U_i(\hat\beta)\)
denote the cluster-level score contribution for cluster \(i\), and let \(V_m\)
denote the model-based variance estimator. Following Wang et. al. [@wang2023], 
the baseline robust sandwich variance estimator is

\[
\widehat{\mathrm{Var}}_{\mathrm{robust}}(\hat\beta)
=
V_m
\left(
\sum_{i=1}^n
U_i(\hat\beta) U_i(\hat\beta)^\top
\right)
V_m.
\]

In **mmbcv**, this estimator is returned as `robust`. When the number of clusters
is small, the usual robust sandwich variance may underestimate the true variance,
which motivates finite-sample bias corrections.

Following Wang et. al. [@wang2023], a convenient way to write many of these estimators is

\[
\widehat{\mathrm{Var}}(\hat\beta)
=
V_m
\left(
\sum_{i=1}^n
C_i \, U_i^\star(\hat\beta)\,
U_i^\star(\hat\beta)^\top C_i^\top
\right)
V_m,
\]

where \(U_i^\star(\hat\beta)\) is either the original cluster score \(U_i(\hat\beta)\)
or a corrected score based on martingale residual (MR), and \(C_i\) is a cluster-specific correction matrix.

### MR correction

The martingale-residual correction replaces the original cluster score \(U_i(\hat\beta)\)
with a bias-corrected version, denoted here by \(U_i^{\mathrm{MR}}(\hat\beta)\). The resulting variance estimator is

\[
\widehat{\mathrm{Var}}_{\mathrm{MR}}(\hat\beta)
=
V_m
\left(
\sum_{i=1}^n
U_i^{\mathrm{MR}}(\hat\beta)
U_i^{\mathrm{MR}}(\hat\beta)^\top
\right)
V_m.
\]

This is returned by **mmbcv** as `varMR`.

### KC, FG, and MD corrections

Wang et al. [@wang2023] summarize three multiplicative small-sample corrections using the same
general sandwich form with different choices of \(C_i\).

For the **Kauermann-Carroll (KC)** correction [@kauermann2001; @wang2023],

\[
C^{KC}_i = \left(I_p - \Omega_i V_m \right)^{-1/2},
\]

and the corresponding estimator is

\[
\widehat{\mathrm{Var}}_{\mathrm{KC}}(\hat\beta)
=
V_m
\left(
\sum_{i=1}^n
C^{KC}_i U_i(\hat\beta) U_i(\hat\beta)^\top C^{KC \top}_i
\right)
V_m.
\]

This is returned as `varKC`.

For the **Fay-Graubard (FG)** correction [@fay2001; @wang2023],

\[
C^{FG}_i
=
\mathrm{diag}
\left\{
\left(1 - \min\{\xi, [\Omega_i V_m]_{jj}\}\right)^{-1/2}
\right\},
\]

where \(\xi=0.75\) is a truncation constant. The resulting
estimator is

\[
\widehat{\mathrm{Var}}_{\mathrm{FG}}(\hat\beta)
=
V_m
\left(
\sum_{i=1}^n
C^{FG}_i U_i(\hat\beta) U_i(\hat\beta)^\top C^{FG \top}_i
\right)
V_m.
\]

This is returned as `varFG`.

For the **Mancl-DeRouen (MD)** correction [@mancl2001; @wang2023],

\[
C^{MD}_i = \left(I_p - \Omega_i V_m \right)^{-1},
\]

and

\[
\widehat{\mathrm{Var}}_{\mathrm{MD}}(\hat\beta)
=
V_m
\left(
\sum_{i=1}^n
C^{MD}_i U_i(\hat\beta) U_i(\hat\beta)^\top C^{MD \top}_i
\right)
V_m.
\]

This is returned as `varMD`.

### MBN correction

The **Morel-Bokossa-Neerchal (MBN)** correction [@morel2003; @wang2023] is written
as a linear combination of the robust sandwich variance and the model-based variance:

\[
\widehat{\mathrm{Var}}_{\mathrm{MBN}}(\hat\beta)
=
c_1 \widehat{\mathrm{Var}}_{\mathrm{robust}}(\hat\beta)
+
c_2 \hat{\phi} V_m,
\]

where \(c_1\), \(c_2\), and \(\hat{\phi}\) are defined as in Wang et al.
[@wang2023]. This is returned as `varMBN`.

### Hybrid MR estimators

Wang et al. [@wang2023] further propose hybrid estimators that combine the MR correction with
KC, FG, MD, and MBN. These are obtained by replacing \(U_i(\hat\beta)\)
with \(U_i^{\mathrm{MR}}(\hat\beta)\) in the KC, FG, and MD formulas, and by
replacing the robust variance with the MR variance in the MBN formula.

The resulting estimators are:

\[
\widehat{\mathrm{Var}}_{\mathrm{KCMR}}(\hat\beta)
=
V_m
\left(
\sum_{i=1}^n
C^{KC}_i U_i^{\mathrm{MR}}(\hat\beta)
U_i^{\mathrm{MR}}(\hat\beta)^\top C_i^{KC \top}
\right)
V_m,
\]

\[
\widehat{\mathrm{Var}}_{\mathrm{FGMR}}(\hat\beta)
=
V_m
\left(
\sum_{i=1}^n
C^{FG}_i U_i^{\mathrm{MR}}(\hat\beta)
U_i^{\mathrm{MR}}(\hat\beta)^\top C_i^{FG \top}
\right)
V_m,
\]

\[
\widehat{\mathrm{Var}}_{\mathrm{MDMR}}(\hat\beta)
=
V_m
\left(
\sum_{i=1}^n
C^{MD}_i U_i^{\mathrm{MR}}(\hat\beta)
U_i^{\mathrm{MR}}(\hat\beta)^\top C_i^{MD \top}
\right)
V_m,
\]

and

\[
\widehat{\mathrm{Var}}_{\mathrm{MBNMR}}(\hat\beta)
=
c_1 \widehat{\mathrm{Var}}_{\mathrm{MR}}(\hat\beta)
+
c_2 \hat{\phi}_{\mathrm{MR}} V_m.
\]

These are returned as `varKCMR`, `varFGMR`, `varMDMR`, and `varMBNMR`,
respectively.

In practice, all outputs returned by `MMBCV()` are variance-covariance matrices for
\(\hat\beta\). Standard errors from any estimator \(V\) can be obtained via

\[
\mathrm{SE}(\hat\beta_j) = \sqrt{V_{jj}}.
\]


## Example data

This package ships with a small simulated clustered multistate dataset `msdat3`.

```{r}
library(mmbcv)

data("msdat3")
dim(msdat3)
length(unique(msdat3$id))
length(unique(msdat3$clus_id))
table(msdat3$event)
```

## Variables

`msdat3` is in long (counting-process) format with one row per subject-interval:

-   **`id`**: subject id\
-   **`clus_id`**: cluster id\
-   **`Z`**: cluster-level treatment assignment (0/1)\
-   **`X`**: individual-level baseline covariate\
-   **`from`, `to`**: start/end states for the interval\
-   **`Tstart`, `Tstop`**: interval start/stop times\
-   **`event`**: end-state at `Tstop` (e.g., `censor`, `S1`, `S2`, `S3`, `D`)

The state labels are:

-  **`(s0)`**: initial state

-  **`S1`**: first intermediate state

- **`S2`**: second intermediate state

- **`S3`**: third intermediate state

- **`D`**: absorbing state

- **`censor`**: right-censoring at the end of the interval

A quick look at the observed transitions:

```{r}
with(msdat3, table(from, to))
```

## Fit a multistate Cox model (survival)

The **`MMBCV()`** function uses a fitted model object as input. In many workflows, this model is fit with `survival::coxph()` using list-formulas.

```{r}
library(survival)

fit <- coxph(
  list(
    Surv(Tstart, Tstop, event) ~ 1,
    state("(s0)"):state("S1") + state("S1"):state("S2") + state("S2"):state("S3") ~
      Z + X,
    state("(s0)"):state("D") + state("S1"):state("D") + state("S2"):state("D") +
      state("S3"):state("D") ~ (Z + X)/common
  ),
  data = msdat3,
  id = id,
  ties = "breslow",
  timefix = FALSE
)

fit
```

If the **`survival`** package is not installed, install it separately in an interactive R session using
`install.packages("survival")`.


## Compute variance estimators with **`MMBCV`**

Once you have fit the multi-state Cox model with the original dataset, compute the variance estimators with **`MMBCV()`**:

```{r}
out <- MMBCV(
  fit, msdat3,
  StartTime = Tstart,
  StopTime  = Tstop,
  ClusterID = clus_id,
  SubjectID = id,
  Event     = event,
  tie       = "breslow",
  details   = FALSE
)

names(out)
```

## Interpreting the output

`MMBCV()` returns a list of variance–covariance matrices for the fitted regression
coefficients. Each matrix can be used to compute standard errors, Wald tests, and
confidence intervals via `sqrt(diag(V))`. The output includes:

- `robust`: standard cluster-robust sandwich variance estimator 
- `varMR`: martingale-residual (MR) bias-corrected variance estimator 
- `varMD`: Mancl-DeRouen (MD) bias-corrected variance estimator 
- `varMDMR`: hybrid MD + MR bias-corrected variance estimator 
- `varKC`: Kauermann-Carroll (KC) bias-corrected variance estimator 
- `varKCMR`: hybrid KC + MR bias-corrected variance estimator 
- `varFG`: Fay-Graubard (FG) bias-corrected variance estimator 
- `varFGMR`: hybrid FG + MR bias-corrected variance estimator 
- `varMBN`: Morel-Bokossa-Neerchal (MBN) bias-corrected variance estimator 
- `varMBNMR`: hybrid MBN + MR bias-corrected variance estimator


Standard errors can be computed from any returned variance-covariance matrix with `sqrt(diag(...))`.

```{r}
Vlist <- out[c("robust","varMR","varMD","varMDMR","varFG","varFGMR","varKC","varKCMR","varMBN","varMBNMR")]
SE <- sapply(Vlist, function(V) sqrt(diag(V)))
SE
```

## Efron ties

If you fit multi-state model in `coxph()` using Efron ties, you need to use Efron ties in the variance calculation as well:

```{r}
out_efron <- MMBCV(
  fit, msdat3,
  StartTime = Tstart,
  StopTime  = Tstop,
  ClusterID = clus_id,
  SubjectID = id,
  Event     = event,
  tie       = "efron",
  details   = FALSE
)

sqrt(diag(out_efron$robust))
```

## Notes

-   **`MMBCV()`** assumes the fit object contains at least `cmap`, `rmap`, `states`, and `coefficients` (as in multistate fits produced by survival workflows).

-   The data passed to **`MMBCV()`** must match the data used to fit the model in `coxph()` (the same counting-process variables, ids, and state labels).

## Practical remarks

The robust estimator is the natural baseline. The bias-corrected estimators are most useful when the number of clusters is not large.

Wang et al. (2023) report that, in their simulations for clustered time-to-event outcomes, the MD estimator performed well when cluster-size variation was modest, while KCMR performed best under larger cluster-size variation. They also note that MR alone could be unstable in some scenarios. For that reason, it is useful to compare robust, varMD, and varKCMR in practice.

## References