---
title: "Subgroup Analysis with Multivariate Binary Outcomes in Multilevel Data"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Subgroup Analysis with Multivariate Binary Outcomes in Multilevel Data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

```{r, echo=FALSE}
biblio <- RefManageR::ReadBib("../inst/REFERENCES.bib", check = "error")
```

## Introduction

When data are clustered (e.g., students within schools, patients within hospitals), observations are not independent. Ignoring this clustering can lead to:

- Underestimated standard errors
- Inflated Type I error rates
- Invalid inference
- Under- or overpowered decisions

The `bglmm()` function extends `bglm()` to handle clustered data by incorporating **random effects**.

## When to Use Multilevel Models

Use `bglmm()` when:

- Observations are grouped into clusters (schools, hospitals, families, etc.);
- Cluster variation affects outcomes
- Clustering is theoretically meaningful

Use `bglm()` when:

- Data are not clustered; AND
- Clustering is not theoretically meaningful

## Example: Educational Intervention Study

We'll analyze a study comparing two teaching methods across 20 schools on reading and math ability, differentiating on student baseline ability.

### Generate Example Data

```{r setup}
library(bmco)
set.seed(2020)

n_schools <- 20 
n_per_school <- 15
n_total <- n_schools * n_per_school

# Generate random intercepts
school_intercepts_math <- rnorm(n_schools, mean = 0, sd = 1.50)
school_intercepts_reading <- rnorm(n_schools, mean = 0, sd = 1.50)

# Cluster randomization: whole schools receive the same method.
school_method <- rep(c("traditional", "new_method"), each = n_schools/2)

study_data <- data.frame(
  school_id = factor(rep(1:n_schools, each = n_per_school)),
  method = rep(school_method, each = n_per_school),
  baseline_ability = rnorm(n_total, mean = 50, sd = 10),
  stringsAsFactors = FALSE
)

# Standardize covariate
study_data$baseline_ability_std <- scale(study_data$baseline_ability)[, 1]

study_data$math_pass <- NA
study_data$reading_pass <- NA

p_math <- p_reading <- rep(NA, nrow(study_data))

for (i in 1:nrow(study_data)) {
  school <- as.numeric(study_data$school_id[i])
  is_new <- ifelse(study_data$method[i] == "new_method", 1, 0)

  p_math[i]    <- plogis(-0.50 + 0.75 * is_new + 0.10 * study_data$baseline_ability_std[i] + 
                           0.20 * is_new * study_data$baseline_ability_std[i] + 
                           school_intercepts_math[school])
  p_reading[i] <- plogis(-0.50 + 0.80 * is_new + 0.05 * study_data$baseline_ability_std[i] + 
                           0.15 * is_new * study_data$baseline_ability_std[i] + 
                           school_intercepts_reading[school])
  
  study_data$math_pass[i]    <- rbinom(1, 1, p_math[i])
  study_data$reading_pass[i] <- rbinom(1, 1, p_reading[i])
}

```
### Fit Multilevel Model

```{r fit_bglmm, results='hide', message=FALSE, warning=FALSE, eval=TRUE}
set.seed(2025)
result <- bglmm(
  data = study_data,
  grp = "method",
  grp_a = "traditional",
  grp_b = "new_method",
  id_var = "school_id",
  x_var = "baseline_ability",
  y_vars = c("math_pass", "reading_pass"),
  x_method = "Empirical",
  x_def = c(-Inf, Inf),
  fixed = c("method", "baseline_ability", "method_baseline_ability"),
  random = c("Intercept"),
  test = "right_sided",
  rule = "All",
  n_burn = 200,
  n_it = 500,
  n_thin = 1
)
```

```{r show_results, eval=TRUE}
print(result)
summary(result)
```

### Interpretation

The posterior probability indicates the probability that the new teaching method improves **both** math and reading outcomes (rule = "All"), accounting for:

1. **Fixed effects**: Overall method effect, baseline ability effect, and their interaction
2. **Random effects**: School-specific deviations from the overall pattern
3. **Clustering**: Within-school correlation of student outcomes

## Specifying Population of Interest

Like `bglm()`, you can define subpopulations:

### Specific Ability Level

```{r value_method, results='hide', message=FALSE, warning=FALSE, eval = FALSE}
set.seed(2027)
result_value <- bglmm(
  # ... other arguments ...
  x_method = "Value",
  x_def = 50,  # Average ability
)
```

```{r show_value, eval = FALSE}
cat("Effect for students with baseline ability = 50:\n")
cat("  Treatment difference:", result_value$delta$mean_delta, "\n")
cat("  Posterior probability:", result_value$delta$pop, "\n")
```

### Ability Range (Empirical)

```{r empirical_method, results='hide', message=FALSE, warning=FALSE, eval = FALSE}
set.seed(2028)
result_empirical <- bglmm(
  # ... other arguments ...
  x_method = "Empirical",
  x_def = c(40,60) # Ability range 40-60
)
```

### Ability Range (Analytical)

```{r analytical_method, results='hide', message=FALSE, warning=FALSE, eval = FALSE}
set.seed(2029)
result_analytical <- bglmm(
  # ... other arguments ...
    x_method = "Analytical",
    x_def = c(40,60)
 )
```

## Decision Rules with Multilevel Data

### All Rule (Conjunctive)

Both outcomes must favor the new method:

```{r rule_all, eval = FALSE}
result_all <- bglmm(
  data = study_data,
  # ... other arguments ...
  rule = "All"  # P(new_method better on BOTH outcomes)
)
```

### Any Rule (Disjunctive)

At least one outcome must favor the new method:

```{r rule_any, eval = FALSE}
result_any <- bglmm(
  data = study_data,
  # ... other arguments ...
  rule = "Any"  # P(new_method better on AT LEAST ONE outcome)
)
```

### Compensatory Rule

Weighted combination (e.g., math weighted 60%, reading 40%):

```{r rule_comp, eval = FALSE}
result_comp <- bglmm(
  data = study_data,
  # ... other arguments ...
  rule = "Comp",
  w = c(0.6, 0.4)  # Weights for math and reading
)
```

## Specifying Prior Distributions

`bglmm()` uses **weakly informative priors** by default. You can customize these priors for:

- Fixed effects regression coefficients (`b_mu0`, `b_sigma0`)
- Random effects means (`g_mu0`, `g_sigma0`) 
- Random effects covariance (`nu0`, `tau0`) 

### Fixed Effects Priors (bglm and bglmm)

Fixed effects have a **multivariate normal prior**:

$$\beta \sim N(\mu_0, \Sigma_0)$$

#### Default Priors

```{r default_priors, eval=TRUE}
# Default
p_fixed <- 2
b_mu0 <- rep(0, p_fixed)                 # Zero prior mean
b_sigma0 <- solve(diag(1e1, p_fixed))    # Small prior precision (large prior variance on the log-odds scale). 
                                         # Weakly informative. 
                                         # solve() generates precision matrix needed as input
```

Where `p_fixed` is the number of fixed effects. In our example, we use:
- Covariate effect(s)
- Interaction term(s)

#### Custom Fixed Effects Priors

```{r custom_prior_1, results='hide', message=FALSE, warning=FALSE, eval=TRUE}
# Assume treatment improves outcomes (positive coefficient on group)

custom_b_mu0 <- c(0, 0.5)                 # Expect positive group effect 
custom_b_sigma0 <- solve(diag(c(1e1, 5))) # More certainty on group effect
```

### Random Effects Priors

#### Population-Level Random Effects

Random effects means have a **multivariate normal prior**:

$$\gamma \sim N(\mu_g, \Sigma_g)$$

**Default**:
```{r random_default, eval=TRUE}
p_random <- 2
g_mu0 <- rep(0, p_random)               # Prior mean
g_sigma0 <- solve(diag(1e1, p_random))  # Weakly informative prior variance; 
                                        # solve() transforms variance matrix to precision matrix needed as input
```

Where `p_random` is the number of fixed effects. In our example, we use:
- Intercept
- Group effect

#### Random Effects Covariance

The covariance matrix has an **inverse-Wishart prior**:

$$\Sigma \sim \text{InvWishart}(\nu_0, \mathbf{T}_0)$$

**Default**:
```{r iwishart_default, eval=TRUE}
nu0 <- p_random                # Degrees of freedom (dimension)
tau0 <- diag(1e-1, p_random)   # Scale matrix
```

**Important**: 
- `nu0` must be ≥ `p_random`
- `tau0` must be **positive definite** (never use `diag(0, p_random)`)
- Smaller `nu0` = less informative
- Larger `tau0` diagonal values = expect more between-cluster variation


### Prior Sensitivity Analysis

Compare results under different priors:

```{r prior_sensitivity, results='hide', message=FALSE, warning=FALSE, eval = FALSE}
# -----------------------------------------------------------------
# Three priors that pull in clearly different directions.
# -----------------------------------------------------------------

# 1. Weakly informative (neutral starting point)
#    Diffuse on fixed effects, diffuse IW for random effects.
result_weak <- bglmm(
  # ... other arguments ...
  b_mu0    = rep(0, p_fixed),
  b_sigma0 = solve(diag(10, p_fixed)),
  g_mu0    = rep(0, p_random),
  g_sigma0 = solve(diag(10, p_random)),
  nu0  = p_random,
  tau0 = diag(0.1, p_random),
)

# 2. Skeptical prior
#    Tight variance so the prior resists positive or negative evidence. 
#    Also a small tau0, implying homogeneity between schools.
result_skeptical <- bglmm(
  # ... other arguments ...
  b_mu0    = rep(0, p_fixed),               
  b_sigma0 = solve(diag(c(1, 1), p_fixed)),  # Tight — hard(er) to overcome
  g_mu0    = rep(0, p_random),
  g_sigma0 = solve(diag(c(1, 1), p_random)), # Tight — hard(er) to overcome
  nu0  = p_random + 4,                 # More informative IW
  tau0 = diag(0.05, p_random),         # Expects small school variation
)

```

**Interpretation**: 

- If results are **similar** → data dominate, prior has little influence
- If results **differ substantially** → prior is influential

### Guidelines for Choosing Priors

1. **Default priors**: Use for exploratory analysis or when no prior information exists

2. **Informative priors**: Use when:
   - You have strong theoretical expectations
   - Previous studies provide evidence
   - You want to regularize extreme estimates
   - Sample size is very small

3. **Skeptical priors**: Use for:
   - Conservative hypothesis testing
   - Replication studies (favor null)
   - High-stakes decisions requiring strong evidence

4. **Prior sensitivity**: Check:
  
```{r prior_check, eval = FALSE}
   # Run with default priors
   # Run with informative priors
   # Compare posterior probabilities and effect sizes
   # If similar → robust; if different → prior-dependent
```

### Common Mistakes to Avoid

- **Don't**: Use `tau0 = diag(0, p)` (singular matrix)
- **Do**: Use `tau0 = diag(1e-1, p)` or `diag(0.5, p)`

- **Don't**: Set `nu0 < p` (too few degrees of freedom)
- **Do**: Use `nu0 =$\geq p$ 

- **Don't**: Use extremely small prior variances without justification
- **Do**: Use weakly informative priors unless you have strong evidence

- **Don't**: Ignore prior sensitivity when sample size is small
- **Do**: Report results under multiple priors for transparency

### Further Reading

For Bayesian logistic regression priors:
- Gelman, A. Jakulin, A., Pittau, M.  G. & Su, Y-S (2008). A weakly informative default prior distribution for logistic and other regression models. *Annals of Applied Statistics, 2*(4), 1360-1383. \doi{10.1214/08-AOAS191}.

For inverse-Wishart priors:
- Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. *Bayesian Analysis, 1*(3), 515-534. \doi{10.1214/06-BA117A}

## MCMC Diagnostics

The function internally checks the multivariate potential scale reduction factor (MPSRF) and warns if > 1.1. 
Additional MCMC diagnostics will be returned When `return_diagnostics = TRUE`.
When `return_diagnostic_plots = TRUE`, `bglmm()` returns diagnostic plots as well.

```{r diagnose, results='hide', message=FALSE, warning=FALSE, eval=TRUE}
set.seed(2030)
result_diag <- bglmm(
  data = study_data,
  grp = "method",
  grp_a = "traditional",
  grp_b = "new_method",
  id_var = "school_id",
  x_var = "baseline_ability",
  y_vars = c("math_pass", "reading_pass"),
  x_method = "Value",
  x_def = 50,
  n_burn = 200,
  n_it = 500,
  n_thin = 1,
  start = c(0.5, 1),
  return_diagnostics = TRUE
)
```

Key diagnostics:

- **MPSRF** (Multivariate PSRF): Ideally < 1.1
- **Effective sample size**: Should be > 100 per parameter
- **Rhat**: Ideally < 1.1 for all parameters

```{r show_diag, eval=TRUE}
# Check convergence
cat("Random effects convergence:\n")
print(result_diag$diags$g$convergence)

cat("\nRandom effects covariance convergence:\n")
print(result_diag$diags$tau$convergence)

```

Available plots: 

- **Autocorrelation**: Is ideally high at lag 0, then drops off quickly toward 0 within a few lags. 
- **Traceplot**: Chains should fluctuate randomly around a stable mean, with no clear trends, drifts, or long flat stretches. Good mixing.
- **Density**: Posterior density of individual parameters (regression coefficients and variance parameters) if preferably smooth, unimodal (unless theory suggests otherwise), and without irregular spikes.

```{r plot_convergence, eval = FALSE}
plot(result_diag)                                    # All
plot(result_diag, type = "trace")                    # Trace only
plot(result_diag, type = "density")                  # Density only

plot(result_diag, which = "fixed")                   # Fixed effects
plot(result_diag, which = "random")                  # Random effects
plot(result_diag, which = "variance")                # Variance components
plot(result_diag, which = "all")                     # All

# Combine type + which
plot(result_diag, type = "trace", which = "all")          # Trace for all parameters
plot(result_diag, type = "density", which = "variance")   # Variance densities
```

If convergence issues occur:

1. Increase `n_burn` (more warmup)
2. Increase `n_it` (more samples)
3. Check for very few clusters 
4. Check for very small cluster sizes 
5. Check range of covariate and consider standardizing
6. Check for problems and warnings in data (perfect separation, low variation, etc.)

If autocorrelation is high, usually the effective sample size is much lower than `n_it`: 

1. Consider thinning
2. Consider increasing `n_it`

For more information on MCMC diagnostics, consult

- Brooks, S. P. & Gelman, A. (1998). General methods for monitoring convergence of iterative simulations. _Journal of Computational and Graphical Statistics, 7_(4), 434–455.\doi{10.1080/10618600.1998.10474787}

- Gelman, A. & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. _Statistical Science, 7_(4), 457–472. \doi{10.1214/ss/1177011136}

## Extracting Posterior Samples

```{r samples, results='hide', message=FALSE, warning=FALSE, eval = FALSE}
set.seed(2031)
result_samples <- bglmm(
# ... other arguments ...
  return_samples = TRUE
)
```

```{r use_samples, eval = FALSE}
# Treatment effect samples
str(result_samples$samples$delta)

# Compute custom summaries
cat("Treatment effect quantiles:\n")
print(apply(result_samples$samples$delta, 2, quantile, probs = c(0.025, 0.5, 0.975)))

# Probability that effect on math > 0.1
cat("\nP(delta_math > 0.1):", mean(result_samples$samples$delta[, 1] > 0.1), "\n")
```

## Data Requirements

For reliable multilevel modeling:

1. **Number of clusters**: Ideally there are sufficient clusters
   - More clusters → Better random effect estimates

2. **Cluster sizes**: 
   - Very small clusters provide little information about random effects
   - Unbalanced cluster sizes are acceptable

3. **Group balance within clusters**: Ideally, each cluster has both treatment groups
   - If some clusters have only one group, the model can still fit but with a warning

4. **Missing data**: Handled automatically
   - Rows with missing outcomes or covariates are removed per cluster
   - Complete-case analysis within each cluster

## Common Issues and Solutions

### Warning: "Very few clusters (J < 5)"

**Problem**: Too few clusters might hinder reliable random effect estimation

**Solutions**:
- Collect more clusters if possible
- Use `bglm()` instead (ignores clustering), but only if appropriate


### Warning: "MCMC chains may not have converged"

**Problem**: MPSRF > 1.1

**Solutions**:
1. Increase burn-in: `n_burn = 2000` or higher
2. Increase iterations: `n_it = 5000` or higher
3. Reduce thinning: `n_thin = 1` (keeps more samples)
4. Check data quality (outliers, sufficient variation)

### Slow computation

**Problem**: Large number of clusters or long chains

**Solutions**:
- Start with small `n_it` to verify model runs
- Increase `n_thin` to reduce stored samples: `n_thin = 10`
- Run overnight for production analyses
- Typical times: 5-10 minutes for 10 clusters, 10k iterations

## Summary

`bglmm()` extends Bayesian multivariate analysis to clustered data by:

1. **Accounting for clustering** via random effects
2. **Allowing for subgroup effects** via logistic regression
3. **Flexible population definitions** (Value, Empirical, Analytical)
4. **Multivariate decision rules** (All, Any, Compensatory)

Choose your analysis:

- **`bmvb()`**: Full sample, no clustering
- **`bglm()`**: Subgroup analysis, no clustering
- **`bglmm()`**: Subgroup analysis + clustering

All three functions support multivariate binary outcomes and provide posterior probabilities for treatment effects.

## References

For more details on statistical methodology or when using the `bglmm()`-function, please cite: 

`r capture.output(print(biblio["Kavelaars2023"], .opts = list(style = "markdown", bib.style = "authoryear", sorting = "ydnt")))`