---
title: "Chapter 13: Hierarchical Linear Models"
author: "Kjell Nygren"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: REFERENCES.bib
reference-section-title: References
vignette: >
  %\VignetteIndexEntry{Chapter 13: Hierarchical Linear Models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(glmbayes)
```

# Chapter 13: Hierarchical Linear Models

This chapter focuses on **hierarchical linear models**—Gaussian models with random effects—and related use cases that rely on **`rlmb`** for block Gibbs sampling. The function **`rlmb`** is a minimal Bayesian simulation engine for linear (Gaussian) models: it takes a response vector, design matrix, and prior specification, and returns draws from the posterior. It is intended for settings where you need **repeated draws with updated conditioning** inside a Gibbs sampler (e.g., alternating between blocks for coefficients and dispersion, or for population and unit-level parameters).

- **Use `lmb` or `glmb`** when you want a single fit or repeated standalone fits: formula interface, full object, `summary`, `predict`, etc.
- **Use `rlmb`** when you are **building your own MCMC** for linear models, such as a two-block Gibbs sampler: you alternate between blocks, each time calling `rlmb` with the current values of the other block. The low-overhead, matrix-in/matrix-out interface is then a better fit. For hierarchical **generalized** linear models (e.g., Poisson), use **`rglmb`**—see Chapter 14.

We illustrate two use cases for **`rlmb`** in the context of **linear (Gaussian)** models: (1) estimating **dispersion and regression coefficients** when the prior is non-conjugate, and (2) **hierarchical (random effects)** models. All examples use data already available in the package. For hierarchical **generalized** linear models (e.g., Poisson), see Chapter 14.

## 1. Use Case 1: Dispersion and Regression Coefficients (Gaussian)

In the Gaussian linear model \(y = X\beta + \varepsilon\) with \(\varepsilon \sim \text{N}(0, \phi I)\), the likelihood depends on the residual variance \(\phi\). If the prior on \((\beta, \phi)\) is not conjugate (e.g., you want independent priors on \(\beta\) and \(\phi\)), you can still sample from the posterior by a **two-block Gibbs sampler**:

1. **Block 1:** Draw \(\beta \mid \phi, y\) — Gaussian posterior given fixed \(\phi\).
2. **Block 2:** Draw \(\phi \mid \beta, y\) — Gamma posterior given fixed \(\beta\).

**`rlmb`** implements both conditionals: use **`dNormal(..., dispersion = phi)`** for the \(\beta\)-step and **`dGamma(shape, rate, beta = beta_current)`** for the \(\phi\)-step; then take the returned **`dispersion`** as the new \(\phi\).

We use the plant weight data from [@Dobson1990] (Page 9): weight by treatment group (Ctl vs Trt). The code below sets the prior from **`Prior_Setup`**, then runs a short burn-in and stores draws from the Gibbs sampler. Posterior means are compared to the conjugate **`lmb(..., dIndependent_Normal_Gamma(...))`** fit.

```{r dobson-setup}
## Dobson (1990) "An Introduction to Generalized Linear Models", Page 9: Plant Weight Data
ctl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69)
group <- gl(2, 10, 20, labels = c("Ctl", "Trt"))
weight <- c(ctl, trt)

ps <- Prior_Setup(weight ~ group)
x    <- ps$x
y    <- ps$y
mu   <- ps$mu
V    <- ps$Sigma
shape <- ps$shape
rate  <- ps$rate
rate_dg <- if (!is.null(ps$rate_gamma)) ps$rate_gamma else rate
disp_ML <- ps$dispersion
```

```{r dobson-gibbs, eval = TRUE}
set.seed(180)
dispersion_current <- disp_ML

## Burn-in
n_burn  <- 1000
for (i in seq_len(n_burn)) {
  out_beta <- rlmb(n = 1, y = y, x = x, pfamily = dNormal(mu = mu, Sigma = V, dispersion = dispersion_current))
  out_phi  <- rlmb(n = 1, y = y, x = x, pfamily = dGamma(shape = shape, rate = rate_dg, beta = out_beta$coefficients[1, ]))
  dispersion_current <- out_phi$dispersion
}

## Sampling
n_sim   <- 1000
beta_out <- matrix(0, nrow = n_sim, ncol = 2)
disp_out <- numeric(n_sim)
for (i in seq_len(n_sim)) {
  out_beta <- rlmb(n = 1, y = y, x = x, pfamily = dNormal(mu = mu, Sigma = V, dispersion = dispersion_current))
  out_phi  <- rlmb(n = 1, y = y, x = x, pfamily = dGamma(shape = shape, rate = rate_dg, beta = out_beta$coefficients[1, ]))
  dispersion_current    <- out_phi$dispersion
  beta_out[i, ]         <- out_beta$coefficients[1, ]
  disp_out[i]           <- out_phi$dispersion
}
```

```{r dobson-compare}
lmb_D9 <- lmb(weight ~ group, dIndependent_Normal_Gamma(mu, V, shape = ps$shape_ING, rate = ps$rate))
print(lmb_D9)

## Compare Gibbs vs lmb: means and SDs
coef_names <- colnames(ps$x)
tbl <- data.frame(
  Parameter  = c(coef_names, "dispersion"),
  Gibbs_mean = c(colMeans(beta_out), mean(disp_out)),
  Gibbs_SD   = c(apply(beta_out, 2, sd), sd(disp_out)),
  lmb_mean   = c(colMeans(lmb_D9$coefficients), mean(lmb_D9$dispersion)),
  lmb_SD     = c(apply(lmb_D9$coefficients, 2, sd), sd(lmb_D9$dispersion))
)
knitr::kable(tbl, digits = 4, caption = "Dobson plant weight: Gibbs sampler vs conjugate lmb fit")
```

The Gibbs posterior means and standard deviations for \(\beta\) and \(\phi\) should be close to the conjugate **`lmb`** fit. In practice, use longer burn-in and more stored draws.

**Gamma regression.** The same idea—alternating draws of coefficients given dispersion and dispersion given coefficients—can be applied to **gamma regression** when the dispersion (shape) of the Gamma response is unknown and must be estimated. The conditioning steps are then implemented with **`rglmb`** (family `Gamma`) and an appropriate prior for the dispersion block. The package does not currently provide a ready-made example for gamma regression in this setting; typical textbook datasets often imply large weights that make the Gibbs steps costly. A future vignette or example may use an alternate dataset once a suitable one is identified.


## 2. Use Case 2: Hierarchical (Random Effects) Models

### 2.1 Eight Schools with dNormal_Gamma Prior (Gibbs-suitable)

The **Eight Schools** example from [@Gelman2013, Ch. 5] has \(J = 8\) schools; each school \(j\) has an observed effect estimate \(y_j\) with known sampling variance \(\sigma_j^2\). The hierarchical model has a population mean \(\mu\) and variance \(\sigma_\theta^2\), and school-specific effects \(\theta_j\). With the **`dNormal_Gamma`** prior on \((\mu, \sigma_\theta^2)\), the Gibbs sampler is efficient and suitable for repeated use inside a loop because the envelope is built once per call without the minimization overhead of **`dIndependent_Normal_Gamma`**.

1. **Population block:** Draw \((\mu, \sigma_\theta^2) \mid \theta\) using **`rlmb`** with **`dNormal_Gamma(mu_mu, sigma_mu/disp_ML, shape, rate)`**.
2. **School block:** For each \(j\), draw \(\theta_j \mid \mu, \sigma_\theta^2, y_j\) using **`rlmb`** with **`dNormal(mu, Sigma = sigma_theta^2, dispersion = sigma_j^2)`**.

```{r schools-data}
school  <- c("A", "B", "C", "D", "E", "F", "G", "H")
estimate <- c(28.39, 7.94, -2.75, 6.82, -0.64, 0.63, 18.01, 12.16)
sd_obs  <- c(14.9, 10.2, 16.3, 11.0, 9.4, 11.4, 10.4, 17.6)
J       <- length(school)
sigma_y_sq <- sd_obs^2   # known sampling variances for each school

mu_mu     <- mean(estimate)
sigma_mu  <- var(estimate)
n_prior   <- 0.5
disp_ML   <- var(estimate)
shape     <- n_prior / 2
rate      <- disp_ML * shape
```

The **`dNormal_Gamma`** Gibbs chunk below is **not evaluated** when the vignette is
built (`eval = FALSE`).
Stored draws are loaded in **`chapter13-load-schools-gibbs`** from
**`inst/extdata/Chapter13_Eight_Schools_two_gibbs.rds`**, produced by
**`data-raw/make_Chapter13_Eight_Schools_gibbs_output.R`**.

```{r chapter13-load-schools-gibbs}
ch13_path <- system.file(
  "extdata", "Chapter13_Eight_Schools_two_gibbs.rds",
  package = "glmbayes"
)
stopifnot(nzchar(ch13_path), file.exists(ch13_path))
ch13 <- readRDS(ch13_path)
ng <- ch13$normal_gamma
ind <- ch13$indep_norm_gamma
stopifnot(
  ncol(ng$theta_out) == J,
  nrow(ng$theta_out) == ng$n_sim,
  ncol(ind$theta_out) == J,
  nrow(ind$theta_out) == ind$n_sim
)
theta_out_ng <- ng$theta_out
mu_out_ng <- ng$mu_out
sigma_theta_out_ng <- ng$sigma_theta_out
n_burn_ng <- ng$n_burn
n_sim_ng <- ng$n_sim
theta_out <- ind$theta_out
mu_out <- ind$mu_out
sigma_theta_out <- ind$sigma_theta_out
iters_out1 <- ind$iters_out1
iters_out2 <- ind$iters_out2
n_burn_schools <- ind$n_burn
n_sim_schools <- ind$n_sim
```

```{r schools-ng-gibbs, eval = FALSE}
set.seed(101)
x_one <- as.matrix(rep(1, J), nrow = J, ncol = 1)
theta_ng <- estimate

n_burn_ng <- 1000
n_sim_ng  <- 1000
theta_out_ng <- matrix(0, nrow = n_sim_ng, ncol = J)
mu_out_ng    <- numeric(n_sim_ng)
sigma_theta_out_ng <- numeric(n_sim_ng)

for (k in seq_len(n_burn_ng)) {
  out_pop <- rlmb(1, y = theta_ng, x = x_one,
                  pfamily = dNormal_Gamma(mu_mu, sigma_mu / disp_ML, shape = shape, rate = rate))
  mu_theta     <- out_pop$coefficients[1, 1]
  sigma_theta_sq <- out_pop$dispersion
  for (j in seq_len(J)) {
    theta_ng[j] <- rlmb(1, y = estimate[j], x = as.matrix(1),
                        pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1]
  }
}

for (k in seq_len(n_sim_ng)) {
  out_pop <- rlmb(1, y = theta_ng, x = x_one,
                  pfamily = dNormal_Gamma(mu_mu, sigma_mu / disp_ML, shape = shape, rate = rate))
  mu_theta     <- out_pop$coefficients[1, 1]
  sigma_theta_sq <- out_pop$dispersion
  for (j in seq_len(J)) {
    theta_ng[j] <- rlmb(1, y = estimate[j], x = as.matrix(1),
                        pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1]
  }
  theta_out_ng[k, ] <- theta_ng
  mu_out_ng[k]      <- mu_theta
  sigma_theta_out_ng[k] <- sqrt(sigma_theta_sq)
}
```

```{r schools-ng-summary}
colMeans(theta_out_ng)
mean(mu_out_ng)
mean(sigma_theta_out_ng)
sqrt(diag(var(theta_out_ng)))
```

The **`dNormal_Gamma`** prior is well-suited for Gibbs sampling: each call avoids the RSS/UB2 minimization used by **`dIndependent_Normal_Gamma`**, so run time stays manageable.


### 2.2 Eight Schools with dIndependent_Normal_Gamma Prior

The same hierarchical model can be fit with **`dIndependent_Normal_Gamma`**, which
assumes prior independence between coefficients and precision.
This prior triggers repeated envelope minimization (RSS and UB2) on each
population-block draw, making it slower and less suitable for long Gibbs runs.

The Gibbs chunks below are **not evaluated** at vignette build time (`eval = FALSE`);
results for this block are loaded from the same RDS as in §2.1
(**`chapter13-load-schools-gibbs`**).

1. **Population block:** Draw \((\mu, \sigma_\theta^2) \mid \theta\) using **`rlmb`** with **`dIndependent_Normal_Gamma(mu_mu, sigma_mu, shape, rate)`**.
2. **School block:** Same as above—draw \(\theta_j \mid \mu, \sigma_\theta^2, y_j\) using **`dNormal(...)`**.

```{r schools-indep-setup, eval = FALSE}
theta <- estimate
n_burn_schools <- 1000
n_sim_schools  <- 1000
theta_out <- matrix(0, nrow = n_sim_schools, ncol = J)
mu_out    <- numeric(n_sim_schools)
sigma_theta_out <- numeric(n_sim_schools)
iters_out1 <- numeric(n_burn_schools)
iters_out2 <- numeric(n_sim_schools)
```

```{r schools-burnin, eval = FALSE}
set.seed(102)
x_one <- as.matrix(rep(1, J), nrow = J, ncol = 1)

for (k in seq_len(n_burn_schools)) {
  out_pop <- rlmb(1, y = theta, x = x_one,
                  pfamily = dIndependent_Normal_Gamma(mu_mu, sigma_mu, shape = shape, rate = rate))
  mu_theta     <- out_pop$coefficients[1, 1]
  sigma_theta_sq <- out_pop$dispersion
  for (j in seq_len(J)) {
    theta[j] <- rlmb(1, y = estimate[j], x = as.matrix(1),
                     pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1]
  }
  iters_out1[k] <- out_pop$iters
}
```

```{r schools-iters-burnin, eval = TRUE}
## Mean draws per acceptance (population block) after burn-in
## Lower = better. In demo Ex_07_Schools.R this was ~6.01; compare to check for regression.
mean(iters_out1)
```

```{r schools-gibbs, eval = FALSE}
for (k in seq_len(n_sim_schools)) {
  out_pop <- rlmb(1, y = theta, x = x_one,
                  pfamily = dIndependent_Normal_Gamma(mu_mu, sigma_mu, shape = shape, rate = rate))
  mu_theta     <- out_pop$coefficients[1, 1]
  sigma_theta_sq <- out_pop$dispersion
  for (j in seq_len(J)) {
    theta[j] <- rlmb(1, y = estimate[j], x = as.matrix(1),
                     pfamily = dNormal(mu_theta, Sigma = sigma_theta_sq, dispersion = sigma_y_sq[j]))$coefficients[1, 1]
  }
  theta_out[k, ] <- theta
  mu_out[k]      <- mu_theta
  sigma_theta_out[k] <- sqrt(sigma_theta_sq)
  iters_out2[k]  <- out_pop$iters
}
```

```{r schools-iters-main, eval = TRUE}
## Mean draws per acceptance (population block) during main run
mean(iters_out2)
```

```{r schools-summary, eval = TRUE}
colMeans(theta_out)
mean(mu_out)
mean(sigma_theta_out)
sqrt(diag(var(theta_out)))
```

Posterior means of \(\theta_j\) show shrinkage toward the population mean compared to the raw estimates.

Block Gibbs sampling with **`rlmb`** for the population and school blocks follows the same conditioning logic as in other **glmbayes** linear-model examples; the independent Normal--Gamma population step is related to the joint envelope construction in [@glmbayesIndNormGammaVignette].


## 3. Summary

| Use case | Function | Block 1 | Block 2 | Data used |
|----------|----------|---------|---------|-----------|
| Dispersion + \(\beta\) (Gaussian) | **rlmb** | \(\beta \mid \phi\): `dNormal(..., dispersion = phi)` | \(\phi \mid \beta\): `dGamma(shape, rate, beta = beta)` | Dobson plant weight |
| Hierarchical (Eight Schools, Gibbs-suitable) | **rlmb** | \((\mu, \sigma_\theta^2) \mid \theta\): `dNormal_Gamma(mu, Sigma_0, shape, rate)` | \(\theta_j \mid \mu, \sigma_\theta^2\): `dNormal(..., dispersion = sigma_j^2)` | Eight Schools |
| Hierarchical (Eight Schools) | **rlmb** | \((\mu, \sigma_\theta^2) \mid \theta\): `dIndependent_Normal_Gamma` | \(\theta_j \mid \mu, \sigma_\theta^2\): `dNormal(..., dispersion = sigma_j^2)` | Eight Schools |

The same dispersion-and-coefficients idea extends to **gamma regression** when dispersion must be estimated; a worked example may be added when a suitable dataset is available.

For hierarchical generalized linear models (e.g., Poisson), see Chapter 14. For prior setup and conjugate fits, see **`?Prior_Setup`**, **`?lmb`**, and **`?glmb`**.