---
title: "Chapter 14: Hierarchical Generalized Linear Models"
author: "Kjell Nygren"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: REFERENCES.bib
reference-section-title: References
vignette: >
  %\VignetteIndexEntry{Chapter 14: Hierarchical Generalized Linear Models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(glmbayes)
library(coda)
```

# Chapter 14: Hierarchical Generalized Linear Models

Chapter 13 focused on **hierarchical linear models**: the Eight Schools example with normally distributed group-level effects and known sampling variances. This chapter extends to **hierarchical generalized linear models** [@GelmanHill2007; @Gelman2013], specifically Poisson regression with observation-level random effects.

We use the **BikeSharing** dataset and **`rglmb`** to implement a two-block Gibbs sampler. The Poisson likelihood is non-conjugate with the normal prior on the linear predictor, so each observation-level update requires an accept--reject step [@RobertCasella2004] via **`rglmb`**. For a full demo, see **`demo("Ex_09_BikeSharingPoisson")`**.

## 1. Model Structure

Let $y_i$ denote the count (hourly bike rentals) for observation $i$, and $x_i$ the covariate vector. The hierarchical model is:

1. **Observation level:** $y_i \sim \text{Poisson}(\exp(\theta_i))$, with $\theta_i$ a latent log-rate.
2. **Population level:** $\theta_i \sim N(x_i^\top \beta, \sigma_\theta^2)$, so covariates predict the random effects.

The two-block Gibbs sampler alternates:

1. **Block 1 (population):** Draw $(\beta, \sigma_\theta^2) \mid \theta$ — conjugate Normal-Gamma given $\theta$, implemented with **`rglmb`** (family `gaussian()`).
2. **Block 2 (observations):** For each $i$, draw $\theta_i \mid \beta, \sigma_\theta^2, y_i$ — Poisson likelihood with normal prior, implemented with **`rglmb`** (family `poisson()`).

With $n$ observations, Block 2 requires $n$ calls to **`rglmb`** per iteration. To keep run time manageable for the vignette, we use a 1% random subset of the data for training and reserve the rest for out-of-sample prediction. The chain used in Sections 4–5 is **precomputed** with the same settings as **`demo("Ex_09_BikeSharingPoisson")`** (`n_burn = 200`, `n_sim = 1000`) and stored as **`inst/extdata/BikeSharing_ch14_gibbs.rds`**. Section 3 still shows the full Gibbs code for reference; it is not re-run when this vignette is built.

## 2. Data and Setup

```{r bikesharing-setup}
data("BikeSharing")

# Center continuous predictors
cont_vars <- c("temp", "atemp", "hum", "windspeed", "hr_sin", "hr_cos", "mon_sin", "mon_cos")
BikeSharing_c <- BikeSharing
BikeSharing_c[cont_vars] <- scale(BikeSharing[cont_vars], center = TRUE, scale = FALSE)


# Formula (all variable model)
form <- cnt ~ part_of_day + quarter + holiday + workingday + weathersit +
  hr_sin + hr_cos + mon_sin + mon_cos + temp + atemp + hum + windspeed

# Formula (Limited variable model)
form2 <- cnt ~ part_of_day + quarter + holiday + workingday + weathersit +
  hr_sin + hr_cos + mon_sin + mon_cos

# Train/test split: indices bundled with precomputed Gibbs output (matches demo, set.seed(42))
pct_train <- 0.01
n <- nrow(BikeSharing_c)
ch14_path <- system.file("extdata", "BikeSharing_ch14_gibbs.rds", package = "glmbayes")
stopifnot(nzchar(ch14_path), file.exists(ch14_path))
ch14_saved <- readRDS(ch14_path)
stopifnot(length(ch14_saved$idx_train) == round(pct_train * n))
idx_train <- ch14_saved$idx_train
idx_test  <- setdiff(seq_len(n), idx_train)

Bike_train <- BikeSharing_c[idx_train, ]
Bike_test  <- BikeSharing_c[idx_test, ]

X_train <- model.matrix(form2, data = Bike_train)
X_test  <- model.matrix(form2, data = Bike_test)
y_train <- Bike_train$cnt
y_test  <- Bike_test$cnt
n_train <- length(y_train)
n_test  <- length(y_test)
p       <- ncol(X_train)

# Initial theta and prior for population block
theta <- log(y_train + 0.5)
data_pop <- data.frame(theta = theta, Bike_train)
form_pop <- theta ~ part_of_day + quarter + holiday + workingday + weathersit +
  hr_sin + hr_cos + mon_sin + mon_cos
ps_pop <- Prior_Setup(form_pop, family = gaussian(), data = data_pop)
```

## 3. Two-Block Gibbs Sampler

The following chunk is **not evaluated** when the vignette is built (`eval = FALSE`). For a live run, execute it in your session or use **`demo("Ex_09_BikeSharingPoisson")`** (expect a long runtime).

```{r bikesharing-gibbs, eval = FALSE}
n_burn <- 200
n_sim  <- 1000

beta_out   <- matrix(0, nrow = n_sim, ncol = p)
sigma_out  <- numeric(n_sim)
theta_out  <- matrix(0, nrow = n_sim, ncol = n_train)

set.seed(123)

# Burn-in
burn_time <- system.time({
  for (k in seq_len(n_burn)) {
    out_pop <- rglmb(1, theta, X_train, family = gaussian(),
      pfamily = dNormal_Gamma(ps_pop$mu, Sigma_0 = ps_pop$Sigma_0,
        ps_pop$shape, ps_pop$rate))
    beta   <- as.vector(out_pop$coefficients[1, ])
    sigma_theta_sq <- out_pop$dispersion[1]
    mu_all <- as.vector(X_train %*% beta)
    for (i in seq_len(n_train)) {
      theta[i] <- rglmb(1, y_train[i], matrix(1, 1, 1), family = poisson(),
        pfamily = dNormal(mu = mu_all[i], Sigma = sigma_theta_sq))$coefficients[1, 1]
    }
  }
})

burn_time

# Main simulation
sim_time <- system.time({
  for (k in seq_len(n_sim)) {
    out_pop <- rglmb(1, theta, X_train, family = gaussian(),
      pfamily = dNormal_Gamma(ps_pop$mu, Sigma_0 = ps_pop$Sigma_0,
        ps_pop$shape, ps_pop$rate))
    beta   <- as.vector(out_pop$coefficients[1, ])
    sigma_theta_sq <- out_pop$dispersion[1]
    mu_all <- as.vector(X_train %*% beta)
    for (i in seq_len(n_train)) {
      theta[i] <- rglmb(1, y_train[i], matrix(1, 1, 1), family = poisson(),
        pfamily = dNormal(mu = mu_all[i], Sigma = sigma_theta_sq))$coefficients[1, 1]
    }
    beta_out[k, ]  <- beta
    sigma_out[k]   <- sqrt(sigma_theta_sq)
    theta_out[k, ] <- theta
  }
})


sim_time
```

Load the precomputed draws (regenerate with **`demo("Ex_09_BikeSharingPoisson")`** and save with `saveRDS` to `inst/extdata/BikeSharing_ch14_gibbs.rds`).

```{r bikesharing-gibbs-loaded}
beta_out  <- ch14_saved$beta_out
sigma_out <- ch14_saved$sigma_out
n_burn    <- ch14_saved$n_burn
n_sim     <- ch14_saved$n_sim
mcmc_main <- ch14_saved$mcmc_main
stopifnot(nrow(beta_out) == n_sim, length(sigma_out) == n_sim, ncol(beta_out) == p)
```

## 4. CODA Diagnostics

We summarize the main parameters ($\beta$ and $\sigma_\theta$) using the **coda** package [@PlummerEtAl2006]. Random effects $\theta$ are excluded from the summary. Object `mcmc_main` comes from the saved chain (coefficient columns match `X_train`, plus `sigma_theta`).

```{r bikesharing-coda}
summary(mcmc_main)

es <- coda::effectiveSize(mcmc_main)
knitr::kable(
  data.frame(parameter = names(es), effective_size = as.numeric(es)),
  row.names = FALSE,
  digits = 4,
  caption = "Effective sample size (coda::effectiveSize)"
)

ac1 <- coda::autocorr(mcmc_main, lag = 1)
ac1_mat <- drop(ac1)
own_ac1 <- diag(ac1_mat)
names(own_ac1) <- colnames(mcmc_main)
knitr::kable(
  data.frame(parameter = names(own_ac1), lag_1_autocorr = as.numeric(own_ac1)),
  row.names = FALSE,
  digits = 4,
  caption = "Lag-1 autocorrelation (diagonal of coda::autocorr, lag = 1)"
)
```

## 5. Out-of-Sample Prediction

We compare two prediction strategies on the held-out test set:

- **Option A (conditional):** $\hat{y}_i = \exp(x_i^\top \bar{\beta})$, where $\bar{\beta}$ is the posterior mean. No random-effect uncertainty.
- **Option B (posterior predictive):** Draw $\theta_{\text{test}} \sim N(X_{\text{test}} \beta^{(s)}, \sigma^{(s)2})$, then $y \sim \text{Poisson}(\exp(\theta_{\text{test}}))$ for each posterior sample $s$; report the mean over samples.

```{r bikesharing-pred}
beta_mean <- colMeans(beta_out)
sigma_mean <- mean(sigma_out)

# Option A: conditional mean
y_pred_cond <- exp(X_test %*% beta_mean)
mae_cond  <- mean(abs(y_test - y_pred_cond))
rmse_cond <- sqrt(mean((y_test - y_pred_cond)^2))

# Option B: posterior predictive mean
n_pred <- 500
y_pred_samples <- matrix(0, nrow = n_pred, ncol = n_test)
for (s in seq_len(n_pred)) {
  idx_s <- sample(n_sim, 1)
  beta_s  <- beta_out[idx_s, ]
  sigma_s <- sigma_out[idx_s]
  theta_test <- rnorm(n_test, mean = X_test %*% beta_s, sd = sigma_s)
  y_pred_samples[s, ] <- rpois(n_test, lambda = exp(theta_test))
}
y_pred_mean <- colMeans(y_pred_samples)
mae_pp  <- mean(abs(y_test - y_pred_mean))
rmse_pp <- sqrt(mean((y_test - y_pred_mean)^2))

cat("Option A (conditional): MAE =", round(mae_cond, 2), " RMSE =", round(rmse_cond, 2), "\n")
cat("Option B (post. pred.): MAE =", round(mae_pp, 2), " RMSE =", round(rmse_pp, 2), "\n")
```

## 6. Summary

| Component | Implementation |
|-----------|----------------|
| **Block 1 (population)** | `rglmb(1, theta, X, family = gaussian(), pfamily = dNormal_Gamma(...))` |
| **Block 2 (observations)** | For each $i$: `rglmb(1, y[i], matrix(1,1,1), family = poisson(), pfamily = dNormal(mu_i, sigma^2))` |
| **Prior for $\beta, \sigma^2$** | From `Prior_Setup` on $\theta \sim X\beta$ (Gaussian) |
| **Data** | BikeSharing (1% train, 99% test for vignette) |
| **Vignette chain** | Precomputed `BikeSharing_ch14_gibbs.rds` (200 burn-in, 1000 stored iterations) |

For longer runs and full diagnostics, see **`demo("Ex_09_BikeSharingPoisson")`**. For hierarchical linear models, see Chapter 13.