Chapter 14: Hierarchical Generalized Linear Models

Kjell Nygren

2026-04-30

library(glmbayes)
library(coda)

Chapter 14: Hierarchical Generalized Linear Models

Chapter 13 focused on hierarchical linear models: the Eight Schools example with normally distributed group-level effects and known sampling variances. This chapter extends to hierarchical generalized linear models (Gelman and Hill 2007; Gelman et al. 2013), specifically Poisson regression with observation-level random effects.

We use the BikeSharing dataset and rglmb to implement a two-block Gibbs sampler. The Poisson likelihood is non-conjugate with the normal prior on the linear predictor, so each observation-level update requires an accept–reject step (Robert and Casella 2004) via rglmb. For a full demo, see demo("Ex_09_BikeSharingPoisson").

1. Model Structure

Let \(y_i\) denote the count (hourly bike rentals) for observation \(i\), and \(x_i\) the covariate vector. The hierarchical model is:

  1. Observation level: \(y_i \sim \text{Poisson}(\exp(\theta_i))\), with \(\theta_i\) a latent log-rate.
  2. Population level: \(\theta_i \sim N(x_i^\top \beta, \sigma_\theta^2)\), so covariates predict the random effects.

The two-block Gibbs sampler alternates:

  1. Block 1 (population): Draw \((\beta, \sigma_\theta^2) \mid \theta\) — conjugate Normal-Gamma given \(\theta\), implemented with rglmb (family gaussian()).
  2. Block 2 (observations): For each \(i\), draw \(\theta_i \mid \beta, \sigma_\theta^2, y_i\) — Poisson likelihood with normal prior, implemented with rglmb (family poisson()).

With \(n\) observations, Block 2 requires \(n\) calls to rglmb per iteration. To keep run time manageable for the vignette, we use a 1% random subset of the data for training and reserve the rest for out-of-sample prediction. The chain used in Sections 4–5 is precomputed with the same settings as demo("Ex_09_BikeSharingPoisson") (n_burn = 200, n_sim = 1000) and stored as inst/extdata/BikeSharing_ch14_gibbs.rds. Section 3 still shows the full Gibbs code for reference; it is not re-run when this vignette is built.

2. Data and Setup

data("BikeSharing")

# Center continuous predictors
cont_vars <- c("temp", "atemp", "hum", "windspeed", "hr_sin", "hr_cos", "mon_sin", "mon_cos")
BikeSharing_c <- BikeSharing
BikeSharing_c[cont_vars] <- scale(BikeSharing[cont_vars], center = TRUE, scale = FALSE)


# Formula (all variable model)
form <- cnt ~ part_of_day + quarter + holiday + workingday + weathersit +
  hr_sin + hr_cos + mon_sin + mon_cos + temp + atemp + hum + windspeed

# Formula (Limited variable model)
form2 <- cnt ~ part_of_day + quarter + holiday + workingday + weathersit +
  hr_sin + hr_cos + mon_sin + mon_cos

# Train/test split: indices bundled with precomputed Gibbs output (matches demo, set.seed(42))
pct_train <- 0.01
n <- nrow(BikeSharing_c)
ch14_path <- system.file("extdata", "BikeSharing_ch14_gibbs.rds", package = "glmbayes")
stopifnot(nzchar(ch14_path), file.exists(ch14_path))
ch14_saved <- readRDS(ch14_path)
stopifnot(length(ch14_saved$idx_train) == round(pct_train * n))
idx_train <- ch14_saved$idx_train
idx_test  <- setdiff(seq_len(n), idx_train)

Bike_train <- BikeSharing_c[idx_train, ]
Bike_test  <- BikeSharing_c[idx_test, ]

X_train <- model.matrix(form2, data = Bike_train)
X_test  <- model.matrix(form2, data = Bike_test)
y_train <- Bike_train$cnt
y_test  <- Bike_test$cnt
n_train <- length(y_train)
n_test  <- length(y_test)
p       <- ncol(X_train)

# Initial theta and prior for population block
theta <- log(y_train + 0.5)
data_pop <- data.frame(theta = theta, Bike_train)
form_pop <- theta ~ part_of_day + quarter + holiday + workingday + weathersit +
  hr_sin + hr_cos + mon_sin + mon_cos
ps_pop <- Prior_Setup(form_pop, family = gaussian(), data = data_pop)
#> Using default pwt = 0.05 (high-d default).

3. Two-Block Gibbs Sampler

The following chunk is not evaluated when the vignette is built (eval = FALSE). For a live run, execute it in your session or use demo("Ex_09_BikeSharingPoisson") (expect a long runtime).

n_burn <- 200
n_sim  <- 1000

beta_out   <- matrix(0, nrow = n_sim, ncol = p)
sigma_out  <- numeric(n_sim)
theta_out  <- matrix(0, nrow = n_sim, ncol = n_train)

set.seed(123)

# Burn-in
burn_time <- system.time({
  for (k in seq_len(n_burn)) {
    out_pop <- rglmb(1, theta, X_train, family = gaussian(),
      pfamily = dNormal_Gamma(ps_pop$mu, Sigma_0 = ps_pop$Sigma_0,
        ps_pop$shape, ps_pop$rate))
    beta   <- as.vector(out_pop$coefficients[1, ])
    sigma_theta_sq <- out_pop$dispersion[1]
    mu_all <- as.vector(X_train %*% beta)
    for (i in seq_len(n_train)) {
      theta[i] <- rglmb(1, y_train[i], matrix(1, 1, 1), family = poisson(),
        pfamily = dNormal(mu = mu_all[i], Sigma = sigma_theta_sq))$coefficients[1, 1]
    }
  }
})

burn_time

# Main simulation
sim_time <- system.time({
  for (k in seq_len(n_sim)) {
    out_pop <- rglmb(1, theta, X_train, family = gaussian(),
      pfamily = dNormal_Gamma(ps_pop$mu, Sigma_0 = ps_pop$Sigma_0,
        ps_pop$shape, ps_pop$rate))
    beta   <- as.vector(out_pop$coefficients[1, ])
    sigma_theta_sq <- out_pop$dispersion[1]
    mu_all <- as.vector(X_train %*% beta)
    for (i in seq_len(n_train)) {
      theta[i] <- rglmb(1, y_train[i], matrix(1, 1, 1), family = poisson(),
        pfamily = dNormal(mu = mu_all[i], Sigma = sigma_theta_sq))$coefficients[1, 1]
    }
    beta_out[k, ]  <- beta
    sigma_out[k]   <- sqrt(sigma_theta_sq)
    theta_out[k, ] <- theta
  }
})


sim_time

Load the precomputed draws (regenerate with demo("Ex_09_BikeSharingPoisson") and save with saveRDS to inst/extdata/BikeSharing_ch14_gibbs.rds).

beta_out  <- ch14_saved$beta_out
sigma_out <- ch14_saved$sigma_out
n_burn    <- ch14_saved$n_burn
n_sim     <- ch14_saved$n_sim
mcmc_main <- ch14_saved$mcmc_main
stopifnot(nrow(beta_out) == n_sim, length(sigma_out) == n_sim, ncol(beta_out) == p)

4. CODA Diagnostics

We summarize the main parameters (\(\beta\) and \(\sigma_\theta\)) using the coda package (Plummer et al. 2006). Random effects \(\theta\) are excluded from the summary. Object mcmc_main comes from the saved chain (coefficient columns match X_train, plus sigma_theta).

summary(mcmc_main)
#> 
#> Iterations = 1:1000
#> Thinning interval = 1 
#> Number of chains = 1 
#> Sample size per chain = 1000 
#> 
#> 1. Empirical mean and standard deviation for each variable,
#>    plus standard error of the mean:
#> 
#>                          Mean      SD Naive SE Time-series SE
#> (Intercept)           3.51912 0.26805 0.008477       0.008445
#> part_of_dayMorning    2.32069 0.27789 0.008788       0.009502
#> part_of_dayAfternoon  1.79354 0.36705 0.011607       0.012202
#> part_of_dayEvening    1.38643 0.28336 0.008961       0.009674
#> quarterQ2            -0.10774 0.30643 0.009690       0.010874
#> quarterQ3            -0.07867 0.36867 0.011658       0.012734
#> quarterQ4             0.45117 0.26950 0.008522       0.009739
#> holiday              -0.05947 0.40337 0.012756       0.012170
#> workingday           -0.10572 0.11632 0.003678       0.003678
#> weathersit           -0.16806 0.08173 0.002584       0.002923
#> hr_sin               -0.71063 0.18303 0.005788       0.006263
#> hr_cos               -0.06914 0.18005 0.005694       0.005848
#> mon_sin              -0.18727 0.18224 0.005763       0.006030
#> mon_cos              -0.51362 0.19698 0.006229       0.006721
#> sigma_theta           0.67173 0.03751 0.001186       0.001338
#> 
#> 2. Quantiles for each variable:
#> 
#>                          2.5%     25%      50%      75%     97.5%
#> (Intercept)           2.98100  3.3490  3.51721  3.69824  4.048539
#> part_of_dayMorning    1.79388  2.1413  2.31747  2.50564  2.864538
#> part_of_dayAfternoon  1.09263  1.5512  1.79171  2.03654  2.535714
#> part_of_dayEvening    0.83131  1.1894  1.38755  1.57537  1.920033
#> quarterQ2            -0.68878 -0.3203 -0.10473  0.11176  0.474540
#> quarterQ3            -0.86117 -0.3138 -0.07295  0.15504  0.646679
#> quarterQ4            -0.06511  0.2687  0.45146  0.62428  0.979505
#> holiday              -0.87759 -0.3393 -0.07459  0.22403  0.729777
#> workingday           -0.34146 -0.1813 -0.10715 -0.02728  0.120492
#> weathersit           -0.31819 -0.2261 -0.16709 -0.11521 -0.005618
#> hr_sin               -1.06890 -0.8303 -0.71183 -0.58576 -0.364450
#> hr_cos               -0.43660 -0.1952 -0.06709  0.05361  0.274436
#> mon_sin              -0.55833 -0.3092 -0.17928 -0.06931  0.167212
#> mon_cos              -0.88295 -0.6536 -0.51927 -0.38338 -0.126460
#> sigma_theta           0.60361  0.6465  0.66966  0.69657  0.748600

es <- coda::effectiveSize(mcmc_main)
knitr::kable(
  data.frame(parameter = names(es), effective_size = as.numeric(es)),
  row.names = FALSE,
  digits = 4,
  caption = "Effective sample size (coda::effectiveSize)"
)
Effective sample size (coda::effectiveSize)
parameter effective_size
(Intercept) 1007.6068
part_of_dayMorning 855.2566
part_of_dayAfternoon 904.8935
part_of_dayEvening 857.9380
quarterQ2 794.0825
quarterQ3 838.1824
quarterQ4 765.8225
holiday 1098.5595
workingday 1000.0000
weathersit 781.9330
hr_sin 854.0864
hr_cos 947.7249
mon_sin 913.3215
mon_cos 859.0090
sigma_theta 785.7880

ac1 <- coda::autocorr(mcmc_main, lag = 1)
ac1_mat <- drop(ac1)
own_ac1 <- diag(ac1_mat)
names(own_ac1) <- colnames(mcmc_main)
knitr::kable(
  data.frame(parameter = names(own_ac1), lag_1_autocorr = as.numeric(own_ac1)),
  row.names = FALSE,
  digits = 4,
  caption = "Lag-1 autocorrelation (diagonal of coda::autocorr, lag = 1)"
)
Lag-1 autocorrelation (diagonal of coda::autocorr, lag = 1)
parameter lag_1_autocorr
(Intercept) 0.0697
part_of_dayMorning 0.0775
part_of_dayAfternoon 0.0494
part_of_dayEvening 0.0760
quarterQ2 0.1143
quarterQ3 0.0875
quarterQ4 0.0491
holiday 0.0590
workingday 0.0134
weathersit 0.0754
hr_sin 0.0782
hr_cos 0.0872
mon_sin 0.0448
mon_cos 0.0753
sigma_theta 0.1195

5. Out-of-Sample Prediction

We compare two prediction strategies on the held-out test set:

beta_mean <- colMeans(beta_out)
sigma_mean <- mean(sigma_out)

# Option A: conditional mean
y_pred_cond <- exp(X_test %*% beta_mean)
mae_cond  <- mean(abs(y_test - y_pred_cond))
rmse_cond <- sqrt(mean((y_test - y_pred_cond)^2))

# Option B: posterior predictive mean
n_pred <- 500
y_pred_samples <- matrix(0, nrow = n_pred, ncol = n_test)
for (s in seq_len(n_pred)) {
  idx_s <- sample(n_sim, 1)
  beta_s  <- beta_out[idx_s, ]
  sigma_s <- sigma_out[idx_s]
  theta_test <- rnorm(n_test, mean = X_test %*% beta_s, sd = sigma_s)
  y_pred_samples[s, ] <- rpois(n_test, lambda = exp(theta_test))
}
y_pred_mean <- colMeans(y_pred_samples)
mae_pp  <- mean(abs(y_test - y_pred_mean))
rmse_pp <- sqrt(mean((y_test - y_pred_mean)^2))

cat("Option A (conditional): MAE =", round(mae_cond, 2), " RMSE =", round(rmse_cond, 2), "\n")
#> Option A (conditional): MAE = 87.27  RMSE = 136.55
cat("Option B (post. pred.): MAE =", round(mae_pp, 2), " RMSE =", round(rmse_pp, 2), "\n")
#> Option B (post. pred.): MAE = 96.16  RMSE = 137.24

6. Summary

Component Implementation
Block 1 (population) rglmb(1, theta, X, family = gaussian(), pfamily = dNormal_Gamma(...))
Block 2 (observations) For each \(i\): rglmb(1, y[i], matrix(1,1,1), family = poisson(), pfamily = dNormal(mu_i, sigma^2))
Prior for \(\beta, \sigma^2\) From Prior_Setup on \(\theta \sim X\beta\) (Gaussian)
Data BikeSharing (1% train, 99% test for vignette)
Vignette chain Precomputed BikeSharing_ch14_gibbs.rds (200 burn-in, 1000 stored iterations)

For longer runs and full diagnostics, see demo("Ex_09_BikeSharingPoisson"). For hierarchical linear models, see Chapter 13.

References

Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. 3rd ed. CRC Press.
Gelman, Andrew, and Jennifer Hill. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
Plummer, Martyn, Nicky Best, Kate Cowles, and Karen Vines. 2006. CODA: Convergence Diagnosis and Output Analysis for MCMC.” R News 6 (1): 7–11. https://CRAN.R-project.org/doc/Rnews/.
Robert, Christian P., and George Casella. 2004. Monte Carlo Statistical Methods. 2nd ed. Springer.