Chapter 14: Hierarchical Generalized Linear Models

Chapter 13 focused on hierarchical linear models: the Eight Schools example with normally distributed group-level effects and known sampling variances. This chapter extends to hierarchical generalized linear models (Gelman and Hill 2007; Gelman et al. 2013), specifically Poisson regression with observation-level random effects.

We use the BikeSharing dataset and rglmb to implement a two-block Gibbs sampler. The Poisson likelihood is non-conjugate with the normal prior on the linear predictor, so each observation-level update requires an accept–reject step (Robert and Casella 2004) via rglmb. For a full demo, see demo("Ex_09_BikeSharingPoisson").

1. Model Structure

Let \(y_i\) denote the count (hourly bike rentals) for observation \(i\), and \(x_i\) the covariate vector. The hierarchical model is:

Observation level: \(y_i \sim \text{Poisson}(\exp(\theta_i))\), with \(\theta_i\) a latent log-rate.
Population level: \(\theta_i \sim N(x_i^\top \beta, \sigma_\theta^2)\), so covariates predict the random effects.

The two-block Gibbs sampler alternates:

Block 1 (population): Draw \((\beta, \sigma_\theta^2) \mid \theta\) — conjugate Normal-Gamma given \(\theta\), implemented with rglmb (family gaussian()).
Block 2 (observations): For each \(i\), draw \(\theta_i \mid \beta, \sigma_\theta^2, y_i\) — Poisson likelihood with normal prior, implemented with rglmb (family poisson()).

With \(n\) observations, Block 2 requires \(n\) calls to rglmb per iteration. To keep run time manageable for the vignette, we use a 1% random subset of the data for training and reserve the rest for out-of-sample prediction. The chain used in Sections 4–5 is precomputed with the same settings as demo("Ex_09_BikeSharingPoisson") (n_burn = 200, n_sim = 1000) and stored as inst/extdata/BikeSharing_ch14_gibbs.rds. Section 3 still shows the full Gibbs code for reference; it is not re-run when this vignette is built.

2. Data and Setup

data("BikeSharing")

# Center continuous predictors
cont_vars <- c("temp", "atemp", "hum", "windspeed", "hr_sin", "hr_cos", "mon_sin", "mon_cos")
BikeSharing_c <- BikeSharing
BikeSharing_c[cont_vars] <- scale(BikeSharing[cont_vars], center = TRUE, scale = FALSE)


# Formula (all variable model)
form <- cnt ~ part_of_day + quarter + holiday + workingday + weathersit +
  hr_sin + hr_cos + mon_sin + mon_cos + temp + atemp + hum + windspeed

# Formula (Limited variable model)
form2 <- cnt ~ part_of_day + quarter + holiday + workingday + weathersit +
  hr_sin + hr_cos + mon_sin + mon_cos

# Train/test split: indices bundled with precomputed Gibbs output (matches demo, set.seed(42))
pct_train <- 0.01
n <- nrow(BikeSharing_c)
ch14_path <- system.file("extdata", "BikeSharing_ch14_gibbs.rds", package = "glmbayes")
stopifnot(nzchar(ch14_path), file.exists(ch14_path))
ch14_saved <- readRDS(ch14_path)
stopifnot(length(ch14_saved$idx_train) == round(pct_train * n))
idx_train <- ch14_saved$idx_train
idx_test  <- setdiff(seq_len(n), idx_train)

Bike_train <- BikeSharing_c[idx_train, ]
Bike_test  <- BikeSharing_c[idx_test, ]

X_train <- model.matrix(form2, data = Bike_train)
X_test  <- model.matrix(form2, data = Bike_test)
y_train <- Bike_train$cnt
y_test  <- Bike_test$cnt
n_train <- length(y_train)
n_test  <- length(y_test)
p       <- ncol(X_train)

# Initial theta and prior for population block
theta <- log(y_train + 0.5)
data_pop <- data.frame(theta = theta, Bike_train)
form_pop <- theta ~ part_of_day + quarter + holiday + workingday + weathersit +
  hr_sin + hr_cos + mon_sin + mon_cos
ps_pop <- Prior_Setup(form_pop, family = gaussian(), data = data_pop)
#> Using default pwt = 0.05 (high-d default).

3. Two-Block Gibbs Sampler

The following chunk is not evaluated when the vignette is built (eval = FALSE). For a live run, execute it in your session or use demo("Ex_09_BikeSharingPoisson") (expect a long runtime).

n_burn <- 200
n_sim  <- 1000

beta_out   <- matrix(0, nrow = n_sim, ncol = p)
sigma_out  <- numeric(n_sim)
theta_out  <- matrix(0, nrow = n_sim, ncol = n_train)

set.seed(123)

# Burn-in
burn_time <- system.time({
  for (k in seq_len(n_burn)) {
    out_pop <- rglmb(1, theta, X_train, family = gaussian(),
      pfamily = dNormal_Gamma(ps_pop$mu, Sigma_0 = ps_pop$Sigma_0,
        ps_pop$shape, ps_pop$rate))
    beta   <- as.vector(out_pop$coefficients[1, ])
    sigma_theta_sq <- out_pop$dispersion[1]
    mu_all <- as.vector(X_train %*% beta)
    for (i in seq_len(n_train)) {
      theta[i] <- rglmb(1, y_train[i], matrix(1, 1, 1), family = poisson(),
        pfamily = dNormal(mu = mu_all[i], Sigma = sigma_theta_sq))$coefficients[1, 1]
    }
  }
})

burn_time

# Main simulation
sim_time <- system.time({
  for (k in seq_len(n_sim)) {
    out_pop <- rglmb(1, theta, X_train, family = gaussian(),
      pfamily = dNormal_Gamma(ps_pop$mu, Sigma_0 = ps_pop$Sigma_0,
        ps_pop$shape, ps_pop$rate))
    beta   <- as.vector(out_pop$coefficients[1, ])
    sigma_theta_sq <- out_pop$dispersion[1]
    mu_all <- as.vector(X_train %*% beta)
    for (i in seq_len(n_train)) {
      theta[i] <- rglmb(1, y_train[i], matrix(1, 1, 1), family = poisson(),
        pfamily = dNormal(mu = mu_all[i], Sigma = sigma_theta_sq))$coefficients[1, 1]
    }
    beta_out[k, ]  <- beta
    sigma_out[k]   <- sqrt(sigma_theta_sq)
    theta_out[k, ] <- theta
  }
})


sim_time

Load the precomputed draws (regenerate with demo("Ex_09_BikeSharingPoisson") and save with saveRDS to inst/extdata/BikeSharing_ch14_gibbs.rds).

beta_out  <- ch14_saved$beta_out
sigma_out <- ch14_saved$sigma_out
n_burn    <- ch14_saved$n_burn
n_sim     <- ch14_saved$n_sim
mcmc_main <- ch14_saved$mcmc_main
stopifnot(nrow(beta_out) == n_sim, length(sigma_out) == n_sim, ncol(beta_out) == p)

4. CODA Diagnostics

We summarize the main parameters (\(\beta\) and \(\sigma_\theta\)) using the coda package (Plummer et al. 2006). Random effects \(\theta\) are excluded from the summary. Object mcmc_main comes from the saved chain (coefficient columns match X_train, plus sigma_theta).

summary(mcmc_main)
#> 
#> Iterations = 1:1000
#> Thinning interval = 1 
#> Number of chains = 1 
#> Sample size per chain = 1000 
#> 
#> 1. Empirical mean and standard deviation for each variable,
#>    plus standard error of the mean:
#> 
#>                          Mean      SD Naive SE Time-series SE
#> (Intercept)           3.51912 0.26805 0.008477       0.008445
#> part_of_dayMorning    2.32069 0.27789 0.008788       0.009502
#> part_of_dayAfternoon  1.79354 0.36705 0.011607       0.012202
#> part_of_dayEvening    1.38643 0.28336 0.008961       0.009674
#> quarterQ2            -0.10774 0.30643 0.009690       0.010874
#> quarterQ3            -0.07867 0.36867 0.011658       0.012734
#> quarterQ4             0.45117 0.26950 0.008522       0.009739
#> holiday              -0.05947 0.40337 0.012756       0.012170
#> workingday           -0.10572 0.11632 0.003678       0.003678
#> weathersit           -0.16806 0.08173 0.002584       0.002923
#> hr_sin               -0.71063 0.18303 0.005788       0.006263
#> hr_cos               -0.06914 0.18005 0.005694       0.005848
#> mon_sin              -0.18727 0.18224 0.005763       0.006030
#> mon_cos              -0.51362 0.19698 0.006229       0.006721
#> sigma_theta           0.67173 0.03751 0.001186       0.001338
#> 
#> 2. Quantiles for each variable:
#> 
#>                          2.5%     25%      50%      75%     97.5%
#> (Intercept)           2.98100  3.3490  3.51721  3.69824  4.048539
#> part_of_dayMorning    1.79388  2.1413  2.31747  2.50564  2.864538
#> part_of_dayAfternoon  1.09263  1.5512  1.79171  2.03654  2.535714
#> part_of_dayEvening    0.83131  1.1894  1.38755  1.57537  1.920033
#> quarterQ2            -0.68878 -0.3203 -0.10473  0.11176  0.474540
#> quarterQ3            -0.86117 -0.3138 -0.07295  0.15504  0.646679
#> quarterQ4            -0.06511  0.2687  0.45146  0.62428  0.979505
#> holiday              -0.87759 -0.3393 -0.07459  0.22403  0.729777
#> workingday           -0.34146 -0.1813 -0.10715 -0.02728  0.120492
#> weathersit           -0.31819 -0.2261 -0.16709 -0.11521 -0.005618
#> hr_sin               -1.06890 -0.8303 -0.71183 -0.58576 -0.364450
#> hr_cos               -0.43660 -0.1952 -0.06709  0.05361  0.274436
#> mon_sin              -0.55833 -0.3092 -0.17928 -0.06931  0.167212
#> mon_cos              -0.88295 -0.6536 -0.51927 -0.38338 -0.126460
#> sigma_theta           0.60361  0.6465  0.66966  0.69657  0.748600

es <- coda::effectiveSize(mcmc_main)
knitr::kable(
  data.frame(parameter = names(es), effective_size = as.numeric(es)),
  row.names = FALSE,
  digits = 4,
  caption = "Effective sample size (coda::effectiveSize)"
)

Effective sample size (coda::effectiveSize)
parameter	effective_size
(Intercept)	1007.6068
part_of_dayMorning	855.2566
part_of_dayAfternoon	904.8935
part_of_dayEvening	857.9380
quarterQ2	794.0825
quarterQ3	838.1824
quarterQ4	765.8225
holiday	1098.5595
workingday	1000.0000
weathersit	781.9330
hr_sin	854.0864
hr_cos	947.7249
mon_sin	913.3215
mon_cos	859.0090
sigma_theta	785.7880


ac1 <- coda::autocorr(mcmc_main, lag = 1)
ac1_mat <- drop(ac1)
own_ac1 <- diag(ac1_mat)
names(own_ac1) <- colnames(mcmc_main)
knitr::kable(
  data.frame(parameter = names(own_ac1), lag_1_autocorr = as.numeric(own_ac1)),
  row.names = FALSE,
  digits = 4,
  caption = "Lag-1 autocorrelation (diagonal of coda::autocorr, lag = 1)"
)

Lag-1 autocorrelation (diagonal of coda::autocorr, lag = 1)
parameter	lag_1_autocorr
(Intercept)	0.0697
part_of_dayMorning	0.0775
part_of_dayAfternoon	0.0494
part_of_dayEvening	0.0760
quarterQ2	0.1143
quarterQ3	0.0875
quarterQ4	0.0491
holiday	0.0590
workingday	0.0134
weathersit	0.0754
hr_sin	0.0782
hr_cos	0.0872
mon_sin	0.0448
mon_cos	0.0753
sigma_theta	0.1195

5. Out-of-Sample Prediction

We compare two prediction strategies on the held-out test set:

Option A (conditional): \(\hat{y}_i = \exp(x_i^\top \bar{\beta})\), where \(\bar{\beta}\) is the posterior mean. No random-effect uncertainty.
Option B (posterior predictive): Draw \(\theta_{\text{test}} \sim N(X_{\text{test}} \beta^{(s)}, \sigma^{(s)2})\), then \(y \sim \text{Poisson}(\exp(\theta_{\text{test}}))\) for each posterior sample \(s\); report the mean over samples.

beta_mean <- colMeans(beta_out)
sigma_mean <- mean(sigma_out)

# Option A: conditional mean
y_pred_cond <- exp(X_test %*% beta_mean)
mae_cond  <- mean(abs(y_test - y_pred_cond))
rmse_cond <- sqrt(mean((y_test - y_pred_cond)^2))

# Option B: posterior predictive mean
n_pred <- 500
y_pred_samples <- matrix(0, nrow = n_pred, ncol = n_test)
for (s in seq_len(n_pred)) {
  idx_s <- sample(n_sim, 1)
  beta_s  <- beta_out[idx_s, ]
  sigma_s <- sigma_out[idx_s]
  theta_test <- rnorm(n_test, mean = X_test %*% beta_s, sd = sigma_s)
  y_pred_samples[s, ] <- rpois(n_test, lambda = exp(theta_test))
}
y_pred_mean <- colMeans(y_pred_samples)
mae_pp  <- mean(abs(y_test - y_pred_mean))
rmse_pp <- sqrt(mean((y_test - y_pred_mean)^2))

cat("Option A (conditional): MAE =", round(mae_cond, 2), " RMSE =", round(rmse_cond, 2), "\n")
#> Option A (conditional): MAE = 87.27  RMSE = 136.55
cat("Option B (post. pred.): MAE =", round(mae_pp, 2), " RMSE =", round(rmse_pp, 2), "\n")
#> Option B (post. pred.): MAE = 96.16  RMSE = 137.24

6. Summary

Component	Implementation
Block 1 (population)	`rglmb(1, theta, X, family = gaussian(), pfamily = dNormal_Gamma(...))`
Block 2 (observations)	For each \(i\): `rglmb(1, y[i], matrix(1,1,1), family = poisson(), pfamily = dNormal(mu_i, sigma^2))`
Prior for \(\beta, \sigma^2\)	From `Prior_Setup` on \(\theta \sim X\beta\) (Gaussian)
Data	BikeSharing (1% train, 99% test for vignette)
Vignette chain	Precomputed `BikeSharing_ch14_gibbs.rds` (200 burn-in, 1000 stored iterations)

For longer runs and full diagnostics, see demo("Ex_09_BikeSharingPoisson"). For hierarchical linear models, see Chapter 13.

Chapter 14: Hierarchical Generalized Linear Models

Kjell Nygren

2026-04-30

Chapter 14: Hierarchical Generalized Linear Models

1. Model Structure

2. Data and Setup

3. Two-Block Gibbs Sampler

4. CODA Diagnostics

5. Out-of-Sample Prediction

6. Summary

References