The lmb function brings the familiar interface of R’s lm into a Bayesian framework. It preserves the core outputs and methods you know from lm—such as coefficient tables, fitted values, and residual summaries—while adding Bayesian priors and posterior summaries. In this chapter, we fit the classical linear model side-by-side with its Bayesian counterpart, demonstrating how their results align when using common default settings. Standard GLM references include (McCullagh and Nelder 1989) and (Venables and Ripley 2002); Bayesian linear-model material aligns with (Gelman et al. 2013) and (Agresti 2015).
For consistency with our Binomial and Count vignettes, we use the default priors generated by Prior_Setup() and fix the dispersion at its maximum-likelihood estimate. Holding dispersion constant ensures that the Bayesian point estimates closely match the classical ones, isolating the effect of the prior. We defer a deeper dive into alternative prior choices—including priors on dispersion parameters—to a later chapter.
In a standard linear regression model, the response is modeled as
\[ Y_i = x_i^\top \beta + \varepsilon_i, \qquad \varepsilon_i \sim N(0,\ \sigma^2), \]
where: - \(x_i\) is the vector of predictors for observation \(i\), - \(\beta\) is the vector of regression coefficients, - \(\sigma^2\) is the error variance (dispersion), treated as fixed in this chapter.
The fitted values are
\[ \hat{\mu}_i = x_i^\top \beta. \]
Up to an additive constant, the log‑likelihood for \(\beta\) is
\[ \ell(\beta) = -\frac{1}{2\sigma^2} \sum_{i=1}^n (y_i - x_i^\top \beta)^2. \]
In matrix notation:
\[ \ell(\beta) = -\frac{1}{2\sigma^2} (y - X\beta)^\top (y - X\beta). \]
This is the same criterion minimized by ordinary least squares, so the classical estimate of \(\beta\) is the familiar OLS solution.
To form a Bayesian linear model, we place a multivariate Normal prior on the coefficients:
\[ \beta \sim N(\mu_0,\ \Sigma_0). \]
The default prior from Prior_Setup() is a Zellner‑type
\(g\)-prior, which scales \(\Sigma_0\) relative to the information in
the design matrix.
With the default setting \(\mathrm{pwt} =
0.01\), the prior is weakly informative and exerts only mild
shrinkage toward \(\mu_0\).
Combining the likelihood and the Normal prior yields a Normal posterior distribution:
\[ \beta \mid y \sim N(\mu_{\text{post}},\ \Sigma_{\text{post}}), \]
where
\[ \Sigma_{\text{post}} = \left( \frac{1}{\sigma^2} X^\top X + \Sigma_0^{-1} \right)^{-1}, \]
\[ \mu_{\text{post}} = \Sigma_{\text{post}} \left( \frac{1}{\sigma^2} X^\top y + \Sigma_0^{-1} \mu_0 \right). \]
Because both the likelihood and prior are quadratic in \(\beta\), the posterior is available in
closed form.
This conjugate structure explains why the Bayesian estimates closely
track the classical OLS estimates when the prior is weak and \(\sigma^2\) is fixed at the classical
estimate.
In this chapter, we treat \(\sigma^2\) as known.
This ensures:
Models that estimate \(\sigma^2\) or place a prior on it are covered in Chapter 11.
We illustrate our models using the Plant Weight dataset from Dobson (1990), which measures the dry weight of plants grown under control (“Ctl”) and treatment (“Trt”) conditions. We combine the two groups into a single response vector and create a factor variable for group membership.
In this section, we fit both the classical and Bayesian linear models to the plant data. First, we run lm() to establish a baseline. Then, we use Prior_Setup() to define our Bayesian priors and call lmb() with those settings.
This simple ordinary least squares regression of plant weight on group membership will serve as our point of comparison.
Here we generate default Gaussian priors and plug them into the Bayesian model.
Prior_Setup() returns the prior mean vector (mu), covariance matrix (V), and the maximum-likelihood estimate of dispersion (disp_ML) for a fixed-dispersion Gaussian model.
We now utilize the print function for Prior_Setup to understand the default prior specification.
ps
#>
#> Call: Prior_Setup(formula = weight ~ group, family = gaussian())
#>
#> Setting up a Zellner g-type prior:
#> pwt = 0.01
#> g = (1 - pwt)/pwt = 99
#>
#> Note: n_prior was computed as (pwt / (1 - pwt)) * n_likelihood:
#> n_prior = 0.202
#> n_likelihood = 20
#>
#> Prior Setup Summary
#> ====================
#>
#> Prior Estimates with 95% Confidence Intervals
#> Prior.Mean Prior.SD CI.Lower CI.Upper pwt
#> (Intercept) 4.8465 2.192002 0.550255 9.142745 0.01
#> groupTrt 0.0000 3.099959 -6.075808 6.075808 0.01
#>
#> Prior Correlation Matrix
#> (Intercept) groupTrt
#> (Intercept) 1.0000 -0.7071
#> groupTrt -0.7071 1.0000
#>
#> Conditional Dispersion (Gaussian family): 0.4853
#>
#> Gamma Prior on Residual Precision:
#> shape = 0.601
#> rate = 0.2917
#> Expected precision (inverse variance) = 2.06041e+00 which implies 1/Expected precision = 4.85341e-01
#>
#> Applicable to gaussian models with compound pfamilies (e.g., dNormal_Gamma, dIndependent_Normal_Gamma),
#> as well as for Gamma regression, quasipoisson, and quasibinomial models.The printed output begins by echoing the original function call, confirming the model formula and the specified likelihood family. Next, the setup of a Zellner g-type prior is displayed, where pwt = 0.01 indicates that the prior contributes just 1% of the total information, while the likelihood contributes the remaining 99%. This reflects a weakly informative prior, allowing the data to dominate the posterior. The derived value of g = 99 quantifies the relative weight of the likelihood to the prior (or more precisely how much the Prior Variance-Covariance matrix is scaled up relative to the Likelihood Variance-Covariance matrix).
The note on n_prior shows how this pseudo-sample size was computed from pwt and the effective sample size of the likelihood (n_likelihood = 20). The summary section includes prior means, standard deviations, and confidence intervals for each coefficient. The prior correlation matrix reveals the structure of dependencies induced by the design matrix—here, a strong negative correlation between the intercept and treatment effect. In addition, the conditional dispersion (0.485) reflects the residual variance under the Gaussian model, consistent with the classical GLM fit.
Finally, the printout also contains information related to shape and rate parameters that are used to specify priors for the dispersion. We discuss these extensively in a separate vignette and will not cover here.
In our next vignette, we discuss the default prior in more details. We also discuss how users can utilize the options in the Prior_Setup function to get different g-prior specifications that rescale the Prior-Variance covariance matrix as well as options for using alternative prior mean specifications. In a later vignette, we dig deeper into the use of differential pwt specifications for different dimensions and the use of more informative priors.
We show how to call the lmb function under three different prior specification. The first (a dNormal Prior) treats the dispersion as a fixed and known constant (and that specification is the focus of most of this vignette). The other two are alternative priors that also provide a prior specification for the dispersion. Those priors will be covered in greater details in later chapters but are introduced so that users are aware of the potential need to estimate the dispersion. Many glm models (like the binomial and poisson models) don’t have dispersion parameters. For that reason, we focus most of our discussion in this chapter on models with fixed dispersion parameter.
lmb.D9=lmb(weight ~ group,dNormal(mu=ps$mu,Sigma =ps$Sigma,dispersion=ps$dispersion))
##lmb.D9_v2=lmb(weight ~ group,dNormal_Gamma(mu=ps$mu,Sigma_0=ps$Sigma_0,shape=ps$shape,rate=ps$rate))
## lmb.D9_v3=lmb(weight ~ group,dIndependent_Normal_Gamma(mu=ps$mu,Sigma = ps$Sigma,shape=ps$shape_ING,rate=ps$rate))
summary(lmb.D9)
#> Call
#> lmb(formula = weight ~ group, pfamily = dNormal(mu = ps$mu, Sigma = ps$Sigma,
#> dispersion = ps$dispersion))
#>
#> Expected Residuals:
#> Min 1Q Median 3Q Max
#> -1.07277554 -0.50499644 0.06000356 0.25500356 1.36722446
#>
#> Prior and Maximum Likelihood Estimates with Standard Deviations
#>
#> Null Mode Prior Mean Prior.sd Max Like. Like.sd
#> (Intercept) 4.846 4.846 2.192 5.032 0.220
#> groupTrt 0.000 0.000 3.100 -0.371 0.311
#>
#> Bayesian Estimates Based on 1000 iid draws
#>
#> Post.Mode Post.Mean Post.Sd MC Error Pr(Null_tail) SE(tail)
#> (Intercept) 5.030145 5.047217 0.213189 0.006742 0.173000 0.012
#> groupTrt -0.367290 -0.384442 0.306728 0.009700 0.098000 0.009
#> Pr(Prior_tail)
#> (Intercept) 0.173
#> groupTrt 0.098 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Directional Tail Summaries:
#>
#> Metric vs Null vs Prior
#> Mahalanobis Distance 1.2418 1.2418
#> Tail Probability 0.0950 0.0950
#> [Tail probabilities are P(delta^T * Z <= 0) in whitened space]
#>
#>
#> Distribution Percentiles
#>
#> 1.0% 2.5% 5.0% Median 95.0% 97.5% 99.0%
#> (Intercept) 4.5600 4.6315 4.7035 5.0476 5.4054 5.4604 5.519
#> groupTrt -1.1135 -0.9703 -0.8824 -0.3952 0.1225 0.2317 0.369
#>
#> Effective Number of Parameters: 1.922843
#> Expected Residual Deviance: 19.9135
#> DIC: 44.1358
#>
#> Expected Mean dispersion: 0.4853407
#> Sq.root of Expected Mean dispersion: 0.696664
#>
#> Mean Likelihood Subgradient Candidates Per iid sample: 1With both models fitted, we’re ready to explore their summaries and compare classical versus Bayesian outputs in the next section.
Before we dive into each summary, it’s helpful to see the big picture. In the classical lm summary, you get three blocks—residual diagnostics, coefficient tests, and global fit measures—that tell you how well an OLS model is behaving. In the Bayesian lmb summary, you see additional blocks that compare your specified priors to the data (via ML estimates), then deliver full posterior summaries, effective‐parameter counts, and information criteria. In Sections 4.1 and 4.2 we walk through each of these blocks side by side, highlighting where the Bayesian output enriches or mirrors the classical view.
The built-in summary() method for an lm object organizes its output into three clear pieces: - A quick look at the residuals - A coefficient table with t-tests - Overall fit statistics (RSE, R^2, F-test) We’ll step through each of these in turn, starting with the residual summary.
summary(lm.D9)
#>
#> Call:
#> lm(formula = weight ~ group)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.0710 -0.4938 0.0685 0.2462 1.3690
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 5.0320 0.2202 22.850 9.55e-15 ***
#> groupTrt -0.3710 0.3114 -1.191 0.249
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.6964 on 18 degrees of freedom
#> Multiple R-squared: 0.07308, Adjusted R-squared: 0.02158
#> F-statistic: 1.419 on 1 and 18 DF, p-value: 0.249summary(lm.D9) begins with a five‐number summary of the
residuals (Min, 1Q, Median, 3Q, Max). This display highlights skew,
spread, and potential outliers in the model’s error distribution.
Coefficients:
| Estimate | Std. Error | t value | Pr(> | |
|---|---|---|---|---|
| (Intercept) | 5.0320 | 0.2202 | 22.850 | 9.55e-15*** |
| groupTrt | –0.3710 | 0.3114 | –1.191 | 0.249 |
The columns report:
The t-test assesses whether each predictor’s effect differs from zero after accounting for sampling variability. A small p-value indicates that the null hypothesis \(\beta = 0\) is unlikely given the data.
In this example, the intercept is highly significant (\(p<10^{-14}\)), whereas the treatment effect (groupTrt) yields \(p=0.249\), so there is insufficient evidence to reject \(H_{0}:\beta_{\text{groupTrt}}=0\) at conventional levels.
Below the coefficient table, summary(lm.D9) prints four
key measures of fit:
Residual standard error: 0.7047 on 18 DF
Multiple R-squared: 0.00285
Adjusted R-squared: -0.04863
F-statistic: 0.3094 on 1 and 18 DF, p-value: 0.5839
These metrics gauge how well the model captures variation in the data and whether the added predictor meaningfully improves over an intercept-only model.
Residual standard error (RSE)
This is the estimated standard deviation of the residuals.
Multiple R-squared (\(R^2\))
This measures the fraction of total variability in \(y\) explained by the model.
Adjusted R-squared (\(\text{adj}R^2\))
Adjusts for the number of predictors, penalizing model complexity.
F-statistic
Tests the null hypothesis that all slopes are zero.
When you call summary() on an lmb object, the first thing you see is how your prior specification lines up against a classical fit under fixed dispersion. This side-by-side display makes it easy to gauge how much your prior is pulling the estimates. Only after that does the function report full posterior summaries (moments, quantiles, Monte Carlo errors) and Bayesian diagnostics (pD, DIC, etc.). We begin by examining the prior vs. ML block.
summary(lmb.D9)
#> Call
#> lmb(formula = weight ~ group, pfamily = dNormal(mu = ps$mu, Sigma = ps$Sigma,
#> dispersion = ps$dispersion))
#>
#> Expected Residuals:
#> Min 1Q Median 3Q Max
#> -1.07277554 -0.50499644 0.06000356 0.25500356 1.36722446
#>
#> Prior and Maximum Likelihood Estimates with Standard Deviations
#>
#> Null Mode Prior Mean Prior.sd Max Like. Like.sd
#> (Intercept) 4.846 4.846 2.192 5.032 0.220
#> groupTrt 0.000 0.000 3.100 -0.371 0.311
#>
#> Bayesian Estimates Based on 1000 iid draws
#>
#> Post.Mode Post.Mean Post.Sd MC Error Pr(Null_tail) SE(tail)
#> (Intercept) 5.030145 5.047217 0.213189 0.006742 0.173000 0.012
#> groupTrt -0.367290 -0.384442 0.306728 0.009700 0.098000 0.009
#> Pr(Prior_tail)
#> (Intercept) 0.173
#> groupTrt 0.098 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Directional Tail Summaries:
#>
#> Metric vs Null vs Prior
#> Mahalanobis Distance 1.2418 1.2418
#> Tail Probability 0.0950 0.0950
#> [Tail probabilities are P(delta^T * Z <= 0) in whitened space]
#>
#>
#> Distribution Percentiles
#>
#> 1.0% 2.5% 5.0% Median 95.0% 97.5% 99.0%
#> (Intercept) 4.5600 4.6315 4.7035 5.0476 5.4054 5.4604 5.519
#> groupTrt -1.1135 -0.9703 -0.8824 -0.3952 0.1225 0.2317 0.369
#>
#> Effective Number of Parameters: 1.922843
#> Expected Residual Deviance: 19.9135
#> DIC: 44.1358
#>
#> Expected Mean dispersion: 0.4853407
#> Sq.root of Expected Mean dispersion: 0.696664
#>
#> Mean Likelihood Subgradient Candidates Per iid sample: 1Before diving into the posterior, summary(lmb.D9)
reports the prior means (and variances) you supplied alongside the
maximum-likelihood estimates under a fixed‐dispersion Gaussian model.
Seeing these side by side helps you gauge how much influence your prior
exerts relative to the data.
sumlmb<-summary(lmb.D9)
sumlmb$coefficients1
#> Null Mode Prior Mean Prior.sd Max Like. Like.sd
#> (Intercept) 4.8465 4.8465 2.192002 5.032 0.2202177
#> groupTrt 0.0000 0.0000 3.099959 -0.371 0.3114349The default prior specification from Prior_Setup() is
designed to align closely with the classical fit while still imposing
mild regularization:
The main table lists posterior mode and mean for each coefficient, plus the posterior standard deviation, Monte Carlo standard error, and the tail probability for a zero‐effect test. These values integrate your prior and the likelihood to give a full distributional summary.
sumlmb<-summary(lmb.D9)
sumlmb$coefficients
#> Post.Mode Post.Mean Post.Sd MC Error Pr(Null_tail)
#> (Intercept) 5.030145 5.0472173 0.2131894 0.006741640 0.173
#> groupTrt -0.367290 -0.3844418 0.3067282 0.009699596 0.098
#> SE(tail) Pr(Prior_tail)
#> (Intercept) 0.011961229 0.173
#> groupTrt 0.009401915 0.098When a g prior is used, the posterior means simplify and becomes a weighted average of the prior mean and the Maximum likelihood estimates.
\[ \text{Post.Mean} = \mathrm{pwt}\times\text{Prior Mean} + (1-\mathrm{pwt})\times\text{MLE}. \]
In other words, pwt directly tunes how much “pull” your
prior has versus the data in each dimension.
The Pr(tail) column gives the posterior “reverse‐tail”
probability relative to the prior mean: if the posterior mean is below
the prior mean, it’s
\[
P(\beta > \text{prior mean}\mid\text{data})
\]
and if the posterior mean is above the prior mean, it’s
\[
P(\beta < \text{prior mean}\mid\text{data}).
\]
For models with a zero prior mean and roughly symmetric posteriors,
Pr(tail) will be close to one-half of the classical
two-sided p-value—and thus roughly equal to the one-sided
p-value—offering a Bayesian analogue to hypothesis testing. In essence,
a small Pr(tail) value signifies strong evidence against
the prior mean, functioning as a form of “rejection” of the prior
hypothesis.
The Standardized Mahalanobis Distance quantifies how far the prior point estimate lies from the posterior mean, accounting for posterior covariance. Building on this, the Directional Tail Probability assesses how likely it is—under the posterior distribution—to observe coefficient values that fall opposite the prior, relative to the posterior mean. This directional measure generalizes the concept of reverse-tail probabilities from univariate to multivariate settings. Small values suggest that the posterior density places little mass in the direction of the prior, and can thus be interpreted as a form of prior rejection.
Below the moments, you get a grid of key quantiles (1%, 2.5%, 5%, median, 95%, 97.5%, 99%) for each parameter. Percentiles let you quickly read off credible‐interval bounds and assess skew or heavy tails in the posterior.
sumlmb<-summary(lmb.D9)
sumlmb$Percentiles
#> 1.0% 2.5% 5.0% Median 95.0% 97.5%
#> (Intercept) 4.559959 4.6315282 4.7034515 5.0476302 5.405354 5.4604074
#> groupTrt -1.113454 -0.9702691 -0.8823886 -0.3952454 0.122497 0.2317131
#> 99.0%
#> (Intercept) 5.5189205
#> groupTrt 0.3686194Interpretation for groupTrt:
- The 95% credible interval (2.5%–97.5%) is approximately
[–0.9315, 0.2334], which spans zero and indicates
uncertainty about the direction of the treatment effect.
- The 90% credible interval (5%–95%) is approxinately [–0.8722,
0.1557], also including zero and reinforcing that a null or
small positive effect remains plausible.
- The posterior distribution is slightly negatively skewed: the lower
tail (1.0% = –1.0553) extends farther below the median (–0.3337) than
the upper tail (99.0% = 0.3380) extends above it.
- Although the median effect is –0.3337 (suggesting a small negative
treatment effect), the credible intervals remind us that the data do not
provide strong evidence to rule out no effect or even a slight positive
effect. ```
For Gaussian models, summary(lmb.D9) also prints the
posterior mean of the dispersion parameter. This aligns with the
classical residual variance estimate but incorporates prior
uncertainty.
Finally, taking the square root of the expected mean dispersion yields a Bayesian estimate of the residual standard deviation. This makes it directly comparable to the classical residual standard error.
The tables below map vignette sections to chapters in Agresti (2015).
| Vignette Section | Textbook Chapter & Section | Notes (provisional) |
|---|---|---|
| 3.1.1 Calling the classical lm function | 3.4 Example Normal Linear Model Inference | Formal definition of \(y = X\beta + \varepsilon\); shows OLS as the MLE under Normal errors (see derivation in Example 3.1). |
| 3.2.1 Setting the prior | 10.2 Bayesian Linear Models | Construction of conjugate Normal priors and Zellner’s g-prior \(V = g (X^{T} X)^{-1}\), with \(g = (1 - \mathrm{pwt})/\mathrm{pwt}\); see Example 10.2. |
| 3.2.2 Calling the lmb function | 10.2 Bayesian Linear Models | Shows closed-form posterior updates under conjugate Normal priors: derivation of the posterior mean and variance for \(\beta\) in the linear model. |
| Vignette Section | Textbook Chapter & Section | Notes |
|---|---|---|
| 4.1.1 Residual summary | 3.4 Example Normal Linear Model Inference | Discussion of raw, standardized, and studentized residuals; graphical diagnostics (residual vs fitted, Q-Q plots); measures of leverage and influence. |
| 4.1.2 Coefficient Estimates and Hypothesis Tests | 3.2 Significance Tests for Normal Linear Models | Formal derivation of t-tests and confidence intervals for slopes; interpretation of estimates, standard errors, t-statistics, two-sided p-values, and significance codes. |
| 4.1.3 Model fit statistics | 3.2 Significance Tests for Normal Linear Models | Definition and interpretation of residual standard error; \(R^2\) and adjusted \(R^2\); ANOVA F-test for overall model significance. |
| 4.2.1 Prior and maximum-likelihood estimates | 10.1 The Bayesian Approach to Statistical Inference | Presentation of conjugate Normal priors and Zellner’s g-prior; overlay of prior and sampling distributions (MLEs); role of \(g\) in shrinkage. |
| 4.2.2 Posterior estimates | 10.2 Bayesian Linear Models | Closed-form posterior mode, mean, and variance under conjugacy; Monte Carlo approximation for non-conjugate settings; interpretation of “Pr(tail)” statements. |
| 4.2.3 Distribution percentiles | 10.2 Bayesian Linear Models | Definition of equal-tailed and HPD intervals; extraction of key posterior quantiles (2.5%, 97.5%, etc.); assessing skew and tail behavior. |
| 4.2.4 Effective number of parameters (pD) | 10.3 Bayesian Generalized Linear Models | Introduction of deviance-based effective number of parameters (\(p_D\)); interpretation of \(p_D\) as a penalty/shrinkage measure in model comparison. |
| 4.2.5 Expected residual deviance (\(\bar{D}\)) | 10.3 Bayesian Generalized Linear Models | Definition of deviance for GLMs; calculation of posterior mean deviance; use of \(\bar{D}\) as a Bayesian fit statistic. |
| 4.2.6 Deviance Information Criterion (DIC) | 10.3 Bayesian Generalized Linear Models | Derivation of \(\text{DIC} = \bar{D} + p_D\); guidelines for using DIC to compare Bayesian models under a common prior. |
| 4.2.7 Expected mean dispersion | 10.2 Bayesian Linear Models | Bayesian estimation of the Gaussian error variance \((\sigma^2)\) via conjugate priors; posterior mean of \(\sigma^2\). |
| 4.2.8 Square-root of expected mean dispersion | 10.2 Bayesian Linear Models | Transformation of posterior \(\sigma^2\) into \(\sigma\) (residual standard-error analogue); direct comparison to classical RSE. |