Chapter A02: Overview of Estimation Procedures

Introduction

This vignette describes the computational architecture used by the glmbayes package to generate independent and identically distributed (iid) samples from Bayesian generalized linear models (GLMs). The central idea is that posterior simulation is handled by a small set of low-level simulation functions (“simfunctions”) that implement either conjugate sampling formulas or envelope-based accept-reject methods. These simfunctions are called by higher-level modeling functions such as glmb() and lmb(), which provide formula interfaces, model frames, and S3 methods.

The simfunctions implement both classical conjugate Bayesian updating and the likelihood-subgradient envelope methods of (Nygren and Nygren 2006). Companion vignettes document coefficient envelopes (Nygren 2025a), Gamma-family dispersion envelopes (Nygren 2025b), and joint envelopes for Gaussian regression with independent Normal–Gamma priors (Nygren 2025c).

This chapter begins by describing the role of rglmb(), rlmb()and the pfamily system, which together determine which simfunction is used for a given model. Later sections describe the simfunctions themselves, the mapping between likelihood families and prior families, and the C++ routines used for envelope construction and sampling.

1. rglmb/rlmb, pfamilies, and the Simulation Architecture

The functions rglmb() and rlmb()are the minimalistic engines for Bayesian GLM simulation. They are called internally by glmb() and lmb() respectively, which provides formula-based interfaces similar to glm() and lm() respectively. In contrast, rglmb() and rlmb()works directly with numeric inputs and avoids the overhead of model frames, formulas, and method dispatches. Their main purpose is to generate iid posterior draws given:

a response vector y
a design matrix x
a likelihood family (e.g., gaussian(), poisson(), binomial())
a prior family (pfamily)
optional weights, offsets, and envelope controls

Unlike glmb()and lmb(),rglmb()andrlmb()` do not compute fitted values, residuals, or diagnostics. They simply dispatche to the correct simfunctions and return the resulting posterior draws. The lower overhead associated with these functions make them a good choice when calling the estimation procedures repeatedly (say as part of Gibbs sampling procedures).

1.1 How glmb and lmb use rglmb and rlmb respectively

The functions glmb() and lmb() are the high-level modeling interface. They:

parse formulas
construct model frames
handle missing data
attach S3 methods for printing, summarizing, predicting, and extracting diagnostics

Once the model frames are constructed, glmb() and lmb()call rglmb() and rlmb()respectively to generate the posterior samples.

Thus:

glmb() –> rglmb() –> simfunction –> posterior draws

and

lmb() –> rlmb() –> simfunction –> posterior draws

respectively.

This separation mirrors the classical distinction between glm()/ lm()and the underlying IRLS routines.

1.2 pfamilies: the Bayesian analogue of family()

In classical GLMs, the family object determines the likelihood and link. In glmbayes, the pfamily object plays the same role for priors. Each pfamily object:

names the prior family (pfamily$pfamily)
stores the prior hyperparameters (pfamily$prior_list)
records the likelihood families it supports (pfamily$okfamilies)
provides a function for the allowable links (pfamily$plinks)
embeds the simulation function to call (pfamily$simfun)

This design mirrors the structure of glm():
family determines the likelihood, and pfamily determines the prior and the sampling method. The combination (family, pfamily) uniquely determines which simfunction is used.

To utilize pfamilies when calling the higher level glmb() and lmb() or the lower level rglmb() and rlmb() functions, users must provide the hyperparameters needed to populate the pfamily$prior_list. These hyperparameters vary across pfamilies largely based on whether the prior specified sets a prior for regression coefficients, dispersion parameters, or both jointly.

The pfamilies currently implemented in glmbayes differ primarily in whether they place priors on regression coefficients, dispersion parameters, or both. The table below summarizes the hyperparameters associated with each pfamily.

pfamily name Prior components in prior_list

dNormal mu: prior mean vector for coefficients Sigma: prior covariance matrix dispersion: fixed dispersion (if applicable)

dGamma shape: shape parameter for Gamma prior on precision rate: rate parameter for Gamma prior on precision beta: regression coefficients (used when updating dispersion separately) max_disp_perc: percentile used to truncate the posterior dispersion distribution disp_lower, disp_upper: optional user-specified truncation bounds

dNormal_Gamma mu: prior mean vector for coefficients Sigma_0: prior covariance on precision-weighted scale shape: shape parameter for Gamma prior on precision rate: rate parameter for Gamma prior on precision (prior_list still uses component name Sigma.)

dIndependent_Normal_Gamma mu: prior mean vector for coefficients Sigma: prior covariance matrix shape: shape parameter for Gamma prior on precision rate: rate parameter for Gamma prior on precision max_disp_perc: percentile used to truncate the posterior dispersion distribution disp_lower, disp_upper: optional user-specified truncation bounds

These pfamilies determine which simfunction is used. For example:

dNormal selects rNormal_reg
dGamma selects rGamma_reg
dNormal_Gamma selects rNormalGamma_reg
dIndependent_Normal_Gamma selects rindepNormalGamma_reg

Because each pfamily embeds its own simulation function, rglmb() and rlmb() do not need to contain any special-case logic. Once the user supplies a pfamily object, the correct simfunction is already encoded inside it. This design mirrors the classical GLM architecture, where the family object determines the likelihood and link, but extends it to the Bayesian setting by allowing the prior to determine the sampling algorithm.

As setting the prior parameters required for each pfamily can be a bit involved, the package provides the Prior_Setup() function which be used to extract default prior settings for these parameters while also allowing tailoring of the prior specification. Some of our earlier vignettes already covered this in details so it won’t be discussed further here.

1.3 The uniform simfunction interface

All simfunctions in simfunctions.R share the same signature:

simfun(n, y, x, prior_list, offset = NULL, weights = 1, family = gaussian(), Gridtype = 2, n_envopt = NULL, use_parallel = TRUE, use_opencl = FALSE, verbose = FALSE)

This uniform interface allows rglmb() and rlmb() to dispatch to any simfunction without special-case logic. It also allows users to change priors or likelihoods without altering their workflow, and it makes the simfunctions suitable for use inside Gibbs samplers or other custom algorithms.

1.4 How rglmb and rlmb select the simfunction

The internal logic of rglmb() is:

The user supplies a likelihood family and a prior pfamily.
Prior_Setup() constructs a prior_list containing the hyperparameters.
rglmb() extracts from the pfamily object:
- the allowed likelihood families (okfamilies)
- the allowed links (plinks)
- the prior hyperparameters (prior_list)
- the simulation function (simfun)
It checks that the requested likelihood family and link are compatible with the pfamily.
It constructs a list of arguments to pass to the simfunction.
It calls the simfunction, which performs the actual sampling.
It attaches metadata (the pfamily, the simfunction call, and the arguments) and returns the result.

The simfunction performs the actual sampling, either by conjugate formulas or by envelope-based accept-reject methods.

1.5 The four core simfunctions

The pfamily system dispatches to one of four simfunctions:

rNormal_reg
rNormalGamma_reg
rindepNormalGamma_reg
rGamma_reg

These functions implement the sampling algorithms described in Appendices A2, A5, A6, and A7. The next section provides tables showing how likelihood families and pfamilies map to these simfunctions.

1.6 Inspecting the Simulation Function Used: the `simfunction()` Generic

The glmbayes package provides an S3 interface for inspecting which low-level simulation function (“simfunction”) was used during a call to rglmb() or rlmb(). This interface is accessed through the exported generic:

simfunction(object, ...)

When applied to an rglmb or rlmb result, this function returns an object of class "simfunction" containing:

the name of the simfunction used (e.g., rNormal_reg)
the call used to invoke the simfunction
the arguments passed to it, including the prior_list and family objects

This provides a transparent view of the internal mechanics of the simulation step, which is useful for debugging, reproducibility, and understanding how pfamilies dispatch to specific simfunctions.

For example:

fit <- rglmb(n = 10, y = y, x = X,
             family = gaussian(),
             pfamily = dNormal(mu, Sigma))

simfunction(fit)

prints the simfunction name, the call, and all arguments used during sampling.

The print.simfunction() method formats this information in a readable way, including a recursive display of the prior hyperparameters stored in prior_list. This interface is intentionally lightweight: it does not re-run the simfunction or modify the model object. It simply reports the simulation metadata recorded during the original call.

1.7 Summary

The combination of glmb()/lmb(), rglmb()/rlmb(), the pfamily system, and the simfunctions forms a flexible and extensible framework for Bayesian GLM simulation. The high-level modeling functions handle formula parsing and model construction, while the low-level simfunctions implement efficient posterior sampling using either conjugate formulas or envelope-based methods.

The next section provides detailed mapping tables showing how families and pfamilies determine the appropriate simfunction and sampling method.

2. Mapping Families and Pfamilies to Simulation Functions

2.1 Mappings to Simulation Functions

The table below summarizes how likelihood families, link functions, and pfamilies map to the simulation functions in simfunctions.R. The tables also indicates whether the sampling is conjugate or envelope-based.

Likelihood Family	Link	Pfamily	Simfunction	Method
Gaussian	identity	dNormal	rNormal_reg	Conjugate MVN [C++]
Gaussian	identity	dGamma	rGamma_reg	Conjugate Gamma [R]
Gaussian	identity	dNormal_Gamma	rNormalGamma_reg	Conjugate Normal-Gamma [R]
Gaussian	identity	dIndependent_Normal_Gamma	rindepNormalGamma_reg	Envelope-based joint sampler [C++]
Poisson	log	dNormal	rNormal_reg	Envelope-based [C++]
Quasi-Poisson	log	dNormal	rNormal_reg	Envelope-based [C++]
Binomial	logit, probit, cloglog	dNormal	rNormal_reg	Envelope-based [C++]
Quasi-Binomial	logit, probit, cloglog	dNormal	rNormal_reg	Envelope-based [C++]
Gamma	log	dNormal	rNormal_reg	Envelope-based [C++]
Gamma	log	dGamma	rGamma_reg	Envelope-based [C++]

2.2 Mapping Simulation Functions to Primary C++ Routines

The table below lists the simulation functions that call C++ code, the likelihood families for which the C++ path is used, and the specific C++ source files and function names invoked. Only the primary simulation routines are shown here; envelope construction routines are documented separately in Appendices A5, A6, and A7.

Simfunction	Family	C++ File	C++ Function(s) Called
rNormal_reg	Gaussian	rNormalReg.cpp	rNormalReg_cpp_export
rNormal_reg	All non-Gaussian	rNormalGLM.cpp	rNormalGLM_cpp_export
rindepNormalGamma_reg	Gaussian	rIndepNormalGammaReg.cpp \| rIndepNormalGammaReg_std_cpp_export \| \| rIndepNormalGammaReg_std_parallel_cpp_export
rGamma_reg	Gaussian	(none)	(uses R-level rgamma or ctrgamma)
rGamma_reg	Gamma	(none)	(uses R-level ctrgamma)

Notes: - rNormal_reg uses different C++ backends depending on whether the likelihood is Gaussian or non-Gaussian. - rindepNormalGamma_reg always uses C++ for the joint sampler. - rGamma_reg does not call a dedicated C++ simulation routine, but it does use the R-level ctrgamma function for truncated Gamma proposals. - Envelope construction routines (EnvelopeBuild, EnvelopeDispersionBuild_cpp, etc.) are documented separately and are not included in this table.

3. Conjugate vs. Envelope-Based Sampling

3.1 Conjugate Cases

Several combinations of likelihood family and prior lead to closed-form posterior distributions:

Gaussian likelihood with Normal prior on beta
Gaussian likelihood with Normal-Gamma prior on (beta, tau)
Gaussian likelihood with Gamma prior on tau
Gamma likelihood with Gamma prior on tau (conjugate path)

In these cases, the simfunctions use direct sampling from multivariate normal or gamma distributions. No envelope construction is required.

3.2 Envelope-Based Cases

When the posterior is not available in closed form, the simfunctions use the likelihood subgradient approach of Nygren and Nygren (2006). These cases include:

All non-Gaussian GLMs with Normal priors
Gaussian regression with independent Normal-Gamma priors
Gamma regression with Gamma prior on precision (non-conjugate path)

The envelope-based samplers rely on:

Likelihood subgradient densities (JASA 2006)
Mixture envelopes (Appendix A5)
Dispersion envelopes (Appendix A6)
Joint envelopes for (beta,phi) (Appendix A7)

These methods guarantee iid sampling even in high dimensions and for non-conjugate models.

4. Theoretical Foundations and Cross-References

The simulation functions are grounded in the following theoretical results:

(Nygren and Nygren 2006): likelihood-subgradient densities and bounds on acceptance rates.
(Nygren 2025a): implementation-oriented companion (this series, Chapter A05) for likelihood-subgradient samplers and the .rNormalGLM_cpp pipeline.
(Nygren 2025b): accept–reject sampling for dispersion in Gamma regression (rGamma_reg).
(Nygren 2025c): joint sampling for Gaussian regression with independent Normal–Gamma priors (rindepNormalGamma_reg).

Together these provide the mathematical and implementation justification for the envelope-based methods in the simfunctions.

5. Overview of Each Simulation Function

As noted above, the function signatures for the simulation functions are the same. However, the functions differ in terms of the specific familes/links supported and in terms of the set of prior parameters required as part of the prior_list argument. Here we provide a summary view of the families/links supported by each function and then a more detailed overview of each function.

Simfunction Supported Families and Links

rNormal_reg gaussian(identity) poisson(log) quasipoisson(log) binomial(logit, probit, cloglog) quasibinomial(logit, probit, cloglog) Gamma(log)

rNormalGamma_reg gaussian(identity)

rindepNormalGamma_reg gaussian(identity)

rGamma_reg gaussian(identity) Gamma(log)

5.1 rNormal_reg

Purpose:
Generates iid samples of regression coefficients beta for GLMs with Normal priors.

Function signature: rNormal_reg(n, y, x, prior_list, offset = NULL, weights = 1, family = gaussian(), Gridtype = 2, n_envopt = NULL, use_parallel = TRUE, use_opencl = FALSE, verbose = FALSE)

Required prior_list components (from dNormal): - mu: prior mean vector for coefficients - Sigma: prior covariance matrix - dispersion: fixed dispersion (if applicable)

Conjugate case:
Gaussian likelihood with Normal prior.
Sampling uses direct multivariate normal draws via .rNormalReg_cpp.

Envelope-based case:
All non-Gaussian GLMs.
Uses .rNormalGLM_cpp and EnvelopeBuild to construct a likelihood-subgradient envelope.

Implementation sketch:
1. Compute posterior mode using optim (non-Gaussian) or lm.fit (Gaussian).
2. Standardize model using glmb_Standardize_Model.
3. Build envelope using EnvelopeBuild.
4. Generate iid samples using accept-reject sampling.

References:
(Nygren and Nygren 2006); (Nygren 2025a).

5.2 rNormalGamma_reg

Purpose:
Conjugate Normal-Gamma regression for Gaussian likelihoods.

Function signature: rNormalGamma_reg(n, y, x, prior_list, offset = NULL, weights = 1, family = gaussian(), Gridtype = 2, n_envopt = NULL, use_parallel = TRUE, use_opencl = FALSE, verbose = FALSE)

Required prior_list components (from dNormal_Gamma): - mu: prior mean vector for coefficients - Sigma: prior covariance matrix - shape: shape parameter for Gamma prior on precision - rate: rate parameter for Gamma prior on precision

Method:
Closed-form posterior for (beta, tau).
Draw tau from Gamma, then draw beta from conditional Normal.

Implementation sketch:
1. Fit weighted least squares to obtain Btilde, IR, and S.
2. Compute posterior Gamma parameters for tau.
3. Draw tau and then beta = Btilde + IR %% rnorm sqrt(dispersion).

References:
Conjugate Normal–Gamma regression (Raiffa and Schlaifer 1961; O’Hagan and Forster 2004); see also (Lindley and Smith 1972).

5.3 rindepNormalGamma_reg

Purpose:
Joint sampling of (beta, phi) for Gaussian regression with independent Normal-Gamma priors.

Function signature: rindepNormalGamma_reg(n, y, x, prior_list, offset = NULL, weights = 1, family = gaussian(), Gridtype = 2, n_envopt = NULL, use_parallel = TRUE, use_opencl = FALSE, verbose = FALSE)

Required prior_list components (from dIndependent_Normal_Gamma): - mu: prior mean vector for coefficients - Sigma: prior covariance matrix - shape: shape parameter for Gamma prior on precision - rate: rate parameter for Gamma prior on precision - max_disp_perc: percentile used to truncate dispersion support - disp_lower, disp_upper: optional user-specified truncation bounds

Method:
Envelope-based joint sampler using a shared envelope over beta and dispersion bounds.

Backend:
- .rIndepNormalGammaReg_std_cpp
- EnvelopeDispersionBuild_cpp
- EnvelopeBuild

Implementation sketch:
1. Iteratively anchor dispersion using repeated Gaussian fits.
2. Standardize model and compute posterior mode.
3. Build coefficient envelope (EnvelopeBuild).
4. Build dispersion envelope (EnvelopeDispersionBuild_cpp).
5. Sample jointly from the standardized model.
6. Transform back to original scale.

References:
(Nygren 2025c); (Griffin and Brown 2010).

5.4 rGamma_reg

Purpose:
Simulates dispersion parameters for Gaussian and Gamma families.

Function signature: rGamma_reg(n, y, x, prior_list, offset = NULL, weights = 1, family = gaussian(), Gridtype = 2, n_envopt = NULL, use_parallel = TRUE, use_opencl = FALSE, verbose = FALSE)

Required prior_list components (from dGamma): - shape: shape parameter for Gamma prior on precision - rate: rate parameter for Gamma prior on precision - beta: regression coefficients (used when updating dispersion separately) - max_disp_perc: percentile used to truncate dispersion support - disp_lower, disp_upper: optional user-specified truncation bounds

Gaussian case:
Conjugate Gamma posterior for precision tau.
Uses direct sampling or truncated sampling via ctrgamma.

Gamma family case:
Non-conjugate.
Uses the A6 envelope-based sampler with a tangent envelope for the precision.

Backend:
- ctrgamma
- A6 routines
- log-space utilities logdiffexp

Implementation sketch:
1. Compute posterior mode for precision.
2. Build Gamma surrogate and tangency point.
3. Construct bounding function log_h.
4. Accept-reject sampling for precision.
5. Transform to dispersion.

References:
(Nygren 2025b).

6. Use in Gibbs Samplers and Custom Algorithms

The simfunctions are designed to be lightweight and fast. They avoid the overhead of:

Model frames
Formulas
S3 objects
Diagnostic structures

This makes them ideal for:

Gibbs sampling
Blocked MCMC
Variational approximations
Simulation studies
Algorithm prototyping

They return only the essential components needed for further computation: coefficients, dispersion draws, posterior modes, envelopes, and iteration counts.

7. Source Code and File Structure

The core simulation routines are implemented in:

simfunctions.R

This file contains the four primary simfunctions (rNormal_reg, rNormalGamma_reg, rindepNormalGamma_reg, rGamma_reg) and the R-level logic that interfaces with the C/C++ backends.

7.1 C and C++ Backends Used by the Simfunctions

Several simfunctions rely on optimized C/C++ routines for envelope construction, posterior mode evaluation, and sampling. These routines are located in the following source files:

Coefficient samplers (Normal priors): - rNormalReg.cpp
Backend for Gaussian–Normal conjugate sampling. - rNormalGLM.cpp
Backend for non-Gaussian Normal-prior envelope sampling.

Joint Normal–Gamma samplers: - rIndepNormalGammaReg.cpp
Standard joint sampler for independent Normal–Gamma priors. - rIndepNormalGammaReg.cpp
Parallelized version of the joint sampler.

Envelope construction routines: - EnvelopeBuild_c.cpp
Coefficient envelopes (Appendix A5). - EnvelopeDispersionBuild_cpp.cpp
Dispersion envelopes for Gamma-family models (Appendix A7).

Model standardization: - glmb_Standardize_Model.R
R-level preprocessing used by all envelope-based samplers.

7.2 Utility Functions

The simfunctions also rely on several numerically robust utility routines:

ctrgamma
Truncated Gamma sampler used in rGamma_reg.
logdiffexp
Stable computation of log(exp(a) - exp(b)).
p_inv_gamma, q_inv_gamma, r_invgamma
Inverse-Gamma CDF, quantile, and random-generation utilities.

These utilities support the envelope-based samplers, truncated distributions, and dispersion updates used throughout the package.

8. Summary

The simulation functions described in this chapter form the computational backbone of the glmbayes package. They implement both classical conjugate sampling and modern envelope-based iid sampling. They are the recommended interface for custom samplers, Gibbs sampling, and research applications requiring direct access to the underlying algorithms.

For further details, see the JASA paper and Appendices A5, A6, and A7.

Chapter A02: Overview of Estimation Procedures

Kjell Nygren

2026-04-30

Chapter A02: Overview of Estimation Procedures

Introduction

1. rglmb/rlmb, pfamilies, and the Simulation Architecture

1.1 How glmb and lmb use rglmb and rlmb respectively

1.2 pfamilies: the Bayesian analogue of family()

1.3 The uniform simfunction interface

1.4 How rglmb and rlmb select the simfunction

1.5 The four core simfunctions

1.6 Inspecting the Simulation Function Used: the `simfunction()` Generic

1.7 Summary

2. Mapping Families and Pfamilies to Simulation Functions

2.1 Mappings to Simulation Functions

2.2 Mapping Simulation Functions to Primary C++ Routines

3. Conjugate vs. Envelope-Based Sampling

3.1 Conjugate Cases

3.2 Envelope-Based Cases

4. Theoretical Foundations and Cross-References

5. Overview of Each Simulation Function

5.1 rNormal_reg

5.2 rNormalGamma_reg

5.3 rindepNormalGamma_reg

5.4 rGamma_reg

6. Use in Gibbs Samplers and Custom Algorithms

7. Source Code and File Structure

7.1 C and C++ Backends Used by the Simfunctions

7.2 Utility Functions

8. Summary

References

Chapter A02: Overview of Estimation Procedures

Kjell Nygren

2026-04-30

Chapter A02: Overview of Estimation Procedures

Introduction

1. rglmb/rlmb, pfamilies, and the Simulation Architecture

1.1 How glmb and lmb use rglmb and rlmb respectively

1.2 pfamilies: the Bayesian analogue of family()

1.3 The uniform simfunction interface

1.4 How rglmb and rlmb select the simfunction

1.5 The four core simfunctions

1.6 Inspecting the Simulation Function Used: the simfunction() Generic

1.7 Summary

2. Mapping Families and Pfamilies to Simulation Functions

2.1 Mappings to Simulation Functions

2.2 Mapping Simulation Functions to Primary C++ Routines

3. Conjugate vs. Envelope-Based Sampling

3.1 Conjugate Cases

3.2 Envelope-Based Cases

4. Theoretical Foundations and Cross-References

5. Overview of Each Simulation Function

5.1 rNormal_reg

5.2 rNormalGamma_reg

5.3 rindepNormalGamma_reg

5.4 rGamma_reg

6. Use in Gibbs Samplers and Custom Algorithms

7. Source Code and File Structure

7.1 C and C++ Backends Used by the Simfunctions

7.2 Utility Functions

8. Summary

References

1.6 Inspecting the Simulation Function Used: the `simfunction()` Generic