1. Introductory Discussion

Gamma regression models are used when the response variable is strictly positive and right‑skewed.
Typical applications include:

insurance claim severities
cost or expenditure data
reaction times and waiting times
positive continuous measurements with heteroskedasticity

Gamma regression is a standard GLM for positive, right‑skewed responses (Nelder and Wedderburn 1972; McCullagh and Nelder 1989; Agresti 2015).

In classical statistics, these models are fit using

glm(…, family = Gamma(link = …)).

In glmbayes, the Bayesian analogue is

glmb(…, family = Gamma(link = …), pfamily = dNormal(…))

or, when dispersion is unknown and modeled,

glmb(…, family = Gamma(link = …), pfamily = dNormal_Gamma(…))

or related pfamilies.

This chapter:

reviews the Gamma GLM structure and link functions
shows how to fit classical and Bayesian Gamma regression models
uses a realistic insurance example (carinsca)
demonstrates how to estimate dispersion using gamma.dispersion()
compares classical and Bayesian summaries, including DIC

2. Gamma Likelihood and Model Structure (Log Link, Fixed Dispersion)

Gamma regression models strictly positive, right-skewed responses.
In a Gamma GLM with fixed dispersion, the response satisfies

\[ Y_i > 0, \qquad Y_i \sim \text{Gamma}(\text{mean}=\mu_i,\ \text{dispersion}=\phi), \]

where the variance is

\[ \mathrm{Var}(Y_i) = \phi\,\mu_i^2. \]

The mean is linked to a linear predictor through

\[ \eta_i = x_i^\top \beta, \qquad \mu_i = g^{-1}(\eta_i). \]

In this chapter we focus exclusively on the log link, which is the standard choice for Gamma GLMs and the only link supported by glmb() when dispersion is fixed.

2.1 Weighted Gamma Log-Likelihood (Log Link)

Let \(w_i\) denote observation weights (e.g., claim counts or exposures).
Under the log link,

\[ \mu_i = e^{\eta_i}, \]

and the Gamma log-likelihood (up to constants) is

\[ \ell(\beta) = \sum_{i=1}^n w_i\left[ -\frac{1}{\phi}\,\eta_i - \frac{1}{\phi}\,y_i e^{-\eta_i} \right]. \]

Terms not involving \(\beta\) have been absorbed into the constant.
This form is used by both glm() and the Bayesian functions glmb() and rglmb() when dispersion is fixed.

2.2 Exponential-Family Representation

The Gamma distribution belongs to the exponential family (McCullagh and Nelder 1989; Agresti 2015) with canonical parameter

\[ \theta_i = -\frac{1}{\mu_i}, \]

and cumulant function

\[ b(\theta_i) = -\log(-\theta_i). \]

The variance is

\[ \mathrm{Var}(Y_i) = \phi\,\mu_i^2. \]

Under the log link, the canonical parameter is a monotone transformation of the linear predictor, and the resulting log-likelihood is log-concave in \(\eta_i\).
This property is essential for stable envelope construction and accept–reject sampling in glmbayes.

2.3 Log Link and Its Properties

The log link is defined by

\[ g(\mu_i) = \log(\mu_i), \qquad \mu_i = e^{\eta_i}. \]

It is the preferred link for Gamma GLMs because:

it automatically enforces \(\mu_i > 0\),
it yields multiplicative interpretations for coefficients,
it produces a log-concave likelihood in \(\eta_i\),
it is numerically stable for skewed data,
it is the only link supported by glmb() when dispersion is fixed.

Other links (inverse, identity) do not preserve log-concavity and are therefore not used in this chapter.
Models with estimated dispersion or non-log links are covered in Chapter 11.

2.4 Likelihood, Prior, and Posterior (Normal Prior on \(\beta\))

With the log link, the weighted log-likelihood is

\[ \ell(\beta) = \sum_{i=1}^n w_i\left[ -\frac{1}{\phi}\,\eta_i - \frac{1}{\phi}\,y_i e^{-\eta_i} \right]. \]

A multivariate normal prior is specified as

\[ \log p(\beta) = -\tfrac{1}{2}(\beta - \mu_0)^\top \Sigma_0^{-1}(\beta - \mu_0) + \text{const}. \]

Combining likelihood and prior yields the log-posterior:

\[ \log p(\beta \mid y) = \sum_{i=1}^n w_i\left[ -\frac{1}{\phi}\,\eta_i - \frac{1}{\phi}\,y_i e^{-\eta_i} \right] - \tfrac{1}{2}(\beta - \mu_0)^\top \Sigma_0^{-1}(\beta - \mu_0) + \text{const}. \]

Because both terms are concave in \(\beta\), the posterior is log-concave, enabling efficient iid sampling via the envelope-based accept–reject sampler implemented in glmbayes. ## 3. Link Functions for Gamma GLMs

The link function relates the mean \(\mu\) to the linear predictor \(\eta = X\beta\):

\[ g(\mu) = \eta. \]

The Gamma family in base R supports:

Link	Formula	Notes
inverse	\(\eta = 1/\mu\)	canonical link
identity	\(\eta = \mu\)	must ensure \(\mu > 0\)
log	\(\eta = \log(\mu)\)	most common; ensures \(\mu > 0\)

In practice, the log link is usually preferred because:

it enforces \(\mu > 0\) automatically
it gives multiplicative interpretations for coefficients
it tends to be numerically stable
it preserves log‑concavity of the likelihood

At present (due to the need for log-concavity), only the log link is implemented in the glmb function.

4. Gamma Regression Example: Insurance Claims

We now consider a Gamma regression example based on the carinsca dataset.
The goal is to model average claim cost as a function of rating variables Merit and Class, using claim counts as weights.

4.1 Data Setup

We begin by preparing the data and setting appropriate factor levels and contrasts.


    data(carinsca)
    carinsca$Merit <- ordered(carinsca$Merit)
    carinsca$Class <- factor(carinsca$Class)
    oldopt <- options(contrasts = c("contr.treatment", "contr.treatment"))

    Claims  <- carinsca$Claims
    Insured <- carinsca$Insured
    Merit   <- carinsca$Merit
    Class   <- carinsca$Class
    Cost    <- carinsca$Cost

The response Cost/Claims represents the average claim cost (severity) per claim.
The weights Claims reflect the number of claims on which each average is based, improving efficiency and aligning with standard GLM practice for rate or average models.

4.2 Classical Gamma Regression Using glm()

We fit a classical Gamma GLM with a log link. As glm() only crudely estimates the dispersion, we use MASS::gamma.dispersion() (Venables and Ripley 2002) and pass the estimate to summary().


    out <- glm(Cost/Claims ~ Merit + Class,
               family  = Gamma(link = "log"),
               weights = Claims,
               x       = TRUE)

    ## Estimate the dispersion using MLE
    disp <- gamma.dispersion(out)
    
    summary(out,dispersion=disp)
#> 
#> Call:
#> glm(formula = Cost/Claims ~ Merit + Class, family = Gamma(link = "log"), 
#>     weights = Claims, x = TRUE)
#> 
#> Coefficients:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) -1.17456    0.01195 -98.286  < 2e-16 ***
#> Merit1      -0.06867    0.02008  -3.420 0.000626 ***
#> Merit2      -0.07025    0.02238  -3.138 0.001700 ** 
#> Merit3      -0.05673    0.01254  -4.523 6.08e-06 ***
#> Class2       0.08274    0.02031   4.073 4.64e-05 ***
#> Class3       0.01583    0.01410   1.123 0.261529    
#> Class4       0.15981    0.01494  10.698  < 2e-16 ***
#> Class5      -0.08142    0.03005  -2.709 0.006744 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for Gamma family taken to be 7.840919)
#> 
#>     Null deviance: 1556.0  on 19  degrees of freedom
#> Residual deviance:  156.9  on 12  degrees of freedom
#> AIC: -2999165
#> 
#> Number of Fisher Scoring iterations: 4

The summary shows:

regression coefficients on the log mean‑cost scale
standard errors and z‑statistics
deviance and AIC
an estimated dispersion parameter (phi)

Under the log link, exp(\(\beta_{j}\)) can be interpreted as a multiplicative factor on the expected cost ratio for a one‑unit change in the corresponding covariate (or a change in factor level).

5. Bayesian Gamma Regression (fixed dispersion) with glmb()

To fit the Bayesian analogue, we proceed in two steps:

estimate the dispersion parameter \(\phi\) from the classical Gamma GLM,
construct a prior for the regression coefficients using ,
call with a prior that incorporates the fixed dispersion \(\phi\).

5.1 Estimating Dispersion via gamma.dispersion()

The gamma.dispersion() helper estimates the dispersion parameter from a fitted classical Gamma GLM:


    ## better than the crude estimate from the summary function
    disp <- gamma.dispersion(out)
    disp
#> [1] 7.840919

This value will be treated as known in our Bayesian model, so that the Bayesian and classical fits are directly comparable in terms of how they treat phi.

5.2 Prior Setup Using Prior_Setup()

Next we use Prior_Setup() to construct a Zellner‑type g‑prior (Griffin and Brown 2010) for the coefficients, aligned with the Gamma log‑link model and weighting structure:

    ps <- Prior_Setup(Cost/Claims ~ Merit + Class,
                      family  = Gamma(link = "log"),
                      weights = Claims)
#> Using default pwt = 0.01 (low-d default).

    mu <- ps$mu
    V  <- ps$Sigma

The output from ps (if printed) will include:

prior means (intercept from a null model, effects centered at zero by default)
prior standard deviations (scaled using the pwt argument and the Fisher information)
any additional hyperparameters relevant for dispersion modeling

With the default pwt = 0.01, the prior is weakly informative, and the posterior estimates typically remain close to the MLEs.

5.3 Fitting the Bayesian Gamma Model

We now call glmb(), specifying the same model and likelihood as in the classical fit, but adding a dNormal prior family that fixes dispersion at the value estimated above:

    out_glmb <- glmb(Cost/Claims ~ Merit + Class,
                 family  = Gamma(link = "log"),
                 pfamily = dNormal(mu = mu, Sigma = V, dispersion = disp),
                 weights = Claims)

The arguments mirror the glm() call, with pfamily providing the prior:

family = Gamma(link = “log”) specifies the likelihood and link
pfamily = dNormal(…) specifies a multivariate normal prior for beta with known dispersion
weights = Claims carries over the claim counts

5.4 Summarizing the Bayesian Model

Bayesian summary:


    summary(out_glmb)
#> Call
#> glmb(formula = Cost/Claims ~ Merit + Class, family = Gamma(link = "log"), 
#>     pfamily = dNormal(mu = mu, Sigma = V, dispersion = disp), 
#>     weights = Claims)
#> 
#> Expected Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -5.9889656 -2.1503982 -0.3464031  2.2958381  6.3430236 
#> 
#> Prior and Maximum Likelihood Estimates with Standard Deviations
#> 
#>             Null Mode Prior Mean Prior.sd Max Like. Like.sd
#> (Intercept)  -1.18283   -1.20215  0.15462  -1.17456   0.016
#> Merit1        0.00000    0.00000  0.25980  -0.06867   0.026
#> Merit2        0.00000    0.00000  0.28961  -0.07025   0.029
#> Merit3        0.00000    0.00000  0.16226  -0.05673   0.016
#> Class2        0.00000    0.00000  0.26281   0.08274   0.026
#> Class3        0.00000    0.00000  0.18245   0.01583   0.018
#> Class4        0.00000    0.00000  0.19328   0.15981   0.019
#> Class5        0.00000    0.00000  0.38885  -0.08142   0.039
#> 
#> Bayesian Estimates Based on 1000 iid draws
#> 
#>              Post.Mode  Post.Mean    Post.Sd   MC Error Pr(Null_tail) SE(tail)
#> (Intercept) -1.1747274 -1.1746675  0.0117659  0.0003721     0.2420000    0.003
#> Merit1      -0.0682525 -0.0678516  0.0206808  0.0006540     0.0000000    0.000
#> Merit2      -0.0698241 -0.0698158  0.0225002  0.0007115     0.0010000    0.001
#> Merit3      -0.0563841 -0.0561201  0.0126068  0.0003987     0.0000000    0.000
#> Class2       0.0822404  0.0820272  0.0198569  0.0006279     0.0000000    0.000
#> Class3       0.0157394  0.0155515  0.0137728  0.0004355     0.1340000    0.011
#> Class4       0.1588636  0.1585185  0.0147610  0.0004668     0.0000000    0.000
#> Class5      -0.0809389 -0.0809062  0.0299447  0.0009469     0.0050000    0.002
#>             Pr(Prior_tail)    
#> (Intercept)          0.009 ** 
#> Merit1              <2e-16 ***
#> Merit2               0.001 ** 
#> Merit3              <2e-16 ***
#> Class2              <2e-16 ***
#> Class3               0.134    
#> Class4              <2e-16 ***
#> Class5               0.005 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>   Directional Tail Summaries:
#> 
#>                Metric vs Null vs Prior
#>  Mahalanobis Distance 16.6510  15.8459
#>      Tail Probability  0.0000   0.0000
#>   [Tail probabilities are P(delta^T * Z <= 0) in whitened space]
#> 
#> 
#> Distribution Percentiles
#> 
#>                  1.0%      2.5%      5.0%    Median     95.0%     97.5%  99.0%
#> (Intercept) -1.201518 -1.196129 -1.193432 -1.174957 -1.155021 -1.149703 -1.146
#> Merit1      -0.121085 -0.109264 -0.102227 -0.067554 -0.035026 -0.029120 -0.023
#> Merit2      -0.118013 -0.113105 -0.106580 -0.070020 -0.033930 -0.026077 -0.019
#> Merit3      -0.085389 -0.082037 -0.078022 -0.055310 -0.035998 -0.032365 -0.028
#> Class2       0.037135  0.043259  0.048983  0.082879  0.114347  0.120577  0.128
#> Class3      -0.015756 -0.011230 -0.007122  0.015346  0.037787  0.041287  0.049
#> Class4       0.124130  0.129555  0.135374  0.158407  0.183404  0.187520  0.193
#> Class5      -0.147714 -0.137897 -0.130838 -0.081565 -0.029834 -0.019777 -0.011
#> 
#> Effective Number of Parameters: 8.057135 
#> Expected Residual Deviance: 220.2001 
#> DIC: -107.1082 
#> 
#> Expected Mean dispersion: 7.840919 
#> Sq.root of Expected Mean dispersion: 2.800164 
#> 
#> Mean Likelihood Subgradient Candidates Per iid sample: 2.562
    options(oldopt)

The Bayesian summary will report:

posterior modes and posterior means for each coefficient
posterior standard deviations and Monte Carlo errors
tail probabilities relative to the prior mean
posterior percentiles (credible intervals)
effective number of parameters pD
expected residual deviance D
DIC

Because dispersion is fixed at the same value in both models, differences between out and out_glmb reflect primarily the influence of the prior on \(\beta\), not differences in \(\phi\).

6. Interpretation and Model Diagnostics

6.1 Coefficients

Under the log link, coefficient interpretations are multiplicative:

exp(\(\beta_{j}\)) is the multiplicative change in expected cost for a one‑unit change in the covariate (or for a change from baseline to a given factor level).

With a weak prior, posterior means in out_glmb will be close to the MLEs in out, but with slightly more regularization and smaller posterior standard deviations.

6.2 Dispersion

By fixing dispersion at disp from the classical model, the Bayesian and classical fits share the same assumed level of variability.

This allows:

direct comparison of log‑likelihood based quantities,
a clean assessment of how the prior affects coefficient estimates.

If desired, a more advanced model can estimate dispersion by switching to a pfamily such as dNormal_Gamma, but that is covered in more detail in the dispersion‑focused vignettes.

6.3 DIC and Model Fit

The Bayesian summary of out_glmb reports:

the effective number of parameters pD,
the expected residual deviance D,
the DIC value (Spiegelhalter et al. 2002).

These can be compared to AIC and deviance from the classical model, especially when considering alternative specifications (for example, reduced models, different covariate sets, or alternative link functions).

7. Concluding Discussion

Gamma regression provides a flexible framework for modeling positive, skewed response variables (McCullagh and Nelder 1989; Gelman et al. 2013).
In this chapter, we have:

introduced the Gamma GLM and its link functions,
shown how to fit a classical Gamma regression using glm(),
demonstrated how to estimate dispersion with gamma.dispersion(),
used Prior_Setup() to construct a default Zellner‑type prior,
fit a Bayesian Gamma regression with glmb() using a dNormal prior with fixed dispersion,
and compared classical and Bayesian summaries in a realistic insurance setting.

In later vignettes, we extend these ideas to:

Gamma models with unknown dispersion and hierarchical priors,
Gamma regression combined with other components in larger GLM systems,
and specialized accept‑reject schemes for dispersion parameters.

References

Agresti, Alan. 2015. Foundations of Linear and Generalized Linear Models. Cambridge University Press.

Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. 3rd ed. CRC Press.

Griffin, Jim E., and Philip J. Brown. 2010. “Inference with Normal-Gamma Prior Distributions in Regression Problems.” Bayesian Analysis 5 (1): 171–88. https://doi.org/10.1214/10-BA507.

McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. Chapman; Hall.

Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized Linear Models.” Journal of the Royal Statistical Society. Series A (General) 135 (3): 370–84. https://doi.org/10.2307/2344614.

Spiegelhalter, David J., Nicky G. Best, Bradley P. Carlin, and Angelika van der Linde. 2002. “Bayesian Measures of Model Complexity and Fit.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 (4): 583–639. https://doi.org/10.1111/1467-9868.00353.

Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with S. Springer.

Chapter 09: Models for the Gamma Family