2. Full Joint Log‑Posterior (Gaussian Case, Weighted)

Combining the weighted log‑likelihood and the independent Normal–Gamma prior gives

\[ \log p(\beta,\tau \mid y) = \ell(\beta,\tau) -\tfrac12(\beta-\mu_0)^\top\Sigma_0^{-1}(\beta-\mu_0) + (a_0 - 1)\log\tau - b_0\tau + \text{const}. \]

Substituting the likelihood expression:

\[ \log p(\beta,\tau \mid y) = -\frac{\tau}{2}\,\mathrm{RSS}(\beta) \;+\; \frac{n_w}{2}\log\tau -\tfrac12(\beta-\mu_0)^\top\Sigma_0^{-1}(\beta-\mu_0) + (a_0 - 1)\log\tau - b_0\tau + \text{const}. \]

Collecting terms in \(\tau\):

\[ \log p(\beta,\tau \mid y) = -\left( \frac{\mathrm{RSS}(\beta)}{2} + b_0 \right)\tau \;+\; \left( a_0 - 1 + \frac{n_w}{2} \right)\log\tau -\tfrac12(\beta-\mu_0)^\top\Sigma_0^{-1}(\beta-\mu_0) + \text{const}. \]

This joint log‑posterior is jointly concave in \((\beta,\tau)\), which is the key property enabling the envelope‑based accept–reject sampler developed in the remainder of Chapter A07.

3. General Sampling Process

Fit a classical model and extract an estimate for the dispersion
- Use a standard regression fit (e.g., lm()) to obtain initial coefficient estimates and the residual sum of squares (RSS).
- This provides a first estimate of the dispersion parameter.
Fit Bayesian models with fixed but gradually refined dispersion parameters

Start from the classical dispersion estimate and iteratively refine it (a fixed-point refinement).
At each iteration, keep the dispersion fixed and compute the posterior expected weighted residual sum of squares (RSS_Post2) in closed form under the Normal posterior for \(\beta\). Update the Gamma parameters for the precision:
\[ \text{shape}_2 = \text{shape} + \tfrac{n_{w}}{2}, \qquad \text{rate}_2 = \text{rate} + \tfrac{\text{RSS}_{\text{Post}}}{2}. \]

where
\[ \mathrm{RSS}_{\text{post}} = \mathbb{E}\!\left[\sum_{i=1}^n w_i (y_i - x_i^\top\beta)^2 \,\middle|\, y,\phi\right] = \sum_{i=1}^n w_i\,(y_i - x_i^\top m_\beta)^2 \;+\; \mathrm{tr}\!\bigl(X^\top W X\,\Sigma_\beta\bigr), \]

with \(m_\beta = \mathbb{E}[\beta \mid y,\phi]\) and \(\Sigma_\beta = \mathrm{Var}(\beta \mid y,\phi)\).

The posterior for the precision \(\tau = 1/\sigma^2\) is then \(\text{Gamma}(\text{shape}_2, \text{rate}_2)\), whose mode is
\[ \tau_{\text{mode}} = \frac{\text{shape}_2 - 1}{\text{rate}_2}. \]
The calibrated dispersion anchor is set as the reciprocal of this mode, i.e. the mode of the dispersion distribution:

\[ d^{*}_{1} = \frac{\operatorname{rate}_{2}}{\operatorname{shape}_{2} - 1}. \]

Here shape2 and rate2 use the weighted likelihood via \(n_{w}\) and weighted RSS.
Thus \(d^{*}_{1}\) is the mode of the weighted dispersion posterior.

Find the posterior mode for the model with calibrated dispersion \(d^{*}_{1}\)
- With the refined dispersion, locate the posterior mode of the regression coefficients.
- This mode serves as the anchor point for subsequent standardization.
Standardize the model
- Re‑express the model using the calibrated dispersion and posterior mode.
- Apply a linear transformation and, as needed, shift parts of the prior into the likelihood so that the standardized model has:
  - a diagonal posterior precision matrix, and
  - a standard multivariate normal prior (which by definition has zero mean and identity covariance).
- This ensures numerical stability and simplifies envelope construction.
Build an envelope based on this standardized model (treating dispersion as fixed)
- Call EnvelopeBuild() to construct an envelope over the regression coefficients.
- At this stage, dispersion is treated as fixed at \(d^{*}_{1}\).
Build an envelope that accounts for the prior on dispersion as well
- Keep the number of grid components and their gradient vectors fixed across dispersion values.
  This ensures the conditional proposal density for the regression coefficients is always a truncated normal with
  - mean vector given by the negative gradient \(-\bar{c}_j\), and
  - bounds determined by \(\exp(\text{loglt}_{j,r})\) and \(\exp(\text{logrt}_{j,r})\).
- Construct a Gamma proposal for the dispersion (precision \(\tau\)) with updated parameters:
  - shape parameter
    \[ \mathrm{shape}_{3}= \mathrm{shape} + \tfrac{n_{w}}{2} - \mathrm{lmc2}\,\cdot\, d^{*}_{2} \]

where: - \(\tfrac{n_{w}}{2}\) comes from the likelihood contribution to the posterior shape, - \(\mathrm{lmc2} \cdot d^{*}_{2}\) adjusts the shape to ensure the proposal dominates the posterior. - the rate parameter is defined further down

\[ \mathrm{rate}_{2} = \mathrm{rate} + \tfrac{\mathrm{RSS\_Min}}{2} \]
where: - \(\tfrac{\mathrm{RSS\_Min}}{2}\) is the classical residual sum of squares,
- used as a conservative bound on the residual term that varies across tangency points.
- Construct the Per-Face Subgradient Selection Probabilities (PLSD) for each grid component \(j\) as follows:
\[ \text{PLSD}_j = \frac{ \left[ \prod_{r=1}^p \left(\exp(\text{logrt}_{j,r}) - \exp(\text{loglt}_{j,r})\right) \right] \cdot \exp\left( \lg\_prob\_factor2_{j} \;+\;\tfrac{1}{2}\bar{c}_j^{\top}\bar{c}_j \right) }{ \sum_{k=1}^{K} \left[ \prod_{r=1}^p \left(\exp(\text{logrt}_{k,r}) - \exp(\text{loglt}_{k,r})\right) \right] \cdot \exp\left( \lg\_prob\_factor2_{k} \;+\;\tfrac{1}{2}\bar{c}_k^{\top}\bar{c}_k \right) } \] This expression mirrors the mixture weights \(\tilde{p}_j\) in Claim 1 of Nygren and Nygren (2006)

Compute and store a set of constants (e.g., lg_prob_factor, lmc1, lmc2) that are used in the accept/reject test and in adjusting the probabilities with which grid components are selected. Specifically, these constants are built so that UB3A(.) and UB3B(.) in the proposition below are always non-negative bounding functions.

4. Main Proposition and Supporting claims

The central goal of this section is to prepare for an accept–reject sampling algorithm for the joint posterior \(\pi(\beta,d)\).
In such algorithms, we need two ingredients:

Proposal distributions — simple, tractable distributions from which we can easily generate candidate values for \((\beta,d)\).
Correction terms — adjustments that ensure the proposals form valid upper bounds on the true posterior, so that the accept–reject step is mathematically justified.

To make this work, we will express the log‑posterior in a form that separates these two roles:
- the proposal part, which generates candidates, and
- the correction part, which determines the acceptance probability.

Before we can state the main proposition that formalizes this decomposition, we need to introduce the constants and auxiliary quantities that appear in both the proposals and the corrections. These constants:

Define the dispersion bounds, tangency maps, and slopes that structure the proposals.
Encode bounding line parameters and extrapolated constants that guarantee the correction terms are non‑negative.
Provide a bridge between the mathematical formulation and the implementation, since many of these constants correspond directly to terms in the code.

By laying out these constants first, we ensure that every symbol used in the proposition and its supporting claims is defined explicitly and consistently. The table below serves as a reference point for all subsequent derivations.

Constant	Definition	Description
Posterior Gamma parameters	\(\text{shape}_2 = \text{Shape} + \tfrac{n_{w}}{2}\) \(\text{rate}_2 = \text{Rate} + \mathrm{RSS}_{\text{post}}/2\)	Posterior shape and rate parameters.
Envelope dispersion anchor	\(d^{*}_{1} = \dfrac{\text{rate}_2}{\text{shape}_2 - 1}\)	Dispersion used for baseline envelope.
Dispersion bounds	\(\begin{aligned} & \text{low} \\[4pt] & \text{upp} \end{aligned}\)	Lower and upper bounds for dispersion.
Tangency offset vector	\(B(d) = P\mu + \dfrac{1}{d}\,X^\top W(\alpha - y)\)	Offset vector used in the inverse map \(c^{-1}(\bar c, d)\).
Tangency slope components	\(\begin{aligned} Q &= X^\top X \\ A(d) &= Q + dP \\ r &= X^\top(y-\alpha) \\ V(d) &= \bar c_j - P\,A(d)^{-1}(r + d\,\bar c_j) \end{aligned}\)	Components for quad–linear slope term.
Tangency inverse map	\(c^{-1}(\bar c, d) = A(d)^{-1}(\bar c - B(d))\)	Maps gradient vector \(\bar c\) and dispersion \(d\) to tangency point \(\theta\).
Tangency map	\(\theta(d) = c^{-1}(\bar c_j, d)\)	Tangency point in coefficient space at dispersion \(d\).
Tangency face energy	\(g_{1j}(d) = -\tfrac12\,\theta(d)^\top P\,\theta(d) + \bar c_j^\top \theta(d)\)	Quadratic–linear face energy at dispersion \(d\).
Baseline face constant	\(g_{1j}(d^{*}_{1})\)	Face‑specific constant obtained by evaluating \(g_{1j}(d)\) at the anchor \(d^{*}_{1}\).
Derivative of face energy	\(\begin{aligned} & g'_{1j}(d) = V(d)^\top A(d)^{-1}\,\bar c_j \\[6pt] & \qquad -\, \Big( V(d)^\top A(d)^{-1} P\,A(d)^{-1} \\[2pt] & \qquad\qquad\qquad\;\;\times \big(r + d\,\bar c_j\big) \Big) \end{aligned}\)	Derivative of \(g_{1j}(d)\) with respect to dispersion.
Derivative at the anchor	\(\begin{aligned} & g'_{1j}(d^{}_{1}) = V(d^{}_{1})^\top A(d^{}_{1})^{-1}\,\bar c_j \\[6pt] & \qquad -\, \Big( V(d^{}_{1})^\top A(d^{}_{1})^{-1} P\,A(d^{}_{1})^{-1} \\[2pt] & \qquad\qquad\qquad\;\;\times \big(r + d^{*}_{1}\,\bar c_j\big) \Big) \end{aligned}\)	Value of the derivative at the envelope anchor \(d^{*}_{1}\).
Tangency face energy	\(g_{1j}(d) = -\tfrac12\,\theta(d)^\top P\,\theta(d) + \bar c_j^\top \theta(d)\)	Quadratic–linear face energy at dispersion \(d\).
Mean quad–linear slope	\(\mathrm{m}_{g'_{1}} = \displaystyle \operatorname{mean}\limits_{j}\!\big(g'_{1j}(d^{}_{1})\big)\)	Average derivative \(g'_{1j}(d^{*}_{1})\) across faces.
Supporting line for face \(j\)	\(g_{2j}(d) = g_{1j}(d^{}_{1}) + (d - d^{}_{1})\,g'_{1j}(d^{*}_{1})\)	Linear supporting line of \(g_{1j}(d)\) at the anchor \(d^{*}_{1}\).
Extrapolated face constants	\(\begin{aligned} & g_{2j}(\text{upp}) = g_{1j}(d^{}_{1}) \\[2pt] & \qquad\qquad\;\; +\, (\text{upp}-d^{}_{1})\, g'_{1j}(d^{}_{1}) \\[6pt] & g_{2j}(\text{low}) = g_{1j}(d^{}_{1}) \\[2pt] & \qquad\qquad\;\; +\, (\text{low}-d^{}_{1})\, g'_{1j}(d^{}_{1}) \end{aligned}\)	Linear extrapolations to dispersion bounds.
Endpoint maxima	\(\begin{aligned} \mathrm{max\_upp} & = \max_j\!\big(g_{2j}(\text{upp})\big) \\[8pt] \mathrm{max\_low} & = \max_j\!\big(g_{2j}(\text{low})\big) \end{aligned}\)	Maxima at upper and lower dispersion bounds.
Mean lower-bound maximum	\(\begin{aligned} &\mathrm{max\_low\_mean} \\[4pt] &= \mathrm{max\_upp} \\[4pt] &\quad - \mathrm{m}_{g'_{1}}\,(\text{upp}-\text{low}) \end{aligned}\)	Linearized lower-bound maximum.
Global line parameters	\(\text{lmc}_2 = \dfrac{\mathrm{max\_upp} - \mathrm{max\_low\_mean}}{\text{upp}-\text{low}}\) \(\text{lmc}_1 = \mathrm{max\_low\_mean} - \text{lmc}_2\,\text{low}\)	Slope and intercept of global affine bound.
Log–linear anchor	\(d^{*}_{2} = \dfrac{\text{upp}-\text{low}}{\log(\text{upp}/\text{low})}\)	Anchor point for log–tilt.
Log–tilt coefficients	\(\begin{aligned} & \mathrm{lm\_log2} = \\[4pt] & \text{lmc}_2\,d^{}_{2} \\[4pt] & \mathrm{lm\_log1} = \\[4pt] & \text{lmc}_1 + \text{lmc}_2 d^{}_{2} - \text{lmc}_2\log(d^{*}_{2}) \\[4pt] & \mathrm{max\_LL\_log\_disp} =\\[4pt] & \mathrm{lm\_log1} + \mathrm{lm\_log2}\log(\text{upp}) \end{aligned}\)	Coefficients for log–tilt bounding function.
Face-specific RSS	\(\mathrm{RSS}_j(d) = \sum_{i=1}^n w_i\,(y_i - x_i^\top c^{-1}(\bar c_j,d))^2\)	Residual sum of squares for face \(j\) at dispersion \(d\).
Global minimum RSS	\(\mathrm{RSS\_Min} = \min_{j}\;\min_{d\in[\text{low},\,\text{upp}]}\;\mathrm{RSS}_j(d)\)	Global minimum RSS across all faces and dispersion values.
UB2 term	\(\mathrm{UB2}_j(d) = \dfrac{1}{2d}\big(\mathrm{RSS}_j(d) - \mathrm{RSS\_Min}\big)\)	Nonnegative UB2 bound for face \(j\).
Per-face UB2 minimum	\(\mathrm{UB2\_Min}_j = \min_{d\in[\text{low},\,\text{upp}]}\mathrm{UB2}_j(d)\)	Minimum UB2 value for face \(j\).
Per‑face shift (UB3A)	\(\begin{aligned} & \mathrm{lg\_prob\_factor1}_{j} = \\[4pt] & \max\Big\{\, g_{2j}(\text{upp}) - \mathrm{max\_upp}, \\[-2pt] & \qquad\;\; g_{2j}(\text{low}) - \mathrm{max\_low} \Big\} \end{aligned}\)	Raw per‑face shift used in UB3A construction.
Per‑face shift (PLSD)	\(\begin{aligned} & \mathrm{lg\_prob\_factor2}_{j} = \\[4pt] & \mathrm{lg\_prob\_factor1}_{j} \;-\; \mathrm{UB2\_Min}_{j} \end{aligned}\)	UB2‑adjusted shift used in PLSD mixture weights.
Global affine bound \(g3_j\)	\(\displaystyle g3_{j}(d) = \mathrm{lg\_prob\_factor1}_{j} + \mathrm{lmc}_1 + \mathrm{lmc}_2\, d\)	Global affine upper bound for the quadratic–linear face energy.
Gamma proposal parameters	\(\text{shape}_3 = \text{shape}_2 - \mathrm{lm\_log2}\) \(\text{rate}_3 = \text{Rate} + \mathrm{RSS\_Min}/2\)	Adjusted Gamma proposal parameters.

Remark 4.1.1
The envelope uses the dispersion variable \(d \equiv \phi = 1/\tau\), where \(\tau\) is the precision appearing in Sections 1.1–1.2. When the user does not supply bounds, the default implementation truncates the surrogate posterior
\[ \tau \mid y \sim \Gamma(\text{shape}_2,\text{rate}_2) \] by selecting central quantiles at level \(\mathrm{max\_disp\_perc}\) (default \(0.99\)): \[ \tau_{\min} = qgamma(\mathrm{max\_disp\_perc},\text{shape}_2,\text{rate}_2), \qquad \tau_{\max} = qgamma(1-\mathrm{max\_disp\_perc},\text{shape}_2,\text{rate}_2). \] The dispersion bounds used in the envelope are then \[ \text{low} = \frac{1}{\tau_{\max}}, \qquad \text{upp} = \frac{1}{\tau_{\min}}. \]

These bounds exclude only the far tails of the surrogate Gamma posterior for \(\tau\) while still restricting \(d\) enough for the global log‑tilt bound to dominate the linear supporting line \(g_{2j}(d)\) across the entire interval.

Remark 4.1.2
By construction, the global affine bound satisfies \[ \mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{low} = \mathrm{max\_low}, \qquad \mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{upp} = \mathrm{max\_upp}. \]

Proof of Remark 4.1.2

From the definitions of the global affine bound, \[ \mathrm{lmc}_2 = \frac{\mathrm{max\_upp} - \mathrm{max\_low}}{\text{upp} - \text{low}}, \qquad \mathrm{lmc}_1 = \mathrm{max\_low} - \mathrm{lmc}_2\,\text{low}, \] we verify the endpoint identities as follows.

At \(d = \text{low}\), \[ \mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{low} = \big(\mathrm{max\_low} - \mathrm{lmc}_2\,\text{low}\big) + \mathrm{lmc}_2\,\text{low} = \mathrm{max\_low}. \]

At \(d = \text{upp}\), \[ \begin{aligned} \mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{upp} &= \mathrm{max\_low} - \mathrm{lmc}_2\,\text{low} + \mathrm{lmc}_2\,\text{upp} \\ &= \mathrm{max\_low} + \mathrm{lmc}_2(\text{upp} - \text{low}) \\ &= \mathrm{max\_low} + \frac{\mathrm{max\_upp} - \mathrm{max\_low}}{\text{upp} - \text{low}}(\text{upp} - \text{low}) \\ &= \mathrm{max\_upp}. \end{aligned} \]

Thus the global affine function \(d \mapsto \mathrm{lmc}_1 + \mathrm{lmc}_2 d\) interpolates the endpoint maxima: \[ \mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{low} = \mathrm{max\_low}, \qquad \mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{upp} = \mathrm{max\_upp}. \]

Remark 4.1.3

A requirement for the above to be a valid sampler is that \(\mathrm{lm\_log2}<\text{shape}_2\). We impose the stronger requirement that \(\mathrm{lm\_log2} \leq \tfrac{n_{w}}{2}\) in our implementation and adjust the global bound as needed while still maintaining validity of the sampler (see the implementation for details).

4.2 Proposal distributions

In the accept–reject scheme, proposals are the distributions we can sample from directly to generate candidate values. We use two proposals:

A Gamma-based proposal for dispersion \(d\) (implemented as a truncated inverse‑Gamma in \(d\), equivalently a truncated Gamma in \(1/d\)).
A mixture of truncated normal proposals for \(\beta\), sampled in two steps:
- First, sample a face index \(j\) using the per‑face mixture weights \(\text{PLSD}_j\).
- Then, conditional on \(j\), sample \(\beta\) from the face‑specific truncated normal.

These proposals are constructed to be tractable for sampling and to support tight bounds via the correction terms.

Gamma proposal in dispersion \(d\) (with truncation)

For dispersion \(d \in [\,\text{low},\,\text{upp}\,]\), define the truncated inverse‑Gamma proposal density (equivalently, a truncated Gamma in \(1/d\)):

\[ \begin{aligned} \log q_\Gamma^{\text{trunc}}(d) &= \Big(\text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2}\Big)\,\log\!\Big(\text{Rate} + \tfrac{\mathrm{RSS\_Min}}{2}\Big) \\ &\quad - \log \Gamma\!\Big(\text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2}\Big) \\[6pt] &\quad - \Big(\text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2} + 1\Big)\,\log d - \frac{\text{Rate} + \tfrac{\mathrm{RSS\_Min}}{2}}{d} \\[6pt] &\quad - \log\!\Bigg( F_\Gamma\!\Big(\tfrac{1}{\text{low}};\,\text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2},\, \text{Rate} + \tfrac{\mathrm{RSS\_Min}}{2}\Big) \\[6pt] &\quad -\; F_\Gamma\!\Big(\tfrac{1}{\text{upp}};\,\text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2},\, \text{Rate} + \tfrac{\mathrm{RSS\_Min}}{2}\Big) \Bigg), \end{aligned} \]

where \(F_\Gamma(\cdot;\,\text{shape},\,\text{rate})\) is the Gamma CDF for \(1/d\).
Sampling proceeds by drawing \(u \sim \text{Uniform}(0,1)\) on the truncated CDF interval and inverting to obtain \(1/d\), then mapping to \(d\).

Mixture of truncated normals for \(\beta\) (two-step sampling)

We construct a mixture over faces, each with a truncated normal component. Sampling is performed in two steps:

Sample the face index \(j\) (mixture weights):

\[ \begin{aligned} \log \mathrm{PLSD}_j &= \mathrm{lg\_prob\_factor2}_j + \tfrac{1}{2}\,\bar c_j^\top \bar c_j + \sum_{r=1}^p \log\!\big(e^{\mathrm{logrt}_{j,r}} - e^{\mathrm{loglt}_{j,r}}\big) \\[6pt] &\quad - \log\!\left( \sum_{k=1}^K \exp\!\Big( \mathrm{lg\_prob\_factor2}_k + \tfrac{1}{2}\,\bar c_k^\top \bar c_k \Big) \prod_{r=1}^p \big(e^{\mathrm{logrt}_{k,r}} - e^{\mathrm{loglt}_{k,r}}\big) \right). \end{aligned} \]

Exponentiating and normalizing across \(j=1,\dots,K\) yields a valid categorical distribution for the face index.

Sample \(\beta\) conditional on \(j\) (truncated normal): \[ \log q_{\text{TN}}(\beta\mid j) = -\tfrac{1}{2}\,\beta^\top \beta - \bar c_j^\top \beta - \tfrac{1}{2}\,\bar c_j^\top \bar c_j + \tfrac{p}{2}\log(2\pi) - \sum_{r=1}^p \log\!\big(e^{\text{logrt}_{j,r}} - e^{\text{loglt}_{j,r}}\big). \]
- This corresponds to a normal with mean \(-\bar c_j\) and identity covariance, truncated coordinate‑wise to intervals \((e^{\text{loglt}_{j,r}},\,e^{\text{logrt}_{j,r}})\).
- Sampling can be performed via independent truncated univariate normals in each coordinate, with acceptance on the rectangle.

These proposals provide efficient candidate generation: \(d\) from the truncated inverse‑Gamma, and \(\beta\) from a face‑indexed mixture of truncated normals. In the accept–reject algorithm, the corresponding correction terms will ensure that the proposals upper‑bound the target, yielding valid acceptance probabilities.

4.3 Correction terms

In the accept–reject framework, the proposal distributions generate candidate values, while the correction terms ensure that the proposals form valid upper bounds on the true posterior.
These corrections are carefully constructed so that:

They are always non‑negative (or non‑positive in the case of test1),
They vanish or simplify at key anchor points, and
They guarantee that the acceptance probability is well‑defined.

Together, they measure the “gap” between the proposal approximation and the exact log‑posterior.

Term	Definition	Description
\(\text{test1}_{j}(\beta,d)\)	\[\begin{aligned} & LL(\beta,d) - \\[4pt] & \Big( LL(\bar c^{-1}(\bar c_j,d),d) - \bar c_j^{\top}(\beta - \bar c^{-1}(\bar c_j,d)) \Big) \\[6pt] &= \Bigg[ - \tfrac{1}{2d}\sum_{i=1}^n w_i\,(y_i - x_i^\top \beta)^2 - \tfrac{1}{2}\beta^\top P \beta \Bigg] \\[6pt] &\quad - \Bigg[ - \tfrac{1}{2d}\sum_{i=1}^n w_i\,(y_i - x_i^\top \bar c^{-1}(\bar c_j,d))^2 \\[4pt] & -\tfrac{1}{2}\big(\bar c^{-1}(\bar c_j,d)\big)^\top P\,\bar c^{-1}(\bar c_j,d) \\[4pt] & - \bar c_j^{\top}\big(\beta - \bar c^{-1}(\bar c_j,d)\big) \Bigg] \end{aligned}\]	Difference between the log‑likelihood and its linearization at the tangency point. By concavity, this term is always \(\leq 0\).
\(\text{RSS}_{j}(d)\)	\(\sum_{i=1}^n w_i\,(y_i - x_i^\top \bar c^{-1}(\bar c_j,d))^2\)	Residual Sum of Squares for face j at dispersion \(d\) .
\(\mathrm{RSS\_Min}\)	\[\begin{aligned} & \min_{j}\;\min\limits_{d \in [d_{\mathrm{low}},\, d_{\mathrm{upp}}]} \\[4pt] & \sum_{i=1}^n w_i \,\Big(y_i - x_i^\top \bar c^{-1}(\bar c_j,d)\Big)^2 \end{aligned}\]	Global minimum residual sum of squares across all faces \(j\) and dispersion values \(d\).
\(\text{UB2}_{j}(d)\)	\(\tfrac{1}{2d}\Big(\sum_{i=1}^n w_i\,(y_i - x_i^\top \bar c^{-1}(\bar c_j,d))^2 - \mathrm{RSS}_{Min}\Big)\)	Bound relative to the ML residual sum of squares. Always \(\geq 0\).
\(\mathrm{UB2\_Min}_{j}\)	\[\begin{aligned} & \min\limits_{d \in [d_{\mathrm{low}},\, d_{\mathrm{upp}}]} \;\text{UB2}_{j}(d) \end{aligned}\]	Minimum UB2 value across the dispersion range.
\(\text{UB2A}_{j}(d)\)	\(\text{UB2}_{j}(d) - \mathrm{UB2\_Min}_{j}\)	Non‑negative UB2 adjustment.
\(\text{UB3A}_j(d)\)	\[\begin{aligned} &\mathrm{lg\_prob\_factor1\_j} + \text{lmc}_1 + \text{lmc}_2\, d \\ &- \Big(-\tfrac{1}{2}(\bar c^{-1}(\bar c_j,d))^\top P\,\bar c^{-1}(\bar c_j,d) \\[4pt] & + \bar c_j^\top \bar c^{-1}(\bar c_j,d)\Big) \end{aligned}\]	Bound from the quadratic–linear face energy. Constructed so the global line dominates the exact quadratic–linear term.
\(\text{UB3B}(d)\)	\[\begin{aligned} &(\mathrm{max\_upp} - \mathrm{max\_LL\_log\_disp}) \\ &+ (\mathrm{lm\_log1} + \mathrm{lm\_log2} \cdot \log d) \\[4pt] & - (\text{lmc}_1 + \text{lmc}_2 \cdot d) \end{aligned}\]	Bound from the dispersion log–tilt construction. Ensures the global log–linear approximation dominates the true log‑dispersion curve.

Here, the standardized log‑likelihood is

\[ LL(\beta,d) \;=\; -\tfrac{n_{w}}{2}\log d \;-\; \tfrac{1}{2d}\sum_{i=1}^n w_i\,(y_i - x_i^\top \beta)^2 \;-\; \tfrac{1}{2}\,\beta^\top P\,\beta. \]

Interpretation.
- test1 measures how far the log‑likelihood lies below its tangent plane.
- UB2 enforces that the residual sum of squares at the tangency point cannot beat the ML fit.
- UB3A ensures that the quadratic–linear face energy is bounded above by a global line.
- UB3B ensures that the log‑dispersion contribution is bounded above by a log–linear tilt.

Together, these correction terms guarantee that the proposals dominate the target posterior, making the accept–reject step valid.

4.4 Proposition: Log‑posterior decomposition in dispersion form

Proposition For face index \(j\), dispersion \(d\), and tangency point \(\theta(d) = c^{-1}(\bar c_j,d)\),
the joint log‑posterior can be written as \[ \begin{aligned} \log \pi(\beta,d) &= \underbrace{\log q_\Gamma(d)}_{\text{Gamma proposal}} + \underbrace{\log q_{\mathrm{TN}}(\beta\mid j)}_{\text{Truncated normal proposal}} + \underbrace{\log \mathrm{PLSD}_j}_{\text{Per‑face mixture weight}} \\[6pt] &\quad + \underbrace{ \mathrm{test1}(\beta,d) - \mathrm{UB2A}_j(d) - \mathrm{UB3A}_j(d) - \mathrm{UB3B}(d) }_{\text{Correction block}} + \text{const}. \end{aligned} \]

Moreover, the correction terms satisfy the following sign properties

\(\mathrm{test1}(\beta,d) \le 0\)
(concavity of the standardized log‑likelihood and its supporting hyperplane at
\(c^{-1}(\bar c_j,d)\)).
\(\mathrm{UB2A}_j(d) \ge 0\)
(since \(\mathrm{UB2}_j(d)\) is defined relative to the global minimum
\(\mathrm{RSS}_{\mathrm{Min}}\), and \(\mathrm{UB2A}_j(d) = \mathrm{UB2}_j(d) - \mathrm{UB2\_Min}_j\).
\(\mathrm{UB3A}_j(d) \ge 0\)
(quadratic–linear face energy bound dominates the exact term).
\(\mathrm{UB3B}(d) \ge 0\)
(dispersion log–tilt bound dominates the exact log‑dispersion).

and we further have the following

\(\min_{d\in[\text{low},\,\text{upp}]}\;\mathrm{RSS}_j(d)= \mathrm{RSS}_j(low)\) which implies \(\mathrm{RSS}_{\mathrm{Min}}= \min_{j} \mathrm{RSS}_j(low)\)
\(\mathrm{UB2\_Min}_j=\min [\mathrm{UB2}_j(low),\mathrm{UB2}_j(upp) ]\)

which enables efficient implementation of the minimization.

Explanation

What this does:
It separates the log‑posterior into three parts:
1. proposal distributions you can sample from directly (Gamma in \(d\), truncated normal in \(\beta\)),
2. a correction block ensuring the proposals form valid upper bounds, and
3. a parameter‑free constant.
Why the signs matter:
The correction block is \[ \mathrm{test1}(\beta,d) - \mathrm{UB2A}_j(d) - \mathrm{UB3A}_j(d) - \mathrm{UB3B}(d), \] where \(\mathrm{test1}\le 0\) and each \(\mathrm{UB}\ge 0\).
This guarantees that, when combined with the proposal terms, we obtain a valid envelope for accept–reject sampling: proposals generate candidates; the correction block calibrates acceptance.
How it’s used:
In the sampler, candidates are drawn from the proposal distributions and accepted with probability determined by the correction block.
The constant term does not affect acceptance and simply keeps the identity exact.

Proof of Proposition

We verify the decomposition and the sign properties by appealing to Claims 1–5.

Algebraic decomposition (Claim-1):
Claim 1 establishes, via the sequence of substitutions (1a)–(5d), that for each face \(j\) the chain of equalities transforms \[ \log q_\Gamma(d) + \log q_{\mathrm{TN}}(\beta\mid j) + \log \mathrm{PLSD}_j + \big[ \mathrm{test1}(\beta,d) - \mathrm{UB2A}_j(d) - \mathrm{UB3A}_j(d) - \mathrm{UB3B}(d) \big] + \text{const} \] into the standardized form with prior Gamma, prior multivariate normal, and the standardized log‑likelihood, absorbing all parameter‑free remnants into a global constant.
Concavity gap sign (Claim 2):
Claim 2 shows \(\mathrm{test1}(\beta,d) \le 0\) because the standardized log‑likelihood lies below its supporting hyperplane at the tangency point \(c^{-1}(\bar c_j,d)\).
Residual sum‑of‑squares bound (Claim-3):
Claim 3 shows \(\mathrm{UB2}_j(d) \ge 0\) because
\[ \mathrm{UB2}_j(d) = \frac{1}{2d}\big(\mathrm{RSS}_j(d) - \mathrm{RSS}_{\mathrm{Min}}\big) \ge 0, \] and therefore
\[ \mathrm{UB2A}_j(d) = \mathrm{UB2}_j(d) - \mathrm{UB2\_Min}_j \ge 0. \]
Quadratic–linear face bound (Claim-4):
Claim 4 shows that \(\mathrm{UB3A}_j(d) \ge 0\).

Define three functions
by

\[ g1_{j}(d)= -\tfrac{1}{2}(\bar c^{-1}(\bar c_j,d))^\top P\,\bar c^{-1}(\bar c_j,d) + \bar c_j^\top \bar c^{-1}(\bar c_j,d) \]

\[ g2_{j}(d)=g1_{j}(d^{*}_{1})\;+\;g'_{1j}(d^{*}_{1})\, (d - d^{*}_{1}). \]

\[ g3_{j}(d) = \mathrm{lg\_prob\_factor1\_j} + \text{lmc}_1 + \text{lmc}_2\, d \]

The key step is to compare:

\(g1_j(d)\): exact quadratic–linear face energy,
\(g2_j(d)\): its supporting line at the tangency point,
\(g3_j(d)\): the global affine bound constructed from \(\mathrm{lmc}_1\), \(\mathrm{lmc}_2\), and the per‑face adjustment \(\widetilde{\lg\_prob\_factor1}_j\).
Concavity ensures \(g1_j(d) \le g2_j(d)\).
Endpoint calibration ensures \(g3_j(d) \ge g2_j(d)\).
Thus \(g3_j(d) \ge g1_j(d)\), and
\[ \mathrm{UB3A}_j(d) = g3_j(d) - g1_j(d) \ge 0. \]

Dispersion tilt bound (Claim-5):
Claim 5 shows that \(\mathrm{UB3B}(d) \ge 0\).
The comparison is between:
- the linear function \(\mathrm{lmc}_1 + \mathrm{lmc}_2 d\), and
- the concave log‑function \(\mathrm{lm\_log1} + \mathrm{lm\_log2}\log d\).
  The log‑function is calibrated at the endpoints of the dispersion interval so that it lies above the linear function at both \(\text{low}\) and \(\text{upp}\), and therefore dominates it everywhere.
Property of RSS_j (Claim-6):

\(\min_{d\in[\text{low},\,\text{upp}]}\;\mathrm{RSS}_j(d)= \mathrm{RSS}_j(low)\)
Property of UB2_j (Claim-7):

\(\mathrm{UB2\_Min}_j=\min [\mathrm{UB2}_j(low),\mathrm{UB2}_j(upp) ]\)

Combining these results: Claim 1 validates the exact algebraic split, while Claims 2–5 confirm the properties that \[ \mathrm{test1}(\beta,d) \le 0,\quad \mathrm{UB2A}_j(d) \ge 0,\quad \mathrm{UB3A}_j(d) \ge 0,\quad \mathrm{UB3B}(d) \ge 0. \] Thus the correction block
\[ \mathrm{test1} - \mathrm{UB2A}_j - \mathrm{UB3A}_j - \mathrm{UB3B} \]

and hence has the intended signs while Claims 6-7 enables us to readily find the needed minimizing constants for the implementation.

Therefore, the proposition holds and provides a valid foundation for the accept–reject sampler.

5. Supporting Claims

In this section we provide detailed statements and proofs of the five claims that underpin the main proposition. The first claims shows that the decomposition is algebraically valid while claims 2–5 validates the sign properties that guarantee correctness of the accept–reject scheme. Claims 6 and 7 finally establishes properties of \(RSS_j(\cdot)\) and \(UB2_j(\cdot)\) that allows us to easily calculate \(\mathrm{RSS\_Min}\) and \(\mathrm{UB2\_Min}_j\).

5.1 Claim 1

Claim 1.
The joint log‑posterior admits the decomposition
\[ \log \pi(\beta,d) = \log q_{\Gamma}(d) + \log q_{\mathrm{TN}}(\beta\mid j) + \log \mathrm{PLSD}_j + \big[ \mathrm{test1}(\beta,d) - \mathrm{UB2A}_j(d) - \mathrm{UB3A}_j(d) - \mathrm{UB3B}(d) \big] + \mathrm{const}, \] where
\[ \mathrm{UB2A}_j(d) = \mathrm{UB2}_j(d) - \mathrm{UB2\_Min}_j, \qquad \log \mathrm{PLSD}_{j,2} = \log \mathrm{PLSD}_j - \mathrm{UB2\_Min}_j. \]

Proof of Claim 1.
Define the following.

(1a)

\[ \begin{aligned} \log q_{\Gamma,2}(d) &= \log q_{\Gamma}(d) - \mathrm{lm\_log2}\,\log d \\[6pt] &= (\text{Shape} + n/2)\,\log \tfrac{1}{d} - (\text{Rate} + \mathrm{RSS}_{\mathrm{Min}}/2)\,\tfrac{1}{d} \\[6pt] &\quad + (\text{Shape} + n_{w}/2 - \mathrm{lm\_log2})\, \log(\text{Rate} + \mathrm{RSS}_{\mathrm{Min}}/2) - \log \Gamma(\text{Shape} + n_{w}/2 - \mathrm{lm\_log2}) \\[6pt] &\quad - \log\!\Bigg( F_{\Gamma}\!\Big( \tfrac{1}{\mathrm{disp}_{\mathrm{lower}}}; \text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2}, \text{Rate} + \tfrac{\mathrm{RSS}_{\mathrm{Min}}}{2} \Big) \\[-2pt] &\qquad\qquad\qquad -\; F_{\Gamma}\!\Big( \tfrac{1}{\mathrm{disp}_{\mathrm{upper}}}; \text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2}, \text{Rate} + \tfrac{\mathrm{RSS}_{\mathrm{Min}}}{2} \Big) \Bigg). \end{aligned} \]

(1b)

\[ \begin{aligned} \mathrm{UB3A}_{j,2}(d) &= \mathrm{UB3A}_j(d) - (\mathrm{lmc}_1 + \mathrm{lmc}_2\,d) \\[6pt] &= \widetilde{\lg\_prob\_factor1}_j - \Big( -\tfrac12\big(c^{-1}(\bar c_j,d)\big)^\top P\,c^{-1}(\bar c_j,d) + \bar c_j^\top c^{-1}(\bar c_j,d) \Big), \end{aligned} \] where
\[ \widetilde{\lg\_prob\_factor1}_j = \lg\_prob\_factor_j - \mathrm{UB2\_Min}_j. \]

(1c)

\[ \begin{aligned} \mathrm{UB3B}_{2}(d) &= \mathrm{UB3B}(d) - \mathrm{lm\_log2}\log d + (\mathrm{lmc}_1 + \mathrm{lmc}_2 d) \\[6pt] &= \big(\mathrm{max\_upp} - \mathrm{max\_LL\_log\_disp}\big) + \mathrm{lm\_log1}. \end{aligned} \]

(2a)

\[ \begin{aligned} \mathrm{test1}_{2}(\beta,d) &= \mathrm{test1}(\beta,d) + \Big[ -\tfrac12\big(c^{-1}(\bar c_j,d)\big)^\top P\,c^{-1}(\bar c_j,d) + \bar c_j^\top c^{-1}(\bar c_j,d) \Big] \\[6pt] &= \Big[ -\tfrac{1}{2d}\sum_{i=1}^n w_i (y_i - x_i^\top \beta)^2 -\tfrac12\,\beta^\top P\beta \Big] \\[4pt] &\quad - \Big[ -\tfrac{1}{2d}\sum_{i=1}^n w_i (y_i - x_i^\top c^{-1}(\bar c_j,d))^2 - \bar c_j^\top \beta \Big]. \end{aligned} \]

(2b)

\[ \begin{aligned} \log \mathrm{PLSD}_{j,2} &= \log \mathrm{PLSD}_j - \mathrm{UB2\_Min}_j \\[6pt] &= \tfrac12\,\bar c_j^\top \bar c_j + \sum_{r=1}^p \log\!\big(e^{\mathrm{logrt}_{j,r}} - e^{\mathrm{loglt}_{j,r}}\big) \\[4pt] &\quad - \log\!\left( \sum_{k=1}^K \exp\!\Big( \widetilde{\lg\_prob\_factor2}_k + \tfrac12\,\bar c_k^\top \bar c_k \Big) \prod_{r=1}^p \big(e^{\mathrm{logrt}_{k,r}} - e^{\mathrm{loglt}_{k,r}}\big) \right). \end{aligned} \]

(3a)

\[ \begin{aligned} \log q_{\Gamma,3}(d) &= \log q_{\Gamma,2}(d) + (\mathrm{RSS}_{\mathrm{Min}}/2)\,\tfrac{1}{d} \\[6pt] &= (\text{Shape} + n_{w}/2)\,\log \tfrac{1}{d} - \text{Rate}\,\tfrac{1}{d} \\[6pt] &\quad + (\text{Shape} + n/2 - \mathrm{lm\_log2})\, \log(\text{Rate} + \mathrm{RSS}_{\mathrm{Min}}/2) - \log \Gamma(\text{Shape} + n_{w}/2 - \mathrm{lm\_log2}) \\[6pt] &\quad - \log\!\Bigg( F_{\Gamma}\!\Big( \tfrac{1}{\mathrm{disp}_{\mathrm{lower}}}; \text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2}, \text{Rate} + \tfrac{\mathrm{RSS}_{\mathrm{Min}}}{2} \Big) \\[-2pt] &\qquad\qquad\qquad -\; F_{\Gamma}\!\Big( \tfrac{1}{\mathrm{disp}_{\mathrm{upper}}}; \text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2}, \text{Rate} + \tfrac{\mathrm{RSS}_{\mathrm{Min}}}{2} \Big) \Bigg). \end{aligned} \]

(3b)

\[ \begin{aligned} \mathrm{test1}_{3}(\beta,d) &= \mathrm{test1}_{2}(\beta,d) + \tfrac{1}{2d}\sum_{i=1}^n w_i\,(y_i - x_i^\top c^{-1}(\bar c_j,d))^2 \\[6pt] &= \Big[ -\tfrac{1}{2d}\sum_{i=1}^n w_i (y_i - x_i^\top \beta)^2 -\tfrac12\,\beta^\top P\beta \Big] + \bar c_j^\top \beta. \end{aligned} \]

(4a)

\[ \begin{aligned} \log q_{\Gamma,4}(d) &= \log q_{\Gamma,3}(d) - (n_{w}/2)\,\log \tfrac{1}{d} \\[6pt] &= \text{Shape}\,\log \tfrac{1}{d} - \text{Rate}\,\tfrac{1}{d} \\[6pt] &\quad + (\text{Shape} + n_{w}/2 - \mathrm{lm\_log2})\, \log(\text{Rate} + \mathrm{RSS}_{\mathrm{Min}}/2) - \log \Gamma(\text{Shape} + n_{w}/2 - \mathrm{lm\_log2}) \\[6pt] &\quad - \log\!\Bigg( F_{\Gamma}\!\Big( \tfrac{1}{\mathrm{disp}_{\mathrm{lower}}}; \text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2}, \text{Rate} + \tfrac{\mathrm{RSS}_{\mathrm{Min}}}{2} \Big) \\[-2pt] &\qquad\qquad\qquad -\; F_{\Gamma}\!\Big( \tfrac{1}{\mathrm{disp}_{\mathrm{upper}}}; \text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2}, \text{Rate} + \tfrac{\mathrm{RSS}_{\mathrm{Min}}}{2} \Big) \Bigg). \end{aligned} \]

(4b)

\[ \begin{aligned} \log q_{\mathrm{TN},2}(\beta\mid j) &= \log q_{\mathrm{TN}}(\beta\mid j) + \bar c_j^\top \beta + \tfrac12\,\bar c_j^\top \bar c_j \\[6pt] &= -\tfrac12\,\beta^\top \beta + \tfrac{p}{2}\log(2\pi) - \sum_{r=1}^p \log\!\big(e^{\mathrm{logrt}_{j,r}} - e^{\mathrm{loglt}_{j,r}}\big). \end{aligned} \]

(4c)

\[ \begin{aligned} \log \mathrm{PLSD}_{j,3} &= \log \mathrm{PLSD}_{j,2} - \tfrac12\,\bar c_j^\top \bar c_j \\[6pt] &= \sum_{r=1}^p \log\!\big(e^{\mathrm{logrt}_{j,r}} - e^{\mathrm{loglt}_{j,r}}\big) \\[4pt] &\quad - \log\!\left( \sum_{k=1}^K \exp\!\Big( \widetilde{\lg\_prob\_factor2}_k + \tfrac12\,\bar c_k^\top \bar c_k \Big) \prod_{r=1}^p \big(e^{\mathrm{logrt}_{k,r}} - e^{\mathrm{loglt}_{k,r}}\big) \right). \end{aligned} \]

(4d)

\[ \begin{aligned} \mathrm{test1}_{4}(\beta,d) &= \mathrm{test1}_{3}(\beta,d) - \bar c_j^\top \beta + (n_{w}/2)\,\log \tfrac{1}{d} \\[6pt] &= \Big[ -\tfrac{1}{2d}\sum_{i=1}^n w_i (y_i - x_i^\top \beta)^2 -\tfrac12\,\beta^\top P\beta + (n/2)\log \tfrac{1}{d} \Big] \\[4pt] &= LL(\beta,d). \end{aligned} \]

(5a)

\[ \begin{aligned} \log q_{\Gamma,5}(d) &= \log q_{\Gamma,4}(d) + \big[\text{Shape}\log(\text{Rate}) - \log\Gamma(\text{Shape})\big] \\[6pt] &\quad - \Big[ (\text{Shape} + n_{2}/2 - \mathrm{lm\_log2}) \log(\text{Rate} + \mathrm{RSS}_{\mathrm{Min}}/2) - \log\Gamma(\text{Shape} + n_{2}/2 - \mathrm{lm\_log2}) \Big] \\[6pt] &\quad - \log\!\Bigg( F_{\Gamma}\!\Big( \tfrac{1}{\mathrm{disp}_{\mathrm{lower}}}; \text{Shape},\text{Rate} \Big) - F_{\Gamma}\!\Big( \tfrac{1}{\mathrm{disp}_{\mathrm{upper}}}; \text{Shape},\text{Rate} \Big) \Bigg) \\[6pt] &\quad + \log\!\Bigg( F_{\Gamma}\!\Big( \tfrac{1}{\mathrm{disp}_{\mathrm{lower}}}; \text{Shape} + \tfrac{n_{2}}{2} - \mathrm{lm\_log2}, \text{Rate} + \tfrac{\mathrm{RSS}_{\mathrm{Min}}}{2} \Big) \\[-2pt] &\qquad\qquad\qquad - F_{\Gamma}\!\Big( \tfrac{1}{\mathrm{disp}_{\mathrm{upper}}}; \text{Shape} + \tfrac{n_{2}}{2} - \mathrm{lm\_log2}, \text{Rate} + \tfrac{\mathrm{RSS}_{\mathrm{Min}}}{2} \Big) \Bigg) \\[6pt] &= \text{Shape}\,\log\tfrac{1}{d} - \text{Rate}\,\tfrac{1}{d} + \text{Shape}\log(\text{Rate}) - \log\Gamma(\text{Shape}) \\[4pt] &\quad - \log\!\Bigg( F_{\Gamma}\!\Big( \tfrac{1}{\mathrm{disp}_{\mathrm{lower}}}; \text{Shape},\text{Rate} \Big) - F_{\Gamma}\!\Big( \tfrac{1}{\mathrm{disp}_{\mathrm{upper}}}; \text{Shape},\text{Rate} \Big) \Bigg). \end{aligned} \]

(5b)

\[ \begin{aligned} \log q_{\mathrm{TN},3}(\beta\mid j) &= \log q_{\mathrm{TN},2}(\beta\mid j) + \sum_{r=1}^p \log\!\big(e^{\mathrm{logrt}_{j,r}} - e^{\mathrm{loglt}_{j,r}}\big) \\[6pt] &= -\tfrac12\,\beta^\top \beta + \tfrac{p}{2}\log(2\pi). \end{aligned} \]

(5c)

\[ \begin{aligned} \log \mathrm{PLSD}_{j,4} &= \log \mathrm{PLSD}_{j,3} - \sum_{r=1}^p \log\!\big(e^{\mathrm{logrt}_{j,r}} - e^{\mathrm{loglt}_{j,r}}\big) + C_{\mathrm{PLSD}} \\[6pt] &= 0. \end{aligned} \]

(5d)

\[ \begin{aligned} C_{\mathrm{final}} &= -\,\mathrm{UB3B}_{2}(d) + \text{const} + \text{Shape}\log(\text{Rate}) - \log\Gamma(\text{Shape}) \\[4pt] &\quad - \Big[ (\text{Shape} + n_{2}/2 - \mathrm{lm\_log2}) \log(\text{Rate} + \mathrm{RSS}_{\mathrm{Min}}/2) - \log\Gamma(\text{Shape} + n_{2}/2 - \mathrm{lm\_log2}) \Big] \\[4pt] &\quad + \log\!\Big( F_{\Gamma}(\tfrac{1}{\mathrm{disp}_{\mathrm{lower}}}; \text{Shape},\text{Rate}) - F_{\Gamma}(\tfrac{1}{\mathrm{disp}_{\mathrm{upper}}}; \text{Shape},\text{Rate}) \Big) \\[4pt] &\quad - \log\!\Big( F_{\Gamma}(\tfrac{1}{\mathrm{disp}_{\mathrm{lower}}}; \text{Shape} + n_{2}/2 - \mathrm{lm\_log2}, \text{Rate} + \mathrm{RSS}_{\mathrm{Min}}/2) \\[-2pt] &\qquad\qquad\qquad - F_{\Gamma}(\tfrac{1}{\mathrm{disp}_{\mathrm{upper}}}; \text{Shape} + n_{2}/2 - \mathrm{lm\_log2}, \text{Rate} + \mathrm{RSS}_{\mathrm{Min}}/2) \Big). \end{aligned} \]

Then for each face j:

\[ \begin{aligned} & \underbrace{\log q_{\Gamma}(d)}_{\text{Gamma proposal}} + \underbrace{\log q_{\mathrm{TN}}(\beta\mid j)}_{\text{Truncated normal proposal}} + \underbrace{\log \mathrm{PLSD}_j}_{\text{Per‑face mixture weight}} \\[6pt] & + \underbrace{\mathrm{test1}(\beta,d) - \mathrm{UB2A}_j(d) - \mathrm{UB3A}_j(d) - \mathrm{UB3B}(d)}_{\text{corrections}} + \mathrm{const} \\[6pt] &= \underbrace{\log q_{\Gamma,2}(d)}_{\text{Interim Gamma Term}} + \underbrace{\log q_{\mathrm{TN}}(\beta\mid j)}_{\text{Truncated normal proposal}} + \underbrace{\log \mathrm{PLSD}_j}_{\text{Per‑face mixture weight}} \\[6pt] & + \underbrace{\mathrm{test1}(\beta,d) - \mathrm{UB2A}_j(d) - \mathrm{UB3A}_{j,2}(d) - \mathrm{UB3B}_{2}(d)}_{\text{Interim correction terms}} + \mathrm{const} \\[6pt] &= \underbrace{\log q_{\Gamma,2}(d)}_{\text{Interim Gamma Term}} + \underbrace{\log q_{\mathrm{TN}}(\beta\mid j)}_{\text{Truncated normal proposal}} + \underbrace{\log \mathrm{PLSD}_{j,2}}_{\text{Interim Per‑face mixture weight}} \\[6pt] & + \underbrace{\mathrm{test1}_{2}(\beta,d) - \mathrm{UB2A}_j(d) - \mathrm{UB3B}_{2}(d)}_{\text{Interim correction Terms}} + \mathrm{const} \\[6pt] &= \underbrace{\log q_{\Gamma,3}(d)}_{\text{Interim Gamma Term}} + \underbrace{\log q_{\mathrm{TN}}(\beta\mid j)}_{\text{Truncated normal proposal}} + \underbrace{\log \mathrm{PLSD}_{j,2}}_{\text{Interim Per‑face mixture weight}} + \underbrace{\mathrm{test1}_{3}(\beta,d) - \mathrm{UB3B}_{2}(d)}_{\text{Interim correction Terms}} + \mathrm{const} \\[6pt] &= \underbrace{\log q_{\Gamma,4}(d)}_{\text{Interim Gamma Term}} + \underbrace{\log q_{\mathrm{TN},2}(\beta\mid j)}_{\text{Interim Normal term}} + \underbrace{\log \mathrm{PLSD}_{j,3}}_{\text{Interim Per‑face mixture weight}} + \underbrace{\mathrm{test1}_{4}(\beta,d) - \mathrm{UB3B}_{2}(d)}_{\text{Interim correction Terms}} + \mathrm{const} \\[6pt] &= \underbrace{\log q_{\Gamma,5}(d)}_{\text{Prior Gamma Distribution Term}} + \underbrace{\log q_{\mathrm{TN},3}(\beta\mid j)}_{\text{Prior Multivariate Normal Term}} + \underbrace{LL(\beta,d)}_{\text{Standardized Log-Likelihood Term}} + C_{\mathrm{final}}. \end{aligned} \]

Narrative explanation.
In the above chain of equalities, each step systematically replaces interim definitions with their standardized counterparts, while absorbing constants into a global term:

The first equality follows from substituting the adjusted definitions in (1a), (1b), and (1c), which restructure the Gamma proposal and the UB3A/UB3B corrections.
The second equality uses (2a) and (2b) to absorb the quadratic–linear adjustment into \(\mathrm{test1}_{2}\) and to shift the per‑face mixture weight into \(\mathrm{PLSD}_{j,2}\).
At this stage, the UB2 term has already been centered, so \(\mathrm{UB2}_j(d)\) is replaced by
\[ \mathrm{UB2A}_j(d)=\mathrm{UB2}_j(d)-\mathrm{UB2\_Min}_j, \] with \(\mathrm{UB2\_Min}_j\) absorbed into the mixture weight.
The third equality applies (3a) and (3b), re‑expressing the Gamma proposal and \(\mathrm{test1}\) by incorporating the residual‑sum‑of‑squares adjustment.
The fourth equality follows from (4a)–(4d), where the dispersion‑dependent log term is absorbed into \(\mathrm{test1}_{4}\), the truncated normal proposal is simplified to \(\log q_{\mathrm{TN},2}\), and the mixture weight is reduced to \(\mathrm{PLSD}_{j,3}\).
Finally, the fifth equality applies the adjustments in (5a)–(5d):
- (5a) replaces the interim Gamma normalization with the true prior Gamma distribution in dispersion form.
- (5b) removes the truncation‑width adjustment, leaving the standard prior multivariate normal distribution term.
- (5c) shows that the per‑face mixture weight cancels to zero once the truncation widths and denominator are absorbed.
- (5d) consolidates all remaining parameter‑free terms, including the UB3B correction, into a single adjusted constant \(C_{\text{final}}\).

Thus, the final decomposition is: \[ \log \pi(\beta,d) = \underbrace{\log q_{\Gamma,5}(d)}_{\text{Prior Gamma Distribution Term}} + \underbrace{\log q_{\mathrm{TN},3}(\beta\mid j)}_{\text{Prior Multivariate Normal Term}} + \underbrace{LL(\beta,d)}_{\text{Standardized Log-Likelihood Term}} + C_{\text{final}}. \]

This shows that after all substitutions, the surviving terms are exactly the prior Gamma distribution in \(d\), the prior multivariate normal distribution in \(\beta\), and the standardized log‑likelihood contribution. Everything else has been absorbed into a global constant. In other words, the decomposition now cleanly separates into the canonical prior and likelihood components, with no extraneous dependence on interim proposals or mixture weights.

Starting from the standardized log‑posterior, we separate the dispersion‑dependent Gamma component, the truncated normal component in \(\beta\), and the per‑face mixture weight. The residual terms are collected into the correction block
\[ \mathrm{test1} - \mathrm{UB2A}_j - \mathrm{UB3A}_j - \mathrm{UB3B}, \] which is an algebraic rearrangement, so the decomposition holds exactly.

5.2 Claim 2

Claim 2.
\(\text{test1}(\beta,d) \le 0\).

Proof of Claim 2.
By the definition of the inverse map \(c^{-1}(\cdot)\), the vector \(\bar c_j\) is a subgradient of the negative standardized log‑likelihood at the tangency point \(c^{-1}(\bar c_j,d)\). From the supporting‑hyperplane property of concave functions, it follows that for every \((\beta,d)\),

\[ LL(\beta,d) \;\le\; LL\!\big(c^{-1}(\bar c_j,d),d\big) \;-\; \bar c_j^{\top}\big(\beta - c^{-1}(\bar c_j,d)\big). \]

By definition, \(\text{test1}(\beta,d)\) is exactly the difference between the standardized log‑likelihood \(LL(\beta,d)\) and this linearization at the tangency point. Hence,

\[ \text{test1}(\beta,d) \le 0. \]

This completes the proof of Claim 2.

5.3 Claim 3

Claim 3.
\(\mathrm{UB2A}_j(d) \ge 0.\)

Proof of Claim 3.
For every face \(j\) and dispersion \(d\), \[ \mathrm{UB2}_j(d) = \frac{\mathrm{RSS}_j(d) - \mathrm{RSS\_Min}}{2d} \ge 0, \] because \(\mathrm{RSS\_Min}\) is the global minimum of \(\mathrm{RSS}_j(d)\) across all faces and all dispersion values.

Since \[ \mathrm{UB2A}_j(d) = \mathrm{UB2}_j(d) - \mathrm{UB2\_Min}_j, \] and \(\mathrm{UB2\_Min}_j = \min_d \mathrm{UB2}_j(d)\), it follows that \[ \mathrm{UB2A}_j(d) \ge 0. \]

This completes the proof of Claim 3.

Here \(\text{RSS}_j(d)\) and \(\text{RSS}_{\min}\) use weighted residuals
\[ \sum_i w_i (y_i - x_i^\top\theta)^2. \]

5.4 Claims 4 and 5

Before proceeding with the proofs, we first introduce a couple of additional functions and state a couple of remarks.

For each \(j\) Define functions \(g1_{j}(.)\), \(g2_{j}(.)\) and \(g3_{j}(.)\)

by \[ g1_{j}(d)= -\tfrac{1}{2}(\bar c^{-1}(\bar c_j,d))^\top P\,\bar c^{-1}(\bar c_j,d) + \bar c_j^\top \bar c^{-1}(\bar c_j,d) \]

\[ g2_{j}(d)=g1_{j}(d^{*}_{1})\;+\;g'_{1j}(d^{*}_{1})\, (d - d^{*}_{1}). \]

\[ g3_{j}(d) = \mathrm{lg\_prob\_factor1\_j} + \text{lmc}_1 + \text{lmc}_2\, d \]

Remark 5.4.1.
For each face \(j\), the derivative of \(g1_{j}(\cdot)\) with respect to \(d\) admits the equivalent forms \[ \frac{d}{dd}\,g1_{j}(d) = V(d)^\top A(d)^{-1} V(d) = V(d)^\top A(d)^{-1}\bar c_j - V(d)^\top A(d)^{-1} P\,A(d)^{-1}\big(r + d\,\bar c_j\big), \]

Proof of Remark 5.4.1.
Recall \[ g1_{j}(d) \;=\; -\tfrac{1}{2}\,\theta(d)^\top P\,\theta(d) \;+\; \bar c_j^\top \theta(d), \quad \theta(d) \;=\; A(d)^{-1}\big(r + d\,\bar c_j\big), \quad A(d) = Q + dP, \] and define \(V(d) := \bar c_j - P\,\theta(d)\).

Differentiate using the chain rule: \[ \frac{d}{dd}\,g1_{j}(d) = \big(\bar c_j^\top - \theta(d)^\top P\big)\,\frac{d\theta}{dd} = V(d)^\top \frac{d\theta}{dd}. \] With \(A'(d)=P\) and \(\theta(d) = A(d)^{-1}(r + d\,\bar c_j)\), \[ \frac{d\theta}{dd} = -A(d)^{-1} P\,\theta(d) + A(d)^{-1}\bar c_j = A(d)^{-1} V(d). \] Substituting back yields the quadratic form \[ \frac{d}{dd}\,g1_{j}(d) = V(d)^\top A(d)^{-1} V(d). \] Expanding \(V(d) = \bar c_j - P\,\theta(d)\) and \(\theta(d) = A(d)^{-1}(r + d\,\bar c_j)\) gives the implementation‑aligned form \[ \frac{d}{dd}\,g1_{j}(d) = V(d)^\top A(d)^{-1}\bar c_j - V(d)^\top A(d)^{-1} P\,A(d)^{-1}\big(r + d\,\bar c_j\big). \] Evaluating at \(d=d^{*}_{1}\) matches the table entry: \[ g1_{j}'(d^{*}_{1}) = V(d^{*}_{1})^\top A(d^{*}_{1})^{-1}\,\bar c_j - V(d^{*}_{1})^\top A(d^{*}_{1})^{-1} P\,A(d^{*}_{1})^{-1}\big(r + d^{*}_{1}\,\bar c_j\big) \]

This completes the proof of Remark 5.4.1.

Remark 5.4.2.
The derivative of the function \(g1_{j}(\cdot)\) with respect to \(d\) is non‑negative.

Proof of Remark 5.4.2.
From Remark 5.4.1, the derivative admits the quadratic form \[ \frac{d}{dd}\,g1_{j}(d) \;=\; V(d)^\top A(d)^{-1} V(d), \quad A(d) = Q + dP, \quad V(d) = \bar c_j - P\,\theta(d). \]

Since \(Q \succ 0\) and \(P \succ 0\), we have \(A(d) = Q + dP \succ 0\) for all \(d \in [\text{low}, \text{upp}]\), hence \(A(d)^{-1} \succ 0\). Therefore, for any \(d\) in this interval, \[ V(d)^\top A(d)^{-1} V(d) \;\ge\; 0. \]

This completes the proof of Remark 5.4.2.

Remark 5.4.3.
The function \(g1_{j}(\cdot)\) is concave, and therefore \[ g1_{j}(d) \;\le\; g1_{j}(d^{*}_{1}) + g'_{1j}(d^{*}_{1})\, (d - d^{*}_{1}). \]

Proof of Remark 5.4.3.
Recall \[ g1_{j}(d) \;=\; -\tfrac{1}{2}\,\theta(d)^\top P\,\theta(d) \;+\; \bar c_j^\top \theta(d), \quad \theta(d) \;=\; A(d)^{-1}\big(r + d\,\bar c_j\big), \quad A(d) = Q + dP, \] and define \(V(d) := \bar c_j - P\,\theta(d)\).

From Remark 5.4.1, \[ \frac{d}{dd}\,g1_{j}(d) \;=\; V(d)^\top A(d)^{-1} V(d). \] Differentiating once more using \(V'(d) = -P\,A(d)^{-1}V(d)\) and \((A^{-1})'(d) = -A(d)^{-1} P\, A(d)^{-1}\), we obtain \[ \frac{d^2}{dd^2}\,g1_{j}(d) = -\,3\,V(d)^\top A(d)^{-1} P\,A(d)^{-1} V(d) \;\le\; 0, \] since \(A(d)^{-1} \succ 0\) and \(P \succ 0\). Hence \(g1_{j}\) is concave on \([\text{low}, \text{upp}]\).

By the supporting line inequality for concave functions, for any \(d\), \[ g1_{j}(d) \;\le\; g1_{j}(d^{*}_{1}) + g1_{j}'(d^{*}_{1})\,(d - d^{*}_{1}) = g1_{j}(d^{*}_{1}) + g'_{1j}(d^{*}_{1})\, (d - d^{*}_{1}), \] which is the stated result.

This completes the proof of Remark 5.4.3.

Remark 5.4.4.
\[ \text{lmc}_2 \;\text{and}\; \mathrm{lm\_log2} \;\text{are both non‑negative.} \] Proof of Remark 5.4.4.
From the constants table, \[ \mathrm{max\_low\_mean} \;=\; \mathrm{max\_upp} - g'_{1j}(d^{*}_{1})\,(\text{upp} - \text{low}), \] and \[ \text{lmc}_2 \;=\; \frac{\mathrm{max\_upp} - \mathrm{max\_low\_mean}}{\text{upp} - \text{low}} \;=\; \mathrm{m}_{g'_{1}}. \] Since each per‑face slope satisfies \(g'_{1j}(d^{*}_{1})\, \ge 0\) (Remark 2), their mean \(\mathrm{m}_{g'_{1}} \ge 0\). Hence \(\text{lmc}_2 \ge 0\).

Moreover, \[ \mathrm{lm\_log2} \;=\; \text{lmc}_2 \cdot d^{*}_{2}, \qquad d^{*}_{2} \;=\; \frac{\text{upp} - \text{low}}{\log(\tfrac{\text{upp}}{\text{low}})}. \]

With \(\text{upp} > \text{low} > 0\), we have \(\log(\tfrac{\text{upp}}{\text{low}}) > 0\), so \(d^{*}_{2} > 0\). Since \(\text{lmc}_2 \ge 0\) and \(d^{*}_{2} > 0\), it follows that \(\mathrm{lm\_log2} \ge 0\). This completes the proof of Remark 5.4.4.

Claim 5.4.5.
\(\text{UB3A}_j(d) \ge 0\).

Proof of Claim 5.4.5.
Recall the definitions \[ g1_{j}(d) = -\tfrac{1}{2}\big(\bar c^{-1}(\bar c_j,d)\big)^\top P\,\bar c^{-1}(\bar c_j,d) + \bar c_j^\top \bar c^{-1}(\bar c_j,d), \]

\[ g2_{j}(d)= g1_{j}(d^{*}_{1})\;+\;g'_{1j}(d^{*}_{1})\, (d - d^{*}_{1}). \]

\[ g3_{j}(d) = \mathrm{lg\_prob\_factor1\_j} + \text{lmc}_1 + \text{lmc}_2\,d. \] —

Step 1. By Remark 5.4.3, \(g1_{j}\) is concave in \(d\). Therefore, its supporting line at \(d^{*}_{1}\) dominates: \[ g1_{j}(d) \;\le\; g2_{j}(d), \qquad d \in [\text{low}, \text{upp}]. \]

Step 2. Endpoint dominance of \(g3_j\) over \(g2_j\).
At the dispersion endpoints, \[ g2_j(\text{low}) = \mathrm{thetabarconst\_low\_apprx}[j], \qquad g2_j(\text{upp}) = \mathrm{thetabarconst\_upp\_apprx}[j]. \]

By definition of the per‑face UB3A shift, \[ \mathrm{lg\_prob\_factor1}_j = \max\Big\{ g2_j(\text{upp}) - \mathrm{max\_upp},\; g2_j(\text{low}) - \mathrm{max\_low} \Big\}, \] we have \[ \mathrm{lg\_prob\_factor1}_j \;\ge\; g2_j(\text{low}) - \mathrm{max\_low}, \qquad \mathrm{lg\_prob\_factor1}_j \;\ge\; g2_j(\text{upp}) - \mathrm{max\_upp}. \]

Using \[ g3_j(d) = \mathrm{lg\_prob\_factor1}_j + \mathrm{lmc}_1 + \mathrm{lmc}_2\, d, \] together with the identities \[ \mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{low} = \mathrm{max\_low}, \qquad \mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{upp} = \mathrm{max\_upp}, \] we obtain \[ g3_j(\text{low}) = \mathrm{lg\_prob\_factor1}_j + \mathrm{max\_low} \;\ge\; g2_j(\text{low}), \] \[ g3_j(\text{upp}) = \mathrm{lg\_prob\_factor1}_j + \mathrm{max\_upp} \;\ge\; g2_j(\text{upp}). \]

Thus \(g3_j\) dominates \(g2_j\) at both endpoints.

Step 3. Since \(g2_{j}\) is affine in \(d\), endpoint dominance implies \[ g3_{j}(d) \;\ge\; g2_{j}(d), \qquad d \in [\text{low}, \text{upp}]. \]

Step 4. Combining Steps 1 and 3, \[ g3_{j}(d) \;\ge\; g2_{j}(d) \;\ge\; g1_{j}(d), \qquad d \in [\text{low}, \text{upp}]. \]

Therefore, \[ \text{UB3A}_j(d) = g3_{j}(d) - g1_{j}(d) \;\ge\; 0. \]

This completes the proof of Claim 5.4.5.

Claim 5.
\(\text{UB3B}(d) \ge 0 \quad \text{for all } d \in [\text{low},\,\text{upp}]\).

Proof of Claim 5.

Define function \(f1(.)\) and \(f2(.)\)

\[ \begin{aligned} f1(d) &= \big(\mathrm{max\_upp} - \mathrm{max\_LL\_log\_disp}\big) \\[6pt] &\quad + \big(\mathrm{lm\_log1} + \mathrm{lm\_log2} \cdot \log d \big) \\[6pt] &= \big(\mathrm{max\_upp} - (\mathrm{lm\_log1} + \mathrm{lm\_log2} \cdot \log \text{upp}) \big) \\[6pt] &\quad + \big(\mathrm{lm\_log1} + \mathrm{lm\_log2} \cdot \log d \big) \\[6pt] &= \big(\mathrm{max\_upp} - (\text{lmc}_1 + \text{lmc}_2 \cdot d^{*}_{2} - \text{lmc}_2 \cdot \log(\text{upp})) \big) \\[6pt] &\quad + \big(\text{lmc}_1 + \text{lmc}_2 \cdot d^{*}_{2} - \text{lmc}_2 \cdot \log(d^{*}_{2}) + \text{lmc}_2 \cdot d^{*}_{2} \cdot \log d \big) \end{aligned} \]

and

\[ \begin{aligned} f2(d) &= \text{lmc}_1 + \text{lmc}_2 \cdot d \\[6pt] &= \mathrm{max\_low\_mean} - \text{lmc}_2 \cdot \text{low} + \text{lmc}_2 \cdot d \end{aligned} \]

so \(\text{UB3B}(d) =f1(d)-f2(d)\).

When \(d=\text{upp}\), we have

\[ \begin{aligned} \text{UB3B}(\text{upp}) &= f1(\text{upp}) - f2(\text{upp}) \\[6pt] &= \big(\mathrm{max\_upp} - \big(\mathrm{max\_low\_mean} + \text{lmc}_2 \cdot (\text{upp} - \text{low})\big)\big) \\[6pt] &= \big(\mathrm{max\_upp} - \big(\mathrm{max\_low\_mean} + \tfrac{(\mathrm{max\_upp} - \mathrm{max\_low\_mean})}{(\text{upp} - \text{low})} \cdot (\text{upp} - \text{low})\big)\big) \\[6pt] &= \big(\mathrm{max\_upp} - \mathrm{max\_upp}\big) \\[6pt] &= 0 \end{aligned} \]

so \(f1(upp)=f2(upp)\).

Likewise, when \(d=\text{low}\), we have

\[ \begin{aligned} \text{UB3B}(\text{low}) &= f1(\text{low}) - f2(\text{low}) \\[6pt] &= \big(\mathrm{max\_upp} - \big(\mathrm{lm\_log2} \cdot (\log(\text{upp}) - \log(\text{low}))\big)\big) \\[6pt] &\quad - \big(\mathrm{max\_low\_mean} + \text{lmc}_2 \cdot (\text{low} - \text{low})\big) \\[6pt] &= \big(\mathrm{max\_upp} - \big(\text{lmc}_2 \cdot d^{*}_{2} \cdot (\log(\text{upp}) - \log(\text{low}))\big)\big) \\[6pt] &\quad - \mathrm{max\_low\_mean} \\[6pt] &= \big(\mathrm{max\_upp} - \tfrac{(\mathrm{max\_upp} - \mathrm{max\_low\_mean})}{(\text{upp} - \text{low})} \cdot \tfrac{(\text{upp} - \text{low})}{\log(\tfrac{\text{upp}}{\text{low}})} \cdot \log(\tfrac{\text{upp}}{\text{low}})\big) \\[6pt] &\quad - \mathrm{max\_low\_mean} \\[6pt] &= \big(\mathrm{max\_upp} - (\mathrm{max\_upp} - \mathrm{max\_low\_mean})\big) \\[6pt] &\quad - \mathrm{max\_low\_mean} \\[6pt] &= 0 \end{aligned} \]

so \(f1(low)=f2(low)\).

From properties of the log and the result from Remark 4, that \(\mathrm{lm\_log2} \ge 0\), it follows that \(f1(\cdot)\) is a concave function.

Now, pick any \(t \in [0,1]\). It then follows that

\[\begin{align} f1(\text{t} \cdot \text{low} +\text{(1-t)} \cdot \text{upp} ) & \geq \text{t} \cdot f1(\text{low}) +\text{(1-t)} \cdot f1(\text{upp}) \\ & = \text{t} \cdot f2(\text{low}) +\text{(1-t)} \cdot f2(\text{upp}) \\ & = f2(\text{t} \cdot \text{low} +\text{(1-t)}\cdot \text{upp}). \end{align}\]

Thus, for all \(d \in [\text{low}, \text{upp}]\), \[ \text{UB3B}(d) = f1(d) - f2(d) \;\ge\; 0, \] which completes the proof.

5.5 Claims 6 and 7

This section collects the notation and identities used to analyze the residual sum of squares along a face \(j\) as a function of the dispersion \(d\), and to derive lower bounds and endpoint properties for the UB2 bound. We first introduce the notation and a set of remarks that support Claim 6 (about \(\mathrm{RSS}_j(\cdot)\)), then state and prove Claim 6. We then introduce additional remarks for the UB2 analysis and use them to state and prove Claim 7.

5.5.1 Notation and setup

We work with the following global objects:

\(X \in \mathbb{R}^{n \times p}\): design matrix, with rows \(x_i^\top\).
\(W = \operatorname{diag}(w_1,\ldots,w_n)\): weight matrix.
\(y \in \mathbb{R}^n\): response vector.
\(\alpha \in \mathbb{R}^n\): offset vector.

The residual sum of squares is \[ \mathrm{RSS}(\beta) = \sum_{i=1}^n w_i \bigl(y_i - x_i^\top \beta\bigr)^2. \]

Let \[ Q \;=\; X^{T} W X, \] and let \(\hat{\beta}\) be the weighted least squares (maximum likelihood) estimate: \[ \hat{\beta} \;=\; \arg\min_{\beta} \mathrm{RSS}(\beta), \qquad \mathrm{RSS}_{ML} \;=\; \mathrm{RSS}(\hat{\beta}). \]

For each face \(j\), let \(\bar{c}_j\) denote the gradient value associated with that face, and let \[ \beta_j(d) = c^{-1}(\bar{c}_j, d) \] be the corresponding tangency point at dispersion \(d\). The face‑specific residual sum of squares is \[ \mathrm{RSS}_j(d) \;=\; \sum_{i=1}^n w_i \bigl(y_i - x_i^\top \beta_j(d)\bigr)^2. \]

For each face \(j\), define the facewise minimum \[ \mathrm{RSS}_{\min,j} \;=\; \min_{d \in [\mathrm{low},\,\mathrm{upp}]} \mathrm{RSS}_j(d), \] and the global lower bound \[ \mathrm{RSS}_{\mathrm{Min}} \;=\; \min_j \mathrm{RSS}_{\min,j}. \]

The UB2 bound for face \(j\) at dispersion \(d\) is defined by \[ \mathrm{UB2}_j(d) \;=\; \frac{1}{2d}\bigl(\mathrm{RSS}_j(d) - \mathrm{RSS}_{\mathrm{Min}}\bigr). \]

For the prior, let \(P \succ 0\) denote the prior precision matrix and \(\mu\) the prior mean. We will use the implementation‑style notation \[ \mathrm{base\_A} \;=\; X^{T} W X \;=\; Q, \qquad \mathrm{base\_B0} \;=\; X^{T} W(\alpha - y). \] For dispersion \(d\), define \[ A(d) \;=\; P \;+\; \frac{\mathrm{base\_A}}{d}, \qquad B_0(d) \;=\; \frac{\mathrm{base\_B0}}{d} \;+\; P\mu, \] so that the tangency point satisfies \[ \beta_j(d) \;=\; A(d)^{-1}\,\bigl(\bar{c}_j - B_0(d)\bigr). \]

It is convenient to reparameterize dispersion via \[ t \;=\; \frac{1}{d}, \qquad t \in [t_{\min},\,t_{\max}] \;=\; [1/\mathrm{upp},\,1/\mathrm{low}], \] and write \(\tilde{\beta}_j(t)\) for \(\beta_j(d)\) at \(t = 1/d\). We then define \[ A(t) \;=\; P + tQ, \qquad \tilde{M}(t) \;=\; A(t)^{-1} Q\, A(t)^{-1}. \]

Finally, define the face‑specific residual at \(\hat{\beta}\) by \[ r^{*}_j \;=\; \bar{c}_j \;-\; P\mu \;-\; P\hat{\beta}. \]

5.5.2 Remarks for Claim 6

We now collect a set of remarks that will be used in the proof of Claim 6.

Remark 5.5.1 (Exact RSS decomposition).
For all \(\beta\), \[ \mathrm{RSS}(\beta) \;=\; \mathrm{RSS}_{ML} \;+\; (\beta - \hat{\beta})^{T} Q (\beta - \hat{\beta}). \]

Proof. This is the standard quadratic expansion of the weighted least squares criterion around its minimizer \(\hat{\beta}\). Since \(\nabla \mathrm{RSS}(\hat{\beta}) = 0\) and \(\nabla^2 \mathrm{RSS}(\beta) = 2Q\) is constant, the Taylor expansion is exact. \(\square\)

Remark 5.5.2 (Facewise RSS via \(r^{*}_j\) and \(\tilde{M}\)).
For each face \(j\) and \(t = 1/d\), \[ \tilde{\beta}_j(t) - \hat{\beta} \;=\; A(t)^{-1} r^{*}_j, \] and hence \[ \mathrm{RSS}_j(d) \;=\; \mathrm{RSS}_{ML} \;+\; (r^{*}_j)^{T} \tilde{M}(1/d)\, r^{*}_j. \]

Proof. From the definition of the tangency point in implementation notation, \[ \beta_j(d) = A(d)^{-1}\,\bigl(\bar{c}_j - B_0(d)\bigr), \] with \(A(d) = P + \mathrm{base\_A}/d\) and \(B_0(d) = \mathrm{base\_B0}/d + P\mu.\) Rewriting in terms of \(t = 1/d\) and using \(A(t) = P + tQ\), \(B_0(t) = t\,\mathrm{base\_B0} + P\mu\), we obtain \[ \tilde{\beta}_j(t) = A(t)^{-1}\,\bigl(\bar{c}_j - B_0(t)\bigr). \] Subtracting \(\hat{\beta}\) and regrouping terms yields \[ \tilde{\beta}_j(t) - \hat{\beta} = A(t)^{-1}\,\bigl(\bar{c}_j - P\mu - P\hat{\beta}\bigr) = A(t)^{-1} r^{*}_j. \] Substituting this into the exact decomposition of Remark 5.5.1 with \(\beta = \beta_j(d)\) and \(t = 1/d\) gives \[ \mathrm{RSS}_j(d) = \mathrm{RSS}_{ML} + (\beta_j(d) - \hat{\beta})^{T} Q (\beta_j(d) - \hat{\beta}) = \mathrm{RSS}_{ML} + (r^{*}_j)^{T} \tilde{M}(1/d)\, r^{*}_j. \] \(\square\)

Remark 5.5.3 (Monotonicity in \(d\) and endpoint minimum).
For each face \(j\) and all \(d \in [\mathrm{low},\,\mathrm{upp}]\), \[ (r^{*}_j)^{T} \tilde{M}(1/d)\, r^{*}_j \;\ge\; (r^{*}_j)^{T} \tilde{M}(1/\mathrm{low})\, r^{*}_j, \] and hence \[ \min_{d \in [\mathrm{low},\,\mathrm{upp}]} \mathrm{RSS}_j(d) \;=\; \mathrm{RSS}_j(\mathrm{low}). \]

Proof. For \(t > 0\), we have \(A(t) = P + tQ \succ tQ\) in the Loewner order, so \(A(t)^{-1} \prec (tQ)^{-1}\). Therefore \[ \tilde{M}(t) = A(t)^{-1} Q A(t)^{-1} \prec (tQ)^{-1} Q (tQ)^{-1} = t^{-2} Q^{-1}. \] As \(d\) increases, \(t = 1/d\) decreases, and the ordering of \(\tilde{M}(t)\) in \(t\) implies that \((r^{*}_j)^{T} \tilde{M}(1/d)\, r^{*}_j\) is minimized at the largest value of \(t\), i.e. at \(t = 1/\mathrm{low}\). Thus \[ (r^{*}_j)^{T} \tilde{M}(1/d)\, r^{*}_j \;\ge\; (r^{*}_j)^{T} \tilde{M}(1/\mathrm{low})\, r^{*}_j \] for all \(d \in [\mathrm{low},\,\mathrm{upp}]\). Combining this with Remark 5.5.2 gives \[ \mathrm{RSS}_j(d) \;\ge\; \mathrm{RSS}_j(\mathrm{low}), \] so the minimum over \(d\) is attained at \(d = \mathrm{low}\). \(\square\)

5.5.3 Claim 6: RSS decomposition and lower bound

Claim 6.
For each face \(j\) and dispersion \(d \in [\mathrm{low},\,\mathrm{upp}]\), \[ \mathrm{RSS}_j(d) \;=\; \mathrm{RSS}_{ML} \;+\; (\beta_j(d) - \hat{\beta})^{T} Q (\beta_j(d) - \hat{\beta}), \] and the minimum over \(d\) is attained at the lower endpoint: \[ \min_{d \in [\mathrm{low},\,\mathrm{upp}]} \mathrm{RSS}_j(d) \;=\; \mathrm{RSS}_j(\mathrm{low}). \]

Proof of Claim 6.
The identity follows immediately from Remark 5.5.1 by substituting \(\beta = \beta_j(d)\). The endpoint property follows from Remark 5.5.3, which shows that \(\mathrm{RSS}_j(d)\) is minimized at \(d=\mathrm{low}\). \(\square\)

5.5.4 Remarks for Claim 7

We now collect additional remarks used in the analysis of the UB2 bound and in the proof of Claim 7.

For the derivative analysis, introduce the standardized matrices \[ K \;=\; Q^{-1/2} P Q^{-1/2}, \qquad v_j \;=\; Q^{-1/2} r^{*}_j, \] and the reparameterized function \[ \tilde{\mathrm{UB2}}_j(t) \;:=\; \mathrm{UB2}_j(1/t), \qquad t \in [1/\mathrm{upp},\,1/\mathrm{low}]. \] Let \[ \Delta \;=\; \mathrm{RSS}_{\mathrm{Min}} - \mathrm{RSS}_{ML} \;\ge\; 0. \]

Remark 5.5.4 (UB2 in \(t\)–form).
Using Remark 5.5.2 and the definitions above, we can write \[ \tilde{\mathrm{UB2}}_j(t) \;=\; \frac{t}{2}\bigl(g_j(t) - \Delta\bigr), \] where \[ g_j(t) = v_j^{T}(K+tI)^{-2}v_j. \]

Proof. From Remark 5.5.2 and the definition of \(\mathrm{UB2}_j(d)\), \[ \mathrm{UB2}_j(d) = \frac{1}{2d} \Bigl( \mathrm{RSS}_{ML} + (r^{*}_j)^{T} \tilde{M}(1/d)\, r^{*}_j - \mathrm{RSS}_{\mathrm{Min}} \Bigr). \] Writing \(t = 1/d\) and using \(\tilde{M}(t) = Q^{-1/2}(K + tI)^{-2}Q^{-1/2}\), we obtain \[ (r^{*}_j)^{T} \tilde{M}(t)\, r^{*}_j = v_j^{T}(K + tI)^{-2}v_j = g_j(t), \] and hence \[ \tilde{\mathrm{UB2}}_j(t) = \frac{t}{2}\bigl(g_j(t) - \Delta\bigr). \] \(\square\)

Remark 5.5.5 (Derivatives of \(g_j\) and \(\tilde{\mathrm{UB2}}_j\)).
For all \(t > 0\), \[ g_j'(t) = -2\, v_j^{T}(K+tI)^{-3} v_j, \qquad g_j''(t) = 6\, v_j^{T}(K+tI)^{-4} v_j, \] and \[ \tilde{\mathrm{UB2}}_j'(t) = \frac{1}{2}\bigl(g_j(t) - \Delta\bigr) + \frac{t}{2}\,g_j'(t), \qquad \tilde{\mathrm{UB2}}_j''(t) = g_j'(t) + \frac{t}{2}\,g_j''(t). \]

Proof. Differentiating \((K+tI)^{-2}\) yields \(\frac{d}{dt}(K+tI)^{-2} = -2(K+tI)^{-3}\), and differentiating once more gives \(\frac{d^2}{dt^2}(K+tI)^{-2} = 6(K+tI)^{-4}\). Applying these formulas inside the quadratic form \(v_j^{T}(\cdot)v_j\) gives the derivatives of \(g_j(t)\), and differentiating the expression in Remark 5.5.4 yields the formulas for \(\tilde{\mathrm{UB2}}_j'(t)\) and \(\tilde{\mathrm{UB2}}_j''(t)\). \(\square\)

Remark 5.5.6 (Eigenvalue weighted‑average identity).
Let \(\mu_i = \lambda_i + t\) denote the eigenvalues of \(K + tI\), where \(\lambda_i\) are the eigenvalues of \(K\). Then \[ \frac{ v_j^{T}(K+tI)^{-2}v_j }{ v_j^{T}(K+tI)^{-3}v_j } \;=\; \sum_i \rho_i \mu_i, \] where \(\rho_i \ge 0\) and \(\sum_i \rho_i = 1\). In particular, the ratio is a weighted average of the \(\mu_i\) and therefore lies in the interval \([\min_i \mu_i,\,\max_i \mu_i]\).

Proof. In the eigenbasis of \(K+tI\), write \(v_j\) in coordinates \(v_j = \sum_i \sqrt{w_i} e_i\) with \(w_i \ge 0\) and \(\sum_i w_i = \|v_j\|^2\). Then \[ v_j^{T}(K+tI)^{-2}v_j = \sum_i w_i \mu_i^{-2}, \qquad v_j^{T}(K+tI)^{-3}v_j = \sum_i w_i \mu_i^{-3}. \] Hence \[ \frac{ v_j^{T}(K+tI)^{-2}v_j }{ v_j^{T}(K+tI)^{-3}v_j } = \frac{\sum_i w_i \mu_i^{-2}}{\sum_i w_i \mu_i^{-3}} = \frac{\sum_i (w_i \mu_i^{-3})\,\mu_i}{\sum_i w_i \mu_i^{-3}} = \sum_i \rho_i \mu_i, \] with \(\rho_i = \frac{w_i \mu_i^{-3}}{\sum_k w_k \mu_k^{-3}} \ge 0\) and \(\sum_i \rho_i = 1\). \(\square\)

Remark 5.5.7 (Critical points of \(\tilde{\mathrm{UB2}}_j\)). Suppose \(\Delta > 0\) and \(r^{*}_j \neq 0\). If \(t^{*} > 0\) is a critical point of \(\tilde{\mathrm{UB2}}_j(t)\), i.e. \(\tilde{\mathrm{UB2}}_j'(t^{*}) = 0\), then: \[ t^{*} = \frac{ v_j^{T}(K + t^{*}I)^{-2}v_j - \Delta }{ 2\, v_j^{T}(K + t^{*}I)^{-3}v_j }, \] and \(t^{*}\) lies strictly below the smallest eigenvalue of \(K\): \[ t^{*} I \;\prec\; K, \qquad t^{*} < \lambda_{\min}(K). \]

Proof. From Remark 5.5.4 and Remark 5.5.5, \[ \tilde{\mathrm{UB2}}_j'(t) = \frac{1}{2}\bigl(g_j(t) - \Delta\bigr) + \frac{t}{2}\,g_j'(t), \] so \(\tilde{\mathrm{UB2}}_j'(t^{*}) = 0\) implies \[ g_j(t^{*}) - \Delta = -\,t^{*} g_j'(t^{*}) = 2 t^{*}\, v_j^{T}(K + t^{*}I)^{-3} v_j, \] and hence \[ t^{*} = \frac{ g_j(t^{*}) - \Delta }{ 2\, v_j^{T}(K + t^{*}I)^{-3} v_j } = \frac{ v_j^{T}(K + t^{*}I)^{-2}v_j - \Delta }{ 2\, v_j^{T}(K + t^{*}I)^{-3}v_j }. \]

By Remark 5.5.6, we can write \[ \frac{ v_j^{T}(K+t^{*}I)^{-2}v_j }{ v_j^{T}(K+t^{*}I)^{-3}v_j } = \sum_i \rho_i \mu_i, \qquad \mu_i = \lambda_i + t^{*}, \] where \(\lambda_i\) are the eigenvalues of \(K\) and \(\rho_i\) form a probability vector. Since \(\Delta > 0\), we have \[ \frac{ v_j^{T}(K+t^{*}I)^{-2}v_j }{ v_j^{T}(K+t^{*}I)^{-3}v_j } > 2t^{*}, \] so \[ \sum_i \rho_i \mu_i > 2t^{*}. \] Using \(\mu_i = \lambda_i + t^{*}\), this implies \[ \sum_i \rho_i \lambda_i > t^{*}. \] Since \(\lambda_{\min}(K) \le \sum_i \rho_i \lambda_i\), we obtain \[ \lambda_{\min}(K) \le \sum_i \rho_i \lambda_i > t^{*}, \] so \(t^{*} < \lambda_{\min}(K)\) and hence \(t^{*} I \prec K\). \(\square\)

Remark 5.5.8 (Inflection points of \(\tilde{\mathrm{UB2}}_j\)). Suppose \(r^{*}_j \neq 0\). If \(t^{*} > 0\) is an inflection point of \(\tilde{\mathrm{UB2}}_j(t)\), i.e. \(\tilde{\mathrm{UB2}}_j''(t^{*}) = 0\), then \[ t^{*} = \frac{2}{3}\sum_i \alpha_i \mu_i, \] for some weights \(\alpha_i \ge 0\) with \(\sum_i \alpha_i = 1\), where \(\mu_i = \lambda_i + t^{*}\) are the eigenvalues of \(K + t^{*}I\). In particular, \[ t^{*} \;\ge\; 2\,\lambda_{\min}(K). \]

Proof. From Remark 5.5.5, \[ \tilde{\mathrm{UB2}}_j''(t) = g_j'(t) + \frac{t}{2}\,g_j''(t) = -2\, v_j^{T}(K+tI)^{-3} v_j + 3t\, v_j^{T}(K+tI)^{-4} v_j. \] Writing this in the eigenbasis of \(K+tI\), let \(\mu_i = \lambda_i + t\) denote the eigenvalues of \(K+tI\) and write \[ v_j^{T}(K+tI)^{-3} v_j = \sum_i w_i \mu_i^{-3}, \qquad v_j^{T}(K+tI)^{-4} v_j = \sum_i w_i \mu_i^{-4}, \] for some weights \(w_i \ge 0\) (not all zero). Then \[ \tilde{\mathrm{UB2}}_j''(t) = \sum_i w_i\Bigl(-2\,\mu_i^{-3} + 3t\,\mu_i^{-4}\Bigr) = \sum_i w_i \mu_i^{-4}\,\bigl(-2\mu_i + 3t\bigr). \] At an inflection point \(t^{*}\), we have \(\tilde{\mathrm{UB2}}_j''(t^{*}) = 0\), so \[ 0 = \sum_i w_i \mu_i^{-4}\,\bigl(-2\mu_i + 3t^{*}\bigr). \] Define normalized weights \[ \alpha_i = \frac{w_i \mu_i^{-4}}{\sum_k w_k \mu_k^{-4}}, \qquad \alpha_i \ge 0,\quad \sum_i \alpha_i = 1. \] Dividing by \(\sum_k w_k \mu_k^{-4} > 0\) gives \[ 0 = \sum_i \alpha_i\,\bigl(-2\mu_i + 3t^{*}\bigr) = -2\sum_i \alpha_i \mu_i + 3t^{*}, \] so \[ 3t^{*} = 2\sum_i \alpha_i \mu_i \quad\Longrightarrow\quad t^{*} = \frac{2}{3}\sum_i \alpha_i \mu_i. \] In particular, \[ \sum_i \alpha_i \mu_i \;\ge\; \min_i \mu_i = \lambda_{\min}(K) + t^{*}, \] so \[ t^{*} = \frac{2}{3}\sum_i \alpha_i \mu_i \;\ge\; \frac{2}{3}\bigl(\lambda_{\min}(K) + t^{*}\bigr). \] Rearranging, \[ t^{*} - \frac{2}{3}t^{*} \;\ge\; \frac{2}{3}\lambda_{\min}(K) \quad\Longrightarrow\quad \frac{1}{3}t^{*} \;\ge\; \frac{2}{3}\lambda_{\min}(K) \quad\Longrightarrow\quad t^{*} \;\ge\; 2\,\lambda_{\min}(K), \] which proves the claimed lower bound. \(\square\)

5.5.5 Claim 7: UB2 derivatives and endpoint minimization

We now state the main result about the shape of \(\tilde{\mathrm{UB2}}_j(t)\) and the location of its minima.

Claim 7.
Suppose \(\Delta > 0\) and \(r^{*}_j \neq 0\). Then:

Any critical point \(t^{*} > 0\) of \(\tilde{\mathrm{UB2}}_j(t)\) satisfies \[ t^{*} = \frac{ v_j^{T}(K + t^{*}I)^{-2}v_j - \Delta }{ 2\, v_j^{T}(K + t^{*}I)^{-3}v_j }, \] and lies strictly below the smallest eigenvalue of \(K\): \[ t^{*} I \;\prec\; K, \qquad t^{*} < \lambda_{\min}(K). \]
Any inflection point \(t^{*} > 0\) (where \(\tilde{\mathrm{UB2}}_j''(t^{*}) = 0\)) satisfies \[ 2\,\lambda_{\min}(K) \;\le\; t^{*}, \] i.e. \(t^{*}\) is at least twice the smallest eigenvalue of \(K\).
Consequently, any critical point lies strictly below any inflection point, in the region where \(\tilde{\mathrm{UB2}}_j\) is concave, and over any interval \(t \in [1/\mathrm{upp},\,1/\mathrm{low}]\) the minimum of \(\tilde{\mathrm{UB2}}_j(t)\) must occur at one of the endpoints. Equivalently, the minimum of \(\mathrm{UB2}_j(d)\) over \(d \in [\mathrm{low},\,\mathrm{upp}]\) occurs at \(d = \mathrm{low}\) or \(d = \mathrm{upp}\).

Proof of Claim 7.
Part 1 follows directly from Remark 5.5.7: at any critical point \(t^{*}\) of \(\tilde{\mathrm{UB2}}_j(t)\) with \(\Delta > 0\) and \(r^{*}_j \neq 0\), we have the stated implicit equation and the strict inequality \(t^{*} < \lambda_{\min}(K)\).

Part 2 follows from Remark 5.5.8: at any inflection point \(t^{*}\), \(\tilde{\mathrm{UB2}}_j''(t^{*}) = 0\) implies \(t^{*} \ge 2\,\lambda_{\min}(K)\).

For part 3, combine these bounds with the sign pattern of \(\tilde{\mathrm{UB2}}_j''(t)\) implied by Remark 5.5.5. The function is strictly concave for \(t < \lambda_{\min}(K)\) and convex for \(t \ge 2\,\lambda_{\min}(K)\), so any critical point must lie in the concave region strictly below any inflection point. Over any compact interval \([1/\mathrm{upp},\,1/\mathrm{low}]\), a function that is concave around any interior critical point and eventually convex must attain its minimum at an endpoint of the interval, and hence the minimum of \(\tilde{\mathrm{UB2}}_j(t)\) occurs at one of the endpoints. The equivalent statement for \(\mathrm{UB2}_j(d)\) follows from the reparameterization \(t = 1/d\). \(\square\)

Chapter A07: Accept–Reject Sampling for gaussian Regression models with independent normal-gamma priors

Kjell Nygren

2026-04-30

Envelope construction with independent Normal–Gamma prior (Gaussian regression)

1. Model Setup (Gaussian Regression with Independent Normal–Gamma Prior)

1.1 Weighted log‑likelihood

1.2 Independent Normal–Gamma prior