4. Main Proposition and Supporting claims
The central goal of this section is to prepare for an
accept–reject sampling algorithm for the joint
posterior \(\pi(\beta,d)\).
In such algorithms, we need two ingredients:
- Proposal distributions — simple, tractable
distributions from which we can easily generate candidate values for
\((\beta,d)\).
- Correction terms — adjustments that ensure the proposals form valid upper bounds on the true posterior, so that the accept–reject step is mathematically justified.
To make this work, we will express the log‑posterior in a form that
separates these two roles:
- the proposal part, which generates candidates,
and
- the correction part, which determines the acceptance
probability.
Before we can state the main proposition that formalizes this decomposition, we need to introduce the constants and auxiliary quantities that appear in both the proposals and the corrections. These constants:
- Define the dispersion bounds, tangency maps, and slopes that
structure the proposals.
- Encode bounding line parameters and extrapolated constants that
guarantee the correction terms are non‑negative.
- Provide a bridge between the mathematical formulation and the implementation, since many of these constants correspond directly to terms in the code.
By laying out these constants first, we ensure that every symbol used in the proposition and its supporting claims is defined explicitly and consistently. The table below serves as a reference point for all subsequent derivations.
| Constant | Definition | Description |
|---|---|---|
| Posterior Gamma parameters | \(\text{shape}_2 = \text{Shape} +
\tfrac{n_{w}}{2}\) \(\text{rate}_2 = \text{Rate} + \mathrm{RSS}_{\text{post}}/2\) |
Posterior shape and rate parameters. |
| Envelope dispersion anchor | \(d^{*}_{1} = \dfrac{\text{rate}_2}{\text{shape}_2 - 1}\) | Dispersion used for baseline envelope. |
| Dispersion bounds | \(\begin{aligned} & \text{low} \\[4pt] & \text{upp} \end{aligned}\) | Lower and upper bounds for dispersion. |
| Tangency offset vector | \(B(d) = P\mu + \dfrac{1}{d}\,X^\top W(\alpha - y)\) | Offset vector used in the inverse map \(c^{-1}(\bar c, d)\). |
| Tangency slope components | \(\begin{aligned} Q &= X^\top X \\ A(d) &= Q + dP \\ r &= X^\top(y-\alpha) \\ V(d) &= \bar c_j - P\,A(d)^{-1}(r + d\,\bar c_j) \end{aligned}\) | Components for quad–linear slope term. |
| Tangency inverse map | \(c^{-1}(\bar c, d) = A(d)^{-1}(\bar c - B(d))\) | Maps gradient vector \(\bar c\) and dispersion \(d\) to tangency point \(\theta\). |
| Tangency map | \(\theta(d) = c^{-1}(\bar c_j, d)\) | Tangency point in coefficient space at dispersion \(d\). |
| Tangency face energy | \(g_{1j}(d) = -\tfrac12\,\theta(d)^\top P\,\theta(d) + \bar c_j^\top \theta(d)\) | Quadratic–linear face energy at dispersion \(d\). |
| Baseline face constant | \(g_{1j}(d^{*}_{1})\) | Face‑specific constant obtained by evaluating \(g_{1j}(d)\) at the anchor \(d^{*}_{1}\). |
| Derivative of face energy | \(\begin{aligned} & g'_{1j}(d) = V(d)^\top A(d)^{-1}\,\bar c_j \\[6pt] & \qquad -\, \Big( V(d)^\top A(d)^{-1} P\,A(d)^{-1} \\[2pt] & \qquad\qquad\qquad\;\;\times \big(r + d\,\bar c_j\big) \Big) \end{aligned}\) | Derivative of \(g_{1j}(d)\) with respect to dispersion. |
| Derivative at the anchor | \(\begin{aligned} & g'_{1j}(d^{*}_{1}) = V(d^{*}_{1})^\top A(d^{*}_{1})^{-1}\,\bar c_j \\[6pt] & \qquad -\, \Big( V(d^{*}_{1})^\top A(d^{*}_{1})^{-1} P\,A(d^{*}_{1})^{-1} \\[2pt] & \qquad\qquad\qquad\;\;\times \big(r + d^{*}_{1}\,\bar c_j\big) \Big) \end{aligned}\) | Value of the derivative at the envelope anchor \(d^{*}_{1}\). |
| Tangency face energy | \(g_{1j}(d) = -\tfrac12\,\theta(d)^\top P\,\theta(d) + \bar c_j^\top \theta(d)\) | Quadratic–linear face energy at dispersion \(d\). |
| Mean quad–linear slope | \(\mathrm{m}_{g'_{1}} = \displaystyle \operatorname*{mean}\limits_{j}\!\big(g'_{1j}(d^{*}_{1})\big)\) | Average derivative \(g'_{1j}(d^{*}_{1})\) across faces. |
| Supporting line for face \(j\) | \(g_{2j}(d) = g_{1j}(d^{*}_{1}) + (d - d^{*}_{1})\,g'_{1j}(d^{*}_{1})\) | Linear supporting line of \(g_{1j}(d)\) at the anchor \(d^{*}_{1}\). |
| Extrapolated face constants | \(\begin{aligned} & g_{2j}(\text{upp}) = g_{1j}(d^{*}_{1}) \\[2pt] & \qquad\qquad\;\; +\, (\text{upp}-d^{*}_{1})\, g'_{1j}(d^{*}_{1}) \\[6pt] & g_{2j}(\text{low}) = g_{1j}(d^{*}_{1}) \\[2pt] & \qquad\qquad\;\; +\, (\text{low}-d^{*}_{1})\, g'_{1j}(d^{*}_{1}) \end{aligned}\) | Linear extrapolations to dispersion bounds. |
| Endpoint maxima | \(\begin{aligned} \mathrm{max\_upp} & = \max_j\!\big(g_{2j}(\text{upp})\big) \\[8pt] \mathrm{max\_low} & = \max_j\!\big(g_{2j}(\text{low})\big) \end{aligned}\) | Maxima at upper and lower dispersion bounds. |
| Mean lower-bound maximum | \(\begin{aligned} &\mathrm{max\_low\_mean} \\[4pt] &= \mathrm{max\_upp} \\[4pt] &\quad - \mathrm{m}_{g'_{1}}\,(\text{upp}-\text{low}) \end{aligned}\) | Linearized lower-bound maximum. |
| Global line parameters | \(\text{lmc}_2 = \dfrac{\mathrm{max\_upp}
- \mathrm{max\_low\_mean}}{\text{upp}-\text{low}}\) \(\text{lmc}_1 = \mathrm{max\_low\_mean} - \text{lmc}_2\,\text{low}\) |
Slope and intercept of global affine bound. |
| Log–linear anchor | \(d^{*}_{2} = \dfrac{\text{upp}-\text{low}}{\log(\text{upp}/\text{low})}\) | Anchor point for log–tilt. |
| Log–tilt coefficients | \(\begin{aligned} & \mathrm{lm\_log2} = \\[4pt] & \text{lmc}_2\,d^{*}_{2} \\[4pt] & \mathrm{lm\_log1} = \\[4pt] & \text{lmc}_1 + \text{lmc}_2 d^{*}_{2} - \text{lmc}_2\log(d^{*}_{2}) \\[4pt] & \mathrm{max\_LL\_log\_disp} =\\[4pt] & \mathrm{lm\_log1} + \mathrm{lm\_log2}\log(\text{upp}) \end{aligned}\) | Coefficients for log–tilt bounding function. |
| Face-specific RSS | \(\mathrm{RSS}_j(d) = \sum_{i=1}^n w_i\,(y_i - x_i^\top c^{-1}(\bar c_j,d))^2\) | Residual sum of squares for face \(j\) at dispersion \(d\). |
| Global minimum RSS | \(\mathrm{RSS\_Min} = \min_{j}\;\min_{d\in[\text{low},\,\text{upp}]}\;\mathrm{RSS}_j(d)\) | Global minimum RSS across all faces and dispersion values. |
| UB2 term | \(\mathrm{UB2}_j(d) = \dfrac{1}{2d}\big(\mathrm{RSS}_j(d) - \mathrm{RSS\_Min}\big)\) | Nonnegative UB2 bound for face \(j\). |
| Per-face UB2 minimum | \(\mathrm{UB2\_Min}_j = \min_{d\in[\text{low},\,\text{upp}]}\mathrm{UB2}_j(d)\) | Minimum UB2 value for face \(j\). |
| Per‑face shift (UB3A) | \(\begin{aligned} & \mathrm{lg\_prob\_factor1}_{j} = \\[4pt] & \max\Big\{\, g_{2j}(\text{upp}) - \mathrm{max\_upp}, \\[-2pt] & \qquad\;\; g_{2j}(\text{low}) - \mathrm{max\_low} \Big\} \end{aligned}\) | Raw per‑face shift used in UB3A construction. |
| Per‑face shift (PLSD) | \(\begin{aligned} & \mathrm{lg\_prob\_factor2}_{j} = \\[4pt] & \mathrm{lg\_prob\_factor1}_{j} \;-\; \mathrm{UB2\_Min}_{j} \end{aligned}\) | UB2‑adjusted shift used in PLSD mixture weights. |
| Global affine bound \(g3_j\) | \(\displaystyle g3_{j}(d) = \mathrm{lg\_prob\_factor1}_{j} + \mathrm{lmc}_1 + \mathrm{lmc}_2\, d\) | Global affine upper bound for the quadratic–linear face energy. |
| Gamma proposal parameters | \(\text{shape}_3 = \text{shape}_2 -
\mathrm{lm\_log2}\) \(\text{rate}_3 = \text{Rate} + \mathrm{RSS\_Min}/2\) |
Adjusted Gamma proposal parameters. |
Remark 4.1.1
The envelope uses the dispersion variable \(d
\equiv \phi = 1/\tau\), where \(\tau\) is the precision appearing in
Sections 1.1–1.2. When the user does not supply bounds, the default
implementation truncates the surrogate posterior
\[
\tau \mid y \sim \Gamma(\text{shape}_2,\text{rate}_2)
\] by selecting central quantiles at level \(\mathrm{max\_disp\_perc}\) (default \(0.99\)): \[
\tau_{\min} =
qgamma(\mathrm{max\_disp\_perc},\text{shape}_2,\text{rate}_2),
\qquad
\tau_{\max} =
qgamma(1-\mathrm{max\_disp\_perc},\text{shape}_2,\text{rate}_2).
\] The dispersion bounds used in the envelope are then \[
\text{low} = \frac{1}{\tau_{\max}},
\qquad
\text{upp} = \frac{1}{\tau_{\min}}.
\]
These bounds exclude only the far tails of the surrogate Gamma posterior for \(\tau\) while still restricting \(d\) enough for the global log‑tilt bound to dominate the linear supporting line \(g_{2j}(d)\) across the entire interval.
Remark 4.1.2
By construction, the global affine bound satisfies \[
\mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{low} = \mathrm{max\_low},
\qquad
\mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{upp} = \mathrm{max\_upp}.
\]
Proof of Remark 4.1.2
From the definitions of the global affine bound, \[ \mathrm{lmc}_2 = \frac{\mathrm{max\_upp} - \mathrm{max\_low}}{\text{upp} - \text{low}}, \qquad \mathrm{lmc}_1 = \mathrm{max\_low} - \mathrm{lmc}_2\,\text{low}, \] we verify the endpoint identities as follows.
At \(d = \text{low}\), \[ \mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{low} = \big(\mathrm{max\_low} - \mathrm{lmc}_2\,\text{low}\big) + \mathrm{lmc}_2\,\text{low} = \mathrm{max\_low}. \]
At \(d = \text{upp}\), \[ \begin{aligned} \mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{upp} &= \mathrm{max\_low} - \mathrm{lmc}_2\,\text{low} + \mathrm{lmc}_2\,\text{upp} \\ &= \mathrm{max\_low} + \mathrm{lmc}_2(\text{upp} - \text{low}) \\ &= \mathrm{max\_low} + \frac{\mathrm{max\_upp} - \mathrm{max\_low}}{\text{upp} - \text{low}}(\text{upp} - \text{low}) \\ &= \mathrm{max\_upp}. \end{aligned} \]
Thus the global affine function \(d \mapsto \mathrm{lmc}_1 + \mathrm{lmc}_2 d\) interpolates the endpoint maxima: \[ \mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{low} = \mathrm{max\_low}, \qquad \mathrm{lmc}_1 + \mathrm{lmc}_2\,\text{upp} = \mathrm{max\_upp}. \]
Remark 4.1.3
A requirement for the above to be a valid sampler is that \(\mathrm{lm\_log2}<\text{shape}_2\). We impose the stronger requirement that \(\mathrm{lm\_log2} \leq \tfrac{n_{w}}{2}\) in our implementation and adjust the global bound as needed while still maintaining validity of the sampler (see the implementation for details).
4.2 Proposal distributions
In the accept–reject scheme, proposals are the distributions we can sample from directly to generate candidate values. We use two proposals:
- A Gamma-based proposal for dispersion \(d\) (implemented as a truncated
inverse‑Gamma in \(d\), equivalently a
truncated Gamma in \(1/d\)).
- A mixture of truncated normal proposals for \(\beta\), sampled in two steps:
- First, sample a face index \(j\)
using the per‑face mixture weights \(\text{PLSD}_j\).
- Then, conditional on \(j\), sample \(\beta\) from the face‑specific truncated normal.
- First, sample a face index \(j\)
using the per‑face mixture weights \(\text{PLSD}_j\).
These proposals are constructed to be tractable for sampling and to support tight bounds via the correction terms.
Gamma proposal in dispersion \(d\) (with truncation)
For dispersion \(d \in [\,\text{low},\,\text{upp}\,]\), define the truncated inverse‑Gamma proposal density (equivalently, a truncated Gamma in \(1/d\)):
\[ \begin{aligned} \log q_\Gamma^{\text{trunc}}(d) &= \Big(\text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2}\Big)\,\log\!\Big(\text{Rate} + \tfrac{\mathrm{RSS\_Min}}{2}\Big) \\ &\quad - \log \Gamma\!\Big(\text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2}\Big) \\[6pt] &\quad - \Big(\text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2} + 1\Big)\,\log d - \frac{\text{Rate} + \tfrac{\mathrm{RSS\_Min}}{2}}{d} \\[6pt] &\quad - \log\!\Bigg( F_\Gamma\!\Big(\tfrac{1}{\text{low}};\,\text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2},\, \text{Rate} + \tfrac{\mathrm{RSS\_Min}}{2}\Big) \\[6pt] &\quad -\; F_\Gamma\!\Big(\tfrac{1}{\text{upp}};\,\text{Shape} + \tfrac{n_{w}}{2} - \mathrm{lm\_log2},\, \text{Rate} + \tfrac{\mathrm{RSS\_Min}}{2}\Big) \Bigg), \end{aligned} \]
where \(F_\Gamma(\cdot;\,\text{shape},\,\text{rate})\)
is the Gamma CDF for \(1/d\).
Sampling proceeds by drawing \(u \sim
\text{Uniform}(0,1)\) on the truncated CDF interval and inverting
to obtain \(1/d\), then mapping to
\(d\).
Mixture of truncated normals for \(\beta\) (two-step sampling)
We construct a mixture over faces, each with a truncated normal component. Sampling is performed in two steps:
- Sample the face index \(j\) (mixture weights):
\[ \begin{aligned} \log \mathrm{PLSD}_j &= \mathrm{lg\_prob\_factor2}_j + \tfrac{1}{2}\,\bar c_j^\top \bar c_j + \sum_{r=1}^p \log\!\big(e^{\mathrm{logrt}_{j,r}} - e^{\mathrm{loglt}_{j,r}}\big) \\[6pt] &\quad - \log\!\left( \sum_{k=1}^K \exp\!\Big( \mathrm{lg\_prob\_factor2}_k + \tfrac{1}{2}\,\bar c_k^\top \bar c_k \Big) \prod_{r=1}^p \big(e^{\mathrm{logrt}_{k,r}} - e^{\mathrm{loglt}_{k,r}}\big) \right). \end{aligned} \]
- Exponentiating and normalizing across \(j=1,\dots,K\) yields a valid categorical distribution for the face index.
- Sample \(\beta\)
conditional on \(j\) (truncated
normal): \[
\log q_{\text{TN}}(\beta\mid j)
= -\tfrac{1}{2}\,\beta^\top \beta
- \bar c_j^\top \beta
- \tfrac{1}{2}\,\bar c_j^\top \bar c_j
+ \tfrac{p}{2}\log(2\pi)
- \sum_{r=1}^p \log\!\big(e^{\text{logrt}_{j,r}} -
e^{\text{loglt}_{j,r}}\big).
\]
- This corresponds to a normal with mean \(-\bar c_j\) and identity covariance,
truncated coordinate‑wise to intervals \((e^{\text{loglt}_{j,r}},\,e^{\text{logrt}_{j,r}})\).
- Sampling can be performed via independent truncated univariate normals in each coordinate, with acceptance on the rectangle.
- This corresponds to a normal with mean \(-\bar c_j\) and identity covariance,
truncated coordinate‑wise to intervals \((e^{\text{loglt}_{j,r}},\,e^{\text{logrt}_{j,r}})\).
These proposals provide efficient candidate generation: \(d\) from the truncated inverse‑Gamma, and \(\beta\) from a face‑indexed mixture of truncated normals. In the accept–reject algorithm, the corresponding correction terms will ensure that the proposals upper‑bound the target, yielding valid acceptance probabilities.
4.3 Correction terms
In the accept–reject framework, the proposal
distributions generate candidate values, while the
correction terms ensure that the proposals form valid
upper bounds on the true posterior.
These corrections are carefully constructed so that:
- They are always non‑negative (or non‑positive in the case of
test1),
- They vanish or simplify at key anchor points, and
- They guarantee that the acceptance probability is well‑defined.
Together, they measure the “gap” between the proposal approximation and the exact log‑posterior.
| Term | Definition | Description |
|---|---|---|
| \(\text{test1}_{j}(\beta,d)\) | \[\begin{aligned} & LL(\beta,d) - \\[4pt] & \Big( LL(\bar c^{-1}(\bar c_j,d),d) - \bar c_j^{\top}(\beta - \bar c^{-1}(\bar c_j,d)) \Big) \\[6pt] &= \Bigg[ - \tfrac{1}{2d}\sum_{i=1}^n w_i\,(y_i - x_i^\top \beta)^2 - \tfrac{1}{2}\beta^\top P \beta \Bigg] \\[6pt] &\quad - \Bigg[ - \tfrac{1}{2d}\sum_{i=1}^n w_i\,(y_i - x_i^\top \bar c^{-1}(\bar c_j,d))^2 \\[4pt] & -\tfrac{1}{2}\big(\bar c^{-1}(\bar c_j,d)\big)^\top P\,\bar c^{-1}(\bar c_j,d) \\[4pt] & - \bar c_j^{\top}\big(\beta - \bar c^{-1}(\bar c_j,d)\big) \Bigg] \end{aligned}\] | Difference between the log‑likelihood and its linearization at the tangency point. By concavity, this term is always \(\leq 0\). |
| \(\text{RSS}_{j}(d)\) | \(\sum_{i=1}^n w_i\,(y_i - x_i^\top \bar c^{-1}(\bar c_j,d))^2\) | Residual Sum of Squares for face j at dispersion \(d\) . |
| \(\mathrm{RSS\_Min}\) | \[\begin{aligned} & \min_{j}\;\min\limits_{d \in [d_{\mathrm{low}},\, d_{\mathrm{upp}}]} \\[4pt] & \sum_{i=1}^n w_i \,\Big(y_i - x_i^\top \bar c^{-1}(\bar c_j,d)\Big)^2 \end{aligned}\] | Global minimum residual sum of squares across all faces \(j\) and dispersion values \(d\). |
| \(\text{UB2}_{j}(d)\) | \(\tfrac{1}{2d}\Big(\sum_{i=1}^n w_i\,(y_i - x_i^\top \bar c^{-1}(\bar c_j,d))^2 - \mathrm{RSS}_{Min}\Big)\) | Bound relative to the ML residual sum of squares. Always \(\geq 0\). |
| \(\mathrm{UB2\_Min}_{j}\) | \[\begin{aligned} & \min\limits_{d \in [d_{\mathrm{low}},\, d_{\mathrm{upp}}]} \;\text{UB2}_{j}(d) \end{aligned}\] | Minimum UB2 value across the dispersion range. |
| \(\text{UB2A}_{j}(d)\) | \(\text{UB2}_{j}(d) - \mathrm{UB2\_Min}_{j}\) | Non‑negative UB2 adjustment. |
| \(\text{UB3A}_j(d)\) | \[\begin{aligned} &\mathrm{lg\_prob\_factor1\_j} + \text{lmc}_1 + \text{lmc}_2\, d \\ &- \Big(-\tfrac{1}{2}(\bar c^{-1}(\bar c_j,d))^\top P\,\bar c^{-1}(\bar c_j,d) \\[4pt] & + \bar c_j^\top \bar c^{-1}(\bar c_j,d)\Big) \end{aligned}\] | Bound from the quadratic–linear face energy. Constructed so the global line dominates the exact quadratic–linear term. |
| \(\text{UB3B}(d)\) | \[\begin{aligned} &(\mathrm{max\_upp} - \mathrm{max\_LL\_log\_disp}) \\ &+ (\mathrm{lm\_log1} + \mathrm{lm\_log2} \cdot \log d) \\[4pt] & - (\text{lmc}_1 + \text{lmc}_2 \cdot d) \end{aligned}\] | Bound from the dispersion log–tilt construction. Ensures the global log–linear approximation dominates the true log‑dispersion curve. |
Here, the standardized log‑likelihood is
\[ LL(\beta,d) \;=\; -\tfrac{n_{w}}{2}\log d \;-\; \tfrac{1}{2d}\sum_{i=1}^n w_i\,(y_i - x_i^\top \beta)^2 \;-\; \tfrac{1}{2}\,\beta^\top P\,\beta. \]
Interpretation.
- test1 measures how far the log‑likelihood lies below its
tangent plane.
- UB2 enforces that the residual sum of squares at the
tangency point cannot beat the ML fit.
- UB3A ensures that the quadratic–linear face energy is
bounded above by a global line.
- UB3B ensures that the log‑dispersion contribution is
bounded above by a log–linear tilt.
Together, these correction terms guarantee that the proposals dominate the target posterior, making the accept–reject step valid.
4.4 Proposition: Log‑posterior decomposition in dispersion form
Proposition For face index \(j\), dispersion \(d\), and tangency point \(\theta(d) = c^{-1}(\bar c_j,d)\),
the joint log‑posterior can be written as \[
\begin{aligned}
\log \pi(\beta,d)
&= \underbrace{\log q_\Gamma(d)}_{\text{Gamma proposal}}
+ \underbrace{\log q_{\mathrm{TN}}(\beta\mid j)}_{\text{Truncated normal
proposal}}
+ \underbrace{\log \mathrm{PLSD}_j}_{\text{Per‑face mixture weight}}
\\[6pt]
&\quad
+ \underbrace{
\mathrm{test1}(\beta,d)
- \mathrm{UB2A}_j(d)
- \mathrm{UB3A}_j(d)
- \mathrm{UB3B}(d)
}_{\text{Correction block}}
+ \text{const}.
\end{aligned}
\]
Moreover, the correction terms satisfy the following sign properties
\(\mathrm{test1}(\beta,d) \le 0\)
(concavity of the standardized log‑likelihood and its supporting hyperplane at
\(c^{-1}(\bar c_j,d)\)).\(\mathrm{UB2A}_j(d) \ge 0\)
(since \(\mathrm{UB2}_j(d)\) is defined relative to the global minimum
\(\mathrm{RSS}_{\mathrm{Min}}\), and \(\mathrm{UB2A}_j(d) = \mathrm{UB2}_j(d) - \mathrm{UB2\_Min}_j\).\(\mathrm{UB3A}_j(d) \ge 0\)
(quadratic–linear face energy bound dominates the exact term).\(\mathrm{UB3B}(d) \ge 0\)
(dispersion log–tilt bound dominates the exact log‑dispersion).
and we further have the following
\(\min_{d\in[\text{low},\,\text{upp}]}\;\mathrm{RSS}_j(d)= \mathrm{RSS}_j(low)\) which implies \(\mathrm{RSS}_{\mathrm{Min}}= \min_{j} \mathrm{RSS}_j(low)\)
\(\mathrm{UB2\_Min}_j=\min [\mathrm{UB2}_j(low),\mathrm{UB2}_j(upp) ]\)
which enables efficient implementation of the minimization.
Explanation
What this does:
It separates the log‑posterior into three parts:- proposal distributions you can sample from directly (Gamma in \(d\), truncated normal in \(\beta\)),
- a correction block ensuring the proposals form valid upper bounds,
and
- a parameter‑free constant.
- proposal distributions you can sample from directly (Gamma in \(d\), truncated normal in \(\beta\)),
Why the signs matter:
The correction block is \[ \mathrm{test1}(\beta,d) - \mathrm{UB2A}_j(d) - \mathrm{UB3A}_j(d) - \mathrm{UB3B}(d), \] where \(\mathrm{test1}\le 0\) and each \(\mathrm{UB}\ge 0\).
This guarantees that, when combined with the proposal terms, we obtain a valid envelope for accept–reject sampling: proposals generate candidates; the correction block calibrates acceptance.How it’s used:
In the sampler, candidates are drawn from the proposal distributions and accepted with probability determined by the correction block.
The constant term does not affect acceptance and simply keeps the identity exact.
Proof of Proposition
We verify the decomposition and the sign properties by appealing to Claims 1–5.
Algebraic decomposition (Claim-1):
Claim 1 establishes, via the sequence of substitutions (1a)–(5d), that for each face \(j\) the chain of equalities transforms \[ \log q_\Gamma(d) + \log q_{\mathrm{TN}}(\beta\mid j) + \log \mathrm{PLSD}_j + \big[ \mathrm{test1}(\beta,d) - \mathrm{UB2A}_j(d) - \mathrm{UB3A}_j(d) - \mathrm{UB3B}(d) \big] + \text{const} \] into the standardized form with prior Gamma, prior multivariate normal, and the standardized log‑likelihood, absorbing all parameter‑free remnants into a global constant.Concavity gap sign (Claim 2):
Claim 2 shows \(\mathrm{test1}(\beta,d) \le 0\) because the standardized log‑likelihood lies below its supporting hyperplane at the tangency point \(c^{-1}(\bar c_j,d)\).Residual sum‑of‑squares bound (Claim-3):
Claim 3 shows \(\mathrm{UB2}_j(d) \ge 0\) because
\[ \mathrm{UB2}_j(d) = \frac{1}{2d}\big(\mathrm{RSS}_j(d) - \mathrm{RSS}_{\mathrm{Min}}\big) \ge 0, \] and therefore
\[ \mathrm{UB2A}_j(d) = \mathrm{UB2}_j(d) - \mathrm{UB2\_Min}_j \ge 0. \]Quadratic–linear face bound (Claim-4):
Claim 4 shows that \(\mathrm{UB3A}_j(d) \ge 0\).
Define three functions
by
\[ g1_{j}(d)= -\tfrac{1}{2}(\bar c^{-1}(\bar c_j,d))^\top P\,\bar c^{-1}(\bar c_j,d) + \bar c_j^\top \bar c^{-1}(\bar c_j,d) \]
\[ g2_{j}(d)=g1_{j}(d^{*}_{1})\;+\;g'_{1j}(d^{*}_{1})\, (d - d^{*}_{1}). \]
\[ g3_{j}(d) = \mathrm{lg\_prob\_factor1\_j} + \text{lmc}_1 + \text{lmc}_2\, d \]
The key step is to compare:
- \(g1_j(d)\): exact quadratic–linear
face energy,
- \(g2_j(d)\): its supporting line at
the tangency point,
- \(g3_j(d)\): the global affine
bound constructed from \(\mathrm{lmc}_1\), \(\mathrm{lmc}_2\), and the per‑face
adjustment \(\widetilde{\lg\_prob\_factor1}_j\).
Concavity ensures \(g1_j(d) \le g2_j(d)\).
Endpoint calibration ensures \(g3_j(d) \ge g2_j(d)\).
Thus \(g3_j(d) \ge g1_j(d)\), and
\[ \mathrm{UB3A}_j(d) = g3_j(d) - g1_j(d) \ge 0. \]
Dispersion tilt bound (Claim-5):
Claim 5 shows that \(\mathrm{UB3B}(d) \ge 0\).
The comparison is between:- the linear function \(\mathrm{lmc}_1 +
\mathrm{lmc}_2 d\), and
- the concave log‑function \(\mathrm{lm\_log1} + \mathrm{lm\_log2}\log
d\).
The log‑function is calibrated at the endpoints of the dispersion interval so that it lies above the linear function at both \(\text{low}\) and \(\text{upp}\), and therefore dominates it everywhere.
- the linear function \(\mathrm{lmc}_1 +
\mathrm{lmc}_2 d\), and
Property of RSS_j (Claim-6):
\(\min_{d\in[\text{low},\,\text{upp}]}\;\mathrm{RSS}_j(d)= \mathrm{RSS}_j(low)\)
Property of UB2_j (Claim-7):
\(\mathrm{UB2\_Min}_j=\min [\mathrm{UB2}_j(low),\mathrm{UB2}_j(upp) ]\)
Combining these results: Claim 1 validates the exact algebraic split,
while Claims 2–5 confirm the properties that \[
\mathrm{test1}(\beta,d) \le 0,\quad
\mathrm{UB2A}_j(d) \ge 0,\quad
\mathrm{UB3A}_j(d) \ge 0,\quad
\mathrm{UB3B}(d) \ge 0.
\] Thus the correction block
\[
\mathrm{test1}
- \mathrm{UB2A}_j
- \mathrm{UB3A}_j
- \mathrm{UB3B}
\]
and hence has the intended signs while Claims 6-7 enables us to readily find the needed minimizing constants for the implementation.
Therefore, the proposition holds and provides a valid foundation for the accept–reject sampler.