Chapter A09: Parallel Sampling Implementation using RcppParallel

1. Introduction

The glmbayes package uses parallel sampling for many envelope-based samplers because generating iid draws can be computationally expensive and acceptance rates may be low. By default, use_parallel = TRUE, so large fits often run on multiple cores via RcppParallel.

However, RcppParallel-based sampling cannot be interrupted. Once the main parallel loop starts, the R session is blocked until it completes. Users may experience the sampler as “freezing” during long runs. To address this, the package runs pilots that estimate how long sampling will take and, in interactive sessions, asks users to opt in or out before the non-interruptible phase begins.

This chapter describes the parallel sampling implementation, the pilot logic, and the interactive safeguards. Parallel envelope construction (e.g., in EnvelopeDispersionBuild) uses a similar pilot pattern and is noted briefly. GPU acceleration via OpenCL is covered in Chapter A10.

2. Where Parallel Sampling Is Used

Parallel sampling is used in two main C++ paths:

Entry point	When	Notes
`rNormalGLM_cpp`	Non-Gaussian GLMs (Poisson, binomial, Gamma, etc.) with `use_parallel = TRUE` and `n > 1`	Called by `rNormal_reg`
`rIndepNormalGammaReg_cpp`	Gaussian with independent Normal–Gamma prior, `use_parallel = TRUE` and `n > 1`	Uses `rIndepNormalGammaReg_std_parallel_cpp`

Serial fallback: When use_parallel = FALSE or n == 1, the serial path is used instead. The serial path includes Rcpp::checkUserInterrupt() in its loop, so it can be interrupted (e.g., with Ctrl+C).

3. Why Parallel Sampling Cannot Be Interrupted

RcppParallel uses Intel TBB (Threading Building Blocks) to distribute work across multiple threads. The R main thread is blocked while RcppParallel::parallelFor() runs. R’s R_CheckUserInterrupt() is not safe to call from worker threads, and there is no supported way to cancel a running TBB task from R. As a result:

During the parallel phase, Ctrl+C and similar interrupts do not reliably stop the run.
The session may appear to hang until the run completes.

The pilot phase mitigates this by estimating runtime before the full parallel run and giving users a chance to decline.

4. Pilot-Based Time Estimation and Opt-In

The pilot flow consists of:

Pilot run – A small serial or parallel batch (e.g., 1 draw or a few draws) to measure per-draw or per-observation time.
Calibration run – A larger parallel batch (typically up to ~1% of n, capped so it takes at most ~5 minutes) to refine the time estimate.
Time estimate – Extrapolation to the full sample size n.
Interactive safeguard – If the estimate exceeds 5 minutes, the user is prompted in interactive sessions.
Non-interactive behavior – In non-interactive sessions (e.g., R CMD check, batch jobs), the run proceeds automatically without prompting.

Threshold: 300 seconds (5 minutes). Above this, the user is asked whether to continue.

Prompt: "Estimated simulation exceeds 5 minutes. Continue? [y/N]: " (or similar).

Responses: y, yes, 1, or continue → proceed; n, no, 2, or empty → stop with an informative error.

5. rNormalGLM Pilot Logic (Non-Gaussian GLMs)

For non-Gaussian GLMs (Poisson, binomial, Gamma, etc.), the rNormalGLM C++ path uses run_rcppparallel_pilot():

5.1 Single-Draw Test

A single draw is generated in serial mode to measure per-draw time and to detect problems early.

5.2 max_draws Cap and Zero Accepts

If the pilot hits an internal max_draws cap with zero accepted draws, this indicates the envelope may be insufficiently tight. The code warns that:

The envelope may be too loose for the posterior.
The full run has no cap and cannot be interrupted.
If acceptance remains zero, the simulation may appear to hang indefinitely.

Recommended actions include: - Set use_opencl = TRUE or increase the requested number of draws (which affects n_envopt and may tighten the envelope). - Try a different Gridtype to force a tighter envelope. - Strengthen the prior.

An interactive prompt then asks: "Enter 1 to continue full run, 2 to stop and return partial results: ". If the user chooses 2, partial test results are returned and the full run is aborted.

5.3 Calibration Run and Refined Estimate

The calibration batch size is:

\[ m_{\text{stage}} = \min\left( \lceil 0.01 \cdot n \rceil,\; \lfloor 300000 / \text{est\_per\_draw\_ms} \rfloor \right), \]

where 300000 ms \(\approx\) 5 minutes. The calibration run uses parallelFor(0, m_stage, worker) to measure per-draw time empirically.

From the calibration:

per_candidate_sec = calibration elapsed time / total candidates used
est_per_draw_sec = per_candidate_sec × E_draws (expected candidates per accepted draw)
est_total_sec = est_per_draw_sec × n

5.4 Five-Minute Safeguard

If est_total_sec > 300:

Interactive: Prompt "Do you want to continue? [y/N]: "
Non-interactive: Proceed automatically with a note.

5.5 Diagnostics (verbose = TRUE)

With verbose = TRUE, the pilot reports:

Estimated simulation time for n draws
Note that the phase uses RcppParallel and cannot be safely interrupted
Calibration elapsed time and number of draws
avg_candidates_per_draw (empirical) vs E_draws
per_candidate_sec and est_per_draw_sec

6. rIndepNormalGammaReg Pilot Logic

For Gaussian regression with independent Normal–Gamma priors, the pilot logic is similar in spirit but tailored to joint \((\beta, \phi)\) sampling:

6.1 Single-Draw Test

One observation is generated in serial mode to measure per-observation time.

6.2 Calibration Run

\[ m_{\text{stage}} = \min\left( \lceil 0.01 \cdot n \rceil,\; \lfloor 300000 / \text{per\_obs\_ms\_serial} \rfloor \right). \]

The calibration run is parallel: parallelFor(0, m_stage, worker).

6.3 Time Estimate

per_obs_sec = calibration elapsed time / m_stage
est_total_sec = per_obs_sec × n

6.4 Five-Minute Safeguard

Same as rNormalGLM: if est_total_sec > 300, prompt in interactive sessions.

7. EnvelopeDispersionBuild Pilot (Envelope Construction)

The RSS and UB2 steps in EnvelopeDispersionBuild use closed-form bounds (bound_rss_over_dispersion and bound_ub2_over_dispersion) and no longer perform RSS/UB2 minimization or pilot timing blocks.

7.1 Five-Minute Safeguard

Since minimization/pilots are disabled, the time-estimate safeguard is not used.

8. Choosing Serial vs Parallel

Scenario	Recommendation
Debugging or small runs	`use_parallel = FALSE` – serial path is interruptible
Large production runs	`use_parallel = TRUE` – faster; rely on the pilot and prompt
Scripts / non-interactive	No prompt; runs proceed automatically

To disable parallel sampling:

glmb(formula, data, family, pfamily, n = 5000, use_parallel = FALSE)
rglmb(n = 5000, y, x, family, pfamily, use_parallel = FALSE)

Serial sampling is slower but can be interrupted. Parallel sampling is faster but non-interruptible once the main loop starts.

9. Technical Details

9.1 RcppParallel Worker Pattern

The parallel samplers use the standard RcppParallel Worker pattern:

A struct inheriting from RcppParallel::Worker
operator()(size_t begin, size_t end) processes the range [begin, end)
Shared outputs use RcppParallel::RVector or RcppParallel::RMatrix for thread-safe writes
Each thread has its own RNG state to avoid contention

9.2 Thread Safety

RNG: Thread-local state; no sharing across workers
Callbacks: The negative log-posterior and gradient functions (f2, f3) are called from workers; a mutex protects R callback invocations where needed
Output: Write-only ranges per worker; no overlapping writes

9.3 Dependencies

RcppParallel – Provides TBB and the parallelFor interface
TBB – Bundled with RcppParallel; no separate installation required

10. Troubleshooting

Symptom	Likely cause	Suggestion
Sampler appears to hang	Long parallel run	Use `verbose = TRUE` to see pilot estimates before the main run
Zero accepts in pilot	Envelope too loose	Try `use_opencl = TRUE`, larger `n`, different `Gridtype`, or stronger prior
Prompt never appears	Non-interactive session	Expected; runs proceed automatically in batch/CI
Want to interrupt a run	Parallel phase has started	Cannot interrupt; use `use_parallel = FALSE` for interruptible runs

11. Cross-References

Chapter A05 – Simulation pipeline (optimization, standardization, envelope, sampling)
Chapter A08 – Envelope-related functions
Chapter A10 – OpenCL acceleration for envelope construction
?glmb, ?rglmb, ?lmb, ?rlmb – use_parallel argument
NEWS.md – Parallel sampling and pilot-related changes

Chapter A09: Parallel Sampling Implementation using RcppParallel

Kjell Nygren

2026-04-30

Chapter A09: Parallel Sampling Implementation using RcppParallel

1. Introduction

2. Where Parallel Sampling Is Used

3. Why Parallel Sampling Cannot Be Interrupted

4. Pilot-Based Time Estimation and Opt-In

5. rNormalGLM Pilot Logic (Non-Gaussian GLMs)

5.1 Single-Draw Test

5.2 max_draws Cap and Zero Accepts

5.3 Calibration Run and Refined Estimate

5.4 Five-Minute Safeguard

5.5 Diagnostics (verbose = TRUE)

6. rIndepNormalGammaReg Pilot Logic

6.1 Single-Draw Test

6.2 Calibration Run

6.3 Time Estimate

6.4 Five-Minute Safeguard

7. EnvelopeDispersionBuild Pilot (Envelope Construction)

7.1 Five-Minute Safeguard

8. Choosing Serial vs Parallel

9. Technical Details

9.1 RcppParallel Worker Pattern

9.2 Thread Safety

9.3 Dependencies

10. Troubleshooting

11. Cross-References