Chapter A09: Parallel Sampling Implementation using RcppParallel

Kjell Nygren

2026-04-30

library(glmbayes)

Chapter A09: Parallel Sampling Implementation using RcppParallel

1. Introduction

The glmbayes package uses parallel sampling for many envelope-based samplers because generating iid draws can be computationally expensive and acceptance rates may be low. By default, use_parallel = TRUE, so large fits often run on multiple cores via RcppParallel.

However, RcppParallel-based sampling cannot be interrupted. Once the main parallel loop starts, the R session is blocked until it completes. Users may experience the sampler as “freezing” during long runs. To address this, the package runs pilots that estimate how long sampling will take and, in interactive sessions, asks users to opt in or out before the non-interruptible phase begins.

This chapter describes the parallel sampling implementation, the pilot logic, and the interactive safeguards. Parallel envelope construction (e.g., in EnvelopeDispersionBuild) uses a similar pilot pattern and is noted briefly. GPU acceleration via OpenCL is covered in Chapter A10.

2. Where Parallel Sampling Is Used

Parallel sampling is used in two main C++ paths:

Entry point When Notes
rNormalGLM_cpp Non-Gaussian GLMs (Poisson, binomial, Gamma, etc.) with use_parallel = TRUE and n > 1 Called by rNormal_reg
rIndepNormalGammaReg_cpp Gaussian with independent Normal–Gamma prior, use_parallel = TRUE and n > 1 Uses rIndepNormalGammaReg_std_parallel_cpp

Serial fallback: When use_parallel = FALSE or n == 1, the serial path is used instead. The serial path includes Rcpp::checkUserInterrupt() in its loop, so it can be interrupted (e.g., with Ctrl+C).

3. Why Parallel Sampling Cannot Be Interrupted

RcppParallel uses Intel TBB (Threading Building Blocks) to distribute work across multiple threads. The R main thread is blocked while RcppParallel::parallelFor() runs. R’s R_CheckUserInterrupt() is not safe to call from worker threads, and there is no supported way to cancel a running TBB task from R. As a result:

The pilot phase mitigates this by estimating runtime before the full parallel run and giving users a chance to decline.

4. Pilot-Based Time Estimation and Opt-In

The pilot flow consists of:

  1. Pilot run – A small serial or parallel batch (e.g., 1 draw or a few draws) to measure per-draw or per-observation time.
  2. Calibration run – A larger parallel batch (typically up to ~1% of n, capped so it takes at most ~5 minutes) to refine the time estimate.
  3. Time estimate – Extrapolation to the full sample size n.
  4. Interactive safeguard – If the estimate exceeds 5 minutes, the user is prompted in interactive sessions.
  5. Non-interactive behavior – In non-interactive sessions (e.g., R CMD check, batch jobs), the run proceeds automatically without prompting.

Threshold: 300 seconds (5 minutes). Above this, the user is asked whether to continue.

Prompt: "Estimated simulation exceeds 5 minutes. Continue? [y/N]: " (or similar).

Responses: y, yes, 1, or continue → proceed; n, no, 2, or empty → stop with an informative error.

5. rNormalGLM Pilot Logic (Non-Gaussian GLMs)

For non-Gaussian GLMs (Poisson, binomial, Gamma, etc.), the rNormalGLM C++ path uses run_rcppparallel_pilot():

5.1 Single-Draw Test

A single draw is generated in serial mode to measure per-draw time and to detect problems early.

5.2 max_draws Cap and Zero Accepts

If the pilot hits an internal max_draws cap with zero accepted draws, this indicates the envelope may be insufficiently tight. The code warns that:

  • The envelope may be too loose for the posterior.
  • The full run has no cap and cannot be interrupted.
  • If acceptance remains zero, the simulation may appear to hang indefinitely.

Recommended actions include: - Set use_opencl = TRUE or increase the requested number of draws (which affects n_envopt and may tighten the envelope). - Try a different Gridtype to force a tighter envelope. - Strengthen the prior.

An interactive prompt then asks: "Enter 1 to continue full run, 2 to stop and return partial results: ". If the user chooses 2, partial test results are returned and the full run is aborted.

5.3 Calibration Run and Refined Estimate

The calibration batch size is:

\[ m_{\text{stage}} = \min\left( \lceil 0.01 \cdot n \rceil,\; \lfloor 300000 / \text{est\_per\_draw\_ms} \rfloor \right), \]

where 300000 ms \(\approx\) 5 minutes. The calibration run uses parallelFor(0, m_stage, worker) to measure per-draw time empirically.

From the calibration:

  • per_candidate_sec = calibration elapsed time / total candidates used
  • est_per_draw_sec = per_candidate_sec × E_draws (expected candidates per accepted draw)
  • est_total_sec = est_per_draw_sec × n

5.4 Five-Minute Safeguard

If est_total_sec > 300:

  • Interactive: Prompt "Do you want to continue? [y/N]: "
  • Non-interactive: Proceed automatically with a note.

5.5 Diagnostics (verbose = TRUE)

With verbose = TRUE, the pilot reports:

  • Estimated simulation time for n draws
  • Note that the phase uses RcppParallel and cannot be safely interrupted
  • Calibration elapsed time and number of draws
  • avg_candidates_per_draw (empirical) vs E_draws
  • per_candidate_sec and est_per_draw_sec

6. rIndepNormalGammaReg Pilot Logic

For Gaussian regression with independent Normal–Gamma priors, the pilot logic is similar in spirit but tailored to joint \((\beta, \phi)\) sampling:

6.1 Single-Draw Test

One observation is generated in serial mode to measure per-observation time.

6.2 Calibration Run

\[ m_{\text{stage}} = \min\left( \lceil 0.01 \cdot n \rceil,\; \lfloor 300000 / \text{per\_obs\_ms\_serial} \rfloor \right). \]

The calibration run is parallel: parallelFor(0, m_stage, worker).

6.3 Time Estimate

  • per_obs_sec = calibration elapsed time / m_stage
  • est_total_sec = per_obs_sec × n

6.4 Five-Minute Safeguard

Same as rNormalGLM: if est_total_sec > 300, prompt in interactive sessions.

7. EnvelopeDispersionBuild Pilot (Envelope Construction)

The RSS and UB2 steps in EnvelopeDispersionBuild use closed-form bounds (bound_rss_over_dispersion and bound_ub2_over_dispersion) and no longer perform RSS/UB2 minimization or pilot timing blocks.

7.1 Five-Minute Safeguard

Since minimization/pilots are disabled, the time-estimate safeguard is not used.

8. Choosing Serial vs Parallel

Scenario Recommendation
Debugging or small runs use_parallel = FALSE – serial path is interruptible
Large production runs use_parallel = TRUE – faster; rely on the pilot and prompt
Scripts / non-interactive No prompt; runs proceed automatically

To disable parallel sampling:

glmb(formula, data, family, pfamily, n = 5000, use_parallel = FALSE)
rglmb(n = 5000, y, x, family, pfamily, use_parallel = FALSE)

Serial sampling is slower but can be interrupted. Parallel sampling is faster but non-interruptible once the main loop starts.

9. Technical Details

9.1 RcppParallel Worker Pattern

The parallel samplers use the standard RcppParallel Worker pattern:

  • A struct inheriting from RcppParallel::Worker
  • operator()(size_t begin, size_t end) processes the range [begin, end)
  • Shared outputs use RcppParallel::RVector or RcppParallel::RMatrix for thread-safe writes
  • Each thread has its own RNG state to avoid contention

9.2 Thread Safety

  • RNG: Thread-local state; no sharing across workers
  • Callbacks: The negative log-posterior and gradient functions (f2, f3) are called from workers; a mutex protects R callback invocations where needed
  • Output: Write-only ranges per worker; no overlapping writes

9.3 Dependencies

  • RcppParallel – Provides TBB and the parallelFor interface
  • TBB – Bundled with RcppParallel; no separate installation required

10. Troubleshooting

Symptom Likely cause Suggestion
Sampler appears to hang Long parallel run Use verbose = TRUE to see pilot estimates before the main run
Zero accepts in pilot Envelope too loose Try use_opencl = TRUE, larger n, different Gridtype, or stronger prior
Prompt never appears Non-interactive session Expected; runs proceed automatically in batch/CI
Want to interrupt a run Parallel phase has started Cannot interrupt; use use_parallel = FALSE for interruptible runs

11. Cross-References