Statistical methods for analyzing binary replicates, i.e., noisy binary measurements of latent binary states. This package implements the methods described in:
Royer-Carenzi, M., Lorenzo, H., & Pudlo, P. (in press). Reconciling Binary Replicates: Beyond the Average. Statistics in Medicine.
The package provides scoring functions to estimate the probability that an individual is in the positive state, given noisy replicated measurements:
| Method | Function | Requirements |
|---|---|---|
| Average-based | average_scoring() |
None |
| Median-based | median_scoring() |
None |
| MAP (EM algorithm) | MAP_scoring() |
Fitted EM model |
| Likelihood-based | likelihood_scoring() |
Known parameters |
| Bayesian | bayesian_scoring() |
Fitted Bayesian model |
Additional features: - Classification with
inconclusive decisions (classify_with_scores()) -
Prevalence estimation from scores or Bayesian posterior
- Credible intervals for model parameters -
Cross-validation for model assessment
(cvEM())
For each individual \(i\), we observe \(n_i\) binary replicates. These are noisy measurements of a true latent state \(T_i \in \{0, 1\}\):
\[ T_i \mid \theta \sim \text{Bernoulli}(\theta) \]
\[ S_i \mid T_i, p, q \sim \text{Binomial}\big(n_i,\; T_i(1-q) + (1-T_i)p\big) \]
where: - \(\theta \in (0, 1)\) is the prevalence (probability that \(T_i = 1\)) - \(p \in (0, 1/2)\) is the false positive rate (probability of observing 1 when \(T_i = 0\)) - \(q \in (0, 1/2)\) is the false negative rate (probability of observing 0 when \(T_i = 1\))
The goal is to estimate the probability \(\mathbb{P}(T_i = 1 \mid S_i = s_i)\) for each individual, which is given by:
\[ \mathbb{P}(T_i = 1 \mid S_i = s_i) = \frac{\theta \cdot (1-q)^{s_i} q^{n_i - s_i}}{\theta \cdot (1-q)^{s_i} q^{n_i - s_i} + (1-\theta) \cdot p^{s_i} (1-p)^{n_i - s_i}} \]
The package depends on rstan for Bayesian inference.
Install it first by following the guide at:
https://mc-stan.org/install/
# install.packages("devtools")
devtools::install_github("pierrepudlo/BinaryReplicates")library(BinaryReplicates)
# Load example data
data("periodontal")
ni <- periodontal$ni
si <- periodontal$si
# --- Fast approach: EM algorithm ---
fit_em <- EMFit(ni, si)
fit_em$parameters_hat
scores_MAP <- MAP_scoring(ni, si, fit_em)
# --- Full Bayesian approach ---
fit_bayes <- BayesianFit(ni, si, chains = 4, iter = 5000)
scores_Bayes <- bayesian_scoring(ni, si, fit_bayes)
# Classify individuals (0.5 = inconclusive)
classes_MAP <- classify_with_scores(scores_MAP, vL = 0.4, vU = 0.6)
classes_Bayes <- classify_with_scores(scores_Bayes, vL = 0.4, vU = 0.6)
# Compare classifications
table(MAP = classes_MAP, Bayesian = classes_Bayes)theta <- 0.4
p <- q <- 0.22
n <- 50
ni <- sample(2:6, n, replace = TRUE)
ti <- rbinom(n, 1, theta)
si <- rbinom(n, ni, ti * (1 - q) + (1 - ti) * p)These methods require no model fitting:
# Average-based scores
Y_A <- average_scoring(ni, si)
theta_A <- prevalence_estimate(Y_A)
# Median-based scores
Y_M <- median_scoring(ni, si)
theta_M <- prevalence_estimate(Y_M)The EM algorithm estimates model parameters without full Bayesian inference:
fit_em <- EMFit(ni, si)
fit_em$parameters_hat
# MAP scores use the estimated parameters
Y_MAP <- MAP_scoring(ni, si, fit_em)
theta_MAP <- prevalence_estimate(Y_MAP)
# Classification with thresholds
T_MAP <- classify_with_scores(Y_MAP, vL = 0.4, vU = 0.6)For full posterior inference, use Stan via
BayesianFit():
# Fit the Bayesian model (uses Stan MCMC)
fit <- BayesianFit(ni, si, chains = 4, iter = 5000)
print(fit, pars = c("theta", "p", "q"))
# Credible intervals
credint(fit, level = 0.90)
# Bayesian scores and prevalence
Y_B <- bayesian_scoring(ni, si, fit)
theta_B <- bayesian_prevalence_estimate(fit)
# Classification
T_B <- classify_with_scores(Y_B, vL = 0.4, vU = 0.6)To use multiple CPU cores for faster sampling:
options(mc.cores = parallel::detectCores())When the true parameters are known (e.g., in simulations):
Y_L <- likelihood_scoring(ni, si, list(theta = theta, p = p, q = q))
T_L <- classify_with_scores(Y_L, vL = 0.4, vU = 0.6)Compute predictive Bayesian scores for new observations:
new_ni <- rep(10, 11)
new_si <- 0:10
new_scores <- predict_scores(new_ni, new_si, fit)Assess model performance with cross-validation:
cv_result <- cvEM(ni, si, N_cv = 10)
cv_classified <- classify_with_scores_cvEM(cv_result, ti = ti, vL = 0.4)
cv_classified$risk # Empirical riskFor more details, see the package vignette:
vignette("introduction", package = "BinaryReplicates")If you use this package, please cite:
Royer-Carenzi, M., Lorenzo, H., & Pudlo, P. (in press).
Reconciling Binary Replicates: Beyond the Average.
Statistics in Medicine.
GPL (>= 3)