This vignette explores the functional form sensitivity problem in difference-in-differences with binary outcomes, following Roth & Sant’Anna (2023).
Suppose two groups have baseline outcome probabilities:
Both groups experience the same additive increase in probability: +0.10.
On the probability scale: The change is 0.10 for both — parallel trends holds.
On the logit scale: - Control: logit(0.40) - logit(0.30) = 0.442 - Treated: logit(0.35) - logit(0.25) = 0.48
These are not equal! A researcher testing parallel trends on the log-odds scale would (correctly) reject — even though the underlying time trend is identical for both groups on the probability scale.
# Demonstrate scale sensitivity
p_ctrl_pre <- 0.30; p_ctrl_post <- 0.40
p_treat_pre <- 0.25; p_treat_post <- 0.35
cat("=== Probability Scale ===\n")
#> === Probability Scale ===
cat("Control change: ", round(p_ctrl_post - p_ctrl_pre, 4), "\n")
#> Control change: 0.1
cat("Treated change: ", round(p_treat_post - p_treat_pre, 4), "\n")
#> Treated change: 0.1
cat("DiD (prob): ", round((p_treat_post - p_treat_pre) - (p_ctrl_post - p_ctrl_pre), 4), "\n\n")
#> DiD (prob): 0
cat("=== Log-Odds (Logit) Scale ===\n")
#> === Log-Odds (Logit) Scale ===
cat("Control change: ", round(qlogis(p_ctrl_post) - qlogis(p_ctrl_pre), 4), "\n")
#> Control change: 0.4418
cat("Treated change: ", round(qlogis(p_treat_post) - qlogis(p_treat_pre), 4), "\n")
#> Treated change: 0.4796
cat("DiD (logit): ", round((qlogis(p_treat_post) - qlogis(p_treat_pre)) -
(qlogis(p_ctrl_post) - qlogis(p_ctrl_pre)), 4), "\n\n")
#> DiD (logit): 0.0377
cat("=== Probit Scale ===\n")
#> === Probit Scale ===
cat("Control change: ", round(qnorm(p_ctrl_post) - qnorm(p_ctrl_pre), 4), "\n")
#> Control change: 0.2711
cat("Treated change: ", round(qnorm(p_treat_post) - qnorm(p_treat_pre), 4), "\n")
#> Treated change: 0.2892
cat("DiD (probit): ", round((qnorm(p_treat_post) - qnorm(p_treat_pre)) -
(qnorm(p_ctrl_post) - qnorm(p_ctrl_pre)), 4), "\n")
#> DiD (probit): 0.0181Key takeaway: The same underlying DGP yields different DiD estimates and different pre-trends test results depending on the scale chosen. There is no uniquely correct scale — the right scale is determined by the DGP.
Scale sensitivity is most severe when:
# Show severity across baseline probability values
baseline_probs <- seq(0.05, 0.45, by = 0.05)
delta_p <- 0.10 # same additive change for both groups
severity_df <- do.call(rbind, lapply(baseline_probs, function(p0) {
p1 <- p0 + delta_p
# Parallel in prob => same change
# Logit DiD if treated has different baseline (p0 - 0.05)
p0_treat <- max(p0 - 0.05, 0.02)
p1_treat <- p0_treat + delta_p
logit_did <- (qlogis(p1_treat) - qlogis(p0_treat)) -
(qlogis(p1) - qlogis(p0))
data.frame(
baseline_ctrl = p0,
baseline_treat = p0_treat,
logit_did = logit_did
)
}))
cat("Logit-scale DiD when true probability DiD = 0:\n")
#> Logit-scale DiD when true probability DiD = 0:
print(severity_df, digits = 3, row.names = FALSE)
#> baseline_ctrl baseline_treat logit_did
#> 0.05 0.02 0.68955
#> 0.10 0.05 0.39891
#> 0.15 0.10 0.17494
#> 0.20 0.15 0.09699
#> 0.25 0.20 0.05942
#> 0.30 0.25 0.03774
#> 0.35 0.30 0.02346
#> 0.40 0.35 0.01290
#> 0.45 0.40 0.00412
cat("\nLarger deviations at low baseline probabilities.\n")
#>
#> Larger deviations at low baseline probabilities.We now simulate data where parallel trends holds on the logit scale (the correct scale for the DGP), and compare estimates from: 1. Linear DiD (standard CS2021) 2. Logit DiD (NonlinearDiD)
# DGP: parallel trends on logit scale
dat <- sim_binary_panel(
n = 1000,
nperiods = 8,
prop_treated = 0.5,
n_cohorts = 3,
true_att = c(0.20, 0.35, 0.25),
base_prob = 0.20, # low baseline: nonlinearity matters most
unit_fe_sd = 0.5,
seed = 42
)
cat("Baseline outcome rate (untreated, pre-period):",
round(mean(dat$y[dat$D == 0 & dat$period == 1]), 3), "\n")
#> Baseline outcome rate (untreated, pre-period): 0.202
cat("True ATTs (avg):", round(mean(c(0.20, 0.35, 0.25)), 3), "\n\n")
#> True ATTs (avg): 0.267# Logit DiD
res_logit <- nonlinear_attgt(
dat, "y", "period", "id", "g",
outcome_model = "logit",
control_group = "nevertreated"
)
# Linear DiD
res_linear <- nonlinear_attgt(
dat, "y", "period", "id", "g",
outcome_model = "linear",
control_group = "nevertreated"
)
# Aggregate
agg_logit <- nonlinear_aggte(res_logit, type = "dynamic")
agg_linear <- nonlinear_aggte(res_linear, type = "dynamic")
cat("=== Overall ATT ===\n")
#> === Overall ATT ===
cat("Linear DiD: ", round(agg_linear$overall_att, 4), "\n")
#> Linear DiD: 0.046
cat("Logit DiD: ", round(agg_logit$overall_att, 4), "\n")
#> Logit DiD: 0.046
cat("True ATT: ", round(mean(c(0.20, 0.35, 0.25)), 4), "\n")
#> True ATT: 0.2667A critical practical insight: even if the true DGP has no pre-treatment differences (i.e., the identifying assumption holds on some scale), a pre-trends test on the wrong scale may falsely reject.
# Test on logit scale
pt_logit <- nonlinear_pretest(res_logit, plot = FALSE)
cat("Pre-trends test (logit scale):\n")
#> Pre-trends test (logit scale):
cat(" Joint p-value:", round(pt_logit$joint_pval, 4), "\n\n")
#> Joint p-value: 0.2518
# Test on linear scale
pt_linear <- nonlinear_pretest(res_linear, plot = FALSE)
cat("Pre-trends test (linear scale):\n")
#> Pre-trends test (linear scale):
cat(" Joint p-value:", round(pt_linear$joint_pval, 4), "\n\n")
#> Joint p-value: 0.2518
cat("Note: If true DGP is logit-scale parallel trends, the linear-scale\n")
#> Note: If true DGP is logit-scale parallel trends, the linear-scale
cat("pre-trends test may spuriously reject due to functional form.\n")
#> pre-trends test may spuriously reject due to functional form.Based on Roth & Sant’Anna (2023) and the evidence above:
Think about your DGP first. If your outcome is binary with moderate baseline rates (15–85%), use a nonlinear model.
Report the scale of your parallel trends assumption. “We assume parallel trends in log-odds” is a substantively different claim from “parallel trends in probabilities.”
Use doubly-robust estimation
(doubly_robust = TRUE) which is consistent under
misspecification of either the outcome model or the propensity score
model.
Consider the odds-ratio DiD for binary outcomes when you want an estimate that does not depend on the reference group or period.
Use nonlinear_bounds() to report
the range of ATTs consistent with the data under minimal
assumptions.
Roth, J., & Sant’Anna, P. H. C. (2023). When is parallel trends sensitive to functional form? Econometrica, 91(2), 737-747.
Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. The Econometrics Journal, 26(3).