---
title: "Policy Learning with Decision-Theoretic Bounds"
author: "Deniz Akdemir"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Policy Learning with Decision-Theoretic Bounds}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5
)
```

## Introduction

This vignette demonstrates how to use `causaldef` for **safe policy learning** — 
making treatment decisions with quantified guarantees even when unobserved 
confounding exists.

The key insight is the **policy regret transfer bound**:

$$\text{Regret}_{do}(\pi) \leq \text{Regret}_{obs}(\pi) + M \cdot \delta$$

where:
- $\text{Regret}_{do}(\pi)$ = regret under the true interventional distribution
- $\text{Regret}_{obs}(\pi)$ = regret observed in data
- $M$ = utility range (max - min possible outcomes)
- $\delta$ = Le Cam deficiency (quantifies confounding)

## The Safety Floor Concept

`policy_regret_bound()` reports two complementary quantities:

- **Transfer penalty** \(M\cdot\delta\): additive worst-case regret inflation term, and
- **Minimax safety floor** \((M/2)\cdot\delta\): irreducible worst-case regret when \(\delta>0\).

If \(\delta>0\), no algorithm can guarantee zero worst-case regret without stronger assumptions or randomized data.

### Implications for AI/ML Safety

1. **No algorithm can beat the safety floor**: Even infinite data doesn't help 
   if confounding exists
2. **Deficiency is the price of observational learning**: To eliminate the 
   safety floor, you need randomized experiments
3. **Confidence intervals aren't enough**: Standard ML uncertainty quantification 
   doesn't capture confounding bias

## Practical Workflow

### Step 1: Define the Causal Problem

```{r define-problem}
library(causaldef)
set.seed(123)

# Simulate a treatment decision problem
n <- 1000

# Covariates
age <- runif(n, 30, 70)
severity <- rbeta(n, 2, 5) * 10

# Confounded treatment assignment (sicker patients get treatment)
U <- rnorm(n)  # Unmeasured health status
ps_true <- plogis(-1 + 0.02 * age + 0.1 * severity + 0.5 * U)
A <- rbinom(n, 1, ps_true)

# Outcome: recovery score (0-100)
# True effect is heterogeneous
tau_true <- 10 + 0.2 * (age - 50)  # Older patients benefit more
Y <- 50 + tau_true * A - 0.3 * severity + 5 * U + rnorm(n, sd = 5)

# Clip to valid range
Y <- pmin(100, pmax(0, Y))

df <- data.frame(
  age = age,
  severity = severity,
  A = A,
  Y = Y
)
```

### Step 2: Estimate Deficiency

```{r estimate-deficiency}
spec <- causal_spec(
  data = df,
  treatment = "A",
  outcome = "Y",
  covariates = c("age", "severity")
)

# Estimate deficiency with multiple methods
def_results <- estimate_deficiency(
  spec,
  methods = c("unadjusted", "iptw", "aipw"),
  n_boot = 100
)

print(def_results)
```

### Step 3: Visualize Deficiency

```{r plot-deficiency, eval=FALSE}
plot(def_results, type = "bar")
```

### Step 4: Compute Policy Regret Bounds

```{r policy-bounds}
# Define utility range (outcome is 0-100)
utility_range <- c(0, 100)

# Suppose our policy achieves 5% observed regret
obs_regret <- 5

# Compute bound
bounds <- policy_regret_bound(
  deficiency = def_results,
  utility_range = utility_range,
  obs_regret = obs_regret
)

print(bounds)
```

### Step 5: Visualize the Safety Floor

```{r plot-safety, eval=FALSE}
# Show how safety floor varies with deficiency
plot(bounds, type = "safety_curve")
```

## Interpreting the Results

### The Safety Floor Report

```{r interpret}
cat("=== Policy Deployment Decision ===\n\n")

delta_best <- min(def_results$estimates)
M <- diff(utility_range)
transfer_penalty <- M * delta_best
minimax_floor <- 0.5 * M * delta_best

cat(sprintf("Best achievable deficiency: %.3f\n", delta_best))
cat(sprintf("Transfer penalty (M*delta): %.1f points\n", transfer_penalty))
cat(sprintf("Minimax safety floor (M/2*delta): %.1f points\n", minimax_floor))
cat(sprintf("Observed regret: %.1f points\n", obs_regret))

if (!is.null(bounds$regret_bound)) {
  cat(sprintf("Worst-case regret: %.1f points\n", bounds$regret_bound))
}

cat("\n")

# Decision thresholds
if (delta_best < 0.05) {
  cat("✓ EXCELLENT: Deficiency < 5%. High confidence in policy.\n")
} else if (delta_best < 0.10) {
  cat("⚠ MODERATE: Deficiency 5-10%. Proceed with monitoring.\n")
} else {
  cat("✗ CAUTION: Deficiency > 10%. Consider RCT before deployment.\n")
}
```

## Sensitivity Analysis with Confounding Frontiers

What if there's additional unmeasured confounding?

```{r confounding-frontier}
# Map the confounding frontier
frontier <- confounding_frontier(
  spec,
  alpha_range = c(-2, 2),
  gamma_range = c(-2, 2),
  grid_size = 30
)

# Find the safe region
safe_region <- subset(frontier$grid, delta < 0.1)
cat(sprintf(
  "Safe operating region covers %.1f%% of confounding space\n",
  100 * nrow(safe_region) / nrow(frontier$grid)
))
```

### Visualize the Frontier

```{r plot-frontier, eval=FALSE}
plot(frontier, type = "heatmap", threshold = c(0.05, 0.1, 0.2))
```

## Policy Learning with grf (Optional)

If you have the `grf` package installed, you can use causal forests for 
heterogeneous treatment effect estimation with deficiency bounds:

```{r grf-example, eval=FALSE}
# Estimate deficiency using causal forests
if (requireNamespace("grf", quietly = TRUE)) {
  def_grf <- estimate_deficiency(
    spec,
    methods = c("aipw", "grf"),
    n_boot = 50
  )
  
  print(def_grf)
  
  # Get individual treatment effect predictions
  kernel_grf <- def_grf$kernel$grf
  if (!is.null(kernel_grf$tau_hat)) {
    cat("\nHeterogeneous Effects Detected:\n")
    cat(sprintf("ATE from forest: %.2f\n", kernel_grf$ate))
    cat(sprintf("CATE range: [%.2f, %.2f]\n", 
                min(kernel_grf$tau_hat), 
                max(kernel_grf$tau_hat)))
  }
}
```

## Best Practices for Safe Deployment

### Pre-Deployment Checklist

| Check | Threshold | Action if Failed |
|-------|-----------|------------------|
| $\delta < 0.05$ | Excellent | Deploy with confidence |
| $\delta \in [0.05, 0.10]$ | Moderate | Deploy with active monitoring |
| $\delta > 0.10$ | Concerning | Consider pilot RCT |
| NC diagnostic falsified | Any | Do not deploy without more data |

### Monitoring in Production

```{r monitoring, eval=FALSE}
# Example: Re-estimate deficiency on new data
new_data <- ...  # Your production data

new_spec <- causal_spec(
  new_data,
  treatment = "A",
  outcome = "Y",
  covariates = c("age", "severity")
)

# Quick check
def_monitor <- estimate_deficiency(
  new_spec,
  methods = "iptw",
  n_boot = 50
)

# Alert if deficiency increased
if (def_monitor$estimates["iptw"] > 1.5 * delta_best) {
  warning("Distribution shift detected! Deficiency increased.")
}
```

## Mathematical Details

### Policy Regret Transfer (Manuscript)

For any policy $\pi$ and bounded utility function $u \in [0, M]$:

$$\mathbb{E}_{P^{do}}\left[\max_a u(a, X) - u(\pi(X), X)\right] \leq 
\mathbb{E}_{P^{obs}}\left[\max_a u(a, X) - u(\pi(X), X)\right] + M\delta$$

**Proof sketch**: The deficiency $\delta$ bounds the total variation distance
between the (simulated) observational and target interventional laws. Since utility is bounded by $M$,
the maximum discrepancy in expected utility is at most $M$ times the total variation gap.

### Why This Matters

Traditional ML focuses on:
- **Prediction error**: How well does my model predict $Y$?
- **Generalization**: Does performance hold on new data?

But for causal policy learning, we need:
- **Interventional validity**: Does my policy work when *deployed*?
- **Confounding robustness**: How much could unmeasured bias hurt me?

The safety floor answers these questions with formal guarantees.

## Summary

| Concept | Definition | Function |
|---------|------------|----------|
| Transfer penalty | $M\delta$ — additive regret inflation term | `$transfer_penalty` |
| Minimax safety floor | $(M/2)\delta$ — irreducible worst-case regret | `$minimax_floor` |
| Regret bound | observed regret + transfer penalty | `$regret_bound` |
| Deficiency | Information gap between obs and do | `estimate_deficiency()` |
| Confounding Frontier | Deficiency as function of $(\alpha, \gamma)$ | `confounding_frontier()` |

Use these tools to make **safe, accountable decisions** from observational data.

## References

1. Akdemir, D. (2026). Constraints on Causal Inference as Experiment Comparison. 
   DOI: 10.5281/zenodo.18367347. See `thm:policy_regret` (Policy Regret Transfer) and
   `thm:safety_floor` (Minimax Safety Floor).

2. Athey, S., & Wager, S. (2021). Policy learning with observational data. 
   *Econometrica*, 89(1), 133-161.

3. Kallus, N. (2020). Confounding-robust policy evaluation in infinite-horizon 
   reinforcement learning. *NeurIPS*.
