---
title: "Custom Derivatives for Maximum Likelihood Estimation"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Custom Derivatives for Maximum Likelihood Estimation}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Introduction

Maximum likelihood estimation (MLE) for survival models requires computing derivatives of the log-likelihood function. The **score function** (gradient) is needed for optimization, and the **Hessian matrix** (second derivatives) is needed for computing standard errors and confidence intervals via the observed Fisher information.

The `flexhaz` package uses a simple 2-tier fallback for each:

| Derivative | If provided | Otherwise |
|-----------|-------------|-----------|
| Score     | `score_fn(df, par, ...)` | [`numDeriv`](https://CRAN.R-project.org/package=numDeriv)`::grad()` |
| Hessian   | `hess_fn(df, par, ...)`  | [`numDeriv`](https://CRAN.R-project.org/package=numDeriv)`::hessian()` |

**You decide how to compute derivatives.** Hand-derive them, use an AD library, or let the package fall back to numerical methods. The package accepts functions; it doesn't care how they were produced.

This vignette shows how to provide custom derivatives using the Weibull distribution as a case study.

## Setup

```{r setup, message=FALSE, warning=FALSE}
library(flexhaz)
```

## The Weibull Distribution

The Weibull distribution is widely used in survival analysis. Its hazard function is:

$$h(t; k, \sigma) = \frac{k}{\sigma}\left(\frac{t}{\sigma}\right)^{k-1}$$

The cumulative hazard is:

$$H(t; k, \sigma) = \left(\frac{t}{\sigma}\right)^k$$

For uncensored data, the log-likelihood is:

$$\ell(k, \sigma) = n\log k + (k-1)\sum_i \log t_i - nk\log\sigma - \sum_i \left(\frac{t_i}{\sigma}\right)^k$$

## Simulating Data

```{r simulate-data}
set.seed(42)
true_k <- 2
true_sigma <- 3
n <- 100

u <- runif(n)
times <- true_sigma * (-log(u))^(1/true_k)
df <- data.frame(t = times, delta = rep(1, n))

cat("Sample size:", n, "\n")
cat("True parameters: k =", true_k, ", sigma =", true_sigma, "\n")
```

## Approach 1: Minimal — Just the Hazard Rate

The simplest approach provides only the hazard function. Everything else is computed numerically:

```{r minimal-approach}
weibull_minimal <- dfr_dist(
    rate = function(t, par, ...) {
        k <- par[1]
        sigma <- par[2]
        (k / sigma) * (t / sigma)^(k - 1)
    }
)

ll <- loglik(weibull_minimal)
s <- score(weibull_minimal)
H <- hess_loglik(weibull_minimal)

test_par <- c(1.8, 2.8)

cat("Log-likelihood:", ll(df, par = test_par), "\n")
cat("Score (numerical):", s(df, par = test_par), "\n")
cat("Hessian (numerical):\n")
print(round(H(df, par = test_par), 4))
```

**Pros:** No math required beyond the hazard function.

**Cons:** Slower (numerical integration for H(t), finite differences for derivatives).

## Approach 2: Analytical Score + Analytical Hessian

For the Weibull, we can derive exact gradient and Hessian analytically:

**Score:**

$$\frac{\partial \ell}{\partial k} = \frac{n}{k} + \sum_i \log t_i - n\log\sigma - \sum_i \left(\frac{t_i}{\sigma}\right)^k \log\left(\frac{t_i}{\sigma}\right)$$

$$\frac{\partial \ell}{\partial \sigma} = \frac{k}{\sigma}\left[\sum_i \left(\frac{t_i}{\sigma}\right)^k - n\right]$$

```{r analytical-approach}
weibull_full <- dfr_dist(
    rate = function(t, par, ...) {
        k <- par[1]
        sigma <- par[2]
        (k / sigma) * (t / sigma)^(k - 1)
    },
    cum_haz_rate = function(t, par, ...) {
        k <- par[1]
        sigma <- par[2]
        (t / sigma)^k
    },
    score_fn = function(df, par, ...) {
        k <- par[1]
        sigma <- par[2]
        t <- df$t
        delta <- if ("delta" %in% names(df)) df$delta else rep(1, nrow(df))

        n_events <- sum(delta == 1)
        t_ratio <- t / sigma
        log_t_ratio <- log(t_ratio)

        dk <- n_events / k + sum(delta * log_t_ratio) -
            sum(t_ratio^k * log_t_ratio)
        dsigma <- -n_events * k / sigma +
            (k / sigma) * sum(t_ratio^k)

        c(dk, dsigma)
    },
    hess_fn = function(df, par, ...) {
        k <- par[1]
        sigma <- par[2]
        t <- df$t
        delta <- if ("delta" %in% names(df)) df$delta else rep(1, nrow(df))

        n_events <- sum(delta == 1)
        t_ratio <- t / sigma
        log_t_ratio <- log(t_ratio)
        t_ratio_k <- t_ratio^k

        d2k <- -n_events / k^2 - sum(t_ratio_k * log_t_ratio^2)
        d2k_sigma <- -n_events / sigma +
            (1 / sigma) * sum(t_ratio_k) +
            (k / sigma) * sum(t_ratio_k * log_t_ratio)
        d2sigma <- n_events * k / sigma^2 -
            k * (k + 1) / sigma^2 * sum(t_ratio_k)

        matrix(c(d2k, d2k_sigma, d2k_sigma, d2sigma), nrow = 2)
    }
)

s_full <- score(weibull_full)
H_full <- hess_loglik(weibull_full)

cat("Score (analytical):", s_full(df, par = test_par), "\n")
cat("Hessian (analytical):\n")
print(round(H_full(df, par = test_par), 4))
```

## Approach 3: Provide Score Only, Let [numDeriv](https://CRAN.R-project.org/package=numDeriv) Handle the Hessian

If deriving the Hessian is tedious but the score is straightforward, provide just `score_fn`:

```{r score-only}
weibull_score_only <- dfr_dist(
    rate = function(t, par, ...) {
        k <- par[1]
        sigma <- par[2]
        (k / sigma) * (t / sigma)^(k - 1)
    },
    cum_haz_rate = function(t, par, ...) {
        k <- par[1]
        sigma <- par[2]
        (t / sigma)^k
    },
    score_fn = function(df, par, ...) {
        k <- par[1]
        sigma <- par[2]
        t <- df$t
        delta <- if ("delta" %in% names(df)) df$delta else rep(1, nrow(df))

        n_events <- sum(delta == 1)
        t_ratio <- t / sigma
        log_t_ratio <- log(t_ratio)

        dk <- n_events / k + sum(delta * log_t_ratio) -
            sum(t_ratio^k * log_t_ratio)
        dsigma <- -n_events * k / sigma +
            (k / sigma) * sum(t_ratio^k)

        c(dk, dsigma)
    }
    # No hess_fn -> numDeriv::hessian fallback
)

H_mixed <- hess_loglik(weibull_score_only)
cat("Hessian (numerical fallback):\n")
print(round(H_mixed(df, par = test_par), 4))
```

This gives you fast optimization (analytical gradient) with correct but slower Hessian computation at the end.

## Using Pre-Built Constructors

The package provides helper constructors with analytical derivatives already implemented:

```{r constructors, warning=FALSE}
# These include score_fn and hess_fn (where available)
weib <- dfr_weibull(shape = 2, scale = 3)
exp_d <- dfr_exponential(lambda = 0.5)

# Fit the model
solver <- fit(weib)
result <- solver(df, par = c(1.5, 2.5))

cat("Fitted parameters:\n")
print(coef(result))
cat("\nTrue parameters: k =", true_k, ", sigma =", true_sigma, "\n")
cat("\n95% Confidence intervals:\n")
print(confint(result))
```

## Computing Standard Errors

With the Hessian, standard errors come from the observed Fisher information:

```{r standard-errors}
# Hessian at MLE
mle_par <- coef(result)
hess_mle <- H_full(df, par = mle_par)

# Observed Fisher information = -Hessian
obs_info <- -hess_mle

# Standard errors = sqrt(diag(inverse Fisher info))
se <- sqrt(diag(solve(obs_info)))

cat("Standard errors:\n")
cat("  SE(k) =", round(se[1], 4), "\n")
cat("  SE(sigma) =", round(se[2], 4), "\n")
```

## Distribution Reference

| Distribution | Constructor | Score | Hessian |
|-------------|------------|-------|---------|
| Exponential | `dfr_exponential()` | Analytical | Analytical |
| Weibull | `dfr_weibull()` | Analytical | Analytical |
| Gompertz | `dfr_gompertz()` | Analytical | [numDeriv](https://CRAN.R-project.org/package=numDeriv) |
| Log-logistic | `dfr_loglogistic()` | Analytical | [numDeriv](https://CRAN.R-project.org/package=numDeriv) |

For Gompertz and log-logistic, you can supply your own `hess_fn` if needed.

## Next Steps

- **`vignette("flexhaz-package")`** - Package introduction and quick start
- **`vignette("reliability_engineering")`** - Real-world applications
- **`vignette("failure_rate")`** - Mathematical foundations
- **`vignette("custom_distributions")`** - The three-level optimization paradigm
