---
title: "NeStage Functions: A User Guide"
subtitle: "Estimating Effective Population Size in Stage-Structured Populations"
author: "Raymond L. Tremblay"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    number_sections: true
vignette: >
  %\VignetteIndexEntry{NeStage Functions: A User Guide}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  fig.width = 7,
  fig.height = 5,
  warning   = FALSE,
  message   = FALSE
)

library(NeStage)

```{r packages, include = FALSE}
library(expm)     # matrix exponentiation (%^%)
library(ggplot2)
library(dplyr)
library(tidyr)
library(purrr)    # map_dfr for looping over populations
library(knitr)
```

---

# Introduction

## What is effective population size and why does it matter?

The **effective population size** ($N_e$) is the size of an ideal Wright-Fisher
population that would experience the same rate of genetic drift — the random
change in allele frequencies from one generation to the next — as the actual
population being studied (Wright 1931). It is almost always smaller than the
census population size $N$, sometimes dramatically so.

$N_e$ matters for conservation because:

- When $N_e$ is small, genetic drift dominates over natural selection, accelerating
  the loss of genetic variation and the fixation of deleterious alleles
  (Frankham 1995).
- The short-term conservation threshold of $N_e \geq 50$ (Franklin 1980)
  is needed to avoid inbreeding depression in the near term. For small,
  range-restricted species like *Lepanthes*, this is the operationally
  relevant benchmark — a threshold of $N_e \geq 5{,}000$ (Lande 1995)
  is ecologically unattainable for any known population.
- Empirical surveys across 102 wildlife species found a median $N_e/N \approx 0.10$
  — meaning that on average only 10% of census individuals contribute
  genetically to the next generation (Frankham 1995).

## Why stage-structured models?

Many perennial plants, including orchids, do not have a simple annual census
that maps onto a generation. Instead, individuals pass through distinct
life-history **stages** — seedlings, juveniles, non-reproductive adults,
reproductive adults — each with different survival and reproductive rates.
Using a simple $N_e$ formula designed for annual species would ignore this
structure and produce misleading estimates.

**NeStage** implements the variance effective population size for
stage-structured populations derived by Yonezawa et al. (2000). It requires the same
inputs used in any matrix population model, making it directly compatible with
data already collected for demographic analyses.

## The three NeStage functions

| Function | Use when | Reproduction |
|---|---|---|
| `Ne_clonal_Y2000()` | All stages reproduce clonally | Fully clonal ($d_i = 1$) |
| `Ne_sexual_Y2000()` | All stages reproduce sexually | Fully sexual ($d_i = 0$) |
| `Ne_mixed_Y2000()` | Stages differ in clonal fraction | Mixed ($0 \leq d_i \leq 1$) |

This vignette focuses on `Ne_sexual_Y2000()` applied to three Puerto Rican
*Lepanthes* orchid species (Tremblay & Ackerman 2001), which reproduce exclusively by
seed (sexually). The other two functions are described briefly in Section 6.

## MatU and MatF: the standard input format

NeStage follows the **MatU / MatF** convention now standard in population
biology (Caswell 2001; Salguero-Gómez et al. 2015):

- **MatU** — the survival-transition matrix. Element $u_{ji}$ is the
  probability that an individual in stage $i$ at time $t$ survives and is
  found in stage $j$ at time $t+1$. Column sums $\leq 1$.
- **MatF** — the fecundity matrix. Element $f_{ji}$ is the per-capita
  production of stage-$j$ offspring by a stage-$i$ individual per time step.
  In most models only the first row is non-zero (all offspring enter stage 1).

The full projection matrix is $\mathbf{A} = \mathbf{MatU} + \mathbf{MatF}$.

> **Tip:** If you use the `{Rcompadre}` or `{Rage}` packages, your matrices
> are already in MatU/MatF format and can be passed directly to NeStage.

---

# The study system: *Lepanthes* orchids in Puerto Rico

*Lepanthes* is one of the most species-rich orchid genera in the Neotropics,
with over 600 species (Luer 1986). Three Puerto Rican species were studied by
Tremblay & Ackerman (2001):

| Species | Habitat | Populations | Census N (adults) |
|---|---|---|---|
| *L. rupestris* | Lithophyte, riverbeds | 7 | 62–102 |
| *L. rubripetala* | Epiphyte, rare | 6 | 27–101 |
| *L. eltoroensis* | Epiphyte, mountain ridges | 4 | 14–51 |

All three species are obligate outcrossers pollinated by fungus gnats, with
no vegetative reproduction. Populations are small, patchily distributed, and
show high variance in reproductive output — conditions expected to produce
small $N_e$ values.

## The six-stage life cycle of *L. rupestris*

*L. rupestris* has the most complex life cycle of the three species, with
**six stages** (Figure 1 of Tremblay & Ackerman (2001)):

| Stage | Description |
|---|---|
| 1 | Recruits (new seedlings entering the population) |
| 2 | Seedlings (established, not yet juvenile) |
| 3 | Juveniles |
| 4 | Non-reproductive adults |
| 5 | Reproductive adults with one active shoot |
| 6 | Reproductive adults with two or more active shoots |

*L. rubripetala* and *L. eltoroensis* have **five stages** (stage 6 is absent;
all reproductive adults are pooled into stage 5).

> **Note on time step:** The matrices in Tremblay & Ackerman (2001) are **monthly**
> transition probabilities, estimated from individuals tagged and tracked
> monthly for 16–34 months. Section 4 explores the consequences of
> converting these to annual matrices before computing $N_e$.

---

# Setting up the matrices

## *L. rupestris* population 1 — monthly matrices

The monthly MatU and MatF matrices for population 1 of *L. rupestris*
are taken directly from the Appendix of Tremblay & Ackerman (2001).

```{r Lrup1_monthly}
# Monthly MatU (survival-transition) — L. rupestris population 1
# Rows = destination stage, columns = origin stage
# 6 x 6 matrix, column sums <= 1
MatU_Lrup1_mo <- matrix(c(
  0,     0,     0,     0,     0,     0,
  0.738, 0.738, 0,     0,     0,     0,
  0,     0,     0.515, 0,     0.076, 0.013,
  0,     0.038, 0,     0.777, 0,     0,
  0,     0.002, 0.368, 0.011, 0.730, 0.171,
  0,     0,     0.037, 0,     0.169, 0.790
), nrow = 6, byrow = TRUE,
dimnames = list(
  paste0("stage_", 1:6),
  paste0("stage_", 1:6)
))

# Monthly MatF (fecundity) — L. rupestris population 1
# Only stages 5 and 6 produce offspring (row 1 only)
MatF_Lrup1_mo <- matrix(c(
  0, 0, 0, 0, 0.120, 0.414,
  0, 0, 0, 0, 0,     0,
  0, 0, 0, 0, 0,     0,
  0, 0, 0, 0, 0,     0,
  0, 0, 0, 0, 0,     0,
  0, 0, 0, 0, 0,     0
), nrow = 6, byrow = TRUE,
dimnames = list(
  paste0("stage_", 1:6),
  paste0("stage_", 1:6)
))  # MatF contains fecundity only (row 1): stages 5 and 6 produce offspring

# Verify column sums of MatU are <= 1
colSums(MatU_Lrup1_mo)
```

> **Check:** All column sums of MatU should be $\leq 1$. Values $< 1$
> represent stage-specific mortality. Here stage 1 has no self-loop
> (column sum = 0.738) meaning recruits either move to stage 2 or die.

## Stage frequency vector D

The stage frequency vector $D$ gives the proportion of the population in
each stage. From Table 1 of Tremblay & Ackerman (2001), population 1 of *L. rupestris*
has 44 seedlings, 72 juveniles, and 97 adults. Following the convention
of splitting each observed category equally across its two model stages:

```{r D_Lrup1}
# Raw counts: 44 seedlings, 72 juveniles, 97 adults
# Split equally: stages 1-2 get 44/2 each, stages 3-4 get 72/2 each,
#                stages 5-6 get 97/2 each
N_Lrup1 <- c(22, 22, 36, 36, 48.5, 48.5)  # total = 213
D_Lrup1 <- N_Lrup1 / sum(N_Lrup1)

round(D_Lrup1, 4)
sum(D_Lrup1)  # must equal 1
```

---

# Monthly vs annual matrices: does the time step matter?

This is a question with real practical consequences. Field ecologists often
collect monthly demographic data but $N_e$ theory assumes a **yearly**
time step (or at least a consistent biological time unit tied to the
generation). What happens when we compute $N_e$ from monthly vs annual matrices?

## Converting monthly to annual matrices

For a linear matrix population model, the annual projection matrix is simply
the monthly matrix raised to the 12th power:

$$\mathbf{A}_{annual} = \mathbf{A}_{monthly}^{12}$$

We can decompose this into MatU and MatF components:

```{r monthly_to_annual}
library(expm)

# Annual projection matrix
A_Lrup1_mo  <- MatU_Lrup1_mo + MatF_Lrup1_mo
A_Lrup1_ann <- A_Lrup1_mo %^% 12

# Decompose back into MatU_annual and MatF_annual
# MatU_annual: survival component raised to the 12th power
MatU_Lrup1_ann <- MatU_Lrup1_mo %^% 12

# MatF_annual: annual fecundity = sum over 12 months of
# (probability of surviving to month k) x (fecundity at month k)
# F_ann = sum_{k=0}^{11} MatU^k %*% MatF_monthly
MatF_Lrup1_ann <- matrix(0, 6, 6)
MatU_k <- diag(6)  # start at MatU^0 = identity
for (k in 0:11) {
  MatF_Lrup1_ann <- MatF_Lrup1_ann + MatU_k %*% MatF_Lrup1_mo
  MatU_k         <- MatU_k %*% MatU_Lrup1_mo
}
# Result: only row 1 is non-zero (offspring only enter stage 1)

# Quick check: the residual measures second-order effects (offspring
# reproducing within the same year). For monthly matrices with low
# fecundity (f5=0.120, f6=0.414 per month) this is negligible.
# Round values below 1e-6 to zero for a clean decomposition.
MatU_Lrup1_ann[MatU_Lrup1_ann < 1e-6] <- 0
MatF_Lrup1_ann[MatF_Lrup1_ann < 1e-6] <- 0
round(max(abs(A_Lrup1_ann - (MatU_Lrup1_ann + MatF_Lrup1_ann))), 8)
```

> **How the conversion works:** $\mathbf{MatU}^{12}$ gives the probability of
> surviving and transitioning across 12 monthly steps. $\mathbf{MatF}_{annual}$
> accumulates fecundity contributions in each month, weighted by the
> probability of having survived to that month. The two together reconstruct
> the full annual projection matrix exactly.

## Comparing Ne/N from monthly vs annual matrices

```{r compare_monthly_annual}
# --- Monthly ---
Ne_mo <- Ne_sexual_Y2000(
  T_mat      = MatU_Lrup1_mo,
  F_vec      = MatF_Lrup1_mo[1, ],
  D          = D_Lrup1,
  population = "L. rupestris pop 1 (monthly)"
)

# --- Annual ---
Ne_ann <- Ne_sexual_Y2000(
  T_mat      = MatU_Lrup1_ann,
  F_vec      = MatF_Lrup1_ann[1, ],
  D          = D_Lrup1,
  population = "L. rupestris pop 1 (annual)"
)

# Summary comparison table
comparison <- data.frame(
  Time_step = c("Monthly", "Annual"),
  NeN       = round(c(Ne_mo$NeN,  Ne_ann$NeN),  3),
  L         = round(c(Ne_mo$L,    Ne_ann$L),     2),
  V         = round(c(Ne_mo$V,    Ne_ann$V),     4),
  Min_N     = c(Ne_mo$Min_N, Ne_ann$Min_N)
)

knitr::kable(comparison,
             col.names = c("Time step", "Ne/N", "L (time units)",
                           "V", "Min N (Ne≥50)"),
             caption = "Ne/N estimates from monthly vs annual matrices,
                        L. rupestris population 1.")
```

## Interpreting the difference

The two approaches give different $N_e/N$ estimates because the time step
affects both $V$ (the variance in gene frequency change per time unit) and
$L$ (the generation time measured in time units):

- With **monthly** matrices, $L$ is measured in **months**. A generation of
  ~13 years corresponds to $L \approx 156$ months. The variance $V$ is very
  small per month because survival and reproduction are nearly constant
  month-to-month.

- With **annual** matrices, $L$ is measured in **years**, giving $L \approx 13$.
  The variance $V$ is larger per year because it integrates across 12 months
  of variation.

The key biological insight is that **$N_e/N$ should be similar between the
two approaches** when $V \times L$ is consistent — this ratio is what appears
in the denominator of Equation 6 of Yonezawa et al. (2000). Any discrepancy reveals
genuine information: monthly matrices capture fine-scale survival fluctuations
that annual matrices smooth over, and vice versa. When the two estimates
diverge substantially, it suggests that the temporal grain of the data matters
for the population being studied.

> **Recommendation:** If your field data were collected monthly, use monthly
> matrices. Do not convert to annual before computing $N_e$, as this
> introduces approximation error. Report the time unit of your matrices
> alongside your $N_e/N$ estimate.

---

# Running Ne_sexual_Y2000: a complete worked example

## All inputs explained

```{r inputs_explained, eval = FALSE}
Ne_sexual_Y2000(
  T_mat      = MatU,        # MatU: survival/transition matrix (s x s)
                             # Column sums must be <= 1
  F_vec      = MatF[1, ],   # First row of MatF: per-capita offspring
                             # production for each stage
  D          = D,           # Stage frequency vector, length s, sums to 1
                             # Use observed proportions or dominant eigenvector
  Vk_over_k  = rep(1, s),  # Variance-to-mean ratio of sexual reproductive
                             # output per stage. Default = 1 (Poisson).
                             # Set > 1 if some individuals dominate reproduction.
  a          = 0,           # Deviation from Hardy-Weinberg proportions.
                             # Default = 0 (random mating).
  L          = NULL,        # Generation time. If NULL, computed from T_mat
                             # and F_vec automatically.
  Ne_target  = 50,        # Ne viability threshold. Set to your study system:
                             #   50  = avoid inbreeding depression (Franklin 1980)
                             #   500 = maintain quantitative variation (Franklin 1980)
                             #   5000 = long-term evolutionary potential (Lande 1995)
  census_N   = 40,        # Your actual (or expected) census population size.
                             # Reports Ne_at_census = NeN * census_N directly,
                             # so you can see the Ne your population achieves NOW.
  population = "my pop"     # Label for output
)
```

## Full output for *L. rupestris* population 1

```{r full_output_Lrup1}
Ne_Lrup1 <- Ne_sexual_Y2000(
  T_mat      = MatU_Lrup1_mo,
  F_vec      = MatF_Lrup1_mo[1, ],
  D          = D_Lrup1,
  Ne_target  = 50,         # short-term inbreeding threshold
  census_N   = 40,         # realistic large population of Lepanthes
  population = "L. rupestris population 1 (monthly)"
)

print(Ne_Lrup1)
```

## Reading the output

The output has four sections:

**Stage survival** reports the annual survival rates and the two key
summary statistics: $\bar{u}$ (population-mean annual survival) and
$\bar{u^2}$ (stage-weighted mean of squared survivals). The difference
$\bar{u} - \bar{u^2}$ measures the **between-stage variance in survival**
— stages with very different survival rates contribute more to genetic drift.

**Variance decomposition** breaks $V$ into its components:

- $V_{\text{term1}} = 2(1+a)(\bar{u} - \bar{u^2})$ — between-stage
  survival variance. This is the only term that enters the purely clonal
  model (Eq. 10 of Yonezawa et al. (2000)).
- $V_{\text{term2}} = (1-\bar{u}) \cdot Avr(S)$ — sexual reproductive
  variance. Larger when fecundity is unequal across stages.

**Generation time** $L$ is the mean age of reproduction, computed following
Orive (1993), using the dominant eigenvector of $\mathbf{A}$ as the
stable-stage distribution.

**Effective size ratios** give the key conservation outputs:
- $N_e/N$ — the fraction of census individuals contributing to the effective
  population. Values below 0.10 are below the empirical median across 102
  wildlife species (Frankham 1995).
- Minimum census size $N$ required to achieve $N_e \geq 50$ (Franklin 1980).

---

# All populations of all three species

## Entering all matrices

```{r all_matrices}
# ============================================================
# L. rupestris — 7 populations, 6-stage model
# Monthly MatU and MatF from Tremblay & Ackerman (2001) Appendix
# ============================================================

# Helper: build MatF from fecundity values for stages 5 and 6
make_MatF_6 <- function(f5, f6) {
  F <- matrix(0, 6, 6)
  F[1, 5] <- f5
  F[1, 6] <- f6
  F  # MatF contains fecundity only — no diagonal terms
}

# Helper: build MatU for L. rupestris (6-stage, shared structure)
# Fixed positions: row1=0, row2 col1=0.738 col2=0.738, row4 col2=0.038,
#                  row5 col2=0.002
# Free parameters (13 values, in row order):
#   row3: s33, s35, s36
#   row4: s43, s44
#   row5: s53, s54, s55, s56
#   row6: s63, s64, s65, s66
make_MatU_Lrup <- function(s33, s35, s36,
                            s43, s44,
                            s53, s54, s55, s56,
                            s63, s64, s65, s66) {
  matrix(c(
    0,     0,     0,    0,    0,    0,
    0.738, 0.738, 0,    0,    0,    0,
    0,     0,     s33,  0,    s35,  s36,
    0,     0.038, s43,  s44,  0,    0,
    0,     0.002, s53,  s54,  s55,  s56,
    0,     0,     s63,  s64,  s65,  s66
  ), nrow = 6, byrow = TRUE,
  dimnames = list(paste0("stage_", 1:6), paste0("stage_", 1:6)))
}

# --- L. rupestris population 1 (already defined above) ---
MatU_Lrup <- list()
MatF_Lrup <- list()
MatU_Lrup[[1]] <- MatU_Lrup1_mo
MatF_Lrup[[1]] <- MatF_Lrup1_mo

# --- L. rupestris population 2 ---
MatU_Lrup[[2]] <- make_MatU_Lrup(
  0.466, 0.126, 0.028,   # row 3: s33, s35, s36
  0,     0.777,           # row 4: s43, s44
  0.491, 0.011, 0.750, 0.191,  # row 5
  0.022, 0,     0.112, 0.781   # row 6
)
MatF_Lrup[[2]] <- make_MatF_6(0.086, 0.273)

# --- L. rupestris population 3 ---
MatU_Lrup[[3]] <- make_MatU_Lrup(
  0.653, 0.100, 0.035,
  0,     0.777,
  0.307, 0.011, 0.794, 0.338,
  0.009, 0,     0.101, 0.627
)
MatF_Lrup[[3]] <- make_MatF_6(0.076, 0.204)

# --- L. rupestris population 4 ---
MatU_Lrup[[4]] <- make_MatU_Lrup(
  0.600, 0.101, 0.018,
  0,     0.777,
  0.310, 0.011, 0.740, 0.225,
  0.033, 0,     0.150, 0.744
)
MatF_Lrup[[4]] <- make_MatF_6(0.076, 0.204)

# --- L. rupestris population 5 ---
MatU_Lrup[[5]] <- make_MatU_Lrup(
  0.596, 0.153, 0.018,
  0,     0.777,
  0.348, 0.011, 0.728, 0.286,
  0.000, 0,     0.082, 0.688
)
MatF_Lrup[[5]] <- make_MatF_6(0.041, 0.165)

# --- L. rupestris population 6 ---
MatU_Lrup[[6]] <- make_MatU_Lrup(
  0.528, 0.133, 0.020,
  0,     0.777,
  0.375, 0.011, 0.695, 0.218,
  0.040, 0,     0.140, 0.749
)
MatF_Lrup[[6]] <- make_MatF_6(0.066, 0.315)

# --- L. rupestris population 7 ---
MatU_Lrup[[7]] <- make_MatU_Lrup(
  0.635, 0.224, 0.058,
  0,     0.777,
  0.333, 0.011, 0.695, 0.299,
  0.018, 0,     0.070, 0.643
)
MatF_Lrup[[7]] <- make_MatF_6(0.119, 0.507)

# Stage frequencies for L. rupestris (Table 1, split equally across stage pairs)
# Seedlings / Juveniles / Adults from Table 1
N_Lrup_raw <- list(
  c(44, 72, 97),   # pop 1
  c(40, 135, 86),  # pop 2
  c(66, 74, 95),   # pop 3
  c(14, 8,  74),   # pop 4
  c(28, 6,  62),   # pop 5
  c(66, 33, 98),   # pop 6
  c(107, 39, 102)  # pop 7
)

D_Lrup <- lapply(N_Lrup_raw, function(n) {
  expanded <- c(n[1]/2, n[1]/2, n[2]/2, n[2]/2, n[3]/2, n[3]/2)
  expanded / sum(expanded)
})

# ============================================================
# L. rubripetala — 6 populations, 5-stage model
# ============================================================
# make_MatU_Lrub: 5-stage MatU for L. rubripetala and L. eltoroensis
# Fixed elements: row2 col1=0.667, col2=0.667; row4 col2=0.037, col4=0.812
# Free parameters:
#   s33 = stage3 -> stage3 (stasis)
#   s35 = stage5 -> stage3 (retrogression from repro to juvenile)
#   s53 = stage3 -> stage5 (growth to reproductive)
#   s54 = stage4 -> stage5 (growth from non-repro adult)
#   s55 = stage5 -> stage5 (stasis in reproductive stage)
make_MatU_Lrub <- function(s33, s35,
                             s53, s54, s55) {
  matrix(c(
    0,     0,     0,     0,     0,
    0.667, 0.667, 0,     0,     0,
    0,     0,     s33,   0,     s35,
    0,     0.037, 0,     0.812, 0,
    0,     0,     s53,   s54,   s55
  ), nrow = 5, byrow = TRUE,
  dimnames = list(paste0("stage_", 1:5), paste0("stage_", 1:5)))
}

make_MatF_5 <- function(f5) {
  F <- matrix(0, 5, 5)
  F[1, 5] <- f5
  F  # MatF contains fecundity only — no diagonal terms
}

MatU_Lrub <- list()
MatF_Lrub <- list()

MatU_Lrub[[1]] <- make_MatU_Lrub(0.825, 0.227, 0.137, 0.010, 0.739)
MatU_Lrub[[2]] <- make_MatU_Lrub(0.546, 0.183, 0.426, 0.010, 0.793)
MatU_Lrub[[3]] <- make_MatU_Lrub(0.606, 0.137, 0.337, 0.010, 0.848)
MatU_Lrub[[4]] <- make_MatU_Lrub(0.743, 0.180, 0.176, 0.010, 0.768)
MatU_Lrub[[5]] <- make_MatU_Lrub(0.726, 0.250, 0.237, 0.010, 0.739)
MatU_Lrub[[6]] <- make_MatU_Lrub(0.801, 0.179, 0.131, 0.010, 0.784)

MatF_Lrub[[1]] <- make_MatF_5(0.043)
MatF_Lrub[[2]] <- make_MatF_5(0.102)
MatF_Lrub[[3]] <- make_MatF_5(0.108)
MatF_Lrub[[4]] <- make_MatF_5(0.067)
MatF_Lrub[[5]] <- make_MatF_5(0.067)
MatF_Lrub[[6]] <- make_MatF_5(0.159)

# Stage frequencies (Table 1): seedlings / juveniles / adults (one repro stage)
# For 5-stage model: stages 1-2 = seedlings/2, stages 3-4 = juveniles/2,
#                    stage 5 = adults
N_Lrub_raw <- list(
  c(9,  44, 101),  # pop 1
  c(0,  0,  27),   # pop 2 — no seedlings/juveniles observed
  c(5,  29, 63),   # pop 3
  c(37, 49, 41),   # pop 4
  c(52, 4,  46),   # pop 5
  c(24, 10, 29)    # pop 6
)

# For populations with missing stage observations, derive D from the
# stable stage distribution (dominant eigenvector of A = MatU + MatF)
  # — only used for L. rubripetala pop 2 where no seedlings or juveniles
  # were observed, so observed D cannot be constructed.
# This is standard practice when observed counts are incomplete.
D_Lrub <- lapply(seq_along(N_Lrub_raw), function(i) {
  n <- N_Lrub_raw[[i]]
  if (all(n[1:2] == 0)) {
    # No seedlings/juveniles observed — use stable stage distribution
    A <- MatU_Lrub[[i]] + MatF_Lrub[[i]]
    ev <- Re(eigen(A)$vectors[, 1])
    ev <- abs(ev)
    ev / sum(ev)
  } else {
    expanded <- c(n[1]/2, n[1]/2, n[2]/2, n[2]/2, n[3])
    expanded / sum(expanded)
  }
})

# ============================================================
# L. eltoroensis — 4 populations, 5-stage model (same structure as L. rubripetala)
# ============================================================
MatU_Lelt <- list()
MatF_Lelt <- list()

MatU_Lelt[[1]] <- matrix(c(
    0,     0,     0,     0,     0,
    0.667, 0.667, 0,     0,     0,
    0,     0,     0.719, 0,     0.274,
    0,     0.037, 0,     0.958, 0,
    0,     0,     0.258, 0.021, 0.720
  ), nrow=5, byrow=TRUE, dimnames=list(paste0("stage_",1:5), paste0("stage_",1:5)))
MatU_Lelt[[2]] <- matrix(c(
    0,     0,     0,     0,     0,
    0.667, 0.667, 0,     0,     0,
    0,     0,     0.696, 0,     0.255,
    0,     0.037, 0,     0.958, 0,
    0,     0,     0.304, 0.021, 0.742
  ), nrow=5, byrow=TRUE, dimnames=list(paste0("stage_",1:5), paste0("stage_",1:5)))
MatU_Lelt[[3]] <- matrix(c(
    0,     0,     0,     0,     0,
    0.667, 0.667, 0,     0,     0,
    0,     0,     0.759, 0,     0.406,
    0,     0.037, 0,     0.958, 0,
    0,     0,     0.222, 0.021, 0.589
  ), nrow=5, byrow=TRUE, dimnames=list(paste0("stage_",1:5), paste0("stage_",1:5)))
# Population 4: substitute L. rubripetala seedling transitions (as in paper)
MatU_Lelt[[4]] <- matrix(c(
    0,     0,     0,     0,     0,
    0.667, 0.667, 0,     0,     0,
    0,     0,     0.719, 0,     0.274,
    0,     0.037, 0,     0.958, 0,
    0,     0,     0.258, 0.021, 0.720
  ), nrow=5, byrow=TRUE, dimnames=list(paste0("stage_",1:5), paste0("stage_",1:5)))

MatF_Lelt[[1]] <- make_MatF_5(0.011)
MatF_Lelt[[2]] <- make_MatF_5(0.016)
MatF_Lelt[[3]] <- make_MatF_5(0.024)
MatF_Lelt[[4]] <- make_MatF_5(0.011)  # use pop 1 fecundity as substitute

# Stage frequencies (Table 1): juveniles and adults only (no seedlings observed)
N_Lelt_raw <- list(
  c(7,  42),   # pop 1: juveniles, adults
  c(8,  51),   # pop 2
  c(7,  43),   # pop 3
  c(5,  14)    # pop 4
)

# L. eltoroensis: no seedlings observed in any population.
# Stages 1-2 assigned 1% of total N each (small but non-zero);
# stages 3-4 split juveniles equally; stage 5 = adults.
# This preserves the observed juvenile:adult ratio while keeping
# D biologically grounded rather than using the eigenvector,
# which distorts Ne/N when lambda departs from 1.
D_Lelt <- lapply(seq_along(N_Lelt_raw), function(i) {
  n    <- N_Lelt_raw[[i]]
  tot  <- sum(n)
  expanded <- c(0.01*tot, 0.01*tot,  # stages 1-2: small non-zero
                n[1]/2, n[1]/2,       # stages 3-4: juveniles split
                n[2])                 # stage 5: adults
  expanded / sum(expanded)
})
```

## Computing Ne/N for all 17 populations

```{r compute_all}
# census_N = 40: a large Lepanthes population (Tremblay & Ackerman 2001)
run_Ne <- function(MatU_list, MatF_list, D_list, species_name,
                   census_N = 40, Ne_target = 50) {
  purrr::map_dfr(seq_along(MatU_list), function(i) {
    r <- Ne_sexual_Y2000(
      T_mat      = MatU_list[[i]],
      F_vec      = MatF_list[[i]][1, ],
      D          = D_list[[i]],
      Ne_target  = Ne_target,
      census_N   = census_N,
      population = paste(species_name, "pop", i)
    )
    data.frame(
      Species      = species_name,
      Population   = i,
      NeN          = round(r$NeN,          3),
      Ne_at_census = round(r$Ne_at_census,  1),
      Min_N        = r$Min_N,
      L_months     = round(r$L,             1),
      V            = round(r$V,             4)
    )
  })
}

results <- bind_rows(
  run_Ne(MatU_Lrup, MatF_Lrup, D_Lrup, "L. rupestris"),
  run_Ne(MatU_Lrub, MatF_Lrub, D_Lrub, "L. rubripetala"),
  run_Ne(MatU_Lelt, MatF_Lelt, D_Lelt, "L. eltoroensis")
)

# Add italic species column for kable display
results_display <- results %>%
  mutate(Species = paste0("*", Species, "*"))

knitr::kable(results_display,
             col.names = c("Species", "Population", "Ne/N", "Ne at N=40",
                           "Min N (Ne≥50)", "L (months)", "V"),
             escape  = FALSE,
             caption = "Ne/N estimates (monthly matrices) for all 17 populations
                        of three Lepanthes species. Ne at N=40 = estimated Ne
                        for a census population of 40 individuals (a large
                        Lepanthes population). Min N = minimum census size
                        required to achieve Ne >= 50 (Franklin 1980).
                        V = total variance in gene frequency change per time
                        step (V = V_term1 + V_term2 of Yonezawa et al. 2000);
                        larger V means stronger genetic drift per generation.")
```

## Comparing to Orive (1993) estimates

Tremblay & Ackerman (2001) used Orive's (1993) coalescence model (Table 7 of the paper).
The table below computes the Ne/N range per species directly from the NeStage results
and compares it to Orive's estimates:

```{r orive_comparison}
species_summary <- results %>%
  group_by(Species) %>%
  summarise(
    N_pops   = n(),
    NeN_min  = round(min(NeN),      3),
    NeN_max  = round(max(NeN),      3),
    NeN_mean = round(mean(NeN),     3),
    L_mean   = round(mean(L_months),1),
    .groups  = "drop"
  ) %>%
  mutate(
    NeN_range_NeStage = paste0(NeN_min, "–", NeN_max),
    NeN_range_Orive   = c("0.227–0.360",
                           "0.339–0.445",
                           "0.453–0.473"),
    Species = paste0("*", Species, "*")
  ) %>%
  select(Species, N_pops, NeN_range_NeStage, NeN_range_Orive, NeN_mean, L_mean)

knitr::kable(species_summary,
             col.names = c("Species", "Populations",
                           "Ne/N range (NeStage)", "Ne/N range (Orive 1993)",
                           "Mean Ne/N", "Mean L (months)"),
             escape  = FALSE,
             caption = "Species-level summary of Ne/N estimates from NeStage
                        (Yonezawa 2000 variance Ne, monthly matrices) compared
                        to the coalescence Ne of Orive (1993) reported in
                        Tremblay & Ackerman (2001) Table 7.")
```

The two approaches measure subtly different quantities:

- **Orive (1993)** computes the **inbreeding effective size** based on
  coalescence time — the expected time for two gene copies to share a common
  ancestor. It captures the effects of both genetic drift and the specific
  mating structure of the population.
- **Yonezawa (2000)** computes the **variance effective size** based on the
  variance in gene frequency change per generation. It focuses on how unequal
  lifetime reproductive contributions reduce the effective contribution of
  census individuals to the gene pool.

Both measure genetic drift but from different angles. Agreement between the
two is reassuring; divergence indicates that mating structure (Orive) or
reproductive variance (Yonezawa) is driving the discrepancy.

---

# Summary and conservation implications

```{r conservation_plot}
# Summary plot: Ne/N by species and population
ggplot(results_display, aes(x = factor(Population), y = NeN,
                     color = Species, group = Species)) +
  geom_line(linewidth = 0.8) +
  geom_point(size = 3) +
  geom_hline(yintercept = 0.10, linetype = "dashed",
             color = "grey40", linewidth = 0.7) +
  annotate("text", x = 0.7, y = 0.105,
           label = "Frankham (1995)\nmedian Ne/N = 0.10",
           hjust = 0, size = 3, color = "grey40") +
  scale_color_manual(
    values = c("#2166ac", "#1b7837", "#d6604d"),
    labels = c(
      expression(italic("L. eltoroensis")),
      expression(italic("L. rubripetala")),
      expression(italic("L. rupestris"))
    )
  ) +
  labs(
    title    = expression(paste(N[e]/N, " across populations of three ",
                                italic("Lepanthes"), " species")),
    subtitle = "Monthly transition matrices | Yonezawa (2000) variance Ne",
    x        = "Population",
    y        = expression(N[e]/N),
    color    = "Species",
    caption  = "Dashed line = empirical median across 102 wildlife species (Frankham 1995)."
  ) +
  theme_classic(base_size = 12) +
  theme(
    plot.title    = element_text(face = "bold"),
    legend.text   = element_text(face = "italic"),
    panel.grid.major.y = element_line(color = "grey92", linewidth = 0.4)
  )
```

```{r conservation_table}
# How many populations have Ne < 50 (Wright drift threshold)?
results_display %>%
  mutate(
    Drift_risk = ifelse(Ne_at_census < 50, "HIGH (Ne < 50)", "lower")
  ) %>%
  select(Species, Population, NeN, Ne_at_census, Drift_risk) %>%
  knitr::kable(
    col.names = c("Species", "Pop", "Ne/N", "Estimated Ne", "Drift risk"),
    escape    = FALSE,
    caption   = "Estimated Ne assuming a census size of N = 40, representative
                 of a large Lepanthes population (Tremblay & Ackerman 2001).
                 Ne < 50 indicates populations where genetic drift dominates
                 over natural selection (Wright 1931; Franklin 1980)."
  )
```

## Key conservation messages

1. **All three species have very small effective population sizes.** Even with
   the most optimistic Ne/N estimates, census populations of 50–100 individuals
   — typical for these species — yield Ne well below 50.

2. **The minimum census size for long-term conservation ($N_e \geq 5{,}000$)
   is far larger than any observed population.** This confirms the finding of
   Tremblay & Ackerman (2001) that these orchid populations face substantial genetic
   vulnerability.

3. **Ne/N varies substantially among populations and species.** *L. rupestris*
   shows greater among-population variation than the other two species,
   likely reflecting habitat heterogeneity along the Luquillo river system.

4. **Ne/N estimates from monthly vs annual matrices may differ.** When
   reporting Ne/N, always state the time step of your matrices. Monthly
   matrices are preferred for species with fine-scale demographic variation.

---

# Brief overview of the other two functions

## Ne_clonal_Y2000

Use this function when **all reproduction is clonal** ($d_i = 1$ for all
stages). This implements Equations 10–11 of Yonezawa et al. (2000):

$$N_e = \frac{N}{(1 - \bar{u^2}) \cdot L}, \quad
N_y = \frac{N}{1 - \bar{u^2}}$$

The clonal model is simpler than the general Equation 6 because sexual
reproductive variance does not enter. It is the appropriate model for
populations maintained entirely by vegetative reproduction (e.g., many
ferns, some grasses, clonal shrubs).

```{r clonal_example, eval = FALSE}
Ne_clonal <- Ne_clonal_Y2000(
  T_mat      = MatU,
  F_vec      = MatF[1, ],
  D          = D,
  show_Ny    = TRUE,       # also compute annual effective size
  population = "fully clonal population"
)
```

## Ne_mixed_Y2000

Use this function when **stages differ in their mix of sexual and clonal
reproduction** ($0 \leq d_i \leq 1$). This implements the full Equation 6
of Yonezawa et al. (2000), including the clonal reproductive variance term
$Avr(A\delta)$.

An important property: under Poisson defaults ($V_c/\bar{c} = V_k/\bar{k} = 1$,
$a = 0$), the clonal fraction $d_i$ has **no effect** on $N_e/N$. The clonal
fraction only influences $N_e$ when clonal output is more variable than
sexual output ($V_c/\bar{c} > V_k/\bar{k}$), which is biologically
realistic for many clonal plants.

```{r mixed_example, eval = FALSE}
Ne_mixed <- Ne_mixed_Y2000(
  T_mat      = MatU,
  F_vec      = MatF[1, ],
  D          = D,
  d          = c(0.0, 0.0, 0.5),  # stage 3 is 50% clonal
  Vc_over_c  = c(1, 1, 2),        # stage 3 has super-Poisson clonal variance
  population = "mixed orchid"
)
```

---

# Session information

```{r session_info}
sessionInfo()
```

---

# References

- Caswell H. (2001). *Matrix Population Models*, 2nd ed. Sinauer Associates.
- Frankham R. (1995). Effective population size/adult population size ratios in wildlife: a review. *Genetical Research* **66**: 95–107.
- Lande R. (1995). Mutation and conservation. *Conservation Biology* **9**: 782–791.
- Luer C.A. (1986). Icones Pleurothallidinarum I. *Monograph in Systematic Botany* **15**.
- Orive M.E. (1993). Effective population size in organisms with complex life histories. *Theoretical Population Biology* **44**: 316–340.
- Salguero-Gómez R. et al. (2015). The COMPADRE Plant Matrix Database. *Journal of Ecology* **103**: 202–218.
- Tremblay R.L. & Ackerman J.D. (2001). Gene flow and effective population size in *Lepanthes* (Orchidaceae): a case for genetic drift. *Biological Journal of the Linnean Society* **72**: 47–62.
- Wright S. (1931). Evolution in Mendelian populations. *Genetics* **16**: 97–159.
- Yonezawa K., Ishida T. & Iwata H. (2000). Formulation and estimation of the effective size of stage-structured populations. *Evolution* **54**: 2007–2013.
