---
title: "How forrest works"
format: html
vignette: >
  %\VignetteIndexEntry{How forrest works}
  %\VignetteEngine{quarto::html}
  %\VignetteEncoding{UTF-8}
knitr:
  opts_chunk:
    collapse: true
    comment: "#>"
    fig.width: 7
    fig.height: 4
    out.width: "100%"
---

```{r setup}
#| include: false
library(forrest)
```

This vignette walks through the internals of `forrest()` step by step, using
concrete data at each stage so you can see exactly what gets built before
anything is drawn.

---

## Design principles

`forrest` follows three principles:

1. **One function, all use cases.** `forrest()` covers regression tables,
   meta-analyses, subgroup analyses, dose-response patterns, and multi-model
   comparisons through a uniform column-name-based interface.

2. **Data and structure are separate.** Users supply tidy data (one row = one
   estimate). Visual structure — section headers, indentation, spacers — is
   derived from grouping columns via `section` / `subsection`, not from
   manually inserted NA rows in the data.

3. **Base graphics with a single dependency.** All drawing uses base R
   `graphics` functions. The only external dependency is
   [tinyplot](https://github.com/grantmcdermott/tinyplot), used solely to
   initialise the plot region.

---

## Source files

| File | Purpose |
|------|---------|
| `R/forrest.R` | Exported `forrest()` — validation, section expansion, drawing pipeline |
| `R/save.R` | Exported `save_forrest()` — device dispatch for PDF/PNG/SVG/TIFF |
| `R/utils.R` | Internal helpers: `build_sections()`, `compute_dodge_groups()`, `group_colors()`, `group_shapes()`, `check_col()`, `%||%` |
| `R/draw.R` | Internal drawing helpers: `draw_diamond()`, `draw_text_panel()` |
| `R/theme.R` | Theme infrastructure: `.theme_defaults`, `.themes`, `resolve_theme()` |

---

## Starting data

We will use a small but representative data set throughout. Six studies are
grouped into three geographic regions, and each region has a pooled estimate.

```{r starting-data}
meta <- data.frame(
  study  = c(
    "Chen (2016)", "Ibrahim (2022)",
    "Bauer (2015)", "Evans (2018)", "Garcia (2020)", "Jensen (2023)",
    "Fuentes (2019)"
  ),
  region = c(
    "Asia",   "Asia",
    "Europe", "Europe", "Europe", "Europe",
    "Latin America"
  ),
  or     = c(1.081, 1.092, 1.095, 1.057, 1.086, 1.070, 1.116),
  lower  = c(1.038, 1.052, 1.058, 1.019, 1.050, 1.036, 1.063),
  upper  = c(1.126, 1.134, 1.134, 1.096, 1.123, 1.105, 1.171),
  weight = c(2065,  1736,  816,   1041,  1479,  918,   567),
  is_sum = c(FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE),
  or_text = sprintf("%.2f (%.2f\u2013%.2f)",
                    c(1.081, 1.092, 1.095, 1.057, 1.086, 1.070, 1.116),
                    c(1.038, 1.052, 1.058, 1.019, 1.050, 1.036, 1.063),
                    c(1.126, 1.134, 1.134, 1.096, 1.123, 1.105, 1.171))
)
meta
```

Without any structural arguments, all seven rows are drawn as plain study rows:

```{r plain}
forrest(
  meta,
  estimate  = "or",
  lower     = "lower",
  upper     = "upper",
  label     = "study",
  weight    = "weight",
  log_scale = TRUE,
  ref_line  = 1,
  xlab      = "OR (95% CI)"
)
```

---

## Step 1b — Section expansion via build_sections()

`build_sections()` is the function that converts the tidy data into the
display-ready expanded frame. Calling it directly shows what `forrest()`
sees before drawing.

```{r build-sections-call}
# build_sections() is an internal function; access via :::
expanded <- forrest:::build_sections(
  df             = meta,
  estimate       = "or",
  lower          = "lower",
  upper          = "upper",
  label          = "study",
  is_summary     = "is_sum",
  weight         = "weight",
  section        = "region",
  subsection     = NULL,
  section_indent = TRUE,
  section_spacer = TRUE,
  cols           = "or_text",
  section_cols   = NULL
)
```

The result is a list with four elements. `$df` is the expanded data frame:

```{r expanded-df}
expanded$df[, c("study", "region", "or", "is_sum", "or_text")]
```

The three flag vectors identify which rows are structural:

```{r expanded-flags}
data.frame(
  study              = expanded$df$study,
  is_section_header  = expanded$is_section_header,
  is_subsection_hdr  = expanded$is_subsection_header,
  is_spacer          = expanded$is_spacer
)
```

Key observations:

- Row 1 (`"Asia"`) and row 5 (`"Europe"`) and row 11 (`"Latin America"`) are
  section header rows — `is_section_header = TRUE`, `or = NA`.
- Data rows within each section are indented by two leading spaces.
- The blank spacer rows (`study = ""`) follow each section.
- `or_text` is `""` for all structural rows.

Passing `section = "region"` to `forrest()` triggers this expansion
automatically:

```{r section-plot}
#| fig-height: 7
forrest(
  meta,
  estimate  = "or",
  lower     = "lower",
  upper     = "upper",
  label     = "study",
  section   = "region",
  weight    = "weight",
  log_scale = TRUE,
  ref_line  = 1,
  xlab      = "OR (95% CI)"
)
```

---

## Subsection expansion

With both `section` and `subsection`, `build_sections()` inserts two levels of
headers. Here each region contains studies from different design types.

```{r subsection-data}
meta2 <- data.frame(
  region = c("Europe", "Europe", "Europe", "Europe", "Asia", "Asia"),
  design = c("Cohort", "Cohort", "Case-control", "Case-control",
             "Cohort", "Case-control"),
  study  = c("Bauer (2015)", "Evans (2018)",
             "Garcia (2020)", "Jensen (2023)",
             "Chen (2016)", "Ibrahim (2022)"),
  or     = c(1.095, 1.057, 1.086, 1.070, 1.081, 1.092),
  lower  = c(1.058, 1.019, 1.050, 1.036, 1.038, 1.052),
  upper  = c(1.134, 1.096, 1.123, 1.105, 1.126, 1.134)
)
```

```{r subsection-expanded}
exp2 <- forrest:::build_sections(
  df           = meta2,
  estimate     = "or",
  lower        = "lower",
  upper        = "upper",
  label        = "study",
  is_summary   = NULL,
  weight       = NULL,
  section      = "region",
  subsection   = "design",
  section_indent = TRUE,
  section_spacer = TRUE
)

data.frame(
  study               = exp2$df$study,
  is_section_header   = exp2$is_section_header,
  is_subsection_header = exp2$is_subsection_header,
  is_spacer           = exp2$is_spacer
)
```

```{r subsection-plot}
#| fig-height: 7
forrest(
  meta2,
  estimate   = "or",
  lower      = "lower",
  upper      = "upper",
  label      = "study",
  section    = "region",
  subsection = "design",
  log_scale  = TRUE,
  ref_line   = 1,
  xlab       = "OR (95% CI)"
)
```

---

## Step 3 — Row type classification

After section expansion, `forrest()` classifies every row into one of four
types. Using the first expanded frame:

```{r row-types}
df  <- expanded$df
est <- as.numeric(df$or)
is_sum    <- as.logical(df$is_sum)
is_struct <- expanded$is_section_header |
             expanded$is_subsection_header |
             expanded$is_spacer
is_ref    <- is.na(est) & !is_sum & !is_struct
is_bold   <- (expanded$is_section_header |
              expanded$is_subsection_header) &
             nchar(trimws(df$study)) > 0L

data.frame(
  study      = df$study,
  is_sum     = is_sum,
  is_struct  = is_struct,
  is_ref     = is_ref,
  is_bold    = is_bold,
  CI_drawn   = !is_sum & !is_struct & !is_ref & !is.na(est)
)
```

The `is_ref` column would be `TRUE` for a reference-category row (user-supplied
`NA` estimate that is not a structural row). For this data there are none.

---

## Step 8 — Dodge layout

`compute_dodge_groups()` assigns visual group IDs. Consecutive rows with the
same label form one group; structural rows are always singletons.

For a non-dodged layout, each row maps to one y slot:

```{r dodge-no-dodge}
lbl       <- as.character(expanded$df$study)
group_ids <- forrest:::compute_dodge_groups(lbl, is_struct)
n_vis     <- max(group_ids)
# y slot for each row (top = n_vis, bottom = 1)
row_y     <- (n_vis + 1L) - group_ids

data.frame(study = lbl, group_id = group_ids, y = row_y)
```

For a dodged layout with two series per label, consecutive rows sharing a label
form one group and are spread around the group centre:

```{r dodge-example-data}
dodge_ex <- data.frame(
  label    = rep(c("Asia", "Europe"), each = 2),
  method   = rep(c("Cohort", "Case-control"), 2),
  or       = c(1.08, 1.05, 1.09, 1.07),
  lower    = c(1.04, 1.01, 1.05, 1.03),
  upper    = c(1.13, 1.09, 1.14, 1.11)
)
```

```{r dodge-example-groups}
lbl2 <- as.character(dodge_ex$label)
grp2 <- forrest:::compute_dodge_groups(lbl2, rep(FALSE, nrow(dodge_ex)))

dodge_amt <- 0.25
n_vis2    <- max(grp2)
grp_cy    <- (n_vis2 + 1L) - seq_len(n_vis2)

row_y2 <- numeric(nrow(dodge_ex))
for (g in seq_len(n_vis2)) {
  idx     <- which(grp2 == g)
  k       <- length(idx)
  offsets <- seq(-(k - 1L) / 2, (k - 1L) / 2, length.out = k) * dodge_amt
  row_y2[idx] <- grp_cy[g] + offsets
}

data.frame(
  label    = lbl2,
  method   = dodge_ex$method,
  group_id = grp2,
  y        = row_y2
)
```

The two "Asia" rows are offset symmetrically around y = 2 (the group centre),
and the two "Europe" rows around y = 1:

```{r dodge-plot}
forrest(
  dodge_ex,
  estimate = "or",
  lower    = "lower",
  upper    = "upper",
  label    = "label",
  group    = "method",
  dodge    = TRUE,
  log_scale = TRUE,
  ref_line  = 1,
  xlab      = "OR (95% CI)"
)
```

---

## Colour assignment

`group_colors()` maps unique levels to the Okabe-Ito palette (skipping index 1,
which is near-white):

```{r group-colors}
forrest:::group_colors(c("Asia", "Europe", "Latin America"))
```

When `group` is supplied, each row's colour comes from this map:

```{r color-assignment}
grp     <- c("Asia", "Asia", "Europe", "Europe", "Latin America")
col_map <- forrest:::group_colors(grp)
col_vec <- unname(col_map[grp])
data.frame(grp, colour = col_vec)
```

---

## Section-level text column annotations

`section_cols` lets specific `cols` columns show a section-level value in the
header row rather than `""`. The value comes from the first non-NA entry of the
named data column within each section.

```{r section-cols-data}
meta$k_text <- c("k = 2", "k = 2",
                 "k = 4", "k = 4", "k = 4", "k = 4",
                 "k = 1")

exp_sc <- forrest:::build_sections(
  df           = meta,
  estimate     = "or",
  lower        = "lower",
  upper        = "upper",
  label        = "study",
  is_summary   = "is_sum",
  weight       = "weight",
  section      = "region",
  section_cols = c(k_text = "k_text"),
  cols         = c("or_text", "k_text"),
  section_spacer = FALSE,
  section_indent = FALSE
)

exp_sc$df[, c("study", "or_text", "k_text")]
```

Header rows have `""` in `or_text` (a row-level column) and the section value
in `k_text` (declared in `section_cols`). Data rows keep their original values.

```{r section-cols-plot}
#| fig-height: 7
#| fig-width: 10
forrest(
  meta,
  estimate     = "or",
  lower        = "lower",
  upper        = "upper",
  label        = "study",
  section      = "region",
  section_cols = c("k" = "k_text"),
  weight       = "weight",
  log_scale    = TRUE,
  ref_line     = 1,
  header       = "Study",
  cols         = c("OR (95% CI)" = "or_text", "k" = "k_text"),
  widths       = c(3.5, 3.5, 2.2, 1.0),
  xlab         = "OR (95% CI)"
)
```

---

## Reference-category rows

A row where `estimate = NA` and which is **not** auto-inserted by
`build_sections()` is a reference category. It produces no CI or point, its
label is rendered in regular (non-bold) font, and `ref_label = TRUE`
appends `" (Ref.)"` automatically.

```{r ref-category-data}
dose <- data.frame(
  quartile = c("Q1", "Q2", "Q3", "Q4"),
  or       = c(NA,   1.21, 1.45, 1.82),
  lower    = c(NA,   1.08, 1.28, 1.60),
  upper    = c(NA,   1.36, 1.65, 2.07)
)
dose
```

With `ref_label = TRUE`, the Q1 row's label gets `" (Ref.)"` appended and no
CI is drawn:

```{r ref-category-plot}
forrest(
  dose,
  estimate  = "or",
  lower     = "lower",
  upper     = "upper",
  label     = "quartile",
  ref_label = TRUE,
  log_scale = TRUE,
  ref_line  = 1,
  xlab      = "OR (95% CI)"
)
```

---

## Summary (diamond) rows

Rows with `is_summary = TRUE` are drawn as filled diamonds by `draw_diamond()`.
The diamond's left and right tips are at `lo[i]` and `hi[i]` (the CI bounds),
its horizontal centre is at `est[i]`, and its half-height is `0.38 * cex`.
The diamond is clipped to `xlim` if the CI extends beyond the axis.

```{r diamond-data}
with_pool <- rbind(
  meta[, c("study", "region", "or", "lower", "upper", "is_sum")],
  data.frame(
    study  = "Pooled", region = "Overall",
    or     = 1.082, lower = 1.058, upper = 1.107,
    is_sum = TRUE
  )
)
```

```{r diamond-plot}
#| fig-height: 8
forrest(
  with_pool,
  estimate   = "or",
  lower      = "lower",
  upper      = "upper",
  label      = "study",
  section    = "region",
  is_summary = "is_sum",
  log_scale  = TRUE,
  ref_line   = 1,
  xlab       = "OR (95% CI)"
)
```

---

## Theme system

`resolve_theme()` merges user overrides with `.theme_defaults`. All six theme
keys and their defaults:

```{r theme-defaults}
forrest:::.theme_defaults
```

Built-in themes are stored as partial override lists:

```{r theme-list}
forrest:::.themes
```

A custom theme overrides only the keys you supply:

```{r custom-theme}
#| fig-height: 3.5
dat <- data.frame(
  label    = c("A", "B", "C"),
  estimate = c(0.2, -0.1, 0.4),
  lower    = c(0.0, -0.3, 0.2),
  upper    = c(0.4,  0.1, 0.6)
)

forrest(
  dat,
  estimate = "estimate",
  lower    = "lower",
  upper    = "upper",
  label    = "label",
  theme    = list(ref_col = "#e63946", ref_lty = 1L,
                  grid_col = "#eeeeee", stripe_col = "#fafafa"),
  stripe   = TRUE,
  xlab     = "Coefficient (95% CI)"
)
```