---
title: "Advanced features"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Advanced features}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4.5
)
```

```{r setup}
library(ggplot2)
library(marimekko)

titanic <- as.data.frame(Titanic)
```

This vignette covers the advanced features of `marimekko` beyond the
basics shown in `vignette("getting-started")`.

## Basic marimekko plot

```{r basic}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived)
```

## Pearson residuals

Pearson residuals measure how much each cell deviates from the
independence assumption. Positive residuals indicate more observations
than expected; negative residuals indicate fewer.

Residuals are automatically computed and exposed as the
`.residuals` computed variable, which you can map to an aesthetic via
`after_stat()`:

```{r residuals}
ggplot(titanic) +
  geom_marimekko(
    aes(
      fill = Survived, weight = Freq,
      alpha = after_stat(abs(.residuals))
    ),
    formula = ~ Class | Survived
  ) +
  scale_alpha_continuous(range = c(0.3, 1), guide = "none") +
  labs(title = "Residual shading: stronger opacity = larger deviation")
```

You can also map residuals to colour instead of relying on fill:

```{r residuals-colour}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  geom_marimekko_text(aes(
    label = after_stat(round(.residuals, 1))
  ), colour = "white", size = 3) +
  labs(title = "Pearson residuals as labels")
```

## Three-variable nested mosaic

`geom_marimekko()` supports multi-variable formulas. A three-variable
formula (`~ X | Y | Z`) partitions the plot in alternating directions
(horizontal, vertical, horizontal):

- First split: horizontal by `X` (column widths proportional to `X`)
- Second split: vertical by `Y` within each column
- Third split: horizontal by `Z` within each cell

```{r multi}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived | Sex
  ) +
  labs(title = "Nested mosaic: Class > Sex > Survived")
```

This produces a richer view than faceting because all three variables
share a single coordinate space, making relative proportions directly
comparable.

## Y-axis labels

By default `geom_marimekko()` automatically labels both axes with
category names. The y-axis shows proportions from 0 to 1, while the
x-axis displays category labels at each column's midpoint.

## Data extraction with fortify

`fortify_marimekko()` returns computed tile positions as a plain data
frame without creating a plot. It accepts the same formula syntax as
`geom_marimekko()`:

```{r fortify}
tiles <- fortify_marimekko(titanic,
  formula = ~ Class | Survived, weight = Freq
)
head(tiles)
```

Multi-variable formulas work too:

```{r fortify-3var}
tiles_3 <- fortify_marimekko(titanic,
  formula = ~ Class | Survived | Sex, weight = Freq
)
head(tiles_3)
```

The returned columns are:

| Column | Description |
|--------|-------------|
| Formula variables | One column per formula variable (e.g. `Class`, `Survived`) |
| `fill` | The fill variable value |
| `xmin`, `xmax` | Horizontal extent of the tile |
| `ymin`, `ymax` | Vertical extent of the tile |
| `x`, `y` | Tile center coordinates |
| `weight` | Aggregated count |
| `.proportion` | Conditional proportion within the parent tile |
| `.marginal` | Proportion of the grand total |
| `.residuals` | Pearson residual |

## Extending with custom ggplot2 layers

The companion layers `geom_marimekko_text()`, `geom_marimekko_label()`
automatically read tile positions from a
preceding `geom_marimekko()` layer. You only need to specify the
`label` aesthetic:

```{r companion-layers}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  geom_marimekko_text(aes(
    label = after_stat(paste(Class, Survived, weight, sep = "\n"))
  ), colour = "white", size = 2.5)
```

For more control, use `fortify_marimekko()` to pre-compute tiles and
pass them as `data` to any standard ggplot2 geom. This lets you
summarize, filter, or transform the tile data before plotting:

```{r fortify-custom}
tiles <- fortify_marimekko(titanic,
  formula = ~ Class | Survived, weight = Freq
)

# Highlight cells with significant residuals
tiles$significant <- abs(tiles$.residuals) > 2

ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  geom_label(
    data = tiles[tiles$significant, ],
    aes(x = x, y = y, label = paste0("r=", round(.residuals, 1))),
    fill = "yellow", size = 3, fontface = "bold"
  ) +
  labs(title = "Significant deviations from independence (|r| > 2)")
```

Because `fortify_marimekko()` returns a plain data frame, you can use
any ggplot2 geom -- `geom_segment()`, `geom_curve()`, `geom_tile()`,
`ggrepel::geom_label_repel()`, etc.

## Extending with `StatMarimekkoTiles`

The exported `StatMarimekkoTiles` ggproto object lets you pair
marimekko tile positions with **any** geom. While the convenience
wrappers `geom_marimekko_text()` and `geom_marimekko_label()` cover
the most common case (text overlays), `StatMarimekkoTiles` gives you
full control by plugging directly into `ggplot2::layer()`.

### How it works

`StatMarimekkoTiles` does not compute tile positions itself — it reads
them from a preceding `geom_marimekko()` layer via an internal shared
environment. This means:

1. A `geom_marimekko()` layer **must** appear before any layer that
   uses `StatMarimekkoTiles`.
2. The stat returns one row per tile with columns `xmin`, `xmax`,
   `ymin`, `ymax`, `x`, `y` (centre), `weight`, `fill`,
   `.proportion`, `.residuals`, and `.tooltip`.
3. You can reference any of these columns in `aes()` via
   `after_stat()`.

### Example: bubble overlay

Map point size to `weight` to show tile counts as bubbles:

```{r stat-tiles-bubble}
ggplot(titanic) +
  geom_marimekko(
    aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived, alpha = 0.4
  ) +
  layer(
    stat = StatMarimekkoTiles,
    geom = GeomPoint,
    mapping = aes(size = after_stat(weight)),
    data = titanic,
    position = "identity",
    show.legend = FALSE,
    inherit.aes = FALSE,
    params = list(colour = "white", alpha = 0.7)
  ) +
  scale_size_area(max_size = 12) +
  labs(title = "Bubble overlay via StatMarimekkoTiles")
```

### Example: residual markers

Colour and size encode deviation from independence:

```{r stat-tiles-residuals}
ggplot(titanic) +
  geom_marimekko(
    aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  layer(
    stat = StatMarimekkoTiles,
    geom = GeomPoint,
    mapping = aes(
      size = after_stat(abs(.residuals)),
      colour = after_stat(ifelse(.residuals > 0, "over", "under"))
    ),
    data = titanic,
    position = "identity",
    show.legend = TRUE,
    inherit.aes = FALSE,
    params = list(alpha = 0.8)
  ) +
  scale_colour_manual(
    values = c(over = "tomato", under = "steelblue"),
    name = "Deviation"
  ) +
  scale_size_continuous(range = c(1, 8), name = "|Residual|") +
  labs(title = "Residual markers via StatMarimekkoTiles")
```

### Example: rectangle outlines

Use `GeomRect` to draw highlighted borders around specific tiles
(e.g. tiles with large residuals):

```{r stat-tiles-rect}
ggplot(titanic) +
  geom_marimekko(
    aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  layer(
    stat = StatMarimekkoTiles,
    geom = GeomRect,
    mapping = aes(
      linewidth = after_stat(ifelse(abs(.residuals) > 2, 1.5, 0))
    ),
    data = titanic,
    position = "identity",
    show.legend = FALSE,
    inherit.aes = FALSE,
    params = list(colour = "red", fill = NA)
  ) +
  labs(title = "Highlight tiles with |residual| > 2")
```

### `StatMarimekkoTiles` vs `fortify_marimekko()`

Both give access to the same computed tile data, but they serve
different purposes:

| | `StatMarimekkoTiles` | `fortify_marimekko()` |
|---|---|---|
| **When** | At render time (reactive) | Before plotting (static) |
| **Input** | Reads from a `geom_marimekko()` layer | Standalone function call |
| **Use case** | Adding companion layers on the same plot | Pre-processing, filtering, or using tile data outside ggplot2 |
| **Faceting** | Automatically panel-aware | Manual panel handling |

Use `StatMarimekkoTiles` when you want to add layers that stay in sync
with `geom_marimekko()` parameters. Use `fortify_marimekko()` when you
need to transform or subset the tile data before passing it to a geom.

## Combining layers

Because `marimekko` produces standard ggplot2 layers, you can freely
combine multiple features:

```{r combined}
ggplot(titanic) +
  geom_marimekko(
    aes(
      fill = Survived, weight = Freq,
      alpha = after_stat(abs(.residuals))
    ),
    formula = ~ Class | Survived,
    show_percentages = TRUE
  ) +
  geom_marimekko_text(aes(label = after_stat(weight)),
    colour = "white", size = 3.5
  ) +
  scale_alpha_continuous(range = c(0.4, 1), guide = "none") +
  theme_marimekko() +
  labs(
    title = "Full-featured mosaic plot",
    subtitle = "Residual shading + counts + marginal %"
  )
```

## Independent x/y gaps

By default, `gap` controls both horizontal (between columns) and
vertical (between segments) spacing. Use `gap_x` and `gap_y` to set
them independently:

```{r gap-xy}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived, gap_x = 0.04, gap_y = 0
  ) +
  labs(title = "Wide column gaps, no vertical gaps")
```

```{r gap-xy2}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived, gap_x = 0, gap_y = 0.03
  ) +
  labs(title = "No column gaps, visible vertical gaps")
```

## Colour palette

`marimekko` ships with an Marimekko inspired color pallette. Use `theme_marimekko()` oe use 
`scale_fill_manual(palette = marimekko_pal)`:

```{r palette}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  ) +
  theme_marimekko() +
  labs(title = "Earthy Nordic palette")
```

By default, tile borders match the fill colour (borders blend in).
Set `colour` explicitly to restore visible borders:

```{r colour-override}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived, colour = "white"
  ) +
  theme_marimekko() +
  labs(title = "White borders with marimekko palette")
```

## Plotly interactivity

marimekko plots work with `plotly::ggplotly()` out of the box:

```{r plotly, eval = FALSE}
library(plotly)

p <- ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived
  )
ggplotly(p)
```

## In-aesthetic expressions

Unlike some mosaic packages, `marimekko` supports arbitrary R
expressions — both in formulas and inside `aes()`:

```{r in-aes}
# Expressions work in formulas
ggplot(mtcars) +
  geom_marimekko(formula = ~ factor(cyl) | factor(gear)) +
  labs(
    y = "Gears", fill = "Gears",
    title = "factor() inside formula works"
  )
```

## Namespace-qualified usage

`marimekko` works correctly when called with `::` notation
(e.g., `marimekko::geom_marimekko()`) without requiring
`library(marimekko)`. This makes it safe to use inside other packages
via `Imports` rather than `Depends`.

## Summary of parameters

| Parameter | Used in | Description |
|-----------|---------|-------------|
| `formula` | `geom_marimekko()`, `fortify_marimekko()` | Formula specifying variable hierarchy (`~ a \| b \| c`) |
| `gap` | `geom_marimekko()`, `fortify_marimekko()` | Spacing between tiles (fraction of plot area) |
| `gap_x` | `geom_marimekko()`, `fortify_marimekko()` | Horizontal gap (overrides `gap` for x) |
| `gap_y` | `geom_marimekko()`, `fortify_marimekko()` | Vertical gap (overrides `gap` for y) |
| `standardize` | `fortify_marimekko()` | Equal-width columns (spine plot) |
| `colour` | `geom_marimekko()` | Tile border colour. Default `NULL` (matches fill) |
| `show_percentages` | `geom_marimekko()` | Append marginal % to x-axis labels |
