---
title: "Getting started with marimekko"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with marimekko}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4.5
)
```

## What is a marimekko plot?

A marimekko (or mosaic) plot is a two-dimensional visualization of a
contingency table. Each column represents a category of one variable, and
the segments within each column represent categories of a second variable:
- **Column widths** are proportional to the marginal counts of the
  x variable.
- **Segment heights** within each column are proportional to the
  conditional counts of the fill variable given x.

The `marimekko` package provides this as a native ggplot2 layer, so you
can combine it with any other ggplot2 functionality (facets, themes,
annotations, etc.).

## Installation

```{r, eval = FALSE}
# From CRAN
install.packages("marimekko")

# From GitHub (when published)
devtools::install_github("gogonzo/marimekko")
```

## Your first marimekko plot

The built-in `Titanic` dataset records survival counts by class, sex, and
age. Let's visualize survival by passenger class.

```{r basic}
library(ggplot2)
library(marimekko)

titanic <- as.data.frame(Titanic)

ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived) +
  labs(title = "Titanic survival by class")
```

Two components are at work:

1. **`geom_marimekko()`** computes tile positions from your data. The
   `formula` defines the variables (columns and segments), `fill` defines
   the segment colours, and `weight` provides the counts. Axis labels
   are automatically added.
2. Standard ggplot2 functions (`labs()`, `theme()`, etc.) work as usual.

## Aesthetics

`geom_marimekko()` understands these aesthetics and parameters:

| Parameter / Aesthetic | Required | Description |
|-----------------------|----------|-------------|
| `formula`             | yes      | Formula specifying variables, e.g. `~ X \| Y` |
| `fill`                | no       | Categorical variable for segment colours (defaults to last formula variable) |
| `weight`              | no       | Numeric weight/count (default 1) |

If your data already has one row per observation (no aggregation needed),
omit `weight`:

```{r unweighted}
ggplot(mtcars) +
  geom_marimekko(aes(fill = factor(gear)),
    formula = ~ cyl | gear
  )
```

## Gap control

The `gap` parameter controls spacing between tiles as a fraction of the
plot area. Default is `0.01`.

```{r gap}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived, gap = 0.03
  ) +
  labs(title = "Wider gaps (gap = 0.03)")
```

Set `gap = 0` for a seamless mosaic:

```{r no-gap}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived, gap = 0
  ) +
  labs(title = "No gaps")
```

## Marginal percentages

`geom_marimekko()` can append marginal percentages to the x-axis
labels via the `show_percentages` parameter:

```{r pct}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq),
    formula = ~ Class | Survived,
    show_percentages = TRUE
  )
```

## Adding text labels

Use `geom_marimekko_text()` (or `geom_marimekko_label()` for a boxed
version) to place labels at tile centers. Tile positions are read
automatically from the preceding `geom_marimekko()` layer — only
the `label` aesthetic is needed. Reference computed variables via
`after_stat()`:

- `weight` -- the aggregated count for the tile
- `cond_prop` / `.proportion` -- the conditional proportion within the parent
- `.residuals` -- Pearson residual
- Original variable columns (e.g. `Class`, `Survived`)

```{r text-labels}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived) +
  geom_marimekko_text(aes(label = after_stat(weight)), colour = "white") +
  labs(title = "Counts inside tiles")
```

Percentage labels:

```{r pct-labels}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived) +
  geom_marimekko_text(aes(
    label = after_stat(paste0(round(cond_prop * 100), "%"))
  ), colour = "white", size = 3)
```

## Theming

`theme_marimekko()` provides a clean, minimal theme that removes
distracting x-axis gridlines:

```{r theme}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived) +
  theme_marimekko() +
  labs(title = "With theme_marimekko()")
```

Since it builds on `theme_minimal()`, you can override any element:

```{r theme-custom}
ggplot(titanic) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived) +
  theme_marimekko() +
  theme(legend.position = "bottom")
```

## Faceting

`geom_marimekko()` supports ggplot2 faceting. Each panel gets its own
independently proportioned mosaic:

```{r facet}
ggplot(as.data.frame(Titanic)) +
  geom_marimekko(aes(fill = Survived, weight = Freq), formula = ~ Class | Survived) +
  facet_wrap(~Sex) +
  labs(title = "Survival by class, faceted by sex")
```

## Next steps

See `vignette("advanced-features")` for spine plots, Pearson residuals,
three-variable mosaics, and programmatic data extraction
with `fortify_marimekko()`.
