---
title: "Other Examples of Nested Logit Models"
author: "Michael Friendly"
date: "`r Sys.Date()`"
package: nestedLogit
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 1
    number_sections: true
fig_caption: yes
bibliography: ["references.bib", "packages.bib"]
csl: apa.csl
vignette: >
  %\VignetteIndexEntry{Other Examples of Nested Logit Models}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  message   = FALSE,
  warning   = FALSE,
  fig.align = "center",
  fig.height = 5,
  fig.width  = 7,
  fig.path   = "fig/",
  dev        = "png",
  comment    = "#>"
)

# packages to be cited in packages.bib
.to.cite <- c("nestedLogit", "nnet", "car", "ggplot2", "AER", "dplyr",
               "mlogit", "vcd", "MASS")
```

# Introduction

This document explores datasets beyond those already packaged with `nestedLogit`
(`Womenlf`, `GSS`, `gators`, `HealthInsurance`) that could profit from treatment
as nested logit models, either because:

- the response categories have a natural **hierarchical tree structure** and the
  independence-of-irrelevant-alternatives (IIA) assumption of plain multinomial
  logit is implausible across nests, or
  
- the response is **ordinal** and **continuation logits** (a special case of
  nested dichotomies) are a natural model, and the **proportional odds** model
  (which assumes equal slopes for the predictors) might not be likely to hold.

From a scan of suitable datasets in available packages, the best candidates identified for this vignette  are:

| Dataset | Package | Response levels | Structure | N |
|---------|---------|-----------------|-----------|---|
| `Chile` | carData | Abstain/No/Undec/Yes | two competing trees | 2700 |
| `TravelMode` | AER | air/train/bus/car | IIA violation | 210 |
| `Fishing` | mlogit | beach/pier/boat/charter | shore vs. vessel | 1182 |
| `Arthritis` | vcd | None/Some/Marked | ordinal continuation | 84 |
| `survey$Smoke` | MASS | Never/Occas/Regul/Heavy | ordinal continuation | 237 |
| `pneumo` | VGAM | normal/mild/severe | ordinal continuation | 8 |

Polytomous response data comes in different forms (e.g., wide vs. long)
so this vignette also provides an opportunity to illustrate transformations
among these to make the data suitable for analysis and for plotting.

## Load packages
Load the packages we'll use here:
```{r load-packages}
library(nestedLogit)
library(nnet)       # for: multinom()
library(car)        # for: Anova()
library(ggplot2)
```

---

# Chilean plebiscite voting intent (`carData::Chile`) {#chile}

## Data

The September, 1988 Chilean national plebiscite asked citizens whether the military government of
Augusto Pinochet should remain in power for another eight years. Six months before that,
the independent research center FLASCO/Chile conducted a national survey of 27000 randomly
selected voters. The main question concerned their intention to vote (`vote`) in the plebiscite, which was
recorded as:

- **Yes** (support Pinochet)
- **No** (oppose Pinochet)
- **Undec** (undecided)
- **Abstain** (abstain / not vote)

[The levels of `Chile$vote` are actually just the first letters: "Y", "N", "U", "A". For ease of interpretation, these
are recoded in a step below.]

<!-- DONE: Recoded vote factor with longer labels (Yes/No/Undec/Abstain) ordered by median statusquo (No < Abstain < Undec < Yes) in the chile-drop chunk. -->

`Chile` is in `carData`, which is already a Suggests: dependency of `nestedLogit`, so
no extra package is needed. One main predictor is a standardized scale of support for the
`statusquo`, where high values represent general support for the policies of the military regime.
Other predictors are: `sex`, `age`, `education` (a factor) and `income` of the respondents.

```{r chile-data}
data("Chile", package = "carData")
str(Chile)
```



The dataset contains missing values on a number of predictors, but there were 168 cases who did not answer the
voting intention question. These are dropped here, so that all models use the same cases.
```{r chile-drop}
colSums(is.na(Chile))

# Drop the 168 rows with missing vote so all models use the same cases
chile <- Chile[!is.na(Chile$vote), ]

# Recode vote with longer labels, ordered by median statusquo (No < Abstain < Undec < Yes)
chile$vote <- factor(chile$vote,
                     levels = c("N", "A", "U", "Y"),
                     labels = c("No", "Abstain", "Undec", "Yes"))
nrow(chile)
table(chile$vote)
```

The strong predictor `statusquo` is a scale measuring attitude toward the
political status quo (higher = more supportive of Pinochet's regime).
<!-- DONE: vote levels now ordered by median statusquo: No (-1.07) < Abstain (-0.26) < Undec (0.08) < Yes (1.18) -->


```{r chile-eda}
# statusquo distribution by vote
boxplot(statusquo ~ vote, data = chile,
        xlab = "Vote", ylab = "Attitude toward status quo",
        main  = "Chile 1988: status quo attitude by voting intention")
```

With this reordering of the response levels, a clear pattern emerges in the boxplot. But, also note how the shapes
of the distributions change, and the presence of outliers. But here, the goal is to explain voting intention using
the other predictors.

## Two competing tree structures

The four `vote` categories may seem to be unordered in the general case. However, we can think more productively about
the underlying process for a respondent that leads to their choice. 
A key theoretical question is the **ordering of the decision process**, which gives rise to
two different sets of nested comparisons.

### Engagement-first tree

The voter first decides whether to participate meaningfully (cast a Yes or No
vote) vs. disengage (Abstain or remain Undecided). Engagement is then governed
by demographic/civic variables; direction by political attitude.

```
            vote
           /    \
       engaged  disengaged
       /   \     /      \
     Yes    No Abstain  Undec
```

We specify this as three nested dichotomies:
```{r chile-dichots-eng}
dichots_eng <- logits(
  engage    = dichotomy(engaged    = c("Yes", "No"),
                        disengaged = c("Abstain", "Undec")),
  direction = dichotomy("Yes", "No"),
  disengage = dichotomy("Abstain", "Undec")
)

# print method
dichots_eng

# print as an ASCII tree
as.tree(dichots_eng, response = "vote")
```


### Direction-first tree

The voter first forms a political opinion (pro or anti status quo) and only
then decides whether to act on it. `statusquo` should dominate the direction
split; demographics should govern the two engagement splits.

```
          vote
         /    \
       pro    anti
      /   \   /      \
    Yes  Undec No  Abstain
```

These dichotomies are specified as follows:

```{r chile-dichots-dir}
dichots_dir <- logits(
  direction  = dichotomy(pro  = c("Yes", "Undec"),
                         anti = c("No", "Abstain")),
  engage_pro = dichotomy("Yes", "Undec"),
  engage_ant = dichotomy("No", "Abstain")
)
dichots_dir
```

## Fitting the models

For comparison, we can fit the standard multinomial logistic regression models (using `"N"` as the baseline category)
and the two nested dichotomy models. I use the main effects of the principal predictors (ignoring `region` and `population`)
in all three models.

```{r chile-fit}
mod.formula <- vote ~ statusquo + age + sex + education + income

# Multinomial logit baseline (reference = "No")
chile2 <- within(chile, vote <- relevel(vote, ref = "No"))
chile_multi <- multinom(mod.formula, data = chile2, trace = FALSE)

# Nested logit: engagement-first tree
chile_eng <- nestedLogit(mod.formula, dichotomies = dichots_eng, data = chile)

# Nested logit: direction-first tree
chile_dir <- nestedLogit(mod.formula, dichotomies = dichots_dir, data = chile)
```

Details of this analysis are provided by `summary()`, which provides tests of coefficients as well as overall measures of fit
(residual deviance) for each dichotomy as well as for the combined models comprising all four vote choices.
This output is suppressed here
```{r chile-summaries, eval=FALSE}
summary(chile_eng)
summary(chile_dir)
```

### Omnibus tests

`car::Anova()` gives $G^2$ tests for each term in the submodels for the dichotomies as well as tests for these terms in the combined model.
<!-- DONE: Side-by-side Anova display noted as a potential future package enhancement (e.g., a method that cbinds the Anova tables for two nested models). -->

```{r chile-anova}
Anova(chile_eng)
Anova(chile_dir)
```

## Which tree fits better?

Both nested models and the multinomial model have identical parameter counts (5 predictors × 3 sub-models
= 15 slopes plus 3 intercepts), so AIC is a fair comparison.

```{r chile-aic}
data.frame(
  model   = c("Multinomial", "Engagement tree", "Direction tree"),
  logLik  = c(logLik(chile_multi), logLik(chile_eng), logLik(chile_dir)),
  AIC     = c(AIC(chile_multi),    AIC(chile_eng),    AIC(chile_dir))
)
```

The direction-first tree should fit better (as it does) if `statusquo` is primarily a
predictor of political opinion (Y/N vs. U/A) rather than of engagement. We can
check this by inspecting the `statusquo` coefficients in each sub-model of the
engagement tree: they should be large in `direction` and small in `engage` and
`disengage`.

## Predicted probabilities

To visualize these models, it is useful to calculate the predicted probabilities of `vote` choice over a range of predictor values,
holding constant (averaging over) the predictor we don't want to display in a given plot
```{r chile-newdata}
new_chile <- data.frame(
  statusquo = seq(-1.5, 1.5, by = 0.25),
  age       = median(chile$age,    na.rm = TRUE),
  sex       = factor("F",  levels = levels(chile$sex)),
  education = factor("S",  levels = levels(chile$education)),
  income    = median(chile$income, na.rm = TRUE)
)
```

We can get the predicted probabilities for these separate models for the dichotomies using `predict.nestedLogit()`
and turning these into data frames for plotting with `ggplot2`.

```{r chile-predict}
pred_eng <- as.data.frame(predict(chile_eng, newdata = new_chile))
pred_dir <- as.data.frame(predict(chile_dir, newdata = new_chile))
```

<!-- DONE: Added geom_ribbon() using se.p for 95% pointwise confidence bands (clamped to [0, 1]). -->

```{r chile-plot, fig.height=5, fig.width=9}
# as.data.frame() returns long format: one row per (observation × response level)
# with predictor columns, plus 'response', 'p', 'se.p', 'logit', 'se.logit'
pred_long <- rbind(
  transform(pred_eng, model = "Engagement-first tree"),
  transform(pred_dir, model = "Direction-first tree")
)

ggplot(pred_long, aes(x = statusquo, y = p, colour = response, fill = response)) +
  geom_ribbon(aes(ymin = pmax(0, p - 1.96 * se.p),
                  ymax = pmin(1, p + 1.96 * se.p)),
              alpha = 0.15, colour = NA) +
  geom_line(linewidth = 1.2) +
  geom_point(color = "blacK") +
  facet_wrap(~ model) +
  labs(x      = "Attitude toward status quo",
       y      = "Predicted probability",
       colour = "Vote", fill = "Vote",
       title  = "Chile 1988: predicted vote probabilities by tree structure") +
  theme_bw()
```

This is substantively interesting, because the two different structures for the dichotomy comparisons
give quite different views of the predictions from the models.

### Plotting logits

Using the `plot.nestedLogit()` method, we can also show these on the logit scale, using the `scale = "logit"` argument.
The `others` argument allows us to condition on (average over) values of the variables not shown in this plot.
There are other niceties, not shown here, such as using direct labels on the curves (`label = TRUE`, `label.x`) rather than a legend.

<!-- DONE: plot() with scale="logit" now works as of commit b523bc7. -->


```{r chile-plot-logit, fig.height=5, fig.width=6}
# Logit-scale plot using the new scale= argument
plot(chile_eng, "statusquo",
     others = list(age       = median(chile$age, na.rm = TRUE),
                   sex       = "F",
                   education = "S",
                   income    = median(chile$income, na.rm = TRUE)),
     scale  = "logit",
     xlab   = "Attitude toward status quo",
     main   = "Engagement-first tree")
```

---

# Travel mode choice (`AER::TravelMode`) {#travel}

The dataset `AER::TravelMode` is a classic in econometrics, originally from @Greene:2003.
It relates to the mode of travel chosen by individuals for travel between
Sydney and Melbourne, Australia.

* Do they choose to go by "car", "air", "train", or "bus"?
* How do variables like cost, waiting time, household income influence that choice?

```{r travel-pkg, include=FALSE}
have_AER <- requireNamespace("AER",   quietly = TRUE)
have_dplyr <- requireNamespace("dplyr", quietly = TRUE)
```

This section requires the `AER` and `dplyr` packages.
<!-- DONE: Added AER to Suggests: in DESCRIPTION. -->

```{r travel-eval, eval=have_AER && have_dplyr}
library(AER)
library(dplyr)
data("TravelMode", package = "AER")
str(TravelMode)
```

## Data structure

`TravelMode` is in **long format**: 840 rows = 210 individuals × 4 travel
alternatives (air, train, bus, car). Variables such as `wait` and `gcost` are
*alternative-specific* — they differ across modes for the same individual.

`nestedLogit()` requires **one row per individual** with the chosen mode as the
response. Only **individual-specific** predictors (`income`, `size`) work here
directly; alternative-specific costs would require a different approach.

```{r travel-reshape, eval=have_AER && have_dplyr}
tm <- TravelMode |>
  filter(choice == "yes") |>
  select(mode, income, size) |>
  mutate(mode = relevel(factor(mode), ref = "car"))

table(tm$mode)
```

## Nesting structure

The canonical nested structure separates *private* (car) from *public* transit,
then within public separates *air* from *ground*, then *train* from *bus*.
This reflects the IIA problem: adding a near-identical alternative (e.g., a
second bus service) should not equally reduce the probability of choosing car.

```{r travel-dichots, eval=have_AER && have_dplyr}
travel_dichots <- logits(
  pvt_pub  = dichotomy("car", public = c("air", "train", "bus")),
  air_grnd = dichotomy("air", ground = c("train", "bus")),
  tr_bus   = dichotomy("train", "bus")
)
travel_dichots
```

## Fitting

```{r travel-fit, eval=have_AER && have_dplyr}
tm_multi  <- multinom(mode ~ income + size, data = tm, trace = FALSE)
tm_nested <- nestedLogit(mode ~ income + size,
                         dichotomies = travel_dichots,
                         data = tm)
summary(tm_nested)
Anova(tm_nested)
```

## Predicted probabilities

```{r travel-predict, eval=have_AER && have_dplyr}
new_tm <- data.frame(income = seq(10, 60, by = 10), size = 1)

# as.data.frame() returns long format; reshape to wide for comparison
pred_tm_nested_long <- as.data.frame(predict(tm_nested, newdata = new_tm))
nested_wide <- reshape(
  pred_tm_nested_long[, c("income", "size", "response", "p")],
  idvar     = c("income", "size"),
  timevar   = "response",
  direction = "wide"
)
names(nested_wide) <- sub("^p\\.", "", names(nested_wide))

pred_tm_multi <- as.data.frame(predict(tm_multi, newdata = new_tm,
                                       type = "probs"))

cat("Nested logit:\n")
cbind(income = new_tm$income, round(nested_wide[, levels(tm$mode)], 3))

cat("Multinomial logit:\n")
cbind(income = new_tm$income, round(pred_tm_multi[, levels(tm$mode)], 3))
```

## Model comparison

```{r travel-aic, eval=have_AER && have_dplyr}
data.frame(
  model  = c("Multinomial", "Nested"),
  logLik = c(logLik(tm_multi), logLik(tm_nested)),
  AIC    = c(AIC(tm_multi),    AIC(tm_nested))
)
```

---

# Fishing mode choice (`mlogit::Fishing`) {#fishing}

Another classic econometric dataset amenable to nested dichotomies treatment concerns the choice by
recreational anglers among different modes of fishing: from a beach or pier, or on a boat, that
could be private or a charter. 

<!-- DONE: Fishing section developed below. `nestedLogit()` can only use the individual-specific
predictor `income`; the choice-specific `price.*` / `catch.*` variables require a full conditional
logit / nested logit framework such as `mlogit`. This section shows the simple nested-dichotomy
fit and notes the limitation. -->

The dataset is from @HerrgesKling:1999; see also @CameronTrivedi:2005 for an econometric treatment
that also considers the cost of fishing by these modes and also the expected catch
associated with each. Models for such response-specific predictors are interesting,
but outside the scope of our nested logit models.


```{r fishing-pkg, include=FALSE}
have_mlogit <- requireNamespace("mlogit", quietly = TRUE)
```

This section requires the `mlogit` package for the dataset.

```{r fishing-load, eval=have_mlogit}
data("Fishing", package = "mlogit")
str(Fishing)
```

## Overview

1182 US recreational anglers each chose one of four fishing modes:
`beach`, `pier`, `boat`, `charter`. The natural nesting separates
*shore-based* from *vessel-based* alternatives:

```
       fishing
      /        \
   shore      vessel
  /     \    /      \
beach  pier boat  charter
```

Shore-based modes share unobserved attributes (no boat required, lower cost
and commitment) that vessel-based modes do not, so IIA is plausible within
but not across groups.

Unlike `AER::TravelMode`, `mlogit::Fishing` is **already in wide format** — one row per
individual — with `mode` indicating the chosen alternative and `income` as the only
individual-specific predictor. The `price.*` and `catch.*` variables are
*alternative-specific* (one column per mode) and cannot be used directly with
`nestedLogit()`, which expects predictors that apply to the individual, not to each
alternative. A full conditional-logit model (e.g., via `mlogit::mlogit()`) would be
needed to exploit those variables.

```{r fishing-dichots, eval=have_mlogit}
fishing_dichots <- logits(
  shore_vessel = dichotomy(shore  = c("beach", "pier"),
                           vessel = c("boat",  "charter")),
  shore_type   = dichotomy("beach", "pier"),
  vessel_type  = dichotomy("boat",  "charter")
)
fishing_dichots
as.tree(fishing_dichots, response = "mode")
```

Fit the model and show the summaries for the dichotomies:
```{r fishing-fit, eval=have_mlogit}
fish_nested <- nestedLogit(mode ~ income,
                           dichotomies = fishing_dichots,
                           data = Fishing)
summary(fish_nested)
Anova(fish_nested)
```

## Plots
Again, the `plot.nestedLogit()` gives an interpretable plot of the response probabilities:
```{r fishing-plot, eval=have_mlogit}
plot(fish_nested, x.var = "income",
     xlab  = "Monthly income (USD)",
     main  = "Fishing mode choice vs. income",
     label=TRUE, label.col="black",
     cex.lab = 1.2)
```

* Fishing from the beach is a low-probability choice, which doesn't vary much with income.
* Fishing from a pier is next overall, followed by charter fishing. Both of these decline with income.
* The tendency to fish from a private boat (yours or friend's) increases dramatically with income.

---

# Arthritis treatment outcome (`vcd::Arthritis`) {#arthritis}

```{r arthritis-pkg, include=FALSE}
have_vcd <- requireNamespace("vcd", quietly = TRUE)
```

<!-- The dataset `vcd::Arthritis` is most commonly analysed  -->

```{r arthritis-load, eval=have_vcd}
data("Arthritis", package = "vcd")
str(Arthritis)
table(Arthritis$Improved)
```

## Overview

84 patients in a double-blind clinical trial received either an active treatment
or a placebo (`Treatment`). They are classified by `Sex` and `Age`.
The outcome `Improved` is *ordinal*, with levels "None", "Some" or "Marked" improvement.

Simpler analyses of these data consider only the dichotomy ("Improved") between 
"None" and the other categories.

```
      Improved
     /        \
  None      Any improvement
            /             \
         Some           Marked
```

Continuation logits are natural here: dichotomy 1 asks whether the patient
improved at all; dichotomy 2 asks, given improvement, whether it was marked
rather than merely some. Treated patients should have higher probabilities at
both transitions.

```{r arthritis-dichots, eval=have_vcd}
arth_dichots <- continuationLogits(c("None", "Some", "Marked"))
arth_dichots
```

```{r arthritis-fit, eval=have_vcd}
arth_nested <- nestedLogit(Improved ~ Treatment + Sex + Age,
                           dichotomies = arth_dichots,
                           data = Arthritis)
summary(arth_nested)
Anova(arth_nested)
```

## Plotting
```{r arthritis-plot, eval=have_vcd}
plot(arth_nested, x.var = "Age",
     others = list(Treatment = "Treated", Sex = "Female"),
     xlab   = "Age",
     main   = "Arthritis: Treatment = Treated, Sex = Female")
```

---

# Smoking status (`MASS::survey`) {#smoking}

```{r survey-load}
data("survey", package = "MASS")
# Drop NAs on Smoke
sv <- survey[!is.na(survey$Smoke), ]
table(sv$Smoke)
```

## Overview

237 Australian university students. `Smoke` is ordinal:
`Never` < `Occas` < `Regul` < `Heavy`.

Continuation logits:
- Dichotomy 1: Never vs. ever smoked
- Dichotomy 2: Occasional vs. regular+
- Dichotomy 3: Regular vs. heavy

```{r smoke-dichots}
smoke_dichots <- continuationLogits(c("Never", "Occas", "Regul", "Heavy"))
smoke_dichots
```

```{r smoke-fit}
smoke_nested <- nestedLogit(Smoke ~ Sex + Age + Exer,
                            dichotomies = smoke_dichots,
                            data = sv)
summary(smoke_nested)
Anova(smoke_nested)
```

```{r smoke-plot}
plot(smoke_nested, x.var = "Age",
     others = list(Sex = "Female", Exer = "Some"),
     xlab   = "Age",
     main   = "Smoking: Female students, some exercise")
```

**Note**: With only 237 observations and modest predictors, results are
illustrative rather than definitive.

---

# Ordinal examples — brief sketches

## Pneumoconiosis severity (`VGAM::pneumo`)

```{r pneumo-pkg, include=FALSE}
have_VGAM <- requireNamespace("VGAM", quietly = TRUE)
```

```{r pneumo-show, eval=have_VGAM}
data("pneumo", package = "VGAM")
pneumo
```

8 grouped observations on coal miners. Response: `normal` / `mild` / `severe`.
Predictor: `duration` (years of dust exposure).  Very small data but the story
is clean — continuation logits on an ordinal severity scale vs. a single
continuous dose predictor.

```{r pneumo-fit, eval=have_VGAM}
# pneumo is aggregated; nestedLogit() needs individual-level data or
# frequency weights — requires expansion or a weighted interface.
# Placeholder: would use continuationLogits(c("normal", "mild", "severe"))
```

## Mental health status (Agresti)

Mental health status (`Well`, `Mild`, `Moderate`, `Impaired`) vs. SES and
life events — Table 9.1 in @Agresti:2002.
May be available via `vcdExtra::Mental` or must be entered from the book.
Classic illustration of continuation logits for an ordinal psychiatric outcome.

---

# Session info

```{r session-info}
sessionInfo()
```

```{r write-bib, echo=FALSE}
pkgs <- unique(c(.to.cite, .packages()))
knitr::write_bib(pkgs, file = here::here("vignettes", "packages.bib"))
```

## References