---
title: "Getting Started with assemblykor"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with assemblykor}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4
)
```

## Overview

`assemblykor` provides seven built-in datasets from the Korean National Assembly
for teaching quantitative methods in political science:

- **`legislators`**: 947 MP records (20th-22nd assemblies)
- **`bills`**: 60,925 legislative bills
- **`wealth`**: 2,928 legislator-year asset declarations
- **`seminars`**: 5,962 legislator-year seminar activity records
- **`speeches`**: 15,843 committee speech records (22nd, Science & ICT)
- **`votes`**: 8,050 plenary vote tallies (20th-22nd)
- **`roll_calls`**: 383,739 member-level roll call votes (22nd)

```{r load}
library(assemblykor)
```

## 1. Exploring legislator data

```{r legislators}
data(legislators)
str(legislators)
```

### Gender composition by assembly

```{r gender-table}
gender_tab <- table(legislators$assembly, legislators$gender)
gender_tab
prop.table(gender_tab, margin = 1)
```

### Legislative productivity by seniority

```{r seniority-plot}
boxplot(n_bills_lead ~ seniority, data = legislators,
        xlab = "Terms served", ylab = "Bills proposed (as lead)",
        main = "Seniority and Legislative Productivity",
        col = "lightblue")
```

Senior legislators produce more bills, but with high variance.

## 2. Bill outcomes

```{r bills}
data(bills)

# Top 5 outcomes
outcome_counts <- sort(table(bills$result), decreasing = TRUE)
barplot(outcome_counts[1:5], las = 2, col = "steelblue",
        main = "Most Common Bill Outcomes")
```

Most bills expire at the end of the assembly term (임기만료폐기).
Only a small fraction pass in their original form (원안가결).

### Bills per month

```{r bills-timeline}
bills$month <- format(bills$propose_date, "%Y-%m")
monthly <- aggregate(bill_id ~ month, data = bills, FUN = length)
names(monthly) <- c("month", "count")
monthly <- monthly[order(monthly$month), ]

plot(seq_len(nrow(monthly)), monthly$count, type = "l",
     xlab = "Month (index)", ylab = "Bills proposed",
     main = "Monthly Bill Proposals (20th-22nd Assembly)")
```

## 3. Wealth panel

The `wealth` dataset is a legislator-year panel ideal for practicing
fixed-effects regression.

```{r wealth}
data(wealth)

# Distribution of net worth
hist(wealth$net_worth / 1e6, breaks = 50, col = "coral",
     main = "Legislator Net Worth Distribution",
     xlab = "Net Worth (billion KRW)")
```

### Real estate concentration

```{r re-share}
wealth$re_share <- ifelse(wealth$total_assets > 0,
                          wealth$real_estate / wealth$total_assets, NA)

boxplot(re_share ~ year, data = wealth,
        xlab = "Year", ylab = "Real estate / total assets",
        main = "Real Estate as Share of Legislator Wealth",
        col = "lightyellow")
```

Korean legislators hold a large share of their wealth in real estate,
reflecting broader patterns in Korean household wealth.

## 4. Policy seminars and cross-party cooperation

```{r seminars}
data(seminars)

# Governing vs opposition party
gov_means <- tapply(seminars$cross_party_ratio,
                    seminars$is_governing, mean, na.rm = TRUE)
barplot(gov_means, names.arg = c("Opposition", "Governing"),
        ylab = "Cross-party ratio", col = c("dodgerblue", "tomato"),
        main = "Cross-Party Seminar Collaboration")
```

Governing-party legislators tend to have lower cross-party collaboration
in policy seminars, a pattern consistent with the "closing ranks" hypothesis.

## 5. Joining datasets

All datasets share the `member_id` and/or `assembly` columns:

```{r join, message=FALSE}
library(dplyr)

# Merge legislators with wealth
leg_wealth <- legislators %>%
  inner_join(wealth, by = "member_id", relationship = "many-to-many")

# Productivity vs wealth
leg_wealth %>%
  group_by(district_type) %>%
  summarise(
    n = n(),
    median_net_worth = median(net_worth / 1e6, na.rm = TRUE),
    median_bills = median(n_bills_lead, na.rm = TRUE)
  )
```

## 6. Plenary votes

```{r votes}
data(votes)

# Yes-vote share distribution
votes$yes_rate <- votes$yes / votes$voted
hist(votes$yes_rate, breaks = 40, col = "lightgreen",
     main = "Distribution of Yes-Vote Share",
     xlab = "Proportion yes")
```

Most bills pass with near-unanimous support. The left tail reveals
contested legislation where party discipline breaks down.

## 7. Roll call analysis

```{r roll-calls, message = FALSE}
data(roll_calls)
library(dplyr)

# Party discipline: how often do members vote with their party majority?
party_votes <- roll_calls %>%
  group_by(bill_id, party) %>%
  mutate(party_majority = names(which.max(table(vote)))) %>%
  ungroup() %>%
  mutate(with_party = vote == party_majority)

party_votes %>%
  group_by(party) %>%
  summarise(
    n_members = n_distinct(member_id),
    discipline = mean(with_party, na.rm = TRUE)
  ) %>%
  filter(n_members >= 5) %>%
  arrange(desc(discipline))
```

## 8. Speech patterns

```{r speeches}
data(speeches)

# Who speaks most in committee?
leg_speeches <- speeches[speeches$role == "legislator", ]
speaker_counts <- sort(table(leg_speeches$speaker_name), decreasing = TRUE)
barplot(speaker_counts[1:10], las = 2, col = "plum",
        main = "Top 10 Most Active Speakers (Sci & ICT Committee)")
```

## Next steps

For text analysis, download the bill propose-reason texts:

```r
texts <- get_bill_texts()
```

For network analysis, download the full co-sponsorship records:

```r
proposers <- get_proposers()
```

See `vignette("codebook")` for the full data dictionary, or
`?get_bill_texts` and `?get_proposers` for download function details.
