---
title: "widr"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with widr}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>", eval = FALSE)
```

**widr** provides direct API access to the [World Inequality Database](https://wid.world) (WID) from R. It offers validated variable codes, structured downloads as standard data frames, and helpers for currency conversion, inequality measurement, and plotting. Independent implementation, unaffiliated with the World Inequality Lab (WIL) or the Paris School of Economics. Data are sourced from WID and maintained by WIL.

## Installation

```{r install}
install.packages("widr")

# Development version
remotes::install_github("cherylisabella/widr")
```

## Variable codes

WID variables follow a four-part grammar:

```
<type:1> <concept:5-6> [<age:3>] [<pop:1>]
```

| Component | Width | Example | Meaning |
|-----------|-------|---------|---------|
| `type` | 1 letter | `s` | share |
| `concept` | 5-6 letters | `ptinc` | pre-tax national income |
| `age` | 3 digits | `992` | adults 20+ |
| `pop` | 1 letter | `j` | equal-split between spouses |

`sptinc992j` denotes the **share** of **pre-tax national income** for **equal-split adults aged 20+**. 

The full catalogue is available at [World Inequality Database](https://wid.world/codes-dictionary/); widr bundles it as six searchable reference tables.

```{r codes}
wid_search("national income")                           # keyword search across concepts
wid_decode("sptinc992j")                                # parse into components
wid_encode("s", "ptinc", age = "992", pop = "j")       # build from components
wid_is_valid(series_type = "s", concept = "ptinc")      # non-throwing validation
```

The six reference tables (`wid_series_types`, `wid_concepts`, `wid_ages`, `wid_pop_types`, `wid_percentiles`, `wid_countries`) are lazy-loaded and compiled from the codes dictionary by an independent script.

## Downloading data

`download_wid()` returns a `wid_df`, a classed `data.frame` fully compatible with dplyr, ggplot2, and base R. At minimum supply `indicators` or `areas`; all other parameters default to `"all"` (age to `"992"`, pop to `"j"`).

```{r download}
library(widr)

# Top 1% pre-tax income share, United States, 2000-2022
top1 <- download_wid(
  indicators = "sptinc992j",
  areas      = "US",
  perc       = "p99p100",
  years      = 2000:2022
)

top1
#> <wid_df>  23 rows | 1 countries | 1 variables
#>   country   variable percentile year  value age pop
#> 1      US sptinc992j  p99p100   2000  0.168 992   j
#> ...
```

Data is retrieved from the WID webservice at `https://rfap9nitz6.execute-api.eu-west-1.amazonaws.com/prod`.

### Multiple countries and percentiles

```{r multi}
shares <- download_wid(
  indicators = "sptinc992j",
  areas      = c("US", "FR", "DE", "CN"),
  perc       = c("p90p100", "p99p100"),
  years      = 1980:2022
)
```

### Excluding interpolated points

Many series are linearly interpolated between survey years. Pass `include_extrapolations = FALSE` to retain only directly observed observations:

```{r extrap}
download_wid("sptinc992j", areas = "MZ", include_extrapolations = FALSE)
```

### Source metadata

`metadata = TRUE` attaches source and methodological documentation as an attribute — the shape of the data frame is unchanged:

```{r meta}
result <- download_wid("sptinc992j", areas = "US", metadata = TRUE)
attr(result, "wid_meta")
#>     variable country      source method quality    imputation
#> 1 sptinc992j      US Tax records    DFL    high adjusted surveys
```

### Key parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| `indicators` | `"all"` | Variable codes |
| `areas` | `"all"` | ISO-2 country / region codes |
| `years` | `"all"` | Integer vector or `"all"` |
| `perc` | `"all"` | Percentile codes, e.g. `"p99p100"` |
| `ages` | `"992"` | Three-digit age code |
| `pop` | `"j"` | Population unit |
| `metadata` | `FALSE` | Attach source info as `attr(., "wid_meta")` |
| `include_extrapolations` | `TRUE` | Include interpolated points |
| `cache` | `TRUE` | Cache responses to disc |
| `verbose` | `FALSE` | Print progress messages |

## Tidyverse integration

`wid_df` is a plain `data.frame` subclass; dplyr verbs and ggplot2 work without any unwrapping:

```{r tidy-pipe}
library(dplyr)
library(ggplot2)

top1 |>
  wid_tidy(country_names = FALSE) |>
  filter(year >= 1990) |>
  ggplot(aes(year, value)) +
  geom_line(colour = "#58a6ff", linewidth = 0.9) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(title = "Top 1% pre-tax income share - United States",
       x = NULL, y = NULL) +
  theme_minimal()
```

`wid_tidy()` coerces `year` to integer and `value` to double, and optionally appends `indicator`, `series_type`, `type_label`, and `country_name` columns.

## Reusable query objects

`wid_query()` builds a query; `wid_filter()` updates it; `wid_fetch()` executes it. Useful when iterating over parameter combinations or embedding in analysis pipelines:

```{r query}
q <- wid_query(indicators = "sptinc992j", areas = c("US", "FR"), cache = FALSE)
q <- wid_filter(q, years = 2010:2022)
wid_fetch(q)
```

## Caching

All responses are cached to disc by default, keyed to the exact query parameters and persisting across sessions:

```{r cache}
wid_cache_list()    # list cached queries
wid_cache_clear()   # remove all
```

## Currency conversion

Monetary series (types `a`, `m`, `t`) are in local currency at the prior year's prices. `wid_convert()` fetches the appropriate WID exchange-rate series and divides in one step. Dimensionless series (types `s`, `g`, etc.) pass through unchanged with a message.

```{r convert}
# Bottom 50% average income, four countries - convert to 2022 USD PPP
download_wid("aptinc992j", areas = c("US", "FR", "CN", "IN"), perc = "p0p50") |>
  wid_convert(target = "ppp", base_year = "2022")
```

Supported targets: `"lcu"` (no conversion), `"usd"`, `"eur"`, `"gbp"`, `"ppp"`, `"yppp"`.

## Inequality measures

These operate on data already in memory; no additional API calls are needed.

### Gini coefficient

Requires a share (`s`) series with contiguous `pXpY` codes covering the full distribution:

```{r gini}
dist <- download_wid("sptinc992j", areas = c("US", "FR"), perc = "all",
                     years = 1990:2022)
wid_gini(dist)
#>   country year  gini
#> 1      FR 1990 0.411
#> 2      US 1990 0.453
```

### Top fractile share

```{r top-share}
wid_top_share(dist, top = 0.01)   # top 1%
wid_top_share(dist, top = 0.10)   # top 10%
```

### Percentile ratio

Requires a threshold (`t`) series:

```{r perc-ratio}
thresh <- download_wid("tptinc992j", areas = "US", perc = "all")
wid_percentile_ratio(thresh)                                          # P90/P10
wid_percentile_ratio(thresh, numerator = "p90", denominator = "p50") # P90/P50
```

## Plotting

All plot functions return `ggplot` objects and accept additional layers:

```{r plot}
# Time series - one line per country; facet = TRUE for separate panels
wid_plot_timeseries(shares,
  country_labels = c(US = "United States", FR = "France",
                     DE = "Germany",       CN = "China"))

# Cross-country bar chart for a single year
wid_plot_compare(shares, year = 2020)

# Lorenz curve
wid_plot_lorenz(dist, country = "US")
```

## Example

```{r full-example}
library(widr); library(dplyr); library(ggplot2)

download_wid(
  indicators = "aptinc992j",
  areas      = c("US", "FR", "CN", "IN"),
  perc       = "p0p50",
  years      = 1990:2022
) |>
  wid_convert(target = "ppp", base_year = "2022") |>
  wid_tidy(country_names = TRUE) |>
  ggplot(aes(year, value, colour = country_name)) +
  geom_line(linewidth = 0.8) +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title    = "Bottom 50% average pre-tax income",
       subtitle = "2022 USD PPP · equal-split adults 20+",
       x = NULL, y = NULL, colour = NULL)
```

## Quick reference

| Function | Purpose |
|---|---|
| `download_wid()` | Download data; returns a `wid_df` |
| `wid_decode()` / `wid_encode()` | Parse or build variable codes |
| `wid_validate()` / `wid_is_valid()` | Validate code components |
| `wid_search()` | Keyword search across reference tables |
| `wid_tidy()` | Decode columns, coerce types |
| `wid_convert()` | Currency conversion |
| `wid_metadata()` | Retrieve source information |
| `wid_gini()` | Gini coefficient |
| `wid_top_share()` | Top fractile income / wealth share |
| `wid_percentile_ratio()` | Percentile ratio (e.g. P90/P10) |
| `wid_plot_timeseries()` | Time-series line chart |
| `wid_plot_compare()` | Cross-country bar / point chart |
| `wid_plot_lorenz()` | Lorenz curve |
| `wid_query()` / `wid_filter()` / `wid_fetch()` | Reusable query objects |
| `wid_set_key()` | Set API key |
| `wid_cache_list()` / `wid_cache_clear()` | Cache management |

Full code dictionary: `vignette("code-dictionary")` · [wid.world/codes-dictionary](https://wid.world/codes-dictionary/)
