---
title: "Tracking state GDP components with IBGE data"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Tracking state GDP components with IBGE data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

## Overview

This vignette demonstrates how to query **IBGE aggregate tables** that
serve as short-term tracking indicators for **state-level GDP components**
— particularly in services, retail, manufacturing, and construction.

The workflow is always the same:

1. **Inspect metadata** with `ibge_metadata()` to discover available
   variables, classifications, and categories.
2. **Fetch data** with `ibge_variables()`, specifying aggregate, variable,
   classification, localities, and periods.
3. **Post-process** the `value` column with `parse_ibge_value()` and convert
   period codes to proper dates.

> **Note on `value`**: the IBGE API may return special symbols (`"-"`,
> `".."`, `"..."`, `"X"`) instead of numbers. Always use
> `parse_ibge_value()` to convert reliably.

## Setup

```{r}
library(ibger)
library(dplyr)
library(tidyr)
library(ggplot2)
library(lubridate)
library(stringr)
```

## Helper: convert period codes to dates

IBGE returns periods as character codes: `"202501"` for monthly data
(January 2025) and `"202501"` for quarterly data (Q1 2025). We need
format-specific converters:

```{r}
# Monthly periods: "202501" -> 2025-01-01
period_to_monthly <- function(x) ym(x)

# Quarterly periods: "202501" -> 2025-01-01
# lubridate::yq() expects "2025.1", so we reformat first
period_to_quarterly <- function(x) {
  yr <- substr(x, 1, 4)
  qt <- as.integer(substr(x, 5, 6))
  as.Date(paste0(yr, "-", qt * 3 - 2, "-01"))
}
```

---

## 1) IPCA (7060) — Health insurance

The IPCA (consumer price index) aggregate 7060 is the main source for
inflation tracking. Here we compare the general index against the health
insurance sub-item for the Recife Metropolitan Area.

### 1.1 Discovering the right IDs

```{r}
meta_7060 <- ibge_metadata(7060)

# Find classification categories matching "Plano" (health plan) or "Índice" (index)
unnest(meta_7060$classifications, categories) |>
  filter(str_detect(category_name, "Plano|Índice")) |>
  select(id, category_id, category_name, category_level)

# Available variables
meta_7060$variables
```

Reading the output:

- `id` is the **classification ID** (e.g. `"315"`).
- `category_id` is the **category ID** within that classification
  (e.g. `"7169"` for *General index*).
- In `ibge_variables()`, pass
  `classification = list("315" = c("7169", "7695"))` to request both
  categories simultaneously.

### 1.2 Fetching the data

```{r}
ipca_health <- ibge_variables(
  aggregate = 7060,
  variable = 63,                          # IPCA - Monthly variation
  periods = -12,
  classification = list(
    "315" = c("7169", "7695")             # General index + Health insurance
  ),
  localities = "N7[2601]"                 # Recife Metropolitan Area
) |>
  mutate(
    value  = parse_ibge_value(value),
    period = period_to_monthly(period)
  ) |>
  select(period, classification_315, locality_name, value)
```

### 1.3 Wide format for inspection

```{r}
ipca_health |>
  pivot_wider(
    id_cols    = c(period, locality_name),
    names_from = classification_315,
    values_from = value
  ) |>
  arrange(desc(period))
```

### 1.4 Plot

```{r}
ipca_health |>
  ggplot(aes(period, value, color = classification_315)) +
  geom_line() +
  geom_point() +
  labs(
    title = "IPCA — Health insurance vs General index",
    subtitle = "Recife Metropolitan Area, monthly variation (%)",
    x = NULL, y = "Monthly variation (%)", color = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

---

## 2) IPCA (7060) — Vehicle insurance

Same logic — only the category changes in classification `"315"`.

```{r}
# Find category ID for "Seguro" (insurance)
unnest(meta_7060$classifications, categories) |>
  filter(str_detect(category_name, "Seguro|Índice")) |>
  select(id, category_id, category_name)
```

```{r}
ipca_vehicle_ins <- ibge_variables(
  aggregate = 7060,
  variable = 63,
  periods = -12,
  classification = list("315" = c("7169", "7643")),  # General + Vehicle insurance
  localities = "N7[2601]"
) |>
  mutate(
    value  = parse_ibge_value(value),
    period = period_to_monthly(period)
  ) |>
  select(period, classification_315, locality_name, value)
```

```{r}
ipca_vehicle_ins |>
  ggplot(aes(period, value, color = classification_315)) +
  geom_line() +
  geom_point() +
  labs(
    title = "IPCA — Vehicle insurance vs General index",
    subtitle = "Recife Metropolitan Area, monthly variation (%)",
    x = NULL, y = "Monthly variation (%)", color = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

---

## 3) PMS (8693) — Transportation and postal services

The Monthly Survey of Services (PMS) aggregate 8693 is a proxy for
service-sector activity. Here we filter by:

- **Index type** (classification `11046`): revenue vs volume indices
- **Activity group** (classification `12355`): transportation, storage
  and postal services

```{r}
meta_8693 <- ibge_metadata(8693)

# Browse classifications and categories
unnest(meta_8693$classifications, categories)
meta_8693$variables
```

```{r}
pms_transport <- ibge_variables(
  aggregate = 8693,
  variable = 7167,                          # Index number (2022 = 100)
  periods = -12,
  classification = list(
    "11046" = "all",                        # All index types (revenue + volume)
    "12355" = "106876"                      # Transportation/postal services
  ),
  localities = "N3[26]"                     # Pernambuco
) |>
  mutate(
    value  = parse_ibge_value(value),
    period = period_to_monthly(period)
  ) |>
  select(period, classification_11046, locality_name, value)
```

```{r}
pms_transport |>
  ggplot(aes(period, value, color = classification_11046)) +
  geom_line() +
  geom_point() +
  labs(
    title = "PMS — Index numbers (2022 = 100)",
    subtitle = "Transportation, storage and postal services (Pernambuco)",
    x = NULL, y = "Index (2022 = 100)", color = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

---

## 4) PNAD Contínua (5434) — Accommodation and food services

The Continuous PNAD aggregate 5434 provides quarterly employment data
(persons aged 14+ employed) by activity group.

```{r}
meta_5434 <- ibge_metadata(5434)
unnest(meta_5434$classifications, categories)
meta_5434$variables
```

```{r}
pnad_accommodation <- ibge_variables(
  aggregate = 5434,
  variable = 4090,                          # Employed persons (thousands)
  periods = -12,                            # Last 12 quarters
  classification = list("888" = "56623"),   # Accommodation and food services
  localities = "N3[26]"                     # Pernambuco
) |>
  mutate(
    value  = parse_ibge_value(value),
    period = period_to_quarterly(period)
  ) |>
  select(period, classification_888, locality_name, value)
```

```{r}
pnad_accommodation |>
  ggplot(aes(period, value)) +
  geom_line() +
  geom_point() +
  labs(
    title = "PNAD Contínua — Employed persons (14+)",
    subtitle = "Accommodation and food services (Pernambuco, thousands)",
    x = NULL, y = "Employed (thousands)"
  ) +
  theme_minimal()
```

---

## 5) PMS (8693) — Professional and administrative services

Same aggregate as section 3, switching only the activity category in
classification `12355`:

```{r}
pms_professional <- ibge_variables(
  aggregate = 8693,
  variable = 7167,
  periods = -12,
  classification = list(
    "11046" = "all",
    "12355" = "31399"                       # Professional/administrative services
  ),
  localities = "N3[26]"
) |>
  mutate(
    value  = parse_ibge_value(value),
    period = period_to_monthly(period)
  ) |>
  select(period, classification_11046, locality_name, value)
```

```{r}
pms_professional |>
  ggplot(aes(period, value, color = classification_11046)) +
  geom_line() +
  geom_point() +
  labs(
    title = "PMS — Index numbers (2022 = 100)",
    subtitle = "Professional and administrative services (Pernambuco)",
    x = NULL, y = "Index (2022 = 100)", color = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

---

## 6) PNAD Contínua (5434) — Domestic services

```{r}
pnad_domestic <- ibge_variables(
  aggregate = 5434,
  variable = 4090,
  periods = -12,
  classification = list("888" = "56628"),   # Domestic services
  localities = "N3[26]"
) |>
  mutate(
    value  = parse_ibge_value(value),
    period = period_to_quarterly(period)
  ) |>
  select(period, classification_888, locality_name, value)
```

```{r}
pnad_domestic |>
  ggplot(aes(period, value)) +
  geom_line() +
  geom_point() +
  labs(
    title = "PNAD Contínua — Employed persons (14+)",
    subtitle = "Domestic services (Pernambuco, thousands)",
    x = NULL, y = "Employed (thousands)"
  ) +
  theme_minimal()
```

---

## 7) PIM-PF (8888) — Industrial production (selected CNAE sectors)

The PIM-PF (Monthly Industrial Survey — Physical Production) aggregate
8888 covers manufacturing output. Classification `544` filters by
industrial activity (CNAE sections).

```{r}
meta_8888 <- ibge_metadata(8888)
unnest(meta_8888$classifications, categories)
meta_8888$variables
```

```{r}
pim_selected <- ibge_variables(
  aggregate = 8888,
  variable = 12606,                         # Index number (2022 = 100)
  periods = -12,
  classification = list(
    "544" = c(129318, 129338)               # Beverages; Motor vehicles
  ),
  localities = "N3[26]"
) |>
  mutate(
    value  = parse_ibge_value(value),
    period = period_to_monthly(period)
  ) |>
  select(period, classification_544, locality_name, value)
```

```{r}
pim_selected |>
  ggplot(aes(period, value, color = classification_544)) +
  geom_line() +
  geom_point() +
  labs(
    title = "PIM-PF — Index numbers (2022 = 100)",
    subtitle = "Beverages and Motor vehicles (Pernambuco)",
    x = NULL, y = "Index (2022 = 100)", color = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

---

## 8) Construction (8886) — Typical construction inputs

```{r}
meta_8886 <- ibge_metadata(8886)
meta_8886$variables
```

```{r}
construction <- ibge_variables(
  aggregate = 8886,
  variable = 12606,                         # Index number (2022 = 100)
  periods = -12,
  localities = "N1"                         # Brazil
) |>
  mutate(
    value  = parse_ibge_value(value),
    period = period_to_monthly(period)
  ) |>
  select(period, locality_name, value)
```

```{r}
construction |>
  ggplot(aes(period, value)) +
  geom_line() +
  geom_point() +
  labs(
    title = "Construction — Typical inputs (physical production)",
    subtitle = "Brazil, index number (2022 = 100)",
    x = NULL, y = "Index (2022 = 100)"
  ) +
  theme_minimal()
```

---

## 9) PMC (8884 / 8757 / 8880) — Retail trade indices

The Monthly Retail Trade Survey (PMC) publishes volume and revenue
indices across different retail segments. The three aggregates below
follow the same pattern — classification `11046` selects the index type
(volume vs nominal revenue).

### 9.1 Vehicles, motorcycles, parts and accessories (8884)

```{r}
meta_8884 <- ibge_metadata(8884)
unnest(meta_8884$classifications, categories)
meta_8884$variables
```

```{r}
pmc_vehicles <- ibge_variables(
  aggregate = 8884,
  variable = 7169,                          # Index number (2022 = 100)
  periods = -12,
  classification = list("11046" = 56738),   # Volume index
  localities = "N3[26]"
) |>
  mutate(
    value  = parse_ibge_value(value),
    period = period_to_monthly(period)
  ) |>
  select(period, classification_11046, locality_name, value)
```

```{r}
pmc_vehicles |>
  ggplot(aes(period, value)) +
  geom_line() +
  geom_point() +
  labs(
    title = "PMC — Sales volume index (2022 = 100)",
    subtitle = "Vehicles, motorcycles, parts and accessories (Pernambuco)",
    x = NULL, y = "Index (2022 = 100)"
  ) +
  theme_minimal()
```

### 9.2 Construction materials (8757)

```{r}
pmc_construction <- ibge_variables(
  aggregate = 8757,
  variable = 7169,
  periods = -12,
  classification = list("11046" = 56732),   # Volume — construction materials
  localities = "N3[26]"
) |>
  mutate(
    value  = parse_ibge_value(value),
    period = period_to_monthly(period)
  ) |>
  select(period, classification_11046, locality_name, value)
```

```{r}
pmc_construction |>
  ggplot(aes(period, value)) +
  geom_line() +
  geom_point() +
  labs(
    title = "PMC — Sales volume index (2022 = 100)",
    subtitle = "Construction materials (Pernambuco)",
    x = NULL, y = "Index (2022 = 100)"
  ) +
  theme_minimal()
```

### 9.3 Retail trade (8880)

```{r}
pmc_retail <- ibge_variables(
  aggregate = 8880,
  variable = 7169,
  periods = -12,
  classification = list("11046" = 56734),   # Volume — retail trade
  localities = "N3[26]"
) |>
  mutate(
    value  = parse_ibge_value(value),
    period = period_to_monthly(period)
  ) |>
  select(period, classification_11046, locality_name, value)
```

```{r}
pmc_retail |>
  ggplot(aes(period, value)) +
  geom_line() +
  geom_point() +
  labs(
    title = "PMC — Sales volume index (2022 = 100)",
    subtitle = "Retail trade (Pernambuco)",
    x = NULL, y = "Index (2022 = 100)"
  ) +
  theme_minimal()
```

---

## Next steps

1. **Save the series** in a standardised format (e.g. `arrow::write_parquet()`
   or a database) for reproducible dashboards.
2. Build a **state GDP tracking dashboard** with normalisation (base 100),
   smoothing (moving averages), and variation indicators (month-over-month,
   year-over-year).
3. Wrap each block (IPCA, PMS, PNAD, PIM-PF, PMC) into a dedicated function
   to reduce repetition in production code.
