---
title: "Getting started with BEMPdata"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with BEMPdata}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  eval     = FALSE
)
```

## Overview

The **BEMPdata** package provides access to the Bangladesh Environmental
Mobility Panel (BEMP), a household panel survey on environmental migration
along the Jamuna River in Bangladesh (2021–2024). The dataset covers 1,691
households across 20 survey datasets (14 rounds: 4 annual in-person waves and
10 bi-monthly phone waves), yielding 24,279 completed surveys.

Data are hosted on [Zenodo](https://doi.org/10.5281/zenodo.18229498) and
downloaded on demand. Files are cached locally after the first download, so
subsequent calls are instant.

## Installation

```{r install}
# Install from GitHub
remotes::install_github("janfreihardt/BEMPdata")
```

## Wave structure

The package includes a built-in overview of all 20 wave datasets:

```{r wave-overview}
library(BEMPdata)

wave_overview

# In-person waves only
wave_overview[wave_overview$type == "in-person", ]
```

Wave identifiers follow the pattern `w{round}[_M|_N|_V]`:

| Suffix | Meaning |
|--------|---------|
| *(none)* | Main household questionnaire |
| `_M`   | Migrant questionnaire |
| `_N`   | Non-migrant questionnaire |
| `_V`   | Village profile questionnaire |

## Downloading wave data

Use `get_wave()` to download and load a wave. The first call downloads the
full CSV archive (~6 MB) from Zenodo; all subsequent calls use the local cache.

```{r get-wave}
# Baseline in-person wave (2021)
w1 <- get_wave("w1")
head(w1)

# Wave 6, migrant questionnaire
w6_migrant <- get_wave("w6_M")

# Wave 14, non-migrant questionnaire, in Stata format (with value labels)
w14_nm <- get_wave("w14_N", format = "dta")
```

## Working with codebooks

### Look up a variable by keyword

```{r lookup}
# Find all variables related to income
lookup_variable("income")

# Search only in variable labels
lookup_variable("migrat", fields = "label")

# Use a regular expression
lookup_variable("flood|erosion")
```

### Get the full codebook for a wave

```{r get-codebook}
# Codebook for the baseline wave
cb_w1 <- get_codebook("w1")
names(cb_w1)

# Merged codebook across all waves
cb_all <- get_codebook("all")
nrow(cb_all)
```

The pre-built `codebook` object ships with the package and is available
immediately without downloading:

```{r bundled-codebook}
# Available offline
head(codebook[, c("wave", "variable_name", "variable_label", "block")])
```

## Cache management

```{r cache}
# Check what is cached and how much space it uses
bemp_cache_info()

# Clear the cache (will prompt for confirmation)
bemp_cache_clear()
```

## Linking waves

The panel respondent code is stored in the registration block of each wave.
Here is a minimal example of merging two waves:

```{r merge}
library(dplyr)

w1  <- get_wave("w1")
w6n <- get_wave("w6_N")

# Identify the respondent code columns
lookup_variable("respondent code", fields = "label")

# Merge on respondent code (adjust variable names as needed)
panel <- inner_join(w1, w6n, by = "w1_reg1", suffix = c("_w1", "_w6n"))
```

## Citation

If you use this package or dataset, please cite:

**R package:**

> Freihardt, J. (2026). *BEMPdata: R package for the Bangladesh Environmental
> Mobility Panel*. Zenodo. <https://doi.org/10.5281/zenodo.18775710>

**Dataset:**

> Freihardt, J. et al. (2026). *The Bangladesh Environmental Mobility Panel
> (BEMP): Panel data on (im)mobility, socio-economic, and political impacts
> of riverbank erosion and flooding in Bangladesh* [Dataset]. Zenodo.
> <https://doi.org/10.5281/zenodo.18229498>

**Data descriptor:**

> Freihardt, J. et al. (*forthcoming*). Bangladesh Environmental Mobility
> Panel (BEMP). *[Journal]*. DOI: [to be added]
