Package {datazoom.social}


Title: Simplify Access to Brazilian Social Data
Version: 0.1.0
Description: Provides tools for downloading and processing microdata from the PNAD Contínua (PNADC, Continuous National Household Sample Survey), a rotating panel survey published quarterly by IBGE (Brazilian Institute of Geography and Statistics). Includes panel identification algorithms for linking individuals across survey waves.
License: MIT + file LICENSE
URL: https://datazoom.com.br/en/
Imports: arrow, data.table, dplyr, magrittr, PNADcIBGE, purrr, readr, rlang, stringr, tidyr
Encoding: UTF-8
RoxygenNote: 7.3.3
Suggests: knitr, rmarkdown
VignetteBuilder: knitr
Depends: R (≥ 4.1.0)
BugReports: https://github.com/datazoompuc/datazoom.social/issues
LazyData: true
NeedsCompilation: no
Packaged: 2026-05-04 14:18:36 UTC; Bernardo
Author: Laura Tavares Regadas [aut, cre], DataZoom (PUC-Rio) [fnd], Igor Rigolon Veiga [aut], Arthur Lins de Vasconcellos [aut], Giulia Toscano Imbuzeiro [aut], Guilherme Jardim [aut], Pablo Chaves [aut], Breno Avidos [aut], Bernardo Sieira [aut]
Maintainer: Laura Tavares Regadas <lauratregadas@gmail.com>
Repository: CRAN
Date/Publication: 2026-05-07 16:01:37 UTC

Build PNADc Panel

Description

This function builds a panel dataset from PNADC data, identifying households and individuals

Usage

build_pnadc_panel(dat, panel)

Arguments

dat

Data frame with PNADC data, sorted into a single panel.

panel

A character with the type of panel identification. Use "none" for no paneling, "basic" for basic paneling, and "advanced" for advanced paneling.

Value

A modified dataset with added identifiers for household (id_dom) and individual (id_ind or id_rs) based on the chosen panel algorithm.

Examples


# Example usage:

panel_data <- build_pnadc_panel(dat = pnad_sample, panel = "basic")


Load Continuous PNAD Data

Description

This function downloads PNADC data and applies panel identification algorithms

Usage

load_pnadc(
  save_to,
  years,
  quarters = 1:4,
  panel = "advanced",
  raw_data = FALSE,
  save_options = c(TRUE, TRUE),
  vars = NULL
)

Arguments

save_to

A character with the directory in which to save the downloaded files.

years

A numeric indicating for which years the data will be loaded, in the format YYYY. Can be any vector of numbers, such as 2010:2012.

quarters

The quarters within those years to be downloaded. Can be a numeric vector or a list of vectors, for different quarters per year.

panel

A character choosing the panel algorithm to apply ("none", "basic", or "advanced"). For details, check vignette("BUILD_PNADC_PANEL")

raw_data

A logical setting the return of raw (TRUE) or processed (FALSE) variables.

save_options

A logical vector of length 2. Controls whether quarterly files are saved and in which format all files are saved. Panel files are always saved. There are four possible combinations:

  • c(TRUE, TRUE): saves quarterly and panel files in .csv format. This is the default.

  • c(TRUE, FALSE): saves quarterly and panel files in .parquet format.

  • c(FALSE, TRUE): does not save quarterly files; panel files are saved in .csv format.

  • c(FALSE, FALSE): does not save quarterly files; panel files are saved in .parquet format.

vars

A character vector of additional variable names to be downloaded, following the same convention as the vars parameter in get_pnadc. Each name must match a column in the PNADC microdata exactly as published by IBGE (e.g. "VD4019", "V2009").

Note that get_pnadc always returns a set of structural columns regardless of this argument, these include survey design weights (V1027, V1028, V1028001, V1028200, posest, posest_sxi), deflator variables (Habitual, Efetivo), and identifiers such as UF, Estrato, V1029, V1033, ID_DOMICILIO, totalling around 233 columns. The vars argument adds on top of those columns; it does not restrict them. Use NULL (the default) to download all available microdata columns.

If panel is not "none", any columns required by the panel identification algorithm that are missing from vars will be added automatically and a warning will list the columns that were added. The required columns per algorithm are:

  • "basic": UPA, V1008, V1014, V2007, V20082, V20081, V2008.

  • "advanced": all of the above, plus V2003.

Note that several of these (UPA, V1008, V1014) are part of the structural columns always returned by get_pnadc, so in practice only V2007, V20082, V20081, V2008 (and V2003 for "advanced") are likely to be auto-added.

Value

A message indicating the successful save of panel files.

Examples


### DO NOT RUN ###
load_pnadc(
  save_to = tempdir(),
  years = 2016,
  quarters = 1:4,
  panel = "advanced",
  raw_data = FALSE,
  save_options = c(FALSE, FALSE)
)


Simulated PNAD sample dataset

Description

A small simulated dataset inspired by microdata from the Brazilian Continuous National Household Sample Survey (PNAD Contínua), included for examples, tests, and documentation in the datazoom.social package.

Usage

pnad_sample

Format

A data.table and data.frame with 31 rows and 23 variables:

V1

Record identifier.

Ano

Survey year.

Trimestre

Survey quarter.

UF

Federative unit code.

UPA

Primary sampling unit identifier.

V1008

Household serial identifier.

V1014

Number of household members.

V1016

Household interview status or type code.

V20082

Year of birth.

V20081

Month of birth.

V2008

Day of birth or age-related auxiliary code, as provided in the simulated data.

V2007

Sex code.

V2009

Age in years.

VD3004

Educational attainment code.

VD4001

Labor force status code.

VD4002

Employment status code.

VD4005

Employment position or job category code.

VD4009

Usual hours worked category or related labor variable code.

VD4019

Monthly labor income.

V4010

Main job identifier or occupation-related code.

V4012

Economic activity or occupation grouping code.

V4013

Time in job or age-related auxiliary labor code.

V4022

Household or person weight, as represented in the simulated data.

Details

This dataset does not contain real PNAD observations. It was created only for demonstration purposes and includes a small subset of variables from the original files distributed by IBGE. Its reduced size makes package examples faster and lighter, while preserving a structure similar to that of the original survey data.

The purpose of pnad_sample is to provide a lightweight object that mimics part of the structure of PNAD Contínua microdata, allowing users to run examples without downloading or processing the full original files.

Variable names follow the naming convention used in the original survey microdata, but the values in this dataset are simulated and should not be used for substantive empirical analysis.

Source

Inspired by the structure of PNAD Contínua microdata produced by the Brazilian Institute of Geography and Statistics (IBGE). https://www.ibge.gov.br