---
title: "Data Frames"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Data Frames}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

Data frames are the workhorse of data analysis in R. In HDF5, data frames are stored as **Compound Datasets**. This allows different columns to have different data types (e.g., integer, float, string) within the same dataset, much like a SQL table.

This vignette explains how `h5lite` handles data frames, including row names, factors, and missing values.

```{r setup}
library(h5lite)
file <- tempfile(fileext = ".h5")
```

## Basic Usage

Writing a data frame is as simple as writing any other object. `h5lite` automatically maps each column to its appropriate HDF5 type.

```{r}
# Create a standard data frame
df <- data.frame(
  id = 1:5,
  group = c("A", "A", "B", "B", "C"),
  score = c(10.5, 9.2, 8.4, 7.1, 6.0),
  passed = c(TRUE, TRUE, TRUE, FALSE, FALSE),
  stringsAsFactors = FALSE
)

# Write to HDF5
h5_write(df, file, "study_data/results")

# Fetch the column names
h5_names(file, "study_data/results")

# Read back
df_in <- h5_read(file, "study_data/results")

head(df_in)
```

## Customizing Column Types

You can use the `as` argument to control the storage type for specific columns. This is passed as a named vector where the names correspond to the column names.

This is particularly useful for optimizing storage (e.g., saving space by storing small integers as `int8` or single characters as `ascii[1]`).

```{r}
df_small <- data.frame(
  id   = 1:10,
  code = rep("A", 10)
)

# Force 'id' to be uint16 and 'code' to be an ascii string
h5_write(df_small, file, "custom_df", 
         as = c(id = "uint16", code = "ascii[]"))
```

## Row Names

Standard HDF5 Compound Datasets do not have a concept of "row names". However, `h5lite` preserves them using **Dimension Scales**.

When you write a data frame with row names, `h5lite` creates a separate dataset (usually named `_rownames`) and links it to the main table. When reading, `h5lite` automatically restores these as the `row.names` of the data frame.

```{r}
mtcars_subset <- head(mtcars, 3)

h5_write(mtcars_subset, file, "cars")

h5_str(file)

# Read back
result <- h5_read(file, "cars")
print(row.names(result))
```


```{r, include=FALSE}
unlink(file)
```
