---
title: "Attributes In-Depth"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Attributes In-Depth}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

In HDF5, **attributes** are small pieces of metadata attached to groups or datasets. They are best used to store descriptive information: units, timestamps, descriptions, or experimental parameters—separately from the main data array.

This vignette covers how to write, read, and manage these attributes using `h5lite`, as well as important limitations regarding their structure.

```{r setup}
library(h5lite)
file <- tempfile(fileext = ".h5")
```

## Writing Attributes

There are two ways to write attributes in `h5lite`: explicitly (targeting an object) or implicitly (saving R attributes).

### 1. Explicit Writing

You can write an attribute to any existing group or dataset using the `attr` argument in `h5_write()`. This is useful for adding metadata after the data has been saved.

```{r}
# First, write a dataset
h5_write(1:10, file, "measurements/temperature")

# Now, attach attributes to it
h5_write(I("Celsius"),    file, "measurements/temperature", attr = "units")
h5_write(I("2023-10-27"), file, "measurements/temperature", attr = "date")
h5_write(I(0.1),          file, "measurements/temperature", attr = "precision")
```

*Note: If the attribute already exists, it will be overwritten.*

### 2. Implicit Writing (R Attributes)

`h5lite` automatically preserves custom R attributes attached to your objects. When you write an R object, any attributes (except for standard internal ones like `dim`, `names`, or `class`) are written as HDF5 attributes.

```{r}
# Create a vector with custom R attributes
data <- rnorm(5)
attr(data, "description") <- I("Randomized control group")
attr(data, "valid")       <- I(TRUE)

# Write the object
h5_write(data, file, "experiment/control")

# Check the file - the attributes are there
h5_attr_names(file, "experiment/control")

h5_str(file)
```

## Reading Attributes

### 1. Accessing Specific Attributes

If you only need a specific piece of metadata without reading the full dataset, you can use `h5_read(..., attr = "name")`.

```{r}
# Read just the 'units' attribute
units <- h5_read(file, "measurements/temperature", attr = "units")
print(units)
```

### 2. Reading with the Dataset

When you read a dataset, `h5lite` automatically reads all attached attributes and re-attaches them to the resulting R object.

```{r}
# Read the full dataset
temps <- h5_read(file, "measurements/temperature")

# The attributes are available in R
attributes(temps)

str(temps)
```

## Managing Attributes

### Listing Attributes

Use `h5_attr_names()` to list the names of all attributes attached to a specific object.

```{r}
h5_attr_names(file, "measurements/temperature")
```

### Deleting Attributes

You can remove a specific attribute using `h5_delete()`.

```{r}
# Delete the 'precision' attribute
h5_delete(file, "measurements/temperature", attr = "precision")

# Verify removal
h5_attr_names(file, "measurements/temperature")
```

## Important Limitations

While attributes are powerful for storing metadata, they are fundamentally simpler structures than HDF5 Datasets. HDF5 enforces specific constraints that affect how `h5lite` can store complex R objects as attributes.

### 1. No Dimension Scales (Loss of Names)

HDF5 **Dimension Scales** (the mechanism `h5lite` uses to store `names`, `dimnames`, and `row.names`) can only be attached to **Datasets**. They cannot be attached to attributes.

This means if you write a named vector, matrix, or array as an attribute, **the names will be lost**.

```{r}
# A vector with names
named_vec <- c(a = 1, b = 2, c = 3)

# Write as a standard Dataset -> Names are preserved
h5_write(named_vec, file, "my_dataset")
h5_names(file, "my_dataset")

# Write as an Attribute -> Names are LOST
h5_write(named_vec, file, "measurements/temperature", attr = "meta_vec")
h5_names(file, "measurements/temperature", attr = "meta_vec")
```

**Exception: Data Frames**
There is one major exception: `data.frame` objects.

Because HDF5 stores data frames as **Compound Types**, the column names are baked into the type definition itself, not stored as side-loaded metadata. Therefore, **column names are preserved** even when writing a data frame as an attribute. However, `row.names` (which rely on dimension scales) will still be lost.

```{r}
# A data frame with metadata
df <- data.frame(
  id = 1:3, 
  status = c("ok", "fail", "ok")
)

# Write as attribute
h5_write(df, file, "measurements/temperature", attr = "log")

# Column names survive!
h5_names(file, "measurements/temperature", attr = "log")
```

### 2. No Attributes on Attributes (Nesting)

In HDF5, you cannot attach attributes to other attributes. This hierarchy is strictly one level deep: Groups/Datasets can have attributes, but attributes cannot.

Consequently, you cannot treat an attribute as a "Group" or folder to store other items. If you need a hierarchical structure for your metadata, you should create a Group (e.g., `/metadata`) and store your metadata as Datasets inside it, rather than attaching them as attributes to another object.

## Controlling Attribute Types

Attributes in HDF5 are typed just like datasets. `h5lite` allows you to control the storage type of attributes using the `as` argument in `h5_write()` or `h5_read()`.

To target an attribute specifically, prefix the name with `@` in the `as` vector.

### Customizing Storage Type

```{r}
# Write the temperature data again, but use a fixed length string for 'description'
h5_write(data, file, "experiment/control", as = c("@description" = "ascii[]"))

# Store an attribute as a `uint8` instead of the default `int32`
h5_write(I(42), file, "measurements/temperature", "sensor_id", as = "uint8")
```

### Customizing Read Type

You can also coerce attributes when reading them.

```{r}
# Force the 'valid' attribute to be read as logical, even if stored as integer
meta <- h5_read(file, "experiment/control", attr = "valid", as = "logical")
```

## Special Note: Dimensions

You might notice that standard R attributes like `dim` are not visible in `h5_attr_names()`.

This is because `h5lite` handles structural attributes implicitly. The dimensions of the attribute data itself are stored in the HDF5 Dataspace, not as a separate attribute. `h5lite` automatically restores the `dim` attribute on the R object when reading, ensuring matrices and arrays retain their shape.

```{r, include=FALSE}
unlink(file)
```
