---
title: "Matrices and Arrays"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Matrices and Arrays}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

HDF5 is an excellent format for storing large, multi-dimensional numerical arrays. `h5lite` simplifies the process of reading and writing matrices and arrays by handling the complex memory layout differences between R and HDF5 automatically.

This vignette covers writing matrices, preserving dimension names (`dimnames`), and understanding how `h5lite` manages dimension ordering.

```{r setup}
library(h5lite)
file <- tempfile(fileext = ".h5")
```

## Writing Matrices

In R, matrices are simply 2-dimensional arrays. You can write them directly using `h5_write()`. `h5lite` preserves the dimensions exactly as they appear in R.

```{r}
# Create a 3x4 matrix
mat <- matrix(1:12, nrow = 3, ncol = 4)

# Write to file
h5_write(mat, file, "linear_algebra/mat_a")

# Read back
mat_in <- h5_read(file, "linear_algebra/mat_a")

# Verify
all.equal(mat, mat_in)
```

## Writing N-Dimensional Arrays

The same logic applies to arrays with 3 or more dimensions.

```{r}
# Create a 3D array (e.g., spatial data over time: x, y, time)
vol <- array(runif(24), dim = c(4, 3, 2))

h5_write(vol, file, "spatial/volume")

# Check dimensions without reading the full data
h5_dim(file, "spatial/volume")
```

## Dimension Names (dimnames)

R objects often carry metadata in the form of `dimnames` (row names, column names, etc.). HDF5 does not have a native "row name" concept for numerical arrays, but it supports **Dimension Scales**.

`h5lite` automatically converts R `dimnames` into HDF5 Dimension Scales. This allows your row and column names to survive the round-trip to disk and back.

```{r}
# Create a matrix with row and column names
data <- matrix(rnorm(6), nrow = 2)
rownames(data) <- c("Sample_A", "Sample_B")
colnames(data) <- c("Gene_1", "Gene_2", "Gene_3")

h5_write(data, file, "genetics/expression")

# Read back
data_in <- h5_read(file, "genetics/expression")

print(data_in)
```

> **Technical Note:** In the HDF5 file, the names are stored as separate datasets (e.g., `_rownames`, `_colnames`) and linked to the main dataset using HDF5 Dimension Scale attributes.

## Dimension Ordering (Row-Major vs. Column-Major)

One of the most confusing aspects of HDF5 for R users is dimension ordering.

* **R** is **Column-Major**: The first dimension varies fastest.
* **HDF5** (and C/C++/Python) is **Row-Major**: The last dimension varies fastest.

### How h5lite handles it

To ensure that a `3x4` matrix in R looks like a `3x4` dataset in HDF5 tools (like `h5dump` or `HDFView`), `h5lite` **rearranges** the data during read/write operations.

1.  **Writing:** `h5lite` converts R's column-major memory layout to HDF5's row-major layout.
2.  **Reading:** `h5lite` converts the data back to column-major for R.

This ensures that **indexing is preserved**. `x[2, 1]` in R refers to the exact same value after reading it back from HDF5.

### Interoperability with Python

Because `h5lite` writes the data in C-order (Row-Major) to match the HDF5 specification, files created with `h5lite` are perfectly readable by Python (`h5py` or `pandas`).

* **R:** Shape is `(3, 4)`
* **Python:** Shape is `(3, 4)`

*Note: Some other R packages create HDF5 files by swapping the dimensions (writing a 3x4 matrix as 4x3) to avoid the cost of transposing data. `h5lite` prioritizes correctness and interoperability over raw write speed.*

## Compression and Chunking

Matrices and arrays benefit significantly from compression. When you enable compression, `h5lite` automatically "chunks" the dataset (breaks it into smaller tiles).

```{r}
# Large matrix of zeros (highly compressible)
sparse_mat <- matrix(0, nrow = 1000, ncol = 1000)
sparse_mat[1:10, 1:10] <- 1

# Write with default compression (zlib level 5)
h5_write(sparse_mat, file, "compressed/matrix")

# Write with high compression (zlib level 9)
h5_write(sparse_mat, file, "compressed/matrix_max", compress = "gzip-9")
```

## Partial I/O

`h5lite` is designed for simplicity and currently reads/writes full datasets at once. It does **not** support partial I/O (hyperslabs), such as reading only rows 1-10 of a 1,000,000 row matrix.

If you need to read specific subsets of data that are too large to fit in memory, you should consider using the `rhdf5` or `hdf5r` packages.

```{r, include=FALSE}
unlink(file)
```

