---
title: "Zero-Copy Julia Arrays in R with jlview"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Zero-Copy Julia Arrays in R with jlview}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    eval = FALSE,
    purl = FALSE
)
```

## Introduction

When working with Julia arrays from R via JuliaCall, every transfer copies data.
For a 50,000 x 25,000 Float64 matrix (~9.3 GB), that means allocating 9.3 GB on
the R side and spending seconds on the copy. If you are iterating on exploratory
analysis or building a pipeline that shuttles arrays back and forth, those copies
add up fast.

**jlview** eliminates that overhead using R's ALTREP (Alternative
Representations) framework. Instead of copying, `jlview()` returns a lightweight
R vector whose data pointer points directly into Julia's memory. R operations
like `sum()`, subsetting, and `colMeans()` read from Julia's buffer with zero
additional allocation.

| | Latency | R Memory |
|---|---|---|
| **jlview (zero-copy)** | 38 ms | 0 MB |
| **copy (collect)** | 2.7 s | 9.3 GB |
| **Improvement** | **72x faster** | **100% less** |

*Benchmark: 50K x 25K Float64 matrix (9.3 GB)*

## Getting Started

Install jlview from GitHub:

```{r, eval = FALSE}
# install.packages("remotes")
remotes::install_github("tanaylab/jlview")
```

Before using jlview, initialize the Julia runtime via JuliaCall:

```{r}
library(jlview)
JuliaCall::julia_setup()
```

The `julia_setup()` call is required once per R session. jlview will
automatically load its Julia-side support module when you first call `jlview()`.

## Dense Arrays

### Vectors

Create a Julia vector and wrap it in an ALTREP view:

```{r}
JuliaCall::julia_command("v = randn(100_000)")
x <- jlview(JuliaCall::julia_eval("v"))

length(x) # 100000
sum(x) # computed directly from Julia memory
x[1:5] # subsetting works as usual
```

### Matrices

Two-dimensional Julia arrays become R matrices with proper dimensions:

```{r}
JuliaCall::julia_command("M = randn(1000, 500)")
m <- jlview(JuliaCall::julia_eval("M"))

dim(m) # [1] 1000  500
m[1:3, 1:3] # subset rows and columns
colSums(m) # column sums, no copy
```

### Verifying Zero-Copy

You can confirm that no R-side allocation occurred by checking `is.altrep()`:

```{r}
.Internal(inspect(x))
# Should show ALTREP wrapper, not a materialized REALSXP
```

## Type Handling

jlview supports the following Julia element types:

| Julia type | R type | Strategy |
|---|---|---|
| `Float64` | `numeric` | Direct zero-copy |
| `Int32` | `integer` | Direct zero-copy |
| `Float32` | `numeric` | Convert to Float64 in Julia, then zero-copy |
| `Int64` | `numeric` | Convert to Float64 in Julia, then zero-copy |
| `Int16` | `integer` | Convert to Int32 in Julia, then zero-copy |
| `UInt8` | `integer` | Convert to Int32 in Julia, then zero-copy |
| `Bool` | `logical` | Full copy (layout incompatible) |
| `String[]` | `character` | Full copy (layout incompatible) |

The conversion strategy is deliberate. Types like Float32 and Int64 do not have
a direct R counterpart with matching memory layout. jlview converts them once
on the Julia side into a layout-compatible type (Float64 or Int32), pins the
converted array, and then creates a zero-copy view of that. The one-time
conversion cost is small compared to copying across runtimes.

For Bool and String arrays, the memory layouts are fundamentally incompatible
(Julia Bool is 1 byte, R logical is 4 bytes; Julia strings are GC-managed
objects). These fall back to JuliaCall's standard copy path, and `jlview()` will
emit a warning.

## Named Arrays

Julia's NamedArrays package provides named dimensions. jlview has dedicated
functions that preserve these names without triggering ALTREP materialization.

### Named Vectors

```{r}
JuliaCall::julia_command("using NamedArrays")
JuliaCall::julia_command('nv = NamedArray([10.0, 20.0, 30.0], (["a", "b", "c"],))')
x <- jlview_named_vector(JuliaCall::julia_eval("nv"))

names(x) # [1] "a" "b" "c"
x["b"] # 20, still zero-copy for the data
```

### Named Matrices

```{r}
JuliaCall::julia_command('nm = NamedArray(randn(3, 2), (["r1","r2","r3"], ["c1","c2"]))')
m <- jlview_named_matrix(JuliaCall::julia_eval("nm"))

rownames(m) # [1] "r1" "r2" "r3"
colnames(m) # [1] "c1" "c2"
m["r1", "c2"]
```

Names are attached atomically during ALTREP construction. This is important
because setting `names()` or `dimnames()` on an existing ALTREP vector would
normally trigger materialization (a full copy), defeating the purpose. By passing
names through `jlview(..., names = ...)` or `jlview(..., dimnames = ...)`, the
names are set on the ALTREP object before R ever inspects the data.

## Sparse Matrices

Julia's `SparseMatrixCSC` maps naturally to R's `dgCMatrix` from the Matrix
package. `jlview_sparse()` constructs a dgCMatrix where the nonzero values
(`x` slot) are backed by a zero-copy ALTREP view of Julia's `nzval` array.

```{r}
JuliaCall::julia_command("using SparseArrays")
JuliaCall::julia_command("sp = sprand(Float64, 10000, 5000, 0.01)")
s <- jlview_sparse(JuliaCall::julia_eval("sp"))

class(s) # [1] "dgCMatrix"
dim(s) # [1] 10000  5000
Matrix::nnzero(s)
```

The row indices (`i` slot) and column pointers (`p` slot) require a 1-to-0 index
shift (Julia is 1-based, dgCMatrix is 0-based). These are copied and shifted in
Julia before being returned to R as plain integer vectors.

## Memory Management

jlview pins Julia arrays in a global dictionary to prevent Julia's garbage
collector from reclaiming them while R holds a reference. This means Julia memory
is held as long as the R ALTREP object exists.

### Three-Layer Defense

1. **Pinning dictionary** -- Each array is stored in `JlviewSupport.PINNED` with
   a unique ID. The C finalizer on the R ALTREP object calls `unpin()` when R
   garbage-collects the wrapper.

2. **GC pressure tracking** -- jlview tracks total pinned bytes and reports them
   to R via `R_AdjustExternalMemory()`. When pinned memory exceeds a threshold
   (default 10 GB), jlview forces an R `gc()` to reclaim stale ALTREP objects.

3. **Explicit release** -- For tight control, call `jlview_release()` to
   immediately unpin the array without waiting for R's GC.

### Explicit Release

```{r}
m <- jlview(JuliaCall::julia_eval("randn(10000, 1000)"))
# ... use m ...
jlview_release(m)
# m is now invalid; accessing it will error
```

### Scoped Release

`with_jlview()` guarantees release even if an error occurs:

```{r}
result <- with_jlview(JuliaCall::julia_eval("randn(100000)"), {
    c(mean(.x), sd(.x))
})
# .x is automatically released here, result is a plain R vector
```

### Tuning GC Pressure

```{r}
# Check current state
jlview_gc_pressure()
# $pinned_bytes
# [1] 80000000
# $threshold
# [1] 10737418240

# Lower the threshold to 500 MB
jlview_set_gc_threshold(500e6)
```

## Copy-on-Write Semantics

jlview objects follow R's standard copy-on-write (COW) semantics. Read
operations (subsetting, aggregation, printing) are zero-copy. Write operations
trigger materialization: R allocates a fresh buffer, copies the data from Julia,
and the ALTREP wrapper is replaced by a standard R vector.

```{r}
x <- jlview(JuliaCall::julia_eval("collect(1.0:5.0)"))
y <- x # y and x share Julia memory, no copy
sum(y) # zero-copy read

y[1] <- 999.0 # WRITE: triggers materialization
# y is now a standard R numeric vector (copy of Julia data, modified)
# x still points to Julia memory, unchanged
```

This is identical to how R treats any shared vector -- jlview does not introduce
new semantics. The only difference is that before materialization, the backing
store is Julia memory instead of R memory.

## Serialization

jlview objects can be saved with `saveRDS()` and restored with `readRDS()`.
On save, the data is materialized into a standard R vector (since Julia memory
cannot be serialized). On load, you get back a regular R vector.

```{r}
x <- jlview(JuliaCall::julia_eval("randn(1000)"))
saveRDS(x, "my_vector.rds")

# In a new session (no Julia needed):
y <- readRDS("my_vector.rds")
class(y) # "numeric" -- a plain R vector
```

This means serialization always works correctly, but the zero-copy property
is not preserved across save/load cycles.

## Known Limitations

- **`NA_integer_` collision** -- R uses `INT_MIN` (-2147483648) to represent
  `NA_integer_`. If a Julia Int32 array contains this exact value, R will
  interpret it as NA. There is no workaround short of avoiding this sentinel
  value in Julia integer arrays.

- **Int64 precision loss** -- Julia Int64 values outside the range
  +/-(2^53 - 1) lose precision when converted to Float64. jlview emits a
  warning if this is detected, but the conversion still proceeds.

- **Bool and String always copy** -- Julia's `Bool` (1 byte) is incompatible
  with R's `logical` (4 bytes), and Julia strings are GC-managed objects with
  no contiguous memory layout that R can point to. These types always fall back
  to a full copy via JuliaCall.

- **Write-back not supported** -- Modifications to jlview objects do not
  propagate back to Julia. Writes trigger R's copy-on-write, producing an
  independent R vector.

- **Single-session lifetime** -- jlview objects are tied to the Julia runtime
  in the current R session. They cannot be shared across processes or serialized
  without materialization.
