---
title: "Spatial Data Workflows with sf"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Spatial Data Workflows with sf}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  fig.alt = "Spatial map of Bayesian surprise values for an sf object."
)
```

```{r setup}
library(bayesiansurpriser)
library(sf)
library(ggplot2)
```

## Overview

The `bayesiansurpriser` package is designed for seamless integration with the `sf` package for spatial data. This vignette demonstrates workflows for analyzing spatial data.

## Basic sf Workflow

### Loading Spatial Data

```{r load-data}
# Load the NC SIDS dataset
nc <- st_read(system.file("shape/nc.shp", package = "sf"), quiet = TRUE)

# Examine the data
head(nc[, c("NAME", "SID74", "BIR74", "SID79", "BIR79")])
```

### Computing Surprise

The `surprise()` function automatically detects sf objects and uses the appropriate method:

```{r compute}
result <- surprise(nc, observed = SID74, expected = BIR74)

# Result is an sf object with surprise columns added
class(result)
names(result)[grepl("surprise", names(result))]
```

### Convenience Function: st_surprise

For explicit sf workflows, use `st_surprise()`:

```{r st-surprise}
result2 <- st_surprise(nc, observed = SID74, expected = BIR74)

# Identical results
all.equal(result$surprise, result2$surprise)
```

## Accessing Results

### Extracting Surprise Values

```{r extract}
# Get surprise values
surprise_vals <- get_surprise(result, "surprise")
head(surprise_vals)

# Get signed surprise values
signed_vals <- get_surprise(result, "signed")
head(signed_vals)
```

### Accessing the Model Space

```{r model-space}
mspace <- get_model_space(result)
print(mspace)
```

### Working with the sf Object

The result is a full sf object, so all sf operations work:

```{r sf-ops}
# Filter high-surprise regions
high_surprise <- result[result$surprise > median(result$surprise), ]
cat("Regions with above-median surprise:", nrow(high_surprise), "\n")

# Compute centroids
centroids <- st_centroid(result)
```

## Visualization

### ggplot2 Integration

```{r ggplot-basic}
ggplot(result) +
  geom_sf(aes(fill = surprise)) +
  scale_fill_surprise() +
  labs(title = "Bayesian Surprise: NC SIDS 1974")
```

```{r ggplot-signed}
ggplot(result) +
  geom_sf(aes(fill = signed_surprise)) +
  scale_fill_surprise_diverging() +
  labs(title = "Signed Surprise",
       subtitle = "Red = over-representation, Blue = under-representation")
```

## Comparing Time Periods

```{r compare}
# Compute surprise for two time periods
result_74 <- surprise(nc, observed = SID74, expected = BIR74)
result_79 <- surprise(nc, observed = SID79, expected = BIR79)

# Compare
nc$surprise_74 <- result_74$surprise
nc$surprise_79 <- result_79$surprise
nc$surprise_change <- nc$surprise_79 - nc$surprise_74

# Plot change
ggplot(nc) +
  geom_sf(aes(fill = surprise_change)) +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red",
                       name = "Change in\nSurprise") +
  labs(title = "Change in Surprise: 1974 to 1979")
```

## Advanced: Custom Model Spaces

```{r custom}
# Create sophisticated model space
custom_space <- model_space(
  bs_model_uniform(),
  bs_model_baserate(nc$BIR74),
  bs_model_gaussian(),
  bs_model_funnel(nc$BIR74),
  prior = c(0.1, 0.3, 0.2, 0.4)
)

result_custom <- surprise(nc, observed = SID74, expected = BIR74,
                          models = custom_space)

# Compute a global posterior model update explicitly if needed
updated_space <- bayesian_update(custom_space, nc$SID74)
cat("Prior:", custom_space$prior, "\n")
cat("Posterior:", updated_space$posterior, "\n")
```

## Integration with dplyr

```{r dplyr, eval=FALSE}
library(dplyr)

# Pipe-friendly workflow
nc %>%
  st_surprise(observed = SID74, expected = BIR74) %>%
  filter(surprise > 0.5) %>%
  select(NAME, surprise, signed_surprise, geometry)
```

## Tips for Large Datasets

1. **Pre-compute expected values**: If expected values are expensive to compute, store them
2. **Use appropriate models**: For very large datasets, the Sampled model with smaller `sample_frac` can be faster
3. **Parallel processing**: The package is compatible with future/furrr for parallel computation

```{r large-data, eval=FALSE}
# For large spatial datasets
result <- surprise(large_sf,
                  observed = cases,
                  expected = population,
                  models = model_space(
                    bs_model_uniform(),
                    bs_model_baserate(large_sf$population)
                  ))
```
