---
title: "Complete Workflow: From Data to Results"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Complete Workflow: From Data to Results}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 8,
  fig.height = 5
)
```

## From Raw Data to Publication-Ready Results

This guide walks through a complete analysis workflow using taxdiv --- from
preparing your data to interpreting and exporting results. By the end, you
will have computed all 14 diversity indices, generated diagnostic plots, and
saved results to a file.

```{r setup}
library(taxdiv)
```

## Step 1: Prepare Your Data

taxdiv needs two inputs:

1. **Community data** --- a named numeric vector (single site) or a data frame
   with a `Site` column (multiple sites)
2. **Taxonomic tree** --- a data frame with `Species` and at least one
   higher-level column (Genus, Family, Order, etc.)

### Single community (named vector)

```{r single_site}
community <- c(
  Quercus_coccifera    = 25,
  Quercus_infectoria   = 18,
  Pinus_brutia         = 30,
  Pinus_nigra          = 12,
  Juniperus_excelsa    = 8,
  Juniperus_oxycedrus  = 6,
  Arbutus_andrachne    = 15,
  Styrax_officinalis   = 4,
  Cercis_siliquastrum  = 3,
  Olea_europaea        = 10
)
```

### Taxonomic tree

Use `build_tax_tree()` to create a properly structured tree. Column order
defines the hierarchy --- species-level first, then progressively higher ranks:

```{r tree}
tax_tree <- build_tax_tree(
  species = names(community),
  Genus   = c("Quercus", "Quercus", "Pinus", "Pinus",
              "Juniperus", "Juniperus", "Arbutus", "Styrax",
              "Cercis", "Olea"),
  Family  = c("Fagaceae", "Fagaceae", "Pinaceae", "Pinaceae",
              "Cupressaceae", "Cupressaceae", "Ericaceae", "Styracaceae",
              "Fabaceae", "Oleaceae"),
  Order   = c("Fagales", "Fagales", "Pinales", "Pinales",
              "Pinales", "Pinales", "Ericales", "Ericales",
              "Fabales", "Lamiales")
)

tax_tree
```

### Multiple sites (data frame)

For multi-site analysis, your data should look like this:

```{r multi_site}
multi_data <- data.frame(
  Site = rep(c("Forest_A", "Forest_B"), each = 5),
  Species = c("Quercus_coccifera", "Pinus_brutia", "Juniperus_excelsa",
              "Arbutus_andrachne", "Olea_europaea",
              "Quercus_coccifera", "Quercus_infectoria", "Pinus_brutia",
              "Pinus_nigra", "Cercis_siliquastrum"),
  Abundance = c(25, 30, 8, 15, 10,
                40, 20, 15, 10, 5),
  Genus = c("Quercus", "Pinus", "Juniperus", "Arbutus", "Olea",
            "Quercus", "Quercus", "Pinus", "Pinus", "Cercis"),
  Family = c("Fagaceae", "Pinaceae", "Cupressaceae", "Ericaceae", "Oleaceae",
             "Fagaceae", "Fagaceae", "Pinaceae", "Pinaceae", "Fabaceae"),
  Order = c("Fagales", "Pinales", "Pinales", "Ericales", "Lamiales",
            "Fagales", "Fagales", "Pinales", "Pinales", "Fabales")
)

head(multi_data)
```

## Step 2: Compute All Indices at Once

### Single community --- `compare_indices()`

```{r single_compare}
results <- compare_indices(community, tax_tree)
results
```

This returns all 14 indices: Shannon, Simpson, Delta, Delta*, AvTD, VarTD,
and the 8 Ozkan pTO indices (uTO, TO, uTO+, TO+, and their max variants).

### Multiple sites --- `batch_analysis()`

```{r batch}
batch_results <- batch_analysis(multi_data)
batch_results
```

`batch_analysis()` automatically detects the `Site`, `Species`, and
`Abundance` columns and computes all indices for each site. You can also
use `summary()` to see statistics across sites:

```{r batch_summary}
summary(batch_results)
```

## Step 3: Run the Full Ozkan Pipeline

For deeper analysis, the full three-run pipeline reveals species-level
contributions to diversity:

```{r full_pipeline}
full <- ozkan_pto_full(community, tax_tree, n_iter = 101, seed = 42)

cat("Run 1 (deterministic):\n")
cat("  uTO+ =", round(full$run1$uTO_plus, 4), "\n")
cat("  TO+  =", round(full$run1$TO_plus, 4), "\n\n")

cat("Run 2 (stochastic maximum):\n")
cat("  uTO+ =", round(full$run2$uTO_plus_max, 4), "\n")
cat("  TO+  =", round(full$run2$TO_plus_max, 4), "\n\n")

cat("Run 3 (sensitivity maximum):\n")
cat("  uTO+ =", round(full$run3$uTO_plus_max, 4), "\n")
cat("  TO+  =", round(full$run3$TO_plus_max, 4), "\n")
```

## Step 4: Visualize Results

### Compare communities side by side

```{r compare_plot, fig.width=10, fig.height=6, fig.alt="Bar chart comparing diversity indices between intact and degraded forest communities"}
degraded <- c(
  Quercus_coccifera   = 40,
  Pinus_brutia        = 35,
  Juniperus_oxycedrus = 10
)
tax_degraded <- tax_tree[tax_tree$Species %in% names(degraded), ]

communities <- list(
  "Intact Forest" = community,
  "Degraded"      = degraded
)

comp <- compare_indices(communities, tax_tree, plot = TRUE)
comp$plot
```

### Inspect the taxonomic tree

```{r tree_plot, fig.width=9, fig.height=5.5, fig.alt="Dendrogram showing community taxonomic structure colored by family"}
plot_taxonomic_tree(tax_tree, community = community,
                    color_by = "Family",
                    title = "Community Taxonomic Structure")
```

### Rarefaction curve

```{r rarefaction, fig.width=8, fig.height=5, fig.alt="Shannon rarefaction curve with bootstrap confidence interval"}
rare <- rarefaction_taxonomic(community, tax_tree,
                               index = "shannon",
                               steps = 10, n_boot = 50, seed = 42)
plot_rarefaction(rare)
```

## Step 5: Export Results

### Save to CSV

```{r export_csv, eval=FALSE}
# Single community
df <- as.data.frame(t(compare_indices(community, tax_tree)$table))
write.csv(df, "results.csv", row.names = FALSE)

# Multi-site batch
write.csv(as.data.frame(batch_results$results),
          "batch_results.csv", row.names = FALSE)
```

### Save to Excel

```{r export_xlsx, eval=FALSE}
# Requires writexl package
# install.packages("writexl")
writexl::write_xlsx(as.data.frame(batch_results$results),
                    "batch_results.xlsx")
```

## Quick Reference: Which Function to Use?

| I want to... | Function |
|--------------|----------|
| Compute Shannon/Simpson | `shannon()`, `simpson()` |
| Compute all 14 indices for one community | `compare_indices()` |
| Compute all indices for many sites | `batch_analysis()` |
| Run the full Ozkan pipeline (Run 1+2+3) | `ozkan_pto_full()` |
| Test if AvTD/VarTD is significant | `simulate_td()` + `plot_funnel()` |
| Check if sampling effort is sufficient | `rarefaction_taxonomic()` + `plot_rarefaction()` |
| Compare two or more communities visually | `compare_indices(plot=TRUE)` or `plot_radar()` |
| See species contributions | `plot_bubble()` |
| View the taxonomic hierarchy | `plot_taxonomic_tree()` |
| Create a taxonomic tree from species data | `build_tax_tree()` |
