---
title: "Getting Started with SQIpro: Comprehensive Soil Quality Index"
author: "Your Name"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Getting Started with SQIpro}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
bibliography: references.bib
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE, comment = "#>",
  fig.width = 7, fig.height = 5,
  warning = FALSE, message = FALSE
)
library(SQIpro)
```

# Introduction

Soil quality is defined as *"the capacity of a specific kind of soil to
function within natural or managed ecosystem boundaries, to sustain plant
and animal productivity, maintain or enhance water and air quality, and
support human health and habitation"* [@doran1994].

The **Soil Quality Index (SQI)** provides a single numeric score (0–1)
that integrates multiple soil physical, chemical, and biological
indicators. `SQIpro` implements six established methods for computing SQI,
a flexible variable scoring framework, and publication-quality
visualisation tools — all in a single CRAN-compliant R package.

## Key Concepts

| Term | Definition |
|------|-----------|
| **Minimum Data Set (MDS)** | The smallest subset of soil variables that adequately characterises soil quality [@andrews2004] |
| **Scoring function** | A transformation that converts raw variable values to a 0–1 score |
| **SQI** | Weighted or unweighted combination of variable scores |
| **Optimum** | The value or range of a variable associated with best soil function |

---

# Installation

```{r install, eval=FALSE}
# From CRAN (once published)
install.packages("SQIpro")

# Development version from GitHub
# install.packages("remotes")
remotes::install_github("yourname/SQIpro")
```

---

# The Example Dataset

`SQIpro` ships with `soil_data`, a hypothetical dataset of 100 soil
samples across 5 land-use systems and 2 depths, containing 12 soil
indicators.

```{r data}
data(soil_data)
dim(soil_data)
head(soil_data)
```

```{r data-summary}
table(soil_data$LandUse, soil_data$Depth)
```

---

# Step 1: Validate Your Data

Always validate before analysis:

```{r validate}
result <- validate_data(
  soil_data,
  group_cols = c("LandUse", "Depth")
)
result$n_per_group
```

---

# Step 2: Define Variable Configuration

This is the most important step. Each variable receives a scoring type:

- **`"more"`** — higher is better (e.g., organic carbon, microbial biomass)
- **`"less"`** — lower is better (e.g., bulk density, electrical conductivity)
- **`"opt"`** — a specific optimum range (e.g., pH 6.0–7.0)
- **`"trap"`** — trapezoidal with explicit zero boundaries

```{r config}
cfg <- make_config(
  variable    = c("pH",   "EC",   "BD",   "OC",   "MBC",
                  "PMN",  "Clay", "WHC",  "DEH",  "AP",  "TN"),
  type        = c("opt",  "less", "less", "more", "more",
                  "more", "opt",  "more", "more", "more","more"),
  opt_low     = c(6.0,   NA,     NA,     NA,     NA,
                  NA,     20,     NA,     NA,     NA,    NA),
  opt_high    = c(7.0,   NA,     NA,     NA,     NA,
                  NA,     35,     NA,     NA,     NA,    NA),
  description = c(
    "Soil pH (H2O 1:2.5)",
    "Electrical Conductivity (dS/m)",
    "Bulk Density (g/cm3)",
    "Organic Carbon (%)",
    "Microbial Biomass Carbon (mg/kg)",
    "Potentially Mineralizable N (mg/kg)",
    "Clay content (%)",
    "Water Holding Capacity (%)",
    "Dehydrogenase Activity (ug TPF/g/day)",
    "Available Phosphorus (mg/kg)",
    "Total Nitrogen (%)"
  )
)
print(cfg)
```

### Verify scoring curves

Before proceeding, always visualise your scoring curves to confirm
biological plausibility:

```{r scoring-curves, fig.height=7}
plot_scoring_curves(soil_data, cfg,
                    group_cols = c("LandUse", "Depth"))
```

---

# Step 3: Score All Variables

```{r score}
scored <- score_all(soil_data, cfg,
                    group_cols = c("LandUse", "Depth"))
head(scored[, c("LandUse", "Depth", "pH", "OC", "MBC")])
```

---

# Step 4: Select the Minimum Data Set (MDS)

```{r mds}
mds <- select_mds(scored,
                  group_cols     = c("LandUse", "Depth"),
                  load_threshold = 0.6)
mds$mds_vars
```

---

# Step 5: Compute SQI Using All Six Methods

## Linear Scoring

Equal-weight additive index [@doran1994]:

```{r linear}
res_lin <- sqi_linear(scored, cfg,
                      group_cols = c("LandUse", "Depth"),
                      mds_vars   = mds$mds_vars)
print(res_lin)
```

## Regression-Based

Variables weighted by stepwise regression coefficients [@masto2008]:

```{r regression}
res_reg <- sqi_regression(scored, cfg,
                           dep_var    = "OC",
                           group_cols = c("LandUse", "Depth"),
                           mds_vars   = mds$mds_vars)
print(res_reg)
```

## PCA-Based

Variables weighted by variance explained [@andrews2004; @bastida2008]:

```{r pca}
res_pca <- sqi_pca(scored, cfg,
                   group_cols = c("LandUse", "Depth"),
                   mds        = mds)
print(res_pca)
```

## Fuzzy Logic

Arithmetic or geometric mean of fuzzy membership scores:

```{r fuzzy}
res_fuz <- sqi_fuzzy(scored, cfg,
                     group_cols = c("LandUse", "Depth"),
                     mds_vars   = mds$mds_vars,
                     operator   = "mean")
print(res_fuz)
```

## Entropy Weighting

Objective weights derived from Shannon entropy [@shannon1948]:

```{r entropy}
res_ent <- sqi_entropy(scored, cfg,
                       group_cols = c("LandUse", "Depth"),
                       mds_vars   = mds$mds_vars)
print(res_ent)
cat("\nEntropy weights:\n")
print(attr(res_ent, "entropy_weights"))
```

## TOPSIS

Multi-criteria ranking by ideal solution distance [@hwang1981]:

```{r topsis}
res_top <- sqi_topsis(scored, cfg,
                      group_cols = c("LandUse", "Depth"),
                      mds_vars   = mds$mds_vars)
print(res_top)
```

## Compare All Methods

```{r compare}
comparison <- sqi_compare(scored, cfg,
                           group_cols = c("LandUse", "Depth"),
                           dep_var    = "OC",
                           mds        = mds)
print(comparison)
```

---

# Step 6: Visualisation

## Score Heatmap

```{r heatmap}
plot_scores(scored, cfg,
            group_cols = c("LandUse", "Depth"),
            group_by   = "LandUse",
            facet_by   = "Depth")
```

## SQI Bar Chart

```{r bar}
plot_sqi(res_lin,
         sqi_col   = "SQI_linear",
         group_col = "LandUse",
         fill_col  = "Depth")
```

## Radar Profile

```{r radar, fig.height=6, eval=requireNamespace("fmsb", quietly=TRUE)}
# Requires the 'fmsb' package: install.packages("fmsb")
plot_radar(scored, cfg,
           group_col  = "LandUse",
           group_cols = c("LandUse", "Depth"))
```

## PCA Biplot

```{r biplot}
plot_pca_biplot(mds, scored, group_col = "LandUse")
```

---

# Step 7: Statistical Analysis

## ANOVA

```{r anova}
# Compute per-observation index
scored$SQI_obs <- rowMeans(scored[, mds$mds_vars], na.rm = TRUE)

aov_res <- sqi_anova(scored,
                     sqi_col   = "SQI_obs",
                     group_col = "LandUse")
cat("ANOVA significant:", aov_res$significant, "\n")
print(aov_res$compact_letters)
```

## Sensitivity Analysis

```{r sensitivity}
sa <- sqi_sensitivity(scored, cfg,
                       group_cols = c("LandUse", "Depth"),
                       method     = "linear",
                       mds_vars   = mds$mds_vars)
print(sa)
```

## Sensitivity Tornado Chart

```{r tornado}
plot_sensitivity(sa)
```

---

# Recommended Workflow Diagram

```
raw data
   │
   ▼
validate_data()          ← check structure, missing values, groups
   │
   ▼
make_config()            ← define scoring type per variable
   │
   ▼
plot_scoring_curves()    ← verify biological plausibility
   │
   ▼
score_all()              ← transform all variables to 0–1
   │
   ▼
select_mds()             ← PCA-based minimum data set selection
   │
   ├──► sqi_linear()
   ├──► sqi_regression()
   ├──► sqi_pca()
   ├──► sqi_fuzzy()
   ├──► sqi_entropy()
   └──► sqi_topsis()
             │
             ▼
        sqi_compare()    ← unified results table + ranking
             │
             ▼
      Visualisation      ← plot_sqi(), plot_scores(), plot_radar()
             │
             ▼
      Statistics         ← sqi_anova(), sqi_sensitivity()
```

---

# References

```{r bib, echo=FALSE}
```
