---
title: "Constrained Linear Genomic Selection Indices"
author: "Zankrut Goyani"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Constrained Linear Genomic Selection Indices}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Introduction

This vignette demonstrates the methodology behind **Constrained Linear Genomic Selection Indices (LGSI)** based on Chapter 6 of the book *Linear Selection Indices in Modern Plant Breeding*. The objective of constrained selection indices is to maximize genetic advance for specific traits while imposing defined restrictions on other traits.

In a genomic selection context, these indices use Genomic Estimated Breeding Values (GEBVs). The core indices covered here include:

1. **Restricted Linear Genomic Selection Index (RLGSI)**: Aims to improve specific traits while leaving others fixed (null restriction).
2. **Predetermined Proportional Gain Linear Genomic Selection Index (PPG-LGSI)**: Aims to achieve specific, predetermined proportions of change in the population mean of designated traits.
3. **Combined RLGSI (CRLGSI)**: Extends RLGSI to use phenotypic and GEBV information jointly.
4. **Combined PPG-LGSI (CPPG-LGSI)**: Extends PPG-LGSI to use phenotypic and GEBV information jointly.

## Environment Setup and Data Preparation

We will utilize synthetic phenotypic and genotypic maize data provided in the `selection.index` package to illustrate these methods.

### 1. Estimating GEBVs

Before applying these indices, we first estimate genomic breeding values. We use Ridge Regression (`lm.ridge`) from the `MASS` package as a rapid and simple method to simulate this step linking marker data to phenotype data.

```{r setup, message=FALSE, warning=FALSE}
library(selection.index)
library(MASS) # For Ridge regression

# Load the maize datasets
data("maize_pheno", package = "selection.index")
data("maize_geno", package = "selection.index")

# Extract only the traits we care about to prevent missing value logic failures
traits <- c("Yield", "PlantHeight", "DaysToMaturity")
maize_pheno_clean <- na.omit(maize_pheno[, c("Genotype", "Block", traits)])

# Calculate mean performance for phenotypic data
# Providing only the traits to `data` to accurately generate the table
pheno_means_full <- mean_performance(
  data = maize_pheno_clean[, traits],
  genotypes = maize_pheno_clean$Genotype,
  replications = maize_pheno_clean$Block,
  method = "Mean"
)

# Remove the trailing 9 rows composed of summary text-stats automatically generated
pheno_means <- pheno_means_full[1:(nrow(pheno_means_full) - 9), ]

# Extract numerical phenotype matrix
Y <- as.matrix(pheno_means[, traits])
mode(Y) <- "numeric"
rownames(Y) <- pheno_means$Genotypes

# Format genotype matrix
X <- as.matrix(maize_geno)

# Ensure matching dimensions and align the matrices
common_genotypes <- intersect(rownames(Y), rownames(X))

# Filter and sort matrices to match genotypes sequence exactly
Y_filtered <- Y[common_genotypes, ]
X_filtered <- X[common_genotypes, ]

# Calculate GEBVs using a simple Ridge Regression model per trait
gebvs_sim <- matrix(0, nrow = nrow(Y_filtered), ncol = ncol(Y_filtered))
colnames(gebvs_sim) <- colnames(Y_filtered)
rownames(gebvs_sim) <- rownames(Y_filtered)
lambda_ridge <- 0.1

for (j in seq_len(ncol(Y_filtered))) {
  # Fit ridge regression to simulate marker effects
  model_ridge <- lm.ridge(Y_filtered[, j] ~ X_filtered, lambda = lambda_ridge)
  beta_ridge <- coef(model_ridge)[-1]

  # Predict GEBVs
  gebvs_sim[, j] <- X_filtered %*% matrix(beta_ridge, ncol = 1)
}

# Define Covariance Matrices
Gamma <- cov(gebvs_sim) # Genomic Covariance Matrix estimated from simulated GEBVs
P_matrix <- cov(Y_filtered) # Phenotypic Covariance Matrix
C_matrix <- Gamma # Genetic Covariance (Assuming Gamma captures genetic additive cov)

# Define vector of economic weights for Yield, PlantHeight, DaysToMaturity
w <- matrix(c(5, -0.1, -0.1), ncol = 1)
```

## 6.1 The Restricted Linear Genomic Selection Index (RLGSI)

The RLGSI is applied in a testing population when only GEBVs are available. The goal is to maximize the response for some traits while restricting the expected generic advance of other traits to exactly zero.

### Mathematical Definition
The expected genetic advance for RLGSI is restricted by:
$Cov(I_{RG}, \mathbf{U}'\mathbf{g}) = \mathbf{U}'\boldsymbol{\Gamma}\boldsymbol{\beta}_{RG} = \mathbf{0}$, where $\mathbf{U}'$ is a matrix of 1s and 0s identifying the restricted traits.

Following the maximization step of the selection response, the index coefficients $\boldsymbol{\beta}_{RG}$ optimally resolve as:
$$ \boldsymbol{\beta}_{RG} = \mathbf{K}_{G}\mathbf{w} $$
where $\mathbf{K}_{G} = [\mathbf{I} - \mathbf{Q}_{G}]$ and $\mathbf{Q}_{G} = \mathbf{U} (\mathbf{U}' \boldsymbol{\Gamma} \mathbf{U})^{-1} \mathbf{U}' \boldsymbol{\Gamma}$.

### RLGSI Implementation Example

Suppose we want to maximize the gain using our three traits, but we want to hold **Plant Height** and **Days To Maturity** completely fixed ($0$ gain).

```{r rlgsi_example}
# Define the restriction matrix U
# We restrict traits 2 (PlantHeight) and 3 (DaysToMaturity).
# Trait 1 (Yield) is left unrestricted.
U_rlgsi <- matrix(c(
  0, 0, # Yield unrestricted
  1, 0, # PlantHeight restricted
  0, 1 # DaysToMaturity restricted
), nrow = 3, byrow = TRUE)

# Calculate RLGSI
rlgsi_res <- rlgsi(Gamma = Gamma, wmat = w, U = U_rlgsi)

# View the resulting index coefficients and Expected Genetic Gain
cat("RLGSI Coefficients (beta_RG):\n")
print(rlgsi_res$b)

cat("\nExpected Genetic Gain per Trait:\n")
print(rlgsi_res$Summary[, "Delta_H", drop = FALSE])
```

Notice in the output that the Expected Genetic Gain (`Delta_H`) for traits 2 and 3 evaluates to a precision limit of $0$, fulfilling the null-gain restriction.

## 6.2 The Predetermined Proportional Gain Linear Genomic Selection Index (PPG-LGSI)

Instead of forcing a trait to remain fixed at $0$, we might want a trait's mean value to undergo a highly specific, predefined amount of selection shift.

### Mathematical Definition
Let $\mathbf{d}' = [d_1, d_2, \ldots, d_r]$ be the proportional gain values requested by the breeder.
Similar to the RLGSI, the coefficient array evaluates identically with a targeted constant proportionality factor restricting expected proportional shifts:
$$ \boldsymbol{\beta}_{PG} = \mathbf{K}_{P}\mathbf{w} $$

### PPG-LGSI Implementation Example

Suppose we want to target **Yield** to proportionally increase by $7.0$ units, and identically target **Plant Height** to slightly decrease by relatively $-3.0$ proportion targets.

```{r ppglgsi_example}
# Define the restriction matrix U for PPG
# Restrict traits 1 (Yield) and 2 (PlantHeight) to predetermined scalars
U_ppg <- matrix(c(
  1, 0, # Yield restricted
  0, 1, # PlantHeight restricted
  0, 0 # DaysToMaturity unrestricted
), nrow = 3, byrow = TRUE)

# Define the predetermined values vector 'd'
d_vec <- matrix(c(7.0, -3.0), ncol = 1)

# Calculate PPG-LGSI
ppg_res <- ppg_lgsi(Gamma = Gamma, d = d_vec, wmat = w, U = U_ppg)

cat("PPG-LGSI Coefficients (beta_PG):\n")
print(ppg_res$b)
```

## 6.3 Combined Restricted Linear Genomic Selection Index (CRLGSI)

The CRLGSI extends the basic RLGSI by integrating *both* phenotypic information and genomic breeding values structurally as blocks in a combined index $I_C$. It is mainly applied to a training population where joint data is uniformly available.

Its mathematical structure models parallel coefficient distributions across phenotypic and genomic parameters uniformly:
$$ \boldsymbol{\beta}_{CR} = \mathbf{K}_{C}\boldsymbol{\beta}_C $$

For an index combining phenotypes and GEBVs, restrictions must concurrently be assigned to both empirical dimensions globally.

```{r crlgsi_example}
# 1. Provide Combined Matrices
# The combined Phenotypic-Genomic covariance matrix T_C
T_C <- rbind(
  cbind(P_matrix, Gamma),
  cbind(Gamma, Gamma)
)

# The combined Genetic-Genomic covariance matrix Psi_C
Psi_C <- rbind(
  cbind(C_matrix, Gamma),
  cbind(Gamma, Gamma)
)

# 2. Define Restriction Matrix U_C
# Note: For combined indices of `t` traits, U_C spans exactly 2t rows.
# Restricting Trait 1 (Yield) on both empirical and genomic parameters:
U_crlgsi <- matrix(0, nrow = 6, ncol = 2)
U_crlgsi[1, 1] <- 1 # Trait 1 phenotypic component
U_crlgsi[4, 2] <- 1 # Trait 1 genomic component

# 3. Calculate CRLGSI
crlgsi_res <- crlgsi(T_C = T_C, Psi_C = Psi_C, wmat = w, U = U_crlgsi)

cat("CRLGSI Combined Coefficients (beta_CR):\n")
print(crlgsi_res$b)
```

## 6.4 Combined Predetermined Proportional Gain Linear Genomic Selection Index (CPPG-LGSI)

The CPPG-LGSI combines the dynamic proportional gain vectors to the $T_C$ and $\Psi_C$ data blocks identically leveraging the `cppg_lgsi()` logic.

### CPPG-LGSI Implementation Example

Suppose we arbitrarily evaluate **Yield** to predetermined restriction constants.

```{r cppglgsi_example}
# Target Yield using predetermined combined proportions d
d_combined <- matrix(c(7.0, 3.5), ncol = 1)

# Calculate CPPG-LGSI dynamically
cppg_res <- cppg_lgsi(T_C = T_C, Psi_C = Psi_C, d = d_combined, wmat = w, U = U_crlgsi)

cat("CPPG-LGSI Coefficients (beta_CP):\n")
print(cppg_res$b)
```

## Summary Comparison

Below we extract the standard metrics associated with each index model (i.e Selection Response $R$):

```{r summary_comparison}
comparison_df <- data.frame(
  Index = c("RLGSI", "PPG-LGSI", "CRLGSI", "CPPG-LGSI"),
  Selection_Response = c(
    rlgsi_res$R,
    ppg_res$R,
    crlgsi_res$R,
    cppg_res$R
  ),
  Overall_Genetic_Advance = c(
    rlgsi_res$GA,
    ppg_res$GA,
    crlgsi_res$GA,
    cppg_res$GA
  )
)

print(comparison_df)
```