| Type: | Package |
| Title: | Structured Screen-and-Select Variable Selection in Linear, Generalized Linear, and Survival Models |
| Version: | 1.0 |
| Date: | 2025-12-30 |
| Maintainer: | Nilotpal Sanyal <nsanyal@utep.edu> |
| Description: | Performs variable selection using the structured screen-and-select (S3VS) framework in linear models, generalized linear models with binary data, and survival models such as the Cox model and accelerated failure time (AFT) model. |
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
| Imports: | glmnet, ncvreg, survival, mombf, future.apply, eha, pec, aftgee, afthd |
| Suggests: | rjags, knitr, doParallel |
| VignetteBuilder: | knitr |
| NeedsCompilation: | no |
| Packaged: | 2026-01-09 13:00:43 UTC; nsanyal |
| Author: | Nilotpal Sanyal [aut, cre], Padmore N. Prempeh [aut] |
| Repository: | CRAN |
| Date/Publication: | 2026-01-13 18:00:14 UTC |
Structured Screen-and-Select Variable Selection in Linear, Generalized Linear, and Survival Models
Description
Performs variable selection using the structured screen-and-select (S3VS) framework in linear models, generalized linear models with binary data, and survival models such as the Cox model and accelerated failure time (AFT) model.
Details
The S3VS package implements the Structured Screen-and-Select Variable Selection (S3VS) framework for linear models, generalized linear models with binary responses, and survival models (Cox proportional hazards and accelerated failure time models).
The central entry point is S3VS, which dispatches to a family-specific routine via the argument family:
-
S3VS_LMfor linear models, -
S3VS_GLMfor generalized linear models with binary outcomes, -
S3VS_SURVfor survival models.
The S3VS workflow proceeds through the following steps, each handled by helper functions:
- Stopping rule check
looprundetermines whether the iterative screen-and-select process should continue.- Leading variable identification
get_leadvarsidentifies leading variables; family-specific versions areget_leadvars_LM,get_leadvars_GLM, andget_leadvars_SURV.- Leading set identification
get_leadsetsidentifies the leading set for each leading variable.- Selection within leading sets
VS_methodperforms selection within leading sets; family-specific methods includeVS_method_LM,VS_method_GLM,VS_method_SURV, andbridge_aftimplements BRIDGE specifically for AFT models.- Aggregation of selected variables
select_varsretains promising variables as selected from an iteration.- Aggregation of non-selected variables (optional)
remove_varsremoves variables deemed uninformative from future iterations (if no variable is selected in the current iteration byselect_vars).- Response update (optional)
update_yenables iterative response updates; family-specific variants includeupdate_y_LMandupdate_y_GLM.
Together, these functions form a structured, iterative pipeline for efficient variable screening and selection in high-dimensional regression and survival analysis.
- Prediction
pred_S3VSproduces predictions using variables selected by S3VS, callingpred_S3VS_LM,pred_S3VS_GLM, orpred_S3VS_SURVas appropriate.
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
Maintainer: Nilotpal Sanyal <nsanyal@utep.edu>
Structured Screen-and-Select Variable Selection in Linear, Generalized Linear, and Survival Models
Description
S3VS is the main function that performs variable selection based on the structured screen-and-select framework in linear, generalized linear, and survival models.
Usage
S3VS(
y,
X,
family = c("normal", "binomial", "survival"),
cor_xy = NULL,
surv_model = c("COX", "AFT"),
method_xy = c("topk", "fixedthresh", "percthresh"),
param_xy,
method_xx = c("topk", "fixedthresh", "percthresh"),
param_xx,
vsel_method = NULL,
alpha = 0.5,
method_sel = c("conservative", "liberal"),
method_rem = c("conservative_begin", "conservative_end", "liberal"),
sel_regout = FALSE,
rem_regout = FALSE,
update_y_thresh = 0.5,
m = 100,
nskip = 3,
verbose = FALSE,
seed = NULL,
parallel = FALSE
)
Arguments
y |
Response. If |
X |
Design matrix of predictors. Can be a base matrix or something |
family |
Model family; one of |
cor_xy |
Optional numeric vector of precomputed marginal correlations between |
surv_model |
Character string specifying the survival model (for |
method_xy |
Rule for screening some predictors as "leading variables" based on their association with the response; one of
|
param_xy |
Tuning parameter for |
method_xx |
Rule for constructing, for each leading variable, the set of associated predictors (the "leading set") using inter-predictor association (absolute value of the correlation coefficient); one of |
param_xx |
Tuning parameter for |
vsel_method |
Character string specifying the variable selection method to be used within each leading set. Available options depend on the model type:
|
alpha |
Only used when |
method_sel |
Policy for aggregating predictors selected across leading sets in an iteration; one of |
method_rem |
Policy for excluding predictors when no selections are made in an iteration; one of |
sel_regout |
Logical (GLM only). If |
rem_regout |
Logical (for LM and GLM only). If |
update_y_thresh |
Numeric scalar threshold controlling how the working response |
m |
Integer. Maximum number of S3VS iterations to perform. Defaults to |
nskip |
Integer. Maximum number of iterations in which no new predictors are selected before the algorithm stops. Defaults to |
verbose |
Logical. If |
seed |
If supplied, sets the random seed via |
parallel |
Logical. If |
Details
Model
For a continuous response, S3VS considers the linear model (LM)
\boldsymbol{y} = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}
For a binary response, S3VS considers the generalized linear model (GLM)
g\!\left( E\!\left( \boldsymbol{y} \mid \boldsymbol{X} \right) \right)
= \boldsymbol{X}\boldsymbol{\beta}
For a survival type response, S3VS considers two choices of models–the Cox model
\lambda(t\mid \boldsymbol{x}_i) = \lambda_0(t) \exp(\boldsymbol{x}_i^T \boldsymbol{\beta})
and the AFT model
\log(\boldsymbol{T}) = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}
S3VS algorithm
The general form of the S3VS algorithm consists of the following steps, repeated iteratively until convergence:
-
Determination of leading variables: 'Leading variables' are determined based on the association of the predictors with the response, following one of three rules. The rule is fixed by the arguments
method_xyandparam_xy. -
Determination of leading sets: For each leading variable, a group of related predictors, called the 'leading set', is determined based on the association of all candidate predictors with the leading variable, following one of three rules. The rule is fixed by the arguments
method_xxandparam_xx. -
Variable selection: Within each leading set, small to moderate-dimensional variable selection is performed using a method fixed by
vsel_method. -
Aggregation of selected/not-selected variables: Variables selected/not-selected in different leading sets are aggregated using several possible rules, fixed by
method_selandmethod_rem. -
Updation of response and/or set of covariates: At the end of each iteration, the response and predictors may be chosen to be updated or not through argumentsm
sel_regout,rem_regout, andupdate_y_thresh.
The convergence criterion is determined by the arguments m and nkip jointly. For ore details of the individual steps, see the manual of the functions linked below.
Value
A list with the following components:
selected |
A character vector of predictor names that were selected across all iterations. |
selected_iterwise |
A list recording the predictors selected at each iteration, in the order they were considered. |
runtime |
Runtime in seconds. |
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
get_leadvars, get_leadsets, VS_method, select_vars, remove_vars, update_y
Examples
### [1] For linear model
# Simulate continuous data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
y <- X[,1] + 0.5 * X[,2] + rnorm(n)
# Run S3VS for LM
res_lm <- S3VS(y = y, X = X, family = "normal",
method_xy = "topk", param_xy = list(k=1),
method_xx = "topk", param_xx = list(k=3),
vsel_method = "LASSO", method_sel = "conservative",
method_rem = "conservative_begin", rem_regout = FALSE,
m = 100, nskip = 3, verbose = TRUE, seed = 123)
# View selected predictors
res_lm$selected
### [2] For generalized linear model
# Simulate binary data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
prob <- 1 / (1 + exp(-eta))
y <- rbinom(n, size = 1, prob = prob)
# Run S3VS for for GLM (logistic)
res_glm <- S3VS(y = y, X = X, family = "binomial",
method_xy = "topk", param_xy = list(k = 1),
method_xx = "topk", param_xx = list(k = 3),
vsel_method = "LASSO",
method_sel = "conservative", method_rem = "conservative_begin",
sel_regout = FALSE, rem_regout = FALSE,
m = 100, nskip = 3, verbose = TRUE, seed = 123)
# View selected predictors
res_glm$selected
### [3] For survival model
# Simulate survival data (Cox)
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
base_rate <- 0.05
T_event <- rexp(n, rate = base_rate * exp(eta))
C <- rexp(n, rate = 0.03)
time <- pmin(T_event, C)
status <- as.integer(T_event <= C)
y_surv <- list(time = time, status = status)
# Run S3VS for linear models
res_surv <- S3VS(y = y_surv, X = X, family = "survival",
surv_model = "COX",
method_xy = "topk", param_xy = list(k = 1),
method_xx = "topk", param_xx = list(k = 3),
vsel_method = "COXGLMNET",
method_sel = "conservative", method_rem = "conservative_begin",
sel_regout = FALSE, rem_regout = FALSE,
m = 100, nskip = 3, verbose = TRUE, seed = 123)
# View selected predictors
res_surv$selected
Structured Screen-and-Select Variable Selection in Generalized Linear Models
Description
S3VS_GLM performs variable selection based on the structured screen-and-select framework in generalized linear models.
Usage
S3VS_GLM(y, X,
method_xy = c("topk", "fixedetasqthresh", "percetasqthresh"), param_xy,
method_xx = c("topk", "fixedcorthresh", "perccorthresh"), param_xx,
vsel_method = c("NLP", "LASSO", "ENET", "SCAD", "MCP"),
alpha = 0.5,
method_sel = c("conservative", "liberal"),
method_rem = c("conservative_begin", "conservative_end", "liberal"),
sel_regout = FALSE, rem_regout = FALSE, update_y_thresh = NULL,
m = 100, nskip = 3, verbose = FALSE, seed = NULL, parallel = FALSE)
Arguments
y |
Response. A numeric/integer/logical vector with values in {0,1}. |
X |
Design matrix of predictors. Can be a base matrix or something |
method_xy |
Rule for screening some predictors as "leading variables" based on their association with the response; one of
|
param_xy |
Tuning parameter for |
method_xx |
Rule for constructing, for each leading variable, the set of associated predictors (the "leading set") using inter-predictor association (absolute value of the correlation coefficient); one of |
param_xx |
Tuning parameter for |
vsel_method |
Character string specifying the variable selection method to be used within each leading set. Available options are |
alpha |
Only used when |
method_sel |
Policy for aggregating predictors selected across leading sets in an iteration; one of |
method_rem |
Policy for excluding predictors when no selections are made in an iteration; one of |
sel_regout |
Logical. If |
rem_regout |
Logical. If |
update_y_thresh |
Numeric scalar threshold controlling how the working response |
m |
Integer. Maximum number of S3VS iterations to perform. Defaults to |
nskip |
Integer. Maximum number of iterations in which no new predictors are selected before the algorithm stops. Defaults to |
verbose |
Logical. If |
seed |
If supplied, sets the random seed via |
parallel |
Logical. If |
Details
For a binary response, S3VS considers the generalized linear model (GLM)
g\!\left( E\!\left( \boldsymbol{y} \mid \boldsymbol{X} \right) \right)
= \boldsymbol{X}\boldsymbol{\beta}
For the S3VS algorithm, see the manual of the top-level function S3VS.
Value
A list with the following components:
selected |
A character vector of predictor names that were selected across all iterations. |
selected_iterwise |
A list recording the predictors selected at each iteration, in the order they were considered. |
runtime |
Runtime in seconds. |
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
get_leadvars_GLM, get_leadsets, VS_method_GLM, select_vars, remove_vars, update_y_GLM
Examples
# Simulate binary data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
prob <- 1 / (1 + exp(-eta))
y <- rbinom(n, size = 1, prob = prob)
# Run S3VS for for GLM (logistic)
res_glm <- S3VS_GLM(y = y, X = X,
method_xy = "topk", param_xy = list(k = 1),
method_xx = "topk", param_xx = list(k = 3),
vsel_method = "LASSO",
method_sel = "conservative", method_rem = "conservative_begin",
sel_regout = FALSE, rem_regout = FALSE,
m = 100, nskip = 3, verbose = TRUE, seed = 123)
# View selected predictors
res_glm$selected
Structured Screen-and-Select Variable Selection in Linear Models
Description
S3VS_LM performs variable selection based on the structured screen-and-select framework in linear models.
Usage
S3VS_LM(y, X, cor_xy = NULL,
method_xy = c("topk", "fixedcorthresh", "perccorthresh"), param_xy,
method_xx = c("topk", "fixedcorthresh", "perccorthresh"), param_xx,
vsel_method = c("NLP", "LASSO", "ENET", "SCAD", "MCP"),
alpha = 0.5,
method_sel = c("conservative", "liberal"),
method_rem = c("conservative_begin", "conservative_end", "liberal"),
rem_regout = FALSE,
m = 100, nskip = 3, verbose = FALSE, seed = NULL)
Arguments
y |
Response. A numeric vector. |
X |
Design matrix of predictors. Can be a base matrix or something |
cor_xy |
Optional numeric vector of precomputed marginal correlations between |
method_xy |
Rule for screening some predictors as 'leading variables' based on their association with the response; one of
|
param_xy |
Tuning parameter for |
method_xx |
Rule for constructing, for each leading variable, the set of associated predictors (the "leading set") using inter-predictor association (absolute value of the correlation coefficient); one of |
param_xx |
Tuning parameter for |
vsel_method |
Character string specifying the variable selection method to be used within each leading set. Available options are |
alpha |
Only used when |
method_sel |
Policy for aggregating predictors selected across leading sets in an iteration; one of |
method_rem |
Policy for excluding predictors when no selections are made in an iteration; one of |
rem_regout |
Logical. If |
m |
Integer. Maximum number of S3VS iterations to perform. Defaults to |
nskip |
Integer. Maximum number of iterations in which no new predictors are selected before the algorithm stops. Defaults to |
verbose |
Logical. If |
seed |
If supplied, sets the random seed via |
Details
For a continuous response, S3VS considers the linear model (LM)
\boldsymbol{y} = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}
For the S3VS algorithm, see the manual of the top-level function S3VS.
Value
A list with the following components:
selected |
A character vector of predictor names that were selected across all iterations. |
selected_iterwise |
A list recording the predictors selected at each iteration, in the order they were considered. |
runtime |
Runtime in seconds. |
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
get_leadvars_LM, get_leadsets, VS_method_LM, select_vars, remove_vars, update_y_LM
Examples
# Simulate continuous data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
y <- X[,1] + 0.5 * X[,2] + rnorm(n)
# Run S3VS for LM
res_lm <- S3VS_LM(y = y, X = X,
method_xy = "topk", param_xy = list(k=1),
method_xx = "topk", param_xx = list(k=3),
vsel_method = "LASSO", method_sel = "conservative",
method_rem = "conservative_begin", rem_regout = FALSE,
m = 100, nskip = 3, verbose = TRUE, seed = 123)
# View selected predictor
res_lm$selected
Structured Screen-and-Select Variable Selection in Survival Models
Description
S3VS_SURV performs variable selection based on the structured screen-and-select framework in survival models.
Usage
S3VS_SURV(y, X, surv_model = c("COX", "AFT"),
method_xy = c("topk", "fixedmuthresh", "percmuthresh"), param_xy,
method_xx = c("topk", "fixedcorthresh", "perccorthresh"), param_xx,
vsel_method = c("LASSO", "ENET", "AFTGEE", "BRIDGE", "PVAFT"),
alpha = 0.5,
method_sel = c("conservative", "liberal"),
method_rem = c("conservative_begin", "conservative_end", "liberal"),
m = 100, nskip = 3, verbose = FALSE, seed = NULL, parallel = FALSE)
Arguments
y |
Response. A list with components |
X |
Design matrix of predictors. Can be a base matrix or something |
surv_model |
Character string specifying the survival model. Must be explicitly provided; there is no default. Values are |
method_xy |
Rule for screening some predictors as "leading variables" based on their association with the response; one of
|
param_xy |
Tuning parameter for |
method_xx |
Rule for constructing, for each leading variable, the set of associated predictors (the "leading set") using inter-predictor association (absolute value of the correlation coefficient); one of |
param_xx |
Tuning parameter for |
vsel_method |
Character string specifying the variable selection method to be used within each leading set. Available options are |
alpha |
Only used when |
method_sel |
Policy for aggregating predictors selected across leading sets in an iteration; one of |
method_rem |
Policy for excluding predictors when no selections are made in an iteration; one of |
m |
Integer. Maximum number of S3VS iterations to perform. Defaults to |
nskip |
Integer. Maximum number of iterations in which no new predictors are selected before the algorithm stops. Defaults to |
verbose |
Logical. If |
seed |
If supplied, sets the random seed via |
parallel |
Logical. If |
Details
For a survival type response, S3VS considers two choices of models–the Cox model
\lambda(t\mid \boldsymbol{x}_i) = \lambda_0(t) \exp(\boldsymbol{x}_i^T \boldsymbol{\beta})
and the AFT model
\log(\boldsymbol{T}) = \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}
For the S3VS algorithm, see the manual of the top-level function S3VS.
Value
A list with the following components:
selected |
A character vector of predictor names that were selected across all iterations. |
selected_iterwise |
A list recording the predictors selected at each iteration, in the order they were considered. |
runtime |
Runtime in seconds. |
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
get_leadvars_SURV, get_leadsets, VS_method_SURV, select_vars, remove_vars
Examples
# Simulate survival data (Cox)
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
base_rate <- 0.05
T_event <- rexp(n, rate = base_rate * exp(eta))
C <- rexp(n, rate = 0.03)
time <- pmin(T_event, C)
status <- as.integer(T_event <= C)
y_surv <- list(time = time, status = status)
# Run S3VS for linear models
res_surv <- S3VS(y = y_surv, X = X, family = "survival",
surv_model = "COX",
method_xy = "topk", param_xy = list(k = 1),
method_xx = "topk", param_xx = list(k = 3),
vsel_method = "COXGLMNET",
method_sel = "conservative", method_rem = "conservative_begin",
sel_regout = FALSE, rem_regout = FALSE,
m = 100, nskip = 3, verbose = TRUE, seed = 123)
# View selected predictors
res_surv$selected
Variable Selection in Leading Sets under the S3VS Framework
Description
VS_method applies the chosen variable-selection algorithm to each leading set produced by S3VS at every iteration.
Usage
VS_method(y, X, family, surv_model = NULL, vsel_method, alpha = 0.5,
p_thresh = 0.1, gamma = 0.9, verbose = FALSE)
Arguments
y |
Response. If |
X |
Predictor matrix. Can be a base matrix or something |
family |
Model family; one of |
surv_model |
Character string specifying the survival model ( |
vsel_method |
Character string indicating the variable-selection engine used inside |
alpha |
Only used when |
p_thresh |
Only used for |
gamma |
Only used for |
verbose |
If |
Details
Details to come...
Value
A list containing:
sel |
Character vector with names of the selected predictors. |
nosel |
Character vector with names of the predictors not selected. |
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
VS_method_LM, VS_method_GLM, VS_method_SURV
Examples
# Simulate continuous data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
y <- X[,1] + 0.5 * X[,2] + rnorm(n)
# Run VS_method
VS_method(y, X, family = "normal", vsel_method = "NLP", verbose = FALSE)
Variable Selection in Leading Sets for Generalized Linear Models under the S3VS Framework
Description
VS_method applies the chosen variable-selection algorithm for generalized linear models to each leading set produced by S3VS at every iteration.
Usage
VS_method_GLM(y, X, vsel_method, alpha = 0.5, verbose = FALSE,
parallel = FALSE, ncores = NULL)
Arguments
y |
Response. A numeric/integer/logical vector with values in {0,1}. |
X |
Predictor matrix. Can be a base matrix or something |
vsel_method |
Character string indicating the variable-selection engine used at each iteration. Available options are |
alpha |
Only used when |
verbose |
If |
parallel |
Logical. If |
ncores |
Integer; number of CPU cores to use when |
Value
A list containing:
sel |
Character vector with names of the selected predictors. |
nosel |
Character vector with names of the predictors not selected. |
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
modelSelection, cv.glmnet, cv.ncvreg
Examples
# Simulate binary data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
prob <- 1 / (1 + exp(-eta))
y <- rbinom(n, size = 1, prob = prob)
# Run VS_method
VS_method_GLM(y, X, vsel_method = "LASSO", verbose = FALSE)
Variable Selection in Leading Sets for Linear Models under the S3VS Framework
Description
VS_method applies the chosen variable-selection algorithm for linear models to each leading set produced by S3VS at every iteration.
Usage
VS_method_LM(y, X, vsel_method, alpha = 0.5, verbose = FALSE)
Arguments
y |
Response. A numeric vector. |
X |
Predictor matrix. Can be a base matrix or something |
vsel_method |
Character string indicating the variable-selection engine used at each iteration. Available options are |
alpha |
Only used when |
verbose |
If |
Value
A list containing:
sel |
Character vector with names of the selected predictors. |
nosel |
Character vector with names of the predictors not selected. |
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
modelSelection, cv.glmnet, cv.ncvreg
Examples
# Simulate continuous data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
y <- X[,1] + 0.5 * X[,2] + rnorm(n)
# Run VS_method
VS_method_LM(y, X, vsel_method = "NLP", verbose = FALSE)
Variable Selection in Leading Sets for Survival Models under the S3VS Framework
Description
VS_method applies the chosen variable-selection algorithm for survival models to each leading set produced by S3VS at every iteration.
Usage
VS_method_SURV(y, X, surv_model, vsel_method, alpha = 0.5,
p_thresh = 0.1, gamma = 0.9, verbose = FALSE, ...)
Arguments
y |
Response. A list with components |
X |
Predictor matrix. Can be a base matrix or something |
surv_model |
Character string specifying the survival model. Must be explicitly provided; there is no default. Values are |
vsel_method |
Character string indicating the variable-selection engine used at each iteration. Available options are |
alpha |
Only used when |
p_thresh |
Only used with |
gamma |
Only used with |
verbose |
If |
... |
Other arguments to be passed inside |
Value
A list containing:
sel |
Character vector with names of the selected predictors. |
nosel |
Character vector with names of the predictors not selected. |
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
cv.glmnet, aftreg, aftgee, bridge_aft, pvaft
Examples
# Simulate survival data (Cox)
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
base_rate <- 0.05
T_event <- rexp(n, rate = base_rate * exp(eta))
C <- rexp(n, rate = 0.03)
time <- pmin(T_event, C)
status <- as.integer(T_event <= C)
y_surv <- list(time = time, status = status)
# Run VS_method
VS_method_SURV(y_surv, X, surv_model = "COX", vsel_method = "COXGLMNET", verbose = FALSE)
Bridge-Penalized AFT Regression via Iteratively Reweighted LASSO
Description
bridge_aft fits an accelerated failure time (AFT) model using an iterative reweighted LASSO scheme to approximate a bridge (L_\gamma) penalty on the regression coefficients.
Usage
bridge_aft(y, X, gamma = 0.5, alpha = 1, max_iter = 100, tol = 1e-05)
Arguments
y |
Response; a list of two elements |
X |
Predictor matrix. Can be a base matrix or something |
gamma |
Bridge penalty exponent |
alpha |
Elastic-net mixing parameter passed to |
max_iter |
Maximum number of outer reweighting iterations (default |
tol |
Convergence tolerance on the |
Value
A list with components:
beta |
Numeric vector of estimated coefficients of length |
gamma |
The bridge exponent used in the fit. |
iterations |
Number of outer reweighting iterations performed. |
Author(s)
Padmore Prempeh <pprempeh@albany.edu>, Nilotpal Sanyal <nsanyal@utep.edu>
References
Jian Huang and Shuangge Ma. Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis, 16(2):176-195, 2010.
See Also
glmnet, cv.glmnet, Surv, survfit
Examples
set.seed(1)
n <- 50
p <- 10
X <- matrix(rnorm(n * p), n, p)
beta_true <- c(runif(10, -1.5, 1.5), rep(0, p - 10))
linpred <- as.vector(X %*% beta_true)
## Generate log-normal AFT survival times (no censoring in this simple example)
sigma <- 0.6
logT <- linpred + rnorm(n, sd = sigma)
time <- exp(logT)
delta <- rep(1, n) # all events (censoring ignored by current implementation)
y_surv <- list(time = time, status = delta)
fit <- bridge_aft(y_surv, X, gamma = 0.5, alpha = 1, max_iter = 50, tol = 1e-5)
str(fit)
fit$beta[1:10]
Identify Leading Sets of Covariates via Inter-Predictor Associations
Description
get_leadsets identifies, for a specified leading variable, a set of associated predictors, the leading set, based on inter-predictor associations (absolute value of the correlation coefficient).
Usage
get_leadsets(x_lead, X, method = c("topk", "fixedthresh", "percthresh"), param)
Arguments
x_lead |
Vector with values of the leading variable |
X |
Predictor matrix. Must contain the leading variable. Can be a base matrix or something |
method |
Rule for constructing, for each leading variable, the set of associated predictors (the "leading set") using inter-predictor association (absolute value of the correlation coefficient); one of |
param |
Tuning parameter for |
Value
A character vector containing the names of the predictors.
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
Examples
# Simulate continuous data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
y <- X[,1] + 0.5 * X[,2] + rnorm(n)
leadvars <- get_leadvars_LM(y = y, X = X, method = "topk", param = list(k=2))
get_leadsets(X[,leadvars[1]], X, method = "percthresh", param = list(thresh = 0.2))
Screening Predictors As 'Leading Variables' By Evaluating Predictor-Response Associations
Description
get_leadvars screens some predictors as "leading variables" based on predictor-response associations in linear, generalized linear, and survival models.
Usage
get_leadvars(y, X, family = c("normal","binomial","survival"),
surv_model = c("AFT", "COX"),
method = c("topk", "fixedthresh", "percthresh"), param,
varsselected = NULL, varsleft = colnames(X), parallel = FALSE)
Arguments
y |
Response. If |
X |
Predictor matrix. Can be a base matrix or something |
family |
Model family; one of |
surv_model |
Character string specifying the survival model ( |
method |
Screening rule, one of |
param |
Tuning parameter for |
varsselected |
Used only when |
varsleft |
Used only when |
parallel |
Logical. If |
Value
A character vector containing the names of the leading variables.
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
get_leadvars_LM, get_leadvars_GLM, get_leadvars_SURV,
Examples
# Simulate continuous data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
y <- X[,1] + 0.5 * X[,2] + rnorm(n)
# Select leading variables
leadvars <- get_leadvars(y = y, X = X, family = "normal",
method = "topk", param = list(k=2))
leadvars
Screening Predictors As 'Leading Variables' By Evaluating Predictor-Response Associations In Generalized Linear Models
Description
get_leadvars_GLM screens some predictors as "leading variables" based on predictor-response associations in generalized linear models.
Usage
get_leadvars_GLM(y, X, method = c("topk", "fixedetasqthresh", "percetasqthresh"), param)
Arguments
y |
Response. A numeric/integer/logical vector with values in {0,1}. |
X |
Predictor matrix. Can be a base matrix or something |
method |
Screening rule, one of |
param |
Tuning parameter for |
Value
A character vector containing the names of the leading varibales.
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
Examples
# Simulate binary data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
prob <- 1 / (1 + exp(-eta))
y <- rbinom(n, size = 1, prob = prob)
# Select leading variables
leadvars <- get_leadvars_GLM(y = y, X = X, method = "topk", param = list(k=2))
leadvars
Screening Predictors As 'Leading Variables' By Evaluating Predictor-Response Associations In Linear Models
Description
get_leadvars_LM screens some predictors as "leading variables" based on predictor-response associations in linear models.
Usage
get_leadvars_LM(y, X, method = c("topk", "fixedcorthresh", "perccorthresh"), param)
Arguments
y |
Response. A numeric vector. |
X |
Predictor matrix. Can be a base matrix or something |
method |
Screening rule, one of |
param |
Tuning parameter for |
Value
A character vector containing the names of the leading varibales.
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
Examples
# Simulate continuous data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
y <- X[,1] + 0.5 * X[,2] + rnorm(n)
# Select leading variables
leadvars <- get_leadvars_LM(y = y, X = X, method = "topk", param = list(k=2))
leadvars
Screening Predictors As "Leading Variables" By Evaluating Predictor-Response Associations In Survival Models
Description
get_leadvars_SURV screens some predictors as "leading variables" based on predictor-response associations in survival models.
Usage
get_leadvars_SURV(y, X, surv_model = c("AFT", "COX"),
method = c("topk", "fixedmuthresh", "percmuthresh"), param,
varsselected = NULL, varsleft = colnames(X), parallel = FALSE)
Arguments
y |
Response. A list with components |
X |
Predictor matrix. Can be a base matrix or something |
surv_model |
Character string specifying the survival model. Must be explicitly provided; there is no default. Values are |
method |
Screening rule, one of |
param |
Tuning parameter for |
varsselected |
A character vector containing the predictors that are already selected in previous iterations. The association measure, conditional utility, is computed controling for these predictors. |
varsleft |
A character vector containing the predictors that are neither selected, nor removed from consideration in previous iterations. Leading predictors are chosen from these predictors. |
parallel |
Logical. If |
Value
A character vector containing the names of the leading varibales.
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
Examples
# Simulate survival data (Cox)
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
base_rate <- 0.05
T_event <- rexp(n, rate = base_rate * exp(eta))
C <- rexp(n, rate = 0.03)
time <- pmin(T_event, C)
status <- as.integer(T_event <= C)
y_surv <- list(time = time, status = status)
# Select leading variables
leadvars <- get_leadvars_SURV(y = y_surv, X = X, surv_model = "COX",
method = "topk", param = list(k=2),
varsselected = NULL, varsleft = colnames(X))
leadvars
Decide Whether to Run Another S3VS Iteration
Description
looprun evaluates simple stopping criteria for the S3VS procedure and returns an indicator of whether one more iteration should be executed.
Usage
looprun(varsselected, varsleft, max_nocollect, m, nskip)
Arguments
varsselected |
Character vector with names of predictors selected so far. Only its length is used; |
varsleft |
Character vector with names of candidate predictors that remain available for selection in future iterations. Only its length is used; |
max_nocollect |
Integer count of iterations up to now in which no new predictors were selected. |
m |
Maximum allowed number of selected predictors (target cap for |
nskip |
Maximum allowed number of "no-collection" iterations before stopping. |
Details
An additional S3VS iteration is recommended iff all three conditions hold:
|\texttt{varsselected}| < m,
|\texttt{varsleft}| > 0,
\texttt{max\_nocollect} < \texttt{nskip}.
Value
1 if another iteration should run, 0 otherwise.
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
Examples
looprun(varsselected = c("x1","x2","x3"),
varsleft = paste0("x", 4:23),
max_nocollect = 0,
m = 10,
nskip = 2)
Prediction Using S3VS-Selected Predictors
Description
pred_S3VS performs prediction using predictors selected by S3VS in linear, generalized linear, and survival models.
Usage
pred_S3VS(y, X, family, surv_model = NULL, method)
Arguments
y |
Response. If |
X |
Predictor matrix. This should include predictors selected by S3VS. Can be a base matrix or something |
family |
Model family; one of |
surv_model |
Character string specifying the survival model ( |
method |
Character string indicating the prediction method used. Allowed values depend on |
Value
A list containing:
y.pred |
Predicted response |
coef |
Coefficient estimates of the predictors used for prediction |
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
pred_S3VS_LM, pred_S3VS_GLM, pred_S3VS_SURV
Examples
# Simulate continuous data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
y <- X[,1] + 0.5 * X[,2] + rnorm(n)
# Run S3VS for LM
res_lm <- S3VS(y = y, X = X, family = "normal",
method_xy = "topk", param_xy = list(k=1),
method_xx = "topk", param_xx = list(k=3),
vsel_method = "LASSO", method_sel = "conservative",
method_rem = "conservative_begin", rem_regout = FALSE,
m = 100, nskip = 3, verbose = TRUE, seed = 123)
pred_lm <- pred_S3VS(y = y, X = X[,res_lm$selected], family = "normal", method = "LASSO")
Prediction Using S3VS-Selected Predictors in Survival Models
Description
pred_S3VS performs prediction using predictors selected by S3VS in survival models.
Usage
pred_S3VS_GLM(y, X, method = c("NLP", "LASSO"))
Arguments
y |
Response. A numeric/integer/logical vector with values in {0,1}. |
X |
Predictor matrix. This should include predictors selected by S3VS. Can be a base matrix or something |
method |
Character string indicating the prediction method used. Available options are |
Value
A list containing:
y.pred |
Predicted response |
coef |
Coefficient estimates of the predictors used for prediction |
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
Examples
# Simulate binary data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
prob <- 1 / (1 + exp(-eta))
y <- rbinom(n, size = 1, prob = prob)
# Predict
pred_glm <- pred_S3VS_GLM(y = y, X = X[,1:3], method = "LASSO")
pred_glm
Prediction Using S3VS-Selected Predictors in Linear Models
Description
pred_S3VS performs prediction using predictors selected by S3VS in linear models.
Usage
pred_S3VS_LM(y, X, method)
Arguments
y |
Response. A numeric vector. |
X |
Predictor matrix. This should include predictors selected by S3VS. Can be a base matrix or something |
method |
Character string indicating the prediction method used. Available options are |
Value
A list containing:
y.pred |
Predicted response |
coef |
Coefficient estimates of the predictors used for prediction |
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
Examples
# Simulate continuous data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
y <- X[,1] + 0.5 * X[,2] + rnorm(n)
# Run S3VS for LM
res_lm <- S3VS(y = y, X = X, family = "normal",
method_xy = "topk", param_xy = list(k=1),
method_xx = "topk", param_xx = list(k=3),
vsel_method = "LASSO", method_sel = "conservative",
method_rem = "conservative_begin", rem_regout = FALSE,
m = 100, nskip = 3, verbose = TRUE, seed = 123)
pred_lm <- pred_S3VS_LM(y = y, X = X[,res_lm$selected], method = "LASSO")
pred_lm
Predicted Survival Probabilities Using S3VS-Selected Predictors in Generalized Linear Models
Description
pred_S3VS returns predicted survival probabilities using predictors selected by S3VS in generalized linear models.
Usage
pred_S3VS_SURV(y, X, surv_model = c("AFT", "COX"), method = c("AFTREG", "AFTGEE"), times)
Arguments
y |
Response. A list with components |
X |
Predictor matrix. This should include predictors selected by S3VS. Can be a base matrix or something |
surv_model |
Character string specifying the survival model. Must be explicitly provided; there is no default. Values are |
method |
Character string indicating the prediction method used. Available options are |
times |
Vector of time points where predicted survival probabilities will be computed. |
Value
A list containing:
y.pred |
Predicted response |
coef |
Coefficient estimates of the predictors used for prediction |
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
cv.glmnet, coxph, aftreg, aftgee
Examples
# Simulate survival data (Cox)
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
base_rate <- 0.05
T_event <- rexp(n, rate = base_rate * exp(eta))
C <- rexp(n, rate = 0.03)
time <- pmin(T_event, C)
status <- as.integer(T_event <= C)
y_surv <- list(time = time, status = status)
# Run S3VS for linear models
res_surv <- S3VS(y = y_surv, X = X, family = "survival",
surv_model = "COX", vsel_method = "COXGLMNET",
method_xy = "topk", param_xy = list(k = 1),
method_xx = "topk", param_xx = list(k = 3),
method_sel = "conservative", method_rem = "conservative_begin",
sel_regout = FALSE, rem_regout = FALSE,
m = 100, nskip = 3, verbose = TRUE, seed = 123)
pred_surv <- pred_S3VS_SURV(y = y_surv, X = X[,res_surv$selected],
surv_model = "COX", method = "COXGLMNET")
pred_surv
Aggregate Not-Selected Predictors for Removal Across Multiple Leading Sets
Description
remove_vars combines lists of predictors that were not selected from multiple leading sets into a single set to remove, using either a liberal (union) rule or a conservative (progressive intersection) rule.
Usage
remove_vars(listnotselect,
method = c("conservative_begin", "conservative_end", "liberal"))
Arguments
listnotselect |
A |
method |
Aggregation rule; one of
|
Details
The liberal rule favors inclusiveness (drop all predictors that were not selected in an iteration), whereas the conservative rule favors stability across earlier/latter leading sets (drop only predictors consistently absent in earlier/latter leading sets).
Value
Vector with names of the predictors that are not selected till the current S3VS iteration and to be removed from all future iterations.
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
Examples
listselect <- list(
c("V1","V2","V23"),
c("V4","V2","V23"),
c("V4","V5","V23")
)
remove_vars(listselect, method="liberal")
Aggregate Selected Predictors Across Multiple Leading Sets
Description
select_vars combines variable selections obtained from multiple leading sets into a single set, using either a liberal (union) or conservative (progressive intersection) rule.
Usage
select_vars(listselect, method = c("conservative", "liberal"))
Arguments
listselect |
A |
method |
Aggregation rule. One of
|
Details
The liberal rule favors inclusiveness, while the conservative rule favors stability.
Value
Vector with names of the retained predictors (considered selected in the current iteration of S3VS); if no predictors are retained, character(0)).
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
Examples
listselect <- list(
c("V1","V2","V23"),
c("V4","V2","V23"),
c("V4","V5","V23")
)
select_vars(listselect, method="conservative")
Update Response Accounting for Selected Predictors
Description
update_y updates the response accounting for the selected predictors in linear models, and selected or removed predictors in generalized linear models.
Usage
update_y(y, X, family, vars, update_y_thresh = NULL)
Arguments
y |
Response. If |
family |
Model family; one of |
X |
Predictor matrix. Can be a base matrix or something |
vars |
Character vector containing the names of predictors that need to be accounted for. They must appear in |
update_y_thresh |
Numeric scalar threshold used only |
Value
Returns the updated response vector.
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
See Also
Examples
# Simulate continuous data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
y <- X[,1] + 0.5 * X[,2] + rnorm(n)
update_y(y = y, X = X, family = "normal", vars = c("V1","V4"))
Update Response Accounting for Selected Predictors in Generalized Linear Models
Description
update_y_LM updates the response accounting for the selected predictors in generalized linear models.
Usage
update_y_GLM(y, X, vars, update_y_thresh)
Arguments
y |
Response. A numeric/integer/logical vector with values in {0,1}. |
X |
Predictor matrix. Can be a base matrix or something |
vars |
Character vector containing the names of predictors that need to be accounted for. They must appear in |
update_y_thresh |
Numeric scalar threshold. When |
Value
Returns the updated (binary) response vector.
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
Examples
# Simulate binary data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
eta <- X[,1] + 0.5 * X[,2]
prob <- 1 / (1 + exp(-eta))
y <- rbinom(n, size = 1, prob = prob)
update_y(family = "binomial", y = y, X = X, vars = c("V1","V4"), update_y_thresh = 0.8)
Update Response Accounting for Selected Predictors in Linear Models
Description
update_y_LM updates the response accounting for the selected predictors in linear models.
Usage
update_y_LM(y, X, vars)
Arguments
y |
Response. A numeric vector of length |
X |
Predictor matrix. Can be a base matrix or something |
vars |
Character vector containing the names of predictors that need to be accounted for. They must appear in |
Value
Returns the updated response vector.
Author(s)
Nilotpal Sanyal <nsanyal@utep.edu>, Padmore N. Prempeh <pprempeh@albany.edu>
Examples
# Simulate continuous data
set.seed(123)
n <- 100
p <- 150
X <- matrix(rnorm(n * p), n, p)
colnames(X) <- paste0("V", 1:p)
y <- X[,1] + 0.5 * X[,2] + rnorm(n)
update_y(family = "normal", y = y, X = X, vars = c("V1","V4"))