Fits hierarchical regularized regression model that enables the incorporation of external data for predictor variables. Both the predictor variables and external data can be regularized by the most common penalties (lasso, ridge, elastic net). Solutions are computed across a two-dimensional grid of penalties (a separate penalty path is computed for the predictors and external variables). Currently support regularized linear and logistic regression, future extensions to other outcomes (i.e. Cox regression) will be implemented in the next major update.
Usage
xrnet(
x,
y,
external = NULL,
unpen = NULL,
family = c("gaussian", "binomial"),
penalty_main = define_penalty(),
penalty_external = define_penalty(),
weights = NULL,
standardize = c(TRUE, TRUE),
intercept = c(TRUE, FALSE),
control = list()
)
Arguments
- x
predictor design matrix of dimension \(n x p\), matrix options include:
matrix
big.matrix
filebacked.big.matrix
sparse matrix (dgCMatrix)
- y
outcome vector of length \(n\)
- external
(optional) external data design matrix of dimension \(p x q\), matrix options include:
matrix
sparse matrix (dgCMatrix)
- unpen
(optional) unpenalized predictor design matrix, matrix options include:
matrix
- family
error distribution for outcome variable, options include:
"gaussian"
"binomial"
- penalty_main
specifies regularization object for x. See
define_penalty
for more details.- penalty_external
specifies regularization object for external. See
define_penalty
for more details.- weights
optional vector of observation-specific weights. Default is 1 for all observations.
- standardize
indicates whether x and/or external should be standardized. Default is c(TRUE, TRUE).
- intercept
indicates whether an intercept term is included for x and/or external. Default is c(TRUE, FALSE).
- control
specifies xrnet control object. See
xrnet_control
for more details.
Value
A list of class xrnet
with components:
- beta0
matrix of first-level intercepts indexed by penalty values
- betas
3-dimensional array of first-level penalized coefficients indexed by penalty values
- gammas
3-dimensional array of first-level non-penalized coefficients indexed by penalty values
- alpha0
matrix of second-level intercepts indexed by penalty values
- alphas
3-dimensional array of second-level external data coefficients indexed by penalty values
- penalty
vector of first-level penalty values
- penalty_ext
vector of second-level penalty values
- family
error distribution for outcome variable
- num_passes
total number of passes over the data in the coordinate descent algorithm
- status
error status for xrnet fitting
0 = OK
1 = Error/Warning
- error_msg
description of error
Details
This function extends the coordinate descent algorithm of the
R package glmnet
to allow the type of regularization (i.e. ridge,
lasso) to be feature-specific. This extension is used to enable fitting
hierarchical regularized regression models, where external information for
the predictors can be included in the external=
argument. In addition,
elements of the R package biglasso
are utilized to enable the use of
standard R matrices, memory-mapped matrices from the bigmemory
package, or sparse matrices from the Matrix
package.
References
Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. URL http://www.jstatsoft.org/v33/i01/.
Zeng, Y., and Breheny, P. (2017). The biglasso Package: A Memory- and Computation-Efficient Solver for Lasso Model Fitting with Big Data in R. arXiv preprint arXiv:1701.05936. URL https://arxiv.org/abs/1701.05936.
Michael J. Kane, John Emerson, Stephen Weston (2013). Scalable Strategies for Computing with Massive Data. Journal of Statistical Software, 55(14), 1-19. URL http://www.jstatsoft.org/v55/i14/.
Examples
### hierarchical regularized linear regression ###
data(GaussianExample)
## define penalty for predictors and external variables
## default is ridge for predictors and lasso for external
## see define_penalty() function for more details
penMain <- define_penalty(0, num_penalty = 20)
penExt <- define_penalty(1, num_penalty = 20)
## fit model with defined regularization
fit_xrnet <- xrnet(
x = x_linear,
y = y_linear,
external = ext_linear,
family = "gaussian",
penalty_main = penMain,
penalty_external = penExt
)