% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/glmnet_details.R
\name{glmnet-details}
\alias{glmnet-details}
\title{Technical aspects of the glmnet model}
\description{
glmnet is a popular statistical model for regularized generalized linear
models. These notes reflect common questions about this particular model.
}
\section{tidymodels and glmnet}{
The implementation of the glmnet package has some nice features. For
example, one of the main tuning parameters, the regularization penalty,
does not need to be specified when fitting the model. The package fits a
compendium of values, called the regularization path. These values
depend on the data set and the value of \code{alpha}, the mixture parameter
between a pure ridge model (\code{alpha = 0}) and a pure lasso model
(\code{alpha = 1}). When predicting, any penalty values can be simultaneously
predicted, even those that are not exactly on the regularization path.
For those, the model approximates between the closest path values to
produce a prediction. There is an argument called \code{lambda} to the
\code{glmnet()} function that is used to specify the path.

In the discussion below, \code{linear_reg()} is used. The information is true
for all parsnip models that have a \code{"glmnet"} engine.
\subsection{Fitting and predicting using parsnip}{

Recall that tidymodels uses standardized parameter names across models
chosen to be low on jargon. The argument \code{penalty} is the equivalent of
what glmnet calls the \code{lambda} value and \code{mixture} is the same as their
\code{alpha} value.

In tidymodels, our \code{predict()} methods are defined to make one
prediction at a time. For this model, that means predictions are for a
single penalty value. For this reason, models that have glmnet engines
require the user to always specify a single penalty value when the model
is defined. For example, for linear regression:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{linear_reg(penalty = 1) \%>\% set_engine("glmnet")
}\if{html}{\out{</div>}}

When the \code{predict()} method is called, it automatically uses the penalty
that was given when the model was defined. For example:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{library(tidymodels)

fit <- 
  linear_reg(penalty = 1) \%>\% 
  set_engine("glmnet") \%>\% 
  fit(mpg ~ ., data = mtcars)

# predict at penalty = 1
predict(fit, mtcars[1:3,])
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## # A tibble: 3 x 1
##   .pred
##   <dbl>
## 1  22.2
## 2  21.5
## 3  24.9
}\if{html}{\out{</div>}}

However, any penalty values can be predicted simultaneously using the
\code{multi_predict()} method:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# predict at c(0.00, 0.01)
multi_predict(fit, mtcars[1:3,], penalty = c(0.00, 0.01))
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## # A tibble: 3 x 1
##   .pred           
##   <list>          
## 1 <tibble [2 x 2]>
## 2 <tibble [2 x 2]>
## 3 <tibble [2 x 2]>
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# unnested:
multi_predict(fit, mtcars[1:3,], penalty = c(0.00, 0.01)) \%>\% 
  add_rowindex() \%>\% 
  unnest(cols = ".pred")
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## # A tibble: 6 x 3
##   penalty .pred  .row
##     <dbl> <dbl> <int>
## 1    0     22.6     1
## 2    0.01  22.5     1
## 3    0     22.1     2
## 4    0.01  22.1     2
## 5    0     26.3     3
## 6    0.01  26.3     3
}\if{html}{\out{</div>}}
\subsection{Where did \code{lambda} go?}{

It may appear odd that the \code{lambda} value does not get used in the fit:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{linear_reg(penalty = 1) \%>\% 
  set_engine("glmnet") \%>\% 
  translate()
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## Linear Regression Model Specification (regression)
## 
## Main Arguments:
##   penalty = 1
## 
## Computational engine: glmnet 
## 
## Model fit template:
## glmnet::glmnet(x = missing_arg(), y = missing_arg(), weights = missing_arg(), 
##     family = "gaussian")
}\if{html}{\out{</div>}}

Internally, the value of \code{penalty = 1} is saved in the parsnip object
and no value is set for \code{lambda}. This enables the full path to be fit
by \code{glmnet()}. See the section below about setting the path.
}

}

\subsection{How do I set the regularization path?}{

Regardless of what value you use for \code{penalty}, the full coefficient
path is used when \code{\link[glmnet:glmnet]{glmnet::glmnet()}} is called.

What if you want to manually set this path? Normally, you would pass a
vector to \code{lambda} in \code{\link[glmnet:glmnet]{glmnet::glmnet()}}.

parsnip models that use a \code{glmnet} engine can use a special optional
argument called \code{path_values}. This is \emph{not} an argument to
\code{\link[glmnet:glmnet]{glmnet::glmnet()}}; it is used by parsnip to
independently set the path.

For example, we have found that if you want a fully ridge regression
model (i.e., \code{mixture = 0}), you can get the \emph{wrong coefficients} if the
path does not contain zero (see \href{https://github.com/tidymodels/parsnip/issues/431#issuecomment-782883848}{issue #431}).

If we want to use our own path, the argument is passed as an
engine-specific option:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{coef_path_values <- c(0, 10^seq(-5, 1, length.out = 7))

fit_ridge <- 
  linear_reg(penalty = 1, mixture = 0) \%>\% 
  set_engine("glmnet", path_values = coef_path_values) \%>\% 
  fit(mpg ~ ., data = mtcars)

all.equal(sort(fit_ridge$fit$lambda), coef_path_values)
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## [1] TRUE
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# predict at penalty = 1
predict(fit_ridge, mtcars[1:3,])
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## # A tibble: 3 x 1
##   .pred
##   <dbl>
## 1  22.1
## 2  21.8
## 3  26.6
}\if{html}{\out{</div>}}
}

\subsection{Tidying the model object}{

\code{\link[broom:reexports]{broom::tidy()}} is a function that gives a summary of
the object as a tibble.

\strong{tl;dr} \code{tidy()} on a \code{glmnet} model produced by parsnip gives the
coefficients for the value given by \code{penalty}.

When parsnip makes a model, it gives it an extra class. Use the \code{tidy()}
method on the object, it produces coefficients for the penalty that was
originally requested:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{tidy(fit)
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## # A tibble: 11 x 3
##   term        estimate penalty
##   <chr>          <dbl>   <dbl>
## 1 (Intercept)  35.3          1
## 2 cyl          -0.872        1
## 3 disp          0            1
## 4 hp           -0.0101       1
## 5 drat          0            1
## 6 wt           -2.59         1
## # i 5 more rows
}\if{html}{\out{</div>}}

Note that there is a \code{tidy()} method for \code{glmnet} objects in the \code{broom}
package. If this is used directly on the underlying \code{glmnet} object, it
returns \emph{all of coefficients on the path}:

\if{html}{\out{<div class="sourceCode r">}}\preformatted{# Use the basic tidy() method for glmnet
all_tidy_coefs <- broom:::tidy.glmnet(fit$fit)
all_tidy_coefs
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## # A tibble: 640 x 5
##   term         step estimate lambda dev.ratio
##   <chr>       <dbl>    <dbl>  <dbl>     <dbl>
## 1 (Intercept)     1     20.1   5.15     0    
## 2 (Intercept)     2     21.6   4.69     0.129
## 3 (Intercept)     3     23.2   4.27     0.248
## 4 (Intercept)     4     24.7   3.89     0.347
## 5 (Intercept)     5     26.0   3.55     0.429
## 6 (Intercept)     6     27.2   3.23     0.497
## # i 634 more rows
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode r">}}\preformatted{length(unique(all_tidy_coefs$lambda))
}\if{html}{\out{</div>}}

\if{html}{\out{<div class="sourceCode">}}\preformatted{## [1] 79
}\if{html}{\out{</div>}}

This can be nice for plots but it might not contain the penalty value
that you are interested in.
}
}

\keyword{internal}
