### abstract ###
We consider the least-square linear regression problem with regularization by the  SYMBOL -norm, a problem usually referred to as the Lasso
In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso
For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection (i e , variable selection)
For a specific rate decay, we show that the Lasso selects all the variables that should enter the model with probability tending to one exponentially fast, while it selects all other variables with strictly positive probability
We show that this property implies that if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection
This novel variable selection algorithm, referred to as the Bolasso, is compared favorably to other linear regression methods on synthetic data and datasets from the UCI machine learning repository
### introduction ###
Regularization by the  SYMBOL -norm has attracted a lot of interest in recent years in machine learning, statistics and signal processing
In the context of least-square linear regression, the problem is usually referred to as the  Lasso ~ CITATION
Much of the early effort has been dedicated to algorithms to solve the optimization problem efficiently
In particular, the  Lars  algorithm of~\singleemcite{lars} allows to find the entire regularization path (i e , the set of solutions for all values of the regularization parameters) at the cost of a single matrix inversion
Moreover, a well-known justification of the  regularization by the  SYMBOL -norm is that it leads to  sparse  solutions, i e , loading vectors with many zeros, and thus performs model selection
Recent works  CITATION  have looked precisely at the model consistency of the Lasso, i e , if we know that the data were generated from a sparse loading vector, does the Lasso actually recover it when the number of observed data points grows
In the case of a fixed number of covariates, the Lasso does recover the sparsity pattern if and only if a certain simple condition on the generating covariance matrices is verified~ CITATION
In particular, in low correlation settings, the Lasso is indeed consistent
However, in presence of strong correlations, the Lasso cannot be consistent, shedding light on potential problems of such procedures for variable selection
Adaptive versions where data-dependent weights are added to the  SYMBOL -norm  then allow to keep the consistency in all situations~ CITATION
In this paper, we first derive a detailed asymptotic analysis of sparsity pattern selection of the Lasso estimation procedure, that extends previous analysis~ CITATION , by focusing on a specific decay of the regularization parameter
We show that when the decay is proportional to  SYMBOL , where  SYMBOL  is the number of observations, then the Lasso will select all the variables that should enter the model (the  relevant  variables) with probability tending to one exponentially fast with~ SYMBOL , while it selects all other variables (the  irrelevant  variables) with strictly positive probability
If several datasets generated from the same distribution were available, then the latter property  would suggest to consider the intersection of the supports of the Lasso estimates for each dataset: all relevant variables would always be selected for all datasets, while irrelevant variables would enter the models randomly, and intersecting the supports from sufficiently many different datasets would simply eliminate them
However, in practice, only one dataset is given; but resampling methods such as the  bootstrap  are exactly dedicated to mimic the availability of several datasets by resampling from the same unique dataset~ CITATION
In this paper, we show that when using the bootstrap and intersecting the supports, we actually get a consistent model estimate, without the consistency condition required by the regular Lasso
We refer to this new procedure as the  Bolasso  ( bo otstrap-enhanced  l east  a b s olute  s hrinkage  o perator)
Finally, our Bolasso framework could be seen as a voting scheme applied to the supports of the bootstrap Lasso estimates; however, our procedure may rather be considered as a consensus combination scheme, as we keep the (largest) subset of variables on which  all  regressors agree in terms of variable selection, which is in our case provably consistent and also allows to get rid of a potential additional hyperparameter
The paper is organized as follows: in \mysec{analysis}, we present the asymptotic analysis of model selection for the Lasso; in \mysec{bootstrap}, we describe the Bolasso algorithm as well as its proof of model consistency, while in \mysec{simulations}, we illustrate our results on synthetic data, where the true sparse generating model is known, and data from the UCI machine learning repository
Sketches of proofs can be found in Appendix~A \paragraph{Notations} For a vector  SYMBOL , we let denote  SYMBOL  the  SYMBOL -norm,  SYMBOL  the  SYMBOL -norm and  SYMBOL  the  SYMBOL -norm
For    SYMBOL ,  SYMBOL  denotes the sign of  SYMBOL , defined as  SYMBOL  if  SYMBOL ,  SYMBOL  if  SYMBOL , and  SYMBOL  if  SYMBOL
For a vector  SYMBOL ,  SYMBOL  denotes the the vector of signs of elements of  SYMBOL
Moreover, given a vector  SYMBOL  and a subset  SYMBOL  of  SYMBOL ,  SYMBOL  denotes the vector in  SYMBOL  of elements of  SYMBOL  indexed by  SYMBOL
Similarly, for a matrix  SYMBOL ,  SYMBOL  denotes the submatrix of   SYMBOL  composed of elements of  SYMBOL  whose rows are in  SYMBOL  and columns are in  SYMBOL
