Skip to main content

Ridge regression and Lasso

Ridge regression

L(w,b)=i[yi(wxi+b)]2+λw2L(w, b) = \sum_i [y_i - (w \cdot x_i + b)]^2 + \lambda |w|^2

Linear regression is a good choice when we have lots of training data. When we try to minimize LL with higher value of of λ\lambda, it will result in lower ww's. Increasing λ\lambda results in less weight for training data. In reality, we can vary λ\lambda and choose a value for which the test error is minimum.

In case of Ridge regression also, we have a closed form solution:

w=(XTX+λI)1(XTy)w = (X^T X + \lambda I)^{-1}(X^T y)

Lasso

Similar to Ridge regression:

L(w,b)=i[yi(wxi+b)]2+λwL(w, b) = \sum_i [y_i - (w \cdot x_i + b)]^2 + \lambda |w|

It has some advantages, like it produces sparse ww matrix (lots of zero elements).

Notebooks