# Linear regression

Minimize the square loss function:

$L(w, b) = \sum_i [y_i - (w \cdot x_i + b)]^2$

Predictor/independent variable (x): used to predict the response variable. Response variable (y): the variable that we want to predict.

In one dimension, the problem becomes fitting a straight line of the form: $\hat{y} = wx +b$, where $w$ is the slope and $b$ is the intercept. If we were given a bunch of training data points $(x_1, y_1),$ $(x_2, y_2),$ $...,$ $(x_n, y_n)$, our task is to find a line ($w$ and $b$) for which the square error is minimum.

We can find the minimum of the square loss function and obtain solution for $w$ and $b$:

$\frac{dL}{dw} = \frac{dL}{db} = 0$ $\Rightarrow \sum_i^n 2 (y_i - (w x_i + b)) = 0$ $\Rightarrow \sum_i^n y_i = w \sum_i^n x_i + nb$ $\Rightarrow b = \frac{1}{n} \sum_i^n y_i - \frac{w}{n} \sum_i^n x_i = \bar{y} -w \bar{x}$

We can solve for $w$ by setting $\frac{dL}{dw} = 0$:

$w = \frac{\sum_i^n (y_i - \bar{y})(x_i - \bar{x})}{\sum_i^n (x_i - \bar{x})^2}$

The above method can be extended for more than one predictor variable. In that case,

$\hat{y} = w^{(1)} x^{(1)} + w^{(2)} x^{(2)} + w^{(k)} x^{(k)} + b = w \cdot x + b$

We can incorporate $b$ in $w$s by assuming an extra predictor variable:

$\hat{y} = w \cdot x + b = \tilde{w} \cdot \tilde{x}$

where $\tilde{x} = (1, x)$ and $\tilde{w} = (b, w)$.

Our variables can be written as matrices:

$X = \begin{pmatrix} \leftarrow & \tilde{x_1} & \rightarrow \\ \leftarrow & \tilde{x_2} & \rightarrow \\ \leftarrow & ... & \rightarrow \\ \leftarrow & \tilde{x_n} & \rightarrow \end{pmatrix}$, $y = \begin{pmatrix} y_1 \\ y_2 \\ ... \\ y_n \end{pmatrix}$

The loss function is minimized at: $\tilde{w} = (X^TX)^{-1}(X^Ty)$.

Scaling of data is not important when we have multiple variable in case of linear regression.