# Optimization

Optimization involves minimizing or maximizing a function:

$\min\limits_x f(x)$
note

If we need to maximize $f(x)$, we can minimize $-f(x)$ instead.

Most problems of interest, involve some constrains on what $x$ values are possible: $x \in \Omega$. While optimization of arbitrary functions over arbitrary domains remains a difficult task, such problems can become somewhat simpler when $f(x)$ and $\Omega$ are convex.

### Convexity​

For simplicity, we will assume $\Omega$ is a subset of $\R^n$.

Definition 1

A set $\Omega \subseteq \R^n$ is convex if for all $x_1, x_2 \in \Omega$ and $t \in [0, 1]$,

$(1 - t)x_1 + tx_2 \in \Omega$
Definition 2

Let $\Omega \subseteq \R^n$ be convex. A function $f : \Omega \to \R$ is convex if for all $x_1, x_2 \in \Omega$ and $t \in [0, 1]$,

$f((1 - t)x_1 + tx_2) \le (1 - t)f(x_1) + tf(x_2)$

Some important examples of convex functions include affine functions, and $\ell_p$ norms for $p \ge 2$. More generally, any norm in vector space is convex by the triangle inequality and homogeneity. We say that $f$ is strictly convex if the inequality above is replaced by a strict inequality. We can further restrict to a smaller, better behaved class of functions called strongly convex functions.

Definition 3

Let $\Omega \subseteq \R^n$ be convex. A function $f : \Omega \to \R$ is strongly convex with parameter $\lambda > 0$ if $g(x) := f(x) - \frac{\lambda} {2}||x||_2^2$ is convex.

If $f$ is twice-differentiable, then $\lambda$-strong convexity is equivalent to the condition that for all $x, y \in \Omega$,

$f(y) \ge f(x) + \langle f(x), y-x \rangle + \frac{\lambda}{2}||x||_2^2$
Lemma 4

Suppose $f : \R^n \to \R$ is a differentiable, convex function. If $x^*$ is a global minimizer of $f$ then $\nabla f(x^*) = 0$.

Suppose further that $f$ is twice-differentiable. If $x^*$ is a global minimizer of $f$, then $\nabla f(x) = 0$ and $\nabla^2 f(x^*)$ is positive semi-definite. Conversely, if $\nabla f(y) = 0$ and $\nabla^2 f(y)$ is positive-definite, then $y$ is a global minimizer of $f$.

Points where $\nabla f(x) = 0$ are referred to as critical points.

Definition 5

A function $f : \Omega \to \R$ is L-Lipschitz if for all $x_1, x_2 \in \Omega$,

$|f(x_1) - f(x_2)| \le L||x_1 - x_2||$

If $f$ is assumed to be differentiable, then for all $x$, $||\nabla f(x)||_2 \le L$. If the gradient of $f$ is Lipschitz, then we say that the function is smooth.

Definition 6

A function $f: \Omega \to \R$ is $\beta$-smooth if for all $u, v \in \Omega$, we have

$||\nabla f(u) - \nabla f(v)||_2 \le \beta||u - v||$

Note that the $\beta$-smooth condition implies that for all $x, y$:

$f(y) \le f(x) + \langle \nabla f(x), y - x\rangle + \frac{\beta}{2}||y - x||_2^2$