Skip to main content


Optimization involves minimizing or maximizing a function:

minxf(x)\min\limits_x f(x)

If we need to maximize f(x)f(x), we can minimize f(x)-f(x) instead.

Most problems of interest, involve some constrains on what xx values are possible: xΩx \in \Omega. While optimization of arbitrary functions over arbitrary domains remains a difficult task, such problems can become somewhat simpler when f(x)f(x) and Ω\Omega are convex.


For simplicity, we will assume Ω\Omega is a subset of Rn\R^n.

Definition 1

A set ΩRn\Omega \subseteq \R^n is convex if for all x1,x2Ωx_1, x_2 \in \Omega and t[0,1]t \in [0, 1],

(1t)x1+tx2Ω(1 - t)x_1 + tx_2 \in \Omega
Definition 2

Let ΩRn\Omega \subseteq \R^n be convex. A function f:ΩRf : \Omega \to \R is convex if for all x1,x2Ωx_1, x_2 \in \Omega and t[0,1]t \in [0, 1],

f((1t)x1+tx2)(1t)f(x1)+tf(x2)f((1 - t)x_1 + tx_2) \le (1 - t)f(x_1) + tf(x_2)

Some important examples of convex functions include affine functions, and p\ell_p norms for p2p \ge 2. More generally, any norm in vector space is convex by the triangle inequality and homogeneity. We say that ff is strictly convex if the inequality above is replaced by a strict inequality. We can further restrict to a smaller, better behaved class of functions called strongly convex functions.

Definition 3

Let ΩRn\Omega \subseteq \R^n be convex. A function f:ΩRf : \Omega \to \R is strongly convex with parameter λ>0\lambda > 0 if g(x):=f(x)λ2x22g(x) := f(x) - \frac{\lambda} {2}||x||_2^2 is convex.

If ff is twice-differentiable, then λ\lambda-strong convexity is equivalent to the condition that for all x,yΩx, y \in \Omega,

f(y)f(x)+f(x),yx+λ2x22f(y) \ge f(x) + \langle f(x), y-x \rangle + \frac{\lambda}{2}||x||_2^2
Lemma 4

Suppose f:RnRf : \R^n \to \R is a differentiable, convex function. If xx^* is a global minimizer of ff then f(x)=0\nabla f(x^*) = 0.

Suppose further that ff is twice-differentiable. If xx^* is a global minimizer of ff, then f(x)=0\nabla f(x) = 0 and 2f(x)\nabla^2 f(x^*) is positive semi-definite. Conversely, if f(y)=0\nabla f(y) = 0 and 2f(y)\nabla^2 f(y) is positive-definite, then yy is a global minimizer of ff.

Points where f(x)=0\nabla f(x) = 0 are referred to as critical points.

Definition 5

A function f:ΩRf : \Omega \to \R is L-Lipschitz if for all x1,x2Ωx_1, x_2 \in \Omega,

f(x1)f(x2)Lx1x2|f(x_1) - f(x_2)| \le L||x_1 - x_2||

If ff is assumed to be differentiable, then for all xx, f(x)2L||\nabla f(x)||_2 \le L. If the gradient of ff is Lipschitz, then we say that the function is smooth.

Definition 6

A function f:ΩRf: \Omega \to \R is β\beta-smooth if for all u,vΩu, v \in \Omega, we have

f(u)f(v)2βuv||\nabla f(u) - \nabla f(v)||_2 \le \beta||u - v||

Note that the β\beta-smooth condition implies that for all x,yx, y:

f(y)f(x)+f(x),yx+β2yx22f(y) \le f(x) + \langle \nabla f(x), y - x\rangle + \frac{\beta}{2}||y - x||_2^2