# 9.9. Differentiability and Convex Functions#

## 9.9.1. First Order Conditions#

Let us look at the special case of real valued functions over $$\RR^n$$ which are differentiable.

Theorem 9.98 (First order characterization of convexity)

Let $$f : \RR^n \to \RR$$ be a real valued function which is differentiable at each point in $$\dom f$$ which is open.

Then $$f$$ is convex if and only if $$\dom f$$ is convex and

(9.4)#$f(\by) \geq f(\bx) + \nabla f(\bx)^T (\by - \bx)$

holds true for all $$\bx, \by \in \dom f$$.

Proof. To prove (9.4), we first show that a differentiable real function $$f: \RR \to \RR$$ is convex if and only if

$f(y) \geq f(x) + f'(x)(y - x)$

holds true for all $$x, y \in \dom f$$.

Assume that $$f$$ is convex. Hence, $$\dom f$$ is convex too.

1. Let $$x,y \in \dom f$$.

2. Since $$\dom f$$ is convex, hence $$(1-t) x + t y = x + t(y-x) \in \dom f$$ for all $$t \in [0, 1]$$.

3. By convexity of $$f$$, we have:

$f(x + t(y-x)) \leq (1-t) f(x) + t f(y).$
4. If we divide by $$t$$ on both sides, we obtain:

$f(y) \geq f(x) + \frac{f(x + t(y-x)) - f(x)}{t}.$
5. Taking the limit as $$t \to 0^+$$, we obtain:

$f(y) \geq f(x) + f'(x)(y-x).$

For the converse, assume that $$\dom f$$ is convex and

$f(y) \geq f(x) + f'(x)(y - x)$

holds true for all $$x, y \in \dom f$$.

1. Recall that in $$\RR$$ the only convex sets are intervals. Thus, $$\dom f$$ is an open interval.

2. Choose any $$x, y \in \dom f$$ such that $$x \neq y$$.

3. Choose $$t \in [0,1]$$.

4. Let $$z = t x + (1-t)y$$.

5. By hypothesis, we have:

$f(x) \geq f(z) + f'(z) (x - z)$

and

$f(y) \geq f(z) + f'(z) (y - z).$
6. Multiplying the first inequality with $$t$$ and second with $$(1-t)$$ and adding them yields:

$t f(x) + (1-t) f(y) \geq f(z) = f(tx + (1-t)y).$
7. Thus, $$f$$ is convex.

We now prove for the general case with $$f : \RR^n \to \RR$$. Recall from Theorem 9.70 that for any $$\bx, \by \in \dom f$$ the restriction of $$f$$ on the line passing through $$\bx$$ and $$\by$$ is given by:

$g(t) = f(t\by + (1-t) \bx) = f(\bx + t(\by - \bx)).$

Note that, by chain rule (Example 5.11):

$g'(t) = \nabla f(t\by + (1-t) \bx)^T (\by - \bx)$

Assume $$f$$ is convex.

1. Let $$\bx, \by \in \dom f$$ such that $$\bx \neq \by$$.

2. Let $$g$$ be the restriction of $$f$$ on the line passing through $$\bx, \by$$ as described above.

3. Due to Theorem 9.70, $$g$$ is convex.

4. By the argument for real functions above:

$g(t') \geq g(t) + g'(t)(t' - t)$

holds true for all $$t, t' \in \dom g$$.

5. In particular, with $$t'=1$$ and $$t=0$$, we have:

$g(1) \geq g(0) + g'(0).$
6. But $$g'(0) = \nabla f(\bx)^T (\by - \bx)$$.

7. Also, $$g(1) = f(\by)$$ and $$g(0) = f(\bx)$$.

8. Thus, we get:

$f(\by) \geq f(\bx) + \nabla f(\bx)^T (\by - \bx)$

as desired.

For the converse, assume that this inequality holds for all $$\bx, \by \in \dom f$$ and $$\dom f$$ is convex.

1. Pick some $$\bx, \by \in \dom f$$ with $$\bx \neq \by$$.

2. Let $$g$$ be the restriction of $$f$$ on the line passing through $$\bx, \by$$ as described above.

3. Pick $$t_1, t_2 \in \dom g$$.

4. Then, $$\bz_1 = t_1\by + (1-t_1) \bx$$ and $$\bz_2 = t_2\by + (1-t_2) \bx$$ are in $$\dom f$$.

5. Consider $$g(t_1) = f(t_1\by + (1-t_1) \bx) = f(\bz_1)$$ and $$g(t_2) = f(t_2\by + (1-2_1) \bx) = f(\bz_2)$$.

6. Note that $$g'(t_2) = \nabla f(t_2\by + (1-t_2) \bx)^T (\by - \bx) = \nabla f(\bz_2)^T (\by - \bx)$$.

7. By hypothesis, we have:

$f(\bz_1) \geq f(\bz_2) + \nabla f(\bz_2)^T (\bz_1 - \bz_2).$
8. But $$\bz_1 - \bz_2 = (t_1 - t_2) (\by - \bx)$$.

9. Thus, we get:

$g'(t_1) \geq g'(t_2) + g'(t_2)(t_1 - t_2).$
10. This holds for every $$t_1, t_2 \in \dom g$$.

11. But then, $$g$$ is convex by previous argument for real functions.

12. Since this is valid for every restriction of $$f$$ to a line passing through its domain, hence by Theorem 9.70 $$f$$ is convex.

## 9.9.2. Second Order Conditions#

For functions which are twice differentiable, convexity can be expressed in terms of the positive-semidefiniteness of their Hessian matrices.

We start with a result on convexity of real functions on open intervals.

Theorem 9.99 (Convexity characterization for twice differentiable real functions on open intervals)

Let $$f : \RR \to \RR$$ be twice continuously differentiable on an open interval $$(\alpha, \beta)$$; i.e., second derivative $$f''$$ exists and is continuous at every point the open interval $$(\alpha, \beta)$$.

Then, $$f$$ is convex if and only if its second derivative $$f''$$ is non-negative for every $$x \in (\alpha, \beta)$$:

$f''(x) \geq 0 \quad \Forall x \in (\alpha, \beta).$

Proof. Assume that $$f''$$ is nonnegative on $$(\alpha, \beta)$$.

1. Then, $$f'$$ is nondecreasing on $$(\alpha, \beta)$$.

2. For any $$x, y \in (\alpha, \beta)$$ with $$x < y$$ and $$r \in (0,1)$$, let $$z = (1-r)x + r y$$.

3. We have $$z \in (x,y)$$; i.e. $$x < z < y$$. Consequently,

$\begin{split} & f(z) - f(x) = \int_x^z f'(t) dt \leq f'(z) (z - x);\\ &f(y) - f(z) = \int_z^y f'(t) dt \geq f'(z) (y - z). \end{split}$
4. Since $$z-x = r(y - x)$$ and $$y -z = (1-r)(y - x)$$, we have

$\begin{split} f(z) \leq f(x) + r f'(z) (y -x);\\ f(z) \leq f(y) - (1-r) f'(z) (y -x ). \end{split}$

We wish to eliminate $$f'(z)$$ from these inequalities.

5. Multiplying the two inequalities by $$(1-r)$$ and $$r$$ respectively, and adding them together, we obtain:

$(1-r)f(z) + r f(z) \leq (1-r)f(x) + r f(y).$
6. But $$(1-r)f(z) + r f(z) = f(z) = f((1-r)x + r y)$$.

7. Thus, $$f((1-r)x + r y) \leq (1-r)f(x) + r f(y)$$.

8. This inequality is valid for the case where $$x > y$$ also.

9. Thus, $$f$$ is convex over $$(\alpha, \beta)$$.

For the converse, assume that $$f''$$ is not non-negative on $$(\alpha, \beta)$$.

1. Then, since $$f''$$ is continuous in $$(\alpha, \beta)$$, hence $$f''$$ is negative in some subinterval $$(\alpha', \beta')$$.

2. Choose $$x, y$$ such that $$\alpha' < x < y < \beta'$$. Choose some $$r \in (0,1)$$.

3. Following an argument parallel to above, we have

$f((1-r)x + r y) > (1-r) f(x) + r f(y).$
4. Thus, there exist $$x, y \in (\alpha, \beta)$$ where the inequality (9.1) is not valid.

5. Consequently, $$f$$ is non-convex.

We continue further with real valued functions over $$\RR^n$$ which are twice differentiable.

Theorem 9.100 (Second order characterization of convexity in Euclidean spaces)

Let $$f : \RR^n \to \RR$$ be twice continuously differentiable; i.e., its Hessian or second derivative $$\nabla^2 f$$ exists at every point in $$\dom f$$ which is open.

Then, $$f$$ is convex if and only if $$\dom f$$ is convex and its Hessian is positive semidefinite for every $$\bx \in \dom f$$:

$\nabla^2 f(\bx) \succeq \ZERO \quad \Forall \bx \in \dom f.$

Proof. The convexity of $$f$$ on its domain $$C = \dom f$$ is equivalent to the convexity of the restriction of $$f$$ to each line segment in $$C$$ due to Theorem 9.70.

We first note that if $$f$$ is convex then $$C$$ is convex and if $$C$$ is not convex, then $$f$$ is not convex. So, for the rest of the argument, we shall assume that $$C$$ is convex.

Consequently, for any $$\by \in C$$ and a nonzero $$\bz \in \RR^n$$ the intersection of the line $$\{ \bx = \by + t \bz \ST t \in \RR\}$$ and $$C$$ is an open line segment as $$C$$ is open and convex.

1. Let $$\by \in C$$.

2. Let $$\bz \in \RR^n$$ be an arbitrary (nonzero) direction.

3. Let $$L = \{ \bx = \by + t \bz \ST t \in \RR\}$$ be a line passing through $$\by$$ in the direction $$\bz$$.

4. Consider the open real interval $$S = \{t \ST \by + t \bz \in C\}$$. Since $$L \cap C$$ is an open line segment in $$\RR^n$$, hence $$S$$ is indeed an open interval in $$\RR$$.

5. Consider the parameterized restriction of $$f$$ on the open interval $$S$$ as:

$g(t) = f(\by + t \bz), \Forall t \in S.$
6. A simple calculation shows that

$g''(t) = \langle \bz, \nabla^2 f(\bx) \bz \rangle$

where $$\bx = \by + t \bz$$.

7. By Theorem 9.99, $$g$$ is convex for each $$\by \in C$$ and nonzero $$\bz \in \RR^n$$ if and only if $$\langle \bz, \nabla^2 f(\bx) \bz \rangle \geq 0$$ for every $$\bz \in \RR^n$$ and $$\bx \in C$$.

8. Thus, $$f$$ is convex if and only if $$\nabla^2 f(\bx) \succeq \ZERO \quad \Forall \bx \in C$$.

For real functions, the Hessian is simply the second derivative $$f''$$.

Corollary 9.6 (Second order characterization of concavity)

Let $$f : \RR^n \to \RR$$ be twice continuously differentiable; i.e., its Hessian or second derivative $$\nabla^2 f$$ exists at every point in $$\dom f$$ which is open.

Then, $$f$$ is concave if and only if $$\dom f$$ is convex and its Hessian is negative semidefinite for every $$\bx \in \dom f$$:

$\nabla^2 f(\bx) \preceq \ZERO \quad \Forall \bx \in \dom f.$

Example 9.40 (Convexity of a quadratic function)

Let $$\bP \in \SS^n$$ be a symmetric matrix. Let $$\bq \in \RR^n$$ and $$r \in \RR$$. Consider the quadratic functional $$f: \RR^n \to \RR$$ given as:

$f(\bx) = \frac{1}{2} \bx^T \bP \bx + \bq^T \bx + r.$

As shown in Example 5.13, the Hessian of $$f$$ is:

$\nabla^2 f (\bx) = \bP \quad \Forall \bx \in \RR^n.$

Thus, $$f$$ is convex if and only if $$\bP \succeq \ZERO$$ (i.e., it is positive semidefinite).

In fact $$f$$ is strictly convex if and only if $$P \succ \ZERO$$.

Example 9.41 (Identity is convex and concave)

Let $$f : \RR \to \RR$$ be:

$f(x) = x.$

We have $$f'(x) = 1$$ and $$f''(x) = 0$$.

$$f$$ is both convex and concave.

Example 9.42 (Exponential is convex)

Let $$f : \RR \to \RR$$ be:

$f(x) = e^{ax}$

with $$\dom f = \RR$$.

We have $$f'(x) = a e^{ax}$$ and $$f''(x) = a^2 e^{ax}$$.

For any $$a,x \in \RR$$, $$a^2 e^{ax} > 0$$. Thus, $$f$$ is strictly convex.

Example 9.43 (Powers)

Let $$f : \RR \to \RR$$ be:

$f(x) = x^a$

with $$\dom f = \RR_{++}$$.

Now, $$f'(x) = a x^{a-1}$$ and $$f''(x) = a (a - 1) x^{a-2}$$.

1. We have $$x > 0$$.

2. For $$a \geq 1$$, $$f''(x) \geq 0$$. $$f$$ is convex for $$a \geq 1$$.

3. For $$a \leq 0$$, $$a (a -1) \geq 0$$. Thus, $$f''(x) \geq 0$$. $$f$$ is convex for $$a \leq 0$$.

4. For $$0 \leq a \leq 1$$, $$a (a-1) \leq 0$$. Thus, $$f''(x) \leq 0$$. $$f$$ is concave on $$0 \leq a \leq 1$$.

Example 9.44 (Reciprocal powers)

Let $$f : \RR \to \RR$$ be:

$f(x) = \frac{1}{x^r} = x^{-r}.$

with $$\dom f = \RR_{++}$$.

Now, $$f'(x) = (-r) x^{-r-1}$$ and $$f''(x) = (-r)(-r - 1) x^{-r-2} = r(r+1) x^{-(r+2)}$$.

1. We have $$x > 0$$.

2. For $$r \geq 0$$, $$f''(x) \geq 0$$. $$f$$ is convex for $$r \geq 0$$.

Example 9.45 (Logarithm is concave)

Let $$f : \RR \to \RR$$ be:

$f(x) = \ln x.$

with $$\dom f = \RR_{++}$$.

Now, $$f'(x) = \frac{1}{x}$$ and $$f''(x) = \frac{-1}{x^2}$$.

1. $$f''(x) < 0$$ for all $$x > 0$$.

2. Thus, $$f$$ is concave for all $$x > 0$$.

Example 9.46 (Negative entropy is convex)

Let $$f : \RR \to \RR$$ be:

$f(x) = x \ln x.$

with $$\dom f = \RR_{++}$$.

Now, $$f'(x) = \ln x + 1$$ and $$f''(x) = \frac{1}{x}$$.

1. $$f''(x) > 0$$ for all $$x > 0$$.

2. Thus, $$f$$ is convex for all $$x > 0$$.

Example 9.47 (Quadratic over linear form is convex)

Let $$f : \RR \times \RR \to \RR$$ be given by:

$f(x, y) = \frac{x^2}{y}$

with $$\dom f = \{ (x, y) \ST y > 0\}$$.

From Example 5.16, the Hessian is:

$\begin{split} \nabla^2 f(x, y) = \frac{2}{y^3} \begin{bmatrix} y^2 & - x y\\ - x y & x^2 \end{bmatrix} = \frac{2}{y^3} \begin{bmatrix} y\\ - x \end{bmatrix} \begin{bmatrix} y\\ - x \end{bmatrix}^T . \end{split}$

Recall that for any $$\bx \in \RR^n$$, the matrix $$\bx \bx^T$$ is positive semi-definite. Hence,

$\begin{split} \begin{bmatrix} y\\ - x \end{bmatrix} \begin{bmatrix} y\\ - x \end{bmatrix}^T \end{split}$

is positive semi-definite.

For $$y > 0$$, $$\frac{2}{y^3} > 0$$. Combining:

$\nabla^2 f(x, y) \succeq \ZERO.$

Thus, $$f$$ is convex.

Example 9.48 (Log sum exponential is convex)

Let $$f : \RR^n \to \RR$$ be given by:

$f(\bx) = \ln \left ( \sum_{i=1}^n e^{x_i} \right )$

with $$\dom f = \RR^n$$.

From Example 5.14, we have

$\nabla^2 f(\bx) = \frac{1}{(\bone^T \bz)^2} \left ((\bone^T \bz) \Diag (\bz) - \bz \bz^T \right )$

where

$\begin{split} \bz = \begin{bmatrix} e^{x_1} \\ \vdots \\ e^{x_n} \end{bmatrix}. \end{split}$

To show that $$\nabla^2 f(\bx)$$ is p.s.d., it suffices to show that $$(\bone^T \bz) \Diag (\bz) - \bz \bz^T$$ is p.s.d..

Now for any $$\bv \in \RR^n$$.

$\begin{split} &\bv^T \left ( (\bone^T \bz) \Diag (\bz) - \bz \bz^T \right ) \bv\\ &= (\bone^T \bz) (\bv^T \Diag (\bz) \bv) - \bv^T \bz \bz^T \bv \\ &= (\bone^T \bz) (\bv^T \Diag (\bz) \bv) - (\bv^T \bz)^2 \\ &= \left (\sum_{i=1}^n z_i \right ) \left (\sum_{i=1}^n v_i^2 z_i \right) - \left (\sum_{i=1}^n v_i z_i \right )^2. \end{split}$

If we define vectors $$\ba$$ and $$\bb$$ with $$a_i = v_i \sqrt{z_i}$$ and $$b_i = \sqrt{z_i}$$, then by Cauchy-Schwartz inequality , we have:

$(\ba^T \ba)(\bb^T \bb) \geq (\ba^T \bb)^2 \iff (\ba^T \ba)(\bb^T \bb) - (\ba^T \bb)^2 \geq 0.$

But this is exactly the expression above. Thus, $$\nabla^2 f(\bx) \succeq \ZERO$$.

Hence, $$f$$ is convex.

Example 9.49 (Log determinant function is concave)

Let $$f : \SS^n \to \RR$$ be:

$f(\bX) = \log \det X.$

with $$\dom f = \SS^n_{++}$$ (the set of symmetric positive definite matrices).

Let any line in $$\SS^n$$ be given by:

$\bX = \bZ + t \bV$

where $$\bZ, \bV \in \SS^n$$.

Consider the restriction of $$f$$ on a line:

$g(t) = \log \det (\bZ + t \bV)$

to the interval of values where $$\bZ + t \bV \succ \ZERO$$ (since $$\dom f = \SS^n_{++}$$ ). In other words,

$\dom g = \{t \in \RR \ST \bZ + t \bV \succ \ZERO \}.$

Without any loss of generality, we can assume that $$t=0 \in \dom g$$; i.e. $$\bZ \succ \ZERO$$.

Recall that:

1. $$\det (AB) = \det(A) \det(B)$$ for square matrices.

2. $$\det (A) = \prod_{i=1}^n \lambda_i$$ for symmetric matrices with $$\lambda_i$$ being their eigen values.

3. If $$\lambda_i$$ are eigen values of $$A$$, then the eigen values of $$I + t A$$ are $$1 + t \lambda_i$$.

Now

$\begin{split} g(t) &= \log \det (\bZ + t \bV) \\ &= \log \det (\bZ^{\frac{1}{2}} (\bZ^{\frac{1}{2}} + t \bZ^{-\frac{1}{2}} \bV) )\\ &= \log \det (\bZ^{\frac{1}{2}} (I + t \bZ^{-\frac{1}{2}} \bV \bZ^{-\frac{1}{2}}) \bZ^{\frac{1}{2}})\\ &= \log \det(\bZ^{\frac{1}{2}}) + \log \det (I + t \bZ^{-\frac{1}{2}} \bV \bZ^{-\frac{1}{2}}) + \log \det(\bZ^{\frac{1}{2}})\\ &= \log \det(\bZ) + \log \det (I + t \bZ^{-\frac{1}{2}} \bV \bZ^{-\frac{1}{2}}). \end{split}$
1. Let $$\lambda_i$$ be the eigen values of $$\bZ^{-\frac{1}{2}} \bV \bZ^{-\frac{1}{2}}$$.

2. Then, $$1 + t \lambda_i$$ are eigen values of $$I + t\bZ^{-\frac{1}{2}} \bV \bZ^{-\frac{1}{2}}$$.

3. Thus, $$\log \det (I + t\bZ^{-\frac{1}{2}} \bV \bZ^{-\frac{1}{2}}) = \sum_{i=1}^n \log \det (1 + t\lambda_i)$$.

Thus,

$g(t) = \sum_{i=1}^n \log \det (1 + t\lambda_i) + \log \det(\bZ).$

Note that $$\log \det(\bZ)$$ doesn’t depend on $$t$$. Similarly, $$\lambda_i$$ only depend on $$\bZ$$ and $$\bV$$, hence they don’t depend on $$t$$.

Differentiating $$g$$ w.r.t. $$t$$, we get:

$g'(t) = \sum_{i=1}^n \frac{\lambda_i}{1 + t \lambda_i}.$

Differentiating again, we get:

$g''(t) = -\sum_{i=1}^n \frac{\lambda_i^2}{(1 + t \lambda_i)^2}.$

Since $$g''(t) \leq 0$$, hence $$f$$ is concave.