7.9. Differentiability and Convex Functions#

7.9.1. First Order Conditions#

Let us look at the special case of real valued functions over \(\RR^n\) which are differentiable.

Theorem 7.98 (First order characterization of convexity)

Let \(f : \RR^n \to \RR\) be a real valued function which is differentiable at each point in \(\dom f\) which is open.

Then \(f\) is convex if and only if \(\dom f\) is convex and

(7.4)#\[f(\by) \geq f(\bx) + \nabla f(\bx)^T (\by - \bx)\]

holds true for all \(\bx, \by \in \dom f\).

Proof. To prove (7.4), we first show that a differentiable real function \(f: \RR \to \RR\) is convex if and only if

\[ f(y) \geq f(x) + f'(x)(y - x) \]

holds true for all \(x, y \in \dom f\).

Assume that \(f\) is convex. Hence, \(\dom f\) is convex too.

  1. Let \(x,y \in \dom f\).

  2. Since \(\dom f\) is convex, hence \( (1-t) x + t y = x + t(y-x) \in \dom f\) for all \(t \in [0, 1]\).

  3. By convexity of \(f\), we have:

    \[ f(x + t(y-x)) \leq (1-t) f(x) + t f(y). \]
  4. If we divide by \(t\) on both sides, we obtain:

    \[ f(y) \geq f(x) + \frac{f(x + t(y-x)) - f(x)}{t}. \]
  5. Taking the limit as \(t \to 0^+\), we obtain:

    \[ f(y) \geq f(x) + f'(x)(y-x). \]

For the converse, assume that \(\dom f\) is convex and

\[ f(y) \geq f(x) + f'(x)(y - x) \]

holds true for all \(x, y \in \dom f\).

  1. Recall that in \(\RR\) the only convex sets are intervals. Thus, \(\dom f\) is an open interval.

  2. Choose any \(x, y \in \dom f\) such that \(x \neq y\).

  3. Choose \(t \in [0,1]\).

  4. Let \(z = t x + (1-t)y\).

  5. By hypothesis, we have:

    \[ f(x) \geq f(z) + f'(z) (x - z) \]

    and

    \[ f(y) \geq f(z) + f'(z) (y - z). \]
  6. Multiplying the first inequality with \(t\) and second with \((1-t)\) and adding them yields:

    \[ t f(x) + (1-t) f(y) \geq f(z) = f(tx + (1-t)y). \]
  7. Thus, \(f\) is convex.

We now prove for the general case with \(f : \RR^n \to \RR\). Recall from Theorem 7.70 that for any \(\bx, \by \in \dom f\) the restriction of \(f\) on the line passing through \(\bx\) and \(\by\) is given by:

\[ g(t) = f(t\by + (1-t) \bx) = f(\bx + t(\by - \bx)). \]

Note that, by chain rule (Example 5.11):

\[ g'(t) = \nabla f(t\by + (1-t) \bx)^T (\by - \bx) \]

Assume \(f\) is convex.

  1. Let \(\bx, \by \in \dom f\) such that \(\bx \neq \by\).

  2. Let \(g\) be the restriction of \(f\) on the line passing through \(\bx, \by\) as described above.

  3. Due to Theorem 7.70, \(g\) is convex.

  4. By the argument for real functions above:

    \[ g(t') \geq g(t) + g'(t)(t' - t) \]

    holds true for all \(t, t' \in \dom g\).

  5. In particular, with \(t'=1\) and \(t=0\), we have:

    \[ g(1) \geq g(0) + g'(0). \]
  6. But \(g'(0) = \nabla f(\bx)^T (\by - \bx)\).

  7. Also, \(g(1) = f(\by)\) and \(g(0) = f(\bx)\).

  8. Thus, we get:

    \[ f(\by) \geq f(\bx) + \nabla f(\bx)^T (\by - \bx) \]

    as desired.

For the converse, assume that this inequality holds for all \(\bx, \by \in \dom f\) and \(\dom f\) is convex.

  1. Pick some \(\bx, \by \in \dom f\) with \(\bx \neq \by\).

  2. Let \(g\) be the restriction of \(f\) on the line passing through \(\bx, \by\) as described above.

  3. Pick \(t_1, t_2 \in \dom g\).

  4. Then, \(\bz_1 = t_1\by + (1-t_1) \bx\) and \(\bz_2 = t_2\by + (1-t_2) \bx\) are in \(\dom f\).

  5. Consider \(g(t_1) = f(t_1\by + (1-t_1) \bx) = f(\bz_1)\) and \(g(t_2) = f(t_2\by + (1-2_1) \bx) = f(\bz_2)\).

  6. Note that \(g'(t_2) = \nabla f(t_2\by + (1-t_2) \bx)^T (\by - \bx) = \nabla f(\bz_2)^T (\by - \bx)\).

  7. By hypothesis, we have:

    \[ f(\bz_1) \geq f(\bz_2) + \nabla f(\bz_2)^T (\bz_1 - \bz_2). \]
  8. But \(\bz_1 - \bz_2 = (t_1 - t_2) (\by - \bx)\).

  9. Thus, we get:

    \[ g'(t_1) \geq g'(t_2) + g'(t_2)(t_1 - t_2). \]
  10. This holds for every \(t_1, t_2 \in \dom g\).

  11. But then, \(g\) is convex by previous argument for real functions.

  12. Since this is valid for every restriction of \(f\) to a line passing through its domain, hence by Theorem 7.70 \(f\) is convex.

7.9.2. Second Order Conditions#

For functions which are twice differentiable, convexity can be expressed in terms of the positive-semidefiniteness of their Hessian matrices.

We start with a result on convexity of real functions on open intervals.

Theorem 7.99 (Convexity characterization for twice differentiable real functions on open intervals)

Let \(f : \RR \to \RR\) be twice continuously differentiable on an open interval \((\alpha, \beta)\); i.e., second derivative \(f''\) exists and is continuous at every point the open interval \((\alpha, \beta)\).

Then, \(f\) is convex if and only if its second derivative \(f''\) is non-negative for every \(x \in (\alpha, \beta)\):

\[ f''(x) \geq 0 \quad \Forall x \in (\alpha, \beta). \]

Proof. Assume that \(f''\) is nonnegative on \((\alpha, \beta)\).

  1. Then, \(f'\) is nondecreasing on \((\alpha, \beta)\).

  2. For any \(x, y \in (\alpha, \beta)\) with \(x < y\) and \(r \in (0,1)\), let \(z = (1-r)x + r y\).

  3. We have \(z \in (x,y)\); i.e. \(x < z < y\). Consequently,

    \[\begin{split} & f(z) - f(x) = \int_x^z f'(t) dt \leq f'(z) (z - x);\\ &f(y) - f(z) = \int_z^y f'(t) dt \geq f'(z) (y - z). \end{split}\]
  4. Since \(z-x = r(y - x)\) and \(y -z = (1-r)(y - x)\), we have

    \[\begin{split} f(z) \leq f(x) + r f'(z) (y -x);\\ f(z) \leq f(y) - (1-r) f'(z) (y -x ). \end{split}\]

    We wish to eliminate \(f'(z)\) from these inequalities.

  5. Multiplying the two inequalities by \((1-r)\) and \(r\) respectively, and adding them together, we obtain:

    \[ (1-r)f(z) + r f(z) \leq (1-r)f(x) + r f(y). \]
  6. But \((1-r)f(z) + r f(z) = f(z) = f((1-r)x + r y)\).

  7. Thus, \(f((1-r)x + r y) \leq (1-r)f(x) + r f(y)\).

  8. This inequality is valid for the case where \(x > y\) also.

  9. Thus, \(f\) is convex over \((\alpha, \beta)\).

For the converse, assume that \(f''\) is not non-negative on \((\alpha, \beta)\).

  1. Then, since \(f''\) is continuous in \((\alpha, \beta)\), hence \(f''\) is negative in some subinterval \((\alpha', \beta')\).

  2. Choose \(x, y\) such that \(\alpha' < x < y < \beta'\). Choose some \(r \in (0,1)\).

  3. Following an argument parallel to above, we have

    \[ f((1-r)x + r y) > (1-r) f(x) + r f(y). \]
  4. Thus, there exist \(x, y \in (\alpha, \beta)\) where the inequality (7.1) is not valid.

  5. Consequently, \(f\) is non-convex.

We continue further with real valued functions over \(\RR^n\) which are twice differentiable.

Theorem 7.100 (Second order characterization of convexity in Euclidean spaces)

Let \(f : \RR^n \to \RR\) be twice continuously differentiable; i.e., its Hessian or second derivative \(\nabla^2 f\) exists at every point in \(\dom f\) which is open.

Then, \(f\) is convex if and only if \(\dom f\) is convex and its Hessian is positive semidefinite for every \(\bx \in \dom f\):

\[ \nabla^2 f(\bx) \succeq \ZERO \quad \Forall \bx \in \dom f. \]

Proof. The convexity of \(f\) on its domain \(C = \dom f\) is equivalent to the convexity of the restriction of \(f\) to each line segment in \(C\) due to Theorem 7.70.

We first note that if \(f\) is convex then \(C\) is convex and if \(C\) is not convex, then \(f\) is not convex. So, for the rest of the argument, we shall assume that \(C\) is convex.

Consequently, for any \(\by \in C\) and a nonzero \(\bz \in \RR^n\) the intersection of the line \(\{ \bx = \by + t \bz \ST t \in \RR\}\) and \(C\) is an open line segment as \(C\) is open and convex.

  1. Let \(\by \in C\).

  2. Let \(\bz \in \RR^n\) be an arbitrary (nonzero) direction.

  3. Let \(L = \{ \bx = \by + t \bz \ST t \in \RR\}\) be a line passing through \(\by\) in the direction \(\bz\).

  4. Consider the open real interval \(S = \{t \ST \by + t \bz \in C\}\). Since \(L \cap C\) is an open line segment in \(\RR^n\), hence \(S\) is indeed an open interval in \(\RR\).

  5. Consider the parameterized restriction of \(f\) on the open interval \(S\) as:

    \[ g(t) = f(\by + t \bz), \Forall t \in S. \]
  6. A simple calculation shows that

    \[ g''(t) = \langle \bz, \nabla^2 f(\bx) \bz \rangle \]

    where \(\bx = \by + t \bz\).

  7. By Theorem 7.99, \(g\) is convex for each \(\by \in C\) and nonzero \(\bz \in \RR^n\) if and only if \(\langle \bz, \nabla^2 f(\bx) \bz \rangle \geq 0\) for every \(\bz \in \RR^n\) and \(\bx \in C\).

  8. Thus, \(f\) is convex if and only if \(\nabla^2 f(\bx) \succeq \ZERO \quad \Forall \bx \in C\).

For real functions, the Hessian is simply the second derivative \(f''\).

Corollary 7.6 (Second order characterization of concavity)

Let \(f : \RR^n \to \RR\) be twice continuously differentiable; i.e., its Hessian or second derivative \(\nabla^2 f\) exists at every point in \(\dom f\) which is open.

Then, \(f\) is concave if and only if \(\dom f\) is convex and its Hessian is negative semidefinite for every \(\bx \in \dom f\):

\[ \nabla^2 f(\bx) \preceq \ZERO \quad \Forall \bx \in \dom f. \]

Example 7.40 (Convexity of a quadratic function)

Let \(\bP \in \SS^n\) be a symmetric matrix. Let \(\bq \in \RR^n\) and \(r \in \RR\). Consider the quadratic functional \(f: \RR^n \to \RR\) given as:

\[ f(\bx) = \frac{1}{2} \bx^T \bP \bx + \bq^T \bx + r. \]

As shown in Example 5.13, the Hessian of \(f\) is:

\[ \nabla^2 f (\bx) = \bP \quad \Forall \bx \in \RR^n. \]

Thus, \(f\) is convex if and only if \(\bP \succeq \ZERO\) (i.e., it is positive semidefinite).

In fact \(f\) is strictly convex if and only if \(P \succ \ZERO\).

Example 7.41 (Identity is convex and concave)

Let \(f : \RR \to \RR\) be:

\[ f(x) = x. \]

We have \(f'(x) = 1\) and \(f''(x) = 0\).

\(f\) is both convex and concave.

Example 7.42 (Exponential is convex)

Let \(f : \RR \to \RR\) be:

\[ f(x) = e^{ax} \]

with \(\dom f = \RR\).

We have \(f'(x) = a e^{ax}\) and \(f''(x) = a^2 e^{ax}\).

For any \(a,x \in \RR\), \(a^2 e^{ax} > 0\). Thus, \(f\) is strictly convex.

Example 7.43 (Powers)

Let \(f : \RR \to \RR\) be:

\[ f(x) = x^a \]

with \(\dom f = \RR_{++}\).

Now, \(f'(x) = a x^{a-1}\) and \(f''(x) = a (a - 1) x^{a-2}\).

  1. We have \(x > 0\).

  2. For \(a \geq 1\), \(f''(x) \geq 0\). \(f\) is convex for \(a \geq 1\).

  3. For \(a \leq 0\), \(a (a -1) \geq 0\). Thus, \(f''(x) \geq 0\). \(f\) is convex for \(a \leq 0\).

  4. For \(0 \leq a \leq 1\), \(a (a-1) \leq 0\). Thus, \(f''(x) \leq 0\). \(f\) is concave on \(0 \leq a \leq 1\).

Example 7.44 (Reciprocal powers)

Let \(f : \RR \to \RR\) be:

\[ f(x) = \frac{1}{x^r} = x^{-r}. \]

with \(\dom f = \RR_{++}\).

Now, \(f'(x) = (-r) x^{-r-1}\) and \(f''(x) = (-r)(-r - 1) x^{-r-2} = r(r+1) x^{-(r+2)}\).

  1. We have \(x > 0\).

  2. For \(r \geq 0\), \(f''(x) \geq 0\). \(f\) is convex for \(r \geq 0\).

Example 7.45 (Logarithm is concave)

Let \(f : \RR \to \RR\) be:

\[ f(x) = \ln x. \]

with \(\dom f = \RR_{++}\).

Now, \(f'(x) = \frac{1}{x}\) and \(f''(x) = \frac{-1}{x^2}\).

  1. \(f''(x) < 0\) for all \(x > 0\).

  2. Thus, \(f\) is concave for all \(x > 0\).

Example 7.46 (Negative entropy is convex)

Let \(f : \RR \to \RR\) be:

\[ f(x) = x \ln x. \]

with \(\dom f = \RR_{++}\).

Now, \(f'(x) = \ln x + 1\) and \(f''(x) = \frac{1}{x}\).

  1. \(f''(x) > 0\) for all \(x > 0\).

  2. Thus, \(f\) is convex for all \(x > 0\).

Example 7.47 (Quadratic over linear form is convex)

Let \(f : \RR \times \RR \to \RR\) be given by:

\[ f(x, y) = \frac{x^2}{y} \]

with \(\dom f = \{ (x, y) \ST y > 0\}\).

From Example 5.16, the Hessian is:

\[\begin{split} \nabla^2 f(x, y) = \frac{2}{y^3} \begin{bmatrix} y^2 & - x y\\ - x y & x^2 \end{bmatrix} = \frac{2}{y^3} \begin{bmatrix} y\\ - x \end{bmatrix} \begin{bmatrix} y\\ - x \end{bmatrix}^T . \end{split}\]

Recall that for any \(\bx \in \RR^n\), the matrix \(\bx \bx^T\) is positive semi-definite. Hence,

\[\begin{split} \begin{bmatrix} y\\ - x \end{bmatrix} \begin{bmatrix} y\\ - x \end{bmatrix}^T \end{split}\]

is positive semi-definite.

For \(y > 0\), \(\frac{2}{y^3} > 0\). Combining:

\[ \nabla^2 f(x, y) \succeq \ZERO. \]

Thus, \(f\) is convex.

Example 7.48 (Log sum exponential is convex)

Let \(f : \RR^n \to \RR\) be given by:

\[ f(\bx) = \ln \left ( \sum_{i=1}^n e^{x_i} \right ) \]

with \(\dom f = \RR^n\).

From Example 5.14, we have

\[ \nabla^2 f(\bx) = \frac{1}{(\bone^T \bz)^2} \left ((\bone^T \bz) \Diag (\bz) - \bz \bz^T \right ) \]

where

\[\begin{split} \bz = \begin{bmatrix} e^{x_1} \\ \vdots \\ e^{x_n} \end{bmatrix}. \end{split}\]

To show that \(\nabla^2 f(\bx)\) is p.s.d., it suffices to show that \((\bone^T \bz) \Diag (\bz) - \bz \bz^T\) is p.s.d..

Now for any \(\bv \in \RR^n\).

\[\begin{split} &\bv^T \left ( (\bone^T \bz) \Diag (\bz) - \bz \bz^T \right ) \bv\\ &= (\bone^T \bz) (\bv^T \Diag (\bz) \bv) - \bv^T \bz \bz^T \bv \\ &= (\bone^T \bz) (\bv^T \Diag (\bz) \bv) - (\bv^T \bz)^2 \\ &= \left (\sum_{i=1}^n z_i \right ) \left (\sum_{i=1}^n v_i^2 z_i \right) - \left (\sum_{i=1}^n v_i z_i \right )^2. \end{split}\]

If we define vectors \(\ba\) and \(\bb\) with \(a_i = v_i \sqrt{z_i}\) and \(b_i = \sqrt{z_i}\), then by Cauchy-Schwartz inequality , we have:

\[ (\ba^T \ba)(\bb^T \bb) \geq (\ba^T \bb)^2 \iff (\ba^T \ba)(\bb^T \bb) - (\ba^T \bb)^2 \geq 0. \]

But this is exactly the expression above. Thus, \(\nabla^2 f(\bx) \succeq \ZERO\).

Hence, \(f\) is convex.

Example 7.49 (Log determinant function is concave)

Let \(f : \SS^n \to \RR\) be:

\[ f(\bX) = \log \det X. \]

with \(\dom f = \SS^n_{++}\) (the set of symmetric positive definite matrices).

Let any line in \(\SS^n\) be given by:

\[ \bX = \bZ + t \bV \]

where \(\bZ, \bV \in \SS^n\).

Consider the restriction of \(f\) on a line:

\[ g(t) = \log \det (\bZ + t \bV) \]

to the interval of values where \(\bZ + t \bV \succ \ZERO\) (since \(\dom f = \SS^n_{++}\) ). In other words,

\[ \dom g = \{t \in \RR \ST \bZ + t \bV \succ \ZERO \}. \]

Without any loss of generality, we can assume that \(t=0 \in \dom g\); i.e. \(\bZ \succ \ZERO\).

Recall that:

  1. \(\det (AB) = \det(A) \det(B)\) for square matrices.

  2. \( \det (A) = \prod_{i=1}^n \lambda_i \) for symmetric matrices with \(\lambda_i\) being their eigen values.

  3. If \(\lambda_i\) are eigen values of \(A\), then the eigen values of \(I + t A\) are \(1 + t \lambda_i\).

Now

\[\begin{split} g(t) &= \log \det (\bZ + t \bV) \\ &= \log \det (\bZ^{\frac{1}{2}} (\bZ^{\frac{1}{2}} + t \bZ^{-\frac{1}{2}} \bV) )\\ &= \log \det (\bZ^{\frac{1}{2}} (I + t \bZ^{-\frac{1}{2}} \bV \bZ^{-\frac{1}{2}}) \bZ^{\frac{1}{2}})\\ &= \log \det(\bZ^{\frac{1}{2}}) + \log \det (I + t \bZ^{-\frac{1}{2}} \bV \bZ^{-\frac{1}{2}}) + \log \det(\bZ^{\frac{1}{2}})\\ &= \log \det(\bZ) + \log \det (I + t \bZ^{-\frac{1}{2}} \bV \bZ^{-\frac{1}{2}}). \end{split}\]
  1. Let \(\lambda_i\) be the eigen values of \(\bZ^{-\frac{1}{2}} \bV \bZ^{-\frac{1}{2}}\).

  2. Then, \(1 + t \lambda_i\) are eigen values of \(I + t\bZ^{-\frac{1}{2}} \bV \bZ^{-\frac{1}{2}}\).

  3. Thus, \(\log \det (I + t\bZ^{-\frac{1}{2}} \bV \bZ^{-\frac{1}{2}}) = \sum_{i=1}^n \log \det (1 + t\lambda_i)\).

Thus,

\[ g(t) = \sum_{i=1}^n \log \det (1 + t\lambda_i) + \log \det(\bZ). \]

Note that \(\log \det(\bZ)\) doesn’t depend on \(t\). Similarly, \(\lambda_i\) only depend on \(\bZ\) and \(\bV\), hence they don’t depend on \(t\).

Differentiating \(g\) w.r.t. \(t\), we get:

\[ g'(t) = \sum_{i=1}^n \frac{\lambda_i}{1 + t \lambda_i}. \]

Differentiating again, we get:

\[ g''(t) = -\sum_{i=1}^n \frac{\lambda_i^2}{(1 + t \lambda_i)^2}. \]

Since \(g''(t) \leq 0\), hence \(f\) is concave.