7.17. Conjugate Functions#

Throughout this section, we assume that \(\VV, \WW\) are real vector spaces. Wherever necessary, they are equipped with a norm \(\| \cdot \|\) or an real inner product \(\langle \cdot, \cdot \rangle\). They are also equipped with a metric \(d(\bx, \by) = \| \bx - \by \|\) as needed. \(\VV^*\) denotes the dual vector space (to \(\VV\)).

7.17.1. Definition and Properties#

Definition 7.78 (Conjugate function)

Let \(f : \VV \to \ERL\) be an extended real valued function. Its conjugate function \(f^*: \VV^* \to \ERL\) is given by

\[ f^*(\by) = \underset{\bx \in \VV}{\sup} \{ \langle \bx, \by \rangle - f(\bx)\} \Forall \by \in \VV^*. \]

Note that the conjugate function is a mapping from the dual vector space to extended real line.

Recall from Definition 7.51 that the support function of a set \(C\) is given by

\[ \sigma_C (\bx) = \sup_{\bz \in C} \langle \bz, \bx \rangle. \]

7.17.1.1. Indicator Functions#

Theorem 7.237 (Conjugate of an indicator function)

Let \(C \subseteq \VV\) be a nonempty set. Let \(I_C : \VV \to \RERL\) be the indicator function for the set \(C\). Then,

\[ I_C^*(\by) = \sigma_C (\by) = \underset{\bx \in C}{\sup} \langle \bx, \by \rangle. \]

In other words, the conjugate of the indicator function of a set is the support function of the same set.

Proof. Let \(\by \in \VV*\) be arbitrary.

  1. At any \(\bx \in C\), we have

    \[ \langle \bx, \by \rangle - I_C(\bx) = \langle \bx, \by \rangle. \]
  2. At any \(\bx \notin C\), we have

    \[ \langle \bx, \by \rangle - I_C(\bx) = -\infty. \]
  3. Since \(C\) is nonempty, hence

    \[ \underset{\bx \in \VV}{\sup} \{ \langle \bx, \by \rangle - f(\bx)\} = \underset{\bx \in C}{\sup} \langle \bx, \by \rangle. \]

    The result follows.

7.17.1.2. Fenchel’s Inequality#

Theorem 7.238 (Fenchel’s inequality)

Let \(f: \VV \to \RERL\) be a proper function and let \(f^*\) be its conjugate function. Then, for any \(\bx \in \VV\) and \(\by \in \VV^*\):

\[ f(\bx) + f^*(\by) \geq \langle \bx, \by \rangle. \]

Proof. We proceed as follows.

  1. By definition

    \[ f^*(\by) = \underset{\bx \in \VV}{\sup} \{ \langle \bx, \by \rangle - f(\bx)\}. \]
  2. Thus, for any \(\bx \in \VV\) and any \(\by \in \VV^*\),

    \[ f^*(\by) \geq \langle \bx, \by \rangle - f(\bx). \]
  3. Since \(f\) is proper, hence \(f(\bx) > -\infty\) for ever \(\bx\).

  4. Since \(f\) is proper, hence there exists \(\ba \in \VV\) such that \(f(\ba) < \infty\).

  5. Then,

    \[ f^*(\by) \geq \langle \ba, \by \rangle - f(\ba). \]
  6. The R.H.S. is a finite quantity.

  7. Thus, \(f^*(\by) > -\infty\) for every \(\by\).

  8. If either \(f(\bx)\) or \(f^*(\by)\) are \(\infty\), then the Fenchel inequality is valid trivially.

  9. When, both of them are finite, then the inequality \(f^*(\by) \geq \langle \bx, \by \rangle - f(\bx)\) simplifies to

    \[ f(\bx) + f^*(\by) \geq \langle \bx, \by \rangle . \]

7.17.1.3. Convexity and Closedness#

Theorem 7.239 (Convexity and closedness)

Let \(f : \VV \to \RERL\) be an extended real valued function. Then, the conjugate function \(f^*\) is closed and convex.

Proof. We note that for a fixed \(\bx\), the function

\[ g_x(\by) = \langle \bx, \by \rangle - f(\bx) \]

is an affine function.

  1. Due to Theorem 4.173, affine functions are closed.

  2. Due to Theorem 7.68, affine functions are convex.

  3. Thus, \(g_x\) is indeed convex and closed for every \(\bx\).

  4. Thus, \(f\) is a pointwise supremum of convex and closed functions.

  5. By Theorem 3.93, \(f\) is closed.

  6. By Theorem 7.114, \(f\) is convex.

  7. Thus, \(f\) is closed and convex.

The beauty of this result is the fact that The conjugate function is always closed and convex even if the original function is not convex or not closed.

7.17.1.4. Properness#

Theorem 7.240 (Properness of conjugates of proper convex functions)

Let \(f : \VV \to \RERL\) be a proper convex function. Then, its conjugate \(f^*\) is proper.

Proof. We are given that \(f\) is a proper convex function. We shall first show that \(f*\) never attains \(-\infty\). We shall then show that \(f^*\) is not \(\infty\) everywhere.

  1. Since \(f\) is proper, there exists \(\ba \in \VV\) such that \(f(\ba) < \infty\).

  2. By definition, for any \(\by \in \VV^*\)

    \[ f^*(\by) \geq \langle \ba, \by \rangle - f(\ba). \]
  3. The R.H.S. in this inequality is finite for every \(\by \in \VV^*\).

  4. Thus, \(f^*(\by) > -\infty\) for every \(\by\).

  5. We also need to show that there exists \(\by \in \VV^*\) such that \(f^*(\by) < \infty\).

  6. Since \(f\) is a proper convex function, hence by Corollary 7.20, there exists \(\bx \in \dom f\) at which the subdifferential \(\partial f(\bx)\) is nonempty.

  7. Consider any \(\by \in \partial f(\bx)\).

  8. By subgradient inequality (7.6),

    \[ f(\bz) \geq f(\bx) + \langle \bz - \bx, \by \rangle \]

    for every \(\bz \in \VV\).

  9. Thus,

    \[ \langle \bz, \by \rangle - f(\bz) \leq \langle \bx, \by \rangle - f(\bx) \Forall \bz \in \VV. \]
  10. The quantity \(\langle \bx, \by \rangle - f(\bx)\) on the R.H.S. is finite and fixed.

  11. Taking supremum on the L.H.S., we obtain

    \[ f^*(\by) = \sup_{\bz \in \VV} (\langle \bz, \by \rangle - f(\bz)) \leq \langle \bx, \by \rangle - f(\bx). \]
  12. Thus, \(f^*(\by) < \infty\).

  13. Thus, \(f^*\) is a proper function.

7.17.2. Biconjugate#

The conjugate of the conjugate is called the biconjugate. We recall that when \(\VV\) is finite dimensional, then the dual of the dual \(\VV^**\) is isomorphic to \(\VV\).

Definition 7.79 (Biconjugate)

Let \(f : \VV \to \ERL\) be an extended real valued function. Its biconjugate function \(f^{**}: \VV \to \ERL\) is given by

\[ f^{**} (\bx) = \underset{\by \in \VV^*}{\sup} \{ \langle \bx, \by \rangle - f^*(\by) \}, \quad \bx \in \VV. \]

7.17.2.1. As Underestimators#

Theorem 7.241 (Biconjugate is an underestimator)

The biconjugate is an underestimator of the original function.

\[ f(\bx) \geq f^{**} (\bx) \Forall \bx \in \VV. \]

Proof. We proceed as follows.

  1. From the definition of the conjugate

    \[ f^*(\by) \geq \langle \bx, \by \rangle - f(\bx) \]

    holds true for any \(\bx \in \VV\) and \(\by \in \VV^*\).

  2. Rewriting, we get

    \[ f(\bx) \geq \langle \bx, \by \rangle - f^*(\by). \]
  3. Taking supremum over \(\by \in \VV^*\) on the R.H.S.,

    \[ f(\bx) \geq \sup_{\by \in \VV^*} (\langle \bx, \by \rangle - f^*(\by)) = f^{**}(\bx). \]

Thus, the biconjugate of \(f\) is always a lower bound for \(f\). Naturally, one is interested in conditions under which biconjugate of \(f\) equals \(f\).

7.17.2.2. Proper Closed and Convex Functions#

Theorem 7.242 (Biconjugate for proper closed and convex functions)

Let \(f : \VV \to \RERL\) be a proper, closed and convex function. Then,

\[ f(\bx) = f^{**} (\bx) \Forall x \in \VV. \]

In other words, \(f^{**} = f\). Or the biconjugate of a proper closed and convex function is the function itself.

Proof. In Theorem 7.241, we have already shown that \(f^{**} \preceq f\). We shall now show that \(f \preceq f^{**}\) also holds true when \(f\) is proper, closed and convex.

  1. For contradiction, assume that there exists \(\bx \in \VV\) such that \(f^{**}(\bx) < f(\bx)\).

  2. Thus, \((\bx, f^{**}(\bx)) \notin \epi f\).

  3. Recall the definition of inner product for the direct sum space \(\VV \oplus \RR\) from Definition 7.4 given by

    \[ \langle (\bx, s), (\by, t) \rangle \triangleq \langle \bx, \by \rangle + st. \]
  4. Since \(f\) is proper, hence \(\epi f\) is nonempty.

  5. Since \(f\) is closed, hence \(\epi f\) is closed.

  6. Since \(f\) is convex, hence \(\epi f\) is convex.

  7. Thus, \(\epi f\) is nonempty, closed and convex.

  8. By strict separation theorem (Theorem 7.169), the set \(\epi f\) and the point \((\bx, f^{**}(\bx))\) can be strictly separated.

  9. There exists a point \((\ba, b)\) with \(\ba \in \VV^*\), \(b \in \RR\) and a scalar \(\alpha \in \RR\) such that

    \[ \langle \bx, \ba \rangle + f^{**}(\bx) b > \alpha \text{ and } \langle \bz, \ba \rangle + s b \leq \alpha \Forall (\bz, s) \in \epi f. \]
  10. It is then easy to identify constants \(c_1, c_2 \in \RR\) such that

    \[ \langle \bz, \ba \rangle + s b \leq c_1 < c_2 \leq \langle \bx, \ba \rangle + f^{**}(\bx) b \Forall (\bz, s) \in \epi f. \]

    Pick \(c_1 = \alpha\) and \(c_2 = \langle \bx, \ba \rangle + f^{**}(\bx) b\) for example.

  11. By simple arithmetic, we can conclude that

    \[ \langle \bz - \bx, \ba \rangle + (s - f^{**}(\bx)) b \leq c_1 - c_2 \triangleq c < 0 \Forall (\bz, s) \in \epi f. \]
  12. The scalar \(b\) must be nonpositive. If for contradiction, \(b > 0\), then fixing a \(\bz\) and increasing \(s\) sufficiently, the inequality can be violated.

  13. We now consider different cases for \(b \leq 0\).

  14. Consider the case where \(b < 0\).

    1. Dividing the inequality by \(-b\) and letting \(\by = \frac{\ba}{-b}\), we obtain

      \[ \langle \bz - \bx, \by \rangle - s + f^{**}(\bx) \leq \frac{c}{-b} < 0 \Forall (\bz, s) \in \epi f. \]
    2. In particular, for \(s = f(\bz)\), we have

      \[ \langle \bz, \by \rangle - \langle \bx, \by \rangle - f(\bz) + f^{**}(\bx) \leq \frac{c}{-b} < 0 \Forall \bz \in \VV. \]

      This is valid for \(\bz \notin \dom f\) also as in that case \(s = f(\bz) = \infty\).

    3. Taking the supremum over \(\bz\) on the L.H.S., we obtain

      \[ f^*(\by) - \langle \bx, \by \rangle + f^{**}(\bx) \leq \frac{c}{-b} < 0. \]
    4. This implies \(f^*(\by) + f^{**}(\bx) < \langle \bx, \by \rangle\).

    5. This contradicts the Fenchel inequality.

    6. Thus, \(b < 0\) is not possible.

  15. Now consider the case where \(b=0\).

    1. Since \(f^*\) is proper, hence \(\dom f^* \neq \EmptySet\).

    2. Take any \(\widehat{\by} \in \dom f^*\).

    3. Pick some \(\epsilon > 0\).

    4. Let \(\widehat{\ba} = \ba + \epsilon \widehat{\by}\).

    5. Let \(\widehat{b} = - \epsilon\).

    6. Then, for any \(\bz \in \dom f\) with \(s=f(\bz)\),

      \[\begin{split} \langle \bz - \bx, \widehat{\ba} \rangle + (f(\bz) - f^{**}(\bx)) \widehat{b} &= \langle \bz - \bx, \ba \rangle + \langle \bz - \bx, \epsilon \widehat{\by} \rangle - \epsilon (f(\bz) - f^{**}(\bx)) \\ &= \langle \bz - \bx, \ba \rangle + \epsilon [\langle \bz - \bx, \widehat{\by} \rangle -f(\bz) + f^{**}(\bx)] \\ &\leq c + \epsilon[\langle \bz, \widehat{\by} \rangle - f(\bz) + f^{**}(\bx) - \langle \bx, \widehat{\by} \rangle]\\ &\leq c + \epsilon [f^*(\widehat{\by}) + f^{**}(\bx) - \langle \bx, \widehat{\by} \rangle]. \end{split}\]
    7. Let \(\widehat{c} = c + \epsilon [f^*(\widehat{\by}) + f^{**}(\bx) - \langle \bx, \widehat{\by} \rangle]\).

    8. Since \(c < 0\), we can pick \(\epsilon > 0\) small enough such that \(\widehat{c} < 0\).

    9. Then, we have the inequality

      \[ \langle \bz - \bx, \widehat{\ba} \rangle + (f(\bz) - f^{**}(\bx)) \widehat{b} \leq \widehat{c} < 0. \]
    10. We now have a situation similar to the case of \(b < 0\). Here we have \(\widehat{b} < 0\).

    11. Dividing both sides by \(-\widehat{b}\) and letting \(\tilde{\by} = \frac{\widehat{\ba}}{-\widehat{b}}\), we obtain

      \[ \langle \bz, \tilde{\by} \rangle - f(\bz) - \langle \bx, \tilde{\by} \rangle + f^{**}(\bx) \leq - \frac{\widehat{c}}{\widehat{b}} < 0 \Forall \bz \in \VV. \]
    12. Taking the supremum over \(\bz\) on the L.H.S., we get

      \[ f^*(\tilde{\by}) + f^{**}(\bx) < \langle \bx, \tilde{\by} \rangle. \]
    13. This again is a contradiction of Fenchel inequality.

  16. Thus, \(f^{**}(\bx) \geq f(\bx)\) must be true for every \(\bx \in \dom f\).

7.17.2.3. Indicator and Support Functions#

Example 7.80 (Indicator and support functions)

Let \(C\) be a nonempty set.

  1. By Theorem 7.237,

    \[ I^*_C = \sigma_C. \]
  2. The set \(\closure \ConvexHull C\) is nonempty, closed and convex.

  3. We note that since \(\closure \convex C\) is a nonempty, closed and convex set, hence \(I_{\closure \convex C}\) is a closed and convex function.

  4. By Theorem 7.237, we have

    \[ I^*_{\closure \convex C} = \sigma_{\closure \convex C} \]
  5. Then, by Theorem 7.242,

    \[ \sigma^*_{\closure \convex C} = (I^*_{\closure \convex C})^* = I^{**}_{\closure \convex C} = I_{\closure \convex C}. \]
  6. By Theorem 7.91,

    \[ \sigma_C = \sigma_{\closure C} \text{ and } \sigma_C = \sigma_{\convex C}. \]
  7. Combining

    \[ \sigma_C = \sigma_{\closure \convex C}. \]
  8. Thus, we have

    \[ \sigma^*_C = \sigma^*_{\closure \convex C} = I_{\closure \convex C}. \]
  9. If \(C\) is nonempty, closed and convex, then

    \[ \closure \convex C = C. \]
  10. In this case,

    \[ \sigma^*_C = I_C. \]

Summary:

For a nonempty, closed and convex set \(C \subseteq \VV\),

\[ \sigma^*_C = I_C. \]

For an arbitrary nonempty set \(C \subseteq \VV\),

\[ \sigma^*_C = I_{\closure \ConvexHull C}. \]

7.17.2.4. Max Function#

Example 7.81 (Conjugate for max function)

Let \(f : \RR^n \to \RR\) be given by

\[ f(\bx) \triangleq \max \{x_1, x_2, \dots, x_n \}. \]

We can rewrite it as

\[ f(\bx) = \underset{\by \in \Delta_n}{\sup} \langle \by, \bx \rangle = \sigma_{\Delta_n} (x) \]

where \(\Delta_n\) is the unit simplex in \(\RR^n\).

  1. Recall from Definition 7.17 that

    \[ \Delta_n \triangleq \{\bx \in \RR^n \ST \langle \bx, \bone \rangle = 1, \bx \succeq \bzero \}. \]
  2. In other words, for every \(\by \in \Delta_n\), \(y_i \geq 0\) and \(\sum y_i = 1\).

  3. \(\langle \by, \bx \rangle = \sum_{i=1}^n y_i x_i\).

  4. For \(\by \in \Delta_n\), this represents the weighted average of the components of \(\bx\).

  5. The supremum value of the weighted average occurs when \(y_i\) corresponding to the largest component of \(\bx\) is 1 and all remaining \(y_j\) are 0.

  6. This provides the justification of the formula above.

We recall that \(\Delta_n\) is a nonempty, closed and convex set. Then, following Example 7.80, the conjugate of max function \(f\) is

\[ f^* = \delta_{\Delta_n}. \]

7.17.3. Conjugate Calculus#

7.17.3.1. Separable functions#

Theorem 7.243 (Conjugate for separable functions)

Let \(g: \VV_1 \oplus \VV_2 \oplus \dots \oplus \VV_p \to \RERL\) be given by

\[ g(\bx_1, \bx_2, \dots, \bx_p) = \sum_{i=1}^p f_i (\bx_i) \]

where \(f_i : \VV_i \to \RERL\) are proper functions. Then:

\[ g^*(\by_1, \by_2, \dots, \by_p) = \sum_{i=1}^p f_i^*(\by_i) \Forall \by_i \in \VV_i^*, 1 \leq i \leq p. \]

Proof. Let \(\by = (\by_1, \dots, \by_p) \in \VV^*_1 \oplus \dots \oplus \VV^*_p\).

Then,

\[\begin{split} g^*(\by) &= g^*(\by_1, \dots, \by_p) \\ &= \sup_{\bx_1, \dots, \bx_p} \{ \langle (\bx_1, \dots, \bx_p), (\by_1, \dots, \by_p) \rangle - g(\bx_1, \dots, \bx_p)\} \\ &= \sup_{\bx_1, \dots, \bx_p} \left \{ \sum_{i=1}^p \langle \bx_i, \by_i \rangle - \sum_{i=1}^p f_i(\bx_i) \right \} \\ &= \sum_{i=1}^p \sup_{\bx_i} \{ \langle \bx_i, \by_i \rangle - f_i(\bx_i) \} \\ &= \sum_{i=1}^p f^*_i(\bx_i). \end{split}\]

7.17.3.2. Affine Transformations#

Theorem 7.244 (Invertible affine transformation)

Let \(f : \VV \to \RERL\) be an extended real valued function. Let \(\bAAA : \WW \to \VV\) be an invertible linear transformation. Let \(\ba \in \WW\), \(\bb \in \WW^*\) and \(c \in \RR\). Consider the function \(g: \WW \to \RERL\) given by:

\[ g(\bx) \triangleq f\left (\bAAA (\bx - \ba) \right ) + \langle \bx, \bb \rangle + c \Forall \bx \in \WW. \]

Then the convex conjugate of \(g\) is given by:

\[ g^*(\by) = f^*\left ((\bAAA^T)^{-1} (\by - \bb) \right ) + \langle \by, \ba \rangle - \langle \bb, \ba \rangle - c \Forall \by \in \WW^*. \]

Proof. We introduce a variable \(\bz = \bAAA (\bx - \ba)\).

  1. We can see that \(\bz = \bAAA \bx - \bAAA \ba\).

  2. Thus, \(\bx \mapsto \bz\) is an affine transformation.

  3. Since \(\bAAA\) is invertible, hence this affine transformation is also invertible.

  4. Also, by invertibility of \(\bAAA\), \(\bx = \bAAA^{-1}(\bz) + \ba\).

Now, for any \(\by \in \WW^*\),

\[\begin{split} g^*(\by) &= \sup_{\bx \in \WW} \{ \langle \bx, \by \rangle - g(\bx) \} \\ &= \sup_{\bx } \{ \langle \bx, \by \rangle - f\left (\bAAA (\bx - \ba) \right ) - \langle \bx, \bb \rangle - c \} \\ &= \sup_{\bz} \{ \langle \bAAA^{-1}(\bz) + \ba, \by \rangle - f (\bz) - \langle \bAAA^{-1}(\bz) + \ba, \bb \rangle - c \} \\ &= \sup_{\bz} \{ \langle \bAAA^{-1}(\bz), \by - \bb \rangle - f (\bz) + \langle \ba, \by \rangle - \langle \ba, \bb \rangle - c \} \\ &= \sup_{\bz} \{ \langle \bz, (\bAAA^{-1})^T(\by - \bb) \rangle - f (\bz) + \langle \ba, \by \rangle - \langle \ba, \bb \rangle - c \} \\ &= f^* \left ( (\bAAA^{-1})^T(\by - \bb) \right ) + \langle \ba, \by \rangle - \langle \ba, \bb \rangle - c\\ &= f^* \left ( (\bAAA^T)^{-1}(\by - \bb) \right ) + \langle \by, \ba \rangle - \langle \bb, \ba \rangle - c\\ \end{split}\]

In this derivation, we have used following facts

  1. \((\bAAA^T)^{-1} = (\bAAA^{-1})^T\).

  2. \(\langle \bx, \by \rangle = \langle \by, \bx \rangle\).

  3. \(\langle \bA (\bx), \by \rangle = \langle \bx, \bA^T (\by) \rangle\).

7.17.3.3. Scaling#

Theorem 7.245 (Scaling)

Let \(f : \VV \to \RERL\) be an extended real valued function. Let \(\alpha > 0\).

  1. The conjugate of the function \(g(\bx) = \alpha f(\bx)\) is given by:

    \[ g^*(\by) = \alpha f^*\left (\frac{\by}{\alpha} \right ) \Forall \by \in \VV^*. \]
  2. The conjugate of the function \(h(\bx) = \alpha f(\frac{\bx}{\alpha})\) is given by:

    \[ h^*(\by) = \alpha f^*(\by) \Forall \by \in \VV^*. \]

Proof. (1) For any \(\by \in \VV^*\)

\[\begin{split} g^*(\by) &= \sup_{\bx \in \VV} \{ \langle \bx, \by \rangle - g(\bx) \} \\ &= \sup_{\bx} \{ \langle \bx, \by \rangle - \alpha f(\bx) \} \\ &= \alpha \sup_{\bx} \{ \langle \bx, \frac{\by}{\alpha} \rangle - \alpha f(\bx) \} \\ &= \alpha f^*\left ( \frac{\by}{\alpha} \right ). \end{split}\]

(2) Similarly, for any \(\by \in \VV^*\)

\[\begin{split} h^*(\by) &= \sup_{\bx \in \VV} \{ \langle \bx, \by \rangle - h(\bx) \} & \\ &= \sup_{\bx \in \VV} \{ \langle \bx, \by \rangle - \alpha f(\frac{\bx}{\alpha}) \} & s\\ &= \alpha \sup_{\bx \in \VV} \left \{ \left \langle \frac{\bx}{\alpha}, \by \right \rangle - f\left (\frac{\bx}{\alpha} \right ) \right \} & \\ &= \alpha \sup_{\bz \in \VV} \{ \langle \bz, \by \rangle - f(\bz) \} & \bz \triangleq \frac{\bx}{\alpha} \\ &= \alpha f^*(\by). \end{split}\]

7.17.4. Subgradients#

Theorem 7.246 (Conjugate subgradient theorem)

Let \(f : \VV \to \RERL\) be proper and convex. The following statements are equivalent for any \(\bx \in \VV\) and \(\by \in \VV^*\):

  1. \(\langle \by, \bx \rangle = f(\bx) + f^*(\by)\).

  2. \(\by \in \partial f(\bx)\).

If \(f\) is closed, then 1 and 2 are equivalent to:

3.. \(\bx \in \partial f^*(\by)\).

Proof. \(\by \in \partial f(\bx)\) is true if and only if

\[ f(\bz) \geq f(\bx) + \langle \bz - \bx , \by \rangle \Forall \bz \in \VV. \]

This is equivalent to

\[ \langle \bx , \by \rangle - f(\bx) \geq \langle \bz , \by \rangle - f(\bz) \Forall \bz \in \VV. \]

Taking this supremum over \(\bz\) on the R.H.S., we see that this is equivalent to

\[ \langle \bx , \by \rangle - f(\bx) \geq f^*(\by). \]

By Fenchel’s inequality Theorem 7.238,

\[ f(\bx) + f^*(\by) \geq \langle \bx, \by \rangle. \]

Thus, the previous inequality must be an equality, giving us

\[ f(\bx) + f^*(\by) = \langle \bx, \by \rangle. \]

This establishes the equivalence between (1) and (2).

We now assume that \(f\) is closed. Thus \(f\) is proper, closed and convex. Due to Theorem 7.242

\[ f^{**} = f. \]

Then (1) is equivalent to

\[ g^*(\bx) + g(\by) = \langle \bx, \by \rangle. \]

where \(g = f^*\).

Then applying the equivalence between (1) and (2) for the function \(g\), we see that

\[ \bx \in \partial g(\by) = \partial f^*(\by). \]

Corollary 7.26 (Conjugate subgradient theorem - second formulation)

Let \(f : \VV \to \RERL\) be a proper, closed and convex function. Then for any \(\bx \in \VV\) and \(\by \in \VV^*\):

\[ \partial f(\bx) = \argmax_{\bu \in \VV^*} \{ \langle \bx, \bu \rangle - f^*(\bu) \} \]

and

\[ \partial f^*(\by) = \argmax_{\bv \in \VV} \{ \langle \bv, \by \rangle - f(\bv) \}. \]

Proof. We proceed as follows.

  1. Let \(\by \in \dom \partial f^*\) and \(\bx \in \partial f^*(\by)\).

  2. Then by Theorem 7.246, it is equivalent to

    \[ f^*(\by) = \langle \bx, \by \rangle - f(\bx). \]
  3. By definition of the conjugate function

    \[ f^*(\by) = \sup_{\bv \in \VV} \{ \langle \bv, \by \rangle - f(\bv) \}. \]
  4. since the supremum is attained at \(\bx\), hence this is equivalent to

    \[ \bx \in \argmax_{\bv \in \VV} \{ \langle \bv, \by \rangle - f(\bv) \}. \]
  5. Hence

    \[ \partial f^*(\by) = \argmax_{\bv \in \VV} \{ \langle \bv, \by \rangle - f(\bv) \}. \]
  6. Similarly, let \(\bx \in \partial f\) and \(\by \in \partial f(\bx)\).

  7. Following similar argument, this is equivalent to

    \[ \by \in \argmax_{\bu \in \VV^*} \{ \langle \bx, \bu \rangle - f^*(\bu) \}. \]
  8. Hence

    \[ \partial f(\bx) = \argmax_{\bu \in \VV^*} \{ \langle \bx, \bu \rangle - f^*(\bu) \}. \]

7.17.4.1. Lipschitz Continuity#

Recall that Theorem 7.232 shows the equivalence between the boundedness of the subgradients of a function \(f\) over a set \(X\) and the Lipschitz continuity of \(f\) over \(X\). A similar result is available in terms of the boundedness of the domain of the conjugate.

Theorem 7.247 (Lipschitz continuity and boundedness of the domain of the conjugate)

Let \(f: \VV \to \RR\) be convex. Then the following statements are equivalent for a constant \(L > 0\).

  1. \(| f(\bx) - f(\by) | \leq L \| \bx - \by \|\) for any \(\bx, \by \in \VV\).

  2. \( \| \bg \|_* \leq L\) for any \(\bg \in \partial f(\bx)\) where \(\bx \in \VV\).

  3. \(\dom f^* \subseteq B_{\| \cdot \|_*} [\bzero, L]\).

Proof. The equivalence between (1) and (2) follows from Theorem 7.232.

We first show that (3) \(\implies\) (2)

  1. Assume that (3) holds true.

  2. By conjugate subgradient theorem (Corollary 7.26), for any \(\bx \in \VV\),

    \[ \partial f(\bx) = \argmax_{\by \in \VV^*} \{ \langle \bx, \by \rangle - f^*(\by) \}. \]
  3. The maximum on the R.H.S. can be attained only at points where \(f^*(\by) < \infty\).

  4. Thus if \(\by \in \partial f(\bx)\) then \(f^*(\by) < \infty\) must hold true.

  5. Thus \(\partial f(\bx) \subseteq \dom f^*\) for every \(\bx \in \VV\).

  6. By hypothesis (3) \(\partial f(\bx) \subseteq B_{\| \cdot \|_*} [\bzero, L]\) for every \(\bx \in \VV\).

  7. Hence \(\bg \in B_{\| \cdot \|_*} [\bzero, L]\) for every \(\bg \in \partial f(\bx)\) for every \(\bx \in \VV\).

  8. Hence \(\| \bg \|_* \leq L\) for every \(\bg \in \partial f(\bx)\) for every \(\bx \in \VV\)..

In the reverse direction, we will show that (1) \(\implies\) (3)

  1. Towards this, we shall show that if (1) is true then for any \(\bz \in \VV^*\), if \(\| \bz \|_* > L\) then \(\bz \notin \dom f^*\).

  2. By hypothesis (1), for every \(\bx \in \VV\),

    \[ f(\bx) - f(\bzero) \leq |f(\bx) - f(\bzero) | \leq L \| \bx \|. \]
  3. Rearranging

    \[ -f(\bx) \geq - f(\bzero) - L \| \bx \| \]

    holds true for every \(\bx \in \VV\).

  4. Then for any \(\by \in \VV^*\)

    \[\begin{split} f^*(\by) &= \sup_{\bx \in \VV} \{\langle \bx, \by \rangle - f(\bx) \} \\ &\geq \sup_{\bx \in \VV} \{\langle \bx, \by \rangle - f(\bzero) - L \| \bx \| \}. \end{split}\]
  5. Pick any \(\bz \in \VV^*\) satisfying \(\| \bz \|_* > L\).

  6. Pick a vector \(\bu \in \VV\) such that \(\| \bu \| = 1\) and \(\langle \bu, \bz \rangle = \| \bz \|_*\). Such a vector exists by definition of dual norm.

  7. Consider the ray \(C = \{ t \bu \ST t \geq 0 \} \subseteq \VV\).

  8. Continuing with the inequality for \(f^*\) for \(\by = \bz\)

    \[\begin{split} f^*(\bz) &\geq \sup_{\bx \in \VV} \{\langle \bx, \bz \rangle - f(\bzero) - L \| \bx \| \}\\ &\geq \sup_{\bx \in C} \{\langle \bx, \bz \rangle - f(\bzero) - L \| \bx \| \}\\ &= \sup_{t \geq 0} \{\langle t \bu, \bz \rangle - f(\bzero) - L t \| \bu \| \}\\ &= \sup_{t \geq 0} \{t \| \bz \|_* - f(\bzero) - L t \}\\ &= \sup_{t \geq 0} \{t (\| \bz \|_* - L) - f(\bzero) \}\\ &= \infty \end{split}\]

    since \(\| \bz \|_* > L\).

  9. Hence \(\bz \notin \dom f^*\).

  10. Hence for any \(\by \in \dom f^*\) we have \(\| \by \|_* \leq L\).

  11. Hence \(\dom f^* \subseteq B_{\| \cdot \|_*} [\bzero, L]\) as desired.

7.17.5. 1-dim Functions#

Theorem 7.248 (Exponent)

Let \(f : \RR \to \RR\), be given by \(f(x) = e^x\). Then, the conjugate is given by:

\[\begin{split} f^*(y) = \begin{cases} y \ln y - y, & y \geq 0, \\ \infty, & \text{ otherwise }. \end{cases} \end{split}\]

Proof. To see this, we proceed as follows:

\[\begin{split} f^*(y) &= \sup_{x \in \RR} \{ x y - f(x) \} \\ &= \sup_{x \in \RR} \{ x y - e^x \}. \end{split}\]
  1. If \(y < 0\), then the supremum value is \(\infty\) as \(x \to -\infty\).

  2. If \(y = 0\), then the supremum value is \(0\) as \(x \to -\infty\).

  3. If \(y > 0\), then the unique maximizer is obtained by differentiating \(x y - e^x\) w.r.t. \(x\) giving us \(y = e^{\tilde{x}}\) or \(\tilde{x} = \ln y\).

  4. Then, the supremum value for \(y > 0\) is \(y \ln y - y\).

  5. Under the convention that \(y \ln y = 0\) when \(y = 0\), we see that \(y \ln y - y = 0\) for \(y = 0\).

  6. Thus, \(f^*(y) = y \ln y - y\) for \(y \geq 0\).

  7. \(f^*(y) = \infty\) otherwise.

Theorem 7.249 (Negative log)

Let \(f : \RR \to (-\infty,\infty]\) be given by:

\[\begin{split} f(x) = \begin{cases} - \ln (x) & x > 0 \\ \infty & x \leq 0 \end{cases}. \end{split}\]

Then, the conjugate is:

\[\begin{split} f^*(y) = \begin{cases} -1 - \ln (-y) & y < 0 \\ \infty & y \geq 0 \end{cases}. \end{split}\]

Proof. To show this, we proceed as follows:

\[\begin{split} f^*(y) &= \sup_{x > 0} \{ xy - f(x) \} \\ &= \sup_{x > 0} \{ xy + \ln (x) \}. \end{split}\]
  1. If \(y \geq 0\), then the supremum value is \(\infty\).

  2. If \(y < 0\), then we can differentiate the expression \(xy + \ln (x)\) and set it to 0 to obtain the optimal solution.

    \[ \frac{d}{d x} (x y + \ln(x)) = y + \frac{1}{x}. \]
  3. Setting it to zero, we get \(\tilde{x} = - \frac{1}{y}\).

  4. The supremum value is thus \(-1 - \ln(-y)\).

Theorem 7.250 (Hinge loss)

Let \(f : \RR \to \RR\) be given by:

\[ f(x) = \max \{1 - x, 0 \} \]

Then, the conjugate is:

\[ f^*(y) = y + I_{[-1,0]} (y) \Forall y \in \RR \]

where \(I\) is the indicator function.

Proof. To see this, note that

\[\begin{split} f^*(y) &= \sup_{x} \{ xy - f(x) \} \\ &= \sup_{x} [ xy - \max \{1 - x, 0 \} ] \\ &= \sup_{x} [ \min \{xy - (1-x), xy \} ] \\ &= \sup_{x} [ \min \{(1+y)x - 1, yx \} ]. \end{split}\]
  1. The terms \((1+y)x - 1\) and \(yx\) are affine in \(x\) representing straight lines.

  2. The two lines intersect at

    \[\begin{split} & (1+y)x - 1 = y x \\ & \implies x - 1 = 0 \\ & \implies x = 1. \end{split}\]
  3. For \(x < 1\), \((1+y)x - 1 < y x\).

  4. For \(x > 1\), \((1+y)x - 1 > y x\).

  5. Thus, the function \( \min \{(1+y)x - 1, y x \}\) is a piece-wise linear function of \(x\).

    \[\begin{split} \min \{(1+y)x - 1, y x \} = \begin{cases} (1+y)x - 1, & x < 1 \\ y x, & x \geq 1. \end{cases} \end{split}\]
  6. The first piece is a line \((1+y)x - 1\) with slope \(1+y\) over \(x \in (-\infty, 1]\).

  7. The second piece is a line \(y x\) with slope \(y\) over \(x \in [1, \infty)\).

  8. A finite supremum of this piecewise linear function exists if the slope of the left piece is nonnegative and the slope of the right piece is nonpositive.

  9. In other words, \(1 + y \geq 0\) and \(y \leq 0\).

  10. Thus, a finite supremum exists only for \(y \in [-1, 0]\).

  11. The supremum value for this range of \(y\) values is attained at \(x=1\) (where the two pieces intersect) and the supremum value equals \(y\).

  12. For all \(y \notin [-1, 0]\) the supremum is infinite.

  13. Thus, \(\dom f^* = [-1,0]\) and \(f^*(y) = y \Forall y \in \dom f^*\).

  14. \(f^*(y) = \infty \Forall y \notin [-1,0]\).

  15. This is succinctly represented in the expression for \(f^*\) with the help of an indicator function above.

Theorem 7.251 (p-th power of absolute value for \(p > 1\).)

Let for some \(p > 1\), the function \(f : \RR \to \RR\) be given by:

\[ f(x) = \frac{1}{p} | x |^p \]

Then, its conjugate is:

\[ f^*(y) = \frac{1}{q} | y |^q \quad \Forall y \in \RR \]

where \(q > 1\) is the conjugate exponent satisfying:

\[ \frac{1}{p} + \frac{1}{q} = 1. \]

Proof. To see this, we proceed as follows:

  1. The conjugate is given by

    \[ f^*(y) = \sup_{x} \{ x y - f (x) \} = \sup_x \{x y - \frac{1}{p} | x |^p \}. \]
  2. The concave function \(xy - \frac{1}{p} | x |^p\) is differentiable.

    \[ \frac{d}{d x} (xy - \frac{1}{p} | x |^p) = y - \sgn (x) | x |^{p - 1}. \]
  3. Setting the derivative to \(0\), the points at which the derivative vanishes are given by

    \[ y - \sgn (\tilde{x}) | \tilde{x} |^{p - 1} = 0. \]
  4. We can rewrite this as

    \[ y = \sgn(y) | y | = \sgn (\tilde{x}) | \tilde{x} |^{p - 1}. \]
  5. Therefore, \(\sgn (y) = \sgn (\tilde{x})\) and \(| \tilde{x} |^{p - 1} = | y|\).

  6. Thus,

    \[ \tilde{x} = \sgn (\tilde{x}) |\tilde{x} | = \sgn (y)| y|^{\frac{1}{p - 1}}. \]
  7. Accordingly, the supremum value at \(\tilde{x}\) is

    \[\begin{split} f^*(y) &= \tilde{x} y - \frac{1}{p} | \tilde{x} |^p \\ &= |y|^{1 + \frac{1}{p - 1}} - \frac{1}{p}| y|^{\frac{p}{p - 1}} \\ &= |y|^{\frac{p}{p - 1}} - \frac{1}{p}| y|^{\frac{p}{p - 1}} \\ &= \left (1 - \frac{1}{p} \right ) | y|^{\frac{p}{p - 1}} \\ &= \frac{1}{q} | y|^q \end{split}\]

    where \(q = \frac{p}{p -1}\) is the conjugate exponent of \(p\).

Theorem 7.252 (p-th power of absolute value for \(0 < p < 1\))

Let for some \(0 < p < 1\), \(f : \RR \to \RR\) be given by:

\[\begin{split} f(x) = \begin{cases} - \frac{1}{p} x^p & x \geq 0 \\ \infty & x < 0 \end{cases}. \end{split}\]

Then, its conjugate is:

\[\begin{split} f^*(y) = \begin{cases} - \frac{(-y)^q}{q} & y < 0 \\ \infty & \text{ otherwise } \end{cases} \end{split}\]

where \(q < 0\) is a negative number satisfying:

\[ \frac{1}{p} + \frac{1}{q} = 1. \]

Proof. To see this, we proceed as follows:

  1. The conjugate is given by

    \[ f^*(y) = \sup_{x} \{ x y - f (x) \} = \sup_{x \geq 0} \{x y + \frac{1}{p} x^p \}. \]
  2. Define \( g : \RR \to \RR\) with \(\dom g = \RR_+\)

    \[ g(x) \triangleq x y + \frac{1}{p} x ^p. \]
  3. When \(y \geq 0\), then \(g(x) \to \infty\) as \(x \to \infty\).

  4. When \(y < 0\), then \(g'(x) = 0\) at \(x = \tilde{x} = (-y)^{\frac{1}{p -1}}\).

  5. We note that \(g\) is a concave function for \(y < 0\).

  6. Thus, \(g\) achieves its supremum at \(\tilde{x}\).

  7. The supremum value is given by

    \[\begin{split} f^*(y) &= g(\tilde{x}) \\ &= \tilde{x} y + \frac{1}{p} \tilde{x}^p \\ &= (-y)^{\frac{1}{p -1}} y + \frac{1}{p} (-y)^{\frac{p}{p -1}} \\ &= - (-y)^{\frac{p}{p -1}} + \frac{1}{p} (-y)^{\frac{p}{p -1}} \\ &= - \left (1 - \frac{1}{p} \right ) (-y)^{\frac{p}{p -1}} \\ &= - \frac{1}{q} (-y)^{q} \end{split}\]

    where \(q = \frac{p}{p -1}\).

  8. The domain of \(f^*\) is \(y < 0\).

7.17.6. n-dim functions#

7.17.6.1. Quadratics#

Theorem 7.253 (Strictly convex quadratic)

Let \(f : \RR^n \to \RR\) be given by:

\[ f(\bx) \triangleq \frac{1}{2} \bx^T \bA \bx + \bb^T \bx + c \]

where \(A \in \SS^n_{++}\), \(\bb \in \RR^n\) and \(c \in \RR\).

Then, the conjugate is:

\[ f^*(\by) = \frac{1}{2}(\by - \bb)^T \bA^{-1}(\by -\bb) - c. \]

Proof. We proceed as follows. For any \(\by \in \RR^n\),

\[\begin{split} f^*(\by) &= \sup_{\bx} \{ \langle \bx, \by \rangle - f(\bx) \} \\ &= \sup_{\bx} \{ \by^T \bx - \frac{1}{2} \bx^T \bA \bx - \bb^T \bx - c \} \\ &= \sup_{\bx} \{ - \frac{1}{2} \bx^T \bA \bx - (\bb - \by)^T \bx - c \}. \end{split}\]
  1. The supremum of the quadratic above is achieved at \(\bx = \bA^{-1}(\by - \bb)\).

  2. The supremum value is \(\frac{1}{2}(\by - \bb)^T \bA^{-1}(\by -\bb) - c\).

Theorem 7.254 (Convex quadratic)

Let \(f : \RR^n \to \RR\) be given by:

\[ f(\bx) \triangleq \frac{1}{2} \bx^T \bA \bx + \bb^T \bx + c \]

where \(A \in \SS^n_{+}\), \(\bb \in \RR^n\) and \(c \in \RR\).

Then, the conjugate is:

\[\begin{split} f^*(\by) = \begin{cases} \frac{1}{2}(\by - \bb)^T \bA^{\dag}(\by -\bb) - c & \by \in \bb + \range \bA,\\ \infty & \text{ otherwise }. \end{cases} \end{split}\]

Proof. In this case, \(\bA\) is not positive definite but only semidefinite.

For any \(\by \in \RR^n\),

\[\begin{split} f^*(\by) &= \sup_{\bx} \{ \langle \bx, \by \rangle - f(\bx) \} \\ &= \sup_{\bx} \{ \by^T \bx - \frac{1}{2} \bx^T \bA \bx - \bb^T \bx - c \} \\ &= \sup_{\bx} \{ - \frac{1}{2} \bx^T \bA \bx + (\by - \bb)^T \bx - c \}. \end{split}\]
  1. Define \(g : \RR^n \to \RR\) as

    \[ g(\bx) = - \frac{1}{2} \bx^T \bA \bx + (\by - \bb)^T \bx - c. \]
  2. Then,

    \[ f^*(\by) = \sup_{\bx} g(\bx). \]
  3. The gradient of \(g\) is given by \(\nabla g(\bx) = - \bA \bx + (\by - \bb)\).

  4. The maximizers of \(g\) are at points where the gradient vanishes, given by \(\bA \bx = \by - \bb\).

  5. This system has a solution if and only if \(\by \in \range \bA + \bb\).

  6. If \(\by \in \range \bA + \bb\), we can choose one of the solutions to the zero gradient equation above.

  7. One possible solution is given by the Moore-Penrose pseudo inverse of \(\bA\), namely \(\tilde{\bx} = \bA^{\dag} (\by - \bb)\).

  8. For this solution,

    \[\begin{split} f^*(\by) &= - \frac{1}{2} \tilde{\bx}^T \bA \tilde{\bx} + (\by - \bb)^T \tilde{\bx} - c \\ &= - \frac{1}{2} (\by - \bb)^T \bA^{\dag} \bA \bA^{\dag} (\by - \bb) + (\by - \bb)^T \bA^{\dag} (\by - \bb) - c\\ &= \frac{1}{2} (\by - \bb)^T \bA^{\dag} (\by - \bb) - c. \end{split}\]

    We used the fact that \((\bA^{\dag})^T = \bA^{\dag}\) since \(\bA\) is symmetric. Also, \(\bA^{\dag} \bA \bA^{\dag} = \bA^{\dag}\).

  9. We now consider the case where \(\by \notin \range \bA + \bb\).

  10. This is same as \(\by - \bb \notin \range \bA\).

  11. Recall that \(\range \bA = (\nullspace \bA)^{\perp}\).

  12. Thus, \(\by - \bb \notin (\nullspace \bA)^{\perp}\).

  13. Thus, there exists a vector \(\bv \in \nullspace \bA\) such that \((\by - \bb)^T \bv > 0\).

  14. Now, for any \(t \in \RR\),

    \[\begin{split} g(t \bv) &= - \frac{1}{2} \bx^T \bA \bx + (\by - \bb)^T \bx - c \\ &= - \frac{t^2}{2} \bv^T \bA \bv + t (\by - \bb)^T \bv - c \\ &= t (\by - \bb)^T \bv - c \end{split}\]

    since \(\bA \bv = \bzero\).

  15. Since \((\by - \bb)^T \bv > 0\), hence \(g(t \bv) \to \infty\) as \(t \to \infty\).

  16. Thus, \(f^*(\by) = \sup_{\bx} g(\bx) = \infty\) if \(\by \notin \range \bA + \bb\).

  17. Thus, \(\dom f^* = \range \bA + \bb\).

7.17.6.2. Entropy#

Theorem 7.255 (Negative entropy)

Let \(f : \RR^n \to \RERL\) be given by:

\[\begin{split} f(\bx) \triangleq \begin{cases} \sum_{i=1}^n x_i \ln (x_i) & \bx \succeq \bzero,\\ \infty & \text{ otherwise }. \end{cases} \end{split}\]

Then, the conjugate is:

\[ f^*(\by) = \sum_{i=1}^n e^{y_i - 1}. \]

Proof. We note that the function \(f(\bx)\) is separable over the components of \(\bx\). Thus, it is enough to compute the conjugate of the scalar function \(g : \RR \to \RR\) defined as

\[\begin{split} g(t) = \begin{cases} t \ln t & \Forall t \geq 0, \\ \infty & \text{ otherwise }. \end{cases} \end{split}\]

We can then apply Theorem 7.243 to compute the conjugate of \(f\).

Now,

\[\begin{split} g^*(s) &= \sup_{t} \{ st - g(t) \} \\ &= \sup_{t \geq 0} \{ st - t \ln t \}. \end{split}\]

It is easy to see that the supremum of the expression \(st - t \ln t\) is achieved at \(t = e^{s - 1}\). Thus,

\[ g^*(s) = s e^{s - 1} - (s - 1) e^{s - 1} = e^{s - 1}. \]

Finally,

\[ f(\bx) = \sum_{i=1}^n g(x_i). \]

Thus, due to Theorem 7.243,

\[ f^*(\by) = \sum_{i=1}^n g^* (y_i) = \sum_{i=1}^n e^{y_i - 1}. \]

Theorem 7.256 (Negative sum of logs)

Let \(f : \RR^n \to \RERL\) be given by:

\[\begin{split} f(\bx) \triangleq \begin{cases} - \sum_{i=1}^n \ln (x_i) & \bx \succ \bzero,\\ \infty & \text{ otherwise }. \end{cases} \end{split}\]

Then, the conjugate is:

\[\begin{split} f^*(\by) = \begin{cases} - n - \sum_{i=1}^n \ln(-y_i) & \by \prec \bzero, \\ \infty & \text{ otherwise}. \end{cases} \end{split}\]

Proof. \(f\) is separable. Define \(g : \RR \to \RR\) as

\[\begin{split} g(t) = \begin{cases} - \ln t & t > 0, \\ \infty & t \leq 0. \end{cases} \end{split}\]

Then,

\[ f(\bx) = \sum_{i=1}^n g(x_i). \]

By Theorem 7.249,

\[\begin{split} g^*(y) = \begin{cases} -1 - \ln (-y) & y < 0 \\ \infty & y \geq 0 \end{cases}. \end{split}\]

Due to Theorem 7.243,

\[ f^*(\by) = \sum_{i=1}^n g^*(y_i). \]

For \(\by \prec \bzero\)

\[ f^*(\by) = \sum_{i=1}^n (- 1 - \ln (-y_i)) = -n - \sum_{i=1}^n \ln (-y_i). \]

For \(\by \succeq \bzero\), one of the \(g^*(y_i) = \infty\). Hence, \(f^*(\by) = \infty\).

Theorem 7.257 (Negative entropy over unit simplex)

Let \(f : \RR^n \to \RERL\) be given by:

\[\begin{split} f(\bx) \triangleq \begin{cases} \sum_{i=1}^n x_i \ln x_i & \bx \in \Delta_n\\ \infty & \text{ otherwise } \end{cases}. \end{split}\]

The conjugate is:

\[ f^*(\by) = \ln \left ( \sum_{j=1}^n e^{y_j} \right ) \]

Recall from Definition 7.17 that a unit simplex in \(\RR^n\) is the set of nonnegative vectors with elements summing up to \(1\).

\[ \Delta_n = \{\bx \in \RR^n \ST \langle \bx, \bone \rangle = 1, \bx \succeq \bzero \}. \]

Proof. For any \(\by \in \RR^n\),

\[\begin{split} f^*(\by) &= \sup_{\bx} \{ \langle \bx, \by \rangle - f(\bx)\}\\ &= \sup_{\bx} \left \{ \sum_{i=1}^n y_i x_i - \sum_{i=1}^n x_i \ln x_i \ST \sum_{i=1}^n x_i = 1, x_1, \dots, x_n \geq 0 \right \}. \end{split}\]

This maximization problem is equivalent to the minimization problem discussed later in Example 8.14. The optimal solution is given by

\[ x_i^* = \frac{e^{y_i}}{\sum_{j=1}^n e^{y_j}} \Forall i = 1,\dots,n. \]

Accordingly, the optimal (supremum) value (following Example 8.14) is

\[ f^*(\by) = \ln \left ( \sum_{i=1}^n e^{y_i} \right ). \]

In other words, the conjugate of the negative entropy function is the log-sum-exp function.

Theorem 7.258 (Log sum exp)

Let \(f : \RR^n \to \RR\) be given by:

\[ f(\bx) \triangleq \ln \left ( \sum_{j=1}^n e^{x_j} \right ). \]

The conjugate is:

\[\begin{split} f^*(\by) = \begin{cases} \sum_{i=1}^n y_i \ln y_i & \by \in \Delta_n\\ \infty & \text{ otherwise } \end{cases}. \end{split}\]

Proof. Following Theorem 7.257, \(f = g^*\) where \(g\) is the negative entropy over the unit simplex function. Since \(g\) is proper, closed and convex, hence due to Theorem 7.242,

\[ f^* = g^{**} = g. \]

Log-sum-exp and negative entropy over simplex and conjugate of each other.

7.17.6.3. Norms#

Theorem 7.259 (Norm)

Let \(f : \VV \to \RR\) be given by

\[ f(\bx) = \| \bx \| \]

Then, the conjugate \(f^* : \VV^* \to \RERL\) for any \(\by \in \VV^*\) is given by:

\[\begin{split} f^*(\by) = I_{B_{\| \cdot \|_*}[\bzero, 1]} = \begin{cases} 0 & \| \by \|_* \leq 1 \\ \infty & \text{ otherwise }. \end{cases} \end{split}\]

In other words, it is the indicator function for the unit ball w.r.t. the dual norm \(\| \cdot \|_*\).

Proof. We proceed as follows.

  1. By Theorem 7.95, the support function for a closed unit ball of a norm is its dual norm.

  2. Thus,

    \[ \sigma_{B_{\| \cdot \|_*}[\bzero, 1]} = \| \cdot \| = f. \]

    Here, we have used the fact that the dual norm of the dual norm is the norm itself.

  3. By Example 7.80, for a nonempty, closed and convex set \(C \subseteq \VV\),

    \[ \sigma^*_C = I_C. \]
  4. The set \(B_{\| \cdot \|_*}[\bzero, 1]\) is nonempty, closed and convex.

  5. Hence,

    \[ f^* = \sigma^*_{B_{\| \cdot \|_*}[\bzero, 1]} = I_{B_{\| \cdot \|_*}[\bzero, 1]}. \]

Theorem 7.260 (Squared norm)

Let \(f : \VV \to \RR\) be given by

\[ f(\bx) = \frac{1}{2}\| \bx \|^2 \]

Then, the conjugate \(f^* : \VV^* \to \RERL\) for any \(\by \in \VV^*\) is given by:

\[ f^*(\by) = \frac{1}{2} \| \by \|_*^2. \]

Proof. We proceed as follows

  1. By definition of the conjugate

    \[ f^*(\by) = \sup_{\bx \in \VV}\left \{ \langle \bx, \by \rangle - \frac{1}{2}\| \bx \|^2 \right \}. \]
  2. Consider the set \(S_t = \{ \bx \in \RR \ST \| \bx \| = t \}\).

  3. Then, we can write \(\VV = \bigcup_{t \geq 0} S_t\).

  4. Define

\[ g_t(\by) = \sup_{\bx \in S_t}\left \{ \langle \bx, \by \rangle - \frac{1}{2}\| \bx \|^2 \right \} = \sup_{\bx \ST \| \bx \| = t} \left \{ \langle \bx, \by \rangle - \frac{1}{2}t^2 \right \}. \]
  1. Then it is easy to see that

    \[ f^*(\by) = \sup_{t \geq 0}g_t(\by). \]
  2. In other words, we transform the maximization (for conjugate) into a double maximization as

    \[ f^*(\by) = \sup_{t \geq 0} \sup_{\bx \ST \| \bx \| = t} \left \{ \langle \bx, \by \rangle - \frac{1}{2}t^2 \right \}. \]
  3. Now

    \[\begin{split} g_t(\by) &= \sup_{\bx \ST \| \bx \| = t} \left \{ \langle \bx, \by \rangle - \frac{1}{2}t^2 \right \}\\ &= \sup_{\bx \ST \| \bx \| = t} \{ \langle \bx, \by \rangle \} - \frac{1}{2}t^2 \\ &= t \| \by \|_* - \frac{1}{2}t^2. \end{split}\]
  4. Hence

    \[ f^*(\by) = \sup_{t \geq 0}g_t(\by) = \sup_{t \geq 0} \left ( t \| \by \|_* - \frac{1}{2}t^2 \right ). \]
  5. Differentiating \(g_t(\by)\) w.r.t. \(t\) and setting it to zero, we obtain

    \[ \| \by \|_* - t^* = 0 \]
  6. Evaluating \(g_t(\by)\) at \(t^* = \| \by \|_*\), we get

    \[ f^*(\by) = \| \by \|_*^2 - \frac{1}{2} \| \by \|_*^2 = \frac{1}{2} \| \by \|_*^2. \]

Theorem 7.261 (Ball-pen)

Let \(f : \VV \to \RERL\) be given by

\[\begin{split} f(\bx) \triangleq \begin{cases} - \sqrt{1 - \| x \|^2} & \| x \| \leq 1\\ \infty & \text{ otherwise }. \end{cases} \end{split}\]

Then, the conjugate \(f^* : \VV^* \to \RERL\) for any \(\by \in \VV^*\) is given by:

\[ f^*(\by) = \sqrt{\| y \|_*^2 + 1}. \]

Generalizing further, let \(f_{\alpha}\) for some \(\alpha > 0\) be defined as

\[\begin{split} f_{\alpha}(\bx) \triangleq \begin{cases} - \sqrt{\alpha^2 - \| x \|^2} & \| x \| \leq \alpha\\ \infty & \text{ otherwise }. \end{cases} \end{split}\]

The conjugate is given by:

\[ f_{\alpha}^*(\by) = \alpha \sqrt{\| y \|_*^2 + 1}. \]

Proof. We shall use the double maximization approach here.

\[\begin{split} f^*(\by) &= \sup \left \{ \langle \bx, \by \rangle + \sqrt{1 - \| x \|^2} \ST \| \bx \| \leq 1 \right \} \\ &= \sup_{t \in [0,1]} \sup_{\bx \ST \| \bx \| = t} \left \{ \langle \bx, \by \rangle + \sqrt{1 - t^2} \right \} \\ &= \sup_{t \in [0,1]} \left \{ t \| \by \|_* + \sqrt{1 - t^2} \right \}. \end{split}\]

Introduce

\[ g(t) = t \| \by \|_* + \sqrt{1 - t^2}. \]

Differentiating w.r.t. \(t\) and setting the derivative to 0, we get

\[\begin{split} & \| \by \|_* - \frac{ t }{\sqrt{1-t^2}} = 0 \\ \implies & \| \by \|_* = \frac{ t }{\sqrt{1-t^2}} \\ \implies & \| \by \|_*^2 = \frac{ t^2 }{1-t^2} \\ \implies & 1 + \| \by \|_*^2 = \frac{1 }{1-t^2} \\ \implies & \frac{1}{1 + \| \by \|_*^2} = 1-t^2 \\ \implies & 1 - \frac{1}{1 + \| \by \|_*^2} = t^2 \\ \implies & \frac{\| \by \|_*^2}{1 + \| \by \|_*^2} = t^2 \\ \implies & t^* = \frac{\| \by \|_*}{\sqrt{1 + \| \by \|_*^2}}. \end{split}\]

Since \(t \in [0,1]\) hence only the positive square root is considered. It is easy to see that \(t^* \in [0,1]\).

Putting back the value of \(t^*\) into \(f^*(\by)\),

\[\begin{split} f^*(\by) &= \frac{\| \by \|_*^2}{\sqrt{1 + \| \by \|_*^2}} + \sqrt{1 - \frac{\| \by \|_*^2}{1 + \| \by \|_*^2}} \\ &= \frac{\| \by \|_*^2}{\sqrt{1 + \| \by \|_*^2}} + \frac{1}{\sqrt{1 + \| \by \|_*^2}} \\ &= \frac{1 + \| \by \|_*^2}{\sqrt{1 + \| \by \|_*^2}} \\ &= \sqrt{1 + \| \by \|_*^2}. \end{split}\]

Next, we note that

\[ f_{\alpha} (\bx) = \alpha f\left ( \frac{\bx}{\alpha} \right ). \]

Applying Theorem 7.245,

\[ f^*_{\alpha} (\bx) = \alpha f^*(\by) = \alpha \sqrt{1 + \| \by \|_*^2}. \]

Theorem 7.262 (\(\sqrt{t^2 + | \cdot |^2}\))

Let \(g_t : \VV \to \RR\) for some \(\alpha > 0\) be given by:

\[ g_t (\bx) = \sqrt{t^2 + \| x \|^2}. \]

Then the conjugate is given by:

\[\begin{split} g_{t}^*(\by) = \begin{cases} -t \sqrt{1 - \| y \|_*^2} & \| y \|_* \leq 1\\ \infty & \text{ otherwise } \end{cases}. \end{split}\]

Proof. We proceed as follows.

  1. Define \(g(\bx) = \sqrt{1 + \|\bx \|^2}\).

  2. We can see that \(g_t(\bx) = t g \left ( \frac{\bx}{t} \right)\).

  3. By Theorem 7.261, \(g = f^*\) where \(f\) is the ball-pen function given by

    \[\begin{split} f(\by) = \begin{cases} - \sqrt{1 - \| \by \|_*^2} & \| \by \|_* \leq 1\\ \infty & \text{ otherwise }. \end{cases} \end{split}\]
  4. We note that \(f\) is proper, closed and convex function.

  5. By Theorem 7.242

    \[ g^* = f^{**} = f. \]
  6. Thus,

    \[\begin{split} g^* (\by) = \begin{cases} - \sqrt{1 - \| \by \|_*^2} & \| \by \|_* \leq 1\\ \infty & \text{ otherwise }. \end{cases} \end{split}\]
  7. Applying Theorem 7.245,

    \[\begin{split} g_t^*(\by) = t g^*(\by) = \begin{cases} - t \sqrt{1 - \| \by \|_*^2} & \| \by \|_* \leq 1\\ \infty & \text{ otherwise }. \end{cases} \end{split}\]