Conjugate Functions
Contents
9.17. Conjugate Functions#
Throughout this section, we assume that
9.17.1. Definition and Properties#
(Conjugate function)
Let
Note that the conjugate function is a mapping from the dual vector space to extended real line.
Recall from Definition 9.51 that
the support function of a set
9.17.1.1. Indicator Functions#
(Conjugate of an indicator function)
Let
In other words, the conjugate of the indicator function of a set is the support function of the same set.
Proof. Let
At any
, we haveAt any
, we haveSince
is nonempty, henceThe result follows.
9.17.1.2. Fenchel’s Inequality#
(Fenchel’s inequality)
Let
Proof. We proceed as follows.
By definition
Thus, for any
and any ,Since
is proper, hence for ever .Since
is proper, hence there exists such that .Then,
The R.H.S. is a finite quantity.
Thus,
for every .If either
or are , then the Fenchel inequality is valid trivially.When, both of them are finite, then the inequality
simplifies to
9.17.1.3. Convexity and Closedness#
(Convexity and closedness)
Let
Proof. We note that for a fixed
is an affine function.
Due to Theorem 4.205, affine functions are closed.
Due to Theorem 9.68, affine functions are convex.
Thus,
is indeed convex and closed for every .Thus,
is a pointwise supremum of convex and closed functions.By Theorem 3.93,
is closed.By Theorem 9.114,
is convex.Thus,
is closed and convex.
The beauty of this result is the fact that The conjugate function is always closed and convex even if the original function is not convex or not closed.
9.17.1.4. Properness#
(Properness of conjugates of proper convex functions)
Let
Proof. We are given that
Since
is proper, there exists such that .By definition, for any
The R.H.S. in this inequality is finite for every
.Thus,
for every .We also need to show that there exists
such that .Since
is a proper convex function, hence by Corollary 9.20, there exists at which the subdifferential is nonempty.Consider any
.By subgradient inequality (9.6),
for every
.Thus,
The quantity
on the R.H.S. is finite and fixed.Taking supremum on the L.H.S., we obtain
Thus,
.Thus,
is a proper function.
9.17.2. Biconjugate#
The conjugate of the conjugate is called the biconjugate.
We recall that when
(Biconjugate)
Let
9.17.2.1. As Underestimators#
(Biconjugate is an underestimator)
The biconjugate is an underestimator of the original function.
Proof. We proceed as follows.
From the definition of the conjugate
holds true for any
and .Rewriting, we get
Taking supremum over
on the R.H.S.,
Thus, the biconjugate of
9.17.2.2. Proper Closed and Convex Functions#
(Biconjugate for proper closed and convex functions)
Let
In other words,
Proof. In Theorem 9.241,
we have already shown that
For contradiction, assume that there exists
such that .Thus,
.Recall the definition of inner product for the direct sum space
from Definition 9.4 given bySince
is proper, hence is nonempty.Since
is closed, hence is closed.Since
is convex, hence is convex.Thus,
is nonempty, closed and convex.By strict separation theorem (Theorem 9.169), the set
and the point can be strictly separated.There exists a point
with , and a scalar such thatIt is then easy to identify constants
such thatPick
and for example.By simple arithmetic, we can conclude that
The scalar
must be nonpositive. If for contradiction, , then fixing a and increasing sufficiently, the inequality can be violated.We now consider different cases for
.Consider the case where
.Dividing the inequality by
and letting , we obtainIn particular, for
, we haveThis is valid for
also as in that case .Taking the supremum over
on the L.H.S., we obtainThis implies
.This contradicts the Fenchel inequality.
Thus,
is not possible.
Now consider the case where
.Since
is proper, hence .Take any
.Pick some
.Let
.Let
.Then, for any
with ,Let
.Since
, we can pick small enough such that .Then, we have the inequality
We now have a situation similar to the case of
. Here we have .Dividing both sides by
and letting , we obtainTaking the supremum over
on the L.H.S., we getThis again is a contradiction of Fenchel inequality.
Thus,
must be true for every .
9.17.2.3. Indicator and Support Functions#
(Indicator and support functions)
Let
By Theorem 9.237,
The set
is nonempty, closed and convex.We note that since
is a nonempty, closed and convex set, hence is a closed and convex function.By Theorem 9.237, we have
Then, by Theorem 9.242,
By Theorem 9.91,
Combining
Thus, we have
If
is nonempty, closed and convex, thenIn this case,
Summary:
For a nonempty, closed and convex set
For an arbitrary nonempty set
9.17.2.4. Max Function#
(Conjugate for max function)
Let
We can rewrite it as
where
Recall from Definition 9.17 that
In other words, for every
, and . .For
, this represents the weighted average of the components of .The supremum value of the weighted average occurs when
corresponding to the largest component of is 1 and all remaining are 0.This provides the justification of the formula above.
We recall that
9.17.3. Conjugate Calculus#
9.17.3.1. Separable functions#
(Conjugate for separable functions)
Let
where
Proof. Let
Then,
9.17.3.2. Affine Transformations#
(Invertible affine transformation)
Let
Then the convex conjugate of
Proof. We introduce a variable
We can see that
.Thus,
is an affine transformation.Since
is invertible, hence this affine transformation is also invertible.Also, by invertibility of
, .
Now, for any
In this derivation, we have used following facts
. . .
9.17.3.3. Scaling#
(Scaling)
Let
The conjugate of the function
is given by:The conjugate of the function
is given by:
Proof. (1) For any
(2) Similarly, for any
9.17.4. Subgradients#
(Conjugate subgradient theorem)
Let
. .
If
3..
Proof.
This is equivalent to
Taking this supremum over
By Fenchel’s inequality Theorem 9.238,
Thus, the previous inequality must be an equality, giving us
This establishes the equivalence between (1) and (2).
We now assume that
Then (1) is equivalent to
where
Then applying the equivalence between (1)
and (2) for the function
(Conjugate subgradient theorem - second formulation)
Let
and
Proof. We proceed as follows.
Let
and .Then by Theorem 9.246, it is equivalent to
By definition of the conjugate function
since the supremum is attained at
, hence this is equivalent toHence
Similarly, let
and .Following similar argument, this is equivalent to
Hence
9.17.4.1. Lipschitz Continuity#
Recall that Theorem 9.232
shows the equivalence between the boundedness of the subgradients
of a function
(Lipschitz continuity and boundedness of the domain of the conjugate)
Let
for any . for any where . .
Proof. The equivalence between (1) and (2) follows from Theorem 9.232.
We first show that (3)
Assume that (3) holds true.
By conjugate subgradient theorem (Corollary 9.26), for any
,The maximum on the R.H.S. can be attained only at points where
.Thus if
then must hold true.Thus
for every .By hypothesis (3)
for every .Hence
for every for every .Hence
for every for every ..
In the reverse direction, we will show that (1)
Towards this, we shall show that if (1) is true then for any
, if then .By hypothesis (1), for every
,Rearranging
holds true for every
.Then for any
Pick any
satisfying .Pick a vector
such that and . Such a vector exists by definition of dual norm.Consider the ray
.Continuing with the inequality for
forsince
.Hence
.Hence for any
we have .Hence
as desired.
9.17.5. 1-dim Functions#
(Exponent)
Let
Proof. To see this, we proceed as follows:
If
, then the supremum value is as .If
, then the supremum value is as .If
, then the unique maximizer is obtained by differentiating w.r.t. giving us or .Then, the supremum value for
is .Under the convention that
when , we see that for .Thus,
for . otherwise.
(Negative log)
Let
Then, the conjugate is:
Proof. To show this, we proceed as follows:
If
, then the supremum value is .If
, then we can differentiate the expression and set it to 0 to obtain the optimal solution.Setting it to zero, we get
.The supremum value is thus
.
(Hinge loss)
Let
Then, the conjugate is:
where
Proof. To see this, note that
The terms
and are affine in representing straight lines.The two lines intersect at
For
, .For
, .Thus, the function
is a piece-wise linear function of .The first piece is a line
with slope over .The second piece is a line
with slope over .A finite supremum of this piecewise linear function exists if the slope of the left piece is nonnegative and the slope of the right piece is nonpositive.
In other words,
and .Thus, a finite supremum exists only for
.The supremum value for this range of
values is attained at (where the two pieces intersect) and the supremum value equals .For all
the supremum is infinite.Thus,
and . .This is succinctly represented in the expression for
with the help of an indicator function above.
Let for some
Then, its conjugate is:
where
Proof. To see this, we proceed as follows:
The conjugate is given by
The concave function
is differentiable.Setting the derivative to
, the points at which the derivative vanishes are given byWe can rewrite this as
Therefore,
and .Thus,
Accordingly, the supremum value at
iswhere
is the conjugate exponent of .
Let for some
Then, its conjugate is:
where
Proof. To see this, we proceed as follows:
The conjugate is given by
Define
withWhen
, then as .When
, then at .We note that
is a concave function for .Thus,
achieves its supremum at .The supremum value is given by
where
.The domain of
is .
9.17.6. n-dim functions#
9.17.6.1. Quadratics#
(Strictly convex quadratic)
Let
where
Then, the conjugate is:
Proof. We proceed as follows. For any
The supremum of the quadratic above is achieved at
.The supremum value is
.
(Convex quadratic)
Let
where
Then, the conjugate is:
Proof. In this case,
For any
Define
asThen,
The gradient of
is given by .The maximizers of
are at points where the gradient vanishes, given by .This system has a solution if and only if
.If
, we can choose one of the solutions to the zero gradient equation above.One possible solution is given by the Moore-Penrose pseudo inverse of
, namely .For this solution,
We used the fact that
since is symmetric. Also, .We now consider the case where
.This is same as
.Recall that
.Thus,
.Thus, there exists a vector
such that .Now, for any
,since
.Since
, hence as .Thus,
if .Thus,
.
9.17.6.2. Entropy#
(Negative entropy)
Let
Then, the conjugate is:
Proof. We note that the function
We can then apply Theorem 9.243
to compute the conjugate of
Now,
It is easy to see that the supremum of the expression
Finally,
Thus, due to Theorem 9.243,
(Negative sum of logs)
Let
Then, the conjugate is:
Proof.
Then,
By Theorem 9.249,
Due to Theorem 9.243,
For
For
(Negative entropy over unit simplex)
Let
The conjugate is:
Recall from Definition 9.17 that a
unit simplex in
Proof. For any
This maximization problem is equivalent to the minimization problem discussed later in Example 10.14. The optimal solution is given by
Accordingly, the optimal (supremum) value (following Example 10.14) is
In other words, the conjugate of the negative entropy function is the log-sum-exp function.
(Log sum exp)
Let
The conjugate is:
Proof. Following Theorem 9.257,
Log-sum-exp and negative entropy over simplex and conjugate of each other.
9.17.6.3. Norms#
(Norm)
Let
Then, the conjugate
In other words, it is the indicator function for the unit ball
w.r.t. the dual norm
Proof. We proceed as follows.
By Theorem 9.95, the support function for a closed unit ball of a norm is its dual norm.
Thus,
Here, we have used the fact that the dual norm of the dual norm is the norm itself.
By Example 9.80, for a nonempty, closed and convex set
,The set
is nonempty, closed and convex.Hence,
(Squared norm)
Let
Then, the conjugate
Proof. We proceed as follows
By definition of the conjugate
Consider the set
.Then, we can write
.Define
Then it is easy to see that
In other words, we transform the maximization (for conjugate) into a double maximization as
Now
Hence
Differentiating
w.r.t. and setting it to zero, we obtainEvaluating
at , we get
(Ball-pen)
Let
Then, the conjugate
Generalizing further, let
The conjugate is given by:
Proof. We shall use the double maximization approach here.
Introduce
Differentiating w.r.t.
Since
Putting back the value of
Next, we note that
Applying Theorem 9.245,
Let
Then the conjugate is given by:
Proof. We proceed as follows.
Define
.We can see that
.By Theorem 9.261,
where is the ball-pen function given byWe note that
is proper, closed and convex function.-
Thus,
Applying Theorem 9.245,