9.17. Conjugate Functions#

Throughout this section, we assume that V,W are real vector spaces. Wherever necessary, they are equipped with a norm or an real inner product ,. They are also equipped with a metric d(x,y)=xy as needed. V denotes the dual vector space (to V).

9.17.1. Definition and Properties#

Definition 9.78 (Conjugate function)

Let f:VR be an extended real valued function. Its conjugate function f:VR is given by

f(y)=supxV{x,yf(x)}yV.

Note that the conjugate function is a mapping from the dual vector space to extended real line.

Recall from Definition 9.51 that the support function of a set C is given by

σC(x)=supzCz,x.

9.17.1.1. Indicator Functions#

Theorem 9.237 (Conjugate of an indicator function)

Let CV be a nonempty set. Let IC:V(,] be the indicator function for the set C. Then,

IC(y)=σC(y)=supxCx,y.

In other words, the conjugate of the indicator function of a set is the support function of the same set.

Proof. Let yV be arbitrary.

  1. At any xC, we have

    x,yIC(x)=x,y.
  2. At any xC, we have

    x,yIC(x)=.
  3. Since C is nonempty, hence

    supxV{x,yf(x)}=supxCx,y.

    The result follows.

9.17.1.2. Fenchel’s Inequality#

Theorem 9.238 (Fenchel’s inequality)

Let f:V(,] be a proper function and let f be its conjugate function. Then, for any xV and yV:

f(x)+f(y)x,y.

Proof. We proceed as follows.

  1. By definition

    f(y)=supxV{x,yf(x)}.
  2. Thus, for any xV and any yV,

    f(y)x,yf(x).
  3. Since f is proper, hence f(x)> for ever x.

  4. Since f is proper, hence there exists aV such that f(a)<.

  5. Then,

    f(y)a,yf(a).
  6. The R.H.S. is a finite quantity.

  7. Thus, f(y)> for every y.

  8. If either f(x) or f(y) are , then the Fenchel inequality is valid trivially.

  9. When, both of them are finite, then the inequality f(y)x,yf(x) simplifies to

    f(x)+f(y)x,y.

9.17.1.3. Convexity and Closedness#

Theorem 9.239 (Convexity and closedness)

Let f:V(,] be an extended real valued function. Then, the conjugate function f is closed and convex.

Proof. We note that for a fixed x, the function

gx(y)=x,yf(x)

is an affine function.

  1. Due to Theorem 4.205, affine functions are closed.

  2. Due to Theorem 9.68, affine functions are convex.

  3. Thus, gx is indeed convex and closed for every x.

  4. Thus, f is a pointwise supremum of convex and closed functions.

  5. By Theorem 3.93, f is closed.

  6. By Theorem 9.114, f is convex.

  7. Thus, f is closed and convex.

The beauty of this result is the fact that The conjugate function is always closed and convex even if the original function is not convex or not closed.

9.17.1.4. Properness#

Theorem 9.240 (Properness of conjugates of proper convex functions)

Let f:V(,] be a proper convex function. Then, its conjugate f is proper.

Proof. We are given that f is a proper convex function. We shall first show that f never attains . We shall then show that f is not everywhere.

  1. Since f is proper, there exists aV such that f(a)<.

  2. By definition, for any yV

    f(y)a,yf(a).
  3. The R.H.S. in this inequality is finite for every yV.

  4. Thus, f(y)> for every y.

  5. We also need to show that there exists yV such that f(y)<.

  6. Since f is a proper convex function, hence by Corollary 9.20, there exists xdomf at which the subdifferential f(x) is nonempty.

  7. Consider any yf(x).

  8. By subgradient inequality (9.6),

    f(z)f(x)+zx,y

    for every zV.

  9. Thus,

    z,yf(z)x,yf(x)zV.
  10. The quantity x,yf(x) on the R.H.S. is finite and fixed.

  11. Taking supremum on the L.H.S., we obtain

    f(y)=supzV(z,yf(z))x,yf(x).
  12. Thus, f(y)<.

  13. Thus, f is a proper function.

9.17.2. Biconjugate#

The conjugate of the conjugate is called the biconjugate. We recall that when V is finite dimensional, then the dual of the dual V is isomorphic to V.

Definition 9.79 (Biconjugate)

Let f:VR be an extended real valued function. Its biconjugate function f:VR is given by

f(x)=supyV{x,yf(y)},xV.

9.17.2.1. As Underestimators#

Theorem 9.241 (Biconjugate is an underestimator)

The biconjugate is an underestimator of the original function.

f(x)f(x)xV.

Proof. We proceed as follows.

  1. From the definition of the conjugate

    f(y)x,yf(x)

    holds true for any xV and yV.

  2. Rewriting, we get

    f(x)x,yf(y).
  3. Taking supremum over yV on the R.H.S.,

    f(x)supyV(x,yf(y))=f(x).

Thus, the biconjugate of f is always a lower bound for f. Naturally, one is interested in conditions under which biconjugate of f equals f.

9.17.2.2. Proper Closed and Convex Functions#

Theorem 9.242 (Biconjugate for proper closed and convex functions)

Let f:V(,] be a proper, closed and convex function. Then,

f(x)=f(x)xV.

In other words, f=f. Or the biconjugate of a proper closed and convex function is the function itself.

Proof. In Theorem 9.241, we have already shown that ff. We shall now show that ff also holds true when f is proper, closed and convex.

  1. For contradiction, assume that there exists xV such that f(x)<f(x).

  2. Thus, (x,f(x))epif.

  3. Recall the definition of inner product for the direct sum space VR from Definition 9.4 given by

    (x,s),(y,t)x,y+st.
  4. Since f is proper, hence epif is nonempty.

  5. Since f is closed, hence epif is closed.

  6. Since f is convex, hence epif is convex.

  7. Thus, epif is nonempty, closed and convex.

  8. By strict separation theorem (Theorem 9.169), the set epif and the point (x,f(x)) can be strictly separated.

  9. There exists a point (a,b) with aV, bR and a scalar αR such that

    x,a+f(x)b>α and z,a+sbα(z,s)epif.
  10. It is then easy to identify constants c1,c2R such that

    z,a+sbc1<c2x,a+f(x)b(z,s)epif.

    Pick c1=α and c2=x,a+f(x)b for example.

  11. By simple arithmetic, we can conclude that

    zx,a+(sf(x))bc1c2c<0(z,s)epif.
  12. The scalar b must be nonpositive. If for contradiction, b>0, then fixing a z and increasing s sufficiently, the inequality can be violated.

  13. We now consider different cases for b0.

  14. Consider the case where b<0.

    1. Dividing the inequality by b and letting y=ab, we obtain

      zx,ys+f(x)cb<0(z,s)epif.
    2. In particular, for s=f(z), we have

      z,yx,yf(z)+f(x)cb<0zV.

      This is valid for zdomf also as in that case s=f(z)=.

    3. Taking the supremum over z on the L.H.S., we obtain

      f(y)x,y+f(x)cb<0.
    4. This implies f(y)+f(x)<x,y.

    5. This contradicts the Fenchel inequality.

    6. Thus, b<0 is not possible.

  15. Now consider the case where b=0.

    1. Since f is proper, hence domf.

    2. Take any y^domf.

    3. Pick some ϵ>0.

    4. Let a^=a+ϵy^.

    5. Let b^=ϵ.

    6. Then, for any zdomf with s=f(z),

      zx,a^+(f(z)f(x))b^=zx,a+zx,ϵy^ϵ(f(z)f(x))=zx,a+ϵ[zx,y^f(z)+f(x)]c+ϵ[z,y^f(z)+f(x)x,y^]c+ϵ[f(y^)+f(x)x,y^].
    7. Let c^=c+ϵ[f(y^)+f(x)x,y^].

    8. Since c<0, we can pick ϵ>0 small enough such that c^<0.

    9. Then, we have the inequality

      zx,a^+(f(z)f(x))b^c^<0.
    10. We now have a situation similar to the case of b<0. Here we have b^<0.

    11. Dividing both sides by b^ and letting y~=a^b^, we obtain

      z,y~f(z)x,y~+f(x)c^b^<0zV.
    12. Taking the supremum over z on the L.H.S., we get

      f(y~)+f(x)<x,y~.
    13. This again is a contradiction of Fenchel inequality.

  16. Thus, f(x)f(x) must be true for every xdomf.

9.17.2.3. Indicator and Support Functions#

Example 9.80 (Indicator and support functions)

Let C be a nonempty set.

  1. By Theorem 9.237,

    IC=σC.
  2. The set clconvC is nonempty, closed and convex.

  3. We note that since clconvC is a nonempty, closed and convex set, hence IclconvC is a closed and convex function.

  4. By Theorem 9.237, we have

    IclconvC=σclconvC
  5. Then, by Theorem 9.242,

    σclconvC=(IclconvC)=IclconvC=IclconvC.
  6. By Theorem 9.91,

    σC=σclC and σC=σconvC.
  7. Combining

    σC=σclconvC.
  8. Thus, we have

    σC=σclconvC=IclconvC.
  9. If C is nonempty, closed and convex, then

    clconvC=C.
  10. In this case,

    σC=IC.

Summary:

For a nonempty, closed and convex set CV,

σC=IC.

For an arbitrary nonempty set CV,

σC=IclconvC.

9.17.2.4. Max Function#

Example 9.81 (Conjugate for max function)

Let f:RnR be given by

f(x)max{x1,x2,,xn}.

We can rewrite it as

f(x)=supyΔny,x=σΔn(x)

where Δn is the unit simplex in Rn.

  1. Recall from Definition 9.17 that

    Δn{xRn|x,1=1,x0}.
  2. In other words, for every yΔn, yi0 and yi=1.

  3. y,x=i=1nyixi.

  4. For yΔn, this represents the weighted average of the components of x.

  5. The supremum value of the weighted average occurs when yi corresponding to the largest component of x is 1 and all remaining yj are 0.

  6. This provides the justification of the formula above.

We recall that Δn is a nonempty, closed and convex set. Then, following Example 9.80, the conjugate of max function f is

f=δΔn.

9.17.3. Conjugate Calculus#

9.17.3.1. Separable functions#

Theorem 9.243 (Conjugate for separable functions)

Let g:V1V2Vp(,] be given by

g(x1,x2,,xp)=i=1pfi(xi)

where fi:Vi(,] are proper functions. Then:

g(y1,y2,,yp)=i=1pfi(yi)yiVi,1ip.

Proof. Let y=(y1,,yp)V1Vp.

Then,

g(y)=g(y1,,yp)=supx1,,xp{(x1,,xp),(y1,,yp)g(x1,,xp)}=supx1,,xp{i=1pxi,yii=1pfi(xi)}=i=1psupxi{xi,yifi(xi)}=i=1pfi(xi).

9.17.3.2. Affine Transformations#

Theorem 9.244 (Invertible affine transformation)

Let f:V(,] be an extended real valued function. Let A:WV be an invertible linear transformation. Let aW, bW and cR. Consider the function g:W(,] given by:

g(x)f(A(xa))+x,b+cxW.

Then the convex conjugate of g is given by:

g(y)=f((AT)1(yb))+y,ab,acyW.

Proof. We introduce a variable z=A(xa).

  1. We can see that z=AxAa.

  2. Thus, xz is an affine transformation.

  3. Since A is invertible, hence this affine transformation is also invertible.

  4. Also, by invertibility of A, x=A1(z)+a.

Now, for any yW,

g(y)=supxW{x,yg(x)}=supx{x,yf(A(xa))x,bc}=supz{A1(z)+a,yf(z)A1(z)+a,bc}=supz{A1(z),ybf(z)+a,ya,bc}=supz{z,(A1)T(yb)f(z)+a,ya,bc}=f((A1)T(yb))+a,ya,bc=f((AT)1(yb))+y,ab,ac

In this derivation, we have used following facts

  1. (AT)1=(A1)T.

  2. x,y=y,x.

  3. A(x),y=x,AT(y).

9.17.3.3. Scaling#

Theorem 9.245 (Scaling)

Let f:V(,] be an extended real valued function. Let α>0.

  1. The conjugate of the function g(x)=αf(x) is given by:

    g(y)=αf(yα)yV.
  2. The conjugate of the function h(x)=αf(xα) is given by:

    h(y)=αf(y)yV.

Proof. (1) For any yV

g(y)=supxV{x,yg(x)}=supx{x,yαf(x)}=αsupx{x,yααf(x)}=αf(yα).

(2) Similarly, for any yV

h(y)=supxV{x,yh(x)}=supxV{x,yαf(xα)}s=αsupxV{xα,yf(xα)}=αsupzV{z,yf(z)}zxα=αf(y).

9.17.4. Subgradients#

Theorem 9.246 (Conjugate subgradient theorem)

Let f:V(,] be proper and convex. The following statements are equivalent for any xV and yV:

  1. y,x=f(x)+f(y).

  2. yf(x).

If f is closed, then 1 and 2 are equivalent to:

3.. xf(y).

Proof. yf(x) is true if and only if

f(z)f(x)+zx,yzV.

This is equivalent to

x,yf(x)z,yf(z)zV.

Taking this supremum over z on the R.H.S., we see that this is equivalent to

x,yf(x)f(y).

By Fenchel’s inequality Theorem 9.238,

f(x)+f(y)x,y.

Thus, the previous inequality must be an equality, giving us

f(x)+f(y)=x,y.

This establishes the equivalence between (1) and (2).

We now assume that f is closed. Thus f is proper, closed and convex. Due to Theorem 9.242

f=f.

Then (1) is equivalent to

g(x)+g(y)=x,y.

where g=f.

Then applying the equivalence between (1) and (2) for the function g, we see that

xg(y)=f(y).

Corollary 9.26 (Conjugate subgradient theorem - second formulation)

Let f:V(,] be a proper, closed and convex function. Then for any xV and yV:

f(x)=argmaxuV{x,uf(u)}

and

f(y)=argmaxvV{v,yf(v)}.

Proof. We proceed as follows.

  1. Let ydomf and xf(y).

  2. Then by Theorem 9.246, it is equivalent to

    f(y)=x,yf(x).
  3. By definition of the conjugate function

    f(y)=supvV{v,yf(v)}.
  4. since the supremum is attained at x, hence this is equivalent to

    xargmaxvV{v,yf(v)}.
  5. Hence

    f(y)=argmaxvV{v,yf(v)}.
  6. Similarly, let xf and yf(x).

  7. Following similar argument, this is equivalent to

    yargmaxuV{x,uf(u)}.
  8. Hence

    f(x)=argmaxuV{x,uf(u)}.

9.17.4.1. Lipschitz Continuity#

Recall that Theorem 9.232 shows the equivalence between the boundedness of the subgradients of a function f over a set X and the Lipschitz continuity of f over X. A similar result is available in terms of the boundedness of the domain of the conjugate.

Theorem 9.247 (Lipschitz continuity and boundedness of the domain of the conjugate)

Let f:VR be convex. Then the following statements are equivalent for a constant L>0.

  1. |f(x)f(y)|Lxy for any x,yV.

  2. gL for any gf(x) where xV.

  3. domfB[0,L].

Proof. The equivalence between (1) and (2) follows from Theorem 9.232.

We first show that (3) (2)

  1. Assume that (3) holds true.

  2. By conjugate subgradient theorem (Corollary 9.26), for any xV,

    f(x)=argmaxyV{x,yf(y)}.
  3. The maximum on the R.H.S. can be attained only at points where f(y)<.

  4. Thus if yf(x) then f(y)< must hold true.

  5. Thus f(x)domf for every xV.

  6. By hypothesis (3) f(x)B[0,L] for every xV.

  7. Hence gB[0,L] for every gf(x) for every xV.

  8. Hence gL for every gf(x) for every xV..

In the reverse direction, we will show that (1) (3)

  1. Towards this, we shall show that if (1) is true then for any zV, if z>L then zdomf.

  2. By hypothesis (1), for every xV,

    f(x)f(0)|f(x)f(0)|Lx.
  3. Rearranging

    f(x)f(0)Lx

    holds true for every xV.

  4. Then for any yV

    f(y)=supxV{x,yf(x)}supxV{x,yf(0)Lx}.
  5. Pick any zV satisfying z>L.

  6. Pick a vector uV such that u=1 and u,z=z. Such a vector exists by definition of dual norm.

  7. Consider the ray C={tu|t0}V.

  8. Continuing with the inequality for f for y=z

    f(z)supxV{x,zf(0)Lx}supxC{x,zf(0)Lx}=supt0{tu,zf(0)Ltu}=supt0{tzf(0)Lt}=supt0{t(zL)f(0)}=

    since z>L.

  9. Hence zdomf.

  10. Hence for any ydomf we have yL.

  11. Hence domfB[0,L] as desired.

9.17.5. 1-dim Functions#

Theorem 9.248 (Exponent)

Let f:RR, be given by f(x)=ex. Then, the conjugate is given by:

f(y)={ylnyy,y0,, otherwise .

Proof. To see this, we proceed as follows:

f(y)=supxR{xyf(x)}=supxR{xyex}.
  1. If y<0, then the supremum value is as x.

  2. If y=0, then the supremum value is 0 as x.

  3. If y>0, then the unique maximizer is obtained by differentiating xyex w.r.t. x giving us y=ex~ or x~=lny.

  4. Then, the supremum value for y>0 is ylnyy.

  5. Under the convention that ylny=0 when y=0, we see that ylnyy=0 for y=0.

  6. Thus, f(y)=ylnyy for y0.

  7. f(y)= otherwise.

Theorem 9.249 (Negative log)

Let f:R(,] be given by:

f(x)={ln(x)x>0x0.

Then, the conjugate is:

f(y)={1ln(y)y<0y0.

Proof. To show this, we proceed as follows:

f(y)=supx>0{xyf(x)}=supx>0{xy+ln(x)}.
  1. If y0, then the supremum value is .

  2. If y<0, then we can differentiate the expression xy+ln(x) and set it to 0 to obtain the optimal solution.

    ddx(xy+ln(x))=y+1x.
  3. Setting it to zero, we get x~=1y.

  4. The supremum value is thus 1ln(y).

Theorem 9.250 (Hinge loss)

Let f:RR be given by:

f(x)=max{1x,0}

Then, the conjugate is:

f(y)=y+I[1,0](y)yR

where I is the indicator function.

Proof. To see this, note that

f(y)=supx{xyf(x)}=supx[xymax{1x,0}]=supx[min{xy(1x),xy}]=supx[min{(1+y)x1,yx}].
  1. The terms (1+y)x1 and yx are affine in x representing straight lines.

  2. The two lines intersect at

    (1+y)x1=yxx1=0x=1.
  3. For x<1, (1+y)x1<yx.

  4. For x>1, (1+y)x1>yx.

  5. Thus, the function min{(1+y)x1,yx} is a piece-wise linear function of x.

    min{(1+y)x1,yx}={(1+y)x1,x<1yx,x1.
  6. The first piece is a line (1+y)x1 with slope 1+y over x(,1].

  7. The second piece is a line yx with slope y over x[1,).

  8. A finite supremum of this piecewise linear function exists if the slope of the left piece is nonnegative and the slope of the right piece is nonpositive.

  9. In other words, 1+y0 and y0.

  10. Thus, a finite supremum exists only for y[1,0].

  11. The supremum value for this range of y values is attained at x=1 (where the two pieces intersect) and the supremum value equals y.

  12. For all y[1,0] the supremum is infinite.

  13. Thus, domf=[1,0] and f(y)=yydomf.

  14. f(y)=y[1,0].

  15. This is succinctly represented in the expression for f with the help of an indicator function above.

Theorem 9.251 (p-th power of absolute value for p>1.)

Let for some p>1, the function f:RR be given by:

f(x)=1p|x|p

Then, its conjugate is:

f(y)=1q|y|qyR

where q>1 is the conjugate exponent satisfying:

1p+1q=1.

Proof. To see this, we proceed as follows:

  1. The conjugate is given by

    f(y)=supx{xyf(x)}=supx{xy1p|x|p}.
  2. The concave function xy1p|x|p is differentiable.

    ddx(xy1p|x|p)=ysgn(x)|x|p1.
  3. Setting the derivative to 0, the points at which the derivative vanishes are given by

    ysgn(x~)|x~|p1=0.
  4. We can rewrite this as

    y=sgn(y)|y|=sgn(x~)|x~|p1.
  5. Therefore, sgn(y)=sgn(x~) and |x~|p1=|y|.

  6. Thus,

    x~=sgn(x~)|x~|=sgn(y)|y|1p1.
  7. Accordingly, the supremum value at x~ is

    f(y)=x~y1p|x~|p=|y|1+1p11p|y|pp1=|y|pp11p|y|pp1=(11p)|y|pp1=1q|y|q

    where q=pp1 is the conjugate exponent of p.

Theorem 9.252 (p-th power of absolute value for 0<p<1)

Let for some 0<p<1, f:RR be given by:

f(x)={1pxpx0x<0.

Then, its conjugate is:

f(y)={(y)qqy<0 otherwise 

where q<0 is a negative number satisfying:

1p+1q=1.

Proof. To see this, we proceed as follows:

  1. The conjugate is given by

    f(y)=supx{xyf(x)}=supx0{xy+1pxp}.
  2. Define g:RR with domg=R+

    g(x)xy+1pxp.
  3. When y0, then g(x) as x.

  4. When y<0, then g(x)=0 at x=x~=(y)1p1.

  5. We note that g is a concave function for y<0.

  6. Thus, g achieves its supremum at x~.

  7. The supremum value is given by

    f(y)=g(x~)=x~y+1px~p=(y)1p1y+1p(y)pp1=(y)pp1+1p(y)pp1=(11p)(y)pp1=1q(y)q

    where q=pp1.

  8. The domain of f is y<0.

9.17.6. n-dim functions#

9.17.6.1. Quadratics#

Theorem 9.253 (Strictly convex quadratic)

Let f:RnR be given by:

f(x)12xTAx+bTx+c

where AS++n, bRn and cR.

Then, the conjugate is:

f(y)=12(yb)TA1(yb)c.

Proof. We proceed as follows. For any yRn,

f(y)=supx{x,yf(x)}=supx{yTx12xTAxbTxc}=supx{12xTAx(by)Txc}.
  1. The supremum of the quadratic above is achieved at x=A1(yb).

  2. The supremum value is 12(yb)TA1(yb)c.

Theorem 9.254 (Convex quadratic)

Let f:RnR be given by:

f(x)12xTAx+bTx+c

where AS+n, bRn and cR.

Then, the conjugate is:

f(y)={12(yb)TA(yb)cyb+rangeA, otherwise .

Proof. In this case, A is not positive definite but only semidefinite.

For any yRn,

f(y)=supx{x,yf(x)}=supx{yTx12xTAxbTxc}=supx{12xTAx+(yb)Txc}.
  1. Define g:RnR as

    g(x)=12xTAx+(yb)Txc.
  2. Then,

    f(y)=supxg(x).
  3. The gradient of g is given by g(x)=Ax+(yb).

  4. The maximizers of g are at points where the gradient vanishes, given by Ax=yb.

  5. This system has a solution if and only if yrangeA+b.

  6. If yrangeA+b, we can choose one of the solutions to the zero gradient equation above.

  7. One possible solution is given by the Moore-Penrose pseudo inverse of A, namely x~=A(yb).

  8. For this solution,

    f(y)=12x~TAx~+(yb)Tx~c=12(yb)TAAA(yb)+(yb)TA(yb)c=12(yb)TA(yb)c.

    We used the fact that (A)T=A since A is symmetric. Also, AAA=A.

  9. We now consider the case where yrangeA+b.

  10. This is same as ybrangeA.

  11. Recall that rangeA=(nullA).

  12. Thus, yb(nullA).

  13. Thus, there exists a vector vnullA such that (yb)Tv>0.

  14. Now, for any tR,

    g(tv)=12xTAx+(yb)Txc=t22vTAv+t(yb)Tvc=t(yb)Tvc

    since Av=0.

  15. Since (yb)Tv>0, hence g(tv) as t.

  16. Thus, f(y)=supxg(x)= if yrangeA+b.

  17. Thus, domf=rangeA+b.

9.17.6.2. Entropy#

Theorem 9.255 (Negative entropy)

Let f:Rn(,] be given by:

f(x){i=1nxiln(xi)x0, otherwise .

Then, the conjugate is:

f(y)=i=1neyi1.

Proof. We note that the function f(x) is separable over the components of x. Thus, it is enough to compute the conjugate of the scalar function g:RR defined as

g(t)={tlntt0, otherwise .

We can then apply Theorem 9.243 to compute the conjugate of f.

Now,

g(s)=supt{stg(t)}=supt0{sttlnt}.

It is easy to see that the supremum of the expression sttlnt is achieved at t=es1. Thus,

g(s)=ses1(s1)es1=es1.

Finally,

f(x)=i=1ng(xi).

Thus, due to Theorem 9.243,

f(y)=i=1ng(yi)=i=1neyi1.

Theorem 9.256 (Negative sum of logs)

Let f:Rn(,] be given by:

f(x){i=1nln(xi)xsucc0, otherwise .

Then, the conjugate is:

f(y)={ni=1nln(yi)y0, otherwise.

Proof. f is separable. Define g:RR as

g(t)={lntt>0,t0.

Then,

f(x)=i=1ng(xi).

By Theorem 9.249,

g(y)={1ln(y)y<0y0.

Due to Theorem 9.243,

f(y)=i=1ng(yi).

For y0

f(y)=i=1n(1ln(yi))=ni=1nln(yi).

For y0, one of the g(yi)=. Hence, f(y)=.

Theorem 9.257 (Negative entropy over unit simplex)

Let f:Rn(,] be given by:

f(x){i=1nxilnxixΔn otherwise .

The conjugate is:

f(y)=ln(j=1neyj)

Recall from Definition 9.17 that a unit simplex in Rn is the set of nonnegative vectors with elements summing up to 1.

Δn={xRn|x,1=1,x0}.

Proof. For any yRn,

f(y)=supx{x,yf(x)}=supx{i=1nyixii=1nxilnxi|i=1nxi=1,x1,,xn0}.

This maximization problem is equivalent to the minimization problem discussed later in Example 10.14. The optimal solution is given by

xi=eyij=1neyji=1,,n.

Accordingly, the optimal (supremum) value (following Example 10.14) is

f(y)=ln(i=1neyi).

In other words, the conjugate of the negative entropy function is the log-sum-exp function.

Theorem 9.258 (Log sum exp)

Let f:RnR be given by:

f(x)ln(j=1nexj).

The conjugate is:

f(y)={i=1nyilnyiyΔn otherwise .

Proof. Following Theorem 9.257, f=g where g is the negative entropy over the unit simplex function. Since g is proper, closed and convex, hence due to Theorem 9.242,

f=g=g.

Log-sum-exp and negative entropy over simplex and conjugate of each other.

9.17.6.3. Norms#

Theorem 9.259 (Norm)

Let f:VR be given by

f(x)=x

Then, the conjugate f:V(,] for any yV is given by:

f(y)=IB[0,1]={0y1 otherwise .

In other words, it is the indicator function for the unit ball w.r.t. the dual norm .

Proof. We proceed as follows.

  1. By Theorem 9.95, the support function for a closed unit ball of a norm is its dual norm.

  2. Thus,

    σB[0,1]==f.

    Here, we have used the fact that the dual norm of the dual norm is the norm itself.

  3. By Example 9.80, for a nonempty, closed and convex set CV,

    σC=IC.
  4. The set B[0,1] is nonempty, closed and convex.

  5. Hence,

    f=σB[0,1]=IB[0,1].

Theorem 9.260 (Squared norm)

Let f:VR be given by

f(x)=12x2

Then, the conjugate f:V(,] for any yV is given by:

f(y)=12y2.

Proof. We proceed as follows

  1. By definition of the conjugate

    f(y)=supxV{x,y12x2}.
  2. Consider the set St={xR|x=t}.

  3. Then, we can write V=t0St.

  4. Define

gt(y)=supxSt{x,y12x2}=supx|x=t{x,y12t2}.
  1. Then it is easy to see that

    f(y)=supt0gt(y).
  2. In other words, we transform the maximization (for conjugate) into a double maximization as

    f(y)=supt0supx|x=t{x,y12t2}.
  3. Now

    gt(y)=supx|x=t{x,y12t2}=supx|x=t{x,y}12t2=ty12t2.
  4. Hence

    f(y)=supt0gt(y)=supt0(ty12t2).
  5. Differentiating gt(y) w.r.t. t and setting it to zero, we obtain

    yt=0
  6. Evaluating gt(y) at t=y, we get

    f(y)=y212y2=12y2.

Theorem 9.261 (Ball-pen)

Let f:V(,] be given by

f(x){1x2x1 otherwise .

Then, the conjugate f:V(,] for any yV is given by:

f(y)=y2+1.

Generalizing further, let fα for some α>0 be defined as

fα(x){α2x2xα otherwise .

The conjugate is given by:

fα(y)=αy2+1.

Proof. We shall use the double maximization approach here.

f(y)=sup{x,y+1x2|x1}=supt[0,1]supx|x=t{x,y+1t2}=supt[0,1]{ty+1t2}.

Introduce

g(t)=ty+1t2.

Differentiating w.r.t. t and setting the derivative to 0, we get

yt1t2=0y=t1t2y2=t21t21+y2=11t211+y2=1t2111+y2=t2y21+y2=t2t=y1+y2.

Since t[0,1] hence only the positive square root is considered. It is easy to see that t[0,1].

Putting back the value of t into f(y),

f(y)=y21+y2+1y21+y2=y21+y2+11+y2=1+y21+y2=1+y2.

Next, we note that

fα(x)=αf(xα).

Applying Theorem 9.245,

fα(x)=αf(y)=α1+y2.

Theorem 9.262 (t2+||2)

Let gt:VR for some α>0 be given by:

gt(x)=t2+x2.

Then the conjugate is given by:

gt(y)={t1y2y1 otherwise .

Proof. We proceed as follows.

  1. Define g(x)=1+x2.

  2. We can see that gt(x)=tg(xt).

  3. By Theorem 9.261, g=f where f is the ball-pen function given by

    f(y)={1y2y1 otherwise .
  4. We note that f is proper, closed and convex function.

  5. By Theorem 9.242

    g=f=f.
  6. Thus,

    g(y)={1y2y1 otherwise .
  7. Applying Theorem 9.245,

    gt(y)=tg(y)={t1y2y1 otherwise .