10.7. Constrained Optimization II#

In this section, we present several results for the general problem of optimizing a cost function over a constraint set.

Throughout this section, we assume that V is an n-dimensional real vector space endowed with an inner product ,:V×VR and a norm :VR.

Main references for this section are [9].

10.7.1. General Constrained Optimization#

Let f:V(,] be a proper function with S=domf. Let CV be a nonempty set.

We are concerned with the optimization problems of the form

(10.15)#minimize f(x)subject to xC.

We provide optimality conditions for three specific type of problems.

  1. A continuously differentiable function f over an arbitrary constraint set C.

  2. A convex, possibly nonsmooth, cost function f and a convex constraint set C.

  3. A cost function consisting of the sum of a smooth (continuously differentiable) function and a convex function and an arbitrary constraint set.

For the development of the results in this section, we will need the supporting machinery of the following notions:

  • Feasible directions and the cone of feasible directions

  • Tangent directions and tangent cone

  • Normal directions and normal cone

For a nonconvex optimization problem, the optimality conditions provide the necessary conditions for a feasible point to be a local optimal solution.

For convex problems, the optimality conditions provide the necessary and sufficient conditions for a feasible point to be a global optimal solution.

10.7.2. Feasible Directions#

At any point in the constraint set C, a feasible direction is a direction along which if we move slightly, then we can stay within C.

Definition 10.23 (Feasible direction)

Consider the optimization problem (10.15). Given a vector xC, a direction dV is said to be a feasible direction of C at x if there exists a t>0 such that

x+tdC for every t[0,t].

The set of all feasible directions of C at x is denoted by FC(x).

Observation 10.8 (The cone of feasible directions)

The set of feasible directions at x is a cone.

  1. We can see that 0FC(x).

  2. Also, if dFC(x) then for any r>0, rdFC(x) also holds true.

10.7.2.1. Convexity#

Theorem 10.60 (Convexity of the cone of feasible directions)

Let C be a nonempty subset of V. Let xC. If C is convex then FC(x) is convex.

Proof. We are given that C is convex.

  1. For any point yC, the line segment [x,y]C.

  2. Hence the set FC(x) consists of all the vectors of the form t(yx) where t>0 and yC.

  3. Let d1,d2FC(x) and r(0,1).

  4. Then there exist y1,y2C and t1,t2>0 such that

    d1=t1(y1x),d2=t2(y2x).
  5. Then

    d=rd1+(1r)d2=rt1(y1x)+(1r)t2(y2x)=(rt1+(1r)t2)(rt1rt1+(1r)t2y1+(1r)t2rt1+(1r)t2y2x).
  6. Let s=rt1rt1+(1r)t2.

  7. Then s(0,1) and 1s=(1r)t2rt1+(1r)t2.

  8. Hence y=sy1+(1s)y2C due to convexity of C.

  9. Hence d=(rt1+(1r)t2)(yx).

  10. Hence d is a feasible direction.

  11. Hence FC(x) is convex.

10.7.3. Tangent Cones#

10.7.3.1. Tangent Direction#

Definition 10.24 (Tangent direction)

Let C be a nonempty subset of V. Let xC. A vector tV is called a tangent of C at x if either t=0 or there exists a sequence {xk} of C such that xkx for every k, and

xkx,xkxxkxtt.
  1. For a nonzero direction t, the term tt is a normalized direction.

  2. Since xkx for every k, hence xkx0 for every k.

  3. The term xkxxkx is also a normalized direction.

  4. Hence the sequence {xkxxkx} is a sequence of normalized directions.

  5. Thus a nonzero direction t is a tangent at x if it is possible to approach x with a feasible sequence {xk} such that the normalized direction sequence {xkxxkx} converges to tt.

10.7.3.2. Tangent Cone#

Theorem 10.61 (The cone of tangent directions)

Let C be a nonempty subset of V. Let xC. The set of tangent directions of C at x is a cone.

Proof. We can see that by definition 0 is a tangent direction.

  1. Let t0 be a tangent direction.

  2. Let α>0.

  3. Then

    αtαt=tt=xkxxkx

    for some sequence {xk} converging to x with xkx for every k.

  4. Hence αt is also a tangent direction.

  5. Hence the set of tangent directions is a cone.

Definition 10.25 (Tangent cone)

Let C be a nonempty subset of V. Let xC. The set of all tangent directions of C at x is called the tangent cone of C at x and is denoted by TC(x).

10.7.3.3. Characterization#

Theorem 10.62 (Characterization of tangent directions)

Let C be a nonempty subset of V. Let xC. A vector tV is a tangent of C at x if and only if there exists a sequence {xk} of C and a positive scalar sequence {rk} such that rk0 and xkxrkt.

Proof. Let t be a tangent of C at x.

  1. If t=0, then we can take xk=x for every k and rk=1k for every k.

  2. Then rk0 and xkxrk0.

  3. Now consider the case where t0.

  4. By definition of tangent, there exists a sequence {xk} of C with xkx, xkx and xkxxkxtt.

  5. Let rk=xkxt.

  6. Clearly rk>0 and rk0 since xkx.

  7. Also

    xkxrk=txkxxkx.
  8. Since by definition of tangent

    xkxxkxtt,

    hence

    xkxrkt

    as desired.

Conversely, suppose t is such that there exist sequences {xk} and {rk} with the given properties.

  1. If t=0 then it is a tangent.

  2. Now consider the case where t0

  3. Since rk0 and (xkx)/rkt hence we must have xkx.

  4. It is also possible to choose a subsequence of {xk} such that xklx for every kl. Otherwise, (xkx)/rk0 but we are given that t0.

  5. WLOG assume that xkx for every k.

  6. Then

    xkxxkx=(xkx)/rkxkx/rktt.
  7. Hence t must be a tangent direction.

10.7.3.4. Closedness#

Theorem 10.63 (Closedness of tangent cone)

Let C be a nonempty subset of V. Let xC. Then TC(x) is closed.

Proof. .

  1. Let {tk} be a sequence of TC(x) such that limtk=tV.

  2. If t=0, then tTC(x) and we are done.

  3. Hence consider the case where t0.

  4. WLOG, assume that tk0 for every k.

  5. By the definition of the tangent, for every k, there is a sequence {xki} of C with xkix such that

    limixki=x,limixkixxkix=tktk.
  6. For each k, choose and ik such that i1<i2<<ik and

    limkxkik=x,limkxkikxxkikxtktk.
  7. For all k, we have

    xkikxxkikxttxkikxxkikxtktk+tktktt.
  8. Taking the limit of k, we have

    xkikxxkikxtt=0.
  9. Hence yTC(x).

10.7.3.5. Feasible Directions and Tangent Cone#

Theorem 10.64 (Feasible cone and tangent cone)

Let C be a nonempty subset of V. Let xC. The following hold true regarding the cone of feasible directions FC(x) and the tangent cone TC(x).

  1. FC(x)TC(x).

  2. clFC(x)TC(x).

  3. If C is convex, then

    clFC(x)=TC(x).

Proof. (1) Every feasible direction is a tangent direction.

  1. Let d be a feasible direction.

  2. Then there exists a t>0 such that

    x+tdCt[0,t].
  3. Let rk=tk for every kN.

  4. Then rk>0 for every k and rk0.

  5. Let xk=x+rkd.

  6. Then xkC, xkx for every k and xkx.

  7. Also xkxrk=d for every k.

  8. Hence xkxrkd.

  9. Hence d is a tangent direction due to Theorem 10.62.

  10. Hence FC(x)TC(x).

(2) Closure

  1. By Theorem 10.63, TC(x) is closed.

  2. We have shown that FC(x)TC(x).

  3. The closure of a set is the smallest closed set containing it.

  4. Hence clFC(x)TC(x).

(3) Convexity and Closure

  1. We are given that C is convex.

  2. By claim (2), we have clFC(x)TC(x).

  3. It will suffice to show the reverse inclusion TC(x)clFC(x).

  4. Let dTC(x).

  5. By Theorem 10.62, there exists a sequence {xk} of C with xkx and a positive scalar sequence {rk} with rk0 such that xkxrkd.

  6. Since C is convex, hence the direction xkxrk is a feasible direction of C for every k.

  7. Hence {xkxrk} is a converging sequence of feasible directions.

  8. Hence dclFC(x).

  9. Hence TC(x)clFC(x).

10.7.3.6. Convexity#

Theorem 10.65 (Convexity of the tangent cone)

Let C be a nonempty subset of V. Let xC. If C is convex then TC(x) is convex.

Proof. .

  1. By Theorem 10.60, FC(x) is convex.

  2. By Theorem 10.64,

    TC(x)=clFC(x)

    since C is convex.

  3. By Theorem 9.123, the closure of a convex set is convex.

  4. Hence TC(x) is convex.

10.7.3.7. Polar Cone of Tangent Cone#

Recall from Definition 9.36 that the normal cone to a set C at a point aC is given by

NC(a)={vV|xa,v0xC}.

Theorem 10.66 (Characterization of the polar cone of the tangent cone for a convex set)

Let C be a nonempty convex set. Then for every xC, we have zTC(x) if and only if

yx,z0yC.

In particular, we have

TC(x)=NC(x),NC(x)=TC(x)

where NC(x) is the normal cone of C at x.

Proof. First consider that zTC(x).

  1. We note that FC(x)TC(x).

  2. Since C is convex, hence yxFC(x) for every yC.

  3. Hence yxTC(x) for every yC.

  4. Hence yx,z0 for every yC from the definition of the polar cone of TC(x).

Now assume that some zV satisfies

yx,z0yC.
  1. For contradiction, assume that zTC(x).

  2. Then there exists some yTC(x) such that y,z>0.

  3. Since clFC(x)=TC(x), hence there exists a sequence {yk} of FC(x) such that yky.

  4. yk is a feasible direction at x for every k.

  5. Due to convexity of C, yk=rk(xkx) for every k where xkC and rk>0.

  6. Since y,z>0, hence for sufficiently large k, we should have yk,z>0.

  7. In other words, for sufficiently large k, we have rkxkx,z>0.

  8. Since rk>0, hence it reduces to xkx,z>0.

  9. But this is a contradiction to the hypothesis.

  10. Hence we must have zTC(x).

This characterization of polar cone of tangent cone implies that

TC(x)=NC(x).

Since TC(x) is a closed and convex cone for a convex C, hence due to Theorem 9.62, we have

NC(x)=(TC(x))=TC(x).

10.7.3.8. Examples#

Example 10.10 (Tangent cone for a linear system of equations)

Let C={xRn|Ax=b} where ARm×n and bRm.

Since C is convex, the strategy for computation of tangent cone is as follows.

  1. Identify the set of all feasible directions at the point.

  2. Compute the closure of the set of feasible directions at the point.

We note that C is an affine subspace.

  1. Note that the subspace L parallel to C is given by

    L={xRn|Ax=0}.
  2. Let xC.

  3. Every direction in the subspace L is a feasible direction.

    1. Let dL.

    2. Then A(x+d)=b.

    3. Hence d is a feasible direction.

  4. It is easy to see that there are no other feasible directions.

  5. Hence FC(x)=L.

  6. Since Rn is finite dimensional, hence L is closed.

  7. Since TC(x)=clFC(x), hence TC(x)=L.

Example 10.11 (Tangent cone for an affine subspace)

Let AV be an affine subspace.

  1. We note that A is closed and convex.

  2. Let L be the linear subspace parallel to A.

  3. Pick some xA.

  4. Then for any tL, we have x+tA.

  5. Hence every direction in L is a feasible direction.

  6. It is also easy to see that no other direction is a feasible direction.

  7. Hence FC(x)=L.

  8. Since TC(x)=clFC(x), hence TC(x)=L.

The tangent cone at any point in an affine subspace is the subspace parallel to the affine subspace.

Example 10.12 (Tangent cone of a closed unit ball)

Consider the set C=B[0,1]V given by

B[0,1]={x|x1}.

Since C is convex, the strategy for computation of tangent cone is as follows.

  1. Pick a point.

  2. Identify the set of all feasible directions at the point.

  3. Compute the closure of the set of feasible directions at the point.

There are two possible cases for the points in C that require separate treatment.

  1. xintC.

  2. xbdC.

(1) Interior points

  1. Let xintC.

  2. Then it is easy to see that every direction is a feasible direction along which we can find another point in C.

  3. Hence FC(x)=V.

  4. Then TC(x)=clFC(x)=V.

(2) Boundary points

  1. Let xbdC.

  2. We can see that only those directions are feasible that point towards some point in C. Any direction pointing towards the exterior of C is not a feasible direction.

    1. The directions which make an acute angle with x point away from the ball.

    2. The directions which make an obtuse angle with x point inside the ball.

  3. We shall now formally prove this.

  4. Since xbdC, hence x=1.

  5. Let yFC(x).

  6. Then we must have x+tyC for sufficiently small t>0.

  7. Hence yFC(x) if and only if there exists a t>0 such that for every t(0,t] we have x+ty1.

  8. Expanding, we have

    x2+2tx,y+t2y21t(0,t].
  9. Since x=1, hence this reduces to

    2tx,y+t2y20t(0,t].
  10. There are only two possibilities for this relation to hold true since y0.

    1. Either y=0.

    2. Or x,y<0 and t2x,yy2.

    3. From the second case, we have

      t=2x,yy2.
  11. Hence FC(x)={y|x,y<0}{0}.

  12. From TC(x)=clFC(x)=V, we have

    TC(x)={y|x,y0}.

Example 10.13 (Tangent cone of a closed half space)

Consider the set CV given by

C={x|x,ab}.

There are two possible cases for the points in C:

  1. xintC.

  2. xbdC.

(1) Interior points

  1. Let xintC.

  2. Then it is easy to see that every direction is a feasible direction along which we can find another point in C.

  3. Hence FC(x)=V.

  4. Then TC(x)=clFC(x)=V.

(2) Boundary points

  1. Let xbdC.

  2. Then we have x,a=b.

  3. Let dFC(x).

  4. This is equivalent to

    x+td,ab

    for some t>0.

  5. This is equivalent to d,a0.

  6. Hence

    FC(x)={d|d,a0}.
  7. This is also a closed half-space.

  8. Hence TC(x)=clFC(x)=FC(x).

  9. We can see that TC(x) is the closed half-space corresponding to the linear subspace parallel to the hyperplane

    H={x|x,a=b}

    given by

    L={x|x,a=0}.
  10. In other words,

    TC(x)=Cz

    where zH.

10.7.4. Normal Cones#

[9] introduces a more general notion of normal directions. For our purposes, we shall call it BNO normal directions.

Definition 10.26 (BNO normal direction)

Given a nonempty subset C of V and a vector xC, a vector zV is said to be a normal of C at x (in the sense of [9]) if there exist sequences {xk} of C and {zk} such that

xkx,zkz,zkTC(xk)k.

This normal direction shall be called a BNO normal direction in these notes to distinguish it from Definition 9.35.

Each zk is chosen from the polar cone of the tangent cone at xk.

Definition 10.27 (BNO normal cone)

Given a nonempty subset C of V and a vector xC, the set of all normal directions at x (in the sense of [9]) is called the normal cone of C at x and is denoted by N~C(x).

This normal cone shall be called a BNO normal cone in these notes to distinguish it from Definition 9.36.

The BNO normal cone reduces to the definition of normal cones in Definition 9.36 for convex sets. The BNO normal cone is useful in the analysis of optimality conditions for minimization of smooth functions with smooth equality and inequality constraints.

10.7.4.1. Conic structure#

Theorem 10.67 (Conic structure)

Let C be a nonempty set and xC. Then the BNO normal cone N~C(x) is a cone.

Proof. 0 being a normal direction.

  1. Pick any sequence {xk} such that xk=x.

  2. Then 0TC(xk) since TC(xk) is a cone.

  3. Hence we can pick zk=0 to form a sequence {zk} such that zk0 and zkTC(xk) for every k.

  4. Hence 0 is a BNO normal direction.

Conic property

  1. Let z be a BNO normal direction and t>0.

  2. There exists a sequence {xk} of C and a sequence {zk} such that zkTC(xk) for every k with zkz.

  3. Then tzkTC(xk) for every k since TC(xk) is a cone.

  4. Also tzktz.

  5. Hence tz is a BNO normal direction.

Hence N~C(x) is a cone.

10.7.4.2. Polar Cone of Tangent Cone#

Theorem 10.68 (Polar cone of tangent cone and BNO normal cone)

Let C be a nonempty set and xC. Then

TC(x)N~C(x).

Proof. This is straightforward from the definition of BNO normal directions.

  1. Let zTC(x).

  2. Consider the sequence {xk} where xk=x.

  3. Hence xkx.

  4. Then zTC(xk) for every k.

  5. Consider the sequence {zk} where zk=z.

  6. Then zkz.

  7. Hence zN~C(x).

  8. Hence TC(x)N~C(x).

10.7.4.3. Closedness#

Theorem 10.69 (Closedness of BNO normal cone)

Let C be a nonempty set and xC. Then the BNO normal cone at x, N~C(x) is closed.

Proof. Consider a converging sequence {zl} of N~C(x).

  1. Let zlz. We need to show that zN~C(x).

  2. For each zl, there exist sequences {xlm} of C and {zlm} where zlmTX(xlm) such that xlmx and zlmzl.

  3. For each k

    1. Let r=1k.

    2. Pick a zl0 from {zl} such that zl0z<r.

    3. Identify the index m0 such that for all m>m0, we have xl0mx<r.

    4. Pick a zl0m1 from {zl0m} such that m1>m0 and zl0m1zl0<r.

    5. Let xk=xl0m1.

    6. Let zk=zl0m1.

  4. Then we have xkx<r=1k for every k.

  5. Also zkz<2r=2k for every k.

  6. zkTX(xk).

  7. xkx.

  8. zkz.

  9. Hence zN~C(x).

Hence, every convergent sequence of N~C(x) in N~C(x). Hence N~C(x) is closed.

10.7.4.4. Regularity#

Theorem 10.70 (Regular set)

Let C be a nonempty set and xC. Then C is said to be regular at x (in the sense of [9]) if

TC(x)=N~C(x).

In other words, a set is regular at a point if the (BNO) normal cone is equal to the polar cone of the tangent cone.

Theorem 10.71 (Regularity of convex sets)

Let C be a nonempty convex set and xC. Then C is regular at every xC. Moreover,

TC(x)=N~C(x)=NC(x).

Proof. We show the regularity of C at x by showing that N~C(x)TC(x).

  1. Let zN~C(x).

  2. Then there exist sequences {xk} of C and {zk} with zkTC(xk) for every k such that xkx and zkz.

  3. Since C is convex, hence, due to Theorem 10.66, we have, for every k,

    yxk,zk0yC.
  4. By taking limit k, we obtain

    yx,z0yC.
  5. Hence, again due to Theorem 10.66, zTC(x).

  6. Hence N~C(x)TC(x).

Combining with Theorem 10.68, we have

TC(x)=N~C(x).

We also saw in Theorem 10.66 that TC(x)=NC(x).

This result shows that the definition of normal cone in Definition 9.36 agrees with the definition in Definition 10.27 for convex sets.

10.7.5. Optimality Conditions#

10.7.5.1. Minimization on an Arbitrary Constraint Set#

When the constraint set is nonconvex, then the tangent cone can be used as a suitable approximation to the constraint set at the point of local optimality. The tangent cone provides additional structure of being a cone and being closed.

Theorem 10.72 (Tangent cone and local optimality)

Let f:V(,] be a function with S=domf. Let a be a local minimum of f over a subset C of S. Assume that f is continuously differentiable over an open set O containing the local minimum a. Assume that OC. Then

y,f(a)0,yTC(a).

Equivalently, we must have

f(a)TC(a);

i.e., the negative of the gradient at the local minimum belongs to the polar cone of the tangent cone or the gradient at the local minimum belongs to the dual cone of the tangent cone.

This necessary condition means that the the value of function increases along every tangent direction in its local neighborhood.

For a smooth f:VR the condition simplifies as follows. Let a be a local minimum of f over a subset C of V. Then

y,f(a)0,yTC(a).

Proof. For the zero tangent direction at a, the condition is met trivially. Let t be a nonzero tangent direction at a.

  1. Then there exists a sequence {xk} of C and a sequence {rk} of V such that xka for every k, rk0, xka and

    xkaxka=tt+rk.

    The term rk is the residual between the k-th normalized direction and the normalized tangent direction.

  2. WLOG assume that xkOk. Otherwise drop finitely many terms from {xk} to achieve this.

  3. For every k, we have

    xka=xkat(t+trk).
  4. Since xkO for every k, hence f is continuously differentiable at every xk and at every point on the line segment [a,xk].

  5. By mean value theorem, we have

    f(xk)=f(a)+xka,f(x~k)

    where x~k lies on the line segment [a,xk].

  6. Since xka, hence x~ka holds too.

  7. Substituting for xka, we get

    f(xk)=f(a)+xkattk,f(x~k)=f(a)+xkattk,f(x~k)

    where tk=t+trk.

  8. Note that tkt since rk0.

  9. Suppose for contradiction that t,f(a)<0.

  10. Since x~ka and tkt, hence it follows that for all sufficiently large k, we have tk,f(x~k)<0.

  11. Hence, from the previous relation, we have f(xk)<f(a) for all sufficiently large k.

  12. Thus contradictions the local optimality of a.

  13. Hence we must have t,f(a)0.

An equivalent statement is that

y,f(a)0,yTC(a).

This means that f(a)TC(a). Since the polar cone is the negative of the dual cone, hence f(a)TC(a).

Observation 10.9 (Local minimum and descent directions)

Recall that a direction d that satisfies d,f(a)<0 is called a descent direction at a. Recall from Theorem 5.1 that

f(x)=f(a)+xa,f(a)+o(xa).

Thus for some x=a+td where t>0, we have

f(x)=f(a)+td,f(a)+o(xa).

Hence for sufficiently small t>0, we have f(x)<f(a).

Thus, Theorem 10.72 says that if a is a local minimum of f over C, then there is no descent direction within the tangent cone TC(a).

10.7.5.2. Minimization on a Convex Constraint Set#

Theorem 10.73 (Local optimality with a convex constraint set)

Let f:V(,] be a function with S=domf. Let a be a local minimum of f over a convex subset C of S. Assume that f is continuously differentiable over an open set O containing the local minimum a. Assume that OC. Then

xa,f(a)0,xC.

If f is real valued (with domf=V) and C=V, this reduces to f(a)=0.

This result is similar to Theorem 10.47 however applicable to all functions that are continuously differentiable in the neighborhood of their local minimum.

Proof. Since C is convex, hence by Theorem 10.64, clFC(x)=TC(x) for every xC.

  1. By Theorem 10.72, we have

    y,f(a)0,yTC(a).
  2. In other words

    y,f(a)0,yclFC(a).
  3. For any xC, we have xaFC(a) since C is convex.

  4. Hence

    xa,f(a)0,xC.

The special case of C=V is as follows.

  1. Choose an orthogonal basis e1,,en for V.

  2. By picking x=a+ei and x=aei, we see that

    eif(a)=0i.
  3. Hence f(a)=0.

10.7.5.3. Convex Functions on a Convex Set#

Recall from Fermat’s optimality condition (Theorem 9.233) that for the unconstrained minimization problem, a is a minimizer of f if and only if 0 is a subgradient of f at a.

A similar result can be obtained for the constrained optimization problem in terms of the subgradients of f and the normal cone of C.

Theorem 10.74 (Optimality conditions for convex constrained optimization)

Let f:V(,] be a proper and convex function. Let CV be a convex set for which ridomfriC. Then, xC is an optimal solution of (10.15) if and only if

 there exists gf(x) for which gNC(x).

Equivalently, x minimizes f over C if and only if

0f(x)+TC(x).

Proof. The problem in (10.15) can be rephrased as

minxVf(x)+IC(x)

where IC is the indicator function for the set C.

  1. Since ridomfriC, hence due to the sum rule of subdifferential calculus (Corollary 9.23):

    (f+IC)(x)=f(x)+IC(x).
  2. Recall from Theorem 9.235 that

    IC(x)=NC(x).
  3. Hence for any xV

    (f+δC)(x)=f(x)+NC(x).
  4. Invoking Fermat’s optimality condition (Theorem 9.233), xC is an optimal solution of (10.15) if and only if

    0f(x)+NC(x).
  5. In other words, there exists gf(x) and hNC(x) such that g+h=0.

  6. This is equivalent to saying that there exists a vector gf(x) such that gNC(x).

  7. This is same as the condition

    (f(x))NC(x).

For the second part:

  1. By Theorem 10.66, we have NC(x)=TC(x) since C is convex.

  2. By the earlier argument, we have gf(x) and hNC(x) such that

    g+h=0.
  3. This is equivalent to saying that

    0f(x)+TC(x).

By using the definition of the normal cone, we can provide an alternative specification of the necessary and sufficient optimality condition in a more explicit manner.

Corollary 10.4 (Optimality conditions for convex constrained optimization second version)

Let f:V(,] be a proper and convex function. Let CV be a convex set for which ridomfriC. Then, xC is an optimal solution of (10.15) if and only if

 there exists gf(x) for which xx,g0xC.

Following is the refinement of Theorem 10.73 for continuously differentiable functions at the global minimum.

Corollary 10.5 (Minimization of convex and smooth function on a convex set)

Let f:V(,] be a proper and convex function. Let CV be a convex set for which ridomfriC. Assume that f is continuously differentiable at a point xC.

Then, xC is an optimal solution of (10.15) if and only if

xx,f(x)0xC.

10.7.5.4. Sum of Smooth and Convex Functions on an Arbitrary Constraint Set#

Theorem 10.75 (Minimization of sum of smooth and convex function)

Let a be a local minimum of a function f:VR over a set C. Assume that the tangent cone TC(a) is convex. Assume that f has the form

f(x)=f1(x)+f2(x)

where f1:VR is convex and f2:VR is smooth. Then

f2(a)f1(a)+TC(a).

10.7.5.5. Optimization Over Unit Simplex#

In this section, we consider the problem of minimizing a function over the unit simplex Δn.

Recall from Definition 9.17 that the unit simplex is given by

Δn={xRn|x,1=1,x0}.

In other words, for every xΔn, every component is nonnegative and their sum is 1.

Theorem 10.76 (Optimality conditions over unit simplex)

Let f:Rn(,] be a proper and convex function.

Consider the optimization problem

(10.16)#minimize f(x)subject to xΔn.

Assume that ridomfriΔn. Then, xΔn is an optimal solution if and only if there exists gf(x) and μR such that

(10.17)#gi{=μ,xi>0;μ,xi=0.

Proof. Assume that for some xΔn, there exists a gf(x) and μ satisfying (10.17).

  1. Note that i=1nxi=1 simplifies to i|xi>0xi=1.

  2. Now, for any xΔn

    gT(xx)=i=1ngi(xixi)=i|xi>0gi(xixi)+i|xi=0gixii|xi>0μ(xixi)+i|xi=0μxi=μi=1nxiμi|xi>0xi=μ1μ1=μμ=0.
  3. Thus,

    gT(xx)0xΔn.
  4. Thus, due to Corollary 10.4, x is indeed an optimal point.

For the converse, assume that x is an optimal point.

  1. Then, due to Corollary 10.4, there exists gf(x) such that gT(xx)0xΔn.

  2. We need to find a value of μ such that g satisfies (10.17).

  3. We consider the following three cases.

    1. x=0.

    2. Only one entry in x is nonzero and others are zero.

    3. Two or more entries in x are nonzero.

  4. If x=0 then we can choose μ=min{g1,,gn} satisfying (10.17).

  5. Suppose that only the i-th entry of x is nonzero and remaining entries are zero.

    1. Since xΔn hence j=1nxj=1 implies that xi=1.

    2. Let μ=gi.

    3. Pick any j{1,,n} such that ji.

    4. Let x=ej; i.e., the unit vector whose j-th component is 1 and other components are zero.

    5. Then,

      gT(xx)0gTxgTx0gjgi0gjgi=μ.
    6. Thus, with μ=gi, (10.17) is satisfied.

  6. For the third case, let i and j denote two different indices for which xi>0 and xj>0.

    1. Pick a vector xΔn satisfying

      xk={xk,k{i,j};xixi2,k=i;xj+xi2,k=j.

      We can verify that xΔn.

    2. Then, the inequality gT(xx)0 simplifies to

      xi2gi+xi2gj0.
    3. Since xi>0, hence this simplifies to gigj.

    4. Switching the role of i and j and following an identical argument, we get gigj.

    5. This means that gi=gj must hold true.

    6. The argument is independent of the choice of i,j for which xi,xj>0.

    7. Thus gi=gj holds true for every i,j such that xi,xj>0..

    8. This implies that all the components of g corresponding to the positive components of x must have the same value.

    9. Let this common value be μ.

    10. In other words, gi=μ for all i=1,,n such that xi>0.

    11. Now consider some k{1,,n} such that xk=0.

    12. Let x=ek.

    13. Note that i=1nxi=1 simplifies to i|xi>0xi=1.

    14. Then,

      gT(xx)0gTxgTx0gki|xi>0gixi0gkμi|xi>0xi0gkμ0gkμ.
    15. Thus, giμ for all i=1,,n such that xi=0.

    16. Thus, we have established that g indeed satisfies (10.17).

Example 10.14 (Optimization over the unit simplex)

Consider the function f:Rn(,] given by

f(x)={i=1nxilnxii=1nyixi,x0; otherwise.

The vector y=(y1,,yn)Rn is fixed. Consider the optimization problem

min{f(x)|xΔn}.
  1. f is a proper convex function.

  2. We can see that domf=R+n.

  3. intdomf=ridomf=R++n.

  4. It is easy to see that ridomfriΔn.

  5. f is differentiable at xintdomf.

  6. The partial derivatives of f are given by

    fxi(x)=1+lnxiyi.
  7. Thus, f(x)={f(x)} at every xsucc0.

  8. Assume that there is an optimal solution satisfying xsucc0.

  9. By Theorem 10.76, there exists gf(x) and μR satisfying (10.17).

  10. Since xsucc0, (10.17) simplifies to gi=μ for all i=1,,n.

  11. Since f is differentiable, hence the only subgradient is f(x).

  12. Thus, there exists μR such that

    fxi(x)=μ.
  13. This implies

    1+lnxiyi=μ.
  14. Therefore, for every i

    xi=eμ1+yi=αeyi

    where α=eμ1.

  15. Since xΔn, hence i=1nxi=1.

  16. Thus,

    i=1nαeyi=1α=1i=1neyi.
  17. Therefore

    xi=eyij=1neyji=1,,n.
  18. We note that x obtained above satisfies all the conditions in Theorem 10.76.

  19. Since these conditions are also sufficient, hence x is indeed the optimal solution for this minimization problem.

  20. Let us compute the optimal value of the minimization problem.

  21. We note that

    lnxi=μ1+yi.
  22. Hence

    f(x)=i=1nxilnxii=1nyixi=i=1nxi(μ1+yi)i=1nyixi=i=1nxi(μ1)=(μ1)i=1nxi=μ1.
  23. In terms of y,

    f(x)=μ1=lnα=ln(i=1neyi).
  24. The optimal value is the negative of the log-sum-exp of y.

This optimization problem is used in the computation of the conjugate function for the negative entropy function. See Theorem 9.257.