Constrained Optimization II
Contents
10.7. Constrained Optimization II#
In this section, we present several results for the general problem of optimizing a cost function over a constraint set.
Throughout this section, we assume that
Main references for this section are [9].
10.7.1. General Constrained Optimization#
Let
We are concerned with the optimization problems of the form
We provide optimality conditions for three specific type of problems.
A continuously differentiable function
over an arbitrary constraint set .A convex, possibly nonsmooth, cost function
and a convex constraint set .A cost function consisting of the sum of a smooth (continuously differentiable) function and a convex function and an arbitrary constraint set.
For the development of the results in this section, we will need the supporting machinery of the following notions:
Feasible directions and the cone of feasible directions
Tangent directions and tangent cone
Normal directions and normal cone
For a nonconvex optimization problem, the optimality conditions provide the necessary conditions for a feasible point to be a local optimal solution.
For convex problems, the optimality conditions provide the necessary and sufficient conditions for a feasible point to be a global optimal solution.
10.7.2. Feasible Directions#
At any point in the constraint set
Definition 10.23 (Feasible direction)
Consider the optimization problem (10.15).
Given a vector
The set of all feasible directions of
Observation 10.8 (The cone of feasible directions)
The set of feasible directions at
We can see that
.Also, if
then for any , also holds true.
10.7.2.1. Convexity#
Theorem 10.60 (Convexity of the cone of feasible directions)
Let
Proof. We are given that
For any point
, the line segment .Hence the set
consists of all the vectors of the form where and .Let
and .Then there exist
and such thatThen
Let
.Then
and .Hence
due to convexity of .Hence
.Hence
is a feasible direction.Hence
is convex.
10.7.3. Tangent Cones#
10.7.3.1. Tangent Direction#
Definition 10.24 (Tangent direction)
Let
For a nonzero direction
, the term is a normalized direction.Since
for every , hence for every .The term
is also a normalized direction.Hence the sequence
is a sequence of normalized directions.Thus a nonzero direction
is a tangent at if it is possible to approach with a feasible sequence such that the normalized direction sequence converges to .
10.7.3.2. Tangent Cone#
Theorem 10.61 (The cone of tangent directions)
Let
Proof. We can see that by definition
Let
be a tangent direction.Let
.Then
for some sequence
converging to with for every .Hence
is also a tangent direction.Hence the set of tangent directions is a cone.
Definition 10.25 (Tangent cone)
Let
10.7.3.3. Characterization#
Theorem 10.62 (Characterization of tangent directions)
Let
Proof. Let
If
, then we can take for every and for every .Then
and .Now consider the case where
.By definition of tangent, there exists a sequence
of with , and .Let
.Clearly
and since .Also
Since by definition of tangent
hence
as desired.
Conversely, suppose
If
then it is a tangent.Now consider the case where
Since
and hence we must have .It is also possible to choose a subsequence of
such that for every . Otherwise, but we are given that .WLOG assume that
for every .Then
Hence
must be a tangent direction.
10.7.3.4. Closedness#
Theorem 10.63 (Closedness of tangent cone)
Let
Proof. .
Let
be a sequence of such that .If
, then and we are done.Hence consider the case where
.WLOG, assume that
for every .By the definition of the tangent, for every
, there is a sequence of with such thatFor each
, choose and such that andFor all
, we haveTaking the limit of
, we haveHence
.
10.7.3.5. Feasible Directions and Tangent Cone#
Theorem 10.64 (Feasible cone and tangent cone)
Let
. .If
is convex, then
Proof. (1) Every feasible direction is a tangent direction.
Let
be a feasible direction.Then there exists a
such thatLet
for every .Then
for every and .Let
.Then
, for every and .Also
for every .Hence
.Hence
is a tangent direction due to Theorem 10.62.Hence
.
(2) Closure
By Theorem 10.63,
is closed.We have shown that
.The closure of a set is the smallest closed set containing it.
Hence
.
(3) Convexity and Closure
We are given that
is convex.By claim (2), we have
.It will suffice to show the reverse inclusion
.Let
.By Theorem 10.62, there exists a sequence
of with and a positive scalar sequence with such that .Since
is convex, hence the direction is a feasible direction of for every .Hence
is a converging sequence of feasible directions.Hence
.Hence
.
10.7.3.6. Convexity#
Theorem 10.65 (Convexity of the tangent cone)
Let
Proof. .
By Theorem 10.60,
is convex.By Theorem 10.64,
since
is convex.By Theorem 9.123, the closure of a convex set is convex.
Hence
is convex.
10.7.3.7. Polar Cone of Tangent Cone#
Recall from Definition 9.36
that the normal cone to a set
Theorem 10.66 (Characterization of the polar cone of the tangent cone for a convex set)
Let
In particular, we have
where
Proof. First consider that
We note that
.Since
is convex, hence for every .Hence
for every .Hence
for every from the definition of the polar cone of .
Now assume that some
For contradiction, assume that
.Then there exists some
such that .Since
, hence there exists a sequence of such that . is a feasible direction at for every .Due to convexity of
, for every where and .Since
, hence for sufficiently large , we should have .In other words, for sufficiently large
, we have .Since
, hence it reduces to .But this is a contradiction to the hypothesis.
Hence we must have
.
This characterization of polar cone of tangent cone implies that
Since
10.7.3.8. Examples#
Example 10.10 (Tangent cone for a linear system of equations)
Let
Since
Identify the set of all feasible directions at the point.
Compute the closure of the set of feasible directions at the point.
We note that
Note that the subspace
parallel to is given byLet
.Every direction in the subspace
is a feasible direction.Let
.Then
.Hence
is a feasible direction.
It is easy to see that there are no other feasible directions.
Hence
.Since
is finite dimensional, hence is closed.Since
, hence .
Example 10.11 (Tangent cone for an affine subspace)
Let
We note that
is closed and convex.Let
be the linear subspace parallel to .Pick some
.Then for any
, we have .Hence every direction in
is a feasible direction.It is also easy to see that no other direction is a feasible direction.
Hence
.Since
, hence .
The tangent cone at any point in an affine subspace is the subspace parallel to the affine subspace.
Example 10.12 (Tangent cone of a closed unit ball)
Consider the set
Since
Pick a point.
Identify the set of all feasible directions at the point.
Compute the closure of the set of feasible directions at the point.
There are two possible cases for the points in
. .
(1) Interior points
Let
.Then it is easy to see that every direction is a feasible direction along which we can find another point in
.Hence
.Then
.
(2) Boundary points
Let
.We can see that only those directions are feasible that point towards some point in
. Any direction pointing towards the exterior of is not a feasible direction.The directions which make an acute angle with
point away from the ball.The directions which make an obtuse angle with
point inside the ball.
We shall now formally prove this.
Since
, hence .Let
.Then we must have
for sufficiently small .Hence
if and only if there exists a such that for every we have .Expanding, we have
Since
, hence this reduces toThere are only two possibilities for this relation to hold true since
.Either
.Or
and .From the second case, we have
Hence
.From
, we have
Example 10.13 (Tangent cone of a closed half space)
Consider the set
There are two possible cases for the points in
. .
(1) Interior points
Let
.Then it is easy to see that every direction is a feasible direction along which we can find another point in
.Hence
.Then
.
(2) Boundary points
Let
.Then we have
.Let
.This is equivalent to
for some
.This is equivalent to
.Hence
This is also a closed half-space.
Hence
.We can see that
is the closed half-space corresponding to the linear subspace parallel to the hyperplanegiven by
In other words,
where
.
10.7.4. Normal Cones#
[9] introduces a more general notion of normal directions. For our purposes, we shall call it BNO normal directions.
Definition 10.26 (BNO normal direction)
Given a nonempty subset
This normal direction shall be called a BNO normal direction in these notes to distinguish it from Definition 9.35.
Each
Definition 10.27 (BNO normal cone)
Given a nonempty subset
This normal cone shall be called a BNO normal cone in these notes to distinguish it from Definition 9.36.
The BNO normal cone reduces to the definition of normal cones in Definition 9.36 for convex sets. The BNO normal cone is useful in the analysis of optimality conditions for minimization of smooth functions with smooth equality and inequality constraints.
10.7.4.1. Conic structure#
Theorem 10.67 (Conic structure)
Let
Proof.
Pick any sequence
such that .Then
since is a cone.Hence we can pick
to form a sequence such that and for every .Hence
is a BNO normal direction.
Conic property
Let
be a BNO normal direction and .There exists a sequence
of and a sequence such that for every with .Then
for every since is a cone.Also
.Hence
is a BNO normal direction.
Hence
10.7.4.2. Polar Cone of Tangent Cone#
Theorem 10.68 (Polar cone of tangent cone and BNO normal cone)
Let
Proof. This is straightforward from the definition of BNO normal directions.
Let
.Consider the sequence
where .Hence
.Then
for every .Consider the sequence
where .Then
.Hence
.Hence
.
10.7.4.3. Closedness#
Theorem 10.69 (Closedness of BNO normal cone)
Let
Proof. Consider a converging sequence
Let
. We need to show that .For each
, there exist sequences of and where such that and .For each
Let
.Pick a
from such that .Identify the index
such that for all , we have .Pick a
from such that and .Let
.Let
.
Then we have
for every .Also
for every . . . .Hence
.
Hence, every convergent sequence of
10.7.4.4. Regularity#
Theorem 10.70 (Regular set)
Let
In other words, a set is regular at a point if the (BNO) normal cone is equal to the polar cone of the tangent cone.
Theorem 10.71 (Regularity of convex sets)
Let
Proof. We show the regularity of
Let
.Then there exist sequences
of and with for every such that and .Since
is convex, hence, due to Theorem 10.66, we have, for every ,By taking limit
, we obtainHence, again due to Theorem 10.66,
.Hence
.
Combining with Theorem 10.68, we have
We also saw in Theorem 10.66
that
This result shows that the definition of normal cone in Definition 9.36 agrees with the definition in Definition 10.27 for convex sets.
10.7.5. Optimality Conditions#
10.7.5.1. Minimization on an Arbitrary Constraint Set#
When the constraint set is nonconvex, then the tangent cone can be used as a suitable approximation to the constraint set at the point of local optimality. The tangent cone provides additional structure of being a cone and being closed.
Theorem 10.72 (Tangent cone and local optimality)
Let
Equivalently, we must have
i.e., the negative of the gradient at the local minimum belongs to the polar cone of the tangent cone or the gradient at the local minimum belongs to the dual cone of the tangent cone.
This necessary condition means that the the value of function increases along every tangent direction in its local neighborhood.
For a smooth
Proof. For the zero tangent direction at
Then there exists a sequence
of and a sequence of such that for every , , andThe term
is the residual between the -th normalized direction and the normalized tangent direction.WLOG assume that
. Otherwise drop finitely many terms from to achieve this.For every
, we haveSince
for every , hence is continuously differentiable at every and at every point on the line segment .By mean value theorem, we have
where
lies on the line segment .Since
, hence holds too.Substituting for
, we getwhere
.Note that
since .Suppose for contradiction that
.Since
and , hence it follows that for all sufficiently large , we have .Hence, from the previous relation, we have
for all sufficiently large .Thus contradictions the local optimality of
.Hence we must have
.
An equivalent statement is that
This means that
Observation 10.9 (Local minimum and descent directions)
Recall that a direction
Thus for some
Hence for sufficiently small
Thus, Theorem 10.72 says that
if
10.7.5.2. Minimization on a Convex Constraint Set#
Theorem 10.73 (Local optimality with a convex constraint set)
Let
If
This result is similar to Theorem 10.47 however applicable to all functions that are continuously differentiable in the neighborhood of their local minimum.
Proof. Since
By Theorem 10.72, we have
In other words
For any
, we have since is convex.Hence
The special case of
Choose an orthogonal basis
for .By picking
and , we see thatHence
.
10.7.5.3. Convex Functions on a Convex Set#
Recall from Fermat’s optimality condition
(Theorem 9.233)
that for the unconstrained minimization problem,
A similar result can be obtained for the
constrained optimization problem in terms
of the subgradients of
Theorem 10.74 (Optimality conditions for convex constrained optimization)
Let
Equivalently,
Proof. The problem in (10.15) can be rephrased as
where
Since
, hence due to the sum rule of subdifferential calculus (Corollary 9.23):Recall from Theorem 9.235 that
Hence for any
Invoking Fermat’s optimality condition (Theorem 9.233),
is an optimal solution of (10.15) if and only ifIn other words, there exists
and such that .This is equivalent to saying that there exists a vector
such that .This is same as the condition
For the second part:
By Theorem 10.66, we have
since is convex.By the earlier argument, we have
and such thatThis is equivalent to saying that
By using the definition of the normal cone, we can provide an alternative specification of the necessary and sufficient optimality condition in a more explicit manner.
Corollary 10.4 (Optimality conditions for convex constrained optimization second version)
Let
Following is the refinement of Theorem 10.73 for continuously differentiable functions at the global minimum.
Corollary 10.5 (Minimization of convex and smooth function on a convex set)
Let
Then,
10.7.5.4. Sum of Smooth and Convex Functions on an Arbitrary Constraint Set#
Theorem 10.75 (Minimization of sum of smooth and convex function)
Let
where
10.7.5.5. Optimization Over Unit Simplex#
In this section, we consider the problem of minimizing a function
over the unit simplex
Recall from Definition 9.17 that the unit simplex is given by
In other words, for every
Theorem 10.76 (Optimality conditions over unit simplex)
Let
Consider the optimization problem
Assume that
Proof. Assume that for some
Note that
simplifies to .Now, for any
Thus,
Thus, due to Corollary 10.4,
is indeed an optimal point.
For the converse, assume that
Then, due to Corollary 10.4, there exists
such that .We need to find a value of
such that satisfies (10.17).We consider the following three cases.
.Only one entry in
is nonzero and others are zero.Two or more entries in
are nonzero.
If
then we can choose satisfying (10.17).Suppose that only the
-th entry of is nonzero and remaining entries are zero.Since
hence implies that .Let
.Pick any
such that .Let
; i.e., the unit vector whose -th component is 1 and other components are zero.Then,
Thus, with
, (10.17) is satisfied.
For the third case, let
and denote two different indices for which and .Pick a vector
satisfyingWe can verify that
.Then, the inequality
simplifies toSince
, hence this simplifies to .Switching the role of
and and following an identical argument, we get .This means that
must hold true.The argument is independent of the choice of
for which .Thus
holds true for every such that ..This implies that all the components of
corresponding to the positive components of must have the same value.Let this common value be
.In other words,
for all such that .Now consider some
such that .Let
.Note that
simplifies to .Then,
Thus,
for all such that .Thus, we have established that
indeed satisfies (10.17).
Example 10.14 (Optimization over the unit simplex)
Consider the function
The vector
is a proper convex function.We can see that
. .It is easy to see that
. is differentiable at .The partial derivatives of
are given byThus,
at every .Assume that there is an optimal solution satisfying
.By Theorem 10.76, there exists
and satisfying (10.17).Since
, (10.17) simplifies to for all .Since
is differentiable, hence the only subgradient is .Thus, there exists
such thatThis implies
Therefore, for every
where
.Since
, hence .Thus,
Therefore
We note that
obtained above satisfies all the conditions in Theorem 10.76.Since these conditions are also sufficient, hence
is indeed the optimal solution for this minimization problem.Let us compute the optimal value of the minimization problem.
We note that
Hence
In terms of
,The optimal value is the negative of the log-sum-exp of
.
This optimization problem is used in the computation of the conjugate function for the negative entropy function. See Theorem 9.257.