# Constrained Optimization II

## Contents

# 10.7. Constrained Optimization II#

In this section, we present several results for the general problem of optimizing a cost function over a constraint set.

Throughout this section, we assume that \(\VV\) is an \(n\)-dimensional real vector space endowed with an inner product \(\langle \cdot, \cdot \rangle : \VV \times \VV \to \RR\) and a norm \(\| \cdot \| : \VV \to \RR\).

Main references for this section are [9].

## 10.7.1. General Constrained Optimization#

Let \(f : \VV \to \RERL\) be a proper function with \(S = \dom f\). Let \(C \subseteq \VV\) be a nonempty set.

We are concerned with the optimization problems of the form

We provide optimality conditions for three specific type of problems.

A continuously differentiable function \(f\) over an arbitrary constraint set \(C\).

A convex, possibly nonsmooth, cost function \(f\) and a convex constraint set \(C\).

A cost function consisting of the sum of a smooth (continuously differentiable) function and a convex function and an arbitrary constraint set.

For the development of the results in this section, we will need the supporting machinery of the following notions:

Feasible directions and the cone of feasible directions

Tangent directions and tangent cone

Normal directions and normal cone

For a nonconvex optimization problem, the optimality conditions provide the necessary conditions for a feasible point to be a local optimal solution.

For convex problems, the optimality conditions provide the necessary and sufficient conditions for a feasible point to be a global optimal solution.

## 10.7.2. Feasible Directions#

At any point in the constraint set \(C\), a feasible direction is a direction along which if we move slightly, then we can stay within \(C\).

(Feasible direction)

Consider the optimization problem (10.15).
Given a vector \(\bx \in C\), a direction \(\bd \in \VV\)
is said to be a *feasible direction* of \(C\) at \(\bx\) if
there exists a \(\overline{t} > 0\) such that

The set of all feasible directions of \(C\) at \(\bx\) is denoted by \(F_C(\bx)\).

(The cone of feasible directions)

The set of feasible directions at \(\bx\) is a cone.

We can see that \(\bzero \in F_C(\bx)\).

Also, if \(\bd \in F_C(\bx)\) then for any \(r > 0\), \(r \bd \in F_C(\bx)\) also holds true.

### 10.7.2.1. Convexity#

(Convexity of the cone of feasible directions)

Let \(C\) be a nonempty subset of \(\VV\). Let \(\bx \in C\). If \(C\) is convex then \(F_C(\bx)\) is convex.

Proof. We are given that \(C\) is convex.

For any point \(\by \in C\), the line segment \([\bx, \by] \in C\).

Hence the set \(F_C(\bx)\) consists of all the vectors of the form \(t(\by - \bx)\) where \(t > 0\) and \(\by \in C\).

Let \(\bd_1, \bd_2 \in F_C(\bx)\) and \(r \in (0,1)\).

Then there exist \(\by_1, \by_2 \in C\) and \(t_1, t_2 > 0\) such that

\[ \bd_1 = t_1 (\by_1 - \bx), \bd_2 = t_2 (\by_2 - \bx). \]Then

\[\begin{split} \bd = r \bd_1 + (1-r) \bd_2 &= r t_1 (\by_1 - \bx) + (1 - r) t_2 (\by_2 - \bx) \\ &= (r t_1 + (1-r) t_2) \left (\frac{r t_1}{r t_1 + (1-r) t_2} \by_1 + \frac{ (1- r) t_2}{r t_1 + (1-r) t_2} \by_2 - \bx \right ). \end{split}\]Let \(s = \frac{r t_1}{r t_1 + (1-r) t_2}\).

Then \(s \in (0, 1)\) and \(1-s = \frac{ (1- r) t_2}{r t_1 + (1-r) t_2}\).

Hence \(\by = s \by_1 + (1-s) \by_2 \in C\) due to convexity of \(C\).

Hence \(\bd = (r t_1 + (1-r) t_2) (\by - \bx)\).

Hence \(\bd\) is a feasible direction.

Hence \(F_C(\bx)\) is convex.

## 10.7.3. Tangent Cones#

### 10.7.3.1. Tangent Direction#

(Tangent direction)

Let \(C\) be a nonempty subset of \(\VV\).
Let \(\bx \in C\). A vector \(\bt \in \VV\)
is called a *tangent* of \(C\) at \(\bx\)
if either \(\bt = \bzero\) or
there exists a sequence \(\{ \bx_k \}\)
of \(C\) such that \(\bx_k \neq \bx\)
for every \(k\), and

For a nonzero direction \(\bt\), the term \(\frac{\bt}{\| \bt \|}\) is a

*normalized direction*.Since \(\bx_k \neq \bx\) for every \(k\), hence \(\bx_k - \bx \neq \bzero\) for every \(k\).

The term \(\frac{\bx_k - \bx}{ \| \bx_k - \bx\|}\) is also a normalized direction.

Hence the sequence \(\{ \frac{\bx_k - \bx}{ \| \bx_k - \bx\|}\}\) is a sequence of normalized directions.

Thus a nonzero direction \(\bt\) is a tangent at \(\bx\) if it is possible to approach \(\bx\) with a feasible sequence \(\{ \bx_k \}\) such that the normalized direction sequence \(\{ \frac{\bx_k - \bx}{ \| \bx_k - \bx\|} \}\) converges to \(\frac{\bt}{\| \bt \|}\).

### 10.7.3.2. Tangent Cone#

(The cone of tangent directions)

Let \(C\) be a nonempty subset of \(\VV\). Let \(\bx \in C\). The set of tangent directions of \(C\) at \(\bx\) is a cone.

Proof. We can see that by definition \(\bzero\) is a tangent direction.

Let \(\bt \neq \bzero\) be a tangent direction.

Let \(\alpha > 0\).

Then

\[ \frac{\alpha \bt }{ \| \alpha \bt \|} = \frac{\bt}{ \| \bt \|} = \frac{\bx_k - \bx}{ \| \bx_k - \bx\|} \]for some sequence \(\{ \bx_k \}\) converging to \(\bx\) with \(\bx_k \neq \bx\) for every \(k\).

Hence \(\alpha \bt\) is also a tangent direction.

Hence the set of tangent directions is a cone.

(Tangent cone)

Let \(C\) be a nonempty subset of \(\VV\).
Let \(\bx \in C\).
The set of all tangent directions of \(C\) at \(\bx\)
is called the *tangent cone* of \(C\) at \(\bx\)
and is denoted by \(T_C(\bx)\).

### 10.7.3.3. Characterization#

(Characterization of tangent directions)

Let \(C\) be a nonempty subset of \(\VV\). Let \(\bx \in C\). A vector \(\bt \in \VV\) is a tangent of \(C\) at \(\bx\) if and only if there exists a sequence \(\{ \bx_k \}\) of \(C\) and a positive scalar sequence \(\{ r_k \}\) such that \(r_k \to 0\) and \(\frac{\bx_k - \bx}{r_k} \to \bt\).

Proof. Let \(\bt\) be a tangent of \(C\) at \(\bx\).

If \(\bt = \bzero\), then we can take \(\bx_k = \bx\) for every \(k\) and \(r_k = \frac{1}{k}\) for every \(k\).

Then \(r_k \to 0\) and \(\frac{\bx_k - \bx}{r_k} \to \bzero\).

Now consider the case where \(\bt \neq \bzero\).

By definition of tangent, there exists a sequence \(\{ \bx_k \}\) of \(C\) with \(\bx_k \neq \bx\), \(\bx_k \to \bx\) and \(\frac{\bx_k - \bx}{ \| \bx_k - \bx\|} \to \frac{\bt}{\| \bt \|}\).

Let \(r_k = \frac{\| \bx_k - \bx \|}{\| \bt \|}\).

Clearly \(r_k > 0\) and \(r_k \to 0\) since \(\bx_k \to \bx\).

Also

\[ \frac{\bx_k - \bx}{r_k} = \| \bt \| \frac{\bx_k - \bx}{\| \bx_k - \bx \|}. \]Since by definition of tangent

\[ \frac{\bx_k - \bx}{ \| \bx_k - \bx\|} \to \frac{\bt}{\| \bt \|}, \]hence

\[ \frac{\bx_k - \bx}{r_k} \to \bt \]as desired.

Conversely, suppose \(\bt\) is such that there exist sequences \(\{ \bx_k \}\) and \(\{ r_k \}\) with the given properties.

If \(\bt = \bzero\) then it is a tangent.

Now consider the case where \(\bt \neq \bzero\)

Since \(r_k \to 0\) and \((\bx_k - \bx) / r_k \to \bt\) hence we must have \(\bx_k \to \bx\).

It is also possible to choose a subsequence of \(\{ \bx_k \}\) such that \(\bx_{k_l} \neq \bx\) for every \(k_l\). Otherwise, \((\bx_k - \bx) / r_k \to \bzero\) but we are given that \(\bt \neq \bzero\).

WLOG assume that \(\bx_k \neq \bx\) for every \(k\).

Then

\[ \frac{\bx_k - \bx}{ \| \bx_k - \bx\|} = \frac{(\bx_k - \bx) / r_k}{ \| \bx_k - \bx\| / r_k} \to \frac{\bt}{ \| \bt \|}. \]Hence \(\bt\) must be a tangent direction.

### 10.7.3.4. Closedness#

(Closedness of tangent cone)

Let \(C\) be a nonempty subset of \(\VV\). Let \(\bx \in C\). Then \(T_C(\bx)\) is closed.

Proof. .

Let \(\{ \bt_k \}\) be a sequence of \(T_C(\bx)\) such that \(\lim \bt_k = \bt \in \VV\).

If \(\bt = \bzero\), then \(\bt \in T_C(\bx)\) and we are done.

Hence consider the case where \(\bt \neq \bzero\).

WLOG, assume that \(\bt_k \neq \bzero\) for every \(k\).

By the definition of the tangent, for every \(k\), there is a sequence \(\{ \bx_k^i \}\) of \(C\) with \(\bx_k^i \neq \bx\) such that

\[ \lim_{i \to \infty} \bx_k^i = \bx, \lim_{i \to \infty} \frac{\bx_k^i - \bx}{\| \bx_k^i - \bx \|} = \frac{\bt_k}{\| \bt_k \|}. \]For each \(k\), choose and \(i_k\) such that \(i_1 < i_2 < \dots < i_k\) and

\[ \lim_{k \to \infty} \bx_k^{i_k} = \bx, \lim_{k \to \infty} \left \| \frac{\bx_k^{i_k} - \bx}{\| \bx_k^{i_k} - \bx \|} - \frac{\bt_k}{\| \bt_k \|} \right \|. \]For all \(k\), we have

\[ \left \| \frac{\bx_k^{i_k} - \bx}{\| \bx_k^{i_k} - \bx \|} - \frac{\bt}{\| \bt \|} \right \| \leq \left \| \frac{\bx_k^{i_k} - \bx}{\| \bx_k^{i_k} - \bx \|} - \frac{\bt_k}{\| \bt_k \|} \right \| + \left \| \frac{\bt_k}{\| \bt_k \|} - \frac{\bt}{\| \bt \|} \right \|. \]Taking the limit of \(k \to \infty\), we have

\[ \left \| \frac{\bx_k^{i_k} - \bx}{\| \bx_k^{i_k} - \bx \|} - \frac{\bt}{\| \bt \|} \right \| = 0. \]Hence \(\by \in T_C(\bx)\).

### 10.7.3.5. Feasible Directions and Tangent Cone#

(Feasible cone and tangent cone)

Let \(C\) be a nonempty subset of \(\VV\). Let \(\bx \in C\). The following hold true regarding the cone of feasible directions \(F_C(\bx)\) and the tangent cone \(T_C(\bx)\).

\(F_C(\bx) \subseteq T_C(\bx)\).

\(\closure F_C(\bx) \subseteq T_C(\bx)\).

If \(C\) is convex, then

\[ \closure F_C(\bx) = T_C(\bx). \]

Proof. (1) Every feasible direction is a tangent direction.

Let \(\bd\) be a feasible direction.

Then there exists a \(\overline{t} > 0\) such that

\[ \bx + t \bd \in C \Forall t \in [0, \overline{t}]. \]Let \(r_k = \frac{\overline{t}}{k}\) for every \(k \in \Nat\).

Then \(r_k > 0\) for every \(k\) and \(r_k \to 0\).

Let \(\bx_k = \bx + r_k \bd\).

Then \(\bx_k \in C\), \(\bx_k \neq \bx\) for every \(k\) and \(\bx_k \to \bx\).

Also \(\frac{\bx_k - \bx}{r_k} = \bd\) for every \(k\).

Hence \(\frac{\bx_k - \bx}{r_k} \to \bd\).

Hence \(\bd\) is a tangent direction due to Theorem 10.62.

Hence \(F_C(\bx) \subseteq T_C(\bx)\).

(2) Closure

By Theorem 10.63, \(T_C(\bx)\) is closed.

We have shown that \(F_C(\bx) \subseteq T_C(\bx)\).

The closure of a set is the smallest closed set containing it.

Hence \(\closure F_C(\bx) \subseteq T_C(\bx)\).

(3) Convexity and Closure

We are given that \(C\) is convex.

By claim (2), we have \(\closure F_C(\bx) \subseteq T_C(\bx)\).

It will suffice to show the reverse inclusion \(T_C(\bx) \subseteq \closure F_C(\bx)\).

Let \(\bd \in T_C(\bx)\).

By Theorem 10.62, there exists a sequence \(\{ \bx_k \}\) of \(C\) with \(\bx_k \to \bx\) and a positive scalar sequence \(\{ r_k \}\) with \(r_k \to 0\) such that \(\frac{\bx_k - \bx}{r_k} \to \bd\).

Since \(C\) is convex, hence the direction \(\frac{\bx_k - \bx}{r_k}\) is a feasible direction of \(C\) for every \(k\).

Hence \(\{\frac{\bx_k - \bx}{r_k} \}\) is a converging sequence of feasible directions.

Hence \(\bd \in \closure F_C(\bx)\).

Hence \(T_C(\bx) \subseteq \closure F_C(\bx)\).

### 10.7.3.6. Convexity#

(Convexity of the tangent cone)

Let \(C\) be a nonempty subset of \(\VV\). Let \(\bx \in C\). If \(C\) is convex then \(T_C(\bx)\) is convex.

Proof. .

By Theorem 10.60, \(F_C(\bx)\) is convex.

By Theorem 10.64,

\[ T_C(\bx) = \closure F_C(\bx) \]since \(C\) is convex.

By Theorem 9.123, the closure of a convex set is convex.

Hence \(T_C(\bx)\) is convex.

### 10.7.3.7. Polar Cone of Tangent Cone#

Recall from Definition 9.36 that the normal cone to a set \(C\) at a point \(\ba \in C\) is given by

(Characterization of the polar cone of the tangent cone for a convex set)

Let \(C\) be a nonempty convex set. Then for every \(\bx \in C\), we have \(\bz \in T_C(\bx)^{\circ}\) if and only if

In particular, we have

where \(N_C(\bx)\) is the normal cone of \(C\) at \(\bx\).

Proof. First consider that \(\bz \in T_C(\bx)^{\circ}\).

We note that \(F_C(\bx) \subseteq T_C(\bx)\).

Since \(C\) is convex, hence \(\by - \bx \in F_C(\bx)\) for every \(\by \in C\).

Hence \(\by - \bx \in T_C(\bx)\) for every \(\by \in C\).

Hence \(\langle \by - \bx, \bz \rangle \leq 0\) for every \(\by \in C\) from the definition of the polar cone of \(T_C(\bx)\).

Now assume that some \(\bz \in \VV^*\) satisfies

For contradiction, assume that \(\bz \notin T_C(\bx)^{\circ}\).

Then there exists some \(\by \in T_C(\bx)\) such that \(\langle \by, \bz \rangle > 0\).

Since \(\closure F_C(\bx) = T_C(\bx)\), hence there exists a sequence \(\{ \by_k \}\) of \(F_C(\bx)\) such that \(\by_k \to \by\).

\(\by_k\) is a feasible direction at \(\bx\) for every \(k\).

Due to convexity of \(C\), \(\by_k = r_k (\bx_k - \bx)\) for every \(k\) where \(\bx_k \in C\) and \(r_k > 0\).

Since \(\langle \by, \bz \rangle > 0\), hence for sufficiently large \(k\), we should have \(\langle \by_k, \bz \rangle > 0\).

In other words, for sufficiently large \(k\), we have \(r_k \langle \bx_k - \bx, \bz \rangle > 0\).

Since \(r_k > 0\), hence it reduces to \(\langle \bx_k - \bx, \bz \rangle > 0\).

But this is a contradiction to the hypothesis.

Hence we must have \(\bz \in T_C(\bx)^{\circ}\).

This characterization of polar cone of tangent cone implies that

Since \(T_C(\bx)\) is a closed and convex cone for a convex \(C\), hence due to Theorem 9.62, we have

### 10.7.3.8. Examples#

(Tangent cone for a linear system of equations)

Let \(C = \{ \bx \in \RR^n \ST \bA \bx = \bb \}\) where \(\bA \in \RR^{m \times n}\) and \(\bb \in \RR^m\).

Since \(C\) is convex, the strategy for computation of tangent cone is as follows.

Identify the set of all feasible directions at the point.

Compute the closure of the set of feasible directions at the point.

We note that \(C\) is an affine subspace.

Note that the subspace \(L\) parallel to \(C\) is given by

\[ L = \{ \bx \in \RR^n \ST \bA \bx = \bzero \}. \]Let \(\bx \in C\).

Every direction in the subspace \(L\) is a feasible direction.

Let \(\bd \in L\).

Then \(\bA (\bx + \bd) = \bb\).

Hence \(\bd\) is a feasible direction.

It is easy to see that there are no other feasible directions.

Hence \(F_C(\bx) = L\).

Since \(\RR^n\) is finite dimensional, hence \(L\) is closed.

Since \(T_C(\bx) = \closure F_C(\bx)\), hence \(T_C(\bx) = L\).

(Tangent cone for an affine subspace)

Let \(A \subseteq \VV\) be an affine subspace.

We note that \(A\) is closed and convex.

Let \(L\) be the linear subspace parallel to \(A\).

Pick some \(\bx \in A\).

Then for any \(\bt \in L\), we have \(\bx + \bt \in A\).

Hence every direction in \(L\) is a feasible direction.

It is also easy to see that no other direction is a feasible direction.

Hence \(F_C(\bx) = L\).

Since \(T_C(\bx) = \closure F_C(\bx)\), hence \(T_C(\bx) = L\).

The tangent cone at any point in an affine subspace is the subspace parallel to the affine subspace.

(Tangent cone of a closed unit ball)

Consider the set \(C = B[\bzero, 1] \subseteq \VV\) given by

Since \(C\) is convex, the strategy for computation of tangent cone is as follows.

Pick a point.

Identify the set of all feasible directions at the point.

Compute the closure of the set of feasible directions at the point.

There are two possible cases for the points in \(C\) that require separate treatment.

\(\bx \in \interior C\).

\(\bx \in \boundary C\).

(1) Interior points

Let \(\bx \in \interior C\).

Then it is easy to see that every direction is a feasible direction along which we can find another point in \(C\).

Hence \(F_C(\bx) = \VV\).

Then \(T_C(\bx) = \closure F_C(\bx) = \VV\).

(2) Boundary points

Let \(\bx \in \boundary C\).

We can see that only those directions are feasible that point towards some point in \(C\). Any direction pointing towards the exterior of \(C\) is not a feasible direction.

The directions which make an acute angle with \(\bx\) point away from the ball.

The directions which make an obtuse angle with \(\bx\) point inside the ball.

We shall now formally prove this.

Since \(\bx \in \boundary C\), hence \(\| \bx \| = 1\).

Let \(\by \in F_C(\bx)\).

Then we must have \(\bx + t \by \in C\) for sufficiently small \(t > 0\).

Hence \(\by \in F_C(\bx)\) if and only if there exists a \(\overline{t} > 0\) such that for every \(t \in (0, \overline{t}]\) we have \(\| \bx + t \by \| \leq 1\).

Expanding, we have

\[ \| \bx \|^2 + 2 t \langle \bx, \by \rangle + t^2 \| \by \|^2 \leq 1 \Forall t \in (0, \overline{t}]. \]Since \(\| \bx \| =1\), hence this reduces to

\[ 2 t \langle \bx, \by \rangle + t^2 \| \by \|^2 \leq 0 \Forall t \in (0, \overline{t}]. \]There are only two possibilities for this relation to hold true since \(\| \by \| \geq 0\).

Either \(\by = \bzero\).

Or \(\langle \bx, \by \rangle < 0\) and \(t \leq \frac{-2 \langle \bx, \by \rangle}{ \| \by \|^2}\).

From the second case, we have

\[ \overline{t} = \frac{-2 \langle \bx, \by \rangle}{ \| \by \|^2}. \]

Hence \(F_C(\bx) = \{ \by \ST \langle \bx, \by \rangle < 0 \} \cup \{ \bzero \}\).

From \(T_C(\bx) = \closure F_C(\bx) = \VV\), we have

\[ T_C(\bx) = \{ \by \ST \langle \bx, \by \rangle \leq 0 \}. \]

(Tangent cone of a closed half space)

Consider the set \(C \subseteq \VV\) given by

There are two possible cases for the points in \(C\):

\(\bx \in \interior C\).

\(\bx \in \boundary C\).

(1) Interior points

Let \(\bx \in \interior C\).

Then it is easy to see that every direction is a feasible direction along which we can find another point in \(C\).

Hence \(F_C(\bx) = \VV\).

Then \(T_C(\bx) = \closure F_C(\bx) = \VV\).

(2) Boundary points

Let \(\bx \in \boundary C\).

Then we have \( \langle \bx, \ba \rangle = b\).

Let \(\bd \in F_C(\bx)\).

This is equivalent to

\[ \langle \bx + t \bd, \ba \rangle \leq b \]for some \(t > 0\).

This is equivalent to \(\langle \bd, \ba \rangle \leq 0\).

Hence

\[ F_C(\bx) = \{ \bd \ST \langle \bd, \ba \rangle \leq 0 \}. \]This is also a closed half-space.

Hence \(T_C(\bx) = \closure F_C(\bx) = F_C(\bx)\).

We can see that \(T_C(\bx)\) is the closed half-space corresponding to the linear subspace parallel to the hyperplane

\[ H = \{ \bx \ST \langle \bx, \ba \rangle = b \} \]given by

\[ L = \{ \bx \ST \langle \bx, \ba \rangle = 0 \}. \]In other words,

\[ T_C(\bx) = C - \bz \]where \(\bz \in H\).

## 10.7.4. Normal Cones#

[9] introduces a more general notion of normal directions. For our purposes, we shall call it BNO normal directions.

(BNO normal direction)

Given a nonempty subset \(C\) of \(\VV\) and a vector \(\bx \in C\),
a vector \(\bz \in \VV\) is said to be a *normal* of \(C\)
at \(\bx\) (in the sense of [9])
if there exist sequences
\(\{ \bx_k \}\) of \(C\) and \(\{ \bz_k \}\) such that

This normal direction shall be called a *BNO normal direction*
in these notes to distinguish it from
Definition 9.35.

Each \(\bz_k\) is chosen from the polar cone of the tangent cone at \(\bx_k\).

(BNO normal cone)

Given a nonempty subset \(C\) of \(\VV\) and a vector \(\bx \in C\),
the set of all normal directions at \(\bx\)
(in the sense of [9])
is called the *normal cone* of \(C\) at \(\bx\)
and is denoted by \(\tilde{N}_C(\bx)\).

This normal cone shall be called a *BNO normal cone*
in these notes to distinguish it from
Definition 9.36.

The BNO normal cone reduces to the definition of normal cones in Definition 9.36 for convex sets. The BNO normal cone is useful in the analysis of optimality conditions for minimization of smooth functions with smooth equality and inequality constraints.

### 10.7.4.1. Conic structure#

(Conic structure)

Let \(C\) be a nonempty set and \(\bx \in C\). Then the BNO normal cone \(\tilde{N}_C(\bx)\) is a cone.

Proof. \(\bzero\) being a normal direction.

Pick any sequence \(\{ \bx_k \}\) such that \(\bx_k = \bx\).

Then \(\bzero \in T_C(\bx_k)^{\circ}\) since \(T_C(\bx_k)^{\circ}\) is a cone.

Hence we can pick \(\bz_k = \bzero\) to form a sequence \(\{ \bz_k \}\) such that \(\bz_k \to \bzero\) and \(\bz_k \in T_C(\bx_k)^{\circ}\) for every \(k\).

Hence \(\bzero\) is a BNO normal direction.

Conic property

Let \(\bz\) be a BNO normal direction and \(t > 0\).

There exists a sequence \(\{\bx_k \}\) of \(C\) and a sequence \(\{ \bz_k \}\) such that \(\bz_k \in T_C(\bx_k)^{\circ}\) for every \(k\) with \(\bz_k \to \bz\).

Then \(t \bz_k \in T_C(\bx_k)^{\circ}\) for every \(k\) since \(T_C(\bx_k)^{\circ}\) is a cone.

Also \(t \bz_k \to t \bz\).

Hence \(t \bz\) is a BNO normal direction.

Hence \(\tilde{N}_C(\bx)\) is a cone.

### 10.7.4.2. Polar Cone of Tangent Cone#

(Polar cone of tangent cone and BNO normal cone)

Let \(C\) be a nonempty set and \(\bx \in C\). Then

Proof. This is straightforward from the definition of BNO normal directions.

Let \(\bz \in T_C(\bx)^{\circ}\).

Consider the sequence \(\{ \bx_k \}\) where \(\bx_k = \bx\).

Hence \(\bx_k \to \bx\).

Then \(\bz \in T_C(\bx_k)^{\circ}\) for every \(k\).

Consider the sequence \(\{ \bz_k \}\) where \(\bz_k = \bz\).

Then \(\bz_k \to \bz\).

Hence \(\bz \in \tilde{N}_C(\bx)\).

Hence \(T_C(\bx)^{\circ} \subseteq \tilde{N}_C(\bx)\).

### 10.7.4.3. Closedness#

(Closedness of BNO normal cone)

Let \(C\) be a nonempty set and \(\bx \in C\). Then the BNO normal cone at \(\bx\), \(\tilde{N}_C(\bx)\) is closed.

Proof. Consider a converging sequence \(\{\bz_l \}\) of \(\tilde{N}_C(\bx)\).

Let \(\bz_l \to \bz\). We need to show that \(\bz \in \tilde{N}_C(\bx)\).

For each \(\bz_l\), there exist sequences \(\{ \bx_l^m \}\) of \(C\) and \(\{ \bz_l^m \}\) where \(\bz_l^m \in T_X(\bx_l^m)^{\circ}\) such that \(\bx_l^m \to \bx\) and \(\bz_l^m \to \bz_l\).

For each \(k\)

Let \(r = \frac{1}{k}\).

Pick a \(\bz_{l_0}\) from \(\{ \bz_l \}\) such that \(\| \bz_{l_0} - \bz \| < r\).

Identify the index \(m_0\) such that for all \(m > m_0\), we have \(\| \bx_{l_0}^m - \bx \| < r\).

Pick a \(\bz_{l_0}^{m_1}\) from \(\{ \bz_{l_0}^m \}\) such that \(m_1 > m_0\) and \(\| \bz_{l_0}^{m_1} - \bz_{l_0} \| < r\).

Let \(\bx_k = \bx_{l_0}^{m_1}\).

Let \(\bz_k = \bz_{l_0}^{m_1}\).

Then we have \(\| \bx_k - \bx \| < r = \frac{1}{k}\) for every \(k\).

Also \(\| \bz_k - \bz \| < 2 r = \frac{2}{k}\) for every \(k\).

\(\bz_k \in T_X(\bx_k)^{\circ}\).

\(\bx_k \to \bx\).

\(\bz_k \to \bz\).

Hence \(\bz \in \tilde{N}_C(\bx)\).

Hence, every convergent sequence of \(\tilde{N}_C(\bx)\) in \(\tilde{N}_C(\bx)\). Hence \(\tilde{N}_C(\bx)\) is closed.

### 10.7.4.4. Regularity#

(Regular set)

Let \(C\) be a nonempty set and \(\bx \in C\).
Then \(C\) is said to be *regular* at \(\bx\)
(in the sense of [9])
if

In other words, a set is regular at a point if the (BNO) normal cone is equal to the polar cone of the tangent cone.

(Regularity of convex sets)

Let \(C\) be a nonempty convex set and \(\bx \in C\). Then \(C\) is regular at every \(\bx \in C\). Moreover,

Proof. We show the regularity of \(C\) at \(\bx\) by showing that \(\tilde{N}_C(\bx) \subseteq T_C(\bx)^{\circ}\).

Let \(\bz \in \tilde{N}_C(\bx)\).

Then there exist sequences \(\{ \bx_k \}\) of \(C\) and \(\{ \bz_k \}\) with \(\bz_k \in T_C(\bx_k)^{\circ}\) for every \(k\) such that \(\bx_k \to \bx\) and \(\bz_k \to \bz\).

Since \(C\) is convex, hence, due to Theorem 10.66, we have, for every \(k\),

\[ \langle \by - \bx_k, \bz_k \rangle \leq 0 \Forall \by \in C. \]By taking limit \(k \to \infty\), we obtain

\[ \langle \by - \bx, \bz \rangle \leq 0 \Forall \by \in C. \]Hence, again due to Theorem 10.66, \(\bz \in T_C(\bx)^{\circ}\).

Hence \(\tilde{N}_C(\bx) \subseteq T_C(\bx)^{\circ}\).

Combining with Theorem 10.68, we have

We also saw in Theorem 10.66 that \(T_C(\bx)^{\circ} = N_C(\bx)\).

This result shows that the definition of normal cone in Definition 9.36 agrees with the definition in Definition 10.27 for convex sets.

## 10.7.5. Optimality Conditions#

### 10.7.5.1. Minimization on an Arbitrary Constraint Set#

When the constraint set is nonconvex, then the tangent cone can be used as a suitable approximation to the constraint set at the point of local optimality. The tangent cone provides additional structure of being a cone and being closed.

(Tangent cone and local optimality)

Let \(f: \VV \to \RERL\) be a function with \(S = \dom f\). Let \(\ba\) be a local minimum of \(f\) over a subset \(C\) of \(S\). Assume that \(f\) is continuously differentiable over an open set \(O\) containing the local minimum \(\ba\). Assume that \(O \subseteq C\). Then

Equivalently, we must have

i.e., the negative of the gradient at the local minimum belongs to the polar cone of the tangent cone or the gradient at the local minimum belongs to the dual cone of the tangent cone.

This necessary condition means that the the value of function increases along every tangent direction in its local neighborhood.

For a smooth \(f: \VV \to \RR\) the condition simplifies as follows. Let \(\ba\) be a local minimum of \(f\) over a subset \(C\) of \(\VV\). Then

Proof. For the zero tangent direction at \(\ba\), the condition is met trivially. Let \(\bt\) be a nonzero tangent direction at \(\ba\).

Then there exists a sequence \(\{ \bx_k \}\) of \(C\) and a sequence \(\{\br_k \}\) of \(\VV\) such that \(\bx_k \neq \ba\) for every \(k\), \(\br_k \to \bzero\), \(\bx_k \to \ba\) and

\[ \frac{\bx_k - \ba}{\| \bx_k - \ba \|} = \frac{\bt}{\| \bt \|} + \br_k. \]The term \(\br_k\) is the residual between the \(k\)-th normalized direction and the normalized tangent direction.

WLOG assume that \(\bx_k \in O \Forall k\). Otherwise drop finitely many terms from \(\{ \bx_k \}\) to achieve this.

For every \(k\), we have

\[ \bx_k - \ba = \frac{\| \bx_k - \ba \|}{\| \bt \|}(\bt + \| \bt \| \br_k). \]Since \(\bx_k \in O\) for every \(k\), hence \(f\) is continuously differentiable at every \(\bx_k\) and at every point on the line segment \([\ba, \bx_k]\).

By mean value theorem, we have

\[ f(\bx_k) = f(\ba) + \langle \bx_k - \ba, \nabla f(\tilde{\bx}_k) \rangle \]where \(\tilde{\bx}_k\) lies on the line segment \([\ba, \bx_k]\).

Since \(\bx_k \to \ba\), hence \(\tilde{\bx}_k \to \ba\) holds too.

Substituting for \(\bx_k - \ba\), we get

\[\begin{split} f(\bx_k) &= f(\ba) + \left \langle \frac{\| \bx_k - \ba \|}{\| \bt \|} \bt_k, \nabla f(\tilde{\bx}_k) \right \rangle \\ &= f(\ba) + \frac{\| \bx_k - \ba \|}{\| \bt \|} \langle \bt_k , \nabla f(\tilde{\bx}_k) \rangle \end{split}\]where \(\bt_k = \bt + \| \bt \| \br_k\).

Note that \(\bt_k \to \bt\) since \(\br_k \to \bzero\).

Suppose for contradiction that \(\langle \bt, \nabla f(\ba) \rangle < 0\).

Since \(\tilde{\bx}_k \to \ba\) and \(\bt_k \to \bt\), hence it follows that for all sufficiently large \(k\), we have \(\langle \bt_k, \nabla f(\tilde{\bx}_k) \rangle < 0\).

Hence, from the previous relation, we have \(f(\bx_k) < f(\ba)\) for all sufficiently large \(k\).

Thus contradictions the local optimality of \(\ba\).

Hence we must have \(\langle \bt, \nabla f(\ba) \rangle \geq 0\).

An equivalent statement is that

This means that \(-\nabla f(\ba) \in T_C(\ba)^{\circ}\). Since the polar cone is the negative of the dual cone, hence \(\nabla f(\ba) \in T_C(\ba)^*\).

(Local minimum and descent directions)

Recall that a direction \(\bd\) that satisfies \(\langle \bd, \nabla f(\ba) \rangle < 0\) is called a descent direction at \(\ba\). Recall from Theorem 5.1 that

Thus for some \(\bx = \ba + t \bd\) where \(t > 0\), we have

Hence for sufficiently small \(t > 0\), we have \(f(\bx) < f(\ba)\).

Thus, Theorem 10.72 says that if \(\ba\) is a local minimum of \(f\) over \(C\), then there is no descent direction within the tangent cone \(T_C(\ba)\).

### 10.7.5.2. Minimization on a Convex Constraint Set#

(Local optimality with a convex constraint set)

Let \(f: \VV \to \RERL\) be a function with \(S = \dom f\). Let \(\ba\) be a local minimum of \(f\) over a convex subset \(C\) of \(S\). Assume that \(f\) is continuously differentiable over an open set \(O\) containing the local minimum \(\ba\). Assume that \(O \subseteq C\). Then

If \(f\) is real valued (with \(\dom f = \VV\)) and \(C = \VV\), this reduces to \(\nabla f(\ba) = \bzero\).

This result is similar to Theorem 10.47 however applicable to all functions that are continuously differentiable in the neighborhood of their local minimum.

Proof. Since \(C\) is convex, hence by Theorem 10.64, \(\closure F_C(\bx) = T_C(\bx)\) for every \(\bx \in C\).

By Theorem 10.72, we have

\[ \langle \by, \nabla f(\ba) \rangle \geq 0, \Forall \by \in T_C(\ba). \]In other words

\[ \langle \by, \nabla f(\ba) \rangle \geq 0, \Forall \by \in \closure F_C(\ba). \]For any \(\bx \in C\), we have \(\bx - \ba \in F_C(\ba)\) since \(C\) is convex.

Hence

\[ \langle \bx - \ba, \nabla f(\ba) \rangle \geq 0, \Forall \bx \in C. \]

The special case of \(C = \VV\) is as follows.

Choose an orthogonal basis \(\be_1, \dots, \be_n\) for \(\VV\).

By picking \(\bx = \ba + \be_i\) and \(\bx = \ba - \be_i\), we see that

\[ \langle \be_i \nabla f(\ba) \rangle = 0 \Forall i. \]Hence \(\nabla f(\ba) = \bzero\).

### 10.7.5.3. Convex Functions on a Convex Set#

Recall from Fermatâ€™s optimality condition (Theorem 9.233) that for the unconstrained minimization problem, \(\ba\) is a minimizer of \(f\) if and only if \(\bzero\) is a subgradient of \(f\) at \(\ba\).

A similar result can be obtained for the constrained optimization problem in terms of the subgradients of \(f\) and the normal cone of \(C\).

(Optimality conditions for convex constrained optimization)

Let \(f : \VV \to \RERL\) be a proper and convex function. Let \(C \subseteq \VV\) be a convex set for which \(\relint \dom f \cap \relint C \neq \EmptySet\). Then, \(\bx^* \in C\) is an optimal solution of (10.15) if and only if

Equivalently, \(\bx^*\) minimizes \(f\) over \(C\) if and only if

Proof. The problem in (10.15) can be rephrased as

where \(I_C\) is the indicator function for the set \(C\).

Since \(\relint \dom f \cap \relint C \neq \EmptySet\), hence due to the sum rule of subdifferential calculus (Corollary 9.23):

\[ \partial (f + I_C) (\bx) = \partial f(\bx) + \partial I_C(\bx). \]Recall from Theorem 9.235 that

\[ \partial I_C(\bx) = N_C(\bx). \]Hence for any \(\bx \in \VV\)

\[ \partial (f + \delta_C) (\bx) = \partial f(\bx) + N_C(\bx). \]Invoking Fermatâ€™s optimality condition (Theorem 9.233), \(\bx^* \in C\) is an optimal solution of (10.15) if and only if

\[ \bzero \in \partial f(\bx^*) + N_C(\bx^*). \]In other words, there exists \(\bg \in \partial f(\bx^*)\) and \(\bh \in N_C(\bx^*)\) such that \(\bg + \bh = \bzero\).

This is equivalent to saying that there exists a vector \(\bg \in \partial f(\bx^*)\) such that \(-\bg \in N_C(\bx^*)\).

This is same as the condition

\[ (- \partial f(\bx^*)) \cap N_C(\bx^*) \neq \EmptySet. \]

For the second part:

By Theorem 10.66, we have \(N_C(\bx^*) = T_C(\bx^*)^{\circ}\) since \(C\) is convex.

By the earlier argument, we have \(\bg \in \partial f(\bx^*)\) and \(\bh \in N_C(\bx^*)\) such that

\[ \bg + \bh = \bzero. \]This is equivalent to saying that

\[ \bzero \in \partial f(\bx^*) + T_C(\bx^*)^{\circ}. \]

By using the definition of the normal cone, we can provide an alternative specification of the necessary and sufficient optimality condition in a more explicit manner.

(Optimality conditions for convex constrained optimization second version)

Let \(f : \VV \to \RERL\) be a proper and convex function. Let \(C \subseteq \VV\) be a convex set for which \(\relint \dom f \cap \relint C \neq \EmptySet\). Then, \(\bx^* \in C\) is an optimal solution of (10.15) if and only if

Following is the refinement of Theorem 10.73 for continuously differentiable functions at the global minimum.

(Minimization of convex and smooth function on a convex set)

Let \(f : \VV \to \RERL\) be a proper and convex function. Let \(C \subseteq \VV\) be a convex set for which \(\relint \dom f \cap \relint C \neq \EmptySet\). Assume that \(f\) is continuously differentiable at a point \(\bx^* \in C\).

Then, \(\bx^* \in C\) is an optimal solution of (10.15) if and only if

### 10.7.5.4. Sum of Smooth and Convex Functions on an Arbitrary Constraint Set#

(Minimization of sum of smooth and convex function)

Let \(\ba\) be a local minimum of a function \(f: \VV \to \RR\) over a set \(C\). Assume that the tangent cone \(T_C(\ba)\) is convex. Assume that \(f\) has the form

where \(f_1 : \VV \to \RR\) is convex and \(f_2 : \VV \to \RR\) is smooth. Then

### 10.7.5.5. Optimization Over Unit Simplex#

In this section, we consider the problem of minimizing a function over the unit simplex \(\Delta_n\).

Recall from Definition 9.17 that the unit simplex is given by

In other words, for every \(\bx \in \Delta_n\), every component is nonnegative and their sum is 1.

(Optimality conditions over unit simplex)

Let \(f : \RR^n \to \RERL\) be a proper and convex function.

Consider the optimization problem

Assume that \(\relint \dom f \cap \relint \Delta_n \neq \EmptySet\). Then, \(\bx^* \in \Delta_n\) is an optimal solution if and only if there exists \(\bg \in \partial f(\bx^*)\) and \(\mu \in \RR\) such that

Proof. Assume that for some \(\bx^* \in \Delta_n\), there exists a \(\bg \in \partial f(\bx^*)\) and \(\mu\) satisfying (10.17).

Note that \(\sum_{i=1}^n \bx_i^* = 1\) simplifies to \(\sum_{i \ST x_i > 0} x_i^* = 1\).

Now, for any \(\bx \in \Delta_n\)

\[\begin{split} \bg^T(\bx - \bx^*) &= \sum_{i=1}^n g_i (x_i - x_i^*) \\ &= \sum_{i \ST x_i^* > 0} g_i (x_i - x_i^*) + \sum_{i \ST x_i^* = 0} g_i x_i \\ &\geq \sum_{i \ST x_i^* > 0} \mu (x_i - x_i^*) + \sum_{i \ST x_i^* = 0} \mu x_i \\ &= \mu \sum_{i=1}^n x_i - \mu \sum_{i \ST x_i^* > 0} x_i^* \\ &= \mu 1 - \mu 1 = \mu - \mu = 0. \end{split}\]Thus,

\[ \bg^T(\bx - \bx^*)\geq 0 \Forall \bx \in \Delta_n. \]Thus, due to Corollary 10.4, \(\bx^*\) is indeed an optimal point.

For the converse, assume that \(\bx^*\) is an optimal point.

Then, due to Corollary 10.4, there exists \(\bg \in \partial f(\bx^*)\) such that \(\bg^T(\bx - \bx^*)\geq 0 \Forall \bx \in \Delta_n\).

We need to find a value of \(\mu\) such that \(\bg\) satisfies (10.17).

We consider the following three cases.

\(\bx^* = \bzero\).

Only one entry in \(\bx^*\) is nonzero and others are zero.

Two or more entries in \(\bx^*\) are nonzero.

If \(\bx^* = \bzero\) then we can choose \(\mu = \min \{ g_1, \dots, g_n \}\) satisfying (10.17).

Suppose that only the \(i\)-th entry of \(\bx^*\) is nonzero and remaining entries are zero.

Since \(\bx^* \in \Delta_n\) hence \(\sum_{j=1}^n x_j^* = 1\) implies that \(x_i^* = 1\).

Let \(\mu = g_i\).

Pick any \(j \in \{1,\dots,n \}\) such that \(j \neq i\).

Let \(\bx = \be_j\); i.e., the unit vector whose \(j\)-th component is 1 and other components are zero.

Then,

\[\begin{split} & \bg^T (\bx - \bx^*) \geq 0 \\ \implies & \bg^T \bx - \bg^T \bx^* \geq 0 \\ \implies & g_j - g_i \geq 0 \\ \implies & g_j \geq g_i = \mu. \end{split}\]Thus, with \(\mu = g_i\), (10.17) is satisfied.

For the third case, let \(i\) and \(j\) denote two different indices for which \(x_i^* > 0\) and \(x_j^* > 0\).

Pick a vector \(\bx \in \Delta_n\) satisfying

\[\begin{split} x_k = \begin{cases} x_k^*, & k \notin \{i, j \}; \\ x_i^* - \frac{x_i^*}{2}, & k = i ; \\ x_j^* + \frac{x_i^*}{2}, & k = j . \end{cases} \end{split}\]We can verify that \(\bx \in \Delta_n\).

Then, the inequality \(\bg^T(\bx - \bx^*) \geq 0\) simplifies to

\[ - \frac{x_i^*}{2} g_i + \frac{x_i^*}{2} g_j \geq 0. \]Since \(x_i^* > 0\), hence this simplifies to \(g_i \leq g_j\).

Switching the role of \(i\) and \(j\) and following an identical argument, we get \(g_i \geq g_j\).

This means that \(g_i = g_j\) must hold true.

The argument is independent of the choice of \(i,j\) for which \(x_i^*, x_j^* > 0\).

Thus \(g_i = g_j\) holds true for every \(i,j\) such that \(x_i^*, x_j^* > 0\)..

This implies that all the components of \(\bg\) corresponding to the positive components of \(\bx^*\) must have the same value.

Let this common value be \(\mu\).

In other words, \(g_i = \mu\) for all \(i=1,\dots,n\) such that \(x_i^* > 0\).

Now consider some \(k \in \{1,\dots,n\}\) such that \(x_k = 0\).

Let \(\bx = \be_k\).

Note that \(\sum_{i=1}^n \bx_i^* = 1\) simplifies to \(\sum_{i \ST x_i > 0} x_i^* = 1\).

Then,

\[\begin{split} & \bg^T (\bx - \bx^* ) \geq 0 \\ \implies & \bg^T \bx - \bg^T \bx^* \geq 0 \\ \implies & g_k - \sum_{i \ST x_i^* > 0} g_i x_i^* \geq 0\\ \implies & g_k - \mu \sum_{i \ST x_i^* > 0} x_i^* \geq 0\\ \implies & g_k - \mu \geq 0\\ \implies & g_k \geq \mu. \end{split}\]Thus, \(g_i \geq \mu\) for all \(i=1,\dots,n\) such that \(x^*_i = 0\).

Thus, we have established that \(\bg\) indeed satisfies (10.17).

(Optimization over the unit simplex)

Consider the function \(f : \RR^n \to \RERL\) given by

The vector \(\by = (y_1, \dots, y_n) \in \RR^n\) is fixed. Consider the optimization problem

\(f\) is a proper convex function.

We can see that \(\dom f = \RR^n_+\).

\(\interior \dom f = \relint \dom f = \RR^n_{++}\).

It is easy to see that \(\relint \dom f \cap \relint \Delta_n \neq \EmptySet\).

\(f\) is differentiable at \(\bx \in \interior \dom f\).

The partial derivatives of \(f\) are given by

\[ \frac{\partial f}{\partial x_i} (\bx) = 1 + \ln x_i - y_i. \]Thus, \(\partial f(\bx) = \{ \nabla f(\bx) \}\) at every \(\bx \succ \bzero\).

Assume that there is an optimal solution satisfying \(\bx^* \succ \bzero\).

By Theorem 10.76, there exists \(\bg \in \partial f(\bx^*)\) and \(\mu \in \RR\) satisfying (10.17).

Since \(\bx \succ \bzero\), (10.17) simplifies to \(g_i = \mu\) for all \(i=1,\dots,n\).

Since \(f\) is differentiable, hence the only subgradient is \(\nabla f(\bx^*)\).

Thus, there exists \(\mu \in \RR\) such that

\[ \frac{\partial f}{\partial x_i} (\bx^*) = \mu. \]This implies

\[ 1 + \ln x_i^* - y_i = \mu. \]Therefore, for every \(i\)

\[ x_i^* = e^{\mu - 1 + y_i} = \alpha e^{y_i} \]where \(\alpha = e^{\mu - 1}\).

Since \(\bx^* \in \Delta_n\), hence \(\sum_{i=1}^n x_i^* = 1\).

Thus,

\[\begin{split} & \sum_{i=1}^n \alpha e^{y_i} = 1 \\ \iff & \alpha = \frac{1}{\sum_{i=1}^n e^{y_i}}. \end{split}\]Therefore

\[ x_i^* = \frac{e^{y_i}}{\sum_{j=1}^n e^{y_j}} \Forall i = 1,\dots,n. \]We note that \(\bx^*\) obtained above satisfies all the conditions in Theorem 10.76.

Since these conditions are also sufficient, hence \(\bx^*\) is indeed the optimal solution for this minimization problem.

Let us compute the optimal value of the minimization problem.

We note that

\[ \ln x_i^* = \mu - 1 + y_i. \]Hence

\[\begin{split} & f(\bx^*) = \sum_{i=1}^n x_i^* \ln x_i^* - \sum_{i=1}^n y_i x_i^* \\ &= \sum_{i=1}^n x_i^* ( \mu - 1 + y_i) - \sum_{i=1}^n y_i x_i^* \\ &= \sum_{i=1}^n x_i^* ( \mu - 1) \\ &= ( \mu - 1) \sum_{i=1}^n x_i^* \\ & = \mu -1. \end{split}\]In terms of \(\by\),

\[ f(\bx^*) = \mu -1 = \ln \alpha = - \ln \left ( \sum_{i=1}^n e^{y_i} \right ). \]The optimal value is the negative of the log-sum-exp of \(\by\).

This optimization problem is used in the computation of the conjugate function for the negative entropy function. See Theorem 9.257.