8.2. Projection on Convex Sets#

We will assume \(\VV\) to be real \(n\)-dimensional vector space space with an inner product \(\langle \cdot, \cdot \rangle\) , the induced norm \(\| \cdot \|\) and corresponding dual norm \(\| \cdot \|_*\) for the dual space \(\VV^*\).

We are interested in mapping a point \(\bx\) to the nearest point in a set \(C\). In general, this problem may have zero, one or multiple solutions. However, in the special case where \(C\) is a nonempty, closed and convex set, then there is exactly one such point in \(C\) which is nearest to a given point \(\bx\). This nearest point is called the projection of \(\bx\) on to \(C\).

Main references for this section are [3, 5].

8.2.1. Projection Theorem#

Theorem 8.6 (Projection theorem)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). For every \(\bx \in \VV\), there exists a unique vector that minimizes \(\| \bz - \bx \|\) over all \(\bz \in C\). This vector is called the projection of \(\bx\) on \(C\).

Proof. We fix some \(\bx \in \VV\) and choose an element \(\bw \in C\).

  1. Consider the function

    \[ g(\bz) = \frac{1}{2} \| \bz - \bx \|^2. \]
  2. Minimizing \(\| \bz - \bx \|\) over all \(C\) is equivalent to minimizing \(g(\bz)\) over the set

    \[ D = \{ \bz \in C \ST \| \bz - \bx \| \leq \| \bw - \bx \| \}. \]
  3. We note that \(D\) is a compact set and \(g\) is a l.s.c., closed and coercive function.

  4. By Weierstrass’ theorem Corollary 6.2, the set of minimizers for \(g\) is nonempty and compact.

We now show that the minimizer is unique.

  1. \(g\) is a strictly convex function because its Hessian matrix is identity matrix which is positive definite.

  2. Hence, the minimizer is unique due to Theorem 8.2.

8.2.2. Orthogonal Projection#

Definition 8.6 (Orthogonal Projection Mapping)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). The orthogonal projection mapping \(P_C : \VV \to \VV\) is defined by:

\[ P_C(\bx) \triangleq \underset{\by \in C}{\argmin} \| \by - \bx \| \Forall \bx \in \VV. \]

This mapping is well defined since the projection is unique for a nonempty, closed and convex set \(C\) due to Theorem 8.6.

The vector \(P_C(\bx)\) is called the projection of \(\bx\) on the set \(C\).

8.2.3. Characterization#

Theorem 8.7 (Orthogonal projection characterization)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). For every vector \(\bx \in \VV\), a vector \(\bz \in C\) is its projection if and only if

\[ \langle \by - \bz, \bx - \bz \rangle \leq 0 \Forall \by \in C. \]

This result is also known as the second projection theorem [3].

Proof. Assume that for some \(\bz \in C\), \(\langle \by - \bz, \bx - \bz \rangle \leq 0 \Forall \by \in C\) holds true.

  1. For any \(\by \in C\)

    \[\begin{split} \| \by - \bx \|^2 &= \| (\by - \bz) - (\bx - \bz) \|^2 \\ &= \| \by - \bz \|^2 + \| \bz - \bx \|^2 - 2 \langle \by - \bz, \bx - \bz \rangle \\ &\geq \| \bz - \bx \|^2 - 2 \langle \by - \bz, \bx - \bz \rangle. \end{split}\]
  2. Thus,

    \[ \| \by - \bx \|^2 \geq \| \bz - \bx \|^2 \Forall \by \in C. \]
  3. Thus, \(\bz\) is indeed the projection of \(\bx\) on \(C\).

Conversely, assume that \(\bz\) is the projection of \(\bx\) on \(C\).

  1. Let \(\by \in C\) be arbitrary.

  2. For any \(t \geq 0\), define \(\by_t = t \by + (1 -t ) \bz\).

  3. Then, we have

    \[\begin{split} \| \bx - \by_t \|^2 &= \| t \bx + (1-t) \bx + - t \by - (1 -t ) \bz \|^2 \\ &= \| t (\bx - \by) + (1- t) (\bx - \bz) \|^2\\ &= t^2 \| \bx - \by \|^2 + (1-t)^2 \| \bx - \bz \|^2 + 2 t (1-t) \langle \bx - \by, \bx - \bz \rangle. \end{split}\]
  4. Viewing \(\| \bx - \by_t \|^2\) as a function of \(t\) and differentiating w.r.t. \(t\), we have

    \[\begin{split} \left . \frac{d}{d t} \| \bx - \by_t \|^2 \right |_{t = 0} &= -2 \| \bx - \bz \|^2 + 2 \langle \bx - \by, \bx - \bz \rangle\\ &= -2 \langle \bx - \bz, \bx - \bz \rangle + 2 \langle \bx - \by, \bx - \bz \rangle\\ &= -2 \langle \by - \bz, \bx - \bz \rangle. \end{split}\]
  5. Since \(t=0\) minimizes \(\| \bx - \by_t \|^2\) over \(t \in [0,1]\), we must have

    \[ \left . \frac{d}{d t} \| \bx - \by_t \|^2 \right |_{t = 0} \geq 0. \]
  6. Thus, we require that

    \[ \langle \by - \bz, \bx - \bz \rangle \leq 0 \]

    must hold true for every \(\by \in C\).

Following is an alternative proof based on results from Constrained Optimization I. This proof is specific to the case where \(\VV = \RR^n\).

Proof. Define a function \(f: \RR^n \to \RR\) as

\[ f(\by) = \| \by - \bx \|^2. \]

Then, the projection problem can be cast as an optimization problem

\[\begin{split} & \text{minimize } & & f(\by) \\ & \text{subject to } & & \by \in C. \end{split}\]

Note that the gradient of \(f\) is given by

\[ \nabla f (\by) = \nabla \langle \by - \bx, \by - \bx \rangle = \nabla (\langle \by, \by \rangle - 2 \langle \by, \bx \rangle + \langle \bx, \bx \rangle) = 2 (\by - \bx). \]

By Theorem 8.47, \(\bz\) is an optimal solution if and only if

\[ f(\bz)^T (\by - \bz) \geq 0 \Forall \by \in C. \]

In other words

\[ 2 (\bz - \bx)^T (\by - \bz) \geq 0 \Forall \by \in C. \]

We can simplify this as

\[ \langle \bx - \bz, \by - \bz \rangle \leq 0 \Forall \by \in C. \]

Theorem 8.8 (Orthogonal projection on an affine subspace)

Let \(C\) be an affine subspace of \(\VV\). Let \(S\) be the linear subspace parallel to \(C\). For every vector \(\bx \in \VV\), a vector \(\bz \in C\) is its projection if and only if

\[ \bx - \bz \in S^{\perp}. \]

Proof. Since \(C\) is an affine subspace of \(\VV\), hence \(C\) is nonempty, convex and closed (as \(\VV\) is finite dimensional).

  1. By Theorem 8.7, \(\bz\) is the projection of \(\bx\) on \(C\) if and only if for every \(\by \in C\), we have

    \[ \langle \by - \bz, \bx - \bz \rangle \leq 0. \]
  2. But \(\by \in C\) if and only if \(\by - \bz \in S\).

  3. Hence the condition is equivalent to

    \[ \langle \bw, \bx - \bz \rangle \leq 0 \Forall \bw \in S. \]
  4. But then, it must be an equality since \(\bw\) and \(-\bw\) both belong to \(S\). Thus, we have

    \[ \langle \bw, \bx - \bz \rangle = 0 \Forall \bw \in S. \]
  5. In other words, \(\bx - \bz \in S^{\perp}\).

8.2.4. Distance Function#

Recall that the distance of a point \(\bx \in \VV\) from a set \(C\) is defined as

\[ d_C(\bx) \triangleq \underset{\by \in C}{\inf} \| \bx - \by \|. \]

Theorem 8.9 (Distance function for nonempty, closed and convex set)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Then the function \(d_C : \VV \to \RR\) defining the distance of a point \(\bx \in \VV\) from the set \(C\) satisfies

\[ d_C(\bx) = \| \bx - P_C(\bx) \|. \]

Proof. By Theorem 8.6, there exists a unique point \(P_C(\bx)\) which minimizes the distance between \(\bx\) and \(C\). Hence

\[ d_C(\bx) = \| \bx - P_C(\bx) \| \]

must hold true.

Theorem 8.10 (Distance function for nonempty, closed and convex set is convex)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Let \(d_C : \VV \to \RR\) be the distance to the set \(C\) function as defined in Theorem 8.9. Then, \(d_C\) is convex.

Proof. Assume, for contradiction, that \(d_C\) is not convex.

  1. Then, there exist \(\bx, \by \in \VV\) and \(t \in (0,1)\) such that

    \[ d_C(t \bx + (1-t) \by) > t d_C(\bx) + (1-t)d_C(\by). \]
  2. Let \(\bu = P_C(\bx)\) and \(\bv = P_C(\by)\). By definition, \(\bu, \bv \in C\).

  3. Then,

    \[ t d_C(\bx) + (1-t)d_C(\by) = t \| \bu - \bx \| + (1-t) \| \bv - \by \|. \]
  4. Since \(C\) is convex, hence, \(t \bu + (1-t) \bv \in C\).

  5. Since, \(d_C(t \bx + (1-t) \by)\) minimizes the distance of \(C\) from the point \(t \bx + (1-t) \by\), hence

    \[ \| t \bu + (1-t) \bv - t \bx - (1-t) \by \| \geq d_C(t \bx + (1-t) \by). \]
  6. Rewriting,

    \[ d_C(t \bx + (1-t) \by) \leq \| t (\bu - \bx) + (1-t) (\bv - \by) \| \leq t \| \bu - \bx \| + (1-t) \| \bv - \by \| \]

    due to triangle inequality.

  7. But, this leads to the contradiction

    \[ t \| \bu - \bx \| + (1-t) \| \bv - \by \| < d_C(t \bx + (1-t) \by) \leq t \| \bu - \bx \| + (1-t) \| \bv - \by \|. \]
  8. Hence, \(d_C\) must be convex.

8.2.5. Nonexpansiveness#

Definition 8.7 (Nonexpansiveness property)

Let \(\VV\) be a normed linear space. An operator \(T : \VV \to \VV\) is called nonexpansive if

\[ \| T (\bx) - T (\by) \| \leq \| \by - \bx \| \Forall \bx, \by \in \VV. \]

In other words, the distance between mapped points in \(\VV\) is always less than or equal to the distance between original points in \(\VV\).

Theorem 8.11 (Nonexpansive operators are Lipschitz continuous)

Let \(\VV\) a be normed linear space. A nonexpansive operator \(T : \VV \to \VV\) is Lipschitz continuous. Hence, it is uniformly continuous.

Proof. Recall from Definition 3.45 that if \(T\) is a Lipschitz map, then there exists \(L > 0\) such that

\[ \| T (\bx) - T (\by) \| \leq L \| \bx - \by \| \]

for every \(\bx, \by \in \VV\).

For a nonexpansive operator, such \(L = 1\). This \(T\) is indeed Lipschitz continuous. By Theorem 3.57, every Lipschitz continuous function is uniformly continuous.

Definition 8.8 (Firm nonexpansiveness property)

Let \(\VV\) be a real inner product space. An operator \(T : \VV \to \VV\) is called firmly nonexpansive if

\[ \langle T(\bx) - T(\by), \bx- \by \rangle \geq \| T(\bx) - T (\by) \|^2 \]

holds true for every \(\bx, \by \in \VV\).

Theorem 8.12 (A firmly nonexpansive operator is nonexpansive)

Let \(\VV\) be a real inner product space. Let \(T : \VV \to \VV\) be a firmly nonexpansive operator. Then, \(T\) is nonexpansive.

Proof. For every \(\bx, \by \in \VV\), we have

\[ \| T(\bx) - T (\by) \|^2 \leq \langle T(\bx) - T(\by), \bx- \by \rangle. \]

Applying Cauchy Schwartz inequality on R.H.S., we get

\[ \| T(\bx) - T (\by) \|^2 \leq \| T(\bx) - T(\by) \| \| \bx- \by \|. \]

Canceling terms, we get:

\[ \| T(\bx) - T (\by) \| \leq \| \bx- \by \| \]

which is the nonexpansive property.

Theorem 8.13 (Orthogonal projection is nonexpansive)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Let \(P_C : \VV \to \VV\) be the orthogonal projection operator as defined in Definition 8.6 is nonexpansive (and therefore continuous).

In other words,

\[ \| P_C (\bx) - P_C (\by) \| \leq \| \bx - \by \| \Forall \bx, \by \in \VV. \]

Proof. Let \(\bx, \by \in \VV\).

  1. By Theorem 8.7,

    \[ \langle \bw - P_C(\bx), \bx - P_C(\bx) \rangle \leq 0 \Forall \bw \in C. \]
  2. In particular \(P_C(\by) \in C\). Hence,

    \[ \langle P_C(\by) - P_C(\bx), \bx - P_C(\bx) \rangle \leq 0. \]
  3. Similarly, starting with \(P_C(\bx)\), we obtain

    \[ \langle P_C(\bx) - P_C(\by), \by - P_C(\by) \rangle \leq 0. \]
  4. Adding these two inequalities, we obtain

    \[ \langle P_C(\by) - P_C(\bx), \bx - P_C(\bx) - \by + P_C(\by) \rangle \leq 0. \]
  5. By rearranging the terms, we get

\[ \langle P_C(\by) - P_C(\bx), P_C(\by) - P_C(\bx) \rangle \leq \langle P_C(\by) - P_C(\bx), \by - \bx \rangle. \]
  1. Applying the Cauchy Schwartz inequality on the R.H.S., we obtain

\[ \| P_C(\by) - P_C(\bx) \|^2 \leq \| P_C(\by) - P_C(\bx) \| \| \by - \bx \|. \]
  1. Thus, \(P_C\) is nonexpansive.

  2. Since \(P_C\) is nonexpansive, hence \(P_C\) is continuous also.

Theorem 8.14 (Orthogonal projection is firmly nonexpansive)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Let \(P_C : \VV \to \VV\) be the orthogonal projection operator as defined in Definition 8.6. Then \(P_C\) is a firmly nonexpansive operator.

In other words,

\[ \langle P_C(\bx) - P_C(\by), \bx- \by \rangle \geq \| P_C(\bx) - P_C (\by) \|^2 \]

holds true for every \(\bx, \by \in \VV\).

Proof. Recall from Theorem 8.7 that for any \(\bu \in \VV\) and \(\bv \in C\)

\[ \langle \bv - P_C(\bu), \bu - P_C(\bu) \rangle \leq 0. \]
  1. Substituting \(\bu = \bx\) and \(\bv = P_C(\by)\), we obtain

    \[ \langle P_C(\by) - P_C(\bx), \bx - P_C(\bx) \rangle \leq 0. \]
  2. Substituting \(\bu = \by\) and \(\bv = P_C(\bx)\), we obtain

    \[ \langle P_C(\bx) - P_C(\by), \by - P_C(\by) \rangle \leq 0. \]
  3. Adding the two inequalities gives us

    \[\begin{split} & \langle P_C(\bx) - P_C(\by), \by - P_C(\by) - \bx + P_C(\bx) \rangle \leq 0 \\ & \iff \langle P_C(\bx) - P_C(\by), (\by - \bx) + (P_C(\bx) - P_C(\by)) \rangle \leq 0\\ &\iff \| P_C(\bx) - P_C(\by) \|^2 \leq \langle P_C(\bx) - P_C(\by), \bx - \by \rangle \end{split}\]

    as desired.

8.2.6. Squared Distance Function#

Definition 8.9 (Squared distance function to a nonempty set)

Let \(C\) be a nonempty subset of \(\VV\). The squared distance to set \(C\) function denoted as \(\varphi_C : \VV \to \RR\) is defined as:

\[ \varphi_C(\bx) \triangleq \frac{1}{2} d_C^2(\bx). \]

We also define \(\psi_C : \VV \to \RR\) as:

\[ \psi_C(\bx) \triangleq \frac{1}{2} \left (\| \bx \|^2 - d_C^2(\bx) \right). \]

Theorem 8.15 (Expression for \(\psi_C\))

Let \(C\) be a nonempty subset of \(\VV\). Then, the function \(\psi_C\) as defined in Definition 8.9 is given by

\[ \psi_C(\bx) = \underset{\by \in C}{\sup} \left [ \langle \by, \bx \rangle - \frac{1}{2} \| \by \|^2 \right ]. \]

Proof. We proceed as follows.

  1. Expanding on the definition of \(d_C^2\)

    \[\begin{split} d_C^2(\bx) &= \inf_{\by \in C} \| \bx - \by \|^2 \\ &= \inf_{\by \in C} \langle \bx - \by, \bx - \by \rangle \\ &= \inf_{\by \in C} (\| \bx \|^2 - 2 \langle \bx, \by \rangle + \| \by \|^2) \\ &= \inf_{\by \in C} (\| \bx \|^2 - (2 \langle \bx, \by \rangle - \| \by \|^2)) \\ &= \| \bx \|^2 - \sup_{\by \in C} (2 \langle \bx, \by \rangle - \| \by \|^2). \end{split}\]
  2. Thus,

    \[ \| \bx \|^2 - d_C^2 (\bx) = \sup_{\by \in C} ( 2 \langle \bx, \by \rangle - \| \by \|^2 ). \]
  3. Thus,

    \[ \psi_C(\bx) = \frac{1}{2} \left (\| \bx \|^2 - d_C^2(\bx) \right) = \sup_{\by \in C} \left [\langle \bx, \by \rangle - \frac{1}{2} \| \by \|^2 \right ]. \]

Theorem 8.16 (\(\psi_C\) is convex)

Let \(C\) be a nonempty subset of \(\VV\). Then, the function \(\psi_C\) as defined in Definition 8.9 is convex.

Beauty of this result is the fact that \(\psi_C\) is convex irrespective of whether \(C\) is convex or not.

Proof. We proceed as follows.

  1. For every \(\by \in C\), the function \(g_y : \VV \to \RR\), given by

    \[ g_y (\bx) = \langle \by, \bx \rangle - \frac{1}{2} \| \by \|^2 \]

    is an affine function.

  2. \(g_y\) is convex for every \(\by \in C\) due to Theorem 7.68.

  3. Now,

    \[ \psi_C (\by) = \sup{\by \in C} g_y. \]
  4. Thus, \(\psi_C\) is a pointwise supremum of convex functions.

  5. Thus, by Theorem 7.114, \(\psi_C\) is convex.

Theorem 8.17 (Squared distance function for nonempty, closed and convex sets)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Then, the squared distance function \(\varphi_C\) is given by

\[ \varphi_C(\bx) = \frac{1}{2}\| \bx - P_C(\bx) \|^2. \]

This follows directly from Theorem 8.10.

8.2.7. Gradients and Subgradients#

Theorem 8.18 (Gradient of the squared distance function)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). The gradient of the squared distance function \(\varphi_C\) as defined in Definition 8.9 at \(\bx \in \VV\) is given by:

\[ \nabla \varphi_C(\bx) = \bx - P_C(\bx) \Forall \bx \in \VV. \]

Proof. We proceed as follows.

  1. Let \(\bx \in \VV\).

  2. Let \(\bz_x = \bx - P_C(\bx)\).

  3. Consider the function

    \[ g_x(\bd) = \varphi_C(\bx + \bd) - \varphi_C(\bx) - \langle \bd, \bz_x \rangle. \]
  4. If

    \[ \lim_{\bd \to \bzero} \frac{g_x(\bd)}{ \| \bd \|} = 0 \]

    then \(\bz_x\) is indeed the gradient of \(\varphi_C\) at \(\bx\).

  5. By definition of orthogonal projection, for any \(\bd \in \VV\),

    \[ \| \bx + \bd - P_C(\bx + \bd) \|^2 \leq \| \bx + \bd - P_C(\bx) \|^2 \]

    as \(P_C(\bx + \bd)\) is the nearest point to \(\bx + \bd\) in \(C\). \(P_C(\bx)\) is just another point in \(C\).

  6. Thus, for any \(\bd \in \VV\)

    \[\begin{split} g_x(\bd) &= \varphi_C(\bx + \bd) - \varphi_C(\bx) - \langle \bd, \bz_x \rangle \\ &= \frac{1}{2} \| \bx + \bd - P_C(\bx + \bd) \|^2 - \frac{1}{2} \| \bx - P_C(\bx) \|^2 - \langle \bd, \bz_x \rangle \\ &\leq \frac{1}{2} \| \bx + \bd - P_C(\bx) \|^2 - \frac{1}{2} \| \bx - P_C(\bx) \|^2 - \langle \bd, \bz_x \rangle. \end{split}\]
  7. Recall that for a norm induced by the inner product

    \[ \| \ba + \bb \|^2 = \langle \ba + \bb, \ba + \bb \rangle = \| \ba \|^2 + 2 \langle \ba, \bb \rangle + \| \bb \|^2. \]
  8. Thus,

    \[\begin{split} \| \bx + \bd - P_C(\bx)\|^2 &= \| \bd + (\bx - P_C(\bx)) \|^2\\ &= \| \bd \|^2 + \| \bx - P_C(\bx)\|^2 + 2 \langle \bd, \bx - P_C(\bx) \rangle. \end{split}\]
  9. Putting it back and simplifying, we obtain

    \[ g_x(\bd) \leq \frac{1}{2}\| \bd \|^2 + \langle \bd, \bx - P_C(\bx)\rangle - \langle \bd, \bz_x \rangle = \frac{1}{2}\| \bd \|^2. \]
  10. Proceeding similarly, we also have

    \[ g_x(-\bd) \leq \frac{1}{2}\| \bd \|^2. \]
  11. Since \(\varphi_C\) is convex, hence \(g_x\) is also convex.

  12. Thus,

    \[ 0 = g_x(\bzero) = g_x \left (\frac{1}{2} \bd + \frac{1}{2} (-\bd) \right ) \leq \frac{1}{2} g_x(\bd) + \frac{1}{2} g_x(-\bd). \]
  13. Thus,

    \[ g_x(\bd) \geq - g_x(-\bd) \geq - \frac{1}{2}\| \bd \|^2. \]
  14. Combining, we have

    \[ - \frac{1}{2}\| \bd \|^2 \leq g_x(\bd) \leq \frac{1}{2}\| \bd \|^2. \]
  15. Or, in terms of absolute values.

    \[ |g_x(\bd)| \leq \frac{1}{2}\| \bd \|^2. \]
  16. Then,

    \[ \frac{|g_x(\bd)|}{\| \bd \|} \leq \frac{1}{2}\| \bd \|. \]
  17. Thus,

    \[ \lim_{\bd \to \bzero} \frac{g_x(\bd)}{ \| \bd \|} = 0 \]
  18. Thus, \(\bz_x = \bx - P_C(\bx)\) is indeed the gradient of \(\varphi_C\) at \(\bx\).

Theorem 8.19 (Gradient of \(\psi_C\).)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). The gradient of the function \(\psi_C\) as defined in Definition 8.9 at \(\bx \in \VV\) is given by:

\[ \nabla \psi_C(\bx) = P_C(\bx) \Forall \bx \in \VV. \]

Proof. We have

\[ \psi_C(\bx) = \frac{1}{2} \left (\| \bx \|^2 - d_C^2(\bx) \right) = \frac{1}{2} (\| \bx \|^2 - \varphi_C(\bx). \]

Hence,

\[ \nabla \psi_C(\bx) = \bx - \nabla \varphi_C(\bx) = \bx - (\bx - P_C(\bx)) = P_C(\bx). \]

Remark 8.5 (Distance function and square distance function relation)

We note that \(\varphi_C = g \circ d_C\) where \(g(t) = \frac{1}{2}[t]_+^2\).

\(g\) is a nonincreasing real-valued convex differentiable function. We also note that

\[ g'(t) = 2 [t]_+. \]

Theorem 8.20 (Subdifferential of the distance function)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). The subdifferential of the distance function \(d_C\) is given by

\[\begin{split} \partial d_C (\bx) = \begin{cases} \left \{ \frac{\bx - P_C(\bx)}{d_C(\bx)}\right \}, & \bx \notin C\\ N_C(\bx) \cap B[\bzero, 1], & \bx \in C \end{cases}. \end{split}\]

\(N_C(\bx)\) denotes the normal cone of all vectors normal to the set \(C\) at a point \(\bx \in C\).

Since \(\partial d_C\) is a singleton for \(\bx \notin C\), hence \(d_C\) is differentiable at \(\bx \notin C\).

Proof. We can get the subdifferentials for \(d_C\) by applying the chain rule.

  1. Recall that \(\varphi_C = g \circ d_C\) where \(g(t) = \frac{1}{2}[t]_+^2\).

  2. Thus, by subdifferential chain rule (Theorem 7.229):

    \[ \partial \varphi_C (\bx) = g'(d_C(\bx)) \partial d_C(\bx) = [d_C(\bx)]_+ \partial d_C(\bx) = d_C(\bx) \partial d_C(\bx). \]

    We used the fact that \(d_C\) is nonnegative.

  3. Since \(\varphi_C\) is differentiable, hence \(\partial \varphi_C(\bx) = \{\bx - P_C(\bx) \}\).

  4. If \(\bx \notin C\), then, \(d_C(\bx) > 0\).

  5. Thus, for \(\bx \notin C\)

    \[ \partial d_C(\bx) = \left \{ \frac{\bx - P_C(\bx)}{d_C(\bx)} \right \}. \]
  6. For \(\bx \in C\), \(d_C(\bx) = 0\).

  7. We need to show that \(\partial d_C(\bx) = N_C(\bx) \cap B[\bzero, 1]\) in this case.

  8. Consider any \(\bd \in \partial d_C(\bx)\).

  9. Then, by subgradient inequality

    \[ d_C(\by) \geq d_C(\bx) + \langle \by - \bx, \bd \rangle = \langle \by - \bx, \bd \rangle \Forall \by \in \VV \]

    since \(d_C(\bx) = 0\).

  10. Then, in particular, for any \(\by \in C\)

    \[ d_C(\by) = 0 \geq \langle \by - \bx, \bd \rangle. \]
  11. Thus, \(\bd \in N_C(\bx)\).

8.2.8. Conjugates#

Theorem 8.21 (Conjugate of norm squared plus indicator)

Let \(C\) be a nonempty subset of \(\VV\). Let \(f : \VV \to \RERL\) be given by:

\[ f(\bx) = \frac{1}{2} \| \bx \|^2 + \delta_C(\bx). \]

Then, its conjugate is given by:

\[ f^*(\by) = \frac{1}{2}\| \by \|^2 - \frac{1}{2} d_C^2 (\by) = \psi_C(\by). \]

Further, if \(C\) is nonempty, closed and convex, then \(f^{**} = f\). In other words, \(\psi_C^* = f\).

Proof. Recall from Definition 7.78 that

\[ f^*(\by) = \underset{\bx \in \VV}{\sup} \{ \langle \bx, \by \rangle - f(\bx)\} \Forall \by \in \VV^*. \]

Expanding, we obtain

\[\begin{split} \underset{\bx \in \VV}{\sup} \{ \langle \bx, \by \rangle - f(\bx)\} &= \underset{\bx \in \VV}{\sup} \{ \langle \bx, \by \rangle - \frac{1}{2} \| \bx \|^2 - \delta_C(\bx)\} \\ &= \underset{\bx \in C}{\sup} \{ \langle \bx, \by \rangle - \frac{1}{2} \| \bx \|^2 \} \\ &= \psi_C(\by). \end{split}\]

The last result is due to Theorem 8.15.

If \(C\) is nonempty, closed and convex, then, \(f\) is proper closed and convex. Then, due to Theorem 7.242, the biconjugate of \(f\) is \(f\) itself.

8.2.9. Smoothness#

Recall from Definition 7.80 that a function \(f : \VV \to \RR\) is \(L\)-smooth if

\[ \| \nabla f(\bx) - \nabla f(\by)\|_* \leq L \| \bx - \by \| \Forall \bx, \by \in \VV. \]

Since the norm in this section is induced by the inner product, hence it is self dual.

Thus, the smoothness criteria becomes:

\[ \| \nabla f(\bx) - \nabla f(\by)\|_* \leq L \| \bx - \by \| \Forall \bx, \by \in \VV. \]

Theorem 8.22 (Smoothness of the squared distance function)

The squared distance function \(\varphi_C\) is 1-smooth.

Proof. Recall the definition of \(\varphi_C\) from Definition 8.9.

  1. By Theorem 8.18,

    \[ \nabla \varphi_C(\bx) = \bx - P_C(\bx). \]
  2. Hence,

    \[\begin{split} & \| \nabla \varphi_C(\bx) - \nabla \varphi_C(\by) \|^2 \\ &= \| \bx - P_C(\bx) - \by + P_C(\by) \|^2 \\ &= \| (\bx - \by) - (P_C(\bx) - P_C(\by)) \|^2 \\ &= \| \bx - \by \|^2 - 2 \langle P_C(\bx) - P_C(\by), \bx - \by \rangle + \| P_C(\bx) - P_C(\by) \|^2\\ &\leq \| \bx - \by \|^2 - 2 \| P_C(\bx) - P_C(\by) \|^2 + \| P_C(\bx) - P_C(\by) \|^2 & \text{ firm nonexpansiveness } \\ &= \| \bx - \by \|^2 - \| P_C(\bx) - P_C(\by) \|^2 \\ &\leq \| \bx - \by \|^2. \end{split}\]
  3. Thus,

    \[ \| \nabla \varphi_C(\bx) - \nabla \varphi_C(\by) \| \leq \| \bx - \by \|. \]
  4. Thus, \(\varphi_C\) is 1-smooth.

Theorem 8.23 (Smoothness of the \(\psi_C\) function)

The function \(\psi_C\) is 1-smooth.

Proof. We recall from Theorem 8.19 that \(\nabla \psi_C(\bx) = P_C(\bx)\).

Hence,

\[ \| \nabla \psi_C(\bx) - \nabla \psi_C(\by) \| = \| P_C(\bx) - P_C(\by) \| \leq \| \bx - \by \| \]

due to the nonexpansiveness property (Theorem 8.13).

8.2.10. POCS Problems#

In this section, we present some example optimization problems which can be converted into an equivalent projection on a convex set problem.

8.2.10.1. Equality Constrained Quadratic Programming#

Quadratic programming problems are discussed extensively in Quadratic Programming. Here we discuss a specific form of minimizing a quadratic function subject to linear equality constraints.

Example 8.1 (Equality constrained quadratic programming)

We consider the quadratic programming problem

\[\begin{split} & \text{minimize } & & \frac{1}{2} \| \bx \|^2 + \bc^T \bx \\ & \text{subject to } & & \bA \bx = \bzero. \end{split}\]

where

  1. \(\bx \in \RR^n\) is the optimization variable.

  2. \(\bc \in \RR^n\) is a given vector.

  3. \(\bA \in \RR^{m \times n}\) is an \(m \times n\) matrix of rank \(m\). Assume that \(m < n\).

We proceed towards converting this problem into a projection on a convex set problem as follows.

  1. By adding a constant term \(\frac{1}{2} \| \bc \|^2\) to the objective function, we obtain an equivalent problem.

    \[\begin{split} & \text{minimize } & & \frac{1}{2} \| \bc + \bx \|^2 \\ & \text{subject to } & & \bA \bx = \bzero. \end{split}\]
  2. The set \(C = \{ \bx \ST \bA \bx = \bzero \}\) is the null space of the matrix \(\bA\) which is linear subspace, hence a nonempty, closed and convex set.

  3. Minimizing \(\frac{1}{2} \| \bc + \bx \|^2\) is equivalent to minimizing \(\| (-\bc) - \bx \|\) subject to \(\bx \in C\).

  4. \(\| (-\bc) - \bx \|\) is \(d(-\bc, \bx)\), the distance between \(-\bc\) and a point \(\bx\).

  5. Thus, we are minimizing the distance of the point \(-\bc\) among \(\bx \in C\).

  6. This is nothing but the distance of \(-\bc\) from the set \(C\).

  7. Since, \(C\) is nonempty, close and convex, hence, there is a unique \(\bx^*\) which minimizes the distance due to the projection theorem.

  8. Thus, the solution is the projection of the vector \(-\bc\) on the subspace \(C\).

  9. By Theorem 8.8, \(\bx^*\) is the unique projection of \(-\bc\) on \(C\) if and only if \(-\bc - \bx^* \in C^{\perp}\).

  10. In other words, $\( \langle \bc + \bx^*, \bx \rangle = 0 \Forall \bx \in C. \)$

  11. A closed form solution to this problem does exist given by

    \[ \bx^* = - (\bI - \bA^T (\bA \bA^T)^{-1} bA) \bc. \]
  12. It is indeed the unique solution to this quadratic programming problem.