10.2. Projection on Convex Sets#

We will assume \(\VV\) to be real \(n\)-dimensional vector space space with an inner product \(\langle \cdot, \cdot \rangle\) , the induced norm \(\| \cdot \|\) and corresponding dual norm \(\| \cdot \|_*\) for the dual space \(\VV^*\).

We are interested in mapping a point \(\bx\) to the nearest point in a set \(C\). In general, this problem may have zero, one or multiple solutions. However, in the special case where \(C\) is a nonempty, closed and convex set, then there is exactly one such point in \(C\) which is nearest to a given point \(\bx\). This nearest point is called the projection of \(\bx\) on to \(C\).

Main references for this section are [6, 9].

10.2.1. Projection Theorem#

Theorem 10.6 (Projection theorem)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). For every \(\bx \in \VV\), there exists a unique vector that minimizes \(\| \bz - \bx \|\) over all \(\bz \in C\). This vector is called the projection of \(\bx\) on \(C\).

Proof. We fix some \(\bx \in \VV\) and choose an element \(\bw \in C\).

  1. Consider the function

    \[ g(\bz) = \frac{1}{2} \| \bz - \bx \|^2. \]
  2. Minimizing \(\| \bz - \bx \|\) over all \(C\) is equivalent to minimizing \(g(\bz)\) over the set

    \[ D = \{ \bz \in C \ST \| \bz - \bx \| \leq \| \bw - \bx \| \}. \]
  3. We note that \(D\) is a compact set and \(g\) is a l.s.c., closed and coercive function.

  4. By Weierstrass’ theorem Corollary 8.2, the set of minimizers for \(g\) is nonempty and compact.

We now show that the minimizer is unique.

  1. \(g\) is a strictly convex function because its Hessian matrix is identity matrix which is positive definite.

  2. Hence, the minimizer is unique due to Theorem 10.2.

10.2.2. Orthogonal Projection#

Definition 10.6 (Orthogonal Projection Mapping)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). The orthogonal projection mapping \(P_C : \VV \to \VV\) is defined by:

\[ P_C(\bx) \triangleq \underset{\by \in C}{\argmin} \| \by - \bx \| \Forall \bx \in \VV. \]

This mapping is well defined since the projection is unique for a nonempty, closed and convex set \(C\) due to Theorem 10.6.

The vector \(P_C(\bx)\) is called the projection of \(\bx\) on the set \(C\).

10.2.3. Characterization#

Theorem 10.7 (Orthogonal projection characterization)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). For every vector \(\bx \in \VV\), a vector \(\bz \in C\) is its projection if and only if

\[ \langle \by - \bz, \bx - \bz \rangle \leq 0 \Forall \by \in C. \]

This result is also known as the second projection theorem [6].

Proof. Assume that for some \(\bz \in C\), \(\langle \by - \bz, \bx - \bz \rangle \leq 0 \Forall \by \in C\) holds true.

  1. For any \(\by \in C\)

    \[\begin{split} \| \by - \bx \|^2 &= \| (\by - \bz) - (\bx - \bz) \|^2 \\ &= \| \by - \bz \|^2 + \| \bz - \bx \|^2 - 2 \langle \by - \bz, \bx - \bz \rangle \\ &\geq \| \bz - \bx \|^2 - 2 \langle \by - \bz, \bx - \bz \rangle. \end{split}\]
  2. Thus,

    \[ \| \by - \bx \|^2 \geq \| \bz - \bx \|^2 \Forall \by \in C. \]
  3. Thus, \(\bz\) is indeed the projection of \(\bx\) on \(C\).

Conversely, assume that \(\bz\) is the projection of \(\bx\) on \(C\).

  1. Let \(\by \in C\) be arbitrary.

  2. For any \(t \geq 0\), define \(\by_t = t \by + (1 -t ) \bz\).

  3. Then, we have

    \[\begin{split} \| \bx - \by_t \|^2 &= \| t \bx + (1-t) \bx + - t \by - (1 -t ) \bz \|^2 \\ &= \| t (\bx - \by) + (1- t) (\bx - \bz) \|^2\\ &= t^2 \| \bx - \by \|^2 + (1-t)^2 \| \bx - \bz \|^2 + 2 t (1-t) \langle \bx - \by, \bx - \bz \rangle. \end{split}\]
  4. Viewing \(\| \bx - \by_t \|^2\) as a function of \(t\) and differentiating w.r.t. \(t\), we have

    \[\begin{split} \left . \frac{d}{d t} \| \bx - \by_t \|^2 \right |_{t = 0} &= -2 \| \bx - \bz \|^2 + 2 \langle \bx - \by, \bx - \bz \rangle\\ &= -2 \langle \bx - \bz, \bx - \bz \rangle + 2 \langle \bx - \by, \bx - \bz \rangle\\ &= -2 \langle \by - \bz, \bx - \bz \rangle. \end{split}\]
  5. Since \(t=0\) minimizes \(\| \bx - \by_t \|^2\) over \(t \in [0,1]\), we must have

    \[ \left . \frac{d}{d t} \| \bx - \by_t \|^2 \right |_{t = 0} \geq 0. \]
  6. Thus, we require that

    \[ \langle \by - \bz, \bx - \bz \rangle \leq 0 \]

    must hold true for every \(\by \in C\).

Following is an alternative proof based on results from Constrained Optimization I. This proof is specific to the case where \(\VV = \RR^n\).

Proof. Define a function \(f: \RR^n \to \RR\) as

\[ f(\by) = \| \by - \bx \|^2. \]

Then, the projection problem can be cast as an optimization problem

\[\begin{split} & \text{minimize } & & f(\by) \\ & \text{subject to } & & \by \in C. \end{split}\]

Note that the gradient of \(f\) is given by

\[ \nabla f (\by) = \nabla \langle \by - \bx, \by - \bx \rangle = \nabla (\langle \by, \by \rangle - 2 \langle \by, \bx \rangle + \langle \bx, \bx \rangle) = 2 (\by - \bx). \]

By Theorem 10.47, \(\bz\) is an optimal solution if and only if

\[ f(\bz)^T (\by - \bz) \geq 0 \Forall \by \in C. \]

In other words

\[ 2 (\bz - \bx)^T (\by - \bz) \geq 0 \Forall \by \in C. \]

We can simplify this as

\[ \langle \bx - \bz, \by - \bz \rangle \leq 0 \Forall \by \in C. \]

Theorem 10.8 (Orthogonal projection on an affine subspace)

Let \(C\) be an affine subspace of \(\VV\). Let \(S\) be the linear subspace parallel to \(C\). For every vector \(\bx \in \VV\), a vector \(\bz \in C\) is its projection if and only if

\[ \bx - \bz \in S^{\perp}. \]

Proof. Since \(C\) is an affine subspace of \(\VV\), hence \(C\) is nonempty, convex and closed (as \(\VV\) is finite dimensional).

  1. By Theorem 10.7, \(\bz\) is the projection of \(\bx\) on \(C\) if and only if for every \(\by \in C\), we have

    \[ \langle \by - \bz, \bx - \bz \rangle \leq 0. \]
  2. But \(\by \in C\) if and only if \(\by - \bz \in S\).

  3. Hence the condition is equivalent to

    \[ \langle \bw, \bx - \bz \rangle \leq 0 \Forall \bw \in S. \]
  4. But then, it must be an equality since \(\bw\) and \(-\bw\) both belong to \(S\). Thus, we have

    \[ \langle \bw, \bx - \bz \rangle = 0 \Forall \bw \in S. \]
  5. In other words, \(\bx - \bz \in S^{\perp}\).

10.2.4. Distance Function#

Recall that the distance of a point \(\bx \in \VV\) from a set \(C\) is defined as

\[ d_C(\bx) \triangleq \underset{\by \in C}{\inf} \| \bx - \by \|. \]

Theorem 10.9 (Distance function for nonempty, closed and convex set)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Then the function \(d_C : \VV \to \RR\) defining the distance of a point \(\bx \in \VV\) from the set \(C\) satisfies

\[ d_C(\bx) = \| \bx - P_C(\bx) \|. \]

Proof. By Theorem 10.6, there exists a unique point \(P_C(\bx)\) which minimizes the distance between \(\bx\) and \(C\). Hence

\[ d_C(\bx) = \| \bx - P_C(\bx) \| \]

must hold true.

Theorem 10.10 (Distance function for nonempty, closed and convex set is convex)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Let \(d_C : \VV \to \RR\) be the distance to the set \(C\) function as defined in Theorem 10.9. Then, \(d_C\) is convex.

Proof. Assume, for contradiction, that \(d_C\) is not convex.

  1. Then, there exist \(\bx, \by \in \VV\) and \(t \in (0,1)\) such that

    \[ d_C(t \bx + (1-t) \by) > t d_C(\bx) + (1-t)d_C(\by). \]
  2. Let \(\bu = P_C(\bx)\) and \(\bv = P_C(\by)\). By definition, \(\bu, \bv \in C\).

  3. Then,

    \[ t d_C(\bx) + (1-t)d_C(\by) = t \| \bu - \bx \| + (1-t) \| \bv - \by \|. \]
  4. Since \(C\) is convex, hence, \(t \bu + (1-t) \bv \in C\).

  5. Since, \(d_C(t \bx + (1-t) \by)\) minimizes the distance of \(C\) from the point \(t \bx + (1-t) \by\), hence

    \[ \| t \bu + (1-t) \bv - t \bx - (1-t) \by \| \geq d_C(t \bx + (1-t) \by). \]
  6. Rewriting,

    \[ d_C(t \bx + (1-t) \by) \leq \| t (\bu - \bx) + (1-t) (\bv - \by) \| \leq t \| \bu - \bx \| + (1-t) \| \bv - \by \| \]

    due to triangle inequality.

  7. But, this leads to the contradiction

    \[ t \| \bu - \bx \| + (1-t) \| \bv - \by \| < d_C(t \bx + (1-t) \by) \leq t \| \bu - \bx \| + (1-t) \| \bv - \by \|. \]
  8. Hence, \(d_C\) must be convex.

10.2.5. Nonexpansiveness#

Definition 10.7 (Nonexpansiveness property)

Let \(\VV\) be a normed linear space. An operator \(T : \VV \to \VV\) is called nonexpansive if

\[ \| T (\bx) - T (\by) \| \leq \| \by - \bx \| \Forall \bx, \by \in \VV. \]

In other words, the distance between mapped points in \(\VV\) is always less than or equal to the distance between original points in \(\VV\).

Theorem 10.11 (Nonexpansive operators are Lipschitz continuous)

Let \(\VV\) a be normed linear space. A nonexpansive operator \(T : \VV \to \VV\) is Lipschitz continuous. Hence, it is uniformly continuous.

Proof. Recall from Definition 3.45 that if \(T\) is a Lipschitz map, then there exists \(L > 0\) such that

\[ \| T (\bx) - T (\by) \| \leq L \| \bx - \by \| \]

for every \(\bx, \by \in \VV\).

For a nonexpansive operator, such \(L = 1\). This \(T\) is indeed Lipschitz continuous. By Theorem 3.57, every Lipschitz continuous function is uniformly continuous.

Definition 10.8 (Firm nonexpansiveness property)

Let \(\VV\) be a real inner product space. An operator \(T : \VV \to \VV\) is called firmly nonexpansive if

\[ \langle T(\bx) - T(\by), \bx- \by \rangle \geq \| T(\bx) - T (\by) \|^2 \]

holds true for every \(\bx, \by \in \VV\).

Theorem 10.12 (A firmly nonexpansive operator is nonexpansive)

Let \(\VV\) be a real inner product space. Let \(T : \VV \to \VV\) be a firmly nonexpansive operator. Then, \(T\) is nonexpansive.

Proof. For every \(\bx, \by \in \VV\), we have

\[ \| T(\bx) - T (\by) \|^2 \leq \langle T(\bx) - T(\by), \bx- \by \rangle. \]

Applying Cauchy Schwartz inequality on R.H.S., we get

\[ \| T(\bx) - T (\by) \|^2 \leq \| T(\bx) - T(\by) \| \| \bx- \by \|. \]

Canceling terms, we get:

\[ \| T(\bx) - T (\by) \| \leq \| \bx- \by \| \]

which is the nonexpansive property.

Theorem 10.13 (Orthogonal projection is nonexpansive)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Let \(P_C : \VV \to \VV\) be the orthogonal projection operator as defined in Definition 10.6 is nonexpansive (and therefore continuous).

In other words,

\[ \| P_C (\bx) - P_C (\by) \| \leq \| \bx - \by \| \Forall \bx, \by \in \VV. \]

Proof. Let \(\bx, \by \in \VV\).

  1. By Theorem 10.7,

    \[ \langle \bw - P_C(\bx), \bx - P_C(\bx) \rangle \leq 0 \Forall \bw \in C. \]
  2. In particular \(P_C(\by) \in C\). Hence,

    \[ \langle P_C(\by) - P_C(\bx), \bx - P_C(\bx) \rangle \leq 0. \]
  3. Similarly, starting with \(P_C(\bx)\), we obtain

    \[ \langle P_C(\bx) - P_C(\by), \by - P_C(\by) \rangle \leq 0. \]
  4. Adding these two inequalities, we obtain

    \[ \langle P_C(\by) - P_C(\bx), \bx - P_C(\bx) - \by + P_C(\by) \rangle \leq 0. \]
  5. By rearranging the terms, we get

\[ \langle P_C(\by) - P_C(\bx), P_C(\by) - P_C(\bx) \rangle \leq \langle P_C(\by) - P_C(\bx), \by - \bx \rangle. \]
  1. Applying the Cauchy Schwartz inequality on the R.H.S., we obtain

\[ \| P_C(\by) - P_C(\bx) \|^2 \leq \| P_C(\by) - P_C(\bx) \| \| \by - \bx \|. \]
  1. Thus, \(P_C\) is nonexpansive.

  2. Since \(P_C\) is nonexpansive, hence \(P_C\) is continuous also.

Theorem 10.14 (Orthogonal projection is firmly nonexpansive)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Let \(P_C : \VV \to \VV\) be the orthogonal projection operator as defined in Definition 10.6. Then \(P_C\) is a firmly nonexpansive operator.

In other words,

\[ \langle P_C(\bx) - P_C(\by), \bx- \by \rangle \geq \| P_C(\bx) - P_C (\by) \|^2 \]

holds true for every \(\bx, \by \in \VV\).

Proof. Recall from Theorem 10.7 that for any \(\bu \in \VV\) and \(\bv \in C\)

\[ \langle \bv - P_C(\bu), \bu - P_C(\bu) \rangle \leq 0. \]
  1. Substituting \(\bu = \bx\) and \(\bv = P_C(\by)\), we obtain

    \[ \langle P_C(\by) - P_C(\bx), \bx - P_C(\bx) \rangle \leq 0. \]
  2. Substituting \(\bu = \by\) and \(\bv = P_C(\bx)\), we obtain

    \[ \langle P_C(\bx) - P_C(\by), \by - P_C(\by) \rangle \leq 0. \]
  3. Adding the two inequalities gives us

    \[\begin{split} & \langle P_C(\bx) - P_C(\by), \by - P_C(\by) - \bx + P_C(\bx) \rangle \leq 0 \\ & \iff \langle P_C(\bx) - P_C(\by), (\by - \bx) + (P_C(\bx) - P_C(\by)) \rangle \leq 0\\ &\iff \| P_C(\bx) - P_C(\by) \|^2 \leq \langle P_C(\bx) - P_C(\by), \bx - \by \rangle \end{split}\]

    as desired.

10.2.6. Squared Distance Function#

Definition 10.9 (Squared distance function to a nonempty set)

Let \(C\) be a nonempty subset of \(\VV\). The squared distance to set \(C\) function denoted as \(\varphi_C : \VV \to \RR\) is defined as:

\[ \varphi_C(\bx) \triangleq \frac{1}{2} d_C^2(\bx). \]

We also define \(\psi_C : \VV \to \RR\) as:

\[ \psi_C(\bx) \triangleq \frac{1}{2} \left (\| \bx \|^2 - d_C^2(\bx) \right). \]

Theorem 10.15 (Expression for \(\psi_C\))

Let \(C\) be a nonempty subset of \(\VV\). Then, the function \(\psi_C\) as defined in Definition 10.9 is given by

\[ \psi_C(\bx) = \underset{\by \in C}{\sup} \left [ \langle \by, \bx \rangle - \frac{1}{2} \| \by \|^2 \right ]. \]

Proof. We proceed as follows.

  1. Expanding on the definition of \(d_C^2\)

    \[\begin{split} d_C^2(\bx) &= \inf_{\by \in C} \| \bx - \by \|^2 \\ &= \inf_{\by \in C} \langle \bx - \by, \bx - \by \rangle \\ &= \inf_{\by \in C} (\| \bx \|^2 - 2 \langle \bx, \by \rangle + \| \by \|^2) \\ &= \inf_{\by \in C} (\| \bx \|^2 - (2 \langle \bx, \by \rangle - \| \by \|^2)) \\ &= \| \bx \|^2 - \sup_{\by \in C} (2 \langle \bx, \by \rangle - \| \by \|^2). \end{split}\]
  2. Thus,

    \[ \| \bx \|^2 - d_C^2 (\bx) = \sup_{\by \in C} ( 2 \langle \bx, \by \rangle - \| \by \|^2 ). \]
  3. Thus,

    \[ \psi_C(\bx) = \frac{1}{2} \left (\| \bx \|^2 - d_C^2(\bx) \right) = \sup_{\by \in C} \left [\langle \bx, \by \rangle - \frac{1}{2} \| \by \|^2 \right ]. \]

Theorem 10.16 (\(\psi_C\) is convex)

Let \(C\) be a nonempty subset of \(\VV\). Then, the function \(\psi_C\) as defined in Definition 10.9 is convex.

Beauty of this result is the fact that \(\psi_C\) is convex irrespective of whether \(C\) is convex or not.

Proof. We proceed as follows.

  1. For every \(\by \in C\), the function \(g_y : \VV \to \RR\), given by

    \[ g_y (\bx) = \langle \by, \bx \rangle - \frac{1}{2} \| \by \|^2 \]

    is an affine function.

  2. \(g_y\) is convex for every \(\by \in C\) due to Theorem 9.68.

  3. Now,

    \[ \psi_C (\by) = \sup{\by \in C} g_y. \]
  4. Thus, \(\psi_C\) is a pointwise supremum of convex functions.

  5. Thus, by Theorem 9.114, \(\psi_C\) is convex.

Theorem 10.17 (Squared distance function for nonempty, closed and convex sets)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Then, the squared distance function \(\varphi_C\) is given by

\[ \varphi_C(\bx) = \frac{1}{2}\| \bx - P_C(\bx) \|^2. \]

This follows directly from Theorem 10.10.

10.2.7. Gradients and Subgradients#

Theorem 10.18 (Gradient of the squared distance function)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). The gradient of the squared distance function \(\varphi_C\) as defined in Definition 10.9 at \(\bx \in \VV\) is given by:

\[ \nabla \varphi_C(\bx) = \bx - P_C(\bx) \Forall \bx \in \VV. \]

Proof. We proceed as follows.

  1. Let \(\bx \in \VV\).

  2. Let \(\bz_x = \bx - P_C(\bx)\).

  3. Consider the function

    \[ g_x(\bd) = \varphi_C(\bx + \bd) - \varphi_C(\bx) - \langle \bd, \bz_x \rangle. \]
  4. If

    \[ \lim_{\bd \to \bzero} \frac{g_x(\bd)}{ \| \bd \|} = 0 \]

    then \(\bz_x\) is indeed the gradient of \(\varphi_C\) at \(\bx\).

  5. By definition of orthogonal projection, for any \(\bd \in \VV\),

    \[ \| \bx + \bd - P_C(\bx + \bd) \|^2 \leq \| \bx + \bd - P_C(\bx) \|^2 \]

    as \(P_C(\bx + \bd)\) is the nearest point to \(\bx + \bd\) in \(C\). \(P_C(\bx)\) is just another point in \(C\).

  6. Thus, for any \(\bd \in \VV\)

    \[\begin{split} g_x(\bd) &= \varphi_C(\bx + \bd) - \varphi_C(\bx) - \langle \bd, \bz_x \rangle \\ &= \frac{1}{2} \| \bx + \bd - P_C(\bx + \bd) \|^2 - \frac{1}{2} \| \bx - P_C(\bx) \|^2 - \langle \bd, \bz_x \rangle \\ &\leq \frac{1}{2} \| \bx + \bd - P_C(\bx) \|^2 - \frac{1}{2} \| \bx - P_C(\bx) \|^2 - \langle \bd, \bz_x \rangle. \end{split}\]
  7. Recall that for a norm induced by the inner product

    \[ \| \ba + \bb \|^2 = \langle \ba + \bb, \ba + \bb \rangle = \| \ba \|^2 + 2 \langle \ba, \bb \rangle + \| \bb \|^2. \]
  8. Thus,

    \[\begin{split} \| \bx + \bd - P_C(\bx)\|^2 &= \| \bd + (\bx - P_C(\bx)) \|^2\\ &= \| \bd \|^2 + \| \bx - P_C(\bx)\|^2 + 2 \langle \bd, \bx - P_C(\bx) \rangle. \end{split}\]
  9. Putting it back and simplifying, we obtain

    \[ g_x(\bd) \leq \frac{1}{2}\| \bd \|^2 + \langle \bd, \bx - P_C(\bx)\rangle - \langle \bd, \bz_x \rangle = \frac{1}{2}\| \bd \|^2. \]
  10. Proceeding similarly, we also have

    \[ g_x(-\bd) \leq \frac{1}{2}\| \bd \|^2. \]
  11. Since \(\varphi_C\) is convex, hence \(g_x\) is also convex.

  12. Thus,

    \[ 0 = g_x(\bzero) = g_x \left (\frac{1}{2} \bd + \frac{1}{2} (-\bd) \right ) \leq \frac{1}{2} g_x(\bd) + \frac{1}{2} g_x(-\bd). \]
  13. Thus,

    \[ g_x(\bd) \geq - g_x(-\bd) \geq - \frac{1}{2}\| \bd \|^2. \]
  14. Combining, we have

    \[ - \frac{1}{2}\| \bd \|^2 \leq g_x(\bd) \leq \frac{1}{2}\| \bd \|^2. \]
  15. Or, in terms of absolute values.

    \[ |g_x(\bd)| \leq \frac{1}{2}\| \bd \|^2. \]
  16. Then,

    \[ \frac{|g_x(\bd)|}{\| \bd \|} \leq \frac{1}{2}\| \bd \|. \]
  17. Thus,

    \[ \lim_{\bd \to \bzero} \frac{g_x(\bd)}{ \| \bd \|} = 0 \]
  18. Thus, \(\bz_x = \bx - P_C(\bx)\) is indeed the gradient of \(\varphi_C\) at \(\bx\).

Theorem 10.19 (Gradient of \(\psi_C\).)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). The gradient of the function \(\psi_C\) as defined in Definition 10.9 at \(\bx \in \VV\) is given by:

\[ \nabla \psi_C(\bx) = P_C(\bx) \Forall \bx \in \VV. \]

Proof. We have

\[ \psi_C(\bx) = \frac{1}{2} \left (\| \bx \|^2 - d_C^2(\bx) \right) = \frac{1}{2} (\| \bx \|^2 - \varphi_C(\bx). \]

Hence,

\[ \nabla \psi_C(\bx) = \bx - \nabla \varphi_C(\bx) = \bx - (\bx - P_C(\bx)) = P_C(\bx). \]

Remark 10.5 (Distance function and square distance function relation)

We note that \(\varphi_C = g \circ d_C\) where \(g(t) = \frac{1}{2}[t]_+^2\).

\(g\) is a nonincreasing real-valued convex differentiable function. We also note that

\[ g'(t) = 2 [t]_+. \]

Theorem 10.20 (Subdifferential of the distance function)

Let \(C\) be a nonempty, closed and convex subset of \(\VV\). The subdifferential of the distance function \(d_C\) is given by

\[\begin{split} \partial d_C (\bx) = \begin{cases} \left \{ \frac{\bx - P_C(\bx)}{d_C(\bx)}\right \}, & \bx \notin C\\ N_C(\bx) \cap B[\bzero, 1], & \bx \in C \end{cases}. \end{split}\]

\(N_C(\bx)\) denotes the normal cone of all vectors normal to the set \(C\) at a point \(\bx \in C\).

Since \(\partial d_C\) is a singleton for \(\bx \notin C\), hence \(d_C\) is differentiable at \(\bx \notin C\).

Proof. We can get the subdifferentials for \(d_C\) by applying the chain rule.

  1. Recall that \(\varphi_C = g \circ d_C\) where \(g(t) = \frac{1}{2}[t]_+^2\).

  2. Thus, by subdifferential chain rule (Theorem 9.229):

    \[ \partial \varphi_C (\bx) = g'(d_C(\bx)) \partial d_C(\bx) = [d_C(\bx)]_+ \partial d_C(\bx) = d_C(\bx) \partial d_C(\bx). \]

    We used the fact that \(d_C\) is nonnegative.

  3. Since \(\varphi_C\) is differentiable, hence \(\partial \varphi_C(\bx) = \{\bx - P_C(\bx) \}\).

  4. If \(\bx \notin C\), then, \(d_C(\bx) > 0\).

  5. Thus, for \(\bx \notin C\)

    \[ \partial d_C(\bx) = \left \{ \frac{\bx - P_C(\bx)}{d_C(\bx)} \right \}. \]
  6. For \(\bx \in C\), \(d_C(\bx) = 0\).

  7. We need to show that \(\partial d_C(\bx) = N_C(\bx) \cap B[\bzero, 1]\) in this case.

  8. Consider any \(\bd \in \partial d_C(\bx)\).

  9. Then, by subgradient inequality

    \[ d_C(\by) \geq d_C(\bx) + \langle \by - \bx, \bd \rangle = \langle \by - \bx, \bd \rangle \Forall \by \in \VV \]

    since \(d_C(\bx) = 0\).

  10. Then, in particular, for any \(\by \in C\)

    \[ d_C(\by) = 0 \geq \langle \by - \bx, \bd \rangle. \]
  11. Thus, \(\bd \in N_C(\bx)\).

10.2.8. Conjugates#

Theorem 10.21 (Conjugate of norm squared plus indicator)

Let \(C\) be a nonempty subset of \(\VV\). Let \(f : \VV \to \RERL\) be given by:

\[ f(\bx) = \frac{1}{2} \| \bx \|^2 + \delta_C(\bx). \]

Then, its conjugate is given by:

\[ f^*(\by) = \frac{1}{2}\| \by \|^2 - \frac{1}{2} d_C^2 (\by) = \psi_C(\by). \]

Further, if \(C\) is nonempty, closed and convex, then \(f^{**} = f\). In other words, \(\psi_C^* = f\).

Proof. Recall from Definition 9.78 that

\[ f^*(\by) = \underset{\bx \in \VV}{\sup} \{ \langle \bx, \by \rangle - f(\bx)\} \Forall \by \in \VV^*. \]

Expanding, we obtain

\[\begin{split} \underset{\bx \in \VV}{\sup} \{ \langle \bx, \by \rangle - f(\bx)\} &= \underset{\bx \in \VV}{\sup} \{ \langle \bx, \by \rangle - \frac{1}{2} \| \bx \|^2 - \delta_C(\bx)\} \\ &= \underset{\bx \in C}{\sup} \{ \langle \bx, \by \rangle - \frac{1}{2} \| \bx \|^2 \} \\ &= \psi_C(\by). \end{split}\]

The last result is due to Theorem 10.15.

If \(C\) is nonempty, closed and convex, then, \(f\) is proper closed and convex. Then, due to Theorem 9.242, the biconjugate of \(f\) is \(f\) itself.

10.2.9. Smoothness#

Recall from Definition 9.80 that a function \(f : \VV \to \RR\) is \(L\)-smooth if

\[ \| \nabla f(\bx) - \nabla f(\by)\|_* \leq L \| \bx - \by \| \Forall \bx, \by \in \VV. \]

Since the norm in this section is induced by the inner product, hence it is self dual.

Thus, the smoothness criteria becomes:

\[ \| \nabla f(\bx) - \nabla f(\by)\|_* \leq L \| \bx - \by \| \Forall \bx, \by \in \VV. \]

Theorem 10.22 (Smoothness of the squared distance function)

The squared distance function \(\varphi_C\) is 1-smooth.

Proof. Recall the definition of \(\varphi_C\) from Definition 10.9.

  1. By Theorem 10.18,

    \[ \nabla \varphi_C(\bx) = \bx - P_C(\bx). \]
  2. Hence,

    \[\begin{split} & \| \nabla \varphi_C(\bx) - \nabla \varphi_C(\by) \|^2 \\ &= \| \bx - P_C(\bx) - \by + P_C(\by) \|^2 \\ &= \| (\bx - \by) - (P_C(\bx) - P_C(\by)) \|^2 \\ &= \| \bx - \by \|^2 - 2 \langle P_C(\bx) - P_C(\by), \bx - \by \rangle + \| P_C(\bx) - P_C(\by) \|^2\\ &\leq \| \bx - \by \|^2 - 2 \| P_C(\bx) - P_C(\by) \|^2 + \| P_C(\bx) - P_C(\by) \|^2 & \text{ firm nonexpansiveness } \\ &= \| \bx - \by \|^2 - \| P_C(\bx) - P_C(\by) \|^2 \\ &\leq \| \bx - \by \|^2. \end{split}\]
  3. Thus,

    \[ \| \nabla \varphi_C(\bx) - \nabla \varphi_C(\by) \| \leq \| \bx - \by \|. \]
  4. Thus, \(\varphi_C\) is 1-smooth.

Theorem 10.23 (Smoothness of the \(\psi_C\) function)

The function \(\psi_C\) is 1-smooth.

Proof. We recall from Theorem 10.19 that \(\nabla \psi_C(\bx) = P_C(\bx)\).

Hence,

\[ \| \nabla \psi_C(\bx) - \nabla \psi_C(\by) \| = \| P_C(\bx) - P_C(\by) \| \leq \| \bx - \by \| \]

due to the nonexpansiveness property (Theorem 10.13).

10.2.10. POCS Problems#

In this section, we present some example optimization problems which can be converted into an equivalent projection on a convex set problem.

10.2.10.1. Equality Constrained Quadratic Programming#

Quadratic programming problems are discussed extensively in Quadratic Programming. Here we discuss a specific form of minimizing a quadratic function subject to linear equality constraints.

Example 10.1 (Equality constrained quadratic programming)

We consider the quadratic programming problem

\[\begin{split} & \text{minimize } & & \frac{1}{2} \| \bx \|^2 + \bc^T \bx \\ & \text{subject to } & & \bA \bx = \bzero. \end{split}\]

where

  1. \(\bx \in \RR^n\) is the optimization variable.

  2. \(\bc \in \RR^n\) is a given vector.

  3. \(\bA \in \RR^{m \times n}\) is an \(m \times n\) matrix of rank \(m\). Assume that \(m < n\).

We proceed towards converting this problem into a projection on a convex set problem as follows.

  1. By adding a constant term \(\frac{1}{2} \| \bc \|^2\) to the objective function, we obtain an equivalent problem.

    \[\begin{split} & \text{minimize } & & \frac{1}{2} \| \bc + \bx \|^2 \\ & \text{subject to } & & \bA \bx = \bzero. \end{split}\]
  2. The set \(C = \{ \bx \ST \bA \bx = \bzero \}\) is the null space of the matrix \(\bA\) which is linear subspace, hence a nonempty, closed and convex set.

  3. Minimizing \(\frac{1}{2} \| \bc + \bx \|^2\) is equivalent to minimizing \(\| (-\bc) - \bx \|\) subject to \(\bx \in C\).

  4. \(\| (-\bc) - \bx \|\) is \(d(-\bc, \bx)\), the distance between \(-\bc\) and a point \(\bx\).

  5. Thus, we are minimizing the distance of the point \(-\bc\) among \(\bx \in C\).

  6. This is nothing but the distance of \(-\bc\) from the set \(C\).

  7. Since, \(C\) is nonempty, close and convex, hence, there is a unique \(\bx^*\) which minimizes the distance due to the projection theorem.

  8. Thus, the solution is the projection of the vector \(-\bc\) on the subspace \(C\).

  9. By Theorem 10.8, \(\bx^*\) is the unique projection of \(-\bc\) on \(C\) if and only if \(-\bc - \bx^* \in C^{\perp}\).

  10. In other words, $\( \langle \bc + \bx^*, \bx \rangle = 0 \Forall \bx \in C. \)$

  11. A closed form solution to this problem does exist given by

    \[ \bx^* = - (\bI - \bA^T (\bA \bA^T)^{-1} bA) \bc. \]
  12. It is indeed the unique solution to this quadratic programming problem.