Projection on Convex Sets
Contents
10.2. Projection on Convex Sets#
We will assume \(\VV\) to be real \(n\)-dimensional vector space space with an inner product \(\langle \cdot, \cdot \rangle\) , the induced norm \(\| \cdot \|\) and corresponding dual norm \(\| \cdot \|_*\) for the dual space \(\VV^*\).
We are interested in mapping a point \(\bx\) to the nearest point in a set \(C\). In general, this problem may have zero, one or multiple solutions. However, in the special case where \(C\) is a nonempty, closed and convex set, then there is exactly one such point in \(C\) which is nearest to a given point \(\bx\). This nearest point is called the projection of \(\bx\) on to \(C\).
Main references for this section are [6, 9].
10.2.1. Projection Theorem#
(Projection theorem)
Let \(C\) be a nonempty, closed and convex subset of \(\VV\). For every \(\bx \in \VV\), there exists a unique vector that minimizes \(\| \bz - \bx \|\) over all \(\bz \in C\). This vector is called the projection of \(\bx\) on \(C\).
Proof. We fix some \(\bx \in \VV\) and choose an element \(\bw \in C\).
Consider the function
\[ g(\bz) = \frac{1}{2} \| \bz - \bx \|^2. \]Minimizing \(\| \bz - \bx \|\) over all \(C\) is equivalent to minimizing \(g(\bz)\) over the set
\[ D = \{ \bz \in C \ST \| \bz - \bx \| \leq \| \bw - \bx \| \}. \]We note that \(D\) is a compact set and \(g\) is a l.s.c., closed and coercive function.
By Weierstrass’ theorem Corollary 8.2, the set of minimizers for \(g\) is nonempty and compact.
We now show that the minimizer is unique.
\(g\) is a strictly convex function because its Hessian matrix is identity matrix which is positive definite.
Hence, the minimizer is unique due to Theorem 10.2.
10.2.2. Orthogonal Projection#
(Orthogonal Projection Mapping)
Let \(C\) be a nonempty, closed and convex subset of \(\VV\). The orthogonal projection mapping \(P_C : \VV \to \VV\) is defined by:
This mapping is well defined since the projection is unique for a nonempty, closed and convex set \(C\) due to Theorem 10.6.
The vector \(P_C(\bx)\) is called the projection of \(\bx\) on the set \(C\).
10.2.3. Characterization#
(Orthogonal projection characterization)
Let \(C\) be a nonempty, closed and convex subset of \(\VV\). For every vector \(\bx \in \VV\), a vector \(\bz \in C\) is its projection if and only if
This result is also known as the second projection theorem [6].
Proof. Assume that for some \(\bz \in C\), \(\langle \by - \bz, \bx - \bz \rangle \leq 0 \Forall \by \in C\) holds true.
For any \(\by \in C\)
\[\begin{split} \| \by - \bx \|^2 &= \| (\by - \bz) - (\bx - \bz) \|^2 \\ &= \| \by - \bz \|^2 + \| \bz - \bx \|^2 - 2 \langle \by - \bz, \bx - \bz \rangle \\ &\geq \| \bz - \bx \|^2 - 2 \langle \by - \bz, \bx - \bz \rangle. \end{split}\]Thus,
\[ \| \by - \bx \|^2 \geq \| \bz - \bx \|^2 \Forall \by \in C. \]Thus, \(\bz\) is indeed the projection of \(\bx\) on \(C\).
Conversely, assume that \(\bz\) is the projection of \(\bx\) on \(C\).
Let \(\by \in C\) be arbitrary.
For any \(t \geq 0\), define \(\by_t = t \by + (1 -t ) \bz\).
Then, we have
\[\begin{split} \| \bx - \by_t \|^2 &= \| t \bx + (1-t) \bx + - t \by - (1 -t ) \bz \|^2 \\ &= \| t (\bx - \by) + (1- t) (\bx - \bz) \|^2\\ &= t^2 \| \bx - \by \|^2 + (1-t)^2 \| \bx - \bz \|^2 + 2 t (1-t) \langle \bx - \by, \bx - \bz \rangle. \end{split}\]Viewing \(\| \bx - \by_t \|^2\) as a function of \(t\) and differentiating w.r.t. \(t\), we have
\[\begin{split} \left . \frac{d}{d t} \| \bx - \by_t \|^2 \right |_{t = 0} &= -2 \| \bx - \bz \|^2 + 2 \langle \bx - \by, \bx - \bz \rangle\\ &= -2 \langle \bx - \bz, \bx - \bz \rangle + 2 \langle \bx - \by, \bx - \bz \rangle\\ &= -2 \langle \by - \bz, \bx - \bz \rangle. \end{split}\]Since \(t=0\) minimizes \(\| \bx - \by_t \|^2\) over \(t \in [0,1]\), we must have
\[ \left . \frac{d}{d t} \| \bx - \by_t \|^2 \right |_{t = 0} \geq 0. \]Thus, we require that
\[ \langle \by - \bz, \bx - \bz \rangle \leq 0 \]must hold true for every \(\by \in C\).
Following is an alternative proof based on results from Constrained Optimization I. This proof is specific to the case where \(\VV = \RR^n\).
Proof. Define a function \(f: \RR^n \to \RR\) as
Then, the projection problem can be cast as an optimization problem
Note that the gradient of \(f\) is given by
By Theorem 10.47, \(\bz\) is an optimal solution if and only if
In other words
We can simplify this as
(Orthogonal projection on an affine subspace)
Let \(C\) be an affine subspace of \(\VV\). Let \(S\) be the linear subspace parallel to \(C\). For every vector \(\bx \in \VV\), a vector \(\bz \in C\) is its projection if and only if
Proof. Since \(C\) is an affine subspace of \(\VV\), hence \(C\) is nonempty, convex and closed (as \(\VV\) is finite dimensional).
By Theorem 10.7, \(\bz\) is the projection of \(\bx\) on \(C\) if and only if for every \(\by \in C\), we have
\[ \langle \by - \bz, \bx - \bz \rangle \leq 0. \]But \(\by \in C\) if and only if \(\by - \bz \in S\).
Hence the condition is equivalent to
\[ \langle \bw, \bx - \bz \rangle \leq 0 \Forall \bw \in S. \]But then, it must be an equality since \(\bw\) and \(-\bw\) both belong to \(S\). Thus, we have
\[ \langle \bw, \bx - \bz \rangle = 0 \Forall \bw \in S. \]In other words, \(\bx - \bz \in S^{\perp}\).
10.2.4. Distance Function#
Recall that the distance of a point \(\bx \in \VV\) from a set \(C\) is defined as
(Distance function for nonempty, closed and convex set)
Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Then the function \(d_C : \VV \to \RR\) defining the distance of a point \(\bx \in \VV\) from the set \(C\) satisfies
Proof. By Theorem 10.6, there exists a unique point \(P_C(\bx)\) which minimizes the distance between \(\bx\) and \(C\). Hence
must hold true.
(Distance function for nonempty, closed and convex set is convex)
Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Let \(d_C : \VV \to \RR\) be the distance to the set \(C\) function as defined in Theorem 10.9. Then, \(d_C\) is convex.
Proof. Assume, for contradiction, that \(d_C\) is not convex.
Then, there exist \(\bx, \by \in \VV\) and \(t \in (0,1)\) such that
\[ d_C(t \bx + (1-t) \by) > t d_C(\bx) + (1-t)d_C(\by). \]Let \(\bu = P_C(\bx)\) and \(\bv = P_C(\by)\). By definition, \(\bu, \bv \in C\).
Then,
\[ t d_C(\bx) + (1-t)d_C(\by) = t \| \bu - \bx \| + (1-t) \| \bv - \by \|. \]Since \(C\) is convex, hence, \(t \bu + (1-t) \bv \in C\).
Since, \(d_C(t \bx + (1-t) \by)\) minimizes the distance of \(C\) from the point \(t \bx + (1-t) \by\), hence
\[ \| t \bu + (1-t) \bv - t \bx - (1-t) \by \| \geq d_C(t \bx + (1-t) \by). \]Rewriting,
\[ d_C(t \bx + (1-t) \by) \leq \| t (\bu - \bx) + (1-t) (\bv - \by) \| \leq t \| \bu - \bx \| + (1-t) \| \bv - \by \| \]due to triangle inequality.
But, this leads to the contradiction
\[ t \| \bu - \bx \| + (1-t) \| \bv - \by \| < d_C(t \bx + (1-t) \by) \leq t \| \bu - \bx \| + (1-t) \| \bv - \by \|. \]Hence, \(d_C\) must be convex.
10.2.5. Nonexpansiveness#
(Nonexpansiveness property)
Let \(\VV\) be a normed linear space. An operator \(T : \VV \to \VV\) is called nonexpansive if
In other words, the distance between mapped points in \(\VV\) is always less than or equal to the distance between original points in \(\VV\).
(Nonexpansive operators are Lipschitz continuous)
Let \(\VV\) a be normed linear space. A nonexpansive operator \(T : \VV \to \VV\) is Lipschitz continuous. Hence, it is uniformly continuous.
Proof. Recall from Definition 3.45 that if \(T\) is a Lipschitz map, then there exists \(L > 0\) such that
for every \(\bx, \by \in \VV\).
For a nonexpansive operator, such \(L = 1\). This \(T\) is indeed Lipschitz continuous. By Theorem 3.57, every Lipschitz continuous function is uniformly continuous.
(Firm nonexpansiveness property)
Let \(\VV\) be a real inner product space. An operator \(T : \VV \to \VV\) is called firmly nonexpansive if
holds true for every \(\bx, \by \in \VV\).
(A firmly nonexpansive operator is nonexpansive)
Let \(\VV\) be a real inner product space. Let \(T : \VV \to \VV\) be a firmly nonexpansive operator. Then, \(T\) is nonexpansive.
Proof. For every \(\bx, \by \in \VV\), we have
Applying Cauchy Schwartz inequality on R.H.S., we get
Canceling terms, we get:
which is the nonexpansive property.
(Orthogonal projection is nonexpansive)
Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Let \(P_C : \VV \to \VV\) be the orthogonal projection operator as defined in Definition 10.6 is nonexpansive (and therefore continuous).
In other words,
Proof. Let \(\bx, \by \in \VV\).
By Theorem 10.7,
\[ \langle \bw - P_C(\bx), \bx - P_C(\bx) \rangle \leq 0 \Forall \bw \in C. \]In particular \(P_C(\by) \in C\). Hence,
\[ \langle P_C(\by) - P_C(\bx), \bx - P_C(\bx) \rangle \leq 0. \]Similarly, starting with \(P_C(\bx)\), we obtain
\[ \langle P_C(\bx) - P_C(\by), \by - P_C(\by) \rangle \leq 0. \]Adding these two inequalities, we obtain
\[ \langle P_C(\by) - P_C(\bx), \bx - P_C(\bx) - \by + P_C(\by) \rangle \leq 0. \]By rearranging the terms, we get
Applying the Cauchy Schwartz inequality on the R.H.S., we obtain
Thus, \(P_C\) is nonexpansive.
Since \(P_C\) is nonexpansive, hence \(P_C\) is continuous also.
(Orthogonal projection is firmly nonexpansive)
Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Let \(P_C : \VV \to \VV\) be the orthogonal projection operator as defined in Definition 10.6. Then \(P_C\) is a firmly nonexpansive operator.
In other words,
holds true for every \(\bx, \by \in \VV\).
Proof. Recall from Theorem 10.7 that for any \(\bu \in \VV\) and \(\bv \in C\)
Substituting \(\bu = \bx\) and \(\bv = P_C(\by)\), we obtain
\[ \langle P_C(\by) - P_C(\bx), \bx - P_C(\bx) \rangle \leq 0. \]Substituting \(\bu = \by\) and \(\bv = P_C(\bx)\), we obtain
\[ \langle P_C(\bx) - P_C(\by), \by - P_C(\by) \rangle \leq 0. \]Adding the two inequalities gives us
\[\begin{split} & \langle P_C(\bx) - P_C(\by), \by - P_C(\by) - \bx + P_C(\bx) \rangle \leq 0 \\ & \iff \langle P_C(\bx) - P_C(\by), (\by - \bx) + (P_C(\bx) - P_C(\by)) \rangle \leq 0\\ &\iff \| P_C(\bx) - P_C(\by) \|^2 \leq \langle P_C(\bx) - P_C(\by), \bx - \by \rangle \end{split}\]as desired.
10.2.6. Squared Distance Function#
(Squared distance function to a nonempty set)
Let \(C\) be a nonempty subset of \(\VV\). The squared distance to set \(C\) function denoted as \(\varphi_C : \VV \to \RR\) is defined as:
We also define \(\psi_C : \VV \to \RR\) as:
\(\psi_C\))
(Expression forLet \(C\) be a nonempty subset of \(\VV\). Then, the function \(\psi_C\) as defined in Definition 10.9 is given by
Proof. We proceed as follows.
Expanding on the definition of \(d_C^2\)
\[\begin{split} d_C^2(\bx) &= \inf_{\by \in C} \| \bx - \by \|^2 \\ &= \inf_{\by \in C} \langle \bx - \by, \bx - \by \rangle \\ &= \inf_{\by \in C} (\| \bx \|^2 - 2 \langle \bx, \by \rangle + \| \by \|^2) \\ &= \inf_{\by \in C} (\| \bx \|^2 - (2 \langle \bx, \by \rangle - \| \by \|^2)) \\ &= \| \bx \|^2 - \sup_{\by \in C} (2 \langle \bx, \by \rangle - \| \by \|^2). \end{split}\]Thus,
\[ \| \bx \|^2 - d_C^2 (\bx) = \sup_{\by \in C} ( 2 \langle \bx, \by \rangle - \| \by \|^2 ). \]Thus,
\[ \psi_C(\bx) = \frac{1}{2} \left (\| \bx \|^2 - d_C^2(\bx) \right) = \sup_{\by \in C} \left [\langle \bx, \by \rangle - \frac{1}{2} \| \by \|^2 \right ]. \]
\(\psi_C\) is convex)
(Let \(C\) be a nonempty subset of \(\VV\). Then, the function \(\psi_C\) as defined in Definition 10.9 is convex.
Beauty of this result is the fact that \(\psi_C\) is convex irrespective of whether \(C\) is convex or not.
Proof. We proceed as follows.
For every \(\by \in C\), the function \(g_y : \VV \to \RR\), given by
\[ g_y (\bx) = \langle \by, \bx \rangle - \frac{1}{2} \| \by \|^2 \]is an affine function.
\(g_y\) is convex for every \(\by \in C\) due to Theorem 9.68.
Now,
\[ \psi_C (\by) = \sup{\by \in C} g_y. \]Thus, \(\psi_C\) is a pointwise supremum of convex functions.
Thus, by Theorem 9.114, \(\psi_C\) is convex.
(Squared distance function for nonempty, closed and convex sets)
Let \(C\) be a nonempty, closed and convex subset of \(\VV\). Then, the squared distance function \(\varphi_C\) is given by
This follows directly from Theorem 10.10.
10.2.7. Gradients and Subgradients#
(Gradient of the squared distance function)
Let \(C\) be a nonempty, closed and convex subset of \(\VV\). The gradient of the squared distance function \(\varphi_C\) as defined in Definition 10.9 at \(\bx \in \VV\) is given by:
Proof. We proceed as follows.
Let \(\bx \in \VV\).
Let \(\bz_x = \bx - P_C(\bx)\).
Consider the function
\[ g_x(\bd) = \varphi_C(\bx + \bd) - \varphi_C(\bx) - \langle \bd, \bz_x \rangle. \]If
\[ \lim_{\bd \to \bzero} \frac{g_x(\bd)}{ \| \bd \|} = 0 \]then \(\bz_x\) is indeed the gradient of \(\varphi_C\) at \(\bx\).
By definition of orthogonal projection, for any \(\bd \in \VV\),
\[ \| \bx + \bd - P_C(\bx + \bd) \|^2 \leq \| \bx + \bd - P_C(\bx) \|^2 \]as \(P_C(\bx + \bd)\) is the nearest point to \(\bx + \bd\) in \(C\). \(P_C(\bx)\) is just another point in \(C\).
Thus, for any \(\bd \in \VV\)
\[\begin{split} g_x(\bd) &= \varphi_C(\bx + \bd) - \varphi_C(\bx) - \langle \bd, \bz_x \rangle \\ &= \frac{1}{2} \| \bx + \bd - P_C(\bx + \bd) \|^2 - \frac{1}{2} \| \bx - P_C(\bx) \|^2 - \langle \bd, \bz_x \rangle \\ &\leq \frac{1}{2} \| \bx + \bd - P_C(\bx) \|^2 - \frac{1}{2} \| \bx - P_C(\bx) \|^2 - \langle \bd, \bz_x \rangle. \end{split}\]Recall that for a norm induced by the inner product
\[ \| \ba + \bb \|^2 = \langle \ba + \bb, \ba + \bb \rangle = \| \ba \|^2 + 2 \langle \ba, \bb \rangle + \| \bb \|^2. \]Thus,
\[\begin{split} \| \bx + \bd - P_C(\bx)\|^2 &= \| \bd + (\bx - P_C(\bx)) \|^2\\ &= \| \bd \|^2 + \| \bx - P_C(\bx)\|^2 + 2 \langle \bd, \bx - P_C(\bx) \rangle. \end{split}\]Putting it back and simplifying, we obtain
\[ g_x(\bd) \leq \frac{1}{2}\| \bd \|^2 + \langle \bd, \bx - P_C(\bx)\rangle - \langle \bd, \bz_x \rangle = \frac{1}{2}\| \bd \|^2. \]Proceeding similarly, we also have
\[ g_x(-\bd) \leq \frac{1}{2}\| \bd \|^2. \]Since \(\varphi_C\) is convex, hence \(g_x\) is also convex.
Thus,
\[ 0 = g_x(\bzero) = g_x \left (\frac{1}{2} \bd + \frac{1}{2} (-\bd) \right ) \leq \frac{1}{2} g_x(\bd) + \frac{1}{2} g_x(-\bd). \]Thus,
\[ g_x(\bd) \geq - g_x(-\bd) \geq - \frac{1}{2}\| \bd \|^2. \]Combining, we have
\[ - \frac{1}{2}\| \bd \|^2 \leq g_x(\bd) \leq \frac{1}{2}\| \bd \|^2. \]Or, in terms of absolute values.
\[ |g_x(\bd)| \leq \frac{1}{2}\| \bd \|^2. \]Then,
\[ \frac{|g_x(\bd)|}{\| \bd \|} \leq \frac{1}{2}\| \bd \|. \]Thus,
\[ \lim_{\bd \to \bzero} \frac{g_x(\bd)}{ \| \bd \|} = 0 \]Thus, \(\bz_x = \bx - P_C(\bx)\) is indeed the gradient of \(\varphi_C\) at \(\bx\).
\(\psi_C\).)
(Gradient ofLet \(C\) be a nonempty, closed and convex subset of \(\VV\). The gradient of the function \(\psi_C\) as defined in Definition 10.9 at \(\bx \in \VV\) is given by:
Proof. We have
Hence,
(Distance function and square distance function relation)
We note that \(\varphi_C = g \circ d_C\) where \(g(t) = \frac{1}{2}[t]_+^2\).
\(g\) is a nonincreasing real-valued convex differentiable function. We also note that
(Subdifferential of the distance function)
Let \(C\) be a nonempty, closed and convex subset of \(\VV\). The subdifferential of the distance function \(d_C\) is given by
\(N_C(\bx)\) denotes the normal cone of all vectors normal to the set \(C\) at a point \(\bx \in C\).
Since \(\partial d_C\) is a singleton for \(\bx \notin C\), hence \(d_C\) is differentiable at \(\bx \notin C\).
Proof. We can get the subdifferentials for \(d_C\) by applying the chain rule.
Recall that \(\varphi_C = g \circ d_C\) where \(g(t) = \frac{1}{2}[t]_+^2\).
Thus, by subdifferential chain rule (Theorem 9.229):
\[ \partial \varphi_C (\bx) = g'(d_C(\bx)) \partial d_C(\bx) = [d_C(\bx)]_+ \partial d_C(\bx) = d_C(\bx) \partial d_C(\bx). \]We used the fact that \(d_C\) is nonnegative.
Since \(\varphi_C\) is differentiable, hence \(\partial \varphi_C(\bx) = \{\bx - P_C(\bx) \}\).
If \(\bx \notin C\), then, \(d_C(\bx) > 0\).
Thus, for \(\bx \notin C\)
\[ \partial d_C(\bx) = \left \{ \frac{\bx - P_C(\bx)}{d_C(\bx)} \right \}. \]For \(\bx \in C\), \(d_C(\bx) = 0\).
We need to show that \(\partial d_C(\bx) = N_C(\bx) \cap B[\bzero, 1]\) in this case.
Consider any \(\bd \in \partial d_C(\bx)\).
Then, by subgradient inequality
\[ d_C(\by) \geq d_C(\bx) + \langle \by - \bx, \bd \rangle = \langle \by - \bx, \bd \rangle \Forall \by \in \VV \]since \(d_C(\bx) = 0\).
Then, in particular, for any \(\by \in C\)
\[ d_C(\by) = 0 \geq \langle \by - \bx, \bd \rangle. \]Thus, \(\bd \in N_C(\bx)\).
10.2.8. Conjugates#
(Conjugate of norm squared plus indicator)
Let \(C\) be a nonempty subset of \(\VV\). Let \(f : \VV \to \RERL\) be given by:
Then, its conjugate is given by:
Further, if \(C\) is nonempty, closed and convex, then \(f^{**} = f\). In other words, \(\psi_C^* = f\).
Proof. Recall from Definition 9.78 that
Expanding, we obtain
The last result is due to Theorem 10.15.
If \(C\) is nonempty, closed and convex, then, \(f\) is proper closed and convex. Then, due to Theorem 9.242, the biconjugate of \(f\) is \(f\) itself.
10.2.9. Smoothness#
Recall from Definition 9.80 that a function \(f : \VV \to \RR\) is \(L\)-smooth if
Since the norm in this section is induced by the inner product, hence it is self dual.
Thus, the smoothness criteria becomes:
(Smoothness of the squared distance function)
The squared distance function \(\varphi_C\) is 1-smooth.
Proof. Recall the definition of \(\varphi_C\) from Definition 10.9.
By Theorem 10.18,
\[ \nabla \varphi_C(\bx) = \bx - P_C(\bx). \]Hence,
\[\begin{split} & \| \nabla \varphi_C(\bx) - \nabla \varphi_C(\by) \|^2 \\ &= \| \bx - P_C(\bx) - \by + P_C(\by) \|^2 \\ &= \| (\bx - \by) - (P_C(\bx) - P_C(\by)) \|^2 \\ &= \| \bx - \by \|^2 - 2 \langle P_C(\bx) - P_C(\by), \bx - \by \rangle + \| P_C(\bx) - P_C(\by) \|^2\\ &\leq \| \bx - \by \|^2 - 2 \| P_C(\bx) - P_C(\by) \|^2 + \| P_C(\bx) - P_C(\by) \|^2 & \text{ firm nonexpansiveness } \\ &= \| \bx - \by \|^2 - \| P_C(\bx) - P_C(\by) \|^2 \\ &\leq \| \bx - \by \|^2. \end{split}\]Thus,
\[ \| \nabla \varphi_C(\bx) - \nabla \varphi_C(\by) \| \leq \| \bx - \by \|. \]Thus, \(\varphi_C\) is 1-smooth.
\(\psi_C\) function)
(Smoothness of theThe function \(\psi_C\) is 1-smooth.
Proof. We recall from Theorem 10.19 that \(\nabla \psi_C(\bx) = P_C(\bx)\).
Hence,
due to the nonexpansiveness property (Theorem 10.13).
10.2.10. POCS Problems#
In this section, we present some example optimization problems which can be converted into an equivalent projection on a convex set problem.
10.2.10.1. Equality Constrained Quadratic Programming#
Quadratic programming problems are discussed extensively in Quadratic Programming. Here we discuss a specific form of minimizing a quadratic function subject to linear equality constraints.
(Equality constrained quadratic programming)
We consider the quadratic programming problem
where
\(\bx \in \RR^n\) is the optimization variable.
\(\bc \in \RR^n\) is a given vector.
\(\bA \in \RR^{m \times n}\) is an \(m \times n\) matrix of rank \(m\). Assume that \(m < n\).
We proceed towards converting this problem into a projection on a convex set problem as follows.
By adding a constant term \(\frac{1}{2} \| \bc \|^2\) to the objective function, we obtain an equivalent problem.
\[\begin{split} & \text{minimize } & & \frac{1}{2} \| \bc + \bx \|^2 \\ & \text{subject to } & & \bA \bx = \bzero. \end{split}\]The set \(C = \{ \bx \ST \bA \bx = \bzero \}\) is the null space of the matrix \(\bA\) which is linear subspace, hence a nonempty, closed and convex set.
Minimizing \(\frac{1}{2} \| \bc + \bx \|^2\) is equivalent to minimizing \(\| (-\bc) - \bx \|\) subject to \(\bx \in C\).
\(\| (-\bc) - \bx \|\) is \(d(-\bc, \bx)\), the distance between \(-\bc\) and a point \(\bx\).
Thus, we are minimizing the distance of the point \(-\bc\) among \(\bx \in C\).
This is nothing but the distance of \(-\bc\) from the set \(C\).
Since, \(C\) is nonempty, close and convex, hence, there is a unique \(\bx^*\) which minimizes the distance due to the projection theorem.
Thus, the solution is the projection of the vector \(-\bc\) on the subspace \(C\).
By Theorem 10.8, \(\bx^*\) is the unique projection of \(-\bc\) on \(C\) if and only if \(-\bc - \bx^* \in C^{\perp}\).
In other words, $\( \langle \bc + \bx^*, \bx \rangle = 0 \Forall \bx \in C. \)$
A closed form solution to this problem does exist given by
\[ \bx^* = - (\bI - \bA^T (\bA \bA^T)^{-1} bA) \bc. \]It is indeed the unique solution to this quadratic programming problem.