10.12. Quadratic Programming#

10.12.1. Quadratic Functions#

Definition 10.41 (Quadratic function)

A function \(f : \RR^n \to \RR\) of the form

\[ f(\bx) = \frac{1}{2} \bx^T \bA \bx + \bb^T \bx + c \]

where \(\bA \in \SS^n\), \(\bb \in \RR^n\) and \(c \in \RR\), is known as a quadratic function.

The matrix \(\bA\) is known as the matrix associated with the quadratic function.

Remark 10.8 (Gradient and Hessian of a quadratic function)

Let

\[ f(\bx) = \frac{1}{2} \bx^T \bA \bx + \bb^T \bx + c \]

be a quadratic function. Then, the gradient is given by:

\[ \nabla f(\bx) = \bA \bx + \bb. \]

And, the Hessian is given by:

\[ \nabla^2 f(\bx) = \bA. \]

See Example 5.8 and Example 5.13 for reference.

10.12.1.1. Stationary Points#

Theorem 10.86 (Stationary points of quadratic functions)

Let a quadratic function \(f : \RR^n \to \RR\) be given by

\[ f(\bx) = \frac{1}{2} \bx^T \bA \bx + \bb^T \bx + c \]

where \(\bA \in \SS^n\), \(\bb \in \RR^n\) and \(c \in \RR\).

  1. \(\bx \in \RR^n\) is a stationary point if and only if \(\bA \bx = - \bb\).

  2. If \(\bA \succeq \ZERO\), then \(\bx\) is a global minimum point of \(f\) if and only if \(\bA \bx = -\bb\).

  3. If \(\bA \succ \ZERO\), then \(\bx = - \bA^{-1} \bb\) is a strict global minimum point of \(f\).

  4. If \(\bA \succ \ZERO\), then the minimum value of \(f\) is \(c - \frac{1}{2} \bb^T \bA^{-1} \bb\).

Proof. (1) is a direct implication of the fact that \(\nabla f(\bx) = \bzero\) if and only if \(\bA \bx + \bb = \bzero\).

(2) We are given that \(\nabla^2 f(\bx) = \bA \succeq \ZERO\).

  1. Thus, \(\nabla^2 f(\bx) \succeq \ZERO\) for every \(\bx \in \RR^n\).

  2. By Theorem 8.4, if \(\bx\) is a stationary point of \(f\), then it is a global minimum point.

  3. By first part, \(\bx\) is a stationary point if and only if \(\bA \bx = - \bb\).

(3) We are given that \(\bA \succ \ZERO\).

  1. Then, \(\bA\) is invertible.

  2. Hence, \(\bx = - \bA^{-1} \bb\) is the unique solution to the equation \(\bA \bx = - \bb\).

  3. By parts (1) and (2), it is the unique (hence strict) global minimizer of \(f\).

(4) We know that strict global minimum point of \(f\) is given by \(\ba = - \bA^{-1} \bb\) with \(\bA \ba = - \bb\). Therefore,

\[\begin{split} f(\ba) &= \frac{1}{2} \ba^T \bA \ba + \bb^T \ba + c \\ &= - \frac{1}{2} \ba^T \bb + \bb^T \ba + c \\ &= c + \frac{1}{2} \bb^T \ba \\ &= c - \frac{1}{2} \bb^T \bA^{-1} \bb. \end{split}\]

10.12.1.2. Coerciveness#

Theorem 10.87 (Coerciveness of quadratic functions)

Let a quadratic function \(f : \RR^n \to \RR\) be given by

\[ f(\bx) = \frac{1}{2} \bx^T \bA \bx + \bb^T \bx + c \]

where \(\bA \in \SS^n\), \(\bb \in \RR^n\) and \(c \in \RR\).

\(f\) is coercive if and only if \(\bA \succ \ZERO\); i.e., \(\bA\) is positive definite.

Proof. Assume that \(\bA\) is positive definite.

  1. Then, all eigenvalues of \(\bA\) are positive.

  2. Let \(\lambda\) be the smallest eigenvalue of \(\bA\).

  3. Then, \(\bx^T \bA \bx \geq \lambda \| \bx \|^2\) for every \(\bx \in \RR^n\).

  4. Thus,

    \[\begin{split} f(\bx) &\geq \frac{ \lambda}{2} \| \bx \|^2 + \bb^T \bx + c \\ &\geq \frac{ \lambda}{2} \| \bx \|^2 - \|\bb \| \| \bx \| + c & \text{ Cauchy Schwartz inequality }\\ &= \frac{ \lambda}{2} \| \bx \| \left (\| \bx \| - \frac{2}{\lambda} \| \bb \| \right ) + c. \end{split}\]
  5. We can see that \(f(\bx) \to \infty\) as \(\| \bx \| \to \infty\).

  6. Thus, \(f\) is coercive.

Now, assume that \(f\) is coercive.

  1. We need to show that \(\bA\) must be positive definite.

  2. Thus, all eigenvalues of \(\bA\) must be positive.

  3. For contradiction, assume that an eigenvalue of \(\bA\) is negative.

  4. Let \(\lambda < 0\) be such an eigenvalue with the corresponding normalized eigenvector \(\bv\) such that \(\bA \bv = \lambda \bv\).

  5. Then, for any \(t \in \RR\),

    \[\begin{split} f(t \bv) &= \frac{t^2}{2} \bv^T \bA \bv + t \bb^T \bv + c \\ &= \frac{\lambda t^2}{2} + t \bb^T \bv + c. \end{split}\]
  6. Clearly, \(f(t \bv) \to -\infty\) as \(t \to \infty\) since \(\lambda\) is negative.

  7. Thus, it contradicts the hypothesis that \(f\) is coercive.

  8. We now consider the possibility where there is a 0 eigenvalue.

  9. Then, there exists a normalized eigenvector \(\bv\) such that \(\bA \bv = \bzero\).

  10. Then, for any \(t \in \RR\),

    \[ f(t \bv) = t \bb^T \bv + c. \]
  11. If \(\bb^T \bv = 0\), then \(f(t\bv) = c\) for every \(t \in \RR\).

  12. If \(\bb^T \bv > 0\), then \(f(t \bv) \to -\infty\) as \(t \to -\infty\).

  13. If \(\bb^T \bv < 0\), then \(f(t \bv) \to -\infty\) as \(t \to \infty\).

  14. In all the three cases, \(f(t \bv)\) does not go to \(\infty\) as \(\| t \bv \| \to \infty\).

  15. Thus, \(f\) is not coercive. A contradiction to the hypothesis.

  16. Hence, the eigenvalues of \(\bA\) must be positive.

  17. Hence, \(\bA\) must be positive definite.

10.12.1.3. Nonnegative Quadratics#

It is useful to work with quadratic functions which are nonnegative on the entire \(\RR^n\).

The basic quadratic form \(f(\bx) = \frac{1}{2} \bx^T \bA \bx\) is nonnegative on entire \(\RR^n\) if \(\bA\) is positive semidefinite.

For the general quadratic function, we need to incorporate the contribution from \(\bb\) and \(c\) terms also.

Theorem 10.88 (Nonnegativity of quadratic function)

Let a quadratic function \(f : \RR^n \to \RR\) be given by

\[ f(\bx) = \frac{1}{2} \bx^T \bA \bx + \bb^T \bx + c \]

where \(\bA \in \SS^n\), \(\bb \in \RR^n\) and \(c \in \RR\).

The following statements are equivalent.

  1. \(f(\bx) \geq 0\) for every \(\bx \in \RR^n\).

  2. \(\begin{bmatrix} \bA & \bb \\ \bb^T & 2 c \end{bmatrix} \succeq \ZERO\); i.e., this \(n+1 \times n+1\) symmetric matrix is positive semidefinite.

Proof. Assume that (2) is true.

  1. Then, for every \(\bx \in \RR^n\)

    \[\begin{split} \begin{bmatrix} \bx \\ 1 \end{bmatrix}^T \begin{bmatrix} \bA & \bb \\ \bb^T & 2 c \end{bmatrix} \begin{bmatrix} \bx \\ 1 \end{bmatrix} \geq 0 \end{split}\]

    due to positive semidefiniteness.

  2. But

    \[\begin{split} \begin{bmatrix} \bx \\ 1 \end{bmatrix}^T \begin{bmatrix} \bA & \bb \\ \bb^T & 2 c \end{bmatrix} \begin{bmatrix} \bx \\ 1 \end{bmatrix} &= \begin{bmatrix} \bx^T & 1 \end{bmatrix} \begin{bmatrix} \bA \bx + \bb \\ \bb^T \bx + 2c \end{bmatrix} \\ & = \bx^T \bA \bx + 2 \bx^T \bb + 2 c \\ &= 2 \left ( \frac{1}{2} \bx^T \bA \bx + \bb^T \bx + c \right ) \\ = 2 f(\bx). \end{split}\]
  3. Thus, \(f(\bx) \geq 0\) for every \(\bx \in \RR^n\).

For the converse, assume (1) is true.

  1. We need to show that \(\begin{bmatrix} \bA & \bb \\ \bb^T & 2 c \end{bmatrix}\) is positive semidefinite.

  2. We shall first show that \(\bA\) is positive semidefinite.

  3. For contradiction, assume that \(\bA\) is not positive semidefinite.

  4. Then, there exists a negative eigenvalue \(\lambda < 0\) and corresponding normalized eigenvector \(\bv\) for \(\bA\) such that \(\bA \bv = \lambda \bv\).

  5. Then, for any \(t \in \RR\)

    \[ f(t \bv) = \frac{\lambda t^2}{2} + t \bb^T \bv + c. \]
  6. Then, \(f(t \bv) \to -\infty\) as \(t \to -\infty\).

  7. This contradicts the hypothesis that \(f\) is nonnegative everywhere.

  8. Thus, \(\bA\) must be positive semidefinite.

  9. We now need to show that for any \(\by \in \RR^n\) and any \(t \in \RR\),

    \[\begin{split} \begin{bmatrix} \by \\ t \end{bmatrix}^T \begin{bmatrix} \bA & \bb \\ \bb^T & 2 c \end{bmatrix} \begin{bmatrix} \by \\ t \end{bmatrix} \geq 0. \end{split}\]
  10. This condition is equivalent to

    \[ \frac{1}{2} \by^T \bA \by + t \bb^T \by + c t^2 \geq 0 \]

    for every \(\by \in \RR^n\) and \(t \in \RR\).

  11. If \(t = 0\), then this condition reduces to

    \[ \by^T \bA \by \geq 0 \Forall \by \in \RR^n. \]
  12. This is valid for every \(\by \in \RR^n\) since \(\bA\) is p.s.d.. as established earlier.

  13. For \(t \neq 0\), we have

    \[ t^2 f \left ( \frac{\by}{ t} \right ) = t^2 \left ( \frac{1}{2 t^2} \by^T \bA \by + \frac{1}{t} \bb^T \by + c \right ) = \frac{1}{2 } \by^T \bA \by + t \bb^T \by + c t^2. \]
  14. By hypothesis, \(t^2 f \left ( \frac{\by}{ t} \right ) \geq 0\) for every \(\by \in \RR^n\) and \(t \neq 0\).

  15. Thus, \(\frac{1}{2 } \by^T \bA \by + t \bb^T \by + c t^2 \geq 0\) for every \(\by \in \RR^n\) and \(t \in \RR\).

  16. Thus, \(\begin{bmatrix} \bA & \bb \\ \bb^T & 2 c \end{bmatrix}\) is indeed p.s.d..

10.12.2. Quadratic Optimization Problems#

We consider the following possibilities:

  1. The objective function is a quadratic function.

  2. Both the objective function as well as the inequality constraints function are quadratic.

10.12.2.1. Quadratic Program#

Definition 10.42 (Quadratic program)

A convex optimization problem is known as a quadratic program (QP) if the objective function is a (convex) quadratic and the constraint functions are affine.

A general quadratic program has the following form:

(10.30)#\[\begin{split}& \text{minimize } & & \frac{1}{2} \bx^T \bP \bx + \bq^T \bx + r \\ & \text{subject to } & & \bG \bx \preceq \bh \\ & & & \bA \bx = \bb\end{split}\]

where

  1. \(\bx \in \RR^n\) is the optimization variable.

  2. \(\bP \in \SS^n_+\) is a symmetric positive semidefinite matrix.

  3. \(\bq \in \RR^n\) and \(r \in \RR\).

  4. \(f(\bx) = \frac{1}{2} \bx^T \bP \bx + \bq^T \bx + r\) is a (convex) quadratic objective function.

  5. \(\bG \in \RR^{m \times n}\) and \(\bh \in \RR^m\) describe the \(m\) affine inequality constraints.

  6. \(\bA \in \RR^{p \times n}\) and \(\bb \in \RR^p\) describe the \(p\) affine equality constraints.

10.12.2.2. Quadratically Constrained Quadratic Program#

Definition 10.43 (Quadratically constrained quadratic program)

A convex optimization problem is known as a quadratically constrained quadratic program (QCQP) if the objective function and the inequality constraint functions are (convex) quadratic while the equality constraint functions are affine.

A general quadratic program has the following form:

(10.31)#\[\begin{split}& \text{minimize } & & \frac{1}{2} \bx^T \bP_0 \bx + \bq_0^T \bx + r_0 \\ & \text{subject to } & & \frac{1}{2} \bx^T \bP_i \bx + \bq_i^T \bx + r_i \leq 0 & \quad i=1,\dots, m\\ & & & \bA \bx = \bb\end{split}\]

where

  1. \(\bx \in \RR^n\) is the optimization variable.

  2. \(\bP_i \in \SS^n_+\) are symmetric positive semidefinite matrices for \(i=0,\dots,m\).

  3. \(\bq_i \in \RR^n\) and \(r_i \in \RR\) for \(i=0,\dots,m\).

  4. \(f_0(\bx) = \frac{1}{2} \bx^T \bP_0 \bx + \bq_0^T \bx + r_0\) is a (convex) quadratic objective function.

  5. \(f_i(\bx) = \frac{1}{2} \bx^T \bP_i \bx + \bq_i^T \bx + r_i\) are (convex) quadratic inequality constraint functions for \(i=1,\dots,m\).

  6. \(\bA \in \RR^{p \times n}\) and \(\bb \in \RR^p\) describe the \(p\) affine equality constraints.