10.6. Linear Constraints#

This section focuses on optimization problems where the constraints are linear (either equality or inequality constraints). We present the optimality conditions in the form of special cases of KKT conditions.

In the sequel, we present optimality conditions for the following optimization problems

  1. Necessary optimality conditions for the minimization of a smooth cost function with linear inequality constraints.

  2. Necessary and sufficient optimality conditions for the minimization of a convex and smooth cost function with linear inequality constraints.

Before proceeding with the optimality conditions, we present a few results which belong to a class of results called “theorems of the alternative”.

10.6.1. Farkas’ Lemma#

Farkas’ lemma is a solvability theorem for a finite system of linear inequalities. Standard versions of Farkas’ lemma consist of two different systems of linear equations and inequalities and exactly one of them is feasible.

Theorem 10.53 (Farkas’ lemma)

Let \(\bA \in \RR^{m \times n}\) and \(\bb \in \RR^m\). Then, exactly one of the following two statements is true.

  1. There exists \(\bx \in \RR^n\) such that \(\bA \bx = \bb, \bx \succeq \bzero\).

  2. There exists \(\by \in \RR^m\) such that \(\bA^T \by \succeq \bzero, \by^T \bb < 0\).

(1) and (2) are called strong alternatives as exactly one of them must be true. In contrast weak alternatives are statements in which at most one of them can be true.

Proof. We first show that both (1) and (2) cannot be true simultaneously.

  1. Assume that both (1) and (2) are true.

  2. Note that

    \[ \by^T \bA \bx = \by^T (\bA \bx) = \by^T \bb < 0 \]

    since \(\bA \bx = \bb\) by (1) and \(\by^T \bb < 0\) by (2).

  3. At the same time

    \[ \by^T \bA \bx = (\by^T \bA) \bx = (\bA^T \by)^T \bx \geq 0 \]

    since by (2) \(\bA^T \by \succeq \bzero\) and by (1) \(\bx \succeq \bzero\).

  4. Thus, we have a contradiction.

  5. Hence (1) and (2) cannot be true simultaneously.

We now show that if (1) doesn’t hold then (2) must be true. In other words, not (1) \(\implies\) (2).

  1. Let \(\ba_1, \dots, \ba_n\) be the columns of \(\bA\).

  2. Note that the columns of \(\bA \in \RR^m\).

  3. Then, (1) can be interpreted as saying that \(\bb\) is a conic combination of columns of \(\bA\).

  4. Let \(Q\) denote the cone generated by the columns of \(\bA\)

    \[ Q = \cone \{\ba_1, \dots, \ba_n \}. \]
  5. \(Q\) is nonempty, closed and convex cone. The closedness of \(Q\) is due to Theorem 9.131 since \(Q\) is a conic hull of a finite set of points.

  6. Since (1) doesn’t hold true, hence \(\bb \notin Q\).

  7. By the strict separation theorem Theorem 9.169, the set \(Q\) and the point \(\bb\) can be strictly separated by a hyperplane.

  8. Specifically, there exists \(\bp \in \RR^m\) and \(\beta \in \RR\) such that

    \[ \bp^T \bb > \beta \text{ and } \bp^T \bx \leq \beta \Forall \bx \in Q. \]
  9. Since \(\bzero \in Q\), hence \(\beta \geq 0\).

  10. Pick any \(i \in \{1,\dots, n\}\).

  11. Then, \(t \ba_i \in Q\) for all \(t > 0\) since \(Q\) is a cone containing \(\ba_i\).

  12. Thus, \(\bp^T t \ba_i \leq \beta\) for all \(t > 0\).

  13. Thus, \(\bp^T \ba_i \leq \frac{\beta}{t}\) for all \(t > 0\).

  14. Since \(\beta \geq 0\), hence taking the limit \(t \to \infty\), we get \(\bp^T \ba_i \leq 0\).

  15. This holds true for every \(i=1,\dots,n\).

  16. Choose \(\by = -\bp\).

  17. Then, \(\by^T \bb = - (\bp^T \bb) < -\beta \leq 0\).

  18. Thus, \(\by^T \bb < 0\).

  19. Also, \(\by^T \ba_i \geq 0\) for all \(i=1,\dots,n\).

  20. Thus, \(\by^T \bA \succeq \bzero\) and \(\by^T \bb < 0\) as desired in (2).

Showing that if (2) doesn’t hold true, then (1) must be true is straightforward.

  1. Assume that (2) is false.

  2. For contradiction, assume that (1) is false.

  3. Then (2) must be true by the previous argument.

  4. We have a contradiction.

  5. Hence, (1) must be true.

Theorem 10.54 (Farkas’ lemma version 2)

Let \(\bA \in \RR^{m \times n}\) and \(\bb \in \RR^m\). Then, exactly one of the following two statements is true.

  1. There exists \(\bx \in \RR^n\) such that \(\bA \bx \preceq \bb\).

  2. There exists \(\by \in \RR^n\) such that \(\bA^T \by = \bzero, \by^T \bb < 0, \by \succeq \bzero\).

(2) is also equivalent to the following statement:

3 There exists \(\by \in \RR^m\) such that \(\bA^T \by = \bzero, \by^T \bb = -1, \by \succeq \bzero\).

Proof. We first show that (2) and (3) are equivalent.

  1. Clearly (3) \(\implies\) (2).

  2. Assume (2) is true.

  3. Let \(\by^T \bb = s\). Let \(r = \frac{-1}{s}\). Then, \(r > 0\).

  4. Let \(\tilde{\by} = \frac{\by}{r}\).

  5. Then, \(\tilde{\by} \succeq \bzero\) since \(\by \succeq \bzero\) and \(r > 0\).

  6. Also \(\tilde{\by}^ \bb = r \by^T \bb = \frac{-1}{s} s = -1\).

  7. Finally \(\bA^T \tilde{\by} = r \bA^T \by = \bzero\).

  8. Thus, \(\tilde{\by}\) satisfies (3).

Next, we show that (1) and (2) cannot be both true simultaneously.

  1. For contradiction, assume that both (1) and (2) are true.

  2. Then

    \[ \by^T \bA \bx = \by^T (\bA \bx) \leq \by^T \bb < 0 \]

    since \(\bA \bx \preceq \bb\) from (1), \(\by \succeq \bzero\) from (2) and \(\by^T \bb < 0\) from (2).

  3. Also

    \[ \by^T \bA \bx = (\by^T \bA) \bx = (\bA^T \by)^T \bx = \bzero^T \bx = 0 \]

    since \(\bA^T \by= \bzero\) from (2).

  4. We have a contradiction.

  5. Thus, both (1) and (2) cannot be true simultaneously.

We next prove that if (2) is false then (1) must be true.

  1. Since (2) and (3) are equivalent hence (3) is false too.

  2. Combine the system of equations \(\bA^T \by = \bzero\) and \(\by^T \bb = -1\) as \(\tilde{\bA} \by = \tilde{\bb}\) where

    \[\begin{split} \tilde{\bA} = \begin{bmatrix} \bA^T \\ \bb^T \end{bmatrix} \text{ and } \tilde{\bb} = \begin{bmatrix} 0\\ \vdots\\ 0\\ -1 \end{bmatrix}. \end{split}\]
  3. Since (3) if false, hence there doesn’t exist \(\by \in \RR^m\) such that \(\tilde{\bA} \by = \tilde{\bb}\).

  4. This is identical to statement (1) of the original Farkas’ lemma in Theorem 10.53.

  5. Hence statement (2) of Theorem 10.53 must hold true.

  6. Thus, there exists \(\bv \in \RR^{n + 1}\) such that \(\tilde{\bA}^T \bv \succeq \bzero\) and \(\tilde{\bb}^T \bv < 0\).

  7. Set

    \[\begin{split} \bv = \begin{bmatrix} \bx \\ t \end{bmatrix} \end{split}\]

    where \(\bx \in \RR^n\) and \(t \in \RR\).

  8. Then \(\tilde{\bb}^T \bv < 0\) implies that \(t > 0\).

  9. \(\tilde{\bA}^T \bv \succeq \bzero\) simplifies to

    \[ \bA \bx + t \bb \succeq \bzero. \]
  10. This further simplifies to

    \[\begin{split} & \bA \bx + t \bb \succeq \bzero\\ \iff & \bA \bx \succeq -t \bb \\ \iff & \bA \left ( \frac{-1}{t} \bx \right ) \preceq \bb. \end{split}\]

    The inequality sign changes since \(t > 0\).

  11. Clearly, \(\frac{-1}{t} \bx\) satisfies (1) as desired. Thus, (1) is indeed true.

Theorem 10.55 (Farkas’ lemma version 3)

Let \(\bA \in \RR^{m \times n}\) and \(\bc \in \RR^n\).

Then the following statements are equivalent.

  1. The implication \(\bA \bx \preceq \bzero \implies \bc^T \bx \leq 0\) holds true.

  2. There exists \(\by \succeq \bzero\) such that \(\bA^T \by = \bc\).

Proof. (2) \(\implies\) (1)

  1. (2) is same as statement (1) of Theorem 10.53.

  2. Thus, by Theorem 10.53, there is no \(\bx \in \RR^n\) such that \(\bA \bx \succeq \bzero\) and \(\bx^T \bc < 0\).

  3. Hence for every \(\bx \in \RR^n\),
    \(\bA \bx \succeq \bzero \implies \bx^T \bc \geq 0\).

  4. Replacing \(\bx\) by \(-\bx\), we see that for every \(\bx \in \RR^n\),
    \(\bA \bx \preceq \bzero \implies \bx^T \bc \leq 0\).

(1) \(\implies\) (2)

  1. For every \(\bx\), \(\bA \bx \preceq \bzero \implies \bc^T \bx \leq 0\).

  2. Thus, For every \(\bx\), \(\bA \bx \succeq \bzero \implies \bc^T \bx \geq 0\).

  3. Thus, there doesn’t exist any \(\bx\) with \(\bA \bx \succeq \bzero\) and \(\bc^T \bx < 0\).

  4. Thus, by Theorem 10.53, there exists \(\by \succeq \bzero\) such that \(\bA^T \by = \bc\).

10.6.2. Gordan’s Theorem#

Theorem 10.56 (Gordan’s theorem)

Let \(\bA \in \RR^{m \times n}\). Then exactly one of the following two systems has a solution.

  1. \(\bA \bx \prec \bzero\).

  2. \(\bp \neq \bzero, \bA^T \bp = \bzero, \bp \succeq \bzero\).

Proof. (1) \(\implies\) not (2)

  1. Assume that (1) has a solution given by \(\bx\).

  2. For contradiction, assume that (2) also has a solution.

  3. Hence, there exists a nonzero \(\bp\) such that \(\bA^T \bp = \bzero\) and \(\bp \succeq \bzero\).

  4. Multiplying the equality \(\bA^T \bp = \bzero\) from left by \(\bx\), we get

    \[ \bx^T \bA^T \bp = (\bA \bx)^T \bp = 0. \]
  5. Since \(\bA \bx \prec \bzero\) and \(\bp \succeq \bzero\), hence every term in \(\bA \bx\) is negative and every term in \(\bp\) is nonnegative.

  6. Hence \((\bA \bx)^T \bp = 0\) is possible only if \(\bp = \bzero\). A contradiction.

  7. Hence (2) doesn’t have a solution.

not (1) \(\implies\) (2)

  1. Assume that (1) doesn’t have a solution.

  2. The system (1) is equivalent to the system

    \[\begin{split} & \bA \bx + s \bone \preceq \bzero;\\ & s > 0. \end{split}\]
  3. Define \(\tilde{\bA} = \begin{bmatrix} \bA & \bone \end{bmatrix}\).

  4. Let \(\bc = \be_{n + 1}\) (i.e., \((0, 0, \dots, 0, 1)\).

  5. Then the above system is equivalent to

    \[\begin{split} \tilde{\bA} \begin{bmatrix} \bx \\ s \end{bmatrix} \preceq \bzero, \quad \bc^T \begin{bmatrix} \bx \\ s \end{bmatrix} > 0. \end{split}\]
  6. The infeasibility of (1) is thus equivalent to the infeasibility of the system

    \[ \tilde{\bA} \bw \preceq \bzero, \quad \bc^T \bw > 0 \]

    where \(\bw \in \RR^{n+1}\).

  7. Since this system is infeasible, hence by Farkas' lemma, the system

    \[ \tilde{\bA}^T \bz = \bc, \quad \bz \succeq \bzero \]

    must have a solution.

  8. In other words, there exists \(\bz \succeq \bzero\) such that

    \[ \bA^T \bz = \bzero, \quad \bone^T \bz = 1. \]
  9. Since \(\bone^T \bz = 1\), hence \(\bz \neq \bzero\).

  10. Thus, there exists \(\bz \neq \bzero\), with \(\bz \succeq \bzero\) such that \(\bA^T \bz = \bzero\).

  11. Hence, the system (2) has a solution \(\bz\).

10.6.3. Smooth Cost Function, Linear Inequality Constraints#

Theorem 10.57 (KKT necessary optimality conditions for linearly constrained problems)

Consider an optimization problem of the form

(10.9)#\[\begin{split}& \text{minimize } & & f(\bx) & \\ & \text{subject to } & & \langle \bx, \ba_i \rangle \leq b_i, i=1,\dots,m &.\end{split}\]

where \(f\) is continuously differentiable over \(\RR^n\), \(\ba_1, \dots, \ba_m \in \RR^n\) and \(b_1, \dots, b_m \in \RR\). Assume that \(\bx^*\) is a local minimum of this optimization problem. Then there exist nonnegative scalars \(t_1, \dots, t_m \geq 0\) such that

(10.10)#\[\nabla f(\bx^*) + \sum_{i=1}^m t_i \ba_i = \bzero\]

and

(10.11)#\[t_i (\langle \bx^*, \ba_i \rangle - b_i ) = 0, \quad i=1,\dots,m.\]

Proof. We are given that \(\bx^*\) is a local minimum point of (10.9).

  1. Let \(C\) denote the constraint set

    \[ C = \{\bx \in \RR^n \ST \langle \bx, \ba_i \rangle \leq b_i, i=1,\dots,m \}. \]
  2. By Theorem 10.45, \(\bx^*\) must be a stationary point.

  3. Hence for every \(\bx \in C\), we have

    \[ \langle \bx - \bx^* , \nabla f(\bx^*) \rangle \geq 0. \]
  4. Introduce a change of variable \(\by = \bx - \bx^*\) in the stationary point criterion.

  5. We must have \(\langle \by, \nabla f(\bx^*) \rangle \geq 0\) for every \(\by\) satisfying \(\langle \by + \bx^* , \ba_i \rangle \leq b_i\) for every \(i=1,\dots,m\).

  6. Equivalently, we must have \(\langle \by, \nabla f(\bx^*) \rangle \geq 0\) for every \(\by\) satisfying

    \[ \langle \by, \ba_i \rangle \leq b_i - \langle \bx^*, \ba_i \rangle, \quad \Forall i=1,\dots,m. \]
  7. The \(i\)-th linear inequality constraint is called active at \(\bx^*\) if \(\langle \bx, \ba_i \rangle = b_i\).

  8. The \(i\)-the constraint is called inactive if \(\langle \bx, \ba_i \rangle < b_i\).

  9. Let the set of active constraints be denoted by

    \[ I(\bx^*) = \{i \ST \langle \bx^*, \ba_i \rangle = b_i \}. \]
  10. If the \(i\)-th constraint is active, then \(\langle \by + \bx^* , \ba_i \rangle \leq b_i\) is equivalent to \(\langle \by, \ba_i \rangle \leq 0\).

  11. If the \(i\)-th constraint is inactive, then \(\langle \by + \bx^* , \ba_i \rangle \leq b_i\) is equivalent to \(\langle \by, \ba_i \rangle \leq b_i - \langle \bx^*, \ba_i \rangle\) a positive quantity.

  12. Thus, we must have \(\langle \by, \nabla f(\bx^*) \rangle \geq 0\) whenever \(\by\) satisfies

    \[\begin{split} & \langle \by, \ba_i \rangle \leq 0, & i \in I(\bx^*), \\ & \langle \by, \ba_i \rangle \leq b_i - \langle \bx^*, \ba_i \rangle, & i \notin I(\bx^*). \end{split}\]
  13. We claim that the second set of inequalities are redundant.

    1. Suppose that we pick a \(\by\) such that \(\langle \by, \ba_i \rangle \leq 0, \Forall i \in I(\bx^*)\).

    2. At \(\bx^*\), we have \(b_i - \langle \bx^*, \ba_i \rangle > 0\) for every \(i \notin I(\bx^*)\).

    3. Then it is possible to choose a small enough \(\alpha > 0\) such that \(\langle \alpha \by, \ba_i \rangle \leq b_i - \langle \bx^*, \ba_i \rangle\) for every \(i\).

      1. We can choose an \(\alpha_i\) for every \(i\) as follows.

      2. If \(\langle \by, \ba_i \rangle \leq 0\), then let \(\alpha_i = 1\).

      3. If \(\langle \by, \ba_i \rangle > 0\), then let

        \[ \alpha_i = \frac{b_i - \langle \bx^*, \ba_i \rangle}{\langle \by, \ba_i \rangle}. \]
      4. Let \(\alpha = \min\{ \alpha_1, \dots, \alpha_m \}\).

    4. We can now see that

      \[\begin{split} & \langle \by, \ba_i \rangle \leq 0 \Forall i \in I(\bx^*)\\ \implies & \langle \alpha \by, \ba_i \rangle \leq b_i - \langle \bx^*, \ba_i \rangle \Forall i=1,\dots,m. \end{split}\]
    5. But then by stationarity of \(\bx^*\), we must have \(\langle \alpha \by, \nabla f(\bx^*) \rangle \geq 0\).

    6. Since \(\alpha > 0\), hence we must have \(\langle \by, \nabla f(\bx^*) \rangle \geq 0\).

  14. Hence, we must have \(\langle \by, \nabla f(\bx^*) \rangle \geq 0\) whenever \(\by\) satisfies

    \[ \langle \by, \ba_i \rangle \leq 0, \quad \Forall i \in I(\bx^*). \]
  15. We are ready to apply the Farkas’ lemma Theorem 10.53.

    1. Form \(\bA\) by combining \(-\ba_i\) for every \(i \in I(\bx^*)\) as columns.

    2. Let \(\bv = \nabla f(\bx^*)\).

    3. We have \(\bA^T \by \succeq 0 \implies \by^T \bv\geq 0\).

    4. Hence the system \(\bA^T \by \succeq 0, \by^T \bv < 0\) is infeasible.

    5. Hence there exists \(\bt \succeq \bzero\) such that \(\bA \bt = \bv\).

  16. Hence there exists \(t_i \geq 0\) for every \(i \in I(\bx^*)\) such that

    \[ - \nabla f(\bx^*) = \sum_{i \in I(\bx^*)} t_i \ba_i. \]
  17. By defining \(t_i = 0\) for every \(i \notin I(\bx^*)\), we have

    \[ t_i (\langle \bx^*, \ba_i \rangle - b_i) = 0 \quad \Forall i=1,\dots,m \]

    and

    \[ \nabla f(\bx^*) + \sum_{i=1}^m t_i \ba_i = \bzero \]

    as desired.

10.6.4. Convex and Smooth Cost Function, Linear Inequality Constraints#

Theorem 10.58 (KKT necessary and sufficient optimality conditions for linearly constrained problems)

Consider an optimization problem of the form

\[\begin{split} & \text{minimize } & & f(\bx) & \\ & \text{subject to } & & \langle \bx, \ba_i \rangle \leq b_i, i=1,\dots,m &. \end{split}\]

where \(f\) is a convex continuously differentiable over \(\RR^n\), \(\ba_1, \dots, \ba_m \in \RR^n\) and \(b_1, \dots, b_m \in \RR\). Let \(\bx^*\) be a feasible solution of this optimization problem. Then \(\bx^*\) is an optimal solution if and only if there exist nonnegative scalars \(t_1, \dots, t_m \geq 0\) such that

\[ \nabla f(\bx^*) + \sum_{i=1}^m t_i \ba_i = \bzero \]

and

\[ t_i (\langle \bx^*, \ba_i \rangle - b_i ) = 0, \quad i=1,\dots,m. \]

Proof. Assume that \(\bx^*\) is an optimal solution. Then it is also a local minimizer. Then the necessary conditions follow from Theorem 10.57.

For the converse, assume that \(\bx^*\) is a feasible solution and the conditions for the scalars \(t_1, \dots, t_m\) are satisfied.

  1. Consider any feasible solution \(\bx\).

  2. Define a function

    \[ h(\bx) = f(\bx) + \sum_{i=1}^m t_i (\langle \bx, \ba_i \rangle - b_i). \]
  3. By differentiating w.r.t. \(\bx\), we have

    \[ \nabla h(\bx) = \nabla f(\bx) + \sum_{i=1}^m t_i \ba_i. \]
  4. By hypothesis we have, \(\nabla h(\bx^*) = 0\).

  5. Since \(h\) is convex, hence by Theorem 10.48, \(\bx^*\) is a minimizer of \(h\) over \(\RR^n\).

  6. By hypothesis, we have \(t_i (\langle \bx^*, \ba_i \rangle - b_i ) = 0, \quad i=1,\dots,m\).

  7. Hence

    \[ \sum_{i=1}^m t_i (\langle \bx^*, \ba_i \rangle - b_i ) = 0. \]
  8. For any feasible point, we have \(\langle \bx, \ba_i \rangle \leq b_i\) for every \(i\).

  9. Since by hypothesis \(t_i \geq 0\), hence for any feasible point,

    \[ \sum_{i=1}^m t_i (\langle \bx, \ba_i \rangle - b_i ) \leq 0. \]
  10. Hence

    \[\begin{split} f(\bx^*) &= f(\bx^*) + \sum_{i=1}^m t_i (\langle \bx^*, \ba_i \rangle - b_i )\\ &= h(\bx^*) \\ &\leq h(\bx)\\ &= f(\bx) + \sum_{i=1}^m t_i (\langle \bx, \ba_i \rangle - b_i )\\ &\leq f(\bx). \end{split}\]
  11. Hence for every feasible solution \(\bx\), we have \(f(\bx^*) \leq f(\bx)\).

  12. This proves that \(\bx^*\) is a global optimal solution.

The scalars \(t_1, \dots, t_m\) that appear in the previous two results are known as the Lagrange multipliers.

  1. Each multiplier \(t_i\) is associated with the corresponding inequality constraint \(\langle \bx, \ba_i \rangle \leq b_i\).

  2. Each multiplier is nonnegative.

  3. The conditions in (10.11) are known as complementary slackness conditions.

  4. At the local minimizer, either the constraint is active with \(\langle \bx, \ba_i \rangle = b_i\) or \(t_i = 0\).

  5. The results can be extended to support linear equality constraints too.

10.6.5. Linear Constraints (Equality and Inequality)#

Theorem 10.59 (KKT optimality conditions for linearly constrained problems)

Consider an optimization problem of the form

(10.12)#\[\begin{split}& \text{minimize } & & f(\bx) & \\ & \text{subject to } & & \langle \bx, \ba_i \rangle \leq b_i, i=1,\dots,m &\\ & \text{and } & & \langle \bx, \bc_j \rangle = d_j, j=1,\dots,p &\end{split}\]

where \(f\) is continuously differentiable over \(\RR^n\), \(\ba_1, \dots, \ba_m, \bc_1, \dots, \bc_p \in \RR^n\) and \(b_1, \dots, b_m, d_1, \dots, d_p \in \RR\). Then we have the following

  1. (Necessity of KKT conditions) If \(\bx^*\) is a local minimum of this optimization problem, then there exist nonnegative scalars \(t_1, \dots, t_m \geq 0\) and real scalars \(r_1, \dots, r_p \in \RR\)

    such that

    (10.13)#\[\nabla f(\bx^*) + \sum_{i=1}^m t_i \ba_i + \sum_{j=1}^p r_j \bc_j = \bzero\]

    and

    (10.14)#\[t_i (\langle \bx^*, \ba_i \rangle - b_i ) = 0, \quad i=1,\dots,m.\]
  2. (Sufficiency for convex cost functions) If \(f\) is also convex over \(\RR^n\) and \(\bx^*\) is a feasible point of the problem (10.12) for which there exist nonnegative scalars \(t_1, \dots, t_m \geq 0\) and real scalars \(r_1, \dots, r_p \in \RR\) such that (10.13) and (10.14) are satisfied, then \(\bx^*\) is an optimal solution.

The key idea for the proof is convert each linear equality constraint to two linear inequality constraints. Then we can leverage the previous results.