10.8. Lagrange Multipliers#

Main references for this section are [6, 9, 17].

The material in this section builds up on the material from previous sections. While the material in Constrained Optimization I and Constrained Optimization II doesn’t make specific assumptions on the structure of the constraint set (beyond say convexity and closedness), the material in Linear Constraints deals with a specific structure where the constraint set consists of a system of linear inequalities and equalities. This section focuses on the case where the constraint set consists of a system of smooth inequalities and equalities.

The necessary and/or sufficient conditions for the optimization problems presented in this section admit the existence of a set of nonnegative (for inequality constraints) and real (for equality constraints) scalars known as Lagrange multipliers satisfying a specific system of equations.

  1. We generalize the linear inequality and equality constraints in Linear Constraints to allow for smooth inequality and equality constraints.

  2. We first consider problems involving minimization of a smooth function over a set of smooth inequalities.

    1. We describe the notion of feasible descent directions.

    2. We show that at local minimizers, there are no feasible descent directions.

    3. We then develop the necessary Fritz-John conditions for the existence of a local minimizer.

    4. We add further constraint qualifications to develop the necessary KKT conditions for the existence of a local minimizer.

  3. We then consider the problems involving minimization of a smooth function over a set of sooth inequalities and equalities. We present the KKT conditions for the existence of a local minimizer.

  4. We then add convexity in the mix for the cost function and constraint functions.

10.8.1. Inequality Constrained Problems#

We start by developing the KKT conditions for the problem of minimizing a smooth function over a set of inequality constraints.

The problem is given as

(10.18)#minimize f(x)subject to gi(x)0,i=1,,m

where f,g1,,gm:VR are continuously differentiable functions over V.

The constraint set is given by

C={xV|gi(x)0,i=1,,m}.

10.8.1.1. Feasible Descent Directions#

Recall from Definition 10.22 that a descent direction is a direction along which the directional derivative of the cost function is negative; i.e., f(x;d)<0. If f is continuously differentiable, this translates to

f(x;d)=d,f(x)<0.

Recall from Definition 10.23 that given a vector xC, a direction dV is said to be a feasible direction of C at x if there exists a t>0 such that

x+tdC for every t[0,t].

We now introduce the notion of a feasible descent direction.

Definition 10.28 (Feasible descent direction)

Consider the problem of minimizing a cost function f:V(,] over a constraint set Cdomf. A nonzero vector d is called a feasible descent direction at xC if f(x;d)<0 and there exists t>0 such that

x+tdC for every t[0,t].

In other words, a feasible descent direction at xC is a feasible direction and a decent direction.

  1. If f is continuously differentiable then we must have d,f(x)<0.

  2. If C is convex, then we just need to have x+tdC. By virtual of convexity of C, every y[x,x+td]C.

Lemma 10.7 (Local minimum and feasible descent directions)

Consider the problem of minimizing a cost function f:V(,] over a constraint set Cdomf. If x is a local minimizer then there are no feasible descent directions at x.

Proof. We prove this by contradiction.

  1. Let x be a local minimizer.

  2. Assume that d0 is a feasible descent direction.

  3. Then there is an ϵ1>0 such that x+tdC for every t[0,ϵ1].

  4. Also, f(x;d)<0.

  5. By Definition 9.70, there exists ϵ2>0 such that

    f(x+td)f(x)t<00<t<ϵ2.
  6. Equivalently, f(x+td)<f(x) for all 0<t<ϵ2.

  7. Let ϵ=min(ϵ1,ϵ2).

  8. Then for every t(0,ϵ), we have x+tdC and f(x+td)<f(x).

  9. This contradicts the hypothesis that x is a local minimizer.

  10. Hence, there are no feasible descent directions at x.

10.8.1.2. Necessary Conditions for Local Optimality#

Revising the problem (10.18):

  1. A constraint gi is called active at x if gi(x)=0.

  2. A constraint gi is called inactive at x if gi(x)<0.

  3. The set of active constraints at a point x is denoted by

    A(x)={i1,,m|gi(x)=0}.

We first restate the Lemma 10.7 for the minimization with inequality constraints problem (10.18). The key idea is the lack of feasible descent directions at a local minimizer.

  1. f(x;d)<0 will indicate that d is a feasible direction.

  2. If a constraint is inactive, then it remains valid in the neighborhood of the local minimizer due to continuity of gi.

  3. If a constraint is active, then moving in some directions will lead to invalidation of the constraint while moving in some directions will keep the constraint valid.

  4. In particular, if gi(x;d)<0, then moving along d keeps the i-th active constraint valid.

  5. Hence, along a feasible descend direction, the directional derivatives of the cost function and the active constraint functions must be negative.

Lemma 10.8 (Local minimum and feasible descent directions for inequality constraints)

Let x be a local minimizer of the optimization problem (10.18):

minimize f(x)subject to gi(x)0,i=1,,m

where f,g1,,gm:VR are continuously differentiable functions over V. Let A(x) denote the set of active constraints:

A(x)={i|gi(x)=0}.

Then there doesn’t exist a vector dV such that

d,f(x)=f(x;d)<0,d,gi(x)=gi(x;d)<0,iA(x).

This result states that local optimality is equivalent to the infeasibility of a certain system of strict inequalities.

Proof. We prove this by contradiction.

  1. Let x be a local minimizer.

  2. Assume that d0 be a direction satisfying the constraints above.

  3. Then there exists an ϵ0>0 such that f(x+td)<f(x) for every t(0,ϵ0).

  4. Similarly, there exist ϵi>0 such that gi(x+td)<gi(x)=0 for every t(0,ϵi) for every iA(x).

  5. Let ϵ=min{ϵ0,,ϵm}.

  6. Then for every t(0,ϵ), we have f(x+td)<f(x) and gi(x+td)<0 for every iA(x).

  7. By the continuity of gi for all i, and the fact that gi(x)<0 for every iA(x), there exists a δ>0 such that for every t(0,δ), gi(x+td)<0 for every iA(x).

  8. Hence, we conclude that for every t(0,min(ϵ,δ)), we have f(x+td)<f(x) and gi(x+td)<0 for every i1,,m.

  9. But this contradicts the local optimality of x.

10.8.1.3. Fritz-John Conditions#

Recall that Farkas’ and Gordan’s theorems of the alternative present different pairs of systems where if one system is infeasible then the other must be feasible and vice versa. We can apply Gordan’s theorem to the infeasible system of strict inequalities in Lemma 10.8 to develop the so-called Fritz-John conditions.

Theorem 10.77 (Fritz-John conditions)

Let x be a local minimizer of the optimization problem

minimize f(x)subject to gi(x)0,i=1,,m

where f,g1,,gm:VR are continuously differentiable functions over V. Then there exist nonnegative scalar multipliers t0,t1,,tm0 which are not all zero such that

t0f(x)+i=1mtigi(x)=0,tigi(x)=0,i=1,,m.

Proof. From Lemma 10.8, we have that the following system is infeasible.

d,f(x)<0,d,gi(x)<0,iA(x).
  1. Let n=dimV.

  2. Let there be an isomorphic mapping between V and Rn.

  3. For every xV, let x also denote the corresponding vector in Rn.

  4. Assume that there are k active constraints at x.

  5. Construct a k+1×n matrix A as follows:

    A=[f(x)Tfi1(x)Tfik(x)T]

    where i1,,ik denote the indices of active constraints.

  6. Then the above system of strict inequalities can be stated as Ad0.

  7. This system of equations is infeasible.

  8. Then by Gordan's theorem, the system

    t0,ATt=0,t0

    where tRk+1 is feasible.

  9. We write t=(t0,ti1,,tik).

  10. The equation ATt=0 expands to

    t0f(x)+iA(x)tigi(x)=0.
  11. t0 means that at least one of t0,ti1,,tik0.

  12. t0 means that t0,ti1,,tik0.

  13. Now, let ti=0 for all remaining iA(x).

  14. Then for active constraints, we have gi(x)=0 and for inactive constraints, we have ti=0.

  15. Hence for all constraints, we have tigi(x)=0.

  16. Hence there exist nonnegative scalar multipliers t0,t1,,tm0 which are not all zero such that

    t0f(x)+i=1mtigi(x)=0,tigi(x)=0,i=1,,m.

The key issue with Fritz-John conditions is that it allows t0=0. The case t0=0 is not particularly useful since it leads to

iA(x)tigi(x)=0.

with ti0 and not all ti being zero. This means that the gradients of the active constraints are linearly dependent.

  1. The case of linearly dependent gradients has nothing to do with the objective function.

  2. A number of points might satisfy the Fritz-John conditions and yet not be local minimum points.

  3. We can modify the Fritz-John conditions and insist that the gradients of the active constraints be linearly independent.

  4. This leads to what are called the KKIT conditions.

10.8.1.4. KKT Conditions#

Theorem 10.78 (KKT conditions)

Let x be a local minimizer of the optimization problem

minimize f(x)subject to gi(x)0,i=1,,m

where f,g1,,gm:VR are continuously differentiable functions over V. Let A(x) denote the set of active constraints:

A(x)={i|gi(x)=0}.

Assume that the gradients of the active constraints {gi(x)}iA(x) are linearly independent.

Then there exist nonnegative scalar multipliers t1,,tm0 which are not all zero such that

f(x)+i=1mtigi(x)=0,tigi(x)=0,i=1,,m.

Proof. This is a simple extension of Theorem 10.77.

By Fritz-John conditions, there exist nonnegative scalar multipliers r0,r1,,rm0 which are not all zero such that

r0f(x)+i=1mrigi(x)=0,rigi(x)=0,i=1,,m.
  1. If r0=0, then the set of gradients of active constraints will become linearly dependent.

  2. Hence, we must have r0>0.

  3. Let ti=rir0 for every i=1,,m.

  4. The result follows.

10.8.2. Inequality and Equality Constrained Problems#

We now generalize the KKT conditions to include problems of the form which include both inequality constraints and equality constraints

(10.19)#minimize f(x)subject to gi(x)0,i=1,,mhj(x)=0,j=1,,p

where f,g1,,gm,h1,,hp:VR are continuously differentiable functions over V.

The constraint set is given by

C={xV|gi(x)0,i=1,,m and hj(x)=0,j=1,,p}.
  1. An equality constraint must always be met at a feasible point. Hence there is no need to distinguish between active and inactive equality constraints. All inequality constraints are active.

  2. A constraint of the form hj(x)=0 can be converted into two inequality constraints hj(x)0 and hj(x)0.

Theorem 10.79 (KKT conditions for problems with smooth inequality and equality constraints)

Let x be a local minimizer of the optimization problem (10.19)

minimize f(x)subject to gi(x)0,i=1,,mhj(x)=0,j=1,,p

where f,g1,,gm,h1,,hp:VR are continuously differentiable functions over V. Let A(x) denote the set of active inequality constraints:

A(x)={i|gi(x)=0}.

Assume that the gradients of the active inequality constraints {gi(x)}iA(x) and all the equality constraints {hj(x)}j=1,,p are linearly independent.

Then there exist nonnegative scalar multipliers t1,,tm0 and real scalar multipliers r1,,rpR which are not all zero such that

f(x)+i=1mtigi(x)+j=1prjhj(x)=0,tigi(x)=0,i=1,,m.

10.8.2.1. KKT Points and Regular Points#

All the results up to this point define a set of necessary conditions in the form of a system of equations on the constraint functions and their gradients which must be satisfied by every local minimizer of the optimization problem. Besides the local minimizers, other points may also satisfy this system of equations. We now introduce the notion of KKT points which satisfy these equations.

Definition 10.29 (KKT point)

Consider the optimization problem (10.19) where f,g1,,gm,h1,,hp:VR are continuously differentiable functions over V.

A feasible point x is called a KKT point if there exist nonnegative scalar multipliers t1,,tm0 and real scalar multipliers r1,,rpR which are not all zero such that

f(x)+i=1mtigi(x)+j=1prjhj(x)=0,tigi(x)=0,i=1,,m.

All the necessary KKT conditions so far can be simply restarted as a local minimizer must be a KKT point if the gradients of active inequality constraints and all equality constraints at the point are linearly independent. We introduce the notion of regularity to capture the linear independence aspect.

Definition 10.30 (Regularity)

Consider the optimization problem (10.19) where f,g1,,gm,h1,,hp:VR are continuously differentiable functions over V.

A feasible point x is called regular if the gradients of the active inequality constraints {gi(x)}iA(x) and all the equality constraints {hj(x)}j=1,,p are linearly independent.

With the terminology of these definitions, Theorem 10.79 reduces to: if a regular point is a local minimizer then it must be a KKT point.

The notion of regularity is a kind of constraint qualification.

Example 10.15 (Regular points, KKT points, Local minimizers)

Consider the problem:

minimize x1+x2subject to x12+x22=1.

The problem structure

  1. The ambient space is R2.

  2. We have the cost function: f(x)=x1+x2.

  3. The cost function is smooth and convex, in fact linear.

  4. We don’t have any inequality constraints.

  5. We have one equality constraint.

  6. The equality constraint function is given by h(x)=x12+x221.

  7. The equality constraint is h(x)=0.

  8. h is a smooth function but it is not a convex function.

  9. The constraint set is a contour of h.

  10. The set of feasible points is given by C={x|h(x)=0}.

  11. The constraint set is not convex.

  12. Hence, it is not a convex optimization problem.

  13. However, the constraint set is compact.

  14. Hence, due to Theorem 3.85, the function f indeed attains a minimum as well as a maximum value on the C.

Gradients

  1. We have f(x)=(1,1).

  2. We have h(x)=(2x1,2x2).

Irregular points

  1. The KKT conditions are applicable only on regular points.

  2. We first identify the points which are irregular.

  3. The irregular points are points at which the gradients of all active inequality constraints and equality constraints are linearly dependent.

  4. Since, we have a single equality constraint, hence the irregular points are those points at which h(x)=0.

  5. This is given by a single point (x1,x2)=(0,0).

  6. But (0,0)C since h((0,0))=10.

  7. Hence, the constraint set C doesn’t contain any irregular points.

  8. In other words, every feasible point is a regular point.

  9. Hence the KKT conditions are necessary for local optimality.

  10. In other words, if a point is a local minimizer then it must be a KKT point.

KKT points

  1. To identify the KKT points, we form the Lagrangian

    L(x,r)=f(x)+rh(x)=x1+x2+r(x12+x221).
  2. The KKT conditions are:

    xL(x,r)=f(x)+rh(x)=0,h(x)=0.
  3. They expand to

    1+2rx1=0,1+2rx2=0,x12+x221=0.
  4. From the first two equations, we have r0 and x1=x2=12r.

  5. Plugging it into the third equation, we get

    (12r)2+(12r)2=1.
  6. This simplifies to r2=12.

  7. Hence, we have r=±12.

  8. This gives us two different KKT points (12,12) and (12,12).

The optimal solution

  1. By compactness, we know that the minimizer does exist.

  2. By regularity, we know that the minimizer must be a KKT point.

  3. We have two candidates available.

  4. We have f((12,12))=2 and f((12,12))=2.

  5. Hence the minimizer is given by

    x=(12,12)

    as it has the smaller value of f.

10.8.3. The Convex Case#

We now restrict our attention to the case where the cost and constraint functions are convex. In this case, the KKT conditions are also sufficient.

Theorem 10.80 (Sufficient KKT conditions for convex problems (smooth and convex cost and inequality constraints, affine equality constraints))

Let x be a feasible solution of the optimization problem

minimize f(x)subject to gi(x)0,i=1,,mhj(x)=0,j=1,,p

where f,g1,,gm:VR are continuously differentiable convex functions over V and h1,,hp:VR are affine functions.

Suppose that there exist nonnegative scalar multipliers t1,,tm0 and real scalar multipliers r1,,rpR which are not all zero such that

f(x)+i=1mtigi(x)+j=1prjhj(x)=0,tigi(x)=0,i=1,,m.

Then x is an optimal solution of the minimization problem above.

Proof. We are given that x is a feasible point satisfying the KKT conditions.

  1. Define the function

    s(x)=f(x)+i=1mtigi(x)+j=1prjhj(x).
  2. Since f and gi are convex and hj are affine, hence s is convex.

  3. Since all of them are continuously differentiable, hence s is also continuously differentiable.

  4. We have

    s(x)=f(x)+i=1mtigi(x)+j=1prjhj(x).
  5. We are given that s(x)=0.

  6. By Theorem 10.48, x is a minimizer of s over V.

  7. Hence s(x)s(x) for every xV.

  8. By hypothesis tigi(x)=0 for every i=1,,m.

  9. Hence i=1mtigi(x)=0.

  10. Since x is a feasible point, hence hj(x)=0 for every j=1,,p.

  11. Hence j=1phj(x)=0.

  12. Hence

    f(x)=f(x)+i=1mtigi(x)+j=1phj(x)=s(x)s(x)=f(x)+i=1mtigi(x)+j=1prjhj(x)f(x).

    The last inequality comes from the fact that ti0, gi(x)0 and hj(x)=0 for every feasible x.

  13. Hence for every feasible x, we have f(x)f(x).

  14. Hence x is an optimal point.

10.8.3.1. Slater’s Conditions#

In Theorem 10.79, we saw that KKT conditions become necessary for the local optimality of a feasible point only if the feasible point is regular. The regularity was a constraint qualification for the nonconvex smooth optimization problem.

In the convex case, a different condition than regularity can guarantee the necessity of KKT conditions. They are known as Slater’s conditions.

Definition 10.31 (Slater’s conditions)

Let g1,,gm:VR be convex. We say that the Slater’s condition is satisfied for a set of convex inequalities

gi(x)0,i=1,,m

if there exists a point x^V such that

gi(x^)<0,i=1,,m.

In other words, the Slater’s condition requires the existence of a point which strictly satisfies all the convex inequality constraints.

Slater’s condition is much easier to check since it requires the existence of a single point which strictly satisfies all the convex inequalities.

Theorem 10.81 (Necessity of KKT conditions under Slater’s condition)

Let x be an optimal solution of the optimization problem

minimize f(x)subject to gi(x)0,i=1,,m

where f,g1,,gm:VR are continuously differentiable functions over V. In addition, assume that g1,,gm are convex. Suppose that there exists a point x^V such that

gi(x^)<0,i=1,,m.

Then there exist nonnegative scalar multipliers t1,,tm0 which are not all zero such that

f(x)+i=1mtigi(x)=0,tigi(x)=0,i=1,,m.

Proof. This is also derived from Fritz-John conditions Theorem 10.77.

By Fritz-John conditions, there exist nonnegative scalar multipliers r0,r1,,rm0 which are not all zero such that

r0f(x)+i=1mrigi(x)=0,rigi(x)=0,i=1,,m.

We need to show that r0>0. After that we can pick ti=rir0 for every i=1,,m to get the desired result.

  1. For contradiction, assume that r0=0.

  2. Then we have

    i=1mrigi(x)=0.
  3. By the gradient inequality, we have

    gi(x)gi(x)+xx,gi(x),i=1,,m.
  4. Specifically, for the point x^, we have

    0>gi(x^)gi(x)+x^x,gi(x),i=1,,m.
  5. Multiplying the i-th inequality by ri0 and summing over i=1,,m, we get

    0>i=1mrigi(x)+x^x,i=1mrigi(x).

    This inequality is strict since not all ri are 0 and r0=0.

  6. Since i=1mrigi(x)=0, it reduces to

    0>i=1mrigi(x).
  7. But rigi(x)=0 for every i=1,,m. Hence we must have

    i=1mrigi(x)=0.
  8. A contradiction. Hence r0>0 must be true.

10.8.4. A Tangent Cones Perspective#

Consider an optimization problem of the form

minimize f(x)subject to hi(x)=0,i=1,,m.

Assume that f:VR and hi:VR for i=1,,m are smooth functions.

The constraint set can be written as

C={x|hi(x)=0,i=1,,m}.
  1. Assume that x is a minimizer of this problem.

  2. By Theorem 10.72, we must have

    f(x)TC(x).

Let us now motivate the Lagrange multipliers using a simple problem of linear inequalities.

  1. Consider the specific case where

    hi(x)=x,aibi.
  2. Consider a matrix A which consists of a1T,,amT as rows.

  3. Put together b1,,bm as a vector b.

  4. Then the constraint set can be expressed as

    C={x|Ax=b}.
  5. Assume that xC is the local minimizer of f.

  6. By Example 10.10,

    TC(x)={x|Ax=0}

    which is the nullspace of A.

  7. By Example 9.18,

    TC(x)=TC(x)=rangeAT.
  8. Hence, by the optimality condition, we have

    f(x)rangeAT.
  9. Hence, there exists tRm such that

    f(x)+ATt=0.
  10. This is equivalent to

    f(x)+i=1mtiai=0.
  11. Since hi(x)=ai, hence this is equivalent to

    f(x)+i=1mtihi(x)=0.

This motivates us to define the Lagrangian of f as

L(x,t)=f(x)+i=1mtihi(x).

The basic Lagrangian theorem states that under suitable conditions, if x is a local minimum of f under the constraint set C then there exist scalars t1,,tm called Lagrangian multipliers such that

f(x)+i=1mtihi(x)=0.
  1. There are n unknowns in x and m unknowns in t.

  2. Thus, we have a total of m+n unknowns.

  3. The relation above gives us a system of n equations.

  4. Together with the m equalities hi(x)=0, we have a system of m+n equations with m+n unknowns.

  5. Thus, the problem of solving a constrained optimization problem is transformed into a problem of solving a system of nonlinear equations.

  6. Now, suppose that the tangent cone at x can be written as

    TC(x)={x|x,hi(x)=0,i=1,,m}.
  7. Letting ai=hi(x) and following the argument above, we must have

    f(x)+i=1mtihi(x)=0.
  8. Thus, if the tangent cone can be represented as above, then if x is a local minimizer, then the Lagrangian multipliers t1,,tm must exist.

  9. The admittance of Lagrange multipliers at a given xC is the property of the constraint set C. It is not a property of the optimization problem itself. If C admits Lagrange multipliers at x, then there exists a Lagrange multiplier vector at x for every smooth cost function f if x is a local minimizer of f.

10.8.5. Enhanced Fritz-John Conditions#

We now introduce a more difficult optimization problem

(10.20)#minimize f(x)subject to xC

where the constraint set C consists of equality and inequality constraints as well as an additional abstract set constraint X:

(10.21)#C=X{x|gi(x)0,i=1,,m}{x|hj(x)=0,j=1,,p}.

We assume that f, gi and hj are smooth functions from V to R and X is a nonempty closed set.

We also introduce g:VRm as

g(x)=(g1(x),,gm(x))

and h:VRp as

h(x)=(h1(x),,hp(x)).

For every i=1,,m, we define

gi+(x)=max{0,gi(x)}.

Correspondingly, we define

g+(x)=(g1+(x),,gm+(x)).

Definition 10.32 (Lagrangian function)

For the optimization problem (10.20), the Lagrangian function is defined as

L(x,t,r)=f(x)+i=1mtigi(x)+j=1prjhj(x)

where tRm and rRp.

10.8.5.1. Lagrange Multiplier Vectors#

Definition 10.33 (Lagrange multiplier vectors)

We say that a constraint set C as defined in (10.21) admits Lagrange multipliers at a point xC if for every smooth cost function f for which x is a local minimum of the problem (10.20), there exist vectors t=(t1,,tm) and r=(r1,,rp) that satisfy the following conditions:

(10.22)#y,(f(x)+i=1mtigi(x)+j=1prjhj(x))0,yTX(x),
(10.23)#ti0,i=1,,m,
(10.24)#ti=0,i|gi(x)<0.

A pair (t,r) satisfying these conditions is called a Lagrange multiplier vector corresponding to f and x.

  1. We also call the Lagrange multiplier vector as simply Lagrange multipliers.

  2. The condition (10.23) is the nonnegativity condition of the Lagrangian multipliers for the inequality constraints.

  3. The condition (10.24) is the complementary slackness condition.

  4. From (10.22), we can see that for each yTX(x), the set of Lagrange multiplier vectors corresponding to a given f and x is a closed half-space.

  5. Hence, the set of Lagrange multiplier vectors corresponding to a given f and x is an intersection of closed half spaces.

  6. Hence the set of Lagrange multiplier vectors is closed and convex. Although it may possibly be empty.

  7. The condition (10.22) is referred to as the Lagrangian stationarity condition. It states that the gradient of the Lagrangian function is nonnegative along all tangent directions at the local minimizer.

  8. It can be viewed as the necessary condition for x to be a local minimizer of the function L(x,t,r). See Theorem 10.72.

  9. When X=V, then TX(x)=V, and this condition reduces to

    f(x)+i=1mtigi(x)+j=1prjhj(x)=0.
  10. When X is convex, then (10.22) reduces to

    xx,(f(x)+i=1mtigi(x)+j=1prjhj(x))0,xX.
    1. X is convex. Hence α(xx) for α>0 is a feasible direction for every xX.

    2. Hence, if this inequality holds, then (10.22) holds for every dFC(x).

    3. Since X is convex, hence clFC(x)=TC(x).

    4. If the inequality (10.22) holds for every dFC(x), then it will also hold for a closure point of FC(x).

    5. In other words, if x,a0 for every xA, then for any convergent sequence {xk} of A, we have limkxk,a0. Hence the inequality holds for every closure point also.

  11. The Lagrangian stationary condition can also be equivalently written as

    (f(x)+i=1mtigi(x)+j=1prjhj(x))TX(x).

    In other words, the negative gradient of the Lagrangian function must lie in the polar cone of the tangent cone of X at x.

  12. Recall from Theorem 10.68 that

    TX(x)N~X(x)

    where N~X(x) is the normal cone of X at x (in the sense of [9]).

  13. Hence the negative gradient of the Lagrangian function must be a normal direction (Definition 10.26) at x.

  14. If X is regular at x (Theorem 10.70), then we also have

    TX(x)=N~X(x).

10.8.5.2. Enhanced Fritz-John Conditions#

We are now ready to present the Enhanced Fritz-John conditions as the necessary conditions for the existence of the local minimizer of the problem (10.20).

Theorem 10.82 (Enhanced Fritz-John conditions)

Let x be a local minimizer of the problem (10.20)-(10.21). Then there exist scalars t0,t1,,tm,r1,,rp satisfying the following conditions.

  1. The gradients satisfy the relation:

    (t0f(x)+i=1mtigi(x)+j=1prjhj(x))N~X(x).
  2. Nonnegativity: ti0 for every i=0,,m.

  3. The scalars t0,t1,,tm,r1,,rp are not equal to 0.

  4. Complementary violation condition: If the index set IJ is not empty, where

    I={i>0|ti>0},J={j|rj0},

    there exists a sequence {xk} of X that converges to x and is such that for all k

    f(xk)<f(x),tigi(xk)>0iI,rjhj(xk)>0jJ,gi+(xk)=o(w(xk))iI,|hj(xk)|=o(w(xk))jJ,

    where gi+(x)=max{0,gi(x)} and

    w(x)=min{miniIgi+(x),minjJ|hj(x)|}.

Proof. The proof is based on a quadratic penalty function approach. For each kN, we define a penalty function as

Fk(x)=f(x)+k2i=1m(gi+(x))2+k2j=1p(hj(x))2+12xx2.
  1. At a feasible point gi+(x)=0 for every i=1,,m since gi(x)0.

  2. At a feasible point hj(x)=0 for every j=1,,p.

  3. Hence Fk(x)=f(x)+12xx2 at every feasible point xC.

  4. The term 12xx2 is a quadratic penalty term penalizing how far we are from the local minimum x.

  5. The term 12(gi+(x))2 is a penalty term denoting how strongly the i-th inequality constraint is violated.

  6. The term 12(hj(x))2 is a penalty term denoting how strongly the j-th equality constraint is violated.

  7. We have Fk(x)f(x) for every xV.

  8. At the local minimizer, we have

    Fk(x)=f(x).
  9. Fk is a continuously differentiable function.

  10. We note that

    (gi+(x))2=2gi+(x)gi(x);(hj(x))2=2hj(x)hj(x).
  11. Hence

    Fk(x)=f(x)+ki=1mgi+(x)gi(x)+kj=1phj(x)hj(x)+(xx).

We now introduce the penalized problem

minimize Fk(x)subject to xXS

where

S={x|xxϵ}

and ϵ>0 is a positive scalar such that f(x)f(x) for all feasible xC with xS. Such a positive scalar exists since x is a local minimizer of f.

  1. The set S is compact and the set X is closed.

  2. Hence XS is compact.

  3. Hence there exists an optimal minimizer of the above problem for every k.

  4. Let xk be a minimizer of Fk(x) over XS.

  5. Then we have Fk(xk)Fk(x) for every k since xXS.

  6. This is equivalent to

    Fk(xk)=f(xk)+k2i=1m(gi+(xk))2+k2j=1p(hj(xk))2+12xkx2f(x)

    for every k.

  7. Since f is continuous and XS is compact, hence f(xk) is bounded for every k.

  8. It follows that

    limkgi+(xk)=0,i=1,,m,limk|hj(xk)|=0,j=1,,p,

    otherwise the term on the L.H.S. of the previous inequality will become unbounded and tend to as k.

  9. Hence every limit point x~ of the sequence {xk} is feasible; i.e., x~C.

  10. Also, since XS is compact, hence every limit point x~ of the sequence {xk} belongs to XS.

  11. From the inequality Fk(xk)f(x), we can also see that

    f(xk)+12xkx2f(x)k.
  12. Taking the limit as k, we obtain

    f(x~)+12x~x2f(x)

    for every limit point x~.

  13. Since x~S (near local minimizer) and x~C (feasible), we have

    f(x)f(x~).
  14. Combining with the previous inequality, it gives us

    12x~x2=0.
  15. Hence, we must have x=x~. Hence, the sequence {xk} has a only one limit point.

  16. Thus, the sequence {xk} converges to x.

  17. By the definition of the closed ball S, x is an interior point of S.

  18. Since limxk=x it follows that xk is an interior point of S for every k greater than some k0.

  19. Hence, due to Theorem 10.72,

    Fk(xk)TC(xk)

    holds true for every k>k0.

  20. We can write Fk(xk) as

    Fk(xk)=f(xk)+i=1mχikgi(xk)+j=1pξjkhj(xk)+(xkx)

    where χik=kgi+(xk) and ξjk=khj(xk).

  21. Note that by definition χik0 for every i=1,,m.

  22. Accordingly, we have

    (f(xk)+i=1mχikgi(xk)+j=1pξjkhj(xk)+(xkx))TC(xk)

    for every k>k0.

  23. We define

    δk=1+i=1m(χik)2+j=1p(ξjk)2.
  24. By definition δk1.

  25. We now introduce

    t0k=1δk,tik=χikδk,i=1,,m,rjk=ξjkδk,j=1,,p.
  26. By dividing by δk in the previous relation, we obtain

    zk=(t0kf(xk)+i=1mtikgi(xk)+j=1prjkhj(xk)+1δk(xkx))TC(xk)

    for every k>k0 since TC(xk) is a cone.

  27. Note that by construction, we have

    (t0k)2+i=1m(tik)2+j=1p(rjk)2=1.
  28. Hence the sequence {(t0k,t1k,,tmk,r1k,,rpk)} is a bounded sequence of R1+m+p.

  29. Hence, it must have a subsequence that converges to some limit {(t0,t1,,tm,r1,,rp)}.

  30. Let

    z=(t0f(x)+i=1mtigi(x)+j=1prjhj(x)).
  31. Along this subsequence, we have xkx, zkz and zkTX(xk) for every k>k0.

  32. Hence, following the definition of the normal cone (Definition 10.27), after disregarding the first k0 terms of the sequences,

    zN~X(x).

(2) Nonnegativity

  1. Since δk1, hence 0<t0k=1δk1.

  2. Hence t00.

  3. Since χik0 for every i=1,,m, hence tik0 for every i=1,,m.

  4. Hence ti0 for every i=1,,m.

(3) Not all zero

  1. We have established that for every k,

    (t0k)2+i=1m(tik)2+j=1p(rjk)2=1.
  2. Taking the limit over the subsequence, we have

    (t0)2+i=1m(ti)2+j=1p(rj)2=1.
  3. Hence, all of them cannot be zero.

(4) Complementary violation conditions

  1. Assume that IJ is not empty.

  2. Let K={k1,k2,,} denote the index set of the convergent subsequence of {(t0k,t1k,,tmk,r1k,,rpk)}.

  3. Let iI.

    1. We have ti>0.

    2. Then for all sufficient large k in K, we have tikti>0 as tik should be sufficiently close to ti.

    3. From the definitions of tik and χik, we must have tikgi+(xk)>0 for all sufficient large k in K.

    4. By definition, gi+(xk)0 and gi+(xk)>0 when gi(xk)>0.

    5. Hence we must have tikgi(xk)>0 for all sufficient large k in K.

  4. Let jJ.

    1. We have rj0.

    2. Then for all sufficiently large k in K, we have rjkrj. Hence rjk0 and rjk has the same sign as rj.

    3. Hence for all sufficiently large k in K, we have rjkrj>0.

    4. from the definitions of rjk and ξjk, we see that for all sufficiently large k in K, we must have rjkhj(xk)>0.

  5. Hence for all sufficiently large k in K, we have

    tikgi(xk)>0iI and rjkhj(xk)>0jJ.
  6. This means that gi+(xk)>0 for every iI and hj(xk)0 for every jJ for all sufficiently large k in K.

  7. This establishes that for all sufficiently large k in K, we have xkx. Otherwise the inequality and equality constraints cannot be violated.

  8. Recall that we established that

    Fk(xk)=f(xk)+k2i=1m(gi+(xk))2+k2j=1p(hj(xk))2+12xkx2f(x)

    for every k.

  9. Hence for all sufficiently large k in K, we must have

    f(xk)<f(x)

    since at least one of (gi+(xk))2 and (hj(xk))2 is positive for every sufficiently large k.

  10. We form the desired sequence {xl} satisfying all the necessary criteria by picking up all the entries corresponding to sufficiently large k in K from {xk}.

  11. It remains to show the magnitude property of terms hj(xk) and gi+(xk).

  12. For the remaining argument, without loss of generality, we shall assume that {xk} is the subsequence chosen above.

TBD the argument below is not complete.

  1. We see that

    (δk)2=1+i=1m(χik)2+j=1p(ξjk)2=1+i=1m(kgi+(xk))2+j=1p(khj(xk))21+kw(xk)2.
  2. For every iI, we have ti=0.

  3. Hence limktik=0.

  4. Hence limkχikδk=0.

  5. Hence

    limk(χik)21+i=1m(χik)2+j=1p(ξjk)2=0.
  6. Hence

    limk(kgi+(xk))21+i=1m(kgi+(xk))2+j=1p(hj(xk))2=0.
  7. We have

    0(kgi+(xk))21+i=1m(kgi+(xk))2+j=1p(hj(xk))2(kgi+(xk))21+kw(xk)2.

The complementary violation condition (CV) in (4) is stronger than the traditional complementary slackness (CS) condition.

  1. If ti>0 for some inequality constraint gi, then the inequality gi(x)0 must be violated arbitrarily close to x.

  2. In contrast, the CS condition only states that gi(x)=0. It doesn’t say anything about the violation of the constraint near the local optimal.

  3. CV states that there exists a sequence of X converging to x such that the equality and inequality constraints are violated at every point in the sequence corresponding to the nonzero Lagrangian multipliers.

  4. This sequence also has the property that the amount of violation in the constraints corresponding to zero Lagrange multipliers is negligible via the following conditions.

    gi+(xk)=o(w(xk))iI,|hj(xk)|=o(w(xk))jJ.

Corollary 10.6 (Enhanced Fritz-John conditions regular case)

In Theorem 10.82, if X is regular at x, then

(t0f(x)+i=1mtigi(x)+j=1prjhj(x))TX(x).

Equivalently

y,(t0f(x)+i=1mtigi(x)+j=1prjhj(x))0yTX(x).

10.8.5.3. Lagrangian Stationarity#

Observation 10.10 (Lagrangian stationarity)

In Theorem 10.82, assume that X is regular at x. Further assume that t0>0 is positive.

By normalization of the scalars, the gradient inequality reduces to

(f(x)+i=1mtigi(x)+j=1prjhj(x))TX(x).

Equivalently

y,(f(x)+i=1mtigi(x)+j=1prjhj(x))0yTX(x).

This is identical to the Lagrangian stationarity condition (1) in Definition 10.33.

Hence if X is regular at x and t0=1, then the vector (t,r)={t1,,tm,r1,,rm} is a Lagrangian multiplier vector that satisfies the CV condition.

10.8.5.4. LICQ#

Observation 10.11 (Linear independence constraint qualification)

In Theorem 10.82, assume that X=V. Then X is normal and TX(x)=V.

  1. Condition (1) reduces to

    t0f(x)+i=1mtigi(x)+j=1prjhj(x)=0.
  2. Further, add the linear independence constraint qualification (LICQ) assumption at x; i.e., the gradients gi(x),iA(x) and hj(x),j=1,,p are linearly independent;

  3. Then t00 must be true.

  4. Otherwise, the linear independence condition would be violated.

  5. It follows that Lagrangian stationarity is satisfied and there exists a Lagrangian multiplier vector.

  6. We thus obtain the classical Lagrangian multiplier theorem based on KKT conditions Theorem 10.79.

10.8.6. Informative Lagrange Multipliers#

Lagrange multiplier vectors may provide a significant amount of sensitivity information by indicating which constraints to violate in order to effect a reduction in objective function value. In general, if the multipliers are informative, then the larger the coefficient for a constraint, the more sensitive is the objective function to the cost.

Definition 10.34 (Informative Lagrange multiplier vector)

A Lagrange multiplier vector (as defined in Definition 10.33 for the problem (10.20)) is called an informative multiplier vector if it satisfies the complementary violation condition of enhanced Fritz-John conditions Theorem 10.82.

Definition 10.35 (Minimal Lagrange multiplier vector)

Let x be a local minimum of the problem (10.20). Let (t,r) be a Lagrange multiplier vector (as defined in Definition 10.33). Let I={i|ti>0} and J={j|rj0}. Then (t,r) is called a minimal Lagrange multiplier vector if there is no other Lagrange multiplier vector with a support that is strictly contained in IJ.

Definition 10.36 (Strong Lagrange multiplier vector)

Let x be a local minimum of the problem (10.20). Let (t,r) be a Lagrange multiplier vector (as defined in Definition 10.33). Let I={i|ti>0} and J={j|rj0}. Then (t,r) is called a strong Lagrange multiplier vector if it satisfies the following condition:

If the set IJ is nonempty, then there exists a sequence {xk} of X such that xkx and for all k we have:

f(xk)<f(x);tigi(xk)>0,iI;rjhj(xk)>0,jJ.

This condition is slightly weaker than the complementary violation condition. It makes no provisions for ensuring that the constraints corresponding to zero multipliers have negligibly small violation at each xk.

  1. Every informative multiplier vector is a strong multiplier vector.

  2. Every minimal multiplier vector is a strong multiplier vector.

  3. Minimal multipliers are not necessarily informative.

  4. Some multiplier vectors are both informative and minimal.

Theorem 10.83 (Existence of informative multipliers)

Let x be a local minimum of the problem (10.20). Assume that the tangent cone TX(x) is convex and that the set of Lagrange multiplier vectors is nonempty. Then

  1. The set of informative Lagrange multiplier vectors is nonempty.

  2. The Lagrange multiplier vector which has the minimum norm is informative.

  3. Every minimal Lagrange multiplier vector is strong.

10.8.7. Pseudonormality#

The key issue with enhanced Fritz-John conditions is that it allows for t0 to be zero.

  1. We now introduce the notion of pseudonormal points in the constraint set C.

  2. We show that if a local minimizer x is a pseudonormal point, then t00.

  3. Hence, at pseudonormal points, the constraint set C admits Lagrange multiplier vectors.

Definition 10.37 (Pseudonormal vector)

We say that a feasible vector x for the problem (10.20) is pseudonormal if one cannot find scalars t1,,tm,r1,,rp and a sequence {xk} of X such that

  1. The gradients of constraint functions at x satisfy:

    (i=1mtigi(x)+j=1prjhj(x))N~X(x).
  2. Inequality multipliers are nonnegative: ti0 for every i=1,,m.

  3. Multipliers for inactive inequality constraints are zero. ti=0 for every iA(x) where A(x) denotes the set of active inequality constraints at x;

    A(x)={i1,,m|gi(x)=0}.
  4. The sequence {xk} converges to x (i.e., xkx) and

    i=1mtigi(xk)+j=1prjhj(xk)>0,k.

Theorem 10.84 (Pseudonormality and EFJ scalars)

Let x be a local minimizer for the problem (10.20). Let t0,t1,,tm,r1,,rp be the enhanced Fritz-John scalars satisfying the conditions in Theorem 10.82.

If x is a pseudonormal vector of the constraint set C defined in (10.21), then t00.

Proof. We shall show this by contradiction.

  1. Assume that x is a local minimizer for the problem (10.20).

  2. Then there exist EFJ scalars t0,t1,,tm,r1,,rp satisfying the conditions in Theorem 10.82.

  3. For contradiction assume that x is a pseudonormal vector and t0=0.

  4. Since t0,t1,,tm,r1,,rp are EFJ scalars, hence

    (i=1mtigi(x)+j=1prjhj(x))N~X(x).
  5. ti0 for every i=1,,m.

  6. The scalars t1,,tm,r1,,rp are not equal to 0.

  7. Hence the index set IJ is not empty, where

    I={i>0|ti>0},J={j|rj0}.
  8. Hence there exists a sequence {xk} of X that converges to x and is such that for all k

    f(xk)<f(x),tigi(xk)>0iI,rjhj(xk)>0jJ,gi+(xk)=o(w(xk))iI,|hj(xk)|=o(w(xk))jJ,

    where gi+(x)=max{0,gi(x)} and

    w(x)=min{miniIgi+(x),minjJ|hj(x)|}.
  9. Hence the sequence {xk} with xkx satisfies

    i=1mtigi(xk)+j=1prjhj(xk)>0k

    since at least one of these multipliers is nonzero.

  10. Multiplier for every inactive inequality constraint must be zero. Otherwise, the complementary violation condition will be violated.

  11. Hence t1,,tm,r1,,rp are the scalars and {xk} is a sequence of X satisfying all the conditions in Definition 10.37.

  12. Hence x cannot be a pseudonormal vector of C.

10.8.8. Constraint Qualifications#

We now introduce different constraint qualifications which ensure that a point x is pseudonormal.

10.8.8.1. Linearly Independent Constraint Qualifications#

Criterion 10.1 (CQ1: linearly independent constraint qualifications)

X=V and x satisfies LICQ; i.e., the active inequality constraint gradients gi(x) for all iinA(x) and the equality constraint gradients hj(x) for all j=1,,p are linearly independent.

10.8.8.2. AHU/MF Constraint Qualifications#

10.8.8.3. Affine Equality and Concave Inequality Constraints#