Lagrange Multipliers
Contents
10.8. Lagrange Multipliers#
Main references for this section are [6, 9, 17].
The material in this section builds up on the material from previous sections. While the material in Constrained Optimization I and Constrained Optimization II doesn’t make specific assumptions on the structure of the constraint set (beyond say convexity and closedness), the material in Linear Constraints deals with a specific structure where the constraint set consists of a system of linear inequalities and equalities. This section focuses on the case where the constraint set consists of a system of smooth inequalities and equalities.
The necessary and/or sufficient conditions for the optimization problems presented in this section admit the existence of a set of nonnegative (for inequality constraints) and real (for equality constraints) scalars known as Lagrange multipliers satisfying a specific system of equations.
We generalize the linear inequality and equality constraints in Linear Constraints to allow for smooth inequality and equality constraints.
We first consider problems involving minimization of a smooth function over a set of smooth inequalities.
We describe the notion of feasible descent directions.
We show that at local minimizers, there are no feasible descent directions.
We then develop the necessary Fritz-John conditions for the existence of a local minimizer.
We add further constraint qualifications to develop the necessary KKT conditions for the existence of a local minimizer.
We then consider the problems involving minimization of a smooth function over a set of sooth inequalities and equalities. We present the KKT conditions for the existence of a local minimizer.
We then add convexity in the mix for the cost function and constraint functions.
10.8.1. Inequality Constrained Problems#
We start by developing the KKT conditions for the problem of minimizing a smooth function over a set of inequality constraints.
The problem is given as
where
The constraint set is given by
10.8.1.1. Feasible Descent Directions#
Recall from Definition 10.22
that a descent direction is a direction along
which the directional derivative of the cost
function is negative; i.e.,
Recall from Definition 10.23
that given a vector
We now introduce the notion of a feasible descent direction.
(Feasible descent direction)
Consider the problem of minimizing a cost function
In other words, a feasible descent direction
at
If
is continuously differentiable then we must have .If
is convex, then we just need to have . By virtual of convexity of , every .
(Local minimum and feasible descent directions)
Consider the problem of minimizing a cost function
Proof. We prove this by contradiction.
Let
be a local minimizer.Assume that
is a feasible descent direction.Then there is an
such that for every .Also,
.By Definition 9.70, there exists
such thatEquivalently,
for all .Let
.Then for every
, we have and .This contradicts the hypothesis that
is a local minimizer.Hence, there are no feasible descent directions at
.
10.8.1.2. Necessary Conditions for Local Optimality#
Revising the problem (10.18):
A constraint
is called active at if .A constraint
is called inactive at if .The set of active constraints at a point
is denoted by
We first restate the Lemma 10.7 for the minimization with inequality constraints problem (10.18). The key idea is the lack of feasible descent directions at a local minimizer.
will indicate that is a feasible direction.If a constraint is inactive, then it remains valid in the neighborhood of the local minimizer due to continuity of
.If a constraint is active, then moving in some directions will lead to invalidation of the constraint while moving in some directions will keep the constraint valid.
In particular, if
, then moving along keeps the -th active constraint valid.Hence, along a feasible descend direction, the directional derivatives of the cost function and the active constraint functions must be negative.
(Local minimum and feasible descent directions for inequality constraints)
Let
where
Then there doesn’t exist a vector
This result states that local optimality is equivalent to the infeasibility of a certain system of strict inequalities.
Proof. We prove this by contradiction.
Let
be a local minimizer.Assume that
be a direction satisfying the constraints above.Then there exists an
such that for every .Similarly, there exist
such that for every for every .Let
.Then for every
, we have and for every .By the continuity of
for all , and the fact that for every , there exists a such that for every , for every .Hence, we conclude that for every
, we have and for every .But this contradicts the local optimality of
.
10.8.1.3. Fritz-John Conditions#
Recall that Farkas’ and Gordan’s theorems of the alternative present different pairs of systems where if one system is infeasible then the other must be feasible and vice versa. We can apply Gordan’s theorem to the infeasible system of strict inequalities in Lemma 10.8 to develop the so-called Fritz-John conditions.
(Fritz-John conditions)
Let
where
Proof. From Lemma 10.8, we have that the following system is infeasible.
Let
.Let there be an isomorphic mapping between
and .For every
, let also denote the corresponding vector in .Assume that there are
active constraints at .Construct a
matrix as follows:where
denote the indices of active constraints.Then the above system of strict inequalities can be stated as
.This system of equations is infeasible.
Then by Gordan's theorem, the system
where
is feasible.We write
.The equation
expands to means that at least one of . means that .Now, let
for all remaining .Then for active constraints, we have
and for inactive constraints, we have .Hence for all constraints, we have
.Hence there exist nonnegative scalar multipliers
which are not all zero such that
The key issue with Fritz-John conditions is that
it allows
with
The case of linearly dependent gradients has nothing to do with the objective function.
A number of points might satisfy the Fritz-John conditions and yet not be local minimum points.
We can modify the Fritz-John conditions and insist that the gradients of the active constraints be linearly independent.
This leads to what are called the KKIT conditions.
10.8.1.4. KKT Conditions#
(KKT conditions)
Let
where
Assume that the gradients of the active constraints
Then there exist nonnegative scalar multipliers
Proof. This is a simple extension of Theorem 10.77.
By Fritz-John conditions, there exist nonnegative scalar multipliers
If
, then the set of gradients of active constraints will become linearly dependent.Hence, we must have
.Let
for every .The result follows.
10.8.2. Inequality and Equality Constrained Problems#
We now generalize the KKT conditions to include problems of the form which include both inequality constraints and equality constraints
where
The constraint set is given by
An equality constraint must always be met at a feasible point. Hence there is no need to distinguish between active and inactive equality constraints. All inequality constraints are active.
A constraint of the form
can be converted into two inequality constraints and .
(KKT conditions for problems with smooth inequality and equality constraints)
Let
where
Assume that the gradients of the active inequality constraints
Then there exist nonnegative scalar multipliers
10.8.2.1. KKT Points and Regular Points#
All the results up to this point define a set of necessary conditions in the form of a system of equations on the constraint functions and their gradients which must be satisfied by every local minimizer of the optimization problem. Besides the local minimizers, other points may also satisfy this system of equations. We now introduce the notion of KKT points which satisfy these equations.
(KKT point)
Consider the optimization problem (10.19)
where
A feasible point
All the necessary KKT conditions so far can be simply restarted as a local minimizer must be a KKT point if the gradients of active inequality constraints and all equality constraints at the point are linearly independent. We introduce the notion of regularity to capture the linear independence aspect.
(Regularity)
Consider the optimization problem (10.19)
where
A feasible point
With the terminology of these definitions, Theorem 10.79 reduces to: if a regular point is a local minimizer then it must be a KKT point.
The notion of regularity is a kind of constraint qualification.
(Regular points, KKT points, Local minimizers)
Consider the problem:
The problem structure
The ambient space is
.We have the cost function:
.The cost function is smooth and convex, in fact linear.
We don’t have any inequality constraints.
We have one equality constraint.
The equality constraint function is given by
.The equality constraint is
. is a smooth function but it is not a convex function.The constraint set is a contour of
.The set of feasible points is given by
.The constraint set is not convex.
Hence, it is not a convex optimization problem.
However, the constraint set is compact.
Hence, due to Theorem 3.85, the function
indeed attains a minimum as well as a maximum value on the .
Gradients
We have
.We have
.
Irregular points
The KKT conditions are applicable only on regular points.
We first identify the points which are irregular.
The irregular points are points at which the gradients of all active inequality constraints and equality constraints are linearly dependent.
Since, we have a single equality constraint, hence the irregular points are those points at which
.This is given by a single point
.But
since .Hence, the constraint set
doesn’t contain any irregular points.In other words, every feasible point is a regular point.
Hence the KKT conditions are necessary for local optimality.
In other words, if a point is a local minimizer then it must be a KKT point.
KKT points
To identify the KKT points, we form the Lagrangian
The KKT conditions are:
They expand to
From the first two equations, we have
and .Plugging it into the third equation, we get
This simplifies to
.Hence, we have
.This gives us two different KKT points
and .
The optimal solution
By compactness, we know that the minimizer does exist.
By regularity, we know that the minimizer must be a KKT point.
We have two candidates available.
We have
and .Hence the minimizer is given by
as it has the smaller value of
.
10.8.3. The Convex Case#
We now restrict our attention to the case where the cost and constraint functions are convex. In this case, the KKT conditions are also sufficient.
(Sufficient KKT conditions for convex problems (smooth and convex cost and inequality constraints, affine equality constraints))
Let
where
Suppose that there exist nonnegative scalar multipliers
Then
Proof. We are given that
Define the function
Since
and are convex and are affine, hence is convex.Since all of them are continuously differentiable, hence
is also continuously differentiable.We have
We are given that
.By Theorem 10.48,
is a minimizer of over .Hence
for every .By hypothesis
for every .Hence
.Since
is a feasible point, hence for every .Hence
.Hence
The last inequality comes from the fact that
, and for every feasible .Hence for every feasible
, we have .Hence
is an optimal point.
10.8.3.1. Slater’s Conditions#
In Theorem 10.79, we saw that KKT conditions become necessary for the local optimality of a feasible point only if the feasible point is regular. The regularity was a constraint qualification for the nonconvex smooth optimization problem.
In the convex case, a different condition than regularity can guarantee the necessity of KKT conditions. They are known as Slater’s conditions.
(Slater’s conditions)
Let
if there exists a point
In other words, the Slater’s condition requires the existence of a point which strictly satisfies all the convex inequality constraints.
Slater’s condition is much easier to check since it requires the existence of a single point which strictly satisfies all the convex inequalities.
(Necessity of KKT conditions under Slater’s condition)
Let
where
Then there exist nonnegative scalar multipliers
Proof. This is also derived from Fritz-John conditions Theorem 10.77.
By Fritz-John conditions, there exist nonnegative scalar multipliers
We need to show that
For contradiction, assume that
.Then we have
By the gradient inequality, we have
Specifically, for the point
, we haveMultiplying the
-th inequality by and summing over , we getThis inequality is strict since not all
are and .Since
, it reduces toBut
for every . Hence we must haveA contradiction. Hence
must be true.
10.8.4. A Tangent Cones Perspective#
Consider an optimization problem of the form
Assume that
The constraint set can be written as
Assume that
is a minimizer of this problem.By Theorem 10.72, we must have
Let us now motivate the Lagrange multipliers using a simple problem of linear inequalities.
Consider the specific case where
Consider a matrix
which consists of as rows.Put together
as a vector .Then the constraint set can be expressed as
Assume that
is the local minimizer of .By Example 10.10,
which is the nullspace of
.By Example 9.18,
Hence, by the optimality condition, we have
Hence, there exists
such thatThis is equivalent to
Since
, hence this is equivalent to
This motivates us to define the Lagrangian of
The basic Lagrangian theorem states that under
suitable conditions, if
There are
unknowns in and unknowns in .Thus, we have a total of
unknowns.The relation above gives us a system of
equations.Together with the
equalities , we have a system of equations with unknowns.Thus, the problem of solving a constrained optimization problem is transformed into a problem of solving a system of nonlinear equations.
Now, suppose that the tangent cone at
can be written asLetting
and following the argument above, we must haveThus, if the tangent cone can be represented as above, then if
is a local minimizer, then the Lagrangian multipliers must exist.The admittance of Lagrange multipliers at a given
is the property of the constraint set . It is not a property of the optimization problem itself. If admits Lagrange multipliers at , then there exists a Lagrange multiplier vector at for every smooth cost function if is a local minimizer of .
10.8.5. Enhanced Fritz-John Conditions#
We now introduce a more difficult optimization problem
where the constraint set
We assume that
We also introduce
and
For every
Correspondingly, we define
(Lagrangian function)
For the optimization problem (10.20), the Lagrangian function is defined as
where
10.8.5.1. Lagrange Multiplier Vectors#
(Lagrange multiplier vectors)
We say that a constraint set
A pair
We also call the Lagrange multiplier vector as simply Lagrange multipliers.
The condition (10.23) is the nonnegativity condition of the Lagrangian multipliers for the inequality constraints.
The condition (10.24) is the complementary slackness condition.
From (10.22), we can see that for each
, the set of Lagrange multiplier vectors corresponding to a given and is a closed half-space.Hence, the set of Lagrange multiplier vectors corresponding to a given
and is an intersection of closed half spaces.Hence the set of Lagrange multiplier vectors is closed and convex. Although it may possibly be empty.
The condition (10.22) is referred to as the Lagrangian stationarity condition. It states that the gradient of the Lagrangian function is nonnegative along all tangent directions at the local minimizer.
It can be viewed as the necessary condition for
to be a local minimizer of the function . See Theorem 10.72.When
, then , and this condition reduces toWhen
is convex, then (10.22) reduces to is convex. Hence for is a feasible direction for every .Hence, if this inequality holds, then (10.22) holds for every
.Since
is convex, hence .If the inequality (10.22) holds for every
, then it will also hold for a closure point of .In other words, if
for every , then for any convergent sequence of , we have . Hence the inequality holds for every closure point also.
The Lagrangian stationary condition can also be equivalently written as
In other words, the negative gradient of the Lagrangian function must lie in the polar cone of the tangent cone of
at .Recall from Theorem 10.68 that
where
is the normal cone of at (in the sense of [9]).Hence the negative gradient of the Lagrangian function must be a normal direction (Definition 10.26) at
.If
is regular at (Theorem 10.70), then we also have
10.8.5.2. Enhanced Fritz-John Conditions#
We are now ready to present the Enhanced Fritz-John conditions as the necessary conditions for the existence of the local minimizer of the problem (10.20).
(Enhanced Fritz-John conditions)
Let
The gradients satisfy the relation:
Nonnegativity:
for every .The scalars
are not equal to .Complementary violation condition: If the index set
is not empty, wherethere exists a sequence
of that converges to and is such that for allwhere
and
Proof. The proof is based on a quadratic penalty function approach.
For each
At a feasible point
for every since .At a feasible point
for every .Hence
at every feasible point .The term
is a quadratic penalty term penalizing how far we are from the local minimum .The term
is a penalty term denoting how strongly the -th inequality constraint is violated.The term
is a penalty term denoting how strongly the -th equality constraint is violated.We have
for every .At the local minimizer, we have
is a continuously differentiable function.We note that
Hence
We now introduce the penalized problem
where
and
The set
is compact and the set is closed.Hence
is compact.Hence there exists an optimal minimizer of the above problem for every
.Let
be a minimizer of over .Then we have
for every since .This is equivalent to
for every
.Since
is continuous and is compact, hence is bounded for every .It follows that
otherwise the term on the L.H.S. of the previous inequality will become unbounded and tend to
as .Hence every limit point
of the sequence is feasible; i.e., .Also, since
is compact, hence every limit point of the sequence belongs to .From the inequality
, we can also see thatTaking the limit as
, we obtainfor every limit point
.Since
(near local minimizer) and (feasible), we haveCombining with the previous inequality, it gives us
Hence, we must have
. Hence, the sequence has a only one limit point.Thus, the sequence
converges to .By the definition of the closed ball
, is an interior point of .Since
it follows that is an interior point of for every greater than some .Hence, due to Theorem 10.72,
holds true for every
.We can write
aswhere
and .Note that by definition
for every .Accordingly, we have
for every
.We define
By definition
.We now introduce
By dividing by
in the previous relation, we obtainfor every
since is a cone.Note that by construction, we have
Hence the sequence
is a bounded sequence of .Hence, it must have a subsequence that converges to some limit
.Let
Along this subsequence, we have
, and for every .Hence, following the definition of the normal cone (Definition 10.27), after disregarding the first
terms of the sequences,
(2) Nonnegativity
Since
, hence .Hence
.Since
for every , hence for every .Hence
for every .
(3) Not all zero
We have established that for every
,Taking the limit over the subsequence, we have
Hence, all of them cannot be zero.
(4) Complementary violation conditions
Assume that
is not empty.Let
denote the index set of the convergent subsequence of .Let
.We have
.Then for all sufficient large
in , we have as should be sufficiently close to .From the definitions of
and , we must have for all sufficient large in .By definition,
and when .Hence we must have
for all sufficient large in .
Let
.We have
.Then for all sufficiently large
in , we have . Hence and has the same sign as .Hence for all sufficiently large
in , we have .from the definitions of
and , we see that for all sufficiently large in , we must have .
Hence for all sufficiently large
in , we haveThis means that
for every and for every for all sufficiently large in .This establishes that for all sufficiently large
in , we have . Otherwise the inequality and equality constraints cannot be violated.Recall that we established that
for every
.Hence for all sufficiently large
in , we must havesince at least one of
and is positive for every sufficiently large .We form the desired sequence
satisfying all the necessary criteria by picking up all the entries corresponding to sufficiently large in from .It remains to show the magnitude property of terms
and .For the remaining argument, without loss of generality, we shall assume that
is the subsequence chosen above.
TBD the argument below is not complete.
We see that
For every
, we have .Hence
.Hence
.Hence
Hence
We have
The complementary violation condition (CV) in (4) is stronger than the traditional complementary slackness (CS) condition.
If
for some inequality constraint , then the inequality must be violated arbitrarily close to .In contrast, the CS condition only states that
. It doesn’t say anything about the violation of the constraint near the local optimal.CV states that there exists a sequence of
converging to such that the equality and inequality constraints are violated at every point in the sequence corresponding to the nonzero Lagrangian multipliers.This sequence also has the property that the amount of violation in the constraints corresponding to zero Lagrange multipliers is negligible via the following conditions.
(Enhanced Fritz-John conditions regular case)
In Theorem 10.82,
if
Equivalently
10.8.5.3. Lagrangian Stationarity#
(Lagrangian stationarity)
In Theorem 10.82,
assume that
By normalization of the scalars, the gradient inequality reduces to
Equivalently
This is identical to the Lagrangian stationarity condition (1) in Definition 10.33.
Hence if
10.8.5.4. LICQ#
(Linear independence constraint qualification)
In Theorem 10.82,
assume that
Condition (1) reduces to
Further, add the linear independence constraint qualification (LICQ) assumption at
; i.e., the gradients and are linearly independent;Then
must be true.Otherwise, the linear independence condition would be violated.
It follows that Lagrangian stationarity is satisfied and there exists a Lagrangian multiplier vector.
We thus obtain the classical Lagrangian multiplier theorem based on KKT conditions Theorem 10.79.
10.8.6. Informative Lagrange Multipliers#
Lagrange multiplier vectors may provide a significant amount of sensitivity information by indicating which constraints to violate in order to effect a reduction in objective function value. In general, if the multipliers are informative, then the larger the coefficient for a constraint, the more sensitive is the objective function to the cost.
(Informative Lagrange multiplier vector)
A Lagrange multiplier vector (as defined in Definition 10.33 for the problem (10.20)) is called an informative multiplier vector if it satisfies the complementary violation condition of enhanced Fritz-John conditions Theorem 10.82.
(Minimal Lagrange multiplier vector)
Let
(Strong Lagrange multiplier vector)
Let
If the set
This condition is slightly weaker than the complementary violation
condition. It makes no provisions for ensuring that the constraints
corresponding to zero multipliers have negligibly small violation
at each
Every informative multiplier vector is a strong multiplier vector.
Every minimal multiplier vector is a strong multiplier vector.
Minimal multipliers are not necessarily informative.
Some multiplier vectors are both informative and minimal.
(Existence of informative multipliers)
Let
The set of informative Lagrange multiplier vectors is nonempty.
The Lagrange multiplier vector which has the minimum norm is informative.
Every minimal Lagrange multiplier vector is strong.
10.8.7. Pseudonormality#
The key issue with enhanced Fritz-John conditions is that it allows
for
We now introduce the notion of pseudonormal points in the constraint set
.We show that if a local minimizer
is a pseudonormal point, then .Hence, at pseudonormal points, the constraint set
admits Lagrange multiplier vectors.
(Pseudonormal vector)
We say that a feasible vector
The gradients of constraint functions at
satisfy:Inequality multipliers are nonnegative:
for every .Multipliers for inactive inequality constraints are zero.
for every where denotes the set of active inequality constraints at ;The sequence
converges to (i.e., ) and
(Pseudonormality and EFJ scalars)
Let
If
Proof. We shall show this by contradiction.
Assume that
is a local minimizer for the problem (10.20).Then there exist EFJ scalars
satisfying the conditions in Theorem 10.82.For contradiction assume that
is a pseudonormal vector and .Since
are EFJ scalars, hence for every .The scalars
are not equal to .Hence the index set
is not empty, whereHence there exists a sequence
of that converges to and is such that for allwhere
andHence the sequence
with satisfiessince at least one of these multipliers is nonzero.
Multiplier for every inactive inequality constraint must be zero. Otherwise, the complementary violation condition will be violated.
Hence
are the scalars and is a sequence of satisfying all the conditions in Definition 10.37.Hence
cannot be a pseudonormal vector of .
10.8.8. Constraint Qualifications#
We now introduce different constraint qualifications which ensure
that a point
10.8.8.1. Linearly Independent Constraint Qualifications#
(CQ1: linearly independent constraint qualifications)