Mathematical Optimization
Contents
8.1. Mathematical Optimization#
Main references for this section are [9, 17].
This section provides a general overview of optimization problems. The notion of objective or cost functions and constraint functions is introduced. Standard form for optimization problems is introduced. Several examples are discussed for equivalent forms of optimization problems. The standard form focuses on minimizing the objective function over a set of equality and inequality constraints. Maximization problems can also be converted into minimization problems by changing the sign of the objective function.
We introduce the notion of the domain of an optimization problem, feasible points or solutions, feasible set of solutions, the optimal value of the minimization problem, optimal points or solutions, optimal set etc.. We discuss when the problem may be infeasible or feasible but unbounded. We discuss situations where the problem may be feasible and have a global minimum/optimal value and yet a feasible solution may not exist.
We then discuss general requirements for the feasibility of the optimization problem and existence of an optimal solution. We introduce the notion of coercive functions which provide the Weierstrass type result for the existence of optimal solutions.
8.1.1. Optimization Problems#
Let \(\VV\) be a real \(n\)-dimensional inner product space.
In the most general form, an optimization problem can be considered as minimizing or maximizing the value of a real valued function \(f : \VV \to \RR\) over its domain \(S = \dom f\).
We call \(f\) as the objective function which is being minimized or maximized.
The variable \(\bx \in \VV\) is called the optimization variable.
\(S = \dom f\) is called the feasible set of solutions for the optimization problem.
A point \(\bx \in S\) is called a feasible point or feasible solution.
Our goal is to find an \(\bx^* \in S\) which maximizes or minimizes the objective function \(f\).
(Globally optimal value and points)
Let \(f : \VV \to \RR\) be a real valued function with \(S = \dom f\).
The maximum value of \(f\) is given by
\[ \sup \{ f(\bx) \ST \bx \in S \}. \]The minimum value of \(f\) is given by
\[ \inf \{ f(\bx) \ST \bx \in S \}. \]We allow the maximum and minimum values to take \(\infty\) and \(-\infty\) values.
The function \(f\) may or may not attain its maximum / minimum value at some point in its domain \(S\).
\(\bx^* \in S\) is called a global minimum point if \(f(\bx) \geq f(\bx^*)\) for every \(\bx \in S\).
\(\bx^* \in S\) is called a global maximum point if \(f(\bx) \leq f(\bx^*)\) for every \(\bx \in S\).
\(\bx^* \in S\) is called a strict global minimum point if \(f(\bx) > f(\bx^*)\) for every \(\bx \in S\) with \(\bx \neq \bx^*\).
\(\bx^* \in S\) is called a strict global minimum point if \(f(\bx) < f(\bx^*)\) for every \(\bx \in S\) with \(\bx \neq \bx^*\).
A point \(\bx^* \in S\) is called a global optimal point if it is either a global maximum or minimum point.
The maximum and minimum values of \(f\) are always unique as opposed to global optimal points which may not be unique.
The set of global minimizers is given by
\[ \argmin \{ f(\bx) \ST \bx \in S \}. \]The set of global maximizers is given by
\[ \argmax \{ f(\bx) \ST \bx \in S \}. \]
In a more restricted setting, an optimization problem can be considered as minimizing or maximizing the value of a real valued function \(f : \VV \to \RR\) with \(S = \dom f\) over a set \(A\) such that \(A \subseteq S\).
In this case, the feasible set is restricted to \(A\).
Minimizers and maximizers are searched within this subset \(A\).
This problem can be converted into the earlier general form by considering the restriction \(\tilde{f} : \VV \to \RR\) of \(f\) such that \(A = \dom \tilde{f}\) and
\[ \tilde{f}(\bx) = f(\bx) \Forall \bx \in A. \]
In several problems, it may not be possible to establish whether a point is globally optimal. However, it is easier to establish if a point is optimal in a neighborhood around it. Such points are called locally optimal points or extreme values.
(Local optimal points)
Let \(f : \VV \to \RR\) be a real valued function with \(S = \dom f\). We say that \(f(\ba)\) is a local extreme value or local optimal value of \(f\) at \(\ba \in \dom f\) if there exists \(\delta > 0\) such that \(f(\bx) - f(\ba)\) doesn’t change sign on \(B(\ba, \delta) \cap S\).
More specifically,
\(f(\ba)\) is a local maximum value of \(f\) if for some \(\delta > 0\):
\[ f(\bx) \leq f(\ba) \Forall \bx \in B(\ba, \delta) \cap S. \]\(f(\ba)\) is a local minimum value of \(f\) if for some \(\delta > 0\):
\[ f(\bx) \geq f(\ba) \Forall \bx \in B(\ba, \delta) \cap S. \]
The point \(\bx=\ba\) is called a local extreme point or local optimal point of \(f\) or more specifically, a local maximum or a local minimum point of \(f\).
\(\ba\) is a strict local maximum point if for some \(\delta > 0\):
\[ f(\bx) > f(\ba) \Forall \bx \in B_d(\ba, \delta) \cap S. \]\(\ba\) is a strict local minimum point of \(f\) if for some \(\delta > 0\):
\[ f(\bx) > f(\ba) \Forall \bx \in B_d(\ba, \delta) \cap S. \]
Here \(B_d(\ba, \delta)\) denotes the deleted neighborhood (an open ball of radius \(\delta\) excluding \(\ba\) itself).
(Local optimal points for open domain)
Let \(f : \VV \to \RR\) be a real valued function with \(S = \dom f\). Assume that \(S\) is open.
If \(\ba \in S\) is a local maximum point, then there exists \(r_1 > 0\) such that
\[ f(\bx) \leq f(\ba) \Forall \bx \in B(\ba, r_1) \cap S. \]But since \(S\) is open, hence \(\ba \in \interior S\). Thus, there exists an open ball \(B(\ba, r_2) \subseteq S\).
Let \(r = \min(r_1, r_2)\).
Then, \(B(\ba, r) \subseteq S\) and \(B(\ba, r) \subseteq B(\ba, r_1)\).
Thus, \(B(\ba, r) \subseteq B(\ba, r_1) \cap S\).
Thus, \(f(\bx) \leq f(\ba) \Forall \bx \in B(\ba, r) \subseteq S\).
Thus, the local optimality condition simplifies to looking for an open ball of radius \(r\) totally contained inside the open domain \(S\).
Based on this discussion, we can adjust the conditions for local optimality.
\(\ba\) is a local maximum point of \(f\) if for some \(r > 0\):
\[ f(\bx) \leq f(\ba) \Forall \bx \in B(\ba, r) \subseteq S. \]\(\ba\) is a local minimum point of \(f\) if for some \(r > 0\):
\[ f(\bx) \geq f(\ba) \Forall \bx \in B(\ba, r) \subseteq S. \]\(\ba\) is a strict local maximum point if for some \(r > 0\):
\[ f(\bx) > f(\ba) \Forall \bx \in B_d(\ba, r) \subseteq S. \]\(\ba\) is a strict local minimum point of \(f\) if for some \(r > 0\):
\[ f(\bx) > f(\ba) \Forall \bx \in B_d(\ba, r) \subseteq S. \]
8.1.1.1. Standard Form for Mathematical Optimization Problems#
(Optimization problem standard form)
Let \(\VV\) be an \(n\)-dimensional real vector space. A mathematical optimization problem of the form
is known as an optimization problem in its standard form.
\(\bx \in \VV\) is called the optimization variable.
\(f_0 : \VV \to \RR\) is called the objective function or cost function.
The functions \(f_i : \VV \to \RR\) are called the inequality constraint functions.
The corresponding inequalities \(f_i(\bx) \leq 0\) are called the inequality constraints.
The functions \(h_j : \VV \to \RR\) are called the equality constraint functions.
The corresponding equations \(h_j(\bx) = 0\) are called the equality constraints.
If there are no constraints (\(m=0, p=0\)), the problem is called unconstrained.
The set
\[ \DDD \triangleq \dom f_0 \cap \bigcap_{i=1}^m \dom f_i \cap \bigcap_{j=1}^p \dom h_j \]is called the domain of the optimization problem.
A point \(\bx \in \DDD\) is called feasible if it satisfies all the inequality constraints \(f_i(\bx) \leq 0\) and all the equality constraints \(h_i(\bx) = 0\).
The optimization problem is called feasible if there exists at least one feasible point.
If there are no feasible points in the domain \(\DDD\), then the optimization problem is called infeasible. Naturally, if \(\DDD = \EmptySet\), then the problem is infeasible.
The set of feasible points for an optimization problem is called the feasible set or the constraint set. We shall denote the feasible set by \(C\).
\[ C = \dom f_0 \cap \bigcap_{i=1}^m f_i^{-1}(-\infty, 0] \cap \bigcap_{j=1}^p h_j^{-1}(0). \]It is the intersection of the domain of \(f_0\), the 0-sublevel sets of \(f_i\) for \(i=1,\dots,m\), and the \(0\)-level sets of \(h_j\) for \(j=1,\dots,p\).
Thus, if the problem is infeasible, then \(C = \EmptySet\).
The optimum value \(p^* \in \ERL\) of the optimization problem is defined as
\[ p^* = \inf \{ f_0(\bx) \ST f_i(\bx) \leq 0, i=1,\dots,m, \; h_j(\bx) = 0, j= 1, \dots, p\}. \]In other words,
\[ p^* = \inf \{ f_0(\bx) \ST \bx \in C \}. \]We allow \(p^*\) to take the extended values \(\infty\) and \(-\infty\).
If the problem is infeasible, then \(p^* = \infty\) It is consistent with the convention that the infimum of an empty set is \(\infty\).
If \(p^* = -\infty\), then the problem is called unbounded below. In this case, there exists a sequence \(\{ \bx_k \}\) of \(C\) such that \(\lim f_0(\bx_k) = -\infty\).
We say that \(\bx^*\) is an optimal point if it solves (8.1). In other words, \(\bx^* \in C\) and \(f(\bx^*) = p^*\).
The set of all optimal points is known as the optimal set denoted by \(X_{\text{opt}}\).
\[ X_{\text{opt}} \triangleq \{ \bx \ST f_i(\bx) \leq 0, i=1,\dots,m, \; h_j(\bx) = 0, j=1,\dots,p, \; f_0(\bx) = p^* \}. \]In other words,
\[ X_{\text{opt}} = \{ \bx \in C \ST f_0(\bx) = p^* \}. \]If an optimal point exists in \(C\), then we say that the optimal value is attained or achieved.
If \(X_{\text{opt}}\) is empty, then we say that the optimal value is not attained or not achieved.
In particular, if the problem is unbounded below, then \(X_{\text{opt}}\) is indeed empty.
If the feasible set \(C\) is not closed, then it is quite possible that The optimum value \(p^*\) is finite and yet it is not attained at any feasible point. Then, there exists a sequence \(\{ \bx_k \}\) of feasible points such that \(\lim f(\bx_k) = p^*\). However, there is no \(\bx \in C\) such that \(f(\bx) = p^*\).
A feasible point \(\bx \in C\) with \(f_0(\bx) \leq p^* + \epsilon\) is called an \(\epsilon\)-suboptimal point.
The set of all \(\epsilon\)-suboptimal points is called the \(\epsilon\)-suboptimal set for the problem (8.1).
We say that a feasible point \(\bx\) is locally optimal if there exists \(r > 0\) such that
\[ f_0(\bx) = \inf \{f_0(\bz) \ST \bz \in B[\bx, r] \cap C \}. \]In other words, \(\bx\) minimizes \(f\) over a local neighborhood of feasible points.
If \(\bx\) is feasible and \(f_i(\bx) = 0\), we say that \(i\)-th inequality constraint is active at \(\bx\). Otherwise, \(f_i(\bx) < 0\) and we say that the \(i\)-th inequality constraint is inactive.
We say that a constraint is redundant if removing it does not change the feasible set.
If we choose an orthonormal basis \(\BBB = \{\be_1, \dots, \be_n \}\) for \(\VV\), then \(\VV\) is isomorphic to \(\RR^n\) under the bracket operator \([\cdot]_{\BBB}\). The optimization variable \(\bx\) has a representation \((x_1, \dots, x_n)\) in \(\RR^n\) given by
\[ \bx = \sum_{i=1}^n x_i \be_i. \]Thus, determining \(\bx\) is same as determining its components \((x_1, \dots, x_n)\).
Since there are \(n\) scalar components in the representation of \(\bx\), hence we can say that the optimization problem has \(n\) (scalar) variables.
In the sequel, we will be presenting a variety of optimization problems. We shall show how those problems can be converted to the standard form described above.
(Box constraints standard form)
Let \(f_0 : \RR^n \to \RR\) be a real valued function. Consider the optimization problem
where \(\bx \in \RR^n\) is the optimization variable. There are \(n\) constraints, one on each component of \(\bx\). Each constraint gives a lower and upper bound on one component. Such constraints are known as variable bounds or box constraints.
This problem can be transformed into an equivalent problem:
This form has \(2n\) inequality constraints. We introduce the functions
Then, the problem becomes
This is the optimization problem in standard form with \(2 n\) inequality constraints and 0 equality constraints.
(Maximization problem in standard form)
Consider the problem
If we replace \(f_0\) by \(-f_0\), then maximizing \(f_0\) is same as minimizing \(-f_0\). Thus, this problem has an equivalent form
(Feasibility problem)
Let \(\VV\) be an \(n\)-dimensional real vector space. A mathematical optimization problem of the form
is called a feasibility problem.
We can convert this problem into the standard problem by introducing a cost function which is identically 0:
The domain reduces to
\[ \DDD = \bigcap_{i=1}^m \dom f_i \cap \bigcap_{j=1}^p \dom h_j. \]The optimal value \(p^*=\infty\) if the problem is not feasible.
Otherwise, the optimal value is \(p^* = 0\).
Every feasible point is also an optimal point.
Every optimization problem in standard form can be converted into a feasibility problem by replacing the objective function with the function which is identically 0 everywhere.
In that case, the problem reduces to checking whether the inequality and equality constraints are consistent or not. In other words, whether there are some points which satisfy all the inequality and equality constraints.
8.1.2. Equivalent Problems#
(Equivalent optimization problems)
We provide an informal definition of equivalent optimization problems.
Two optimization problems are called equivalent if it is possible to find the solution of one problem from the other problem and the vice versa.
The primary reason for transforming an optimization problem to an equivalent one is that it may be easier to solve an equivalent problem. If we can transform a problem into a form which has a well known closed form solution or has a well known algorithm to solve the problem, then we can solve the original problem by first solving the equivalent problem and then transforming the solution of the equivalent problem to the solution of the original problem.
Transform the problem to an equivalent problem with a known algorithm for solution.
Solve the equivalent problem.
Transform the solution of the equivalent problem to the solution of the original problem.
In this sense, the computational complexity of solving an optimization problem cannot be greater than the complexity of solving the equivalent problem plus the complexity of transforming the solution of the equivalent problem to the solution of the original problem.
There are several ways to obtain an equivalent optimization problem from an existing optimization problem.
8.1.2.1. Scaling#
(Equivalent problems by scaling of objective or constraint functions)
Consider the problem given by
where \(\alpha_i > 0\) for \(i=0,\dots,m\) and \(\beta_j \neq 0\) for \(j=1,\dots,p\).
This problem is obtained from the optimization problem in standard from in (8.1) by scaling the objective function and the inequality constraint functions by positive constants and the equality constraints by nonzero constants.
\(\alpha_0\) cannot be negative otherwise it will convert a minimization problem to a maximization problem.
\(\alpha_0\) cannot be zero as that will convert the optimization problem into a feasibility problem.
\(\alpha_i\) for \(i=1,\dots,m\) cannot be negative as it will turn the inequality sign from \(f_i(\bx) \leq 0\) to \(f'(\bx) \geq 0\).
\(\alpha_i\) for \(i=1,\dots,m\) cannot be zero as it will remove the corresponding constraint altogether.
\(\beta_j\) cannot be zero. A zero value will discard the corresponding equality constraint.
The feasible sets for both problems are identical.
The optimal sets for both problems are identical.
The optimal values for both problems are not identical however (unless \(\alpha_0 = 1\)). Since \(f'_0\) is a scaled version of \(f_0\), hence the optimal value is also scaled accordingly.
8.1.2.2. Change of Variables#
(Change of variables)
Let \(\phi : \VV \to \VV\) be an injective function with \(\DDD \subseteq \range \phi\).
For the problem (8.1), introduce the following functions
Now, consider the problem
with the variable \(\bz \in \VV\).
We say that the two problems are related by the change of variable or substitution of the variable \(\bx = \phi (\bz)\).
8.1.2.3. Transformation with Monotone Functions#
(Transformation of objective and constraint functions)
For the problem (8.1), we introduce the following:
Let \(\psi_0: \RR \to \RR\) be a strictly increasing function over \(\range f_0\).
Let \(\psi_i : \RR \to \RR\) for \(i=1,\dots,m\) be strictly increasing functions over \(\range f_i\) with \(\psi_i(u) \leq 0 \iff u \leq 0\).
Let \(\eta_j : \RR \to \RR\) for \(j=1,\dots,p\) be real functions that satisfy \(\eta_j(u) = 0 \iff u = 0\).
Let \(\tilde{f}_i (\bx) = \psi_i (f_i(\bx))\) for \(i=0,\dots,m\).
Let \(\tilde{h}_j (\bx) = \eta_j (h_j(\bx))\) for \(j=1,\dots,p\).
Now, consider the problem
with the variable \(\bx \in \VV\).
(Least norms and least norms squared)
Consider the unconstrained least norm problem
with \(\bx \in \RR^n\), \(\bA \in \RR^{m \times n}\) and \(\bb \in \RR^m\). We have \(f_0(\bx) = \| \bA \bx - \bb \|_2\). Then, \(\range f_0 = \RR_+\). Consider \(\psi_0 : \RR \to \RR\) as
Then, \(\psi_0\) is strictly increasing over \(\RR_+\). Let
Then, the least norm minimization problem is equivalent to the problem
which is the least norm squared problem.
We note that while \(\tilde{f}_0\) is differentiable, \(f_0\) is not. It is a key difference between the two problems. Solving the least norm squared problem can be done using gradient methods since the objective function is differentiable.
8.1.2.4. Slack Variables#
(Slack variables)
For the inequality constraints in the problem (8.1), we can introduce the variables \(s_i \geq 0\) so that \(f_i(\bx) + s_i = 0\). This way, we can convert an inequality constraint to an equality constraint and introduce simpler inequality constraints for the variables \(s_i \geq 0\). The resultant problem:
where the optimization variables are \(\bx \in \VV\) and \(\bs \in \RR^m\) is equivalent to (8.1).
The variables \(s_i\) are known as slack variables. \(\bs = (s_1, \dots, s_m)\) is a vector that collects all the slack variables.
This form has \(m\) inequality constraints and \(m+p\) equality constraints.
It has \(n+m\) optimization variables \(x_1, \dots, x_n\) and \(s_1, \dots, s_m\).
If \((\bx, \bs)\) is a feasible point for (8.5), then \(\bx\) is a feasible point for (8.1).
If \(\bx\) is a feasible point for (8.1), then we can pick \(s_i = -f_i(\bx)\) to form \(\bs\) making \((\bx, \bs)\) a feasible point for (8.5).
\(\bx\) is an optimal point for (8.1) if and only if \((\bx, \bs)\) is an optimal point for (8.5) where \(s_i = - f_i(\bx)\).
The slack variables measure how much slack does an inequality constraint function have at a feasible point.
If \(s_i = 0\) then, \(f_i\) is active and it has no slack.
If \(s_i > 0\), then \(f_i\) inactive and it has some slack.
8.1.2.5. Epigraph Polar Form#
(Epigraph polar form)
The problem (8.1) is equivalent to the optimization problem below:
with the optimization variable \((\bx, t) \in \VV \oplus \RR\).
Note that \(t\) appears only in a single inequality in (8.6).
At the optimal point, \(f_0(\bx) = t\) must hold true. If \(f(\bx) < t\), then we can always choose a lower value of \(t\) as the solution.
If \(\bx\) is an optimal point for (8.1), then \((\bx, f(\bx))\) must be an optimal point for (8.6).
Similarly, if \((\bx, t)\) is an optimal point for (8.6), then \(\bx\) must be an optimal point for (8.1) and \(t=f_0(\bx)\) must hold true.
The two problems are indeed equivalent.
This problem has \(m+1\) inequality constraints and \(p\) equality constraints.
The objective function for the epigraph form is a linear function of \((\bx, t)\).
8.1.3. First Order Conditions#
In this subsection, we focus on objective functions of type \(f : \RR^n \to \RR\) which are differentiable.
(First order optimality criterion for local optimal points)
Let \(f : \RR^n \to \RR\) be a real valued function with \(S = \dom f\).
Suppose that \(\bx^* \in \interior S\) is a local optimal point.
Assume that all the partial derivatives of \(f\) exist at \(\bx^*\).
Then, \(\nabla f(\bx^*) = \bzero\).
Let \(i=1,\dots,n\).
Consider the one-dimensional function \(g_i : \RR \to \RR\) given by
\[ g_i (t) = f(\bx^* + t \be_i) \]where \(\be_i\) is the \(i\)-th unit vector of \(\RR^n\).
We note that \(g_i\) is differentiable at \(t=0\) and
\[ g_i'(0) = \frac{\partial f}{\partial x_i} (\bx^*). \]Since \(\bx^*\) is a local optimal point of \(f\), hence \(t=0\) is a local optimal point of \(g_i\).
Thus, \(g_i'(0) = 0\).
Thus, \(\frac{\partial f}{\partial x_i} (\bx^*) = 0\).
Since, this is true for every \(i=1,\dots,n\), hence \(\nabla f(\bx^*) = \bzero\).
We mention that gradient being zero is a necessary condition; i.e., if a point is an optimal point then the gradient of \(f\) at that point must be zero. It is not a sufficient condition. The gradient may be zero and yet the point may not be an optimal point.
(Stationary point)
Let \(f : \RR^n \to \RR\) be a real valued function with \(S = \dom f\). Let \(\bx^* \in \interior S\) and assume that \(f\) is differentiable in some neighborhood of \(\bx^*\). Then, \(\bx^*\) is called a stationary point if \(\nabla f(\bx^*) = \bzero\).
Thus, the locally optimal points are necessarily stationary points.
8.1.4. Second Order Conditions#
Recall the discussion on semidefinite and definite matrices in Eigen Values.
For a twice continuously differentiable function \(f\), the Hessian is symmetric. The semidefiniteness or definiteness of the Hessian provides necessary and sufficient conditions for local optimality at stationary points of \(f\); i.e., the points \(\ba \in \dom f\) where \(\nabla f(\ba) = \bzero\).
We have following necessary conditions for local optimality.
If \(\ba\) is a local minimum point then \(\nabla^2 f(\ba)\) must be positive semidefinite.
If \(\ba\) is a local maximum point then \(\nabla^2 f(\ba)\) must be negative semidefinite.
We have following sufficient conditions for local optimality.
If \(\nabla^2 f(\ba)\) is positive definite, then \(\ba\) is a strict local minimum.
If \(\nabla^2 f(\ba)\) is negative definite, then \(\ba\) is a strict local maximum.
(Necessary second order optimality conditions)
Let \(f : \RR^n \to \RR\) be a real valued function with \(S = \dom f\). Assume that \(S\) is open. Further, assume that \(f\) is twice continuously differentiable over \(S\) and that \(\ba \in S\) is a stationary point.
Then, the following hold.
If \(\ba\) is a local minimum point of \(f\) over \(S\), then \(\nabla^2 f(\ba) \succeq \ZERO\).
If \(\ba\) is a local maximum point of \(f\) over \(S\), then \(\nabla^2 f(\ba) \preceq \ZERO\).
Proof. Assume \(\ba\) to be a local minimum point.
Then, there exists an open ball \(B(\ba, r) \subseteq S\) such that \(f(\bx) \geq f(\ba)\) for all \(\bx \in B(\ba, r)\).
Let \(\bd \in \RR^n\) be a nonzero vector.
For \(0 < t < \frac{r}{\| \bd \|}\), we define
\[ \ba_t = \ba + t \bd. \]By definition, \(\ba_t \in B(\ba, r)\).
Hence, for any \(t \in (0, \frac{r}{\| \bd \|})\), \(f(\ba_t) \geq f(\ba)\).
By linear approximation theorem (Theorem 5.5), there exists a vector \(\bz_t \in [\ba, \ba_t]\) such that
\[ f(\ba_t) - f(\ba) = \nabla f(\ba)^T (\ba_t - \ba) + \frac{1}{2} (\ba_t - \ba)^T \nabla^2 f(\bz_t) (\ba_t - \ba). \]Since \(\ba\) is a stationary point, hence \(\nabla f(\ba) = \bzero\).
Also, by definition of \(\ba_t\), \(\ba_t - \ba = t \bd\).
Thus,
\[ f(\ba_t) - f(\ba) = \frac{t^2}{2} \bd^T \nabla^2 f(\bz_t) \bd. \]Since \(f(\ba)\) is local minimum. Hence, \(f(\ba_t) - f(\ba) \geq 0\).
Thus, for any \(\bd \in \RR^n\) and any \(0 < t < \frac{r}{\| \bd \|}\), we have
\[ \frac{t^2}{2} \bd^T \nabla^2 f(\bz_t) \bd \geq 0. \]By the continuity of the Hessian, and the fact that \(\bz_t \to \ba\) as \(t \to 0^+\), we obtain that
\[ \frac{t^2}{2} \bd^T \nabla^2 f(\ba) \bd \geq 0 \Forall \bd \in \RR^n. \]Thus, \(\nabla^2 f(\ba) \succeq \ZERO\).
The argument for second statement is similar. We can apply the same argument on \(-f\) and recognize that if \(\ba\) is a local maximum for \(f\) then it is a local minimum for \(-f\).
(Sufficient second order optimality conditions)
Let \(f : \RR^n \to \RR\) be a real valued function with \(S = \dom f\). Assume that \(S\) is open. Further, assume that \(f\) is twice continuously differentiable over \(S\) and that \(\ba \in S\) is a stationary point.
Then, the following hold.
If \(\nabla^2 f(\ba) \succ \ZERO\), then \(\ba\) is a strict local minimum point of \(f\) over \(S\).
If \(\nabla^2 f(\ba) \prec \ZERO\), then \(\ba\) is a strict local maximum point of \(f\) over \(S\).
Proof. Let \(\ba \in S\) is a stationary point satisfying \(\nabla^2 f(\ba) \succ \ZERO\).
Since the Hessian is continuous (\(f\) is twice continuously differentiable), hence there exists an open ball \(B(\ba, r)\) such that \(\nabla^2 f(\bx) \succ \ZERO\) for every \(\bx \in B(\ba, r)\).
By linear approximation theorem (Theorem 5.5), for any \(\bx \in B(\ba, r)\), there exists a vector \(\bz \in [\ba, \bx]\) such that
\[ f(\bx) - f(\ba) = \frac{1}{2} (\bx - \ba)^T \nabla^2 f(\bz) (\bx - \ba). \]We note that \(\bz \in B(\ba, r)\). Hence, \(\nabla^2 f(\bz) \succ \ZERO\).
Thus, \(\frac{1}{2} (\bx - \ba)^T \nabla^2 f(\bz) (\bx - \ba) > 0\).
Thus, \(f(\bx) - f(\ba) > 0\) holds true for every \(\bx \in B(\ba, r)\).
Thus, \(\ba\) is a strict local minimum point for \(f\) over \(S\).
The proof for the second statement is similar.
(Hessian based sufficient global optimality condition)
Let \(f : \RR^n \to \RR\) be a real valued function with \(\dom f = \RR^n\). Further, assume that \(f\) is twice continuously differentiable over \(\RR^n\). Suppose that \(\nabla^2 f(\bx) \succeq \ZERO\) for every \(\bx \in \RR^n\). Let \(\ba \in \RR^n\) be a stationary point of \(f\). Then, \(\ba\) is a global minimum point of \(f\).
In other words, if the Hessian of a function is continuous and always positive semidefinite, then all its stationary points are also global minimum points.
Proof. We are given that \(\ba\) is a stationary point.
\(\nabla f(\ba) = \bzero\).
Let \(\bx \in \RR^n\).
By linear approximation theorem (Theorem 5.5), there exists a vector \(\bz \in [\ba, \bx]\) such that
\[ f(\bx) - f(\ba) = \frac{1}{2} (\bx - \ba)^T \nabla^2 f(\bz) (\bx - \ba). \]By hypothesis, \(\nabla^2 f(\bz) \succeq \ZERO\).
Thus, \(f(\bx) - f(\ba) \geq 0\) for every \(\bx \in \RR^n\).
Thus, \(\ba\) is a global minimum point of \(f\).
This result is a special case for the global optimality of convex optimization problems. If a convex function is twice differentiable, then its Hessian is positive semidefinite Theorem 9.100.
8.1.4.1. Saddle Points#
(Saddle point)
Let \(f : \RR^n \to \RR\) be a real valued function with \(S = \dom f\). Assume that \(S\) is open.
A stationary point \(\ba \in S\) is called a saddle point of \(f\) over \(U\) if it is neither a local minimum point nor a local maximum point of \(f\) over \(S\).
(Sufficient condition for saddle points)
Let \(f : \RR^n \to \RR\) be a real valued function with \(S = \dom f\). Assume that \(S\) is open. Further, assume that \(f\) is twice continuously differentiable over \(S\) and that \(\ba \in S\) is a stationary point.
If \(\nabla^2 f(\ba)\) is an indefinite matrix, then \(\ba\) is a saddle point of \(f\) over \(S\).
Proof. We are given that \(\ba\) is a stationary point and \(\nabla^2 f(\ba)\) is indefinite.
Then, \(\nabla f(\ba) = \bzero\).
Recall from Theorem 4.133 that a matrix is indefinite if and only if at least one eigenvalue is positive and at least one eigenvalue is negative.
Let \(\lambda > 0\) be a positive eigenvalue of \(\nabla^2 f(\ba)\) and \(\bv\) be the corresponding normalized eigenvector.
Since \(S\) is open, hence there exists \(r > 0\) such that \(B(\ba, r) \subseteq S\).
Accordingly, \(\ba + t \bv \in B(\ba, r) \subseteq S\) for every \(t \in [0, r)\).
By quadratic approximation theorem (Theorem 5.6),
\[ f(\ba + t \bv) = f(\ba) + \frac{t^2}{2} \bv^T \nabla^2 f(\ba) \bv + o(t^2 \| \bv \|^2). \]Putting \(\nabla^2 f(\ba) \bv = \lambda \bv\) and using \(\| \bv \| = 1\),
\[ f(\ba + t \bv) = f(\ba) + \frac{\lambda t^2}{2} + o(t^2). \]Here \(o: \RR_{++} \to \RR\) is a function satisfying \(\frac{o(x)}{x} \to 0\) as \(x \to 0^+\).
Thus, \(\frac{o(t^2)}{t^2} \to 0\) as \(t \to 0^+\).
For every \(\epsilon > 0\), there exists \(\delta > 0\) such that
\[ \left | \frac{o(t^2)}{t^2} \right | < \epsilon, \text{ whenever } t < \delta. \]Choose \(\epsilon = \lambda\).
Then, \(- \frac{\lambda t^2}{2} < o(t^2) < \frac{\lambda t^2}{2}\) for every \(t < \delta\).
Thus, \(\frac{\lambda t^2}{2} + o(t^2) > 0\) for every \(t < \delta\).
Thus, there exists \(r_1 = \min(\delta, r)\) such that for every \(0 \leq t < r_1\), \(f(\ba + t \bv) > f(\ba)\).
Thus, \(\ba\) is not a local maximum point.
A similar argument with a negative eigenvalue and corresponding normalized eigenvector shows that \(\ba\) cannot be a local minimum point.
Thus, \(\ba\) must be a saddle point.
8.1.5. Minimization of Proper Functions#
Let \(\VV\) be a real \(n\)-dimensional normed linear space. Let \(f : \VV \to \RERL\) be a proper function with \(S = \dom f\). We consider the problem of minimizing \(f\) over a set \(A \subseteq S\).
\(f\) is the objective function or cost function (being minimized).
The set \(A\) is the feasible set or constraint set.
Any \(\bx \in A\) is a feasible solution or feasible point.
If there is at least one feasible point (i.e., \(A\) is nonempty), the problem is feasible.
Otherwise, the problem is infeasible.
Let \(p^* = \inf_{\bx \in A} f(\bx)\) be the optimal value of the minimization problem.
We allow \(p^*\) to take values over the extended real line \(\ERL\).
If the minimization problem is infeasible, then \(p^* = \infty\).
If \(p^* = -\infty\), then the problem is unbounded below.
If there is some \(\bx^* \in A\) such that \(f(\bx^*) = p^*\), then \(\bx^*\) is an optimal solution or optimal point or minimizing point or minimizer or global minimum over \(A\).
Alternatively, \(f\) attains a minimum over \(A\) at \(\bx^*\). We write this as
\[ \bx^* \in \underset{\bx \in A}{\argmin} f(\bx). \]If \(\bx^*\) is a unique minimizer of \(f\) over \(A\), then we abuse the notation and write
\[ \bx^* = \underset{\bx \in A}{\argmin} f(\bx). \]It is possible that \(p^*\) is finite and yet there is no optimal point in \(A\).
In other words, the set \(\underset{\bx \in A}{\argmin} f(\bx)\) may be empty.
A basic question of optimization is whether an optimal solution exists.
(The set of optimal points)
The set of optimal points for the problem of minimization of a proper function \(f\) over a feasible set \(A\) is given by
where \(p^* = \inf_{\bx \in A} f(\bx)\) and \(f^{-1}(y)\) denotes the level set of \(f\) given by \(\{\bx \in S \ST f(\bx) = y \}\).
This comes directly from the fact that
\(\bx^*\) must be feasible. Hence \(\bx^* \in A\).
\(f\) must attain the optimal value at \(\bx^*\). Thus, \(p^* = f(\bx^*)\). Hence, \(\bx^* \in f^{-1}(p^*)\).
In other words, the optimal set is the intersection of the feasible set \(A\) with the level set \(f^{-1}(p^*)\).
Thus, for an optimal solution to exist
\(p^*\) must be finite.
The level set \(f^{-1}(p^*)\) must be nonempty.
The feasible set \(A\) must be nonempty.
The intersection of \(A\) with \(f^{-1}(p^*)\) must be nonempty.
8.1.5.1. Coercive Functions#
(Coercive function)
A proper function \(f: \VV \to \RERL\) is called coercive over a set \(A\) if for every sequence \(\{ \bx_n \}\) of \(A\) such that \( \lim_{k \to \infty} \| \bx_k \| = \infty\), we have \(\lim_{k \to \infty} f(\bx_k) = \infty\).
If \(f\) is coercive over the entire vector space \(\VV\), we say that \(f\) is coercive.
One way to think about coercive functions is that they grow rapidly at the extremes of the domain on which they are defined.
(Level sets of coercive functions)
Let \(f : \VV \to \RERL\) be a coercive proper function. Let \(a \in \RR\). If the level set \(f^{-1}(a)\) is nonempty, then it is bounded.
In other words, all nonempty level sets of a coercive function are bounded.
Proof. For contradiction, assume that there exists \(a \in \RR\) such that \(A = f^{-1}(a)\) is unbounded.
Then, there exists a sequence \(\{ \bx_k \}\) of \(A\) such that \( \lim_{k \to \infty} \| \bx_k \| = \infty\).
Since \(f\) is coercive, hence \(\lim_{k \to \infty} f(\bx_k) = \infty\).
But, by definition, \(f(\bx_k) = a\).
Hence \(\lim_{k \to \infty} f(\bx_k) = a\).
We have a contradiction.
Hence, \(f^{-1}(a)\) must be bounded.
(Sublevel sets of coercive functions)
Let \(f : \VV \to \RERL\) be a coercive proper function with \(S = \dom f\). For every \(r \in \RR\), define the sublevel set \(S_r\) as \(\{ \bx \in S \ST f(\bx) \leq r \}\). If \(S_r\) is nonempty, then it is bounded.
In other words, all nonempty sublevel sets of a coercive function are bounded.
Proof. For contradiction, assume that there exists \(r \in \RR\) such that \(S_r\) is unbounded.
Then, there exists a sequence \(\{ \bx_k \}\) of \(S_r\) such that \( \lim_{k \to \infty} \| \bx_k \| = \infty\).
Since \(f\) is coercive, hence \(\lim_{k \to \infty} f(\bx_k) = \infty\).
But, by definition, \(f(\bx_k) \leq r\).
Hence \(\lim_{k \to \infty} f(\bx_k) \leq r\).
We have a contradiction.
Hence, \(S_r\) must be bounded.
8.1.5.2. Weierstrass’ Theorem#
In this subsection, we examine the problem of unconstrained minimization of a proper closed function. Recall from Definition 3.64 that a function is called closed if all its sublevel sets are closed.
(Unconstrained minimization of a proper closed function)
Let \(\VV\) be a real \(n\)-dim normed linear space. Let \(f : \VV \to \RERL\) be a proper closed function with \(S = \dom f\).
Let \(p^* = \inf_{\bx \in \VV} f(\bx)\). Let \(X = f^{-1}(p^*)\) be the set of minimizers of \(f\) (over all of \(\VV\)). For every \(p \in \RR\), let \(S_p\) denote the sublevel set \(\{ \bx \ST f(\bx) \leq p \}\). Then,
\(p^* < \infty\).
\(X = \bigcap_{p > p^*} S_p\).
\(X\) is closed.
In words, the minimal value \(p^*\) is either finite or \(-\infty\). The set of minimizers is the intersection of all the (closed) sublevel sets for \(p > p^*\). The set of minimizers is a closed set.
Proof. We show that \(p^* < \infty\).
Since \(f\) is proper, hence \(S\) is nonempty.
Thus, there exists \(\bx \in S\) such that \(f(\bx) < \infty\).
Hence, \(p^* = \inf_{\bx \in \VV} f(\bx) < \infty\).
Let \(Y = \bigcap_{p > p^*} S_p\). We have to show that \(X = Y\).
We first show that \(X \subseteq Y\).
let \(\bx \in X\).
Then, \(f(\bx) = p^*\).
Then, \(f(\bx) < p\) for every \(p > p^*\).
Hence, \(\bx \in S_p\) for every \(p > p^*\).
Thus, \(\bx \in Y\).
Thus, \(X \subseteq Y\).
We now show that \(Y \subseteq X\).
Let \(\bx \in Y\).
We claim that \(f(\bx) = p^*\) must hold true.
\(f(\bx)\) cannot be smaller than \(p^*\) since by definition \(p^* = \inf f(\bx)\).
Thus, \(f(\bx) \geq p^*\) must hold true.
Now, for contradiction, assume that \(f(\bx) > p^*\).
Let \(q = f(\bx)\).
Let \(p = \frac{p^* + q}{2}\).
Then, \(p^* < p < q\).
Hence, \(\bx \notin S_p\).
But then, \(\bx \notin Y\) since \(Y \subseteq S_p\) as \(p > p^*\).
A contradiction.
Thus, the only allowed value for \(f(\bx)\) is \(p^*\).
Thus, \(\bx \in X\).
Thus, \(Y \subseteq X\).
Thus,
\[ X = f^{-1}(p^*) = Y = \bigcap_{p > p^*} S_p. \]
We show that \(X\) is closed.
If \(X\) is empty, then it is closed by definition.
Otherwise, \(X\) is an intersection of sublevel sets.
Since \(f\) is closed, its sublevel sets are closed.
Hence \(S_p\) is closed for every \(p\).
Thus, \(X\) is an intersection of closed sets.
Thus, \(X\) is closed.
(Weierstrass’ theorem)
Let \(\VV\) be a real \(n\)-dim normed linear space. Let \(f : \VV \to \RERL\) be a proper closed function with \(S = \dom f\).
Let \(p^* = \inf_{\bx \in \VV} f(\bx)\). Let \(X = f^{-1}(p^*)\) be the set of minimizers of \(f\) (over all of \(\VV\)). For every \(p \in \RR\), let \(S_p\) denote the sublevel set \(\{ \bx \ST f(\bx) \leq p \}\).
Assume that one of the following conditions are true.
\(S = \dom f\) is closed and bounded.
There exists a scalar \(r \in \RR\) such that the sublevel set \( S_r = \{ \bx \ST f(\bx) \leq r \}\) is nonempty and bounded.
\(f\) is coercive.
Then, \(X\) (the set of minimizers of \(f\)) is nonempty and compact.
Here is this result in a simplified language. The theorem addresses the problem of unconstrained minimization of a function.
If \(f\) is proper and closed and its domain is closed and bounded, then its set of minimizers is nonempty and compact.
If \(f\) is proper and closed and any sublevel set of \(f\) is bounded, then its set of minimizers is nonempty and compact.
If \(f\) is proper, closed and coercive, then its set of minimizers is nonempty and compact.
Since the set of minimizers is nonempty and compact, it is nonempty, closed and bounded.
Proof. If there exists \(\bx \in S\) such that \(f(\bx) = p^*\), then \(X\) is nonempty. To show that \(X\) is compact, we just need to show that it is closed and bounded. By Theorem 4.66, every closed and bounded subset of \(\VV\) which is a real \(n\)-dimensional normed linear space is compact. By Theorem 8.8, the optimal value \(p^* < \infty\) and \(X\), the set of minimizers of a proper and closed function, is closed. Thus, we just need to show that \(X\) is nonempty and bounded.
Assume that condition (1) holds.
Since \(f\) is proper, hence \(S\) is nonempty.
Consider a sequence \(\{ \bx_k \}\) of \(S\) such that \(\lim_{k \to \infty} f(\bx_k) = p^*\).
Since \(S\) is bounded, hence \(\{ \bx_k \}\) has a convergent subsequence by Bolzano-Weierstrass theorem (Theorem 4.69).
Let \(\{ \bx_l \}\) be the convergent subsequence of \(\{ \bx_k \}\) with \(\lim \bx_l = \bx^*\).
Such a sequence exists since \(p^* = \inf_{\bx \in \VV} f(\bx)\).
Thus, for every \(k \in \Nat\), there exists \(\bx_k \in S\) such that \(f(\bx_k) - p^* < \frac{1}{k}\).
Since \(S\) is closed, hence \(\bx^* \in S\).
Since \(\{ f(\bx_k) \}\) is convergent and \(\{ f(\bx_l) \}\) is a subsequence of \(\{ f(\bx_k) \}\), hence
\[ \lim_{l \to \infty} f(\bx_l) = \lim_{k \to \infty} f(\bx_k) = p^*. \]Since \(f\) is closed, hence it is lower semicontinuous (l.s.c.) at \(\bx^* \in S\) (see Theorem 3.119).
Also, note that since \(\{ f(\bx_k) \}\) is convergent, hence
\[ \liminf_{l \to \infty} f(\bx_l) = \lim_{l \to \infty} f(\bx_l). \]Since \(f\) is l.s.c. at \(\bx^* \in S\), hence by Theorem 3.108,
\[ f(\bx^*) \leq \liminf_{l \to \infty} f(\bx_l) = \lim_{l \to \infty} f(\bx_l) = \liminf_{k \to \infty} f(\bx_k) = p^*. \]Since \(p^*\) is the infimum value of \(f\), hence \(f(\bx^*)\) cannot be smaller than \(p^*\).
Thus, \(f(\bx^*) = p^*\) must be true.
Since \(f\) is proper and \(\bx^* \in S\), hence \(p^* > -\infty\). Thus the optimal value is finite.
Thus, \(f(\bx^*) = p^*\) means that \(\bx^*\) is an optimal solution for the minimization of \(f\).
Thus, the set \(X = f^{-1}(p^*)\) is nonempty.
\(X\) is bounded since \(S\) is bounded by hypothesis.
Since \(X\) is closed and bounded. Hence, \(X\) is compact.
Assume that condition (2) holds.
The sublevel set \(S_r\) is nonempty and bounded.
Since \(f\) is closed, hence \(S_r\) is also closed.
Consider the restriction of \(f\) on \(S_r\) given by
\[\begin{split} \tilde{f} = \begin{cases} f(\bx), & f(\bx) \leq r \\ \infty, & \text{ otherwise }. \end{cases} \end{split}\]Then, \(\dom \tilde{f} = S_r\). Thus, \(\dom \tilde{f}\) is nonempty, closed and bounded.
Then, \(\tilde{f}\) never takes the value \(-\infty\) and is not identically \(\infty\). Hence, \(\tilde{f}\) is a proper function.
Since \(f\) is closed, hence sublevel sets of \(\tilde{f}\) are also closed.
For any \(p > r\), the sublevel set of \(\tilde{f}\) is identical to \(S_r\).
For any \(p \leq r\), the sublevel set of \(\tilde{f}\) is identical to \(S_p\), the corresponding sublevel set of \(f\).
Thus, \(\tilde{f}\) is closed too.
Applying condition (1) on \(\tilde{f}\), the set of minimizers of \(\tilde{f}\) is nonempty and compact.
We also note that the minimizers of \(f\) are identical to the minimizers of \(\tilde{f}\).
Thus, the set of minimizers of \(f\) is nonempty and compact.
Assume that condition (3) holds.
Since \(f\) is proper, hence it has some nonempty sublevel sets.
Let \(r \in \RR\) be one such scalar such that the sublevel set \(S_r = \{ \bx \in S \ST f(\bx) \leq r \}\) is nonempty.
By Theorem 8.7, the nonempty sublevel sets of \(f\) are bounded. Hence, \(S_r\) is bounded.
Then, by applying condition (2), the set of minimizers of \(f\) is nonempty and compact.
(Minimizing a proper closed function over a closed set)
Let \(f : \VV \to \RERL\) be a proper closed function with \(S = \dom f\). Let \(A \subseteq S\) be a nonempty closed set. Consider the problem of minimizing \(f\) over \(A\).
Further, assume that one of the following conditions are true.
\(A\) is bounded.
There exists a scalar \(r \in \RR\) such that the set \(\{ \bx \in A \ST f(\bx) \leq r \}\) is nonempty and bounded.
\(f\) is coercive over \(A\).
Then, the set of minimizers of \(f\) over \(A\) is nonempty and compact.
Proof. We define a restriction \(g : \VV \to \RERL\) of \(f\) over the set \(A\) as follows
\(\dom g = A\). Thus, \(\dom g\) is closed.
Since \(f\) never takes the value \(-\infty\), hence \(g\) also never takes the value \(-\infty\).
Since \(A\) is nonempty, hence there exists \(\bx \in \VV\) such that \(g(\bx) < \infty\).
Hence, \(g\) is a proper function.
Also, note that the set \(\{ \bx \in A \ST f(\bx) \leq r \}\) is nothing but the sublevel set of \(g\) for the scalar \(r\).
\[ \{ \bx \ST g(\bx) \leq r \} = A \cap \{ \bx \ST f(\bx) \leq r \}. \]Since \(f\) is closed, all its sublevel sets are closed.
\(A\) is closed, hence \( A \cap \{ \bx \ST f(\bx) \leq r \}\) is also closed.
Hence, all the sublevel sets of \(g\) are also closed. Hence, \(g\) is also closed.
The set of minimizers of \(f\) over \(A\) is nothing but the set of minimizers of \(g\).
If \(f\) is coercive over \(A\), then \(g\) is coercive (over its entire domain).
Thus, applying Theorem 8.9, the set of minimizers of \(g\) is nonempty and compact. So is the set of minimizers of \(f\) over \(A\).
(Minimizing a real valued l.s.c. function over a closed set)
Let \(f : \VV \to \RR\) be a real valued function with \(\dom f = \VV\). Let \(A \subseteq \VV\) be a nonempty closed set. Assume that \(f\) is lower semicontinuous over \(A\). Consider the problem of minimizing \(f\) over \(A\).
Further, assume that one of the following conditions are true.
\(A\) is bounded.
There exists a scalar \(r \in \RR\) such that the set \(\{ \bx \in A \ST f(\bx) \leq r \}\) is nonempty and bounded.
\(f\) is coercive over \(A\).
Then, the set of minimizers of \(f\) over \(A\) is nonempty and compact.
Proof. We define a restriction \(g : \VV \to \RERL\) of \(f\) over the set \(A\) as follows
\(\dom g = A\). Thus, \(\dom g\) is closed.
Since \(f\) is real valued hence \(g\) also never takes the value \(-\infty\).
Since \(A\) is nonempty, hence there exists \(\bx \in \VV\) such that \(g(\bx) < \infty\).
Hence, \(g\) is a proper function.
Since \(f\) is l.s.c. on \(A\), hence \(g\) is closed.
Also, note that the set \(\{ \bx \in A \ST f(\bx) \leq r \}\) is nothing but the sublevel set of \(g\) for the scalar \(r\).
The set of minimizers of \(f\) over \(A\) is nothing but the set of minimizers of \(g\).
If \(f\) is coercive over \(A\), then \(g\) is coercive (over its entire domain).
Thus, applying Theorem 8.9, the set of minimizers of \(g\) is nonempty and compact. So is the set of minimizers of \(f\) over \(A\).
8.1.6. Some Simple Problems and Technical Results#
In this subsection, we provide some technical results on some simple optimization problems. These results are useful building blocks for bigger problems.
(Nonnegativity of affine functional on nonnegative orthant)
Consider the affine function \(f: \RR^n \to \RR\) given by
Then, \(f(\bx) \geq 0\) for every \(\bx \succeq \bzero\) if and only if \(\ba \succeq \bzero\) and \(b \geq 0\).
Proof. Assume that \(\ba \succeq \bzero\) and \(b \geq 0\). Then \(f(\bx)\) is product and sum of nonnegative terms. Hence, \(f(\bx) \geq 0\).
For the converse, assume that \(f(\bx) \geq 0\) for every \(\bx \succeq \bzero\).
If \(b < 0\), then for \(\bx = \bzero\), \(f(\bx) = b < 0\). Hence, \(b\) cannot be smaller than 0.
Consider some \(\ba \prec \bzero\).
Assume that \(a_i < 0\) for some \(i \in [1,\dots,n]\).
Let \(\bx\) be such that \(x_j = 0\) for all \(j \neq i\) and \(x_i = t\) for some \(t > 0\).
Then, \(f(\bx) = t a_i + b\).
By increasing \(t\) suitably, we can get \(f(\bx) < 0\).
Thus, \(\ba \succeq \bzero\) is also necessary to ensure that \(f(\bx) \geq 0\) for all \(\bx \succeq \bzero\).
(Minimization of linear functional over unit ball)
For any \(\ba \in \RR^n\), the optimal value of the problem
is \(- \| \ba\|\).
Proof. If \(\ba = \bzero\), then \(\ba^T \bx = 0\). The minimum value is 0 which is equal to \(-\| \ba \|\).
Now consider the case where \(\ba \neq \bzero\).
By Cauchy Schwartz inequality
\[ \ba^T \bx \geq - \| \ba \| \| \bx \| \geq - \| \ba \| \]for any \(\| \bx \| \leq 1\).
Thus, we have established a lower bound:
\[ \underset{ \| \bx \| \leq 1 }{\inf} \ba^T \bx \geq - \| \ba \|. \]We now show that the lower bound is indeed attained.
Let \(\bx = \frac{-\ba}{ \| \ba \|}\).
Then \(\| \bx \| = 1\). Thus, \(\bx\) belongs to the unit ball.
Now \(\ba^T \bx = - \| \ba \|\). Thus, the lower bound is attained.