Smoothness
Contents
9.18. Smoothness#
Primary references for this section are [7].
9.18.1. L-Smooth Functions#
Definition 9.80 (L-smooth functions)
For some 
The constant 
- Since - If - The class of functions which are - When - The class of functions which are - By definition, if a function is - Thus, it is often useful to identify the smallest possible value of 
Example 9.82 (Zero smoothness of affine functions)
Let 
Then, 
To see this, we note that 
Theorem 9.263 (Smoothness of quadratic functions)
Let 
Then, 
and 
Proof. We note that the dual norm of 
Thus, 
We next show that 
- Assume that - By definition of - Then, 
- Thus, 
Thus, 
9.18.1.1. Descent Lemma#
Theorem 9.264 (Descent lemma)
Let 
Proof. we proceed as follows:
- By the fundamental theorem of calculus 
- By adding and subtracting - This gives us - (a) is an application of Generalized Cauchy Schwartz inequality (Theorem 4.108}). 
- (b) is the application of 
 
- Thus, 
9.18.1.2. Characterization of 
Theorem 9.265  (Characterization of 
Let 
Proof. (1) 
(2) 
- We are given that (2) is satisfied. 
- If - Fix a - Consider a function - Then, 
- By hypothesis in property (2), for any 
- Now, 
- Thus, - We note in particular that - Since - In other words, 
- We can also see that - Let 
- Let - Choose 
- Then, 
- Using property (2) on - Simplifying this, we get - as desired. 
(3) 
- For - For - Adding the two inequalities and canceling the term - Rearranging, we get - as desired. 
(4) 
- When - By generalized Cauchy Schwartz inequality (Theorem 4.108) 
- Thus, combining with hypothesis (4), we obtain 
- Since - Canceling it from both sides, we get - as desired. 
We have shown so far that (1), (2), (3) and (4) are equivalent statements. We are left with showing that (5) is equivalent to the other statements.
(2) 
- Pick - Let - By hypothesis (2), 
- Note that - Thus, the previous two inequalities are same as 
- Multiplying the first inequality by - Rearranging, we get 
(5) 
- Pick - By hypothesis in inequality (5) 
- Rearranging the terms, we obtain - Division by - Recalling the definition of directional derivative (Definition 9.70): 
- Since the previous inequality is valid for every - Recall from Theorem 9.203 that - Thus, we get: - as desired. 
9.18.1.3. Second Order Characterization#
We now restrict our attention to the vector space 
Theorem 9.266  (
Let 
Proof. (2) 
- We are given that - By the fundamental theorem of calculus 
- Taking the (dual)-norm on both sides 
- Thus, 
(1) 
- We are given that - By fundamental theorem of calculus, for any - Taking - Dividing by - Since this is valid for every 
Corollary 9.27  (
Let 
Proof. Since 
From Theorem 9.266, 
9.18.2. Strong Convexity#
Definition 9.81 (Strong convexity)
A function 
9.18.2.1. Strong Convexity 
Strongly convex functions are convex. In fact, we have a stronger result available.
Theorem 9.267 (Strong convexity and convexity)
Assume that the ambient space 
A function 
Proof. Let us define a function 
We need to show that 
- We first note that - Thus, - Now, - Now, 
- Since the norm is Euclidean, hence 
- Thus, the convexity inequality for - which is nothing but the 
9.18.2.2. Quadratic Functions#
Theorem 9.268 (Strong convexity of quadratic functions)
Let 
Then 
Proof. Due to Theorem 9.267,
- We note that 
- As shown in Example 9.40, - This is equivalent to 
9.18.2.3. Coerciveness#
Theorem 9.269 (Strong convexity and coerciveness)
Assume that the ambient space 
Proof. We proceed as follows.
- Define 
- By Theorem 9.267, - Since - Specifically, - Fix some - Then - By subgradient inequality, for any - Expanding - Let - Rearranging terms - where - We note that the term - By Cauchy-Schwarz inequality 
- Hence 
- It is easy to see that, the R.H.S. goes to - Hence 
9.18.2.4. Sum Rule#
Theorem 9.270 (Sum of strongly convex and convex functions)
Let 
Proof. Since both 
We further need to show that 
- Let - Since - Since - Then, 
- Thus, 
Example 9.83  (Strong convexity of 
Let 
- The function - Let - Then, the indicator function - Due to Theorem 9.270, the function - is also 1-strongly convex. 
9.18.2.5. First Order Characterization#
Recall that 
Theorem 9.271 (First order characterization of strong convexity)
Let 
- For every - For any 
Proof. We shall prove the equivalence of these statements in the following order.
(2) 
- We assume that (2) is true. 
- Let - We need to show that (9.10) holds for - Since - Let - Choose some - Let - By the line segment property (Theorem 9.143), - Let - Again, by the line segment property, - Since - Thus, - Take some - By hypothesis (2) 
- Substituting - Similarly, by hypothesis (2) 
- This gives us, 
- Multiplying the first inequality by - Thus, 
- Expanding - Define - Define - Substituting these into the previous inequality, we obtain 
- The functions - By Theorem 9.173, both - Therefore, taking the limit - Now - Thus, 
- This establishes that 
(1) 
- We are given that - Let - Pick any - Let - By the hypothesis 
- This is same as 
- We can see that - Dividing both sides of inequality by - Since - We can rewrite this as 
- Note that - Thus, 
- Thus, 
- This inequality holds for every - Taking the limit - An identical reasoning by switching the roles of - Adding these two inequalities gives us 
- Multiplying both sides by - as desired. 
(3) 
- We are given that (3) is satisfied. 
- Let - Pick any - Pick some - Define - By line segment property - Define - Consider the 1D function 
- Pick any - Then, by line segment principle - Due to (Theorem 9.217), - Take some - By subgradient inequality 
- In particular, for - Since this is valid for every - Applying the mean value theorem (Theorem 9.234) 
- Since - But - Hence 
- This simplifies to - Canceling - Applying the inequality to the integral above 
- Integrating, we get 
- Expanding for - The 1D function - Taking the limit - which is the desired result. 
9.18.2.6. Minimization#
Theorem 9.272 (Existence and uniqueness of a a minimizer of closed strongly convex function)
Let 
- The increase in the value of - where 
Proof. (1) Existence of the minimizer
- Since - Since - Pick - By Theorem 9.214, - Pick some - Then, by property 2 of Theorem 9.271, - holds true for every - Let - Since all norms in a finite dimensional space are equivalent, hence, there exists a constant - for every - Therefore, 
- This in turn is same as 
- Let - Consider the sublevel set - Let - Then, - But then 
- This simplifies to 
- Since - Thus, we require that 
- This simplifies to 
- In other words, - Since this is valid for every - Since - since - Thus, - Since - Thus, the problem of minimizing - Since - By Theorem 3.121, - Thus, we have established the existence of a minimizer of 
(1) Uniqueness of the minimizer
- To show the uniqueness, for contradiction, assume that - Let - We must have - By strong convexity of - If - Hence, the minimizer must be unique. 
(2) Increase in value of 
- Let - By Fermat’s optimality condition - Since - holds true for any 
9.18.3. Smoothness and Strong Convexity#
9.18.3.1. The Conjugate Correspondence Theorem#
The idea of smoothness and strong convexity is connected. Roughly speaking, a function is strongly convex if and only if its conjugate is smooth.
Theorem 9.273 (Conjugate correspondence theorem)
Let 
- If - If 
Proof. (1) Smooth convex to strongly convex conjugate
- We are given that - Due to Theorem 9.239, - Since - Thus - Pick any - Let - Since - Since - Following characterization of smoothness (Theorem 9.265), by its property 4, 
- Since the last inequality holds for any 
(2) Strongly convex to smooth conjugate
- We are given that - Pick any - The conjugate is given by 
- Define - We can see that 
- Due to the sum rule (Theorem 9.270), - Due to Theorem 9.272, - Hence - Since this is valid for any - This justifies the signature for - Let’s continue with any - Since - Now, by the second formulation of conjugate subgradient theorem (Corollary 9.26), 
- We can see that 
- Since - Due to Theorem 9.220, - Since - We now pickup two points - By conjugate subgradient theorem (Theorem 9.246), this is equivalent to - Following the first order characterization of strong convexity in Theorem 9.271, 
- In other words 
- By generalized Cauchy Schwartz inequality (Theorem 4.108) 
- Thus the previous inequality simplifies to 
- This establishes that 
9.18.4. Examples#
Example 9.84  (Smoothness of 
Let 
- Note that for any - The Hessian is given by 
- Therefore, - Hence, by Corollary 9.27, 
9.18.4.1. Log-Sum-Exp#
Example 9.85 (Smoothness of log-sum-exp)
Consider the log-sum-exp function 
Smoothness w.r.t. 
- The partial derivatives of - The second order partial derivatives are 
- The Hessian can be written as - where - We can now see that 
- Hence - Hence, by Corollary 9.27, 
Smoothness w.r.t. 
- We first show that for any 
- To see this, we expand the L.H.S. as 
- Since - Let - Then from above, 
- Putting this back in the approximation, we have 
- Following characterization of smoothness (Theorem 9.265), 
9.18.4.2. Negative Entropy#
Example 9.86 (Strong convexity of negative entropy over the unit simplex)
Let 
- By Theorem 9.257, its conjugate is given by - which is the log sum exp function. 
- By Example 9.85, the log-sum-exp function is 1-smooth w.r.t. both - Hence by conjugate correspondence theorem Theorem 9.273,