Differentiation
Contents
5.1. Differentiation#
We consider functions from
5.1.1. Differentiability and Jacobian#
(Differentiability at a point)
Let
Such a matrix
There can be at most one
If we write
The matrix
The Jacobian
is an real matrix.Partial derivatives of each component of
(i.e., ) line up on the -th row.Partial derivatives for one coordinate
line up on the -th column.If
is single valued, then the Jacobian is a row vector.
(Jacobian of identity function)
Let
Then,
Thus
the
(Jacobian of linear transformation)
Let
where
Then,
Thus
(Jacobian of affine transformation)
Let
where
Then,
Thus
The vector
(Differentiable function)
A function
(First order approximation)
The affine function given by:
is called the first order approximation of
5.1.2. Real Valued Functions#
Rest of this section focuses mostly on real valued functions of type
First order derivative of a real valued function is called a gradient.
Second order derivative of a real valued function is called a Hessian.
We consider first order and second order approximations of a real valued function.
5.1.3. Gradient#
(Gradient)
When
at
For real valued functions, the derivative is a row vector but the gradient is a column vector.
The components of the gradient are given by the partial derivatives as:
(Gradient of linear functional)
Let
We can expand it as:
Computing partial derivative with respect to
Putting the partial derivatives together, we get:
(Gradient of affine functional)
Let
where
We can expand it as:
Computing partial derivative with respect to
Putting the partial derivatives together, we get:
The intercept
(Gradient of quadratic form)
Let
where
We can expand it as:
Note that the diagonal elements
There are
Taking partial derivative w.r.t.
The first term comes from
term that is quadratic in .The first sum comes from linear terms where
and except .The second sum comes from linear terms where
and except .There are
terms in the sums and terms.We can move one
into each sum to simplify the partial derivative as:
Note that the
Similarly, the
Thus,
Putting together the partial derivatives, we obtain:
If
Let
We can write this as
where
Following, Example 5.6,
(Gradient of quadratic functional)
Let
We can compute the gradient as follows:
We took advantage of the fact that gradient operation commutes with scalar multiplication and distributes on vector addition.
Since
is a constant, it has no contribution to the derivative.We reused results from previous examples.
We utilized the fact that
since is symmetric.
In summary:
The derivative of
(Gradient mapping)
If a real valued function
5.1.4. Continuous Differentiability#
(Continuously differentiable real valued function)
Let
If
If
5.1.5. First Order Approximation#
(First order approximation of real valued functions)
The affine function given by:
is the first order approximation of
a real valued function
(First order approximation accuracy)
Let
Another way to write this result is:
where
5.1.6. Chain Rule#
(Chain rule)
Suppose
Then,
Notice how the derivative lines up as a simple matrix multiplication.
(Chain rule for real valued functions)
Suppose
Then,
(Gradient of log-sum-exp)
Let
with
Let
Then, we can see that
Thus,
Now, if we define
then, we see that:
Using this notation:
Let
with
Let
Let
Then, we can see that
and (from Example 5.7)
Thus, for every
The gradient of
(Chain rule for composition with affine function)
Suppose
with
The derivative of
If
(Chain rule for restriction on a line)
Let
where
If we define
we can see that:
By chain rule:
In particular, if
5.1.7. Hessian#
In this section, we review the second derivative
of a real valued function
(Hessian)
The second derivative or Hessian matrix
of
provided
(Hessian of linear functional)
Let
We can expand it as:
Computing partial derivative with respect to
If we further compute the partial derivative w.r.t.
Thus, the Hessian is an
Hessian is the derivative of the gradient mapping.
(Hessian of quadratic form)
Let
where
Recall from Example 5.6 that:
Also recall from Example 5.2 that
for all
Thus, using Theorem 5.3
If
(Hessian of log-sum-exp)
Let
with
Define
then, we see that:
Using this notation:
We have:
Proceeding to compute the second derivatives:
Now, note that
Thus,
Alternatively,
(Derivatives for least squares cost function)
Let
Expanding it, we get:
Note that
And the Hessian is:
(Derivatives for quadratic over linear function)
Let
with
The gradient is obtained by computing the partial derivatives
w.r.t.
The Hessian is obtained by computing second order partial derivatives:
5.1.8. Twice Continuous Differentiability#
(Twice continuously differentiable real valued function)
Let
If
If
(Symmetry of Hessian)
If
5.1.9. Second Order Approximation#
(Linear approximation theorem)
Let
(Quadratic approximation theorem)
Let
(Second order approximation)
The second order approximation of
5.1.10. Smoothness#
5.1.10.1. Real Functions#
(Class of continuous functions)
The class of continuous real functions,
denoted by
Let
Then, we say that
In other words, the
consists of class of continuous real functions. consists of class of continuously differentiable functions. consists of class of smooth functions which are infinitely differentiable.
5.1.10.2. Real Valued Functions on Euclidean Space#
A function
exist and are continuous for every
If
is continuous, it is said to belong to or .If
is continuously differentiable, it is said to belong to .If
is twice continuously differentiable, it is said to belong to .If
is infinitely differentiable, it is said to belong to .