5.1. Differentiation#

We consider functions from $R^{n}$ to $R^{m}$ .

5.1.1. Differentiability and Jacobian#

Definition 5.1 (Differentiability at a point)

Let $f : R^{n} \to R^{m}$ . Let $x \in int dom f$ . The function $f$ is differentiable at $x$ if there exists a matrix $D f (x) \in R^{m \times n}$ that satisfies

(5.1)#

lim_{z \in dom f, z \neq x, z \to x} \frac{‖ f (z) - f (x) - D f (x) (z - x) ‖_{2}}{‖ z - x ‖_{2}} = 0.

Such a matrix $D f (x)$ is called the derivative (or Jacobian) of $f$ at $x$ .

There can be at most one $D f (x)$ satisfying the limit in (5.1).

Observation 5.1

If we write $z = x + h$ then an alternative form for (5.1) is given by:

lim_{x + h \in dom f, h \neq 0, h \to 0} \frac{‖ f (x + h) - f (x) - D f (x) h ‖_{2}}{‖ h ‖_{2}} = 0.

The matrix $D f (x)$ can be obtained from the partial derivatives:

D f (x)_{i j} = \frac{\partial f_{i} (x)}{\partial x_{j}}, i = 1, \dots, m, j = 1, \dots, n .

\begin{array}{r} D f (x) = [\begin{array}{c} \frac{\partial f_{1} (x)}{\partial x_{1}} & \frac{\partial f_{1} (x)}{\partial x_{2}} & \dots & \frac{\partial f_{1} (x)}{\partial x_{n}} \\ \frac{\partial f_{2} (x)}{\partial x_{1}} & \frac{\partial f_{2} (x)}{\partial x_{2}} & \dots & \frac{\partial f_{2} (x)}{\partial x_{n}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial f_{m} (x)}{\partial x_{1}} & \frac{\partial f_{m} (x)}{\partial x_{2}} & \dots & \frac{\partial f_{m} (x)}{\partial x_{n}} \end{array}] . \end{array}

The Jacobian $D f (x)$ is an $m \times n$ real matrix.
Partial derivatives of each component of $f$ (i.e., $f_{i}$ ) line up on the $i$ -th row.
Partial derivatives for one coordinate $x_{j}$ line up on the $j$ -th column.
If $f$ is single valued, then the Jacobian $D f (x)$ is a row vector.

Example 5.1 (Jacobian of identity function)

Let $f : R^{n} \to R^{n}$ be defined as:

f (x) = x .

Then, $f_{i} (x) = x_{i}$ . Hence,

\frac{\partial f_{i} (x)}{\partial x_{j}} = δ (i, j) .

Thus

D f (x) = I_{n}

the $n \times n$ identity matrix.

Example 5.2 (Jacobian of linear transformation)

Let $f : R^{n} \to R^{m}$ be defined as:

f (x) = A x

where $A = (a_{i j})$ is an $m \times n$ real matrix.

Then, $f_{i} (x) = \sum_{j = 1}^{n} a_{i j} x_{j}$ . Hence,

\frac{\partial f_{i} (x)}{\partial x_{j}} = a_{i j} .

Thus

D f (x) = A .

Example 5.3 (Jacobian of affine transformation)

Let $f : R^{n} \to R^{m}$ be defined as:

f (x) = A x + b

where $A = (a_{i j}) \in R^{m \times n}$ and $b \in R^{m}$ .

Then, $f_{i} (x) = \sum_{j = 1}^{n} a_{i j} x_{j} + b_{i}$ . Hence,

\frac{\partial f_{i} (x)}{\partial x_{j}} = a_{i j} .

Thus

D f (x) = A .

The vector $b$ is a constant offset. It has no impact on the derivative.

Definition 5.2 (Differentiable function)

A function $f$ is called differentiable if its domain $dom f$ is open and it is differentiable at every point of $dom f$ .

Definition 5.3 (First order approximation)

The affine function given by:

(5.2)#

\hat{f} (x) = f (a) + D f (a) (x - a)

is called the first order approximation of $f$ at $x = a \in int dom f$ .

5.1.2. Real Valued Functions#

Rest of this section focuses mostly on real valued functions of type $f : R^{n} \to R$ .

First order derivative of a real valued function is called a gradient.
Second order derivative of a real valued function is called a Hessian.
We consider first order and second order approximations of a real valued function.

5.1.3. Gradient#

Definition 5.4 (Gradient)

When $f : R^{n} \to R$ is a real valued function, then the derivative $D f (x)$ is a $1 \times n$ matrix. The gradient of a real valued function is defined as:

\nabla f (x) = D f (x)^{T}

at $x \in int dom f$ if $f$ is differentiable at $x$ .

For real valued functions, the derivative is a row vector but the gradient is a column vector.

The components of the gradient are given by the partial derivatives as:

\nabla f (x)_{i} = \frac{\partial f (x)}{\partial x_{i}}, i = 1, \dots, n .

Example 5.4 (Gradient of linear functional)

Let $f : R^{n} \to R$ be a linear functional given by:

f (x) = ⟨ x, a ⟩ = a^{T} x .

We can expand it as:

f (x) = \sum_{j = 1}^{n} a_{j} x_{j} .

Computing partial derivative with respect to $x_{i}$ , we get:

\frac{\partial f (x)}{\partial x_{i}} = \frac{\partial}{\partial x_{i}} (\sum_{j = 1}^{n} a_{j} x_{j}) = a_{i} .

Putting the partial derivatives together, we get:

\nabla f (x) = a .

Example 5.5 (Gradient of affine functional)

Let $f : R^{n} \to R$ be a affine functional given by:

f (x) = a^{T} x + b

where $a \in R^{n}$ and $b \in R$ .

We can expand it as:

f (x) = \sum_{j = 1}^{n} a_{j} x_{j} + b .

Computing partial derivative with respect to $x_{i}$ , we get:

\frac{\partial f (x)}{\partial x_{i}} = \frac{\partial}{\partial x_{i}} (\sum_{j = 1}^{n} a_{j} x_{j} + b) = a_{i} .

Putting the partial derivatives together, we get:

\nabla f (x) = a .

The intercept $b$ is a constant term which doesn’t affect the gradient.

Example 5.6 (Gradient of quadratic form)

Let $f : R^{n} \to R$ be a quadratic form given by:

f (x) = x^{T} A x

where $A \in R^{n \times n}$ .

We can expand it as:

f (x) = \sum_{i = 1}^{n} \sum_{j = 1}^{n} x_{i} a_{i j} x_{j} .

Note that the diagonal elements $a_{i i}$ give us terms of the form $a_{i i} x_{i}^{2}$ . Let us split the expression into diagonal and non-diagonal terms:

\begin{array}{r} f (x) = \sum_{i = 1}^{n} a_{i i} x_{i}^{2} + \sum_{\begin{array}{c} i, j \\ i \neq j \end{array}} x_{i} a_{i j} x_{j} . \end{array}

There are $n$ terms in the first sum (the diagonal entries of $A$ ) and $n^{2} - n$ terms in the second sum (the non-diagonal entries of $A$ ).

Taking partial derivative w.r.t. $x_{k}$ , we obtain:

\begin{array}{r} \frac{\partial f (x)}{\partial x_{k}} = 2 a_{k k} x_{k} + \sum_{\begin{array}{c} i \\ i \neq k \end{array}} x_{i} a_{i k} + \sum_{\begin{array}{c} j \\ j \neq k \end{array}} a_{k j} x_{j} . \end{array}

The first term comes from $a_{k k}$ term that is quadratic in $x_{k}$ .
The first sum comes from linear terms where $k = j$ and $i = 1, \dots, n$ except $i \neq k$ .
The second sum comes from linear terms where $k = i$ and $j = 1, \dots, n$ except $j \neq k$ .
There are $2 n - 2$ terms in the sums and $2$ $a_{k k} x_{k}$ terms.
We can move one $a_{k k} x_{k}$ into each sum to simplify the partial derivative as:

\frac{\partial f (x)}{\partial x_{k}} = \sum_{i = 1}^{n} x_{i} a_{i k} + \sum_{j = 1}^{n} a_{k j} x_{j} .

Note that the $k$ -th component of the vector $u = A x$ is $\sum_{j = 1}^{n} a_{k j} x_{j}$ .

Similarly, the $k$ -th component of the vector $v = A^{T} x$ is $\sum_{i = 1}^{n} a_{i k} x_{i}$ .

Thus,

\frac{\partial f (x)}{\partial x_{k}} = v_{k} + u_{k} .

Putting together the partial derivatives, we obtain:

\nabla f (x) = v + u = A^{T} x + A x = (A^{T} + A) x = (A + A^{T}) x .

If $A$ is symmetric then,

\nabla f (x) = 2 A x .

Example 5.7 (Gradient of squared $ℓ_{2}$ norm)

Let $f : R^{n} \to R$ be a quadratic form given by:

f (x) = ‖ x ‖_{2}^{2} = x^{T} x .

We can write this as

f (x) = x^{T} I x

where $I$ is the identity matrix.

Following, Example 5.6,

\nabla f (x) = 2 I x = 2 x .

Example 5.8 (Gradient of quadratic functional)

Let $P \in S^{n}$ be a symmetric matrix. Let $q \in R^{n}$ and $r \in R$ . Consider the quadratic functional $f : R^{n} \to R$ given as:

f (x) = \frac{1}{2} x^{T} P x + q^{T} x + r .

We can compute the gradient as follows:

\begin{aligned} \nabla f (x) & = \nabla (\frac{1}{2} x^{T} P x + q^{T} x + r) \\ = \frac{1}{2} \nabla (x^{T} P x) + \nabla (q^{T} x) + \nabla r \\ = \frac{1}{2} (P + P^{T}) x + q \\ = \frac{1}{2} (P + P) x + q \\ = P x + q . \end{aligned}

We took advantage of the fact that gradient operation commutes with scalar multiplication and distributes on vector addition.
Since $r$ is a constant, it has no contribution to the derivative.
We reused results from previous examples.
We utilized the fact that $P = P^{T}$ since $P$ is symmetric.

In summary:

\nabla f (x) = P x + q .

The derivative of $f$ is then obtained by taking the transpose of the gradient:

D f (x) = x^{T} P + q^{T} .

Definition 5.5 (Gradient mapping)

If a real valued function $f : R^{n} \to R$ is differentiable, the gradient mapping of $f$ is the function $\nabla f : R^{n} \to R^{n}$ with $dom \nabla f = dom f$ , with the value $\nabla f (x)$ at every $x \in dom f$ .

5.1.4. Continuous Differentiability#

Definition 5.6 (Continuously differentiable real valued function)

Let $f : R^{n} \to R$ be a real valued function with $S = dom f$ . Let $U \subseteq S$ be an open set. If all the partial derivatives of $f$ exist and are continuous at every $x \in U$ , then $f$ is called continuously differentiable over $U$ .

If $f$ is continuously differentiable over an open set $U \subseteq S$ , then it is continuously differentiable over every subset $C \subseteq U$ .

If $S$ is open itself and $f$ is continuously differentiable over $S$ , then $f$ is called continuously differentiable.

5.1.5. First Order Approximation#

Definition 5.7 (First order approximation of real valued functions)

The affine function given by:

(5.3)#

\hat{f} (x) = f (a) + \nabla f (a)^{T} (x - a)

is the first order approximation of a real valued function $f$ at $x = a \in int dom f$ .

Theorem 5.1 (First order approximation accuracy)

Let $f : R^{n} \to R$ be defined on an open set $S = dom f$ . Assume that $f$ is continuously differentiable on $S$ . Then,

lim_{d \to 0} \frac{f (x + d) - f (x) - \nabla f (x)^{T} d}{‖ d ‖} = 0 \forall x \in S .

Another way to write this result is:

f (x) = f (a) + \nabla f (a)^{T} (x - a) + o (‖ x - a ‖)

where $a \in S$ and $o (\cdot) : R_{+} \to R$ is a one dimensional function satisfying $\frac{o (t)}{t} \to 0$ as $t \to 0^{+}$ .

5.1.6. Chain Rule#

Theorem 5.2 (Chain rule)

Suppose $f : R^{n} \to R^{m}$ is differentiable at $x \in int dom f$ and $g : R^{m} \to R^{p}$ is differentiable at $f (x) \in int dom g$ . Define the composition $h : R^{n} \to R^{p}$ as:

h (x) = g (f (x)) .

Then, $h$ is differentiable at $x$ with the derivative given by:

D h (x) = D g (f (x)) D f (x) .

Notice how the derivative lines up as a simple matrix multiplication.

Corollary 5.1 (Chain rule for real valued functions)

Suppose $f : R^{n} \to R$ is differentiable at $x \in int dom f$ and $g : R \to R$ is differentiable at $f (x) \in int dom g$ . Define the composition $h : R^{n} \to R$ as:

h (x) = g (f (x)) .

Then, $h$ is differentiable at $x$ with the gradient given by:

\nabla h (x) = g^{'} (f (x)) \nabla f (x) .

Example 5.9 (Gradient of log-sum-exp)

Let $h : R^{n} \to R$ be given by:

h (x) = \ln (\sum_{i = 1}^{n} \exp x_{i})

with $dom h = R^{n}$ .

Let $g (y) = \ln y$ and

f (x) = \sum_{i = 1}^{n} \exp x_{i}

Then, we can see that $h (x) = g (f (x))$ . Now $g^{'} (y) = \frac{1}{y}$ and

\begin{array}{r} \nabla f (x) = [\begin{array}{c} \exp x_{1} \\ ⋮ \\ \exp x_{n} \end{array}] . \end{array}

Thus,

\begin{array}{r} \nabla h (x) = \frac{1}{\sum_{i = 1}^{n} \exp x_{i}} [\begin{array}{c} \exp x_{1} \\ ⋮ \\ \exp x_{n} \end{array}] . \end{array}

Now, if we define

\begin{array}{r} z = [\begin{array}{c} \exp x_{1} \\ ⋮ \\ \exp x_{n} \end{array}] \end{array}

then, we see that:

1^{T} z = \sum_{i = 1}^{n} \exp x_{i} .

Using this notation:

\nabla h (x) = \frac{1}{1^{T} z} z .

Example 5.10 (Gradient of $ℓ_{2}$ norm at nonzero vectors)

Let $h : R^{n} \to R$ be given by:

h (x) = ‖ x ‖_{2} = \sqrt{⟨ x, x ⟩}

with $dom h = R^{n}$ .

Let $g : R \to R$ with $dom g = R_{+}$ be given by $g (y) = \sqrt{y}$ .

Let $f : R^{n} \to R$ with $dom f = R^{n}$ be given by

f (x) = ⟨ x, x ⟩ = \sum_{i = 1}^{n} x_{i}^{2} = ‖ x ‖_{2}^{2} .

Then, we can see that $h (x) = g (f (x))$ or $h = g \circ f$ .

$g$ is differentiable on the open set $R_{+ +}$ . For every $y \in R_{+ +}$ ,

g^{'} (y) = \frac{1}{2 \sqrt{y}}

and (from Example 5.7)

\nabla f (x) = 2 x .

Thus, for every $x \neq 0$ , following Corollary 5.1,

\nabla h (x) = g^{'} (f (x)) \nabla f (x) = \frac{1}{2 \sqrt{‖ x ‖_{2}^{2}}} 2 x = \frac{x}{‖ x ‖_{2}} .

The gradient of $ℓ_{2}$ norm at $0$ doesn’t exist. However, subgradients can be computed. See Example 9.71 and Example 9.72.

Corollary 5.2 (Chain rule for composition with affine function)

Suppose $f : R^{n} \to R^{m}$ is differentiable. Let $A \in R^{n \times p}$ and $b \in R^{n}$ . Define $g : R^{p} \to R^{m}$ as:

g (x) = f (A x + b)

with $dom g = {x | A x + b \in dom f}$ .

The derivative of $g$ at $x \in int dom g$ is given by:

D g (x) = D f (A x + b) A .

If $f$ is real valued (i.e. $m = 1$ ), then the gradient of a composition of a function with an affine function is given by:

\nabla g (x) = A^{T} \nabla f (A x + b) .

Example 5.11 (Chain rule for restriction on a line)

Let $f : R^{n} \to R$ be a real valued differentiable function. Consider the restriction of $f$ on a line in its domain

g (t) = f (x + t v)

where $x \in dom f$ and $v \in R^{n}$ with the domain

dom g = {t | x + t v \in dom f} .

If we define $h : R \to R^{n}$ as:

h (t) = x + t v;

we can see that:

g (t) = f (h (t))

By chain rule:

g^{'} (t) = D f (h (t)) D h (t) = \nabla f (h (t))^{T} v = \nabla f (x + t v)^{T} v .

In particular, if $v = y - x$ , with $y \in dom f$ ,

g^{'} (t) = \nabla f (x + t (y - x))^{T} (y - x) = \nabla f (t y + (1 - t) x)^{T} (y - x) .

5.1.7. Hessian#

In this section, we review the second derivative of a real valued function $f : R^{n} \to R$ .

Definition 5.8 (Hessian)

The second derivative or Hessian matrix of $f$ at $x \in int dom f$ , denoted by $\nabla^{2} f$ , is given by:

\nabla^{2} f (x)_{i j} = \frac{\partial^{2} f (x)}{\partial x_{i} \partial x_{j}}, i = 1, \dots, n j = 1, \dots, n

provided $f$ is twice differentiable at $x$ .

Example 5.12 (Hessian of linear functional)

Let $f : R^{n} \to R$ be a linear functional given by:

f (x) = ⟨ x, a ⟩ = a^{T} x .

We can expand it as:

f (x) = \sum_{j = 1}^{n} a_{j} x_{j} .

Computing partial derivative with respect to $x_{i}$ , we get:

\frac{\partial f (x)}{\partial x_{i}} = \frac{\partial}{\partial x_{i}} (\sum_{j = 1}^{n} a_{j} x_{j}) = a_{i} .

If we further compute the partial derivative w.r.t. $x_{j}$ , we get:

\frac{\partial^{2} f (x)}{\partial x_{i} \partial x_{j}} = \frac{\partial a_{i}}{\partial x_{j}} = 0.

Thus, the Hessian is an $n \times n$ 0 matrix:

\nabla^{2} f (x) = O_{n} .

Theorem 5.3

Hessian is the derivative of the gradient mapping.

D \nabla f (x) = \nabla^{2} f (x) .

Example 5.13 (Hessian of quadratic form)

Let $f : R^{n} \to R$ be a quadratic form given by:

f (x) = x^{T} A x

where $A \in R^{n \times n}$ .

Recall from Example 5.6 that:

\nabla f (x) = (A^{T} + A) x .

Also recall from Example 5.2 that

D (C x) = C

for all $C \in R^{m \times n}$ .

Thus, using Theorem 5.3

\nabla^{2} f (x) = D \nabla f (x) = D ((A^{T} + A) x) = A^{T} + A .

If $A$ is symmetric then

\nabla^{2} f (x) = 2 A .

Example 5.14 (Hessian of log-sum-exp)

Let $f : R^{n} \to R$ be given by:

f (x) = \ln (\sum_{i = 1}^{n} e^{x_{i}})

with $dom f = R^{n}$ .

Define

\begin{array}{r} z = [\begin{array}{c} e^{x_{1}} \\ ⋮ \\ e^{x_{n}} \end{array}] \end{array}

then, we see that:

1^{T} z = \sum_{i = 1}^{n} e^{x_{i}} .

Using this notation:

f (x) = \ln (1^{T} z) .

We have:

\frac{\partial z_{i}}{\partial x_{i}} = \frac{\partial}{\partial x_{i}} e^{x_{i}} = e^{x_{i}} = z_{i} .

$\frac{\partial z_{j}}{\partial x_{i}} = 0$ for $i \neq j$ . Now,

\begin{aligned} \frac{\partial}{\partial x_{i}} f (x) & = \frac{\partial}{\partial z_{i}} \ln (1^{T} z) \cdot \frac{\partial z_{i}}{\partial x_{i}} \\ = \frac{1}{1^{T} z} \frac{\partial}{\partial z_{i}} 1^{T} z \cdot z_{i} \\ = \frac{1}{1^{T} z} z_{i} . \end{aligned}

Proceeding to compute the second derivatives:

\begin{aligned} \frac{\partial^{2}}{\partial x_{i} \partial x_{j}} f (x) & = \frac{\partial}{\partial x_{i}} (\frac{1}{1^{T} z} z_{j}) \\ = \frac{\partial}{\partial z_{i}} (\frac{1}{1^{T} z} z_{j}) \cdot \frac{\partial z_{i}}{\partial x_{i}} \\ = \frac{1^{T} z δ_{i j} - z_{j}}{(1^{T} z)^{2}} \cdot z_{i} \\ = \frac{1^{T} z δ_{i j} z_{i} - z_{i} z_{j}}{(1^{T} z)^{2}} \\ = \frac{δ_{i j} z_{i}}{1^{T} z} - \frac{z_{i} z_{j}}{(1^{T} z)^{2}} . \end{aligned}

Now, note that $(z z^{T})_{i j} = z_{i} z_{j}$ . And, $(diag (z))_{i j} = δ_{i j} z_{i}$ .

Thus,

\nabla^{2} f (x) = \frac{1}{1^{T} z} diag (z) - \frac{1}{(1^{T} z)^{2}} z z^{T} .

Alternatively,

\nabla^{2} f (x) = \frac{1}{(1^{T} z)^{2}} ((1^{T} z) diag (z) - z z^{T}) .

Example 5.15 (Derivatives for least squares cost function)

Let $A \in R^{m \times n}$ . Let $b \in R^{n}$ . Consider the least squares cost function:

f (x) = \frac{1}{2} ‖ A x - b ‖_{2}^{2} .

Expanding it, we get:

f (x) = \frac{1}{2} x^{T} A^{T} A x - b^{T} A x + \frac{1}{2} b^{T} b .

Note that $A^{T} A$ is symmetric. Using previous results, we obtain the gradient:

\nabla f (x) = A^{T} A x - A^{T} b .

And the Hessian is:

\nabla^{2} f (x) = D \nabla f (x) = A^{T} A .

Example 5.16 (Derivatives for quadratic over linear function)

Let $f : R \times R \to R$ be given by:

f (x, y) = \frac{x^{2}}{y}

with $dom f = {(x, y) | y > 0}$ .

The gradient is obtained by computing the partial derivatives w.r.t. $x$ and $y$ :

\begin{array}{r} \nabla f (x, y) = [\begin{array}{c} \frac{2 x}{y} \\ \frac{- x^{2}}{y^{2}} \end{array}] . \end{array}

The Hessian is obtained by computing second order partial derivatives:

\begin{array}{r} \nabla^{2} f (x, y) = [\begin{array}{c} \frac{2}{y} & \frac{- 2 x}{y^{2}} \\ \frac{- 2 x}{y^{2}} & \frac{2 x^{2}}{y^{3}} \end{array}] = \frac{2}{y^{3}} [\begin{array}{c} y^{2} & - x y \\ - x y & x^{2} \end{array}] . \end{array}

5.1.8. Twice Continuous Differentiability#

Definition 5.9 (Twice continuously differentiable real valued function)

Let $f : R^{n} \to R$ be a real valued function with $S = dom f$ . Let $U \subseteq S$ be an open set. If all the second order partial derivatives of $f$ exist and are continuous at every $x \in U$ , then $f$ is called twice continuously differentiable over $U$ .

If $f$ is twice continuously differentiable over an open set $U \subseteq S$ , then it is twice continuously differentiable over every subset $C \subseteq U$ .

If $S$ is open itself and $f$ is twice continuously differentiable over $S$ , then $f$ is called twice continuously differentiable.

Theorem 5.4 (Symmetry of Hessian)

If $f : R^{n} \to R$ with $S = dom f$ is twice continuously differentiable over a set $U \subseteq S$ , then its Hessian matrix $\nabla^{2} f (x)$ is symmetric at every $x \in U$

5.1.9. Second Order Approximation#

Theorem 5.5 (Linear approximation theorem)

Let $f : R^{n} \to R$ with $S = dom f$ be twice continuously differentiable over an open set $U \subseteq S$ . Let $x \in U$ . Let $r > 0$ be such that $B (x, r) \subseteq U$ . Then, for any $y \in B (x, r)$ , there exist $z \in [x, y]$ such that

f (y) - f (x) = \nabla f (x)^{T} (y - x) + \frac{1}{2} (y - x)^{T} \nabla^{2} f (z) (y - x) .

Theorem 5.6 (Quadratic approximation theorem)

f (y) = f (x) + \nabla f (x)^{T} (y - x) + \frac{1}{2} (y - x)^{T} \nabla^{2} f (x) (y - x) + o (‖ y - x ‖^{2}) .

Definition 5.10 (Second order approximation)

The second order approximation of $f$ at or near $x = a$ is the quadratic function defined by:

\hat{f} (x) = f (a) + \nabla f (a)^{T} (x - a) + \frac{1}{2} (x - a)^{T} \nabla^{2} f (a) (x - a) .

5.1.10. Smoothness#

5.1.10.1. Real Functions#

Definition 5.11 (Class of continuous functions)

The class of continuous real functions, denoted by $C$ , is the set of functions of type $f : R \to R$ which are continuous over their domain $dom f$ .

Definition 5.12 (Differentiability class $C^{k}$ )

Let $f : R \to R$ be a real function with $S = dom f$ .

Then, we say that $f$ belongs to the differentiability class $C^{k}$ on $S$ if and only if

\frac{d^{k}}{d x^{k}} f (x) \in C .

In other words, the $k$ -th derivative of $f$ exists and is continuous.

$C^{0}$ consists of class of continuous real functions.
$C^{1}$ consists of class of continuously differentiable functions.
$C^{\infty}$ consists of class of smooth functions which are infinitely differentiable.

5.1.10.2. Real Valued Functions on Euclidean Space#

Definition 5.13 (Differentiability class $C^{k}$ )

A function $f : R^{n} \to R$ with $S = dom f$ where $S$ is an open subset of $R^{n}$ is said to be of class $C^{k}$ on $S$ , for a positive integer $k$ , if all the partial derivatives of $f$

\frac{\partial^{m} f}{\partial x_{1}^{m_{1}} \partial x_{2}^{m_{2}} \dots \partial x_{n}^{m_{n}}} (x)

exist and are continuous for every $m_{1}, m_{2}, \dots, m_{n} \geq 0$ and $m = m_{1} + m_{2} + \dots m_{n} \leq k$ .

If $f$ is continuous, it is said to belong to $C$ or $C^{0}$ .
If $f$ is continuously differentiable, it is said to belong to $C^{1}$ .
If $f$ is twice continuously differentiable, it is said to belong to $C^{2}$ .
If $f$ is infinitely differentiable, it is said to belong to $C^{\infty}$ .

Topics in Signal Processing

Differentiation

Contents

5.1. Differentiation#

5.1.1. Differentiability and Jacobian#

5.1.2. Real Valued Functions#

5.1.3. Gradient#

5.1.4. Continuous Differentiability#

5.1.5. First Order Approximation#

5.1.6. Chain Rule#

5.1.7. Hessian#

5.1.8. Twice Continuous Differentiability#

5.1.9. Second Order Approximation#

5.1.10. Smoothness#

5.1.10.1. Real Functions#

5.1.10.2. Real Valued Functions on Euclidean Space#