5.2. Differentiation in Banach Spaces#

We introduce the concept of differentiation in Banach spaces. Recall that Banach spaces are normed linear spaces that are complete.

5.2.1. Gateaux Differential#

Definition 5.14 (Directional derivative)

Let \(X\) and \(Y\) be Banach spaces. Let \(f: X \to Y\) be a function with \(S = \dom f\). The directional derivative of \(f\) at \(\bx \in \interior S\) in the direction \(\bh \in X\) where \(\bh \neq \bzero\), denoted by \(f'(\bx; \bh)\) is given by

\[ f'(\bx; \bh) \triangleq \lim_{t \to 0^+} \frac{f (\bx + t \bh) - f(\bx)}{t} \]

whenever the limit exists. This is also known as the Gateaux differential. By convention, \(f'(\bx; \bzero_X) = \bzero_Y\). This is consistent with the definition above.

  • There is no single directional derivative at a point \(\bx\).

  • The directional derivative depends on the direction \(\bh\).

  • In one dimension, there are two directional derivatives at each \(\bx\).

  • In two or more dimensions, there are infinitely many directional derivatives.

  • The directional derivative is a one dimensional calculation along the direction \(\bh\).

  • It is usually easy to compute the directional derivative even when the space \(X\) is infinite dimensional.

Definition 5.15 (Gateaux differentiability)

Let \(X\) and \(Y\) be Banach spaces. Let \(f: X \to Y\) be a function with \(S = \dom f\). Let \(U \subseteq S\) be an open set. We say that \(f\) is Gateaux differentiable at \(\bx \in U\) if the Gateaux differential \(f'(\bx; \bh)\) exists for every direction \(\bh \in X\).

Accordingly, we can define a bounded operator \(T_x : X \to Y\) given by

\[ T_x(\bh) \triangleq \lim_{t \to 0^+} \frac{f (\bx + t \bh) - f(\bx)}{t} \Forall \bh \in X. \]

The operator \(T\) is called the Gateaux derivative of \(f\) at \(\bx\).

Example 5.17 (Gateaux differential of exponential function)

Let \(f(x) = e^x\). Then,

\[\begin{split} f'(x; h) &= \lim_{t \to 0^+}\frac{e^{x + th} - e^x}{ t} \\ &= e^x \lim_{t \to 0^+} \frac{e^{t h} - 1}{t} \\ &= e^x \lim_{t \to 0^+} \frac{t h }{t} = h e^x. \end{split}\]

We note that the Gateaux derivative depends linearly on \(h\).

Theorem 5.7 (Gateaux differential nonnegative homogeneity)

The Gateaux differential of a function \(f : X \to Y\) is nonnegative homogeneous in the sense that

\[ f'(\bx; \alpha \bh) = \alpha f'(\bx; \bh) \]

for every \(\alpha \in \RR_+\) and every \(\bh \in X\).

However, the Gateaux differential may not be additive. Thus, the Gateaux differential may fail to be linear.

Example 5.18 (Gateaux differential of absolute value function)

Let \(f(x) = |x|\). Then, the Gateaux differentials are given by

\[\begin{split} f'(x; h) = \begin{cases} h \frac{x}{|x|} & x \neq 0;\\ |h | & x = 0. \end{cases} \end{split}\]

We note that the Gateaux differential of \(f\) exists everywhere. However the Gateaux differential depends on \(h\) in a nonlinear way at \(x=0\). At \(x \neq 0\), the Gateaux differential depends linearly on \(h\).

Example 5.19 (Gateaux differential of square function)

Let \(f(x) = x^2\). Then, the Gateaux differential is given by

\[\begin{split} f'(x; h) &= \lim_{t \to 0^+} \frac{f(x + t h) - f(x)}{t} \\ &= \lim_{t \to 0^+} \frac{x^2 + t^2 h^2 + 2 x t h - x^2 }{t} \\ &= 2 x h. \end{split}\]

We note that the Gateaux differential is linear w.r.t. \(h\).

Example 5.20 (Gateaux differential of linear functional)

Let \(f(\bx) = \ba^T \bx\) where \(\ba \in \RR^n\) is a given fixed vector.

\[ f'(\bx; \bh) = \lim_{t \to 0^+}\frac{\ba^T \bx + t \ba^T \bh - \ba^T \bx}{t} = \ba^T \bh. \]

We note that the Gateaux differential is linear w.r.t. \(\bh\).

Example 5.21 (Gateaux differential of simple quadratic)

Let \(f(\bx) = \bx^T \bA \bx\) where \(\bA \in \SS^n\) is a given symmetric matrix.

\[\begin{split} f'(\bx; \bh) &= \lim_{t \to 0^+}\frac{(\bx + t \bh) ^T \bA (\bx + t \bh) - \bx^T \bA \bx}{t} \\ &= \lim_{t \to 0^+}\frac{t^2 \bh ^T \bA \bh + 2 t \bh^T \bA \bx}{t} \\ &= 2 \bh^T \bA \bx = 2 \bx^T \bA \bh. \end{split}\]

We note that the Gateaux differential is linear w.r.t. \(\bh\).

In particular, if \(f(\bx) = \bx^T \bx\), then \(f'(\bx; \bh) = 2 \bh^T \bx = 2 \bx^T \bh\).

Theorem 5.8 (Gateaux differential of a constant function)

The Gateaux differential of a constant function is zero.

Theorem 5.9 (Gateaux differential sum rule)

Gateaux differential distributes over sum.

Let \(f, g: X \to Y\) both have Gateaux derivatives at \(\bx\) in the direction \(\bh\). Then,

\[ (f + g)'(\bx; \bh) = f'(\bx; \bh) + g'(\bx; \bh). \]

Also,

\[ (f - g)'(\bx; \bh) = f'(\bx; \bh) - g'(\bx; \bh). \]

Theorem 5.10 (Gateaux differential product rule)

Let \(f, g: X \to Y\) both be Gateaux differentiable at \(\bx \in \interior \dom f \cap \dom g\). Let \(h\) be their (pointwise) product function given by

\[ h(\bx) = f(\bx) g(\bx) \]

with \(\dom h = \dom f \cap \dom g\). Then,

\[ h'(\bx; \bh) = = (fg)'(\bx; \bh) = f'(\bx; \bh) g(\bx) + g'(\bx; \bh) f(\bx). \]

Theorem 5.11 (Gateaux differential chain rule)

Let \(f : X \to Y\) and \(g : Y \to Z\) be functions. Let \(h : X \to Z\) be the composition of \(f\) and \(g\) given by \(h = g \circ f\). Let \(U \subseteq \dom h\) be an open set. Let \(\bx \in U\). Assume that \(f\) is Gateaux differentiable at \(\bx\) and \(g\) is Gateaux differentiable at \(f(\bx)\). Then,

\[ h'(\bx; \bh) = g'(f(\bx); f'(\bx; \bh)) \Forall \bh \in X. \]

We recall the little-\(o\) notation. We say that a quantity \(q\) is \(o(t)\) if

\[ \lim_{t \to 0^+} \frac{q}{t} = 0. \]

For vector valued functions, a quantity \(\bq\) is \(o(t)\) if

\[ \lim_{t \to 0^+} \frac{ \| \bq \| }{t} = 0. \]

or

\[ \lim_{t \to 0^+} \frac{ \bq }{t} = \bzero. \]

Proof. If \(f\) is Gateaux differentiable at \(\bx\), then

\[ f'(\bx; \bh) = \lim_{t \to 0^+} \frac{f (\bx + t \bh) - f(\bx)}{t} \Forall \bh \in X. \]

In terms of little-o notation,

\[ f(\bx + t \bh) = f(\bx) + t f'(\bx; \bh) + o(t). \]

Similarly, if \(g\) is Gateaux differentiable at \(\by\), then

\[ g(\by + s \bu) = g (\by) + s g'(\by; \bu) + o(s). \]

Now,

\[\begin{split} h'(\bx; \bh) = (g \circ f)' (\bx; \bh) &= \lim_{t \to 0^+}\frac{g(f(\bx + t \bh)) - g(f(\bx)) }{t} \\ &= \lim_{t \to 0^+}\frac{g(f(\bx) + t f'(\bx; \bh) + o(t) ) - g(f(\bx)) }{t} \\ &= \lim_{t \to 0^+}\frac{g(f(\bx) + t (f'(\bx; \bh) + t^{-1}o(t)) ) - g(f(\bx)) }{t} \\ &= \lim_{t \to 0^+}\frac{g(f(\bx)) + t g'(f(\bx); f'(\bx; \bh) + t^{-1}o(t)) + o(t) - g(f(\bx))}{t} \\ &= \lim_{t \to 0^+}\frac{t g'(f(\bx); f'(\bx; \bh) + t^{-1}o(t) ) + o(t) } {t} \\ &= \lim_{t \to 0^+} [g'(f(\bx); f'(\bx; \bh) ) + t^{-1}o(t)) + t^{-1} o(t) ] \\ &= g'(f(\bx); f'(\bx; \bh) ). \end{split}\]

Example 5.22 (Chain rule for square of inner product)

Consider the function \(h(\bx) = (\bx^T \bx)^2\).

  1. Define \(g(t) = t^2\)

  2. Define \(f(\bx) = \bx^T \bx\).

  3. Then \(h = g \circ f\).

  4. We have \(f'(\bx; \bh) = 2 \bh^T \bx\).

  5. We have \(g'(y; u) = 2 y u\).

  6. Thus,

    \[\begin{split} g'(f(\bx); f'(\bx; \bh) ) &= 2 f(\bx) f'(\bx; \bh) \\ &= 2 (\bx^T \bx) (2 \bh^T \bx) \\ &= 4 (\bh^T \bx) (\bx^T \bx). \end{split}\]

We can compute the same thing using the product rule.

  1. We note that \(h (\bx) = f(\bx) f(\bx)\).

  2. Applying the product rule:

    \[\begin{split} h'(\bx; \bh) &= f'(\bx; \bh) f(\bx) + f'(\bx; \bh) f(\bx)\\ &= 2 f'(\bx; \bh) f(\bx) \\ &= 2 (2 \bh^T \bx) (\bx^T \bx) \\ &= 4 (\bh^T \bx) (\bx^T \bx). \end{split}\]

5.2.2. Fréchet Derivative#

Definition 5.16 (Fréchet differentiability)

Let \(X\) and \(Y\) be Banach spaces. Let \(f: X \to Y\) be a function with \(S = \dom f\). Let \(U \subseteq S\) be an open set. We say that \(f\) is Fréchet differentiable at \(\bx \in U\) if there is a bounded and linear operator \(T_x : X \to Y\) given by

\[ T_x(\bh) = \lim_{t \to 0^+} \frac{f (\bx + t \bh) - f(\bx)}{t} \Forall \bh \in X. \]

The operator \(T_x\) is called the Fréchet derivative of \(f\) at \(\bx\).

We note that \(T_x\) depends on \(\bx\).

Remark 5.1 (Fréchet differentiability alternate forms)

By definition, if \(f\) is Fréchet differentiable at \(\bx\), then it is Gateaux differentiable at \(\bx\). Since \(T_x\) is linear, we can write it as

\[ T_x(\bh) = \bA \bh \]

emphasizing the fact that the essential part of \(T_x\) doesn’t depend on \(\bh\). \(\bA\) may still depend on \(\bx\).

Using the little-\(o\) notation, we can write

\[ f(\bx + t \bh) = f(\bx) + t T_x(\bh) + o(t) = f(\bx) + t \bA \bh + o(t). \]

If we set \(t\bh = \by\), then \(t \to 0\) if and only if \(\by \to \bzero\). In particular, \(\| \by \|_X = t \| \bh \|_X = o (t)\). Now,

\[\begin{split} & f(\bx + \by) = f(\bx) + \bA \by + o(t) \\ &\iff f(\bx + \by) - f(\bx) - \bA \by = o (t) = o( \| \by \|_X) \\ &\iff \lim_{\| \by \|_X \to 0 } \frac{ \| f(\bx + \by) - f(\bx) - \bA \by \|_Y}{\| \by \|_X} = 0 \\ &\iff \lim_{ \by \to \bzero } \frac{\| f(\bx + \by) - f(\bx) - \bA \by \|_Y}{\| \by \|_X} = 0 \\ &\iff \lim_{ \by \to \bzero } \frac{\| f(\bx + \by) - f(\bx) - T_x (\by) \|_Y }{\| \by \|_X} = 0. \end{split}\]

Therefore \(f : X \to Y\) is Fréchet differentiable at \(\bx \in U\) if and only if

\[ \lim_{\by \to \bzero} \frac{f(\bx + \by) - f(\bx) - T_x(\by)}{\| \by \|_X} = \bzero \]

for every \(\by \in X\).

It is worthwhile to compare this definition to the definition of differentiability of \( f: \RR^n \to \RR^m\) in Definition 5.1. If we put \(\bz = \bx + \by\), we can rewrite the condition as

\[ \lim_{\bz \to \bx} \frac{ \| f(\bz) - f(\bx) - T_x(\bz - \bx) \|_Y}{\| \bz - \bx \|_X} = 0. \]

Thus, \(T_x\) plays the same role as the Jacobian matrix \(Df(\bx)\) in (5.1).

Theorem 5.12 (Existence of Fréchet derivative)

The Fréchet derivative of a function \(f\) exists at a point \(\bx = \ba\) if and only if all Gateaux differentials of \(f\) at \(\bx\) are continuous functions of \(\bx\) at \(\bx=\ba\).

Theorem 5.13 (Uniqueness of Fréchet derivative)

If the Fréchet derivative of a function \(f\) exists at a point \(\bx = \ba\) then it is unique.

5.2.3. Gradient#

Definition 5.17 (Gradient)

Let \(\VV\) be a Hilbert space. Let \(f : \VV \to \RR\) is a real valued function. Let \(S = \dom f\) and \(U \subseteq S\) be an open set. Assume that \(f\) is Fréchet differentiable at \(\bx \in U\). Then, the Fréchet derivative \(T_x : \VV \to \RR\) is a bounded linear functional.

The gradient of a real valued function is denoted by \(\nabla f(\bx)\) and \(\nabla f(\bx) \in \VV^*\) satisfying

\[ \langle \bh, \nabla f(\bx) \rangle = T_x(\bh). \]