# 7.9. Subgaussian Distributions#

In this section we review subgaussian distributions and matrices drawn from subgaussian distributions.

Examples of subgaussian distributions include

• Gaussian distribution

• Rademacher distribution taking values $$\pm \frac{1}{\sqrt{M}}$$

• Any zero mean distribution with a bounded support

Definition 7.33

A random variable $$X$$ is called subgaussian if there exists a constant $$c > 0$$ such that

(7.2)#$M_X(t) = \EE [\exp(X t) ] \leq \exp \left (\frac{c^2 t^2}{2} \right )$

holds for all $$t \in \RR$$. We use the notation $$X \sim \Sub (c^2)$$ to denote that $$X$$ satisfies the constraint (7.2). We also say that $$X$$ is $$c$$-subgaussian.

$$\EE [\exp(X t) ]$$ is moment generating function of $$X$$.

$$\exp \left (\frac{c^2 t^2}{2} \right )$$ is moment generating function of a Gaussian random variable with variance $$c^2$$.

The definition means that for a subgaussian variable $$X$$, its M.G.F. is bounded by the M.G.F. of a Gaussian random variable $$\sim \NNN(0, c^2)$$.

Example 7.16 (Gaussian r.v. as subgaussian r.v.)

Consider zero-mean Gaussian random variable $$X \sim \NNN(0, \sigma^2)$$ with variance $$\sigma^2$$. Then

$\EE [\exp(X t) ] = \exp\left ( \frac{\sigma^2 t^2}{2} \right ).$

Putting $$c = \sigma$$ we see that (7.2) is satisfied. Hence $$X\sim \Sub(\sigma^2)$$ is a subgaussian r.v. or $$X$$ is $$\sigma$$-subgaussian.

Example 7.17 (Rademacher distribution)

Consider $$X$$ with

$\PP_X(x) = \frac{1}{2}\delta(x-1) + \frac{1}{2}\delta(x + 1)$

i.e. $$X$$ takes a value $$1$$ with probability $$0.5$$ and value $$-1$$ with probability $$0.5$$.

Then

$\EE [\exp(X t) ] = \frac{1}{2} \exp(-t) + \frac{1}{2} \exp(t) = \cosh t \leq \exp \left ( \frac{t^2}{2} \right).$

Thus $$X \sim \Sub(1)$$ or $$X$$ is 1-subgaussian.

Example 7.18 (Uniform distribution)

Consider $$X$$ as uniformly distributed over the interval $$[-a, a]$$ for some $$a > 0$$. i.e.

$\begin{split} f_X(x) = \begin{cases} \frac{1}{2 a} & -a \leq x \leq a\\ 0 & \text{otherwise} \end{cases} \end{split}$

Then

$\EE [\exp(X t) ] = \frac{1}{2 a} \int_{-a}^{a} \exp(x t)d x = \frac{1}{2 a t} [e^{at} - e^{-at}] = \sum_{n = 0}^{\infty}\frac{(at)^{2 n}}{(2 n + 1)!}$

But $$(2n+1)! \geq n! 2^n$$. Hence we have

$\sum_{n = 0}^{\infty}\frac{(at)^{2 n}}{(2 n + 1)!} \leq \sum_{n = 0}^{\infty}\frac{(at)^{2 n}}{( n! 2^n)} = \sum_{n = 0}^{\infty}\frac{(a^2 t^2 / 2)^{n}}{( n!)} = \exp \left (\frac{a^2 t^2}{2} \right ).$

Thus

$\EE [\exp(X t ] \leq \exp \left ( \frac{a^2 t^2}{2} \right ).$

Hence $$X \sim \Sub(a^2)$$ or $$X$$ is $$a$$-subgaussian.

Example 7.19 (Random variable with bounded support)

Consider $$X$$ as a zero mean, bounded random variable i.e.

$\PP(|X| \leq B) = 1$

for some $$B \in \RR^+$$ and

$\EE(X) = 0.$

Then, the following upper bound holds:

$\EE [ \exp(X t) ] = \int_{-B}^{B} \exp(x t) f_X(x) d x \leq \exp\left (\frac{B^2 t^2}{2} \right ).$

This result can be proven with some advanced calculus. $$X \sim \Sub(B^2)$$ or $$X$$ is $$B$$-subgaussian.

There are some useful properties of subgaussian random variables.

Lemma 7.7 (Mean and variance of subgaussian random variables)

If $$X \sim \Sub(c^2)$$ then

$\EE (X) = 0$

and

$\EE(X^2) \leq c^2.$

Thus subgaussian random variables are always zero-mean. Their variance is always bounded by the variance of the bounding Gaussian distribution.

Proof. We proceed as follows:

1. Note that

$\sum_{n = 0}^{\infty} \frac{t^n}{n!} \EE (X^n) = \EE \left( \sum_{n = 0}^{\infty} \frac{(X t)^n}{n!} \right ) = \EE \left ( \exp(X t) \right ).$
2. But since $$X \sim \Sub(c^2)$$ hence

$\sum_{n = 0}^{\infty} \frac{t^n}{n!} \EE (X^n) \leq \exp \left ( \frac{c^2 t^2}{2} \right) = \sum_{n = 0}^{\infty} \frac{c^{2 n} t^{2 n}}{2^n n!}.$
3. Restating

$\EE (X) t + \EE (X^2) \frac{t^2}{2!} \leq \frac{c^2 t^2}{2} + \smallO (t^2) \text{ as } t \to 0.$
4. Dividing throughout by $$t > 0$$ and letting $$t \to 0$$ we get $$\EE (X) \leq 0$$.

5. Dividing throughout by $$t < 0$$ and letting $$t \to 0$$ we get $$\EE (X) \geq 0$$.

6. Thus $$\EE (X) = 0$$. So $$\Var(X) = \EE (X^2)$$.

7. Now we are left with

$\EE (X^2) \frac{t^2}{2!} \leq \frac{c^2 t^2}{2} + \smallO (t^2) \text{ as } t \to 0.$
8. Dividing throughout by $$t^2$$ and letting $$t \to 0$$ we get $$\Var(X) \leq c^2$$.

Subgaussian variables have a linear structure.

Theorem 7.24 (Linearity of subgaussian variables)

If $$X \sim \Sub(c^2)$$ i.e. $$X$$ is $$c$$-subgaussian, then for any $$\alpha \in \RR$$, the r.v. $$\alpha X$$ is $$|\alpha| c$$-subgaussian.

If $$X_1, X_2$$ are r.v. such that $$X_i$$ is $$c_i$$-subgaussian, then $$X_1 + X_2$$ is $$c_1 + c_2$$-subgaussian.

Proof. Scalar multiplication:

1. Let $$X$$ be $$c$$-subgaussian.

2. Then

$\EE [\exp(X t) ] \leq \exp \left (\frac{c^2 t^2}{2} \right ).$
3. Now for $$\alpha \neq 0$$, we have

$\EE [\exp(\alpha X t) ] \leq \exp \left (\frac{\alpha^2 c^2 t^2}{2} \right ) \leq \exp \left (\frac{(|\alpha | c)^2 t^2}{2} \right ).$
4. Hence $$\alpha X$$ is $$|\alpha| c$$-subgaussian.

1. Consider $$X_1$$ as $$c_1$$-subgaussian and $$X_2$$ as $$c_2$$-subgaussian.

2. Thus

$\EE (\exp(X_i t) ) \leq \exp \left (\frac{c_i^2 t^2}{2} \right ).$
3. Let $$p, q >1$$ be two numbers s.t. $$\frac{1}{p} + \frac{1}{q} = 1$$.

4. Using H”older’s inequality, we have

$\begin{split} \EE (\exp((X_1 + X_2)t) ) &\leq \left [ \EE (\exp(X_1 t) )^p\right ]^{\frac{1}{p}} \left [ \EE (\exp(X_2 t) )^q\right ]^{\frac{1}{q}}\\ &= \left [ \EE (\exp( p X_1 t) )\right ]^{\frac{1}{p}} \left [ \EE (\exp(q X_2 t) )\right ]^{\frac{1}{q}}\\ &\leq \left [ \exp \left (\frac{(p c_1)^2 t^2}{2} \right ) \right ]^{\frac{1}{p}} \left [ \exp \left (\frac{(q c_2)^2 t^2}{2} \right ) \right ]^{\frac{1}{q}}\\ &= \exp \left ( \frac{t^2}{2} ( p c_1^2 + q c_2^2) \right ) \\ &= \exp \left ( \frac{t^2}{2} ( p c_1^2 + \frac{p}{p - 1} c_2^2) \right ). \end{split}$
5. Since this is valid for any $$p > 1$$, we can minimize the r.h.s. over $$p > 1$$.

6. If suffices to minimize the term

$r = p c_1^2 + \frac{p}{p - 1} c_2^2.$
7. We have

$\frac{\partial r}{\partial p} = c_1^2 - \frac{1}{(p-1)^2}c_2^2.$
8. Equating it to 0 gives us

$p - 1 = \frac{c_2}{c_1} \implies p = \frac{c_1 + c_2}{c_1} \implies \frac{p}{p -1} = \frac{c_1 + c_2}{c_2}.$
9. Taking second derivative, we can verify that this is indeed a minimum value.

10. Thus

$r_{\min} = (c_1 + c_2)^2.$
11. Hence we have the result

$\EE (\exp((X_1 + X_2)t) ) \leq \exp \left (\frac{(c_1+ c_2)^2 t^2}{2} \right ).$
12. Thus $$X_1 + X_2$$ is $$(c_1 + c_2)$$-subgaussian.

If $$X_1$$ and $$X_2$$ are independent, then $$X_1 + X_2$$ is $$\sqrt{c_1^2 + c_2^2}$$-subgaussian.

If $$X$$ is $$c$$-subgaussian then naturally, $$X$$ is $$d$$-subgaussian for any $$d \geq c$$. A question arises as to what is the minimum value of $$c$$ such that $$X$$ is $$c$$-subgaussian.

Definition 7.34 (Subgaussian moment)

For a centered random variable $$X$$, the subgaussian moment of $$X$$, denoted by $$\sigma(X)$$, is defined as

$\sigma(X) = \inf \left \{ c \geq 0 \; | \; \EE (\exp(X t) ) \leq \exp \left (\frac{c^2 t^2}{2} \right ), \Forall t \in \RR. \right \}$

$$X$$ is subgaussian if and only if $$\sigma(X)$$ is finite.

We can also show that $$\sigma(\cdot)$$ is a norm on the space of subgaussian random variables. And this normed space is complete.

For centered Gaussian r.v. $$X \sim \NNN(0, \sigma^2)$$, the subgaussian moment coincides with the standard deviation. $$\sigma(X) = \sigma$$.

Sometimes it is useful to consider more restrictive class of subgaussian random variables.

Definition 7.35 (Strictly subgaussian distribution)

A random variable $$X$$ is called strictly subgaussian if $$X \sim \Sub(\sigma^2)$$ where $$\sigma^2 = \EE(X^2)$$, i.e. the inequality

$\EE (\exp(X t) ) \leq \exp \left (\frac{\sigma^2 t^2}{2} \right )$

holds true for all $$t \in \RR$$.

We will denote strictly subgaussian variables by $$X \sim \SSub (\sigma^2)$$.

Example 7.20 (Gaussian distribution)

If $$X \sim \NNN (0, \sigma^2)$$ then $$X \sim \SSub(\sigma^2)$$.

## 7.9.1. Characterization#

We quickly review Markov’s inequality which will help us establish the results in this subsection.

Let $$X$$ be a non-negative random variable. And let $$t > 0$$. Then

$\PP (X \geq t ) \leq \frac{\EE (X)}{t}.$

Theorem 7.25

For a centered random variable $$X$$, the following statements are equivalent:

1. moment generating function condition:

$\EE [\exp(X t) ] \leq \exp \left (\frac{c^2 t^2}{2} \right ) \Forall t \in \RR.$
2. Subgaussian tail estimate: There exists $$a > 0$$ such that

$\PP(|X| \geq \lambda) \leq 2 \exp (- a \lambda^2) \Forall \lambda > 0.$
3. $$\psi_2$$-condition: There exists some $$b > 0$$ such that

$\EE [\exp (b X^2) ] \leq 2.$

Proof. $$(1) \implies (2)$$

1. Using Markov’s inequality, for any $$t > 0$$ we have

$\begin{split} \PP(X \geq \lambda) &= \PP (t X \geq t \lambda) = \PP \left(e^{t X} \geq e^{t \lambda} \right )\\ &\leq \frac{\EE \left ( e^{t X} \right ) }{e^{t \lambda}} \leq \exp \left ( - t \lambda + \frac{c^2 t^2}{2}\right ) \Forall t \in \RR. \end{split}$
2. Since this is valid for all $$t \in \RR$$, hence it should be valid for the minimum value of r.h.s.

3. The minimum value is obtained for $$t = \frac{\lambda}{c^2}$$.

4. Thus we get

$\PP(X \geq \lambda) \leq \exp \left ( - \frac{\lambda^2}{2 c^2}\right ).$
5. Since $$X$$ is $$c$$-subgaussian, hence $$-X$$ is also $$c$$-subgaussian.

6. Hence

$\PP (X \leq - \lambda) = \PP (-X \geq \lambda) \leq \exp \left ( - \frac{\lambda^2}{2 c^2}\right ).$
7. Thus

$\PP(|X| \geq \lambda) = \PP (X \leq - \lambda) + \PP(X \geq \lambda) \leq 2 \exp \left ( - \frac{\lambda^2}{2 c^2}\right ).$
8. Thus we can choose $$a = \frac{1}{2 c^2}$$ to complete the proof.

$$(2)\implies (3)$$

TODO PROVE THIS

$\EE (\exp (b X^2)) \leq 1 + \int_0^{\infty} 2 b t \exp (b t^2) \PP (|X| > t)d t$

$$(3)\implies (1)$$

TODO PROVE THIS

## 7.9.2. More Properties#

We also have the following result on the exponential moment of a subgaussian random variable.

Lemma 7.8

Suppose $$X \sim \Sub(c^2)$$. Then

$\EE \left [\exp \left ( \frac{\lambda X^2}{2 c^2} \right ) \right ] \leq \frac{1}{\sqrt{1 - \lambda}}$

for any $$\lambda \in [0,1)$$.

Proof. We are given that

$\begin{split} &\EE (\exp(X t) ) \leq \exp \left (\frac{c^2 t^2}{2} \right )\\ &\implies \int_{-\infty}^{\infty} \exp(t x) f_X(x) d x \leq \exp \left (\frac{c^2 t^2}{2} \right ) \Forall t \in \RR\\ \end{split}$

Multiplying on both sides with $$\exp \left ( -\frac{c^2 t^2}{2 \lambda} \right )$$:

$\int_{-\infty}^{\infty} \exp \left (t x - \frac{c^2 t^2}{2 \lambda}\right ) f_X(x) d x \leq \exp \left (\frac{c^2 t^2}{2}\frac{\lambda-1}{\lambda} \right ) = \exp \left (-\frac{t^2}{2}\frac{c^2 (1 - \lambda)}{\lambda} \right )$

Integrating on both sides w.r.t. $$t$$ we get:

$\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \exp \left (t x - \frac{c^2 t^2}{2 \lambda}\right ) f_X(x) d x d t \leq \int_{-\infty}^{\infty} \exp \left (-\frac{t^2}{2}\frac{c^2 (1 - \lambda)}{\lambda} \right ) d t$

which reduces to:

$\begin{split} &\frac{1}{c} \sqrt{2 \pi \lambda} \int_{-\infty}^{\infty} \exp \left ( \frac{\lambda x^2}{2 c^2} \right ) f_X(x) d x \leq \frac{1}{c} \sqrt {\frac{2 \pi \lambda}{1 - \lambda}}\\ \implies & \EE \left (\exp \left ( \frac{\lambda X^2}{2 c^2} \right ) \right ) \leq \frac{1}{\sqrt{1 - \lambda}} \end{split}$

which completes the proof.

## 7.9.3. Subgaussian Random Vectors#

The linearity property of subgaussian r.v.s can be extended to random vectors also. This is stated more formally in following result.

Theorem 7.26

Suppose that $$X = [X_1, X_2,\dots, X_N]$$, where each $$X_i$$ is i.i.d. with $$X_i \sim \Sub(c^2)$$. Then for any $$\alpha \in \RR^N$$, $$\langle X, \alpha \rangle \sim \Sub(c^2 \| \alpha \|^2_2)$$. Similarly if each $$X_i \sim \SSub(\sigma^2)$$, then for any $$\alpha \in \RR^N$$, $$\langle X, \alpha \rangle \sim \SSub(\sigma^2 \| \alpha \|^2_2)$$.

Norm of a subgaussian random vector

1. Let $$X$$ be a random vector where each $$X_i$$ is i.i.d. with $$X_i \sim \Sub (c^2)$$.

2. Consider the $$l_2$$ norm $$\| X \|_2$$. It is a random variable in its own right.

3. It would be useful to understand the average behavior of the norm.

4. Suppose $$N=1$$. Then $$\| X \|_2 = |X_1|$$.

5. Also $$\| X \|^2_2 = X_1^2$$. Thus $$\EE (\| X \|^2_2) = \sigma^2$$.

6. It looks like $$\EE (\| X \|^2_2)$$ should be connected with $$\sigma^2$$.

7. Norm can increase or decrease compared to the average value.

8. A ratio based measure between actual value and average value would be useful.

9. What is the probability that the norm increases beyond a given factor?

10. What is the probability that the norm reduces beyond a given factor?

These bounds are stated formally in the following theorem.

Theorem 7.27

Suppose that $$X = [X_1, X_2,\dots, X_N]$$, where each $$X_i$$ is i.i.d. with $$X_i \sim \Sub(c^2)$$. Then

(7.3)#$\EE (\| X \|_2^2 ) = N \sigma^2.$

Moreover, for any $$\alpha \in (0,1)$$ and for any $$\beta \in [\frac{c^2}{\sigma^2}, \beta_{\max}]$$, there exists a constant $$\kappa^* \geq 4$$ depending only on $$\beta_{\max}$$ and the ratio $$\frac{\sigma^2}{c^2}$$ such that

(7.4)#$\PP (\| X \|_2^2 \leq \alpha N \sigma^2) \leq \exp \left ( - \frac{ N (1 - \alpha)^2}{\kappa^*} \right )$

and

(7.5)#$\PP (\| X \|_2^2 \geq \beta N \sigma^2) \leq \exp \left ( - \frac{ N (\beta - 1)^2}{\kappa^*} \right )$
• First equation gives the average value of the square of the norm.

• Second inequality states the upper bound on the probability that norm could reduce beyond a factor given by $$\alpha < 1$$.

• Third inequality states the upper bound on the probability that norm could increase beyond a factor given by $$\beta > 1$$.

• Note that if $$X_i$$ are strictly subgaussian, then $$c=\sigma$$. Hence $$\beta \in (1, \beta_{\max})$$.

Proof. Since $$X_i$$ are independent hence

$\EE \left [ \| X \|_2^2 \right ] = \EE \left [ \sum_{i=1}^N X_i^2 \right ] = \sum_{i=1}^N \EE \left [ X_i^2 \right ] = N \sigma^2.$

This proves the first part.

Now let us look at (7.5).

By applying Markov’s inequality for any $$\lambda > 0$$ we have:

$\begin{split} \PP (\| X \|_2^2 \geq \beta N \sigma^2) &= \PP \left ( \exp (\lambda \| X \|_2^2 ) \geq \exp (\lambda \beta N \sigma^2) \right) \\ & \leq \frac{\EE (\exp (\lambda \| X \|_2^2 )) }{\exp (\lambda \beta N \sigma^2)} = \frac{\prod_{i=1}^{N}\EE (\exp ( \lambda X_i^2 )) }{\exp (\lambda \beta N \sigma^2)} \end{split}$

Since $$X_i$$ is $$c$$-subgaussian, hence from \cref {lem:subgaussian_exp_square_moment} we have

$\EE (\exp ( \lambda X_i^2 )) = \EE \left (\exp \left ( \frac{2 c^2\lambda X_i^2}{2 c^2} \right ) \right) \leq \frac{1}{\sqrt{1 - 2 c^2 \lambda}}.$

Thus:

$\prod_{i=1}^{N}\EE (\exp ( \lambda X_i^2 )) \leq \left ( \frac{1}{\sqrt{1 - 2 c^2 \lambda}} \right )^{\frac{N}{2}}.$

Putting it back we get:

$\PP (\| X \|_2^2 \geq \beta N \sigma^2) \leq \left (\frac{\exp (- 2\lambda \beta \sigma^2)}{\sqrt{1 - 2 c^2 \lambda}}\right )^{\frac{N}{2}}.$

Since above is valid for all $$\lambda > 0$$, we can minimize the R.H.S. over $$\lambda$$ by setting the derivative w.r.t. $$\lambda$$ to $$0$$.

Thus we get optimum $$\lambda$$ as:

$\lambda = \frac{\beta \sigma^2 - c^2 }{2 c^2 \sigma^2 (1 + \beta)}.$

Plugging this back we get:

$\PP (\| X \|_2^2 \geq \beta N \sigma^2) \leq \left ( \beta \frac{\sigma^2}{c^2} \exp \left ( 1 - \beta \frac{\sigma^2}{c^2} \right ) \right ) ^{\frac{N}{2}}.$

Similarly proceeding for (7.4) we get

$\PP (\| X \|_2^2 \leq \alpha N \sigma^2) \leq \left ( \alpha \frac{\sigma^2}{c^2} \exp \left ( 1 - \alpha \frac{\sigma^2}{c^2} \right ) \right ) ^{\frac{N}{2}}.$

We need to simplify these equations. We will do some jugglery now. Consider the function

$f(\gamma) = \frac{2 (\gamma - 1)^2}{(\gamma-1) - \ln \gamma} \Forall \gamma > 0.$

By differentiating twice, we can show that this is a strictly increasing function. Let us have $$\gamma \in (0, \gamma_{\max}]$$. Define

$\kappa^* = \max \left ( 4, \frac{2 (\gamma_{\max} - 1)^2}{(\gamma_{\max}-1) - \ln \gamma_{\max}} \right )$

Clearly

$\kappa^* \geq \frac{2 (\gamma - 1)^2}{(\gamma-1) - \ln \gamma} \Forall \gamma \in (0, \gamma_{\max}].$

Which gives us:

$\ln (\gamma) \leq (\gamma - 1) - \frac{2 (\gamma - 1)^2}{\kappa^*}.$

Hence by exponentiating on both sides we get:

$\gamma \leq \exp \left [ (\gamma - 1) - \frac{2 (\gamma - 1)^2}{\kappa^*} \right ].$

By slight manipulation we get:

$\gamma \exp ( 1 - \gamma) \leq \exp \left [ \frac{2 (1 - \gamma )^2}{\kappa^*} \right ].$

We now choose

$\gamma = \alpha \frac{\sigma^2}{c^2}$

Substituting we get:

$\PP (\| X \|_2^2 \leq \alpha N \sigma^2) \leq \left ( \gamma \exp \left ( 1 - \gamma \right ) \right ) ^{\frac{N}{2}} \leq \exp \left [ \frac{N (1 - \gamma )^2}{\kappa^*} \right ] .$

Finally

$c \geq \sigma \implies \frac{\sigma^2}{c^2}\leq 1 \implies \gamma \leq \alpha \implies 1 - \gamma \geq 1 - \alpha$

Thus we get

$\PP (\| X \|_2^2 \leq \alpha N \sigma^2) \leq \exp \left [ \frac{N (1 - \alpha )^2}{\kappa^*} \right ] .$

Similarly by choosing $$\gamma = \beta \frac{\sigma^2}{c^2}$$ proves the other bound.

We can now map $$\gamma_{\max}$$ to some $$\beta_{\max}$$ by:

$\gamma_{\max} = \frac {\beta_{\max} \sigma^2 }{c^2}.$

This result tells us that given a vector with entries drawn from a subgaussian distribution, we can expect the norm of the vector to concentrate around its expected value $$N\sigma^2$$.