Subgaussian Distributions
Contents
7.9. Subgaussian Distributions#
In this section we review subgaussian distributions and matrices drawn from subgaussian distributions.
Examples of subgaussian distributions include
Gaussian distribution
Rademacher distribution taking values \(\pm \frac{1}{\sqrt{M}}\)
Any zero mean distribution with a bounded support
A random variable \(X\) is called subgaussian if there exists a constant \(c > 0\) such that
holds for all \(t \in \RR\). We use the notation \(X \sim \Sub (c^2)\) to denote that \(X\) satisfies the constraint (7.2). We also say that \(X\) is \(c\)-subgaussian.
\(\EE [\exp(X t) ]\) is moment generating function of \(X\).
\(\exp \left (\frac{c^2 t^2}{2} \right )\) is moment generating function of a Gaussian random variable with variance \(c^2\).
The definition means that for a subgaussian variable \(X\), its M.G.F. is bounded by the M.G.F. of a Gaussian random variable \(\sim \NNN(0, c^2)\).
(Gaussian r.v. as subgaussian r.v.)
Consider zero-mean Gaussian random variable \(X \sim \NNN(0, \sigma^2)\) with variance \(\sigma^2\). Then
Putting \(c = \sigma\) we see that (7.2) is satisfied. Hence \(X\sim \Sub(\sigma^2)\) is a subgaussian r.v. or \(X\) is \(\sigma\)-subgaussian.
(Rademacher distribution)
Consider \(X\) with
i.e. \(X\) takes a value \(1\) with probability \(0.5\) and value \(-1\) with probability \(0.5\).
Then
Thus \(X \sim \Sub(1)\) or \(X\) is 1-subgaussian.
(Uniform distribution)
Consider \(X\) as uniformly distributed over the interval \([-a, a]\) for some \(a > 0\). i.e.
Then
But \((2n+1)! \geq n! 2^n\). Hence we have
Thus
Hence \(X \sim \Sub(a^2)\) or \(X\) is \(a\)-subgaussian.
(Random variable with bounded support)
Consider \(X\) as a zero mean, bounded random variable i.e.
for some \(B \in \RR^+\) and
Then, the following upper bound holds:
This result can be proven with some advanced calculus. \(X \sim \Sub(B^2)\) or \(X\) is \(B\)-subgaussian.
There are some useful properties of subgaussian random variables.
(Mean and variance of subgaussian random variables)
If \(X \sim \Sub(c^2)\) then
and
Thus subgaussian random variables are always zero-mean. Their variance is always bounded by the variance of the bounding Gaussian distribution.
Proof. We proceed as follows:
Note that
\[ \sum_{n = 0}^{\infty} \frac{t^n}{n!} \EE (X^n) = \EE \left( \sum_{n = 0}^{\infty} \frac{(X t)^n}{n!} \right ) = \EE \left ( \exp(X t) \right ). \]But since \(X \sim \Sub(c^2)\) hence
\[ \sum_{n = 0}^{\infty} \frac{t^n}{n!} \EE (X^n) \leq \exp \left ( \frac{c^2 t^2}{2} \right) = \sum_{n = 0}^{\infty} \frac{c^{2 n} t^{2 n}}{2^n n!}. \]Restating
\[ \EE (X) t + \EE (X^2) \frac{t^2}{2!} \leq \frac{c^2 t^2}{2} + \smallO (t^2) \text{ as } t \to 0. \]Dividing throughout by \(t > 0\) and letting \(t \to 0\) we get \(\EE (X) \leq 0\).
Dividing throughout by \(t < 0\) and letting \(t \to 0\) we get \(\EE (X) \geq 0\).
Thus \(\EE (X) = 0\). So \(\Var(X) = \EE (X^2)\).
Now we are left with
\[ \EE (X^2) \frac{t^2}{2!} \leq \frac{c^2 t^2}{2} + \smallO (t^2) \text{ as } t \to 0. \]Dividing throughout by \(t^2\) and letting \(t \to 0\) we get \(\Var(X) \leq c^2\).
Subgaussian variables have a linear structure.
(Linearity of subgaussian variables)
If \(X \sim \Sub(c^2)\) i.e. \(X\) is \(c\)-subgaussian, then for any \(\alpha \in \RR\), the r.v. \(\alpha X\) is \(|\alpha| c\)-subgaussian.
If \(X_1, X_2\) are r.v. such that \(X_i\) is \(c_i\)-subgaussian, then \(X_1 + X_2\) is \(c_1 + c_2\)-subgaussian.
Proof. Scalar multiplication:
Let \(X\) be \(c\)-subgaussian.
Then
\[ \EE [\exp(X t) ] \leq \exp \left (\frac{c^2 t^2}{2} \right ). \]Now for \(\alpha \neq 0\), we have
\[ \EE [\exp(\alpha X t) ] \leq \exp \left (\frac{\alpha^2 c^2 t^2}{2} \right ) \leq \exp \left (\frac{(|\alpha | c)^2 t^2}{2} \right ). \]Hence \(\alpha X\) is \(|\alpha| c\)-subgaussian.
Addition:
Consider \(X_1\) as \(c_1\)-subgaussian and \(X_2\) as \(c_2\)-subgaussian.
Thus
\[ \EE (\exp(X_i t) ) \leq \exp \left (\frac{c_i^2 t^2}{2} \right ). \]Let \(p, q >1 \) be two numbers s.t. \(\frac{1}{p} + \frac{1}{q} = 1\).
Using H”older’s inequality, we have
\[\begin{split} \EE (\exp((X_1 + X_2)t) ) &\leq \left [ \EE (\exp(X_1 t) )^p\right ]^{\frac{1}{p}} \left [ \EE (\exp(X_2 t) )^q\right ]^{\frac{1}{q}}\\ &= \left [ \EE (\exp( p X_1 t) )\right ]^{\frac{1}{p}} \left [ \EE (\exp(q X_2 t) )\right ]^{\frac{1}{q}}\\ &\leq \left [ \exp \left (\frac{(p c_1)^2 t^2}{2} \right ) \right ]^{\frac{1}{p}} \left [ \exp \left (\frac{(q c_2)^2 t^2}{2} \right ) \right ]^{\frac{1}{q}}\\ &= \exp \left ( \frac{t^2}{2} ( p c_1^2 + q c_2^2) \right ) \\ &= \exp \left ( \frac{t^2}{2} ( p c_1^2 + \frac{p}{p - 1} c_2^2) \right ). \end{split}\]Since this is valid for any \(p > 1\), we can minimize the r.h.s. over \(p > 1\).
If suffices to minimize the term
\[ r = p c_1^2 + \frac{p}{p - 1} c_2^2. \]We have
\[ \frac{\partial r}{\partial p} = c_1^2 - \frac{1}{(p-1)^2}c_2^2. \]Equating it to 0 gives us
\[ p - 1 = \frac{c_2}{c_1} \implies p = \frac{c_1 + c_2}{c_1} \implies \frac{p}{p -1} = \frac{c_1 + c_2}{c_2}. \]Taking second derivative, we can verify that this is indeed a minimum value.
Thus
\[ r_{\min} = (c_1 + c_2)^2. \]Hence we have the result
\[ \EE (\exp((X_1 + X_2)t) ) \leq \exp \left (\frac{(c_1+ c_2)^2 t^2}{2} \right ). \]Thus \(X_1 + X_2\) is \((c_1 + c_2)\)-subgaussian.
If \(X_1\) and \(X_2\) are independent, then \(X_1 + X_2\) is \(\sqrt{c_1^2 + c_2^2}\)-subgaussian.
If \(X\) is \(c\)-subgaussian then naturally, \(X\) is \(d\)-subgaussian for any \(d \geq c\). A question arises as to what is the minimum value of \(c\) such that \(X\) is \(c\)-subgaussian.
(Subgaussian moment)
For a centered random variable \(X\), the subgaussian moment of \(X\), denoted by \(\sigma(X)\), is defined as
\(X\) is subgaussian if and only if \(\sigma(X)\) is finite.
We can also show that \(\sigma(\cdot)\) is a norm on the space of subgaussian random variables. And this normed space is complete.
For centered Gaussian r.v. \(X \sim \NNN(0, \sigma^2)\), the subgaussian moment coincides with the standard deviation. \(\sigma(X) = \sigma\).
Sometimes it is useful to consider more restrictive class of subgaussian random variables.
(Strictly subgaussian distribution)
A random variable \(X\) is called strictly subgaussian if \(X \sim \Sub(\sigma^2)\) where \(\sigma^2 = \EE(X^2)\), i.e. the inequality
holds true for all \(t \in \RR\).
We will denote strictly subgaussian variables by \(X \sim \SSub (\sigma^2)\).
(Gaussian distribution)
If \(X \sim \NNN (0, \sigma^2)\) then \(X \sim \SSub(\sigma^2)\).
7.9.1. Characterization#
We quickly review Markov’s inequality which will help us establish the results in this subsection.
Let \(X\) be a non-negative random variable. And let \(t > 0\). Then
For a centered random variable \(X\), the following statements are equivalent:
moment generating function condition:
\[ \EE [\exp(X t) ] \leq \exp \left (\frac{c^2 t^2}{2} \right ) \Forall t \in \RR. \]Subgaussian tail estimate: There exists \(a > 0\) such that
\[ \PP(|X| \geq \lambda) \leq 2 \exp (- a \lambda^2) \Forall \lambda > 0. \]\(\psi_2\)-condition: There exists some \(b > 0\) such that
\[ \EE [\exp (b X^2) ] \leq 2. \]
Proof. \((1) \implies (2)\)
Using Markov’s inequality, for any \(t > 0\) we have
\[\begin{split} \PP(X \geq \lambda) &= \PP (t X \geq t \lambda) = \PP \left(e^{t X} \geq e^{t \lambda} \right )\\ &\leq \frac{\EE \left ( e^{t X} \right ) }{e^{t \lambda}} \leq \exp \left ( - t \lambda + \frac{c^2 t^2}{2}\right ) \Forall t \in \RR. \end{split}\]Since this is valid for all \(t \in \RR\), hence it should be valid for the minimum value of r.h.s.
The minimum value is obtained for \(t = \frac{\lambda}{c^2}\).
Thus we get
\[ \PP(X \geq \lambda) \leq \exp \left ( - \frac{\lambda^2}{2 c^2}\right ). \]Since \(X\) is \(c\)-subgaussian, hence \(-X\) is also \(c\)-subgaussian.
Hence
\[ \PP (X \leq - \lambda) = \PP (-X \geq \lambda) \leq \exp \left ( - \frac{\lambda^2}{2 c^2}\right ). \]Thus
\[ \PP(|X| \geq \lambda) = \PP (X \leq - \lambda) + \PP(X \geq \lambda) \leq 2 \exp \left ( - \frac{\lambda^2}{2 c^2}\right ). \]Thus we can choose \(a = \frac{1}{2 c^2}\) to complete the proof.
\((2)\implies (3)\)
TODO PROVE THIS
\((3)\implies (1)\)
TODO PROVE THIS
7.9.2. More Properties#
We also have the following result on the exponential moment of a subgaussian random variable.
Suppose \(X \sim \Sub(c^2)\). Then
for any \(\lambda \in [0,1)\).
Proof. We are given that
Multiplying on both sides with \(\exp \left ( -\frac{c^2 t^2}{2 \lambda} \right )\):
Integrating on both sides w.r.t. \(t\) we get:
which reduces to:
which completes the proof.
7.9.3. Subgaussian Random Vectors#
The linearity property of subgaussian r.v.s can be extended to random vectors also. This is stated more formally in following result.
Suppose that \(X = [X_1, X_2,\dots, X_N]\), where each \(X_i\) is i.i.d. with \(X_i \sim \Sub(c^2)\). Then for any \(\alpha \in \RR^N\), \( \langle X, \alpha \rangle \sim \Sub(c^2 \| \alpha \|^2_2)\). Similarly if each \(X_i \sim \SSub(\sigma^2)\), then for any \(\alpha \in \RR^N\), \(\langle X, \alpha \rangle \sim \SSub(\sigma^2 \| \alpha \|^2_2)\).
Norm of a subgaussian random vector
Let \(X\) be a random vector where each \(X_i\) is i.i.d. with \(X_i \sim \Sub (c^2)\).
Consider the \(l_2\) norm \(\| X \|_2\). It is a random variable in its own right.
It would be useful to understand the average behavior of the norm.
Suppose \(N=1\). Then \(\| X \|_2 = |X_1|\).
Also \(\| X \|^2_2 = X_1^2\). Thus \(\EE (\| X \|^2_2) = \sigma^2\).
It looks like \(\EE (\| X \|^2_2)\) should be connected with \(\sigma^2\).
Norm can increase or decrease compared to the average value.
A ratio based measure between actual value and average value would be useful.
What is the probability that the norm increases beyond a given factor?
What is the probability that the norm reduces beyond a given factor?
These bounds are stated formally in the following theorem.
Suppose that \(X = [X_1, X_2,\dots, X_N]\), where each \(X_i\) is i.i.d. with \(X_i \sim \Sub(c^2)\). Then
Moreover, for any \(\alpha \in (0,1)\) and for any \(\beta \in [\frac{c^2}{\sigma^2}, \beta_{\max}]\), there exists a constant \(\kappa^* \geq 4\) depending only on \(\beta_{\max}\) and the ratio \(\frac{\sigma^2}{c^2}\) such that
and
First equation gives the average value of the square of the norm.
Second inequality states the upper bound on the probability that norm could reduce beyond a factor given by \(\alpha < 1\).
Third inequality states the upper bound on the probability that norm could increase beyond a factor given by \(\beta > 1\).
Note that if \(X_i\) are strictly subgaussian, then \(c=\sigma\). Hence \(\beta \in (1, \beta_{\max})\).
Proof. Since \(X_i\) are independent hence
This proves the first part.
Now let us look at (7.5).
By applying Markov’s inequality for any \(\lambda > 0\) we have:
Since \(X_i\) is \(c\)-subgaussian, hence from \cref {lem:subgaussian_exp_square_moment} we have
Thus:
Putting it back we get:
Since above is valid for all \(\lambda > 0\), we can minimize the R.H.S. over \(\lambda\) by setting the derivative w.r.t. \(\lambda\) to \(0\).
Thus we get optimum \(\lambda\) as:
Plugging this back we get:
Similarly proceeding for (7.4) we get
We need to simplify these equations. We will do some jugglery now. Consider the function
By differentiating twice, we can show that this is a strictly increasing function. Let us have \(\gamma \in (0, \gamma_{\max}]\). Define
Clearly
Which gives us:
Hence by exponentiating on both sides we get:
By slight manipulation we get:
We now choose
Substituting we get:
Finally
Thus we get
Similarly by choosing \(\gamma = \beta \frac{\sigma^2}{c^2}\) proves the other bound.
We can now map \(\gamma_{\max}\) to some \(\beta_{\max}\) by:
This result tells us that given a vector with entries drawn from a subgaussian distribution, we can expect the norm of the vector to concentrate around its expected value \(N\sigma^2\).