7.4. Basic Inequalities#

Probability theory is full of inequalities. Many results are derived from the application of these inequalities. This section collects some basic inequalities.

A good reference is Wikipedia list of inequalities. In particular see the section on probability inequalities.

In this section we will cover the basic inequalities.

7.4.1. Markov’s inequality#

Theorem 7.18

Let \(X\) be a non-negative random variable and \(a > 0\). Then

\[ \PP (X \geq a) \leq \frac{\EE (X)}{a}. \]

7.4.2. Chebyshev’s inequality#

Theorem 7.19

Let \(X\) be a random variable with finite mean \(\mu\) and finite non-zero variance \(\sigma^2\). Then for any real number \(k > 0\), the following holds

\[ \PP (| X - \mu | \geq k \sigma) \leq \frac{1}{k^2}. \]

Proof. TBD.

Choosing \(k = \sqrt{2}\), we see that at least half of the values lie in the interval \((\mu - \sqrt{2} \sigma, \mu + \sqrt{2} \sigma)\).

7.4.3. Boole’s inequality#

This is also known as union bound.

7.4.4. Fano’s inequality#

7.4.5. Cramér–Rao inequality#

7.4.6. Hoeffding’s inequality#

This inequality provides an upper bound on the probability that the sum of random variables deviates from its expected value.

We start with a version of the inequality for i.i.d Bernoulli random variables.

Theorem 7.20

Let \(X_1, \dots, X_n\) be i.i.d. Bernoulli random variables with probability of success as \(p\). \(\EE \left [\sum_i X_i \right] = p n \). The probability of the sum deviating from the mean
by \(\epsilon n\) for some \(\epsilon > 0\) is bounded by

\[ \PP \left (\sum_i X_i \leq (p - \epsilon) n \right ) \leq \exp ( -2 \epsilon^2 n) \]

and

\[ \PP \left (\sum_i X_i \geq (p + \epsilon) n \right ) \leq \exp ( -2 \epsilon^2 n). \]

The two inequalities can be summarized as

\[ \PP \left [ (p - \epsilon) n \leq \sum_i X_i \leq (p + \epsilon) n \right ] \geq 1 - 2\exp ( -2 \epsilon^2 n). \]

The inequality states that the number of successes that we see is concentrated around its mean with exponentially small tail.

We now state the inequality for the general case for any (almost surely) bounded random variable.

Theorem 7.21

Let \(X_1, \dots, X_n\) be independent r.v.s. Assume that \(X_i\) are almost surely bounded; i.e.:

\[ \PP \left ( X_i \in [ a_i, b_i] \right ) = 1, \quad 1 \leq i \leq n. \]

Define the empirical mean of the variables as

\[ \overline{X} \triangleq \frac{1}{n} \left ( X_1 + \dots + X_n \right). \]

Then the probability that \(\overline{X}\) deviates from its mean \(\EE(\overline{X})\) by an amount \(t > 0\) is bounded by following inequalities:

\[ \PP \left ( \overline{X} - \EE(\overline{X}) \geq t \right ) \leq \exp \left ( - \frac{2 n^2 t^2}{\sum_{i = 1}^n (b_i - a_i)^2} \right) \]

and

\[ \PP \left ( \overline{X} - \EE(\overline{X}) \leq -t \right ) \leq \exp \left ( - \frac{2 n^2 t^2}{\sum_{i = 1}^n (b_i - a_i)^2} \right). \]

Together, we have

\[ \PP \left ( \left | \overline{X} - \EE(\overline{X}) \right | \geq t \right ) \leq 2\exp \left ( - \frac{2 n^2 t^2}{\sum_{i = 1}^n (b_i - a_i)^2} \right). \]

Note that we don’t require \(X_i\) to be identically distributed in this formulation. For the special case when \(X_i\) are i.i.d. uniform r.v.s over \([0, 1]\), then \(\EE(\overline{X}) = \EE(X_i) = \frac{1}{2}\) and

\[ \PP \left ( \left | \overline{X} - \frac{1}{2}\right | \geq t \right ) \leq 2\exp \left ( - 2 n t^2 \right). \]

Clearly, \(\overline{X}\) starts concentrating around its mean as \(n\) increases and the tail falls exponentially.

The proof of this result depends on what is known as Hoeffding’s Lemma.

Lemma 7.6

Let \(X\) be a zero mean r.v. with \(\PP (X \in [a, b]) = 1\). Then

\[ \EE \left [ \exp (t X) \right] \leq \exp \left ( \frac{1}{8} t^2 (b - a)^2 \right ). \]

7.4.7. Jensen’s inequality#

Jensen’s inequality relates the value of a convex function of an integral to the integral of the convex function. In the context of probability theory, the inequality take the following form.

Theorem 7.22

Let \(f : \RR \to \RR\) be a convex function. Then

\[ f \left ( \EE [X] \right ) \leq \EE \left [ f ( X ) \right ]. \]

The equality holds if and only if either \(X\) is a constant r.v. or \(f\) is linear.

7.4.8. Bernstein inequalities#

7.4.9. Chernoff’s inequality#

This is also known as Chernoff bound.

7.4.10. Fréchet inequalities#