Probability Spaces
Contents
7.1. Probability Spaces#
Probability measures the amount of uncertainty in an event.
Typically, the probability of an event is expressed
as a nonnegative real number ranging between
When an event has a probability
, we are certain that the event will occur.When an event has a probability
, we are certain that the event will not occur.When we toss a coin, we are not certain whether it will turn heads (H) or tails (T). We normally assign a probability of
to both H and T outcomes as we believe that both outcomes are equally likely.When we throw a die, then the possible outcomes are
. If we believe that each outcome is equally likely, then we can assign a probability of to each outcome.If we have a fairly reliable digital communication channel, then the probability of error may be as low as
. In other words, there is a one in a million chance of a transmitted bit flipping during the transmission.
Given a sample space of outcomes, there are two main activities involved:
Assigning a probability to different outcomes or events in a manner that the probabilities are sensible.
Use the laws of probability theory to infer the probabilities of other outcomes or events.
Our notes will be based on the axiomatic treatment of probability.
We describe the rules for assigning sensible probabilities
to different events. We then develop the theory for computing with
the probabilities. We leave out the task of estimating the
probabilities of individual events which is covered extensively
in statistics.
The foundations of modern probability theory are rooted in
the measure theory which is a study of measures.
A measure is a generalization of geometric notions like
length, area and volume. The probability of a set is
also a measure.
Although we don’t provide extensive coverage of
measure theory in these notes,
we cover some fundamental concepts like
For an introduction to probability theory, see [8, 64, 68, 69]. For engineering applications, see [72]. For a comprehensive measure theoretic treatment, see [10].
(Random experiment)
A random experiment is an experiment in which the outcomes are nondeterministic. In other words, different outcomes can occur each time the experiment is run.
7.1.1. Sample Space#
(Sample space)
The sample space associated with a random experiment is the set of all possible outcomes of the experiment.
We shall often denote the sample space by the symbol
7.1.2. Sigma Algebras and Fields#
(Field, Algebra)
Consider a sample space
.If
then and .If
then .
In other words,
Note
The term algebra is used here in the sense of Boolean
algebra from set theory whose operations include
union, intersection and complement.
The notion of field here is different from the notion
of fields (e.g.
(trivial algebra)
Let
(algebras over a set of 4 elements)
Let
is an algebra. is an algebra. is an algebra.The power set consisting of all subsets of
is an algebra.
(Properties of an algebra)
Let
.If
, then .If
, then .If
then .
Proof. (1) Sample space
. .Hence
.
(2) Closure under finite union by mathematical induction
Base case: for
is trivial.Assume that the statement is true for some
.Let
.By inductive hypothesis
.Then
Since both
, hence there union is also in .Hence by mathematical induction,
is closed under all finite unions.
(3) Closure under finite intersection
We note that
Since
, hence for .Then
due to (2).Then
.
(4) Set difference
We note that
.Since
, hence .Since
, hence .
(Algebra from a partition)
Let
Since there are
sets in the partition , hence number of elements of is .By definition
and are in .Let
. Then both and are unions of some members of .Hence
is also a union of some members of .Hence
is closed under union.If
and are disjoint then .Otherwise,
is the union of some members of which are common in both and . Hence .Hence
is closed under intersection.Let
. Then is union of some members of .Then
is the union of remaining members of .Hence
.Hence
is closed under complement.Hence
is an algebra.
Often, the sample space
Consider an infinite sample space
The pair
When the sample space is obvious from the context, we will
often call
(Power set)
Let
(Countable intersection)
Let
Proof. We use the complement property.
Let
be subsets in .Then
are also in .Then their countable union:
Taking complement, we get
as desired.
Any
7.1.2.1. Generated Algebra#
(atom)
An atom of
Let
Proof. Denote
Empty Set
Since
for every , hence .
Union
Let
.Then
for every .Hence
for every .Hence
.
Intersection
Let
.Then
for every .Hence
for every .Hence
.
Countable union
Let
be a countable collection of subsets in .Then
for every .Since each
is a -algebra, hence for every .Hence
.
Let
Here by smallest, we mean that if there is any other
-algebra such that for every , then .Since the power set
is a algebra and it contains all subsets of (including ) hence there is always a -algebra containing a given collection of sets.It may not be possible to visualize every member of
easily from the descriptions of .
We next show that a smallest
Let
Proof. The main issue is to verify that if there is any other
Let
be the collection of all -algebras containing all sets of .We can see that
is nonempty since .By Theorem 7.3,
is a -algebra.We claim that
.Since
for every , hence .By construction
for every . In other words, is contained in every -algebra containing .Hence
is indeed the -algebra generated by .
We note that if
(Algebra generated by 2 sets)
Let
Since
, hence .Similarly,
.Since
and and their complements are in , hence as is closed under intersection.Let us name them
, , and .We can see that these four sets
are disjoint andWe now have a partition of
into disjoint sets. We can follow the lead from Example 7.3 to construct an algebra by constructing all the unions of or more sets from the collection .The empty set
doesn’t contain any of these sets .There are
of these disjoint subsets.There are
pair-wise unions of these sets.There are
unions of of these subsets.There is
union of all the subsets which is .A total of
possible subsets are formed.In particular, note that
, , and .We can see that this collection of
subsets of is an algebra following Example 7.3.This is indeed the smallest algebra containing
and as any other algebra must contain .Hence it is the algebra generated by
and .We can see that
are the atoms of the algebra .
7.1.2.2. Dynkin theorem#
When constructing probability measures
(see Probability Measure and Space),
it is generally impossible to assign
a probability to each subset in a
Let
Let
contains the empty set is closed under complements: if then is closed under countable disjoint union: if and for every , then .
If
In other words, the
The proof of this result is involved. The overall strategy works as follows:
We shall construct a
-algebra that lies between and .In particular, we shall construct the set
which is the intersection of all -systems containing .We then show that
is a -system.We then show that
is also a -system.We show that a collection which is both a
-system and a -system is also a -algebra.Thus, we claim that
is a -algebra.We finally show that
.
In order to prove this result, we first prove some of the intermediate steps individually.
(Closedness under proper differences)
A
Proof. We note that
Since
, hence .Since
, hence and are disjoint.Hence
since it is a disjoint union.Hence
in .But
.Hence
.
A family of subsets of
Proof. Let
since it is a -system. is closed under finite intersections since it is a -system.
We just need to show that it is closed under countable unions of (not necessarily disjoint) sets.
Let
.We shall write
as a countable union of disjoint sets.Let
.For
, letIn other words,
consists of all elements of which don’t appear in any of the sets .Then
is a sequence of disjoint sets.We can also see that
for every .Hence
.Since
is a -system, hence for every .Since
is a -system, hence for every .Hence
as it is a countable union of disjoint sets.
Suppose
Proof. Empty set
Since
, hence .
Countable union of disjoint sets
Let
be a sequence of disjoint subsets in .Then
for every .Then
since are also disjoint.Also
.Since
, hence .Hence
is closed under countable union of disjoint sets.
Complements
Let
. Then .Now
.Since
and and , hence due to Lemma 7.1, .Since
hence .Hence
is closed under complements.
Thus,
Let
Proof. We can easily verify that
Let
be the collection of all -systems containing .Then
.Since
for every , hence .Let
.Then
for every .Hence
for every .Hence
.Let
be a collection of pairwise disjoint sets.Then
for every .Hence
for every .Hence
.
Let
Proof. The proof uses a bootstrap argument often used in measure theory.
We first show that for any
Let
.Then
.Let
be the set of all sets such that .By Lemma 7.3,
is a -system.Let
. Then since is a -system.But
.Hence
.Hence
.Hence
.Thus,
is a -system containing .Hence
as is the intersection of all -systems containing .Thus, for any
and for any , the intersection . . .
We now show that for any
Consider any
.Let
be the set of all sets such that .By preceding argument
.Let
.Since
and hence .Hence
.Hence
.
By Lemma 7.3,
is a -system.Therefore
.This means that for any
, the intersection . . .
Thus,
We are now ready to prove the Dynkin
7.1.2.3. Borel -Algebra#
Recall the notions of topology (open sets, closed sets, etc.)
on the real line from Topology of Real Line.
If we consider the collection of open sets
of
The
The pair
Since
(Examples of Borel sets)
Following are some examples of Borel sets:
for any .The set of rational numbers.
Any countable subset of
.The intervals
, , , . .
A countable intersection of open sets is known as
a
A countable union of closed sets is known as
a
We recall that a countable intersection of open
sets need not be open and a countable union
of closed sets need not be closed.
However
We haven’t yet shown that
(Generation from one-sided closed intervals)
The Borel
Proof. .
Let
denote the collection of all open intervals.By Theorem 2.1, every open set in
is an at most countable union of open intervals.Hence
.Let
denote the collection of all intervals of the form .Let
for some .Let
be a rational number in . We can see that as .Let
be a rational number in . We can see that as .Thus,
Hence
.Hence
.Hence
.However, every element of
is a closed set.Hence
.We have
Hence
.
7.1.3. Events#
(Event)
An event is a subset of the sample space of
a random experiment that belongs to some
Note
Events are the subsets of the
sample space to which probabilities can
be assigned.
For finite sample spaces, any subset can
be an event. For infinite sample spaces,
it may not be possible to meaningfully
assign probabilities to every possible
subset. We need the notion of a
We can translate the set-theoretic language
to the language of events as follows.
Let
doesn’t occur is denoted by .Either
or occur is denoted by .Both
and occur is denoted by . occurs and doesn’t occur is denoted by . This can also be denoted as .The events
and are exhaustive if . In particular . and events are exclusive if .
(Elementary event)
An event consisting of only one outcome is called a singleton event or an elementary event.
(Certain event)
The sample space
Since
(Null event)
The empty set
The null event never occurs.
(Mutually exclusive events)
Let
7.1.4. Probability Measure and Space#
We next provide an axiomatic definition of a probability measure.
Note that we will often write a joint event (intersection of two
events)
as
(Probability measure)
Let
Nonnegativity:
.Unit measure or normalization:
.Additivity:
if .
We can write it in the form of axioms (first introduced by Andrey Kolmogorov).
(First axiom: nonnegativity)
The probability of an event is a nonnegative real number.
This axiom implies that the probability is always finite.
(Second axiom: unit measure)
(Third axiom: additivity)
Let
7.1.4.1. Probability Space#
(Probability space)
A sample space endowed with a
We next establish some basic facts about a probability measure.
7.1.4.2. Basic Properties#
(Properties of a probability measure)
Let
Probability of null event:
. .Complement rule:
.Sum rule:
.Monotonicity: If
, thenNumeric bound:
Finite additivity: For any positive integer
, we haveif
are pairwise disjoint events.
Proof. (1)
and are disjoint.Hence
This simplifies to
Hence
.
(2)
Recall that
and are disjoint sets with .Hence
Hence
(3)
and are disjoint with .Hence
Hence
.
(4)
Recall that we can split
into disjoint setsBy additivity, we have
(5)
We have
. and are disjoint.Then
By nonnegativity,
.Hence
.
(6)
We have
.But
.Hence
.
(7)
The statement is trivially true for
.The statement is true for
by the additivity rule.Assume that the statement is true for some
.In other words, for every collection of events
, such that the events are pairwise disjoint, we haveLet
be a collection of pairwise disjoint events. Define .We have
.Then
By principle of mathematical induction, the statement is true for every
.
Note
We assign probabilities to events and not to individual outcomes of a random experiment. If the sample space is finite or countable, often it is convenient to assign probabilities to individual outcomes. One should treat this as assignment of probability to the event consisting of a single outcome; a singleton event.
In the following, we shall assume that a
probability space
(Union of three events)
Let
Proof. Define
Then
Further
Note that
.Also
.Hence
Putting these back, we get
7.1.4.3. Inclusion-Exclusion Principle#
Theorem 7.8 can be extended
to the union of
(Inclusion-exclusion principle)
Let
where
In general for every
It is known as inclusion-exclusion principle since
Proof. The proof is based on mathematical induction.
7.1.4.4. Boole’s Inequality#
(Boole’s inequality)
Let
Proof. We prove it using induction.
For
, obviouslyAssume the inequality is true for the set of
events for some . In other words,Since
hence
Since
hence
7.1.4.5. Countable Additivity#
Often, we need to work with problems where we need to estimate the probability of a countable union of events. The basic axioms of a probability measure are unable to handle this. We need one more axiom that a probability measure must satisfy.
(Fourth axiom: countable additivity)
Let
7.1.5. Joint Probability#
(Joint probability)
Let
Similarly, let
In other words, it is the probability of every event happening together.
7.1.6. Conditional Probability#
Conditional probability provides us the means to reason about experiments with partial information. If an event is known to have happened, then the probabilities of other events associated with an experiment change. Here are some examples:
If a cancer test is 90% reliable and it turns positive for a person then the probability of the person having cancer increases dramatically.
One can analyze a corpus of English literature to estimate the probabilities of a letter coming after another. Given that a letter
has appeared as the first letter of the word, the probability that the next letter will be is higher than the general probability of being the second letter of a word.If a die is rolled twice successively and we are told that the sum of two rolls is
, then the probability that the first roll is is .
One point to note that conditional probability doesn’t establish any chronological order between events. It merely describes how the probabilities change based on partial information about an experiment.
(Conditional probability)
Let
Note that the conditional probability is not
defined if
By definition, we can see that
Consider an experiment of tossing a coin 3 times.
The sample space is
Assume that all the outcomes are equally likely. Each each outcome has the probability
.Let
denote the event that the first toss is a head. We haveWe can see that
.Let
be the event that more heads than tails come up in the three tosses. We haveWe can see that
.If the first outcome is a head, then the probability that more heads than tails come will increase.
Let us first check the event
. We haveHence
.Then the probability that more heads than tails come up given that first toss is a head is given by
We can also compute the probability that the first toss is a head given that more heads than tails come up as
We should verify that the conditional probability as defined above satisfies the axioms of probability.
The conditional probability is a probability measure.
Proof. (Nonnegativity) By definition, it is a ratio of nonnegative quantities. Hence, it is nonnegative.
(Normalization) We can see that
(Additivity)
Let
and be disjoint events.Then
and are also disjoint events.Hence
The argument for countable additivity is similar.
We note that
7.1.6.1. Properties#
Since
(Properties of a conditional probability measure)
Let
. . . . .If
, thenFor any positive integer
, we haveif
are pairwise disjoint events.Union of three events
Let
be a finite collection of events. Then we have
The proofs are similar to Theorem 7.7 and other results in the following.
Note
Since
7.1.6.2. Multiplication Rule#
(Multiplication rule)
Let
Then
Proof. We can see that
Draw 4 cards from a deck of 52 cards. What is the probability that none of them is a spade?
Define
as the event that the -th card is not a spade.Our event of interest is
.There are 13 cards of the suit spade. There are 39 other cards.
.Given that
has happened (i.e. the first card is not a spade), we are left with 51 cards out of which 38 are not spade.Hence
.The probability that the third card is not a spade is
.The probability that the fourth card is not a spade is
.Applying the multiplication rule
Another way to calculate this is through counting the number of ways four cards can be chosen.
Total number of ways four cards can be chosen is
.Total number of ways four cards can be chosen which are not spades are
.Hence the probability that none of the four cards are spades is
7.1.6.3. Marginal Probability#
(Marginal probability)
Let
7.1.6.4. Total Probability Theorem#
(Total probability theorem)
Let
Proof. .
Since
are disjoint, hence so are .We have
Applying additivity axiom,
Applying conditional probability definition, we have
7.1.7. Bayes Rule#
The most famous application of total probability theorem
is the Bayes rule. It relates the conditional probabilities
of the form
(Bayes rule)
Let
Proof. .
By definition of conditional probability, we have
and
Hence
Dividing both sides with
, we getExpanding
via total probability theorem, we get the desired result.
(Statistical inference)
Bayes’ rule is a key tool in the field of statistical inference.
We conduct an experiment where can observe an effect which may be due to a number of causes.
The events
denote the causes which may not be observable directly.The event
is the effect which can be observed.The probability
models the relationship between the cause and the effect . It represents the likelihood of happening given that has happened.We are often interested in knowing the probability of
given that has been observed.This is an inference since the events
cannot be observed directly.The probability
is known as the posterior probability of the event given that has happened.This is distinguished from the unconditional/marginal probability
which is the probability of the event without any information about the event . is known as the prior probability of .
7.1.8. Independence#
The conditional probability
(Independence of two events)
Let
Independence is a symmetric property. We can
see that if
It follows that for independent events
If
thenAnd if
then
Consider the problem of rolling a
Let
be the event that the first roll is .Let
be the event that the sum of two rolls is .We have
.We have
.Number of outcomes is
. . .We can see that
.Then
.We can see that
.The two events
and are independent.On the surface, it may seem that there should be dependence between the events
and but it is not so.Since the outcomes in the event
are equally likely, we can carefully examine the list of outcomes in the event to see that observing the event gives us no additional information about whether the first roll will give us .
Now consider the problem of rolling a
We have
.We have
.Number of outcomes is
. . .We can see that
.Then
.Clearly,
.The two events are dependent.
In fact
In other words, the probability of increases given that has been observed.Looking at the outcomes in
we can see that observance of eliminates the possibility of and from the first roll.This leads to the increase in the probability of
in the first roll.We can also see that
.The probability of
given has also increased.
7.1.8.1. Conditional Independence#
Recall that the conditional probability is a probability measure in itself. This enables us to introduce the notion of conditional independence.
(Conditional independence)
Given an event
Let
Proof. Note that
By conditional probability definition and multiplication rule, we have
Comparing this with the expression of conditional independence, we have
Eliminating the common term, we have
It is important to note that
Unconditional independence of
and doesn’t imply conditional independence given .Conditional independence of
and given doesn’t imply unconditional independence.
Consider tossing a fair coin twice. Assume that all outcomes are equally likely.
.Let
be the event that the first toss is a head. .Let
be the event that the second toss is a head. .It is easy to see that
and are independent.Let
be the event that the two tosses have different results. . . . .Clearly, the two events are not conditionally independent.
Let there be two coins. The first coin
is fair while the second coin is unfair
with the probability of head being
Let
be the event that the second (unfair) coin was chosen.Let
be the event that the -th toss resulted in heads.Given the choice of the coin, the two events
and are independent. . . .However, the two events
and are not independent unconditionally.If the first toss is a head, then the probability that the coin is unfair increases. Consequently, the probability that the second toss will also be a head increases.
From total probability theorem, we have
Similarly
.However,
.
7.1.8.2. Three Events#
Naturally, we are interested in extending the notion of independence for multiple events. However this is a bit tricky. We start with the simpler case of three events which can explain the issues involved. Independence means that occurrence or non-occurrence of any event or any pair of events should not provide any information about the occurrence of any other event or pair of events.
(Independence of three events)
Let
The first three conditions establish the pairwise independence of the three sets. However, the fourth condition is necessary and it doesn’t follow from the first three conditions. Conversely, the fourth condition doesn’t imply the first three conditions.
Let us look at some counter examples.
Consider the experiment of tossing a fair coin twice with equally likely outcomes.
.Let
be the event that -th toss is a head.Let
be the event that the two tosses have different results. . . . . . . . .We can see that
are pairwise independent.However
since occurrence of makes and mutually exclusive.Hence
.Hence
are not independent.
Consider the experiment of tossing a fair dice twice with equally likely outcomes.
The sample space has
possible outcomes.Let
be the event that the first roll has the outcomes , , or .Let
be the event that the first roll has the outcomes , , or .Let
be the event that the sum of two rolls is .We can see that
We have
We can see that
Hence
.Clearly,
Now
is the event that the first roll is . . .Hence, we have
None of
are pairwise independent.Hence
are not independent.
7.1.8.3. Events#
We are now ready to generalize the notion of independence to any finite number of events. The key idea is that information about occurrence or non-occurrence of any subset of events should not provide any information about the occurrence or non-occurrence of the remaining events.
Let
for all combinations of indices such that
In other words,
for every (nonempty) subset
7.1.8.4. Independent Trials#
(Independent trials)
If an experiment involves a sequence of independent but identical stages, we say that we have a sequence of independent trials.
(Bernoulli trials)
If every stage in a sequence of independent trials has only two possible outcomes, we say that we have a sequence of Bernoulli trials.
It is customary to denote the two outcomes
of a Bernoulli trial by
(Repeated coin tosses)
Consider tossing a coin
The outcomes of each experiment are
(head) and (tail).Let
and .The outcome of
tosses can be described as a string of letters each of which is or .There are
possible strings. They form the sample space of the compound experiment.Let a particular string have
heads and tails.Then the probability of this string is given by
There are
strings which consist of heads and tails.Each of these strings (singleton events) are mutually exclusive of each other.
Hence, by the additivity axiom, the probability of having
heads and tails in trials is given by
(Binomial probability)
The probability of
where
See Enumerative Combinatorics for a quick review of binomial coefficients.
(Sum of binomial probabilities)
7.1.9. Compound Experiments#
Often we need to examine the outcomes of different experiments together. Here are some examples:
Tossing a coin and throwing a dice
Tossing a coin twice in succession (first and second tosses are separate experiments)
Two or more experiments together form a compound experiment. Repeated trials are an example of a compound experiment.
(Compound experiment)
Let
7.1.9.1. Product -Algebra#
If
Let
However, we can see that the set
is not a
Let
The members of a product
7.1.9.2. Compound Events#
A compound event can be written as a finite (or countable) disjoint union of product events from the two experiments. A finite union looks like
where
(Compound events as union of product events)
Consider two experiments each of which consists of throwing a die.
The sample space for both experiments is
There are
possible outcomes in the compound experiment.The compound sample space is given by
7.1.9.3. Independent Experiments#
(Independent experiments)
Two experiments are called independent
if the outcome of one experiment doesn’t
depend on the (past, present or future) outcomes
of the other experiment.
In that case, for every product event
Let