7.1. Probability Spaces#

Probability measures the amount of uncertainty in an event. Typically, the probability of an event is expressed as a nonnegative real number ranging between 0 and 1.

  1. When an event has a probability 1, we are certain that the event will occur.

  2. When an event has a probability 0, we are certain that the event will not occur.

  3. When we toss a coin, we are not certain whether it will turn heads (H) or tails (T). We normally assign a probability of 0.5 to both H and T outcomes as we believe that both outcomes are equally likely.

  4. When we throw a die, then the possible outcomes are {1,2,3,4,5,6}. If we believe that each outcome is equally likely, then we can assign a probability of 16 to each outcome.

  5. If we have a fairly reliable digital communication channel, then the probability of error may be as low as 1e6. In other words, there is a one in a million chance of a transmitted bit flipping during the transmission.

Given a sample space of outcomes, there are two main activities involved:

  1. Assigning a probability to different outcomes or events in a manner that the probabilities are sensible.

  2. Use the laws of probability theory to infer the probabilities of other outcomes or events.

Our notes will be based on the axiomatic treatment of probability. We describe the rules for assigning sensible probabilities to different events. We then develop the theory for computing with the probabilities. We leave out the task of estimating the probabilities of individual events which is covered extensively in statistics. The foundations of modern probability theory are rooted in the measure theory which is a study of measures. A measure is a generalization of geometric notions like length, area and volume. The probability of a set is also a measure. Although we don’t provide extensive coverage of measure theory in these notes, we cover some fundamental concepts like σ-algebra in our introduction to probability theory.

For an introduction to probability theory, see [8, 64, 68, 69]. For engineering applications, see [72]. For a comprehensive measure theoretic treatment, see [10].

Definition 7.1 (Random experiment)

A random experiment is an experiment in which the outcomes are nondeterministic. In other words, different outcomes can occur each time the experiment is run.

7.1.1. Sample Space#

Definition 7.2 (Sample space)

The sample space associated with a random experiment is the set of all possible outcomes of the experiment.

We shall often denote the sample space by the symbol Ω. Individual outcomes shall often be denoted by ζ.

7.1.2. Sigma Algebras and Fields#

Definition 7.3 (Field, Algebra)

Consider a sample space Ω and a certain collection of subsets of Ω denoted by F. We say that F forms an algebra (in the set theoretic sense) over Ω if meets the following rules:

  1. F.

  2. If A,BF then ABF and ABF.

  3. If EF then ΩE=EcF.

In other words, F contains the empty set, F is closed under union, intersection, complement operations. The pair (Ω,F) is known as a field.

Note

The term algebra is used here in the sense of Boolean algebra from set theory whose operations include union, intersection and complement. The notion of field here is different from the notion of fields (e.g. R and C) in ring theory. Similarly the term algebra over Ω should not be confused with algebras over fields or rings in ring theory.

Example 7.1 (trivial algebra)

Let Ω be an arbitrary sample space. Then, {,Ω} is a trivial algebra over Ω.

Example 7.2 (algebras over a set of 4 elements)

Let Ω={1,2,3,4}.

  1. {,{1,2,3,4}} is an algebra.

  2. {,{1,2},{3,4},{1,2,3,4}} is an algebra.

  3. {,{1},{2,3,4},{1,2,3,4}} is an algebra.

  4. The power set consisting of all subsets of Ω is an algebra.

Theorem 7.1 (Properties of an algebra)

Let F be an algebra over Ω. Then

  1. ΩF.

  2. If A1,,AnF, then A1AnF.

  3. If A1,,AnF, then A1AnF.

  4. If A,BF then ABF.

F includes the sample space. F is closed under finite unions and finite intersections. F is closed under set difference.

Proof. (1) Sample space

  1. F.

  2. Ω=c.

  3. Hence ΩF.

(2) Closure under finite union by mathematical induction

  1. Base case: for n=2 is trivial.

  2. Assume that the statement is true for some n2.

  3. Let A1,,An,An+1F.

  4. By inductive hypothesis A=A1AnF.

  5. Then

    A1AnAn+1=(A1An)An+1=AAn+1.
  6. Since both A,An+1F, hence there union is also in F.

  7. Hence by mathematical induction, F is closed under all finite unions.

(3) Closure under finite intersection

  1. We note that

    A1An=(A1cAnc)c.
  2. Since AiF, hence AicF for i=1,,n.

  3. Then A=A1cAncF due to (2).

  4. Then A1An=AcF.

(4) Set difference

  1. We note that AB=ABc.

  2. Since BF, hence BcF.

  3. Since A,BcF, hence ABcF.

Example 7.3 (Algebra from a partition)

Let Ω be a sample space. Let A={A1,,An} be a (finite) partition of Ω. In other words, Ai are pairwise disjoint and their union is Ω. Then the collection F consisting of all unions of the sets Ai (including the empty set which is the union of zero sets) forms an algebra.

  1. Since there are n sets in the partition A, hence number of elements of F is 2n.

  2. By definition and Ω are in F.

  3. Let X,YF. Then both X and Y are unions of some members of A.

  4. Hence XY is also a union of some members of A.

  5. Hence F is closed under union.

  6. If X and Y are disjoint then XYF.

  7. Otherwise, XY is the union of some members of A which are common in both X and Y. Hence XYF.

  8. Hence F is closed under intersection.

  9. Let XF. Then X is union of some members of A.

  10. Then ΩX is the union of remaining members of A.

  11. Hence ΩXF.

  12. Hence F is closed under complement.

  13. Hence F is an algebra.

Often, the sample space Ω is an infinite set (e.g., R) and the field of subsets of Ω is also infinite. For example, the Borel field defined over R contains all the open and closed intervals in R. We have to deal with situations which involve countable unions and intersections of sets in the field. Mathematical induction shows that a field is closed under finite unions and intersections but it is not enough to prove that it is also closed under countable unions. To handle such cases, we need to extend the definition of a field.

Definition 7.4 (σ-field, σ-algebra)

Consider an infinite sample space Ω and a certain collection of subsets of Ω denoted by F. We say that F is a σ-algebra over Ω if it is an algebra over Ω and it is closed under countable unions. In other words, if A1,A2, is a countable collection of sets in F, then, their union

i=1AiF.

The pair (Ω,F) is known as a σ-field.

When the sample space is obvious from the context, we will often call F also as a field or a σ-field as appropriate.

Example 7.4 (Power set)

Let Ω be an arbitrary sample space. Then its power set P(Ω) is a σ-algebra.

Theorem 7.2 (Countable intersection)

Let F be a σ-algebra over Ω. Then F is closed under countable intersection. In other words, if A1,A2, is a countable collection of sets in F, then, their intersection

i=1AiF.

Proof. We use the complement property.

  1. Let A1,A2, be subsets in F.

  2. Then A1c,A2c, are also in F.

  3. Then their countable union:

    i=1AicF.
  4. Taking complement, we get

    i=1AiF

    as desired.

Remark 7.1

Any σ-algebra F of subsets of a sample space Ω likes between the two extremes:

{,Ω}FP(Ω).

7.1.2.1. Generated σ Algebra#

Definition 7.5 (atom)

An atom of F is a set AF such that the only subsets of A which are also in F are the empty set and A itself.

Theorem 7.3 (Intersection of σ-algebras)

Let G={Gi}iI be a nonempty collection of σ-algebras over Ω where I is some index set. Then their intersection iIGi is also a σ-algebra.

Proof. Denote F=iIGi. We shall verify all the properties of a σ-algebra.

Empty Set

  1. Since Gi for every i, hence F.

Union

  1. Let A,BF.

  2. Then A,BGi for every i.

  3. Hence ABGi for every i.

  4. Hence ABF.

Intersection

  1. Let A,BF.

  2. Then A,BGi for every i.

  3. Hence ABGi for every i.

  4. Hence ABF.

Countable union

  1. Let A1,A2, be a countable collection of subsets in F.

  2. Then A1,A2,Gi for every i.

  3. Since each Gi is a σ-algebra, hence j=1AjGi for every i.

  4. Hence j=1AjF.

Definition 7.6 (σ-algebra generated by a collection of sets)

Let A={Ai}iI be a collection of subsets of Ω where I is an index set. Let F be the smallest σ-algebra such that AiF for every iI. Then F is called the σ-algebra generated by A and is denoted by σ(A).

  1. Here by smallest, we mean that if there is any other σ-algebra G such that AiG for every iI, then FG.

  2. Since the power set P(Ω) is a σ algebra and it contains all subsets of Ω (including A) hence there is always a σ-algebra containing a given collection of sets.

  3. It may not be possible to visualize every member of F easily from the descriptions of Ai.

We next show that a smallest σ-algebra exists for every collection A.

Theorem 7.4 (Existence of the generated σ-algebra)

Let A be a collection of subsets of F. Then there exists a smallest σ-algebra containing A. In other words, there is a σ-algebra generated by A.

Proof. The main issue is to verify that if there is any other σ-algebra, it contains the smallest one. We provide a constructive proof.

  1. Let G={Gj}jJ be the collection of all σ-algebras containing all sets of A.

  2. We can see that G is nonempty since P(Ω)G.

  3. By Theorem 7.3, F=jJGj is a σ-algebra.

  4. We claim that σ(A)=F.

  5. Since AGj for every jJ, hence AF.

  6. By construction FGj for every jJ. In other words, F is contained in every σ-algebra containing A.

  7. Hence F is indeed the σ-algebra generated by A.

We note that if A itself is a σ-algebra, then of course σ(A)=A.

Example 7.5 (Algebra generated by 2 sets)

Let Ω be an arbitrary sample space. Let A and B be two subsets of Ω which are not necessarily disjoint. We shall construct the algebra F generated by A and B.

  1. Since AF, hence AcF.

  2. Similarly, BcF.

  3. Since A and B and their complements are in F, hence AB,ABc,AcB,AcBcF as F is closed under intersection.

  4. Let us name them E=AB, F=ABc, G=AcB and H=AcBc.

  5. We can see that these four sets E,F,G,H are disjoint and

    Ω=EFGH.
  6. We now have a partition of Ω into 4 disjoint sets. We can follow the lead from Example 7.3 to construct an algebra by constructing all the unions of 0 or more sets from the collection {E,F,G,H}.

  7. The empty set doesn’t contain any of these sets (40).

  8. There are (41)=4 of these disjoint subsets.

  9. There are (42)=6 pair-wise unions of these 4 sets.

  10. There are (43)=4 unions of 3 of these 4 subsets.

  11. There is (44)=1 union of all the 4 subsets which is Ω.

  12. A total of 1+4+6+4+1=16=24 possible subsets are formed.

  13. In particular, note that A=EF, B=EG, Ac=GH and Bc=FH.

  14. We can see that this collection of 16 subsets of Ω is an algebra following Example 7.3.

  15. This is indeed the smallest algebra containing A and B as any other algebra must contain F.

  16. Hence it is the algebra generated by A and B.

  17. We can see that E,F,G,H are the atoms of the algebra F.

7.1.2.2. Dynkin πλ theorem#

When constructing probability measures (see Probability Measure and Space), it is generally impossible to assign a probability to each subset in a σ-algebra. The Carathéodory extension theorem allows us to define a measure explicitly for only a small collection of simple sets, which may or may not form a σ-algebra (e.g. the atoms inside a σ-algebra) and automatically extend the measure to all other sets in the algebra. The uniqueness claim in the extension theorem makes use of Dynkin πλ theorem (below). Readers may skip this subsection in the first reading.

Definition 7.7 (π system)

Let Ω be a sample space. A collection P of subsets of Ω is a π-system if P is closed under finite intersections. In other words, if A,BP, then ABP.

Definition 7.8 (λ system)

Let Ω be a sample space. A collection L of subsets of Ω is a λ-system if

  1. L contains the empty set

  2. L is closed under complements: if AL then AcL

  3. L is closed under countable disjoint union: if A1,A2,L and AiAj= for every ij, then i=1AiL.

Theorem 7.5 (Dynkin πλ theorem)

If P is a π system and L a λ-system of subsets of Ω with PL then

σ(P)L.

In other words, the σ-algebra generated by P is contained in L.

The proof of this result is involved. The overall strategy works as follows:

  1. We shall construct a σ-algebra that lies between P and L.

  2. In particular, we shall construct the set l(P) which is the intersection of all λ-systems containing P.

  3. We then show that l(P) is a λ-system.

  4. We then show that l(P) is also a π-system.

  5. We show that a collection which is both a λ-system and a π-system is also a σ-algebra.

  6. Thus, we claim that l(P) is a σ-algebra.

  7. We finally show that σ(P)l(P)L.

In order to prove this result, we first prove some of the intermediate steps individually.

Lemma 7.1 (Closedness under proper differences)

A λ-system is closed under proper differences. In other words, if A,BL where L is a λ-system and AB then the difference BA is also in L.

Proof. We note that BA=BAc. We shall show this as a complement of the disjoint union of sets.

  1. Since BL, hence BcL.

  2. Since AB, hence A and Bc are disjoint.

  3. Hence D=ABcL since it is a disjoint union.

  4. Hence Dc=AcB in L.

  5. But BA=BAc=Dc.

  6. Hence BAL.

Lemma 7.2 (π+λσ)

A family of subsets of Ω which is both a π-system and a λ-system is a σ-algebra.

Proof. Let S be a family of subsets of Ω which is both a π-system and a λ-system.

  1. S since it is a λ-system.

  2. S is closed under finite intersections since it is a π-system.

We just need to show that it is closed under countable unions of (not necessarily disjoint) sets.

  1. Let A1,A2,S.

  2. We shall write i=1Ai as a countable union of disjoint sets.

  3. Let B1=A1.

  4. For n2, let

    Bn=An(A1A2An1)=AnA1cA2cAn1c.

    In other words, Bn consists of all elements of An which don’t appear in any of the sets A1,,An1.

  5. Then B1,B2, is a sequence of disjoint sets.

  6. We can also see that A1An=B1Bn for every n.

  7. Hence i=1Ai=i=1Bi.

  8. Since S is a λ-system, hence AicS for every i.

  9. Since S is a π-system, hence BiS for every i.

  10. Hence i=1BiS as it is a countable union of disjoint sets.

Lemma 7.3 (λ-co-systems)

Suppose L is a λ-system of subsets of Ω. For any set AL, let SA be the set of all BΩ for which ABL. Then SA is a λ-system.

Proof. Empty set

  1. Since =AL, hence SA.

Countable union of disjoint sets

  1. Let B1,B2,SA be a sequence of disjoint subsets in SA.

  2. Then ABiL for every i.

  3. Then (ABi)L since ABi are also disjoint.

  4. Also (ABi)=A(Bi).

  5. Since A(Bi)L, hence BiSA.

  6. Hence SA is closed under countable union of disjoint sets.

Complements

  1. Let BSA. Then ABL.

  2. Now ABc=AB=A(AB).

  3. Since AL and ABL and ABA, hence due to Lemma 7.1, A(AB)L.

  4. Since ABcL hence BcSA.

  5. Hence SA is closed under complements.

Thus, SA is indeed a λ-system.

Lemma 7.4

Let l(P) be the intersection of all λ-systems containing P. Then l(P) is a λ-system.

Proof. We can easily verify that l(P) satisfies all the properties of a λ-system.

  1. Let L={Li} be the collection of all λ-systems containing P.

  2. Then l(P)=Li.

  3. Since Li for every i, hence l(P).

  4. Let Al(P).

  5. Then ALi for every i.

  6. Hence AcLi for every i.

  7. Hence Acl(P).

  8. Let A1,A2,l(P) be a collection of pairwise disjoint sets.

  9. Then A1,A2,Li for every i.

  10. Hence AiLi for every i.

  11. Hence Ail(P).

Lemma 7.5

Let l(P) be the intersection of all λ-systems containing P. Then l(P) is a π-system.

Proof. The proof uses a bootstrap argument often used in measure theory.

We first show that for any AP and Bl(P), ABl(P).

  1. Let AP.

  2. Then Al(P).

  3. Let SA be the set of all sets BΩ such that ABl(P).

  4. By Lemma 7.3, SA is a λ-system.

  5. Let BP. Then ABP since P is a π-system.

  6. But Pl(P).

  7. Hence ABl(P).

  8. Hence BSA.

  9. Hence PSA.

  10. Thus, SA is a λ-system containing P.

  11. Hence l(P)SA as l(P) is the intersection of all λ-systems containing P.

  12. Thus, for any AP and for any Bl(P), the intersection ABl(P).

    1. Bl(P)BSA.

    2. BSAABl(P).

We now show that for any A,Bl(P), ABl(P).

  1. Consider any Bl(P).

  2. Let SB be the set of all sets CΩ such that BCl(P).

  3. By preceding argument PSB.

    1. Let AP.

    2. Since AP and Bl(P) hence ABl(P).

    3. Hence ASB.

    4. Hence PSB.

  4. By Lemma 7.3, SB is a λ-system.

  5. Therefore l(P)SB.

  6. This means that for any Al(P), the intersection ABl(P).

    1. Al(P)ASB.

    2. ASBBAl(P).

Thus, l(P) is closed under intersections and is indeed a π-system.

We are now ready to prove the Dynkin πλ theorem (Theorem 7.5).

Proof. Let l(P) be the intersection of all λ-systems containing P.

  1. By hypothesis L is a λ-system containing P.

  2. By definition l(P)L.

  3. By Lemma 7.4, l(P) is a λ-system.

  4. By Lemma 7.5, l(P) is a π-system.

  5. By Lemma 7.2, l(P) is a σ-algebra.

  6. By definition σ(P) is the smallest σ-algebra containing P.

  7. Hence Pσ(P)l(P)L.

7.1.2.3. Borel σ-Algebra#

Recall the notions of topology (open sets, closed sets, etc.) on the real line from Topology of Real Line. If we consider the collection of open sets of R, denoted by O, then it is clear that it is not a σ-algebra as it is not closed under complements. We are interested in the σ-algebra generated by O. By Theorem 7.4 we know that such a σ-algebra exists.

Definition 7.9 (Borel σ-algebra)

The σ-algebra generated by the open sets of the real line R is known as the Borel σ-algebra of R and is denoted by B. In other words, if O is the collection of all open subsets of R, then

B=σ(O).

The pair (R,B) is known as the Borel field. The members of the Borel σ-algebra are known as Borel sets.

Since B is a σ-algebra, it contains all open sets, all closed sets, all countable unions of closed sets, all countable intersections of open sets. There are other subsets of real line which are not included in the Borel σ-algebra. However, they don’t happen to be of much engineering and scientific interest.

Example 7.6 (Examples of Borel sets)

Following are some examples of Borel sets:

  1. {x} for any xR.

  2. The set of rational numbers.

  3. Any countable subset of R.

  4. The intervals (0,1), [0,1], (0,1], [1,0).

  5. [1,2][4,5].

Definition 7.10 (Gδ-set)

A countable intersection of open sets is known as a Gδ-set.

Definition 7.11 (Fσ-set)

A countable union of closed sets is known as a Fσ-set.

We recall that a countable intersection of open sets need not be open and a countable union of closed sets need not be closed. However B contains all the Gδ and Fσ sets since it is a σ-algebra.

We haven’t yet shown that B2R. In other words, there exist non-Borel subsets of R. There are other characterizations of the Borel σ-algebra which provide us with a better understanding of its structure.

Theorem 7.6 (Generation from one-sided closed intervals)

The Borel σ-algebra B is generated by intervals of the form (,a], where aQ is a rational number.

Proof. .

  1. Let O0 denote the collection of all open intervals.

  2. By Theorem 2.1, every open set in R is an at most countable union of open intervals.

  3. Hence σ(O0)=B.

  4. Let D denote the collection of all intervals of the form (,a],aQ.

  5. Let (a,b)O0 for some a<b.

  6. Let an be a rational number in (a,a+1n). We can see that ana as n.

  7. Let bn be a rational number in (b1n,b). We can see that bnb as n.

  8. Thus,

    (a,b)=n=1(an,bn]=n=1{(,bn](,an]c}.
  9. Hence (a,b)σ(D).

  10. Hence O0σ(D).

  11. Hence σ(O0)σ(D).

  12. However, every element of D is a closed set.

  13. Hence σ(D)B.

  14. We have

    Bσ(O0)σ(D)B.
  15. Hence B=σ(D).

7.1.3. Events#

Definition 7.12 (Event)

An event is a subset of the sample space of a random experiment that belongs to some σ-algebra defined on it. Let Ω be the sample space of a random experiment. Let F be a σ-algebra of subsets of Ω. Then every member of F is an event.

Note

Events are the subsets of the sample space to which probabilities can be assigned. For finite sample spaces, any subset can be an event. For infinite sample spaces, it may not be possible to meaningfully assign probabilities to every possible subset. We need the notion of a σ-algebra which is a collection of subsets of the sample space satisfying closure under countable unions, intersections and complements. The subsets belonging to a σ-algebra can be assigned probabilities and are events.

We can translate the set-theoretic language to the language of events as follows. Let A,B be two different events.

  1. A doesn’t occur is denoted by Ac.

  2. Either A or B occur is denoted by AB.

  3. Both A and B occur is denoted by AB.

  4. A occurs and B doesn’t occur is denoted by AB. This can also be denoted as ABc.

  5. The events A and B are exhaustive if Ω=AB. In particular AAc=Ω.

  6. A and B events are exclusive if AB=.

Definition 7.13 (Elementary event)

An event consisting of only one outcome is called a singleton event or an elementary event.

Definition 7.14 (Certain event)

The sample space Ω is known as the certain event.

Since Ω contains all possible outcomes of an experiment, hence this event always occurs whenever the experiment is run.

Definition 7.15 (Null event)

The empty set is known as the null event.

The null event never occurs.

Definition 7.16 (Mutually exclusive events)

Let E and F be two events. If E and F are disjoint sets then we say that the two events are mutually exclusive.

7.1.4. Probability Measure and Space#

We next provide an axiomatic definition of a probability measure. Note that we will often write a joint event (intersection of two events) as AB rather than AB.

Definition 7.17 (Probability measure)

Let Ω be a sample space and let F be a σ algebra of subsets of Ω. A probability measure is a set function P:FR that assigns to every event EF a real number P(E) called the probability of the event E satisfying the following rules:

  1. Nonnegativity: P(E)0.

  2. Unit measure or normalization: P(Ω)=1.

  3. Additivity: P(EF)=P(E)+P(F) if EF=.

We can write it in the form of axioms (first introduced by Andrey Kolmogorov).

Axiom 7.1 (First axiom: nonnegativity)

The probability of an event is a nonnegative real number.

P(E)R,P(E)0EF.

This axiom implies that the probability is always finite.

Axiom 7.2 (Second axiom: unit measure)

P(Ω)=1.

Axiom 7.3 (Third axiom: additivity)

Let E,FF be two disjoint sets. Then

P(EF)=P(E)+P(F).

7.1.4.1. Probability Space#

Definition 7.18 (Probability space)

A sample space endowed with a σ-algebra and a probability measure is known as a probability space. In other words, let Ω be the sample space of a random experiment, let F be a σ algebra of subsets of Ω and let P be a probability measure defined on F. Then the triplet (Ω,F,P) is known as a probability space.

We next establish some basic facts about a probability measure.

7.1.4.2. Basic Properties#

Theorem 7.7 (Properties of a probability measure)

Let (Ω,F,P) be a probability space. Then for the events contained in F satisfy the following properties.

  1. Probability of null event: P()=0.

  2. P(EFc)=P(E)P(EF).

  3. Complement rule: P(E)=1P(Ec).

  4. Sum rule: P(EF)=P(E)+P(F)P(EF).

  5. Monotonicity: If EF, then

    P(E)P(F).
  6. Numeric bound:

    0P(E)1.
  7. Finite additivity: For any positive integer n, we have

    P(i=1nEi)=i=1nP(Ei)

    if E1,E2,,En are pairwise disjoint events.

Proof. (1)

  1. and Ω are disjoint.

  2. Hence

    P(Ω)=P()+P(Ω).
  3. This simplifies to

    1=P()+1.
  4. Hence P()=0.

(2)

  1. Recall that EFc and EF are disjoint sets with EFcEF=E.

  2. Hence

    P(E)=P(EFc)+P(EF).
  3. Hence

    P(EFc)=P(E)P(EF).

(3)

  1. E and Ec are disjoint with EEc=Ω.

  2. Hence

    1=P(Ω)=P(EEc)=P(E)+P(Ec).
  3. Hence P(E)=1P(Ec).

(4)

  1. Recall that we can split EF into disjoint sets

    EF=EFcEFEcF.
  2. By additivity, we have

    P(EF)=P(EFc(EFEcF))=P(EFc)+P(EFEcF)=P(EFc)+P(EF)+P(EcF)=P(E)P(EF)+P(EF)+P(F)P(EF)=P(E)+P(F)P(EF).

(5)

  1. We have F=FEFEc=EFEc.

  2. E and FEc are disjoint.

  3. Then

    P(F)=P(EFEc)=P(E)+P(FEc).
  4. By nonnegativity, P(FEc)0.

  5. Hence P(E)P(F).

(6)

  1. We have P(E)=1P(Ec).

  2. But P(Ec)0.

  3. Hence P(E)1.

(7)

  1. The statement is trivially true for n=1.

  2. The statement is true for n=2 by the additivity rule.

  3. Assume that the statement is true for some k2.

  4. In other words, for every collection of events E1,,Ek, such that the events are pairwise disjoint, we have

    P(i=1kEi)=i=1kP(Ei).
  5. Let E1,E2,,Ek,Ek+1 be a collection of k+1 pairwise disjoint events. Define E=i=1kEi.

  6. We have EEk+1=.

  7. Then

    P(i=1k+1Ei)=P(EEk+1)=P(E)+P(Ek+1)=P(i=1kEi)+P(Ek+1)=i=1kP(Ei)+P(Ek+1)=i=1k+1P(Ei).
  8. By principle of mathematical induction, the statement is true for every n.

Note

We assign probabilities to events and not to individual outcomes of a random experiment. If the sample space is finite or countable, often it is convenient to assign probabilities to individual outcomes. One should treat this as assignment of probability to the event consisting of a single outcome; a singleton event.

In the following, we shall assume that a probability space (Ω,F,P) has been given and all events are contained in F.

Theorem 7.8 (Union of three events)

Let A,B,C be three events. Then

P(ABC)=P(A)+P(B)+P(C)P(AB)P(BC)P(AC)+P(ABC).

Proof. Define D=BC.

  1. Then

    P(ABC)=P(AD)=P(A)+P(D)P(AD).
  2. Further

    P(D)=P(BC)=P(B)+P(C)P(BC).
  3. Note that AD=ABAC.

  4. Also ABAC=ABC.

  5. Hence

    P(AD)=P(ABAC)=P(AB)+P(AC)P(ABC).
  6. Putting these back, we get

    P(ABC)=P(A)+(P(B)+P(C)P(BC))(P(AB)+P(AC)P(ABC))=P(A)+P(B)+P(C)P(AB)P(BC)P(AC)+P(ABC).

7.1.4.3. Inclusion-Exclusion Principle#

Theorem 7.8 can be extended to the union of n events. This is known as the inclusion-exclusion principle.

Theorem 7.9 (Inclusion-exclusion principle)

Let A1,A2,,An be n events in a probability space (Ω,F,P). Then

P(i=1nAi)=S1S2+S3+(1)n+1Sn=k=1n(1)k+1Sk

where Sk is the sum of the probability of all k-cardinality intersections among the sets A1,,An. In particular,

S1=i=1nP(Ai)S2=1i<jnP(AiAj)S3=1i<j<knP(AiAjAk)Sn=P(A1A2An).

In general for every k1,,n, we can write:

Sk=1i1<i2<<iknP(Ai1Ai2Aik).

It is known as inclusion-exclusion principle since S1 is included then S2 is excluded, then S3 is included and so on.

Proof. The proof is based on mathematical induction.

7.1.4.4. Boole’s Inequality#

Theorem 7.10 (Boole’s inequality)

Let A1,A2,,An be a finite collection of events. Then we have

P(i=1nAi)i=1nP(Ai).

Proof. We prove it using induction.

  1. For n=1, obviously

    P(A1)P(A1).
  2. Assume the inequality is true for the set of n events for some n1. In other words,

    P(i=1nAi)i=1nP(Ai).
  3. Since

    P(AB)=P(A)+P(B)P(AB),

    hence

    P(i=1n+1Ai)=P(i=1nAi)+P(An+1)P(i=1nAiAn+1).
  4. Since

    P(i=1nAiAn+1)0,

    hence

    P(i=1n+1Ai)P(i=1nAi)+P(An+1)i=1n+1P(Ai).

7.1.4.5. Countable Additivity#

Often, we need to work with problems where we need to estimate the probability of a countable union of events. The basic axioms of a probability measure are unable to handle this. We need one more axiom that a probability measure must satisfy.

Axiom 7.4 (Fourth axiom: countable additivity)

Let E1,E2, be a (countable) sequence of mutually exclusive events (disjoint sets). Then

P(i=1Ei)=i=1P(Ei).

7.1.5. Joint Probability#

Definition 7.19 (Joint probability)

Let A and B be two different events. Then the joint probability of the events A and B is the probability that the two events occur together and is given by P(AB).

Similarly, let {Ai}iI be a collection of events indexed by the set I. Then their joint probability is given by

P(iIAi).

In other words, it is the probability of every event happening together.

7.1.6. Conditional Probability#

Conditional probability provides us the means to reason about experiments with partial information. If an event is known to have happened, then the probabilities of other events associated with an experiment change. Here are some examples:

  1. If a cancer test is 90% reliable and it turns positive for a person then the probability of the person having cancer increases dramatically.

  2. One can analyze a corpus of English literature to estimate the probabilities of a letter coming after another. Given that a letter t has appeared as the first letter of the word, the probability that the next letter will be h is higher than the general probability of h being the second letter of a word.

  3. If a die is rolled twice successively and we are told that the sum of two rolls is 7, then the probability that the first roll is 1 is 0.5.

One point to note that conditional probability doesn’t establish any chronological order between events. It merely describes how the probabilities change based on partial information about an experiment.

Definition 7.20 (Conditional probability)

Let A and B be two events. Assume that P(A)>0. The conditional probability of the event B given that the event A has happened is denoted by P(B|A). It is defined as

P(B|A)P(AB)P(A).

Note that the conditional probability is not defined if P(A)=0.

By definition, we can see that

P(AB)=P(B|A)P(A)=P(A|B)P(B).

Example 7.7

Consider an experiment of tossing a coin 3 times.

  1. The sample space is

    Ω={HHH,HHT,HTH,HTT,THH,THT,TTH,TTT}.
  2. Assume that all the outcomes are equally likely. Each each outcome has the probability 18.

  3. Let A denote the event that the first toss is a head. We have

    A={HHH,HHT,HTH,HTT}.
  4. We can see that P(A)=12.

  5. Let B be the event that more heads than tails come up in the three tosses. We have

    B={HHH,HHT,HTH,THH}.
  6. We can see that P(B)=12.

  7. If the first outcome is a head, then the probability that more heads than tails come will increase.

  8. Let us first check the event AB. We have

    AB={HHH,HHT,HTH}.
  9. Hence P(AB)=38.

  10. Then the probability that more heads than tails come up given that first toss is a head is given by

    P(B|A)=3/81/2=34.
  11. We can also compute the probability that the first toss is a head given that more heads than tails come up as

    P(A|B)=3/81/2=34.

We should verify that the conditional probability as defined above satisfies the axioms of probability.

Theorem 7.11

The conditional probability is a probability measure.

Proof. (Nonnegativity) By definition, it is a ratio of nonnegative quantities. Hence, it is nonnegative.

(Normalization) We can see that

P(Ω|A)=P(AΩ)P(A)=P(A)P(A)=1.

(Additivity)

  1. Let B1 and B2 be disjoint events.

  2. Then AB1 and AB2 are also disjoint events.

  3. Hence

    P(B1B2|A)=P(A(B1B2))P(A)=P(AB1AB2))P(A)=P(AB1)+P(AB2))P(A)=P(AB1))P(A)+P(AB2))P(A)=P(B1|A)+P(B2|A).

The argument for countable additivity is similar.

We note that

P(A|A)=P(AA)P(A)=P(A)P(A)=1.

7.1.6.1. Properties#

Since P(B|A) is a valid probability measure, all the properties of a probability measure are applicable for the conditional probability also.

Theorem 7.12 (Properties of a conditional probability measure)

Let (Ω,F,P) be a probability space. Let all probabilities be conditioned on an event A. Then the following properties hold:

  1. P(|A)=0.

  2. P(A|A)=1.

  3. P(EFc|A)=P(E|A)P(EF|A).

  4. P(E|A)=1P(Ec|A).

  5. P(EF|A)=P(E|A)+P(F|A)P(EF|A).

  6. If EF, then

    P(E|A)P(F|A).
  7. For any positive integer n, we have

    P(i=1nEi|A)=i=1nP(Ei|A)

    if E1,E2,,En are pairwise disjoint events.

  8. Union of three events

    P(BCD|A)=P(B|A)+P(C|A)+P(D|A)P(BC|A)P(CD|A)P(BD|A)+P(BCD|A).
  9. Let B1,B2,,Bn be a finite collection of events. Then we have

    P(i=1nBi|A)i=1nP(Bi|A).

The proofs are similar to Theorem 7.7 and other results in the following.

Note

Since P(A|A)=1, one can see that all of the conditional probability is concentrated on the outcomes in A. Thus, we might as well discard the outcomes in Ac and treat the conditional probabilities as a probability measure on the new sample space A.

7.1.6.2. Multiplication Rule#

Theorem 7.13 (Multiplication rule)

Let A1,A2,,An be a finite collection of events and let A be an event which occurs if and only if each of these events occur. In other words,

A=A1A2An.

Then

P(A)=P(A1)P(A2|A1)P(A3|A1A2)P(An|A1A2An1).

Proof. We can see that

P(A)=P(A1A2An)=P(A1)P(A1A2An)P(A1)=P(A1)P(A1A2)P(A1)P(A1A2An)P(A1A2)=P(A1)P(A1A2)P(A1)P(A1A2A3)P(A1A2)P(A1A2An)P(A1A2A3)=P(A1)P(A1A2)P(A1)P(A1A2A3)P(A1A2)P(A1A2An)P(A1A2A3An1)=P(A1)P(A2|A1)P(A3|A1A2)P(An|A1A2An1).

Example 7.8

Draw 4 cards from a deck of 52 cards. What is the probability that none of them is a spade?

  1. Define Ai as the event that the i-th card is not a spade.

  2. Our event of interest is A=A1A2A3A4.

  3. There are 13 cards of the suit spade. There are 39 other cards.

  4. P(A1)=3952.

  5. Given that A1 has happened (i.e. the first card is not a spade), we are left with 51 cards out of which 38 are not spade.

  6. Hence P(A2|A1)=3851.

  7. The probability that the third card is not a spade is P(A3|A1A2)=3750.

  8. The probability that the fourth card is not a spade is P(A4|A1A2A3)=3649.

  9. Applying the multiplication rule

    P(A)=P(A1A2A3A4)=P(A1)P(A2|A1)P(A3|A1A2)P(A4|A1A2A3)=3952385137503649.

Another way to calculate this is through counting the number of ways four cards can be chosen.

  1. Total number of ways four cards can be chosen is (524).

  2. Total number of ways four cards can be chosen which are not spades are (394).

  3. Hence the probability that none of the four cards are spades is

    (394)(524)=39!4!35!4!48!52!=3938373652515049.

7.1.6.3. Marginal Probability#

Definition 7.21 (Marginal probability)

Let A and B be two events. The marginal probability of the event A is the probability P(A) which is not conditioned on the event B.

7.1.6.4. Total Probability Theorem#

Theorem 7.14 (Total probability theorem)

Let A1,,An be disjoint events that form a partition of the sample space Ω. Assume that P(Ai)>0 for every i. Then for any event B we have

P(B)=P(A1B)++P(AnB)=P(A1)P(B|A1)++P(An)P(B|An).

Proof. .

  1. Since A1,,An are disjoint, hence so are A1B,,AnB.

  2. We have

    B=A1BAnB.
  3. Applying additivity axiom,

    P(B)=P(A1B)++P(AnB).
  4. Applying conditional probability definition, we have

    P(B)=P(A1)P(B|A1)++P(An)P(B|An).

7.1.7. Bayes Rule#

The most famous application of total probability theorem is the Bayes rule. It relates the conditional probabilities of the form P(A|B) with conditional probabilities of the form P(B|A) in which the order of conditioning is reversed.

Theorem 7.15 (Bayes rule)

Let A1,,An be disjoint events that form a partition of the sample space Ω. Assume that P(Ai)>0 for every i. Let B be another event such that P(B)>0. Then we have

P(Ai|B)=P(Ai)P(B|Ai)P(B)=P(Ai)P(B|Ai)P(A1)P(B|A1)++P(An)P(B|An).

Proof. .

  1. By definition of conditional probability, we have

    P(AiB)=P(Ai|B)P(B)

    and

    P(AiB)=P(B|Ai)P(Ai).
  2. Hence

    P(Ai|B)P(B)=P(B|Ai)P(Ai).
  3. Dividing both sides with P(B), we get

    P(Ai|B)=P(Ai)P(B|Ai)P(B).
  4. Expanding P(B) via total probability theorem, we get the desired result.

Remark 7.2 (Statistical inference)

Bayes’ rule is a key tool in the field of statistical inference.

  1. We conduct an experiment where can observe an effect which may be due to a number of causes.

  2. The events A1,,An denote the causes which may not be observable directly.

  3. The event B is the effect which can be observed.

  4. The probability P(B|Ai) models the relationship between the cause Ai and the effect B. It represents the likelihood of B happening given that Ai has happened.

  5. We are often interested in knowing the probability of Ai given that B has been observed.

  6. This is an inference since the events Ai cannot be observed directly.

  7. The probability P(Ai|B) is known as the posterior probability of the event Ai given that B has happened.

  8. This is distinguished from the unconditional/marginal probability P(Ai) which is the probability of the event Ai without any information about the event B.

  9. P(Ai) is known as the prior probability of Ai.

7.1.8. Independence#

The conditional probability P(A|B) captures the information that occurrence of the event B provides about the event A. In many cases, it may happen that observing B provides us no information about the event A. Consequently, the probability P(A) is not altered when conditioning on the event B. Then, we say that the events A and B are independent.

Definition 7.22 (Independence of two events)

Let A and B be two events. We say that A and B are independent if

P(AB)=P(A)P(B).

Independence is a symmetric property. We can see that if A is independent of B then B is independent of A.

It follows that for independent events

  1. If P(A)>0 then

    P(B|A)=P(B).
  2. And if P(B)>0 then

    P(A|B)=P(A).

Example 7.9

Consider the problem of rolling a 4-sided dice twice. Assume that all outcomes are equally likely.

  1. Let A be the event that the first roll is 1.

  2. Let B be the event that the sum of two rolls is 5.

  3. We have A={(1,1),(1,2),(1,3),(1,4)}.

  4. We have B={(1,4),(2,3),(3,2),(4,1)}.

  5. Number of outcomes is 16.

  6. P(A)=14.

  7. P(B)=14.

  8. We can see that AB={(1,4)}.

  9. Then P(AB)=116.

  10. We can see that P(AB)=P(A)P(B).

  11. The two events A and B are independent.

  12. On the surface, it may seem that there should be dependence between the events A and B but it is not so.

  13. Since the outcomes in the event B are equally likely, we can carefully examine the list of outcomes in the event B to see that observing the event B gives us no additional information about whether the first roll will give us 1.

Now consider the problem of rolling a 6-sided dice twice with the same events as before.

  1. We have A={(1,1),(1,2),(1,3),(1,4),(1,5),(1,6)}.

  2. We have B={(1,4),(2,3),(3,2),(4,1)}.

  3. Number of outcomes is 36.

  4. P(A)=16.

  5. P(B)=19.

  6. We can see that AB={(1,4)}.

  7. Then P(AB)=136.

  8. Clearly, P(AB)P(A)P(B).

  9. The two events are dependent.

  10. In fact P(A|B)=14 In other words, the probability of A increases given that B has been observed.

  11. Looking at the outcomes in B we can see that observance of B eliminates the possibility of 5 and 6 from the first roll.

  12. This leads to the increase in the probability of 1 in the first roll.

  13. We can also see that P(B|A)=16.

  14. The probability of B given A has also increased.

7.1.8.1. Conditional Independence#

Recall that the conditional probability is a probability measure in itself. This enables us to introduce the notion of conditional independence.

Definition 7.23 (Conditional independence)

Given an event C, two events A and B are called conditionally independent if

P(AB|C)=P(A|C)P(B|C).

Theorem 7.16

Let A,B,C be three different events. Assume that P(BC)>0. Given that C has occurred, A and B are conditionally independent if and only if

P(A|BC)=P(A|C).

Proof. Note that P(BC)>0 implies that P(B)>0, P(C)>0, P(B|C)>0 and P(C|B)>0.

  1. By conditional probability definition and multiplication rule, we have

    P(AB|C)=P(ABC)P(C)=P(C)P(B|C)P(A|BC)P(C)=P(B|C)P(A|BC).
  2. Comparing this with the expression of conditional independence, we have

    P(B|C)P(A|BC)=P(A|C)P(B|C).
  3. Eliminating the common term, we have

    P(A|BC)=P(A|C).

It is important to note that

  1. Unconditional independence of A and B doesn’t imply conditional independence given C.

  2. Conditional independence of A and B given C doesn’t imply unconditional independence.

Example 7.10

Consider tossing a fair coin twice. Assume that all outcomes are equally likely.

  1. Ω={HH,HT,TH,TT}.

  2. Let A be the event that the first toss is a head. A={HH,HT}.

  3. Let B be the event that the second toss is a head. B={TH,HH}.

  4. It is easy to see that A and B are independent.

  5. Let C be the event that the two tosses have different results. C={TH,HT}.

  6. P(A|C)=12.

  7. P(B|C)=12.

  8. P(AB|C)=0.

  9. Clearly, the two events are not conditionally independent.

Example 7.11

Let there be two coins. The first coin is fair while the second coin is unfair with the probability of head being 0.9. Assume that a coin can be chosen at random with equal probability and then it is tossed twice (the two tosses being independent).

  1. Let C be the event that the second (unfair) coin was chosen.

  2. Let Hi be the event that the i-th toss resulted in heads.

  3. Given the choice of the coin, the two events H1 and H2 are independent.

  4. P(H1|C)=0.9.

  5. P(H2|C)=0.9.

  6. P(H1H2|C)=0.81.

  7. However, the two events H1 and H2 are not independent unconditionally.

  8. If the first toss is a head, then the probability that the coin is unfair increases. Consequently, the probability that the second toss will also be a head increases.

  9. From total probability theorem, we have

    P(H1)=P(C)P(H1|C)+P(Cc)P(H1|Cc)=0.50.9+0.50.5=0.7.
  10. Similarly P(H2)=0.7.

  11. However,

    P(H1H2)=P(C)P(H1H2|C)+P(Cc)P(H1H2|Cc)=0.50.81+0.50.25=0.53.
  12. P(H1)P(H2)=0.70.7=0.49P(H1H2).

7.1.8.2. Three Events#

Naturally, we are interested in extending the notion of independence for multiple events. However this is a bit tricky. We start with the simpler case of three events which can explain the issues involved. Independence means that occurrence or non-occurrence of any event or any pair of events should not provide any information about the occurrence of any other event or pair of events.

Definition 7.24 (Independence of three events)

Let A,B and C be three events. We say that A,B and C are jointly independent if

P(AB)=P(A)P(B)P(BC)=P(B)P(C)P(AC)=P(A)P(C)P(ABC)=P(A)P(B)P(C).

The first three conditions establish the pairwise independence of the three sets. However, the fourth condition is necessary and it doesn’t follow from the first three conditions. Conversely, the fourth condition doesn’t imply the first three conditions.

Let us look at some counter examples.

Example 7.12

Consider the experiment of tossing a fair coin twice with equally likely outcomes.

  1. Ω={HH,HT,TH,TT}.

  2. Let Hi be the event that i-th toss is a head.

  3. Let A be the event that the two tosses have different results.

  4. H1={HT,HH}.

  5. H2={TH,HH}.

  6. A={HT,TH}.

  7. AH1={HT}.

  8. AH2={TH}.

  9. H1H2={HH}.

  10. P(H1)=P(H2)=P(A)=12.

  11. P(AH1)=P(AH2)=P(H1H2)=14.

  12. We can see that A,H1,H2 are pairwise independent.

  13. However AH1H2= since occurrence of A makes H1 and H2 mutually exclusive.

  14. Hence P(AH1H2)=0P(A)P(H1)P(H2).

  15. Hence A,H1,H2 are not independent.

Example 7.13

Consider the experiment of tossing a fair dice twice with equally likely outcomes.

  1. The sample space has 36 possible outcomes.

  2. Let A be the event that the first roll has the outcomes 1, 2, or 3.

  3. Let B be the event that the first roll has the outcomes 3, 4, or 5.

  4. Let C be the event that the sum of two rolls is 9.

  5. We can see that

    C={(3,6),(4,5),(5,4),(6,3)}.
  6. We have

    P(A)=12,P(B)=12,P(C)=19.
  7. We can see that

    ABC={(3,6)}.
  8. Hence P(ABC)=136.

  9. Clearly,

    P(ABC)=P(A)P(B)P(C).
  10. Now AB is the event that the first roll is 3.

  11. BC={(3,6),(4,5),(5,4)}.

  12. AC={(3,6)}.

  13. Hence, we have

    P(AB)=16,P(BC)=112,P(AC)=136.
  14. None of A,B,C are pairwise independent.

  15. Hence A,B,C are not independent.

7.1.8.3. n Events#

We are now ready to generalize the notion of independence to any finite number of events. The key idea is that information about occurrence or non-occurrence of any subset of events should not provide any information about the occurrence or non-occurrence of the remaining events.

Definition 7.25 (Independence of n events)

Let A1,A2,,An be n events contained in F. We say that A1,A2,,An are jointly independent if

P(Ai1Ai2)=P(Ai1)P(Ai2)P(Ai1Ai2Ai3)=P(Ai1)P(Ai2)P(Ai3)P(Ai1Ai2Ai3Aik)=P(Ai1)P(Ai2)P(Ai3)P(Aik)P(A1A2An)=P(A1)P(A2)P(An)

for all combinations of indices such that 1i1<i2<<ikn.

In other words, A1,A2,,An are independent if

P(iSAi)=iSP(Ai)

for every (nonempty) subset S of {1,2,,n}.

7.1.8.4. Independent Trials#

Definition 7.26 (Independent trials)

If an experiment involves a sequence of independent but identical stages, we say that we have a sequence of independent trials.

Definition 7.27 (Bernoulli trials)

If every stage in a sequence of independent trials has only two possible outcomes, we say that we have a sequence of Bernoulli trials.

It is customary to denote the two outcomes of a Bernoulli trial by H and T with the probabilities p and q=1p respectively.

Example 7.14 (Repeated coin tosses)

Consider tossing a coin n times as a sequence of Bernoulli trials.

  1. The outcomes of each experiment are H (head) and T (tail).

  2. Let P(H)=p and P(T)=q=1p.

  3. The outcome of n tosses can be described as a string of n letters each of which is H or T.

  4. There are 2n possible strings. They form the sample space of the compound experiment.

  5. Let a particular string have k heads and nk tails.

  6. Then the probability of this string is given by

    P(ζ1,,ζn)=i=1nP({ζi})=pkqnk.
  7. There are (nk) strings which consist of k heads and nk tails.

  8. Each of these strings (singleton events) are mutually exclusive of each other.

  9. Hence, by the additivity axiom, the probability of having k heads and nk tails in n trials is given by

    (nk)pkqnk.

Definition 7.28 (Binomial probability)

The probability of k heads in a sequence of n Bernoulli trials, denoted by p(k), is known as binomial probability and is given by

p(k)=(nk)pk(1p)nk

where (nk) is a binomial coefficients and p is the probability of head.

See Enumerative Combinatorics for a quick review of binomial coefficients.

Theorem 7.17 (Sum of binomial probabilities)

k=0np(k)=1.

Proof. Recall from Theorem 1.43 that

(a+b)n=k=0n(nk)akbnk.

Putting a=p and b=1p, we get

1=k=0n(nk)pk(1p)nk=k=0np(k).

7.1.9. Compound Experiments#

Often we need to examine the outcomes of different experiments together. Here are some examples:

  • Tossing a coin and throwing a dice

  • Tossing a coin twice in succession (first and second tosses are separate experiments)

Two or more experiments together form a compound experiment. Repeated trials are an example of a compound experiment.

Definition 7.29 (Compound experiment)

Let A and B be two different experiments. Let Ω1 be the sample space of A and Ω2 be the sample space of B. Then the sample space of the compound experiment is given by the Cartesian product Ω=Ω1×Ω2.

7.1.9.1. Product σ-Algebra#

If F1 and F2 are σ-algebras over Ω1 and Ω2 respectively, it is natural to ask how can we construct a σ-algebra for the compound sample space.

Let E1 be an event associated with experiment A and E2 be an event associated with experiment B. Then E=E1×E2 is a product event associated with the compound experiment. We have

E1×E2={ζ=(ζ1,ζ2)|ζ1E1,ζ2E2}.

However, we can see that the set

{E1×E2|E1F1,E2F2}

is not a σ-algebra. Naturally, the closest σ-algebra is the σ-algebra generated by this set.

Definition 7.30 (Product σ algebra)

Let (Ω1,F1) and (Ω2,F2) be two different σ-fields. Then the product σ-algebra F1F2 is the σ-algebra on Ω1×Ω2 generated by the collection of all product events:

F1F2≜=σ({E1×E2|E1F1,E2F2}).

The members of a product σ-algebra are known as compound events.

7.1.9.2. Compound Events#

A compound event can be written as a finite (or countable) disjoint union of product events from the two experiments. A finite union looks like

E=i=1kE1,i×E2,i

where E1,i and E2,i are events in F1 and F2 respectively.

Example 7.15 (Compound events as union of product events)

  1. Consider two experiments each of which consists of throwing a die.

  2. The sample space for both experiments is

    Ω1=Ω2={1,2,3,4,5,6}.
  3. There are 36 possible outcomes in the compound experiment.

  4. The compound sample space is given by

    Ω=Ω1×Ω2={(1,1),(1,2),,(1,6),(2,1),,(2,6),,(6,1),,(6,6)}.

7.1.9.3. Independent Experiments#

Definition 7.31 (Independent experiments)

Two experiments are called independent if the outcome of one experiment doesn’t depend on the (past, present or future) outcomes of the other experiment. In that case, for every product event E=E1×E2, we can write

P(E)=P(E1)P(E2).

Let E be a compound event of two independent experiments given as a disjoint union of product events. Then the probability measure for the compound event is given by

(7.1)#P(E)=i=1kP(E1,i)P(E2,i).