Federico del Campo (1837-1923) was a Peruvian painter.
We have sections
\begin{equation}{\label{a}}\tag{A}\mbox{}\end{equation}
Probability of Events
In the study of statistics we consider experiments for which the outcome cannot be predicted with certainty. Such experiments are called random experiments. Each experiment ends in an outcome that cannot be determined with certainty before the experiment is performed. However, the experiment is such that the collection of every possible outcome can be described and perhaps listed. This collection of all outcomes is called outcome space or sample space.
Example. Some examples are provided below.
- Two dice are cast and the number of spots on the sides that are “up” counted. The out come space is \(S=\{2,3,4,5,6,7,8,9,10,11,12\}\).
- Each of six students selects an integer at random from the first 52 positive integers. We are interested in whether at least two of these six integers match (M) or whether all are different (D). Thus, the outcome space is \(S=\{M,D\}\).
- A fair coin is flipped successively at random until the first head is observed. If we let \(x\) denote the numbers of flips of the coin that are required, then the outcome space is \(S=\{x:x=1.2.3.4.\cdots\}\), which consists of an infinite, but countable, number of outcomes.
- To determine the percentage of body fat for a person, one measurement that is made is a person’s weight under water. If \(w\) denotes this weight in kilograms, then the outcome space is \(S=\{w:0<w<7\}\), as we know from the past experience that this weight does not exceed \(7\) kilograms. \(\sharp\)
Given an outcome space \(S\), let \(A\) be a part of the collection of outcomes in \(S\), i.e., \(A\subset S\). Then \(A\) is called an event. When the experiment is performed and the outcome of the experiment is in \(A\), we say that event \(A\) has occurred. Since, in studying probability, the words set and event are interchangeable.
- \(\emptyset\) denotes the empty set.
- \(A\subseteq B\) means that \(A\) is a subset of \(B\).
- \(A\cup B\) is the union of \(A\) and \(B\).
- \(A\cap B\) is the intersection of \(A\) and \(B\).
- \(A^{c}\) is the complement}of \(A\), i.e., all elements in \(S\) that are not in \(A\).
- \(A_{1},A_{2},\cdots ,A_{k}\) are mutually exclusive events means that \(A_{i}\cap A_{j}=\emptyset\) for \(i\neq j\), i.e., \(A_{1},A_{2},\cdots ,A_{k}\) are disjoint sets.
- \(A_{1},A_{2},\cdots ,A_{k}\) are exhaustive events means \(A_{1}\cup A_{2}\cup\cdots\cup A_{k}=S\). If \(A_{1},A_{2},\cdots ,A_{k}\) are mutually exclusive and exhaustive events, we know \(A_{i}\cap A_{j}=\emptyset\) for \(i\neq j\), and \(A_{1}\cup A_{2}\cup\cdots\cup A_{k}=S\).
The following useful relationships are known as DeMorgan’s Laws
\[\left (\bigcup_{i=1}^{n}A_{i}\right )^{c}=\bigcap_{i=1}^{n}A_{i}^{c}\]
and
\[\left (\bigcap_{i=1}^{n}A_{i}\right )^{c}=\bigcup_{i=1}^{n}A_{i}^{c}.\]
The probability of event \(A\), denoted by \(\mathbb{P}(A)\), is often called the chance of \(A\) occurring. A function \(\mathbb{P}(A)\) is called a set function, which is evaluated for a set \(A\). Probability is a set function \(\mathbb{P}\) that assigns to each event \(A\) in the sample space \(S\) a number \(\mathbb{P}(A)\), which is called the probability of the event \(A\), such that the following properties are satisfied.
- We have \(\mathbb{P}(A)\geq 0\) for any subset \(A\) of \(S\).
- We have \(\mathbb{P}(S)=1\).
- Given events \(A_{1},A_{2}\cdots\) satisfying \(A_{i}\cap A_{j}=\emptyset\) for \(i\neq j\), we have \[\mathbb{P}(A_{1}\cup\cdots\cup A_{n})=\mathbb{P}(A_{1})+\cdots +\mathbb{P}(A_{n})\] for each positive integer \(n\), and \[\mathbb{P}(A_{1}\cup A_{2}\cup\cdots )=\mathbb{P}(A_{1})+\mathbb{P}(A_{2})+\cdots\] for a countable infinite number of events.
\begin{equation}{{\label{stalect1}}\tag{1}\mbox{}\end{equation}
Theorem \ref{stalect1}. For each event \(A\), we have \(\mathbb{P}(A)=1-\mathbb{P}(A^{c})\).
Proof. We have \(S=A\cup A^{c}\) and \(A\cap A^{c}=\emptyset\). Thus, from the above properties, it follows \(1=\mathbb{P}(A)+\mathbb{P}(A^{c})\). This completes the proof. \(\blacksquare\)
Example. A fair coin is flipped successively until the same face is observed on successive flips. Let \(A=\{x:x=3,4,5,\cdots\}\); that is, \(A\) is the event that it will take three or more flips of the coin to observe that same face on consecutive flips. To find \(\mathbb{P}(A)\), we first find the probability of \(A^{c}=\{x:x=2\}\), the complement of \(A\). In two flips of a coin, the possible outcomes are \(\{HH,HT,TH,TT\}\), and we assume that each of these four points has the same chance of being observed, which says
\[\mathbb{P}(A^{c})=\mathbb{P}({HH,TT})=\frac{2}{4}.\]
Using Theorem \ref{stalect1}, we have
\begin{align*} \mathbb{P}(A) & =1-\mathbb{P}(A^{c})\\ & =1-\frac{2}{4}\\ & =\frac{2}{4}=\frac{1}{2}.\end{align*}
Theorem. We have \(\mathbb{P}(\emptyset )=0\).
Proof. In Theorem \ref{stalect1}, we take \(A=\emptyset\), i.e., \(A^{c}=S\). Then, we have
\begin{align*} P(\emptyset ) & =1-\mathbb{P}(S)\\ & =1-1=0.\end{align*}
This completes the proof. \(\blacksquare\)
Theorem. We have the following properties.
(i) If events \(A\) and \(B\) satisfy \(A\subseteq B\), then \(\mathbb{P}(A)\leq \mathbb{P}(B)\).
(ii) For each event \(A\), we have \(\mathbb{P}(A)\leq 1\).
Proof. To prove part (i), it is clear to see \(B=A\cup (B\cap A^{c})\) and \(A\cap (B\cap A^{c})=\emptyset\). Therefore, we have
\begin{align*} \mathbb{P}(B) & =\mathbb{P}(A)+\mathbb{P}(B\cap A^{c})\\ & \geq \mathbb{P}(A)\end{align*}
since \(\mathbb{P}(B\cap A^{c})\geq 0\).
To prove part (ii), since \(A\subset S\), using part (i), we have \(\mathbb{P}(A)\leq \mathbb{P}(S)=1\). This completes the proof. \(\blacksquare\)
Theorem. Given any two events \(A\) and \(B\), we have
\[\mathbb{P}(A\cup B)=\mathbb{P}(A)+\mathbb{P}(B)-\mathbb{P}(A\cap B).\]
Proof. Since \(A\cup B=A\cup (A^{c}\cap B)\) is the union of mutually exclusive events, we have \(\mathbb{P}(A\cup B)=\mathbb{P}(A)+\mathbb{P}(A^{c}\cap B)\). Since \(B=(A\cap B)\cup (A^{c}\cap B)\) is a union of mutually exclusive events, we also have \(\mathbb{P}(B)=\mathbb{P}(A\cap B)+\mathbb{P}(A^{c}\cap B)\), i.e., \(\mathbb{P}(A^{c}\cap B)=\mathbb{P}(B)-\mathbb{P}(A\cap B)\). Therefore, we obtain
\begin{align*} \mathbb{P}(A\cup B) & =\mathbb{P}(A)+\mathbb{P}(A^{c}\cap B)\\ & =\mathbb{P}(A)+\mathbb{P}(B)-\mathbb{P}(A\cap B).\end{align*}
This completes the proof. \(\blacksquare\)
Example. A faculty leader was meeting two students in Paris, one arriving by train from Amsterdam and the other arriving by train from Brussels at approximately the same time. Let \(A\) and \(B\) be the events that the trains are on time, respectively. Suppose from past experience we know that \(\mathbb{P}(A)=0.93\), \(\mathbb{P}(B)=0.89\) and \(\mathbb{P}(A\cap B)=0.87\). Then, we have
\begin{align*} \mathbb{P}(A\cup B) & =\mathbb{P}(A)+\mathbb{P}(B)-\mathbb{P}(A\cap B)\\ & =0.93+0.89-0.87=0.95\end{align*}
is the probability that at least one train is on time. \(\sharp\)
Given three events \(A\), \(B\), and \(C\), we can similarly obtain
\begin{align*} \mathbb{P}(A\cup B\cup C) & =\mathbb{P}(A)+\mathbb{P}(B)+\mathbb{P}(C)\\ & \quad -\mathbb{P}(A\cap B)-\mathbb{P}(A\cap C)-\mathbb{P}(B\cap C)\\ & \quad +\mathbb{P}(A\cap B\cap C)\end{align*}
by considering \(A\cup B\cup C=A\cup (B\cup C)\).
Example. Continued from the above example by saying that a third student is arriving from Cologne. Let \(C\) be the event that this train is on time with \(\mathbb{P}(C)=0.91\), \(\mathbb{P}(B\cap C)=0.85\), \(\mathbb{P}(A\cap C)=0.86\), and \(\mathbb{P}(A\cap B\cap C)=0.81\). Then, we have
\begin{align*} \mathbb{P}(A\cup B\cup C) & =0.93+0.89+0.91-0.87-0.85-0.86+0.81\\ & =0.96\end{align*}
is the probability that at least one of the three trains is on time.
Theorem. (Additive Theorem). For any finite number of events, we have
\begin{align*} \mathbb{P}\left (\bigcup_{j=1}^{n} A_{j}\right ) & =\sum_{j=1}^{n} \mathbb{P}(A_{j})-\sum_{1\leq j_{1}<j_{2}\leq n} \mathbb{P}(A_{j_{1}}\cap A_{j_{2}})\\ & \quad +\sum_{1\leq j_{1}<j_{2}<j_{3}\leq n} \mathbb{P}(A_{j_{1}}\cap A_{j_{2}}\cap A_{j_{3}})\\ & \quad -\cdots +(-1)^{n+1}\mathbb{P}(A_{1}\cap A_{2}\cap\cdots\cap A_{n}).\end{align*}
Multiplication Principle. Suppose that an experiment \(E_{1}\) has \(n_{1}\) outcomes and for each of these possible outcomes an experiment \(E_{2}\) has \(n_{2}\) possible outcomes. The composite experiment \(E_{1}E_{2}\) that consists of performing first \(E_{1}\) and then \(E_{2}\) has \(n_{1}n_{2}\) possible outcomes.
Clearly, the multiplication principle can be extended to a sequence of more than two experiments. For \(i=1,\cdots ,m\), suppose that the experiment \(E_{i}\) has \(n_{i}\) possible outcomes after previous experiments have been performed. The composite experiment \(E_{1}E_{2}\cdots E_{m}\), which consists of performing \(E_{1}\), then \(E_{2}\), \(\cdots\) and finally \(E_{m}\), has \(n_{1}n_{2}\cdots n_{m}\) possible outcomes.
Example. A certain food service gives the following choices for dinner: \(E_{1}\), soup or tomato juice; \(E_{2}\), steak or shrimp; \(E_{3}\), french fried potato, mashed potato, or a backed potato; \(E_{4}\), corns or peas; \(E_{5}\), Jello, tossed salad, cottage cheese, or cole slaw; \(E_{6}\), cake, cookies, pudding, brownie, vanilla ice cream, chocolate ice cream, or orange sherbet; \(E_{7}\), coffee, tea, milk, or punch. How many different dinner selections are possible if one of the listed choices is made for each of \(E_{1},E_{2}, \cdots ,E_{7}\)? By the multiplication principle there are \(2\cdot 2\cdot 3\cdot 2\cdot 4\cdot 7\cdot 4=2688\) different combinations. \(\sharp\)
Suppose that \(n\) positions are to be filled with \(n\) different objects. There are \(n\) choices for filling the first position, \(n-1\) for the second, \(\cdots\), 1 choice for the last position. Using the multiplication principle, there are \(n(n-1)\cdots 2\cdot 1 =n!\) possible arrangements. The symbol \(n!\) is read \(n\) factorial. For convenience, we take \(0!=1\); that is, we say that zero position can be filled with zero objects in one way. Each of the \(n!\) arrangements (in a row) of \(n\) different objects is called a permutation of the \(n\) objects.
Example. The number of permutations of the four letters a, b, c, and d is clearly \(4!=24\). However, the number of possible four-letter code words using the four letters a, b, c, and d if letters may be repeated is \(4^{4}=256\), because in this case each selection can be performed in four ways. \(\sharp\).
Suppose that \(r\) positions are to be filled with objects selected from \(n\) different objects with \(r\leq n\). Then, the number of possible ordered arrangements is given by
\[P^{n}_{r}=n(n-1)(n-2)\cdots (n-r+1).\]
In terms of factorilas, we have
\begin{align*} P^{n}_{r} & =\frac{n(n-1)\cdots (n-r+1)(n-r)\cdots 3\cdot 2\cdot 1}{(n-r)\cdots 3\cdot 2\cdot 1}\\ & =\frac{n!}{(n-r)!}.\end{align*}
Each of the \(P^{n}_{r}\) arrangements is called a permutation of \(n\) objects taken} \(r\) at a time.
Example. The number of possible four-letter code words, selecting from the 26 letters in the alphabet, in which all four letters are different is
\begin{align*} P^{26}_{4} & =26\cdot 25\cdot 24\cdot 23\\ & =\frac{26!}{24!}\\ & =358800.\end{align*}
If \(r\) objects are selected from a set of \(n\) objects, and if the order of selection is noted, the set of \(r\) objects is called an {ordered sample of size \(r\).
- Sampling with replacement occurs when an object is selected and then replaced before the next object is selected. By the multiplication principle, the number of possible ordered samples of size \(r\) taken from a set of \(n\) objects is \(n^{r}\) when sampling with replacement.
- Sampling without replacement occurs when an object is not replaced after it has been selected. By the multiplication principle, the number of possible ordered samples of size \(r\) taken from a set of \(n\) objects, sampling without replacement, is \[n(n-1)\cdots (n-r+1)=\frac{n!}{(n-r)!}\] which is equivalent to \(P^{n}_{r}\), the number of permutations of \(n\) object taken \(r\) at a time.
The order of selection is always not important. In other words, we are interested in the number of subsets of size \(r\) that can be selected from a set of \(n\) different objects. Let \(C\) denote the number of (unordered) subsets of size \(r\) that can be selected from \(n\) different objects. We can obtain each of the \(P^{n}{r}\) ordered subsets by first selecting one of the \(C\) unordered subsets of \(r\) objects and then ordering these \(r\) objects. Since the latter can be carried out in \(r!\) ways, the multiplication principle yields \(C\cdot r!\) ordered subsets; that is, \(C\cdot r!\) must equal \(P^{n}{r}\). Therefore, we have
\[C\cdot r!=\frac{n!}{(n-r)!},\]
which implies
\[C=\frac{n!}{r!(n-r)!}.\]
We denote this answer by
\[C^{n}_{r}=\frac{n!}{r!}{(n-r)!}.\]
We also say that the number ways in which \(r\) objects can be selected without replacement from \(n\) objects, when the order of selection is disregarded, is \(C^{n}{r}\), and this can be read as “$n$ choose \(r\)”. Each of the \(C^{n}{r}\) unordered subsets is called a combination of \(n\) objects taken \(r\) at a time.
Example. The number of possible five-card hands (hands in five-card poker) drawn from a deck of 52 playing cards is
\begin{align*} C^{52}_{5} & =\frac{52!}{5!}{47!}\\ & =2598960.\end{align*}
The numbers \(C^{n}_{r}\) are frequently called binomial coefficients since they arise in the expansion of a binomial.
\[(a+b)^{n}=\sum_{r=0}^{n} C^{n}_{r}b^{r}a^{n-r}.\]
Now, suppose that a set contains \(n\) objects of two types, \(r\) of one type and \(n-r\) of the other type. The number of permutations of \(n\) different objects is \(n!\). However, in this case, the objects are not distinguishable. To count the number of distinguishable arrangements, first select \(r\) out of the \(n\) positions for the objects of the first type. This can be done in \(C^{n}{r}\) ways. Then fill in the remaining positions with the objects of the second type. Therefore, the number of distinguishable arrangements is \(C^{n}{r}\). Each of the \(C^{n}_{r}\) permutations of \(n\) objects, \(r\) of one type and \(n-r\) of another type, is called a distinguishable permutation.
Example. A coin is flipped 10 times and the sequences of heads and tails is observed. the number of possible 10-tuplets that result in four heads and six tails is
\[C^{10}{4}=C^{10}{6}=10!/4!6!=210.\]
\begin{equation}{\label{b}}\tag{B}\mbox{}\end{equation}
Conditional Probability.
The conditional probability of an event \(A\) given that event \(B\) has occurred is defined by
\[\mathbb{P}(A|B)=\frac{\mathbb{P}(A\cap B)}{\mathbb{P}(B)},\]
provided that \(\mathbb{P}(B)>0\). We can think of the “given event B” as specifying the new sample space for which we now want to calculate the probability of that part of \(A\) that is contained in \(B\) to determine \(\mathbb{P}(A|B)\).
Example. A pair of four-sided dice is rolled and the sum is determined. Let \(A\) be the event that a sum of \(3\) is rolled, and let \(B\) be the event that a sum of \(3\) or a sum of \(5\) is rolled. In a sequence of rolls, the probability that a sum of \(3\) is rolled before a sum of \(5\) is rolled can be thought of as the conditional probability of a sum \(3\) given that a sum of \(3\) or \(5\) has occurred. In other words, the conditional probability of \(A\) given \(B\) is
\begin{align*} \mathbb{P}(A|B) & =\frac{\mathbb{P}(A\cap B)}{\mathbb{P}(B)}\\ & =\frac{\mathbb{P}(A)}{\mathbb{P}(B)}\\ & =\frac{2/16}{6/16}\\ & =\frac{2}{6}.\end{align*}
Theorem. Conditional probability satisfies the axioms for a probability function when \(P(B)>0\).
(i) \(\mathbb{P}(A|B)\geq 0\).
(ii) \(\mathbb{P}(B|B)=1\).
(iii) If \(A_{1},A_{2},\cdots\) are mutually exclusive events, then
\[\mathbb{P}(A_{1}\cup A_{2}\cup\cdots\cup A_{n}|B)=\mathbb{P}(A_{1}|B)+\mathbb{P}(A_{2}|B)+\cdots +\mathbb{P}(A_{n}|B),\]
for each positive integer \(n\), and
\[\mathbb{P}(A_{1}\cup A_{2}\cup\cdots |B)=\mathbb{P}(A_{1}|B)+\mathbb{P}(A_{2}|B)+\cdots\]
for the countable infinite number of events.
Proof. Parts (i) and (ii) are evident since
\[\mathbb{P}(A|B)=\frac{\mathbb{P}(A\cap B)}{\mathbb{P}(B)}\geq 0\]
and
\[\mathbb{P}(B|B)=\frac{\mathbb{P}(B\cap B)}{\mathbb{P}(B)}=1.\]
To prove part (iii), we have
\begin{align*} \mathbb{P}(A_{1}\cup A_{2}\cup\cdots |B) & =\frac{\mathbb{P}[(A_{1}\cup A_{2}\cup\cdots )\cap B]}{\mathbb{P}(B)}\\ & =\frac{[(A_{1}\cap B)\cup (A_{2}\cap B)\cup\cdots ]}{\mathbb{P}(B)}.\end{align*}.
Since \((A_{1}\cap B)\), \((A_{2}\cap B)\), \(\cdots\) are mutually exclusive events, we also have
\begin{align*} \mathbb{P}(A_{1}\cup A_{2}\cup\cdots |B) & =\frac{\mathbb{P}(A_{1}\cap B)+\mathbb{P}(A_{2}\cap B)+\cdots}{\mathbb{P}(B)}\\ & =\frac{\mathbb{P}(A_{1}\cap B)}{\mathbb{P}(B)}+\frac{\mathbb{P}(A_{2}\cap B)}{\mathbb{P}(B)}+\cdots\\ & =\mathbb{P}(A_{1}|B)+\mathbb{P}(A_{2}|B)+\cdots.\end{align*}
This completes the proof. \(\blacksquare\)
The probability that two events, \(A\) and \(B\), both occur is given by the multiplication rule
\[\mathbb{P}(A\cap B)=\mathbb{P}(A)\mathbb{P}(B|A)\]
or by
\[\mathbb{P}(A\cap B)=\mathbb{P}(B)\mathbb{P}(A|B).\]
Example. A bowl contains seven blue chips and three red chips. Two chips are to be drawn successively at random and without replacement. We want to compute the probability that the first draw results in a red chip (A) and the second draw results in a blue chip (B). It is reasonable to assign the following probabilities
\[\mathbb{P}(A)=3/10\mbox{ and }\mathbb{P}(B|A)=7/9.\]
The probability of red on the first draw and blue on the second draw is
\[\mathbb{P}(A\cap B)=\frac{3}{10}\cdot\frac{7}{9}=\frac{7}{30}.\]
The multiplication rule can be extended to three or more events. In the case of three events we have, by using the multiplication rule for two events,
\begin{align*} \mathbb{P}(A\cap B\cap C) & =\mathbb{P}[(A\cap B)\cap C]\\ & =\mathbb{P}(A\cap B)\mathbb{P}(C|A\cap B).\end{align*}
Since \(\mathbb{P}(A\cap B)=\mathbb{P}(A)\mathbb{P}(B|A)\), we obtain
\[\mathbb{P}(A\cap B\cap C)=\mathbb{P}(A)\mathbb{P}(B|A)\mathbb{P}(C|A\cap B).\]
Example. A grade-school boy has five blue and four white marbles in his left pocket and four blue and five white marbles in his right pockets. If he transfers one marble at random from his left pocket to his right pocket, what is the probability of his then drawing a blue marble from his right pocket? For notation let \(BL\), \(BR\), and \(WL\) denote drawing blue from left pocket, blue from right pocket, and white from left pocket, respectively. Then, we have
\begin{align*} \mathbb{P}(BR) & =\mathbb{P}(BL\cap BR)+\mathbb{P}(WL\cap BR)\\ & =\mathbb{P}(BL)\mathbb{P}(BR|BL)+\mathbb{P}(WL)\mathbb{P}(BR|WL)\\ & =\frac{5}{9}\cdot\frac{5}{10}+\frac{4}{9}\cdot\frac{4}{10}\\ & =\frac{41}{90}\end{align*}
\begin{equation}{\label{c}}\tag{C}\mbox{}\end{equation}
Independent Events.
For certain pairs of events, the occurence of one of them may or may not change the probability of the occurence of the other. In the later case, they are said to be independent events. The formal definition is as follows. Events \(A\) and \(B\) are independent if and only if \(\mathbb{P}(A\cap B)=\mathbb{P}(A)\mathbb{P}(B)\). Otherwise, \(A\) and \(B\) are called dependent events.
Example. A red die and a white die are rolled. Let event \(A=\{4\) on the red die$\}$ and event \(B=\{\)sum of dice is odd$\}$. Of the 36 equally likely outcomes, \(6\) are favorable to \(A\), \(18\) are favorable to \(B\), and \(3\) are favorable to \(A\cap B\). Thus
\begin{align*} \mathbb{P}(A)\mathbb{P}(B) & =\frac{6}{36}\cdot\frac{18}{36}\\ & =\frac{3}{36}\\ & =\mathbb{P}(A\cap B).\end{align*}
Therefore, \(A\) and \(B\) are independent. \(\sharp\)
Example. A red die and a white die ae rolled. Let event \(A=\{5\) on red die$\}$ and event \(B=\{\)sum of dice is \(11\}\). Of the 36 equally likely outcomes, \(6\) are favorable to \(A\), \(2\) are favorable to \(B\), and \(1\) is favorable to \(A\cap B\). Then, we have
\begin{align*} \mathbb{P}(A)\mathbb{P}(B) & =\frac{6}{36}\cdot\frac{2}{36}\\ & =\frac{1}{108}\neq\frac{1}{36}\\ & =\mathbb{P}(A\cap B).\end{align*}
Therefore, \(A\) and \(B\) are dependent events. \(\sharp\)
Theorem. Suppose that \(A\) and \(B\) are independent events. Then, the following pairs of events are also independent:
(i) \(A\) and \(B^{c}\).
(ii) \(A^{c}\) and \(B\).
(iii) \(A^{c}\) and \(B^{c}\).
Proof. Since the conditional probability satisfies the axioms for a probability function, for \(\mathbb{P}(A)>0\), we have \(\mathbb{P}(B^{c}|A)=1-\mathbb{P}(B|A)\). Therefore, we obtain
\begin{align*} \mathbb{P}(A\cap B^{c}) & =\mathbb{P}(A)\mathbb{P}(B^{c}|A)\\ & =\mathbb{P}(A)[1-\mathbb{P}(B|A)]\\ & =\mathbb{P}(A)[1-\mathbb{P}(B)]\\ & \mbox{ (since \(\mathbb{P}(B|A)=\mathbb{P}(B)\) by independence)}\\ & =\mathbb{P}(A)\mathbb{P}(B^{c}),\end{align*}
which says that \(A\) and \(B^{c}\) are independent events. \(\blacksquare\)
Events \(A\), \(B\), and \(C\) are mutually independent if and only if the following two conditions hold true.
- They are pairwise independent; that is, \(\mathbb{P}(A\cap B)=\mathbb{P}(A)\mathbb{P}(B)\), \(\mathbb{P}(A\cap C)=\mathbb{P}(A)\mathbb{P}(C)\) and \(\mathbb{P}(B\cap C)=\mathbb{P}(B)\mathbb{P}(C)\).
- We have \(\mathbb{P}(A\cap B\cap C)=\mathbb{P}(A)\mathbb{P}(B)\mathbb{P}(C)\).
Example. An urn contains four balls numbered 1,2,3, and 4. One ball is to be drawn at random from the urn. Let the events \(A\), \(B\), and \(C\) be defined by \(A=\{1,2\}\), \(B=\{1,3\}\), \(C=\{1,4\}\). Then \(\mathbb{P}(A)=\mathbb{P}(B)=\mathbb{P}(C)=1/2\). Furthermore, we have
\begin{align*} \mathbb{P}(A\cap B) & =\frac{1}{4}=\mathbb{P}(A)\mathbb{P}(B),\\ \mathbb{P}(A\cap C) & =\frac{1}{4}=\mathbb{P}(A)\mathbb{P}(C),\\ \mathbb{P}(B\cap C) & =\frac{1}{4}=\mathbb{P}(B)\mathbb{P}(C),\end{align*}
which implies that \(A\), \(B\), and \(C\) are pairwise independent. Since \(A\cap B\cap C=\{1\}\), we have
\begin{align*} \mathbb{P}(A\cap B\cap C) & =\frac{1}{4}\neq\frac{1}{8}\\ & =\mathbb{P}(A)\mathbb{P}(B)\mathbb{P}(C).\end{align*}
It also means that something seems to be lacking for the complete independence of \(A\), \(B\), and \(C\). \(\sharp\).
The above definition can be extended to mutual independence of four or more events.
Theorem. Suppose that \(A\), \(B\), and \(C\) are mutually independent events. Then, the following events are also independent
(i) \(A\) and \((B\cap C)\).
(ii) \(A\) and \((B\cup C)\).
(iii) \(A^{c}\) and \((B\cap C^{c})\).
In addition, \(A^{c}\), \(B^{c}\), and \(C^{c}\) are mutually independent.
Example. A fair six-sided die is rolled \(6\) independent times. Let \(A_{i}\) be the event that side \(i\) is observed on the \(i\)th roll, called a match on the \(i\)th trial for \(i=1,\cdots ,6\). Therefore, we have \(\mathbb{P}(A_{i})=1/6\) and \(\mathbb{P}(A_{i}^{c})=1-1/6\). Let \(B\) denote the event that at least one match occurs. Then \(B^{c}\) is the event that no matches occur. We have
\begin{align*} \mathbb{P}(B) & =1-\mathbb{P}(B^{c})\\ & =1-\mathbb{P}(A_{1}^{c}\cap A_{2}^{c}\cap\cdots\cap A_{6}^{c})\\ & =1-\frac{5}{6}\cdot\frac{5}{6}\cdot\frac{5}{6}\cdot\frac{5}{6}\cdot\frac{5}{6}\cdot\frac{5}{6}\\ & =1-\left (\frac{5}{6}\right )^{6}\end{align*}
is the probability of event \(B\). \(\sharp\)
Example. Urn I contains two red balls and three white balls; urn II contains two red balls and one white ball; urn III contains one red ball and three white balls. At the \(i\)th trial, a ball is drawn from urn \(i\) for \(i=I,II,III\). Assume that the trials are independent. Under reasonable assumptions, the probabilities of some of the possible outcomes are
\begin{align*} \mathbb{P}(\{R,R,R\}) & =\frac{2}{5}\cdot\frac{2}{3}\cdot\frac{1}{4},\\ \mathbb{P}({W,R,W}) & =\frac{3}{5}\cdot\frac{2}{3}\cdot\frac{3}{4},\\ \mathbb{P}({R,W,W}) & =\frac{2}{5}\cdot\frac{1}{3}\cdot\frac{3}{4}.\end{align*}
Example. An urn contains three red, two white, and four yellow balls. An ordered sample of size \(3\) is drawn from the urn. If the balls are drawn with replacement so that one outcome does not change the probabilities of the others, the trials are independent. Under reasonable assumptions, the probabilities of the two given outcomes are
\begin{align*} \mathbb{P}(\{R,W,Y\}) & =\frac{3}{9}\cdot\frac{2}{9}\cdot\frac{4}{9}\\ & =\frac{8}{243}\end{align*}
and
\begin{align*} & \mathbb{P}(\{Y,Y,R\}\mbox{ or }\{R,W,W\})\\ & \quad=\frac{4}{9}\cdot\frac{4}{9}\cdot\frac{3}{9}+\frac{3}{9}\cdot\frac{2}{9}\cdot\frac{2}{9}\\ & \quad=\frac{20}{243}.\end{align*}
If the balls are drawn without replacement, the trials are dependent. The probabilities, again under reasonable assumptions, of the two outcomes are
\begin{align*} \mathbb{P}(\{R,W,Y\}) & =\frac{3}{9}\cdot\frac{2}{8}\cdot\frac{4}{7}\\ & =\frac{1}{21}\end{align*}
and
\begin{align*} & \mathbb{P}(\{Y,Y,R\}\mbox{ or }\{R,W,W\})\\ & \quad =\frac{4}{9}\cdot\frac{4}{8}\cdot\frac{3}{7}+\frac{3}{9}\cdot\frac{2}{8}\cdot\frac{2}{7}\\ & \quad =\frac{7}{84}.\end{align*}
Example. Suppose that on five consecutive days an “instant winner” lottery ticket is purchased and the probability of winning is \(1/5\) on each day. Assuming independent trials, we have
\[\mathbb{P}(\{W,W,L,L,L\})=\left (\frac{1}{5}\right )^{2}\left (\frac{4}{5}\right )^{3}\]
and
\begin{align*} \mathbb{P}(\{L,W,L,W,L\}) & =\frac{4}{5}\cdot\frac{1}{5}\cdot\frac{4}{5}\cdot\frac{1}{5}\cdot\frac{4}{5}\\ & =\left (\frac{1}{5}\right )^{2}\left (\frac{4}{5}\right )^{3}.\end{align*}
In general, the probability of purchasing two winning tickets and three losing tickets is
\[C^{5}_{2}\left (\frac{1}{5}\right )^{2}\left (\frac{4}{5}\right )^{3}=0.2048.\]
since there are \(C^{5}_{2}\) ways to select the positions (or the days) for the winning tickets and each of these \(C^{5}_{2}\) ways has the probability \((1/5)^{2}(4/5)^{3}\). \(\sharp\)
\begin{equation}{\label{d}}\tag{D}\mbox{}\end{equation}
Baye’s Theorem.
Let \(B_{1},B_{2},\cdots ,B_{m}\) constitute a partition of the sample space \(S\). That is, \(S=B_{1}\cup B_{2}\cup\cdots\cup B_{m}\) and \(B_{i}\cap B_{j}=\emptyset\) for \(i\neq j\). Of course, the events \(B_{1},B_{2},\cdots ,B_{m}\) are mutually exclusive and exhaustive. Furthermore, suppose that the prior probability of the event \(B_{i}\) is positive. Let \(A\) be an event. Then \(A\) is the union of \(m\) mutually exclusive events, namely,
\[A=(B_{1}\cap A)\cup (B_{2}\cap A)\cup\cdots\cup (B_{m}\cap A).\]
Therefore, we have
\begin{align*} \mathbb{P}(A) & =\sum_{i=1}^{m} \mathbb{P}(B_{i}\cap A)\\ & =\sum_{i=1}^{m} \mathbb{P}(B_{i})\mathbb{P}(A|B_{i}).\end{align*}
For \(\mathbb{P}(A)>0\), we have
\[\mathbb{P}(B_{k}|A)=\frac{\mathbb{P}(B_{k}\cap A)}{\mathbb{P}(A)}\]
for \(k=1,\cdots ,m\). Then, we have the Bayes’ Theorem given by
\[\mathbb{P}(B_{k}|A)=\frac{\mathbb{P}(B_{k})\mathbb{P}(A|B_{k})}{\sum_{i=1}^{m} \mathbb{P}(B_{i})\mathbb{P}(A|B_{i})}\]
for \(k=1,\cdots ,m\). The conditional probability \(\mathbb{P}(B_{k}|A)\) is often called the posterior probability of \(B_{k}\).
Example. In a certain factory, machines I, II, and III are all producing springs of the same length. Of their production, machines I, II, and III produce 2%, 1%, and 3% defective springs, respectively. Of the total production of springs in the factory, machine I produces 35%, machine II produces 25%, and machine III produces 40%. If one spring is selected at random from the total springs produced in a day, the probability that it is defective, in an obvious notation, equals
\begin{align*} \mathbb{P}(D) & =\mathbb{P}(I)\mathbb{P}(D|I)+\mathbb{P}(II)\mathbb{P}(D|II)+\mathbb{P}(III)\mathbb{P}(D|III)\\ & =\left (\frac{35}{100}\right )\left (\frac{2}{100}\right )+\left (\frac{25}{100}\right )\left (\frac{1}{100}\right )+\left (\frac{40}{100}\right )\left (\frac{3}{100}\right )\\ & =\frac{215}{10000}.\end{align*}
When the selected spring is defective, the conditional probability that it was produced by machine III is, by Bayes’ formula,
\begin{align*} \mathbb{P}(III|D) & =\frac{\mathbb{P}(III)\mathbb{P}(D|III)}{\mathbb{P}(D)}\\ & =\frac{(40/100)(3/1000)}{215/10000}\\ & =\frac{120}{215}.\end{align*}


