Charles Wynne Nicholls (1831-1903) was an Irish painter.
We have sections
\begin{equation}{\label{a}}\tag{A}\mbox{}\end{equation}
Random Variables.
Consider the probability space \((S,{\cal F},\mathbb{P})\). Let \(D\) be a space, and let \(X\) be a function defined on \(S\) into \(D\), i.e., \(X:S\rightarrow D\). For \(T\subseteq D\), the inverse image of \(T\) under \(X\), denoted by \(X^{-1}(T)\), is defined by
\[X^{-1}(T)=\left\{s\in S:X(s)\in T\right\}.\]
Then, we have
\[X^{-1}\left (\bigcup_{i=1}^{\infty}T_{i}\right )=\bigcup_{i=1}^{\infty}X^{-1}(T_{i}).\]
For \(T_{1}\cap T_{2}=\emptyset\), we also have
\[X^{-1}(T_{1})\cap X^{-1}(T_{2})=\emptyset.\]
Using the above properties, we can obtain
\begin{align*} & X^{-1}\left (\bigcap_{i=1}^{\infty}T_{i}\right )=\bigcap_{i=1}^{\infty}X^{-1}(T_{i})\\ & X^{-1}\left (T^{c}\right )=\left (X^{-1}(T)\right )^{c}\\ & X^{-1}(D)=S\\ & X^{-1}(\emptyset )=\emptyset.\end{align*}
Let \({\cal D}\) be a \(\sigma\)-field of subsets of \(D\). We define the class \(X^{-1}({\cal D})\) of subsets of \(D\) by
\[X^{-1}({\cal D})=\left\{A\subseteq S:A=X^{-1}(T)\mbox{ for some }T\in{\cal D}\right\}.\]
Using the above properties, we also have the following interesting result.
Theorem. The class \(X^{-1}({\cal D})\) is a $\sigma$-field of subsets of \(S\).
Definition. Under the probability space \((S,{\cal F},\mathbb{P})\), we say that \(X\) is \(({\cal F},{\cal D})\)-measurable when \(X^{-1}({\cal D})\subseteq {\cal F}\). \(\sharp\)
Definition. Given a function \(X:(S,{\cal F},\mathbb{P})\rightarrow (\mathbb{R},{\cal B})\), where \({\cal B}\) is the Borel \(\sigma\)-field on \(\mathbb{R}\). We say that \(X\) is a
random variable when \(X\) is \(({\cal F},{\cal B})\)-measurable. \(\sharp\)
Definition. Given a function \({\bf X}:(S,{\cal F},\mathbb{P})\rightarrow (\mathbb{R}^{n},{\cal B}^{n})\), where \({\cal B}^{n}\) is the Borel \(\sigma\)-field on \(\mathbb{R}^{n}\). We say
that \({\bf X}\) is a random vector when \({\bf X}\) is \(({\cal F},{\cal B}^{n})\)-measurable. \(\sharp\)
Proposition. Let \({\bf X}=(X_{1},\cdots ,X_{n}):(S,{\cal F})\rightarrow (\mathbb{R}^{n},{\cal B}^{n})\). Then \({\bf X}\) is a random vector if and only if \(X_{i}\) for \(i=1,\cdots ,n\) are random variables.
Using the measurability of functions, we have the following operations of random variables.
Proposition. Let \(X\) and \(Y\) be random variables. Then, we have the following properties.
(i) \(aX+bY\) is a random variable for all \(a,b\in \mathbb{R}\).
(ii) \(\max\{X,Y\}\) and \(\min\{X,Y\}\) are random variables.
(iii) \(XY\) is a random variable.
(iv) Provided that \(Y(s)\neq 0\) for each \(s\), \(X/Y\) is a random variable. \(\sharp\)
Let \({a_{n}}\) be a sequence of real numbers. Let us recall the concepts of limit superior and inferior.
Proposition. Let \(X_{1},X_{2},\cdots\) be a sequence of random variables. Then
\[\sup_{n}X_{n},\quad\inf_{n}X_{n},\quad\limsup_{n\rightarrow\infty}X_{n}\mbox{ and }\liminf_{n\rightarrow\infty}X_{n}\]
are random variables. Consequently, if
\[X(s)=\lim_{n\rightarrow\infty}X_{n}(s)\]
exists for every \(s\in S\), then \(X\) is a random variable. Furthermore, if
\[X(s)=\sum_{n=1}^{\infty}X_{n}(s)\]
converges for each \(s\in S\), then \(X\) is a random variable. \(\sharp\)
Proposition. Let \(X_{1},\cdots ,X_{n}\) be random variables, and let \({\bf g}:\mathbb{R}^{n}\rightarrow \mathbb{R}^{m}\) be Borel measurable. Then \({\bf g}(X_{1},\cdots ,X_{n})\) is a random vector. \(\sharp\)
Let \({\bf X}:(S,{\cal F},\mathbb{P})\rightarrow (\mathbb{R}^{n},{\cal B}^{n})\) be a random vector. We define a set function \(\mathbb{P}_{\bf X}\) on \(\mathbb{R}^{n}\) by
\begin{align*} \mathbb{P}_{\bf X}(B) & \equiv \mathbb{P}({\bf X}\in B)\\ & =\mathbb{P}\left (\{s\in S:{\bf X}(s)\in B\}\right )\\ & =\mathbb{P}({\bf X}^{-1}(B)).\end{align*}
Then \(\mathbb{P}_{\bf X}\) is a probability measure on \(\mathbb{R}^{n}\).
Proposition. The set function \(\mathbb{P}_{\bf X}\) defined on \(\mathbb{R}^{n}\) is a probability measure.
Proof. We are going to show that \(\mathbb{P}_{\bf X}\) satisfies the axioms. Since \(\mathbb{P}\) is a probability measure, we have \(\mathbb{P}_{\bf X}(B)\geq 0\) for \(B\in {\cal B}^{n}\). We also have
\begin{align*} \mathbb{P}_{\bf X}(\mathbb{R}^{n}) & =\mathbb{P}({\bf X}^{-1}(\mathbb{R})^{n})\\ & =\mathbb{P}(S)=1.\end{align*}
For \(B_{i}\cap B_{j}=\emptyset\), we have
\begin{align*} \mathbb{P}_{\bf X}\left (\bigcup_{i=1}^{\infty}B_{i}\right ) & =\mathbb{P}\left ({\bf X}^{-1}\left (\bigcup_{i=1}^{\infty}B_{i}\right )\right )\\ & =\mathbb{P}\left (\bigcup_{i=1}^{\infty}{\bf X}^{-1}(B_{i})\right )\\ & =\sum_{i=1}^{\infty}\mathbb{P}\left ({\bf X}^{-1}(B_{i})\right )\\ & =\sum_{i=1}^{\infty}\mathbb{P}_{\bf X}(B{i})\end{align*}
This completes the proof. \(\blacksqyare\)
\begin{equation}{\label{b}}\tag{B}\mbox{}\end{equation}
Probability Distribution Functions.
The probability measure \(\mathbb{P}_{\bf X}\) is called the probability distribution function of \({\bf X}\). The random vector \({\bf X}\) is said to be of the discrete type when there exists countably many points \({\bf z}_{i}\in\mathbb{R}^{n}\) for \(i=1,2,\cdots\) satisfying
\[\sum_{i=1}^{\infty}\mathbb{P}({\bf X}={\bf z}_{i})=1.\]
In this case, we set
\[\mathbb{P}({\bf X}={\bf z}_{i})=f_{\bf X}({\bf z}_{i})\]
and call \(f_{\bf X}\) the probability density function (or probability mass function) of \({\bf X}\). Clearly, \(f_{\bf X}({\bf z}_{i})\geq 0\) for all \(i=1,2,\cdots\) and \(\sum_{i=1}^{\infty}f_{\bf X}({\bf z}_{i})=1\). Furthermore, we have
\begin{align*} \mathbb{P}_{\bf X}(B) & =\mathbb{P}({\bf X}\in B)\\ & =\sum_{{\bf z}_{i}\in B}f_{\bf X}({\bf z}_{i}).\end{align*}
On the other hand, if \({\bf X}\) satisfies \(\mathbb{P}({\bf X}={\bf z})=0\) for all \({\bf z}\in \mathbb{R}^{n}\), we say that \({\bf X}\) is of the continuous type. If there exists a nonnegative measurable function \(f_{\bf X}:(\mathbb{R}^{n},{\cal B}^{n})\rightarrow (\mathbb{R},{\cal B})\) (but not necessarily continuous) satisfying
\[\mathbb{P}_{\bf X}(B)=\mathbb{P}({\bf X}\in B)=\int_{B}f_{\bf X}({\bf z})d{\bf z}\]
for \(B\in {\cal B}^{n}\), we say that \(f_{\bf X}\) is the probability density function of \({\bf X}\). Clearly, we have
\[\int_{\mathbb{R}^{n}}f_{\bf X}({\bf z})d{\bf z}=1.\]
Proposition. Given a random variable \(X\), the family
\[\sigma (X)=\left\{X^{-1}(B):B\in {\cal B}\right\}\]
is a \(\sigma\)-field on \(S\).
Proof. It is clear to see that \(S=X^{-1}(\mathbb{R})\) belongs to \(\sigma (X)\). If each of the sets \(X^{-1}(B_{i})\) belongs to \(\sigma (X)\), since \({\cal B}\) is a \(\sigma\)-field, it follows \(\bigcup_{i=1}^{\infty}B_{i}\in {\cal B}\). Since
\[\bigcup_{i=1}^{\infty}X^{-1}(B_{i})=X^{-1}\left (\bigcup_{i=1}^{\infty}B_{i}\right ),\]
we have \(\bigcup_{i=1}^{\infty}X^{-1}(B_{i})\in\sigma (X)\). This completes the proof. \(\blacjsquare\)
We also say that \(\sigma (X)\) is the \(\sigma\)-field generated by \(X\).
\begin{equation}{\label{c}}\tag{C}\mbox{}\end{equation}
Distribution Functions.
Let \(X\) be a random variable with probability distribution function \(\mathbb{P}_{X}\) , i.e., \(\mathbb{P}_{X}(B)=\mathbb{P}(X\in B)\) for \(B\in {\cal B}\). In particular, for \(B=(-\infty ,x]\), \(\mathbb{P}_{X}((-\infty ,x])\) is denoted by \(F(x)\) and is called the distribution function of \(X\). We also see
\[F(x)=\mathbb{P}(X\leq x).\]
Let \({\bf X}=(X_{1},\cdots ,X_{n})\) be a random vector. The distribution function of \({\bf X}\), also known as the joint distribution function of \(X_{1},\cdots ,X_{n}\) is defined by
\[F(x_{1},\cdots ,x_{n})=\mathbb{P}(X_{1}\leq x_{1},\cdots ,X_{n}\leq x_{n}).\]
Proposition. The distribution function \(F\) of a random variable \(X\) satisfies the following properties.
(i) We have \(0\leq F(x)\leq 1\).
(ii) The distribution function \(F\) is nondecreasing.
(iii) The distribution function \(F\) is right-continuous.
(iv) We have
\[\lim_{x\rightarrow -\infty}F(x)=0\]
and
\[\lim_{x\rightarrow +\infty}F(x)=1.\]
If \(X\) is discrete, its distribution function is a step function and we have
\[F(x)=\sum_{x_{i}\leq x}f(x_{i})\]
and
\[f(x_{i})=F(x_{i})-F(x_{i-1}),\]
where we assume \(x_{1}<x_{2}<\cdots\).
The distribution function of a random variable \(X\) of the continuous type, defined in terms of the p.d.f. of \(X\), is given by
\[F(x)=\mathbb{P}(X\leq x)=\int_{-\infty}^{x} f(t)dt.\]
The distribution function \(F(x)\) cumulates all of the probability less than or equal to \(x\). Using the fundamental theorem of calculus, we have \(F'(x)=f(x)\) when the derivative \(F'(x)\) exists.
Let \({\bf X}=(X_{1},\cdots ,X_{n})\) be a random vector. Then, for each \(i\), the distribution function of \(X_{i}\) given by
\[F_{X_{i}}(x)=\lim_{x_{j}\rightarrow\infty ,j\neq i}F(x_{1},\cdots ,x_{i-1},x,x_{i+1},\cdots ,x_{n})\]
is called the marginal distribution function of \(X_{i}\). If \(k\) \(x_{j}\)’s are taken to approach \(\infty\), then the resulting function is the joint distribution function of the random variables corresponding to the remaining \((n-k)\) \(X_{j}\)’s. All these distribution functions are also called the marginal distribution functions.
The above Theorem says that \({\bf X}=(X_{1},\cdots .X_{n})\) is a random vector if and only if \(X_{1},\cdots ,X_{n}\) are random variables. Then, the p.d.f. \(f({\bf x})=f(x_{1},\cdots ,x_{n})\) of \({\bf X}\) is also called the joint probability density function of \(X_{1},\cdots ,X_{n}\).
Let us consider the case \(n=2\). Then, the p.d.f.’s of \(X_{1}\) and \(X_{2}\) are given by
\[f_{1}(x_{1})=\int_{-\infty}^{\infty}f(x_{1},x_{2})dx_{2}\]
and
\[f_{2}(x_{2})=\int_{-\infty}^{\infty}f(x_{1},x_{2})dx_{1}.\]
They are also called the marginal probability density functions. We also have
\begin{align*} \mathbb{P}(X_{1}\in B) & =\mathbb{P}(X_{1}\in B,X_{2}\in \mathbb{R})\\ & =\int_{B}\int_{\mathbb{R}}f(x_{1},x_{2})dx_{1}dx_{2}\\ & =\int_{B}\left [\int_{\mathbb{R}}f(x_{1},x_{2})dx_{2}\right ]dx_{1}=\int_{B}f_{1}(x_{1})dx_{1}.\end{align*}
Let \(x_{1}\) be a fixed value with \(f_{1}(x_{1})>0\). We define \(f(x_{2}|x_{1})\) as
\[f(x_{2}|x_{1})=\frac{f(x_{1},x_{2})}{f_{1}(x_{1})},\]
which is considered as a function of \(x_{2}\). Then \(f(\cdot |x_{2})\) is a p.d.f. In fact, \(f(x_{2}|x_{1})\geq 0\) and
\begin{align*} \int_{\mathbb{R}}f(x_{2}|x_{1})dx_{2} & =\frac{1}{f_{1}(x_{1})}\int_{\mathbb{R}}f(x_{1},x_{2})dx_{2}\\ & =\frac{f_{1}(x_{1})}{f_{1}(x_{1})}=1.\end{align*}
In a similar fashion, for \(f_{2}(x_{2})>0\), we define \(f(x_{1}|x_{2})\) by
\[f(x_{1}|x_{2})=\frac{f(x_{1},x_{2})}{f_{2}(x_{2})}\]
and show that \(f(\cdot |x_{2})\) is a p.d.f. We call \(f(\cdot |x_{1})\) the conditional probability density function of \(X_{2}\) given \(X_{1}=x_{1}\), and call \(f(\cdot |x_{2})\) the conditional probability density function of \(X_{1}\) given \(X_{2}=x_{2}\). We also have
\[\mathbb{P}(X_{2}\in B|X_{1}=x_{1})=\int_{B}f(x_{2}|x_{1})dx_{2}\]
and
\[\mathbb{P}(X_{1}\in B|X_{2}=x_{2})=\int_{B}f(x_{1}|x_{2})dx_{1}.\]
If \(X_{1}\) and \(X_{2}\) are both discrete, then \(f(x_{2}|x_{1})\) has the following interpretation:
\begin{align*} f(x_{2}|x_{1}) & =\frac{f(x_{1},x_{2})}{f_{1}(x_{1})}\\ & =\frac{\mathbb{P}(X_{1}=x_{1},X_{2}=x_{2})}{\mathbb{P}(X_{1}=x_{1})}\\ & =\mathbb{P}(X_{2}=x_{2}|X_{1}=x_{1}).\end{align*}
We can also define the conditional distribution function of \(X_{2}\) given \(X_{1}=x_{1}\) by
\[F(x_{2}|x_{1})=\int_{-\infty}^{x_{2}}f(x|x_{1})dx,\]
and similarly for \(F(x_{1}|x_{2})\).
Example. Let \(X\) and \(Y\) have the joint p.d.f.
\[f(x,y)=\frac{x+y}{21}\]
for \(x=1,2,3\) and \(y=1,2\). Then, the marginal probability functions are given by
\[f_{X}(x)=\frac{2x+3}{21}\mbox{ for }x=1,2,3\]
and
\[f_{Y}(y)=\frac{3y+6}{21}\mbox{ for }y=1,2.\]
Therefore, the conditional p.d.f. of \(X\) given \(Y=y\) is equal to
\[g(x|y)=\frac{(x+y)/21}{(3y+6)/21}=\frac{x+y}{3y+6}\]
for \(x=1,2,3\) when \(y=1\) or \(2\). For example, we have
\begin{align*} \mathbb{P}(X=2|Y=2) & =g(2|2)\\ & =4/12=1/3. \end{align*}
Similarly, the conditional p.d.f. of \(Y\), given \(X=x\), is equal to
\[h(y|x)=\frac{x+y}{2x+3}\]
for \(y=1,2\) when \(x=1,2\) or \(3\). \(\sharp\)
In general, the (joint) marginal p.d.f. of \(X_{i_{1}},\cdots ,X_{i_{m}}\) is given by
\[f_{i_{1},\cdots ,i_{m}}(x_{i_{1}},\cdots ,x_{i_{m}})=\int_{\mathbb{R}}\cdots\int_{\mathbb{R}}f(x_{1},\cdots ,x_{n})dx_{j_{1}}\cdots dx_{j_{k}},\]
where \(m+k=n\). Also, the joint conditional p.d.f. and conditional d.f are given by
\[f(x_{j_{1}},\cdots ,x_{j_{k}}|x_{i_{1}},\cdots ,x_{i_{m}})=\frac{f(x_{1},\cdots ,x_{n})}{f_{i_{1},\cdots ,i_{m}}(x_{i_{1}},\cdots ,x_{i_{m}})}\]
and
\[F(x_{j_{1}},\cdots ,x_{j_{k}}|x_{i_{1}},\cdots ,x_{i_{m}})=\int_{-\infty}^{x_{j_{1}}}\cdots\int_{-\infty}^{x_{j_{k}}}f(y_{1},\cdots ,y_{k}|x_{i_{1}},\cdots ,x_{i_{m}})dy_{1}\cdots dy_{k},\]
respectively.
Example. Let the joint p.d.f. of \(X\) and \(Y\) be defined by
\[f(x,y)=\frac{x+y}{21}\]
for \(x=1,2,3\) and \(y=1,2\).
Then, we have
\begin{align*} f_{X}(x) & =\sum_{y\in R_{Y}}f(x,y)\\ & =\sum_{y=1}^{2}\frac{x+y}{21}\\ & =\frac{2x+3}{21}\end{align*}
for \(x=1,2,3\) and
\begin{align*} f_{Y}(y) & =\sum_{x\in R_{X}}f(x,y)\\ & =\sum_{x=1}^{3}\frac{x+y}{21}\\ & =\frac{6+3y}{21}\end{align*}
for \(y=1,2\). Both \(f_{X}(x)\) and \(f_{Y}(y)\) satisfy the properties of probability density function. \(\sharp\)
Example. Let the joint p.d.f. of \(X\) and \(Y\) be given by
\[f(x,y)=\frac{xy^{2}}{30}\]
for \(x=1,2,3\) and \(y=1,2\).
The marginal probability density functions are given by
\[f_{X}(x)=\sum_{y=1}^{2}\frac{xy^{2}}{30}=\frac{x}{6}\]
for \(x=1,2,3\) and
\[f_{Y}(y)=\sum_{x=1}^{3}\frac{xy^{2}}{30}=\frac{y^{2}}{5}\]
for \(y=1,2\). Then, we have \(f(x,y)=f_{X}(x)f_{Y}(y)\) for \(x=1,2,3\) and \(y=1,2\). \(\sharp\)
Example. Let the joint p.d.f. of \(X\) and \(Y\) be
\[f(x,y)=\frac{xy^{2}}{13}\]
for \((x,y)=(1,1),(1,2),(2,2)\). Then, the p.d.f. of \(X\) is given by
\[f_{X}(x)=\left\{\begin{array}{ll}
\frac{5}{13}, & x=1\\
\frac{8}{13}, & x=2,
\end{array}\right .\]
and the p.d.f. of \(Y\) is given by
\[f_{Y}(y)=\left\{\begin{array}{ll}
\frac{1}{13}, & y=1,\\
\frac{12}{13}, & y=2.
\end{array}\right .\]
Example. Let \(X\) and \(Y\) have the joint p.d.f. \(f(x,y)=e^{-x-y}\) for \(0<x<\infty\) and \(0<y<\infty\). Let
\[A=\{(x,y):0<x<\infty\mbox{ for }0<y<x/3\}.\]
The probability that \((X,Y)\) falls in \(A\) is given by
\begin{align*} P[(X,Y)\in A] & =\int_{0}^{\infty}\int_{0}^{x/3} e^{-x-y}dxdy\\ & =\int_{0}^{\infty}(e^{-x}-e^{-4x/3})dx\\ & =\frac{1}{4}.\end{align*}


