Jules-Cyrille Cave (1859-1949) was a French painter.
The distribution function \(F\) of a random variable \(X\) satisfies the following properties.
- \(0\leq F(x)\leq 1\) for \(x\in \mathbb{R}\).
- \(F\) is nondecreasing.
- \(F\) is continuous from the right.
- \(F(x)\rightarrow 0\) as \(x\rightarrow -\infty\) and \(F(x)\rightarrow 1\) as \(x\rightarrow +\infty\).
- \(\mathbb{P}(a<x\leq b)=F(b)-F(a)\).
- If \(X\) is discrete, its distribution function is a step function. The value of it at \(x\) being defined by \[F(x)=\sum_{x_{j}\leq x} f(x_{j})\] and \[f(x_{j})\equiv\mathbb{P}(X=x_{j})=F(x_{j})-F(x_{j-1}),\] where it is assumed \(x_{1}<x_{2}<\cdots \)latex . We also note \(\mathbb{P}(X=x_{j})=0\) for continuous random variable \(X\).
- If \(X\) is a continuous type, its distribution function \(F\) is continuous. Furthermore, we have \(\frac{dF(x)}{dx}=f(x)\) at continuity points of \(f\) since \[F(x)=\int_{-\infty}^{x} f(t)dt\mbox{ and }\frac{dF(x)}{dx}=f(x)\] from the fundamental theorem of calculus.
We now consider the random vector \({\bf X}=(X_{1},X_{2})\) and the distribution function \(F(x_{1},x_{2})\) of \({\bf X}\), or the joint distribution function of \(X_{1},X_{2}\). We have
\[F(x_{1},x_{2})=\mathbb{P}(X_{1}\leq x_{1},X_{2}\leq x_{2}).\]
Let \({\bf X}\) have a joint p.d.f. \(f(x_{1},x_{2})\). Then, we have
\[\mathbb{P}(a_{1}\leq X_{1}\leq b_{1},a_{2}\leq X_{2}\leq b_{2})=\int_{a_{1}}^{b_{1}}\int_{a_{2}}^{b_{2}} f(x_{1},x_{2})dx_{1}dx_{2}.\]
We have the following properties.
- \(0\leq F(x_{1},x_{2})\leq 1\) for \(x_{1},x_{2}\in\mathbb{R}\).
- \(F\) is continuous from the right with respect to each of the coordinates \(x_{1},x_{2}\).
- If both \(x_{1},x_{2}\rightarrow\infty\), then \(F(x_{1},x_{2})\rightarrow 1\). If at least one of the \(x_{1},x_{2}\rightarrow-\infty\), then \(F(x_{1},x_{2})\rightarrow 0\). We express this by writing \(F(\infty ,\infty )=1\), \(F(-\infty ,x_{2})=F(x_{1},-\infty)= F(-\infty ,-\infty )=0\), where \(-\infty <x_{1},x_{2}<\infty\).
- We have \[\frac{\partial^{2}}{\partial x_{1}\partial x_{2}}F(x_{1},x_{2})=f(x_{1},x_{2})\] at continuity points of \(f\).
Let \({\bf X}=(X_{1},\cdots ,X_{k})\) be a random vector, and let \(F(x_{1},\cdots ,x_{n})\) be a distribution function of \({\bf X}\), or the joint distribution function of \(X_{1},\cdots ,X_{k}\). Then, we have
\[\frac{\partial^{k}}{\partial x_{1},\cdots ,\partial x_{k}}F(x_{1},\cdots ,x_{k})=f(x_{1},\cdots ,x_{k})\]
at the continuity points of \(f\).
\[F(\infty ,\cdots ,\infty ,x_{j},\infty ,\cdots ,\infty)=F_{j}(x_{j})\]
is the distribution function of the random variable \(X_{j}\). If \(m\) \(x_{j}\)’s are replaced by \(\infty\) for \(1<m<k\), then the resulting function is the joint distribution function of the random variables corresponding to the remaining \((k-m)\) \(X_{j}\)’s. All these distribution functions are called marginal distribution functions.
The p.d.f of \({\bf X}\), \(f(x_{1},\cdots ,x_{k})\) is also called the joint p.d.f. of the random variables \(X_{1},\cdots ,X_{k}\). Consider first the case \(k=2\). We set
\[f_{1}(x_{1})=\left\{\begin{array}{l}
{\displaystyle \sum_{x_{2}} f(x_{1},x_{2})}\\
{\displaystyle \int_{-\infty}^{\infty} f(x_{1},x_{2})dx_{2}}
\end{array}\right .\]
and
\[f_{2}(x_{2})=\left\{\begin{array}{l}
{\displaystyle \sum_{x_{1}} f(x_{1},x_{2})}\\
{\displaystyle \int_{-\infty}^{\infty} f(x_{1},x_{2})dx_{1}}
\end{array}\right .\]
Then \(f_{1}\) is the p.d.f. of \(X_{1}\), and \(f_{2}\) is the p.d.f. of \(X_{2}\). We call \(f_{1},f_{2}\) the marginal p.d.f’s. In fact, we have
\[\mathbb{P}(X_{1}\in B)=\left\{\begin{array}{l}{\displaystyle \sum_{x_{1}\in B,x_{2}\in\mathbb{R}} f(x_{1},x_{2})=
\sum_{x_{1}\in B}\sum_{x_{2}\in\mathbb{R}} f(x_{1},x_{2})=\sum_{x_{1}\in B} f_{1}(x_{1})}\\
{\displaystyle \int_{B}\int_{\mathbb{R}} f(x_{1},x_{2})dx_{1}dx_{2}=\int_{B}\left [\int_{\mathbb{R}} f(x_{1},x_{2})dx_{2}\right ]dx_{1}=\int_{B} f_{1}(x_{1})dx_{1}}.
\end{array}\right .\]
Suppose that \(f_{1}(x_{1})>0\). Then, we define \(f(x_{2}|x_{1})\) by
\[f(x_{2}|x_{1})=\frac{f(x_{1},x_{2})}{f_{1}(x_{1})}.\]
This is considered as a function of \(x_{2}\), where \(x_{1}\) is an arbitrary and fixed value of \(X_{1}\). Then \(f(\cdot |x_{1})\) is a p.d.f. Similarly, if \(f_{2}(x_{2})>0\), we define \(f(x_{1}|x_{2})\) by
\[f(x_{1}|x_{2})=\frac{f(x_{1},x_{2})}{f_{2}(x_{2})}.\]
Then \(f(\cdot |x_{2})\) is a p.d.f. We call \(f(\cdot |x_{1})\) the conditional p.d.f. of \(X_{2}\) given that \(X_{1}=x_{1}\) when \(f_{1}(x_{1})>0\)). For a similar reason, we call \(f(\cdot |x_{2})\) the conditional p.d.f. of \(X_{1}\) given that \(X_{2}=x_{2}\) when \(f_{2}(x_{2})>0\)). Furthermore, if \(X_{1}\), \(X_{2}\) are both discrete, then \(f(x_{2}|x_{1})\) has the following interpretation
\begin{align*} f(x_{2}|x_{1}) & =\frac{f(x_{1},x_{2})}{f_{1}(x_{1})}\\ & =\frac{\mathbb{P}(X_{1}=x_{1},X_{2}=x_{2})}{\mathbb{P}(X_{1}=x_{1})}\\ & =\mathbb{P}(X_{2}=x_{2}|X_{1}=x_{1}).\end{align*}
Therefore, we obtain
\[\mathbb{P}(X_{2}\in B|X_{1}=x_{1})=\sum_{x_{2}\in B} f(x_{2}|x_{1}).\]
We can also define the conditional distribution function of \(X_{2}\) given \(X_{1}=x_{1}\) by means of
\[F(x_{2}|x_{1})=\left\{\begin{array}{l}
{\displaystyle \sum_{x’_{2}\leq x_{2}} f(x’_{2}|x_{1})}\\
{\displaystyle \int_{-\infty}^{x_{2}} f(x’_{2}|x_{1})dx’_{2}}
\end{array}\right .\]
and similarly for \(F(x_{1}|x_{2})\).
Let \({\bf X}=(X_{1},\cdots ,X_{k})\) be a random vector with p.d.f \(f(x_{1},\cdots ,x_{k})\), then we have called \(f(x_{1},\cdots,x_{k})\) the joint p.d.f. of the random variables \(X_{1},\cdots,X_{k}\). If we integrate or sum over \(n\) of the variables \(x_{1},\cdots x_{k}\) keeping the remaining \(m\) fixed ($n+m=k$), the resulting function is the joint p.d.f. of the random variables corresponding to the remaining \(m\) variables; that is,
\[f_{i_{1},\cdots ,i_{m}}(x_{i_{1}},\cdots ,x_{i_{m}})=\left\{\begin{array}{l}
{\displaystyle \sum_{x_{j_{1}},\cdots ,x_{j_{n}}} f(x_{1},\cdots ,x_{k})}\\
{\displaystyle \int_{-\infty}^{\infty}\cdots\int_{-\infty}^{\infty}f(x_{1},\cdots ,x_{k})dx_{j_{1}}\cdots dx_{j_{n}}}.
\end{array}\right .\]
There are
\[\left (\begin{array}{c} k\\ 1\end{array}\right )+\left (\begin{array}{c} k\\ 2\end{array}\right )+\left (\begin{array}{c} k\\ k-1\end{array}\right )=2^{k}-2\]
such p.d.f.’s which are also called marginal p.d.f’s. Also if \(x_{i_{1}}, \cdots ,x_{i_{m}}\) are such that \(f_{i_{1},\cdots,i_{m}}(x_{i_{1}},\cdots , x_{i_{m}})>0\), then the function of $x_{j_{1}},\cdots x_{j_{n}}$ defined by
\[f(x_{j_{1}},\cdots ,x_{j_{n}}|x_{i_{1}},\cdots ,x_{i_{m}})=\frac{f(x_{1},\cdots ,x_{k})}{f_{i_{1},\cdots ,i_{m}}(x_{i_{1}},\cdots ,x_{i_{m}})}\]
is a p.d.f. called the joint conditional p.d.f. of the random variables \(X_{j_{1}},\cdots ,X_{j_{n}}\) given \(X_{i_{1}}=x_{i_{1}},\cdots , X_{i_{m}}=x_{i_{m}}\). Conditional distribution are defined by
\[F(x_{j_{1}},\cdots ,x_{j_{n}}|x_{i_{1}},\cdots ,x_{i_{m}})=\left\{\begin{array}{l}
{\displaystyle \sum_{(x’_{j{1}},\cdots ,x’_{j{n}})\leq (x_{j_{1}},\cdots ,x_{j_{n}})} f(x’_{j{1}},\cdots ,x’_{j{n}}|x_{i_{1}},\cdots ,x_{i_{m}})}\\
{\displaystyle \int_{-\infty}^{x_{j_{1}}}\cdots\int_{-\infty}^{x_{j_{n}}}f(x’_{j{1}},\cdots ,x’_{j{n}}|x_{i_{1}},\cdots ,x_{i_{m}})dx’_{j{1}}\cdots dx’_{j{n}}}.
\end{array}\right .\]
Let \({\bf X}=(X_{1},\cdots ,X_{k})\) be a random vector with p.d.f. \(f\), and let \(g:(\mathbb{R}^{k},{\cal B}^{k})\rightarrow (\mathbb{R},{\cal B})\) be a measurable function such that \(g({\bf X})=g(X_{1},\cdots ,X_{k})\) is a random variable. The \(n\)th moment of \(g({\bf X})\) is denoted by \(\mathbb{E}[g({\bf X})]^{n}\) and is defined by
\[\mathbb{E}[g({\bf X})]^{n}=\left\{\begin{array}{l}
{\displaystyle \sum_{{\bf x}} [g({\bf x})]^{n}f({\bf x})}\\
{\displaystyle \int_{-\infty}^{\infty}\cdots\int_{-\infty}^{\infty}[g(x_{1},\cdots ,x_{k})]^{n}f(x_{1},\cdots ,x_{k})dx_{1}\cdots dx_{k}.}
\end{array}\right .\]
For \(n=1\), we get
\[\mathbb{E}[g({\bf X})]=\left\{\begin{array}{l}
{\displaystyle \sum_{{\bf x}} g({\bf x})f({\bf x})}\\
{\displaystyle \int_{-\infty}^{\infty}\cdots\int_{-\infty}^{\infty}g(x_{1},\cdots ,x_{k})f(x_{1},\cdots ,x_{k})dx_{1}\cdots dx_{k}.}
\end{array}\right .\]
and call it the mathematical expectation or mean value or just mean of \(g({\bf X})\). For \(r>0\), the \(r\)th absolute moment of \(g({\bf X})\) is denoted by \(\mathbb{E}|g({\bf X})|^{r}\) is defined by
\[\mathbb{E}|g({\bf X})|^{r}=\left\{\begin{array}{l}
{\displaystyle \sum_{{\bf x}} |g({\bf x})|^{r}f({\bf x})}\\
{\displaystyle \int_{-\infty}^{\infty}\cdots\int_{-\infty}^{\infty}
|g(x_{1},\cdots ,x_{k})|^{r}f(x_{1},\cdots ,x_{k})dx_{1}\cdots dx_{k}.}
\end{array}\right .\]
For an arbitrary constant \(c\), and \(n\) and \(r\) as above, the \(n\)th moment and \(r\)th absolute moment of \(g({\bf X})\) about \(c\) are denoted by \(\mathbb{E}[g({\bf X})-c)]^{n}\), \(\mathbb{E}|g({\bf X})-c|^{r}\), respectively, and are defined by
\[\mathbb{E}[g({\bf X}-c]^{n}=\left\{\begin{array}{l}
{\displaystyle \sum_{{\bf x}} [g({\bf x})-c]^{n}f({\bf x})}\\
{\displaystyle \int_{-\infty}^{\infty}\cdots\int_{-\infty}^{\infty}[g(x_{1},\cdots ,x_{k})-c]^{n}f(x_{1},\cdots ,x_{k})dx_{1}\cdots dx_{k}.}
\end{array}\right .\]
and
\[\mathbb{E}|g({\bf X}-c)|^{r}=\left\{\begin{array}{l}
{\displaystyle \sum_{{\bf x}} |g({\bf x})-c|^{r}f({\bf x})}\\
{\displaystyle \int_{-\infty}^{\infty}\cdots\int_{-\infty}^{\infty}|g(x_{1},\cdots ,x_{k})-c|^{r}f(x_{1},\cdots ,x_{k})dx_{1}\cdots dx_{k}.}
\end{array}\right .\]
For \(c=E[g({\bf X})]\), the moments are called central moments. The \(2\)nd central moment of \(g({\bf X})\), that is,
\[\mathbb{E}[g({\bf X}-\mathbb{E}g({\bf X}))]^{2}=\left\{\begin{array}{l}
{\displaystyle \sum_{{\bf x}} [g({\bf x})-\mathbb{E}g({\bf X})]^{2}f({\bf x})}\\
{\displaystyle \int_{-\infty}^{\infty}\cdots\int_{-\infty}^{\infty}[g(x_{1},\cdots ,x_{k})-\mathbb{E}g({\bf X})]^{2}f(x_{1},\cdots ,x_{k})dx_{1}\cdots dx_{k}.}
\end{array}\right .\]
is called the variance of \(g({\bf X})\), and is denoted by \(\mbox{Var}[g({\bf X})]\) or \(\sigma^{2}[g({\bf X})]\).
We have the following properties
- Given any constant \(c\), we have$\mathbb{E}(c)=c$.
- We have \(\mathbb{E}[cg({\bf X})]=c\mathbb{E}[g({\bf X})]\). In particular, we also have \(\mathbb{E}(cX)=c\mathbb{E}(X)\).
- We have \(\mathbb{E}[g({\bf X})+d]=\mathbb{E}[g({\bf X})]+d\). In particular, we also have \(\mathbb{E}(X+d)=\mathbb{E}(X)+d\).
- We have \[\mathbb{E}\left [\sum_{j=1}^{n} c_{j}g_{j}({\bf X})\right]= \sum_{j=1}^{n} c_{j}\mathbb{E}[g_{j}({\bf X})],\] In particular, we also have \[\mathbb{E}\left (\sum_{j=1}^{n} c_{j}X_{j}\right)=\sum_{j=1}^{n} c_{j}E(X_{j}).\]
- If \(X\geq Y\), then \(\mathbb{E}(X)\geq \mathbb{E}(Y)\). In particular, if \(X\geq 0\), then \(\mathbb{E}(X)\geq 0\).
- We have \(|\mathbb{E}[g({\bf X})]|\leq \mathbb{E}|g({\bf X})|\).
- Given any constant \(c\), we have \(\mbox{Var}(c)=0\).
- We have \(\mbox{Var}[cg({\bf X})]=c^{2}\mbox{Var}[g({\bf X})]\). In particular, we also have \(\mbox{Var}(cX)=c^{2}\mbox{Var}(X)\).
- We have \(\mbox{Var}[g({\bf X})+d]=\mbox{Var}[g({\bf X})]\). In particular, we also have \(\mbox{Var}(X+d)=\mbox{Var}(X)\). Therefore, we obtain \(\mbox{Var}[cg({\bf X})+d]=c^{2}\mbox{Var}[g({\bf X})]\), and in particular, we have \(\mbox{Var}(cX+d)=c^{2}\mbox{Var}(X)\).
- We have \(\mbox{Var}[g({\bf X})]=\mathbb{E}[g({\bf X})]^{2}-[\mathbb{E}g({\bf X})]^{2}\). In particular, we also have \(\mbox{Var}(X)=\mathbb{E}(X^{2})-(\mathbb{E}X)^{2}\).
The conditional moments of random variables are defined by
\[\mathbb{E}(X_{2}|X_{1}=x_{1})=\left\{\begin{array}{l}
{\displaystyle \sum_{x_{2}} x_{2}f(x_{2}|x_{1})}\\
{\displaystyle \int_{-\infty}^{\infty} x_{2}f(x_{2}|x_{1})dx_{2}}
\end{array}\right .\]
and
\[\mbox{Var}(X_{2}|X_{1}=x_{1})=\left\{\begin{array}{l}
{\displaystyle \sum_{x_{2}} [x_{2}-E(X_{2}|X_{1}=x_{1})]^{2}f(x_{2}|x_{1})}\\
{\displaystyle \int_{-\infty}^{\infty} [x_{2}-E(X_{2}|X_{1}=x_{1}]^{2}f(x_{2}|x_{1})dx_{2}}
\end{array}\right .\]
We see that \(\mathbb{E}(X_{2}|X_{1}=x_{1})\) is a function of \(x_{1}\). Replacing \(x_{1}\) by \(X_{1}\) and writing \(\mathbb{E}(X_{2}|X_{1})\) instead of \(\mathbb{E}(X_{2}|X_{1}=x_{1})\), we have that \(\mathbb{E}(X_{2}|X_{1})\) is itself a random variable, and a function of \(X_{1}\). Then, we may talk about the expectation of \(\mathbb{E}(X_{2}|X_{1})\); that is, \(\mathbb{E}[\mathbb{E}(X_{2}|X_{1})]\). We can show that if \(\mathbb{E}(X_{2})\) and \(\mathbb{E}(X_{2}|X_{1})\) exists, then
\[\mathbb{E}[\mathbb{E}(X_{2}|X_{1})]=\mathbb{E}(X_{2}).\]
Note that the expectation of Cauchy distribution does not exist. We also have the following properties.
- Let \(X\) and \(Y\) be two random variables, let \(g(X)\) be a measurable function of \(X\), and let \(\mathbb{E}(Y)\) exist. Then, for all$x$ for which the conditional expectations below exists, we have \[\mathbb{E}[Yg(X)|X=x]=g(x)\mathbb{E}(Y|X=x)\] or \[\mathbb{E}[Yg(X)|X]=g(X)\mathbb{E}(Y|X)\] a random variable.
- Let \(X\) and \(Y\) be two random variables. Then, we have \[\mbox{Var}(Y)=\mathbb{E}[\mbox{Var}(Y|X)]+\mbox{Var}[\mathbb{E}(Y|X)].\]
Theorem. Let \({\bf X}\) be a \(k\)-dimensional random vector and \(g\geq 0\) be a real-valued measurable function defined on \(\mathbb{R}^{k}\) such that \(g({\bf X})\) is a random variable, and let \(c>0\). Then, we have
\[\mathbb{P}(g({\bf X})\geq c)\leq\frac{\mathbb{E}[g({\bf X})]}{c}.\]
Let \(X\) be a random variable and take \(g(X)=|X-\mu |^{r}\) and \(\mu=\mathbb{E}(X)\) for \(r>0\). Then, we have
\[\mathbb{P}(|X-\mu |\geq c)=\mathbb{P}(|X-\mu |^{r}\geq c^{r})\leq\frac{\mathbb{E}|X-\mu |^{r}}{c^{r}}.\]
This is known as Markov’s inequality. For \(r=2\), we have
\[\mathbb{P}(|X-\mu |\geq c)\leq\frac{\mathbb{E}(X-\mu )^{2}}{c^{2}}=\frac{\sigma^{2}(X)}{c^{2}}=\frac{\sigma^{2}}{c^{2}}.\]
This is known as Tcheyshev’s inequality. In particular, if \(c=k\sigma\), then
\[\mathbb{P}(|X-\mu |\geq k\sigma)\leq\frac{1}{k^{2}}.\]
Theorem. (Schwarz inequality). Let \(X\) and \(Y\) be two random variables with means \(\mu_{1},\mu_{2}\) and variances \(\sigma_{1}^{2},\sigma_{2}^{2}\), respectively. Then, we have
\[\mathbb{E}^{2}[(X-\mu_{1})(Y-\mu_{2})]\leq\sigma_{1}^{2}\sigma_{2}^{2}\]
or equivalently
\[-\sigma_{1}\sigma_{2}\leq \mathbb{E}[(X-\mu_{1})(Y-\mu_{2})]\leq\sigma_{1}\sigma_{2},\]
and
\[\mathbb{E}[(X-\mu_{1})(Y-\mu_{2})]=\sigma_{1}\sigma_{2}\]
of and only if
\[\mathbb{P}\left(Y=\mu_{2}+\frac{\sigma_{2}}{\sigma_{1}}(X-\mu_{1})\right )=1,\]
and
\[\mathbb{E}[(X-\mu_{1})(Y-\mu_{2})]=-\sigma_{1}\sigma_{2}\]
if and only if
\[\mathbb{P}\left(Y=\mu_{2}-\frac{\sigma_{2}}{\sigma_{1}}(X-\mu_{1})\right)=1.\sharp\]
$\mathbb{E}[(X-\mu_{1})(Y-\mu_{2})]$ is called the covariance of \(X,Y\) and is denoted by \(\mbox{Cov}(X,Y)\). If \(\sigma_{1}\), \(\sigma_{2}\) are the standard deviation of \(X\) and \(Y\), which are assumed to be positive, then the covariance of \((X-\mu_{1})/\sigma_{1}\) and \((Y-\mu)/\sigma_{2}\) is called the correlation of \(X,Y\) and is denoted by \(\rho (X,Y)\); that is,
\begin{align*} \rho & =\mathbb{E}\left [\left (\frac{X-\mu_{1}}{\sigma_{1}}\right )\left (\frac{Y-\mu_{2}}{\sigma_{2}}\right )\right ]\\ & =\frac{\mbox{Cov}(X,Y)}{\sigma_{1}
\sigma_{2}}\\ & =\frac{\mathbb{E}(XY)-\mu_{1}\mu_{2}}{\sigma_{1}\sigma_{2}}.\end{align*}
From the Schwartz inequality, we have \(\rho^{2}\leq 1\); that is, \(-1\leq\rho\leq 1\), and \(\rho=1\) if and only if
\[Y=\mu_{2}+\frac{\sigma_{2}}{\sigma_{1}}(X-\mu_{1})\]
with probability \(1\), and \(\rho =-1\) if and only if
\[Y=\mu_{2}-\frac{\sigma_{2}}{\sigma_{1}}(X-\mu_{1})\]
with probability \(1\). So \(\rho =\pm 1\) mean \(X\) and \(Y\) are linearly related. Therefore \(\rho\) is a measure of linear dependence between \(X\) and \(Y\). If \(\rho =0\), we say that \(X\) and \(Y\) are uncorrelated, while \(\rho =\pm 1\), we say that \(X\) and \(Y\) are completely correlated (positive if \(\rho =1\), negative if \(\rho =1\)). For \(-1<\rho <1\) and \(\rho\neq 0\), we say that \(X\) and \(Y\) are correlated (positively if \(\rho >0\), negatively if \(\rho <0\)). Positive values of \(\rho\) may indicate that there is a tendency of large values of \(Y\) to correspond to large values of \(X\) and small values of \(Y\) to correspond to small values of \(X\). Negative values of \(\rho\) may indicate that small values of \(Y\) correspond to large values of \(X\) and large values of \(Y\) to small values of \(X\). Values of \(\rho\) close to zero may also indicate that these tendencies are weak, while values of \(\rho\) close to \(\pm 1\) may indicate that the tendencies are strong.
Let \(X\) be a random variable with p.d.f. \(f\). The characteristic function of \(X\) denoted by \(\phi_{X}\) is a function defined on \(\mathbb{R}\) by
\begin{align*} \phi_{X}(t) & =\mathbb{E}[e^{itX}]\\ & =\left\{\begin{array}{l}
{\displaystyle \sum_{x} e^{itx}f(x)=\sum_{x} [\cos (tx)f(x)+i\sin (tx)f(x)]}\\
{\displaystyle \int_{-\infty}^{\infty} e^{itX}f(x)dx=\int_{-\infty}^{\infty}[\cos (tx)f(x)+i\sin (tx)f(x)]dx}
\end{array}\right .\end{align*}
The characteristic function \(\phi_{X}\) exists for all \(t\in\mathbb{R}\), and is also called the Fourier transform of \(f\). Some properties of characteristic function are given below.
- We have \(\phi_{X}(0)=1\).
- We have \(|\phi_{X}(t)|\leq 1\).
- \(\phi_{X}\) is uniformly continuous.
- Given any constant \(d\), we have \(\phi_{X+d}(t)=e^{itd}\phi_{X}(t)\).
- Given any constant \(c\), we have \(\phi_{cX}(t)=\phi_{X}(ct)\).
- We have \(\phi_{cX+d}(t)=e^{itd}\phi_{X}(ct)\).\left .\frac{d^{n}}{dt^{n}}\phi_{X}(t)\right
- If \(\mathbb{E}|X^{n}|<\infty\) for any integer \(n\), we have \[\left .\frac{d^{n}}{dt^{n}}\phi_{X}(t)\right|_{t=0} =i^{n}\mathbb{E}(X^{n}).\]
Theorem. (Uniqueness Theorem) There is a one-to-one correspondence between the characteristic function and the p.d.f. of a random variable. \(\sharp\)
The moment-generating function (m.g.f.) \(M_{X}\) of a random variable \(X\), which is also called the Laplace transform of \(f\), is defined by
\[M_{X}(t)=\mathbb{E}(e^{tX})\mbox{ for }t\in\mathbb{R}\]
when this expectation exists. For \(t=0\), we have \(M_{X}(0)=1\). However, it may fail to exists for \(t\neq 0\). If \(M_{X}(t)\) exists, then we have \(\phi_{X}(t)=M_{X}(it)\).
Let \({\bf X}=(X_{1},\cdots ,X_{n})\) be a random vector. Then, the characteristic function of the random vector \({\bf X}\), or the joint characteristic function of the random variables \(X_{1},\cdots,X_{n}\), denoted by \(\phi_{{\bf X}}\) or \(\phi_{X_{1},\cdots,X_{n}}\), is defined by
\[\phi_{X_{1},\cdots ,X_{n}}(t_{1},\cdots ,t_{n})=\mathbb{E}[e^{it_{1}X_{1}+it_{2}X_{2}+\cdots +it_{n}X_{n}}]\]
for \(t_{j}\in\mathbb{R}\) and \(j=1,\cdots ,n\). The characteristic function \(\phi_{X_{1},\cdots ,X_{n}}\) always exists and satisfies the following properties.
- We have \(\phi_{X_{1},\cdots ,X_{n}}(0,\cdots ,0)=1\).
- We have \(|\phi_{X_{1},\cdots ,X_{n}}(t_{1},\cdots ,t_{n})|\leq 1\).
- \(\phi_{X_{1},\cdots ,X_{k}}\) is uniformly continuous.
- We have \[\phi_{X_{1}+d_{1},\cdots ,X_{n}+d_{n}}(t_{1},\cdots ,t_{n})=e^{it_{1}d_{1}+\cdots it_{n}d_{n}}\phi_{X_{1},\cdots,X_{n}}(t_{1},\cdots , t_{n}).\]
- We have \[\phi_{c_{1}X_{1}+d_{1},\cdots ,c_{n}X_{n}+d_{n}}(t_{1},\cdots,t_{n})= e^{it_{1}d_{1}+\cdots it_{n}d_{n}}\phi_{X_{1},\cdots,X_{n}} (c_{1}t_{1},\cdots ,c_{n}t_{n}).\]
- Suppose that the absolute \((n_{1},\cdots ,n_{k})\)-joint moment as well as all lower order joint moments of \(X_{1},\cdots ,X_{k}\) are finite. Then, we have \[\left.\frac{\partial^{n_{1}+\cdots +n_{k}}}{\partial t_{1}^{n_{1}}\cdots\partial t_{k}^{n_{k}}}\phi_{X_{1},\cdots ,X_{k}}(t_{1},\cdots ,t_{k})\right |_{t_{1}=\cdots =t_{k}=0}=i^{\sum_{j=1}^{k} n_{j}}\mathbb{E}(X_{1}^{n_{1}}\cdots X_{k}^{n_{k}}).\] In particular, we also have \[\left .\frac{\partial^{n}}{\partial t_{j}^{n}}\phi_{X_{1},\cdots ,X_{k}}(t_{1},\cdots ,t_{k})\right |_{t_{1}=\cdots t_{k}=0}=i^{n}\mathbb{E}(X_{j}^{n})\] for \(j=1,\cdots ,k\).
- In \(\phi_{X_{1},\cdots ,X_{k}}(t_{1},\cdots ,t_{k})\), we set \(t_{j_{1}}=\cdots =t_{j_{n}}=0\). Then, the resulting expression is the joint characteristic function of the random variables \(X_{i_{1}},\cdots , X_{i_{m}}\), where the \(j\)’s and \(i\)’s aredifferent and \(m+n=k\).
Theorem. (Uniqueness). There is a one-to-one correspondence between the characteristic function and the p.d.f. of a random vector. \(\sharp\)
The m.g.f. of the random vector \({\bf X}\) or the joint m.g.f. of the random variables \(X_{1},\cdots ,X_{n}\), denoted by \(M_{{\bf X}}\) or \(M_{X_{1},\cdots ,X_{n}}\), is defined by
\[M_{X_{1},\cdots ,X_{n}}(t_{1},\cdots ,t_{n})=\mathbb{E}(e^{t_{1}X_{1}+\cdots +t_{n}X_{n}})\]
for \(t_{j}\in\mathbb{R}\) and \(j=1,\cdots ,n\), when this expectation exists. If \(M_{X_{1},\cdots ,X_{n}}(t_{1},\cdots,x_{n})\) exists, then we have \(\phi_{X_{1},\cdots ,X_{n}}(t_{1},\cdots,t_{n})= M_{X_{1},\cdots ,X_{n}}(it_{1},\cdots ,it_{n})\).
(i) Let \(X\) have a binomial distribution \(B(n,p)\). Then, we have
\[\phi_{X}(t)=(pe^{it}+q)^{n}\]
and
\[M_{X}(t)=(pe^{t}+q)^{n}.\]
Therefore, we obtain
\[\left .\frac{d}{dt}\phi_{X}(t)\right |_{t=0}=\left .n(pe^{it}+q)^{n-1} ipe^{it}\right |_{t=0}=inp\]
such that \(\mathbb{E}(X)=np\).
(ii) Let \(X\) have a Poisson distribution \(P(\lambda )\). Then, we have
\[\phi_{X}(t)=e^{\lambda e^{it}-\lambda }\]
and
\[M_{X}(t)=e^{\lambda e^{t}-\lambda}.\]
Therefore, we obtain
\[\left .\frac{d}{dt}\phi_{X}(t)\right |_{t=0}=\left . e^{\lambda e^{it}-\lambda}\lambda e^{it}\right |_{t=0}=i\lambda\]
such that \(\mathbb{E}(X)=\lambda\).
(iii) Let \(X\) have a normal distribution \(N(\mu ,\sigma^{2})\). Then, we have
\[\phi_{X}(t)=e^{it\mu -(\sigma^{2}t^{2}/2)}\]
and
\[M_{X}(t)=e^{t\mu +(\sigma^{2}t^{2}/2)}.\]
In particular, if \(X\) is \(N(0,1)\), then we have
\[\phi_{X}(t)=e^{-t^{2}/2}\]
and
\[M_{X}(t)=e^{t^{2}/2}.\]
Therefore, we obtain
\[\left .\frac{d}{dt}\phi_{X}(t)\right |_{t=0}=\left .\exp\left (it\mu -\frac {\sigma^{2}t^{2}}{2}\right )(i\mu -\sigma^{2}t)\right |_{t=0}=i\mu\]
such that \(\mathbb{E}(X)=\mu\).
(iv) Let \(X\) have a Gamma distribution with parameters \(\alpha\) and \(\beta\). Then, we have
\[\phi_{X}(t)=(1-i\beta t)^{-\alpha}\]
and
\[M_{X}(t)=(1-\beta t)^{-\alpha}\]
for \(t<1/\beta\). Therefore, we obtain
\[\left .\frac{d}{dt}\phi_{X}(t)\right |_{t=0}=\left .\frac {i\alpha\beta}{(1-i\beta t)^{\alpha +1}}\right |_{t=0}=i\alpha\beta\]
such that \(\mathbb{E}(X)=\alpha\beta\). For \(\alpha =r/2\) and \(\beta =2\), we have the corresponding quantities for \(\chi_{r}^{2}\). For \(\alpha =1\) and \(\beta =1/\lambda\), we obtain the corresponding quantities for the Exponential distribution. Therefore, we have
\[\phi_{X}(t)=(1-2it)^{-r/2}\]
and
\[\phi_{X}(t)=\left (1-\frac{it}{\lambda }\right )^{-1}=\frac{\lambda}{\lambda -it},\]
respectively.
Independence.
We say that the random variables \(X_{j}\) for \(j=1,\cdots ,n\) are independent when
\[\mathbb{P}(X_{j}\in B_{j},j=1,\cdots ,n)=\prod_{j=1}^{k}\mathbb{P}(X_{j}\in B_{j})\]
for any \(B_{j}\in {\cal B}_{j}\) and \(j=1,\cdots ,n\).
Theorem. For \(j=1,\cdots ,n\), let the random variables \(X_{j}\) be independent, and let \(g_{j}:(\mathbb{R},{\cal B})\rightarrow (\mathbb{R},{\cal B})\) be measurable functions such that \(g_{j}(X_{j})\) for \(j=1,\cdots ,n\) are random variables. Then, the random variables \(g_{j}(X_{j})\) for \(j=1,\cdots ,n\) are also independent. \(\sharp\)
Theorem. (Factorization Theorem) The random variables \(X_{j}\) for \(j=1,\cdots,n\) are independent if and only if any one of the following (equivalent) conditions hold:
(a) \({\displaystyle F_{X_{1},\cdots ,X_{k}}(x_{1},\cdots ,x_{n})= \prod_{j=1}^{n} F_{X_{j}}(x_{j})}\) for all \(x_{j}\in\mathbb{R}\) and \(j=1,\cdots ,n\).
(b) \({\displaystyle f_{X_{1},\cdots ,X_{k}}(x_{1},\cdots ,x_{n})= \prod_{j=1}^{n} f_{X_{j}}(x_{j})}\) for all \(x_{j}\in\mathbb{R}\) and \(j=1,\cdots ,n\).
(c) \({\displaystyle \phi_{X_{1},\cdots ,X_{n}}(t_{1},\cdots,t_{n})= \prod_{j=1}^{n} \phi_{X_{j}}(t_{j})}\) for all \(x_{j}\in\mathbb{R}\) and \(j=1,\cdots ,n\). \(\sharp\)
Theorem. Consider the random variables \(X_{j}\) for \(j=1,\cdots ,n\), and let \(g_{j}:(\mathbb{R},{\cal B})\rightarrow (\mathbb{R},{\cal B})\) be measurable functions such that \(g_{j}(X_{j})\) for \(j=1,\cdots ,n\) are random variables. Suppose that the random variables \(X_{j}\) for \(j=1,\cdots ,k\) are independent. Then, we have
\[\mathbb{E}\left [\prod_{j=1}^{n} g_{j}(X_{j})\right ]=\prod_{j=1}^{n}\mathbb{E}[g_{j}(X_{j})]\]
when the expectations exist. \(\sharp\)
Suppose that \(X_{1}\) and \(X_{2}\) are independent. Then, we have
\[\mbox{Cov}(X_{1},X_{2})=\mathbb{E}(X_{1}X_{2})-\mathbb{E}(X_{1})E(X_{2})=0,\]
which implies
\[\rho =\frac{\mbox{Cov}(X_{1},X_{2})}{\sigma (X_{1})\sigma (X_{2})}=0.\]
It says that \(X_{1}\) and \(X_{2}\) are uncorrelated. However, the converse needs not be true; that is, the uncorrelated random variables, in general, are not independent.
Theorem. Consider the random variables \(X_{j}\) for \(j=1,\cdots ,n\) with \(\sigma^{2}(X_{j})=\sigma_{j}^{2}>0\) for \(j=1,\cdots ,n\) and \(\rho(X_{i},X_{j})= \rho_{ij}\) for \(i\neq j\) and \(i,j=1,\cdots ,n\). Then, we have
\[\sigma^{2}\left (\sum_{j=1}^{n} c_{j}X_{j}\right )=\sum_{j=1}^{n} c_{j}^{2}\sigma_{j}^{2}+\sum_{i\neq j} c_{i}c_{j}\rho_{ij}\sigma_{i}\sigma_{j}.\]
In particular, when the random variables \(X_{j}\) for \(j=1,\cdots ,n\) are independent, or only (pairwise) uncorrelated, we have
\[\sigma^{2}\left (\sum_{j=1}^{n} c_{j}X_{j}\right )=\sum_{j=1}^{n}c_{j}^{2}\sigma_{j}^{2}.\]
Theorem. We have the following properties.
(i) Let \(X_{j}\) be independent binomial distribution \(B(n_{j},p)\) for \(j=1,\cdots ,n\). Then \(X=\sum_{j=1}^{n}X_{j}\) is \(B(n,p)\), where \(n=\sum_{j=1}^{n} n_{j}\).
(ii) Let \(X_{j}\) be independent Poisson distribution \(P(\lambda_{j})\) for \(j=1,\cdots ,n\). Then \(X=\sum_{j=1}^{n} X_{j}\) is \(P(\lambda )\), where \(\lambda =\sum_{j=1}^{} \lambda_{j}\).
(iii) Let \(X_{j}\) be independent normal distribution \(N(\mu_{j},\sigma_{j}^{2})\) for \(j=1,\cdots,n\). Then \(X=\sum_{j=1}^{n} c_{j}X_{j}\) is \(N(\mu ,\sigma^{2})\), where \(\mu =\sum_{j=1}^{n} c_{j}\mu_{j}\) and \(\sigma^{2}=\sum_{j=1}^{n} c_{j}^{2}\sigma_{j}^{2}\).
(iv) Let \(X_{j}\) be independent chi-square distribution$\chi_{r_{j}}^{2}$ for \(j=1,\cdots ,n\). Then \(X=\sum_{j=1}^{n} X_{j}\) is \(\chi_{r}^{2}\), where \(r=\sum_{j=1}^{n}r_{j}\).
Proof. To prove part (i), we have
\begin{align*} \phi_{X}(t) & =\phi_{\sum_{j=1}^{n} X_{j}}(t)\\ & =\prod_{j=1}^{n} \phi_{X_{j}}(t)\\ & =\prod_{j=1}^{n} (pe^{it}+q)^{n_{j}}\\ & =(pe^{it}+q)^{n},\end{align*}
which is the characteristic function of \(B(n,p)\) random variable.
To prove part (ii), we have
\begin{align*} \phi_{X}(t) & =\phi_{\sum_{j=1}^{k} X_{j}} (t)\\ & =\prod_{j=1}^{k} \phi_{X_{j}}(t)\\ & =\prod_{j=1}^{k} \exp (\lambda_{j}e^{it}-\lambda_{j})\\ & =\exp (\lambda e^{it}-\lambda ),\end{align*}
which is the characteristic function of a \(P(\lambda )\) random variable.
To prove part (iii), we have
\begin{align*} \phi_{X}(t) & =\phi_{\sum_{j=1}^{n} c_{j}X_{j}}(t)\\ & =\prod_{j=1}^{n}\phi_{X_{j}}(c_{j}t)\\ & =\prod_{j=1}^{n} \left [\exp\left (ic_{j}t\mu_{j}-\frac{\sigma_{j}^{2}c_{j}^{2}t^{2}}{2}\right )\right ]\\ & =\exp\left (it\mu -\frac{\sigma^{2}t^{2}}{2}\right ),\end{align*}
which is the characteristic function of a normal random variable.
To prove part (iv), we have
\begin{align*} \phi_{X}(t) & =\phi_{\sum_{j=1}^{n} X_{j}}(t)\\ & =\prod_{j=1}^{k} \phi_{X_{j}}(t)\\ & =\prod_{j=1}^{n} (1-2it)^{-r_{j}/2}\\ & =(1-2it)^{-r/2},\end{align*}
which is the characteristic function of a \(\chi_{r}^{2}\) random variable. \(\blacksquare\)
Theorem. Let \(X\) be \(N(\mu ,\sigma^{2})\). Then \((X-\mu )/\sigma\) is \(N(0,1)\). Let \(X\) be \(N(\mu ,\sigma^{2})\). Then
\[Y=\left (\frac{X-\mu}{\sigma}\right )^{2}\]
is \(\chi_{1}^{2}\).
Theorem. Let \(X_{j}\) be independent \(N(\mu ,\sigma^{2})\) for \(j=1,\cdots ,k\), and let \(\bar{X}=\frac{1}{k}\sum_{j=1}^{k} X_{j}\). Then \(\bar{X}\) is \(N(\mu ,\sigma^{2}/k)\), or equivalently, \([\sqrt{k}(\bar{X}-\mu )]/\sigma\) is \(N(0,1)\).
Theorem. Let \(X_{j}\) be independent \(N(\mu_{j},\sigma_{j}^{2})\) for \(j=1,\cdots ,n\). Then
\[X=\sum_{j=1}^{n} \left (\frac{X_{j}-\mu_{j}}{\sigma_{j}}\right )^{2}\]
is \(\chi_{n}^{2}\).
Proof. We first note that \((\frac{X_{j}-\mu_{j}}{\sigma_{j}})^{2}\) are independent for \(j=1,\cdots,n\). We also see that \((\frac{X_{j}-\mu_{j}}{\sigma_{j}})^{2}\) are \(\chi_{1}^{2}\) for \(j=1,\cdots,n\), which shows \(X\) is \(\chi_{n}^{2}\). \(\blacksquare\)
Theorem. Let \(X_{j}\) be independent \(N(\mu ,\sigma^{2})\) for \(j=1,\cdots ,n\), and let
\[S^{2}=\frac{1}{n}\sum_{j=1}^{n} (X_{j}-\bar{X})^{2}.\]
Then \(nS^{2}/\sigma^{2}\) is \(\chi_{n-1}^{2}\).
Proof. We have
\[\sum_{j=1}^{n} \left (\frac{X_{j}-\mu}{\sigma}\right )^{2}=\left [\frac{\sqrt{k}(\bar{X}-\mu )}{\sigma}\right ]^{2}+\frac{kS^{2}}{\sigma^{2}}.\]
Since
\[\sum_{j=1}^{k} \left (\frac{X_{j}-\mu}{\sigma}\right )^{2}\mbox{ is }\chi_{k}^{2}\]
and
\[\left [\frac{\sqrt{k}(\bar{X}-\mu )}{\sigma}\right ]^{2}\mbox{ is }\chi_{1}^{2},\]
by taking characteristic functions of both sides, we have
\[(1-2it)^{-k/2}=(1-2it)^{-1/2}\phi_{kS^{2}/\sigma^{2}}(t).\]
Therefore, we obtain
\[\phi_{kS^{2}/\sigma^{2}}(t)=(1-2it)^{-(k-1)/2},\]
which is the characteristic function of a \(\chi_{k-1}^{2}\) random variable. This completes the proof. \(\blacksquare\)
We also have
\[\mathbb{E}\left (\frac{nS^{2}}{\sigma^{2}}\right )=n-1\mbox{ and }\sigma^{2}\left (\frac{nS^{2}}{\sigma^{2}}\right )=2(n-1).\]
Definition. Three convergences are defined below.
(i) We say that the sequence \(\{X_{n}\}_{n=1}^{\infty}\) converges almost surely (a.s.) or with probability one to \(X\) as \(n\rightarrow\infty\) when \(X_{n}(\omega ) \rightarrow X(\omega )\) as \(n\rightarrow\infty\) for all \(\omega\in \Omega\) except possibly for a subset \(N\) of \(\Omega\) sarisfying \(\mathbb{P}(N)=0\). This type of convergence is also known as strong convergence.
(ii) We say that the sequence \(\{X_{n}\}_{n=1}^{\infty}\) converges in probability to \(X\) as \(n\rightarrow\infty\) when, for every \(\varepsilon >0\), we have
$\mathbb{P}(|X_{n}-X|>\varepsilon)\rightarrow 0$ as \(n\rightarrow\infty\).
(iii) We say that the sequence \(\{X_{n}\}_{n=1}^{\infty}\) converges in distribution to \(X\) as \(n\rightarrow\infty\) when \(F_{n}(x)\rightarrow F(x)\) as \(n\rightarrow\infty\) for all \(x\in\mathbb{R}\) for which \(F\) is continuous. This type of convergence is also known as weak convergence. \(\sharp\)
Suppose that \(F_{n}\) have p.d.f.’s \(f_{n}\). Then, the sequence \(\{X_{n}\}_{n=1}^{\infty}\) converges in distribution to \(X\) does not necessarily imply the convergence of \(f_{n}(x)\) to a p.d.f. \(f\).
Theorem. We have the following properties.
(i) Convergence with probability one implies convergence in probability.
(ii) Convergence in probability implies convergence in distribution. \(\sharp\)
Theorem. (Central Limit Theorem) Let \(X_{1},\cdots ,X_{n}\) be i.i.d. random variables with finite mean \(\mu\) and variance \(\sigma^{2}\). Let
\[G_{n}(x)=\mathbb{P}\left(\frac{\sqrt{n}(\bar{X}-\mu )}{\sigma}\leq x\right)\]
and
\[\Phi (x)=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{x}e^{-t^{2}/2}dt.\]
Then \(G_{n}(x)\rightarrow\Phi (x)\) uniformly in \(x\in\mathbb{R}\). \(\sharp\)
Theorem. (Lindeberg-Feller Central Limit Theorem) Let \(\{X_{n}\}_{n=1}^{\infty}\) be a sequence of independent but not necessarily identically distributed random variables with \(\mbox{Var}(X_{n})=\sigma_{n}^{2}< \infty\) for \(n=1,2,\cdots \)latex . Let \(\mathbb{E}[X_{n}]=\alpha_{n}\) and \(S_{n}=\sum_{j=1}^{n} X_{n}\). Let \(\mbox{Var}(S_{n})=B_{n}^{2}\), and let \(F_{n}\) be distribution function of \(X_{n}\). Then the following two conditions:
\[\lim_{n\rightarrow\infty}\max_{1\leq k\leq n}\frac{\sigma_{k}^{2}}{B_{n}^{2}}=0\]
and
\[\lim_{n\rightarrow\infty}\mathbb{P}\left(\frac{S_{n}-E[S_{n}]} {B_{n}}\leq x\right)=\frac{1}{\sqrt{\pi}}\int_{-\infty}^{x} e^{-u^{2}/2}du\mbox{ the convergence is uniformly in \(x\)}\]
hold if and only if, for every \(\epsilon >0\), the condition
\begin{equation}{\label{e6}}\tag{1}
\lim_{n\rightarrow\infty}\frac{1}{B_{n}^{2}}\sum_{k=1}^{n}\int_{|x-\alpha_{k}|\geq\epsilon B_{n}} (x-\alpha_{k})^{2}dF_{k}(x)=0
\end{equation}
is satisfied. The condition (\ref{e6}) is called Lindeberg Condition. \(\sharp\)
Theorem. (Strong Law of Large Numbers) Let \(X_{j}\) for \(j=1,\cdots ,n\) be i.i.d. with finite mean \(\mu\). Then
\[\bar{X}_{n}=\frac{X_{1}+\cdots +X_{n}}{n}\]
converges with probability one to \(\mu\). \(\sharp\)
Theorem. (Weak Law of Large Numbers) Theorem. Let \(X_{j}\) for \(j=1,\cdots ,n\) be i.i.d. with finite mean \(\mu\). Then
\[\bar{X}_{n}=\frac{X_{1}+\cdots +X_{n}}{n}\]
converges in probability to \(\mu\). \(\sharp\)
The sample variance is
\[S_{n}^{2}=\frac{1}{n}\sum_{j=1}^{n} (X_{j}-\bar{X}_{n})^{2}=\frac{1}{n} \sum_{j=1}^{n} X_{j}^{2}-\bar{X}_{n}^{2}.\]
Theorem. Let \(X_{j}\) for \(j=1,\cdots ,n\) be i.i.d. random variables with \(\mathbb{E}(X_{j})= \mu\) and \(\mbox{Var}(X_{j})=\sigma^{2}\) for \(j=1,\cdots ,n\). Then \(S_{n}^{2}\) converges with probability one and in probability to \(\sigma^{2}\).
Proof. Since \(\mathbb{E}(X_{j}^{2})=\sigma^{2}+\mu^{2}\), the SLLN’s and WLLN’s say that
\[\frac{1}{n}\sum_{j=1}^{n} X_{j}^{2}\]
converges with probability one and in probability to \(\sigma^{2}+\mu^{2}\). On the other hand, since \(\bar{X}_{n}\) converges with probability one and in probability to \(\mu\), it follows that \(\bar{X}_{n}^{2}\) converges with probability one and in probability to \(\mu^{2}\). Therefore, we see that
\[\frac{1}{n}\sum_{j=1}^{n} X_{j}^{2}-\bar{X}_{n}^{2}\]
converges with probability one and in probability to \(\sigma^{2}+\mu^{2}- \mu^{2}=\sigma^{2}\). This completes the proof. \(\blacksquare\)
We see that \(S_{n}^{2}\) converges in probability to \(\sigma^{2}\) implies that
\[\frac{n}{n-1}\frac{S_{n}^{2}}{\sigma^{2}}\]
converges in probability to \(1\) since \(n/(n-1)\rightarrow 1\) as \(n\rightarrow\infty\).
Theorem. Let \(X_{1},\cdots ,X_{n}\) be i.i.d. random variables with mean \(\mu\) and positive variance \(\sigma^{2}\). Then
\[\frac{\sqrt{n-1}(\bar{X}_{n}-\mu )}{S_{n}}\mbox{ and }\frac{\sqrt{n}(\bar{X}_{n}-\mu )}{S_{n}}\]
converge in distribution to \(N(0,1)\).
Proof. The CLT says
\[\frac{\sqrt{n}(\bar{X}_{n}-\mu )}{\sigma}\]
converges in distribution to \(N(0,1)\). We also see that
\[\sqrt{\frac{n}{n-1}}\frac{S_{n}}{\sigma}\]
converges in probability to \(1\) by the previous comment. This completes the proof. \(\blacksquare\)
- Let \(X\) be \(N(0,1)\) and \(Y\) be \(\chi_{r}^{2}\) such that they are also independent. Set \({\displaystyle T=\frac{X}{\sqrt{Y/r}}}\). The random variable \(T\) is said to have the $t$ distribution with \(r\) degrees of freedom, and is often denoted by \(t_{r}\).
- Let \(X\) be \(\chi_{r_{1}}^{2}\) and \(Y\) be \(\chi_{r_{2}}^{2}\) such that they are independent. Set \({\displaystyle F=\frac{X/r_{1}}{Y/r_{2}}}\). The random variable \(F\) is said to have the \(F\) distribution with \(r_{1},r_{2}\) degrees of freedom, and is often denoted by \(F_{r_{1},r_{2}}\).
If \(F\) is distributed as$F_{r_{1},r_{2}}$, then \(1/F\) is distributed as \(F_{r_{2},r_{1}}\). If \(T\) is distributed as \(t_{r}\), then \(T^{2}\) is distributed as \(F_{1,r}\) since \(X^{2}\) is \(\chi_{1}^{2}\).


