Limit Theorems

Bartholomeus van Hove (1790-1880) was a Dutch painter.

We have sections

\begin{equation}{\label{a}}\tag{A}\mbox{}\end{equation}

Modes of Convergence.

Let \(\{X_{n}\}_{n=1}^{\infty}\) be a sequence of random variables and \(X\) be a random variable defined on the probability space \((S,{\cal F},\mathbb{P})\).

  • We say that \(\{X_{n}\}_{n=1}^{\infty}\) converges almost surely or with probability one, to \(X\) as \(n\rightarrow\infty\), and we write \(X_{n} \stackrel{a.s.}{\longrightarrow}X\) or \(X_{n}\rightarrow X\) with probability one or \(\mathbb{P}(X_{n}\rightarrow X)=1\) when \(X_{n}(s)\rightarrow X(s)\) for all \(s\in S\) except possibly for a subset \(N\) of \(S\) such that \(\mathbb{P}(N)=0\). In other words, for every \(\epsilon >0\) and for every \(s\in N^{c}\), there exists an integer \(N(\epsilon ,s)\) satisfying \(|X_{n}(s)-X(s)|<\epsilon\) for all \(n\geq N(\epsilon ,s)\). This type of convergence is also known as strong convergence.
  • We say that \(\{X_{n}\}_{n=1}^{\infty}\) converges in probability to \(X\) as \(n\rightarrow\infty\), and we write \(X_{n}\stackrel{\mathbb{P}}{\longrightarrow}X\) when, for every \(\epsilon >0\), we have \(\mathbb{P}(|X_{n}-X|>\epsilon )\rightarrow 0\). In other words, for every \(\epsilon >0\), there exists an integer \(N(\epsilon )>0\) satisfying \(\mathbb{P}(|X_{n}-X|>\epsilon )<\epsilon\) for all \(n\geq N(\epsilon )\).
  • Let \(F_{n}=F_{X_{n}}\) and \(F=F_{X}\). We say that \(\{X_{n}\}_{n=1}^{\infty}\) converges in distribution to \(X\) as \(n\rightarrow\infty\), and we write \(X_{n}\stackrel{d}{\longrightarrow}X\) when \(F_{n}(x)\rightarrow F(x)\) for all \(x\in \mathbb{R}\) for which \(F\) is continuous. In other words, for every \(\epsilon >0\) and every \(x\) for which \(F\) is continuous, there exists an integer \(N(\epsilon ,x)\) satisfying \(|F_{n}(x)-F(x)|<\epsilon\) for all \(n\geq N(\epsilon ,x)\). This type of convergence is also known as weak convergence.

Since

\[\mathbb{P}(|X_{n}-X|>\epsilon )+\mathbb{P}(|X_{n}-X|\leq\epsilon )=1,\]

we see that \(X_{n}\stackrel{\mathbb{P}}{\longrightarrow}X\) is equivalent to \(\mathbb{P}(|X_{n}-X|\leq\epsilon )\rightarrow 1\). Also if \(\mathbb{P}(|X_{n}-X|>\epsilon )\rightarrow 0\) for every \(\epsilon >0\), then clearly \(\mathbb{P}(|X_{n}-X|\geq\epsilon )\rightarrow 0\). On the other hand, if \(F_{n}\) has p.d.f. \(f_{n}\), then \(X_{n}\stackrel{d}{\longrightarrow}X\) does not necessarily imply the convergence of \(f_{n}\) to a p.d.f. \(f\)

Proposition. Let \(F\) and \(\{F_{n}\}_{n=1}^{\infty}\) be distribution functions such that \(F_{n}(x)\rightarrow F(x)\) for \(x\in \mathbb{R}\), and let \(F\) be continuous. Then, the convergence is uniform in \(x\in \mathbb{R}\). In other words, for every \(\epsilon >0\), there exists an integer \(N(\epsilon )>0\) such that \(n\geq N(\epsilon )\) implies \(|F_{n}(x)-F(x)|<\epsilon\) for every \(x\in \mathbb{R}\).

Theorem. We have that \(X_{n}\stackrel{a.s.}{\longrightarrow}X\) implies \(X_{n}\stackrel{\mathbb{P}}{\longrightarrow}X\), and that \(X_{n}\stackrel{\mathbb{P}} {\longrightarrow}X\) implies \(X_{n}\stackrel{d}{\longrightarrow}X\).

Theorem. Let \(\{F_{n}\}_{n=1}^{\infty}\) be a sequence of distribution functions, and let \(F\) be a distribution function. Let \(\phi_{n}\) be the characteristic function corresponding to \(F_{n}\), and let \(\phi\) be the characteristic function corresponding to \(F\).

(i) If \(F_{n}(x)\rightarrow F(x)\) for all continuity points \(x\) of \(F\), then \(\phi_{n}(t)\rightarrow\phi (t)\) for every \(t\in \mathbb{R}\).

(ii) If \(\phi_{n}(t)\) converges to a function \(g(t)\) which is continuous at \(t=0\), then \(g\) is a characteristic function, and if \(F\) is the corresponding distribution function, then \(F_{n}(x) \rightarrow F(x)\) for all continuity points \(x\) of \(F\). \(\sharp\)

Proposition. We have \(X_{n}\stackrel{a.s.}{\longrightarrow}X\) if and only if, for every \(\epsilon >0\),

\[\lim_{n\rightarrow\infty}\mathbb{P}\left (\sup_{k\geq n}|X_{k}-X|>\epsilon\right )=0.\]

Proposition. If \(\sum_{n=1}^{\infty}\mathbb{P}(|X_{n}-X|>\epsilon )<\infty\) for every \(\epsilon >0\), then \(X_{n}\stackrel{a.s.}{\longrightarrow}X\).

Proposition. We have \(X_{n}\stackrel{d}{\longrightarrow}X\) if and only if \(\mathbb{E}[f(X_{n})]\rightarrow \mathbb{E}[f(X)]\) for every continuous function \(f\).

Proposition. Let \(g:\mathbb{R}\rightarrow \mathbb{R}\) be a continuous function. We have the following properties.

(i) If \(X_{n}\stackrel{a.s.}{\longrightarrow}X\), then \(g(X_{n})\stackrel{a.s.}{\longrightarrow}g(X)\).

(ii) If \(X_{n}\stackrel{\mathbb{P}}{\longrightarrow}X\), then \(g(X_{n})\stackrel{\mathbb{P}}{\longrightarrow}g(X)\).

(iii) If \(X_{n}\stackrel{d}{\longrightarrow}X\), then \(g(X_{n})\stackrel{d}{\longrightarrow}g(X)\).

\begin{equation}{\label{b}}\tag{B}\mbox{}\end{equation}

Central Limit Theorem.

We have shown that the distribution of \(\bar{X}\) is \(N(\mu ,\sigma^{2}/n)\) when sampling from the normal distribution \(N(\mu ,\sigma^{2})\) by referring to the page Distributions of Functions of Random Variables. Let

\[W=\frac{\sqrt{n}(\bar{X}-\mu )}{\sigma}=\frac{\bar{X}-\mu}{\sigma /\sqrt{n}}.\]

Then \(W\) is \(N(0,1)\). In general, let

\[W=\frac{\sqrt{n}(\bar{X}-\mu )}{\sigma}=\frac{\bar{X}-\mu}{\sigma /\sqrt{n}},\]

where \(\bar{X}\) is the mean of a random sample of size \(n\) from some distribution with mean \(\mu\) and variance \(\sigma^{2}\). Then, for each positive integer \(n\), we have

\begin{align*} \mathbb{E}(W) & =\mathbb{E}\left [\frac{\bar{X}-\mu}{\sigma /\sqrt{n}}\right ]\\ & =\frac{\mathbb{E}(\bar{X}-\mu )}{\sigma /\sqrt{n}}\\ & =\frac{\mu -\mu}{\sigma /\sqrt{n}}=0\end{align*}

and

\begin{align*} \mbox{Var}(W) & =\mathbb{E}(W^{2})\\ & =\mathbb{E}\left [\frac{(\bar{X}-\mu )^{2}}{\sigma^{2}/n}\right ]\\ & =\frac{\mathbb{E}[(\bar{X}-\mu )^{2}]}{\sigma^{2}/n}\\ & =\frac{\sigma^{2}/n}{\sigma^{2}/n}=1.\end{align*}

However \(W\) is not necessary normally distributed.

Theorem. (Central Limit Theorem). Let \(X_{1},\cdots ,X_{n}\) be i.i.d. random variables with finite mean \(\mu\) and variance \(\sigma^{2}\)., and let

\[F_{n}(x)=\mathbb{P}\left [\frac{\sqrt{n}(\bar{X}-\mu )}{\sigma}\leq x\right ]\]

and

\[\Phi (x)=\frac{1}{2\pi}\int_{-\infty}^{x}e^{-t^{2}/2}dt.\]

Then \(F_{n}(x)\rightarrow\Phi (x)\) uniformly in \(x\in \mathbb{R}\).

Theorem. (Central Limit Theorem). If \(\bar{X}\) is the mean of a random sample \(X_{1},X_{2},\cdots ,X_{n}\) of size \(n\) from a distribution with a finite mean \(\mu\) and a finite
positive variance \(\sigma^{2}\), then the distribution of

\begin{align*} W & =\frac{\sqrt{n}(\bar{X}-\mu )}{\sigma}\\ & =\frac{\sum_{i=1}^{n}X_{i}-n\mu}{\sqrt{n}\sigma}\end{align*}

is \(N(0,1)\) in the limit as \(n\rightarrow\infty\).

We can also say that \(\bar{X}\) is approximately \(N(\mu ,\sigma^{2}/n)\), and \(Y=\sum_{i=1}^{n} X_{i}\) is approximately \(N(n\mu ,n\sigma^{2})\) when \(n\) is sufficiently large. Generally, if \(n\) is greater than \(30\), these approximations will be good. However, if the underlying distribution is symmetric, unimodal, and continuous type, a value of \(n\) as small as 4 or 5 can yield a very adequate approximation. Moreover, if the original distribution is approximately normal, \(\bar{X}\) would have a distribution very close to normal when \(n\) equals 2 or 3. In fact, we know that if the sample is taken from \(N(\mu ,\sigma^{2})\), the sample mean \(\bar{X}\) is exactly \(N(\mu ,\sigma^{2}/n)\) for every \(n=1,2,3,\cdots\).

Example. Let \(X\) denote the mean of a random sample of size \(15\) from the distribution whose p.d.f. is \(f(x)=(3/2)x^{2}\) for \(-1<x<1\). Then, we have \(\mu =0\) and \(\sigma^{2}=3/5\). Therefore, we obtain

\begin{align*} & \mathbb{P}(0.03\leq\bar{X}\leq 0.15)\\ & \quad =\mathbb{P}\left (\frac{\sqrt{15}(0.03-0)}{\sqrt{3/5}}
\leq\frac{\sqrt{15}(\bar{X}-0)}{\sqrt{3/5}}\leq\frac{\sqrt{15}(0.15-0)}{\sqrt{3/5}}\right )\\
& \quad =\mathbb{P}(0.15\leq W\leq 0.75)\\
& \quad\approx\Phi (0.75)-\Phi (0.15)=0.2138.\end{align*}

Example. Let \(X_{1},X_{2},\cdots ,X_{20}\) be the random variables of size \(20\) from the uniform distribution \(U(0,1)\). Then, we have \(\mathbb{E}(X_{i})=1/2\) and \(\mbox{Var}(X_{i})=1/12\) for \(i=1,\cdots, 20\). If \(Y=X_{1}+X_{2}+\cdots +X_{20}\), then we have

\begin{align*} & \mathbb{P}(Y\leq 9.1)\\ & \quad =\mathbb{P}\left (\frac{Y-20(1/2)}{\sqrt{20/12}}\leq\frac{9.1-10}{\sqrt{20/12}}\right )\\
& \quad =\mathbb{P}(W\leq -0.697)\\ & \quad\approx\Phi (-0.697)=0.2423\end{align*}

We also have

\begin{align*} & \mathbb{P}(8.5\leq Y\leq 11.7)\\ & \quad =\mathbb{P}\left (\frac{8.5-10}{\sqrt{5/3}}\leq\frac{Y-10}{\sqrt{5/3}}\leq\frac{11.7-10}{\sqrt{5/3}}\right )\\
& \quad =\mathbb{P}(-1.162\leq W\leq 1.317 )\\ & \quad \approx\Phi (1.317)-\Phi (-1.162)=0.7835.\end{align*}

Recall that a binomial random variable can be described as the sum of Bernoulli random variables. That is, let \(X_{1},X_{2},\cdots X_{n}\) be a random sample from a Bernoulli distribution with a mean \(\mu =p\) and a variance \(\sigma^{2}=p(1-p)\) for \(0<p<1\). Then \(Y=\sum_{i=1}^{n} X_{i}\) is \(B(n,p)\). The central limit theorem states that the distribution of

\[W=\frac{Y-np}{\sqrt{np(1-p)}}=\frac{\bar{X}-p}{\sqrt{p(1-p)/n}}\]

is \(N(0,1)\) in the limit \(n\rightarrow\infty\). Thus if \(n\) is sufficiently large, the distribution of \(Y\) is approximately \(N(np,np(1-p))\), and probabilities for the binomial distribution \(B(n,p)\) can be approximated using this normal distribution. A rule often stated is that \(n\) is “sufficiently large” when \(np\geq 5\) and \(n(1-p)\geq 5\). It can be used as a rough guide, although as \(p\) deviates more and more from 0.5, we need larger and larger sample sizes. Note that we shall be approximating probabilities for a discrete distribution with probabilities for a continuous distribution. We need the so-called continuity correction when we consider the probability \(\mathbb{P}(a\leq Y\leq b)\). To find the probabilities of the form \(\mathbb{P}(a\leq Y\leq b)\), we have

\[\mathbb{P}(a\leq Y\leq b)\approx \mathbb{P}(a-0.5\leq Y\leq b+0.5),\]

where \(Y\) is approximately \(N(np,np(1-p))\). Therefore, we have

\[\mathbb{P}(a\leq Y\leq b)=\Phi\left (\frac{b+0.5-np}{\sqrt{np(1-p)}}\right )-\Phi\left (\frac{a-0.5-np}{\sqrt{np(1-p)}}\right ).\]

Example. Let \(Y\) have the binomial distribution \(B(10,1/2)\). Then, we have

\begin{align*} \mathbb{P}(3\leq Y<6) & =\mathbb{P}(3\leq Y\leq 5)\\ & \approx \mathbb{P}(2.5\leq Y\leq 5.5).\end{align*}

Therefore, we obtain

\begin{align*} & \mathbb{P}\left (\frac{2.5-5}{\sqrt{10/4}}\leq\frac{Y-5}{\sqrt{10/4}}\leq
\frac{5.5-5}{\sqrt{10/4}}\right )\\ & \quad\approx\Phi (0.316)-\Phi (-1.581)=0.5670.\end{align*}

Example. Let \(Y\) be \(B(36,1/2)\). Since \(36\cdot (1/2)=18\) and \(36\cdot(1/2)(1/2)=9\), we have

\begin{align*} & \mathbb{P}(12<Y\leq 18)\\ & \quad =\mathbb{P}(13\leq Y\leq 18)\\ & \quad\approx \mathbb{P}(12.5\leq Y\leq 18.5)\\
& \quad=\mathbb{P}\left (\frac{12.5-18}{\sqrt{9}}\leq\frac{Y-18}{\sqrt{9}}\leq
\frac{18.5-18}{\sqrt{9}}\right )\\ & \quad\approx\Phi (0.167)-\Phi (-1.833)=0.5329.\end{align*}

A random variable having a Poisson distribution with mean \(\lambda\), where \(\lambda\) is assumed to be an integer, can be thought of as the sum \(Y\) of the observations of a random sample of size \(\lambda\) from a Poisson distribution with mean \(1\). Therefore,

\[W=\frac{Y-\lambda}{\sqrt{\lambda}}\]

has a distribution that is approximately \(N(0,1)\), and the distribution of \(Y\) is approximately \(N(\lambda ,\lambda )\). In general, if \(Y\) has a Poisson distribution with mean \(\lambda\), then the distribution of

\[W=\frac{Y-\lambda}{\sqrt{\lambda}}\]

is approximately \(N(0,1)\) when \(\lambda\) is sufficiently large. \(\sharp\)

Example. Let \(Y\) be a random variable having a Poisson distribution with mean 20. Then, we have

\begin{align*} & \mathbb{P}(16<Y\leq 21)\\ & \quad =\mathbb{P}(17\leq Y\leq 21)\\ & \quad\approx \mathbb{P}(16.5\leq Y\leq 21.5)\\
& \quad=\mathbb{P}\left (\frac{16.5-20}{\sqrt{20}}\leq\frac{Y-20}{\sqrt{20}}\leq
\frac{21.5-20}{\sqrt{20}}\right )\\ & \quad\approx\Phi (0.335)-\Phi (-0.783)=0.4142.\end{align*}

\begin{equation}{\label{c}}\tag{C}\mbox{}\end{equation}

Laws of Large Numbers.

We distinguish two categories of laws of large numbers. The strong laws of large numbers in which the convergence involved is convergence with probability one, and the weak laws of large numbers, where the convergence involved is the convergence in probability.

Theorem. (Strong Laws of Large Numbers). Let \(X_{1},\cdots ,X_{n}\) are i.i.d. with finite mean \(\mu\). Then, we have

\[\bar{X}_{n}=\frac{X_{1}+\cdots +X_{n}}{n}\stackrel{a.s.}{\longrightarrow}\mu .\]

The converse is also true, i.e., if \(\bar{X}_{n}\stackrel{a.s.} {\longrightarrow}\) to some finite constant \(\mu\), then \(\mathbb{E}[X_{i}]=\mu\).

Theorem. (Weak Laws of Large Numbers). Let \(X_{1},\cdots ,X_{n}\) be i.i.d. with finite mean \(\mu\). Then, we have

\[\bar{X}_{n}=\frac{X_{1}+\cdots +X_{n}}{n}\stackrel{\mathbb{P}}{\longrightarrow}\mu .\]

\begin{equation}{\label{d}}\tag{D}\mbox{}\end{equation}

Some Other Limit Theorems.

Theorem. Suppose that \(X_{n}\stackrel{\mathbb{P}}{\longrightarrow}X\) and \(Y_{n}\stackrel{\mathbb{P}}{\longrightarrow}Y\). Then, we have the following properties.

(i) Given any constants \(a\) and \(b\), we have \(aX_{n}+bY_{n}\stackrel{\mathbb{P}}{\longrightarrow}aX+bY\).

(ii) We have \(X_{n}Y_{n}\stackrel{\mathbb{P}}{\longrightarrow}XY\).

(iii) If \(\mathbb{P}(Y_{n}\neq 0)=\mathbb{P}(Y\neq 0)=1\) then \(X_{n}/Y_{n}\stackrel{\mathbb{P}}{\longrightarrow}X/Y\) .

Theorem. If \(X_{n}\stackrel{d}{\longrightarrow}X\) and \(Y_{n}\stackrel{d}{\longrightarrow}c\neq 0\), which is a constant, then we have the following properties.

(i) We have \(X_{n}+Y_{n}\stackrel{d}{\longrightarrow}X+c\).

(ii) We have \(X_{n}Y_{n}\stackrel{d}{\longrightarrow}cX\).

(iii) If \(\mathbb{P}(Y_{n}\neq 0)=1\) then \(X_{n}/Y_{n}\stackrel{d}{\longrightarrow}X/c\). \(\sharp\)

Let \(X_{1},\cdots ,X_{n}\) be i.i.d. random variables. We have the sample variance
\begin{align*} S_{n}^{2} & =\frac{1}{n}\sum_{i=1}^{n}(X_{i}-\bar{X}_{n})^{2}\\ & =\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}-\bar{X}_{n}^{2}.\end{align*}

Theorem. Let \(X_{1},\cdots ,X_{n}\) be i.i.d. random variables with \(\mathbb{E}[X_{i}]=\mu\) and \(\mbox{Var}[X_{i}]=\sigma^{2}\) for \(i=1,\cdots ,n\). Then \(S_{n}^{2}\stackrel{a.s.}{\longrightarrow}\sigma^{2}\) and also in probability.

Proof. Since the \(X\)’s are i.i.d with

\[\mathbb{E}[X_{i}^{2}]=\mbox{Var}[X_{i}]+(\mathbb{E}[X_{i}])^{2}=\sigma^{2}+\mu^{2},\]

the random variables \(X_{1}^{2},\cdots ,X_{n}^{2}\) are also i.i.d. Therefore, the strong laws of large numbers and weak laws of large numbers give

\[\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}\stackrel{a.s.}{\longrightarrow}\sigma^{2}+\mu^{2}\]

and also in probability. On the other hand, we have \(\bar{X}_{n}\stackrel{a.s.} {\longrightarrow}\mu\) and also in probability, which implies \(\bar{X}_{n}^{2} \stackrel{a.s.}{\longrightarrow}\mu^{2}\) and also in probability. Therefore, we have

\[\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}-\bar{X}_{n}^{2}\stackrel{a.s.}{\longrightarrow}\sigma^{2}+\mu^{2}-\mu^{2}=\sigma^{2}\]

and also in probability. This completes the proof. \(\blacksquare\)

Theorem. Let \(X_{1},\cdots ,X_{n}\) be i.i.d. random variables with mean \(\mu\) and variance \(\sigma^{2}\). Then, we have

\[\frac{\sqrt{n-1}\cdot (\bar{X}_{n}-\mu )}{S_{n}}\stackrel{d}{\longrightarrow}N(0,1)\]

and also

\[\frac{\sqrt{n}\cdot (\bar{X}_{n}-\mu )}{S_{n}}\stackrel{d}{\longrightarrow}N(0,1).\sharp\]

We have seen that \(X_{n}\stackrel{\mathbb{P}}{\longrightarrow}X\) does not necessarily imply \(X_{n}\stackrel{a.s.}{\longrightarrow}X\). However, we have the following result.

Theorem. If \(X_{n}\stackrel{\mathbb{P}}{\longrightarrow}X\), then there is a subsequence \(\{X_{n_{k}}\}_{k=1}^{\infty}\) satisfying \(X_{n_{k}}\stackrel{a.s.}{\longrightarrow}X\).

 

Hsien-Chung Wu
Hsien-Chung Wu
文章: 183

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *