Nomparametric Statistics

William Bradford (1823-1892) was an American painter.

We have sections

\begin{equation}{\label{a}}\tag{A}\mbox{}\end{equation}

Order Statistics.

The order statistics are the observations of the random sample arranged, or ordered, in magnitude from the smallest to the largest.

Example. The values \(x_{1}=0.62\), \(x_{2}=0.98\), \(x_{3}=0.31\), \(x_{4}=0.81\), and \(x_{5}=0.53\) are the \(n=5\) observed values of five independent trials of an experiment with p.d.f. \(f(x)=2x\) for \(0<x<1\). The observed order statistics are

\[y_{1}=0.31<y_{2}=0.53<y_{3}=0.62<y_{4}=0.81<y_{5}=0.98.\]

The difference of the largest and the smallest \(y_{5}-y_{1}=0.98-0.31=0.67\) is called the sample range. \(\sharp\)

Let \(X_{1},X_{2},\cdots ,X_{n}\) be observations of a random sample of size \(n\) from a continuous-type distribution, and let the random variables \(Y_{1}<Y_{2}<\cdots <Y_{n}\) denote the order statistics of that sample. This is,

\begin{align*}
&\mbox{$Y_{1}=$ smallest of \(X_{1},X_{2},\cdots ,X_{n}\),}\\
&\mbox{$Y_{2}=$ second smallest of \(X_{1},X_{2},\cdots ,X_{n}\),}\\
&\vdots\\
&\mbox{$Y_{n}=$ largest of \(X_{1},X_{2},\cdots ,X_{n}\).}
\end{align*}

Example. Let \(Y_{1}<Y_{2}<Y_{3}<Y_{4}<Y_{5}\) be the order statistics of a random sample \(X_{1},X_{2},X_{3},X_{4},X_{5}\) of size \(n=5\) from the distribution with p.d.f. \(f(x)=2x\) for \(0<x<1\). Consider \(\mathbb{P}(Y_{4}<1/2)\). For the event \(\{Y_{4}<1/2\}\) to occur, at least four of the random variables \(X_{1},\cdots ,X_{5}\) must be less than \(1/2\), since \(Y_{4}\) is the forurth smallest among the five observations. Therefore, if the event \(\{X_{i}<1/2\}\) for \(i=1,2,3,4,5\) is called “success”, we must have at least four successes in the five mutually independent trials, each of which has probability of success

\[\mathbb{P}\left (X_{i}\leq\frac{1}{2}\right )=\int_{0}^{1/2} 2xdx=\frac{1}{4}.\]

Therefore, we have

\[\mathbb{P}\left (Y_{4}\leq\frac{1}{2}\right )=C^{5}_{4}\left (\frac{1}{4}\right )^{4}\left (\frac{3}{4}\right )+\left (\frac{1}{4}\right )^{5}=0.0156.\]

In general, if \(0<y<1\), then the distribution function of \(Y_{4}\) is

\[G(y)=P(Y_{4}\leq y)=C^{5}_{4}(y^{2})^{4}(1-y^{2})+(y^{2})^{5},\]

since this represents the probability of at least four “successes” in five independent trials, each of which has probability of success

\[\mathbb{P}(X_{i}<y)=\int_{0}^{y} 2xdx=y^{2}.\]

For \(0<y<1\), the p.d.f. of \(Y_{4}\) is

\begin{align*}
g(y) & =G'(y)=C^{5}_{4}4(y^{2})^{3}(2y)(1-y^{2})C^{5}_{4}(y^{2})^{5}(-2y)+5(y^{2})^{4}(2y)\\
& =\frac{5!}{3!1!}(y^{2})^{3}(1-y^{2})(2y).
\end{align*}

Note that in this example, the distribution function of each \(X\) is \(F(x)=x^{2}\) when \(0<x<1\). Thus, we have

\[g(y)=\frac{5!}{3!1!}[F(y)]^{3}[1-F(y)]f(y)\]

for \(0<y<1\). \(\sharp\)

The preceding example should make the following generalization. Let \(Y_{1}<Y_{2}<\cdots <Y_{n}\) be the order statistics of a random sample of size \(n\) from a distribution of the continuous-type with distribution function \(F(x)\) and p.d.f. \(F'(x)=f(x)\), where \(0<F(x)<1\) for \(a<x<b\) and \(F(a)=0\), \(F(b)=1\). (It is possible that \(a=-\infty\) and/or \(b=+\infty\).) The event that the \(r\)th order statistic \(Y_{r}\) is at most \(y\), \(\{Y_{r}\leq y\}\), can occur if and only if at least \(r\) of the \(n\) observations are less than or equal to \(y\). That is, the probability of “success” on each trial is \(F(y)\), and we must have at least \(r\) successes. Therefore, we have

\[G_{r}(y)=\mathbb{P}(Y_{r}\leq y)=\sum_{k=r}^{n} C^{n}_{k}[F(y)]^{k}[1-F(y)]^{n-k}.\]

That is, rewriting this slightly, we have

\[G_{r}(y)=\sum_{k=r}^{n-1} C^{n}_{k}[F(y)]^{k}[1-F(y)]^{n-k}+[F(y)]^{n}.\]

Therefore, the p.d.f. of \(Y_{r}\) is given by

\begin{align*}
g_{r}(y) & =G’_{r}(y)=\sum_{k=r}^{n-1}C^{n}_{k}k[F(y)]^{k-1}f(y)[1-F(y)]^{n-k}\\
& \quad +\sum_{k=r}^{n-1}C^{n}_{k}[F(y)]^{k}(n-k)[1-F(y)]^{n-k-1}[-f(y)]+n[F(y)]^{n-1}f(y).
\end{align*}

Since
\[C^{n}_{k}k=\frac{n!}{(k-1)!(n-k)!}\mbox{ and }C^{n}_{k}(n-k)=\frac{n!}{k!(n-k-1)!},\]

the p.d.f. of \(Y_{r}\) is given by

\[g_{r}(y)=\frac{n!}{(r-1)!(n-r)!}[F(y)]^{r-1}[1-F(y)]^{n-r}f(y), a<y<b,\]

It is worth noting that the p.d.f. of the smallest order statistic is

\[g_{1}(y)=n[1-F(y)]^{n-1}f(y)\mbox{ for }a<y<b,\]

and the p.d.f. of the largest statistic is

\[g_{n}(y)=n[F(y)]^{n-1}f(y)\mbox{ for }a<y<b.\]

Example. Let \(Y_{1}<Y_{2}<Y_{3}<Y_{4}\) be the order statistics of a random sample of size \(n=4\) from a distribution with the uniform of p.d.f. \(f(x)=1\) for  \(0<x<1\). Since \(F(x)=x\), when \(0\leq x<1\), the p.d.f. of \(Y_{3}\) is

\[g_{3}(y)=\frac{4!}{2!1!}y^{2}(1-y)\mbox{ for }0<y<1.\]
Thus, for illustration, we have

\begin{align*} \mathbb{P}\left (\frac{1}{3}<Y_{3}<\frac{2}{3}\right ) & =\int_{1/3}^{2/3}12y^{2}(1-y)dy\\ & =\frac{13}{27}.\end{align*}

Example. Let \(Y_{1}<Y_{2}<\cdots <Y_{7}\) be the order statistics of a random sample of size \(n=7\) from a distribution with p.d.f. \(f(x)=3(1-x)^{2}\) for
$0<x<1$. Compute the probability \(P(Y_{4}<1-\sqrt[3]{0.6})\). We could find the p.d.f. of \(Y_{4}\). However, note that the probability of a single observation being less than \(1-\sqrt[3]{0.6}\) is

\[\int_{0}^{1-\sqrt[3]{0.6}} 3(1-x^{2})dx=0.4.\]

Thus, from the statistic table,

\begin{align*} \mathbb{P}(Y_{4}<1-\sqrt[3]{0.6}) & =\sum_{k=4}^{7}C^{7}_{k}0.4^{k}0.6^{7-k}\\ & =1-0.7102=0.2898.\end{align*}

Example. Let \(X_{1},X_{2},\cdots ,X_{n}\) denote a random sample from a distribution with p.d.f.

\[f(x;\theta )=e^{-(x-\theta )}\mbox{ for }\theta\leq x<\infty ,\]

where \(0<\theta <\infty\). The likelihood function is

\[L(\theta )=e^{-n\bar{x}+n\theta}, \theta\leq x_{i}\]

for \(i=1,2,\cdots ,n\), and zero otherwise. Therefore, we have

\[\ln L(\theta )=-n\bar{x}+n\theta ,\theta\leq x_{i}\]

for \(i=1,2,\cdots ,n\), and the derivative

\[\frac{\partial\ln L(\theta )}{\partial\theta}=n\]

is positive for all \(\theta\leq x_{i}\) for \(i=1,2,\cdots ,n\). That is, \(\ln L(\theta )\) and hence \(L(\theta )\) is an increasing function of \(\theta\). To achieve the maximum value of \(L(\theta )\), we accordingly make \(\theta\) as large as possible. Since \(\theta\leq x_{i}\) for \(i=1,2,\cdots ,n\), we have

\[\theta\leq\min\{x_{1},\cdots ,x_{n}\}=y_{1}.\]

That is, the maximum likelihood estimator is \(\widehat{\theta}=Y_{1}\), the first order statistic.

Now, we provide the notion of percentiles. Let \(X\) be a continuous-type random variable with p.d.f. \(f(x)\) and distribution function \(F(x)\). The \((100p)\)th distribution percentiles, where \(0<p<1\), is a number \(\pi_{p}\) satisfying

\[p=\int_{-\infty}^{\pi_{p}} f(x)dx=F(\pi_{p}).\]

The \(50\)th percentiles is called the median. Let \(m=\pi_{0.5}\). Let \(x_{1},x_{2},\cdots ,x_{n}\) be a sample observations. The \((100p)\)th sample percentiles has approximately \(np\) sample observations less than it and also \(100(1-p)\) sample observations greater than it. One way of achieving this is to take the \((100p)\)th sample percentiles as the \((n+1)p\)th order statistic, provided that \((n+1)p\) is an integer. If \((n+1)p\) is not an integer but is equal to \(r\) plus some proper fraction, say \(a/b\), use a weighted average of the \(r\)th and the \((r+1)\)th order statistics. That is, define the \((100p)\)th sample percentiles as

\begin{align*} \widetilde{\pi}_{p} & =y_{r}+(a/b)(y_{r+1}-y_{r})\\ & =(1-a/b)y_{r}+(a/b)y_{r+1}.\end{align*}

For illustration, we consider the following \(50\) ordered test scores

\[\begin{array}{cccccccccc}
34 & 38 & 42 & 42 & 45 & 47 & 51 & 52 & 54 & 57\\
58 & 58 & 59 & 60 & 61 & 63 & 65 & 65 & 66 & 67\\
68 & 69 & 69 & 70 & 71 & 71 & 72 & 73 & 73 & 74\\
75 & 75 & 76 & 76 & 77 & 79 & 81 & 81 & 82 & 83\\
83 & 84 & 84 & 85 & 87 & 90 & 91 & 93 & 93 & 97
\end{array}\]

With \(p=1/2\), we find the \(50\)th percentile by averaging the \(25\)th and \(26\)th order statistics, since \((n+1)p=51\cdot (1/2)=25.5\). Thus, the \(50\)th percentile is

\begin{align*} \widetilde{\pi}_{0.5} & =(1/2)y_{25}+(1/2)y_{26}\\ & =(71+71)/2=71.\end{align*}

With \(p=1/4\), we have \((n+1)p=51\cdot (1/4)=12.75\). Therefore, the \(25\)th percentile is

\begin{align*} \widetilde{\pi}_{0.25} & =(1-0.75)y_{12}+0.75\cdot y_{13}\\ & =0.25\cdot 58+0.75\cdot 59=58.75.\end{align*}

For \(p=3/4\), we have \((n+1)p=51\cdot (3/4)=38.25\). The \(75\)th percentile is

\begin{align*} \widetilde{\pi}_{0.75} & =(1-0.25)y_{38}+0.25\cdot y_{39}\\ & =0.75\cdot 81+0.25\cdot 82=81.25.\end{align*}

Note that approximately \(50\%\), \(25\%\), and \(75\%\) of the sample observations are less than \(71\), \(58.75\), and \(81.25\), respectively.

Recall that  if \(X\) has a distribution function \(F(x)\) of the continuous type, then \(F(X)\) has a uniform distribution on the interval zero to one. If \(Y_{1} <Y_{2}<\cdots <Y_{n}\) are the order statistics of a random sample \(X_{1},X_{2},\cdots ,X_{n}\) of size \(n\), then

\[F(Y_{1})<F(Y_{2})<\cdots <F(Y_{n})\]

since \(F\) is a non-decreasing function and the probability of an equality is again zero. Note that this last display could be looked upon as an ordering of the mutually independent random variables \(F(Y_{1}),F(Y_{2}),\cdots F(Y_{n})\), each of which is \(U(0,1)\). That is,

\[W_{1}=F(Y_{1})<W_{2}=F(Y_{2})<\cdots <W_{n}=F(Y_{n})\]

can be thought of as the order statistics of a random sample of size \(n\) from that uniform distribution. Since the distribution of \(U(0,1)\) is \(G(w)=w\) for  \(0<w<1\), the p.d.f. of the \(r\)th order statistic \(W_{r}=F(Y_{r})\) is given by

\[h_{r}(w)=\frac{n!}{(r-1)!}{(n-r)!}w^{r-1}(1-w)^{n-r}\mbox{ for }0<w<1.\]

The mean \(\mathbb{E}(W_{r})=\mathbb{E}[F(Y_{r})]\) of \(W_{r}=F(Y_{r})\), is given by the integral

\[\mathbb{E}(W_{r})=\int_{0}^{1} w\frac{n!}{(r-1)!(n-r)!}w^{r-1}(1-w)^{n-r}dw.\]

This can be evaluated by integrating by parts several times, but it is easier to obtain the answer if we rewrite it as follows
\[\mathbb{E}(W_{r})=\left (\frac{r}{n+1}\right )\int_{0}^{1}\frac{(n+1)!}{r!(n-r)!}w^{r}(1-w)^{n-r}dw.\]

The integrand in this last expression can be thought of as the p.d.f. of the \((r+1)\)th order statistic of a random sample of size \(n+1\) from a \(U(0,1)\) distribution. Hence,  the integral must equal \(1\) and

\[\mathbb{E}(W_{r})=\frac{r}{n+1}\mbox{ for }r=1,2,\cdots ,n.\]

There is an extremely interesting interpretation of \(W_{r}=F(Y_{r})\). Note that \(F(Y_{r})\) is the cummulated probability up to and including \(Y_{r}\) or, equivalently, the area under \(f(x)=F'(x)\) but less than \(Y_{r}\). Hence \(F(Y_{r})\) can be treated as a random area. Since \(F(Y_{r-1})\) is also a random area, \(F(Y_{r}-Y_{r-1})\) is the random area under \(f(x)\) between \(Y_{r}\) and \(Y_{r-1}\). The expected value of the random area between any two adjacent order statistics is

\[\mathbb{E}[F(Y_{r})-F(Y_{r-1})]=\frac{r}{n+1}-\frac{r-1}{n+1}=\frac{1}{n+1}.\]

Also it is easy to show

\[\mathbb{E}[F(Y_{1})]=\frac{1}{n+1}\mbox{ and }\mathbb{E}[1-F(Y_{n})]=\frac{1}{n+1}.\]

That is, the order statistics \(Y_{1}<Y_{2}<\cdots <Y_{n}\) partition the support of \(X\) into \(n+1\) parts and thus create \(n+1\) areas under \(f(x)\) and above the \(x\)-axis. “On the average”, each of the \(n+1\) areas equals \(1/(n+1)\). If we reall that the \((100p)\)th percetile \(\pi_{p}\) is such that the area under \(f(x)\) to the left of \(\pi_{p}\) is \(p\), the preceding discussion suggests that we let \(Y_{r}\) be an estimator of \(\pi_{p}\), where \(p=r/(n+1)\). For this reason, we define the \((100p)\)th percentile of the sample as \(Y_{r}\) where \(r=(n+1)p\) provided that \((n+1)p\) is an integer. In case \((n+1)p\) is not an integer, we use a weighted average (on an average) of the two adjacent order statistics \(Y_{r}\) and \(Y_{r+1}\), where \(r\) is the greatest integer \([(n+1)p]\) in \((n+1)p\). In particular, the sample median is

\[\widetilde{m}=\left\{\begin{array}{ll}
Y_{(n+1)/2} & \mbox{when \(n\) is odd}\\
{\displaystyle \frac{Y_{n/2}+Y_{(n/2)+1}}{2}} & \mbox{when \(n\) is even}
\end{array}\right .\]

Example. Let \(X\) equal the weight of soap in a “$1000$-gram” bottle. A random sample of \(n=12\) observations of \(X\) yielded the following weights that have been ordered

\[\begin{array}{cccccc}
1013 & 1019 & 1021 & 1024 & 1026 & 1028\\
1033 & 1035 & 1039 & 1040 & 1043 & 1047
\end{array}\]

Since \(n=12\) is even, the sample median is

\begin{align*} \widetilde{m} & =\frac{y_{6}+y_{7}}{2}\\ & =\frac{1028}{1033}{2}=1030.5\end{align*}

The location of the \(25\)th percentile is

\[(n+1)\cdot 0.25=(12+1)\cdot 0.25=3.25.\]

Thus the \(25\)th percentile, using a weighted average is

\begin{align*} \widetilde{\pi}_{0.25} & =y_{3}+0.25(y_{4}-y_{3})=0.75y_{3}+0.25y_{4}\\ & =0.75\cdot 1021+0.25\cdot 1024=1021.75.\end{align*}

Similarly, \((12+1)\cdot 0.75=9.75\), the \(75\)th percentile is

\begin{align*} \widetilde{\pi}_{0.75} & =y_{9}+0.75(y_{10}-y_{9})=0.25y_{9}+0.75y_{10}\\ & =0.25\cdot 1039+0.75\cdot 1040=1039.75.\end{align*}

Since \((12+1)\cdot 0.6=7.8\), the \(60\)th percentile is

\begin{align*} \widetilde{\pi}_{0.6} & =0.2y_{7}+0.8y_{8}\\ & =0.2\cdot 1033+0.8\cdot 1035=1034.6.\end{align*}

\begin{equation}{\label{b}}\tag{B}\mbox{}\end{equation}

Confidence Intervals for Percentiles.

In the previous discussion, we defined the sample percentiles in terms of the order statistics and noted that the sample percentiles can be used to estimate the corresponding distribution percentiles. Now, we use the order statistics to construct confidence intervals for the unknown distribution percentiles. Since little is assumed about the underlying distribution in the construction of these confidence intervals, they are often called distribution-free confidence intervals. If \(Y_{1}<Y_{2}<Y_{3}<Y_{4}<Y_{5}\) are the order statistics of a random sample of size \(n=5\) from a continuous-type distribution, then the sample median \(Y_{3}\) could be thought of as an estimator of the distribution median \(\pi_{0.5}\). Let \(m=\pi_{0.5}\). We could simply use the sample median \(Y_{3}\) as an estimator of the distribution median \(m\). However, we are certain that all of us recognize that, with only a sample size \(5\), we would be quite lucky if the observed \(Y_{3}=y_{3}\) were very close to \(m\). Therefore, we describe how a confidence interval can be constructed for \(m\). Instead of simply \(Y_{3}\) as an estimator of \(m\), let us also compute the probability that the random interval \((Y_{1},Y_{5})\) includes \(m\). That is, let us determine \(\mathbb{P}(Y_{1}<m<Y_{5})\). Again, we say that we have success if an individual item, say \(X\), is less than \(m\). Therefore, the probability of success on one of the independent trials is \(\mathbb{P}(X<m)=0.5\). In order for the first order statistic \(Y_{1}\) to be less than \(m\) and the last order statistic \(Y_{5}\) to be greater than \(m\), we must have at least one success but not five successes. That is, we have

\begin{align*} \mathbb{P}(Y_{1}<m<Y_{5}) & =\sum_{k=1}^{4} C^{5}_{k}\left (\frac{1}{2}\right )^{k}\left (\frac{1}{2}\right )^{5-k}\\ & =1-\left (\frac{1}{2}\right )^{5}-\left (\frac{1}{2}\right )^{5}=\frac{15}{16}.\end{align*}

The probability that the random interval \((Y_{1},Y_{5})\) includes \(m\) is \(15/16\approx 0.94\). Suppose that this random sample is actually taken and
the order statistics are observed to equal \(y_{1}<y_{2}<y_{3}<y_{4}<y_{5}\), respectively. Then \((y_{1},y_{5})\) is a \(94\%\) confidence interval for \(m\).

It is interesting to note what happens as the sample size increases. Let \(Y_{1}<Y_{2}<\cdots <Y_{n}\) be the order statistics of a random sample of size \(n\) from a distribution of the continuous-type. Thus \(P(Y_{1}<m<Y_{n})\) is the probability that there is at least one “success” but not \(n\) successes, where the probability of success on each trial is \(\mathbb{P}(X<m)=0.5\). Consequently, we have

\begin{align*} \mathbb{P}(Y_{1}<m<Y_{n}) & =\sum_{k=1}^{n-1} C^{n}_{k}\left (\frac{1}{2}\right )^{k}\left (\frac{1}{2}\right )^{n-k}\\ & =1-\left (\frac{1}{2}\right )^{n}-\left (\frac{1}{2}\right )^{n}=1-\left (\frac{1}{2}\right )^{n-1}.\end{align*}

This probability increases as \(n\) increases such that the corresponding confidence interval \((y_{1},y_{n})\) would have a very large confidence coefficient \(1-(1/2)^{n-1}\). However, the interval \((y_{1},y_{n})\) tends to get wider when \(n\) increases. If we used the interval \((y_{2},y_{n-1})\) or \((y_{3},y_{n-2})\), we would obtain shorter intervals but also smaller confidence coefficients. With the order statistics \(Y_{1}<Y_{2}<\cdots < Y_{n}\) associated with a random sample of size \(n\) from a continuous-type distribution, we consider \(P(Y_{i}<m<Y_{j})\) for \(i<j\). For example, we might want

\[\mathbb{P}(Y_{2}<m<Y_{n-1})\mbox{ or }\mathbb{P}(Y_{3}<m<Y_{n-2}).\]

On each of the \(n\) independent trials, we say that we have success if \(X\) is less than \(m\). Therefore, the probability of success on each trial is \(\mathbb{P}(X<m)=0.5\). Consequently, to have the \(i\)th order statistic \(Y_{i}\) less than \(m\) and the \(j\)th order statistic \(Y_{j}\) greater than \(m\), we must have at least \(i\) successes but fewer that \(j\) successes (or else \(Y_{j}<m\)). That is, we have

\[\mathbb{P}(Y_{i}<m<Y_{j})=\sum_{k=i}^{j-1} C^{n}_{k}\left (\frac{1}{2}\right )^{k}
\left (\frac{1}{2}\right )^{n-k}=1-\alpha .\]

For particular values of \(n\), \(i\), and \(j\), this probability \(1-\alpha\), which is the sum of probabilities from a binomial distribution, can be calculated directly or approximated by an area under the normal p.d.f provided that \(n\) is large enough. The observed interval \((y_{i},y_{j})\) could serve as a \(100(1-\alpha )\%\) condidence interval for the unknown distribution median \(m\).

Example. The lengths in centimeters of \(n=9\) fish of a particular species (nezumia) captured off the New England coast were \(32.5\), \(27.6\), \(29.3\), \(30.1\), \(15.5\), \(21.7\), \(22.8\), \(21.2\), \(19.0\). Therefore, the observed order statistics are

\[15.5<19.0<21.2<21.7<22.8<27.6<29.3<30.1<32.5.\]

Before the sample is drawn, we know

\begin{align*} \mathbb{P}(Y_{2}<m<Y_{8}) & =\sum_{k=2}^{7} C^{9}_{k}\left (\frac{1}{2}\right )^{k}\left (\frac{1}{2}\right )^{9-k}\\ & =0.9805-0.0195=0.9610,\end{align*}

from the statistic table. Therefore, the confidence interval \((y_{2},y_{8})=(19.0,30.1)\) for \(m\), the median of the lengths of all fish of this species, has a \(96.1\%\) confidence coefficient. \(\sharp\)

For sample sizes larger than \(20\), we approximate those binomial probabilities with areas under the normal curve. To illustrate how good these approximations are, we compute the probability with \(n=16\). Using the statistic table, we have

\begin{align*}
1-\alpha & =\mathbb{P}(Y_{5}<m<Y_{12})\\ & =\sum_{k=5}^{11} C^{16}_{k}\left (\frac{1}{2}\right )^{k}\left (\frac{1}{2}\right )^{16-k}\\
& =\mathbb{P}(W=5,6,\cdots ,11)\\ & =0.9616-0.0384=0.9232,
\end{align*}

where \(W\) is \(B(16,1/2)\). The normal approximation gives

\begin{align*} 1-\alpha & =\mathbb{P}(4.5<W<11.5)\\ & =\mathbb{P}\left (\frac{4.5-8}{2}<W<\frac{11.5-8}{2}\right )\end{align*}

since \(W\) has mean \(np=8\) and variance \(np(1-p)=4\). The standardized variable \(Z=(W-8)/2\) has an approximate normal distribution from CLT. Therefore, we have

\begin{align*} 1-\alpha & \approx\Phi\left (\frac{3.5}{2}\right )-\Phi\left (\frac{-3.5}{2}\right )\\ & =\Phi (1.75)-\Phi (-1.75)=0.9599-0.0401=0.9198.\end{align*}

This compares very favorably with the probability \(0.9232\) computed above.

The argument used to find a confidence interval for the median \(m\) of a distribution of the continuous type can be applied to any percentile \(\pi_{p}\). In this case, we say that we have success on a single trial if \(X\) is less than \(\pi_{p}\). Therefore, the probability of success on each of the independent trials is \(\mathbb{P}(X<\pi_{p})=p\). Accordingly for$i<j$, \(1-\alpha =\mathbb{P}(Y_{i}<\pi_{p}<Y_{j})\) is the probability that we have at least \(i\) successes but fewer than \(j\) successes. Therefore, we have

\[1-\alpha =\mathbb{P}(Y_{i}<\pi_{p}<Y_{j})=\sum_{k=i}^{j-1}C^{n}_{k}p^{k}(1-p)^{n-k}.\]

Once the sample is observed and the order statistics is determined, the known interval \((y_{i},y_{j})\) could serve as a \(100(1-\alpha )\%\) confidence interval for the unknown distribution percentile \(\pi_{p}\).

Example. Let the following numbers represent the order statistics of the \(n=27\) observations obtained in a random sample from a certain population of
incomes (measured in hundreds of dollars)

\[\begin{array}{cccccc}
161 & 180 & 192 & 205 & 229 & 264\\
169 & 183 & 193 & 213 & 241 & 291\\
171 & 184 & 196 & 221 & 243 & 317\\
174 & 186 & 200 & 222 & 256 & 376\\
179 & 187 & 204
\end{array}\]

We are interested in estimating the \(25\)th percentile \(\pi_{0.25}\) of the population. Since \((n+1)p=28\cdot (1/4)=7\), the \(7\)th order statistic, namely \(y_{7}=183\), would be a point estimate of \(\pi_{0.25}\). To find a confidence interval for \(\pi_{0.25}\), let us move down and up a few order statistics from \(y_{7}\), say to \(y_{4}\) and \(y_{10}\). Before the sample was drawn, we had

\begin{align*} 1-\alpha & =\mathbb{P}(Y_{4}<\pi_{0.25}<Y_{10})\\ & =\sum_{k=4}^{9} C^{27}_{k}0.25^{k}\cdot 0.75^{27-k}=\mathbb{P}(3.5<W<9.5),\end{align*}

where \(W\) is \(B(27,1/4)\) with mean \(27/4=6.75\) and variance \(81/16\). Therefore, we have

\begin{align*} 1-\alpha & \approx\Phi\left (\frac{9.5-6.75}{9/4}\right )-\Phi\left (\frac{3.5-6.75}{9/4}\right )\\ & =\Phi\left (\frac{11}{9}\right )-\Phi\left (-\frac{13}{9}\right )=0.8149.\end{align*}

Thus \((y_{4},y_{10})=(174,187)\) serves as an \(81.49\%\) confidence interval for \(\pi_{0.25}\). It should be noted that we could chose other intervals, such as \((y_{3},y_{11})=(171,192)\), and these would have different confidence coefficients. The person involved in the study must select the desired confidence coefficient, and then the appropriate order statistics are taken, usually quite symmetrically about \((n+1)p\)th order statistic. \(\sharp\)

\begin{equation}{\label{c}}\tag{C}\mbox{}\end{equation}

Binomial Tests for Percentiles.

We now take up some tests of statistical hypotheses that are distribution free, provided that the null hypotheses are true. Now, we find a distribution-free test of the hypothesis \(H_{0}:\pi_{p}=\pi_{0}\), where \(\pi_{0}\) is given. It should be noted that the powers of these tests are not distribution free when the null hypotheses are false. Moreover, the resulting non-null distribution theory is frequently quite complicated. So little will be said about the power functions. We begin by considering a hypothesis about the median \(m=\pi_{0.5}\) of a distribution of the continuous type. Further, we assume that the median \(m\) is unique. Suppose it is known from the past experience that the median length of sunfish in a particular polluted lake was \(m=3.7\) inches.

During the past two years the lake was “cleaned up”, and the conjecture is made that now \(m>3.7\) inches. We consider a procedure for testing this conjecture. In particular, we describe a procedure for testing the null hypothesis \(H_{0}:m=3.7\) against the alternative hypothesis \(H_{1}:m>3.7\). Let \(X\) denote the length of a sunfish selected at random from the lake. If the null hypothesis \(H_{0}:m=3.7\) is true, we have \(\mathbb{P}(X\leq 3.7;H_{0})=0.5\). However, if the alternative hypothesis \(H_{1}:m>3.7\) is true, we have \(\mathbb{P}(X\leq 3.7;H_{0})<0.5\). If we take a random sample of \(n=10\) fish, we would expect about half the fish to be shorter than \(3.7\) inches if \(H_{0}\) is true. However, if \(H_{1}\) is true, we would expect less than half the fish to be shorter than \(3.7\) inches. We shall decide on the basis of the number. say \(Y\), of fish shorter than \(3.7\) inches whether to accept \(H_{0}\) or \(H_{1}\). Therefore. \(Y\) can be thought of as the number of “success” in \(10\) Bernoulli trials with probability of success given by \(p=\mathbb{P}(X\leq 3.7)\). If \(H_{0}\) is true, \(p=1/2\) and \(Y\) is \(B(10,1/2)\); whereas if \(H_{1}\) is true, \(p<1/2\) and \(Y\) is \(B(10,p)\). We reject \(H_{0}\) and accept \(H_{1}\) if and only if the observed value \(y\) of \(Y\) is sufficiently small, say \(y\leq c\). From the (binomial) statistic table we find \(\mathbb{P}(Y\leq 2;H_{0})=0.0547\). Therefore, let the critical region be defined by \(C=\{y:y\leq 2\}\). Then, we have \(\alpha =0.0547\). Suppose that the lengths of \(10\) subfish selected at random from this lake were \(5.0\), \(3.9\), \(5.2\), \(5.5\), \(2.8\), \(6.1\), \(6.4\), \(2.6\), \(1.7\), and \(4.3\) inches. Since \(y=3\) of these lengths are less than \(3.7\), we would not reject \(H_{0}:m=3.7\) at the \(\alpha =0.0547\) significance level by considering only those few data.

Example. Let \(X\) denote the length of time in seconds between two calls entering a college switchboard. Let \(m\) be the unique median of this continuous-type distribution. We test the null hypothesis \(H_{0}:m=6.2\) against the alternative hypothesis \(H_{1}:m<6.2\). If \(Y\) is the number of lengths of time in a random sample of size \(20\) that are less than \(6.2\), the critical region \(C=\{y:y\geq 14\}\) has a significance level of \(\alpha =0.0577\) using (binomial) statistic table. A random sample of size \(20\) yielded the following data

\[\begin{array}{llllllll}
6.8 & 5.7 & 6.9 & 5.3 & 4.1 & 9.8 & 1.7 & 7.0\\
2.1 & 19.0 & 18.9 & 16.9 & 10.4 & 44.1 & 2.9 & 2.4\\
4.8 & 18.9 & 4.8 & 7.9 &&&&
\end{array}\]

Since \(y=9\), the null hypothesis is not rejected. \(\sharp\)

In many places in the literature the test we have just described is called the sign test. The reason for this terminology is that the test is based on a statistic \(Y\) that is equal to the number of negative signs among (here \(m_{0}\) is the hypothesized median)
\[X_{1}-m_{0},X_{2}-m_{0},\cdots ,X_{n}-m_{0}.\]

The sign test can also be used to test the hypothesis that two continuous-type random variables \(X\) and \(Y\) satisfy \(p=\mathbb{P}(X>Y)=1/2\). To test the hypothesis \(H_{0}:p=1/2\) against an appropriate alternative hypothesis, we consider the independent pairs \((X_{1},Y_{1}),(X_{2},Y_{2}), \cdots ,(X_{n},Y_{n})\). Let \(W\) denote the number of pairs for which \(X_{i}-Y_{i}>0\). When \(H_{0}\) is true, \(W\) is \(B(n,1/2)\), and the test can be based on the statistic \(W\). It is important to recognize that \(X_{i}\) and \(Y_{i}\) do not need to be independent. Then, there are many ways in which \(X_{1},X_{2},\cdots ,X_{n}\) and \(Y_{1},Y_{2},\cdots ,Y_{n}\) can be paired up. If \(X\) is the length of the right foot of a person and \(Y\) is the length of the corresponding left foot, there is a natural pairing. Here \(H_{0}:p=P(X>Y)=1/2\) suggests that either foot of a particular person is equally likely to be longer.

\begin{equation}{\label{ch4ex3}}\tag{1}\mbox{}\end{equation}

Example \ref{ch4ex3}. Freshman in a health dynamics course have their percentage of body fat measured at the beginning \((x)\) and at the end \((y)\) of the semester. These measurements are given for \(26\) students in the following table. Also recorded is a plus sign when \(x_{i}>y_{i}\) and a minus sign with \(x_{i}<y_{i}\).

\[\begin{array}{ccc|ccc}
\hline x & y & \mbox{Sign} & x & y & \mbox{Sign}\\
\hline 35.4 & 33.6 & + & 22.4 & 21. & +\\
28.8 & 31.9 & – & 23.5 & 24,5 & -\\
10.6 & 31.9 & + & 24.1 & 21.9 & +\\
16.7 & 15.6 & + & 22.5 & 21.7 & +\\
14.6 & 14.0 & + & 17.5 & 17.9 & -\\
8.8 & 13.9 & – & 16.9 & 14.9 & +\\
17.9 & 8.7 & + & 11.7 & 17.5 & -\\
17.8 & 17.6 & + & 8.3 & 11.7 & -\\
9.3 & 8.9 & + & 7.9 & 10.2 & -\\
23.6 & 23.6 & 0 & 20.7 & 17.7 & +\\
15.6 & 13.7 & + & 26.8 & 24.1 & +\\
24.3 & 24.7 & – & 20.6 & 20.4 & +\\
23.8 & 25.3 & – & 25.1 & 21.9 & +\\
\hline\end{array}\]

In theory, the case \(x_{i}=y_{i}\) should not happen. Since measurments are rounded off, ties do occur in applications. When a tie does occur, the subject is dropped and the test is run with a reduced sample size. We test the hypothesis \(H_{0}:p=\mathbb{P}(X>Y)=1/2\) against the one-sided alternative \(H_{1}:p>1/2\). That is, the alternative hypothesis is that the students’ percentage of body fat decreases during the semester. Let the critical region be defined by \(C=\{w:w\geq 17\}\), where \(w\) is the number of “plus sign”. The distribution of \(W\) is \(B(25,1/2)\) when \(H_{0}\) is true. Therefore, we have

\[\alpha =\mathbb{P}(W\geq 17;p=1/2)=0.0539\]

from the (binomial) statistic table. Since \(w=16\), the null hypothesis is not rejected with these data. Although they suggest, with more data of the same type, we might be able to reject \(H_{0}\). The \(p\)-value of this test is
\[p\mbox{-value}=\mathbb{P}(W\geq 16;p=1/2)=0.1148.\]

How do we decide when to use the sign test? In the situation in which we need a test about a location parameter of a distribution, would we use the sign test to test \(m=m_{0}\) or a test based on \(\bar{X}\) to test \(\mu =\mu_{0}\)? In particular, if the distribution is asymmetric, then \(m=\mu\), and hence each procedure would be testing the hypothesis that the common value of \(m\) and \(\mu\) is equal to a known constant. In more advanced theory and practice, we discover that if there are outliers (i.e. extreme \(x\) values that deviate greatly from the most of the other values), the sign test is usually better than that based on \(\bar{X}\). This is certainly appeals to our intuition, since a few extreme values can influence the average \(\bar{x}\) a great deal; but in the sign test, each extreme value is associated with only one sign no matter how far it is from \(m\). Hence, in cases with highly skewed data or data with outliers, the distribution-free sign test would probably be preferred to that based on \(\bar{X}\).

Another advantage of the sign test is the fact that it can be easily generalized to percentile other than median. To test the hypothesis that the \((100p)th\) percentile \(\pi_{p}\) of a continuous-type distribution is equal to a specified value \(\pi_{0}\), let \(Y\) be the number of the items of the random sample of size \(n\) that are less than \(\pi_{0}\). If \(H_{0}:\pi_{p}=\pi_{0}\) is true, then \(Y\) is \(B(n;p)\), where  \(p\) is a known probability. If, for example, the alternative hypothesis \(H_{1}:\pi_{p}>\pi_{0}\), then the critical region would be of the form \(C=\{y:y\geq c\}\).

Example. Suppose, from past testing, we know that the \(25\)th percentile of ninth-grade general mathematics students taking a standardized examination is \(62.4\). Statisticians believe that by introducing more statistics involving real problems into such a course, the students will see the usefulness of mathematics. It is hoped that this approach will motivate the poorer students in these classes and increase some of the lower percentiles. In particular, they believe that if these courses were changed as suggested, then \(H_{1}:\pi_{0.25}>62.4\). To test \(H_{0}:\pi_{0.25}=62.4\) against \(H_{1}\), \(192\) general mathematics students were selected at random and given this new type of course using more statistics. Let \(Y\) be the number of students scoring less than \(62.4\). If \(H_{0}\) is true, then \(Y\) is \(B(192,1/4)\). If \(H_{1}\) is true, we would expect smaller values of \(Y\), and thus the critical region is of the form \(y\leq c\). To determine \(c\), we can use the fact that \(Y\), under \(H_{0}\), has an approximate normal distribution with mean \(192\cdot (1/4)=48\) and variance \(192\cdot (1/4)(3/4)=36\). Thus, for the significance level to be about \(0.05\), we want \(c\approx 48-1.645\cdot 6=38.13\). Accordingly, if we take \(c=38\), then the significance level is

\begin{align*} \mathbb{P}(Y\leq 38) & =\mathbb{P}(Y<38.5)\\ & =\mathbb{P}\left (\frac{Y-48}{6}<\frac{-9.5}{6}\right )\\ & \approx\Phi (-1.583)=0.0567.\end{align*}

Suppose that after the course is over, these \(192\) students take the standardized examination and only \(31\) have grades less than \(62.4\). We would reject \(H_{0}:\pi_{0.25}=62.4\) and accept \(H_{1}:\pi_{0.25}> 62.4\); that is, it seems as if the \(25\)th percentile is greater in the new type of general mathematics course than that associated with the old one. \(\sharp\)

\begin{equation}{\label{d}}\tag{D}\mbox{}\end{equation}

Wilcoxon Test.

Let \(X\) be a continuous-type random variable. Let \(m\) denote the median of \(X\). To test the hypothesis \(H_{0}:m=m_{0}\) against an appropriate alternative hypothesis, the sign test can be used. That is, if \(X_{1},X_{2},\cdots ,X_{n}\) denote the items of a random sample from this distribution and \(Y\) denotes the number of negative differences among \(X_{1}-m_{0}, X_{2}-m_{0},\cdots , X_{n}-m_{0}\), then \(Y\) has the binomial distribution \(B(n,1/2)\) under \(H_{0}\) and is the test statistic for the sign test. One major objection to this test is that it does not take into account the magnitude of the differences \(X_{1}-m_{0},\cdots ,X_{n}-m_{0}\). Here we discuss a test that does take into account the magnitude of the differences \(|X_{i}-m_{0}|\) for \(i=1,\cdots ,n\). However, in addition to the assumption that the random variable \(X\) is of continuous type, we must also assume that the p.d.f. of \(X\) is symmetric about the median in order to find the distribution of this new statistic. Because of the continuity assumption, we assume that no observations are equal and that no observation is equal to the median. We are interested in testing the hypothesis \(H_{0}:m=m_{0}\) where \(m_{0}\) is some given constant. With our random sample \(X_{1},X_{2},\cdots ,X_{n}\), we rank the differences \(X_{1}-m_{0},X_{2}-m_{0},\cdots ,X_{n}-m_{0}\) in ascending order according to magnitude. That is, for \(i=1,2,\cdots ,n\), let \(R_{i}\) denote the rank of \(|X_{i}-m_{0}|\) among \(|X_{1}-m_{0}|,|X_{2}-m_{0}|,\cdots , |X_{n}-m_{0}|\). Note that \(R_{1},R_{2},\cdots ,R_{n}\) is a permutation of the first \(n\) positive integers, \(1,2,\cdots ,n\). Now with each \(R_{i}\) we associate the sign of the difference \(X_{i}-m_{0}\); that is, if \(X_{i}-m_{0}>0\), we use \(R_{i}\), but if \(X_{i}-m_{0}<0\), we use \(-R_{i}\). The Wilcoxon statistic \(W\) is the sum of these \(n\) signed ranks.

Example. Consider the sunfish example in the above discussion. There we considered testing \(H_{0}:m=3.7\) against the alternative hypothesis \(H_{1}:m>3.7\). The observed lengths of the \(n=10\) fish were

\[x_{i}:5.0, 3.9, 5.2, 5.5, 2.8, 6.1, 6.4, 2.6, 1.7, 4.3\]

Thus we have

\[\begin{array}{lcccccccccc}
x_{i}-m_{0}: & 1.3 & 0.2 & 1.5 & 1.8 & -0.9 & 2.4 & 2.7 & -1.1 & -2.0 & 0.6\\
|x_{i}-m_{0}|: & 1.3 & 0.2 & 1.5 & 1.8 & 0.9 & 2.4 & 2.7 & 1.1 & 2.0 & 0.6\\
\mbox{Rand}: & 5 & 1 & 5 & 7 & 3 & 9 & 10 & 4 & 8 & 2\\
\mbox{Signed Rank}: & 5 & 1 & 6 & 7 & -3 & 9 & 10 & -4 & -8 & 2
\end{array}\]

Therefore, the Wilcoxon statistic is equal to

\[W=5+1+6+7-3+9+10-4-8+2=25.\]

Incidently, the positive answer seems reasonable because the number of the \(10\) lengths that are less than \(3.7\) is \(3\), which is the statistic used in the sign test. \(\sharp\)

If the hypothesis \(H_{0}:m=m_{0}\) is true, about one-half of the differences would be negative and thus about one-half of the signs would be negative. Thus it seems that the hypothesis \(H_{0}:m=m_{0}\) is supported if the observed value \(W\) is close to zero. If the alternative hypothesis is \(H_{1}:m>m_{0}\), we would reject \(H_{0}\) if the observed \(W=w\) is too large, since, in this case, the larger deviations \(|X_{i}-m_{0}|\) would usually be associated with observations for which \(x_{i}-m_{0}>0\). That is, the critical region would be of the form \(\{w:w\geq c_{1}\}\). If the alternative hypothesis is \(H_{1}:m<m_{0}\), the critical region would be of the form \(\{w:w\leq c_{2}\}\). Also, the critical region would be of the form \(\{w:w\leq c_{3}\) or \(w\geq c_{4}\}\) for a two-sided alternative hypothesis \(H_{1}:m\neq m_{0}\). In order to find the values of \(c_{1},c_{2},c_{3},c_{4}\) that yield desired significance levels, it is necessary to determine the distribution of \(W\) under \(H_{0}\). When \(H_{0}:m=m_{0}\) is true, we have

\begin{align*} \mathbb{P}(X_{i}<m_{0}) & =\mathbb{P}(X_{i}>m_{0})\\ & =\frac{1}{2}\end{align*}

for \(i=1,2,\cdots ,n\). Therefore, the probability is \(1/2\) that a negative sign is associated with the rank \(R_{i}\) of \(|X_{i}-m_{0}|\). Moreover, the assignments of these \(n\) signs are independent, since \(X_{1},X_{2},\cdots ,X_{n}\) are mutually independent. In addition, \(W\) is a sum that contains the integers \(1,2,\cdots ,n\), each integer with a positive or negative sign. Since the underlying distribution is symmetric, it seems intuitively obvious that \(W\) has the same distribution as the random variable \(V=\sum_{i=1}^{n} V_{i}\), where \(V_{1},V_{2},\cdots ,V_{n}\) are independent and

\[\mathbb{P}(V_{i}=i)=\mathbb{P}(V_{i}=-i)=\frac{1}{2}\]

for \(i=1,2,\cdots ,n\). That is, \(V\) is a sum that contains the integers \(1,2,\cdots ,n\), and these integers receive their algebraic signs by independent assignments. Since \(W\) and \(V\) have the same distribution, their means and variances are equal, and we can easily find those of \(V\). Now that the mean of \(V_{i}\) is

\[\mathbb{E}(V_{i})=-i\cdot\frac{1}{2}+i\cdot\frac{1}{2}=0\]

and thus

\[\mathbb{E}(W)=\mathbb{E}(V)=\sum_{i=1}^{n} \mathbb{E}(V_{i})=0.\]

The variance of \(V_{i}\) is

\begin{align*} \mbox{Var}(V_{i}) & =\mathbb{E}(V_{i}^{2})\\ & =(-i)^{2}\cdot\frac{1}{2}+i^{2}\cdot\frac{1}{2}=i^{2}.\end{align*}

Thus

\begin{align*} \mbox{Var}(W) & =\mbox{Var}(V)=\sum_{i=1}^{n} \mbox{Var}(V_{i})\\ & =\sum_{i=1}^{n} i^{2}=\frac{n(n+1)(2n+1)}{6}.\end{align*}

We shall not try to find the distribution of \(W\) in general, since the p.d.f. does not have a convenient expression. However, we demonstrate how we could find the distribution of \(W\) (or \(V\)) with enough patience and computer support. Recall that the moment-generating function of \(V_{1}+V_{2}\) is

\[M(t)=\mathbb{E}[e^{t(V_{1}+V_{2})}].\]

From the independence of \(V_{1}\) and \(V_{2}\), we obtain

\begin{align*} M(t) & =\mathbb{E}(e^{tV_{1}})\mathbb{E}(e^{tV_{2}})\\ & =\left (\frac{e^{-t}+e^{t}}{2}\right )
\left (\frac{e^{-2t}+e^{2t}}{2}\right )\\ & =\frac{e^{-3t}+e^{-t}+e^{t}+e^{3t}}{4}.\end{align*}

This means that each of the points \(-3,-1,1,3\) in the support of \(V_{1}+V_{2}\) has probability \(1/4\). Next, let \(n=3\). The moment-generating function of \(V_{1}+V_{2}+V_{3}\) is

\begin{align*}
M(t) & =\mathbb{E}[e^{t(V_{1}+V_{2}+V_{3})}]=\mathbb{E}[e^{t(V_{1}+V_{2})}]\mathbb{E}(e^{tV_{3}})\\
& =\left (\frac{e^{-3t}+e^{-t}+e^{t}+e^{3t}}{4}\right )\left (\frac{e^{-3t}+e^{3t}}{2}\right )\\
& =\frac{e^{-6t}+e^{-4t}+e^{-2t}+2e^{0}+e^{2t}+e^{4t}+e^{6t}}{8}.
\end{align*}

Thus, the points \(-6,-4,-2,0,2,4,6\) in the support of \(V_{1}+V_{2}+V_{3}\) have the respective probabilities \(1/8,1/8,1/8,2/8,1/8,1/8,1/8\). Obviously, this procedure can be continued for \(n=4,5,\cdots\), but it is rather tedious. However, even though \(V_{1},V_{2},\cdots ,V_{n}\) are not identically distributed random variables, the sum \(V\) of them still has an approximate normal distribution. To obtain this normal approximation for \(V\) (or \(W\)), a more general form of the central limit theorem can be used that allows us to say that the standardized random variable

\[Z=\frac{W-0}{\sqrt{n(n+1)(2n+1)/6}}\]

is approximately \(N(0,1)\) when \(H_{0}\) is true. We accept this without proof. Therefore, we can approximate probabilities such as

\[\mathbb{P}(W\geq c;H_{0})\approx\mathbb{P}(Z\geq z_{\alpha};H_{0})\]

when the sample size \(n\) is sufficiently large using this normal distribution.

Example. The moment-generating function of \(W\) or \(V\) is given by

\[M(t)=\prod_{i=1}^{n} \frac{e^{-it}+e^{it}}{2}.\]

Using a computer software, it is possible to expand \(M(t)\) and find the coefficients of \(e^{kt}\) which equal \(\mathbb{P}(W=k)\). The distribution of \(W\) can be approximated by the normal distribution \(N(0,n(n+1)(2n+1)/6)\). \(\sharp\)

Example. Let \(m\) be the median of a symmetric distribution of the continuous-type. To test the hypothesis \(H_{0}:m=160\) against the alternative hypothesis \(H_{1}:m>160\), we take a random sample of size \(n=16\). For an approximate significance level of \(\alpha =0.05\), \(H_{0}\) is rejected if the computed \(W=w\) is

\[z\geq\frac{w}{\sqrt{16\cdot 17\cdot 33/6}}\geq 1.645\]

or

\[w\geq 1.645\cdot\sqrt{\frac{16\cdot17\cdot33}{6}}=63.626.\]

We say that the observed values of a random sample are

\[\begin{array}{l}
176.9, 158.3, 152.1, 158.8, 172.4, 169.8, 159.7, 162.7\\
156.6, 174.5, 184.4, 165.2, 147.8, 177.8, 160.1, 160.5
\end{array}\]

In the following table, the magnitudes of the differences \(|x_{i}-160|\) have been ordered and ranked

\[\begin{array}{cccccccc}
\hline 0.1(1) & 0.3(-2) & 0.5(3) & 1.2(-4) & 1.7(-4) & 2.7(6) & 3.4(-7) & 5.2(8)\\
7.9(-9) & 9.8(10) & 12.2(-11) & 12.4(12) & 14.5(13) & 16.9(14) & 17.8(15) & 24.4(16)\\
\hline\end{array}\]

For this set of data

\[w=1-2+3-4-5+6+\cdots +16=60.\]

Since \(60<63.626\), \(H_{0}\) is not rejected at the \(0.05\) significance level. It is interesting to note that \(H_{0}\) would have been rejected at \(\alpha =0.1\), since the approximate \(p\)-value is, making a unit correction for continuity,

\[p\mbox{-value}=\mathbb{P}(W\geq 60)=\mathbb{P}\left (\frac{W-0}{\sqrt{16\cdot 17\cdot 33}{6}}\geq\frac{59-0}{\sqrt{16\cdot 17\cdot 33}{6}}\right )\approx\mathbb{P}(Z\geq 1.525)=0.0636.\]

This would indicate to us that the data are too few to reject \(H_{0}\), but if the pattern continues, we shall most certainly reject with a larger sample size. \(\sharp\)

Although, theoretically we could ignore the possibilities that \(x_{i}=m_{0}\), for some \(i\), and that \(|x_{i}-m_{0}|=|x_{j}-m_{0}|\) for some \(i\neq j\), these situations do occur in applications. Usually, in practice, if \(x_{i}=m_{0}\) for some \(i\), that observation is deleted and the test is performed with a reduced sample size. If the absolute values of the differences from \(m_{0}\) of two or more observations are equal, each observation is assigned is assigned the average of the corresponding ranks. The change this causes in the distribution of \(W\) is not very great, and thus we continue using the same normal approximation.

Example. In Example \ref{ch4ex3}, we gave some paired data for percentage of body fat measured at the beginning and the end of a semester. Let \(m\) equal the median of the differences, \(x-y\). We shall use the Wilcoxon statistic to test the null hypothesis \(H_{0}:m=0\) against the alternative hypothesis \(H_{1}:m>0\). Since there are \(n=25\) nonzero differences, we reject \(H_{0}\) if

\[z=\frac{w-0}{\sqrt{25\cdot 26\cdot 51}{6}}\geq 1.645\]

or equivalently, if

\[w\geq 1.645\cdot\sqrt{\frac{25\cdot 26\cdot 51}{6}}=122.27\]

at an approximate \(\alpha =0.05\) significance level. We calculate the differences, yielding the ordered absolute values in the following table. Note that in the case of ties, the average of the ranks of the tied measurements is given

\[\begin{array}{cccccc}
\hline 0.1(1) & 0.2(2.5) & 0.2(2.5) & 0.4(5) & 0.4(-5) & 0.4(-5)\\
0.6(7) & 0.8(8) & 1.0(-9) & 1.1(10) & 1.4(11) & 1.5(-12)\\
1.8(13) & 1.9(14) & 2.0(15) & 2.2(16) & 2.3(-17) & 2.7(18)\\
3.0(19) & 3.1(-20) & 3.2(21) & 3.4(-22) & 5.1(-23) & 5.8(-24)\\
9.2(25) &&&&&\\
\hline\end{array}\]

The value of the Wilcokon statistic is

\[w=1+2.5+2.5+5-5-5+\cdots +25=51.\]

Since \(51<122.27\), we fail to reject the null hypothesis. The approximate \(p\)-value of this test is

\begin{align*} p\mbox{-value} & =\mathbb{P}(W\geq 51)=\mathbb{P}(W\geq 50)\\ & \approx\mathbb{P}\left (Z\geq\frac{51-0}{\sqrt{25\cdot 26\cdot 51}{6}}\right )\\ & =\mathbb{P}(Z\geq 0.637)=0.2505.\end{align*}

Two-Sample Distribution-Free Tests.

Now, we consider corresponding tests associated with characteristics of two distributions. The first test corresponds to the sign test and is called the median test. Let \(m_{X}\) and \(m_{Y}\) be the respective medians of two distributions of the continuous type. By taking independent random samples \(X_{1},X_{2},\cdots ,X_{n_{1}}\) and \(Y_{1},Y_{2},\cdots ,Y_{n_{2}}\) from these two distributions, respectively, we wish to test the hypothesis \(H_{0}:m_{X}=m_{Y}\). To do this, we combine the two samples and count the number \(V\) of \(X\) values in the lower half of this combined sample. If \(H_{0}:m_{X}=m_{Y}\) is true, then we would expect \(V\) to equal some number around \(n_{1}/2\). If as an alternative, \(m_{X}<m_{Y}\), we would expect \(V\) to be somewhat larger, and the alternative \(m_{X}>m_{Y}\) would suggest a smaller value of \(V\). Let us see what this means in terms of the distribution functions \(F(x)\) and \(G(x)\) of the respective distributions. If \(F(z)=G(z)\), then \(H_{0}:m_{X}=m_{Y}\) is true. Since we cannot find the distribution of \(V\) knowing only that \(m_{X}=m_{Y}\), we shall find its distribution assuming that \(F(z)=G(z)\). If \(F(z)\geq G(z)\), then \(m_{X}\leq m_{Y}\). If the observed value of \(V\) is quite large ; that is, if the number of values of \(X\) falling below the median of the combined sample is large, we would suspect that \(m_{X}<m_{Y}\). Therefore, the critical region for testing \(H_{0};m_{X}=m_{Y}\) against \(H_{1}:m_{X}<m_{Y}\) is of the form \(v\geq c\), where \(c\) is to be determined to yield the desired significance level (when \(F(z)=G(z)\)). Similarly, the critical region for testing \(H_{0}:m_{X}=m_{Y}\) against \(H_{1}:m_{X}>m_{Y}\) is of the form \(v\leq c\). When \(F(z)=G(z)\) is true and still assuming continuous-type distributions, we shall argue that \(V\) has a hyper-geometric distribution. To simplify the discussion, assume that \(n_{1}+n_{2}=2k\), where \(k\) is a positive integer. To compute \(\mathbb{P}(V=v)\), we need the probability that exactly \(v\) of \(X_{1},X_{2},\cdots ,X_{n_{1}}\) are in the lower half of the ordered combined sample. (Under our assumptions, the probability is zero that any two of the \(2k\) random variables are equal.) The smallest \(k\) of the \(n_{1}+n_{2}=2k\) items can be selected in any one of
$C^{2k}_{k}=C^{n_{1}+n_{2}}_{k}$ ways, each having the same probability, provided that \(F(z)=G(z)\). Of these \(C^{2k}_{k}\) ways, the number in which exactly \(v\) of the \(n_{1}\) values of \(X\) and \(k-v\) of the \(n_{2}\) values of \(Y\) appear in the lower \(k\) items is \(C^{n_{1}}_{v}\cdot C^{n_{2}}_{k-v}\). Therefore, we have

\[h(v)=\mathbb{P}(V=v)=\frac{C^{n_{1}}_{v}\cdot C^{n_{2}}_{k-v}}{c^{n_{1}+n_{2}}_{k}}\]

for \(v=0,1,\cdots ,n_{1}\), which givens \(C^{j}_{i}=0\) if \(i>j\).

\begin{equation}{\label{ch4ex4}}\tag{2}\mbox{}\end{equation}

Example \ref{ch4ex4}. Let \(X\) and \(Y\) denote the weights of ground cinnamon in “$115$-gram” tins packaged by companies \(A\) and \(B\), respectively. We shall test the hypothesis \(H_{0}:m_{X}=m_{Y}\) against the one-sided alternative hypothesis \(H_{1}:m_{X}<m_{Y}\). The weights of \(n_{1}=8\) and \(n_{2}=8\) tins of cinnamon packaged by companies \(A\) and \(B\), respectively, selected at random, yielded the following observations of \(X\)

\[\begin{array}{cccccccc}
117.1 & 121.3 & 127.8 & 121.9 & 117.4 & 124.5 & 119.5 & 115.1
\end{array}\]

and the following observations of \(Y\)

\[\begin{array}{cccccccc}
123.5 & 125.3 & 126.5 & 127.9 & 122.1 & 125.6 & 129.8 & 117.2
\end{array}\]

The critical region is of the form \(v\geq c\). To determine the value of \(c\) when \(F(z)=G(z)\), we compute \(\mathbb{P}(V=v)\) for \(v=6,7,8\). Therefore, we have

\begin{align*}
h(8) & =\frac{C^{8}_{8}\cdot C^{8}_{0}}{C^{8+8}_{8}}=\frac{1\cdot 1}{12870}=\frac{1}{12870};\\
h(7) & =\frac{C^{8}_{7}\cdot C^{8}_{1}}{C^{8+8}_{8}}=\frac{8\cdot 8}{12870}=\frac{64}{12870};\\
h(6) & =\frac{C^{8}_{6}\cdot C^{8}_{2}}{C^{8+8}_{8}}=\frac{28\cdot 28}{12870}=\frac{784}{12870}.
\end{align*}

Since

\[\mathbb{P}(V\geq 6)=h(8)+h(7)+h(6)=\frac{849}{12879}=0.066,\]

we shall reject \(H_{0}\) if \(v\geq 6\) at an \(\alpha =0.066\) significance level. The combined ordered sample of weights is listed below

\[\begin{array}{cccccccc}
115.1(x) & 117.1(x) & 117.2(y) & 117.4(x) & 119.5(x) & 121.3(x) & 121.9(x) & 122.1(y)\\
123.5(y) & 124.5(x) & 125.3(y) & 125.6(y) & 126.5(y) & 127.8(x) & 127.9(y) & 129.8(y)
\end{array}\]

We see that \(v=6\), and thus \(H_{0}\) is rejected with \(\alpha =0.066\). Note that the \(p\)-value of this test is \(p\mbox{-value}=\mathbb{P}(V\geq 6)=0.066\). \(\sharp\)

The median test can be generalized as easily as the sign test. Instead of letting \(V\) be the number of \(X\) values in the lower half of the combined sample of size \(n_{1}+n_{2}\), let \(V\) be the number of \(X\) values in the lower \(i\) values of the combined sample. Thus, if \(i/(n_{1}+n_{2}=p\), we would use \(V\) to test the equality of the \((100p)\)th percentiles of the two distributions. If \(V\) is much larger than \(n_{1}p\), we would suspect that the \((100p)\)th percentile of the \(X\) distribution is smaller than that of the \(Y\) distribution. If \(V\) is much smaller than \(n_{1}p\), we would guess that it would be the other way around. Since, under \(F(z)=G(z)\), all orderings of the \(n_{1}\) values of \(X\) and \(n_{2}\) values of \(Y\) have the same probability, the p.d.f. of \(V\) is

\[h(v)=\mathbb{P}(V=v)=\frac{C^{n_{1}}_{v}\cdot C^{n_{2}}_{i-v}}{C^{n_{1}+n_{2}}_{i}}\]

for \(v=0,1,2,\cdots ,n_{1}\).

Example. Using the data in Example \ref{ch4ex4}, we shall test the null hypothesis that the \(25\)th percentiles \(\pi_{0.25}\) of \(X\) and \(Y\) are equal against the alternative hypothesis that the \(25\)th percentile of \(X\) is less than the \(25\)th percentile of \(Y\). Let \(V\) equal the number of \(X\) values in the lower four values of the combined sample because \(i/(n_{1}+n_{2})=4/(8+8)=0.25\). The critical region is of the form \(v\geq c\). To determine the value of \(c\), we use the p.d.f. of \(V\) given by

\[h(v)=\mathbb{P}(V=v)=\frac{C^{8}_{v}\cdot C^{8}{4-v}}{C^{16}_{8}}\]

for \(v=0,1,2,3,4\). Now, we have

\[h(4)=\frac{70}{1820}=0.038\mbox{ and }h(3)=\frac{448}{1820}=0.246.\]

Therefore, we take for our critical region \(C=\{v:v\geq 4\}\), which gives a significance level of \(\alpha =0.038\). From Example \ref{ch4ex4} we see that three of the first four observations in the ordered arrangement are values of \(X\) and thus \(v=3\), so we fail to reject the null hypothesis. Furthermore, the \(p\)-value of this test is

\[p\mbox{-value}=\mathbb{P}(V\geq 3)=0.246+0.038=0.284.\]

There is another method for testing the equality of the medians of two distributions of the continuous type that uses the magnitudes of the observations. For this test, it is assumed that the populations have similar shapes. Order the combined sample of \(X_{1},X_{2},\cdots ,X_{n_{1}}\) and \(Y_{1},Y_{2},\cdots ,Y_{n_{2}}\) in increasing order of magnitude. Assign to the ordered values the ranks \(1,2,\cdots ,n_{1}+n_{2}\). In the case of ties, assign the average of the
ranks associated with the tied values. Let \(W\) equal the sum of the ranks of \(Y_{1},Y_{2},\cdots ,Y_{n_{2}}\). If the distribution of \(Y\) is shifted
to the right of that of \(X\), the values of \(Y\) would tend to be larger than the values of \(X\) and \(W\) would usually be larger than expected when \(F(z)=G(z)\). Thus, the critical region for testing the hypothesis \(H_{0}:m_{X}=m_{Y}\) against \(H_{1}:m_{X}<m_{Y}\) would be of the form \(w\geq c\). Similarly, if the alternative hypothesis is \(H_{1}:m_{X}>m_{Y}\), the critical region would be of the form \(w\leq c\). We shall not derive the distribution of \(W\). However, if \(n_{1}\) and \(n_{2}\) are both larger than \(7\), a normal approximation can be used. With \(F(z)=G(z)\), the mean and variance of \(W\) are

\[\mathbb{E}(W)=\frac{n_{2}(n_{1}+n_{2}+1)}{2}\]

and

\[\mbox{Var}(W)=\frac{n_{1}n_{2}(n_{1}+n_{2}+1)}{12}.\]

Therefore, the statistic

\[Z=\frac{W-n_{2}(n_{1}+n_{2}+1)/2}{\sqrt{n_{1}n_{2}(n_{1}+n_{2}+1)/12}}\]

is approximately \(N(0,1)\).

Example. We illustrate the two-sample Wilcoxon test using the data given in Example \ref{ch4ex4}. The crirical region for testing \(H_{0}:m_{X}=m_{Y}\)
against \(H_{1}:m_{X}<m_{Y}\) is of the form \(w\geq c\). Since \(n_{1}=n_{2}=8\), at an approximate \(\alpha =0.05\) significance level, \(H_{0}\) is rejected if

\[z=\frac{w-8(8+8+1)/2}{\sqrt{8\cdot 8\cdot (8+8+1)/12}}>1.645,\]

that is, if
\[w>1.645\cdot\sqrt{\frac{8\cdot 8\cdot (8+8+1)}{12}}+4\cdot 17=83.66.\]

From the data we see that the computed \(W\) is

\[w=3+8+9+11+12+13+15+16=87>83.66.\]

Thus \(H_{0}\) is rejected, an action consistent with that of the median test. The \(p\)-value of this test is, making a half-unit correction for continuity,

\begin{align*} p\mbox{-value} & =\mathbb{P}(W\geq 87)\\ & =\mathbb{P}\left (\frac{W-68}{\sqrt{90.667}}\geq
\frac{86.5-68}{\sqrt{90.667}}\right )\\ & \approx\mathbb{P}(Z\geq 1.943)=0.026.\end{align*}

It should be noted that the median and Wilcoxon tests are much less sensitive to extreme values than is \(t\)-distribution test based on \(\bar{X}-\bar{Y}\). Therefore, if there is much skewness or contamination, these proposed distribution-free tests are much safer. In particular, that of Wilcoxon is quite good and does not lose too much in case the distribution are close to normal ones. It is important to note that the one-sample Wilcoxon requires symmetry of the underlying distribution, but the two-sample Wilcoxon does not and thus can be used for skewed distributions.

Example. Let \(X\) have a Cauchy distribution, and let \(Y\) have a standard normal distribution \(N(0,1)\). The assumption of the Wilcoxon test are not satisfied, but the distributions of \(X\) and \(Y\) have “similar shapes”. We shall simulate on the computer random samples of size \(n_{1}=n_{2}=12\) from these distributions and use the Wilcoxon statistic to test the null hypothesis \(H_{0}:m_{X}=m_{Y}\) against a two-sided alternative hypothesis \(H_{1}:m_{X}\neq m_{Y}\). The \(n_{1}=12\) observations of \(X\), which have been ordered, are

\[\begin{array}{cccccc}
-9.7465 & -1.9458 & -1.5203 & -1.3990 & -0.7215 & -0.1802\\
0.1219 & 0.2104 & 0.9365 & 2.3121 & 7.5518 & 13.7523
\end{array}\]

The \(n_{2}=12\) observations of \(Y\), which have been ordered, are

\[\begin{array}{cccccc}
-1.5124 & -1.3661 & -1.0712 & -0.6178 & -0.3631 & -0.2465\\
0.1484 & 0.2688 & 0.4528 & 1.0480 & 1.2914 & 1,7841
\end{array}\]

The combined ordered samples are listed below

\[\begin{array}{cccccc}
-9.7465(1x) & -1.9458(2x) & -1.5203(3x) & -1.5124(4y) & -1.3990(5x) & -1.3661(6y)\\
-1.0712(7y) & -0.7215(8x) & -0.6178(9y) & -0.3631(10y) & -0.2465(11y) & -0.1802(12x)\\
0.1219(13x) & 0.1484(14y) & 0.2104(15x) & 0.2688(16y) & 0.4528(17y) & 0.9365(18x)\\
1.0480(19y) & 1.2914(20y) & 1.7841(21y) & 2.3121(22x) & 7.5518(23x) & 13.7523(24x)
\end{array}\]

The value of the Wilcoxon statistic, the sum of the \(Y\) ranks, is

\[w=4+6+7+9+10+11+14+16+17+19+20+21=154.\]

The sum of the \(X\) ranks is \(300-154=146\). Now, we have

\[z=\frac{154-(12\cdot 25/2)}{\sqrt{12\cdot 12\cdot 25/12}}=\frac{154-150}{\sqrt{300}}=0.23.\]

Therefore \(H_{0}\) is clearly accepted. That is, we accept the null hypothesis that the medians of \(X\) and \(Y\) are equal, as we expected.

\begin{equation}{\label{e}}\tag{E}\mbox{}\end{equation}

Run Test and Test for Randomness.

Under the assumption that the random variables \(X\) and \(Y\) are of the continuous type and have distributions \(F(x)\) and \(G(y)\), respectively, we describe another test of the hypothesis \(H_{0}:F(z)=G(z)\). This new test cal also be used to test for randomness. Suppose that we have \(n_{1}\) observations of the random variable \(X\) and \(n_{2}\) observations of the random variable \(Y\). The combination of two sets of independent observations into one collection of \(n_{1}+n_{2}\) observations, placed in ascending order of magnitude, might yield an arrangement

\[\underline{yyy}\underline{xx}\underline{y}\underline{x}\underline{y}\underline{xx}\underline{yy}\]

where \(x\) denoted an observation of \(X\) and \(y\) an observation of \(Y\) in the ordered arrangement. We have underlined groups of successive values of \(X\) and \(Y\). Each underlined groups is called run. Therefore, we have a run of three values of \(Y\), followed by a run of two values of \(X\), followed by a run of one value of \(Y\), and so on. In this example there are seven runs. We give two more examples to show what might be indicated by the number of runs. If the five \(x\)’s and seven \(y\)’s had the orderings

\[\underline{xxxx}\underline{y}\underline{x}\underline{yyyyyy}\]

we might suspect that \(F(z)\geq G(z)\). Note that there are four runs in this ordering. The ordered arrangement

\[\underline{yyy}\underline{xx}\underline{y}\underline{xxx}\underline{yyy}\]

might suggest that the median of the two distributions are equal but that the spread of the \(Y\) distribution is greater than the spread of the \(X\) distribution, for example, that \(\sigma_{Y}>\sigma_{X}\). These examples suggest that the hypothesis \(F(z)=G(z)\) should be rejected if the number of runs is too small, where a small number of runs could be caused by the differences in the location or in the spread of the two distributions. Let the random variables \(R\) equal the number of runs in the combined ordered sample of \(n_{1}\) observations of \(X\) and \(n_{2}\) observations of \(Y\). We shall find the distribution of \(R\) when \(F(z)=G(z)\) and then describe a test of the hypothesis \(H_{0}:F(z)=G(z)\). Under \(H_{0}\), all permutations of the \(n_{1}\) observations of \(X\) and \(n_{2}\) observations of \(Y\) have equal probabilities. We can select the \(n_{1}\) positions for the \(n_{1}\) values of \(X\) in \(C^{n_{1}+n_{2}}_{n_{1}}\) ways, the probability of each arrangement being \(1/C^{n_{1}+n_{2}}_{n_{1}}\). To find \(\mathbb{P}(R=r)\), we must determine the number of permutations that yield \(r\) runs. First suppose that \(r=2k\), where \(k\) is a positive integer. In this case the \(n_{1}\) ordered values of \(X\) and \(n_{2}\) ordered values of \(Y\) must each be separated into \(k\) runs. We can form \(k\) runs of the \(n_{1}\) claues of \(X\) by inserting \(k-1\) dividers into the \(n_{1}-1\) spaces between the values of \(X\), with no more than on divider per space. This can be done in \(C^{n_{1}-1}_{k-1}\) ways. Similarly, \(k\) runs of the \(n_{2}\) values of \(Y\) can be formed in \(C^{n_{2}-1}_{k-1}\) ways. These two sets of runs can be placed together to form \(r=2k\) runs, of which \(C^{n_{1}-1}_{k-1}\cdot C^{n_{2}-1}_{k-1}\) begin with a run of \(x\)’s and \(C^{n_{2}-1}{k-1}\cdot C^{n_{1}-1}{k-1}\) begin with a run of \(y\)’s. Thus

\[\mathbb{P}(R=2k)=\frac{2C^{n_{1}-1}_{k-1}\cdot C^{n_{2}-1}_{k-1}}{C^{n_{1}+n_{2}}_{n_{2}}},\]

where \(2k\) is an element of the space \(R\). When \(r=2k+1\), it is possible to have \(k+1\) runs of the ordered values of \(X\) and \(k\) runs of the ordered values of \(Y\) or \(k\) runs of \(X\)’s and \(k+1\) runs of \(Y\)’s. We can form \(k+1\) runs of the \(n_{1}\) values pf \(X\) by inserting \(k\) dividers into the \(n_{1}-1\) spaces between the values of \(X\) with no more than one divider per space in \(C^{n_{1}-1}_{k}\) ways. Similarly, \(k\) runs of \(n_{2}\) values of \(Y\) can be done \(C^{n_{2}-1}_{k-1}\) ways. These two sets of runs can be placed together to form \(2k+1\) runs in \(C^{n_{1}-1}_{k}\cdot C^{n_{2}-1}_{k-1}\) ways. In addition, \(k+1\) runs of the \(n_{2}\) values of \(Y\) and \(k\) runs of the \(n_{1}\) values of \(X\) can be placed together to form \(C^{n_{2}-1}_{k}\cdot C^{n_{1}-1}_{k-1}\) sets of \(2k+1\) runs. Hence

\[\mathbb{P}(R=2k+1)=\frac{C^{n_{1}-1}_{k}\cdot C^{n_{2}-1}_{k-1}+C^{n_{1}-1}_{k-1}\cdot C^{n_{2}-1}_{k}}{C^{n_{1}+n_{2}}_{n_{2}}},\]

for \(2k+1\) in the space of \(R\). A test based on the numbers of runs can be used for testing the hypothesis \(H_{0}:F(z)=G(z)\). The hypothesis is rejected if the observed number of runs \(r\) is too small. That is, the critical region is of the form \(r\leq c\), where the constant \(c\) is determined by using the p.d.f. of \(R\) to yield the desired significance level. The run test is sensitive to both differences in location and differences in spread of the two distributions.

\begin{equation}{\label{ch4ex5}}\tag{3}\mbox{}\end{equation}

Example \ref{ch4ex5}. Let \(X\) and \(Y\) equal the percentages of body fat for freshman women and men, respectively, with distribution functions \(F(x)\) and \(G(y)\). We shall use the run test to test the hypothesis \(H_{0}:F(z)=G(z)\) against the alternative hypothesis \(H_{1}:F(z)<G(z)\). (That is, the alternative hypothesis is that the \(X\) distribution is to the right of the \(Y\) distribution.) Ten observations of both \(X\) and \(Y\) that have been ordered are

\[\begin{array}{ccccccccccc}
X: & 16.6 & 16.7 & 18.5 & 19.2 & 21.5 & 22.4 & 22.6 & 23.2 & 24.2 & 26.3\\
Y: & 9.4 & 9.7 & 11.3 & 11.8 & 13.3 & 15.6 & 16.1 & 16.5 & 18.2 & 21.7
\end{array}\]

The critical region is of the form \(r\leq c\). Using the statistic table, we have

\begin{align*}
\mathbb{P}(R=2)=\frac{2}{184756} & \mathbb{P}(R=3)=\frac{18}{184756}\\
\mathbb{P}(R=4)=\frac{162}{184756} & \mathbb{P}(R=5)=\frac{648}{184756}\\
\mathbb{P}(R=6)=\frac{2592}{184756} & \mathbb{P}(R=3)=\frac{6048}{184756}.
\end{align*}

The sum of these probabilities is \(9470/184756=0.051\). Therefore, we can take for our critical region \(C=\{r:r\leq 7\}\) with a significance level of \(\alpha =0.051\). To determine the number of runs, we order the combined samples and underline adjacent \(x\) and \(y\) values.

\[\begin{array}{l}
\underline{\begin{array}{cccccccc}
9.4 & 9.7 & 11.3 & 11.8 & 13.3 & 15.6 & 16.1 & 16.5
\end{array}}
\underline{\begin{array}{cc}
16.6 & 16.7
\end{array}}\\
\underline{18.2}\underline{\begin{array}{ccc}
18.5 & 19.2 & 21.5
\end{array}}\underline{21.7}\underline{\begin{array}{ccccc}
22.4 & 22.6 & 23.2 & 24.2 & 26.3
\end{array}}
\end{array}\]

We see that the number of runs is \(r=6\). Therefore, we reject the null hypothesis. Note that the \(p\)-value of this test is

\[p\mbox{-value}=\mathbb{P}(R\leq 6)=\frac{3422}{184756}=0.0185.\]

When \(n_{1}\) and \(n_{2}\) are large, say each is at least equal to \(10\), \(R\) can be approximated with a normally distributed random variable. That is, it can be shown

\[\mathbb{E}(R)=\frac{2n_{1}n_{2}}{n_{1}+n_{2}}+1\]

and

\[\mbox{Var}(R)=\frac{(E(R)-1)(E(R)-2)}{n_{1}+n_{2}-1}=\frac{2n_{1}n_{2}(2n_{1}n_{2}-n_{1}-n_{2}}{(n_{1}+n_{2})^{2}(n_{1}+n_{2}-1)}.\]

and

\[Z=\frac{R-\mathbba{E}(R)}{\sqrt{\mbox{Var}(R)}}\]

is approximately \(N(0,1)\). The critical region for testing the null hypothesis \(H_{0}:F(z)=G(z)\) is of the form \(z\leq -z_{\alpha}\), where \(\alpha\) is the desired significance level.

Example. We use the normal approximation to calculate the significance level and the \(p\)-value for Example \ref{ch4ex5}. With \(n_{1}=n_{2}=10\),

\[\mathbb{E}(R)=\frac{2\cdot 10\cdot 10}{10+10}+1=11\]

and

\[\mbox{Var}(R)=\frac{(11-1)(11-2)}{19}=\frac{90}{19}.\]

With the critical region \(C=\{r:r\leq 7\}\), the approximate significance level, using a half unit correction for continuity, is

\begin{align*} \alpha & =\mathbb{P}(R\leq 7)\\ & =\mathbb{P}\left (\frac{R-11}{\sqrt{90/19}}\leq\frac{7.5-11}{\sqrt{90/19}}\right )\\ & \approx\mathbb{P}(Z\leq -1.608)=0.0539.\end{align*}

Note that this value compares very favorably with \(\alpha =0.051\) given in Example \ref{ch4ex5}. Since \(r=6\), the approximate \(p\)-value, using a
normal approximation, is

\begin{align*} p\mbox{-value} & =\mathbb{P}(R\leq 6)\approx\mathbb{P}\left (Z\leq\frac{6.5-11}{\sqrt{90/19}}\right )\\ & =\mathbb{P}(Z\leq -2.068)=0.0193,\end{align*}

which is close to the \(p\)-value given in Example \ref{ch4ex5}. \(\sharp\)

Applications of the run test include tests for randomness. Let \(x_{1},x_{2},\cdots ,x_{k}\) be the observed values of a random variable \(X\), where the subscripts now designate the order in which the outcomes were observed and the observations are not arranged in the order of magnitude. Assume that \(k\) is even. The median divides the \(k\) numbers into a lower and upper half. Replace each observation by \(L\) if it falls below the median and by \(U\) if it falls above the median. Then, for example, a sequence such as

\[\begin{array}{cccccccc}
U & U & U & L & U & L & L & L
\end{array}\]

might a trend toward decreasing values of \(X\). If trend is the alternative hypothesis to randomness, the critical region would be of the form \(r\leq c\). On the other hand, if we have a sequence such as

\[\begin{array}{cccccccc}
U & L & U & L & U & L & U & L
\end{array}\]

we would suspect a cyclic effect and would reject the hypothesis of randomness if \(r\) were too large. To test both for trend and cyclic effect, the critical region for testing the hypothesis of randomness is of the form \(r\leq c_{1}\) or \(r\geq c_{2}\). If the sample size \(k\) is odd, the number of observations in the “upper half” and lower half” will differ by one. In this case we will always put the extra observation in the upper group and, of course, \(n_{2}=n_{1}+1\). If the median is equal to a value that is tied with other values, we will again out the tied values in the upper group and then perform the test in which \(n_{1}\) and \(n_{2}\) are not equal to each other.

Example. We shall use a sample of size \(k=14\) to test for both trend and cyclic effect. To determine the critical region for rejecting the hypothesis of randomness, we use the p.d.f of \(R\) with \(n_{1}=n_{2}=7\). Since

\begin{align*}
\mathbb{P}(R=2) & =\mathbb{P}(R=14)=\frac{2}{3432}\\
\mathbb{P}(R=3) & =\mathbb{P}(R=13)=\frac{12}{3432}\\
\mathbb{P}(R=4) & =\mathbb{P}(R=12)=\frac{72}{3432},
\end{align*}

the critical region \(\{r:r\leq 4\mbox{ or }r\geq 12\}\) would yield a test at a significance level of \(\alpha =172/3432=0.05\). The \(14\) observations are

\[\begin{array}{ccccccc}
81.4 & 76.3 & 85.6 & 76.4 & 88.4 & 80.2 & 85.6\\
84.6 & 78.3 & 82.8 & 88.1 & 85.4 & 87.7 & 86.6
\end{array}\]

The median of these outcomes is \((84.6+85.4)/2=85.0\). Replacing each outcome with \(L\) if it falls below \(85\) and \(U\) if it falls above \(85\) yields the sequence

\[\underline{LL}\underline{U}\underline{L}\underline{U}\underline{L}\underline{U}\underline{LLL}\underline{UUUU}.\]

Since \(r=8\), the hypothesis of randomness is not rejected. \(\sharp\)

\begin{equation}{\label{f}}\tag{F}\mbox{}\end{equation}

Kolmogorov-Smirnov Goodness-of-Fit Test.

Now, we discuss a test that considers the goodness of fit between a hypothesized distribution function and an empirical distribution function. The empirical distribution function is given in terms of the order statistics. Let \(y_{1}<y_{2}<\cdots y_{n}\) be the observed values of the order statistics of a random sample \(x_{1},x_{2},\cdots ,x_{n}\) of size \(n\). When no two observations are equal, the empirical distribution function is defined by

\[F_{n}(x)=\left\{\begin{array}{ll}
0 & x<y_{1}\\
k/n & y_{k}\leq x<y_{k+1}\\
1 & y_{k}\leq x,
\end{array}\right .,k=1,2,\cdots ,n-1,\]

In this case, the empirical distribution function has a jump of magnitude \(1/n\) occurring at each observation. If \(n_{k}\) observations are equal to \(x_{k}\), a jump of magnitude \(n_{k}/n\) occurs at \(x_{k}\). Let \(X_{1},X_{2},\cdots ,X_{n}\) denote a random sample of size \(n\) from a distribution of the continuous type with distribution function \(F(x)\). Consider a fixed value of \(x\). Then \(W=F_{n}(x)\), the value of the empirical distribution function at \(x\), can be thought of as a random variable that takes on the values \(0,1/n,2/n,\cdots ,1\). Now, we have \(nW=k\) if and only if exactly \(k\) observations are less than or equal to \(x\) (say success) and \(n-k\) observations are greater than \(x\). The probability that an observation is less than or equal to \(x\) is given by \(F(x)\). The probability that an observation is less than or equal to \(x\) is given by \(F(x)\). That is, the probability of success is \(F(x)\). Because of the independence of the random variables \(X_{1},X_{2},\cdots ,X_{n}\), the probability of \(k\) successes is given by the binomial distribution, namely

\[\mathbb{P}(nW=k)=P(W=k/n)=C^{n}_{k}[F(x)]^{k}[1-F(x)]^{n-k}\]

for \(k=0,1,2,\cdots ,n\). Since \(nW\) has a binomial distribution with \(p=F(x)\), the mean and variance of \(nW\) are given by
$\mathbb{E}(nW)=nF(x)$ and \(\mbox{Var}(nW)=n[F(x)][1-F(x)]\). Hence the mean and variance of \(W=F_{n}(x)\) are

\[\mathbb{E}[F_{n}(x)]=\mathbb{E}(W)=F(x)\]

and

\[\mbox{Var}[F_{n}(x)]=\mbox{Var}(W)=\frac{F(x)[1-F(x)]}{n}.\]

Since the variance of \(F_{n}(x)\) gets nearer to zero as \(n\) becomes large, \(F_{n}(x)\) and its mean \(F(x)\) tend to be closer with large \(n\). As a matter of fact, there is a theorem by Glivenko, which states that with probability \(1\), \(F_{n}(x)\) converges to \(F(x)\) uniformly in \(x\) as \(n\rightarrow x\). Because of the convergence of the empirical distribution function to the theoretical distribution function, it makes sense to construct a goodness-of-fit test based on the closeness of the empirical and a hypothesized distribution function, say \(F_{n}(x)\) and \(F_{0}(x)\), respectively. We shall use the Kolmogorov-Smirnov statistic defined by

\[D_{n}=\sup_{x}|F_{n}(x)-F_{0}(x)|.\]

That is, \(D_{n}\) is the least upper bound of all pointwise differences \(|F_{n}(x)-F_{0}(x)|\). We would like to point out that the distribution of \(D_{n}\) does not depend on the particular function \(F_{0}(x)\) of the continuous type. (This is essentially due to the fact that \(Y=F_{0}(x)\) has a uniform distribution \(U(0,1)\).) Thus \(D_{n}\) can be thought of as a distribution-free statistic. We are interested in using the Kolmogorov-Smirnov statistic \(D_{n}\) to test the null hypothesis \(H_{0}:F(x)=F_{0}(x)\) against all alternatives \(H_{1}:F(x)\neq F_{0}(x)\), where \(F_{0}\) is some specified distribution function. Intuitively, we accept \(H_{0}\) if the empirical distribution function \(F_{n}(x)\) is sufficiently close to \(F_{0}(x)\), that is, if the value of \(D_{n}\) is sufficiently small. The hypothesis \(H_{0}\) is rejected if the observed value of \(D_{n}\) is greater than the critical value selected from the statistic table VIII, where this critical value depends on the desired significance level and sample size.

Example. We shall test the null hypothesis \(H_{0}:F(x)=F_{0}(x)\) against \(H_{1}:F(x)\neq F_{0}(x)\), where

\[F_{0}(x)=\left\{\begin{array}{ll}
0 & x<0\\
x & 0\leq x<1\\
1 & 1\leq x.
\end{array}\right .\]

That is, the null hypothesis is that \(X\) is \(U(0,1)\). If the test is based on a sample of size \(n=10\) and if \(\alpha =0.1\), the critical region is \(C=\{d_{10}:d_{10}\geq 0.37\}\), where \(d_{10}\) is the observed value of Kolmogorov-Smirnov statistic \(D_{10}\). Suppose that the observed values of the random sample are

\[\begin{array}{cccccccccc}
0.62 & 0.36 & 0.23 & 0.76 & 0.65 & 0.09 & 0.55 & 0.26 & 0.38 & 0.24
\end{array}\]

We see that \(d_{10}=F_{10}(0.65)-F_{0}(0.65)=0.25\) and hence \(H_{0}\) is not rejected. \(\sharp\)

Example. When observing a Poisson process with a mean rate of arrivals \(\lambda =1/\theta\), the random variable \(W\), which denotes the waiting time until the \(\alpha\)th arrival, has a gamma distribution. The p.d.f. of \(W\) is

\[f(w)=\frac{w^{\alpha -1}e^{-w/\theta}}{\Gamma (\alpha )\theta^{\alpha}}\]

for \(0\leq w<\infty.\) A Geiger counter was set up to record the waiting time \(W\) in seconds to observe \(\alpha =100\) alpha-particle emissions of barium \(133\). It is claimed that the number of counts per second has a Poisson distribution with \(\lambda =14.7\) and hence \(\theta =0.068\). We shall test the hypothesis \(H_{0}:F(w)=\int_{-\infty}^{w} f(t)dt\), where \(f(t)\) is the gamma p.d.f. with \(\theta =0.068\) and \(\alpha =100\). Based on \(25\) observations, \(H_{0}\) is rejected if \(d_{25}\geq 0.24\) for \(\alpha =0.1\). For the observed data, \(d_{25}=0.117\) and hence \(H_{0}\) is not rejected. \(\sharp\)

Note that we have been assuming that \(F(x)\) is a continuous function. That is, we have only considered random variables of the continuous type. This procedure may also be applied in the discrete case. However, in the discrete case, the true significance level will be at most \(\alpha\). That is, the resulting test will be conservative. Another application of the Kolmogorov-Smirnov statistic is in forming a confidence band for an unknown distribution function \(F(x)\). To form a confidence band based on a sample of size \(n\), select an umber \(d\) satisfying \(\mathbb{P}(D_{n}\geq d)=\alpha\). Then, we have

\begin{align*}
1-\alpha & =\mathbb{P}(\sup_{x}|F_{n}(x)-F(x)|\leq d)\\
& =\mathbb{P}(|F_{n}(x)-F(x)|\leq d\mbox{ for all }x)\\
& =\mathbb{P}(F_{n}(x)-d\leq F(x)\leq F_{n}(x)+d\mbox{ for all }x).
\end{align*}

Let

\[F_{L}(x)=\left\{\begin{array}{ll}
0 & F_{n}(x)-d\leq 0\\
F_{n}(x)-d & F_{n}(x)-d>0
\end{array}\right .\]

and

\[F_{U}(x)=\left\{\begin{array}{ll}
F_{n}(x)+d & F_{n}(x)+d<1\\
1 & F_{n}(x)+d\geq 0.
\end{array}\right .\]

The two-step functions \(F_{L}(x)\) and \(F_{U}(x)\) yield a \(100(1-\alpha )\%\) confidence band for the unknown distribution function \(F(x)\).

Example. A random sample of size \(n=15\) from an unknown distribution yielded the sample values,

\[3.88, 3.97, 4.03, 2.49, 3.18, 3.08, 2.91, 3.43, 2.41, 1.57, 3.78, 3.25,1.29, 2.57, 3.40.\]

Now, we have \(\mathbb{P}(D_{15}\geq 0.3)=0.1\). \(\sharp\)

 

 

Hsien-Chung Wu
Hsien-Chung Wu
文章: 183

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *