Karl Paul Themistokles von Eckenbrecher (1842-1921) was a German painter.
We have sections
- Power and Sample Size
- Test about One Mean and One Variance
- Tests of the Equality of two Normal Distributions
- Likelihood Ratio Tests
\begin{equation}{\label{a}}\tag{A}\mbox{}\end{equation}
Power and Sample Size.
Let \(X\) equal the breaking strength of a steel bar. Assume that the steel bar is manufactured by process I. Then \(X\) is \(N(50,36)\). When the process II (a new process) is used, we assume that \(X\) is \(N(55,36)\). Given a large number of steel bars manufactured by process II, how can we test whether the increase in the mean breaking strength was realized? That is, we assume \(X\) is \(N(\mu ,36)\) and \(\mu\) is equal to \(50\) or \(55\). The purpose is to test the null hypothesis \(H_{0}:\mu=50\) against the alternative hypothesis \(H_{1}:\mu =55\). Note that each of these hypotheses completely specifies the distribution of \(X\). In other words, \(H_{0}\) states that \(X\) is \(N(50,36)\), and \(H_{1}\) states that \(X\) is \(N(56,36)\). A hypothesis that completely specifies the distribution of \(X\) is called a simple hypothesis; otherwise, it is called a composite hypothesis (composed of at least two simple hypotheses). For example, the hypothesis \(H_{1}:\mu >50\) is a composite hypothesis, since it is composed of all normal distributions with \(\sigma^{2}=36\) and means greater than \(50\).
In order to test which of the two hypotheses \(H_{0}\) or \(H_{1}\) is true, we shall set up a rule based on the breaking strengths \(x_{1},x_{2},\cdots ,x_{n}\) of \(n\) bars (the observed values of a random sample of size \(n\) from this new normal distribution). The rule leads to a decision to accept or reject the null hypothesis \(H_{0}\). Therefore, it is necessary to partition the sample space into two parts by saying \(C\) and \(C’\) such that that \(H_{0}\) is rejected when$(x_{1},x_{2},\cdots ,x_{n})\in C$, and that \(H_{0}\) is accepted (not rejected) when \((x_{1},x_{2},\cdots ,x_{n})\in C’\), . The rejection region \(C\) for \(H_{0}\) is called the critical region for the test. The partition of the sample space is specified in terms of the values of a statistic called the test statistic. We can let \(\bar{X}\) be the test statistic. In this situation, we can take
\[C=\{(x_{1},\cdots ,x_{n}):\bar{x}\geq 53\}.\]
Then, we can define the critical region as those values of the test statistic for which \(H_{0}\) is rejected. In other words, the given critical region is equivalent to defining
\[C=\{\bar{x}:\bar{x}>53\}\]
in the space of \(\bar{x}\). If \((x_{1},x_{2},\cdots ,x_{n})\in C\) when \(H_{0}\) is true, it means that \(H_{0}\) will be rejected when it is true. Such an error is called a type I error. If \((x_{1},x_{2},\cdots ,x_{n})\in C’\) when \(H_{1}\) is true, it means that \(H_{0}\) will be accepted when \(H_{1}\) is true. Such an error is called a type II error. The probability of a type I error is called the significance level of the test and is denoted by \(\alpha\). That is,
\[\alpha =\mathbb{P}[(X_{1},X_{2},\cdots ,X_{n})\in C;H_{0}]\]
is the probability that \((X_{1},X_{2},\cdots ,X_{n})\) falls in \(C\) when \(H_{0}\) is true. The probability of a type II error is denoted by \(\beta\); that is,
\[\beta =\mathbb{P}[(X_{1},X_{2},\cdots ,X_{n})\in C’;H_{1}]\]
is the probability of accepting (failing to reject) \(H_{0}\) when it is false. For illustration, suppose that \(n=16\) bars were tested and
\[C=\{\bar{x}:\bar{x}>53\}.\]
Then \(\bar{X}\) is \(N(50,36/16)\) when \(H_{0}\) is true and is \(N(56,36/16)\) when \(H_{1}\) is true. Therefore, we obtain
\begin{align*} \alpha & =\mathbb{P}(\bar{X}\geq 53;H_{0})\\ & =\mathbb{P}\left (\frac{\bar{X}-50}{6/4}\geq\frac{53-50}{6/4};H_{0}\right )\\ & =1-\Phi (2)=0.0228\end{align*}
and
\begin{align*} \beta & =\mathbb{P}(\bar{X}<53;H_{1})\\ & =\mathbb{P}\left (\frac{\bar{X}-55}{6/4}<\frac{53-55}{6/4};H_{1}\right )\\ & =\Phi (-4/3)=1-0.9087=0.0913.\end{align*}
We remark that a decrease in the size of \(\alpha\) leads to an increase in the size of \(\beta\), and vice versa. Both \(\alpha\) and \(\beta\) can be decreased if the sample size \(n\) is increased.
Assume that, when given a name tag, a person puts it on either the right or left side. Let \(p\) equal the probability that the name tag is placed on the right side. We shall test the null hypotheses \(H_{0}:p=1/2\) against the composite alternative hypothesis \(H_{1}:p<1/2\). (Included with the null hypothesis are those values of \(p\) that are greater than \(1/2\). In other words, we can think of \(H_{0}\) as \(H_{0}:p\geq 1/2\).) We shall give name tag to a random sample of \(n=20\) people, denoting the placements of their name tags with Bernoulli random variables \(X_{1},X_{2},\cdots ,X_{20}\), where \(X_{i}=1\) means that a person places the name tag on the right side, and \(X_{i}=0\) means that a person places the name tag on the left. Then, we can use for our statistic \(Y=\sum_{i=1}^{20} X_{i}\) which has a binomial distribution \(B(20,p)\). The critical region is defined by
$C=\{y:y\leq 6\}$ or, equivalently, by
\[\left\{\(x_{1},\cdots ,x_{20}):\sum_{i=1}^{2} x_{i}\leq 6\right\}.\]
Since \(Y\) is \(B(20,1/2)\) for \(p=1/2\), the significance level of the corresponding test is
\begin{align*} \alpha & =\mathbb{P}\left (Y\leq 6;p=\frac{1}{2}\right )\\ & =\sum_{y=0}^{6}C^{20}_{y}\left (\frac{1}{2}\right )^{20}\\ & =0.0577\end{align*}
The probability \(\beta\) of type II error has different values with different values of \(p\) selected from the composite alternative hypothesis \(H_{1}:p<1/2\). For illustration, for \(p=1/4\), we have
\begin{align*} \beta & =\mathbb{P}\left (7\leq Y\leq 20;p=\frac{1}{4}\right )\\ & =\sum_{y=7}^{20}C^{20}_{y}\left (\frac{1}{4}\right )^{y}\left (\frac{3}{4}\right )^{20-y}\\ & =0.2142,\end{align*}
and, for \(p=1/10\), we have
\begin{align*} \beta & =\mathbb{P}\left (7\leq Y\leq 20;p=\frac{1}{10}\right )\\ & =\sum_{y=7}^{20}C^{20}_{y}\left (\frac{1}{10}\right )^{y}\left (\frac{9}{10}\right )^{20-y}\\ & =0.0024.\end{align*}
Instead of considering the probability of accepting \(H_{0}\) when \(H_{1}\) is true, we can compute the probability \(K\) of rejecting \(H_{0}\) when \(H_{1}\) is true. After all, \(\beta\) and \(K=1-\beta\) provide the same information. Since \(K\) is a function of \(p\), we denote this explicitly by writing \(K(p)\). The probability
\[K(p)=\sum_{y=0}^{6} C^{20}_{y}p^{y}(1-p)^{20-y}\]
for \(0<p\leq\frac{1}{2}\) is called the power function of the test. Therefore, we have \(K(1/2)=0.0577=\alpha\), \(1-K(1/4)=0.2142\) and \(1-K(1/10)=0.0024\). The value
of the power function at a specified \(p\) is called the power of the test at that point. For illustration, the values \(K(1/4)=0.7858\) and \(K(1/10)-0.9976\) are the powers at \(p=1/4\) and \(p=1/10\), respectively. An acceptable power function is one that assumes small values when \(H_{0}\) is true, and large values when \(p\) differes much from \(p=1/2\).
Let \(X_{1},X_{2},\cdots ,X_{n}\) be a random sample of size \(n\) from the normal distribution \(N(\mu ,100)\), which we can suppose is a possible distribution of scores of students in a statistic course that uses a new method of teaching. We wish to decide between \(H_{0}:\mu =60\) (no change hypothesis since this was the mean by the previous method of teaching) and the research worker’s hypothesis \(H_{1}:\mu >60\). Let us consider a sample of size \(n=25\). The sample mean \(\bar{X}\) is a maximum likelihood estimator of \(\mu\). Therefore, it seems reasonable to base our decision on this statistic. Initially, we use the rule to reject \(H_{0}\) and accept \(H_{1}\) if and only if \(\bar{x}\geq 62\). We first find the probability of rejecting \(H_{0}:\mu =60\) for various values of \(\mu\geq 60\). The probability of rejecting \(H_{0}\) is given by
\[K(\mu )=\mathbb{P}(\bar{X}\geq 62;\mu )\]
since this test calls for the rejection of \(H_{0}:\mu =60\) when \(\bar{x}\geq 62\). When the new process has the general mean \(\mu\), we know that \(\bar{X}\) has the normal distribution \(N(\mu .100/25=4)\). Now,
\begin{align*} K(\mu ) & =\mathbb{P}\left (\frac{\bar{X}-\mu}{2}\geq\frac{62-\mu}{2};\mu\right )\\ & =1-\Phi\left (\frac{62-\mu}{2}\right )\end{align*}
for \(\mu\geq 60\), is the probability of rejecting \(H_{0}:\mu =60\) using this particular test. The probability \(K(\mu )\) of rejecting \(H_{0}:\mu =60\) is called the power function of the test. At the value \(\mu_{1}\) of the parameter, \(K(\mu_{1})\) is the power at \(\mu_{1}\). The power at \(\mu =60\) is \(K(60)= 0.1587\), and this the probability of rejecting \(H_{0}:\mu =60\) when \(H_{0}\) is true. That is, \(K(60)=0.1587=\alpha\) is the probability of a type I error and is called the significance level of the test. The power at \(\mu =65\) is \(K(65)=0.9332\), and this is the probability of making the correct decision (namely, rejecting \(H_{0}:\mu =60\) when \(\mu =65\)). Therefore, we are pleased that here it is large. When \(\mu =65\), \(1-K(65)=0.0668\) is the probability of not rejecting \(H_{0}:\mu =60\) when \(\mu =65\); that is, it is the probability of a type II error and is denoted by \(\beta =0.0668\). Clearly, the probability \(\beta =1-K(\mu_{1})\) of a type II error depends on which value, say \(\mu_{1}\), is taken in the alternative hypothesis \(H_{1}:\mu >60\). Therefore, we have \(\beta =0.0668\) when \(\mu =65\), and \(\beta\) is \(1-K(63)=0.3085\) when \(\mu =63\). Frequently, statisticians like to have the significance level \(\alpha\) smaller than \(0.1587\), say around \(0.05\) or less, since it is a probability of a type I error. Therefore, if we would like \(\alpha =0.05\), then we can no longer use, with \(n=25\), the critical region \(\bar{x}\geq 62\), but \(\bar{x}\geq c\), where is \(c\) is selected satisfying
\[K(60)=\mathbb{P}(\bar{X}\geq c;\mu =60)=0.05.\]
However, when \(\mu =60\), \(\bar{X}\) is \(N(60,4)\) and
\begin{align*} K(60) & =\mathbb{P}\left (\frac{\bar{X}-60}{2}\geq\frac{c-60}{2};\mu =60\right )\\ & =1-\Phi\left (\frac{c-60}{2}\right )=0.05.\end{align*}
From the table, we have
\[\frac{c-60}{2}=1.645=z_{0.05}\]
and
\[c=60+3.29=63.29.\]
Although this change reduces \(\alpha\) from \(0.1587\) to \(0.05\), it increases \(\beta\) from \(0.0668\) at \(\mu =65\) to
\begin{align*} \beta & =1-P(\bar{X}\geq 63.29;\mu =65)\\
& =1-\mathbb{P}\left (\frac{\bar{X}-65}{2}\geq\frac{63.29-65}{2};\mu =65\right )\\
& =\Phi (-0.855)=0.1963.\end{align*}
In general, without changing the sample size or the type of test of hypothesis, a decrease in \(\alpha\) causes an increase in \(\beta\), and a decrease in \(\beta\) causes an increase in \(\alpha\). Both probabilities \(\alpha\) and \(\beta\) of the two type errors can be decreased only by increasing the sample size or, in some way, constructing a better test of the hypothesis. For example, since \(\bar{X}\) is \(N(\mu ,100/100=1)\), if \(n=100\) and we desire a test with significance level \(\alpha =0.05\), then
\[\alpha =\mathbb{P}(\bar{X}\geq c;\mu =60)=0.05.\]
means
\[\mathbb{P}\left (\frac{\bar{X}-60}{1}\geq\frac{c-60}{1};\mu =60\right )=0.05.\]
and \(c-60=1.645\), i.e., \(c=61.645\). The power functions is
\begin{align*} K(\mu ) & =\mathbb{P}(\bar{X}\geq 61.645;\mu )\\ & =\mathbb{P}\left (\frac{\bar{X}-\mu}{1}\geq
\frac{61.645-\mu}{1};\mu\right )\\ & =1-\Phi (61.645-\mu ).\end{align*}
In particular, this means that \(\beta\) at \(\mu =65\) is
\begin{align*} \beta & =1-K(\mu )\\ & =\Phi (61.645-65)\\ & =\Phi (-3.355)\approx 0.\end{align*}
Therefore, for \(n=100\), both \(\alpha\) and \(\beta\) have decreased from their respective original values of \(0.1587\) and \(0.0668\) when \(n=25\). Rather than guess the value of \(n\), an ideal power function determines the sample size. Let us use a critical region of the form \(\bar{x}\geq c\). Further, suppose that we want \(\alpha =0.05\) and, when \(\mu =65\), \(\beta=0.05\). Since \(\bar{X}\) is \(N(\mu ,100/n)\), we have
\begin{align*} 0.025 & =\mathbb{P}(\bar{X}\geq c;\mu =60)\\ & =1-\Phi\left (\frac{c-60}{10/\sqrt{n}}\right )\end{align*}
and
\begin{align*} 0.05 & =1-\mathbb{P}(\bar{X}\geq c;\mu =65)\\ & =\Phi\left (\frac{c-65}{10/\sqrt{n}}\right ).\end{align*}
That is, we have
\[\frac{c-60}{10/\sqrt{n}}=1.96\mbox{ and }\frac{c-65}{10/\sqrt{n}}=-1.645.\]
Solving these equations simultaneous for \(c\) and \(10/\sqrt{n}\), we obtain
\[\frac{10}{\sqrt{n}}=\frac{5}{3.605}\]
and
\[c=60+1.96\cdot\frac{5}{3.605}=62.718,\]
which implies \(\sqrt{n}=7.21\) and \(n=51.98\). Since \(n\) must be an integer, we can take \(n=52\) and obtain \(\alpha =0.025\) and \(\beta =0.05\), approximately.
Now, we consider the notion of probability value or $p$-value. The \(p\)-value associated with a test is the probability that we obtain the observed value of the test statistic or a value that is more extreme in the direction of the alternative hypothesis when \(H_{0}\) is true. Rather than select the critical region ahead of time, the \(p\)-value of a test can be reported such that the reader can make a decision. Assume that we are testing \(H_{0}:\mu =60\) against \(H_{1}:\mu >60\) with a sample mean \(\bar{X}\) based on \(n=52\) observations. Suppose that we obtain the observed sample mean of \(\bar{x}=62.75\). If we compute the probability of obtaining a \(\bar{x}\) of that value of \(62.75\) or greater when \(\mu =60\), then we obtain the \(p\)-value associated with \(\bar{x}=62.75\). That is,
\begin{align*} p\mbox{-value} & =\mathbb{P}(\bar{X}\geq 62.75;\mu =60)\\
& =\mathbb{P}\left (\frac{\bar{X}-60}{10/\sqrt{52}}\geq\frac{62.75-60}{10/\sqrt{52}};\mu =60\right )\\
& =1-\Phi\left (\frac{62.75-60}{10/\sqrt{52}}\right )\\ & =1-\Phi (1.983)=0.0237.\end{align*}
If this \(p\)-value is small, we tend to reject the hypothesis \(H_{0}:\mu =60\). For example, rejection of \(H_{0}:\mu =60\) if the \(p\)-value is less than or equal to \(0.025\) is exactly the same as rejection if \(\bar{x}\geq 62.718\). In other words, \(\bar{x}=62.718\) has a \(p\)-value of \(0.025\).
Example. Suppose that in the past, a golfer’s scores have been (approximately) normally distributed with mean \(\mu =90\) and \(\sigma^{2}=9\). After taking some lessons, the golfer has reason to believe that the mean \(\mu\) has decreased (We assume that \(\sigma^{2}\) is still about \(9\)). In order to test the null hypothesis \(H_{0}:\mu =90\) against the alternative hypothesis \(H_{1}:\mu <90\), the golfer plays \(16\) games, and computes the sample mean \(\bar{x}\). If \(\bar{x}\) satisfies \(\bar{x}\leq c\), then \(H_{0}\) is rejected and \(H_{1}\) is accepted. For \(c=88.5\), the power function of the test is given by
\begin{align*} K(\mu ) & =P(\bar{X}\leq 88.5;\mu )\\ & =\mathbb{P}\left (\frac{\bar{X}-\mu}{3/4}\leq\frac{88.5-\mu}{3/4};\mu\right )\\ & =\Phi\left (\frac{88.5-\mu}{3/4}\right )\end{align*}
since \(9/16\) is the variance of \(\bar{X}\). In particular, we have
\begin{align*} \alpha & =K(90)=\Phi (-2)\\ & =1-0.9772=0.0228.\end{align*}
If the true mean is equal to \(\mu =88\) after the lessons, the power is \(K(88)=\Phi (2/3)=0.7475\). If \(\mu =87\), then \(K(87)=\Phi (2)=0.9772\). An observed sample mean of \(\bar{x}=88.25\) has a
\begin{align*} p\mbox{-value} & =\mbox{P}(\bar{X}\leq 88.25;\mu =90)\\ & =\Phi\left (\frac{88.25-90}{3/4}\right )\\ & =\Phi\left (-\frac{7}{3}\right )=0.0098,\end{align*}
which would lead to rejection at \(\alpha =0.0228\). \(\sharp\)
Example. In order to test \(H_{0}:p=1/2\) against \(H_{1}:p<1/2\), we take a random sample of Bernoulli trials \(X_{1},X_{2},\cdots ,X_{n}\). The test statistic is taken to be \(Y=\sum_{i=1}^{n} X_{i}\) which has a binomial distribution \(B(n,p)\). Let the critical region be defined by \(C=\{y:y\leq c\}\). The power function for this test is defined by
\[K(p)=\mathbb{P}(Y\leq c;p).\]
We shall find the values of \(n\) and \(c\) satisfying \(K(1/2)=0.05\) and \(K(1/4)=0.9\) approximately. In other words, we would like the significance level to be \(\alpha =K(1/2)=0.05\) and the power at \(p=1/4\) to equal to \(0.9\). We have that
\begin{align*} 0.05 & =\mathbb{P}\left (Y\leq c;p=\frac{1}{2}\right )\\ & =\mathbb{P}\left (\frac{Y-n/2}{\sqrt{n(1/2)(1/2)}}\leq\frac{c-n/2}{\sqrt{n(1/2)(1/2}}\right )\end{align*}
implies
\[\frac{c-n/2}{\sqrt{n/4}}\approx -1.645,\]
and that
\begin{align*} 0.9 & =\mathbb{P}\left (Y\leq c;p=\frac{1}{4}\right )\\ & =\mathbb{P}\left (\frac{Y-n/4}{\sqrt{n(1/4)(3/4)}}\leq\frac{c-n/4}{\sqrt{n(1/4)(3/4)}}\right )\end{align*}
implies
\[\frac{c-n/4}{\sqrt{3n/36}}\approx 1.282.\]
Therefore, we obtain
\[\frac{n}{4}\approx 1.645\sqrt{\frac{n}{4}}+1.282\sqrt{\frac{3n}{16}}\]
and
\[\sqrt{n}\approx 4\cdot 1.378=5.512.\]
Therefore \(n\) is equal to \(31\). From either of the first two approximate equalities, we find that \(c\) is about \(10.9\). Using \(n=31\) and \(c=10.9\), we have \(K(1/2)=0.05\) and \(K(1/4)=0.9\) that are only approximate. In fact, since \(Y\) must be an integer, we could let \(c=10.5\). Then, with \(n=31\), we have
\begin{align*} \alpha & =K(1/2)\\ & =\mathbb{P}\left (Y\leq 10.5;p=\frac{1}{2}\right )\\ & \approx 0.0362\end{align*}
and
\begin{align*} \alpha & =K(1/4)\\ & =\mathbb{P}\left (Y\leq 10.5;p=\frac{1}{4}\right )\\ & \approx 0.8729.\end{align*}
We can let \(c=11.5\) and \(n=31\). Then, we have
\begin{align*} \alpha & =K(1/2)\\ & =\mathbb{P}\left (Y\leq 11.5;p=\frac{1}{2}\right )\\ & \approx 0.0558\end{align*}
and
\begin{align*} \alpha & =K(1/4)\\ & =\mathbb{P}\left (Y\leq 11.5;p=\frac{1}{4}\right )\\ & \approx 0.9235.\end{align*}
\begin{equation}{\label{b}}\tag{B}\mbox{}\end{equation}
Test about One Mean and One Variance.
The null hypothesis is generally of the form \(H_{0}:\mu =\mu_{0}\). There are essentially three possibilities for the alternative hypothesis
- \(H_{1}:\mu >\mu_{0}\); that is, \(\mu\) has increased.
- \(H_{1}:\mu <\mu_{0}\); that is, \(\mu\) has decreased.
- \(H_{1}:\mu\neq\mu_{0}\); that is, \(\mu\) has changed, but it is not known if it has increased or decreased, which leads to a two-sided alternative hypothesis.
To test \(H_{0}:\mu =\mu_{0}\) against one of these alternative hypotheses, a random sample is taken from the distribution, and an observed sample mean \(\bar{x}\) that is close to \(\mu_{0}\) supports \(H_{0}\). The closeness of \(\bar{x}\) to \(\mu_{0}\) is measured in terms of standard deviations \(\sigma /\sqrt{n}\) of \(\bar{X}\) , which is sometimes called the standard error of the mean. Therefore, the test statistic can be defined by
\begin{align*} Z & =\frac{\bar{X}-\mu_{0}}{\sqrt{\sigma^{2}/n}}\\ & =\frac{\bar{X}-\mu_{0}}{\sigma /\sqrt{n}}.\end{align*}
The critical regions, at a significance level \(\alpha\), are summarized as follows. The underlying assumption is that the distribution is \(N(\mu ,\sigma^{2})\) and \(\sigma^{2}\) is known.
\[\begin{array}{ccc}
\mbox{Table A} &&\\
\hline H_{0} & H_{1} & \mbox{Critical Region (variance known)}\\
\hline \mu =\mu_{0} & \mu >\mu_{0} & z\geq z_{\alpha}\mbox{ or }\bar{x}\geq\mu_{0}+z_{\alpha}\cdot\sigma/\sqrt{n}\\
\mu =\mu_{0} & \mu <\mu_{0} & z\leq z_{\alpha}\mbox{or }\bar{x}\leq\mu_{0}-z_{\alpha}\cdot\sigma/\sqrt{n}\\
\mu =\mu_{0} & \mu\neq\mu_{0} & |z|\geq z_{\alpha /2}\mbox{ or }|\bar{x}-\mu_{0}|\geq z_{\alpha /2}\cdot\sigma/\sqrt{n}\\
\hline\end{array}\]
We now take a more realistic position, and assume that the variance is unknown. Suppose that our null hypothesis is \(H_{0}:\mu =\mu_{0}\) and the two-sided alternative hypothesis is \(H_{1}:\mu\neq\mu_{0}\). If a random sample \(X_{1},X_{2},\cdots ,X_{n}\) is taken from a normal distribution \(N(\mu ,\sigma^{2})\), we recall that a confidence interval for \(\mu\) is based on
\begin{align*} T & =\frac{\bar{X}-\mu}{\sqrt{S^{2}/n}}\\ & =\frac{\bar{X}-\mu}{S/\sqrt{n}}.\end{align*}
This suggets that \(T\) might be a good statistic to use for the test of \(H_{0}:\mu =\mu_{0}\) with \(\mu\) replaced by \(\mu_{0}\). If \(\mu =\mu_{0}\), we know that \(T\) has a \(t\)-distribution with \(n-1\) degrees of freedom. Therefore, for \(\mu =\mu_{0}\), we have
\[\mathbb{P}(|T|\geq t_{\alpha /2}(n-1))=\mathbb{P}\left (\frac{|\bar{X}-\mu_{0}|}{S/\sqrt{n}}\geq t_{\alpha /2}(n-1)\right )=\alpha .\]
If \(\bar{x}\) and \(s\) are the sample mean and sample standard deviation, the rule that rejects \(H_{0}:\mu =\mu_{0}\) and accepts \(H_{1}:\mu\neq\mu_{0}\) if and only if
\[|t|=\frac{|\bar{x}-\mu_{0}|}{s/\sqrt{n}}\geq t_{\alpha /2}(n-1)\]
provides a test of this hypothesis with significance level \(\alpha\). It should be noted that this rule is equivalent to rejecting \(H_{0}:\mu =\mu_{0}\) if \(\mu_{0}\) is not in the \(100(1-\alpha )\%\) confidence interval with end points \(\bar{x}-t_{\alpha /2}(n-1)\cdot (2/\sqrt{n})\) and \(\bar{x}+t_{\alpha /2}(n-1)\cdot (s/\sqrt{n})\). The following table summarizes tests of hypotheses for a single mean, along with the three possible alternative hypotheses, when the underlying distribution is \(N(\mu ,\sigma^{2})\), \(\sigma^{2}\) is unknown,
$t=(\bar{x}-\mu )/(s/\sqrt{n})$, and \(n\leq 30\). If \(n>30\), use Table A for approximate tests with \(\sigma\) replaced by \(s\).
\[\begin{array}{ccc}
\mbox{Table B} &&\\
\hline H_{0} & H_{1} & \mbox{Critical Region (variace unknown)}\\
\hline \mu =\mu_{0} & \mu >\mu_{0} & t\geq t_{\alpha}(n-1)\mbox{ or }\bar{x}\geq\mu_{0}+t_{\alpha}(n-1)\cdot s/\sqrt{n}\\
\mu =\mu_{0} & \mu <\mu_{0} & t\leq -t_{\alpha}(n-1)\mbox{ or }\bar{x}\leq\mu_{0}-t_{\alpha}(n-1)\cdot s/\sqrt{n}\\
\mu =\mu_{0} & \mu\neq\mu_{0} & |t|\geq t_{\alpha /2}(n-1)\mbox{ or }|\bar{x}-\mu_{0}|\geq t_{\alpha /2}(n-1)\cdot s/\sqrt{n}\\
\hline\end{array}\]
Example. Let \(X\) (in millimeters) equal the growth in \(15\) days of a tumor induced in a mouse. Assume that the distribution of \(X\) is \(N(\mu ,\sigma^{2})\). We shall test the null hypothesis \(H_{0}:\mu =\mu_{0}=4\) millimeters against the two-sided alternative hypothesis \(H_{1}:\mu\neq 4\). If we use \(n=9\) observations and a significance level of \(\alpha =0.1\), the critical region is given by
\[|t|=\frac{|\bar{x}-4|}{s/\sqrt{9}}\geq t_{\alpha /2}(8)=1.86.\]
If we are given that \(n=9\), \(\bar{x}=4.3\), and \(s=1.2\), we see that
\[t=\frac{4.3-4}{1.2/\sqrt{9}}=\frac{0.3}{0.4}=0.75.\]
Therefore, we have \(|t|=|0.75|<1.86\), and we accept (do not reject) \(H_{0}:\mu =4\) at the \(\alpha =0.1\) significance level. \(\sharp\)
Example. In attempting to control the strength of the wastes discarded into a nearby river, a paper firm has taken a number of measures. Members of the firm believe that they have restricted the oxygen-consuming power of their waste from a previous mean \(\mu\) pf \(500\). They plan to test \(H_{0}:\mu =500\) against \(H_{1}:\mu <500\) by using readings taken on \(n=25\) consecutive days. If these \(25\) values can be treated as a random sample, then the critical region, for a significance level of \(\alpha =0.01\), is given by
\[t=\frac{\bar{x}-500}{s/\sqrt{25}}\leq -t_{0.01}(24)=-2.492.\]
The observed values of the sample mean and sample standard deviation are \(\bar{x}=308.8\) and \(s=115.15\). Since
\[t=\frac{308.8-500}{115.14/\sqrt{25}}=-8.3<-2.492,\]
we clearly reject the null hypothesis and accept \(H_{1}:\mu <500\). \(\sharp\)
There is interest in comparing the means of two different distributions or populations. If \(X\) and \(Y\) are independent, let \(W=X-Y\), and the hypothesis that \(\mu_{X}=\mu_{Y}\) would be replaced with the hypothesis \(H_{0}:\mu_{W}=0\). If we can assume that the distribution of \(W\) is (approximately) \(N(\mu_{W},\sigma^{2})\), then the approximate \(t\)-test for a single mean could be used by referring to Table B. This is often called a paired \(t\)-test.
Example. Twenty-four girls in the \(9\)th and \(10\)th grades were put on an ultra-heavy rope-jumping program. Someone thought that such a program would increase their speed when running the \(40\)-yard dash. Let \(W\) equal the difference in time to run the \(40\)-yard: “before-program time” minus “after-program time”. Assume that the distribution of \(W\) is (approximately) \(N(\mu_{W},\sigma_{W}^{2})\). We shall test the null hypothesis \(H_{0}:\mu_{W}=0\) against the alternative hypothesis \(H_{1}:\mu_{W}>0\). The test statistic and critical region that has an \(\alpha =0.05\) significance level are given by
\[t=\frac{\bar{w}-0}{s_{W}/\sqrt{24}}\geq t_{0.05}(23)=1.714.\]
The observed data gives \(\bar{w}=0.079\) and \(s_{W}=0.255\). Therefore, the observed value of the test statistic is
\[t=\frac{0.079-0}{0.255/\sqrt{24}}=1.518.\]
Since \(1.518<1.714\), the null hypothesis is not rejected. We also note that \(t_{0.1}(23)=1.319\) and \(t=1.518>1.319\). Therefore, the null hypothesis would be rejected at an \(\alpha =0.1\) significance level. Another way of saying this is that \(0.05<p\mbox{-value}<0.1\). \(\sharp\)
A psychology professor claims that the variance of IQ scores for college students is \(\sigma^{2}=100\). To test this claim, it is decided to test the hypothesis \(H_{0}:\sigma^{2}=100\) against the alternative hypothesis \(H_{1}:\sigma^{2}\neq 100\). A random sample of \(n\) students will be selected, and the test will be based on the observed unbiased estimate \(s^{2}\) of the variance \(\sigma^{2}\) of their IQ scores. The hypothesis \(H_{0}\) will be rejected if \(s^{2}\) differs “too much” from \(\sigma^{2}=100\). That is, \(H_{0}\) will be rejected if \(s^{2}\geq c_{1}\) or \(\sigma^{2}\geq c_{2}\) for some constants \(c_{1}<100\) or \(c_{2}>100\). Generally, \(c_{1}\) and \(c_{2}\) are selected satisfying \(\mathbb{P}(S^{2}\leq c_{1})=\alpha /2\) and \(\mathbb{P}(S^{2}\geq c_{2})=\alpha /2\).
Assume that IQ scores are normally distributed. Then, the distribution of \((n-1)S^{2}/100\) is \(\chi^{2}(n-1)\) when \(H_{0}\) is true. Therefore, the chi-square probability table can be used to give
\[\mathbb{P}\left (\frac{(n-1)S^{2}}{100}\leq\frac{(n-1)c_{1}}{100}\right )=\alpha /2\]
and
\[\mathbb{P}\left (\frac{(n-1)S^{2}}{100}\geq\frac{(n-1)c_{2}}{100}\right )=\alpha /2.\]
We also have
\[\frac{(n-1)c^{1}}{100}=\chi^{2}_{1-\alpha /2}(n-1)\]
and
\[\frac{(n-1)c^{2}}{100}=\chi^{2}_{\alpha /2}(n-1).\]
Therefore, we obtain
\[c_{1}=\frac{100}{n-1}\cdot\chi^{2}_{1-\alpha /2}(n-1)\]
and
\[c_{2}=\frac{100}{n-1}\cdot\chi^{2}_{\alpha /2})(n-1).\]
This test can also be based on the value of the chi-square statistic
\[\chi^{2}=\frac{(n-1)S^{2}}{100}.\]
We would reject the null hypothesis when \(\chi^{2}\leq\chi_{1-\alpha /2}^{2}(n-1)\) or \(\chi^{2}\geq\chi^{2}_{\alpha /2}(n-1)\).
Example. Suppose that, in the previous discussion, we take \(n=23\) and \(\alpha =0.05\). In order to test \(H_{0}:\sigma^{2}=100\) against \(H_{1}:\sigma^{2}\neq 100\), let
\[\chi^{2}=(n-1)S^{2}/\sigma_{0}^{2}=22S^{2}/100.\]
Since \(\chi_{0.975}^{2}(22)=10.98\) and \(\chi_{0.025}^{2}(22)=36.78\), \(H_{0}\) will be rejected when
\[\chi^{2}=\frac{22S^{2}}{100}<10.98\]
or
\[\chi^{2}=\frac{22S^{2}}{100}\geq 36.78\]
or, equivalently, when
\[s^{2}\leq c_{1}=\frac{100\cdot 10.98}{22}=49.91\]
or
\[s^{2}\geq c_{2}=\frac{100\cdot 36.78}{22}=167.18.\]
Given that the observed value of the sample variance is \(s^{2}=147.82\), the hypothesis \(H_{0}:\sigma^{2}=100\) is not rejected. Also, the observed value of the chi-square test statistic is given by
\[\chi^{2}=\frac{22\cdot 147.82}{100}=32.52.\]
Since \(10.98<32.52<36.78\), we would again accept \(H_{0}\) as expected. If \(H_{1}:\sigma^{2}>100\) has been the alternative hypothesis, \(H_{0}:\sigma^{2}=100\) would have been rejected when
\[\chi^{2}=\frac{22S^{2}}{100}\geq\chi_{0.05}^{2}(22)=33.92\]
or equivalently
\[s^{2}\geq\frac{100\cdot\chi_{0.05}^{2}(22)}{22}=\frac{100\cdot 33.92}{22}=154.18.\]
Since \(\chi^{2}=32.35<33.92\) and \(s^{2}=147.82<154.18\), \(H_{0}\) would not be rejected in favor of this one-sided alternative hypothesis for \(\alpha =0.05\), although we suspect a slightly larger data set might lead to rejection. For the one-sided alternative hypothesis \(H_{1}:\sigma^{2}>100\), the \(p\)-value is \(\mathbb{P}(W\geq 32.52)\), where \(W=22S^{2}/100\) has a chi-square distribution with \(22\) degrees of freedom when \(H_{0}\) is true. Using table, we have \(\mathbb{P}(W\geq 30.81)=0.1\) and \(\mathbb{P}(W\geq 33.92)=0.05\). Therefore, we obtain
\[0.05<p\mbox{-value}=\mathbb{P}(W\geq 32.52)<0.1.\]
The null hypothesis would not be rejected at an \(\alpha =0.05\) significance level. To find the \(p\)-values for a two-sided alternative for tests about the variance, double the tail probability beyond the chi-square test statistic when the sample variance is below \(\sigma_{0}^{2}\). Since \(s^{2}=147.82>100\), the \(p\)-value equals \(2\mathbb{P}(W\geq 32.52))\). i.e., \(0.1<p\mbox{-value}<0.2\). \(\sharp\)
The following table summarizes tests of hypotheses for a single variance. The critical region is given in terms of the sample variance. The critical region is also given in terms of the chi-square statistic shown below
\[\chi^{2}=\frac{(n-1)s^{2}}{\sigma_{0}^{2}}.\]
\[\begin{array}{ccc}
\mbox{Table C} &&\\
\hline H_{0} & H_{1} & \mbox{Critical Region}\\
\hline \sigma^{2} =\sigma^{2}_{0} & \sigma^{2}>\sigma^{2}_{0} &
s^{2}\geq {\displaystyle \frac{\sigma_{0}^{2}\cdot\chi^{2}_{\alpha}(n-1)}{n-1}}\mbox{ or }\chi^{2}\geq\chi_{\alpha}^{2}(n-1)\\
\sigma^{2} =\sigma^{2}_{0} & \sigma^{2}<\sigma^{2}_{0} &
s^{2}\leq {\displaystyle \frac{\sigma_{0}^{2}\cdot \chi^{2}_{1-\alpha} (n-1)}{n-1}}\mbox{ or }\chi^{2}\leq\chi_{1-\alpha}^{2}(n-1)\\
\sigma^{2} =\sigma^{2}_{0} & \sigma^{2}\neq\sigma^{2}_{0} &
s^{2}\leq {\displaystyle \frac{\sigma_{0}^{2}\cdot \chi^{2}_{1-\alpha /2} (n-1)}{n-1}}\mbox{ or }s^{2}\geq {\displaystyle \frac{\sigma_{0}^{2}\cdot
\chi^{2}_{\alpha /2}(n-1)}{n-1}}\mbox{ or }\chi^{2}\leq\chi_{1-\alpha /2}^{2}(n-1)\mbox{ or }\chi^{2}\geq\chi_{\alpha /2}^{2}(n-1)\\
\hline\end{array}\]
\begin{equation}{\label{c}}\tag{C}\mbox{}\end{equation}
Tests of the Equality of two Normal Distributions.
Assume that the independent random variables \(X\) and \(Y\) have normal distributions \(N(\mu_{X},\sigma_{X}^{2})\) and \(N(\mu_{Y},\sigma_{Y}^{2})\), respectively. We are interested in testing whether the distributions of \(X\) and \(Y\) are the same. When the assumption of normality is valid, we would be interested in testing whether the two variances are equal and whether the two means are equal. When \(X\) and \(Y\) are independent and normally distributed, we can test hypotheses about their means using the same \(t\)-statistic that is also used for constructing a confidence interval for \(\mu_{X}-\mu_{Y}\). Recall that the \(t\)-statistic used for constructing the confidence interval assumed that the variances of \(X\) and \(Y\) are equal.
A botanist is interested in comparing the growth response of dwarf pea stems to two different levels of the hormone indoleacetic acid (IAA). Using \(16\)-day-old pea plants, the botanist obtains \(5\)-millimeter sections and floats these sections on solutions with different hormone concentrations to observe the effect of the hormone on the growth of the pea stem. Let
$X$ and \(Y\) denote, respectively, the independent growths that can be attributed to the hormone during the first \(26\) hours after sectioning for \(0.5\cdot 10^{-4}\) and \(10^{-4}\) levels of concentration of IAA. The botanist would like to test the null hypothesis \(H_{0}:\mu_{X}-\mu_{Y}=0\) against the alternative hypothesis \(H_{1}:\mu_{X}-\mu_{Y}<0\). If we can assume \(X\) and \(Y\) are independent and normally distributed with common variance, the corresponding random samples of sizes \(n\) and \(m\) give a test based on the statistic
\begin{align*} T & =\frac{\bar{X}-\bar{Y}}{\sqrt{\left (\frac{(n-1)S_{X}^{2}+(m-1)S_{Y}^{2}}{n+m-2}\right )\left (\frac{1}{n}+\frac{1}{m}\right )}}\\ & =\frac{\bar{X}-\bar{Y}}{S_{p}\sqrt{\frac{1}{n}+\frac{1}{m}}},\end{align*}
where
\[S_{p}=\sqrt{\frac{(n-1)S_{X}^{2}+(m-1)S_{Y}^{2}}{n+m-2}}.\]
$T$ has a \(t\)-distribution with \(n+m-2\) degrees freedom when \(H_{0}\) is true and the variances are (approximately) equal. The hypothesis \(H_{0}\) will be rejected in favor of \(H_{1}\) when the observed value of \(T\) is less than \(-t_{\alpha}(n+m-2)\).
\begin{equation}{\label{ex1}}\tag{1}\mbox{}\end{equation}
Example \ref{ex1}. In the preceding discussion, the botanist measured the growths of pea stem segments in the millimeter for \(n=11\) observations of \(X\) and \(m=13\) observations of \(Y\). For these data, we have \(\bar{x}=1.03\), \(s_{X}^{2}=0.24\), \(\bar{y}=1.66\), and \(s_{Y}^{2}=0.35\). The critical region for testing \(H_{0}:\mu_{X}-\mu_{Y}=0\) against \(H_{1}:\mu_{X}-\mu_{Y}<0\) is \(t\leq -t_{0.05}(22)=-1.717\). Since
\begin{align*} t & =\frac{1.03-1.66}{\sqrt{\left (\frac{10\cdot 0.24+12\cdot 0.35}{11+13-2}\right )\left (\frac{1}{11}+\frac{1}{13}\right )}}\\ & =-2.81<-1.717,\end{align*}
The hypothesis \(H_{0}\) is clearly rejected at an \(\alpha =0.05\) significance level. Notice that the approximate \(p\)-value of this test is \(0.005\), since \(-t_{0.005}(22)=-2.819\). Also the sample variances do not differ too much. Therefore, the most statisticians would use this two-sample \(t\)-test. \(\sharp\)
Based on independent random samples of sizes \(n\) and \(m\), let \(\bar{x}\), \(\bar{y}\), and \(s_{p}^{2}\) represent the observed unbiased estimates of the respective parameters \(\mu_{X}\), \(\mu_{Y}\), and \(\sigma_{X}^{2}=\sigma_{Y}^{2}\) of two normal distributions with common variance. Then \(\alpha\)-level tests of certain hypotheses are given in the following table, when \(\sigma_{X}^{2}=\sigma_{Y}^{2}\). If the common variance assumption is violated, but not too much badly, the test is satisfactory but the significance levels are only approximate.
\[\begin{array}{ccc}
\mbox{Table D} &&\\
\hline H_{0} & H_{1} & \mbox{Critical Region}\\
\hline \mu_{X}=\mu_{Y} & \mu_{X}>\mu_{Y} & t\geq t_{\alpha}(n+m-2)\mbox{ or }\bar{x}-\bar{y}\geq t_{\alpha}(n+m-2)\cdot s_{p}\cdot\sqrt{1/n+1/m}\\
\mu_{X}=\mu_{Y} & \mu_{X}<\mu_{Y} & t\leq -t_{\alpha}(n+m-2)\mbox{ or }\bar{x}-\bar{y}\leq -t_{\alpha}(n+m-2)\cdot s_{p}\cdot\sqrt{1/n+1/m}\\
\mu_{X}=\mu_{Y} & \mu_{X}\neq\mu_{Y} & |t|\geq t_{\alpha /2}(n+m-2)\mbox{ or }|\bar{x}-\bar{y}|\geq t_{\alpha /2}(n+m-2)\cdot s_{p}\cdot\sqrt{1/n+1/m}\\
\hline\end{array}\]
\begin{equation}{\label{ex2}}\tag{2}\mbox{}\end{equation}
Example \ref{ex2}. A product is packaged using a machine with \(24\) filler heads numbered \(1\) to \(24\), with the odd-numbered heads on one side of the machine and the even on the other side. Let \(X\) and \(Y\) equal the fill weights in grams when a package is filled by an odd-numbered head and an even-numbered head, respectively. Assume that the distributions of \(X\) and \(Y\) are \(N(\mu_{X},\sigma_{X}^{2})\) and \(N(mu_{Y},\sigma_{Y}^{2})\), respectively, and that \(X\) and \(Y\) are independent. We would like to test the null hypothesis \(H_{0}:\mu_{X}-\mu_{Y}=0\) against the alternative hypothesis \(H_{1}:\mu_{X}-\mu_{Y}\neq 0\). To perform the test, after the machine has been set up and is running, we shall select one package at random from each filler head and weight it. At an \(\alpha =0.1\) significance level, the critical region is \(|t|\geq t_{0.05}(22)=1.717\). For the \(n=12\) observations of \(X\), we have \(\bar{x}=1076.75\) and \(s_{X}^{2}=29.3\). For the \(m=12\) observations of \(Y\), we have \(\bar{y}=1072.33\) and \(s_{Y}^{2}=26.24\). The calculated value of the test statistic is given by
\[t=\frac{1076.75-1072.33}{\sqrt{\left (\frac{11\cdot 29.3+11\cdot 26.24}{22}\right )\left (\frac{1}{12}+\frac{1}{12}\right )}}=2.05.\]
Since \(|t|=|2.05|>1.717\), the null hypothesis is rejected at an \(\alpha =0.1\) significance level. Note that \(|t|=2.05<2.074=t_{0.025}(22)\) such that the null hypothesis would not be rejected at an \(\alpha =0.05\) significance level. That is, the \(p\)-value is between \(0.05\) and \(0.1\). \(\sharp\)
We would like to give two modifications of tests about two means. Assume that we know the variances of \(X\) and \(Y\). The appropriate test statistic to use for testing \(H_{0}:\mu_{X}=\mu_{Y}\) is
\begin{equation}{\label {ch2eq1}}\tag{3}
Z=\frac{\bar{X}-\bar{Y}}{\sqrt{\frac{\sigma_{X}^{2}}{n}+\frac{\sigma_{Y}^{2}}{m}}}
\end{equation}
which has a standard normal distribution when the null hypothesis is true, and the populations are normally distributed. When the variances are unknown and the sample sizes are large, we just replace \(\sigma_{X}^{2}\) with \(S_{X}^{2}\) and \(\sigma_{Y}^{2}\) with \(S_{Y}^{2}\) in equation (\ref{ch2eq1}). The resulting statistic will have an approximate \(N(0,1)\) distribution.
Example. The target thickness for fruit-flavored gum and for fruit-flavored bubble gum is \(6.7\) hundredths of an inch. Let the independent random variables \(X\) and \(Y\) equal the respective thicknesses of these gums in hundredths of an inch. Assume that their distributions are \(N(\mu_{X},\sigma_{X}^{2})\) and \(N(\mu_{Y},\sigma_{Y}^{2})\), respectively. Since bubble gum has more elasticity than regular gum, it seems as if it would be harder to roll it out to the correct thickness. Therefore, we shall test the null hypothesis \(H_{0}:\mu_{X}=\mu_{Y}\) against the alternative hypothesis \(H_{1}:\mu_{X}<\mu_{Y}\) using samples of sizes \(n=50\) and \(m=40\). Since the variances are unknown and the sample sizes are large, the test statistic is
\[Z=\frac{\bar{X}-\bar{Y}}{\sqrt{\frac{S_{X}^{2}}{50}+\frac{S_{Y}^{2}}{40}}}.\]
At an approximate significance level of \(\alpha =0.01\), the critical region is \(z\leq -z_{0.01}=-2.326\). The observed sample mean and sample variance of \(X\) are \(\bar{x}=6.701\) and \(s_{X}^{2}=0.108\), respectively, and the observed sample mean and sample variance of \(Y\) are \(\bar{y}=6.841\) and \(s_{Y}^{2}=0.155\), respectively. Since the calculated value of the test statistic is
\begin{align*} z & =\frac{6.701-6.841}{\sqrt{\frac{0.108^{2}}{50}+\frac{0.155^{2}}{40}}}\\ & =-4.848<-2.326,\end{align*}
the null hypothesis is clearly rejected. \(\sharp\)
We now give a test for the equality of two variances when the sampling from normal populations. Let the independent random variables \(X\) and \(Y\) have respective distributions that are \(N(\mu_{X},\sigma_{X}^{2})\) and \(N(\mu_{Y},\sigma_{Y}^{2})\). To test the null hypothesis \(H_{0}:\sigma_{X}^{2}/\sigma_{Y}^{2}=1\) (or, equivalently, \(\sigma_{X}^{2}=
\sigma_{Y}^{2}\)), we take random samples of \(n\) observations of \(X\) and \(m\) observations of \(Y\). Recall that \((n-1)S_{X}^{2}/\sigma_{X}^{2}\) and \((m-1)S_{Y}^{2}/\sigma_{Y}^{2}\) have independent chi-square distributions \(\chi^{2}(n-1)\) and \(\chi^{2}(m-1)\), respectively. Therefore, when \(H_{0}\) is true, we have
\begin{align*} F & =\frac{(n-1)S_{X}^{2}/(n-1)\sigma_{X}^{2}}{(m-1)S_{Y}^{2}/(m-1)\sigma_{Y}^{2}}\\ & =\frac{S_{X}^{2}}{S_{Y}^{2}}.\end{align*}
has an \(F\)-distribution with \(r_{1}=n-1\) and \(r_{2}=m-1\) degrees of freedom. This \(F\) statistic is our test statistic. When \(H_{0}\) is true, we would expect the observed value of \(F\) to be close \(1\). Three possible alternative hypotheses along with critical region of size \(\alpha\) are summarized in the following table. Recall that \(1/F\), the reciprocal of \(F\), has an \(F\)-distribution with \(m-1\) and \(n-1\) degrees of freedom so all critical region may be written in terms of right-tail rejection regions so that the critical values can be selected easily from the statistic table.
\[\begin{array}{ccc}
\mbox{Table E} &&\\
\hline latex H_{0} & H_{1} & \mbox{Critical Region}\\
\hline \sigma_{X}^{2}=\sigma_{Y}^{2} & \sigma_{X}^{2}>\sigma_{Y}^{2} & s_{X}^{2}/s_{Y}^{2}\geq F_{\alpha}(n-1,m-1)\\
\sigma_{X}^{2}=\sigma_{Y}^{2} & \sigma_{X}^{2}<\sigma_{Y}^{2} & s_{Y}^{2}/s_{X}^{2}\geq F_{\alpha}(m-1,n-1)\\
\sigma_{X}^{2}=\sigma_{Y}^{2} & \sigma_{X}^{2}\neq\sigma_{Y}^{2} & s_{X}^{2}/s_{Y}^{2}\geq F_{\alpha /2}(n-1,m-1)\mbox{ or } s_{Y}^{2}/s_{X}^{2}\geq F_{\alpha /2}(m-1,n-1)\\
\hline\end{array}\]
Example. A biologist who studies spiders believes that not only do female green lynx spiders tend to be longer than their male counterparts but also that the lengths of the female spiders seem to vary more than those of the male spiders. We shall test whether the latter belief is true. Suppose that the distribution of the length \(X\) of male spiders is \(N(\mu_{X},\sigma_{X}^{2})\) and the length \(Y\) of female spiders is \(N(\mu_{Y},\sigma_{Y}^{2})\), and \(X\) and \(Y\) are independent. We shall test \(H_{0}:\sigma_{X}^{2}/\sigma_{Y}^{2}=1\) (i.e. \(\sigma_{X}^{2}=\sigma_{Y}^{2}\)) against the alternative hypothesis \(H_{1}:\sigma_{X}^{2}/\sigma_{Y}^{2}<1\) (i.e. \(\sigma_{X}^{2}<\sigma_{Y}^{2}\)). If we use \(n=30\) and \(m=30\) observations of \(X\) and \(Y\), respectively, a critical region that has a significance level of \(\alpha =0.01\) is
\[\frac{s_{Y}^{2}}{s_{X}^{2}}\geq F_{0.01}(29.29)=2.42,\]
approximately. The observations of \(X\) yield \(\bar{x}=5.917\) and \(s_{X}^{2}=0.4399\), while the observations of \(Y\) yield \(\bar{y}=8.153\) and \(s_{Y}^{2}=1.41\). Since
\[\frac{s_{Y}^{2}}{s_{X}^{2}}=\frac{1.41}{0.4399}=3.2053>2.42,\]
the null hypothesis is rejected in favor of the biologist’s belief. \(\sharp\)
Example. For Example \ref{ex1}, given \(n=11\) observations of \(X\) and \(m=13\) observations of \(Y\), where \(X\) is \(N(\mu_{X},\sigma_{X}^{2})\) and \(Y\) is \(N(\mu_{Y},\sigma_{Y}^{2})\), we shall test the null hypothesis \(H_{0}:\sigma_{X}^{2}/\sigma_{Y}^{2}=1\) against a two-sided alternative hypothesis. At an \(\alpha =0.05\) significance level, \(H_{0}\) is rejected if
\[\frac{s_{X}^{2}}{s_{Y}^{2}}\geq F_{0.025}(10,12)=3.37\]
or
\[\frac{s_{Y}^{2}}{s_{X}^{2}}\geq F_{0.025}(12,10)=3.62.\]
Using the data in Example \ref{ex1}, we obtain
\[\frac{s_{X}^{2}}{s_{Y}^{2}}=\frac{0.24}{0.35}=0.686\mbox{ and }\frac{s_{Y}^{2}}{s_{X}^{2}}=1.458,\]
which says that we do not reject \(H_{0}\). Therefore, the assumption of equal variances for the \(t\)-statistic that was used in Example \ref{ex1} seems to be valid. We also have
\[F_{0.975}(10,12)=1/F_{0.025}(12,10)=1/3.62=0.276.\]
Example. For Example \ref{ex2}, we shall test the null hypothesis \(H_{0}: \sigma_{X}^{2}/\sigma_{Y}^{2}=1\) against a two-sided alternative hypothesis. Since
\begin{align*} \frac{s_{X}^{2}}{s_{Y}^{2}} & =\frac{29.3}{26.24}\\ & =1.12<F_{0.025}(11,11)\end{align*}
Since \(F_{0.025}(11,11)\) must be between \(F_{0.025}(12,12)=3.28\) and \(F_{0.025}(10,10)=3.72\) from the statistic table. The assumption of equal variance is confirmed. \(\sharp\)
The Best Critical Region.
Let us consider a test of the simple hypothesis \(H_{0}:\theta =\theta_{0}\) against a simple alternative hypothesis \(H_{1}:\theta =\theta_{1}\). In this discussion, we assume that the random vaiables \(X_{1},X_{2},\cdots ,X_{n}\) under consideration have a joint p.d.f. of the discrete type, which is denote by \(L(\theta ;x_{1},\cdots x_{n})\). That is,
\[\mathbb{P}(X_{1}=x_{1},X_{2}=x_{2},\cdots ,X_{n}=x_{n})=L(\theta ;x_{1},\cdots ,x_{n}).\]
A critical region of size \(\alpha\) is a set of points \((x_{1},x_{2},\cdots ,x_{n})\) with probability of \(\alpha\) when \(\theta =\theta_{0}\). For a good test, this set \(C\) of points should have a large probability when \(\theta =\theta_{1}\), since, under \(H_{1}:\theta =\theta_{1}\), we wish to reject \(H_{0}:\theta =\theta_{0}\). Therefore, the first point we would place in the critical region \(C\) is the one with the smallest ratio
\[\frac{L(\theta_{0};x_{1},x_{2},\cdots ,x_{n})}{L(\theta_{1};x_{1},x_{2},\cdots ,x_{n})}.\]
That is, the “cost” in terms of probability under \(H_{0}:\theta =\theta_{0}\) is small compared to the probability that we can ”buy” if \(\theta =\theta_{1}\). The next point to add to \(C\) would be the one with the next smallest ratio. We would be continue to add points to \(C\) in this manner until the probability of \(C\), under \(H_{0}:\theta =\theta_{0}\), equals \(\alpha\). In this way, we have achieved, for the significance level \(\alpha\), the region \(C\) with the largest probability when \(H_{1}:\theta =\theta_{1}\) is true.
Definition. Consider the test of the simple null hypothesis \(H_{0}:\theta =\theta_{0}\) against the simple alternative hypothesis \(H_{1}:\theta =\theta_{1}\). Let \(C\) be a critical region of size \(\alpha\); that is, \(\alpha =\mathbb{P}(C;\theta_{0})\). Then \(C\) is a best critical region of size \(\alpha\) if, for every other critical region \(D\) of size \(\alpha =\mathbb{P}(D;\theta_{0})\), we have
\[\mathbb{P}(C;\theta_{1})\geq\mathbb{P}(D;\theta_{1}).\]
In other words, when \(H_{1}:\theta =\theta_{1}\) is true, the probability of rejecting \(H_{0}:\theta =\theta_{0}\) using the critical region \(C\) is at least as great as the corresponding probability using any other critical region \(D\) of size \(\alpha\). \(\sjarp\)
Therefore, a best critical region of size \(\alpha\) is the critical region that has the greatest power among all critical regions of size \(\alpha\).
Theorem. (Neyman-Pearson Lemma). Let \(X_{1},X_{2},\cdots ,X_{n}\) be a random sample of size \(n\) from a distribution with p.d.f. \(f(x;\theta )\), where \(\theta_{0}\) and \(\theta_{1}\) are two possible values of \(\theta\). Denote the joint p.d.f of \(X_{1},X_{2},\cdots ,X_{n}\) by the likelihood function
\[L(\theta )=L(\theta ;x_{1},\cdots ,x_{n})=f(x_{1};\theta )\cdots f(x_{n};\theta ).\]
If there exists a constant \(k\) and a subset \(C\) of the sample space satisfying
- \(\mathbb{P}((X_{1},X_{2},\cdots ,X_{n})\in C;\theta_{0})=\alpha\);
- \({\displaystyle \frac{L(\theta_{0})}{L(\theta_{1})}\leq k}\) for \((x_{1},x_{2},\cdots ,x_{n})\in C\);
- \({\displaystyle \frac{L(\theta_{0})}{L(\theta_{1})}\geq k}\) for \((x_{1},x_{2},\cdots ,x_{n})\in C’\),
then \(C\) is a best critical region of size \(\alpha\) for testing the simple hypothesis \(H_{0}:\theta =\theta_{0}\) against the simple alternative hypothesis \(H_{1}:\theta =\theta_{1}\). \(\sharp\)
Example. Let \(X_{1},X_{2},\cdots ,X_{n}\) be a random sample from the normal distribution \(N(\mu ,36)\). We shall find the best critical region for testing the simple hypothesis \(H_{0}:\mu =50\) against the simple alternative hypothesis \(H_{1}:\mu =55\). Using the ratio of likelihood functions, namely \(L(50)/L(55)\), we shall find those points in the sample space for whcich this ratio is less than or equal to a constant \(k\). That is, we shall solve the following inequality
\begin{align*} \frac{L(50)}{L(55)} & =\frac{(72\pi )^{-n/2}\cdot\exp\left [-\left (\frac{1}{72}\right )\sum_{i=1}^{n} (x_{i}-50)^{2}\right ]}{(72\pi )^{-n/2}\cdot\exp\left [-\left (\frac{1}{72}\right )\sum_{i=1}^{n} (x_{i}-55)^{2}\right ]}\\ & =\exp\left [-\left (\frac{1}{72}\right )\left (10\sum_{i=1}^{n}
x_{i}+n\cdot 50^{2}-n\cdot 55^{2}\right )\right ]\leq k.\end{align*}
If we take the natural logarithm of each member of the inequality, we find
\[-10\sum_{i=1}^{n} x_{i}-n\cdot 50^{2}+n\cdot 55^{2}\leq 72\ln k.\]
Therefore, we obtain
\[\frac{1}{n}\sum_{i=1}^{n} x_{i}\geq -\frac{1}{10n}(n\cdot 50^{2}-n\cdot 55^{2}+72\ln k).\]
Equivalently, we have \(\bar{x}\geq c\), where
\[c=-(1/10n)(n\cdot 50^{2}-n\cdot 55^{2}+72\ln k).\]
Therefore, \(L(50)/L(55)\leq k\) is equivalent to \(\bar{x}\geq c\). According to the Neyman-Pearson lemma, the best critical region is
\[C=\{(x_{1},x_{2},\cdots ,x_{n}):\bar{x}\geq c\},\]
where \(c\) is selected such that the size of the critical region is \(\alpha\). We take \(n=16\) and \(c=53\). Since \(\bar{X}\) is \(N(50,36/16)\) under \(H_{0}\), we have
\begin{align*} \alpha & =\mathbb{P}(\bar{X}\geq 53;\mu =50)\\ & =\mathbb{P}\left (\frac{\bar{X}-50}{6/4}\geq\frac{3}{6/4};\mu =50\right )\\ & =1-\Phi (2)=0.0228.\end{align*}
The above example illustrates what is often true that the inequality
\[L(\theta_{0})/L(\theta_{1})\leq k\]
can be expressed in terms of a function \(u(X_{1},\cdots ,X_{n})\) given by \(u(x_{1},\cdots ,x_{n})\leq c_{1}\) or \(u(x_{1},\cdots ,x_{n})\geq c_{2}\), where \(c_{1}\) or \(c_{2}\) is selected such that the size of the critical region is \(\alpha\). Therefore, the test can be based on the statistic \(u(X_{1},\cdots ,X_{n})\). For illustration, if we want \(\alpha\) to be a given value \(0.05\), we could choose \(c_{1}\) or \(c_{2}\). In the example, with \(\alpha =0.05\), we want
\begin{align*} 0.05 & =\mathbb{P}(\bar{X}\geq c;\,u =50)\\ & =\mathbb{P}\left (\frac{\bar{X}-50}{6/4}\geq\frac{c-50}{6/4};\mu =50\right )\\ & =1-\Phi\left (\frac{c-50}{6/4}\right )\end{align*}
Hence, it must be true for \((c-50)/(3/2)=1.645\), or equivalently,
\[c=50+\frac{3}{2}\cdot 1.645\approx 52.47.\]
Example. Let \(X_{1},X_{2},\cdots ,X_{n}\) denote a random sample of size \(n\) from a Poisson distribution with mean \(\lambda\). A best critical region for testing \(H_{0}:\lambda =2\) against \(H_{1}:\lambda =5\) is given by
\[\frac{L(2)}{L(5)}=\frac{2^{\sum x_{i}}\cdot e^{-2n}}{x_{1}!x_{2}!\cdots
x_{n}!}\cdot\frac{x_{1}!x_{2}!\cdots x_{n}!}{5^{\sum x_{i}}\cdot e^{-5n}}\leq k.\]
This inequality is equivalent to
\[\left (\frac{2}{5}\right )^{\sum x_{i}}\cdot e^{3n}\leq k\]
and
\[\sum_{i=1}^{n} x_{i}\cdot\ln\left (\frac{2}{5}\right )+3n\leq \ln k.\]
Since \(\ln (2/5)<0\), this is the same as
\[\sum_{i=1}^{n} x_{i}\geq\frac{\ln k-3n}{\ln (2/5)}=c.\]
For \(n=4\) and \(c=13\), we have
\begin{align*} \alpha & =\mathbb{P}\left (\sum_{i=1}^{4} X_{i}\geq 13;\lambda =2\right )\\ & =1-0.936=0.064,\end{align*}
from the statistic table, since \(\sum_{i=1}^{4} X_{i}\) has a Poisson distribution with mean \(8\) when \(\lambda =2\). \(\sharp\)
When \(H_{0}:\theta =\theta_{0}\) and \(H_{1}:\theta =\theta_{1}\) are both simple hypothesis, a critical region of size \(\alpha\) is a best critical region if the probability of rejecting \(H_{0}\) when \(H_{1}\) is true is a maximum when compared with all other critical region of size \(\alpha\). The test using the best critical region is called a most powerful
test, since it has the greatest value of the power function at \(\theta =\theta_{1}\) when compared with that of other tests of significance level \(\alpha\). If \(H_{1}\) is a composite hypothesis, the power of a test depends on each simple alternative in \(H_{1}\).
Definition. A test, defined by a critical region \(C\) of size \(\alpha\), is a uniformly most powerful test when it is a most powerful test against each simple alternative in \(H_{1}\). The critical region \(C\) is called a uniformly most powerful critical region of size \(\alpha\). \(\sharp\)
Example. Let \(X_{1},X_{2},\cdots ,X_{n}\) be a random sample from \(N(\mu ,36)\). We have seen that when testing \(H_{0}:\mu =50\) against \(H_{1}:\mu =55\), a best critical region \(C\) is defined by
\[C=\{(x_{1},\cdots ,x_{n}):\bar{x}\geq c\},\]
where \(c\) is seleceted such that the significnce level is \(\alpha\). Now, we consider testing \(H_{0}:\mu =50\) against the one-sided composite alternative hypothesis \(H_{1}:\mu >50\). For each simple hypothesis in \(H_{1}\), say \(\mu =\mu_{1}\), the quotient of the likelihood function is
\begin{align*} \frac{L(5)}{L(\mu_{1})} & =\frac{(72\pi )^{-n/2}\cdot\exp\left [-\left (\frac{1}{72}\right )\sum_{i=1}^{n} (x_{i}-50)^{2}\right ]}{(72\pi )^{-n/2}\cdot\exp\left [-\left (
\frac{1}{72}\right )\sum_{i=1}^{n} (x_{i}-\mu_{1})^{2}\right ]}\\ & =\exp\left [-\left (\frac{1}{72}\right )\left (2(\mu_{1}-50)\sum_{i=1}^{n}
x_{i}+n(50^{2}-\mu_{1}^{2})\right )\right ].\end{align*}
Now \(L(50)/L(\mu_{1})\leq k\) if and only if
\[\bar{x}\geq\frac{(-72)\ln k}{2n(\mu_{1}-50)}+\frac{50+\mu_{1}}{2}=c.\]
Therefore, the best critical region of size \(\alpha\) for testing \(H_{0}:\mu =50\) against \(H_{1}:\mu =\mu_{1}\), where \(\mu_{1}>50\), is given by
\[C=\{(x_{1},\cdots ,x_{n}):\bar{x}\geq c\},\]
where \(c\) is selected satisfying \(\mathbb{P}(\bar{X}\geq x;H_{0}:\mu =50)=\alpha\). Note that the same value of \(c\) can be used for each \(\mu_{1}>50\), but \(k\) does not remain the same. Since the critical region \(C\) defines a test that is most powerful against each simple alternative \(\mu_{1}>50\), this is a uniformly most powerful test, and \(C\) is a uniformly most powerful critical region of size \(\alpha\). Again if \(\alpha =0.05\), then \(c\approx 52.47\). \(\sharp\)
Example. Let \(Y\) have the binomial distribution \(B(n,p)\). To find a uniformly most powerful test of the simple null hypothesis \(H_{0}:p=p_{0}\) against the one-sided alternative hypothesis \(H_{1}:p>p_{0}\), we consider, with \(p_{1}>p_{0}\),
\[\frac{L(p_{0})}{L(p_{1})}=\frac{C^{n}_{y}p{0}^{y}(1-p_{0})^{n-y}}{C^{n}_{y}p{1}^{y}(1-p_{1})^{n-y}}\leq k.\]
This is equivalent to
\[\left (\frac{p_{0}(1-p_{0})}{p_{1}(1-p_{1})}\right )^{y}\left (\frac{1-p_{0}}{1-p_{1}}\right )^{n}\leq k\]
and
\[y\ln\left (\frac{p_{0}(1-p_{1})}{p_{1}(1-p_{0})}\right )\leq\ln k-n\ln\left (\frac{1-p_{0}}{1-p_{1}}\right ).\]
Since \(p_{0}<p_{1}\) and \(p_{0}(1-p_{1})<p_{1}(1-p_{0})\), it follows \(\ln [p_{0}(1-p_{1})/p_{1}(1-p_{0})]<0\). Therefore, we have
\[\frac{y}{n}\geq\frac{\ln k-n\ln [(1-p_{0})/(1-p_{1})]}{n\ln [p_{0}(1-p_{1})/p_{1}(1-p_{0})]}=c\]
for each \(p_{1}>p_{0}\). It is interesting to note that if the alternative hypothesis is the one-sided \(H_{1}:p<p_{0}\), then a uniformly most powerful test is of the form \((y/n)\leq c\). \(\sharp\)
If a sufficient statistic \(Y=u(X_{1},X_{2},\cdots ,X_{n})\) exists for \(\theta\), then, by the factorization theorem,
\begin{align*} \frac{L(\theta_{0})}{L(\theta_{1})} & =\frac{\phi (u(x_{1},\cdots ,x_{n});\theta_{0})h(x_{1},\cdots ,x_{n})}{\phi (u(x_{1},\cdots ,x_{n});\theta_{1})h(x_{1},\cdots ,x_{n})}\\ & =
\frac{\phi (u(x_{1},\cdots ,x_{n});\theta_{0})}{\phi (u(x_{1},\cdots ,x_{n});\theta_{1})}\end{align*}
Therefore \(L(\theta_{0})/L(\theta_{1})\leq k\) provides a critical region that is a function of the observations \(x_{1},x_{2},\cdots ,x_{n}\) only through the observed value of the sufficient statistic \(y=u(x_{1},\cdots , x_{n})\). Hence, the best critical and uniformly most powerful critical regions are based upon the sufficient statistic when they exist.
\begin{equation}{\label{d}}\tag{D}\mbox{}\end{equation}
Likelihood Ratio Tests.
Let \(\Omega\) denote the total parameter space, i.e., the set of all possible values of the parameter \(\theta\) given by either \(H_{0}\) or \(H_{1}\). The hypotheses will be stated as follows
\[H_{0}:\theta\in\omega\mbox{ against }H_{1}:\theta\in\omega^{\prime},\]
where \(\omega\) is a subset of \(\Omega\) and \(\omega^{\prime}\) is the complement of \(\omega\) with respect to \(\Omega\). The test will be constructed by using the ratio of likelihood functions that have been maximized in \(\omega\) and \(\Omega\), respectively. In a sense, this is a natural generalization of the ratio appearing in the Neyman-Pearson
lemme when the two hypotheses are simple.
Definition. The likelihood ratio is the quotient
\[\lambda =\frac{L(\widehat{\omega})}{L(\widehat{\Omega})},\]
where \(L(\widehat{\omega})\) is the maximum of the likelihood function with respect to \(\theta\) when \(\theta\in\omega\), and \(L(\widehat{\Omega})\) is the maximum of the likelihood function with respect to \(\theta\) when \(\theta\in\Omega\). \(\sharp\)
Since \(\lambda\) is the quotient of nonnegative functions, we have \(\lambda\geq 0\). Since \(\omega\subset\Omega\), we have \(L(\widehat{\omega})\leq L(\widehat{\Omega})\), which says \(\lambda\leq 1\). Therefore, we obtain \(0\leq\lambda\leq 1\). If the maximum of \(L\) in \(\omega\) is much smaller than that in \(\Omega\), it would seem that the data \(x_{1},x_{2},\cdots ,x_{n}\) do not support the hypothesis \(H_{0}:\theta\in\omega\). That is, a small value of the ratio \(\lambda =L(\widehat{\omega})/L(\widehat{\Omega})\) would lead to the rejection of \(H_{0}\). On the other hand, a value of the ratio \(\lambda\) that is close to one would support the null hypothesis \(H_{0}\). This leads us to the following definition.
Definition. To test \(H_{0}:\theta\in\omega\) against \(H_{1}:\theta\in\omega^{\prime}\), the critical region for the likelihood ratio test is the set of points in the sample space for which
\[\lambda =\frac{L(\widehat{\omega})}{L(\widehat{\Omega})}\leq k,\]
where \(0<k<1\) and \(k\) is selected such that the test has a desired significance level \(\alpha\). \(\sharp\)
Example. Assume that the weight \(X\) in ounces of a “$10$-pound” bag of sugar is \(N(\mu ,5)\). We shall test the hypothesis \(H_{0}:\mu =162\) against the alternative hypothesis \(H_{1}:\mu\neq 162\). Therefore, we have
\[\Omega =\{\mu :-\infty <\mu <\infty\}\]
and \(\omega =\{162\}\). To find the likelihood ratio, we need \(L(\widehat{\omega})\) and \(L(\widehat{\Omega})\). When \(H_{0}\) is true, \(\mu\) can take only one value, namely \(\mu =162\). Therefore, we obtain \(L(\widehat{\omega})=L(162)\). To find \(L(\widehat{\Omega})\), we must find the value of \(\mu\) that maximizes \(L(\mu )\). We recall that \(\widehat{\mu}=\bar{x}\) is the maximum likelihood estimate of \(\mu\). Thus \(L(\widehat{\Omega})=L(\bar{x})\) and the likelihood ratio \(\lambda =L(\widehat{\omega})/L(\widehat{\Omega})\) is given by
\begin{align*} \lambda & =\frac{(10\pi )^{-n/2}\left [-\left (\frac{1}{10}\right )\sum_{i=1}^{n} (x_{i}-162)^{2}\right ]}{(10\pi )^{-n/2}\left [-\left (\frac{1}{10}\right )
\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}\right ]}\\
& =\frac{\exp\left [-\left (\frac{1}{10}\right )\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}-\left (\frac{n}{10}\right )(\bar{x}-162)^{2}\right ]}
{\exp\left [-\left (\frac{1}{10}\right )\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}\right ]}\\ & =\exp\left [-\frac{n}{10}(\bar{x}-162)^{2}\right ].\end{align*}
A value of \(\bar{x}\) close to \(162\) would tend to support \(H_{0}\) and in that case \(\lambda\) is close to \(1\). On the other hand, an \(\bar{x}\) that differs from \(162\) by too much would tend to support \(H_{1}\). A likelihood ratio critical region is given by \(\lambda\leq k\), where \(k\) is selected such that the significance level of the test is \(\alpha\). Using this criterion and simplifying the inequality as we do using the Neyman-Pearson lemma, we have that \(\lambda\leq k\) is equivalent to each of the following inequalities
\begin{align*} & -\left (\frac{n}{10}\right )(\bar{x}-162)^{2}\leq\ln k,\\
& (\bar{x}-162)^{2}\geq -\left (\frac{n}{10}\right )\ln k,\\
& \frac{|\bar{x}-162|}{\sigma /\sqrt{n}}\geq\frac{\sqrt{-(10/n)\ln k}}{\sigma /\sqrt{n}}=c.\end{align*}
Since \(Z=(\bar{X}-162)/(\sigma /\sqrt{n})\) is \(N(0,1)\) when \(H_{0}:\mu =162\) is true, let \(c=z_{\alpha /2}\). Therefore, the critical region is
\[C=\left\{\bar{x}:\frac{|\bar{x}-162|}{\sigma /\sqrt{n}}\geq z_{\alpha /2}\right\}.\]
For illustration, if \(\alpha =0.05\), we have \(z_{0.025}=1.96\). \(\sharp\)
Suppose that the random sample \(X_{1},X_{2},\cdots ,X_{n}\) arises from the normal distribution \(N(\mu ,\sigma^{2})\), where both \(\mu\) and \(\sigma^{2}\) are unknown. Let us consider the likelihood ratio test of the null hypothesis \(H_{0}:\mu =\mu_{0}\) against the two-sided alternative hypothesis \(H_{1}:\mu\neq\mu_{0}\). For this test
\[\omega =\{(\mu ,\sigma^{2}):\mu =\mu_{0}, 0<\sigma^{2}<\infty\}\]
and
\[\Omega =\{(\mu ,\sigma^{2}):-\infty <\mu <\infty , 0<\sigma^{2}<\infty\}.\]
If \((\mu ,\sigma^{2})\in\Omega\), the observed maximum likelihood estimates are \(\widehat{\mu}=\bar{x}\) and \(\widehat{\sigma^{2}}=(1/n)\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}\). Therefore, we have
\begin{align*} L(\widehat{\Omega}) & =\left [\frac{1}{2\pi\left (\frac{1}{n}\right )\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}}\right ]^{n/2}\exp\left [-\frac
{\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}}{\left (\frac{2}{n}\right )\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}}\right ]\\ & =\left [\frac{ne^{-1}}
{2\pi\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}}\right ]^{n/2}.\end{align*}
Similarly, if \((\mu ,\sigma^{2})\in\omega\), the observed maximum likelihood estimates are \(\widehat{\mu}=\mu_{0}\) and \(\widehat{\sigma^{2}}=(1/n)\sum_{i=1}^{n} (x_{i}-\mu_{0})^{2}\). Therefore, we have
\begin{align*} L(\widehat{\omega}) & =\left [\frac{1}{2\pi\left (\frac{1}{n}\right )\sum_{i=1}^{n} (x_{i}-\mu_{0})^{2}}\right ]^{n/2}\exp\left [-\frac
{\sum_{i=1}^{n} (x_{i}-\mu_{0})^{2}}{\left (\frac{2}{n}\right )\sum_{i=1}^{n} (x_{i}-\mu_{0})^{2}}\right ]\\ & =\left [\frac{ne^{-1}}
{2\pi\sum_{i=1}^{n} (x_{i}-\mu_{0})^{2}}\right ]^{n/2}.\end{align*}
The likelihood ratio \(\lambda =L(\widehat{\omega})/L(\widehat{\Omega})\) for this test is
\begin{align*} \lambda & =\frac{\left [\frac{ne^{-1}}{2\pi\sum_{i=1}^{n} (x_{i}-\mu_{0})^{2}}\right ]^{n/2}}{\left [\frac{ne^{-1}}
{2\pi\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}}\right ]^{n/2}}\\ & =\left [\frac{\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}}{\sum_{i=1}^{n} (x_{i}-\mu_{0})^{2}}\right ]^{n/2}.\end{align*}
Note that
\begin{align*} \sum_{i=1}^{n} (x_{i}-\mu_{0})^{2} & =\sum_{i=1}^{n} (x_{i}-\bar{x}+\bar{x}-\mu_{0})^{2}\\ & =\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}+n(\bar{x}-\mu_{0})^{2}.\end{align*}
If this substitution is made in the denominator of \(\lambda\), we have
\begin{align*} \lambda & =\left [\frac{\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}}{\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}+n(\bar{x}-\mu_{0})^{2}}\right ]^{n/2}\\
& =\left [\frac{1}{1+\frac{n(\bar{x}-\mu_{0})^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\right ]^{n/2}.\end{align*}
Note that \(\lambda\) is close to \(1\) when \(\bar{x}\) is close to \(\mu_{0}\), and it is small when \(\bar{x}\) and \(\mu_{0}\) differ by a great deal. The likelihood ratio test, given by the inequality \(\lambda\leq k\), is the same as
\[\frac{1}{1+\frac{n(\bar{x}-\mu_{0})^{2}}{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}}\leq k^{2/n}\]
or equivalently
\[\frac{n(\bar{x}-\mu_{0})^{2}}{\frac{\sum_{i=1}^{n}(x_{i}-\bar{x})^{2}}{n-1}}\geq (n-1)(k^{-2/n}-1).\]
When \(H_{0}\) is true, we see that \(\sqrt{n}(\bar{X}-\mu_{0})/\sigma\) is \(N(0,1)\), and \(\sum_{i=1}^{n} (X_{i}-\bar{X})^{2}/\sigma^{2}\) has an independent chi-square distribution \(\chi^{2}(n-1)\). Hence, under \(H_{0}\),
\begin{align*} T & =\frac{\sqrt{n}(\bar{X}-\mu_{0})/\sigma}{\sqrt{\sum_{i=1}^{n}(X_{i}-\bar{X})^{2}/(\sigma^{2}(n-1))}}\\ & =\frac{\sqrt{n}(\bar{X}-\mu_{0})}
{\sqrt{\sum_{i=1}^{n} (X_{i}-\bar{X})^{2}/(n-1)}}\\ & =\frac{\bar{X}-\mu_{0}}{S/\sqrt{n}}\end{align*}
has a \(t\)-distribution with \(n-1\) degrees of freedom. In accordance with the likelihood ratio test criterion, \(H_{0}\) is rejected if the observed \(T^{2}\geq (n-1)(k^{-2/n}-1)\).
That is, we reject \(H_{0}:\mu =\mu_{0}\) and accept \(H_{1}:\mu\neq\mu_{0}\) if the observed \(|T|\geq t_{\alpha /2}(n-1)\).
We observe that this test is exactly the same as the listed in Table B for testing \(H_{0}:\mu =\mu_{0}\) against \(H_{1}:\mu\neq\mu_{0}\). That is, the test listed there is a likelihood ratio test. As a matter of fact, all six of the tests given in Tables B and D are likelihood ratio tests. Therefore, the examples associated with those tables are illustrations of the use of likelihood ratio test.
Now, we concern a test about the variance of a normal population. Let \(X_{1},X_{2},\cdots ,X_{n}\) be a random sample from \(N(\mu ,\sigma^{2})\), where \(\mu\) and \(\sigma^{2}\) are unknown. We wish to test \(H_{0}:\sigma^{2}=\sigma^{2}_{0}\) against \(H_{1}:\sigma^{2}\neq\sigma_{0}^{2}\). For this, we have
\[\omega =\{(\mu ,\sigma^{2}):-\infty <\mu <\infty ,\sigma^{2}=\sigma_{0}^{2}\}\]
and
\[\Omega =\{(\mu ,\sigma^{2}):-\infty <\mu <\infty ,0<\sigma^{2}<\infty\}.\]
As in the test concerning the mean, we obtain
\[L(\widehat{\Omega})=\left [\frac{ne^{-1}}{2\pi\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}}\right ]^{n/2}.\]
If \((\mu ,\sigma^{2})\in\omega\), then \(\widehat{\mu}=\bar{x}\) and \(\widehat{\sigma^{2}}=\sigma_{0}^{2}\). Therefore, we have
\[L(\widehat{\omega})=\left (\frac{1}{2\pi\sigma_{0}^{2}}\right )^{n/2}\exp
\left [-\frac{\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}}{2\sigma_{0}^{2}}\right ].\]
Accordingly, the likelihood ratio test \(\lambda =L(\widehat{\omega})/L(\widehat{\Omega})\) is
\[\lambda =\left (\frac{w}{n}\right )^{n/2}\exp\left (-\frac{w}{2}+\frac{n}{2}\right )\leq k,\]
where
\[w=\sum_{i=1}^{n} (x_{i}-\bar{x})^{2}/\sigma_{0}^{2}.\]
Solving this inequality for \(w\), we obtain a solution of the form \(w\leq c_{1}\) or \(w\geq c_{2}\), where the constants \(c_{1}\) and \(c_{2}\) are appropriate functions of the constants \(k\) and \(n\) so as to achieve the desired significance level \(\alpha\). Since
\[W=\sum_{i=1}^{n} (X_{i}-\bar{X})^{2}/\sigma_{0}^{2}\]
is \(\chi^{2}(n-1)\) if \(H_{0}:\sigma^{2}=\sigma_{0}^{2}\) is true, most statisticians modify this test slightly by taking
\[c_{1}=\chi_{1-\alpha /2}^{2}(n-1)\]
and
\[c_{2}=\chi_{\alpha /2}^{2}(n-1).\]
That is, the test and the other test listed in Table C are either likelihood ratio tests or slight modifications of likelihood ratio tests. As a matter of fact, most tests involving normal assumptions are likelihood ratio tests or modifications of them involving regression and analysis of variance. Note that likelihood ratio tests are based on sufficient statistics when they exist as was also true of best critical and uniformly most powerful critical regions.
Definition. A statement regrading the parameter \(\theta\) such as \(\theta\in\omega\subseteq\Omega\) is called a statistical hypothesis (about \(\theta\)) and is usually denoted by \(H_{0}\).The statement that \(\theta\in\omega^{c}\) (the complement set of \(\omega\) with respect to \(\Omega\)) is also a statistical hypothesis about \(\theta\), which is called the alternative to \(H_{0}\) and is usually denoted by \(H_{a}\) or \(H_{1}\). Therefore, we have
\begin{align*} H_{0}: & \theta\in\omega\\ H_{a}: & \theta\in\omega^{c}\end{align*}
Definition. Let \(\beta (\theta)=\mathbb{P}_{\theta}(\mbox{rejecting }H_{0})\) satisfying \(1-\beta (\theta)=\mathbb{P}_{\theta}(\mbox{accepting }H_{0})\) for \(\theta\in\Omega\). Then \(\beta (\theta)\) with \(\theta\in \omega\) is the probability of rejecting \(H_{0}\), which is calculated under the assumption that \(H_{0}\) is true. Therefore, for \(\theta\in \omega\), \(\beta (\theta)\) is the probability of an error, namely, the probability of type-I error. \(1-\beta (\theta)\) with \(\theta\in \omega^{c}\) is the probability of accepting \(H_{0}\), which is calculated under the assumption that \(H_{0}\) is false. Therefore, for \(\theta\in\omega^{c}\), the value \(1-\beta (\theta)\) represents the probability of an error, namely, the probability of type-II error. The function \(\beta\) restricted to \(\omega^{c}\) is called the power function of the test and \(\beta (\theta)\) is called the power of the test at \(\theta\in\omega^{c}\). The value \(\sup\{\beta (\theta):\theta\in\omega\}\) is denoted by \(\alpha\) and is called the level of significance. \(\sharp\)
We can see that \(\alpha\) is the smallest upper bound of the type-I error probability. It is plain that one would desire to make \(\alpha\) as small as possible. At the same time to make the power as large as possible. Maximizing the power is equivalent to minimizing the type-II error probability. Unfortunately, with a fixed sample size, this cannot be done in general. What the theory of testing hypotheses does is to fix the size \(\alpha\) at a desirable level (which is usually taken to be \(0.005\), \(0.01\), \(0.05\), \(0.1\)) and then derive tests which maximize the power.
Definition. When the experimental sample values have been obtained, a test of statistical hypothesis is a rule which leads to a decision to accept or to reject the hypothesis under consideration. In accordance with a prescribed test, let \(C\) be the subset of the sample space which leads to the rejection of the hypothesis under consideration. Then \(C\) is called the critical region of the test. \(\sharp\)
Let \(f(x;\theta )\) denote the p.d.f. of a random variable \(X\), and let \(X_{1},\cdots ,X_{n}\) denote the random sample from this distribution. We consider the two simple hypotheses \(H_{0}:\theta =\theta^{\prime}\) and \(H_{a}:\theta =\theta^{\prime\prime}\). Therefore, we have \(\Omega =\{\theta^{\prime\}, \theta^{\prime\prime}}\). The values
\[\mathbb{P}((X_{1},\cdots ,X_{n}\in C| H_{0})\]
and
\[\mathbb{P}((X_{1},\cdots ,X_{n}\in C|H_{a})\]
mean the probability \(\mathbb{P}((X_{1},\cdots,X_{n}\in C)\) when, respectively, \(H_{0}\) and \(H_{a}\) are true.
Definition. Let \(C\) denote a subset of the sample space. Then \(C\) is called a best critical region of size \(\alpha\) for testing the simple hypothesis \(H_{0}:\theta =\theta^{\prime}\) against the alternative simple hypothesis \(H_{a}:\theta =\theta^{\prime\prime}\) when, for every subset \(A\) of the sample space for which \(\mathbb{P}((X_{1},\cdots ,X_{n})\in A|H_{0})=\alpha\):
- \(\mathbb{P}((X_{1},\cdots ,X_{n}\in C|H_{0})=\alpha\);
- \(\mathbb{P}((X_{1},\cdots ,X_{n}\in C|H_{a})\geq\mathbb{P}((X_{1},\cdots,X_{n})\in A|H_{1})\).
Theorem. (Neyman-Pearson Theorem). Let \(X_{1},\cdots ,X_{n}\) denote a random sample from a distribution that has p.d.f. \(f(x;\theta )\). Then, the joint p.d.f. of \(X_{1},\cdots ,X_{n}\) is given by
\[L(\theta ;x_{1},\cdots ,x_{n})=f(x_{1};\theta )f(x_{2};\theta )\cdots f(x_{n};\theta ).\]
Let \(\theta^{\prime}\) and \(\theta^{\prime\prime}\) be distinct values of \(\theta\) such that \(\Omega =\{\theta^{\prime},\theta^{\prime\prime}\}\), and let \(k\) be a positive number. Let \(C\) be a subset of the sample space such that:
- \({\displaystyle \frac{L(\theta^{\prime};x_{1},\cdots,x_{n})} {L(\theta^{\prime\prime};x_{1},\cdots ,x_{n})}\leq k}\) for each point \((x_{1},\cdots ,x_{n})\in C\).
- \({\displaystyle \frac{L(\theta^{\prime};x_{1},\cdots,x_{n})} {L(\theta^{\prime\prime};x_{1},\cdots ,x_{n})}\geq k}\) for each point \((x_{1},\cdots ,x_{n})\in C^{*}\), where \(C^{*}\) is the complement set of \(C\).
Then \(C\) is a best critical region of size \(\alpha\) for testing the simple hypothesis \(H_{0}:\theta =\theta^{\prime}\) against the alternative simple hypothesis \(H_{a}:\theta =\theta^{\prime\prime}\). \(\sharp\)
Example. Let \(X_{1},X_{2},\cdots ,X_{n}\) denote a random sample from the distribution that has the p.d.f.
\[f(x;\theta )=\frac{1}{\sqrt{2\pi}}\exp\left [-\frac{(x-\theta )^{2}}{2}\right ]\]
for \(-\infty <x<\infty\). It is desired to test the simple hypothesis \(H_{0}=\theta=\theta^{\prime} =0\) against \(H_{a}=\theta=\theta^{\prime\prime}=1\). Now, we have
\begin{align*} \frac{L(\theta^{\prime};x_{1},\cdots ,x_{n})}{L(\theta^{\prime\prime};x_{1},\cdots ,x_{n})} & =\frac{(1/\sqrt{2\pi})^{n}\exp\left [-\left (\sum_{i=1}^{n} x_{i}^{2}\right )/2\right ]}{(1/\sqrt{2\pi})^{n}\exp\left [-\left (\sum_{i=1}^{n} (x_{i}-1)^{2}\right )/2\right ]}\\
& =\exp\left (-\sum_{i=1}^{n} x_{i}+\frac{n}{2}\right ).\end{align*}
For \(k>0\), the set of all points \((x_{1},\cdots ,x_{n})\) satisfying
\[\exp\left (-\sum_{i=1}^{n} x_{i}+\frac{n}{2}\right )\leq k\]
is a best critical region. This inequality holds if and only if
\[-\sum_{i=1}^{n} x_{i}+\frac{n}{2}\leq\ln k\]
or equivalently
\[\sum_{i=1}^{n} x_{i}\geq\frac{n}{2}-\ln k\equiv c.\]
In this case, a best critical region is the set
\[C=\{(x_{1},\cdots,x_{n}): \sum_{i=1}^{n} x_{i}\geq c\},\]
where \(c\) is a constant that can be determined such that the size of the critical region is a desired number \(\alpha\). The event \(\sum_{i=1}^{n} X_{i}\geq c\) is equivalent to the event \(\bar{X}\geq c/n\equiv c_{1}\). Therefore, the test may be based on the statistic \(\bar{X}\). If \(H_{0}\) is true, i.e., \(\theta =\theta^{\prime}=0\), then \(\bar{X}\) has a distribution \(N(0,1/n)\). For a given significance level \(\alpha\), the number \(c_{1}\) can be found by consulting the Table satisfying \(\mathbb{P}(\bar{X} \geq c_{1}|H_{0})=\alpha\). Therefore, for the experimental values \(x_{1},\cdots ,x_{n}\), we would compute \(\bar{x}=\sum_{i=1}^{n}/n\). If \(\bar{x}\geq c_{1}\), the simple hypothesis \(H_{0}=\theta =\theta^{\prime}=0\) would be rejected at the significance level \(\alpha\); if \(\bar{x}<c_{1}\), the hypothesis \(H_{0}\) would be accepted. The probability of rejecting \(H_{0}\), when \(H_{0}\) is true, is \(\alpha\); the probability of rejecting \(H_{0}\), when \(H_{0}\) is false, is the value of the power of the test at \(\theta =\theta^{\prime\prime}=1\). That is,
\[\mathbb{P}\left (\bar{X}\geq c_{1}|H_{a}\right )=\int_{c_{1}}^{\infty}\frac{1}{\sqrt{2\pi}\sqrt{1/n}}\exp\left [-\frac{(\bar{x}-1)^{2}}{2(1/n)}\right ]d\bar{x}.\]
Definition. The critical region \(C\) is a uniformly most powerful critical region of size \(\alpha\) for testing the simple hypothesis \(H_{0}\) against an alternative composite hypothesis \(H_{a}\) when the set \(C\) is a best critical region of size \(\alpha\) for testing \(H_{0}\) against each simple hypothesis in \(H_{a}\). A test defined by this critical region \(C\) is called a uniformly most powerful test, with significance level \(\alpha\), for testing the simple hypothesis \(H_{0}\) against the alternative composite hypothesis \(H_{a}\). \(\sharp\)
Example. Let \(X_{1},\cdots ,X_{n}\) denote a random sample from a distribution that is \(N(0,\theta )\), where the variance \(\theta\) is an unknown positive number. We are going to find a uniformly most powerful test with significance level \(\alpha\) for testing the simple hypothesis \(H_{0}; \theta =\theta^{\prime}\) against the alternative composite hypothesis \(H_{a}:\theta >\theta^{\prime}\). Therefore, we have \(\Omega =\{\theta :\theta\geq \theta^{\prime}\}\). The joint p.d.f. of \(X_{1},\cdots ,X_{n}\) is
\[L(\theta ;x_{1},\cdots ,x_{n})=\left (\frac{1}{2\pi\theta}\right )^{n/2}\exp\left (-\frac{\sum_{i=1}^{n} x_{i}^{2}}{2\theta}\right ).\]
Let \(\theta^{\prime\prime}\) represent a number greater than \(\theta^{\prime}\), and let \(k\) denote a positive number. Let \(C\) be the set of points, where
\[\frac{L(\theta^{\prime};x_{1},\cdots ,x_{n})}{L(\theta^{\prime\prime};x_{1},\cdots ,x_{n})}\leq k,\]
i.e., the set of points, where
\[\left (\frac{\theta^{\prime\prime}}{\theta^{\prime}}\right )^{n/2}\exp\left [-\left (\frac{\theta^{\prime\prime}-\theta^{\prime}}{2\theta^{\prime}\theta^{\prime\prime}}\right )\sum_{i=1}^{n} x_{i}^{2}\right ]\leq k\]
or equivalently
\[\sum_{i=1}^{n} x_{i}^{2}\geq\frac{2\theta^{\prime}\theta^{\prime\prime}}{\theta^{\prime\prime}-\theta^{\prime}}\left [\frac{n}{2}\ln\left (\frac{\theta^{\prime\prime}}{\theta^{\prime}}\right )-\ln k\right ]=c.\]
The set
\[C=\left\{(x_{1},\cdots ,x_{n}):\sum_{i=1}^{n} x_{i}^{2}\geq c\right\}\]
is then a best critical region for testing the simple hypothesis \(H_{0}:\theta =\theta^{\prime}\) against the simple hypothesis \(H_{a}:\theta = \theta^{\prime\prime}\). It remains to determine \(c\) such that this critical region has the desired size \(\alpha\). If \(H_{0}\) is true, the random variable \(\sum_{i=1}^{n} X_{i}^{2}/\theta^{\prime}\) has a chi-square distribution with \(n\) degrees of freedom. Since
\[\alpha =\mathbb{P}\left (\sum_{i=1}^{n} X_{i}^{2}/\theta^{\prime}\geq\c/\theta^{\prime}|H_{0} \right),\]
$c/\theta^{\prime}$ may be obtained from a Table to determine$c$. Moreover, for each number \(\theta^{\prime\prime}\) greater than \(\theta^{\prime}\), the foregoing argument holds. That is, if \(\theta^{\prime\prime\prime}\) is another number greater than \(\theta^{\prime}\), then
\[C=\left\{(x_{1},\cdots ,x_{n}):\sum_{i=1}^{n} x_{i}^{2} \geq c\right\}\]
is a best critical region of size \(\alpha\) for testing \(H_{0}:\theta =\theta^{\prime}\) against \(H_{a}:\theta = \theta^{\prime\prime\prime}\). Accordingly,
\[C=\left\{(x_{1},\cdots ,x_{n}): \sum_{i=1}^{n} x_{i}^{2}\geq c\right\}\]
is a uniformly most powerful critical region of size \(\alpha\) for testing \(H_{0}:\theta =\theta^{\prime}\) against \(H_{a}:\theta >\theta^{\prime}\). If \(x_{1},\cdots ,x_{n}\) denote the experimental values of \(X_{1},\cdots ,X_{n}\), then \(H_{0}:\theta = \theta^{\prime}\) is rejected at the significance level \(\alpha\), and \(H_{1}:\theta\theta^{\prime}\) is accepted, if \(\sum_{i=1}^{n} x_{i}^{2} \geq c\); otherwise, \(H_{0}:\theta =\theta^{\prime}\) is accepted. \(\sharp\)
The test of hypotheses are summarized
(i) Tests of hypotheses about one mean, variance known:
\[\begin{array}{lll}
\hline H_{0} & H_{a} & \mbox{Critical Region}\\
\hline \mu=\mu_{0} & \mu >\mu_{0} & \bar{x}\geq\mu_{0}+z_{\alpha}\sigma /\sqrt{n}\\
\mu =\mu_{0} & \mu <\mu_{0} & \(\bar{x}\leq\mu_{0}-z_{\alpha}\sigma /\sqrt{n}\\
\mu =\mu_{0} & \mu\neq\mu_{0}\) |\bar{x}-\mu_{0}|\geq z_{\alpha/2}\sigma /\sqrt{n}\\
\hline\end{array}\]
(ii) Tests of hypotheses about one mean, variance unknown:
\[\begin{array}{lll}
\hline H_{0} & H_{a} & \mbox{Critical Region}\\
\hline \mu=\mu_{0} & \mu >\mu_{0} & \bar{x}\geq\mu_{0}+t_{n-1;\alpha}\sigma /\sqrt{n}\\
\mu =\mu_{0} & \mu <\mu_{0} & \bar{x}\leq\mu_{0}-t_{n-1;\alpha}\sigma /\sqrt{n}\\
\mu =\mu_{0} & \mu\neq\mu_{0} & |\bar{x}-\mu_{0}|\geq t_{n-1;\alpha/2}\sigma /\sqrt{n}\\
\hline\end{array}\]
(iii) Tests about the variance: We have known that \((n-1)S_{n-1}^{2}/\sigma^{2}\) is distributed as \(\chi_{n-1}^{2}\).
\[\begin{array}{lll}
\hline H_{0} & H_{a} & \mbox{Critical Region}\\
\hline \sigma^{2}=\sigma_{0}^{2} & \sigma^{2}>\sigma_{0}^{2} & {\displaystyle s^{2}\geq\sigma_{0}^{2}\cdot\frac{\chi_{n-1;\alpha}^{2}}{n-1}}\\
\sigma^{2}=\sigma_{0}^{2} & $\sigma^{2}<\sigma_{0}^{2} & ${\displaystyle s^{2}\leq\sigma_{0}^{2}\cdot\frac{\chi_{n-1;1-\alpha}^{2}}{n-1}}\\
\sigma^{2}=\sigma_{0}^{2} & \sigma^{2}\neq\sigma_{0}^{2} & {\displaystyle s^{2}\leq\sigma_{0}^{2}\cdot\frac{\chi_{n-1;1-\alpha/2}^{2}}{n-1}}\mbox{ or }{\displaystyle s^{2}\geq \sigma_{0}^{2}\cdot\frac{\chi_{n-1;\alpha/2}^{2}}{n-1}}\\
\hline\end{array}\]
(iv) Tests comparing two means when \(\sigma_{1}\) and \(\sigma_{2}\) are known: Let us assume that \(X_{1}\) and \(X_{2}\) are normally distributed and possess known standard deviations \(\sigma_{1}\) and \(\sigma_{2}\). If we define \(Y=X_{1}-X_{2}\), then \(Y\) is distributed as \(N(\mu_{1}-\mu_{2}, \sigma_{1}^{2}+\sigma_{2}^{2})\equiv N(\mu_{Y},\sigma_{Y}^{2})\). We see that \(\bar{Y}=\bar{X}_{1}-\bar{X}_{2}\) is distributed as
\[N\left (\mu_{1}-\mu_{2},\sqrt{\frac{\sigma_{1}^{2}}{n_{1}}+\frac{\sigma_{2}^{2}}{n_{2}}}\right )\equiv N(\mu_{\bar{Y}},\sigma_{\bar{Y}}^{2}),\]
where \(n_{1}\) and \(n_{2}\) are the sizes of the samples from each of the populations. Then, we can use the procedures presented in (i). The null hypothesis is \(H_{0}:\mu_{Y}=\mu_{1}-\mu_{2}\equiv\mu_{Y_{0}}\).
\[\begin{array}{lll}
\hline H_{0} & H_{a} & \mbox{Critical Region}\\
\hline \mu_{Y}=\mu_{Y_{0}} & \mu_{Y}>\mu_{Y_{0}} & \bar{y}\geq\mu_{0}+z_{\alpha}\sigma /\sqrt{n}\\
\mu_{Y}=\mu_{Y_{0}} & \mu_{Y}<\mu_{Y_{0}} & \bar{y}\leq\mu_{0}-z_{\alpha}\sigma /\sqrt{n}\\
\mu_{Y}=\mu_{Y_{0}} & \mu_{Y}\neq\mu_{Y_{0}} & |\bar{y}-\mu_{Y_{0}}|\geq z_{\alpha/2}\sigma /\sqrt{n}\\
\hline\end{array}\]
(v) Tests comparing two means when \(\sigma_{1}\) and \(\sigma_{2}\) are unknown: Suppose that \(n_{1}\geq 30\) and \(n_{2}\geq 30\), the samples are large. Then, we can replace \(\sigma_{1}\) and \(\sigma_{2}\) with the sample variances \(s_{1}\) and \(s_{2}\) and proceed as discussed in (iii). Because of the CLT, we need not even assume that the distributions of \(X_{1}\) and \(X_{2}\) are normal. If the sample sizes are not large, we must use the \(t\)-distribution. In this situation, we both
- assume normality for both populations
- make the added assumption that \(\sigma = \sigma_{1}=\sigma_{2}\).
As above,
\[\sigma_{Y}^{2}=(\sigma_{1}^{2}/n_{1})+ (\sigma_{2}^{2}/n_{2}).\]
Since we assume that \(\sigma_{1}=\sigma_{2}= \sigma\) and \(\sigma_{\bar{Y}}=\sigma\sqrt{(1/n_{1})+(1/n_{2})}\), the estimate of the common value \(\sigma^{2}\) is
\[s_{P}^{2}=\frac{\sum (x_{1i}-\bar{x}_{1})^{2}+\sum (x_{2i}-\bar{x}_{2})^{2}}{n_{1}+n_{2}-2},\]
which is equivalent to
\[s_{P}^{2}=\frac{(n_{1}-1)s_{1}^{2}+(n_{2}-1)s_{2}^{2}}{n_{1}+n_{2}-2},\]
where \(s_{1}^{2}\) and \(s_{2}^{2}\) are the sample variances.
\[\begin{array}{lll}
\hline H_{0} & H_{a} & \mbox{Critical Region}\\
\hline \mu_{X_{1}}=\mu_{X_{2}} & \mu_{X_{1}}>\mu_{X_{2}} & \bar{x}-\bar{y}\geq t_{n_{1}+n_{2}-2;\alpha}\cdot s_{P}\cdot\sqrt{1/n_{1}+1/n_{2}}\\
\mu_{X_{1}}=\mu_{X_{2}} & \mu_{X_{1}}<\mu_{X_{2}} & \bar{x}-\bar{y}\leq -t_{n_{1}+n_{2}-2;\alpha}\cdot s_{P}\cdot\sqrt{1/n_{1}+1/n_{2}}\\
\mu_{X_{1}}=\mu_{X_{2}} & \mu_{X_{1}}\neq\mu_{X_{2}} & |\bar{x}-\bar{y}|\geq t_{n_{1}+n_{2}-2;\alpha /2}\cdot s_{P}\cdot\sqrt{1/n_{1}+1/n_{2}}\\
\hline\end{array}\]
(vi) Tests concerning two variances: Let both sampled population be governed by a normal distribution. We set the sample sizes from our two populations as \(n\) and \(m\). Then, we have that
\begin{align*} F & =\frac{\chi_{n-1}^{2}/(n-1)}{\chi_{m-1}^{2}/(m-1)}\\
& =\frac{[(n-1)S_{1}^{2}]/\sigma_{1}^{2}}{(n-1)}/\frac{[(m-1)S_{2}^{2}]/\sigma_{2}^{2}}{(m-1)}\\
& =\frac{S_{1}^{2}/\sigma_{1}^{2}}{S_{2}^{2}/\sigma_{2}^{2}}\end{align*}
is distributed as an \(F\) random variable with \(n-1\) and \(m-1\) degrees of freedom. If \(\sigma_{1}=\sigma_{2}\), i.e., if the null hypothesis of equal variances is true, then
\[F=\frac{S_{1}^{2}}{S_{2}^{2}}\]
is also distributed as an \(F\) random variable with \(n-1\) and \(m-1\) degrees of freedom. Let \(f=s_{1}^{2}/s_{2}^{2}\).
\[\begin{array}{lll}
\hline H_{0} & H_{a} & \mbox{Critical Region}\\
\hline \sigma_{1}^{2}=\sigma_{2}^{2} & \sigma_{1}^{2}>\sigma_{2}^{2} & f\geq F_{n-1,m-1;\alpha}\\
\sigma_{1}^{2}=\sigma_{2}^{2} & \sigma_{1}^{2}<\sigma_{2}^{2} & f\geq F_{m-1,n-1;\alpha}\\
\sigma_{1}^{2}=\sigma_{2}^{2} & \sigma_{1}^{2}\neq\sigma_{2}^{2} & f\geq F_{n-1,m-1;\alpha/2}$ or $latex f\geq F_{m-1,n-1;\alpha/2}\\
\hline\end{array}\]


