Laszlo Neogrady (1896-1962) was a Hungarian painter.
We have sections
- The Inverse Transform Method
- Composition
- Convolution
- The Acceptance-Rejection Method
- Generating Multivariate Distributions
- Generating Poisson Process
- Generating Non-homogeneous Poisson Process
\begin{equation}{\label{a}}\tag{A}\mbox{}\end{equation}
The Inverse Transform Method.
Consider a continuous random variable having distribution function \(F\). A general method for generating such a random variable, called the inverse transformation method, is based on the following theorem.
Theorem. Let \(U\) be a uniform \((0,1)\) random variable. For any continuous function \(F\) the random variable \(X\) defined by \(X=F^{-1}(U)\) has distribution \(F\), where \(F^{-1}(u)\) is defined to be the value of \(x\) satisfying \(F(x)=u\).
Proof. Let \(F_{X}\) denote the distribution function of \(X=F^{-1}(U)\). Then, we have
\begin{align*}
F_{X}(x) & =\mathbb{P}\{X\leq x\}=P\{F^{-1}(U)\leq x\}\\
& =\mathbb{P}\{F(F^{-1}(U))\leq F(x)\}\mbox{ (since \(F(x)\) is an increasing function)}\\
& =\mathbb{P}\{U\leq F(x)\}=F(x)\mbox{ (since \(U\) is uniform \((0,1)\))}
\end{align*}
The proof is complete. \(\blacksquare\)
The above theorem shows that we can generate a random variable \(X\) from the continuous distribution function \(F\) by generating a random number \(u\) and then setting \(X=x=F^{-1}(u)\).
Example. Suppose that we want to generate a random variable \(X\) having distribution function \(F(x)=x^{n}\) for \(0<x<1\). Let \(x=F^{-1}(u)\). Then, we have \(u=F(x)=x^{n}\) or equivalently \(x=u^{1/n}\). Therefore, we can generate such a random variable by generating a random number \(u\) and then setting \(X=x=u^{1/n}\). \(\sharp\)
Example. The distribution function of a \(U(a,b)\) random variable is easily inverted by solving \(u=F(x)\) for \(x\) to obtain, for \(0\leq u\leq 1\),
\[x=F^{-1}(u)=a+(b-a)u\]
Therefore, we can use the inverse transform method to generate \(X\)
\[\begin{array}{l}
\mbox{Step 1. Generate a random number \(u\)}\\
\mbox{Step 2. Return \(x=a+(b-a)u\)}
\end{array}\]
If many \(X\) values are to be generated, the constant \(b-a\) should be computed beforehand and stored for use in the algorithm. \(\sharp\)
\begin{equation}{\label{sch3ex3}}\tag{1}\mbox{}\end{equation}
Example \ref{sch3ex3}. A continuous random variable having probability density function \(f(x)=\lambda e^{-\lambda x}\) for \(0<x<\infty\) and for some \(\lambda >0\) is said to be an exponential random variable with parameter \(\lambda\). Its cumulative distribution is given by
\[F(x)=\int_{0}^{x}\lambda e^{-\lambda t}dt=1-e^{-\lambda x}\]
for \(0<x<\infty\). We take \(\lambda =1\). Then, we have \(F(x)=1-e^{-x}\). Let \(x=F^{-1}(u)\). Then, we have \(u=F(x)=1-e^{-x}\), which implies \(1-u=e^{-x}\). By taking logarithms, we obtain \(x=-\ln (1-u)\). Therefore, we can generate an exponential random variable with parameter \(1\) by generating a random number \(u\) and then setting
\[X=x=F^{-1}(u)=-\ln (1-u).\]
Since \(1-U\) is also uniform on \((0,1)\), it follows that$-\ln (1-U)$ has the same distribution as \(-\ln U\). That is, we have \(X=-\ln U\) is exponentially distributed with parameter \(1\). In addition, if \(X\) is an exponential random variable with parameter \(1\), i.e. with mean \(1\), then, for any positive \(\lambda\), \(\lambda X\) is an exponential random variable with parameter \(\lambda\), i.e. with mean \(\lambda\). Therefore, an exponential random variable \(X\) with parameter \(\lambda\) (mean \(1/\lambda\)) can be generated by generating a random number \(u\) and setting
\[X=x=-\frac{1}{\lambda}\ln u.\]
Example. The density function of Weibull distribution \(W(\alpha ,\beta )\) is given by
\[f(x)=\frac{\alpha}{\beta^{\alpha}}x^{\alpha -1}\exp\left [-\left (\frac{x}{\beta}\right )^{\alpha}\right ]\mbox{ for }x>0.\]
The distribution function is
\[F(x)=1-e^{-(x/\beta )^{\alpha}}\mbox{ for }x>0\]
The Weibull distribution is easily inverted to obtain
\[F^{-1}(u)=\beta (-\ln (1-u))^{1/\alpha}\]
which leads to the following inverse transform algorithm
\[\begin{array}{l}
\mbox{Step 1. Generate \(u\)}\\
\mbox{Step 2. Return \(x=\beta (-\ln u)^{1/\alpha}\)}
\end{array}\]
Again we are exploiting the fact that \(U\) and \(1-U\) have the same \(U(0,1)\) distribution. This algorithm can also be justified by noting that if \(Y\) has an exponential distribution with mean \(\beta^{\alpha}\), then \(Y^{1/\alpha}\) is Weibull \(W(\alpha ,\beta )\). \(\sharp\)
Example. The density function of \(tri(a,b,c)\) is given by
\[f(x)=\left\{\begin{array}{ll}
{\displaystyle \frac{2(x-a)}{(b-a)(c-a)}} & \mbox{if \(a\leq x\leq c\)}\\
{\displaystyle \frac{2(b-x)}{(b-a)(b-c)}} & \mbox{if \(c<x\leq b\)}\\
0 & \mbox{otherwise}
\end{array}\right .\]
where \(a\), \(b\), and \(c\) are real numbers with \(a<c<b\). The distribution function is
\[F(x)=\left\{\begin{array}{ll}
{\displaystyle \frac{(x-a)^{2}}{(b-a)(c-a)}} & \mbox{if \(a\leq x\leq c\)}\\
{\displaystyle 1-\frac{(b-x)^{2}}{(b-a)(b-c)}} & \mbox{if \(c<x\leq b\)}\\
1 & \mbox{if \(b<x\)}
\end{array}\right .\]
If \(X\) is \(\mbox{tri}(0,1,(c-a)/(b-a))\) random variable, then \(X’=a+(b-a)X\) is \(\mbox{tri}(a,b,c)\) random variable. Therefore, we can restrict attention to \(\mbox{tri}(0,1,c)\) random variables, where \(0<c<1\). The distribution function is easily inverted to obtain, for \(0\leq u\leq 1\),
\[F^{-1}(u)=\left\{\begin{array}{ll}
\sqrt{cu} & \mbox{if \(0\leq u\leq c\)}\\
1-\sqrt{(1-c)(1-u)} & \mbox{if \(c<u\leq 1\)}
\end{array}\right .\]
Therefore, we can state the following inverse transform algorithm for generating \(X\) with \(\mbox{tri}(0,1,c)\)
\[\begin{array}{l}
\mbox{Step 1. Generate \(u\)}.\\
\mbox{Step 2. If \(u\leq c\), return \(x=\sqrt{cu}\). Otherwise, return
$x=1-\sqrt{(1-c)(1-u)}$.}
\end{array}\]
If \(u>c\) in step 2, we cannot replace the \(1-u\) in the formula for \(x\) by \(u\). \(\sharp\)
Example. A random variable having probability density function
\[f(t)=\lambda e^{-\lambda t}\frac{(\lambda t)^{n-1}}{(n-1)!}\mbox{ for }t>0\]
is said to be a gamma random variable with parameters \((n,\lambda )\). We can also show that the sum of \(n\) independent exponential random variables, each having parameter \(\lambda\) (mean \(1/\lambda\)), is a gamma random variable with parameters \((n,\lambda )\). Suppose that we want to generate the value of a gamma random variable with parameters \((n,\lambda )\). Since the distribution function of such a random variable is given by
\[F(x)=\int_{0}^{x}\frac{\lambda e^{-\lambda t}(\lambda t)^{n-1}}{(n-1)!}dt\]
it is not possible to give a closed form expression for its inverse. However, by using the result that a gamma random variable \(X\) with parameters \((n,\lambda )\) can be regarded as being the sum of \(n\) independent exponential random variables, each having parameter \(\lambda\), we can make use of Example \ref{sch3ex3} to generate \(X\). Specifically, we can generate a gamma random variable with parameters \((n,\lambda )\) by generating \(n\) random numbers \(u_{1},u_{2},\cdots ,u_{n}\) and then setting
\[X=x=-\frac{1}{\lambda}\ln u_{1}-\cdots -\frac{1}{\lambda}\ln u_{n}=-\frac{1}{\lambda}\ln (u_{1}u_{2}\cdots u_{n}),\]
where the use of the identity \(\sum_{i=1}^{n} \ln x_{i}=\ln (x_{1}x_{2}\cdots x_{n})\) is computationally time saving in that it requires only one rather than \(n\) logarithm computations. \(\sharp\)
\begin{equation}{\label{b}}\tag{B}\mbox{}\end{equation}
Composition.
The composition technique applies when the distribution function \(F\) from which we wish to generate can be expressed as a convex combination of other distribution functions \(F_{1},F_{2},\cdots\). Specifically, we assume that for all \(x\), \(F(x)\) can be written as
\[F(x)=\sum_{j=1}^{\infty} p_{j}F_{j}(x)\]
where \(p_{j}\geq 0\), \(\sum_{j=1}^{\infty} p_{j}=1\), and each \(F_{j}\) is a distribution function. The general composition algorithm is as follows
\[\begin{array}{l}
\mbox{Step 1. Generate a positive random integer \(J=j\) such that \(P\{J=j\}=p_{j}\) for \(j=1,2,\cdots\)}\\
\mbox{Step 2. Return \(x\) with distribution function \(F_{J}\).}
\end{array}\]
Step 1 can be thought of as choosing the distribution function \(F_{j}\) with probability \(p_{j}\) and could be accomplished, for example, by the discrete inverse transform method. Given that \(J=j\), generating \(x\) in step 2 should be done, of course, independently of \(J\). By conditioning on the value of \(J\) generated in step 1, we can easily see that the \(x\) returned by the algorithm will have distribution function \(F\)
\begin{align*} \mathbb{P}\{X\leq x\} & =\sum_{j=1}^{\infty}\mathbb{P}\{X\leq x|J=j\}\mathbb{P}\{J=j\}\\ & =\sum_{j=1}^{\infty} F_{j}(x)p_{j}=F(x).\end{align*}
Sometimes, we can give a geometric interpretation to the composition method. For a continuous random variable \(X\) with density \(f\), for example, we might be able to divide the area under \(f\) into its convex combination representation. Then we can think of step 1 as choosing a region and step 2 as generating from the distribution corresponding to the chosen region.
Example. The double-exponential (or Laplace) distribution has density \(f(x)=0.5e^{-|x|}\) for all real \(x\). We can express the density as
\[f(x)=0.5e^{x}I_{(-\infty ,0)}(x)+0.5e^{-x}I_{[0,\infty )}(x)\]
where \(I_{A}\) denotes the indicator function of the set \(A\), defined by
\[I_{A}(x)=\left\{\begin{array}{ll}
1 & \mbox{if \(x\in A\)}\\
0 & \mbox{otherwise}
\end{array}\right .\]
Thus, \(f(x)\) is a convex combination of \(f_{1}(x)=e^{x}I_{(-\infty ,0)}(x)\) and \(f_{2}(x)=e^{-x}I_{[0,\infty )}(x)\), both of which are densities, and \(p_{1}=p_{2}=0.5\). We first generate \(u_{1}\) and \(u_{2}\) as i.i.d. \(U(0,1)\). If \(u_{1}\leq 0.5\), return \(x=\ln u_{2}\). On the other hand, if \(u_{1}>0.5\), return \(x=-\ln u_{2}\). Note that we are essentially generating an exponential random variate with mean \(1\) and then changing its sign with probability \(0.5\). \(\sharp\)
Example. For \(0<a<1\), the right-trapezoidal distribution has density
\[f(x)=\left\{\begin{array}{ll}
a+2(1-a)x & \mbox{if \(0\leq x\leq 1\)}\\
0 & \mbox{otherwise}
\end{array}\right .\]
We can think of dividing the area under \(f\) into a rectangle having area \(a\) and a right triangle with area \(1-a\). Now \(f(x)\) can be decomposed as
\[f(x)=aI_{[0,1]}(x)+(1-a)2xI_{[0,1]}(x)\]
such that \(f_{1}=I_{[0,1]}(x)\) is simply the \(U(0,1)\) density, and \(f_{2}(x)=2xI_{[0,1]}(x)\) is a right-triangular density. Clearly, \(p_{1}=a\) and \(p_{2}=1-a\). The composition method thus calls for generating \(u_{1}\) and checking whether \(u_{1}\leq a\). If so, generate an independent \(u_{2}\), and return \(x=u_{2}\). If \(u_{1}>a\), however, we must generate from the right-triangular distribution. This can be accomplished either by generating \(u_{2}\) and returning \(x=\sqrt{u_{2}}\), or by generating \(u_{2}\) and \(u_{3}\) distributed as i.i.d. \(U(0,1)\) and returning \(x=\max\{u_{2},u_{3}\}\). Since the time to take a square root is probably greater than that required to generate an extra \(U(0,1)\) random variate and perform a comparison, the latter method would appear to be a faster way of generating an \(x\) with density \(f_{2}\). \(\sharp\)
\begin{equation}{\label{c}}\tag{C}\mbox{}\end{equation}
Convolution.
For several important distributions, the desired random variable \(X\) can be expressed as a sum of other random variables that are i.i.d. and can be generated more readily than direct generation of \(X\). We assume that there are i.i.d. random variables \(Y_{1},Y_{2},\cdots ,Y_{m}\) such that \(Y_{1}+Y_{2}+\cdots +Y_{m}\) has the same distribution as \(X\). Therefore, we write \(X=Y_{1}+Y_{2}+\cdots +Y_{m}\). Here we assume that the random variable can be represented as a sum of other random variables, whereas the assumption behind the method of composition is that the distribution function of \(X\) is a weighted sum of other distribution functions; the two situations are fundamentally different. Let \(F\) be the distribution function of \(X\) and \(G\) be the distribution function of a \(Y_{j}\). The algorithm is as follows
\[\begin{array}{ll}
\mbox{Step 1. Generate \(y_{1},y_{2},\cdots ,y_{m}\) i.i.d. each with distribution function \(G\)}\\
\mbox{Step 2. Return \(x=y_{1}+y_{2}+\cdots +y_{m}\)}
\end{array}\]
To demonstrate the validity of this algorithm, recall that we assumed that \(X\) and \(Y_{1}+Y_{2}+\cdots +Y_{m}\) have the same distribution function \(F\). Thus, we have
\[\mathbb{P}\{X\leq x\}=\mathbb{P}\{Y_{1}+Y_{2}+\cdots +Y_{m}\leq x\}=F(x).\]
Example. The \(m\)-Erlang random variable \(X\) with mean \(\beta\) can be defined as the sum of \(m\) i.i.d. exponential random variables with common mean \(\beta /m\). Thus, to generate \(X\), we can first generate \(y_{1},y_{2}, \cdots ,y_{m}\) as i.i.d. exponential with mean \(\beta /m\), then return \(x=y_{1}+y_{2}+\cdots +y_{m}\). \(\sharp\)
\begin{equation}{\label{d}}\tag{D}\mbox{}\end{equation}
The Acceptance-Rejection Method.
Version I.
The acceptance-rejection method requires that we specify a function \(t\) that majorizes the density; that is, \(t(x)\geq f(x)\) for all \(x\). Now \(t\) will not, in general, be a density since
\[c=\int_{-\infty}^{\infty} t(x)dx\geq\int_{-\infty}^{\infty} f(x)dx=1\]
but the function \(r(x)=t(x)/c\) is clearly a density. We must be able to generate a random variate \(Y\) having density \(r(x)\). The general algorithm follows
\[\begin{array}{l}
\mbox{Step 1. Generate \(Y=y\) having density \(r(x)\)}\\
\mbox{Step 2. Generate \(u\), independent of \(y\)}\\
\mbox{Step 3. If \(u\leq f(y)/t(y)\), return \(X=x=y\). Otherwise, go to Step 1}
\end{array}\]
Theorem. The above acceptance-rejection method is valid.
Proof. We demonstrate that the acceptance-rejection method for continuous random variables is valid by showing that for any \(x\),
\[\mathbb{P}\{X\leq x\}=\int_{-\infty}^{x} f(y)dy.\]
Let \(A\) denote the event that acceptance occurs in step 3 of the algorithm. Now \(X\) is defined only on the event \(A\), which is a subset of the entire space on which \(Y\) and \(U\) of steps 1 and 2 are defined. Thus, unconditional probability statements about \(X\) alone are really conditional probability statements (conditioned on \(A\)) about \(Y\) and \(U\). Since, given that \(A\) occurs we have \(X=Y\), we can write
\begin{align*} \mathbb{P}\{X\leq x\} & =\mathbb{P}\{Y\leq x|A\}\\ & =\frac{\mathbb{P}\{A,Y\leq x\}}{\mathbb{P}(A)}\end{align*}
We also have
\begin{align*} \mathbb{P}\{A|Y=y\} & =\mathbb{P}\left\{\left .U\leq\frac{f(y)}{t(y)}\right |Y=y\right\}\\ & =\mathbb{P}\left\{U\leq\frac{f(y)}{t(y)}\right\}=\frac{f(y)}{t(y)}\end{align*}
where the second equality follows since \(U\) is independent of \(Y\) and the third equality follows since \(U\) is \(U(0,1)\) and \(f(y)\leq t(y)\). Then, we have
\begin{align*} \mathbb{P}\{A,Y\leq x\} & =\int_{-\infty}^{x} \mathbb{P}\{A,Y\leq x|Y=y\}r(y)dy\\ & =\int_{-\infty}^{x} \mathbb{P}\{A|Y=y\}\frac{t(y)}{c}dy\\ & =\frac{1}{c}\int_{-\infty}^{x} f(y)dy\end{align*}
We also have
\[\mathbb{P}(A)=\int_{-\infty}^{\infty} \mathbb{P}\{A|Y=y\}r(y)dy=\frac{1}{c}\]
This yields the desired result. \(\blacksquare\)
Example. The \(\mbox{Beta}(3,4)\) distribution on the unit interval has density
\[f(x)=\left\{\begin{array}{ll}
60x^{3}(1-x)^{2} & \mbox{if \(0\leq x\leq 1\)}\\
0 & \mbox{otherwise}
\end{array}\right .\]
Since the distribution function \(F(x)\) is a six-degree polynomial, the inverse transform approach would not be simple, involving numerical methods to find polynomial roots. By standard differential calculus, i.e., setting \(df/dx=0\), we see that the maximum value of \(f(x)\) occurs at \(x=0.6\), where \(f(0.6)=2.0736\). If we define
\[t(x)=\left\{\begin{array}{ll}
2.0736 & \mbox{if \(0\leq x\leq 1\)}\\
0 & \mbox{otherwise}
\end{array}\right .\]
then \(t\) majorizes \(f\). Since \(c=\int_{0}^{1} 2.0736dx=2.0736\), it follows that \(r(x)\) is just the \(U(0,1)\) density. The algorithm first generates \(y\) and
$u$ in steps 1 and 2; then in step 3 we check whether
\[u\leq\frac{60y^{3}(1-y)^{2}}{2.0736}\]
If so, we return \(X=x=y\); otherwise, we reject \(y\) and go to step 1. \(\sharp\)
Example. The density function of the gamma distribution \(G(\alpha ,\beta )\) is given by
\[f(x)=\frac{1}{\beta^{\alpha}\Gamma (\alpha)}x^{\alpha -1}e^{-x/\beta}\mbox{ for }x>0\]
Given \(X\) being \(G(\alpha ,1)\), we can obtain, for any \(\beta >0\), a gamma \(G(\alpha ,\beta )\) random variable \(X’\) by letting \(X’=\beta X\), such that it is sufficient to restrict attention to generating from the gamma \(G(\alpha ,1)\) distribution. Furthermore, recall that the gamma \(G(1,1)\) distribution is just the exponential distribution with mean \(1\), so that we need consider only \(0<\alpha <1\) and \(\alpha >1\). We first consider the case \(0<\alpha <1\) by using the acceptance-rejection method with majorizing function
\[t(x)=\left\{\begin{array}{ll}
0 & \mbox{if \(x\leq 0\)}\\
{\displaystyle \frac{x^{\alpha -1}}{\Gamma (\alpha )}} & \mbox{if \(0<x\leq 1\)}\\
{\displaystyle \frac{e^{-x}}{\Gamma (\alpha )}} & \mbox{if \(1<x\)}
\end{array}\right .\]
Therefore, we obtain
\[c=\int_{0}^{\infty} t(x)dx=\frac{b}{\alpha\Gamma (\alpha )},\]
where \(b=(e+\alpha )/e>1\), which yields the density \(r(x)=t(x)/c\) as
\[r(x)=\left\{\begin{array}{ll}
0 & \mbox{if \(x\leq 0\)}\\
{\displaystyle \frac{\alpha x^{\alpha -1}}{b}} & \mbox{if \(0<x\leq 1\)}\\
{\displaystyle \frac{\alpha e^{x}}{b}} & \mbox{if \(1<x\)}
\end{array}\right .\]
Generating a random variable \(Y\) with density \(r(x)\) can be done by the inverse transform method; the distribution function corresponding to \(r(x)\) is
\[R(x)=\int_{0}^{x} r(y)dy=\left\{\begin{array}{ll}
{\displaystyle \frac{x^{\alpha}}{b}} & \mbox{if \(0\leq x\leq 1\)}\\
{\displaystyle 1-\frac{\alpha e^{-x}}{b}} & \mbox{if \(1<x\)}
\end{array}\right .\]
which can be inverted to obtain
\[R^{-1}(u)=\left\{\begin{array}{ll}
(bu)^{1/\alpha} & \mbox{if \(u\leq 1/b\)}\\
{\displaystyle -\ln\frac{b(1-u)}{\alpha}} & \mbox{otherwise}
\end{array}\right .\]
In order to generate \(Y\) with density \(r(x)\), we first generate \(u_{1}\). If \(u_{1}\leq 1/b\), we set \(y=(bu_{1})^{1/\alpha}\); in this case, \(y\leq 1\). Otherwise, if \(u_{1}>1/b\), set \(y=-\ln (b(1-u_{1})/\alpha )\), which will be greater than \(1\). Since
\[\frac{f(y)}{t(y)}=\left\{\begin{array}{ll}
e^{-y} & \mbox{if \(0\leq y\leq 1\)}\\
y^{\alpha -1} & \mbox{if \(1<y\)}
\end{array}\right .\]
We obtain the algorithm
\[\begin{array}{l}
\mbox{Step 1. Generate \(u_{1}\), and let \(p=bu_{1}\). If \(p>1\), go to Step 3. Otherwise, proceed to Step 2.}\\
\mbox{Step 2. Let \(y=p^{1/\alpha}\), and generate \(u_{2}\). If \(u_{2}\leq e^{-y}\), return \(x=y\). Otherwise, go to Step 1.}\\
\mbox{Step 3. Let \(y=-\ln ((b-p)/\alpha )\) and generate \(u_{2}\). If \(u_{2}\leq y^{\alpha -1}\), return \(x=y\). Otherwise go to Step 1.}
\end{array}\]
We now consider the case \(\alpha >1\). We will present a modified acceptance-rejection method. Its execution time is bounded as \(\alpha \rightarrow\infty\) and in fact appears to become faster as \(\alpha\) grows. The modification of the acceptance-rejection method consists of adding a faster pretest for acceptance. To obtain a majorizing function \(t(x)\), let \(\lambda =\sqrt{2\alpha -1}\), \(\mu =\alpha^{\lambda}\), and
\[c=\frac{4\alpha^{\alpha}e^{-\alpha}}{(\lambda\Gamma (\alpha ))}.\]
Then, we define \(t(x)=cr(x)\), where
\[r(x)=\frac{\lambda\mu x^{\lambda -1}}{(\mu +x^{\lambda})^{2}}\mbox{ for } x>0.\]
The distribution function corresponding to the density \(r(x)\) is
\[R(x)=\frac{x^{\lambda}}{\mu +x^{\lambda}}\mbox{ for }x\geq 0\]
which is easily inverted to obtain
\[R^{-1}(u)=\left (\frac{\mu u}{1-u}\right )^{1/\lambda}\mbox{ for }0<u<1.\]
To verify that \(r(x)\) indeed majorizes \(f(x)\), see Cheng (1977). This \(R(x)\) is known as the log logistic distribution. We use the inverse transform method to generate \(Y\) with density \(r(x)\). After adding an advantageous preset for acceptance and streamlining for computational efficiency, Cheng recommends the following algorithm, where the pre-specified constants are
\[a=1/\sqrt{2\alpha -1},\quad b=\alpha -\ln 4, \quad q=\alpha +1/a, \quad \theta =4.5\mbox{ and }d=1+\ln\theta.\]
The algorithm is
\[\begin{array}{ll}
\mbox{Step 1. Generate \(u_{1}\) and \(u_{2}\) independently.}\\
\mbox{Step 2. Let \(v=a\ln (u_{1}/(1-u_{1}))\), \(y=\alpha e^{v}\), \(z=u_{1}^{2}u_{2}\), and \(w=b+qv-y\).}\\
\mbox{Step 3. If \(w+d-\theta z\geq 0\), return \(x=y\). Otherwise proceed to Step 4.}\\
\mbox{Step 4. If \(w\geq\ln z\), return \(x=y\). Otherwise, go to Step 1.}
\end{array}\]
Step 3 is the added pretest, which (if passed) avoids computing the logarithm in the regular acceptance-rejection test in Step 4. If Step 3 were removed, the algorithm would still be valid and would just be the literal acceptance-rejection method. There are several other good algorithms that could be used when \(\alpha >1\). We can also consider the direct use of inverse transform method to generate gamma random variables. Since neither the gamma distribution function nor its inverse has a simple closed form, we must resort to numerical methods. \(\sharp\)
Example. The density function of beta distribution \(Beta(\alpha_{1},\alpha_{2})\) is given by
\[f(x)=\frac{1}{B(\alpha_{1},\alpha_{2})}x^{\alpha_{1}-1}(1-x)^{\alpha_{2}-1}\mbox{ for }0<x<1,\]
where \(B(\alpha_{1},\alpha_{2})\) is a beta function defined by
\begin{align*} B(\alpha_{1},\alpha_{2}) & =\int_{0}^{1} t^{\alpha_{1}-1}(1-t)^{\alpha_{2}-1}dt\\ & =\frac{\Gamma (\alpha_{1})\Gamma (\alpha_{2})}
{\Gamma (\alpha_{1}+\alpha_{2})}\end{align*}
If \(X\) is a \(Beta(\alpha_{1},\alpha_{2})\) random variable, then \(1-X\) is a \(Beta(\alpha_{2},\alpha_{1})\) random variable, so that we can readily generate a \(Beta(\alpha_{2},\alpha_{1})\) if we can obtain \(Beta(\alpha_{1},\alpha_{2})\) easily. One such situation occurs when either \(\alpha_{1}\) or \(\alpha_{2}\) is equal to \(1\). If \(\alpha_{2}=1\), for example, then for \(0\leq x\leq 1\) we have \(f(x)=\alpha_{1}x^{\alpha_{1}-1}\), so the distribution function is \(F(x)=x^{\alpha_{1}}\), and we can easily generate \(X\) with \(Beta(\alpha_{1},1)\) by the inverse transform method, i.e., by returning \(x=u^{1/\alpha_{1}}\). Finally, the beta \(Beta(1,1)\) distribution is simply \(U(0,1)\). A general method for generating a \(Beta(\alpha_{1},\alpha_{2})\) random variable for any \(\alpha_{1}>0\) and \(\alpha_{2}>0\) is a result of the fact that if \(Y_{1}\) is a \(G(\alpha_{1},1)\) random variable, \(Y_{2}\) is a \(G(\alpha_{2},1)\) random variable, and \(Y_{1}\) and \(Y_{2}\) are independent, then \(Y_{1}/(Y_{1}+Y_{2})\) is \(Beta(\alpha_{1},\alpha_{2})\) random variable. This leads to the following algorithm
\[\begin{array}{ll}
\mbox{Step 1. Generate \(y_{1}\) from \(G(\alpha_{1},1)\) and \(y_{2}\) from \(G(\alpha_{2},1)\) independently.}\\
\mbox{Step 2. Return \(x=y_{1}/(y_{1}+y_{2})\).}
\end{array}\]
The inverse transform method for generating beta random variable must rely on numerical methods, as was the case for gamma distribution. \(\sharp\)
One popular method of finding a suitable \(t(x)\) is first to specify \(r(x)\) to be some common density, then find the smallest \(c\) satisfying \(t(x)=cr(x)\geq f(x)\) for all \(x\).
Version II.
Suppose that we have a method for generating a random variable having density function \(g(x)\). We can use this as the basis for generating from the continuous distribution having density function of \(f(x)\) by generating \(Y\) from \(g\) and then accepting this generated value with a probability proportional to \(f(y)/g(y)\). Specifically, let \(c\) be a constant such that \(f(y)/g(y)\leq c\) for all \(y\). We then have the following technique for generating a random variable having density function \(f\).
\[\begin{array}{l}
\mbox{Step 1. Generate \(y\) having desity function \(g(y)\)}\\
\mbox{Step 2. Generate a random number \(u\)}\\
\mbox{Step 3. If \(u\leq f(y)/cg(y)\), set \(x=y\). Otherwise, go to Step 1.}
\end{array}\]
As in the discrete case it should be noted that the way in which one accepts the value \(y\) with probability \(f(y)/cg(y)\) is by generating a random number \(u\) and then accepting \(y\) if \(u\leq f(y)/cg(y)\). In exactly the same way as we did in the discrete case we can prove the following theorem
Theorem. The random variable generated by the acceptance-rejection method has density function \(f\), and the number of iterations of the algorithm that are needed is a geometric random variable with mean \(c\). \(\sharp\)
Example. Let us use the acceptance-rejection method to generate a random variable having density function \(f(x)=20x(1-x)^{3}\) for \(0<x<1\). Since this random variable is concentrated in the interval \((0,1)\), let us consider the acceptance-rejection method with \(g(x)=1\) for \(0<x<1\). To determine the constant \(c\) satisfying \(f(x)/g(x)\leq c\), we need to determine the maximum value of \(f(x)/g(x)=20x(1-x)^{3}\). Differentiation of this quantity yields
\[\frac{d}{dx}\left (\frac{f(x)}{g(x)}\right )=20((1-x)^{3}-3x(1-x)^{2}).\]
Setting this equal to \(0\), it follows that the maximal value is attained when \(x=1/4\). Therefore, we obtain
\[\frac{f(x)}{g(x)}\leq 20\left (\frac{1}{4}\right )\left (\frac{3}{4}\right )^{3}=\frac{135}{64}=c,\]
which implies
\[\frac{f(x)}{cg(x)}=\frac{256}{27}x(1-x)^{3}\]
Therefore, the acceptance-rejection procedure is as follows
\[\begin{array}{l}
\mbox{Step 1. Generate random numbers \(u_{1}\) and \(u_{2}\).}\\
\mbox{Step 2. If \(u_{2}\leq (256/27)u_{1}(1-u_{1})^{3}\), stop and set \(x=u_{1}\). Otherwise, go to Step 1.}
\end{array}\]
The average number of times that Step 1 will be performed is \(c=135/64\approx 2.11\). \(\sharp\)
Example. Suppose that we want to generate a random variable having the gamma distribution \(G(3/2,1)\)
\[f(x)=\frac{1}{\Gamma (3/2)}x^{1/2}e^{-x}=\frac{2}{\sqrt{\pi}}x^{1/2}e^{-x}\equiv Kx^{1/2}e^{-x},\mbox{ for }x>0,\]
where \(K=1/\Gamma (3/2)=2/\sqrt{\pi}\). Since such a random variable is concentrated on the positive axis and has mean \(3/2\), it is natural to try the acceptance-rejection method with an exponential random variable with the same mean. Let
\[g(x)=\frac{2}{3}e^{-2x/3}\mbox{ for }x>0.\]
Then, we have
\[\frac{f(x)}{g(x)}=\frac{3K}{2}x^{1/2}e^{-x/3}.\]
By differentiating and setting the resultant derivative equal to \(0\), we obtain that the maximal value of this ratio is obtained when
\[\frac{1}{2}x^{-1/2}e^{-x/3}=\frac{1}{3}x^{1/2}e^{-x/3}\]
that is, when \(x=3/2\). Therefore, we obtain
\begin{align*} c & =\max\frac{f(x)}{g(x)}=\frac{3K}{2}\left (\frac{3}{2}\right )^{1/2}e^{-1/2}\\ & =\frac{3^{3/2}}{(2\pi e)^{1/2}}.\end{align*}
Since
\[\frac{f(x)}{cg(x)}=\sqrt{\frac{2e}{3}}\cdot\sqrt{x}\cdot e^{-x/3}\]
we see that a gamma random variable with parameters \((3/2,1)\) can be generated as follows
\[\begin{array}{l}
\mbox{Step 1. Generate a random number \(u_{1}\) and set \(y=-\frac{3}{2}\ln u_{1}\) according Example 1.}\\
\mbox{Step 2. Generate a random number \(u_{2}\).}\\
\mbox{Step 3. If \(u_{2}\leq (2ey/3)^{1/2}e^{-y/3}\), set \(x=y\) and stop. Otherwise, go to Step 1.}
\end{array}\]
The average number of interactions that will be needed is
\[c=3\left (\frac{3}{2\pi e}\right )^{1/2}\approx 1.257.\]
Next, we want to show that why we take the exponential random variable with parameter \(2/3\). Let \(g(x)=\lambda e^{-\lambda x}\). Then, we have
\[\frac{f(x)}{g(x)}=\frac{Kx^{1/2}e^{-(1-\lambda )x}}{\lambda}.\]
The maximal value of this ratio is obtained when
\[\frac{1}{2}x^{-1/2}=(1-\lambda )x^{1/2}\]
or when \(x=1/(2(1-\lambda ))\) provided \(\lambda <1\) (if \(\lambda\geq 1\), it is easy to see that the ratio \(f(x)/g(x)\) assumes arbitrarily large values). Therefore, if we use the exponential random variable with mean \(1/\lambda\), then the average number of iterations of the algorithm that will be needed is
\[c=\max\frac{f(x)}{g(x)}=\frac{K}{\lambda}(2(1-\lambda ))^{-1/2}e^{-1/2}.\]
Therefore, the best choice of \(\lambda\) is the one that minimizes the above, or equivalently, that maximizes \(\lambda (1-\lambda )^{1/2}\). Calculus
now shows that this value satisfies
\[(1-\lambda )^{1/2}=\frac{\lambda (1-\lambda )^{-1/2}}{2}\]
or equivalently, \(1-\lambda =\lambda /2\), or \(\lambda =2/3\). Therefore, the best exponential random variable to use in the acceptance-rejection method to generate a gamma random variable \(G(3/2,1)\) is indeed the exponential random variable with mean \(3/2\). \(\sharp\)
Example. (Generating a Normal Random Variable). A random variable \(X\) is said to be normally distributed with mean \(\mu\) and variance \(\sigma^{2}\) if its probability density function is given by
\[f(x)=\frac{1}{\sigma\sqrt{2\pi}}\exp\left [-\frac{(x-\mu)^{2}}{2\sigma^{2}}\right ]\mbox{ for }-\infty <x<\infty .\]
If \(X\) is a normal random variable with mean \(\mu\) and variance \(\sigma^{2}\), then \(Z=\frac{X-\mu}{\sigma}\) is normal with mean \(0\) and variance \(1\). Such a random variable \(Z\) is said to be a standard normal random variable. To generate a standard normal random variable \(Z\), we note first that the absolute value of \(Z\) has probability density function
\begin{equation}{\label{sch3eq3}}\tag{2}
f(x)=\frac{2}{\sqrt{2\pi}}e^{-x^{2}/2}\mbox{ for }0<x<\infty .
\end{equation}
We start by generating from the preceding density function by using the acceptance-rejection method with \(g\) being the exponential density function with mean \(1\); that is, \(g(x)=e^{-x}\) for \(0<x<\infty\). Now, we have
\[\frac{f(x)}{g(x)}=\sqrt{\frac{2}{\pi}}e^{x-x^{2}/2}.\]
The maximum value of \(f(x)/g(x)\) occurs at the value of \(x\) that maximizes \(x-x^{2}/2\). Calculus shows that this occurs when \(x=1\). Therefore, we can take
\[c=\max\frac{f(x)}{g(x)}=\frac{f(1)}{g(1)}=\sqrt{\frac{2e}{\pi}}.\]
Since
\[\frac{f(x)}{cg(x)}=\exp\left [x-\frac{x^{2}}{2}-\frac{1}{2}\right ]=\exp\left [-\frac{(x-1)^{2}}{2}\right ]\]
it follows that we can generate the absolute value of a standard normal random variable as follows
\[\begin{array}{l}
\mbox{Step 1. Generate \(y\), an exponential random variable with parameter \(1\).}\\
\mbox{Step 2. Generate a random number \(u\).}\\
\mbox{Step 3. If \(u\leq\exp [-(y-1)^{2}/2]\), set \(x=y\). Otherwise, go to Step 1.}
\end{array}\]
Once we have simulated a random variable \(X\) having density as in (\ref{sch3eq3}), we can then obtain a standard normal random variable
$Z$ by letting \(Z\) be equally likely to be either \(X\) or \(-X\). In Step 3, the value \(y\) is accepted if \(u\leq\exp [-(y-1)^{2}/2]\), which is equivalent to \(-\ln u\geq (y-1)^{2}/2\). However, in Example \ref{sch3ex3}, it was shown that \(-\ln U\) is exponential random variable with mean \(1\). Therefore, the above discussion is equivalent to the following
\[\begin{array}{l}
\mbox{Step 1. Generate independent exponentials with mean \(1\), \(y_{1}\) and \(y_{2}\).}\\
\mbox{Step 2. If \(y_{2}\geq (y_{1}-1)^{2}/2\), set \(x=y_{1}\). Otherwise, go to Step 1.}
\end{array}\]
Therefore, we have the following algorithm that generates a standard normal random variable.
\[\begin{array}{l}
\mbox{Step 1. Generate independent exponenetials with mean \(1\), \(y_{1}\) and \(y_{2}\).}\\
\mbox{Step 2. If \(y_{2}\geq (y_{1}-1)^{2}/2\), then go to Step 3. Otherwise, go to Step 1.}\\
\mbox{Step 3. Generate a random number \(u\) and set }
z=\left\{\begin{array}{ll}
y_{1} & \mbox{if \(u\leq 1/2\)}\\
-y_{1} & \mbox{if \(u>1/2\)}
\end{array}\right .
\end{array}\]
If we want the normal random variable to have mean \(\mu\) and variance \(\sigma^{2}\), we can just take \(\mu +\sigma Z\). \(\sharp\)
Although most methods for generating random variables can be classified into one of the approaches discussed so far, some techniques simply rely on some special property of the desired distribution function \(F\) or the random variables \(X\).
Example. If \(Y\) is \(N(0,1)\), then \(Y^{2}\) has a chi-square distribution with \(1\) degree of freedom. Thus to generate \(X\) distributed as \(\chi^{2}_{1}\), generate \(Y\) distributed as \(N(0,1)\), and return \(X=Y^{2}\). \(\sharp\)
Example. If \(Z_{1},Z_{2},\cdots ,Z_{n}\) are i.i.d. \(\chi_{1}^{2}\) random variable, then \(X=Z_{1}+Z_{2}+\cdots +Z_{n}\) is a \(\chi_{n}^{2}\) random variable. In order to to generate \(X\), we first generate \(Y_{1},Y_{2},\cdots ,Y_{k}\) as i.i.d. random variables, and then return \(X=Y_{1}^{2}+Y_{2}^{2}+\cdots +Y_{n}^{2}\). Since for large \(n\) this may be quite slow, we might want to generate the gamma distribution \(G(n/2,2)\) discussed above. \(\sharp\)
Example. If \(Y\) is \(N(0,1)\) random variable, \(Z\) is \(\chi_{n}^{2}\) random variable, and \(Y\) and \(Z\) are independent, then \(X=Y/\sqrt{Z/n}\) has a \(t\)-distribution with degrees of freedom \(n\). Thus to generate \(X\), we generate \(Y\) and \(Z\) independently, and return \(X=Y/\sqrt{Z/n}\). \(\sharp\)
Example. If \(Z_{1}\) is \(\chi_{k_{1}}^{2}\) random variable, \(Z_{2}\) is \(\chi_{k_{2}}^{2}\) random variable, and \(Z_{1}\) and \(Z_{2}\) are independent, then \(X=(Z_{1}/k_{1})/(Z_{2}/k_{2})\) has a \(F_{k_{1},k_{2}}\) distribution. We thus generate \(Z_{1}\) and \(Z_{2}\) independently, and return \(X=(Z_{1}/k_{1})/(Z_{2}/k_{2})\). \(\sharp\)
\begin{equation}{\label{e}}\tag{E}\mbox{}\end{equation}
Generating Multivariate Distributions.
Suppose that we have a fully specified joint distribution function \(F_{X_{1},X_{2},\cdots ,X_{n}}(x_{1},x_{2},\cdots ,x_{n})\) from which we would like to generate a random vector \({\bf X}=(X_{1},X_{2},\cdots ,X_{n})\). Also assume that, for \(k=2,3,\cdots ,n\) we can obtain the conditional distribution of \(X_{k}\) given that \(X_{i}=x_{i}\) for \(i=1,2,\cdots ,k-1\). We denote the conditional distribution by \(F_{X_{k}}(x_{k}|x_{1},x_{2},\cdots ,x_{k-1})\). In addition, let \(F_{X_{i}}(x_{i})\) be the marginal distribution function of \(X_{i}\) for \(i=1,2,\cdots ,n\). Then, a general algorithm for generating a random vector \({\bf X}\) with joint distribution \(F_{X_{1},X_{2},\cdots ,X_{n}}\) is as follows
\[\begin{array}{l}
\mbox{Step 1. Generate \(x_{1}\) with distribution function \(F_{X_{1}}\).}\\
\mbox{Step 2. Generate \(x_{2}\) with distribution function \(F_{X_{2}}(x_{2}|x_{1})\)}\\
\mbox{Step 3. Generate \(x_{3}\) with distribution function \(F_{X_{3}}(x_{3}|x_{1},x_{2})\)}\\
\vdots\\
\mbox{Step n. Generate \(x_{n}\) with distribution function \(F_{X_{n}}(x_{n}|x_{1},x_{2},\cdots ,x_{n-1})\)}\\
\mbox{Step n+1. Return \({\bf X}=(X_{1},X_{2},\cdots ,X_{n})\)}
\end{array}\]
In step 2 through \(n\), the conditional distributions used are those with the previously generated \(x_{i}\)’s. For example, if \(x_{1}\) is the
value generated for \(X_{1}\) in step 1, the conditional distribution function used in step 2 is \(F_{X_{2}}(x_{2}|x_{1})\) etc.
Example. (The Polar Method for Generating a Pair of Independent Standard Normal Random Variables). Let \(X\) and \(Y\) be independent standard normal random variables, and let \(R\) and \(\theta\) denote the polar coordinates of the vector \((X,Y)\). That is,
\[R^{2}=X^{2}+Y^{2}\mbox{ and }\tan\theta =\frac{Y}{X}.\]
Since \(X\) and \(Y\) are independent, their joint density is the product of their individual densities and is thus given by
\[f(x,y)=\frac{1}{\sqrt{2\pi}}e^{-x^{2}/2}\cdot\frac{1}{\sqrt{2\pi}}e^{-y^{2}/2}=\frac{1}{2\pi}e^{-(x^{2}+y^{2})/2}.\]
To determine the joint density of \(R^{2}\) and \(\Theta\), call it \(f(d,\theta )\), we make the change of variables
\[d=x^{2}+y^{2}\mbox{ and }\theta =\tan^{-1}\left (\frac{y}{x}\right ).\]
As the Jacobian of this transformation is easily shown to equal \(2\), it follows that the joint density function of \(R^{2}\) and \(\Theta\) is given by
\[f(d,\theta )=\frac{1}{2}\frac{1}{2\pi}e^{-d/2}\]
for \(0<d<\infty\) and \(0<\theta <2\pi\). However, as this is equal to the product of an exponential density having mean \(2\) (namely, \(\frac{1}{2}e^{-d/2}\)) and the uniform density on \((0,2\pi )\) (namely, \(1/2\pi\)), it follows that
\begin{equation}{\label{sch3eq21}}\tag{3}
\begin{array}{l}
\mbox{$R^{2}$ and \(\Theta\) are independent, with \(R^{2}\) being exponential with mean \(2\)}\\
\mbox{and \(\Theta\) being uniformly distributed over \((0,2\pi )\).}
\end{array}
\end{equation}
We can now generate a pair of independent standard normal random variables \(X\) and \(Y\) by using (\ref{sch3eq21}) to first generate their polar coordinates and then transforming back to rectangular coordinates.
\[\begin{array}{l}
\mbox{Step 1. Generate random numbers \(u_{1}\) and \(u_{2}\)}\\
\mbox{Step 2. Set \(r^{2}=-2\ln u_{1}\) (and thus \(R^{2}\) is exponential with mean \(2\))}\mbox{Set \(\theta =2\pi u_{2}\) (and thus \(\Theta\) is uniform between \(0\) and \(2\pi\)).}\\
\mbox{Step 3. Now let \(x=r\cos\theta =\sqrt{-2\ln u_{1}}\cos (2\pi u_{2})\)
and \(y=r\sin\theta =\sqrt{-2\ln u_{1}}\sin (2\pi u_{2})\).}
\end{array}\]
The transformation is known as the Box-Muller transformation. However, the use of Box-Muller transformation to generate a pair of independent standard normals is computationally not very efficient. The reason for this is the need to compute the sine and cosine trigonometric functions. There is fortuitously a way to get around this time-consuming difficulty by an indirect computation of the sine and cosine of a random angle. To begin, note that if \(U\) is uniform on \((0,1)\) then \(2U\) is uniform on \((0,2)\) and so \(2U-1\) is uniform on \((-1,1)\). If we generate random numbers \(u_{1}\) and \(u_{2}\) and set
\[v_{1}=2u_{1}-1\mbox{ and }v_{2}=2u_{2}-1\]
then \((V_{1},V_{2})\) is uniformly distributed in the square of area \(4\) centered at \((0,0)\). Suppose now that we continually generate such pairs \((v_{1},v_{2})\) until we obtain one that is contained in the circle of radius \(1\) centered at \((0,0)\); that is, until \((v_{1},v_{2})\) satisfies \(v_{1}^{2}+v_{2}^{2}\leq 1\). It now follows that such a pair \((v_{1},v_{2})\) is uniformly distributed in the circle. Let \(r\) and
$\theta$ denote the polar coordinates of this pair. Then, it is not difficult to verify \(R\) and \(\Theta\) are independent, with \(R^{2}\) being
uniformly distributed on \((0,1)\) and with \(\Theta\) being uniformly distributed over \((0,2\pi )\). Since \(\Theta\) is a random angle, it
follows that we can generate the sine and cosine of a random angle \(\Theta\) by generating a random point \((v_{1},v_{2})\) in the circle and then setting
\[\sin\theta =\frac{v_{2}}{r}=\frac{v_{2}}{\sqrt{v_{1}^{2}+v_{2}^{2}}}\]
and
\[\cos\theta =\frac{v_{1}}{r}=\frac{v_{1}}{\sqrt{v_{1}^{2}+v_{2}^{2}}}.\]
It follows that, from the Box-Muller transformation, we can generate independent standard normals by generating a random number \(u\) and setting
\begin{equation}{\label{sch3eq22}}\tag{4}
x=\sqrt{-2\ln u}\cdot\frac{v_{1}}{\sqrt{v_{1}^{2}+v_{2}^{2}}}\mbox{ and }
y=\sqrt{-2\ln u}\cdot\frac{v_{2}}{\sqrt{v_{1}^{2}+v_{2}^{2}}}.
\end{equation}
In fact, since \(R^{2}=V_{1}^{2}+V_{2}^{2}\) is itself uniformly distributed over \((0,1)\) and is independent of the random angle \(\Theta\), we can use it as the random number \(u\) needed in (\ref{sch3eq22}). Let \(s=r^{2}\). Then, we obtain
\[x=\sqrt{-2\ln s}\cdot\frac{v_{1}}{\sqrt{s}}=v_{1}\cdot\sqrt{\frac{-2\ln s}{s}}\]
and
\[y=\sqrt{-2\ln s}\cdot\frac{v_{2}}{\sqrt{s}}=v_{2}\cdot\sqrt{\frac{-2\ln s}{s}}\]
are independent standard normals when \((v_{1},v_{2})\) is a randomly chosen point in the circle of radius \(1\) centered at the origin, and
$s=v_{1}^{2}+v_{2}^{2}$. Therefore, we have the following approach to generating a pair of independent standard normals
\[\begin{array}{ll}
\mbox{Step 1.} & \mbox{Generate random numbers \(u_{1}\) and \(u_{2}\)}\\
\mbox{Step 2.} & \mbox{Set \(v_{1}=2u_{1}-1\), \(v_{2}=2u_{2}-1\), and \(s=v_{1}^{2}+v_{2}^{2}\).}\\
\mbox{Step 3.} & \mbox{If \(s>1\) go to Step 1.}\\
\mbox{Step 4.} & \mbox{Return the independent standard normals \({\displaystyle x=\sqrt{-2\ln s}\cdot\frac{v_{1}}{\sqrt{s}}=v_{1}\cdot \sqrt{\frac{-2\ln s}{s}}}\)}\\
& \mbox{and \({\displaystyle y=\sqrt{-2\ln s}\cdot\frac{v_{2}}{\sqrt{s}}=v_{2}\cdot\sqrt{\frac{-2\ln s}{s}}}\).}
\end{array}\]
The above method is called the polar method. Since the probability that a random point in the square will fall within the circle is equal to \(\pi /4\) (the area of the circle divided by the area of the square), it follows that, on average, the polar method will require \(4/\pi =1.273\) iterations of Step 1. Hence it will, on average, require \(2.546\) random numbers, \(1\) logarithm, \(1\) square root, \(1\) division, and \(4.546\) multiplications to generate two standard normals. \(\sharp\)
The \(n\)-dimensional multivariate normal distribution with mean vector \(\mu=(\mu_{1},\mu_{2},\cdots ,\mu_{n})\) and covariance matrix \(\Sigma\), where the \((i,j)\)th entry is \(\sigma_{ij}\), has joint density function
\[f({\bf x})=(2\pi )^{-n/2}|\Sigma |^{-1/2}\exp\left [\frac{-({\bf x}-\mu)^{T}\Sigma^{-1}({\bf x}-\mu)}{2}\right ]\]
where \({\bf x}=(x_{1},x_{2},\cdots ,x_{n})\) and \(|\Sigma |\) is the determinant of \(\Sigma\). We denote this joint distribution as \(N(\mu,\Sigma )\). If \({\bf X}=(X_{1},X_{2},\cdots ,X_{n})\) is \(N(\mu,\Sigma )\) random vector, then \(\mathbb{E}(X_{i})=\mu_{i}\) and \(Cov(X_{i},X_{j})=\sigma_{ij}=\sigma_{ji}\); that is, \(\Sigma\) is symmetric and is positive definite. Since \(\Sigma\) is symmetric and positive definite, we can factor it uniquely as \(\Sigma =C^{T}C\), where \(C\) is an \(n\times n\) lower triangular matrix. If \(c_{ij}\) is the \((i,j)\)th element of \(C\), an algorithm for generating the desired multivariate normal vector \(X\) is as follows
\[\begin{array}{l}
\mbox{Step 1. Generate \(z_{1},z_{2},\cdots ,z_{n}\) as i.i.d. \(N(0,1)\)}\\
\mbox{Step 2. For \(i=1,2,\cdots ,n\), let \(x_{i}=\mu_{i}+\sum_{j=1}^{i}c_{ij}z_{j}\) and return \({\bf x}=(x_{1},x_{2},\cdots ,x_{n})\).}
\end{array}\]
\begin{equation}{\label{f}}\tag{F}\mbox{}\end{equation}
Generating Poisson Process.
We show how to generate the times of arrivals \(t_{1},t_{2},\cdots\) for the arrival processes. Suppose that “events” are occurring at random time points. Let \(N(t)\) denote the number of events that occur in the time interval \([0,t]\). These events are said to constitute a Poisson Process having rate \(\lambda\) i when the following conditions are satisfied.
(i) We have \(N(0)=0\). This condition states that the process begins at time \(0\).
(ii) The numbers of events occurring in disjoint time intervals are independent. This condition is the independent increment assumption, which states that the number of events by time \(t\), i.e. \(N(t)\), is independent of the number of events that occur between \(t\) and \(t+s\), i.e. \(N(t+s)-N(t)\).
(iii) The distribution of the number of events that occur in a given interval depends only on the length of the interval and not on its location. This condition is the {\bf stationary increment} assumption, which states that the probability distribution of \(N(t+s)-N(t)\) is the same for all values of \(t\).
(iv) We have
\[\lim_{h\rightarrow 0}\frac{\mathbb{P}\{N(h)=1\}}{h}=\lambda .\]
This condition states that in a small interval of length \(h\), the probability of one event occurring is approximately \(\lambda h\).
(v) We have
\[\lim_{h\rightarrow 0}\frac{\mathbb{P}\{N(h)\geq 2\}}{h}=0.\]
This condition states that in a small interval of length \(h\), the probability of two or more is approximately \(0\).
\begin{equation}{\label{sch3r2}}\tag{5}\mbox{}\end{equation}
Remark \ref{sch3r2}. We can also argue that these assumptions imply that the number of events occurring in an interval of length \(t\) is a Poisson random variable with mean \(\lambda t\). \(\sharp\)
For a Poisson process, let us denote by \(X_{1}\) the time of the first event. Furthermore, for \(n>1\), let \(X_{n}\) denote the elapsed time between the \((n-1)\)st and the \(n\)th event. The sequence \(\{X_{n}:n=1,2, \cdots\}\) is called the sequence of interarrival times. For instance, if \(X_{1}=5\) and \(X_{2}=10\), then the first event of the Poisson process will occur at time \(5\) and the second at time \(15\). We now determine the distribution of the \(X_{n}\). To do so, we first note that the event \(\{X_{1}>t\}\) takes place if and only if no events of the Poisson process occur in the interval \([0,t]\). Therefore, we have
\[\mathbb{P}\{X_{1}>t\}=\mathbb{P}\{N(t)=0\}=e^{-\lambda t}\]
by Remark \ref{sch3r2}. Hence \(X_{1}\) has an exponential distribution with mean \(1/\lambda\). To obtain the distribution of \(X_{2}\), we have
\begin{align*}
\mathbb{P}\{X_{2}>t|X_{1}=s\} & =\mathbb{P}\{\mbox{$0$ events in \((s,s+t]\)}|X_{1}=s\}\\
& =\mathbb{P}\{\mbox{$0$ events in \((s,s+t]\)}\}=e^{-\lambda t}\mbox{ (by the assumption of independent increment)}\\
& =e^{-\lambda t}\mbox{ (by the assumption of stationary increment)}
\end{align*}
Therefore, we conclude that \(X_{2}\) is also an exponential random variable with mean \(1/\lambda\) and furthermore, that \(X_{2}\) is independent of \(X_{1}\). Repeating the same argument yields the following proposition.
\begin{equation}{\label{sch3p17}}\tag{6}\mbox{}\end{equation}
Proposition \ref{sch3p17}. The interarrival times \(X_{1},X_{2},\cdots ,X_{n}\) are independent and identically distributed exponential random variables with parameter \(\lambda\). \(\sharp\)
Suppose we want to generate the first \(n\) event times of a Poisson process with rate \(\lambda\). To do so we make use of the result that the times between successive events for such a process are independent exponential random variables each with parameter \(\lambda\). Thus, one way to generate the process is to generate these interarrival times. If we generate \(n\) random numbers \(u_{1},u_{2},\cdots ,u_{n}\) and set \(x_{i}=(-1/\lambda )\ln u_{i}\), then \(x_{i}\) can be regarded as the time between the \((i-1)\)st and the \(i\)th event of the Poisson process. Since the actual time of the \(j\)th event will equal the sum of the first \(j\) interarrival times, it follows that the generated values of the first \(n\) event times are \(\sum_{i=1}^{j} x_{i}\), \(j=1,2,\cdots ,n\). If we want to generate the first \(T\) time units of the Poisson process, we can follow the above procedure of successively generating the interarrival times, stopping when their sum exceeds \(T\). That is, the following algorithm can be used to generate all the event times occurring in \((0,T)\) of the Poisson process with rate \(\lambda\). In the algorithm \(t\) refers to time, \(i\) is the number of events that have occurred by time \(t\), and \(S(i)\) is the most recent event time.
\[\begin{array}{l}
\mbox{Step 1. \(t=0\), \(i=0\)}.\\
\mbox{Step 2. Generate a random number \(u\)}.\\
\mbox{Step 3. \(t\leftarrow t-\frac{1}{\lambda}\ln u\). If \(t>T\), stop.}\\
\mbox{Step 4. \(i\leftarrow i+1\), \(S(i)=t\)}.\\
\mbox{Step 5. Go to Step 2.}
\end{array}\]
The final value of \(i\) in the above algorithm will represent the number of events that occur by time \(T\), and the values \(S(1),S(2),\cdots ,S(i)\) will be the \(i\) event times in increasing order.
\begin{equation}{\label{g}}\tag{G}\mbox{}\end{equation}
Generating Nonhomogeneous Poisson Process.
If “events” are occurring randomly in time, and \(N(t)\) denotes the number of events that occur by time \(t\), then we say that \(\{N(t)\}\)
constitutes a Nonhomogeneous Poisson Process with intensity function \(\lambda (t)\) when the following conditions are satisfied.
(i) \(N(0)=0\).
(ii) The numbers of events that occur in disjoint time intervals are independent.
(iii) We have
\[\lim_{h\rightarrow 0}\frac{P\{\mbox{exactly \(1\) event between \(t\) and \(t+h\)}\}}{h}=\lambda (t).\]
(iv) We have
\[\lim_{h\rightarrow 0}\frac{P\{\mbox{$2$ or more events between \(t\) and \(t+h\)}\}}{h}=0.\]
The function \(m(t)\) defined by
\[m(t)=\int_{0}^{t} \lambda (s)ds\]
is called the mean-value function.
Proposition. \(N(t+s)-N(t)\) is a Possion random variable with mean \(m(t+s)-m(t)\). \(\sharp\)
The quantity \(\lambda (t)\), called the intensity at time \(t\), indicates how likely it is that an event will occur around time \(t\). Note that when \(\lambda (t)=\lambda\) the nonhomogeneous reverts to the usual Poisson process. The following proposition gives a useful way of interpreting a nonhomogeneous Poisson process.
Proposition. Suppose that events are occurring according to a Poisson process having rate \(\lambda\), and suppose that, independently of anything that came before, an event that occurs at time \(t\) is counted with probability \(p(t)\). Then, the process of counted events constitutes a nonhomogeneous Poisson process with intensity function \(\lambda (t)=\lambda p(t)\). \(\sharp\)
Suppose that we want to simulate the first \(T\) time units of a nonhomogeneous Poisson process with intensity function \(\lambda (t)\). The first method we presented, valued the thinning or random sampling approach, (because it “thins” the homogeneous Poisson points) starts by choosing a value \(\lambda\) which satisfies
\[\lambda (t)\leq\lambda\mbox{ for all }t\leq T.\]
Such a nonhomogeneous Poisson process can be generated by a random selection of the event times of a Poisson process having rate \(\lambda\). That is, if an event of a Poisson process with rate \(\lambda\) that occurs at time \(t\) is counted, independently of what has transpired previously, with probability \(\lambda (t)/\lambda\), then the process of counted events is a nonhomogeneous Poisson process with intensity function \(\lambda (t)\) for \(0\leq t\leq T\) by Proposition \ref{sch3p17}. Therefore, by simulating a Poisson process and then randomly counting the events, we can generate the desired nonhomogeneous Poisson process. This can be written algorithmically as follows
\[\begin{array}{l}
\mbox{Step 1. \(t=0\), \(i=0\)}.\\
\mbox{Step 2. Generate a random number \(u\)}.\\
\mbox{Step 3. \(t\leftarrow t-\frac{1}{\lambda}\ln u\). If \(t>T\), stop.}\\
\mbox{Step 4. Generate a random number \(u\)}.\\
\mbox{Step 5. IF \(u\leq\lambda (t)/\lambda\), set \(i\leftarrow i+1\), \(S(i)=t\).}\\
\mbox{Step 6. Go to Step 2.}
\end{array}\]
$\lambda (t)$ is the intensity function and \(\lambda\) satisfies \(\lambda (t)\leq\lambda\). The final value of \(i\) represents the number of events time \(T\), and \(S(1),S(2),\cdots ,S(i)\) are the event times.

