Anton Heinrich Dieffenbach (1831-1914) was a German landscape and genre painter.
We have sections
A sample space \(S\) may be difficult to describe when the elements of \(S\) are not modeled using numbers. We shall discuss how we can use a rule such that each outcome of a random experiment, and element \(s\) of \(S\) may be associated with a real numbers \(x\).
Example. A rat is selected at random from a cage and its sex is determined. The set of possible outcomes is female and male. Therefore, the sample space is \(S=\{ \mbox{female},\mbox{male}\}=\{F,M\}\). Let \(X\) be a function defined on \(S\) such that \(X(F)=0\) and \(X(M)=1\). Thus \(X\) is a real-valued function that has the sample space \(S\) as its domian and the set of real numbers \(\{x:x=0,1\}\) as its range. We call \(X\) a random variable. The space associated with \(X\) is \(\{x;x=0,1\}\). \(\sharp\)
We now formulate the definition of a random variable. Given a random experiment with a sample space \(S\), a function \(X\) that assigns to each element \(s\) in \(S\) one and only one real number \(X(s)=x\) is called a random variable. The range of \(X\) is the set of real numbers \(\{x:X(s)=x,s\in S\}\). Suppose that the range contains a countable number of points; that is, the range contains either a finite number of points or the points of range can be put into one-to-one correspondence with the positive integers. Such a range is called a set of discrete points. Furthermore, the random variable \(X\) is called a random variable of the discrete type. For a random variable \(X\) of the discrete type, the probability \(P(X=x)\) defined as \(P(X=x)=P(A)\), where \(A=\{s:X(s)=x\}\), is frequently denoted by \(f(x)\). This function \(f(x)\) is called the probability density function, and it is abbreviated as p.d.f. The p.d.f. \(f(x)\) of a discrete random variable \(X\) is a function that satisfies the following properties. Let \(R\) be the range of \(X\).
(i) We have \(f(x)>0\) for \(x\in R\).
(ii) We have \(\sum_{x\in R} f(x)=1\).
(iii) For \(A\subset R\), we have \begin{align*} P(X\in A) & =\sum_{x\in A}P(X=x)\\ & =\sum_{x\in A}f(x),\end{align*}.
Example. Roll a four-sided die twice, and let \(X\) equal the larger of the two outcomes if they are different and the common value if they are the same. The sample space for this experiment is
\[S=\{(d_{1},d_{2}):d_{1}=1,2,3,4;d_{2}=1,2,3,4\},\]
where we assume that each of these 16 points has probability \(1/16\). Then, we have
\begin{align*} P(X=1) & =P[(1,1)]=1/16,\\ P(X=2) & =P\{(1,2),(2,1),(2,2)\}\\ &=3/16,\\ P(X=3) & =5/16,\\ P(X=4) & =7/16.\end{align*}
That is, the p.d.f. of \(X\) can be written simply as
\begin{align*} f(x) & =P(X=x)\\ & =\frac{2x-1}{16}\\ & \mbox{ for } x=1,2,3,4.\end{align*}
We can add \(f(x)=0\) elsewhere. \(\sharp\)
\begin{equation}{\label{a}}\tag{A}\mbox{}\end{equation}
Mathematical Expectation (Mean) and Variance.
Let \(f(x)\) be the p.d.f. of the discrete random variable \(X\) with range \(R\). Suppose that the summation
\[\sum_{x\in R} u(x)f(x)\]
exists. Then, the sum is called the mathematical expectation or the mean of the function \(u(X)\), and it is denoted by \(E[u(X)]\). In other words, we have
\[\mathbb{E}[u(X)]=\sum_{x\in R}u(x)f(x).\]
We can think of the expected value \(\mathbb{E}[u(X)]\) as a weighted mean of \(u(x)\) for \(x\in R\), where the weights are the probabilities \(f(x)=P(X=x)\) for \(x\in R\).
Example. Let the discrete random variable \(X\) have the p.d.f. \(f(x)=\frac{1}{3}\) for \(x\in R=\{-1,0,1\}\). Let \(u(X)=X^{2}\). Then, we have
\begin{align*} \mathbb{E}[X^{2}] & =\sum_{x\in R}x^{2}f(x)\\ & =(-1)^{2}\cdot\frac{1}{3}+(0)^{2}\cdot\frac{1}{3}+(1)^{1}\cdot\frac{1}{3}\\ & =\frac{2}{3}.\end{align*}
Example. Let \(X\) have the p.d.f.
\[f(x)=\left\{\begin{array}{ll}
\frac{1}{8}, & x=0,3\\
\frac{3}{8}, & x=1,2.
\end{array}\right .\]
The mean of \(X\) is
\begin{align*} \mathbb{E}(X) & =0\cdot\frac{1}{8}+1\cdot\frac{3}{8}+2\cdot\frac{3}{b}+3\cdot\frac{1}{8}\\ & =\frac{3}{2}.\end{align*}
Theorem. The mathematical expectation satisfies the following properties when it exists.
(i) Let \(c\) be a constant. Then, we have \(\mathbb{E}(c)=c\).
(ii) Let \(c\) be a constant, and let \(u\) be a function. Then, we have \(\mathbb{E}[cu(X)]=c\mathbb{E}[u(X)]\).
(iii) Let \(c_{1}\) and \(c_{2}\) be constants, and let \(u_{1}\) and \(u_{2}\) be functions. Then, we have
\[\mathbb{E}[c_{1}u_{1}(X)+c_{2}u_{2}(X)]=c_{1}\mathbb{E}[u_{1}(X)]+c_{2}\mathbb{E}[u_{2}(X)].\]
Proof. To prove part (i), we have
\begin{align*} \mathbb{E}(c) & =\sum_{x\in R}cf(x)\\ & =c\sum_{x\in R}f(x)\\ & =c.\end{align*}
To prove part (ii), we have
\begin{align*} \mathbb{E}[cu(X)] & =\sum_{x\in R}cu(x)f(x)\\ & =c\sum_{x\in R}u(x)f(x)\\ & =c\mathbb{E}[u(X)].\end{align*}
To prove part (iii), we have
\begin{align*} & \mathbb{E}[c_{1}u_{1}(X)+c_{2}u_{2}(X)]\\ & \quad=\sum_{x\in R}[c_{1}u_{1}(x)+c_{2}u_{2}(x)]f(x)\\ & \quad =c_{1}\sum_{x\in R}u_{1}(x)f(x)+c_{2}\sum_{x\in R}u_{2}(x)f(x).\end{align*}
This completes the proof. \(\blacksquare\)
The third property can be extended to more than two terms by mathematical induction; that is, we have
\[\mathbb{E}\left [\sum_{i=1}^{k} c_{i}u_{i}(X)\right ]=\sum_{i=1}^{k} c_{i}\mathbb{E}[u_{i}(X)].\]
Example. Let \(X\) have the p.d.f. \(f(x)=\frac{x}{10}\) for \(x=1,2,3,4\). Then, we have
\begin{align*} \mathbb{E}(X) & =\sum_{x=1}^{4} x\left (\frac{x}{10}\right )\\ & =3\\ \mathbb{E}(X^{2}) & =\sum_{x=1}^{4} x^{2}\left (\frac{x}{10}\right )\\ & =10\\ \mathbb{E}[X(5-X)] & =5\mathbb{E}(X)-\mathbb{E}(X^{2})\\ & =5\cdot 3-10=5.\end{align*}
Example. Let \(u(x)=(x-c)^{2}\), where \(c\) is a constant. Suppose that \(\mathbb{E}[(X-c)^{2}]\) exists. Find the value of \(c\) such that \(\mathbb{E}[(X-c)^{2}]\) is a minimum. We write
\begin{align*} g(c) & =\mathbb{E}[(X-c)^{2}]\\ & =\mathbb{E}[X^{2}-2cX+c^{2}]\\ & =\mathbb{E}(X^{2})-2c\mathbb{E}(X)+c^{2}.\end{align*}
$g'(c)=0$ and solve for \(c\). Then, we obtain \(g'(c)=-2\mathbb{E}(X)+2c=0\), i.e., \(c=\mathbb{E}(X)\). Since \(g”(c)=2>0\), it means that \(\mathbb{E}(X)\) is the value of \(c\) that minimizes \(\mathbb{E}[(X-c)^{2}]\). \(\sharp\)
We see that the mean \(\mu =\mathbb{E}(X)\) is the centroid of a system of weights or a measure of the central location of the probability distribution of \(X\). A measure of the dispersion or spread of a distribution is defined as follows. Suppose that \(\mathbb{E}[(X-\mu )^{2}]\) is finite. The variance of a discrete random variable \(X\) is defined by
\begin{align*} \sigma^{2} & =Var(X)\\ &=\mathbb{E}[(X-\mu )^{2}]\\ & =\sum_{x\in R}(x-\mu )^{2}f(x).\end{align*}
The positive square root of the variance is called the standard deviation of \(X\) and is denoted by
\[\sigma =\sqrt{\mathbb{E}[(X-\mu )^{2}]}.\]
It is worthwhile to note that the variance can be computed in another manner. We have
\begin{align*} \sigma^{2} & =\mathbb{E}[(X-\mu )^{2}]\\ & =\mathbb{E}(X^{2}-2\mu X+\mu^{2})\\ & =\mathbb{E}(X^{2})-2\mu \mathbb{E}(X)+\mu^{2}\\ & =\mathbb{E}(X^{2})-2\mu^{2}+\mu^{2}\\ & =\mathbb{E}(X^{2})-\mu^{2}.\end{align*}
Example. Let the p.d.f. of \(X\) be given by \(f(x)=x/6\), \(x=1.2.3\). The mean of \(X\) is
\begin{align*} \mu & =\mathbb{E}(X)\\ &=1\cdot\frac{1}{6}+2\cdot\frac{2}{6}+3\cdot\frac{3}{6}\\ & =\frac{7}{3}.\end{align*}
To find the variance and standard deviation of \(X\), we first find
\begin{align*} \mathbb{E}(X^{2}) & =1^{2}\cdot\frac{1}{6}+2^{2}\cdot\frac{2}{6}+3^{2}\cdot\frac{3}{6}\\ & =\frac{36}{6}\\ & =6.\end{align*}
Therefore, the variance of \(X\) is given by
\begin{align*} \sigma^{2} & =\mathbb{E}(X^{2})-\mu^{2}\\ & =6-\left (\frac{7}{3}\right )^{2}=\frac{9}{5},\end{align*}
and the standard deviation of \(X\) is \(\sigma =\sqrt{5}/3\). \(\sharp\)
Let \(X\) be a random variable with mean \(\mu_{X}\) and variance \(\sigma_{X}^{2}\). Given any constants \(a\) and \(b\), we see that \(Y=aX+b\) is also a random variable. The mean of \(Y\) is given by
\begin{align*} \mu_{Y} & =\mathbb{E}(Y)\\ &=\mathbb{E}(aX+b)\\ & =a\mathbb{E}(X)+b\\ & =a\mu_{X}+b,\end{align*}
and the variance of \(Y\) is given be
\begin{align*} \sigma_{Y}^{2} & =\mathbb{E}[(Y-\mu_{Y})^{2}]\\ & =\mathbb{E}[(aX+b-a\mu_{X}-b)^{2}]\\ & =\mathbb{E}[a^{2}(X-\mu_{X})^{2}]\\ & =a^{2}\sigma_{X}^{2},\end{align*}
which says \(\sigma_{Y}=|a|\sigma_{X}\).
Let \(X\) be an integer that is selected randomly from the first \(m\) positive integers. We say that \(X\) has a discrete uniform distribution on the integers \(1,2,\cdots ,m\). The p.d.f. of \(X\) is defined by \(f(x)=\frac{1}{m}\) for \(x=1,2,\cdots ,m\). The mean of \(X\) is given by
\begin{align*} \mu & =\mathbb{E}(X)\\ & =\sum_{x=1}^{m}x\left (\frac{1}{m}\right )\\ & =\left (\frac{1}{m}\right )\\ & =\frac{m+1}{2}.\end{align*}
To find the variance of \(X\), we first find
\begin{align*} \mathbb{E}(X^{2}) & =\sum_{x=1}^{m}x^{2}\left (\frac{1}{m}\right )\\ & =\left (\frac{1}{m}\right )\frac{m(m+1)(2m+1)}{6}\\ & =\frac{(m+1)(2m+1)}{6}.\end{align*}
Therefore, the variance of \(X\) is given by
\begin{align*} \sigma^{2} & =\mathbb{E}(X^{2})-\mu^{2}\\ & =\frac{m^{2}-1}{12}.\end{align*}
Example. Let \(X\) equal the outcome when rolling a fair six-sided die. Then, the p.d.f. of \(X\) is given by \(f(x)=\frac{1}{6}\) for \(x=1,2,3,4,5,6\). The respective mean and variance of \(X\) are given by
\[\mu =\frac{1+6}{2}=3.5\]
and
\[\sigma^{2}=\frac{6^{2}-1}{12}=\frac{35}{12}.\]
Consider a collection of \(N=N_{1}+N_{2}\) similar objects, \(N_{1}\) of them belonging to one of two dichotomous classes (red chips) and \(N_{2}\) of them belonging to the second class (blue chips). A collection of \(n\) objects is selected from these \(N\) objects at random and without replacement. Find the probability that exactly \(x\) of these \(n\) objects are red, where the integer \(x\) satisfies \(x\leq n\), \(x\leq N_{1}\) and \(n-x\leq N_{2}\), i.e. \(x\) belong to the first class and \(n-x\) belong to the second class. We can select \(x\) red chips in any one of \(C^{N_{1}}_{x}\) ways and \(n-x\) blue chips in any one of \(C^{N_{2}}_{n-x}\) ways. By the multiplication principle, the product \(C^{N_{1}}_{x}C^{N_{2}}_{n-x}\) equals the number of ways the joint operation can be performed. When we assume that each of \(C^{N}_{n}\) ways of selecting \(n\) objects from \(N\) objects has the same probability, we have that the probability of selecting exactly \(x\) red chips is given by
\begin{align*} P(X=x) & =\frac{C^{N_{1}}_{x}C^{N_{2}}_{n-x}}{C^{N}_{n}},\end{align*}
where \(x\leq n\), \(x\leq N_{1}\) and \(n-x\leq N_{2}\). We say that this random variable \(X\) has a hypergeometric distribution. It can be shown
\[\mu =n\left (\frac{N_{1}}{N}\right )\]
and
\[\sigma^{2}=n\left (\frac{N_{1}}{N}\right )\left (\frac{N_{2}}{N}\right )\left (\frac{N-n}{N-1}\right ).\]
\begin{equation}{\label{b}}\tag{B}\mbox{}\end{equation}
Discrete Random Variables.
Bernoulli Distributions.
A Bernoulli experiment is a random experiment, where the outcome can be classified as two mutually exclusive and exhaustive ways. For example, success or failure, female or male, life or death, non-defective or defective. A sequence of Berboulli trials occurs when a Bermoulli experiment is performed several independent times such that the probability of success remains the same from trial to trial. In such a sequence, let \(p\) denote the probability of success on each trial. In addition, we shall frequently let \(q=1-p\) denote the probability of failure.
Let \(X\) be a random variable associated with a Bernoulli trial by defining it as \(X(\mbox{success})=1\) and \(X(\mbox{failure})=0\). In other words, two outcomes, success and failure, are denoted by one and zero, respectively. The p.d.f. of \(X\) is given by
\begin{align*} f(x) & =P(X=x)\\ & =p^{x}(1-p)^{1-x}\mbox{ for }x=0,1,\end{align*}
and we say that \(X\) has a Bernoulli distribution. The expected value of \(X\) is given by
\begin{align*} \mu & =\mathbb{E}(X)\\ & =\sum_{x=0}^{1} xp^{x}(1-p)^{1-x}\\ & =p,\end{align*}
and the variance of \(X\) is given by
\begin{align*} \sigma^{2} & =Var(X)\\ & =\sum_{x=0}^{1} (x-p)^{2}p^{x}(1-p)^{1-x}\\ & =p(1-p).\end{align*}
It follows that the standard deviation of \(X\) is \(\sigma =\sqrt{pq}\).
Binomial Distribution.
In a sequence of Bernoulli trials, we are often interested in the total number of successes and not in the order of their occurrence. Let the random variable \(X\) equal the number of observed successes in \(n\) Bernoulli trials. The possible values of \(X\) are \(0,1,2,\cdots ,n\). If \(x\) successes occur, where \(x=0,1,2,\cdots ,n\), then \(n-x\) failures occur. The number of
ways of selecting \(x\) positions for the \(x\) successes in the \(n\) trials is \(C^{n}_{x}\). Since the trials are independent and since the probabilities of successes and failures on each trial are, respectively, \(p\) and \(1-p\), the probability of each of these ways is \(p^{x}(1-p)^{n-x}\). Therefore, the p.d.f. of \(X\) is the sum of the probabilities of these \(C^{n}_{x}\) mutually exclusive event; that is, we have
\[f(x)=C^{n}_{x}p^{x}(1-p)^{n-x}\]
for \(x=0,1,2,\cdots ,n\). These probabilities are called binomial probabilities, and the random variable \(X\) is said to have a binomial distribution. A binomial distribution will be denoted by the symbol \(B(n,p)\), and we say that the distribution of \(X\) is \(B(n,p)\). The constants \(n\) and \(p\) are the parameters of binomial distribution. It can be shown that the mean and variance of binomial distribution \(B(n,p)\) are
\[\mu =np\mbox{ and }\sigma^{2}=np(1-p).\]
Geometric Distribution.
To obtain a binomial random variable, we observed a sequence of \(n\) Bernoulli trials and counted the number of successes. Suppose now that we do not fix the number of Bernoulli trials in advance but instead continue to observe the sequence of Bernoulli trials until \(r\) successes occurs. The random variable of interest is the number of trials needed to observe the \(r\)th success. We first discuss this problem when \(r=1\). That is, consider a sequence of Bernoulli trials with probability \(p\) of success. This sequence is observed until the first success occurs. Let \(X\) denote the trial number on which this first success occurs. The p.d.f. of \(X\) is given by
\[f(x)=P(X=x)=p(1-p)^{x-1}\]
for \(x=1,2,3,\cdots\) because there must be \(x-1\) failures before the first success that occurs on trial \(x\). In this way, we say that \(X\) has a geometric distribution. It can be shown that the mean and variance of the geometric distribution are
\[\mu =\frac{1}{p}\mbox{ and }\sigma^{2}=\frac{1-p}{p^{2}}.\]
Negative Binomial Distribution.
Now, we turn to the more general problem of observing a sequence of Bernoulli trials until exactly \(r\) successes occur, where \(r\) is a fixed positive integer. Let the random variable \(X\) denote the number of trials needed to observe the \(r\)th success. In other words, the random variable \(X\) is the trial number on which the \(r\)th success is observed. By the multiplication rule of probabilities, the p.d.f. of \(X\) equals the product of the probability \(C^{x-1}_{r-1}p^{r-1}(1-p)^{x-r}\) of obtaining exactly \(r-1\) successes in the first \(x-1\) trials and the probability \(p\) of a success on the \(r\)th trial. Therefore, the p.d.f. of \(X\) is given by
\begin{align*} f(x) & =P(X=x)\\ & =C^{x-1}_{r-1}p^{r-1}(1-p)^{x-r}\end{align*}
for \(x=r,r+1,\cdots\) .
The reason for calling this negative binomial distribution is the following. Consider \(h(w)=(1-w)^{-r}\), the binomial \((1-w)\) with the negative exponent \(-r\). Using Maclaurin series expansion, we have
\begin{align*} (1-w)^{-r} & =\sum_{k=0}^{\infty}\frac{h^{(k)}(0)}{k!}w^{k}\\ & =\sum_{k=0}^{\infty}C^{r+k-1}_{r-1}w^{k}.\end{align*}
Let \(x=k+r\) in the summation. Then, we have \(k=x-r\) and
\begin{align*} (1-w)^{-r} & =\sum_{x=r}^{\infty}C^{r+x-r-1}{r-1}w^{x-r}\\ & = \sum{x=r}^{\infty}C^{x-1}_{r-1}w^{x-r}\end{align*}
the summand of which is, except for the factor \(p^{r}\), the negative binomial probability when \(w=q\). It can be shown that the mean and variance
of the negative binomial distribution are
\[\mu =\frac{r}{p}\mbox{ and }\sigma^{2}=\frac{r(1-p)}{p^{2}}.\]
Poisson Distribution.
We say that the random variable \(X\) has a Poisson distribution if its p.d.f. has the form
\[f(x)=P(X=x)=\frac{\lambda^{x}e^{-\lambda}}{x!}\]
for \(x=0,1,2,\cdots \)latex and \(\lambda >0\). It can be shown that the mean and variance of the Poisson distribution are \(\mu =\lambda\) and \(\sigma^{2}=\lambda\). When \(n\) is large, we have
\[P(X=x)\approx C^{n}_{x}\left (\frac{\lambda}{n}\right )^{x}\left (1-\frac{\lambda}{n}\right )^{n-x}.\]
It also means that, when \(X\) has the binomial distribution \(B(n,p)\) with large \(n\) and small \(p\), we have
\[\frac{(np)^{x}e^{-np}}{x!}\approx C^{n}_{x}p^{x}(1-p)^{n-x}\]
by taking \(\lambda =np\). This approximation is reasonably good when \(n\) is large. In this case, since \(\lambda\) is a fixed constant, the parameter \(p\) should be small by referring to \(np=\lambda\).
Example. A manufacturer of Christmas tree light bulbs knows that 2% of its bulbs are defective. Assuming independence, we have a binomial distribution with parameters \(p=0.02\) and \(n=100\). To approximate the probability that a box of 100 of these bulbs contains at most three defective bulbs, we use the Poisson distribution with \(\lambda =100\cdot 0.02=2\), which gives
\[\sum_{x=0}^{3}\frac{2^{x}e^{-2}}{x!}=0.857.\]
Using the binomial distribution, we obtain
\[\sum_{x=0}^{3}C^{100}_{x}(0.02)^{x}(0.98)^{100-x}=0.859.\]
In this case, the Poisson approximation is extremely close to the true value.
\begin{equation}{\label{c}}\tag{C}\mbox{}\end{equation}
Generating Functions.
Let \(X\) be a discrete random variable with p.d.f. \(f(x)\) and range \(R\). Suppose that
\[\eta (t)=\mathbb{E}(t^{X})=\sum_{x\in R}t^{x}f(x)\]
exists and is finite for \(t\) values in some interval including \(t=0\) and \(t=1\). Then, the function \(\eta (t)\) is called the probability-generating function of \(X\).
Example. Consider a random variable \(X\) that has the geometric p.d.f. \(f(x)=p(1-p)^{x-1}\) for \(x=1,2,3,\cdots\), where \(0<p<1\). Then, we have
\begin{align*} \eta (t) & =\sum_{x=1}^{\infty} t^{x}p(1-p)^{x-1}\\ & =pt\sum_{x=1}^{\infty}[(1-p)t]^{x-1}\\ & =\frac{pt}{1-(1-p)t}\end{align*}
provided that \(-1<(1-p)t<1\). In other words, if \(-1/(1-p)<t<1/(1-p)\), the generating function \(\eta (t)\) exists. \(\sharp\)
To see why \(\eta (t)\) is called the probability-generating function, suppose that there are only a finite number of points in \(R\), which is true for the binomial distribution for illustration. Now, we have
\begin{align*} \eta (t) & =\sum_{x=0}^{n} t^{x}f(x)\\ & =f(0)+f(1)t+f(2)t^{2}+\cdots +f(n)t^{n}\end{align*}
is simply a polynomial of \(n\)th degree. In particular, we have \(\eta (0)=f(0)=P(X=0)\). We take the first derivative
\[\eta^{\prime}(t)=f(1)+2f(2)t+3f(3)t^{2}+\cdots +nf(n)t^{n-1}\]
and set \(t=0\). Then, we obtain
\[\eta^{\prime}(0)=f(1)=P(X=1).\]
By taking the second derivative
\[\eta^{\prime\prime}(t)=2f(2)+3\cdot 2f(3)t+\cdots +n\cdot (n-1)f(n)t^{n-2}\]
and setting \(t=0\), we also obtain
\[P(X=2)=\frac{\eta^{\prime\prime}(0)}{2!}.\]
In general, we have
\[P(X=r)=\frac{\eta^{(r)}(0)}{r!}\]
for \(r=0,1,2,\cdots ,n\). Therefore, we can generate the probabilities \(P(X=x)\) for \(x\in R\) from \(\eta (t)\) and its derivatives.
Now, we consider
\[\eta (t)=\sum_{x\in R} t^{x}f(x).\]
If we interchange the order of differentiation and summation, which we can do provided that the resulting summations exist, we obtain, for each positive integer \(r\),
\[\eta^{(r)}(t)=\sum_{x\in R}x(x-1)\cdots (x-r+1)t^{x-r}f(x).\]
By setting \(t=1\), we also obtain
\begin{align*} \eta^{(r)}(1) & =\sum_{x\in R}x(x-1)\cdots (x-r+1)f(x)\\ & =\mathbb{E}[X(X-1)\cdots (X-r+1)].\end{align*}
In particular, we have
\[\eta^{\prime}(1)=E(X)=\mu\]
and
\begin{align*} & \eta^{\prime\prime}(1)+\eta^{\prime}(1)-[\eta^{\prime}(1)]^{2}\\ & \quad =\mathbb{E}[X(X-1)]+\mathbb{E}(X)-[\mathbb{E}(X)]^{2}\\ & \quad =\mathbb{E}(X^{2})-[\mathbb{E}(X)]^{2}\\ & \quad=\sigma^{2}.\end{align*}
Example. For the geometric distribution, the probability-generating function is given by
\[\eta (t)=\frac{pt}{[1-(1-p)t]}\]
for \(-\frac{1}{1-p}<t<\frac{1}{1-p}\). Now, we have
\[\eta^{\prime}(t)=\frac{p}{[1-(1-p)t]^{2}}\]
and
\[\eta^{\prime\prime}(t)=\frac{2p(1-p)}{[1-(1-p)]^{3}}.\]
The mean of the geometric distribution is given by
\begin{align*} \mu & =E(X)\\ & =\eta^{\prime}(1)\\ & =\frac{p}{[1-(1-p)]^{2}}\\ & =\frac{1}{p}.\end{align*}
The variance of the geometric distribution is given by
\begin{align*} \sigma^{2} & =\eta^{\prime\prime}(1)+\eta^{\prime}(1)-[\eta^{\prime}(1)]^{2}\\ & =\frac{2p(1-p)}{[1-(1-p)]^{3}}+\frac{1}{p}-\frac{1}{p^{2}}\\ & =\frac{1-p}{p^{2}}.\end{align*}
Let \(X\) be a discrete random variable with p.d.f. \(f(x)\) and range \(R\). If there is a positive number \(h\) such that
\[\mathbb{E}(e^{tX})=\sum_{x\in R}e^{tx}f(x)\]
exists and is finite for \(-h<t<h\), then the function of \(t\) defined by
\[M(t)=\mathbb{E}(e^{tX})\]
is called the moment-generating function of \(X\).
Example. Consider the random variable \(X\) that has the geometric p.d.f. \(f(x)=p(1-p)^{x-1}\) for \(x=1,2,3,\cdots\). Then, we have
\begin{align*} M(t) & =\sum_{x=1}^{\infty} e^{tx}p(1-p)^{x-1}\\ & =pe^{t}\sum_{x=1}^{\infty}[(1-p)e^{t}]^{x-1}.\end{align*}
The summation is the sum of a geometric series which exists provided that \((1-p)e^{t}<1\) or, equivalently, \(t<-\ln (1-p)\). That is, we have
\[M(t)=\frac{pe^{t}}{1-(1-p)e^{t}}\]
for \(t<-\ln (1-p)\). It can be shown that the existence of \(M(t)\), for \(-h<t<h\), implies that the derivatives of \(M(t)\) of all orders exists at \(t=0\). Moreover, it is permissible to interchange differentiation and summation. Therefore, we obtain
\[M^{(r)}(t)=\sum_{x\in R}x^{r}e^{tx}f(x).\]
By taking \(t=0\), it follows
\[M^{(r)}(0)=\sum_{x\in R} x^{r}f(x)=\mathbb{E}(X^{r}).\]
In particular, if the moment-generating function exists, we have \(\mu =M^{\prime}(0)\) and
\[\sigma^{2}=M^{\prime\prime}(0)-[M^{\prime}(0)]^{2}.\]
Example. Let \(X\) have a binomial distribution \(B(n,p)\) with p.d.f.
\[f(x)=C^{n}_{x}p^{x}(1-p)^{n-x}\]
for \(x=0,1,2,\cdots ,n\). The moment-generating function of \(X\) is given by
\begin{align*} M(t) & =E(e^{tX})\\ & =\sum_{x=0}^{n}e^{tx}C^{n}_{x}p^{x}(1-p)^{n-x}\\ & =\sum_{x=0}^{n}C^{n}_{x}(pe^{t})^{x}(1-p)^{n-x}.\end{align*}
Using the formula for the binomial expansion with \(a=1-p\) and \(b=pe^{t}\), we have
\[M(t)=[(1-p)+pe^{t}]^{n}.\]
The first two derivatives of \(M(t)\) are given by
\[M'(t)=n[(1-p)+pe^{t}]^{n-1}(pe^{t})\]
and
\[M”(t)=n(n-1)[(1-p)+pe^{t}]^{n-2}(pe^{t})^{2}+n[(1-p)+pe^{t}]^{n-1}(pe^{t}).\]
Therefore, we obtain
\[\mu =\mathbb{E}(X)=M'(0)=np\]
and
\begin{align*} \sigma^{2} & =M”(0)=[M'(0)]^{2}\\ & =n(n-1)p^{2}+np-(np)^{2}\\ & =np(1-p).\end{align*}
Example. Let \(X\) have a Poisson distribution with p.d.f.
\[f(x)=\frac{\lambda^{x}e^{-\lambda}}{x!}\]
for \(x=0,1,2,\cdots\). The moment-generating function of \(X\) is given by
\begin{align*} M(t) & =E(e^{tX})\\ & =\sum_{x=0}^{\infty} e^{tx}\frac{\lambda^{x}e^{-\lambda}}{x!}\\ & =e^{-\lambda}\sum_{x=0}^{\infty}\frac{(\lambda e^{t})^{x}}{x!}.\end{align*}
Using the series representation of the exponential function, we have
\begin{align*} M(t) & =e^{-\lambda}e^{\lambda e^{t}}\\ & =e^{\lambda (e^{t}-1)}\end{align*}
for all value of \(t\). Therefore, we obtain
\[M'(t)=\lambda e^{t}e^{\lambda (e^{t}-1)}\\]
and
\[M”(t)=(\lambda e^{t})^{2}e^{\lambda (e^{t}-1)}+\lambda e^{t}e^{\lambda (e^{t}-1)}.\]
The mean and variance of \(X\) are given by
\[\mu =M'(0)=\lambda\]
and
\begin{align*} \sigma^{2} & =M”(0)-[M'(0)]^{2}\\ & =(\lambda^{2}+\lambda )-\lambda^{2}=\lambda .\end{align*}
The relationship between \(\eta (t)\) and \(M(t)\) are given by
\[M(t)=\eta (e^{t})\mbox{ and }\eta (t)=M(\ln t).\]
Example. Let \(X\) have a negative binomial distribution with p.d.f.
\[f(x)=C^{x-1}_{r-1}p^{r}(1-p)^{x-r}\]
for \(x=r,r+1,\cdots\). The probability-generating function of \(X\) is given by
\begin{align*} \eta (t) & =\sum_{x=r}^{\infty} t^{x}C^{x-1}_{r-1}p^{r}(1-p)^{x-r}\\ & = (pt)^{r}\sum_{x=r}^{\infty}C^{x-1}_{r-1}[(1-p)t]^{x-r}\\ & =\frac{(pt)^{r}}{[1-(1-p)t]^{r}}\end{align*}
where \(|t|<1/(1-p)\). The moment-generating function of \(X\) is
\begin{align*} M(t) & =\eta (e^{t})\\ & =\frac{(pe^{t})^{r}}{[1-(1-p)e^{t}]^{r}}\end{align*}
for \(t<-\ln (1-p)\). These generating functions can be used to show
\[\mu =\frac{r}{p}\mbox{ and }\sigma^{2}=\frac{r(1-p)}{p^{2}}.\]
Let \(X\) have a p.d.f. \(f(x)\) with support \({b_{1},b_{2},\cdots}\). Then, we have
\begin{align*} M(t) & =\sum_{x\in R}e^{tx}f(x)\\ & =f(b_{1})e^{tb_{1}}+f(b_{2})e^{tb_{2}}+\cdots .\end{align*}
Therefoe, the coefficients of \(e^{tb_{i}}\) is \(f(b_{i})=P(X=b_{i})\). That is, if we write a moment-generating function of a discrete- random variable \(X\) in the form above, the probability of any value of \(X\), say \(b_{i}\), is the coefficient of \(e^{tb_{i}}\).
Example. Let the moment-generating function of \(X\) be defined by
\[M(t)=\frac{1}{15}e^{t}+\frac{2}{15}e^{2t}+\frac{3}{15}e^{3t}+\frac{4}{15}e^{4t}+\frac{5}{15}e^{5t}.\]
Then, the coefficient of \(e^{2t}\) is \(2/15\). Therefore, we have \(f(2)=P(X=2)=2/15\). In general, we see that the p.d.f. of \(X\) is given by
\[f(x)=\frac{x}{15}\mbox{ for }x=1,2,3,4,5.\]


