Expectations and Variances

William Clarke Wontner (1857-1930) was an English portrait painter.

We have sections

\begin{equation}{\label{a}}\tag{A}\mbox{}\end{equation}

Moments of Random Variables.

Let \(X\) be a random variable with p.d.f. \(f\), and let \(g:(\mathbb{R},{\cal B})\rightarrow (\mathbb{R},{\cal B})\) be measurable so that \(g(X)\) is a random variable. Then, we give the following definition.

Definition. The \(n\)th moment of \(g(X)\) is denoted by \(\mathbb{E}[g(X)]^{n}\) and is defined by

\[\mathbb{E}[g(X)]^{n}=\left\{\begin{array}{l}
{\displaystyle \sum_{x}(g(x))^{n}\cdot f(x)}\\
{\displaystyle \int_{-\infty}^{\infty}(g(x))^{n}\cdot f(x)dx}.
\end{array}\right .\]

For \(n=1\), \(\mathbb{E}[g(X)]\) is called the mathematical expectation or mean value of \(g(X)\). For an arbitrary constants \(c\) and \(n\), the \(n\)th moment of
$g(X)$ about \(c\) is denoted by \(\mathbb{E}[g(X)-c]^{n}\) and is defined by

\[\mathbb{E}[g(X)-c]^{n}=\left\{\begin{array}{l}
{\displaystyle \sum_{x}(g(x)-c)^{n}\cdot f(x)}\\
{\displaystyle \int_{-\infty}^{\infty}(g(x)-c)^{n}\cdot f(x)dx}.
\end{array}\right .\]

For \(c=\mathbb{E}[g(X)]\), the moments are called central moments. The 2nd central moment of \(g(X)\) is called the variance of \(g(X)\) and is denoted by \(\sigma^{2}(g(X))\) or \(Var(g(X))\). The quantity \(\sqrt{\sigma^{2}(g(X))}\equiv\sigma (g(X))\) is called the standard deviation of \(g(X)\). \(\sharp\)

Some interesting properties are shown below.

  • We have \(\mathbb{E}[c]=c\) and \(\sigma^{2}(c)=0\), where \(c\) is a constant.
  • We have \[\mathbb{E}[cg(X)+d]=c\mathbb{E}[g(X)]+d\] and, in general, we have \[\mathbb{E}\left [\sum_{i=1}^{n}c_{i}g_{i}(X)\right ]=\sum_{i=1}^{n}c_{i}\mathbb{E}[g_{i}(X)].\]
  • For \(X\geq 0\), we have \(\mathbb{E}[X]\geq 0\). Consequently, if \(X\geq Y\), then \(\mathbb{E}[X]\geq \mathbb{E}[Y]\).
  • We have \[\left |\mathbb{E}[g(X)]\right |\leq \mathbb{E}[\left |g(X)\right |].\]
  • We have \[\sigma^{2}[cg(X)+d]=c^{2}\sigma^{2}[g(X)].\]
  • We have \[\sigma^{2}[g(X)]=\mathbb{E}[g(X)]^{2}-(\mathbb{E}[g(X)])^{2}.\]

Theorem. For \(X\geq 0\), we have
\[\mathbb{E}[X]=\int_{0}^{\infty}\mathbb{P}(X>y)dy.\]

\begin{equation}{\label{ex1}}\tag{1}\mbox{}\end{equation}

Example \ref{ex1}. Let \(X\) and \(Y\) have the joint p.d.f. \(f(x,y)=2\) for \(0\leq x\leq y\leq 1\). Then, the support is given by

\[R=\{(x,y):0\leq x\leq y\leq 1\}.\]

Now, we have

\begin{align*} & \mathbb{P}(0\leq X\leq 1/2,0\leq Y\leq 1/2)\\ & \quad=\mathbb{P}(0\leq X\leq Y,0\leq Y\leq 1/2)\\ & \quad=\int_{0}^{1/2}\int_{0}^{y} 2dxdy\\ & \quad=\frac{1}{4}.\end{align*}

The marginal p.d.f.’s are given by

\[f_{X}(x)=\int_{x}^{1} 2dy=2(1-x)\mbox{ for }0\leq x\leq 1\]

and

\[f_{Y}(y)=\int_{0}^{y} 2dx=2y\mbox{ for }0\leq y\leq 1.\]

Three expected values are given by

\begin{align*} \mathbb{E}[X] & =\int_{0}^{1}\int_{x}^{1} 2xdydx=\int_{0}^{1} 2x(1-x)dx=\frac{1}{3}\\
\mathbb{E}[Y] & =\int_{0}^{1}\int_{0}^{y} 2ydxdy=\int_{0}^{1} 2y^{2}dy=\frac{2}{3}\\
\mathbb{E}[Y^{2}] & =\int_{0}^{1}\int_{0}^{y} 2y^{2}dxdy=\int_{0}^{1}2y^{3}dy=\frac{1}{2}.\end{align*}

\begin{equation}{\label{b}}\tag{B}\mbox{}\end{equation}

Conditional Expectations.

For the discrete random variable, the conditional expectation of \(Y\), given \(X=x\), is defined by

\begin{align*} \mu_{Y|X=x} & =\mathbb{E}[Y|X=x]\\ & =\sum_{y}y\cdot f(y|x),\end{align*}

and the conditional variance of \(Y\), given \(X=x\), is defined by

\begin{align*} \sigma_{Y|X=x}^{2} & =\mathbb{E}\left .\left\{(Y-\mathbb{E}[Y|X=x])^{2}\right |X=x\right\}\\ & =\sum_{y}(y-\mathbb{E}[Y|X=x])^{2}f(y|x),\end{align*}

which can be computed using

\[\sigma_{Y|X=x}^{2}=\mathbb{E}[Y^{2}|X=x]-(\mathbb{E}[Y|X=x])^{2}.\]

The conditional mean \(\mu_{X|Y=y}\) and the conditional variance \(\sigma_{X|Y=y}^{2}\) are given similar expressions.

Example. Let \(X\) and \(Y\) have the joint p.d.f.

\[f(x,y)=\frac{x+y}{21}\]

for \(x=1,2,3\) and \(y=1,2\). Then, the marginal probability functions are given by

\[f_{X}(x)=\frac{2x+3}{21}\mbox{ for }x=1,2,3\]

and

\[f_{Y}(y)=\frac{3y+6}{21}\mbox{ for }y=1,2.\]

Therefore, the conditional p.d.f. of \(X\) given \(Y=y\) is equal to

\[g(x|y)=\frac{(x+y)/21}{(3y+6)/21}=\frac{x+y}{3y+6}\]

for \(x=1,2,3\) when \(y=1\) or \(2\). For example, we have

\begin{align*} \mathbb{P}(X=2|Y=2) & =g(2|2)\\ & =4/12=1/3. \end{align*}

Similarly, the conditional p.d.f. of \(Y\), given \(X=x\), is equal to

\[f(y|x)=\frac{x+y}{2x+3}\]

for \(y=1,2\) when \(x=1,2\) or \(3\). We can compute \(\mu_{Y|x}\) and \(\sigma_{Y|x}^{2}\), when \(x=3\).

\begin{align*} \mu_{Y|3} & =\mathbb{E}[Y|X=3]\\ & =\sum_{y=1}^{2}yf(y|3)\\ & =\sum_{y=1}^{2}y\cdot\frac{3+y}{9}\\ & =\frac{14}{9}\end{align*}

and

\begin{align*} \sigma_{Y|3}^{2} & =E\left [\left .\left (Y-\frac{14}{9}\right )^{2}\right |X=3\right ]\\ & =\sum_{y=1}^{2}\left (y-\frac{19}{4}\right )^{2}\left (\frac{3+y}{9}\right )\\ & =\frac{20}{81}.\sharp\end{align*}

Let \(X_{1}\) and \(X_{2}\) be two continuous random variables with joint p.d.f. \(f(x_{1},x_{2})\). The conditional expectation and variance of \(X_{2}\) given that \(X_{1}=x_{1}\) are defined by

\[\mathbb{E}[X_{2}|X_{1}=x_{1}]=\int_{\mathbb{R}}x_{2}f(x_{2}|x_{1})dx_{2}\]

and

\begin{align*} & \mbox{Var}[X_{2}|X_{1}=x_{1}]\\ & \quad =\mathbb{E}\left .\left\{(X_{2}-\mathbb{E}[X_{2}|X_{1}=x_{1}])^{2}\right |X_{1}=x_{1}\right\}\\
& \quad =\int_{\mathbb{R}}\left (x_{2}-\mathbb{E}[X_{2}|X_{1}=x_{1}]\right )^{2}\cdot f(x_{2}|x_{1})dx_{2}\\
& \quad =\mathbb{E}[X_{2}^{2}|X_{1}=x_{1}]-(\mathbb{E}[X_{2}|X_{1}=x_{1}])^{2}.\end{align*}

Example. Continued from Example \ref{ex1}, we have

\[\begin{array}{l}
f(x,y)=2\mbox{ for }0\leq x\leq y\leq 1,\\
f_{X}(x)=2(1-x)\mbox{ for }0\leq x\leq 1,\\
f_{Y}(y)=2y\mbox{ for }0\leq y\leq 1.
\end{array}\]

We also have

\begin{align*} f(y|x) & =\frac{f(x,y)}{f_{X}(x)}\\ & =\frac{2}{2(1-x)}\\ & =\frac{1}{1-x}\end{align*}

for \(0\leq x\leq 1\) \(x\leq y\leq 1\). The conditional mean of \(Y\), given \(X=x\), is given by

\begin{align*} \mathbb{E}[Y|X=x] & =\int_{x}^{1} y\cdot\frac{1}{1-x}dy\\ & =\frac{1+x}{2}\end{align*}

for \(0\leq x\leq 1\). Similarly, it can be shown

\[\mathbb{E}[X|Y=y]=\frac{y}{2}\]

for \(0\leq y\leq 1\). The conditional variance of \(Y\), given \(X=x\), is given by

\begin{align*} & E\left .\left\{(Y-\mathbb{E}[Y|X=x])^{2}\right |X=x\right\}\\ & \quad =\int_{x}^{1}\left (y-\frac{1+x}{2}\right )^{2}\cdot\frac{1}{1-x}dy\\ & \quad =\frac{(1-x)^{2}}{12}.\end{align*}

The computation of conditional probability is given by

\begin{align*} & \mathbb{P}\left (\frac{3}{4}\leq Y\leq\frac{7}{8}\left |X=\frac{1}{4}\right .\right)\\ & \quad=\int_{3/4}^{7/8} f(y|1/4)dy\\ & \quad=\int_{3/4}^{7/8}\frac{1}{3/4}dy=\frac{1}{6}\end{align*}.

We see that \(\mathbb{E}[X_{2}|X_{1}=x_{1}]\) is a function of \(x_{1}\). Replacing \(x_{1}\) by the random variable \(X_{1}\) and writing \(\mathbb{E}[X_{2}|X_{1}]\) instead of \(\mathbb{E}[X_{2}|X_{1}=x_{1}]\), we have that \(\mathbb{E}[X_{2}|X_{1}]\) is itself a random variable and a function of \(X_{1}\). Then we may talk about \(\mathbb{E}[\mathbb{E}[X_{2}|X_{1}]]\).

Proposition. Suppose that \(\mathbb{E}[X_{2}]\) and \(\mathbb{E}[X_{2}|X_{1}]\) exist. Then, we have \(\mathbb{E}[\mathbb{E}[X_{2}|X_{1}]]=\mathbb{E}[X_{2}]\).

Proof. We have

\begin{align*} \mathbb{E}[\mathbb{E}[X_{2}|X_{1}]] & =\int_{\mathbb{R}}\mathbb{E}[X_{2}|X_{1}=x_{1}]f_{1}(x_{1})dx_{1}\\
& =\int_{\mathbb{R}}\left [\int_{\mathbb{R}}x_{2}f(x_{2}|x_{1})dx_{2}\right ]f_{1}(x_{1})dx_{1}\\
& =\int_{\mathbb{R}}\int_{\mathbb{R}}x_{2}f(x_{2}|x_{1})f_{1}(x_{1})dx_{2}dx_{1}\\
& =\int_{\mathbb{R}}\int_{\mathbb{R}}x_{2}f(x_{1},x_{2})dx_{2}dx_{1}\\ & =\int_{\mathbb{R}}\int_{\mathbb{R}}x_{2}f(x_{1},x_{2})dx_{1}dx_{2}\\
& =\int_{\mathbb{R}}x_{2}\left (\int_{\mathbb{R}}f(x_{1},x_{2})dx_{1}\right )dx_{2}\\ & =\int_{\mathbb{R}}x_{2}f_{2}(x_{2})dx_{2}\\ & =\mathbb{E}[X_{2}].\end{align*}

This completes the proof. \(\blacksquare\)

Proposition. Let \(X_{1}\) and \(X_{2}\) be random variables. Suppose that \(g(x)\) is a measurable function of \(x\) and \(\mathbb{E}[X_{2}]\) exists. Then, we have

\[\mathbb{E}[X_{2}\cdot g(X_{1})|X_{1}=x_{1}]=g(x_{1})\cdot\mathbb{E}[X_{2}|X_{1}=x_{1}]\]

or

\[\mathbb{E}[X_{2}\cdot g(X_{1})|X_{1}]=g(X_{1})\cdot \mathbb{E}[X_{2}|X_{1}].\]

In particular, by taking \(X_{2}=1\), we have

\[\mathbb{E}[g(X_{1})|X_{1}=x_{1}]=g(x_{1})\]

or

\[\mathbb{E}[g(X_{1})|X_{1}]=g(X_{1}).\]

Proof. We have

\begin{align*} \mathbb{E}[X_{2}\cdot g(X_{1})|X_{1}=x_{1}] & =\int_{\mathbb{R}}x_{2}g(x_{1})f(x_{2}|x_{1})dx_{2}\\ & =g(x_{1})\int_{\mathbb{R}}x_{2}f(x_{2}|x_{1})dx_{2}\\ & =g(x_{1})\cdot
\mathbb{E}[X_{2}|X_{1}=x_{1}].\end{align*}

This completes the proof. \(\blacksquare\)

Proposition. Let \(X\) and \(Y\) be two random variables. Then, we have

\[\mbox{Var}(Y)=\mathbb{E}(\mbox{Var}(Y|X))+\mbox{Var}(\mathbb{E}(Y|X)).\]

\begin{equation}{\label{p1}}\tag{2}\mbox{}\end{equation}

Proposition \ref{p1}. Let \(X\) be a random variable, and let \(g\geq 0\) be a real-valued measurable function defined on \(\mathbb{R}\) such that \(g(X)\) is a random variable, and let \(c>0\). Then, we have

\[\mathbb{P}(g(X)\geq c)\leq\frac{\mathbb{E}[g(X)]}{c}.\]

Proof. Assume that \(X\) is a continuous random variable with p.d.f. \(f\). Then, we have

\begin{align*} \mathbb{E}[g(X)] & =\int_{\mathbb{R}}g(x)\cdot f(x)dx\\ & =\int_{A}g(x)\cdot f(x)dx+\int_{A^{c}}g(x)\cdot f(x)dx,\end{align*}

where

\[A=\{x\in \mathbb{R}:g(x)\geq c\}.\]

We also have

\begin{align*} \mathbb{E}[g(X)] & \geq\int_{A}g(x)\cdot f(x)dx\\ & \geq c\int_{A}f(x)dx\\ & =c\cdot \mathbb{P}(X\in A)\\ & =c\cdot \mathbb{P}(g(X)\geq c).\end{align*}

The proof is analogous when \(X\) is a discrete random variable. This completes the proof. \(\blacksquare\)

\begin{equation}{\label{c}}\tag{C}\mbox{}\end{equation}

Inequalities.

Let us take \(g(X)=|X-\mu |^{r}\), \(\mu =\mathbb{E}[X]\) and \(r>0\). Then, Proposition \ref{p1} gives the following inequality

\begin{align*} \mathbb{P}(|X-\mu |\geq c) & =\mathbb{P}(|X-\mu |^{r}\geq c^{r})\\ & \leq\frac{\mathbb{E}[|X-\mu |^{r}]}{c^{r}},\end{align*}

which is known as Markov’s inequality. When we take \(r=2\), we also have

\begin{align*} \mathbb{P}(|X-\mu |\geq c) & \leq\frac{\mathbb{E}[X-\mu ]^{2}}{c^{2}}\\ & =\frac{\sigma^{2}}{c^{2}},\end{align*}

which is known as Chebyshev’s inequality. In particular, for \(c=k\sigma\), we have

\[\mathbb{P}(|X-\mu |\geq k\sigma )\leq\frac{1}{k^{2}}.\]

There is another form for Chebyshev’s inequality.

Theorem. (Chebyshev’s inequality). Let \(X\) be positive, and let \(g\) be positive and increasing function on \(\mathbb{R}_{+}\). Then, for each \(c>0\), we have

\[\mathbb{P}(X\geq c)\leq\frac{\mathbb{E}[g(X)]}{g(c)}.\]

Proof. Since \(g\) is an increasing function, we have

\begin{align*} g(X) & \geq g(X)\cdot 1_{\{X\geq c\}}\\ & \geq g(c)\cdot 1_{\{X\geq c\}}.\end{align*}

Taking expectations in the inequalities, we obtain

\begin{align*} \mathbb{E}[g(X)] & \geq g(c)\cdot \mathbb{E}[1_{\{X\geq c\}}]\\ & =g(c)\cdot\int_{\mathbb{R}}f(x)\cdot 1_{\{x\geq c\}}dx\\
& =g(c)\cdot\int_{\{x\geq c\}}f(x)dx\\ & =g(c)\cdot\int_{c}^{\infty}f(x)dx\\ & =g(c)\cdot \mathbb{P}(X\geq c).\end{align*}

This completes the proof. \(\blacksquare\)

Proposition. (Schwartz inequality). Let \(X\) and \(Y\) be two random variables with means \(\mu_{1},\mu_{2}\) and variances \(\sigma_{1}^{2},\sigma_{2}^{2}\), respectively. Then, we have

\[E^{2}[(X-\mu_{1})(Y-\mu_{2})]\leq\sigma_{1}^{2}\sigma_{2}^{2},\]

or equivalently

\[-\sigma_{1}\sigma_{2}\leq \mathbb{E}[(X-\mu_{1})(Y-\mu_{2})]\leq\sigma_{1}\sigma_{2}.\]

Moreover, we have

\[\mathbb{E}[(X-\mu_{1})(Y-\mu_{2})]=\sigma_{1}\sigma_{2}\]

if and only if

\[\mathbb{P}\left [Y=\mu_{2}+\frac{\sigma_{2}}{\sigma_{1}}(X-\mu_{1})\right ]=1\]

and

\[\mathbb{E}[(X-\mu_{1})(Y-\mu_{2})]=-\sigma_{1}\sigma_{2}\]

if and only if

\[\mathbb{P}\left [Y=\mu_{2}-\frac{\sigma_{2}}{\sigma_{1}}(X-\mu_{1})\right ]=1.\]

This completes the proof. \(\blacksquare\)

The expectation

\[\mathbb{E}[(X-\mu_{1})(Y-\mu_{2})]\]

is called the covariance of \(X\) and \(Y\) and is denoted by \(\mbox{Cov}(X,Y)\). On the other hand, the covariance of \((X-\mu_{1})/\sigma_{1}\) and \((Y-\mu_{2})/\sigma_{2}\) is called the correlation coefficient of \(X\) and \(Y\) and is denoted by \(\rho_{XY}\) or just \(\rho\) if no confusion is possible; that is

\begin{align*} \rho & =E\left [\left (\frac{X-\mu_{1}}{\sigma_{1}}\right )\left (\frac{Y-\mu_{2}}{\sigma_{2}}\right )\right ]\\ & =\frac{\mathbb{E}[(X-\mu_{1})(Y-\mu_{2})]}{\sigma_{1}\sigma_{2}}\\ & =\frac{\mbox{Cov}(X,Y)}{\sigma_{1}\sigma_{2}}\\ & =\frac{\mathbb{E}[XY]-\mu_{1}\mu_{2}}{\sigma_{1}\sigma_{2}}.\end{align*}

From the Schwartz inequality, we have \(\rho^{2}\leq 1\), i.e., \(-1\leq\rho\leq 1\). Moreover, we have

\[\rho=1\mbox{ if and only if }Y=\mu_{2}+\frac{\sigma_{2}}{\sigma_{1}}(X-\mu_{1})\mbox{ with probability }1\]

and

\[\rho=-1\mbox{ if and only if }Y=\mu_{2}-\frac{\sigma_{2}}{\sigma_{1}}(X-\mu_{1})\mbox{ with probability }1.\]

So \(\rho =\pm 1\) means \(X\) and \(Y\) are linearly related. Therefore, the significance of \(\rho\) as a measure of linear dependence between \(X\) and \(Y\). If \(\rho =0\), we say that \(X\) and \(Y\) are uncorrelated, while if \(\rho =\pm 1\), we say that \(X\) and \(Y\) are completely correlated (positively if \(\rho =1\), and negatively if \(\rho =-1\)). For \(-1<\rho <1\), \(\rho\neq 0\), we say that \(X\) and \(Y\) are correlated (positively if \(\rho >0\), and negatively if \(\rho <0\)). Positive values of \(\rho\) may indicate that there is a tendency of large values of \(Y\) to correspond to large values of \(X\), and small values of \(Y\) to correspond to small values of \(X\). Negative values of \(\rho\) may indicate that small values of \(Y\) correspond to large values of \(X\) and large values of \(Y\) to small values of \(X\). Values of \(\rho\) close to zero may also indicate that these tendencies are weak, while values of \(\rho\) close to \(\pm 1\) may indicate that the tendencies are strong.

Example. Let \(X_{1}\) and \(X_{2}\) have the joint p.d.f.

\[f(x_{1},x_{2})=\frac{x_{1}+2x_{2}}{18}\]

for \(x_{1}=1,2\) and \(x_{2}=1,2\). The marginal probability density function are, respectively, given by

\begin{align*} f_{X_{1}}(x_{1}) & =\sum_{x_{2}=1}^{2}\frac{x_{1}+2x_{2}}{18}\\ & =\frac{2x_{1}+6}{18}\end{align*}

for \(x_{1}=1,2\), and

\begin{align*} f_{X_{2}}(x_{2}) & =\sum_{x_{1}=1}^{2}\frac{x_{1}+2x_{2}}{18}\\ & =\frac{4x_{2}+3}{18}\end{align*}

for \(x_{2}=1,2\). Since \(f(x_{1},x_{2})\neq f_{X_{1}}(x_{1})f_{X_{2}}(x_{2})\), \(X_{1}\) and \(X_{2}\) are dependent. The mean and variance of \(X_{1}\) are given by

\begin{align*} \mu_{1} & =\sum_{x_{1}=1}^{2}x_{1}\cdot\frac{2x_{1}+6}{18}\\ & =\frac{14}{9}\end{align*}

and

\begin{align*} \sigma_{1}^{2} & =\sum_{x_{1}=1}^{2}x_{1}^{2}\cdots\frac{2x_{1}+6}{18}-\left (\frac{14}{9}\right )^{2}\\ & =\frac{20}{81}.\end{align*}

The mean and variance of \(X_{2}\) are given by

\begin{align*} \mu_{2} & =\sum_{x_{2}=1}^{2}x_{2}\cdot\frac{4x_{2}+3}{18}\\ & =\frac{29}{18}\end{align*}

and

\begin{align*} \sigma_{2}^{2} & =\sum_{x_{2}=1}^{2}x_{2}^{2}\cdot\frac{4x_{2}+3}{18}-\left (\frac{29}{18}\right )^{2}\\ & =\frac{77}{324}.\end{align*}

The covariance of \(X_{1}\) and \(X_{2}\) is given by

\begin{align*} \mbox{Cov}(X_{1},X_{2}) & =\sum_{x_{1}=1}^{2}\sum_{x_{2}=1}^{2}x_{1}x_{2}\cdot\frac{x_{1}+2x_{2}}{18}-\left (\frac{14}{9}\right )\left (\frac{29}{8}\right )\\ & =\frac{45}{18}-\frac{406}{162}\\ & =-\frac{1}{162}.\end{align*}

Therefore, the correlation coefficient is given be

\begin{align*} \rho & =\frac{-1/162}{\sqrt{(20/81)(77/324)}}\\ & =\frac{-1}{\sqrt{1540}}\\ & =-0.025.\end{align*}

 

 

Hsien-Chung Wu
Hsien-Chung Wu
文章: 183

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *