Vittorio Reggianini (1858-1939) was an Italian painter.
We have sections
- Vector-Valued Functions of Single Variable \ref{a}
- Real-Valued Functions of Several Variables
- Vector-Valued Functions of Several Variables
- Jacobian Determinant
- Optimum Problems
We have been familiar with the differentiation for real-valued functions of single variable. Now, we are going to study the differentiation for vector-valued functions of single variable, real-valued functions of several variables and vector-valued functions of several variables.
\begin{equation}{\label{a}}\tag{A}\mbox{}\end{equation}
Vector-Valued Functions of Single Variable.
Let \({\bf f}:(a,b)\rightarrow\mathbb{R}^{m}\) be a vector-valued function defined on an open interval \((a,b)\). We say that \({\bf f}\) is differentiable at \(t\in (a,b)\) when the following limit
\[\lim_{h\rightarrow 0}\frac{{\bf f}(t+h)-{\bf f}(t)}{h}\mbox{ exists.}\]
This limit is called the derivative of \({\bf f}\) at \(t\), and is denoted by \({\bf f}'(t)\). The vector-valued function \({\bf f}\) is differentiable when it is differentiable at each point of its domain.
\begin{equation}{\label{map454}}\tag{1}\mbox{}\end{equation}
Lemma \ref{map454}. Let \({\bf f}:A\rightarrow\mathbb{R}^{n}\) be a vector-valued function defined on a subset \(A\) of \(\mathbb{R}\). Given \({\bf a}=(a_{1},\cdots ,a_{n})\), we have
\[\lim_{x\rightarrow p}{\bf f}(x)={\bf a}\]
if and only if
\[\lim_{x\rightarrow p}f_{k}(x)=a_{k}\]
for each \(k=1,\cdots ,n\).
The above Lemma \ref{map454} can refer to the page Limits of functions.
\begin{equation}{\label{map456}}\tag{2}\mbox{}\end{equation}
Lemma \ref{map456}. Let \(f_{1},\cdots ,f_{n}\) be real-valued functions defined on a subset \(S\) of a metric space \((M,d)\). Then, the vector-valued function \({\bf f}=(f_{1},\cdots ,f_{n})\) is continuous at \(p\in S\) if and only if each one of the functions \(f_{1},\cdots ,f_{n}\) is continuous at \(p\).
The above Lemma \ref{map456} can refer to the page Continuity of Functions.
Proposition. Let \({\bf f}:(a,b)\rightarrow\mathbb{R}^{m}\) be a vector-valued function defined on an open interval \((a,b)\). Then, we have the following properties.
(i) Suppose that \({\bf f}=(f_{1},\cdots ,f_{m})\) is differentiable at \(t\). Then \[{\bf f}'(t)=\left (f’_{1}(t),\cdots ,f’_{m}(t)\right ).\]
(ii) Suppose that \({\bf f}\) is differentiable at \(t\). Then \({\bf f}\) is continuous at \(t\).
Proof. Part (i) follows from Lemma \ref{map454}, and part (ii) follows from Lemma \ref{map456} immediately. \(\blacksquare\)
Example. We consider the vector-valued function \({\bf f}(t)=(t\sin t,e^{-t},t)\). Then, we have
\[{\bf f}'(t)=\left (\cos t+\sin t,-e^{-t},1\right )\mbox{ and }{\bf f}”(t)=\left (2\cos t-t\sin t,e^{-t},0\right ).\]
Let \({\bf f}:[a,b]\rightarrow\mathbb{R}^{m}\) be a vector-valued function defined a closed interval \([a,b]\) such that each component function \(f_{i}\) is Riemann-integrable on \([a,b]\). The Riemann integral of \({\bf f}\) on \([a,b]\) is defined by
\[\int_{a}^{b} {\bf f}(t)dt=\left (\int_{a}^{b} f_{1}(t)dt,\cdots ,\int_{a}^{b} f_{m}(t)dt\right ).\]
Example. We consider the vector-valued function \({\bf f}(t)=(t,\sqrt{1+t},-e^{t})\) defined on \([0,1]\). Then, we have
\begin{align*} \int_{0}^{1} {\bf f}(t)dt & =\left (\int_{0}^{1} tdt,\int_{0}^{1} \sqrt{1+t}dt,\int_{0}^{1} (-e^{t})dt\right )\\ & =\left (\frac{1}{2},\frac{2}{3}(2\sqrt{2}-1),1-e\right ).\end{align*}
Example. Suppose that \({\bf f}'(t)=(2\cos t,-t\sin t,2t)\) with \({\bf f}(0)=(1,0,3)\). We are going to find the vector-valued function \({\bf f}\). By integrating \({\bf f}'(t)\), we find
\[{\bf f}(t)=\left (2\sin t+C_{1},\frac{1}{2}\cos t^{2}+C_{2},t^{2}+C_{3}\right ).\]
Since
\[(1,0,3)={\bf f}(0)=\left (C_{1},\frac{1}{2}+C_{2},C_{3}\right ),\]
we have \(C_{1}=1\), \(C_{2}=-\frac{1}{2}\), \(C_{3}=3\). Therefore, we obtain
\[{\bf f}(t)=\left (2\sin t+1,\frac{1}{2}\cos t^{2}-\frac{1}{2},t^{2}+3\right ).\]
Let \({\bf a}=(a_{1},a_{2},\cdots ,a_{n})\) and \({\bf b}=(b_{1},b_{2},\cdots ,b_{n})\) be two vectors in \(\mathbb{R}^{n}\). The inner product of \({\bf a}\) and \({\bf b}\) is denoted and defined by
\[{\bf a}\bullet{\bf b}=a_{1}b_{1}+a_{2}b_{2}+\cdots +a_{n}b_{n}.\]
Given any \({\bf x}\in\mathbb{R}^{n}\), the norm of \({\bf x}\) is defined by
\[\parallel{\bf x}\parallel\sqrt{x_{1}^{2}+\cdots+x_{n}^{2}}.\]
Theorem. Let \({\bf f}\) and \({\bf g}\) be two vector-valued functions defined on the same closed interval \([a,b]\). Then, we have the following results.
(i) We have \[\int_{a}^{b} [{\bf f}(t)+{\bf g}(t)]dt=\int_{a}^{b}{\bf f}(t)dt+\int_{a}^{b} {\bf g}(t)dt.\]
(ii) For every scalar \(\alpha\), we have \[\int_{a}^{b} [\alpha {\bf f}(t)]dt=\alpha\int_{a}^{b} {\bf f}(t)dt.\]
(iii) For every constant vector \({\bf c}\), we have \[\int_{a}^{b} [{\bf c}\bullet {\bf f}(t)]dt={\bf c}\bullet\left (\int_{a}^{b} {\bf f}(t)dt\right ).\]
(iv) We have \[\left |\!\left |\int_{a}^{b} {\bf f}(t)dt\right |\!\right |\leq\int_{a}^{b}\parallel {\bf f}(t)\parallel dt.\]
Let \(u:[a,b]\rightarrow\mathbb{R}\) be a real-valued function defined on the closed interval \([a,b]\). Let \({\bf f}:[a,b]\rightarrow\mathbb{R}^{m}\) be a vector-valued function defined on the same closed interval \([a,b]\). We can define the scalar product \((u{\bf f})(t)=u(t){\bf f}(t)\). Let \({\bf f}:[c,d]\rightarrow\mathbb{R}^{m}\) be a vector-valued function defined on the closed interval \([c,d]\). If \(u(t)\) is in the domain of \({\bf f}\) for each \(t\in [a,b]\), i.e., \(u(t)\in [c,d]\), then we can form the composition \(({\bf f}\circ u)(t)={\bf f}(u(t))\).
Proposition. We have the following rules of differentiation.
(i) We have \[({\bf f}+{\bf g})'(t)={\bf f}'(t)+{\bf g}'(t).\]
(ii) For any constant \(\alpha\), we have \[(\alpha {\bf f})'(t)=\alpha {\bf f}'(t).\]
(iii) We have \[(u{\bf f})'(t)=u(t){\bf f}'(t)+u'(t){\bf f}(t).\]
(iv) We have \[({\bf f}\bullet {\bf g})'(t)={\bf f}(t)\bullet {\bf g}'(t)+{\bf f}'(t)\bullet {\bf g}(t).\]
(v) The chain rule is given by \[({\bf f}\circ u)'(t)={\bf f}'(u(t))u'(t).\]
\begin{equation}{\label{b}}\tag{B}\mbox{}\end{equation}
Real-Valued Functions of Several Variables.
Now, we consider the real-valued function of several variables. Two different concepts of differentiation will be introduced, and their relation will also be established.
Definition. Let \(f:\mathbb{R}^{n}\rightarrow\mathbb{R}\) be a real-valued function defined on \(\mathbb{R}^{n}\). Given any vector \({\bf u}\in\mathbb{R}^{n}\), when the following limit
\[f'({\bf x};{\bf u})=\lim_{h\rightarrow 0}\frac{f({\bf x}+h{\bf u})-f({\bf x})}{h}\]
exists, it is called the directional derivative of \(f\) at \({\bf x}\) in the direction \({\bf u}\). \(\sharp\)
Suppose that we consider the function \(f:\mathbb{R}^{3}\rightarrow\mathbb{R}\). Then, we have the following observations.
- For \({\bf u}={\bf i}=(1,0,0)\), we have \begin{align*} f'({\bf x};{\bf i}) & =\lim_{h\rightarrow 0}\frac{f({\bf x}+h{\bf i})-f({\bf x})}{h}\\ & =\lim_{h\rightarrow 0}\frac{f(x+h,y,z)-f(x,y,z)}{h}= \frac{\partial f}{\partial x}({\bf x}).\end{align*}
- For \({\bf u}={\bf j}=(0,1,0)\), we have \begin{align*} f'({\bf x};{\bf j}) & =\lim_{h\rightarrow 0}\frac{f({\bf x}+h{\bf j})-f({\bf x})}{h}\\ & =\lim_{h\rightarrow 0}\frac{f(x,y+h,z)-f(x,y,z)}{h}= \frac{\partial f}{\partial y}({\bf x}).\end{align*}
- For \({\bf u}={\bf k}=(0,0,1)\), we have \begin{align*} f'({\bf x};{\bf k}) & =\lim_{h\rightarrow 0}\frac{f({\bf x}+h{\bf k})-f({\bf x})}{h}\\ & =\lim_{h\rightarrow 0}\frac{f(x,y,z+h)-f(x,y,z)}{h}= \frac{\partial f}{\partial z}({\bf x}).\end{align*}
In general, for the function \(f:\mathbb{R}^{n}\rightarrow\mathbb{R}\), if \({\bf e}_{i}\) is the \(i\)th standard unit vector, then we have
\[f'({\bf x};{\bf e}_{i})=\frac{\partial f}{\partial x_{i}}({\bf x})\mbox{ for }i=1,\cdots ,n.\]
In other words, the partial derivatives are the special kinds of directional derivatives.
Definition. Let \(f:S\rightarrow\mathbb{R}\) be a real-valued function defined on an open subset \(S\) of \(\mathbb{R}^{n}\). We say that \(f\) is differentiable at \({\bf x}\in S\) when there exists a vector \({\bf y}\) satisfying
\[\lim_{{\bf u}\rightarrow {\bf 0}}\frac{f({\bf x}+{\bf u})-f({\bf x})-{\bf y}\bullet {\bf u}}{\parallel {\bf u}\parallel}=0.\]
It is not difficult to show that, if such a vector \({\bf y}\) exists, it is unique. We call this unique vector the gradient of \(f\) at \({\bf x}\). Therefore, when \(f\) is differentiable at \({\bf x}\), the gradient of \(f\) at \({\bf x}\) is the unique vector \(\nabla f({\bf x})\) satisfying
\[\lim_{{\bf u}\rightarrow {\bf 0}}\frac{f({\bf x}+{\bf u})-f({\bf x})-\nabla f({\bf x})\bullet {\bf u}}{\parallel {\bf u}\parallel}=0.\]
Theorem. Let \(f:S\rightarrow\mathbb{R}\) be a real-valued function defined on an open subset \(S\) of \(\mathbb{R}^{n}\). Suppose that the first-order partial derivative of \(f\) is continuous on a neighborhood of \({\bf x}\). Then \(f\) is differentiable at \({\bf x}\) and the gradient is given by
\[\nabla f({\bf x})=\left (\frac{\partial f}{\partial x_{1}}({\bf x}),\cdots ,\frac{\partial f}{\partial x_{m}}({\bf x})\right ).\]
Example. We consider the real-valued function \(f(x,y,z)=\sin (xy^{2}z^{3})\). Then, we have
\begin{align*} \nabla f & =\left (\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},\frac{\partial f}{\partial z}\right )\\ & = \left (y^{2}z^{3}\cos (xy^{2}z^{3}),2xyz^{3}\cos (xy^{2}z^{3}),3xy^{2}z^{2}\cos (xy^{2}z^{3})\right ).\end{align*}
Proposition. Let \(f:S\rightarrow\mathbb{R}\) be a real-valued function defined on an open subset \(S\) of \(\mathbb{R}^{n}\). Suppose that \(f\) is differentiable at \({\bf x}\in S\). Then \(f\) is continuous at \({\bf x}\).
Proposition. Let \(f,g:S\rightarrow\mathbb{R}\) be two real-valued functions defined on the same open subset \(S\) of \(\mathbb{R}^{n}\) such that the gradients \(\nabla f({\bf x})\) and \(\nabla g({\bf x})\) exist. Then, we have the following rules
\begin{align*} \nabla [f({\bf x})+g({\bf x})] & =\nabla f({\bf x})+\nabla g({\bf x})\\ \nabla [\alpha f({\bf x})] & =\alpha \nabla f({\bf x})\\ \nabla [f({\bf x})g({\bf x})] & =f({\bf x})\nabla g({\bf x})+g({\bf x})\nabla f({\bf x}). \end{align*}
\begin{equation}{\label{ma15}}\tag{3}\mbox{}\end{equation}
Theorem \ref{ma15}. Let \(f:\mathbb{R}^{n}\rightarrow\mathbb{R}\) be a real-valued function defined on \(\mathbb{R}^{n}\). Suppose that \(f\) is differentiable at \({\bf x}\). Then \(f\) has a directional derivative at \({\bf x}\) in every direction \({\bf u}\) with \(\parallel{\bf u}\parallel\neq 0\) and
\[f'({\bf x};{\bf u})=\nabla f({\bf x})\bullet {\bf u}.\]
Proof. The differentiability at \({\bf x}\) says
\[\lim_{h\rightarrow 0}\frac{f({\bf x}+h{\bf u})-f({\bf x})-\nabla f({\bf x})\bullet h{\bf u}}{\parallel h{\bf u}\parallel}=0,\]
which implies
\[\lim_{h\rightarrow 0}\left [\frac{f({\bf x}+h{\bf u})-f({\bf x})}{\parallel h{\bf u}\parallel}-\frac{\nabla f({\bf x})\bullet h{\bf u}}{\parallel h{\bf u}\parallel}\right ]=0.\]
We also have
\[0=\lim_{h\rightarrow 0-}\left [\frac{f({\bf x}+h{\bf u})-f({\bf x})}{-h\parallel {\bf u}\parallel} -\frac{\nabla f({\bf x})\bullet h{\bf u}}{-h\parallel {\bf u}\parallel}\right ],\]
which implies
\[0=\lim_{h\rightarrow 0-}\left [\frac{f({\bf x}+h{\bf u})-f({\bf x})}{h\parallel {\bf u}\parallel} -\frac{\nabla f({\bf x})\bullet h{\bf u}}{h\parallel {\bf u}\parallel}\right ].\]
Similarly, we have
\[0=\lim_{h\rightarrow 0+}\left [\frac{f({\bf x}+h{\bf u})-f({\bf x})}{h\parallel {\bf u}\parallel} -\frac{\nabla f({\bf x})\bullet h{\bf u}}{h\parallel {\bf u}\parallel}\right ].\]
Since \(\parallel{\bf u}\parallel\neq 0\), we obtain
\begin{align*} 0 & =\lim_{h\rightarrow 0}\left [\frac{f({\bf x}+h{\bf u})-f({\bf x})}{h\parallel {\bf u}\parallel} -\frac{\nabla f({\bf x})\bullet h{\bf u}}{h\parallel {\bf u}\parallel}\right ]\\ & =\lim_{h\rightarrow 0}\left [\frac{f({\bf x}+h{\bf u})-f({\bf x})}{h}-\nabla f({\bf x})\bullet {\bf u}\right ], \end{align*}
which implies
\[f'({\bf x};{\bf u})=\lim_{h\rightarrow 0}\frac{f({\bf x}+h{\bf u})-f({\bf x})}{h}=\nabla f({\bf x})\bullet {\bf u}.\]
This completes the proof. \(\blacksquare\)
Example. Find the directional derivative of function \(f(x,y,z)=x\cos y\sin z\) at the point \((1,\pi,\frac{1}{4}\pi )\) in the direction of vector \({\bf u}=(2,-1,4)\). Now, we have
\begin{align*} \frac{\partial f}{\partial x}(x,y,z) & =\cos y\sin z\\
\frac{\partial f}{\partial y}(x,y,z) & =-x\sin y\sin z\\
\frac{\partial f}{\partial z}(x,y,z) & =x\cos y\cos z.\end{align*}
Then, we have
\begin{align*} \frac{\partial f}{\partial x}(1,\pi,\pi /4) & =-\sqrt{2}/2\\
\frac{\partial f}{\partial y}(1,\pi,\pi /4) & =0\\
\frac{\partial f}{\partial z}(1,\pi,\pi /4) & =-\sqrt{2}/2.\end{align*}
Finally, we obtain \[\nabla f(1,\pi,\pi /4)=(-\sqrt{2}/2,0,-\sqrt{2}/2)\]
and
\begin{align*} f'((1,\pi ,\pi /4);{\bf u}) & =\nabla f(1,\pi ,\pi /4)\bullet {\bf u}\\ & =(-\sqrt{2}/2,0,-\sqrt{2}/2)\bullet (2,-1,4)=-3\sqrt{2}.\end{align*}
Theorem. (The Mean-Value Theorem for Several Variables). Let \(S\) be an open subset of \(\mathbb{R}^{n}\), and let \(f:\mathbb{R}^{n}\rightarrow\mathbb{R}\) be a real-valued function defined on \(S\). Given any point \({\bf a}\) and \({\bf b}\) in \(S\), suppose that \(f\) is differentiable at each point of the line segment joining \({\bf a}\) and \({\bf b}\). Then, there exists a point \({\bf c}\) in this line segment satisfying \[f({\bf b})-f({\bf a})=\nabla f({\bf c})\bullet ({\bf b}-{\bf a}).\]
Proof. The proof can refer to Remark~\ref{ma132} which will be given in the subsequent discussion.
Proposition. We have the following properties. (i) Let \(U\) be an open connected set in \(\mathbb{R}^{n}\), and let \(f\) be a differentiable function on \(U\). Suppose that \(\nabla f({\bf x})={\bf 0}\) for all \({\bf x}\in U\). Then \(f\) is constant on \(U\). (ii) Let \(U\) be an open connected set in \(\mathbb{R}^{n}\), and let \(f\) and \(g\) be differentiable functions on \(U\). Suppose that \(\nabla f({\bf x})=\nabla g({\bf x})\) for all \({\bf x}\in U\). Then \(f\) and \(g\) differ by a constant on \(U\).
Proof. The proof can refer to the proof of Proposition \ref{ma133} that will be given in the subsequent discussion.
\begin{equation}{\label{c}}\tag{C}\mbox{}\end{equation}
Vector-Valued Functions of Several Variables.
Now, we are going to discuss the derivatives of vector-valued function from \(\mathbb{R}^{n}\) to \(\mathbb{R}^{m}\). Let \(S\) be a subset of \(\mathbb{R}^{n}\), and let \(\hat{\bf f}:S\rightarrow\mathbb{R}^{m}\) be a vector-valued function defined on \(S\). We want to investigate how \(\hat{\bf f}\) changes when we move from a point \({\bf c}\in S\) along a line segment to a nearby point \({\bf c}+{\bf u}\) with \({\bf u}\neq {\bf 0}\). Each point on the segment can be expressed as \({\bf c}+h{\bf u}\) for \(h\in\mathbb{R}\). In this case, the vector \({\bf u}\) describes the direction of the line segment. Assume that \({\bf c}\) is an interior point of \(S\). Then, there exists an \(n\)-dimensional open ball \(B({\bf c};r)\subset S\) such that, for a sufficiently small \(h\), the line segment joining \({\bf c}\) to \({\bf c}+h{\bf u}\) will lie in \(B({\bf c};r)\subset S\).
Definition. Let \(S\) be a subset of \(\mathbb{R}^{n}\), and let \({\bf f}:S\rightarrow\mathbb{R}^{m}\) be a vector-valued function defined on \(S\). The directional derivative of \({\bf f}\) at \({\bf c}\) in the direction \({\bf u}\) is denoted and defined by
\begin{equation}{\label{maeq245}}\tag{4} {\bf f}'({\bf c};{\bf u})=\lim_{h\rightarrow 0}\frac{{\bf f}({\bf c}+h{\bf u})-{\bf f}({\bf c})}{h} \end{equation}
whenever the limit exists. For the \(k\)th unit vector \({\bf u}={\bf e}_{k}\), \({\bf f}'({\bf c};{\bf e}_{k})\) is called a partial derivative and is denoted by \(D_{k}{\bf f}({\bf c})\). \(\sharp\)
Example. The vector-valued function \({\bf f}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}\) is called linear when \[{\bf f}(a{\bf x}+b{\bf y})=a{\bf f}({\bf x})+b{\bf f}({\bf y})\] for every \({\bf x},{\bf y}\in\mathbb{R}^{n}\) and every \(a,b\in\mathbb{R}\). In this case, the limit in (\ref{maeq245}) is simplified to \({\bf f}({\bf u})\), i.e., \({\bf f}'({\bf c};{\bf u})={\bf f}({\bf u})\) for every \({\bf c}\) and \({\bf u}\). \(\sharp\)
Suppose that the directional derivative \({\bf f}'({\bf c};{\bf u})\) exists in every direction \({\bf u}\). Then, all the partial derivatives \(D_{1}{\bf f}({\bf c}),\cdots ,D_{n}{\bf f}({\bf c})\) exist. However, the converse is not true. The counterexample is given below.
Example. We consider the real-valued function \(f:\mathbb{R}^{2}\rightarrow\mathbb{R}\) by
\[f(x,y)=\left\{\begin{array}{ll} x+y & \mbox{if \(x=0\) or \(y=0\)}\\ 1 & \mbox{otherwise}. \end{array}\right .\] Then, we have \[D_{1}f(0,0)=1=D_{2}f(0,0).\]
Now, we consider any other direction \({\bf u}=(u_{1},u_{2})\) with \(u_{1}\neq 0\) and \(u_{2}\neq 0\). Then, the limit
\[f'({\bf 0};{\bf u})=\lim_{h\rightarrow 0}\frac{f({\bf 0}+h{\bf u})-f({\bf 0})}{h}=\lim_{h\rightarrow 0}\frac{1}{h}\]
does not exist. \(\sharp\)
We have the following interesting observations.
- The limit in (\ref{maeq245}) is meaningful for \({\bf u}={\bf 0}\). In this case, we see that \({\bf f}'({\bf c};{\bf 0})\) exists and equals to \({\bf 0}\) for every \({\bf c}\in S\).
- When \(f\) is a real-valued function, we have \[D_{k}f({\bf c})=\frac{\partial f}{\partial x_{k}}({\bf c}).\]
- Using Proposition \ref{map454}, for the vector-valued function \({\bf f}=(f_{1},\cdots ,f_{m})\), the directional derivative \({\bf f}'({\bf c};{\bf u})\) exists if and only if \(f_{k}^{\prime}({\bf c};{\bf u})\) exists for each \(k=1,\cdots ,m\) and we have \begin{equation}{\label{ma16}}\tag{5} {\bf f}'({\bf c};{\bf u})=(f_{1}^{\prime}({\bf c};{\bf u}),\cdots ,f_{m}^{\prime}({\bf c};{\bf u})). \end{equation} In particular, when \({\bf u}={\bf e}_{k}\), we have \[D_{k}{\bf f}({\bf c})=\left (\frac{\partial f_{1}}{\partial x_{k}}({\bf c}),\cdots ,\frac{\partial f_{m}}{\partial x_{k}}({\bf c})\right ).\]
Example. Let \({\bf f}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}\) be a vector-valued function. We define a new function \(F(t)={\bf f}({\bf c}+t{\bf u})\). Then, we want to show
\begin{equation}{\label{maeq464}}\tag{6} F'(t)={\bf f}'({\bf c}+t{\bf u};{\bf u}). \end{equation}
Now, we have
\begin{align*} F'(t) & =\lim_{h\rightarrow 0}\frac{F(t+h)-F(t)}{h}\\ & =\lim_{h\rightarrow 0}\frac{{\bf f}({\bf c}+(t+h){\bf u})-{\bf f}({\bf c}+t{\bf u})}{h}\\ & =\lim_{h\rightarrow 0}\frac{{\bf f}(({\bf c}+t{\bf u})+h{\bf u})-{\bf f}({\bf c}+t{\bf u})}{h}\\ & ={\bf f}'({\bf c}+t{\bf u};{\bf u}). \end{align*}
Example. We consider the a real-valued function given by
\[f({\bf x})=\parallel {\bf x}\parallel^{2}=x_{1}^{2}+\cdots +x_{m}^{2}={\bf x}\bullet {\bf x}.\]
Then, we have
\begin{align*} F(t) & =f({\bf c}+t{\bf u})=({\bf c}+t{\bf u})\bullet({\bf c}+t{\bf u})\\ & =\parallel {\bf c} \parallel^{2}+2t({\bf c}\bullet {\bf u})+t^{2}\parallel {\bf u}\parallel^{2}.\end{align*}
Therefore, we obtain
\[F'(t)=2({\bf c}\bullet{\bf u})+2t\parallel {\bf u}\parallel^{2},\]
which also says that \(f'({\bf c};{\bf u})=F'(0)=2({\bf c}\bullet{\bf u})\) by referring to (\ref{maeq464}). \(\sharp\)
In what follows, we shall introduce the concept of total derivative. In the one-dimensional case, a function \(f\) with a derivative at \(c\) can be approximated by a linear polynomial. More precisely, suppose that \(f'(c)\) exist. Then, we can define
\begin{equation}{\label{maeq246}}\tag{7} E_{c}(h)=\left\{\begin{array}{ll} {\displaystyle \frac{f(c+h)-f(c)}{h}-f'(c)} & \mbox{if \(h\neq 0\)}\\ 0 & \mbox{if \(h=0\)}. \end{array}\right . \end{equation}
In other words, we have
\begin{equation}{\label{maeq247}}\tag{8} f(c+h)=f(c)+f'(c)h+hE_{c}(h), \end{equation}
which is also called the first-order Taylor formula for approximating \(f(c+h)-f(c)\) by \(f'(c)h\) with error \(hE_{c}(h)\). From (\ref{maeq247}), we have the following properties.
- The quantity \(f'(c)h\) is a linear function of \(h\). That is, if we write \(T_{c}(h)=f'(c)h\), then we have \[T_{c}(ah_{1}+bh_{2})=aT_{c}(h_{1})+bT_{c}(h_{2}).\]
- We have the error \(E_{c}(h)\rightarrow 0\) as \(h\rightarrow 0\) by (\ref{maeq246}).
The total derivative of a vector-valued function \({\bf f}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}\) will be defined in such a way that it preserves these two properties.
Definition. We consider the vector-valued function \({\bf f}:S\rightarrow\mathbb{R}^{m}\) defined on a subset \(S\) of \(\mathbb{R}^{n}\). Let \({\bf c}\) be an interior point of \(S\). The function \({\bf f}\) is said to be differentiable at \({\bf c}\) when there exists a linear function \(T_{\bf c}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}\) satisfying
\begin{equation}{\label{maeq248}}\tag{9} {\bf f}({\bf c}+{\bf v})={\bf f}({\bf c})+T_{\bf c}({\bf v})+\parallel {\bf v}\parallel E_{\bf c}({\bf v}), \end{equation}
where \(E_{\bf c}({\bf v})\rightarrow {\bf 0}\) as \({\bf v}\rightarrow {\bf 0}\). The linear function \(T_{\bf c}\) is called the total derivative of \({\bf f}\) at \({\bf c}\). \(\sharp\)
Let \({\bf f}\) be differentiable at \({\bf c}\) with total derivative \(T_{\bf c}\). For convenience, the total derivative is now written as \({\bf f}'({\bf c})=T_{\bf c}\), which resembles the notation used in the one-dimensional case. However, we need to note that \({\bf f}'({\bf c})\) is a linear function, not a number. It is defined everywhere on \(\mathbb{R}^{n}\) with \({\bf f}'({\bf c})({\bf u})=T_{\bf c}({\bf u})\). In this case, the first-order Taylor formula is written as
\[{\bf f}({\bf c}+{\bf u})={\bf f}({\bf c})+{\bf f}'({\bf c})({\bf u})+\parallel {\bf v}\parallel E_{\bf c}({\bf u}).\]
The following theorem says that, when the total derivative exists, it is unique and relates to the directional derivatives.
\begin{equation}{\label{mat249}}\tag{10}\mbox{}\end{equation}
Proposition \ref{mat249}. Suppose that the vector-valued function \({\bf f}\) is differentiable at \({\bf c}\) with total derivative \(T_{\bf c}\). Then, the directional derivative \({\bf f}'({\bf c};{\bf u})\) exists for every direction \({\bf u}\in\mathbb{R}^{n}\) and we have
\[{\bf f}'({\bf c})({\bf u})=T_{\bf c}({\bf u})={\bf f}'({\bf c};{\bf u}).\]
Proof. It is clear to see \({\bf f}'({\bf c};{\bf 0})={\bf 0}\). By referring to (\ref{maeq248}), if \({\bf v}={\bf 0}\), then \(T_{\bf c}({\bf 0})={\bf 0}\). Therefore, we obtain
\[{\bf f}'({\bf c};{\bf 0})={\bf 0}=T_{\bf c}({\bf 0}).\]
Now, we assume \({\bf v}\neq {\bf 0}\) and take \({\bf v}=h{\bf u}\) with \({\bf u}\neq {\bf 0}\). Then, we have
\begin{align*} {\bf f}({\bf c}+h{\bf u})-{\bf f}({\bf c}) & =T_{\bf c}(h{\bf u})+\parallel h{\bf u}\parallel E_{\bf c}(h{\bf u})\\ & =hT_{\bf c}({\bf u})+ |h|\parallel {\bf u}\parallel E_{\bf c}(h{\bf u}).\end{align*}
By dividing \(h\) on both sides, we obtain
\begin{equation}{\label{maeq460}}\tag{11} \frac{{\bf f}({\bf c}+h{\bf u})-{\bf f}({\bf c})}{h}=T_{\bf c}({\bf u})+\frac{|h|}{h}\cdot\parallel {\bf u}\parallel E_{\bf c}(h{\bf u}). \end{equation}
Since \({\bf v}=h{\bf u}\), if \(h\rightarrow 0\), then \({\bf v}\rightarrow {\bf 0}\), i.e., \(E_{\bf c}(h{\bf u})\rightarrow {\bf 0}\). Therefore, from (\ref{maeq460}), we obtain
\begin{align*} {\bf f}'({\bf c};{\bf u}) & =\lim_{h\rightarrow 0}\frac{{\bf f}({\bf c}+h{\bf u})-{\bf f}({\bf c})}{h}\\ & =\lim_{h\rightarrow 0}\left [T_{\bf c}({\bf u})+\frac{|h|}{h}\cdot\parallel {\bf u}\parallel E_{\bf c}(h{\bf u})\right ]\\ & =T_{\bf c}({\bf u})+\parallel {\bf u}\parallel\left (\lim_{h\rightarrow 0}\frac{|h|}{h}E_{\bf c}(h{\bf u})\right )=T_{\bf c}({\bf u}), \end{align*}
where
\begin{align*} \lim_{h\rightarrow 0+}\frac{h}{h}E_{\bf c}(h{\bf u}) & =\lim_{h\rightarrow 0+}E_{\bf c}(h{\bf u})\\ & =\lim_{{\bf v}\rightarrow {\bf 0}}E_{\bf c}({\bf v})=0\end{align*}
and
\begin{align*} \lim_{h\rightarrow 0-}\frac{-h}{h}E_{\bf c}(h{\bf u}) & =\lim_{h\rightarrow 0-}-E_{\bf c}(h{\bf u})\\ & =-\lim_{{\bf v}\rightarrow {\bf 0}}E_{\bf c}({\bf v})=0.\end{align*}
This completes the proof. \(\blacksquare\)
Let \(T_{\bf c}\) be the total derivative of \({\bf f}\) at \({\bf c}\). By Proposition \ref{mat249}, we have
\[T_{\bf c}({\bf e}_{k})={\bf f}'({\bf c};{\bf e}_{k})=D_{k}{\bf f}({\bf c}).\]
\begin{equation}{\label{map462}}\tag{12}\mbox{}\end{equation}
Proposition \ref{map462}. Suppose that the vector-valued function \({\bf f}\) is differentiable at \({\bf c}\). Then \({\bf f}\) is continuous at \({\bf c}\).
Proof. Let \({\bf e}_{1},\cdots ,{\bf e}_{n}\) be the unit coordinate vectors in \(\mathbb{R}^{n}\). Given \({\bf v}=(v_{1},\cdots ,v_{n})\), we have
\[{\bf v}=v_{1}{\bf e}_{1}+\cdots +v_{n}{\bf e}_{n}.\]
Using the linearity of \(T_{\bf c}\), we obtain
\[T_{\bf c}({\bf v})=v_{1}T_{\bf c}({\bf e}_{1})+\cdots +v_{n}T_{\bf c}({\bf e}_{n}).\]
Then, we have
\[\lim_{{\bf v}\rightarrow {\bf 0}}T_{\bf c}({\bf v})={\bf 0}\mbox{ and } \lim_{{\bf v}\rightarrow {\bf 0}}\parallel {\bf v}\parallel E_{\bf c}({\bf v})={\bf 0}.\]
From (\ref{maeq248}), we have
\[\lim_{{\bf v}\rightarrow {\bf 0}}{\bf f}({\bf c}+{\bf v})={\bf f}({\bf c}).\]
This completes the proof. \(\blacksquare\)
Example. Suppose that the vector-valued function \({\bf f}\) is linear. Then, we have \({\bf f}({\bf c}+{\bf u})={\bf f}({\bf c})+{\bf f}({\bf u})\). Therefore, the total derivative \(T_{\bf c}\) exists for every \({\bf c}\) and equals to \({\bf f}\). \(\sharp\)
Proposition. Suppose that the vector-valued function \({\bf f}:S\rightarrow\mathbb{R}^{m}\) defined on a subset \(S\) of \(\mathbb{R}^{n}\) is differentiable at an interior point \({\bf c}\in S\). Given
\[{\bf v}=v_{1}{\bf e}_{1}+\cdots +v_{n}{\bf e}_{n},\]
where \({\bf e}_{1},\cdots ,{\bf e}_{n}\) are the unit coordinate vectors in \(\mathbb{R}^{n}\), we have
\begin{equation}{\label{maeq467}}\tag{13} T_{\bf c}({\bf v})=\sum_{k=1}^{n}v_{k}D_{k}{\bf f}({\bf c}). \end{equation}
In particular, when \(f\) is a real-valued function, i.e., \(m=1\), we have
\begin{equation}{\label{ma21}}\tag{14} T_{\bf c}({\bf v})=\nabla f({\bf c})\bullet {\bf v}, \end{equation}
where \(\nabla f({\bf c})\) is the gradient of \(f\) at \({\bf c}\) given by
\[\nabla f({\bf c})=\left (\frac{\partial f}{\partial x_{1}}({\bf c}),\cdots ,\frac{\partial f}{\partial x_{n}}({\bf c})\right ).\]
Proof. Using the linearity of \(T_{\bf c}\) and Proposition~\ref{mat249}, we have
\begin{align*} T_{\bf c}({\bf v}) & =\sum_{k=1}^{n}T_{\bf c}(v_{k}{\bf e}_{k}) \\ & =\sum_{k=1}^{n}v_{k}T_{\bf c}({\bf e}_{k})\\ & =\sum_{k=1}^{n}v_{k}{\bf f}'({\bf c};{\bf e}_{k}) \\ & =\sum_{k=1}^{n}v_{k}D_{k}{\bf f}({\bf c}).\end{align*}
When \(f\) is a real-valued function, we have
\[f'({\bf c};{\bf e}_{k})=\frac{\partial f}{\partial x_{k}}({\bf c})\mbox{ for }k=1,\cdots ,n.\]
This completes the proof. \(\blacksquare\)
Let \(f:\mathbb{R}\rightarrow\mathbb{R}\) be a differentiable real-valued function defined on \(\mathbb{R}\). We recall that the mean-value theorem says \(f(y)-f(x)=f'(z)(y-x)\) for some \(z\) lies between \(x\) and \(y\). Now, we are going to present the mean-value theorem for vector-valued function. Given any \({\bf x},{\bf y}\in\mathbb{R}^{n}\), the line segment joining \({\bf x}\) and \({\bf y}\) is denoted and given by
\[L({\bf x},{\bf y})=\left\{t{\bf x}+(1-t){\bf y}:0\leq t\leq 1\right\}.\]
For convenience, we also write \({\bf a}\bullet{\bf b}=\langle{\bf a},{\bf b}\rangle\).
\begin{equation}{\label{mat369}}\tag{15}\mbox{}\end{equation}
Theorem \ref{mat369}. (Mean-Value Theorem for Vector-Valued Function). Let \(S\) be an open subset of \(\mathbb{R}^{n}\), and let \({\bf x},{\bf y}\in S\) satisfy \(L({\bf x},{\bf y})\subset S\). Assume that the vector-valued function \({\bf f}:S\rightarrow\mathbb{R}^{m}\) is differentiable on \(S\). Then, for every \({\bf a}\in\mathbb{R}^{m}\), there exists a point \({\bf z}\in L({\bf x},{\bf y})\) satisfying
\begin{equation}{\label{maeq252}}\tag{16} \langle {\bf a},{\bf f}({\bf y})-{\bf f}({\bf x})\rangle =\langle {\bf a},{\bf f}'({\bf z})({\bf y}-{\bf x})\rangle . \end{equation}
Proof. Let \({\bf u}={\bf y}-{\bf x}\). Since \(S\) is open and \(L({\bf x},{\bf y})\subset S\), there exists \(\delta >0\) satisfying
\[L({\bf x}-\delta{\bf u},{\bf y}+\delta{\bf u})=L((1+\delta ){\bf x}-\delta {\bf y},(1+\delta ){\bf y}-\delta{\bf x})\subset S.\]
In other words, we can prolong the line \(L({\bf x},{\bf y})\) a little bit such that the endpoints are
\[(1+\delta ){\bf x}-\delta {\bf y}={\bf x}-\delta{\bf u}\mbox{ and }(1+\delta ){\bf y}-\delta{\bf x}={\bf x}+(1+\delta){\bf u},\]
which also says that there exists \(\delta >0\) satisfying \({\bf x}+t{\bf u}\in S\) for all \(t\in (-\delta ,1+\delta )\). Suppose that \({\bf a}\in\mathbb{R}^{m}\) is fixed. We define a real-valued function \(g\) on \((-\delta ,1+\delta )\) by
\[g(t)=\langle {\bf a},{\bf f}({\bf x}+t{\bf u})\rangle .\]
Then, using (\ref{maeq464}) and Proposition \ref{mat249} (in order), \(g\) is differentiable on \((-\delta ,1+\delta )\) and its derivative is given by
\begin{align} g'(t) & =\langle {\bf a},{\bf f}'({\bf x}+t{\bf u};{\bf u})\rangle\label{maeq368}\tag{17}\\ & =\langle {\bf a},{\bf f}'({\bf x}+t{\bf u})({\bf u})\rangle .\nonumber \end{align}
By the one-dimensional mean-value theorem, we have
\[g(1)-g(0)=g'(t)\mbox{ for }0<t<1.\]
From (\ref{maeq368}), we also have
\[g'(t)=\langle {\bf a},{\bf f}'({\bf x}+t{\bf u})({\bf u})\rangle=\langle {\bf a},{\bf f}'({\bf z})({\bf y}-{\bf x})\rangle ,\]
where \({\bf z}={\bf x}+t{\bf u}\in L({\bf x},{\bf y})\). Since
\[g(1)-g(0)=\langle {\bf a},{\bf f}({\bf y})-{\bf f}({\bf x})\rangle ,\]
we obtain the desired equality (\ref{maeq252}). We also see that the point \({\bf z}\) depends on \(g\), which also depends on \({\bf a}\). This completes the proof. \(\blacksquare\)
\begin{equation}{\label{ma132}}\tag{18}\mbox{}\end{equation}
Remark \ref{ma132}. We have the following observations.
- Let \(f\) is a real-valued function, i.e., \(m=1\). Then, we can take \(a=1\) in (\ref{maeq252}). In this case, we obtain \[f({\bf y})-f({\bf x})=f'({\bf z})({\bf y}-{\bf x})=\langle\nabla f({\bf z}),{\bf y}-{\bf x}\rangle .\]
- Let \(S\) be convex, and let the vector-valued function \({\bf f}:S\rightarrow\mathbb{R}^{m}\) be defined on \(S\) such that all the partial derivatives \(\partial f_{k}/\partial x_{j}\) are bounded on \(S\). Then, there exists a constant \(A>0\) satisfying \[\parallel {\bf f}({\bf y})-{\bf f}({\bf x})\parallel\leq A\parallel {\bf y}-{\bf x}\parallel .\] This says that the vector-valued function \({\bf f}\) satisfies the Lipschitz condition on \(S\).
\begin{equation}{\label{ma137}}\tag{19}\mbox{}\end{equation}
Lemma \ref{ma137}. Every open connected set \(S\) in \(\mathbb{R}^{n}\) is polygonally connected. In other words, every pair of points in \(S\) can be joined by a polygonal arc lying in \(S\).
The above Lemma \ref{ma137} can refer to the page Point set topology in Metric Space.
\begin{equation}{\label{ma133}}\tag{20}\mbox{}\end{equation}
Proposition \ref{ma133}. Let \(S\) be an open connected subset of \(\mathbb{R}^{n}\), and let the vector-valued function \({\bf f}:S\rightarrow\mathbb{R}^{m}\) be differentiable on \(S\). Suppose that \({\bf f}'({\bf c})={\bf 0}\) for each \({\bf c}\in S\). Then \({\bf f}\) is constant on \(S\).
Proof. Lemma \ref{ma137} says that \(S\) is polygonally connected. It means that each pair of points \({\bf x}\) and \({\bf y}\) in \(S\) can be joined by a polygonal arc lying in \(S\). We denote the vertices of this arc by \({\bf p}_{1},\cdots ,{\bf p}_{r}\) with \({\bf p}_{1}={\bf x}\) and \({\bf p}_{r}={\bf y}\). Since each segment \(L({\bf p}_{i+1},{\bf p}_{i})\subset S\), the mean-value Theorem \ref{mat369} says
\[\langle {\bf a},{\bf f}({\bf p}_{i+1})-{\bf f}({\bf p}_{i})\rangle=\langle {\bf a},{\bf f}'({\bf c})({\bf p}_{i+1}-{\bf p}_{i})\rangle=0\]
for every vector \({\bf a}\). By adding these equations for \(i=1,\cdots ,r-1\), we obtain
\[\langle {\bf a},{\bf f}({\bf y})-{\bf f}({\bf x})\rangle =0\mbox{ for every vector }{\bf a}.\]
By taking \({\bf a}={\bf f}({\bf y})-{\bf f}({\bf x})\), we also obtain
\[0=\langle {\bf f}({\bf y})-{\bf f}({\bf x}),{\bf f}({\bf y})-{\bf f}({\bf x})\rangle=\parallel {\bf f}({\bf y})-{\bf f}({\bf x})\parallel^{2},\]
which implies \({\bf f}({\bf y})={\bf f}({\bf x})\). This shows that \({\bf f}\) is constant on \(S\), and the proof is complete. \(\blacksquare\)
\begin{equation}{\label{mat376}}\tag{21}\mbox{}\end{equation}
Theorem \ref{mat376}. Let \({\bf f}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}\) be a vector-valued function defined on \(\mathbb{R}^{n}\). Suppose that one of the partial derivatives \(D_{1}{\bf f},\cdots ,D_{n}{\bf f}\) exists at \({\bf c}\), and that the remaining \(n-1\) partial derivatives exists on \(n\)-dimensional open ball \(B({\bf c};r)\) for some \(r>0\) and are continuous at \({\bf c}\). Then \({\bf f}\) is differentiable at \({\bf c}\).
Proof. We first note that a vector-valued function \({\bf f}=(f_{1},\cdots ,f_{m})\) is differentiable at \({\bf c}\) if and only if each component \(f_{k}\) is differentiable at \({\bf c}\). (The proof is left as an exercise). Without loss of generality, we assume that \(D_{1}f({\bf c})\) exists, and that \(D_{2}f,\cdots D_{n}f\) exist on \(n\)-dimensional open ball \(B({\bf c};r)\) for some \(r>0\) and are continuous at \({\bf c}\). Now, we write \({\bf v}=\lambda {\bf y}\) with \(\parallel {\bf y}\parallel =1\) and \(\lambda >0\). Then, we have \(\lambda =\parallel {\bf v}\parallel\). We can keep \(\lambda\) small enough such that \({\bf c}+{\bf v}\) lies in the open ball \(B({\bf c};r)\) in which the partial derivatives \(D_{2}f,\cdots ,D_{n}f\) exist. We express \({\bf y}\) in terms of its component as follows
\[{\bf y}=y_{1}{\bf e}_{1}+\cdots +y_{n}{\bf e}_{n},\]
where \({\bf e}_{1},\cdots ,{\bf e}_{n}\) are the unit coordinate vector. Then, we can write the difference \(f({\bf c}+{\bf v})-f({\bf c})\) as
\begin{align} f({\bf c}+{\bf v})-f({\bf c}) & =f({\bf c}+\lambda {\bf y})-f({\bf c})\nonumber\\ & =\sum_{k=1}^{n}\left [ f({\bf c}+\lambda {\bf v}_{k})-f({\bf c}+\lambda {\bf v}_{k-1})\right ],\label{maeq370}\tag{22} \end{align}
where
\[\begin{array}{ccccc} {\bf v}_{0}={\bf 0}, & {\bf v}_{1}=y_{1}{\bf e}_{1}, & {\bf v}_{2}=y_{1}{\bf e}_{1} +y_{2}{\bf e}_{2}, & \cdots , & {\bf v}_{n}=y_{1}{\bf e}_{1}+\cdots +y_{n}{\bf e}_{n}={\bf y}, \end{array}\]
that is,
\begin{equation}{\label{maeq468}}\tag{23} {\bf v}_{k}={\bf v}_{k-1}+y_{k}{\bf e}_{k}\mbox{ for }k=0,\cdots,n. \end{equation}
The first term in (\ref{maeq370}) is \(f({\bf c}+\lambda y_{1}{\bf e}_{1})-f({\bf c})\). Since the points \({\bf c}\) and \({\bf c}+\lambda y_{1}{\bf e}_{1}\) differ only in their first component, we define a function \(g\) by
\[g(x)=f(x,c_{2},\cdots ,c_{n}).\]
Since \(D_{1}f({\bf c})\) exists, we see that \(g'(c_{1})\) exists and
\begin{align*} D_{1}f({\bf c}) & =g'(c_{1})=\lim_{\lambda\rightarrow 0}\frac{g(c_{1}+\lambda y_{1})-g(c_{1})}{\lambda y_{1}}\\ & =\lim_{\lambda\rightarrow 0}\frac{f({\bf c}+\lambda y_{1}{\bf e}_{1})-f({\bf c})}{\lambda y_{1}},\end{align*}
which can be written as
\begin{align} f({\bf c}+\lambda y_{1}{\bf e}_{1})-f({\bf c}) & =\lambda y_{1}D_{1}f({\bf c})+ \lambda y_{1}E_{1}(\lambda )\nonumber\\ & =v_{1}D_{1}f({\bf c})+\lambda y_{1}E_{1}(\lambda ), \label{maeq469}\tag{24}\end{align}
where \(E_{1}(\lambda )\rightarrow 0\) as \(\lambda\rightarrow 0\). For \(k\geq 2\), using (\ref{maeq468}), the \(k\)th term in (\ref{maeq370}) is
\[f({\bf c}+\lambda {\bf v}_{k-1}+\lambda y_{k}{\bf e}_{k})-f({\bf c}+\lambda {\bf v}_{k-1}) \equiv f({\bf b}_{k}+\lambda y_{k}{\bf e}_{k})-f({\bf b}_{k}),\]
where \({\bf b}_{k}={\bf c}+\lambda {\bf v}_{k-1}\). The points \({\bf b}_{k}\) and \({\bf b}_{k}+\lambda y_{k}{\bf e}_{k}\) differ only in their \(k\)th component. We write \({\bf b}_{k}=(b_{k}^{(1)},\cdots ,b_{k}^{(n)})\). Then, we can define
\[g_{k}(x)=f\left (b_{k}^{(1)},\cdots b_{k}^{(k-1)},x,b_{k}^{(k+1)},\cdots ,b_{k}^{(n)}\right ).\]
In order to apply the one-dimensional mean-value theorem for the derivatives of \(g_{k}\), we need to claim that the line segment joining the points \({\bf b}_{k}\) and \({\bf b}_{k}+\lambda y_{k}{\bf e}_{k}\) lies in the open ball \(B({\bf c};r)\). Since \({\bf c}+{\bf v}\in B({\bf c};r)\), it follows
\[0<\lambda=\parallel {\bf v}\parallel <r.\]
Now, we have
\begin{align*} \parallel {\bf b}_{k}-{\bf c}\parallel & =\parallel\lambda {\bf v}_{k-1}\parallel =\lambda\cdot\parallel {\bf v}_{k-1}\parallel\\ & \leq\lambda\cdot\parallel {\bf y}\parallel =\lambda <r,\end{align*}
which says that \({\bf b}_{k}\in B({\bf c};r)\). On the other hand, we also have
\begin{align*} \parallel {\bf b}_{k}+\lambda y_{k}{\bf e}_{k}-{\bf c}\parallel & =\parallel\lambda {\bf v}_{k-1}+\lambda y_{k}{\bf e}_{k}\parallel =\lambda\cdot\parallel {\bf v}_{k-1}+y_{k}{\bf e}_{k}\parallel \\ & =\lambda\cdot\parallel {\bf v}_{k}\parallel \\ & \leq\lambda\cdot\parallel {\bf y}\parallel =\lambda <r,\end{align*}
which says that \({\bf b}_{k}+\lambda y_{k}{\bf e}_{k}\in B({\bf c};r)\). Therefore, we conclude that the line segment joining the points \({\bf b}_{k}\) and \({\bf b}_{k}+\lambda y_{k}{\bf e}_{k}\) indeed lies in the open ball \(B({\bf c};r)\). Now, we apply the one-dimensional mean-value theorem for the derivatives of \(g_{k}\) to obtain
\begin{align} f({\bf c}+\lambda {\bf v}_{k})-f({\bf c}+\lambda {\bf v}_{k-1}) & =f({\bf b}_{k}+\lambda y_{k}{\bf e}_{k})-f({\bf b}_{k})\label{maeq371}\tag{25}\\ & =g_{k}\left (b_{k}^{(k)}+\lambda y_{k}\right )-g_{k}\left (b_{k}^{(k)}\right )\nonumber\\ & =g_{k}'(d_{k})\cdot\lambda y_{k}\nonumber\\ & =\lambda y_{k}\cdot Df_{k}\left (b_{k}^{(1)},\cdots b_{k}^{(k-1)},d_{k},b_{k}^{(k+1)},\cdots ,b_{k}^{(n)}\right )\nonumber\\ & \equiv\lambda y_{k}\cdot D_{k}f({\bf a}_{k})=v_{k}\cdot D_{k}f({\bf a}_{k}),\nonumber \end{align}
where \(d_{k}\) is between \(b_{k}^{(k)}+\lambda y_{k}\) and \(b_{k}^{(k)}\), and
\[{\bf a}_{k}\equiv\left (b_{k}^{(1)},\cdots b_{k}^{(k-1)},d_{k},b_{k}^{(k+1)},\cdots ,b_{k}^{(n)}\right )\]
lies on the line segment joining the points \({\bf b}_{k}\) and \({\bf b}_{k}+\lambda y_{k}{\bf e}_{k}\), i.e., \({\bf a}_{k}\in B({\bf c};r)\). If \({\bf b}_{k}\rightarrow {\bf c}\) as \(\lambda\rightarrow 0\), then \({\bf a}_{k}\rightarrow {\bf c}\) as \(\lambda\rightarrow 0\). Since each \(D_{k}f\) is continuous at \({\bf c}\) for \(k\geq 2\), we also have
\[D_{k}f({\bf a}_{k})\rightarrow D_{k}f({\bf c})\mbox{ as }\lambda\rightarrow 0,\]
which says
\begin{equation}{\label{maeq372}}\tag{26} D_{k}f({\bf a}_{k})=D_{k}f({\bf c})+E_{k}(\lambda ), \end{equation}
where \(E_{k}(\lambda )\rightarrow 0\) as \(\lambda\rightarrow 0\). Using (\ref{maeq469}), (\ref{maeq371}), (\ref{maeq372}) and (\ref{ma21}), the equality (\ref{maeq370}) becomes
\begin{align*} f({\bf c}+{\bf v})-f({\bf c}) & =\sum_{k=1}^{n}v_{k}\cdot D_{k}f({\bf c})+\lambda\sum_{k=1}^{n}y_{k}E_{k}(\lambda )\\ & =\sum_{k=1}^{n}v_{k}\cdot \frac{\partial f}{\partial x_{k}}({\bf c})+\lambda\sum_{k=1}^{n}y_{k}E_{k}(\lambda )\\ & =\langle\nabla f({\bf c}),{\bf v}\rangle+\parallel {\bf v}\parallel E(\lambda )\\ & =T_{\bf c}({\bf v})+\parallel {\bf v}\parallel E(\lambda ), \end{align*}
where
\[E(\lambda )=\sum_{k=1}^{n}y_{k}E_{k}(\lambda )\rightarrow 0\mbox{ as }\lambda\rightarrow 0.\]
This completes the proof. \(\blacksquare\)
Proof. (Alternative proof without using the one-dimensional mean-value theorem). Now, we consider the \(k\)th term in (\ref{maeq370}) for \(k\geq 2\). Since \({\bf b}_{k}\in B({\bf c};r)\), according to (\ref{maeq469}), we have
\begin{align} f({\bf c}+\lambda {\bf v}_{k})-f({\bf c}+\lambda {\bf v}_{k-1}) & = f({\bf b}_{k}+\lambda y_{k}{\bf e}_{k})-f({\bf b}_{k})\label{maeq470}\tag{27}\\ & =\lambda y_{k}D_{k}f({\bf b}_{k})+\lambda y_{k}\bar{E}_{k}(\lambda )\nonumber\\ & =v_{k}D_{k}f({\bf b}_{k})+\lambda y_{k}\bar{E}_{k}(\lambda ), \end{align}
where \(\bar{E}_{k}(\lambda )\rightarrow 0\) as \(\lambda\rightarrow 0\). On the other hand, since each \(D_{k}f\) is continuous at \({\bf c}\) for \(k\geq 2\) and \({\bf b}_{k}\rightarrow {\bf c}\) as \(\lambda\rightarrow 0\), we have
\begin{equation}{\label{maeq471}}\tag{28} D_{k}f({\bf b}_{k})=D_{k}f({\bf c})+\widehat{E}_{k}(\lambda ), \end{equation}
where \(\widehat{E}_{k}(\lambda )\rightarrow 0\) as \(\lambda\rightarrow 0\). Using (\ref{maeq469}), (\ref{maeq470}) and (\ref{maeq471}), the equality (\ref{maeq370}) becomes
\begin{align*} f({\bf c}+{\bf v})-f({\bf c}) & =\sum_{k=1}^{n}v_{k}\cdot D_{k}f({\bf c})+\lambda\cdot y_{1}\cdot E_{1}(\lambda )+\lambda\sum_{k=2}^{n}y_{k}\cdot\left (\bar{E}_{k}(\lambda )+\widehat{E}_{k}(\lambda )\right )\\ & =T_{\bf c}({\bf v})+\parallel {\bf v}\parallel E(\lambda ), \end{align*}
where
\[E(\lambda )=y_{1}\cdot E_{1}(\lambda )+\sum_{k=2}^{n}y_{k}\cdot\left (\bar{E}_{k}(\lambda ) +\widehat{E}_{k}(\lambda )\right )\rightarrow 0\mbox{ as }\lambda\rightarrow 0.\]
This completes the proof. \(\blacksquare\)
We remark that the partial derivatives \(D_{1}{\bf f},\cdots ,D_{n}{\bf f}\) of a vector-valued function \({\bf f}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}\) are also vector-valued functions from \(\mathbb{R}^{n}\) to \(\mathbb{R}^{m}\). Therefore, these partial derivatives can have partial derivatives that are called second-order partial derivative and are denoted by \(D_{rk}{\bf f}=D_{r}(D_{k}{\bf f})\). In general, the second partial derivatives \(D_{rk}{\bf f}\) and \(D_{kr}{\bf f}\) are not equal to each other. The counterexample is given below.
Example. We consider the following real-valued function
\[f(x,y)=\left\{\begin{array}{ll} {\displaystyle \frac{xy(x^{2}-y^{2})}{x^{2}+y^{2}}} & \mbox{if }(x,y)\neq (0,0)\\ 0 & \mbox{if }(x,y)=(0,0). \end{array}\right .\]
Then, we have
\begin{align*} D_{1}f(x,y) & =\frac{\partial f}{\partial x}(x,y)\\ & =\frac{y(x^{4}+4x^{2}y^{2}-y^{4})}{(x^{2}+y^{2})^{2}}\mbox{ for }(x,y)\neq (0,0)\end{align*}
and
\[D_{1}(0,0)=\lim_{h\rightarrow 0}\frac{f(h,0)-f(0,0)}{h}=0.\]
We also have \(D_{1}f(0,y)=-y\) for all \(y\neq 0\). This says that \(D_{21}f(0,y)=-1\) for all \(y\neq 0\). Let \(g(y)=D_{1}f(0,y)\). Then \(g(y)=-y\) for all \(y\neq 0\) and \(g(0)=D_{1}(0,0)=0\). Therefore, we obtain
\[D_{21}f(0,0)=g'(0)=\lim_{h\rightarrow 0}\frac{g(h)-g(0)}{h} =\lim_{h\rightarrow 0}\frac{-h}{h}=-1\]
On the other hand, we have
\begin{align*} D_{2}f(x,y) & =\frac{\partial f}{\partial y}(x,y)\\ & =\frac{x(x^{4}-4x^{2}y^{2}-y^{4})}{(x^{2}+y^{2})^{2}}\mbox{ for }(x,y)\neq (0,0)\end{align*}
and
\[D_{2}f(0,0)=\lim_{h\rightarrow 0}\frac{f(0,h)-f(0,0)}{h}=0.\]
Therefore, we have \(D_{2}f(x,0)=x\) for all \(x\neq 0\). This says that \(D_{12}f(x,0)=1\) for all \(x\neq 0\). We can similarly obtain \(D_{12}f(0,0)=1\). Now, we have \(D_{21}f(0,0)\neq D_{12}f(0,0)\). \(\sharp\)
We are going to provide the sufficient conditions to guarantee that the mixed partial derivatives \(D_{rk}{\bf f}\) and \(D_{kr}({\bf f})\) are identical.
\begin{equation}{\label{mat377}}\tag{29}\mbox{}\end{equation}
Theorem \ref{mat377}. Let \({\bf }:\mathbb{R}^{m}\rightarrow\mathbb{R}^{n}\) be a vector-valued function defined on \(\mathbb{R}^{m}\). Suppose that both partial derivatives \(D_{r}{\bf f}\) and \(D_{k}{\bf f}\) exist on some \(n\)-dimensional open ball \(B({\bf c};r)\), and that are differentiable at \({\bf c}\). Then, we have
\begin{equation}{\label{maeq373}}\tag{30}\tag{20} D_{rk}{\bf f}({\bf c})=D_{kr}{\bf f}({\bf c}). \end{equation}
Proof. For the vector-valued function \({\bf f}=(f_{1},\cdots ,f_{m})\), we have
\[D_{k}{\bf f}=\left (D_{k}f_{1},\cdots ,D_{k}f_{m}\right ).\]
Therefore, it suffices to prove the theorem for the real-valued function. Since only two components are considered in (\ref{maeq373}), it suffices to consider the case \(n=2\). For simplicity, we also assume \({\bf c}=(0,0)\). Therefore, we shall prove \(D_{12}f(0,0)=D_{21}f(0,0)\). We can pick \(h\neq 0\) such that the square with vertices \((0,0)\), \((h,0)\), \((0,h)\) and \((h,h)\) lies in the 2-dimensional open ball \(B({\bf 0};\delta )\). We consider the following quantity
\[\Delta h=f(h,h)-f(h,0)-f(0,h)+f(0,0).\]
The purpose is to show that \(\Delta h/h^{2}\) tends to both \(D_{21}f(0,0)\) and \(D_{12}f(0,0)\) as \(h\rightarrow 0\). We define \(G(x)=f(x,h)-f(x,0)\). Then, we have
\begin{equation}{\label{maeq374}}\tag{31} \Delta h=G(h)-G(0). \end{equation}
By the one-dimensional mean-value theorem, we have
\begin{equation}{\label{maeq375}}\tag{32} G(h)-G(0)=hG'(x_{1})=h\left [D_{1}f(x_{1},h)-D_{1}f(x_{1},0)\right ], \end{equation}
where \(x_{1}\) lies between \(0\) and \(h\). Let \({\bf v}_{1}=(x_{1},h)\). Since \(D_{1}f\) is differentiable at \({\bf c}=(0,0)\), using (\ref{maeq467}), we have the first-order Taylor formula
\begin{align*} D_{1}f(x_{1},h) & =D_{1}f((0,0)+(x_{1},h))=D_{1}f({\bf c}+{\bf v}_{1})\\ & =D_{1}f({\bf c})+T_{\bf c}({\bf v}_{1})+\parallel {\bf v}_{1}\parallel\cdot E_{\bf c}({\bf v}_{1})\\ & =D_{1}f({\bf c})+x_{1}\cdot D_{11}f({\bf c})+h\cdot D_{21}f({\bf c})+\parallel {\bf v}_{1}\parallel\cdot E_{\bf c}({\bf v}_{1})\\ & =D_{1}f(0,0)+x_{1}\cdot D_{11}f(0,0)+h\cdot D_{21}f(0,0)+\sqrt{x_{1}^{2}+h^{2}}\cdot E_{\bf c}({\bf v}_{1}), \end{align*}
where \(E_{\bf c}({\bf v}_{1})\rightarrow 0\) as \({\bf v}_{1}\rightarrow {\bf 0}\). Since \(x_{1}\) lies between \(0\) and \(h\), it follows if \(h\rightarrow 0\), then \(x_{1}\rightarrow 0\), i.e., \({\bf v}_{1}\rightarrow {\bf 0}\). Equivalently, if \(h\rightarrow 0\), then \(E_{\bf c}({\bf v}_{1})\rightarrow 0\). Let \({\bf v}_{2}=(x_{1},0)\). We can similarly obtain
\[D_{1}f(x_{1},0)=D_{1}f(0,0)+D_{11}f(0,0)x_{1}+|x_{1}|\cdot E_{\bf c}({\bf v}_{2}),\]
where \(E_{\bf c}({\bf v}_{2})\rightarrow 0\) as \(h\rightarrow 0\). Since \(E_{\bf c}({\bf v}_{1})\) and \(E_{\bf c}({\bf v}_{2})\) can be regarded as functions of \(h\), we write \(E_{\bf c}({\bf v}_{1})=E_{1}(h)\) and \(E_{\bf c}({\bf v}_{2})=E_{2}(h)\). Then, we have
\[D_{1}f(x_{1},h)=D_{1}f(0,0)+x_{1}\cdot D_{11}f(0,0)+h\cdot D_{21}f(0,0)+\sqrt{x_{1}^{2}+h^{2}}\cdot E_{1}(h)\]
and \[D_{1}f(x_{1},0)=D_{1}f(0,0)+D_{11}f(0,0)x_{1}+|x_{1}|\cdot E_{2}(h),\]
where \(E_{1}(h)\rightarrow 0\) and \(E_{2}(h)\rightarrow 0\) as \(h\rightarrow 0\). Applying these facts to (\ref{maeq374}) and (\ref{maeq375}), we obtain \begin{equation}{\label{maeq465}}\tag{33} \Delta h=D_{21}f(0,0)h^{2}+E(h), \end{equation} where
\[E(h)=h\cdot\sqrt{x_{1}^{2}+h^{2}}\cdot E_{1}(h)-h|x_{1}|E_{2}(h).\]
Since \(x_{1}\) lies between \(0\) and \(h\), i.e., \(|x_{1}|\leq |h|\), we also have
\[0\leq |E(h)|\leq\sqrt{2}\cdot h^{2}|E_{1}(h)|+h^{2}|E_{2}(h)|,\]
which implies \begin{equation}{\label{maeq466}}\tag{34} \lim_{h\rightarrow 0}\left |\frac{E(h)}{h^{2}}\right |=0. \end{equation} From (\ref{maeq465}) and (\ref{maeq466}), we obtain
\[\lim_{h\rightarrow 0}\frac{\Delta h}{h^{2}}=D_{21}f(0,0).\]
Applying the same argument to the function \(H(y)=f(h,y)-f(0,y)\), we can also obtain
\[\lim_{h\rightarrow 0}\frac{\Delta h}{h^{2}}=D_{12}f(0,0).\]
This completes the proof. \(\blacksquare\)
Theorem. Let \({\bf f}:\mathbb{R}^{m}\rightarrow\mathbb{R}^{n}\) be a vector-valued function defined on \(\mathbb{R}^{m}\). Suppose that both partial derivatives \(D_{r}{\bf f}\) and \(D_{k}{\bf f}\) exist on some \(n\)-dimensional open ball \(B({\bf c};r)\), and that both \(D_{rk}{\bf f}({\bf c})\) and \(D_{kr}{\bf f}({\bf c})\) are continuous at \({\bf c}\). Then, we have
\[D_{rk}{\bf f}({\bf c})=D_{kr}{\bf f}({\bf c}).\]
Proof. From Theorem \ref{mat376}, we see that \(D_{r}{\bf f}\) and \(D_{k}{\bf f}\) are differentiable at${\bf c}$. Therefore, the results follow immediately from Theorem \ref{mat377}. \(\blacksquare\)
Theorem. Let \({\bf f}:\mathbb{R}^{m}\rightarrow\mathbb{R}^{n}\) be a vector-valued function defined on \(\mathbb{R}^{m}\). Suppose that the partial derivatives \(D_{r}{\bf f}\), \(D_{k}{\bf f}\) and \(D_{kr}{\bf f}\) are continuous on some \(n\)-dimensional open ball \(B({\bf c};r)\). Then \(D_{rk}{\bf f}({\bf c})\) exist at \({\bf c}\) and
\[D_{rk}{\bf f}({\bf c})=D_{kr}{\bf f}({\bf c}).\]
Next, we are going to introduce the Taylor’s formula for the real-valued function \(f:\mathbb{R}^{n}\rightarrow\mathbb{R}\) defined on \(\mathbb{R}^{n}\). Suppose that \(f\) is differentiable at \({\bf c}\). Then, using Theorem \ref{ma15}, the directional derivative in the direction \({\bf t}\in\mathbb{R}^{n}\) is given by
\begin{equation}{\label{ma164}}\tag{35} f'({\bf c};{\bf t})=\nabla f({\bf c})\bullet {\bf t}=\sum_{i=1}^{n}\frac{\partial f}{\partial x_{i}}({\bf c})t_{i}. \end{equation}
Now, we introduce some notations. Suppose that all second-order partial derivatives of \(f\) exist at \({\bf x}\in\mathbb{R}^{n}\). For any \({\bf t}\in\mathbb{R}^{n}\), we write
\begin{equation}{\label{ma165}}\tag{36} f”({\bf c};{\bf t})=\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}({\bf c})t_{i}t_{j}. \end{equation}
Similarly, if all third-order partial derivatives of \(f\) exist at \({\bf c}\), then we also write
\[f”'({\bf x};{\bf t})=\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{n}\frac{\partial^{3}f} {\partial x_{i}\partial x_{j}\partial x_{k}}({\bf c})t_{i}t_{j}t_{k}.\]
In general, if all \(m\)th-order partial derivatives exist at \({\bf c}\), then we can similarly define \(f^{(m)}({\bf c};{\bf t})\).
\begin{equation}{\label{mat380}}\tag{37}\mbox{}\end{equation}
Theorem \ref{mat380}. (Taylor’s Formula). Let \(f:\mathbb{R}^{n}\rightarrow\mathbb{R}\) be a real-valued function defined on \(\mathbb{R}^{n}\), and let \({\bf a},{\bf b}\in S\) satisfy \(L({\bf a},{\bf b})\subset S\). Suppose that all its partial derivatives of order that is less than \(m\) are differentiable on an open subset \(S\) of \(\mathbb{R}^{n}\). Then, there exists a point \({\bf z}\in L({\bf a},{\bf b})\) satisfying
\[f({\bf b})-f({\bf a})=\sum_{k=1}^{m-1}\frac{1}{k!}f^{(k)}({\bf a};{\bf b}-{\bf a})+\frac{1}{m!}f^{(m)}({\bf z};{\bf b}-{\bf a}).\]
Proof. Since \(S\) is open, by referring to the proof of Theorem~\ref{mat369} there exists \(\delta >0\) satisfying \({\bf a}+t({\bf b}-{\bf a})\in S\) for each \(t\in (-\delta ,1+\delta )\). We define a real-valued function \(g\) on \((-\delta ,1+\delta )\) by
\[g(t)=f({\bf a}+t({\bf b}-{\bf a})).\]
Then, we have
\[f({\bf b})-f({\bf a})=g(1)-g(0).\]
We shall prove the theorem by applying the one-dimensional Taylor’s formula to \(g\). Now, we have
\begin{equation}{\label{maeq378}}\tag{38} g(1)-g(0)=\sum_{k=1}^{m-1}\frac{1}{k!}g^{(k)}(0)+\frac{1}{m!}g^{(m)}(\theta ), \end{equation}
where \(\theta\in (0,1)\). Let \({\bf p}(t)={\bf a}+t({\bf b}-{\bf a})\). Then, we have \(g(t)=f({\bf p}(t))\) and the \(k\)th component of \({\bf p}\) has derivative \(p’_{k}(t)=b_{k}-a_{k}\). Applying the chain rule and using (\ref{ma164}), we see that \(g'(t)\) exists in the interval \((-\delta ,1+\delta )\), and is given by the formula
\begin{align*} g'(t) & =\sum_{j=1}^{n}\frac{\partial f}{\partial x_{j}}({\bf p}(t))(b_{j}-a_{j})\\ & =f'({\bf p}(t);{\bf b}-{\bf a}).\end{align*}
Applying the chain rule again and using (\ref{ma165}), we also obtain
\begin{align*} g”(t) & =\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}} ({\bf p}(t))(b_{j}-a_{j})(b_{i}-a_{i})\\ & =f”({\bf p}(t);{\bf b}-{\bf a}).\end{align*}
Therefore, we can inductively obtain
\begin{equation}{\label{maeq379}}\tag{39} g^{(m)}(t)=f^{(m)}({\bf p}(t);{\bf b}-{\bf a}). \end{equation}
Since the point \({\bf z}={\bf a}+\theta ({\bf b}-{\bf a})\in L({\bf a},{\bf b})\), from (\ref{maeq378}) and (\ref{maeq379}), we obtain the desired formula. This completes the proof. \(\blacksquare\)
The Chain Rule.
Let \({\bf f}\) and \({\bf g}\) be two vector-valued functions such that the composition \({\bf h}={\bf f}\circ {\bf g}\) is defined in a neighborhood of a point \({\bf a}\). The chain rule means to obtain the total derivative of \({\bf h}\). For a vector-valued function \({\bf f}=(f_{1},\cdots ,f_{m})\), by Proposition \ref{mat249} and (\ref{ma16}), we have
\begin{align} {\bf f}'({\bf c})({\bf v}) & ={\bf f}'({\bf c};{\bf v})=\sum_{k=1}^{m}f’_{k}({\bf c};{\bf v}){\bf e}_{k}\nonumber\\ & =\sum_{k=1}^{m} (\nabla f_{k}({\bf c})\bullet{\bf v}){\bf e}_{k}, \label{maeq461}\tag{40}\end{align}
which implies
\begin{align*} \parallel {\bf f}'({\bf c})({\bf v})\parallel & =\left |\!\left |\sum_{k=1}^{m}(\nabla f_{k}({\bf c})\bullet{\bf v}){\bf e}_{k}\right |\!\right |\\ & \leq\sum_{k=1}^{m}\left |\nabla f_{k}({\bf c})\bullet{\bf v}\right |\parallel{\bf e}_{k}\parallel\\ & \quad\mbox{ (using the triangle inequality)}\\ & =\sum_{k=1}^{m}\left |\nabla f_{k}({\bf c})\bullet{\bf v}\right |\leq\sum_{k=1}^{m}\parallel {\bf v}\parallel\cdot\parallel\nabla f_{k}({\bf c})\parallel \\ & \quad\mbox{ (using the Cauchy-Schwartz inequality)}\\ & =\parallel {\bf v}\parallel\cdot\sum_{k=1}^{m}\parallel\nabla f_{k}({\bf c})\parallel. \end{align*}
Therefore, we obtain
\begin{equation}{\label{maeq363}}\tag{41} \parallel {\bf f}'({\bf c})({\bf v})\parallel\leq M\cdot\parallel {\bf v}\parallel , \end{equation}
where
\[M=\sum_{k=1}^{m}\parallel\nabla f_{k}({\bf c})\parallel .\]
Let \(\mathfrak{B}\) be the standard basis for \(\mathbb{R}^{n}\) and \(\mathfrak{B}’\) be the standard basis for \(\mathbb{R}^{m}\). According to (\ref{maeq461}), the matrix representing the linear function \({\bf f}'({\bf c})\) is given by
\begin{align*} D{\bf f}({\bf c}) & \equiv [{\bf f}'({\bf c})]_{\mathfrak{B}}^{\mathfrak{B}’}\\ & =\left [\begin{array}{cccc} {\displaystyle \frac{\partial f_{1}}{\partial x_{1}}({\bf c})} & {\displaystyle \frac{\partial f_{1}}{\partial x_{2}}({\bf c})} & \cdots & {\displaystyle \frac{\partial f_{1}}{\partial x_{n}}({\bf c})}\\ {\displaystyle \frac{\partial f_{2}}{\partial x_{1}}({\bf c})} & {\displaystyle \frac{\partial f_{2}}{\partial x_{2}}({\bf c})} & \cdots & {\displaystyle \frac{\partial f_{2}}{\partial x_{n}}({\bf c})}\\ \vdots & \vdots && \vdots\\ {\displaystyle \frac{\partial f_{m}}{\partial x_{1}}({\bf c})} & {\displaystyle \frac{\partial f_{m}}{\partial x_{2}}({\bf c})} & \cdots & {\displaystyle \frac{\partial f_{m}}{\partial x_{n}}({\bf c})} \end{array}\right ],\end{align*}
which is called the Jacobian matrix of \({\bf f}\) at \({\bf c}\).
\begin{equation}{\label{mat413}}\tag{42}\mbox{}\end{equation}
Theorem \ref{mat413}. (Chain Rule). Suppose that \({\bf g}\) is differentiable at \({\bf a}\) with total derivative \({\bf g}'({\bf a})\). Let \({\bf b}={\bf g}({\bf a})\) and assume that \({\bf f}\) is differentiable at \({\bf b}\) with total derivative \({\bf f}'({\bf b})\). Then, the composition function \({\bf h}={\bf f}\circ {\bf g}\) is differentiable at \({\bf a}\) with the total derivative \({\bf h}'({\bf a})\) given by
\begin{equation}{\label{maeq250}}\tag{43} {\bf h}'({\bf a})={\bf f}'({\bf b})\circ {\bf g}'({\bf a}), \end{equation}
which is the composition of two linear functions \({\bf f}'({\bf b})\) and \({\bf g}'({\bf a})\).
Proof. We have
\begin{align} {\bf h}({\bf a}+{\bf y})-{\bf h}({\bf a}) & ={\bf f}\left ({\bf g}({\bf a}+{\bf y})\right )-{\bf f}({\bf g}({\bf a}))\nonumber\\ & ={\bf f}({\bf b}+{\bf v})-{\bf f}({\bf b}), \label{maeq365}\tag{44}\end{align}
where \({\bf b}={\bf g}({\bf a})\) and \({\bf v}={\bf g}({\bf a}+{\bf y})-{\bf b}\). By referring to (\ref{maeq248}), we have
\begin{equation}{\label{maeq361}}\tag{45} {\bf v}={\bf g}({\bf a}+{\bf y})-{\bf g}({\bf a})={\bf g}'({\bf a})({\bf y})+\parallel {\bf y}\parallel E_{\bf a}({\bf y}), \end{equation}
where \(E_{\bf a}({\bf y})\rightarrow {\bf 0}\) as \({\bf y}\rightarrow {\bf 0}\). We also have
\begin{equation}{\label{maeq362}}\tag{46} {\bf f}({\bf b}+{\bf v})-{\bf f}({\bf b})=f'({\bf b})({\bf v})+\parallel {\bf v}\parallel E_{\bf b}({\bf v}) \end{equation}
where \(E_{\bf b}({\bf v})\rightarrow {\bf 0}\) as \({\bf v}\rightarrow {\bf 0}\). By substituting (\ref{maeq361}) into (\ref{maeq362}) and using the linearity of \({\bf f}'({\bf b})\), we obtain
\begin{align} {\bf f}({\bf b}+{\bf v})-{\bf f}({\bf b}) & =f'({\bf b})({\bf g}'({\bf a})({\bf y}))+ {\bf f}'({\bf b})\left (\parallel {\bf y}\parallel E_{\bf a}({\bf y})\right )+\parallel {\bf v}\parallel E_{\bf b}({\bf v})\label{maeq366}\tag{47}\\ & \equiv f'({\bf b})({\bf g}'({\bf a})({\bf y}))+\parallel {\bf y}\parallel E({\bf y}),\nonumber \end{align}
where \(E({\bf 0})={\bf 0}\) and
\begin{equation}{\label{maeq364}}\tag{48} E({\bf y})={\bf f}'({\bf b})\left (E_{\bf a}({\bf y})\right )+\frac{\parallel {\bf v}\parallel}{\parallel {\bf y}\parallel}E_{\bf b}({\bf v}) \mbox{ for }{\bf y}\neq {\bf 0}. \end{equation}
Next, we want to claim \(E({\bf y})\rightarrow {\bf 0}\) as \({\bf y}\rightarrow {\bf 0}\). Using (\ref{maeq363}) and (\ref{maeq361}), we have
\begin{align*} \parallel {\bf v}\parallel & \leq\parallel {\bf g}'({\bf a})({\bf y})\parallel +\parallel {\bf y}\parallel\cdot\parallel E_{\bf a}({\bf y})\parallel\\ & \leq \parallel {\bf y}\parallel\left [M+\parallel E_{\bf a}({\bf y})\parallel\right ],\end{align*}
where
\[M=\sum_{k=1}^{m}\parallel\nabla g_{k}({\bf a})\parallel .\]
Therefore, we obtain
\[\frac{\parallel {\bf v}\parallel}{\parallel {\bf y}\parallel}\leq M+\parallel E_{\bf a}({\bf y})\parallel ,\]
which says
\begin{equation}{\label{maeq463}}\tag{49} \frac{\parallel {\bf v}\parallel}{\parallel {\bf y}\parallel}\leq M \end{equation}
by taking \({\bf y}\rightarrow {\bf 0}\). Since \(E_{\bf a}({\bf y})\rightarrow {\bf 0}\) as \({\bf y}\rightarrow {\bf 0}\), using Proposition \ref{map462}, the continuity says
\begin{equation}{\label{ma131}}\tag{50} \lim_{{\bf y}\rightarrow {\bf 0}}{\bf f}'({\bf b})(E_{\bf a}({\bf y}))={\bf f}'({\bf b})({\bf 0})={\bf 0}. \end{equation}
Since \({\bf v}={\bf g}({\bf a}+{\bf y})-{\bf g}({\bf a})\) and \({\bf g}\) is differentiable at \({\bf a}\), i.e., \({\bf g}\) is continuous at \({\bf a}\), we see that \({\bf y}\rightarrow {\bf 0}\) implies \({\bf v}\rightarrow {\bf 0}\). This says that \(E_{\bf b}({\bf v})\rightarrow {\bf 0}\) as \({\bf y}\rightarrow {\bf 0}\). From (\ref{maeq364}), (\ref{maeq463}) and (\ref{ma131}), we have \(E({\bf y})\rightarrow {\bf 0}\) as \({\bf y}\rightarrow {\bf 0}\). Using (\ref{maeq365}) and (\ref{maeq366}), we obtain the Taylor formula
\begin{align*} {\bf h}({\bf a}+{\bf y})-{\bf h}({\bf a}) & =f'({\bf b})({\bf g}'({\bf a})({\bf y}))+\parallel {\bf y}\parallel E({\bf y}) \\ & =({\bf f}'({\bf b})\circ {\bf g}'({\bf a}))({\bf y})+\parallel {\bf y}\parallel E({\bf y}),\end{align*}
where \(E({\bf y})\rightarrow {\bf 0}\) as \({\bf y}\rightarrow {\bf 0}\). This proves that \({\bf h}\) is differentiable at \({\bf a}\) and its total derivative at \({\bf a}\) is the composition \({\bf f}'({\bf b})\circ {\bf g}'({\bf a})\). This completes the proof. \(\blacksquare\)
Since the matrices representing the linear functions \({\bf f}'({\bf b})\), \({\bf g}'({\bf a})\) and \({\bf h}'({\bf a})\) are \(D{\bf f}({\bf b})\), \(D{\bf g}({\bf a})\) and \(D{\bf h}({\bf a})\), respectively, according to the results in linear algebra, the matrix representing the composition of two linear functions is equal to the matrix product of the matrices that represent their corresponding linear functions. Therefore, the chain rule presented in (\ref{maeq250}) can be written as the matrix product of Jacobian matrices as follows:
\begin{equation}{\label{maeq250*}}\tag{51} D{\bf h}({\bf a})=D{\bf f}({\bf b})D{\bf g}({\bf a}), \end{equation}
which is called the matrix form of chain rule. Suppose that \({\bf a}\in\mathbb{R}^{p}\), \({\bf b}={\bf g}({\bf a})\in\mathbb{R}^{n}\) and \({\bf f}({\bf b})\in\mathbb{R}^{m}\). Then \({\bf h}({\bf a})\in\mathbb{R}^{m}\) and we can write \[\begin{array}{lcl} {\bf g}=(g_{1},\cdots ,g_{n}), & {\bf f}=(f_{1},\cdots ,f_{m}), & {\bf h}=(h_{1},\cdots ,h_{m}), \end{array}\] where \({\bf f}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}\), \({\bf g}:\mathbb{R}^{p}\rightarrow\mathbb{R}^{n}\) and \({\bf h}:\mathbb{R}^{p}\rightarrow\mathbb{R}^{m}\). Therefore \(D{\bf h}({\bf a})\) is an \(m\times p\) matrix, \(D{\bf f}({\bf b})\) is an \(m\times n\) matrix and \(D{\bf g}({\bf a})\) is an \(n\times p\) matrix. The chain rule given in (\ref{maeq250*}) says \begin{equation}{\label{maeq251}}\tag{52} \frac{\partial h_{i}}{\partial x_{j}}({\bf a})=\sum_{k=1}^{n}\frac{\partial f_{i}}{\partial x_{k}}({\bf b})\cdot\frac{\partial g_{k}}{\partial x_{j}}({\bf a}) \end{equation} for \(i=1,\cdots ,m\) and \(j=1,\cdots ,p\). Suppose that we write \({\bf y}={\bf f}({\bf x})\) and \({\bf x}={\bf g}({\bf t})\). Then \({\bf y}={\bf f}({\bf g}({\bf t}))={\bf h}({\bf t})\). In this case, the equation (\ref{maeq251}) becomes
\[\frac{\partial y_{i}}{\partial t_{j}}=\sum_{k=1}^{n}\frac{\partial y_{i}}{\partial x_{k}}\cdot\frac{\partial x_{k}}{\partial t_{j}}.\]
In particular, we have the following forms.
- Let \(u(x,y)\) be a function of \(x\) and \(y\), and let \(x=x(s,t)\) and \(y=y(s,t)\) be functions of \(s\) and \(t\). Then, we have \[\frac{\partial u}{\partial s}=\frac{\partial u}{\partial x}\frac{\partial x}{\partial s}+\frac{\partial u}{\partial y}\frac{\partial y}{\partial s}\mbox{ and } \frac{\partial u}{\partial t}=\frac{\partial u}{\partial x}\frac{\partial x}{\partial t}+\frac{\partial u}{\partial y}\frac{\partial y}{\partial t}.\]
- Let \(u(x,y,z)\) be a function of \(x\), \(y\) and \(z\), and let \(x=x(s,t)\), \(y=y(s,t)\) and \(z(s,t)\) be functions of \(s\) and \(t\). Then, we have \[\frac{\partial u}{\partial s}=\frac{\partial u}{\partial x}\frac{\partial x}{\partial s}+\frac{\partial u}{\partial y}\frac{\partial y} {\partial s}+\frac{\partial u}{\partial z}\frac{\partial z}{\partial s}\mbox{ and } \frac{\partial u}{\partial t}=\frac{\partial u}{\partial x}\frac{\partial x}{\partial t}+\frac{\partial u}{\partial y}\frac{\partial y} {\partial t}+\frac{\partial u}{\partial z}\frac{\partial z}{\partial t}.\]
Example. Let \(u(x,y)=x^{2}-2xy+2y^{3}\), where
\[x(s,t)=s^{2}\ln t\mbox{ and }y(s,t)=2st^{3}.\]
Since
\[\frac{\partial u}{\partial x}=(2x-2y),\quad\frac{\partial u}{\partial y}=(-2x+6y^{2})\]
and
\[\frac{\partial x}{\partial s}=2s\ln t,\quad\frac{\partial y}{\partial s}=2t^{3},\quad\frac{\partial x}{\partial t}=\frac{s^{2}}{t},\frac{\partial y}{\partial t}=6st^{2},\] we have \[\frac{\partial u}{\partial s}=(2x-2y)(2s\ln t)+(-2x+6y^{2})(2t^{3})\]
and
\[\frac{\partial u}{\partial t}=(2x-2y)\left (\frac{s^{2}}{t}\right )+(-2x+6y^{2})(6st^{2}).\]
Example. Let \(u=x^{2}y^{3}e^{xz}\), where
\[x=s^{2}+t^{2},\quad y=2st\mbox{ and }z=s\ln t.\]
Since
\[\frac{\partial u}{\partial x}=2xy^{3}e^{xz}+x^{2}y^{3}ze^{xz},\quad\frac{\partial u}{\partial y} =3x^{2}y^{2}e^{xz},\quad\frac{\partial u}{\partial z}=x^{3}y^{3}e^{xz}\]
and
\[\frac{\partial x}{\partial s}=2s,\quad\frac{\partial y}{\partial s}=2t,\quad\frac{\partial z}{\partial s}=\ln t,\]
we have
\[\frac{\partial u}{\partial s}=(2xy^{3}e^{xz}+x^{2}y^{3}ze^{xz})(2s)+(3x^{2}y^{2}e^{xz})(2t)+(x^{3}y^{3}e^{xz})(\ln t).\]
Theorem. (Implicit Differentiation). The implicit differentiation are given below.
(i) Let the function \(u(x,y)\) of two-variables be continuously differentiable. Suppose that the equation \(u(x,y)=0\) defines \(y\) implicitly as a differentiable function of \(x\). If \(\partial u/\partial y\neq 0\), then
\[\frac{dy}{dx}=-\frac{\partial u/\partial x}{\partial u/\partial y}.\]
(ii) Let the function \(u(x,y,z)\) of three-variables be continuously differentiable. Suppose that the equation \(u(x,y,z)=0\) defines \(z\) implicitly as a differentiable function of \(x\) and \(y\). If \(\partial u/\partial z\neq 0\), then
\[\frac{\partial z}{\partial x}=-\frac{\partial u/\partial x}{\partial u/\partial z}\mbox{ and }\frac{\partial z}{\partial y} =-\frac{\partial u/\partial y}{\partial u/\partial z}.\]
Proof. To prove part (i), we introduce a variable \(t\) by setting \(x=t\). Then, we have \(u=u(x,y)\) with \(x=t\) and \(y=y(t)\). By the chain rule, we have
\[\frac{du}{dt}=\frac{\partial u}{\partial x}\frac{dx}{dt}+\frac{\partial u}{\partial y}\frac{dy}{dt}.\]
Since \(u(t,y(t))=0\) for all \(t\), it follows \(du/dt=0\). Also, since \(x=t\), we have \(dx/dt=1\) and \(dy/dt=dy/dx\). Therefore, we obtain
\[0=\frac{\partial u}{\partial x}+\frac{\partial u}{\partial y}\frac{dy}{dx},\]
which proves part (i). To prove part (ii), we write \(u=u(x,y,z)\) with \(x=s\), \(y=t\) and \(z=z(s,t)\). Since \(u(s,t,z(s,t))=0\) for all \(s\) and \(t\), it follows \(\partial u/\partial s=0\). Since \(\partial x/\partial s=1\) and \(\partial y/\partial s=0\), we obtain
\begin{align*} 0 & =\frac{\partial u}{\partial s}=\frac{\partial u}{\partial x} \frac{\partial x}{\partial s}+\frac{\partial u}{\partial y}\frac{\partial y} {\partial s}+\frac{\partial u}{\partial z}\frac{\partial z}{\partial s}\\ & = \frac{\partial u}{\partial x}\cdot 1+\frac{\partial u}{\partial y}\cdot 0+ \frac{\partial u}{\partial z}\frac{\partial z}{\partial s}\\ & =\frac{\partial u}{\partial x}+ \frac{\partial u}{\partial z}\frac{\partial z}{\partial x},\end{align*}
which implies
\[\frac{\partial z}{\partial x}=-\frac{\partial u/\partial x}{\partial u/\partial z}.\]
The formula for \(\partial z/\partial y\) can be similarly obtained. This completes the proof. \(\blacksquare\)
Example. Consider the following equation
\[u(x,y)=2xy-y^{3}+1-x-2y=0.\]
Then, we have
\[\frac{dy}{dx}=-\frac{\partial u/\partial x}{\partial u/\partial y}=-\frac{2y-1}{2x-3y^{2}-2}=\frac{1-2y}{2x-3y^{3}-2}.\]
Theorem. Let \(f=f(x,y)\) be a continuous real-valued function defined on the rectangle \([a,b]\times [c,d]\). Suppose that \(\partial f/\partial y\) is continuous on \([a,b]\times [c,d]\). We also assume that the real-valued functions \(p:[c,d]\rightarrow [a,b]\) and \(q:[c,d]\rightarrow [a,b]\) are differentiable on \([c,d]\). Define
\[F(y)=\int_{p(y)}^{q(y)}f(x,y)dx\mbox{ for }y\in [c,d].\]
Then \(F'(y)\) exists for each \(y\in (c,d)\) and is given by
\[F'(y)=\int_{p(y)}^{q(y)}\left [\frac{\partial f}{\partial y}(x,y)\right ]dx+f(q(y),y)q'(y)-f(p(y),y)p'(y).\]
Proof. For \(x_{1},x_{2}\in [a,b]\) and \(x_{3}\in [c,d]\), we define
\[G(x_{1},x_{2},x_{3})=\int_{x_{1}}^{x_{2}}f(t,x_{3})dt.\]
Then \(F(y)=G(p(y),q(y),y)\). The chain rule says
\[F'(y)=\frac{\partial G}{\partial x_{1}}(p(y),q(y),y)p'(y)+\frac{\partial G}{\partial x_{2}}(p(y),q(y),y)q'(y)+\frac{\partial G}{\partial x_{3}}(p(y),q(y),y).\]
Using Theorem \ref{mat110}, we have
\[\frac{\partial G}{\partial x_{1}}(x_{1},x_{2},x_{3})=-f(x_{1},x_{3})\mbox{ and } \frac{\partial G}{\partial x_{2}}(x_{1},x_{2},x_{3})=f(x_{2},x_{3}).\]
Using Theorem \ref{mat367}, we also have
\[\frac{\partial G}{\partial x_{3}}(x_{1},x_{2},x_{3}) =\int_{x_{1}}^{x_{2}}\left [\frac{\partial f}{\partial x_{3}}(x,x_{3})\right ]dx =\int_{x_{1}}^{x_{2}}\left [\frac{\partial f}{\partial y}(x,y)\right ]dx.\]
This completes the proof. \(\blacksquare\)
\begin{equation}{\label{d}}\tag{D}\mbox{}\end{equation}
Jacobian Determinant.
We consider the vector-valued function \({\bf f}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}\). The Jacobian matrix \(D{\bf f}({\bf x})\) is an \(n\times n\) matrix. Its determinant is called a Jacobian determinant and is denoted by \(J_{{\bf f}}({\bf x})\).
Theorem. (Multiplicative Rule for Jacobian Matrix and Determinant). Suppose that \({\bf g}\) is differentiable on an open subset \(T\) of \(\mathbb{R}^{n}\), and that \({\bf h}\) is differentiable on the image \({\bf g}(T)\). Then, the composition \({\bf k}={\bf h}\circ {\bf g}\) is differentiable on \(T\). Moreover, for each \({\bf t}\in T\), we have
\begin{equation}{\label{maeq411}}\tag{53} D{\bf k}({\bf t})=D{\bf h}({\bf g}({\bf t}))D{\bf g}({\bf t}) \end{equation} and \begin{equation}{\label{maeq412}}\tag{54} J_{\bf k}({\bf t})=J_{\bf h}({\bf g}(t))J_{\bf g}({\bf t}). \end{equation}
Proof. The chain rule in Theorem \ref{mat413} says that the composition \({\bf k}\) is differentiable on \(T\). The matrix form of the chain rule also says that the corresponding Jacobian matrices are related as shown in (\ref{maeq411}). From the theory of determinants in linear algebra, we know that
\[\mbox{det}(AB)=\mbox{det}(A)\cdot\mbox{det}(B).\]
Therefore, (\ref{maeq411}) implies (\ref{maeq412}), and the proof is complete. \(\blacksquare\)
\begin{equation}{\label{ma140}}\tag{55}\mbox{}\end{equation}
Proposition \ref{ma140}. Let \(B=B({\bf a};r)\) be an \(n\)-dimensional open ball in \(\mathbb{R}^{n}\), and let \(\partial B\) denote its boundary, i.e.,
\[\partial B=\left\{{\bf x}:\parallel {\bf x}-{\bf a}\parallel =r\right\}.\]
Let \(\bar{B}=B\cup\partial B\) denote its closure. Suppose that the vector-valued function \({\bf f}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}\) is continuous on \(\bar{B}\), and that all the partial derivatives \(\partial f_{i}/\partial x_{j}({\bf x})\) exist for \({\bf x}\in B\). We also assume that \({\bf f}({\bf x})\neq {\bf f}({\bf a})\) for \({\bf x}\in\partial B\), and that the Jacobian determinant \(J_{{\bf f}}({\bf x})\neq 0\) for each \({\bf x}\in B\). Then, the image \({\bf f}(B)\) contains an \(n\)-dimensional open ball with center at \({\bf f}({\bf a})\).
Proof. We define a real-valued function \(g\) on \(\partial B\) by
\[g({\bf x})=\parallel {\bf f}({\bf x})-{\bf f}({\bf a})\parallel\mbox{ for }{\bf x}\in\partial B.\]
Since \({\bf f}({\bf x})\neq {\bf f}({\bf a})\) for \({\bf x}\in\partial B\), it follows \(g({\bf x})>0\) for all \({\bf x}\in\partial B\). Since \({\bf f}\) is continuous on \(\bar{B}\), we also see that \(g\) is continuous on \(\partial B\). Since \(\partial B\) is compact, the function \(f\) attains its minimum on \(\partial B\). In this case, let
\[m=\inf_{{\bf x}\in\partial B}g({\bf x})=\min_{{\bf x}\in\partial B}g({\bf x})=g({\bf x}^{*})\]
for some \({\bf x}^{*}\), which also says that \(m>0\). We shall prove
\begin{equation}{\label{ma139}}\tag{56} B\left ({\bf f}({\bf a});\frac{m}{2}\right )\subseteq {\bf f}(B). \end{equation}
Given any fixed \({\bf y}\in B({\bf f}({\bf a});m/2)\), we define a new function \(h\) on \(\bar{B}\) by
\[h({\bf x})=\parallel {\bf f}({\bf x})-{\bf y}\parallel\mbox{ for }{\bf x}\in\bar{B}.\]
Then \(h\) is continuous on the compact set \(\bar{B}\). This says that \(h\) attains its minimum on \(\bar{B}\). We are going to claim that \(h\) attains its minimum on the open ball \(B\). Since \({\bf y}\in B({\bf f}({\bf a});m/2)\), it says
\[h({\bf a})=\parallel {\bf f}({\bf a})-{\bf y}\parallel<\frac{m}{2}.\]
This says that the minimum of \(h\) on \(\bar{B}\) must also be less than \(m/2\). For each \({\bf x}\in\partial B\), we have
\begin{align*} h({\bf x}) & =\parallel {\bf f}({\bf x})-{\bf y}\parallel=\parallel {\bf f}({\bf x})-{\bf f}({\bf a})-({\bf y}-{\bf f}({\bf a}))\parallel\\ & \geq\parallel {\bf f}({\bf x})-{\bf f}({\bf a})\parallel-\parallel{\bf y}-{\bf f}({\bf a})\parallel\\ & =g({\bf x})-\parallel{\bf y}-{\bf f}({\bf a})\parallel>g({\bf x})-\frac{m}{2}\geq\frac{m}{2}, \end{align*}
which says that the minimum of \(h\) cannot occur on the boundary \(\partial B\). Therefore, there exists \({\bf x}^{*}\in B\) satisfying
\[\inf_{{\bf x}\in\partial B}h({\bf x})=\min_{{\bf x}\in\partial B}h({\bf x})=h({\bf x}^{*}).\]
This also means that \(h^{2}\) attains its minimum at \({\bf x}^{*}\), which says \(\nabla h^{2}({\bf x}^{*})=0\), i.e.,
\[\frac{\partial h^{2}}{\partial x_{j}}({\bf x}^{*})=0\mbox{ for all }j=1,\cdots,n.\]
Since
\[h^{2}({\bf x})=\parallel {\bf f}({\bf x})-{\bf y}\parallel^{2}=\sum_{i=1}^{n}\left [f_{i}({\bf x})-y_{i}\right ]^{2},\]
we have \[0=\frac{\partial h^{2}}{\partial x_{j}}({\bf x}^{*})=\sum_{i=1}^{n}\left [f_{i}({\bf x})-y_{i}\right ]\frac{\partial f_{i}}{\partial x_{j}}({\bf x}^{*}),\]
which is a system of linear equations whose determinant$J_{{\bf f}}({\bf x}^{*})\neq 0$. Therefore, we obtain \(f_{i}({\bf x}^{*})=y_{i}\), i.e., \({\bf f}({\bf x}^{*})={\bf y}\). This says \({\bf y}\in {\bf f}(B)\), which proves the inclusion (\ref{ma139}). This completes the proof. \(\blacksquare\)
\begin{equation}{\label{ma141}}\tag{57}\mbox{}\end{equation}
Proposition \ref{ma141}. Let \(A\) be an open subset of \(\mathbb{R}^{n}\), and let the vector-valued function \({\bf f}:A\rightarrow\mathbb{R}^{n}\) be continuous and have finite partial derivatives \(\partial f_{i}/\partial x_{j}\) on \(A\). Suppose that \({\bf f}\) is one-to-one on \(A\), and that the Jacobian determinant \(J_{{\bf f}}({\bf x})\neq 0\) for each \({\bf x}\in A\). Then \({\bf f}(A)\) is an open set in \(\mathbb{R}^{n}\).
Proof. Given \({\bf b}\in {\bf f}(A)\), there exists \({\bf a}\in A\) satisfying \({\bf b}={\bf f}({\bf a})\). Since \(A\) is open, there exists an open ball \(B({\bf a};r)\) satisfying \(B\equiv B({\bf a};r)\subset A\). Since \({\bf f}\) is one-to-one on \(A\) and \({\bf a}\not\in\partial B\), it means that \({\bf f}({\bf x}) \neq {\bf f}({\bf a})\) for \({\bf x}\in\partial B\). Proposition \ref{ma140} says that \({\bf f}(B)\) contains an open ball with center \({\bf f}({\bf a})={\bf b}\). In other words, there exists an open ball \(B({\bf b};\bar{r})\) satisfying \(B({\bf b};\bar{r})\subset {\bf f}(B)\). Since \({\bf f}(B)\subseteq{\bf f}(A)\), it follows \(B({\bf b};\bar{r})\subset {\bf f}(A)\). This shows that \({\bf f}(A)\) is open, and the proof is complete. \(\blacksquare\)
We recall the Cramer’s rule for solving a system of linear equations. Given a system of linear equations of the form
\[\sum_{j=1}^{n}a_{ij}x_{j}=t_{i}\mbox{ for }i=1,\cdots,n,\]
where \(x_{1},\cdots,x_{n}\) represent the unknowns. This system has a unique solution if and only if the determinant of the coefficient matrix \(A=[a_{ij}]\) is nonzero. The determinant of matrix \(A=[a_{ij}]\) is denoted by \(\mbox{det}[a_{ij}]\). Therefore, when \(\mbox{det}[a_{ij}]\neq 0\), we can obtain the unique solution by Cramer’s rule saying that the solutions are given by
\[x_{k}=\frac{\mbox{det}A_{k}}{\mbox{det}[a_{ij}]},\]
where \(A_{k}\) is the matrix obtained from \(A=[a_{ij}]\) by replacing the \(k\)th column of \(A=[a_{ij}]\) by \(t_{1},\cdots,t_{n}\). In particular, when each \(t_{i}=0\), we obtain each \(x_{k}=0\).
\begin{equation}{\label{map253}}\tag{58}\mbox{}\end{equation}
Proposition \ref{map253}. Let the vector-valued function \({\bf f}:S\rightarrow\mathbb{R}^{n}\) be defined on an open subset \(S\) of \(\mathbb{R}^{n}\) and have continuous partial derivatives \(\partial f_{i}/\partial x_{j}\) on \(S\). Suppose that the Jacobian determinant \(J_{{\bf f}}({\bf a})\neq 0\) for some \({\bf a}\in S\). Then, there exists an \(n\)-dimensional open ball \(B({\bf a};r)\) such that \({\bf f}\) is one-to-one on \(B({\bf a};r)\).
Proof. Let \({\bf z}_{i}\in S\) for \(i=1,\cdots,n\), and let
\[{\bf z}=\left ({\bf z}_{1},\cdots,{\bf z}_{n}\right )\in\mathbb{R}^{n^{2}}.\]
We define a real-valued function \(h\) by
\[h({\bf z})=\mbox{det}\left [\frac{\partial f_{i}}{\partial x_{j}}({\bf z}_{i})\right ].\]
Since each \(\partial f_{i}/\partial x_{j}\) is continuous on \(S\), it follows that \(h\) is continuous at each \({\bf z}\) formed in the above way. We take \({\bf z}_{i}={\bf a}\) for all \(i=1,\cdots,n\). Then, we have \(h({\bf z})=J_{{\bf f}}({\bf a})\neq 0\). Using the continuity, there exists an open ball \(B({\bf a};r)\) satisfying \(\mbox{det}[\partial f_{i}/\partial x_{j}({\bf z}_{i})]\neq 0\) for each \({\bf z}_{i}\in B({\bf a};r)\). We are going to prove that \({\bf f}\) is one-to-one on this open ball \(B({\bf a};r)\). Assume that \({\bf f}({\bf x})={\bf f}({\bf y})\) for some pair of points \({\bf x}\neq {\bf y}\) in \(B({\bf a};r)\). Since \(B({\bf a};r)\) is convex, the line segment \(L({\bf x},{\bf y})\) is contained in \(B({\bf a};r)\). Using the mean-value theorem for differentiation, we have
\[0=f_{i}({\bf y})-f_{i}({\bf x})=\nabla f_{i}({\bf z}_{i})\bullet ({\bf y}-{\bf x})\mbox{ for }i=1,\cdots,n\]
for some
\[{\bf z}_{i}\in L({\bf x},{\bf y})\subset B({\bf a};r)\mbox{ for }i=1,\cdots,n.\]
Therefore, we obtain a system of linear equations given by
\[\sum_{j=1}^{n}\left (y_{j}-x_{j}\right )\frac{\partial f_{i}}{\partial x_{j}}({\bf z}_{i})=0\mbox{ for each }{\bf z}_{i}\in B({\bf a};r).\]
Since \(\mbox{det}[\partial f_{i}/\partial x_{j}({\bf z}_{i})]\neq 0\) for each \({\bf z}_{i}\in B({\bf a};r)\), it follows \(y_{j}=x_{j}\) for all \(j\), which contradicts \({\bf x}\neq {\bf y}\). Therefore, we obtain \({\bf x}\neq {\bf y}\) implies \({\bf f}({\bf x})\neq {\bf f}({\bf y})\) on \(B({\bf a};r)\). This completes the proof. \(\blacksquare\)
We remark that Proposition \ref{map253} is in a local sense and not a global sense. The condition \(J_{{\bf f}}({\bf a})\neq 0\) only guarantees that \({\bf f}\) is one-to-one on some open balls centered at \({\bf a}\). It does not say that \({\bf f}\) is one-to-one on \(S\) even when \(J_{{\bf f}}({\bf a})\neq 0\) for every \({\bf a}\in S\). A function \(f:(S,d_{S})\rightarrow (T,d_{T})\) from one metric space \((S,d_{S})\) to another metric space \((T,d_{T})\) is called an open mapping when, for every open set \(A\) in \(S\), the image \(f(A)\) is open in \(T\).
\begin{equation}{\label{mat263}}\tag{59}\mbox{}\end{equation}
Theorem \ref{mat263}. Let the vector-valued function \({\bf f}:A\rightarrow\mathbb{R}^{n}\) be defined on an open subset \(A\) of \(\mathbb{R}^{n}\) and have continuous partial derivatives \(\partial f_{i}/\partial x_{j}\) on \(A\). Suppose that the Jacobian determinant \(J_{{\bf f}}({\bf a})\neq 0\) for some \({\bf a}\in A\). Then \({\bf f}\) is an open mapping.
Proof. Given any open subset \(S\) of \(A\) and any \({\bf a}\in S\), Proposition \ref{map253} says that there exists an open ball \(B({\bf a};r)\) such that \({\bf f}\) is one-to-one on \(B({\bf a};r)\). Proposition \ref{ma141} also says that \({\bf f}(B({\bf a};r))\) is open in \(\mathbb{R}^{n}\). Since \(S=\bigcup_{{\bf a}\in S}B({\bf a};r)\), it follows
\[{\bf f}(S)=\bigcup_{{\bf a}\in S}{\bf f}(B({\bf a};r)),\]
which says that \({\bf f}(S)\) is open, and the proof is complete. \(\blacksquare\)
If a vector-valued function \({\bf f}:S\rightarrow\mathbb{R}^{m}\) has continuous partial derivatives on \(S\), we say that \({\bf f}\) is continuously differentiable on \(S\). Proposition \ref{map253} says that a continuously differentiable function with a non-vanishing Jacobian determinant at a point \({\bf a}\) has a local inverse on some open ball centered at \({\bf a}\). The next theorem gives some local differentiability properties of this local inverse function.
\begin{equation}{\label{mat264}}\tag{60}\mbox{}\end{equation}
Theorem \ref{mat264}. (Inverse Function Theorem for Vector-Valued Function). Suppose that the vector-valued function \({\bf f}:S\rightarrow\mathbb{R}^{m}\) defined on an open subset \(S\) of \(\mathbb{R}^{n}\) is continuously differentiable on \(S\). Let \(T={\bf f}(S)\). We also assume that the Jacobian determinant \(J_{{\bf f}}({\bf a})\neq 0\) for some point \({\bf a}\in S\). Then there exists two open sets \(X\subseteq S\) and \(Y\subseteq T\) and a uniquely determined vector-valued function \({\bf g}\) such that the following properties are satisfied.
(i) We have \({\bf a}\in X\) and \({\bf f}({\bf a})\in Y\).
(ii) We have \(Y={\bf f}(X)\).
(iii) \({\bf f}\) is one-to-one on \(X\). (iv) \({\bf g}\) is defined on \(Y\) satisfying \({\bf g}(Y)=X\) and \({\bf g}({\bf f}({\bf x}))={\bf x}\) for every \({\bf x}\in X\). (v) \({\bf g}\) is continuously differentiable on \(Y\)
Proof. Since each \(\partial f_{i}/\partial x_{j}\) is continuous on \(S\), it follows the function \(J_{{\bf f}}\) is continuous on \(S\). Since \(J_{{\bf f}}({\bf a})\neq 0\), there exists an open ball \(B({\bf a};r)\) such that \(J_{{\bf f}}({\bf x})\neq 0\) for all \({\bf x}\in B({\bf a};r)\). Proposition \ref{map253} says that there exists an open ball \(B({\bf a};\bar{r})\) such that \(B({\bf a};\bar{r})\subseteq B({\bf a};r)\) and \({\bf f}\) is one-to-one on \(B({\bf a};\bar{r})\). Let \(B({\bf a};\hat{r})\) be an open ball with \(\hat{r}<\bar{r}\). Proposition \ref{ma140} also says that \({\bf f}(B({\bf a};\hat{r}))\) contains an open ball \(B({\bf f}({\bf a});r^{*})\). We write \(Y=B({\bf f}({\bf a});r^{*})\), and let \(X={\bf f}^{-1}(Y)\cap B({\bf a};\hat{r})\). Since \({\bf f}\) is continuous on \(S\), i.e., \({\bf f}^{-1}(Y)\) is open, it follows that \(X\) is open. Since the closure \(\bar{B}({\bf a};\hat{r})\) of \(B({\bf a};\hat{r})\) is compact and \({\bf f}\) is one-to-one and continuous on \(\bar{B}({\bf a};\hat{r})\). Proposition~\ref{map453} says that there exists a function \({\bf g}\) (i.e., the inverse function \({\bf f}^{-1}\)) defined on \({\bf f}(\bar{B}({\bf a};\hat{r}))\) satisfying \({\bf g}({\bf f}({\bf x}))={\bf x}\) for all \({\bf x}\in\bar{B}({\bf a};\hat{r})\). Since \(X\subseteq\bar{B}({\bf a};\hat{r})\) and \(Y\subseteq {\bf f}(\bar{B}({\bf a};\hat{r}))\), this proves parts (i)-(iv). The uniqueness follows from part (iv). To prove part (v), we define a real-valued function \(h\) by
\[h({\bf z})=\mbox{det}\left [\frac{\partial f_{i}}{\partial x_{j}}({\bf z}_{i})\right ],
\mbox{ where }{\bf z}=\left ({\bf z}_{1},\cdots,{\bf z}_{n}\right )\in\mathbb{R}^{n^{2}}.\] Using the argument in the proof of Proposition \ref{map253}, there exists an open ball \(B({\bf a};\epsilon)\) such that \(h({\bf z})\neq 0\) for each \({\bf z}_{i}\in B({\bf a};\epsilon)\) for \(i=1,\cdots,n\). We can assume that the open \(B({\bf a};\bar{r})\) described above was taken to satisfying \(B({\bf a};\bar{r})\subseteq B({\bf a};\epsilon)\). Then, we have \(\bar{B}({\bf a};\hat{r})\subset B({\bf a};\epsilon)\) and \(h({\bf z})\neq 0\) for each \({\bf z}_{i}\in\bar{B}({\bf a};\hat{r})\) for \(i=1,\cdots,n\). We write \({\bf g}=(g_{1},\cdots,g_{n})\) and shall show that each \(g_{k}\) is continuously differentiable on \(Y\) for \(k=1,\cdots,n\). To prove the existence of \(\partial g_{k}/\partial x_{r}\) on \(Y\), given any \({\bf y}\in Y\), since \(Y\) is open, we have \({\bf y}+t{\bf e}_{r}\in Y\) for sufficiently small \(t\), where \({\bf e}_{r}\) denotes the \(r\)th unit coordinate vector. Therefore, we consider
\[\frac{g_{k}({\bf y}+t{\bf e}_{r})-g_{k}({\bf y})}{t}.\]
Let \({\bf x}={\bf g}({\bf y})\), and let \(\bar{\bf x}={\bf g}({\bf y}+t{\bf e}_{r})\). Then \({\bf x},\bar{\bf x}\in X\) satisfying \({\bf f}(\bar{\bf x})-{\bf f}({\bf x})=t{\bf e}_{r}\), which also says
\[f_{i}(\bar{\bf x})-f_{i}({\bf x})=\left\{\begin{array}{ll} 0 & \mbox{if \(i\neq r\)}\\ t & \mbox{if \(i=r\)}. \end{array}\right .\]
Using the mean-value theorem for differentiation, we have
\begin{align} \nabla f_{i}({\bf z}_{i})\bullet\frac{\bar{\bf x}-{\bf x}}{t} & =\frac{f_{i}(\bar{\bf x})-f_{i}({\bf x})}{t}\nonumber\\ & =\left\{\begin{array}{ll} 0 & \mbox{if \(i\neq r\)}\\ 1 & \mbox{if \(i=r\)}. \end{array}\right .\mbox{ for }i=1,\cdots,n, \label{ma142}\tag{61}\end{align}
where each \({\bf z}_{i}\) lies in the line segment joining \({\bf x}\) and \(\bar{\bf x}\), i.e., \({\bf z}_{i}\in B({\bf a};\hat{r})\). Since
\[\mbox{det}\left [\frac{\partial f_{i}}{\partial x_{j}}({\bf z}_{i})\right ]=h({\bf z})\neq 0,\]
from (\ref{ma142}), we can obtain a system of \(n\) linear equations in \(n\) unknowns \(p_{k}\equiv (\bar{x}_{k}-x_{k})/t\) and has a unique solution. Using the Cramer’s rule, we have
\begin{align*} \frac{g_{k}({\bf y}+t{\bf e}_{r})-g_{k}({\bf y})}{t} & =\frac{\bar{x}_{k}-x_{k}}{t}\\ & =p_{k} =\frac{\mbox{det}A_{k}}{\mbox{det}\left [\frac{\partial f_{i}}{\partial x_{j}}({\bf z}_{i})\right ]},\end{align*}
where \(A_{k}\) is the matrix obtained from the matrix \([\partial f_{i}/\partial x_{j}({\bf z}_{i})]\) by replacing the \(k\)th column of \([\partial f_{i}/\partial x_{j}({\bf z}_{i})]\) by \({\bf e}_{r}\). Since \({\bf g}\) is continuous, we have
\[\bar{\bf x}={\bf g}({\bf y}+t{\bf e}_{r})\rightarrow {\bf g}({\bf y})={\bf x}\mbox{ as }t\rightarrow 0,\]
Since \({\bf z}_{i}\) lies in the line segment joining \({\bf x}\) and \(\bar{\bf x}\), we also have
\[{\bf z}_{i}\rightarrow {\bf x}\mbox{ as }t\rightarrow 0.\]
Since \({\bf x}\in X\subset B({\bf a};\hat{r})\), it follows
\[\mbox{det}\left [\frac{\partial f_{i}}{\partial x_{j}}({\bf x})\right ]\neq 0.\]
Let \(\bar{A}_{k}\) be the matrix obtained from the matrix \([\partial f_{i}/\partial x_{j}({\bf x})]\) by replacing the \(k\)th column of \([\partial f_{i}/\partial x_{j}({\bf x})]\) by \({\bf e}_{r}\). Therefore, we obtain
\begin{align*} \frac{\partial g_{k}}{\partial x_{r}}({\bf y}) & =\lim_{t\rightarrow 0} \frac{g_{k}({\bf y}+t{\bf e}_{r})-g_{k}({\bf y})}{t}\\ & =\lim_{t\rightarrow 0}\frac{\mbox{det}A_{k}} {\mbox{det}\left [\frac{\partial f_{i}}{\partial x_{j}}({\bf z}_{i})\right ]}\\ & =\frac{\mbox{det}\bar{A}_{k}} {\mbox{det}\left [\frac{\partial f_{i}}{\partial x_{j}}({\bf x})\right ]},\end{align*}
which shows the existence of \(\partial g_{k}/\partial x_{r}({\bf y})\) for each \({\bf y}\in Y\). Since the matrix \(\bar{A}_{k}\) involves the derivatives \(\partial f_{i}/\partial x_{j}({\bf x})\) and \(\partial f_{i}/\partial x_{j}\) and \({\bf g}\) are continuous, it follows that the partial derivatives \(\partial g_{k}/\partial x_{r}\) are also continuous. This completes the proof. \(\blacksquare\)
The proof of Theorem \ref{mat264} also provide a method to compute the derivatives \(\partial g_{k}/\partial x_{r}({\bf y})\). From (\ref{ma142}), we have
\[\lim_{t\rightarrow 0}\nabla f_{i}({\bf z}_{i})=\nabla f_{i}({\bf x})\mbox{ and } \lim_{t\rightarrow 0}\frac{\bar{x}_{k}-x_{k}}{t}=\frac{\partial g_{k}}{\partial x_{r}}({\bf y}).\]
Therefore, for each fixed \(r\), we obtain the following system of linear equations
\[\nabla f_{i}({\bf x})\bullet\left (\frac{\partial g_{1}}{\partial x_{r}}({\bf y}),\cdots,\frac{\partial g_{n}}{\partial x_{r}}({\bf y})\right )=\left\{\begin{array}{ll} 0 & \mbox{if \(i\neq r\)}\\ 1 & \mbox{if \(i=r\)}. \end{array}\right .\mbox{ for }i=1,\cdots,n.\]
Therefore, we can use Cramer’s rule to obtain the partial derivatives
\[\frac{\partial g_{1}}{\partial x_{r}}({\bf y}),\cdots,\frac{\partial g_{n}}{\partial x_{r}}({\bf y})\]
for \(r=1,\cdots,n\). Suppose that we have an equation of the form \(f(x,t)=0\). The problem is to decide whether this equation determines \(x\) as a function of \(t\). When this is true, we have \(x=g(t)\) for some function \(g\). In this case, we say that \(g\) is defined implicitly by \(f(x,t)=0\). In general, when we have a system of several equations involving several variables, we ask whether we can solve these equations for some of the variables in terms of the remaining variables. The implicit function theorem gives a description of these conditions. The equation of a curve in the xy-plane can be expressed either in an explicit form \(y=f(x)\) or in an implicit form \(F(x,y)=0\). When we adopt the form \(F(x,y)=0\), this does not necessarily represent a function. For example, the equation \(F(x,y)=x^{2}+y^{2}-5=0\) is not a function. The equation \(F(x,y)\) just simply represents a relation saying that the set of all pairs \((x,y)\) satisfy the equation \(F(x,y)=0\). The question is how the equation \(F(x,y)=0\) also defines a function. In other words, when can the equation \(F(x,y)=0\) be solved explicitly for \(y\) in terms of \(x\) and yield a unique solution? The implicit function theorem deals with this question locally. It means that, given a point \((x_{0},y_{0})\) satisfying \(F(x_{0},y_{0})=0\), under some suitable conditions, there will be a neighborhood of \((x_{0},y_{0})\) such that, in this neighborhood, the relation defined by \(F(x,y)=0\) is indeed a function. In general, the theorem treats a system of \(n\) equations in \(n+k\) variables as given below:
\[f_{r}(x_{1},\cdots ,x_{n};t_{1},\cdots ,t_{k})=0\mbox{ for }r=1,\cdots ,n.\]
This system can be solved for \(x_{1},\cdots ,x_{n}\) in terms of \(t_{1},\cdots ,t_{k}\) under some suitable conditions. For convenience, we write \(({\bf x},{\bf t})\in\mathbb{R}^{n+k}\), where \({\bf x}=(x_{1},\cdots ,x_{n})\in\mathbb{R}^{n}\) and \({\bf t}_{k}=(t_{1},\cdots ,t_{k})\in\mathbb{R}^{k}\). The implicit function theorem is presented below.
\begin{equation}{\label{ma138}}\tag{62}\mbox{}\end{equation}
Theorem \ref{ma138}. (Implicit Function Theorem for Vector-Valued Function). Let \({\bf f}:S\rightarrow\mathbb{R}^{n}\) be a vector-valued function defined on an open subset \(S\) of \(\mathbb{R}^{n+k}\) such that \({\bf f}\) is continuously differentiable on \(S\). Suppose that \({\bf f}({\bf x}_{0};{\bf t}_{0})={\bf 0}\) and the \(n\times n\) determinant
\begin{equation}{\label{ma143}}\tag{63} \mbox{det}\left [\frac{\partial f_{i}}{\partial x_{j}}({\bf x}_{0};{\bf t}_{0})\right ]\neq 0. \end{equation}
Then, there exists a \(k\)-dimensional open set \(T_{0}\) containing \({\bf t}_{0}\) and an only one vector-valued function \({\bf g}:T_{0}\rightarrow\mathbb{R}^{n}\) defined on \(T_{0}\) such that the following properties are satisfied.
- \({\bf g}\) is continuously differentiable on \(T_{0}\).
- We have \({\bf g}({\bf t}_{0})={\bf x}_{0}\).
- We have \({\bf f}({\bf g}({\bf t});{\bf t})={\bf 0}\) for each \({\bf t}\in T_{0}\).
Proof. We define a vector-valued function
\[{\bf F}=\left (F_{1},\cdots,F_{n};F_{n+1},\cdots,F_{n+k}\right )\]
on \(S\) into \(\mathbb{R}^{n+k}\) given by
\[F_{m}({\bf x};{\bf t})=f_{m}({\bf x};{\bf t})\mbox{ for }1\leq m\leq n\]
and
\[F_{n+m}({\bf x};{\bf t})=t_{m}\mbox{ for }1\leq m\leq k.\]
In this case, we can write \({\bf F}=({\bf f};{\bf I})\), where \({\bf f}=(f_{1},\cdots,f_{n})\) and \({\bf I}\) is an identity function given by \({\bf I}({\bf t})={\bf t}\) for each \({\bf t}\in\mathbb{R}^{k}\). Since
\[\frac{\partial F_{n+j}}{\partial x_{i}}=0\mbox{ for }1\leq i\leq n\mbox{ and }1\leq j\leq k,\]
it is not difficult to see that the Jacobian \(J_{\bf F}({\bf x};{\bf t})\) has the same value as the \(n\times n\) determinant \(\mbox{det}[\partial f_{i}/\partial x_{j}({\bf x};{\bf t})]\), which also says \(J_{\bf F}({\bf x}_{0};{\bf t}_{0})\neq 0\) by referring to (\ref{ma143}). We also have \[{\bf F}({\bf x}_{0};{\bf t}_{0})=({\bf f}({\bf x}_{0};{\bf t}_{0});{\bf I}({\bf x}_{0};{\bf t}_{0}))=({\bf 0};{\bf t}_{0}).\] Therefore, using inverse function Theorem \ref{mat264}, there exists open sets \(X\) and \(Y\) containing \(({\bf x}_{0};{\bf t}_{0})\) and \(({\bf 0};{\bf t}_{0})\), respectively, such that \({\bf F}\) is one-to-one on \(X\) satisfying \(X={\bf F}^{-1}(Y)\). There also exists a local inverse function \({\bf G}:Y\rightarrow X\) such that \({\bf G}\) is continuously differentiable on \(Y\) satisfying \[{\bf G}({\bf F}({\bf x};{\bf t}))=({\bf x};{\bf t}).\] More precisely, we have \({\bf G}=({\bf v};{\bf w})\), where \({\bf v}=(v_{1},\cdots,v_{n})\) is a vector-valued function defined on \(Y\) and \({\bf w}=(w_{1},\cdots,w_{k})\) is also a vector-valued function defined on \(Y\). Since \({\bf G}({\bf F}({\bf x};{\bf t}))=({\bf x};{\bf t})\), it follows
\[{\bf v}({\bf F}({\bf x};{\bf t}))={\bf x}\mbox{ and }{\bf w}({\bf F}({\bf x};{\bf t}))={\bf t}.\]
Since \({\bf F}\) is one-to-one on \(X\) and the inverse image \({\bf F}^{-1}(Y)\) contains \(X\), given any point \(({\bf x};{\bf t})\in Y\), we have \(({\bf x};{\bf t})={\bf F}(\bar{\bf x};\bar{\bf t})\) for some \((\bar{\bf x};\bar{\bf t})\in X\). Since \({\bf F}=({\bf f};{\bf I})\), we also have \({\bf t}=\bar{\bf t}\). Therefore, we obtain
\[{\bf v}({\bf x};{\bf t})={\bf v}({\bf F}(\bar{\bf x};{\bf t}))=\bar{\bf x}\mbox{ and } {\bf w}({\bf x};{\bf t})={\bf w}({\bf F}(\bar{\bf x};{\bf t}))={\bf t}.\]
Therefore, given any \(({\bf x};{\bf t})\in Y\), we have \({\bf G}({\bf x};{\bf t})=(\bar{\bf x};{\bf t})\), where \(\bar{\bf x}\) satisfies \(({\bf x};{\bf t}) ={\bf F}(\bar{\bf x};{\bf t})\), which implies \begin{equation}{\label{ma144}}\tag{64} {\bf F}({\bf v}({\bf x};{\bf t});{\bf t})=({\bf x};{\bf t})\mbox{ for each } ({\bf x};{\bf t})\in Y. \end{equation} Now, we are going to define the set \(T_{0}\) and the function \({\bf g}\). Let
\[T_{0}=\left\{{\bf t}\in\mathbb{R}^{k}:({\bf 0};{\bf t})\in Y\right\}\]
and define \({\bf g}({\bf t})={\bf v}({\bf 0};{\bf t})\) for each \({\bf t}\in T_{0}\). We see that the set \(T_{0}\) is open in \(\mathbb{R}^{k}\). Since \({\bf G}\) is continuous differentiable on \(Y\) and the components of \({\bf g}\) are taken from the components of \({\bf G}\), we also see that \({\bf g}\) is continuously differentiable on \(T_{0}\). Since \(({\bf 0};{\bf t}_{0})={\bf F}({\bf x}_{0};{\bf t}_{0})\), it follows
\[{\bf g}({\bf t}_{0})={\bf v}({\bf 0};{\bf t}_{0})={\bf x}_{0}.\]
The equation (\ref{ma144}) says that \({\bf f}({\bf v}({\bf x};{\bf t});{\bf t})={\bf x}\). By taking \({\bf x}={\bf 0}\), we obtain \({\bf f}({\bf g}({\bf t});{\bf t})={\bf 0}\). This proves the desired three properties. We remain to claim that the function \({\bf g}\) is unique. Suppose that there exists another function \({\bf h}\) satisfying \({\bf f}({\bf h}({\bf t});{\bf t})={\bf 0}\). Then, we have
\[{\bf f}({\bf g}({\bf t});{\bf t})={\bf f}({\bf h}({\bf t});{\bf t}).\]
Since \({\bf f}\) is one-to-one, it follows \(({\bf g}({\bf t});{\bf t})=({\bf h}({\bf t});{\bf t})\), which also implies \({\bf g}({\bf t})={\bf h}({\bf t})\) for each \({\bf t}\in T_{0}\). This completes the proof. \(\blacksquare\)
The important results in the multiple integrals is the formula for making change of variables. The Riemann integral for one-dimensional case is shown in Theorem \ref{mat112} and the formula is given by
\[\int_{a}^{b}f(g(t))g'(t)dt=\int_{g(a)}^{g(b)}f(x)dx.\]
We are going to generalize the above formula in the multiple integrals based on the coordinate transformation.
Definition. Let \(T\) be an open subset of \(\mathbb{R}^{n}\). A vector-valued function \({\bf g}:T\rightarrow\mathbb{R}^{n}\) defined on \(T\) is called a coordinate transformation on \(T\) when the following conditions are satisfied.
- \({\bf g}\) is continuously differentiable on \(T\).
- \({\bf g}\) is one-to-one on \(T\).
- The Jacobian determinant \(J_{\bf g}({\bf t})\neq 0\) for all \({\bf t}\in T\). \(\sharp\)
From Proposition \ref{map253}, the first condition says that \({\bf g}\) is locally one-to-one near each point in which its Jacobian determinant does not vanish. The second condition assumes that \({\bf g}\) is globally one-to-one on \(T\). This guarantees the existence of a global inverse \({\bf g}^{-1}\) that is defined and one-to-one on the image \({\bf g}(T)\). The first and third conditions together imply that \({\bf g}\) is an open mapping by Theorem \ref{mat263}. We also see that \({\bf g}^{-1}\) is continuously differentiable on \({\bf g}(T)\) by Theorem \ref{mat264}. The coordinate transformation \({\bf g}\) and its inverse \({\bf g}^{-1}\) set up a one-to-one correspondence between the open subsets of \(T\) and the open subsets of \({\bf g}(T)\), and also between the compact subsets of \(T\) and the compact subsets of \({\bf g}(T)\). We are going to provide some examples that are commonly used coordinate transformation.
Example. (Polar coordinates in \(\mathbb{R}^{2}\)). We take
\[T=\left\{(t_{1},t_{2}):t_{1}>0\mbox{ and }0<t_{2}<2\pi\right\}\]
and consider the function \({\bf g}=(g_{1},g_{2}):T\rightarrow\mathbb{R}^{2}\) defined by
\[g_{1}(t_{1},t_{2})=t_{1}\cos t_{2}\mbox{ and }g_{2}(t_{1},t_{2})=t_{1}\sin t_{2}.\]
We usually write \((t_{1},t_{2})\) as \((r,\theta )\). The coordinate transformation \({\bf g}\) maps each point \((r,\theta )\in T\) onto the point \((x,y)\) in \({\bf g}(T)\) given by the following formulas
\[x=r\cos\theta\mbox{ and }y=r\sin\theta .\]
The Jacobian determinant is given by \[J_{\bf g}(r,\theta )=\left |\begin{array}{cc} \cos\theta & \sin\theta\\ -r\sin\theta & r\cos\theta \end{array}\right |=r.\]
Example. (Cylindrical coordinates in \(\mathbb{R}^{3}\)). We write \((t_{1},t_{2},t_{3})=(r,\theta ,z)\) and take
\[T=\left\{(r,\theta ,z):r>0,0<\theta <2\pi\mbox{ and }z\in\mathbb{R}\right\}.\]
The coordinate transformation \({\bf g}\) maps each point$(r,\theta ,z)$ onto the point \((x,y,z)\) in the image \({\bf g}(T)\) given by the following formulas
\[x=r\cos\theta,\quad y=r\sin\theta\mbox{ and }z=z.\]
The Jacobian determinant is given by
\[J_{\bf g}(r,\theta ,z)=\left |\begin{array}{ccc} \cos\theta & \sin\theta & 0\\ -r\sin\theta & r\cos\theta & 0\\ 0 & 0 & 1 \end{array}\right |=r.\]
Example. (Spherical coordinates in \(\mathbb{R}^{3}\)). We write \((t_{1},t_{2},t_{3})=(\rho ,\theta ,\phi )\) and take
\[T=\left\{(\rho ,\theta ,\phi ):\rho >0,0<\theta <2\pi\mbox{ and }0<\phi <\pi\right\}.\]
The coordinate transformation \({\bf g}\) maps each point \((\rho ,\theta ,\phi )\) onto the point \((x,y,z)\) in the image \({\bf g}(T)\) given by the following formulas
\[x=\rho\cos\theta\sin\phi,\quad y=\rho\sin\theta\sin\phi\mbox{ and }z=\rho\cos\phi.\]
The Jacobian determinant is given by
\[J_{\bf g}(r,\theta ,z)=\left |\begin{array}{ccc} \cos\theta\sin\phi & \sin\theta\sin\phi & \cos\phi\\ -\rho\sin\theta\sin\phi & \rho\cos\theta\sin\phi & 0\\ \rho\cos\theta\cos\phi & \rho\sin\theta\cos\phi & -\rho\sin\phi \end{array}\right |=-\rho^{2}\sin\phi .\]
\begin{equation}{\label{ma145}}\tag{65}\mbox{}\end{equation}
Theorem \ref{ma145}. Let \(T\) be an open subset of \(\mathbb{R}^{n}\), and let \({\bf g}\) be a coordinate transformation on \(T\). Let \(f\) be a real-valued function defined on the image \({\bf g}(T)\). Suppose that the Lebesgue integral
\[\int_{{\bf g}(T)}f({\bf x})d{\bf x}\]
exists. Then, the Lebesgue integral
\[\int_{T}f({\bf g}({\bf t}))|J_{\bf g}({\bf t})|d{\bf t}\] also exists and we have the formula \[\int_{{\bf g}(T)}f({\bf x})d{\bf x}=\int_{T}f({\bf g}({\bf t}))|J_{\bf g}({\bf t})|d{\bf t}.\]
The concept of Lebesgue integral extends the concept of Riemann integral. It means that Theorem \ref{ma145} is still valid regarding the multiple Riemann integrals. Some Examples for the multiple Riemann integrals are provided below.
Example. Suppose that we wish to evaluate
\[\int\!\!\!\!\!\int_{\Omega}(x^{2}+y^{2})d(x,y),\]
where \(\Omega\) is an unit disc given by
\[\Omega =\left\{(x,y):x^{2}+y^{2}\leq 1\right\}.\]
We change the variables \((x,y)\) into the polar coordinates \((r,\theta )\) by \(x=r\cos\theta\) and \(y=r\sin\theta\). Then, the Jacobian determinant \(J_{\bf g}\) is \(r\) and the region \(\Omega\) is transformed into the following region
\[\Gamma =\left\{(r,\theta ):0\leq r\leq 1\mbox{ and }0\leq\theta\leq 2\pi\right\}.\]
Therefore, we obtain
\begin{align*} \int\!\!\!\!\!\int_{\Omega}(x^{2}+y^{2})d(x,y) & =\int\!\!\!\!\!\int_{\Gamma}r^{2}\cdot rd(r,\theta )\\ & =\int_{0}^{1}\left [\int_{0}^{2\pi}r^{3}d\theta\right ]dr =\int_{0}^{1}2\pi r^{3}dr =\frac{\pi}{2}. \end{align*}
Example. The function \(f(x)=e^{-x^{2}}\) has no elementary anti-derivative. Nevertheless, using polar coordinates, we can obtain
\[\int_{-\infty}^{\infty} e^{-x^{2}}dx=\sqrt{\pi}.\]
We consider the following integrals
\begin{align*} \left (\int_{-\infty}^{\infty} e^{-x^{2}}dx\right )^{2} & =\left (\int_{-\infty}^{\infty} e^{-x^{2}}dx\right )\left (\int_{-\infty}^{\infty} e^{-x^{2}}dx\right )\\ & =\left (\int_{-\infty}^{\infty} e^{-x^{2}}dx\right )\left (\int_{-\infty}^{\infty} e^{-y^{2}}dx\right )\\ & =\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}e^{-(x^{2}+y^{2})}dxdy. \end{align*}
Let \(x=r\cos\theta\) and \(y=r\sin\theta\). We have
\begin{align*} \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} e^{-(x^{2}+y^{2})}dxdy & =\int_{0}^{2\pi}\int_{0}^{\infty} re^{-r^{2}}drd\theta\\ & =\int_{0}^{2\pi}\left [-\frac{1}{2}e^{-r^{2}}\right ]_{0}^{\infty}d\theta =\frac{1}{2}\int_{0}^{2\pi}d\theta=\pi . \end{align*}
Example. We wish to evaluate
\[\int\!\!\!\!\!\int_{\Omega} (x+y)^{2}d(x,y),\]
where \(\Omega\) is the parallelogram bounded by the lines
\[x+y=0,\quad x+y=1,\quad 2x-y=0\mbox{ and }2x-y=3.\]
The boundaries suggest that we set \(u=x+y\) and \(v=2x-y\). Then, we can solve them to obtain
\[x=x(u,v)=\frac{u+v}{3}\mbox{ and }y=y(u,v)=\frac{2u-v}{3}.\]
The Jacobian determinant is given by
\[J_{\bf g}(u,v)=\left |\begin{array}{cc} {\displaystyle \frac{\partial}{\partial u}\left (\frac{u+v}{3}\right )} & {\displaystyle \frac{\partial}{\partial u}\left (\frac{2u-v}{3}\right )}\\ {\displaystyle \frac{\partial}{\partial v}\left (\frac{u+v}{3}\right )} & {\displaystyle \frac{\partial}{\partial v}\left (\frac{2u-v}{3}\right )} \end{array}\right | =\left |\begin{array}{cc} {\displaystyle \frac{1}{3}} & {\displaystyle \frac{2}{3}}\\ {\displaystyle \frac{1}{3}} & -{\displaystyle \frac{1}{3}} \end{array}\right | =-\frac{1}{3}.\]
The region \(\Omega\) is transformed into the following region
\[\Gamma =\left\{(u,v):0\leq u\leq 1\mbox{ and }0\leq v\leq 3\right\}.\]
Therefore, we obtain
\[\int\!\!\!\!\!\int_{\Omega} (x+y)^{2}d(x,y)=\int\!\!\!\!\!\int_{\Gamma}u^{2}|J_{\bf g}(u,v)|d(u,v)= \frac{1}{3}\int_{0}^{3}\left [\int_{0}^{1} u^{2}du\right ]dv=\frac{1}{3}.\]
Example. We wish to evaluate
\[\int\!\!\!\!\!\int_{\Omega} xyd(x,y)\]
where \(\Omega\) is the first-quadrant region bounded by the curves
\[x^{2}+y^{2}=4,\quad x^{2}+y^{2}=9,\quad x^{2}-y^{2}=1\mbox{ and }x^{2}-y^{2}=4.\]
The boundaries suggest that we set \(u=x^{2}+y^{2}\) and \(v=x^{2}-y^{2}\). We can solve them to obtain
\[x=\sqrt{\frac{u+v}{2}}\mbox{ and }y=\sqrt{\frac{u-v}{2}}.\]
The Jacobian determinant is given by
\[J_{\bf g}(u,v)=\left |\begin{array}{ll} {\displaystyle \frac{\partial}{\partial u}\left (\sqrt{\frac{u+v}{2}}\right )} & {\displaystyle \frac{\partial}{\partial u}\left (\sqrt{\frac{u-v}{2}}\right )}\\ {\displaystyle \frac{\partial}{\partial v}\left (\sqrt{\frac{u+v}{2}}\right )} & {\displaystyle \frac{\partial}{\partial v}\left (\sqrt{\frac{u-v}{2}}\right )} \end{array}\right |=-\frac{1}{4\sqrt{u^{2}-v^{2}}}.\]
The region \(\Omega\) is transformed into the following region
\[\Gamma =\left\{(u,v):4\leq u\leq 9\mbox{ and }1\leq v\leq 4\right\}.\]
Therefore, we obtain
\begin{align*} \int\!\!\!\!\!\int_{\Omega} xyd(x,y) & =\int\!\!\!\!\!\int_{\Gamma}\left (\sqrt{\frac{u+v}{2}} \right )\left (\sqrt{\frac{u-v}{2}}\right )\left (\frac{1}{4\sqrt{u^{2}-v^{2}}}\right )d(u,v)\\ & =\frac{1}{8}\int_{1}^{4}\left [\int_{4}^{9}du\right ]dv=\frac{15}{8}. \end{align*}
Example. We wish to evaluate
\[\int\!\!\!\!\!\int\!\!\!\!\!\int_{\Omega}(x^{2}+y^{2})d(x,y,z),\]
where the region \(\Omega\) is given by
\[\Omega =\left\{(x,y,z):-2\leq x\leq 2, -\sqrt{4-x^{2}}\leq y\leq\sqrt{4-x^{2}}\mbox{ and }0\leq z\leq 4-x^{2}-y^{2}\right\}.\]
This region \(\Omega\) is bounded above by the paraboloid of revolution \(z=4-x^{2}-y^{2}\) and below by the \(xy\)-plane. Since \(\Omega\) is symmetric about the \(z\)-axis, the triple integral has a simple representation in cylindrical coordinates as given by
\[\Gamma=\left\{(r,\theta ,z):0\leq r\leq 2,0\leq\theta\leq 2\pi\mbox{ and }0\leq z\leq 4-r^{2}\right\}.\]
Therefore, we obtain
\begin{align*} \int\!\!\!\!\!\int\!\!\!\!\!\int_{\Omega} (x^{2}+y^{2})d(x,y,z) & =\int\!\!\!\!\!\int\!\!\!\!\!\int_{\Gamma} r^{2}\cdot rd(r,\theta ,z)\\ & =\int_{0}^{2\pi}\int_{0}^{2}\int_{0}^{4-r^{2}}r^{3}dzdrd\theta\\ & =\int_{0}^{2\pi}\int_{0}^{2} (4r^{3}-r^{5})drd\theta\\ & =\int_{0}^{2\pi}\left [r^{4}-\frac{1}{6}r^{6}\right ]_{0}^{2}d\theta=\frac{32\pi}{3}. \end{align*}
Example. Find the volume of the solid \(\Omega\) enclosed by the surface
\[(x^{2}+y^{2}+z^{2})^{2}=2z(x^{2}+y^{2}).\]
Using the spherical coordinates, we obtain
\[\rho^{4}=2\rho\cos\phi\rho^{2}\sin^{2}\phi,\]
which implies \(\rho =2\sin^{2}\phi\cos\phi\). This equation places no restriction on \(\theta\), i.e., \(\theta\) can range from \(0\) to \(2\pi\). Since \(\rho\) remains nonnegative, we see that \(\phi\) can range only from \(0\) to \(\pi/2\). Thus, the solid \(\Omega\) is transformed into the following solid
\[\Gamma=\left\{(\rho ,\theta ,\phi ):0\leq\theta\leq 2\pi,0\leq\phi\leq\frac{1}{2}\pi\mbox{ and }0\leq\rho\leq 2\sin^{2}\phi\cos\phi\right\}.\]
Therefore, we obtain
\begin{align*} \int\!\!\!\!\!\int\!\!\!\!\!\int_{\Omega}d(x,y,z)& =\int\!\!\!\!\!\int\!\!\!\!\!\int_{\Gamma}\rho^{2}\sin\phi d(\rho ,\theta ,\phi )\\ & =\int_{0}^{2\pi}\int_{0}^{\pi /2}\int_{0}^{2\sin^{2}\phi\cos\phi}\rho^{2}\sin\phi d\rho d\theta d\phi\\ & =\int_{0}^{2\pi}\int_{0}^{\pi/2}\frac{8}{3}\sin^{7}\phi\cos^{3}\phi d\phi d\theta\\ & =\frac{8}{3}\left (\int_{0}^{2\pi}d\theta\right )\left (\int_{0}^{\pi /2} (\sin^{7}\phi\cos\phi -\sin^{9}\phi\cos\phi )d\phi\right )\\ & =\frac{8}{3}\cdot 2\pi\cdot\frac{1}{40}=\frac{2}{15}\pi. \end{align*}
\begin{equation}{\label{e}}\tag{E}\mbox{}\end{equation}
Optimum Problems.
We are going to study the optimum of real-valued functions. We first consider the real-valued functions of one variable. Let \(f\) be a real-valued function defined on an open interval \(I\). Suppose that \(f\) has a local extremum at an interior point \(c\) of \(I\). It is well-known that if \(f\) has a derivative (finite or infinite) at \(c\), then \(f'(c)=0\). However, the converse is not true in general. For example, the function \(f(x)=x^{3}\) satisfies \(f'(0)=0\), but it is increasing in every neighborhood of \(0\). Now, we are going to derive the sufficient conditions.
Proposition. Given some integer \(n\geq 1\), we assume that \(f\) has a continuous \(n\)th derivative in the open interval \((a,b)\). We also assume that, for some interior point \(c\in (a,b)\), we have
\begin{equation}{\label{maeq472}}\tag{66} f'(c)=f”(c)=\cdots =f^{(n-1)}(c)=0\mbox{ and }f^{(n)}(c)\neq 0. \end{equation}
Then, we have the following results.
- Suppose that \(n\) is even. Then \(f\) has a local minimum at \(c\) when \(f^{(n)}(c)>0\), and a local maximum at \(c\) when \(f^{(n)}(c)<0\).
- Suppose that \(n\) is odd. Then, there is neither a local maximum nor a local minimum at \(c\).
Proof. Since \(f^{(n)}(c)\neq 0\) and \(f^{(n)}\) is continuous on the open interval \((a,b)\), there exists an interval \(I(c)\) such that, for every \(x\in I(c)\), the derivative \(f^{(n)}(x)\) will have the same sign as \(f^{(n)}(c)\). Applying the one-dimensional Taylor’s formula at \(c\) and using (\ref{maeq472}), for every \(x\in I(c)\), we have
\[f(x)-f(c)=\frac{f^{(n)}(x_{1})}{n!}(x-c)^{n}\mbox{ for }x_{1}\in I(c),\]
where \(f^{(n)}(x_{1})\) and \(f^{(n)}(c)\) have the same sign.
- Suppose that \(n\) is even. Then \(f^{(n)}(c)>0\) implies \(f(x)\geq f(c)\), and \(f^{(n)}(c)\leq 0\) implies \(f(x)\leq f(c)\).
- Suppose that \(n\) is odd and \(f^{(n)}(c)>0\). Then \(x>c\) implies \(f(x)>f(c)\), and \(x<c\) implies \(f(x)<f(c)\). This says that there is no extremum at \(c\).
- Suppose that \(n\) is odd and \(f^{(n)}(c)<0\). Then, we can obtain the similar statement.
This completes the proof. \(\blacksquare\)
Now, we turn to consider the real-valued functions of several variables.
Definition. Let \(f\) be a real-valued function defined on a subset \(S\) of an \(n\)-dimensional Euclidean space \(\mathbb{R}^{n}\). Given \({\bf a}\in S\), we say that \(f\) has a local maximum at \({\bf a}\) when there is an \(n\)-dimensional ball \(B({\bf a})\) satisfying \(f({\bf x})\leq f({\bf a})\) for all \({\bf x}\in B({\bf a})\cap S\). When \(f({\bf x})\geq f({\bf a})\) for all \({\bf x}\in B({\bf a})\cap S\), \(f\) is said to have a {\bf local minimum} at \({\bf a}\). The local maxima and local minima together comprise the local extreme values. \(\sharp\)
Theorem. Let \(f\) be a real-valued function defined on \(S\subseteq\mathbb{R}^{n}\) such that it is differentiable at \({\bf x}_{0}\). Suppose that \(f\) has a local extreme values at \({\bf x}_{0}\). Then either \(\nabla f({\bf x}_{0})={\bf 0}\) or \(\nabla f({\bf x}_{0})\) does not exist. \(\sharp\)
Definition. Let \(f:\mathbb{R}^{n}\rightarrow\mathbb{R}\) be a real-valued function defined on \(\mathbb{R}^{n}\). The point \({\bf a}\) is called a stationary point of \(f\) when \(f\) is differentiable at \({\bf a}\) with \(\nabla f({\bf a})={\bf 0}\). A stationary point is called a saddle point when every \(n\)-dimensional open ball \(B({\bf a})\) contains some points \({\bf x}\) with \(f({\bf x})>f({\bf a})\) and other points \({\bf x}\) with \(f({\bf x})<f({\bf a})\). \(\sharp\)
Example. Given a function \(f(x,y)=2x^{2}+y^{2}-xy-7y\), we have \(\nabla f(x,y)=(4x-y,2y-x-7)\). Set \(\nabla f(x,y)=0\). The point \((1,4)\) is the only stationary point. We shall compare the value of \(f\) with the value of \(f\) at nearby points \((1+h,4+k)\). Now, we have \(f(1,4)=-14\) and
\[f(1+h,4+k)=2h^{2}+k^{2}-hk-14.\]
Since the difference
\begin{align*} f(1+h,4+k)-f(1,4) & =h^{2}+(h^{2}-hk+k^{2})\\ & \geq h^{2}+(h^{2}-2|h||k|+k^{2})\\ & =h^{2}+(|h|-|k|)^{2}\geq 0, \end{align*}
it follows \(f(1+h,4+k)\geq f(1,4)\) for sufficiently small \(h\) and \(k\), which shows that \(f\) has a local minimum at \((1,4)\) with local minimum is \(-14\). \(\sharp\)
Example. Given a function \(f(x,y)=y^{2}-xy+2x+y+1\), we have \(\nabla f(x,y)=(2-y,2y-x+1)\). The gradient is \({\bf 0}\) when \(x=5\) and \(y=2\). The point \((5,2)\) is the only stationary point. We shall compare the value of \(f\) at \((5,2)\) with the value of \(f\) at nearby points \((5+h,2+k)\). Now, we have \(f(5,2)=7\) and
\[f(5+h,2+k)=k^{2}-hk+7.\]
The difference
\[d=f(5+h,2+k)-f(5,2)=k(k-h)\]
does not keep a constant sign for sufficiently small \(h\) and \(k\). It says that \((5,2)\) is a saddle point. \(\sharp\)
\begin{equation}{\label{mat384}}\tag{67}\mbox{}\end{equation}
Theorem \ref{mat384}. (Second-Order Derivative Test for Extrema). Let \(f:\mathbb{R}^{n}\rightarrow\mathbb{R}\) be a real-valued function defined on \(\mathbb{R}^{n}\), and let \({\bf a}\) be a stationary point of \(f\). Suppose that the second-order partial derivatives \(\partial^{2}f/\partial x_{i}\partial x_{j}\) exist in an \(n\)-dimensional open ball \(B({\bf a})\) and are continuous at \({\bf a}\). We define
\begin{equation}{\label{maeq383}}\tag{68} Q({\bf t})=\frac{1}{2}f”({\bf a};{\bf t})=\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}({\bf a})t_{i}t_{j}. \end{equation}
Then, we have the following results.
(i) Suppose that \(Q({\bf t})>0\) for all \({\bf t}\neq {\bf 0}\). Then \(f\) is a local minimum at \({\bf a}\).
(ii) Suppose that \(Q({\bf t})<0\) for all \({\bf t}\neq {\bf 0}\). Then \(f\) is a local maximum at \({\bf a}\).
(iii) Suppose that \(Q({\bf t})\) takes both positive and negative values. Then \({\bf a}\) is a saddle point of \(f\).
Proof. For \(m=2\) and \({\bf y}={\bf a}+{\bf t}\) in Theorem \ref{mat380}, we have
\begin{equation}{\label{maeq381}}\tag{69} f({\bf a}+{\bf t})-f({\bf a})=\langle\nabla f({\bf a}),{\bf t}\rangle+\frac{1}{2}f”({\bf z};{\bf t}), \end{equation}
where \({\bf z}\) lies on the line segment joining the points \({\bf a}\) and \({\bf a}+{\bf t}\), and
\[f”({\bf z};{\bf t})=\sum_{i=1}^{n}\sum_{j=1}^{n}\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}({\bf z})t_{i}t_{j}.\]
At a stationary point, we have \(\nabla f({\bf a})={\bf 0}\). Therefore, the expression (\ref{maeq381}) becomes
\[f({\bf a}+{\bf t})-f({\bf a})=\frac{1}{2}f”({\bf z};{\bf t}),\]
which says that, as \({\bf a}+{\bf t}\) ranges over \(B({\bf a};r)\), the algebraic sign of \(f({\bf a}+{\bf t})-f({\bf a})\) is determined by that of \(f”({\bf z};{\bf t})\). In this case, we can rewrite (\ref{maeq381}) as
\begin{equation}{\label{maeq382}}\tag{70} f({\bf a}+{\bf t})-f({\bf a})=\frac{1}{2}f”({\bf a};{\bf t})+\parallel {\bf t}\parallel^{2}E({\bf t}), \end{equation}
where
\[\parallel {\bf t}\parallel^{2}E({\bf t})=\frac{1}{2}f”({\bf z};{\bf t})-\frac{1}{2}f”({\bf a};{\bf t}).\]
Since \((|t_{i}|-|t_{j}|)^{2}\geq 0\), it follows \[|t_{i}t_{j}|\leq 2|t_{i}t_{j}|\leq t_{i}^{2}+t_{j}^{2}\leq\parallel {\bf t}\parallel^{2}.\] Then, we have
\begin{align*} \parallel {\bf t}\parallel^{2}|E({\bf t})| & \leq\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\left |\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}({\bf z})- \frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}({\bf a})\right |\cdot|t_{i}t_{j}|\\ & \leq\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\left |\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}({\bf z})- \frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}({\bf a})\right |\cdot\parallel {\bf t}\parallel^{2}. \end{align*}
Therefore, we obtain
\[|E({\bf t})|\leq\frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\left |\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}({\bf z})- \frac{\partial^{2}f}{\partial x_{i}\partial x_{j}}({\bf a})\right |,\]
which says \(E({\bf t})\rightarrow 0\) as \({\bf t}\rightarrow {\bf 0}\), since \({\bf z}\) lies on the line segment joining the points \({\bf a}\) and \({\bf a}+{\bf t}\), and the second-order partial derivatives of \(f\) are continuous at \({\bf a}\). The function \(Q\) is continuous at each point \({\bf t}\in\mathbb{R}^{n}\). Let
\[S=\{{\bf t}:\parallel {\bf t}\parallel =1\}\]
denote the boundary of the \(n\)-dimensional ball \(B({\bf 0};1)\). Suppose that \(Q({\bf t})>0\) for all \({\bf t}\neq {\bf 0}\). Then, the function \(Q\) is positive on \(S\). Since \(S\) is compact, the function \(Q\) has a minimum on \(S\), which is denoted by \(m\). Then, we have \(m>0\). Since \(Q(c{\bf t})=c^{2}Q({\bf t})\) for each \(c\in\mathbb{R}\), by taking \(c=1/\parallel {\bf t}\parallel\) with \({\bf t}\neq {\bf 0}\), we have \(c{\bf t}\in S\), i.e., \(c^{2}Q({\bf t})\geq m\), which also says that \(Q({\bf t})\geq m\parallel {\bf t}\parallel^{2}\). Applying this to (\ref{maeq382}), we obtain \begin{align*} f({\bf a}+{\bf t})-f({\bf a}) & =Q({\bf t})+\parallel {\bf t}\parallel^{2}E({\bf t})\\ & \geq m\parallel {\bf t}\parallel^{2}+\parallel {\bf t}\parallel^{2}E({\bf t}).\end{align*} Since \(E({\bf t})\rightarrow {\bf 0}\) as \({\bf t}\rightarrow {\bf 0}\), there exists a positive number \(r\) such that
\[0<\parallel {\bf t}\parallel <r\mbox{ implies }|E({\bf t})|<\frac{m}{2}.\]
For such \({\bf t}\), we have
\[0\leq\parallel {\bf t}\parallel^{2}|E({\bf t})|<\frac{1}{2}m\parallel{\bf t}\parallel^{2},\]
which implies
\[f({\bf a}+{\bf t})-f({\bf a})>m\parallel {\bf t}\parallel^{2}-\frac{1}{2}m\parallel {\bf t}\parallel^{2}=\frac{1}{2}m\parallel {\bf t}\parallel^{2}>0.\]
This shows that \(f\) has a relative minimum at \({\bf a}\), which proves part (i). Using the similar arguments or simply applying part (i) to \(-f\), we prove part (ii). To prove part (iii), given any \(\lambda >0\), using (\ref{maeq382}), we obtain
\begin{align*} f({\bf a}+\lambda {\bf t})-f({\bf a}) & =Q(\lambda {\bf t})+\lambda^{2}\parallel {\bf t}\parallel^{2}E(\lambda {\bf t})\\ & =\lambda^{2}\left [Q({\bf t})+\parallel {\bf t}\parallel^{2}E(\lambda {\bf t})\right ].\end{align*}
Suppose that \(Q({\bf t})\neq 0\) for some \({\bf t}\). Since \(E({\bf t})\rightarrow 0\) as \({\bf t}\rightarrow {\bf 0}\), there exists a positive number \(r\) such that
\[0<\lambda <r\mbox{ implies }\parallel {\bf t}\parallel^{2}E(\lambda {\bf t})<\frac{1}{2}\left |Q({\bf t})\right |.\]
Therefore, the quantity \(\lambda^{2} [Q({\bf t})+\parallel {\bf t}\parallel^{2}E(\lambda {\bf t})]\) has the same sign as \(Q({\bf t})\) for \(0<\lambda <r\). This also says that the difference \(f({\bf a}+\lambda {\bf t})-f({\bf a})\) has the same sign as \(Q({\bf t})\) for \(0<\lambda <r\). Therefore, when \(Q({\bf t})\) takes both positive and negative values, we see that \({\bf a}\) is a saddle point of \(f\). This completes the proof. \(\blacksquare\)
Corollary. Let \(f:\mathbb{R}^{2}\rightarrow\mathbb{R}\) be a real-valued function defined on \(\mathbb{R}^{2}\) with continuous second-order partial derivatives at \({\bf a}\in\mathbb{R}^{2}\). We define
\begin{align*} A=\frac{\partial^{2}f}{\partial x^{2}}({\bf a}),\quad B=\frac{\partial^{2}f}{\partial x\partial y}({\bf a}),\quad C=\frac{\partial^{2}f}{\partial y^{2}}({\bf a}) \end{align*}
and \(\Delta =AC-B^{2}\). Then, we have the following results.
(i) Suppose that \(\Delta >0\) and \(A>0\). Then \(f\) is a local minimum at \({\bf a}\).
(ii) Suppose that \(\Delta >0\) and \(A<0\). Then \(f\) is a local maximum at \({\bf a}\).
(iii) Suppose that \(\Delta <0\). Then \({\bf a}\) is a saddle point of \(f\).
Proof. In the two-dimensional case, we can write the quadratic form in (\ref{maeq383}) as follows:
\[Q(x,y)=\frac{1}{2}\left (Ax^{2}+2Bxy+Cy^{2}\right )\]
Suppose that \(A\neq 0\). Then \(Q\) can also be written as
\[Q(x,y)=\frac{1}{2A}\left [(Ax+By)^{2}+\Delta y^{2}\right ].\]
We consider the following cases.
- Suppose that \(\Delta >0\). Then, the expression in brackets is the sum of two squares, which says that \(Q(x,y)\) has the same sign as \(A\). Therefore, parts (i) and (ii) follow from parts (i) and (ii) of Theorem \ref{mat384}.
- Suppose that \(\Delta <0\). Then, the quadratic form is the product of two linear factors. Therefore, the set of points \((x,y)\) such that \(Q(x,y)=0\) consists of two lines in the x-y plane intersecting at \((0,0)\). These lines divide the plane into four regions such that \(Q(x,y)\) is positive in two of these regions and negative in the other two regions. This says that \({\bf a}\) is a saddle point of \(f\).
This completes the proof. \(\blacksquare\)
Example. Given a function \(f(x,y)=-xye^{-(x^{2}+y^{2})/2}\), we have
\[\frac{\partial f}{\partial x}=y(x^{2}-1)e^{-(x^{2}+y^{2})/2}\mbox{ and } \frac{\partial f}{\partial y}=x(y^{2}-1)e^{-(x^{2}+y^{2})/2}.\]
Therefore, we obtain \[\nabla f(x,y)=0\mbox{ if and only if }y(1-x^{2})=0\mbox{ and }x(y^{2}-1)=0,\]
which says that \((0,0),(1,1),(1,-1),(-1,1)\) and \((-1,-1)\) are the stationary points. The second partial derivatives are given by
\begin{align*} \frac{\partial^{2}f}{\partial x^{2}} & =xy(3-x^{2})e^{-(x^{2}+y^{2})/2}\\ \frac{\partial^{2}f}{\partial y^{2}} & =xy(3-y^{2})e^{-(x^{2}+y^{2})/2}\\ \frac{\partial^{2}f}{\partial y\partial x} & =(x^{2}-1)(1-y^{2})e^{-(x^{2}+y^{2})/2}. \end{align*}
The values for the second partial derivatives are recorded in the following table
\[\begin{tabular}{cccccc}\hline Point & \(A\) & \(B\) & \(C\) & \(\Delta\) & Result\\ \hline \((0,0)\) & \(0\) & \(-1\) & \(0\) & \(-1\) & Saddle point\\ \((1,1)\) & \(2e^{-1}\) & \(0\) & \(2e^{-1}\) & \(4e^{-2}\) & Local minimum\\ \((1,-1)\) & \(-2e^{-1}\) & \(0\) & \(-2e^{-1}\) & \(4e^{-2}\) & Local maximum\\ \((-1,1)\) & \(-2e^{-1}\) & \(0\) & \(-2e^{-1}\) & \(4e^{-2}\) & Local maximum\\ \((-1,-1)\) & \(2e^{-1}\) & \(0\) & \(2e^{-1}\) & \(4e^{-2}\) & Local minimum\\ \hline \end{tabular}\]
We remark that when \(\Delta =0\), there may be a local maximum, a local minimum or a saddle point at \({\bf a}\). \(\sharp\)
Next, we are going to add some side conditions to the optimum problems. These side conditions may be called the constraints.
\begin{equation}{\label{mat255}}\tag{71}\mbox{}\end{equation}
Theorem \ref{mat255}. Let \(f:S\rightarrow\mathbb{R}\) be a real-valued function defined on an open subset \(S\) of \(\mathbb{R}^{n}\) such that \(f\) is continuously differentiable on \(S\), and let \(g_{1},\cdots ,g_{m}\) be real-valued functions defined on \(S\) such that the vector-valued function \({\bf g}=(g_{1},\cdots ,g_{m})\) is also continuously differentiable on \(S\) satisfying \(m<n\). We define
\[X_{0}=\left\{{\bf x}\in S:{\bf g}({\bf x})={\bf 0}\right\}.\]
Given any \({\bf x}_{0}\in X_{0}\), suppose that \({\bf x}_{0}\) is a local maximum or local minimum, and that the determinant
\[\mbox{det}\left [\frac{\partial g_{i}}{\partial x_{j}}({\bf x}_{0})\right ]\neq 0.\]
Then, there exist real numbers \(\lambda_{1},\cdots ,\lambda_{m}\) such that the following \(n\) equations are satisfied
\begin{equation}{\label{maeq385}}\tag{72} \frac{\partial f}{\partial x_{r}}({\bf x}_{0})+\sum_{k=1}^{m}\lambda_{k} \frac{\partial g_{k}}{\partial x_{r}}({\bf x}_{0})=0\mbox{ for }r=1,\cdots ,n, \end{equation}
which can be written as the following vector form
\[\nabla f({\bf x}_{0})+\sum_{k=1}^{m}\lambda_{k}\nabla g_{k}({\bf x}_{0})={\bf 0}.\]
The real numbers \(\lambda_{1},\cdots ,\lambda_{m}\) are called the Lagrange multipliers.
Proof. We consider the following system of \(m\) linear equations in \(m\) unknowns \(\lambda_{1},\cdots ,\lambda_{m}\)
\[\sum_{k=1}^{m}\lambda_{k}\frac{\partial g_{k}}{\partial x_{r}}({\bf x}_{0})=-\frac{\partial f}{\partial x_{r}}({\bf x}_{0})\]
for \(r=1,2,\cdots ,m\). Since the determinant of the system is not zero by the assumption, this system has a unique solution. Therefore, the first \(m\) equations in (\ref{maeq385}) are satisfied. We need to verify that the remaining \(n-m\) equations in (\ref{maeq385}) are also satisfied for this choice of \(\lambda_{1},\cdots ,\lambda_{m}\). To do this, we apply the implicit function Theorem \ref{ma138}. Since \(m<n\), every point \({\bf x}\in S\) can be written as \({\bf x}=(\bar{\bf x},{\bf t})\), where \(\bar{\bf x}\in\mathbb{R}^{m}\) and \({\bf t}\in\mathbb{R}^{n-m}\). We also write
\[\bar{\bf x}=\left (x_{1},\cdots ,x_{m}\right )\mbox{ and }{\bf t}=\left (t_{1},\cdots ,t_{n-m}\right )\equiv \left (x_{m+1},\cdots ,x_{n}\right ),\]
i.e., \(t_{k}=x_{m+k}\). For the vector-valued function \({\bf g}=(g_{1},\cdots ,g_{m})\), we can write \({\bf g}(\bar{\bf x}_{0},{\bf t}_{0})={\bf 0}\) if \({\bf x}_{0}=(\bar{\bf x}_{0},{\bf t}_{0})\). Since \({\bf g}\) is continuously differentiable on \(S\), and since the determinant
\[\mbox{det}\left [\frac{\partial g_{i}}{\partial x_{j}}({\bf x}_{0})\right ]\neq 0,\]
all the conditions of the implicit function Theorem \ref{ma138} are satisfied. Therefore, there exists an \((n-m)\)-dimensional open set \(T_{0}\) containing \({\bf t}_{0}\) and a unique vector-valued function \({\bf h}=(h_{1},\cdots ,h_{m})\) defined on \(T_{0}\) such that \({\bf h}\) is continuously differentiable on \(T_{0}\) satisfying \({\bf h}({\bf t}_{0})=\bar{\bf x}_{0}\), and for every \({\bf t}\in T_{0}\), we have \({\bf g}({\bf h}({\bf t}),{\bf t})={\bf 0}\). This means that the system of the following \(m\) equations
\[g_{1}(x_{1},\cdots ,x_{n})=0,\cdots ,g_{m}(x_{1},\cdots ,x_{n})=0\]
can be solved for \(x_{1},\cdots ,x_{m}\) in terms of \(x_{m+1},\cdots ,x_{n}\), and the solutions are given by
\[x_{r}=h_{r}(x_{m+1},\cdots ,x_{n})\mbox{ for }r=1,\cdots ,m.\]
Now, we shall substitute these expressions for \(x_{1},\cdots ,x_{m}\) into \(f(x_{1},\cdots ,x_{n})\) and \(g_{p}(x_{1},\cdots ,x_{n})\). In other words, we define a new function \(F\) as
\[F(x_{m+1},\cdots ,x_{n})=f\left (h_{1}(x_{m+1},\cdots ,x_{n}),\cdots ,h_{m}(x_{m+1},\cdots ,x_{m}),x_{m+1},\cdots ,x_{n}\right )\]
and \(m\) new functions \(G_{1},\cdots ,G_{m}\) as, for \(p=1,\cdots,m\),
\[G_{p}(x_{m+1},\cdots ,x_{n})=g_{p}\left (h_{1}(x_{m+1},\cdots ,x_{n}),\cdots ,h_{m}(x_{m+1},\cdots ,x_{m}),x_{m+1},\cdots ,x_{n}\right ).\]
More precisely, we can write
\[F({\bf t})=f({\bf H}({\bf t}))\mbox{ and }G_{p}({\bf t})=g_{p}({\bf H}({\bf t})),\mbox{ where }{\bf H}({\bf t})=({\bf h}({\bf t}),{\bf t}).\]
Then, we see that each function \(G_{p}\) is equal to zero on the set \(T_{0}\) by the implicit function Theorem \ref{ma138}. Therefore, each derivative \(\partial G_{p}/\partial x_{r}\) is also equal to zero on \(T_{0}\). In particular, we have \(\partial G_{p}/\partial x_{r}({\bf t}_{0})=0\). By the chain rule (\ref{maeq251}), we obtain
\begin{equation}{\label{maeq386}}\tag{73} 0=\frac{\partial G_{p}}{\partial x_{r}}({\bf t}_{0})=\sum_{k=1}^{n}\frac{\partial g_{p}}{\partial x_{k}}({\bf x}_{0})\frac{\partial H_{k}}{x_{r}}({\bf t}_{0})\mbox{ for }{\bf t}_{0}\in T_{0}. \end{equation}
Since
\[H_{k}({\bf t})=\left\{\begin{array}{ll} h_{k}({\bf t}) & \mbox{if \(1\leq k\leq m\)}\\ x_{k} & \mbox{if \(m+1\leq k\leq n\)}, \end{array}\right .\]
for \(m+1\leq k\leq n\) and for every \({\bf t}\), we have
\[\frac{\partial H_{m+r}}{\partial x_{r}}({\bf t})=1\mbox{ and }\frac{\partial H_{k}}{\partial x_{r}}({\bf t})=0\mbox{ for }k\neq m+r.\]
Therefore, from (\ref{maeq386}), we obtain
\begin{equation}{\label{maeq387}}\tag{74} 0=\sum_{k=1}^{m}\frac{\partial g_{p}}{\partial x_{k}}({\bf x}_{0})\frac{\partial h_{k}}{\partial x_{r}}({\bf t}_{0})+ \frac{\partial g_{p}}{\partial x_{m+r}}({\bf x}_{0})\mbox{ for }p=1,\cdots ,m\mbox{ and } r=1,\cdots ,n-m. \end{equation}
By the continuity of \({\bf h}\), there exists an \((n-m)\)-dimensional ball \(B({\bf t}_{0})\subseteq T_{0}\) such that
\[{\bf t}\in B({\bf t}_{0})\mbox{ implies }{\bf h}({\bf t},{\bf t})\in B({\bf x}_{0}),\]
where \(B({\bf x}_{0})\) is the \(n\)-dimensional open ball in the statement of the theorem. This says that
\[{\bf t}\in B({\bf t}_{0})\mbox{ implies }{\bf h}({\bf t},{\bf t})\in B({\bf x}_{0})\cap X_{0}.\]
Therefore, by the assumption, we have either \(F({\bf t})\leq F({\bf t}_{0})\) for all \({\bf t}\in B({\bf t}_{0})\) or \(F({\bf t})\geq F({\bf t}_{0})\) for all \({\bf t}\in B({\bf t}_{0})\). This says that \(F\) has a local maximum or a local minimum at the interior point \({\bf t}_{0}\), which implies that \(\partial F/\partial x_{r}({\bf t}_{0})=0\). Using the chain rule, we obtain
\[0=\frac{\partial F}{\partial x_{r}}({\bf t}_{0})=\sum_{k=1}^{n}\frac{\partial f}{\partial x_{k}}({\bf x}_{0})\frac{\partial H_{k}}{\partial x_{r}}({\bf t}_{0})\]
for \(r=1,\cdots ,n-m\). Therefore, we also have
\begin{equation}{\label{maeq388}}\tag{75} \sum_{k=1}^{m}\frac{\partial f}{\partial x_{k}}({\bf x}_{0})\frac{\partial h_{k}}{\partial x_{r}}({\bf t}_{0})+\frac{\partial f}{\partial x_{m+r}}({\bf x}_{0})=0 \end{equation}
for \(r=1,\cdots ,n-m\). Multiplying (\ref{maeq387}) by \(\lambda_{p}\) and summing them up, and adding the result to (\ref{maeq388}), we obtain
\[\sum_{k=1}^{m}\left [\frac{\partial f}{\partial x_{k}}({\bf x}_{0})+\sum_{p=1}^{m}\lambda_{p} \frac{\partial g_{p}}{\partial x_{k}}({\bf x}_{0})\right ]\frac{\partial h_{k}}{\partial x_{r}}({\bf t}_{0})+\frac{\partial f}{\partial x_{m+r}}({\bf x}_{0}) +\sum_{p=1}^{m}\lambda_{p}\frac{\partial g_{p}}{\partial x_{m+r}}({\bf x}_{0})=0\]
for \(r=1,\cdots ,n-m\). Since the expression in square brackets vanishes because of the way \(\lambda_{1},\cdots ,\lambda_{m}\) were defined, we also have
\[\frac{\partial f}{\partial x_{m+r}}({\bf x}_{0})+\sum_{p=1}^{m}\lambda_{p}\frac{\partial g_{p}}{\partial x_{m+r}}({\bf x}_{0})=0\]
for \(r=1,\cdots ,n-m\). This completes the proof. \(\blacksquare\)
Corollary. Let \(f,g:S\rightarrow\mathbb{R}\) be a real-valued function defined on an open subset \(S\) of \(\mathbb{R}^{n}\) such that \(f\) and \(g\) are continuously differentiable on \(S\). Suppose that \({\bf x}_{0}\) maximizes or minimizes \(f({\bf x})\) subject to the side condition \(g({\bf x})=0\) satisfying \(\nabla g({\bf x}_{0})\neq {\bf 0}\). Then, there exists a scalar \(\lambda\) satisfying \(\nabla f({\bf x}_{0})=\lambda\nabla g({\bf x}_{0})\). The scalar \(\lambda\) is called a Lagrange multiplier.
Proof. The desired result follows from Theorem \ref{mat255} by taking \(m=1\). \(\blacksquare\)
Example. Maximize or minimize \(f(x,y)=xy\) on the unit circle \(x^{2}+y^{2}=1\). We set \(g(x,y)=x^{2}+y^{2}-1\). The gradients are
\[\nabla f(x,y)=(y,x)\mbox{ and }\nabla g(x,y)=(2x,2y).\]
Setting \(\nabla f(x,y)=\lambda\nabla g(x,y)\), we obtain \(y=2\lambda x\) and \(x=2\lambda y\). Multiplying the first equation by \(y\) and the second equation by \(x\), we find \(y^{2}=2\lambda xy=x^{2}\). The side condition \(x^{2}+y^{2}=1\) implies \(2x^{2}=1\), which says \(x=\pm\frac{1}{2}\sqrt{2}=y\). The only points that can rise to an extreme value are
\[\left (\frac{1}{2}\sqrt{2},\frac{1}{2}\sqrt{2}\right ),\quad\left (\frac{1}{2}\sqrt{2}, -\frac{1}{2}\sqrt{2}\right ),\quad\left (-\frac{1}{2}\sqrt{2},\frac{1}{2}\sqrt{2}\right ) \mbox{ and }\left (-\frac{1}{2}\sqrt{2},-\frac{1}{2}\sqrt{2}\right ).\]
At the first and fourth points \(f\) takes the value \(1/2\). At the second and the third points \(f\) takes the value \(-1/2\). It is clear that \(1/2\) is the maximum value and \(-1/2\) is the minimum value. \(\sharp\)
Example. Find the minimum value of the function \(f(x,y)=x^{2}+(y-2)^{2}\) taken on the hyperbola \(x^{2}-y^{2}=1\). We set \(g(x,y)=x^{2}-y^{2}-1\). Then, we have
\[\nabla f(x,y)=(2x,2(y-2))\mbox{ and }\nabla g(x,y)=(2x,-2y).\]
The condition \(\nabla f(x,y)=\lambda\nabla g(x,y)\) gives
\[2x=2\lambda x\mbox{ and }2(y-2)=-2\lambda y.\]
The side condition \(x^{2}-y^{2}=1\) shows that \(x\) cannot be zero, which can obtain \(\lambda =1\). This also means \(y-2=-y\). Therefore, we obtain \(y=1\). With \(y=1\), the side condition gives \(x=\pm\sqrt{2}\). Therefore, we need to check the points \((-\sqrt{2},1)\) and \((\sqrt{2},1)\). At each of these points \(f\) takes value \(3\). \(\sharp\)
Example. Maximize \(f(x,y,z)=xyz\) subject to the side condition \(x^{3}+y^{3}+z^{3}=1\) with \(x\geq 0,y\geq 0\) and \(z\geq 0\). We set \(g(x,y,z)=x^{3}+y^{3}+z^{3}-1\). The gradients are given by
\[\nabla f(x,y,z)=(yz,xz,xy)\mbox{ and }\nabla g(x,y,z)=(3x^{2},3y^{2},3z^{2}).\]
The condition \(\nabla f(x,y,z)=\lambda\nabla g(x,y,z)\) gives
\[yz=3\lambda x^{2},\quad xz=3\lambda y^{2}\mbox{ and }xy=3\lambda z^{2}.\]
Multiplying the first equation by \(x\), the second equation by \(y\), and the third equation by \(z\), we get
\[xyz=3\lambda x^{3}=3\lambda y^{3}=3\lambda z^{3}.\]
For \(\lambda =0\), we see that \(x,y\) or \(z\) must be zero, which implies \(xyz\) to be \(0\). In this case, \(0\) is obviously not a maximum. Having excluded \(\lambda =0\), we can divide by \(\lambda\) to get \(x^{3}=y^{3}=z^{3}\), which says that \(x=y=z\). The side condition \(x^{3}+y^{3}+z^{3}=1\) gives \(x=1/\sqrt[3]{3}=y=z\). Therefore, the desired maximum is \(1/3\). \(\sharp\)
Example. Let
\[h(x_{1},x_{2},x_{3})=\sum_{i=1}^{3}\sum_{j=1}^{3}a_{ij}x_{i}x_{j}.\]
We are going to find the extreme values of the following function
\[f(x_{1},x_{2},x_{3})=x_{1}^{2}+x_{2}^{2}+x_{3}^{2}\]
subject to the following side condition
\[g(x_{1},x_{2},x_{3})=h(x_{1},x_{2},x_{3})-1=0.\]
Using Theorem \ref{mat255}, we introduce the Lagrange multiplier \(\lambda\) to the vector equation
\begin{equation}{\label{maeq256}}\tag{76} \nabla f({\bf x})+\lambda\nabla g({\bf x})=\nabla f({\bf x})+\lambda\nabla h({\bf x})={\bf 0}. \end{equation}
Therefore, by taking inner product with \({\bf x}\) on both sides, we obtain
\[0=\langle {\bf x},\nabla f({\bf x})\rangle+\langle {\bf x},\lambda\nabla h({\bf x})\rangle =2f({\bf x})+2\lambda h({\bf x}).\]
Since \(h({\bf x})=1\), we obtain \(\lambda =-f({\bf x})\). Let \(t=-1/\lambda\) for \(\lambda\neq 0\). Then form (\ref{maeq256}), we obtain
\[t\nabla f({\bf x})-\nabla h({\bf x})={\bf 0},\]
which can be expanded as follows:
\begin{align*} (a_{11}-t)x_{1}+a_{12}x_{2}+a_{13}x_{3} & =0\\ a_{21}x_{1}+(a_{22}-t)x_{2}+a_{23}x_{3} & =0\\ a_{31}x_{1}+a_{32}x_{2}+(a_{33}-t)x_{3} & =0. \end{align*}
Since \({\bf x}={\bf 0}\) is not a solution, the determinant of this system should vanish, i.e.,
\[\left |\begin{array}{ccc} a_{11}-t & a_{12} & a_{13}\\ a_{21} & a_{22}-t & a_{23}\\ a_{31} & a_{32} & a_{33}-t \end{array}\right |=0.\]
This is a cubic equation. Then, we can solve to obtain three roots. In this case, we can determine the value of \(\lambda\) and the optimal solution \((x_{1},x_{2},x_{3})\). \(\sharp\)


