25 Partial change and the gradient vector

We have two ways by which we represent functions:

As a computational algorithm for generating the output from an input(s), typically involving arithmetic and such.
As a geometrical entity, specifically the graph of a function which can be a curve or, for functions of two inputs, a surface.

These two modes are sometimes intertwined, as when we use the name “line” to refer to a computational object: $\line(x) \equiv a x + b$.

Unfortunately for functions of two inputs, a surface is hard to present in the formats that are most easily at hand: a piece of paper, a printed page, a computer screen. That isbecause a curved surface is naturally a 3-dimensional object, while paper and screens provide two-dimensional images. Consequently, the graphics mode we prefer for presenting functions of two inputs is the contour plot, which is not a single geometrical object but a set of many objects: contours, labels, colored tiles.

We’ve been doing calculus on functions with one input because it is so easy to exploit both the computational mode and the graphical mode. And it might fairly be taken as a basic organizing theme of calculus that

a line segment approximates a curve in a small region around a point.

When figuring out the derivative function $\partial_x f(x)$ from a graph of $f(x)$, we find the tangent to the graph at each of many input values, record the slope of the line (and throw away the intercept) and then write down the series of slopes as a function of the input, typically by representing the slope by position along the vertical axis and the corresponding input by position along the horizontal axis. Figure 25.1 shows the process.

Figure 25.1: (A) The graph of a smooth function annotated with small line segments that approximate the function locally. The color of each labeled segment corresponds to the value of $x$ for that segment. The slope of each segment is written numerically below the segment. (B) The labeled dots show the slope of each segment from (A). The slope is encoded using vertical position (as usual) and carries over the numerical label from (A). Connecting the dots sketches out the derivative of the function in (A).

Panel (A) in Figure 25.1 shows a smooth function $f(x)$ (thin black curve). To find the function $\partial_x f(x)$, we take the slope of $f(x)$ at many closely spaced inputs. In Panel (A), we’ve highlighted short, tangent line segments at the closely-spaced points labeled A through V. The slope of each tangent line segment can be calculated by the usual rise-over-run method; the numerical value of the slope is written underneath the segment. To plot the derivative $\partial_x f(x)$, I have taken the slope information from (A) and plotted it as a function of $x$.

To restate what you already know, in the neighborhood of any input value $x$, the slope of any local straight-line approximation to $f(x)$ is given by the value of of $\partial_x f(x)$.

25.1 Calculus on two inputs

Although we use contour plots for good practical reasons, the graph of a function $g(x,y)$ with two inputs is a surface, as described in Section @ref(surface-plot). The derivative of $g(x,y)$ should encode the information needed to approximate the surface at any input $(x,y)$. In particular, we want the derivative of $g(x,y)$ to tell us the orientation of the tangent plane to the surface.

A tangent plane is infinite in extent. Let’s use the word facet to refer to a little patch of the tangent plane centered at the point of contact. Each facet is flat. (it is part of a plane!) Figure 25.2 shows some facets tangent to a familiar curved surface. No two of the facets are oriented the same way.

Better than a picture of a summer melon, pick up a hardcover book and place it on a curved surface such as a basketball. The book cover is a flat surface: a facet. The orientation of the cover will match the orientation of the surface at the point of tangency. Change the orientation of the cover and you will find that the point of tangency will change correspondingly.

If melons and basketballs are not your style, you can play the same game on an interactive graph of a function with two inputs. The snapshot below is a link to an applet that shows the graph of a function as a blue surface. You can specify a point on the surface by setting the value of the (x, y) input using the sliders. Display the tangent plane (which will be green) at that point by check-marking the “Tangent plane” input. (Acknowledgments to Alfredo Sánchez Alberca who wrote the applet using the GeoGebra math visualization system.)

For the purposes of computation by eye, a contour graph of a surface can be easier to deal with. Figure 25.3 shows the contour graph of a smoothly varying function. Three points have been labeled A, B, and C.

Figure 25.3: A function of 2 inputs with 3 specific inputs marked A, B, and C

Zooming in on each of the marked points presents a simpler picture for each of them, although one that is different for each point. Each zoomed-in plot contains almost parallel, almost evenly spaced contours. If the surface had been exactly planar over the entire zoomed-in domain, the contours would be exactly parallel and exactly evenly spaced. We can approach such exact parallelness by zooming in more closely around the labeled point.

Figure 25.4: Zooming in on the neighborhoods of A, B, and C in Figure 25.3 shows a simple, almost planar, local landscape. The bottom row shows the contours of the tangent plane near each of the neighborhoos in the top row.

Just as the function $\line(x) \equiv a x + b$ describes a straight line, the function $\text{plane}(x, y) \equiv a + b x + c y$ describes a plane whose orientation is specified by the value of the parameters $b$ and $c$. (Parameter $a$ is about the vertical location of the plane, not it is orientation.)

In the bottom row of Figure 25.4, the facets tangent to the original surface at A, B, and C are displayed. Comparing the top and bottom rows of Figure 25.4) you can see that each facet has the same orientation as the surface; the contours face in the same way.

Remember that the point of constructing such facets is to generalize the idea of a derivative from a function of one input $f(x)$ to functions of two or more inputs such as $g(x,y)$. Just as the derivative $\partial_x f(x_0)$ reflects the slope of the line tangent to the graph of $f(x)$ at $x=x_0$, our plan for the “derivative” of $g(x_0,y_0)$ is to represent the orientation of the facet tangent to the graph of $g(x,y)$ at $(x=x_0, y=y_0)$. The question for us now is what information is needed to specify an orientation.

One clue comes from the formula for a function whose graph is a plane oriented in a particular direction:

\[\text{plane}(x,y) \equiv a + b x + cy\]

R/mosaic: Orientation of a plane

To explore the roles of the parameters $b$ and $c$ in setting the orientation of the line, open an R/mosaic session. The R/mosaic code below generates a particular instance of $\text{plane}(x,y)$ and plots it in two ways: a contour plot and a surface plot. Change the numerical values of $b$ and $c$ and observe how the orientation of the planar surface changes in the graphs. You can also see that the value of $a$ is irrelevant to the orientation of the plane, just as the intercept of a straight-line graph is irrelevant to the slope of that line.

Note: The gf_refine(coord_fixed()) part of the contour-plot command makes numerical intervals on the horizontal and vertical axes have the same length.)

plane <- makeFun(a + b*x + c*y ~ x + y, a = 1, b = -2.5, c = 1.6)
contour_plot(plane(x, y) ~ x + y, 
             bounds(x=c(-2, 2), y=c(-2, 2))) %>%
  gf_refine(coord_fixed())

interactive_plot(plane(x, y) ~ x + y, 
                 bounds(x=c(-2, 2), y=c(-2, 2)))

As always it can be difficult to extract quantitative information from a surface plot. For the example here, you can see that the high-point on the surface is when $x$ is most negative and $y$ is most positive. Compare that to the contour plot to verify that two modes are displaying the same surface.

An instructive experience is to pick up a rigid, flat object, for instance a smartphone or hardcover book. Hold the object level with pinched fingers at the mid-point of each of the short ends, as shown in Figure 25.5 (left).

You can tip the object in one direction by raising or lowering one hand. (middle picture) And you can tip the object in the other coordinate direction by rotating the object around the line joining the points grasped by the left and right hands. (right picture) By combining these two motions, you can orient the surface of the object in a wide range of directions.¹

The purpose of this lesson is to show that two-numbers are sufficient to dictate the orientation of a plane. In terms of Figure 25.5 these are 1) the amount that one hand is raised relative to the other and 2) the angle of rotation around the hand-to-hand axis.

Similarly, in the formula for a plane, the orientation is set by two numbers, $b$ and $c$ in $\text{plane}(x, y) \equiv a + b x + c y$.

How do we find the right $b$ and $c$ for the tangent facet to a function $g(x,y)$ at a specific input $(x_0, y_0)$? Taking slices of $g(x,y)$ provides the answer. In particular, these two slices:

\[\text{slice}_1(x) \equiv g(x, y_0) = a + b\, x + c\, y_0 \\ \text{slice}_2(y) \equiv g(x_0, y) = a + b x_0 + c\, y\]

Look carefully at the formulas for the slices. In $\text{slice}_1(x)$, the value of $y$ is being held constant at $y=y_0$. Similarly, in $\text{slice}_2(y)$ the value of $x$ is held constant at $x=x_0$.

The parameters $b$ and $c$ can be read out from the derivatives of the respective slices: $b$ is equal to the derivative of the slice$_1$ function with respect to $x$ evaluated at $x=x_0$, while $c$ is the derivative of the slice$_2$ function with respect to $y$ evaluated at $y=y_0$. Or, in the more compact mathematical notation:

\[b = \partial_x \text{slice}_1(x)\left.\strut\right|_{x=x_0} \ \ \text{and}\ \ c=\partial_y \text{slice}_2(y)\left.\strut\right|_{y=y_0}\]

These derivatives of slice functions are called partial derivatives. The word “partial” refers to examining just one input at a time. In the above formulas, the ${\large |}_{x=x_0}$ means to evaluate the derivative at $x=x_0$ and ${\large |}_{y=y_0}$ means something similar.

You don’t need to create the slices explicitly to calculate the partial derivatives. Simply differentiate $g(x, y)$ with respect to $x$ to get parameter $b$ and differentiate $g(x, y)$ with respect to $y$ to get parameter $c$. To demonstrate, we will make use of the sum rule:

\[\partial_x g(x, y) = \underbrace{\partial_x a}_{=0} + \underbrace{\partial_x b x}_{=b} + \underbrace{\partial_x cy}_{=0} = b\]

Similarly, \[\partial_y g(x, y) = \underbrace{\partial_y a}_{=0} + \underbrace{\partial_y b x}_{=0} + \underbrace{\partial_y cy}_{=c} = c\]

Get in the habit of noticing the subscript on the differentiation symbol $\partial$. When taking, for instance, $\partial_y f(x,y,z, \ldots)$, all inputs other than $y$ are to be held constant. Some examples:

\[\partial_y 3 x^2 = 0\ \ \text{but}\ \ \ \partial_x 3 x^2 = 6x\\ \ \\ \partial_y 2 x^2 y = 2x^2\ \ \text{but}\ \ \ \partial_x 2 x^2 y = 4 x y \]

25.2 All other things being equal …

Recall that the derivative of a function with one input, say, $\partial_x f(x)$ tells you, at each possible value of the input $x$, how much the output will change proportional to a small change in the value of the input.

Now that we are in the domain of multiple inputs, writing $h$ to stand for “a small change” is not entirely adequate. Instead, we will write $dx$ for a small change in the $x$ input and $dy$ for a small change in the $y$ input.

With this notation, we write the first-order polynomial approximation to a function of a single input $x$ as \[f(x+dx) = f(x) + \partial_x f(x) \times dx\] Applying this notation to functions of two inputs, we have:

\[g(x + \color{magenta}{dx}, y) = g(x,y) + \color{magenta}{\partial_x} g(x,y) \times \color{magenta}{dx}\] and \[g(x, y+\color{brown}{dy}) = g(x,y) + \color{brown}{\partial_y} g(x,y) \times \color{brown}{dy}\]

Each of these statements is about changing one input while holding the other input(s) constant. Or, as the more familiar expression goes, “The effect of changing one input all other things being equal or all other things held constant.²

Everything we’ve said about differentiation rules applies not just to functions of one input, $f(x)$, but to functions with two or more inputs, $g(x,y)$, $h(x,y,z)$ and so on.

25.3 Gradient vector

For functions of two inputs, there are two partial derivatives. For functions of three inputs, there are three partial derivatives. We can, of course, collect the partial derivatives into Cartesian coordinate form. This collection is called the gradient vector.

Just as our notation for differences ($\cal D$) and derivatives ($\partial$) involves unusual typography on the letter “D,” the notation for the gradient involves such unusual typography although this time on $\Delta$, the Greek version of “D.” For the gradient symbol, turn $\Delta$ on its head: $\nabla$. That is,

\[\nabla g(x,y) \equiv \left(\stackrel\strut\strut\partial_x g(x,y), \ \ \partial_y g(x,y)\right)\]

Note that $\nabla g(x,y)$ is a function of both $x$ and $y$, so in general the gradient vector differs from place to place in the function’s domain.

The graphics convention for drawing a gradient vector for a particular input, that is, $\nabla g(x_0, y_0)$, puts an arrow with its root at $(x_0, y_0)$, pointing in direction $\nabla g(x_0, y_0)$, as in Figure 25.6.

A gradient field (see Figure 25.7) is the value of the gradient vector at each point in the function’s domain. Graphically, to prevent over-crowding, the vectors are drawn at discrete points. The lengths of the drawn vectors are set proportional to the numerical length of $\nabla g(x, y)$, so a short vector means the surface is relatively level, a long vector means the surface is relatively steep.

Figure 25.7: A plot of the gradient field $\nabla g(x,y)$.

25.4 Total derivative (optional)

The name “partial derivative” suggests the existence of some kind of derivative that is not just a part, but the whole thing. The total derivative is such a whole and gratifyingly made up of its parts, that is, the partial derivatives.

Suppose you are modeling the temperature of some volume of the atmosphere, given as $T(t, x, y, z)$. This merely says that the temperature depends on both time and location, something that is familiar from everyday life.

The partial derivatives have an easy interpretation: $\partial_t T()$ tells how the temperature is changing over time at a given location, perhaps because of the evaporation or condensation of water vapor. $\partial_x T()$ tells how the temperature changes in the $x$ direction, and so on.

The total derivative gives an overall picture of the changes in a parcel of air, which you can thnk of as a tiny balloon-like structure but without the balloon membrane. The temperature inside the “balloon” may change with time (e.g. condensation or evaporation of water), but as the ballon drifts along with the motion of the air (that is, the wind), the evolving location can change the temperature as well. Think of a balloon caught in an updraft: the temperature goes down as the balloon ascends.

For an imaginary observer located in the balloon, the temperature is changing with time. Part of this change is the instrinsic change measured by $\partial_t T$ but we need to add to that the changes induces by the evolving location of the balloon. The partial change in temperature due to a change in altitude is $\partial_z T$, but it is important to realize that the coordinates of the location are themselves functions of time: $x(t), y(t), z(t)$. Seeing the function $T()$ for the observer in the balloon as a function of $t$, we have $T(t, x(t), y(t), z(t))$. This is a function composition: $T()$ composed with each of $x()$, $y()$, and $z()$. Recall in the chain rule $\partial_v f(g(v)) = \partial_v f(g(v)) \partial_v g(v)$ that the derivative of the composed quantity is the product of two derivatives.

Likewise, the total derivative of temperature with respect to the observer riding in the balloon will be add together the parts due to changes in time (holding position constant), x-coordinate (holding time and the other space coordinates constant), and the like. Signifying the total differentiation with a capital $D$, we have

\[D\, T(t) = \partial_t T() + \partial_x T() \cdot\partial_t x + \partial_y T()\cdot \partial_t y + \partial_z T() \cdot\partial_t z\]

Note that $\partial_t x$ is the velocity of the balloon in the x-direction, and similarly for the other coordinate directions. Writing these velocities as $v_x, v_y, v_z$, the total derivative for temperature of a parcel of air embedded in a moving atmosphere is

\[D\ T(t) = \partial_t T + v_x\, \partial_x T + v_y\, \partial_y T + v_z\, \partial_z T\]

Formulations like this, which put the parts of change together into a whole, are often seen in the mathematics of fluid flow as applied in meteorology and oceanology.

25.5 Differentials

A little bit of this, a little bit of that. — Stevie Wonder, “The Game of Love”

We have framed calculus in terms of functions: transformations that take one (or more!) quantities as input and return a quantity as output. This was not the original formulation. In this section, we will use the original style to demonstrate how you can sometimes skip the step of constructing a function before differentiating to answer a question of the sort: “If this quantity changes by a little bit, how much will another, related quantity change?”

As an example, consider the textbook-style problem of a water skier being pulled along the water by a rope pulled in from the top of a tower of height $H$. The skier is distance $x$ from the tower. As the rope is winched in at a constant rate, does the skier go faster or slower as she approaches the tower.

In the function style of approach, we can write the position function $x(t)$ with input the length of the rope $L(t)$. Using the diagram, you can see that

\[x(t) = \sqrt{\strut L(t)^2 - H^2}\ .\]

Differentiate both sides with respect to $t$ to get the velocity of the skier: $\partial_t x(t)$ through the chain rule:

\[\underbrace{\partial_t x(t)}_{\partial_t f(g(t))} = \underbrace{\frac{1}{2\sqrt{\strut L(t)^2 - H^2}}}_{\left[ \partial_t f \right](g(t)) } \times \underbrace{\left[2 \partial_t L(t)\right]}_{\partial_t g(t)} = \frac{\partial_t L(t)}{\strut\sqrt{L(t)^2 - H^2}}\]

Now to reformulate the problem without defining a function.

Newton referred to “flowing quantities” or “fluents” and to what today is universally called derivatives as “fluxions.” Newton did not have a notion of inputs and output.³

At about the same time as Newton’s inventions, very similar ideas were being given very different names by mathematicians on the European continent. There, an infinitely small change in a quantity was called a “differential” and the differential of $x$ was denoted $dx$.

The first calculus textbook was subtitled, Of the Calculus of Differentials, in other words, how to calculate differentials. (See Figure 25.8.) Section I of this 1696 text is entitled, “Where we give the rules of this calculation,” those rules being recognizably the same as presented in Chapter Chapter 23 of this book.

Figure 25.8: From the start of the first calculus textbook, by le marquis de l’Hôpital, 1696.

Definition I of Section I states,

“We call quantities variable* that grow or decrease continuously; and to the contrary constant quantities are those that remain the same while the others change. … The infinitely small amount by which a continuous quantity increases or decreases is called the differential.*”

The differential is not a derivative. The differential is an infinitely small change in a quantity and a derivative is a rate of change. The differential of a quantity $x$ is written $dx$ in the textbook.⁴

The point of Section I of de l’Hôpital’s textbook is to present the rules by which the differentials of complex quantities can be calculated. You will recognize the product rule in de l’Hôpital’s notation:

The differential of $x\,y$ is $y\,dx + x\,dy$

The Pythagorean theorem relates the various quantities this way:

\[L^2 = x^2 + H^2\]

The differential of each side of the equation refers to “a little bit” of increase in the quantity on that side of the equation:

\[d(L^2) = d(x^2)\ \ \ \implies\ \ \ 2 L\, dL = 2 x\, dx\] where we’ve used one of the “rules” for calculating differentials. This gives us

\[dx = \frac{L}{x} dL\]

Think of this as a recipe for calculating $dx$. If you tell me $L$, $x$, and $dL$ then you can calculate the value of $dx$. For instance, suppose the tower is 52 feet tall and that there is $L=173$ feet of tow-rope extending to the skier. The Pythagorean theorem tells us the skier is $x=165$ feet from the base of the tower. The rope is, let us suppose, being pulled in at the top of the tower at $dL = 10$ feet per second. How fast is $x$ changing?

\[dx = \frac{173\ \text{ft}}{165\ \text{ft}} \times 10 \text{ft s}^{-2} = 10.05\ \text{ft s}^{-1}\]

We will return to “a little bit of this” when we explore how to add up little bits to get the whole in Chapter Chapter 38.

25.6 Drill

Part 1 What is $\partial_x x$?

$0$ $1$ $x$ $y$

Part 2 What is $\partial_x y$?

$0$ $1$ $x$ $y$

Part 3 What is $\partial_x a\, x$?

$0$ $a$ $x$ $y$

Part 4 What is $\partial_x x\, y$?

$0$ $1$ $x$ $y$

Part 5 What is $\partial_y x\, y$?

$0$ $1$ $x$ $y$

Part 6 What is $\partial_x A e^{kt}$?

$0$ $A k e^{kx}$ $t$

Part 7 What is $\partial_t A e^{kt}$?

$0$ $k A e^{kt}$ $k A e^{kx}$ $t A e^{kt}$

Part 8 What is $\partial_x A x e^{kt}$?

$A e^{kt}$ $A x e^{kt}$ $0$ $A k x e^{kt}$

Part 9 What is $\partial_t A x e^{kt}$?

$A e^{kt}$ $A k e^{kt}$ $0$ $A k x e^{kt}$

Part 10 What is $\partial_x \left[\strut a_0 + a_1 x + a_2 x^2 \right]$?

0 $a_1 + 2 a_2 x$ $a_1 + a_2 x$ $a_0 + a_1 x$

Part 11 What is $\partial_y \left[\strut a_0 + a_1 x + a_2 x^2 \right]$?

0 $a_1 + 2 a_2 x$ $a_1 + a_2 x$ $a_1 + 2 a_2 y$

Part 12 What is $\partial_x \left[\strut a_0 + a_1 y + a_2 y^2 \right]$?

0 $a_1 + 2 a_2 x$ $a_1 + a_2 x$ $a_1 + 2 a_2 y$

Part 13 What is $\partial_x \left[\strut a_0 + a_1 x + b_1 y + c x y \right]$?

$a_1 + c$ $a_1$ $a_1 + cy$ $a_1 + b1 + c$

Part 14 What is $\partial_y \left[\strut a_0 + a_1 x + b_1 y + c x y \right]$?

$b_1 + c$ $b_1$ $b_1 + cx$ $a_1 + b1 + c$

Part 15 What is $\partial_x \partial_y \left[\strut a_0 + a_1 x + b_1 y + c x y \right]$? (Usually we would write $\partial_{xy}$ instead of $\partial_x \partial_y$, but they amount to the same thing.)

$0$ $a_1$ $c$ $b_1$

Part 16 What is $\partial_x \partial_x \left[\strut a_0 + a_1 x + b_1 y + c x y \right]$? (Usually we would write $\partial_{xx}$ instead of $\partial_x \partial_x$, but they amount to the same thing.)

$0$ $a_1$ $c$ $b_1$

Part 17 What is $\partial_x \partial_x \left[\strut a_0 + a_1 x + b_1 y + c x y + a_2 x^2 + b_2 y^2 \right]$? (Usually we would write $\partial_{xx}$ instead of $\partial_x \partial_x$, but they amount to the same thing.)

$0$ $a_2$ $2 a_2$ $c + a_2$

Part 18 What is $\partial_y \partial_x \left[\strut a_0 + a_1 x + b_1 y + c x y + a_2 x^2 + b_2 y^2 \right]$? (Usually we would write $\partial_{yx}$ instead of $\partial_y \partial_x$, but they amount to the same thing.)

$0$ $2 a_2$ $c$ $2 b_2$

Part 19 What is $\partial_x \left[\strut A x^n y^m \right]$?

$A y^m$
$A n m x^{n-1} y^{m-1}$
$A n x^{n-1} y^m$
$A m x^{n} y^{m-1}$

Part 20 What is $\partial_y \left[\strut A x^n y^m \right]$?

$A m y^{m-1}$
$A n m x^{n-1} y^{m-1}$
$A n x^{n-1} y^m$
$A m x^{n} y^{m-1}$

Part 21 What is $\partial_{xy} \left[\strut A x^n y^m \right]$?

$A m x^{n-1} y^{m-1}$
$A n m x^{n-1} y^{m-1}$
$A n x^{n-1} y^{m-1}$
$A m x^{n} y^{m-1}$

Part 22 What is $\partial_x \left[\strut f(x) + y\right]$?

$0$
$\partial_x f(x) + 1$
$\partial_x f(x)$
$\partial_x f(x) + y$

Part 23 What is $\partial_x \left[\strut f(x) + g(y)\right]$?

$0$
$\partial_x f(x) + \partial_x g(y)$
$\partial_x f(x)$
$\partial_x f(x) + \partial_y g(y)$

Part 24 What is $\partial_y \left[\strut f(x) + g(y)\right]$?

0 $\partial_x g(y)$ $\partial_x f(x)$ $\partial_y g(y)$

Part 25 What is $\partial_x \partial_y \left[\strut f(x) + g(y)\right]$?

0
$\partial_x \partial_y g(y)$
$\partial_x f(x)$
$\partial_y g(y)$

Part 26 What is $\partial_y \partial_y \left[\strut f(x) + g(y)\right]$?

0 1 $\partial_y g(y)$ $\partial_{yy} g(y)$

Part 27 What is $\partial_y f(x) g(y)$?

$g(y)\ \partial_y f(x) + f(x) \ \partial_y g(y)$
$f(x)\ \partial_{y} g(y)$
$\partial_y g(y)$
0

Part 28 What is $\partial_y h(x,y) g(y)$?

$ g(y) _y h(x,y) + h(x,y) _y g(y)$
$g(y) \partial_y h(x, y)$
$\partial_y g(y)$
0

Part 29 What is $\partial_x h(x,y) g(y)$?

$g(y) \partial_y h(x, y)$
$g(y)\ \partial_x h(x,y) + h(x,y)\ \partial_x g(y)$
$\partial_x h(x, y)$
$g(y) \partial_x h(x, y)$

Part 30 What is $\partial_{yx} h(x,y) g(y)$?

$(\partial_x g(y))\ (\partial_x h(x, y)) + g(y) (\partial_{xx} h(x, y) )$
$g(y) \partial_{yx} h(x,y) + h(x,y)\ \partial_y g(y)$
$\partial_{yx} h(x, y)$
$(\partial_y g(y)) \ (\partial_x h(x, y)) + g(y)\ (\partial_{yx} h(x, y))$

Part 31 What is the “with-respect-to” input in $\partial_y xy$?

$y$ $x$ $1$

Part 32 What is the “with-respect-to” input in $\partial_x y$?

$y$ $x$ $1$

Part 33 What is the “with-respect-to” input in $\partial_t y$?

$y$ $t$ $1$

Part 34
At which of these inputs is the function steepest in the x-direction?

$(x=0, y=1)$ $(x=1, y=5)$ $(x=0, y=6)$ $(x=-2, y=6)$

Part 35
At which of these inputs is the function practically flat?

$(x=0, y=1)$ $(x=1, y=2)$ $(x=0, y=6)$ $(x=-2, y=3)$

Part 36
You are standing on the input point $(x=-1,y=4)$. In terms of the compass points (where north would be up and east to the right), which direction points most steeply uphill from where you are standing.

NE SE SW NW

Part 37
You are standing on the input point $(x=2,y=1)$. In terms of the compass points (where north would be up and east to the right), which direction points most steeply uphill from where you are standing.

NE SE SW NW

Part 38
You have been hiking all day and have reached map coordinate (x=2, y=2). You are completely exhausted. Time for a break. You want to walk along the hill, without any change of elevation. Which compass direction should you head in to get started?

NE or SW SE but not NW NW or SE NW but not SE

Part 39 Symbolically compute $\partial_TWC(T,V)$ given $WC(T,V)\equiv35.74+.06215\cdot T-35.75\cdot V^{1.6}+0.4275\cdot T\cdot V^{1.6}$

$35.74+.06215-35.75\cdot V^{1.6}+0.4275\cdot V^{1.6}$
$.06215-35.75\cdot V^{1.6}+0.4275\cdot V^{1.6}$
$0.06215 + 0.4275\cdot V^{1.6}$
$.06215-35.75\cdot1.6\cdot V^{0.6}+0.4275\cdot1.6\cdot V^{0.6}$

Part 40 Symbolically compute \[\partial_yf(x,y,z,t)\] given \[f(x,y,z,t)\equiv xyz^2\sin(yt)\]

\[xz^2\cos(yt)\]
\[xz^2t\cos(yt)\]
\[xz^2\sin(yt)+xyz^2\cos(yt)\]
\[xz^2\sin(yt)+xyz^2t\cos(yt)\]

Part 41 Symbolically compute $\partial_{ab}$ given $g(a,b)\equiv 3ab^4+a^3b^2$

$6ab^2$ $12b^3+6a^2b$ $36ab^2+2a^3$ $3b^4+2a^3b$

25.7 Exercises

Exercise 25.01

It is relatively easy to assess partial derivatives when you know the gradient. After all, the gradient is the vector of $(\partial_x\,f(x,y), \partial_y f(x,y))$. To train your eye, here is a contour plot and a corresponding gradient plot.

Part A What is the rule for determining $\partial_x f(x,y)$ from the direction of the gradient vector?

If the vector has a component pointing right, $\partial_x f$ is positive.
If the vector has a component pointing left, $\partial_x f$ is positive
If the vector has a vertical component pointing up, $\partial_x f$ is positive.
If the vector has a component pointing downward, the partial derivative $\partial_x f$ is positive.

Part B What is the rule for determining $\partial_y f(x,y)$ from the direction of the gradient vector?

If the vector has a component pointing right, $\partial_y f$ is positive.
If the vector has a component pointing left, $\partial_y f$ is positive.
If the vector has a vertical component pointing up, $\partial_y f$ is positive.
If the vector has a component pointing downward, the partial derivative $\partial_y f$ is positive.

Exercise 25.02

Open a sandbox and use the following commands to make a contour plot of the function $g(x)$ centered on the reference point $(x_0\!=\!0,\, y_0\!=\!0)$.

g <- rfun( ~ x + y, seed = 802, n = 15)
x0 <-  0
y0 <-  0
size <- 5
contour_plot(g(x, y) ~ x + y,
             bounds(x = x0 + size*c(-1, 1),
                    y = y0 + size*c(-1, 1)))

By making size smaller, you can zoom in around the reference point. Zoom in gradually (say, size = 1.0, 0.5, 0.1, 0.05, 0.01) until you reach a point where the surface plot is (practically) a pretty simple inclined plane.

From the contour plot, zoomed in so that the graph shows an inclined plane, figure out the sign of $\partial_x g(0,0)$ and $\partial_y g(0,0)$.

Part A Which answer best describes the signs of the partial derivatives of $g(x,y)$ at the reference point $(x_0=0, y_0=0)$?

$\partial_x g(0,0)$ is pos, $\partial_y g(0,0)$ is pos
$\partial_x g(0,0)$ is pos, $\partial_y g(0,0)$ is neg
$\partial_x g(0,0)$ is neg, $\partial_y g(0,0)$ is neg
$\partial_x g(0,0)$ is neg, $\partial_y g(0,0)$ is pos.
$\partial_x g(0,0)$ is 0, $\partial_y g(0,0)$ is pos

Exercise 25.04

Consider this close up of a function around a reference point at the center of the graph.

By eye, estimate these derivatives of the function at the reference point $(x_0=-2, y_0=-5)$.

Part A What is the numerical value of $\partial_x g(x,y)$ at the reference point?

-1 -0.50 -0.25 0 0.25 0.50 1

Part B What is the numerical value of $\partial_y g(x,y)$ at the reference point?

-1 -0.50 -0.25 0 0.25 0.50 1

The next questions ask about second-order partial derivatives. As you know, the second derivative is about how the first derivative changes with x or y. Insofar as the function is a simple inclined plane, where the contours would be straight, parallel, and evenly spaced, the second derivatives would all be zero. But you can see that it is not such a plane: the contours curve a bit.

In determining the second derivatives by eye from the graph, you are encouraged to compare first derivatives at the opposing edges of the graph, as opposed to at very nearby points.

Part C What is the sign of $\partial_{xx} g(x,y)$ at the reference point?

negative positive

Part D What is the sign of $\partial_{yy} g(x,y)$ at the reference point?

negative positive

Part E What is the sign of $\partial_{xy} g(x,y)$ at the reference point?

negative positive

Part F What is the sign of $\partial_{yx} g(x,y)$ at the reference point?

negative positive

Exercise 25.06

At numerous occasions in your professional life, you will be in one or both of these positions:

You are a decision-maker being presented with the results of analysis conducted by a team of unknown reliability, and you need to figure out whether what they are telling you is credible.
You are a member of the analysis team needing to demonstrate to the decision-maker that your work should be believed.

As an example, consider one of the functions presented in a comedy book, Geek Logic: 50 Foolproof Equations for Everyday Life (2006), by Garth Sundem. The particular function we will consider here is Dr(), intended to help answer the question, “Should you go to the doctor?”

\[\text{Dr}(d, c, p, e, n, s) = \frac{\frac{s^2}{2} + e(n-e)}{100 - 3(d + \frac{p^3}{70} - c)}\] where

$d$ = How many days in the past month have you been incapacitated? $d_0 \equiv 3$
$c$ = Does the issue seem to be getting better or worse. (-10 to 10 with -10 being “circling the drain” and 10 being “dramatic improvement”) $c_0 \equiv -2$
$p$ = How much pain or discomfort are you currently experiencing? (1-10 with 10 being “currently holding detached toe in Ziploc bag”) $p_0 = 3$
$e$ = How embarrassing is this issue? (1-10 with 10 being “slipped on ice and fell on 1972 Mercedes-Benz hood ornament, which is now part of my body”) $e_0 = 4$
$n$ = How noticeable is the issue? (1-10 with 10 being “fell asleep on waffle iron”) $n_0 = 5$
$s$ = How serious does the issue seem? (1-10 with 10 being “may well have nail embedded in frontal lobe [of brain]”) $s_0 = 3$

Although the function is offered tongue-in-cheek, let’s examine it to see if it even roughly matches common sense. The tool we will use relates to low-order polynomial approximation around a reference point and examining appropriate partial derivatives. To save time, we stipulate a reference point for you, noted in the description of quantities above.

The code creates an R implementation of the function that is set up so that the default values of the inputs are those at the given reference point. You can use this in a sandbox to try different changes in each of the input quantities.


Dr <- makeFun(
  ((s^2)/2 + e*(n-e)) /
    (100 - 3*(d + (p^3/70) - c)) ~
    d+p+e+c+n+s, s=3, n=5, e=4, p=3, d=3, c=-2)
Dr()

According to the instructions in the book, if Dr()$> 1$, you should go to the doctor.

Essay 1: The value of Dr() at the reference point is 0.10, indicating that you shouldn’t go to the doctor. But we don’t yet know whether 0.10 is very close to the decision threshold of 1 or very far away. Describe a reasonable way to figure this out. Report your description and the results here.

Essay 2: There are six inputs to the function. Go through the list of all six and (without thinking too hard about it) write down for all of them your intuitive sense of whether an increase of one point in that input should raise or lower the output of Dr() at the reference point. Also write down whether you think the input should be a large or small determinant of whether to go to the doctor. (You don’t need to refer to the Dr() function itself, just to your own intuitive sense of what should be the effect of each of the inputs.)

The operator D() can calculate partial derivatives. You can calculate the value of a partial derivative very easily at the reference point, using an expression like this, which gives the value of the partial of Dr() with respect to input $s$ at the reference point:

D(Dr(s = s) ~ s)(s=3)

We are now going to use these partial derivatives to compare your intuition about going to the doctor to what the function has to say. Of course, we don’t know yet whether the function is reasonable, so don’t be disappointed if your intuition conflicts with the function.

Essay 3: Calculate the numerical value of each of the partial derivatives at the reference point. List them here and say, for each one, whether it accords with your intuition.

Exercise 25.07

Each of the figures below shows a contour plot and a gradient field. For some of the figures, the contour plot and the gradient field show the same function, for others they do not. Your task is to identify whether the contour plot and the gradient field are of the same or different functions.

Part A For Figure A, do the contour plot and the gradient field show the same function?

Yes No

Part B For Figure B, do the contour plot and the gradient field show the same function?

Yes No

Part C For Figure C, do the contour plot and the gradient field show the same function?

Yes No

Part D For Figure D, do the contour plot and the gradient field show the same function?

Yes No

Part E For Figure E, do the contour plot and the gradient field show the same function?

Yes No

Part F For Figure F, do the contour plot and the gradient field show the same function?

Yes No

Exercise 25.08

Here is a contour plot of a function $g(x,y)$. You will be presented with several gradient fields. Your task is to determine whether the gradient field corresponds to the contour plot and, if not, say why not.

Part A What’s wrong with gradient field 1?

arrows point down the hill instead of up it
magnitude of arrows are wrong, but direction is right
arrows don’t point in the right direction
nothing is wrong

Part B What’s wrong with gradient field 2?

arrows point down the hill instead of up it
magnitude of arrows are wrong, but direction is right
arrows don’t point in the right direction
nothing is wrong

Part C What’s wrong with gradient field 3?

arrows point down the hill instead of up it
magnitude of arrows are wrong, but direction is right
arrows don’t point in the right direction
nothing is wrong

Part D What’s wrong with gradient field 4?

arrows point down the hill instead of up it
magnitude of arrows are wrong, but direction is right
arrows don’t point in the right direction
nothing is wrong

Exercise 25.10

Here are contour maps and gradient fields of several functions with input $x$ and $y$. But any row of graphs may show two different functions. Your job is to match the contour plot with the gradient field, which may be in another row.

Part A Which contour plot matches gradient field 1?

A B C D E F

Part B Which contour plot matches gradient field 2?

A B C D E F

Part C Which contour plot matches gradient field 3?

A B C D E F

Part D Which contour plot matches gradient field 4?

A B C D E F

Part E Which contour plot matches gradient field 5?

A B C D E F

Part F Which contour plot matches gradient field 6?

A B C D E F

Exercise 25.12

Using the gradient field depicted below, figure out the sign of the partial derivatives at the labeled points. We will use “neg” to refer to negative partial derivatives, “pos” to refer to positive partial derivatives, and “zero” to refer to partials that are so small that you cannot visually distinguish them from zero.

Part A Which is $\partial_y f$ at point A?

neg zero pos

Part B Which is $\partial_x f$ at point A?

neg zero pos

Part C Which is $\partial_x f$ at point B?

neg zero pos

Part D Which is $\partial_x f$ at point C?

neg zero pos

Part E Which is $\partial_y f$ at point E?

neg zero pos

Part F Which is $\partial_x f$ at point E?

neg zero pos

Part G At which letter are both the partial with respect to $x$ and the partial with respect to $y$ negative.?

A B C D E F none of them

Exercise 25.13

The graph depicts the effect of a drug on heart rate. Each of the lines shows heart rate as a function of dose for a given age. The drug’s effect depends both on the dose of the drug and on the age of the person taking the drug.

Using the information shown in the graph, estimate numerically each of the following partial derivatives, giving proper units for each. (Heart rate is measured in “bpm” (beats per minute) and dose in “mg”.)

The derivative $\partial_\text{dose} HR$
1. at dose$=275$, age$=20$
2. at dose$=275$, age$=30$
3. at dose$=275$, age$=40$
$\partial_\text{age} HR$ at dose$=275$, age$=30$.
$\partial_\text{dose} \partial_\text{dose} HR$ at dose$=275$, age$=30$.
$\partial_\text{age} \partial_\text{age} HR$ at dose$=275$, age$=30$.
$\partial_\text{age} \partial_\text{dose} HR$ at dose$=275$, age$=30$.

The graphic above is three slices through the function HR(dose, age). One slice is for age 20, one slice for age 30, and the last for age 40. This is a standard graphical format in the technical literature, sometimes called an interaction plot since it emphasizes the interaction between age and dose.

Of course, we can plot the same function HR(dose, age) as a contour plot.

Which contour plot matches the interaction graph?
On the correct contour plot, draw the paths that correspond to the graph of HR versus dose for each of ages 20, 30, and 40 years.

Exercise 25.14

For almost everyone, a house is too expensive to buy with cash, so people need to borrow money. The usual form of the loan is called a “mortgage”. Mortgages extend over many years and involve paying a fixed amount each month. That amount is calculated so that, by paying it each month for the duration of the mortgage, the last payment will completely repay the amount borrowed plus the accumulated interest.

The monthly mortgage payment in dollars, $P$, for a house is a function of three quantities, \[P(A, r, N)\] where $A$ is the amount borrowed in dollars, $r$ is the interest rate (percentage points per year), and $N$ is the number of years before the mortgage is paid off.

A studio apartment is selling for $220,000. You will need to borrow $184,000 to make the purchase.

Part A Suppose $P(184000,4,10) = 2180.16$. What does this tell you in financial terms?

The monthly cost of borrowing $184,000 for 10 years at 4% interest per year.
The monthly cost of borrowing $184,000 for 4 years at 10% interest per year.
The annual cost of the mortgage at 4% interest for 10 years.
The annual cost of the mortgage at 10% interest for 4 years

The next two questions involve what happens to the monthly mortgage payments if you change either the amount or duration of the mortgage. (Hint: Common sense works wonders!)

Part B What would you expect about the quantity $\partial P / \partial A$, the partial derivative of the monthly mortgage payment with respect to the amount of money borrowed?

It is positive It is zero It is negative

Part C What would you expect about the quantity $\partial P / \partial N$, the partial derivative of the monthly mortgage payment with respect to the number of years the mortgage lasts?

It is positive It is zero It is negative

Part D Suppose $\partial_r P (184000,4,30) =$ $145.65. What is the financial significance of the number $145.65??

If the interest rate $r$ went up from 4 to 5, the monthly payment would increase by $145.65.
If the interest rate $r$ went up from 4 to 4.001, the monthly payment would increase by $145.65.
If the interest rate $r$ went up from 4 to 4.001, the monthly payment would increase by $0.001 imes $145.65.

Exercise 25.16

In economic theory, the quantity of the demand for any good is a decreasing function of the price of that good and an increasing function of the price of a competing good.

The classical example is that apple juice competes with orange juice. The demand for orange juice is in units of thousands of liters of orange juice. The price is in units of dollars per liter.

Here’s a graph with the input quantities unlabeled. The contour labels indicate the demand for orange juice.

The concept of partial derivatives makes it much easier to think about the situation. There are two partial derivative functions relevant to the function in the graph. Well denote the inputs apple and orange, but remember that these are the prices of those commodities in dollars per liter.

$\partial_\text{apple} \text{demand}()$ – how the demand changes when apple-juice price goes up, holding orange-juice price constant. (Another notation that is more verbose but perhaps easier to read $\frac{\partial\, \text{demand}}{\partial\,\text{apple}}$)
$\partial_\text{orange} \text{demand}()$ – how the demand changes when orange-juice price goes up, holding apple-juice price constant. (Another notation: $\frac{\partial\, \text{demand}}{\partial\,\text{orange}}$)

Notice that the notation names both the output and the single input which is to be changed–the other inputs will be held constant.

The first paragraph of this problem gives the economic theory which amounts to saying that one of the partial derivatives is positive and the other negative.

Part A What is the proper translation of the notation $\partial_\text{apple}\text{demand}()$?

The partial derivative of orange-juice demand with repect to apple-juice price
The partial derivative of apple-juice price with respect to demand for orange juice
The partial derivative of apple-juice demand with respect to price of apple juice
The partial derivative of orange-juice price with respect to apple-juice price.

Part B According to the economic theory described above, one of the partial derivatives will be positive and the other negative. Which will be positive.

$\partial_\text{apple} \text{demand}()$
$\partial_\text{orange} \text{demand}()$

Part C What does the vertical axis measure?

Price of orange juice
Quantity of apple juice
Quantity of orange juice
Price of apple juice

Part D Consider the magnitude (absolute value) of the partial derivative of demand with respect to orange-juice price. Is this magnitude greater toward the top of the graph or the bottom?

top bottom neither

Exercise 25.18

The contour plot of function $g(y, z)$ is overlaid with vectors. The black vector is a correct representation of the gradient (at the root of the vector). The other vectors are also supposed to represent the gradient, but might have something wrong with them (or might not). You’re job is to say what’s wrong with each of those vectors.

Part A What’s wrong with the red vector?

nothing
too long
too short
points downhill
points uphill
wrong direction entirely

Part B What’s wrong with the green vector?

nothing
too long
too short
points downhill
points uphill
wrong direction entirely

Part C What’s wrong with the blue vector?

nothing
too long
too short
points downhill
points uphill
wrong direction entirely

Part D What’s wrong with the orange vector?

nothing
too long
too short
points downhill
points uphill
wrong direction entirely

Part E What’s wrong with the gray vector?

nothing too long too short points downhill points uphill

Exercise 25.20

Let’s return to the water skier in Section 25.5. When we left her, the rope was being pulled in at 10 feet per second and her corresponding speed on the water was 10.05 feet per second. The relationship between the rope speed and the skier’s speed was \[dx = \frac{L}{x} dL\] and, due to the right-angle configuration of the tow system, $x^2 = L^2 - H^2$.

What happens to $dx$ as $L$ gets smaller with $dL$ being the same? We need to keep in mind that $dx$ depends on three things: $dL$, $L$, and $x$. But we can substitute in the Pythagorean relationship between $x$, $L$, and (fixed) $H$ to get

\[dx = \frac{L}{\strut\sqrt{L^2 - H^2}}\ dL\ .\]

This is bad news for the skier who holds on too long! As $L$ approaches $H$, the rope becomes more and more vertical and the skier’s water speed becomes greater and greater, approaching $\infty$ as $L \rightarrow H$.

You, an engineer brought in to solve this dangerous possibility, have proposed to have the winch slow down as the rope is reeled in. How should the speed of the rope be set so that the skier’s water speed remains safely constant?

Part A What formula for $dL$ will allow $dx$ to stay constant at a value $v$?

$dL = dx$
$dL = v \sqrt{L^2 - H^2}{L}$
$dL = v \sqrt{L^2 - H^2}$
There is no such formula.

In describing the orientation of aircraft and ships, three parameters are used: pitch, roll, and yaw. For a geometrical plane (as opposed to an aircraft or ship, which have distinct front and back ends), yaw isn’t applicable.↩︎
The Latin phrase for this is ceteris paribus, often used in economics.↩︎
The meaning of “output” as “to produce” dates from more than 100 years after Newton’s death.↩︎
A “warning” is given in the textbook that the symbol $d$ will always be used to mark the differential of a variable quantity and that $d$ will never be used to indicate a parameter.↩︎