icon

Making the grade: Part II

Chris Sangwin Share this page
icon

Making the grade: Part II

Chris Sangwin
January 2004

In Making the grade: Part I we considered what the gradient of a curve might mean, and how to find it by appealing directly to the definition. In particular, we used direct arguments - which were really quite involved - to calculate the gradients of the curves x2 and sin(x). To perform this kind of calculation every time we need to calculate such a gradient would be a nightmare - especially if we had a complicated function. In this article we think about the process of manipulating the algebraic expressions with which we usually describe functions in order to perform this calculation. This is differentiation as we know and love it!

The other kind of gradient

The other kind of gradient.
Image DHD Photo Gallery

The whole point of having a set of formal rules is to allow us to temporarily forget the exact meaning and to concentrate on calculation. After all, we can only concentrate on a few things at a time. Of course, it is vital to keep the meaning in the back of our minds, as a check that the answer is sensible. Furthermore, by having a set of rules disjoint from a particular context, we can apply the rules in many different settings.

The rules allow us to differentiate just about any algebraic expression we care to write down. Of course we have to decide which formal rules to apply in a given situation, and in what order. Sometimes it is not clear which rule we should apply - there are a number of things we could do correctly. What then to do? How should we decide? As we shall see, the answer to these questions is surprising and illustrates the intimate way in which calculus and algebra interact.

Calculating gradients using the calculus

In the previous article we calculated the gradient by considering the change in f divided by the change in x, that is, f(x+h)f(x)hfor all h0. When we carried out this calculation for the function f(x)=x2 we obtained f(x+h)f(x)h=(x+h)2x2h=x2+2xh+h2x2h=2x+h. Taking the limit as h tends to zero, either positive or negative, gave 2x as the derived function, or derivative, of x2.

Let's generalize this and consider f(x)=xn where n is any natural number, that is, n=1,2,3,.... Of course, we have already considered the cases when n=1 (the straight line) and when n=2 (the quadratic). \par In order to calculate (1) when f(x)=xn we need to consider f(x+h)f(x)h=(x+h)nxnh. To simplify this we need to expand out the term (x+h)n. When we do this for small values of n we get (x+h)0=1,(x+h)1=x+h,(x+h)2=x2+2xh+h2,(x+h)3=x3+3x2h+3xh2+h3,(x+h)4=x4+4x3h+6x2h2+4xh3+h4,(x+h)5=x5+5x4h+10x3h2+10x2h3+5xh4+h5. We could do this by hand for other values of n by multiplying out the brackets, but this is tricky, time-consuming and it is all too easy to slip up. In fact, there is a very regular pattern to the coefficients of the terms in the expansions above. If we ignore the x's and h's we obtain the pattern known as {\em Pascal's Triangle}, part of which is shown below.

\[ \begin{array}{c} 1 \\ 1\ 1 \\ 1 \ 2 \ 1 \\
1 \ 3 \ {\color{blue} 3} \ {\color{blue} 1} \\ 1 \ {\color{red} 4} \ {\color{red}6 } \ {\color{blue} 4} \ 1 \\ 1 \ 5 \ {\color{red} 10} \ 10 \ 5 \ 1 \\ \end{array} \]
This pattern, which may be continued forever, is obtained by adding together two adjacent numbers in one row to generate the number below. For example, the blue 4 is generated by adding the 3 and 1 above it. Similarly, the red 10 is obtained by adding the 4 and 6 above it.

In general, the number in the position k+1 in from the left on the nth row is given by the formula nCk=n!(nk)!k!. These numbers are known as the {\em binomial coefficients}, because nCk is the coefficient of xnk hk when we expand (x+h)n. This result is known as the {\em binomial theorem} and it allows us to exploit this pattern to write (x+h)n as (x+h)n=xn+nC1xn1h+nC2xn2h2+nC3xn3h3+...+nCn2x2hn2+nCn1xhn1+hn. \par We are currently interested in calculating the quantity (1) when f(x)=xn. To do this we note that, for all values of n, nC1=n!(n1)!=n×(n1)×(n2)×...×2×1(n1)×(n2)×...×2×1=n. Using this we have (x+h)nxnh=(xn+nC1xn1h+nC2xn2h2+...+hn)xnh=nxn1+h×(stuff). We don't really need to know exactly what the " stuff" is in the above expression in this case. This is because it is multiplied by h, and letting h tend to zero wipes out all these terms. What we are left with is the function nxn1. So we express this result as \setcounter{equation}{1} ddxxn=nxn1 whenever n is a natural number (ie, n=1,2,3,...). If n=1 in this argument we have ddxx=1, which confirms that the gradient of the straight line f(x)=x is constant.

This result may be expanded so that (2) holds whenever n is a {\em real} number, although this takes a little more work.

To take another example, let n=0 (remember that x0=1 for all x). So we have the formula f(x)=x0=1. This is a horizontal straight line, which has gradient zero. Does our formula (2) agree? ddxx0=0.x1=0. So the mechanical formula (2) again agrees with a simple case we can easily imagine. This is a useful check.

Constructing functions

Functions are fundamental to modern mathematics and you simply can't avoid using them. The idea of a function is to take two sets of objects known as the {\em inputs} and {\em outputs}. To every input the function assigns a unique output: inputFunctionoutput. Most often the inputs and outputs are sets of numbers, such as the real line. The function is also most often described using a formula, in the form of an algebraic expression. This is exactly the idea of a function we have considered so far, although we haven't been explicit about it! This is also the way we will continue to think about functions. \par The reason we pause now to think of functions in a more abstract way is simply to acknowledge that a function is much more general than a formula. In fact, the function |x| introduced in the previous article was built from two formulae bolted together. Recall these were |x|:={x,x0x,x<0. The trigonometric functions sin(x), cos(x), etc. are constructed with reference to a geometrical shape - in this case a circle of radius 1. Other ways of building functions involve an infinite series (that is a sum) or a sequence of formulae. We won't consider these in this article, but just concentrate on how we can build up functions from simple operations. \par Let us assume our input is a number x. The simplest operations we could perform on our variable are the arithmetic ones. That is addition, multiplication and the two inverse operations of subtraction and division. Because we can manipulate the formulae using algebra we can often write one formula in different ways.

For example, consider \setcounter{equation}{2} f(x)=x2(3x2)=3x2x4, which is shown in Figure~1 below.

Figure 1: The function (3)

Figure 1: The function (3)

We can think of f(x) as x2 multiplied by 3x2. Both these functions are shown in Figure~2. Try to imagine what happens when you multiply the values on each graph together.

Figure 2: The functions x<sup>2</sup> and 3-x<sup>2</sup>

Figure 2: The functions x2 and 3-x2

Alternatively, to calculate f(x) we might subtract x4 from 3x2. The graphs of these functions are shown in Figure~3. Since f(x)=3x2x4 we can recreate Figure~1 by subtracting one from the other. No doubt there are other ways of constructing the same function f.


Figure 3: The functions x<sup>4</sup> and 3x<sup>2</sup>

Figure 3: The functions x4 and 3x2

Functions can also be applied in order, one after the other, as in inputFunction 1Function 2output. So, our function (3) can be thought of as applying the function g(u)=u(3u) to the result of the function u(x)=x2.

Figure 4: The function sin(x<sup>2</sup>)

Figure 4: The function sin(x2)

Of course, we have to ensure that any output from Function 1 is a legitimate input for Function 2. In this case we say the two functions have been composed. For example, a function such as sin(x2) can be thought of as the function that maps x to sin(x) applied to the result of the function that maps x to x2. Note that the order really does matter here and sin(x2) and sin(x)2 are very different functions: see Figures 4 and 5.

Figure 5: The function sin(x)<sup>2</sup>

Figure 5: The function sin(x)2

Given the numerous ways we could express a function such as (3), how should we go about differentiating it? This is the question we address in the rest of this article.

Linearity of the differential calculus

You don't always get the same result if you do things in a different order!

You don't always get the same result if you do things in a different order!

The first general rule allows us to calculate the derivative of two functions which have been added together. If we want to find the gradient of f(x)+g(x) we simply find the gradients of f(x) and g(x) separately and then add the results. In a more condensed (and easier to read) form this may be expressed as:

\setcounter{equation}{3} ddx(f(x)+g(x))=ddxf(x)+ddxg(x). Similarly, if f(x) is multiplied by a constant a then ddxaf(x)=addxf(x). Together the two rules above are known as {\em linearity}, and they allow us to easily calculate the derivative of {\em any} polynomial p(x) by breaking it down into constant multiples of xn for various n, and then applying (2). This is powerful indeed.

Example

For example, to calculate the gradient of the function f defined in (3), we write this as the unfactored form 3x2x4 and can then apply the rules as follows: ddx(3x2x4)=ddx3x2+ddxx4 using (4),=3ddxx2ddxx4 using (5),=6x4x3 using (2) twice. Any book on calculus will contain many similar examples and exercises for you to practice. \par Before we go any further, we need a word of warning about notation. In particular, there are many ways of writing the derivative of a function f at the point x. Different authors have different preferences. So far we have used the notation ddxf(x), which was promoted by Leibnitz. Another notation, used by Newton, has two forms: f(x) or f˙(x). Although neater in some circumstances, it is very easy to misread a dot or apostrophe and so care is needed. We will use both kinds of notation.

General rules

Linearity, which is expressed in the formulae (4) and (5), together with our result (2) allows us to calculate the derivative of any polynomial by breaking it into separate parts. In fact (4) and (5) involve two {\em general functions}. What would be really useful would be two rules which allow us to calculate gradients when general functions are multiplied or composed together, that is to say, rules which allow us to find \setcounter{equation}{5} ddx(f(x)×g(x)) and ddxf(g(x)), where f and g are any differentiable functions. We make a huge assumption in believing that such general rules really exist. However, {\em if they do} then the rules applied to xn in various different ways must respect the result (2). For example, x4 may be written as x2×x2, or as x×x3. The rule for (6), if it exists, must give 4x3 when applied to each of these ways of writing x4. Otherwise we could obtain different answers for the derivative. So, we look at different ways of writing xn as a product, and try to find a rule which is consistent, at least for these. \par Let's start by defining F(x)=xn and split this up into f(x)=xmandg(x)=xnm, so that F(x)=f(x)×g(x). We know using (2) that F(x)=nxn1,f(x)=mxm1,andg(x)=(nm)xnm1. Our task is to write F(x) in terms of f(x) and g(x) as an attempt to gain some insight into what the general rule (6) might be. That is, we write

where A and B are unknown functions of x. Now, using algebra we can confirm that

Thus if we take A(x)=xnm=g(x) and B(x)=xm=f(x) we have a correct general rule whenever we split xn into (8). This rule may be written as \setcounter{equation}{7} F(x)=f(x)g(x)+f(x)g(x). Immediately, by linearity, it follows that (8) holds for any polynomial. \par Can we find a rule for general functions, like sin(x), which are not polynomials? Certainly, if a general rule exists, when we apply this to xn=xm×xnm, the rule should reduce to (8). In fact, the rule for general functions turns out to be {\em precisely} (8), and is better known as the {\em Product Rule}. \par Similarly, if we write F(x)=xnm=(xn)m and f(x)=xn,andg(x)=xm, we have F(x)=g(f(x)). Again, we know using (2) that F(x)=nmxnm1,f(x)=nxn1,andg(x)=mxm1. Our task is to write F(x) in terms of f(x) and g(x) as an attempt to gain some insight into what the general rule (7) might be. But

\setcounter{equation}{8} which suggests a general rule F(x)=f(x)g(f(x)).

There's a rule for everything - even chains!

There's a rule for everything - even chains!
Image DHD Photo Gallery

In fact, the rule for general functions turns out to be precisely (9) again, and is better known as the Chain Rule.

These rules are really very comprehensive, and proofs may be found in any text book on advanced calculus. A huge range of functions can be build up, and conversely decomposed, using these rules. When trying to differentiate a complicated function, the method is to decompose it into simpler components, and work with these separately. The derivatives of these simple parts can be recombined using the general rules to find the derivative of the original function. For completeness we state these rules again below, in two common forms of notation and for each give a worked example.

The chain rule

The following rule allows us to differentiate functions built up by compositions. Let's assume that we have some function which we can choose to write as a composition g(f(x)). Let u=f(x), then \setcounter{equation}{9} ddx(g(f(x))=(ddxf(x))(ddug(u)), or, using alternative notation, (g(f(x)))=f(x)g(f(x)). \par {\bf Example:} \par Let us return and consider the function sin(x2). Writing g(u)=sin(u) and u=x2, we already know that ddusin(u)=cos(u)andddxx2=2x. Substituting these into (10) gives ddxsin(x2)=2xcos(x2). Note: we have cos(x2) here, not cos(x), as we have u=x2.

The product rule

Functions can be built up by components which are multiplied together. For example, f(x)×g(x). To differentiate this we use the formula \setcounter{equation}{10} ddx(f(x)×g(x))=(ddxf(x))×g(x)+f(x)×(ddxg(x)), or, using alternative notation, (f(x)×g(x))=f(x)g(x)+f(x)g(x). \par {\bf Example:} \par This time consider the function x2sin(x). This can be decomposed into the two functions x2×sin(x), each of which we know how to differentiate. Therefore we can use the rule (11) immediately to write ddxx2sin(x)=2xsin(x)+x2cos(x).

The quotient rule

Functions can be built up by components which are divided one by the other. Actually, since dividing by a is identical to multiplying by 1a to work out the derivative of f(x)g(x) we could just apply the product rule to f(x)×1g(x), and the chain rule to g(x) composed with 1x=x1. But it is convenient to have a separate rule for this, which is \setcounter{equation}{11} ddx[f(x)g(x)]=g(x)[ddxf(x)][ddxg(x)]f(x)g(x)2, or, using alternative notation, [f(x)g(x)]=g(x)f(x)g(x)f(x)g(x)2. \par {\bf Example:} \par This time we differentiate sin(x2)x2. We know how to differentiate sin(x2), even though it is itself a composition. We applied (10) to this and it has derivative 2xcos(x2). Using (12) gives ddx[sin(x2)x2]=x2[ddxsin(x2)][ddxx2]sin(x2)(x2)2=x2×2xcos(x2)2xsin(x2)x4=2x2cos(x2)2sin(x2)x3.

Applying the rules

To finish, instead of giving lots of different examples (which can be found in any calculus text), we take the reverse approach and think about one example in more detail. In particular, we return to the function x4. We know from the previous result (2) that the derivative of x4 is 4x3. We write this as ddxx4=4x3. Using the rules of algebra we could write this function in a number of ways. We can also think of x4=x2×x2. Alternatively, x4=x×x3 or even x4=x×x×x×x. \par Taking the first of these, since the derivative of x2 is 2x, we may apply the {\em product rule} to give ddx(x4)=ddx(x2×x2)=2x×x2+x2×2x=2x3+2x3=4x3. We could also apply the product rule to one of the other representations of this function. In particular, we could calculate the derivative as ddx(x4)=ddx(x×x3)=1×x3+x×3x2=x3+3x3=4x3. Similarly we could think of x4 being a composition of two functions: as "x-squared, all squared", that is to say, x4=(x2)2. In this case we may apply the {\em chain rule} to see that ddx(x4)=ddx((x2)2)=2(x2)×2x=4x3. Notice that in each case we get the same correct answer. \par We need not stop there. For example, we could write x4=x6/x2 and apply the quotient rule. If we do this we have ddx(x6x2)=x2×ddx(x6)ddx(x2)×x6(x2)2=x2×6x52x×x6x4=6x72x7x4=4x3. The point is that in order to find the derivative of x4 we may do {\em anything algebraically legitimate} and apply {\em any of the rules for differentiation correctly.} The way we choose to find the derivative, as long of course as it is applied correctly, does not matter. We could ask: How many ways are there of differentiating x4? Can you think of others?

About the author

T

Chris in the Volcanoes National Park, Hawaii, Summer 2003

Chris Sangwin is a member of staff in the School of Mathematics and Statistics at the University of Birmingham. He is a Research Fellow in the Learning and Teaching Support Network centre for Mathematics, Statistics, and Operational Research. His interests lie in mathematical Control Theory.

Chris would like to thank Mr Martin Brown, of Thomas Telford School, for his helpful advice and encouragement during the writing of this article.