Reply to comment
In Making the grade: Part I in Issue 27 of Plus we considered what the gradient of a curve might mean, and how to find it by appealing directly to the definition. In particular, we used direct arguments - which were really quite involved - to calculate the gradients of the curves x2 and sin(x). To perform this kind of calculation every time we need to calculate such a gradient would be a nightmare - especially if we had a complicated function. In this article we think about the process of manipulating the algebraic expressions with which we usually describe functions in order to perform this calculation. This is differentiation as we know and love it!
The other kind of gradient.
Image DHD Photo Gallery
The whole point of having a set of formal rules is to allow us to temporarily forget the exact meaning and to concentrate on calculation. After all, we can only concentrate on a few things at a time. Of course, it is vital to keep the meaning in the back of our minds, as a check that the answer is sensible. Furthermore, by having a set of rules disjoint from a particular context, we can apply the rules in many different settings.
The rules allow us to differentiate just about any algebraic expression we care to write down. Of course we have to decide which formal rules to apply in a given situation, and in what order. Sometimes it is not clear which rule we should apply - there are a number of things we could do correctly. What then to do? How should we decide? As we shall see, the answer to these questions is surprising and illustrates the intimate way in which calculus and algebra interact.
Calculating gradients using the calculusIn the previous article we calculated the gradient by considering the change in divided by the change in , that is,
Let’s generalize this and consider where is any natural number, that is, . Of course, we have already considered the cases when (the straight line) and when (the quadratic).
In order to calculate (1) when we need to consider
To simplify this we need to expand out the term
When we do this for small values of we get
We could do this by hand for other values of by multiplying out the brackets, but this is tricky, time-consuming and it is all too easy to slip up. In fact, there is a very regular pattern to the coefficients of the terms in the expansions above. If we ignore the ’s and ’s we obtain the pattern known as Pascal’s Triangle, part of which is shown below.
This pattern, which may be continued forever, is obtained by adding together two adjacent numbers in one row to generate the number below. For example, the blue is generated by adding the and above it. Similarly, the red is obtained by adding the and above it.
In general, the number in the position in from the left on the th row is given by the formula
These numbers are known as the binomial coefficients, because is the coefficient of when we expand . This result is known as the binomial theorem and it allows us to exploit this pattern to write as
We are currently interested in calculating the quantity (1) when . To do this we note that, for all values of ,
Using this we have
We don’t really need to know exactly what the " stuff" is in the above expression in this case. This is because it is multiplied by , and letting tend to zero wipes out all these terms. What we are left with is the function So we express this result as
whenever is a natural number (ie, ). If in this argument we have
which confirms that the gradient of the straight line is constant.
This result may be expanded so that (2) holds whenever is a real number, although this takes a little more work.
To take another example, let (remember that for all ). So we have the formula
Functions are fundamental to modern mathematics and you simply can’t avoid using them. The idea of a function is to take two sets of objects known as the inputs and outputs. To every input the function assigns a unique output:
Most often the inputs and outputs are sets of numbers, such as the real line. The function is also most often described using a formula, in the form of an algebraic expression. This is exactly the idea of a function we have considered so far, although we haven’t been explicit about it! This is also the way we will continue to think about functions.
The reason we pause now to think of functions in a more abstract way is simply to acknowledge that a function is much more general than a formula. In fact, the function introduced in the previous article was built from two formulae bolted together. Recall these were
The trigonometric functions , , etc. are constructed with reference to a geometrical shape - in this case a circle of radius . Other ways of building functions involve an infinite series (that is a sum) or a sequence of formulae. We won’t consider these in this article, but just concentrate on how we can build up functions from simple operations.
Let us assume our input is a number . The simplest operations we could perform on our variable are the arithmetic ones. That is addition, multiplication and the two inverse operations of subtraction and division. Because we can manipulate the formulae using algebra we can often write one formula in different ways.
For example, consider
Figure 1: The function (3)
We can think of as multiplied by . Both these functions are shown in Figure 2. Try to imagine what happens when you multiply the values on each graph together.
Figure 2: The functions x2 and 3-x2
Alternatively, to calculate we might subtract from . The graphs of these functions are shown in Figure 3. Since we can recreate Figure 1 by subtracting one from the other. No doubt there are other ways of constructing the same function .
Figure 3: The functions x4 and 3x2
Functions can also be applied in order, one after the other, as in
Figure 4: The function sin(x2)
Of course, we have to ensure that any output from Function 1 is a legitimate input for Function 2. In this case we say the two functions have been composed. For example, a function such as sin(x2) can be thought of as the function that maps x to sin(x) applied to the result of the function that maps x to x2. Note that the
order really does matter here and sin(x2) and sin(x)2 are very different functions: see Figures 4 and 5.
Figure 5: The function sin(x)2
Given the numerous ways we could express a function such as (3), how should we go about differentiating it? This is the question we address in the rest of this article.
Linearity of the differential calculus
You don't always get the same result if you do things in a different order!
The first general rule allows us to calculate the derivative of two functions which have been added together. If we want to find the gradient of f(x)+g(x) we simply find the gradients of f(x) and g(x) separately and then add the results. In a more condensed (and easier to read) form this may be expressed as:
For example, to calculate the gradient of the function defined in (3), we write this as the unfactored form and can then apply the rules as follows:
Any book on calculus will contain many similar examples and exercises for you to practice.
Before we go any further, we need a word of warning about notation. In particular, there are many ways of writing the derivative of a function at the point . Different authors have different preferences. So far we have used the notation
which was promoted by Leibnitz. Another notation, used by Newton, has two forms:
Although neater in some circumstances, it is very easy to misread a dot or apostrophe and so care is needed. We will use both kinds of notation.
Linearity, which is expressed in the formulae (4) and (5), together with our result (2) allows us to calculate the derivative of any polynomial by breaking it into separate parts. In fact (4) and (5) involve two general functions. What would be really useful would be two rules which allow us to calculate gradients when general functions are multiplied or composed together, that is to say, rules which allow us to find
where and are any differentiable functions. We make a huge assumption in believing that such general rules really exist. However, if they do then the rules applied to in various different ways must respect the result (2). For example, may be written as , or as . The rule for (6), if it exists, must give when applied to each of these ways of writing . Otherwise we could obtain different answers for the derivative. So, we look at different ways of writing as a product, and try to find a rule which is consistent, at least for these.
Let’s start by defining and split this up into
We know using (2) that
Our task is to write in terms of and as an attempt to gain some insight into what the general rule (6) might be. That is, we write
where A and B are unknown functions of x. Now, using algebra we can confirm that
Thus if we take and we have a correct general rule whenever we split into (8). This rule may be written as
Immediately, by linearity, it follows that (8) holds for any polynomial.
Can we find a rule for general functions, like , which are not polynomials? Certainly, if a general rule exists, when we apply this to , the rule should reduce to (8). In fact, the rule for general functions turns out to be precisely (8), and is better known as the Product Rule.
Similarly, if we write
we have . Again, we know using (2) that
Our task is to write in terms of and as an attempt to gain some insight into what the general rule (7) might be. But
which suggests a general rule
There's a rule for everything - even chains!
Image DHD Photo Gallery
In fact, the rule for general functions turns out to be precisely (9) again, and is better known as the Chain Rule.
These rules are really very comprehensive, and proofs may be found in any text book on advanced calculus. A huge range of functions can be build up, and conversely decomposed, using these rules. When trying to differentiate a complicated function, the method is to decompose it into simpler components, and work with these separately. The derivatives of these simple parts can be recombined using the general rules to find the derivative of the original function. For completeness we state these rules again below, in two common forms of notation and for each give a worked example.
The chain rule
The following rule allows us to differentiate functions built up by compositions. Let’s assume that we have some function which we can choose to write as a composition . Let , then
or, using alternative notation,
Let us return and consider the function . Writing and , we already know that
Substituting these into (10) gives
Note: we have here, not , as we have .
The product rule
Functions can be built up by components which are multiplied together. For example, . To differentiate this we use the formula
or, using alternative notation,
This time consider the function . This can be decomposed into the two functions , each of which we know how to differentiate. Therefore we can use the rule (11) immediately to write
The quotient rule
Functions can be built up by components which are divided one by the other. Actually, since dividing by is identical to multiplying by to work out the derivative of we could just apply the product rule to , and the chain rule to composed with . But it is convenient to have a separate rule for this, which is
or, using alternative notation,
This time we differentiate . We know how to differentiate , even though it is itself a composition. We applied (10) to this and it has derivative . Using (12) gives
Applying the rules
To finish, instead of giving lots of different examples (which can be found in any calculus text), we take the reverse approach and think about one example in more detail. In particular, we return to the function . We know from the previous result (2) that the derivative of is . We write this as
Using the rules of algebra we could write this function in a number of ways. We can also think of . Alternatively, or even .
Taking the first of these, since the derivative of is , we may apply the product rule to give
We could also apply the product rule to one of the other representations of this function. In particular, we could calculate the derivative as
Similarly we could think of being a composition of two functions: as "-squared, all squared", that is to say, . In this case we may apply the chain rule to see that
Notice that in each case we get the same correct answer.
We need not stop there. For example, we could write and apply the quotient rule. If we do this we have
The point is that in order to find the derivative of we may do anything algebraically legitimate and apply any of the rules for differentiation correctly. The way we choose to find the derivative, as long of course as it is applied correctly, does not matter. We could ask: How many ways are there of differentiating ? Can you think of others?
About the author
Chris in the Volcanoes National Park, Hawaii, Summer 2003
Chris Sangwin is a member of staff in the School of Mathematics and Statistics at the University of Birmingham. He is a Research Fellow in the Learning and Teaching Support Network centre for Mathematics, Statistics, and Operational Research. His interests lie in mathematical Control Theory.
Chris would like to thank Mr Martin Brown, of Thomas Telford School, for his helpful advice and encouragement during the writing of this article.