WEBVTT mathematics/multivariable-calculus/hovasapian
00:00:00.000 --> 00:00:05.000
Hello and welcome back to educator.com and multi-variable calculus.
00:00:05.000 --> 00:00:13.000
Today we are going to talk about the chain rule, and you remember from single-variable calculus the chain rule just allowed you to differentiate functions that were composite functions.
00:00:13.000 --> 00:00:27.000
Composite functions where something like sin(x³), something like that, you took the derivative of the sine, then you took the derivative of what was inside the argument, the x³, that became 3x².
00:00:27.000 --> 00:00:45.000
Now that we are dealing with these vector value functions, these functions of several variables and we have introduced the gradient, we can actually bring those tools to bare on differentiating a function, a composite function that involves functions of several variables.
00:00:45.000 --> 00:00:52.000
So let us just jump right on in, and let me give a quick description of what it is that is going on.
00:00:52.000 --> 00:01:00.000
Again, what we want to do is not just jump right into the mathematics, we do not just want to write symbols on a page, we want to be able to understand what is happening.
00:01:00.000 --> 00:01:11.000
When there is understanding, when you can see what is going on, you can use the intuition that you have already developed to decide what goes next and where to go. That is the whole idea.
00:01:11.000 --> 00:01:16.000
We want you to understand what is happening mathematically before you actually do the mathematics.
00:01:16.000 --> 00:01:22.000
That is the easy part, the mathematics will come, that is just symbolic manipulation but we want to see what is going on.
00:01:22.000 --> 00:01:30.000
Let us start in R2, let us start in the plane, again we are using our geometric intuition to help us guide our mathematics.
00:01:30.000 --> 00:01:44.000
So, let us say we have something like this. Now let us say we have some region in... you know, here... so now, let us suppose the following things.
00:01:44.000 --> 00:02:05.000
So suppose f is a function from R2 to R, so a function of 2 variables, defined on the open set u. This is u by the way. Open set u.
00:02:05.000 --> 00:02:43.000
Now, also suppose that C, which is a function from R to R2 -- we are dealing with R2, okay -- R to R2, is a curve in 2-space, that passes through u.
00:02:43.000 --> 00:02:52.000
Let us say this is just something like that. So passes through u. Now, here is what is great.
00:02:52.000 --> 00:02:57.000
The points along the curve, the function itself is defined on this open set.
00:02:57.000 --> 00:03:02.000
In other words the points in this set can be used for this function f.
00:03:02.000 --> 00:03:11.000
Well, the curve happens to pass through u, so the point on the curve can be used in f.
00:03:11.000 --> 00:03:20.000
That is what we are doing, so what we can do is we can actually form the composite function, what we have is some curve that has nothing to do with u and yet it happens to pass through u.
00:03:20.000 --> 00:03:31.000
We also have a function that is defined on u because there is this overlap we can use the... in the function f, we can use the points along the curve.
00:03:31.000 --> 00:03:35.000
That is what is really, really great here.
00:03:35.000 --> 00:04:00.000
Let us go ahead and write this down, and then the points along c(t) can be used by the function f.
00:04:00.000 --> 00:04:30.000
In other words, we can form the composite function of c, which is f(c).
00:04:30.000 --> 00:04:38.000
Let me write it with the actual t... oh this is a capital F by the way, sorry about that... small f's big F's, hm.
00:04:38.000 --> 00:04:49.000
F(c(t)), that is it, it is a composite function. Except now, instead of a composite function of single variables, it is a composite function of multi-variables.
00:04:49.000 --> 00:04:58.000
In this case, we are dealing with 2. Let us just do a quick example, and then we will discuss it just a little bit more. Just to make sure we understand this concept.
00:04:58.000 --> 00:05:07.000
This is a profoundly important concept, so like the gradient we definitely want to have a good sense of what is going on here.
00:05:07.000 --> 00:05:11.000
We will take the time to make sure that that is the case.
00:05:11.000 --> 00:05:19.000
Let us just do a quick example. Just so that we can see... so example 1.
00:05:19.000 --> 00:05:32.000
Now, let c(t), let the curve equal the following: let go t³ then t², so t³ and t².
00:05:32.000 --> 00:05:45.000
Now we will let our f, capital F of (x,y), of two variables, let us call it ln(y) and let us do cos(xy). How is that?
00:05:45.000 --> 00:06:06.000
Just a... you know... nice random function. Now, f(c(t)). Notice. F is a function of two variables x and y, so its argument contains 2 things.
00:06:06.000 --> 00:06:20.000
c(t) has two things, for x we put in the t³, for y we put in the t², into here and we form the function F as a function of t.
00:06:20.000 --> 00:06:32.000
Watch what happens. F(c(t)) = f(t³,t²), right?
00:06:32.000 --> 00:06:40.000
f(x,y) that just means these 2 things, so wherever I see a y I put in whatever is in here, wherever I see an x I put in what is in here.
00:06:40.000 --> 00:06:49.000
Well F(c(t)) is this, it actually spits out 2 values, those 2 values, this is x, this is y.
00:06:49.000 --> 00:07:12.000
We end up getting the logarithm of t² × cos... natural logarithm × the cosine(t³t², which equals ln(t²) × cos(t⁵).
00:07:12.000 --> 00:07:30.000
That is it, that is all I have done here. I have formed this composite function now, with 2 different functions that mapped to different spaces, but the space where one maps to is exactly the space that the next function needs as its domain in order to take the next step.
00:07:30.000 --> 00:07:46.000
Let me actually write down this whole idea of the functions again, so C(t) is a map from R to R2.
00:07:46.000 --> 00:07:54.000
In other words, it takes numbers, real numbers, and it spits out 2 vectors, a point in 2 space.
00:07:54.000 --> 00:08:07.000
F(x,y) it takes a vector, a point in 2-space, and it spits out a number.
00:08:07.000 --> 00:08:19.000
So what I have really done here, the composite function actually ends up being a map from R to R. That is what is happening here.
00:08:19.000 --> 00:08:33.000
That is what is important to see. You can jump around from space to space, that is what makes multi-variable calculus so unbelievably powerful. 0826 That you can actually jump around from space to space like this, with well defined functions.
00:08:33.000 --> 00:08:40.000
This is going to be the x,y plane, this is R2. I will write you another copy of the real number line.
00:08:40.000 --> 00:08:47.000
So, this is the real number line, this is R2, 2-space, and this is the real number line.
00:08:47.000 --> 00:08:55.000
C(t) maps from here to here.
00:08:55.000 --> 00:09:00.000
A point to a vector... a vector in 2-space.
00:09:00.000 --> 00:09:09.000
F goes from here, takes a point in 2-space, and maps to here, so this is c(t) and this is f.
00:09:09.000 --> 00:09:26.000
When we find the composite function, f(c(t)), what I have now is a map from R to R. That is what this example is.
00:09:26.000 --> 00:09:32.000
I have this that goes to a point in 2-space. F takes a point in 2-space and spits out some number.
00:09:32.000 --> 00:09:43.000
Noticed I ended up with some function of t, ln(t²)t⁵ a specific value of t. This is just a single number.
00:09:43.000 --> 00:09:53.000
This is what is going on here. You are just forming a composite function with curves and functions of several variables.
00:09:53.000 --> 00:09:56.000
Hopefully this is reasonably clear. This is what we want to understand. This is what is happening mathematically.
00:09:56.000 --> 00:10:07.000
You are mapping from one space to another, and then you are moving from that space to another space. In this case the space you end up with happens to be the space you started off with, which is the real number line.
00:10:07.000 --> 00:10:18.000
So now the chain rule allows us to differentiate something like this. So now, let us go ahead and explicitly write down what the chain rule is.
00:10:18.000 --> 00:10:30.000
I want you to see this simply because I want you also to start becoming accustomed to the expression of theorems, formal things, but again, it has to be based on understanding.
00:10:30.000 --> 00:11:06.000
It is a little long, but there is nothing here that is strange, so let f be a function defined and differentiable on an open set u.
00:11:06.000 --> 00:11:53.000
Let c be a differentiable curve, all that means is a nice smooth curve with no whacky bumps or corners... be a differentiable curve such that the values of c(t) lie in the open set u.
00:11:53.000 --> 00:12:05.000
What we were doing in the beginning of the lesson, we have an open set, we have a curve that happens to pass through that open set, therefore the points along the curve can be used for our function.
00:12:05.000 --> 00:12:18.000
Then, the composite function, f(c(t)) is differentiable.
00:12:18.000 --> 00:12:57.000
It is differentiable itself... as a function of t and the derivative of f(c(t)) with respect to t is equal to the gradient of f evaluated at c(t). 1250 The dot product of that vector with the vector c'(t), now you remember the gradient is a vector.
00:12:57.000 --> 00:13:14.000
If I have some function, like f(x,y), the gradient is df/dx, and the second component is df/dy, I just differentiate as many variables, and that is my gradient vector.
00:13:14.000 --> 00:13:32.000
So let us stop and think about what this says. If I have some function that is defined and differentiable on some open set, and c happens to be a differentiable curve that passes through that open set, in other words take some values in that open set, then the composite function f(c(t)) is also differentiable.
00:13:32.000 --> 00:13:45.000
It is differentiable as a function of t, as a single variable t and the derivative of that composite function is equal to the gradient of f at c(t) ⋅ c'(t).
00:13:45.000 --> 00:13:51.000
This is very, very, very important.
00:13:51.000 --> 00:14:01.000
Now, for computations, when we actually do specific problems, we of course are going to be working with components, which is always the case.
00:14:01.000 --> 00:14:12.000
With vectors we can go ahead and write out the definitions and the theorems using a shorter, more elegant notation, but when we actually do the computations with vectors we have to work with components.
00:14:12.000 --> 00:14:20.000
x,y,z, whatever it is that we happen to be working with. So, let us go ahead and just sort of write out the component form of this so you see what is happening.
00:14:20.000 --> 00:14:28.000
Again, the dot product is the same dot product that you know. There is nothing new here, trust what you know.
00:14:28.000 --> 00:14:37.000
This is a vector, this is a vector, when you take the dot product of 2 vectors you get a number. That is what this says. You are getting a derivative.
00:14:37.000 --> 00:14:46.000
A function of t, if you evaluated a specific point of t, it is actually just a number. You are still just doing a derivative. The same thing you have been doing for years.
00:14:46.000 --> 00:15:20.000
Let us just see here. So, if c(t) equals, now c₁, that is the... so c(t) is a curve... c₂(t), its component functions are component functions of t, just like the first example, and we have f(x₁,x₂).
00:15:20.000 --> 00:15:24.000
This time I did not write it as x and y, I wrote it as x₁ and x₂, these are variables.
00:15:24.000 --> 00:15:49.000
The first variable, the second variable. Then the derivative with respect to t of the f(c(t)) = well, we said it equals the gradient of f evaluated at c(t) ⋅ c'(t).
00:15:49.000 --> 00:16:08.000
Okay, the gradient -- I should probably write this out -- the gradient of this function is going to be... tell you what, let me go ahead and before I write that, let me write out the gradient because I know it has been a couple of lessons since we did that.
00:16:08.000 --> 00:16:22.000
So, let me write the gradf = df/dx₁, df/dx₂.
00:16:22.000 --> 00:16:28.000
This is a vector, the first component of which is the derivative with respect to the first variable.
00:16:28.000 --> 00:16:31.000
The second component is the derivative of the function with respect to the second variable.
00:16:31.000 --> 00:16:46.000
Now, c'(t) is also a vector. It is the derivative of this, c₁'(t) and it is the derivative of this, c₂'(t).
00:16:46.000 --> 00:16:58.000
That is it, these are just functions, so now what we have is the derivative with respect to t of f(c(t)), in other words this thing right here.
00:16:58.000 --> 00:17:05.000
We said it is the gradient of f ⋅ c', this is the gradient of f, this is c'.
00:17:05.000 --> 00:17:12.000
So let us see what this looks like in component form. It is... oh, you know what, I have a capital F, don't I?
00:17:12.000 --> 00:17:19.000
I keep forgetting that, that small f is just so ubiquitous in most scientific literature.
00:17:19.000 --> 00:17:42.000
So, we have df/dx₁ × dc₁/dt, that is all this is, c' is just dc₁/dt.
00:17:42.000 --> 00:18:10.000
It is just notation. dc₂/dt. The dot product is this × that + this × that + df/dx₂ × dc₂/dt.
00:18:10.000 --> 00:18:16.000
So that is it, that is all we are doing here. We are just doing it in component form.
00:18:16.000 --> 00:18:25.000
Now personally, I think that what I have just written here is actually a little bit more confusing than just the statement of the theorem.
00:18:25.000 --> 00:18:34.000
If you look at it as just the statement of the theorem, the gradient of f dotted with c', and if you know what the gradient is, you know what c' is, you know how to take derivatives.
00:18:34.000 --> 00:18:41.000
You just do the dot product. This is sort of the component representation of it.
00:18:41.000 --> 00:18:45.000
I personally do not like seeing all of these things because again it is notationally intensive.
00:18:45.000 --> 00:18:50.000
The idea is to understand what this is, and then you can do the rest.
00:18:50.000 --> 00:19:05.000
So, personally, my favorite, I still think it is great to learn it this way. Gradient ⋅ c'. Gradient of f ⋅ c'. Just keep telling yourself that about 5 or 6 times, and you will know what to do.
00:19:05.000 --> 00:19:11.000
So, let us go ahead and just do an example, that is the best way to make sense of this.
00:19:11.000 --> 00:19:17.000
So, let me go back to my black ink here. Actually you know what, let me go ahead and go to blue.
00:19:17.000 --> 00:19:53.000
So example 2. Now, we will let our curve t be t, e(t) and t², and we will let our function x, y, z, so we are definitely talking about a curve in 3-space, and a function of 3 variables, equals xy²z.
00:19:53.000 --> 00:20:01.000
First of all, let us talk about what is going on here. We are going to form the composite function. We are going to be forming f(c(t)).
00:20:01.000 --> 00:20:20.000
That is what we are going to be doing. Well, f(c(t)), x,y,z, in this case x is this thing, t, y is this thing e(t), and z is this t².
00:20:20.000 --> 00:20:31.000
We want to write everything out. Now the gradient of f, that is it, we are just going to build this step by step by step, that is all we are doing here.
00:20:31.000 --> 00:20:44.000
The gradient of f is equal to, well it is the first component is the first partial, the second component is the second partial, the third component is the third partial.
00:20:44.000 --> 00:20:54.000
If you like the other notation it is going to be df/dx, df/dy, and it is going to be df/dz.
00:20:54.000 --> 00:21:04.000
Now, let us go ahead and actually compute that. The first partial, the derivative of this function with respect to x is y²z.
00:21:04.000 --> 00:21:11.000
The derivative with respect to y is 2xyz.
00:21:11.000 --> 00:21:24.000
And, the derivative with respect to z is going to be xy², so this is my gradient of f.
00:21:24.000 --> 00:21:40.000
Now, my gradient of f, evaluated at c(t), so now we will take the next step, now we will do the grad of f evaluated at c(t) which is the actual expression that is in the definition for the chain rule.
00:21:40.000 --> 00:21:48.000
All that says is that take my gradient f, this thing, and I just put in the values c(t) in here.
00:21:48.000 --> 00:21:52.000
Well, x is t, y is e(t), and z is t².
00:21:52.000 --> 00:22:03.000
So when I put these things into here, here is what I get.
00:22:03.000 --> 00:22:09.000
y² is just, so it is going to end up being t²e(2t), right?
00:22:09.000 --> 00:22:16.000
y² is just e(2t), z is t², so that is t²e(2t).
00:22:16.000 --> 00:22:26.000
2 × x, which is t, y which is e(t), z which is t², I end up with 2t³e(t).
00:22:26.000 --> 00:22:42.000
xy² is t × e(t) × t², I get t × e of... wait, e(2t), yes, there we go...
00:22:42.000 --> 00:22:49.000
Okay, so that takes care of this one. That is the grad of f evaluated at c(t), now I just have to find c'.
00:22:49.000 --> 00:22:58.000
That is really, really simple. C'(t). Well here is my c right here, I will just take the derivative of each one, that is it.
00:22:58.000 --> 00:23:10.000
The derivative of t is 1, the derivative of e(t) is e^t, and the derivative of t² is 2t.
00:23:10.000 --> 00:23:31.000
Well, now I just form my dot product, so the gradient of f evaluated at c(t) dotted with c'(t), it equals this vector dotted with this vector.
00:23:31.000 --> 00:24:06.000
Well the dot product is this × that, so it is t², the dot product is not a vector, this × that so it is t²e(2t) + this × that, 2t³e(2t) + this × that + 2t²e(2t), and that is it.
00:24:06.000 --> 00:24:13.000
Let us see if there is anything that I can combine here, t²e(2t), 2t²e(2t), yes, there is.
00:24:13.000 --> 00:24:29.000
So it is going to be 3t²e(2t) + 2 × t³e(2t), and that is my final answer. That is it. Let me go back.
00:24:29.000 --> 00:24:40.000
I was given some, you know, a curve, and I was given a function, and I just hammered it out.
00:24:40.000 --> 00:24:50.000
I took the gradient as a function, as a vector in x, y, z, I evaluated it at c(t), in other words I put these values in for x, y, z, and I got this.
00:24:50.000 --> 00:24:58.000
Now it is the gradient vector expressed in t. I took the derivative of c which is c', that is easiest enough to do.
00:24:58.000 --> 00:25:01.000
Then I just took the dot product of those vectors. That is it, that is all that is going on here.
00:25:01.000 --> 00:25:23.000
So this happens to be the derivative of f(c(t)). That is what this is equal to. This is equal to f(c(t))... no, we want to definitely, f'(c(t)), that is it. That is all that is going on here.
00:25:23.000 --> 00:25:26.000
It is just a way of differentiating a composite function.
00:25:26.000 --> 00:25:42.000
Now, you are probably asking yourself, Okay, well if I start with a function of t, t goes from R to R3, and then I take the function R3 to R so that it is a function R to R, what I actually have is a function of t, right?
00:25:42.000 --> 00:25:52.000
Yes, it is just a function of t that you are differentiating. You are saying, well wait a minute, if I found f(t) up here, couldn't I just do this directly? Do I have to use the chain rule?
00:25:52.000 --> 00:25:56.000
The answer is no, you do not have to use the chain rule, you can do it directly. Which one is better?
00:25:56.000 --> 00:26:02.000
Well, actually it depends on the situation. It depends on the function, it depends on what you are doing, that is all it is.
00:26:02.000 --> 00:26:06.000
So, let us go ahead and actually do it directly just to confirm that you can do it directly.
00:26:06.000 --> 00:26:14.000
I think it will shed a little bit more light on this relationship between the curve and the function.
00:26:14.000 --> 00:26:29.000
Let us do this in blue again... so we said that f(x,y,z) is equal to xy²z.
00:26:29.000 --> 00:26:35.000
We said that c(t), let us just rewrite them over again.
00:26:35.000 --> 00:26:46.000
Let us see, what did we say c(t) was... t, e(t), and t², so now let us just form f(c(t)).
00:26:46.000 --> 00:27:13.000
So f(c(t) = well, xy²z, x is t, so that is t, y² is e(2t), and z is going to be t², I am just plugging those in, and I end up with t³e(2t).
00:27:13.000 --> 00:27:36.000
Well, not if I just take... this is just a function of t, so if I just take df with respect to t, I end up getting, so it is going to be this × the derivative of that, which is going to be 2t³e(2t) + that × the derivative of that, 3t²e(2t).
00:27:36.000 --> 00:27:40.000
What do you know, you end up with the same exact answer.
00:27:40.000 --> 00:27:47.000
Which is better? Again it just depends on your particular situation.
00:27:47.000 --> 00:27:52.000
Sometimes you want to do it directly if it makes more sense, sometimes you want to use the chain rule if it makes more sense.
00:27:52.000 --> 00:27:56.000
The problem at hand we will actually decide which one is better.
00:27:56.000 --> 00:28:03.000
Ok, so that is the chain rule, thank you very much for joining us here at educator.com. We will see you next time. Take care, bye-bye.