WEBVTT mathematics/statistics/son
00:00:00.000 --> 00:00:02.500
Hi and welcome to www.educator.com.
00:00:02.500 --> 00:00:08.100
Today we are going to be talking about normal distributions again but this time breaking it down into the PDF
00:00:08.100 --> 00:00:13.800
or probability density function and CDF or the cumulative distribution function.
00:00:13.800 --> 00:00:18.100
Here is what we have for today.
00:00:18.100 --> 00:00:22.300
We are going to be talking about frequency charts which we have been doing before
00:00:22.300 --> 00:00:31.400
and then contrast it to this new idea but still fairly elementary, cumulative frequency of charts.
00:00:31.400 --> 00:00:34.200
We are going to do a very brief review of calculus.
00:00:34.200 --> 00:00:42.000
It is going to be a very elementary review, no actual calculations just conceptual.
00:00:42.000 --> 00:00:48.000
Then we are going to talk more deeply about the probability density function and the cumulative distribution function.
00:00:48.000 --> 00:00:56.300
You have to reprogram yourself to see PDF and not think of it as a portable document or format.
00:00:56.300 --> 00:01:01.200
Let us talk about frequency versus cumulative frequency.
00:01:01.200 --> 00:01:07.300
So far what we have been doing is we have some sort of variables such as score on SAT verbal.
00:01:07.300 --> 00:01:11.900
We have all these values that value can potentially hold.
00:01:11.900 --> 00:01:21.200
We have been talking about what percentage of our sample or population have achieved that score.
00:01:21.200 --> 00:01:26.100
1% score is 800, 3 score is 750.
00:01:26.100 --> 00:01:29.200
That is what we have been talking about so far.
00:01:29.200 --> 00:01:40.400
Here when I write percentage I am talking about relative frequency but it is largely the same thing as frequency.
00:01:40.400 --> 00:01:42.100
Not a big deal.
00:01:42.100 --> 00:01:52.100
When we talk about cumulative percentile, what we are really talking about is not just the people
00:01:52.100 --> 00:01:58.100
who has achieved that value but an accumulation of everybody who has come before it.
00:01:58.100 --> 00:02:00.100
Let us start off at the bottom.
00:02:00.100 --> 00:02:06.000
Here only 1% of the population has achieved 250 points.
00:02:06.000 --> 00:02:09.200
It think that is one of the minimum or something.
00:02:09.200 --> 00:02:16.300
But 3% have achieved 300 or below.
00:02:16.300 --> 00:02:24.200
If you are in the third percentile, you have out performed 3% of every else that has taken the test.
00:02:24.200 --> 00:02:27.600
Not something to write home about yet.
00:02:27.600 --> 00:02:43.600
This 8% actually accounts for this 3% as well plus a little bit extra.
00:02:43.600 --> 00:02:48.700
The 16% encapsulates everybody who has come before it.
00:02:48.700 --> 00:02:55.000
Cumulative percentile is helpful is you want to know your ranking in a performance.
00:02:55.000 --> 00:02:59.600
For instance you want to know what percentile of the population you are in.
00:02:59.600 --> 00:03:05.600
You just do not want to know what percent of everybody who is around you has also achieved that score.
00:03:05.600 --> 00:03:08.300
You want to know how many people you have out performed.
00:03:08.300 --> 00:03:14.700
Cumulative percentile continuously adding all the people that have come before you.
00:03:14.700 --> 00:03:17.900
It gives you that ranking.
00:03:17.900 --> 00:03:29.700
If you are in the 95 percentile, you know out performed or equally 95% of that sample.
00:03:29.700 --> 00:03:35.500
That is cumulative frequency is really helpful to us.
00:03:35.500 --> 00:03:41.900
One of the things that you want to notice is that cumulative frequency, when you just look at the number right away,
00:03:41.900 --> 00:03:45.700
you do not know how common that score is.
00:03:45.700 --> 00:03:54.700
When you look at 98% you do not know how common 750 as a score is but you could easily find that out
00:03:54.700 --> 00:04:00.300
just by looking at the difference between 98 and whatever cumulative frequency came before it.
00:04:00.300 --> 00:04:06.700
That difference is 3, so 3% of people have been in that bracket.
00:04:06.700 --> 00:04:14.300
Another thing about cumulative frequency I want you to notice is that it is a monotonic increase.
00:04:14.300 --> 00:04:19.800
It means that there is no going up and then going back down.
00:04:19.800 --> 00:04:22.200
There are no changes in direction.
00:04:22.200 --> 00:04:25.100
It is continuously going up, up, and up.
00:04:25.100 --> 00:04:29.600
That makes sense because you have to add up everybody who has come before you.
00:04:29.600 --> 00:04:31.800
That is cumulative frequency.
00:04:31.800 --> 00:04:40.300
When you look at it on a visualization, you could see what I mean by monotonic increase.
00:04:40.300 --> 00:04:46.600
Here we have an example of monotonic increase.
00:04:46.600 --> 00:04:53.600
This curve goes up, and up, and up because it is adding up everybody who has come before you.
00:04:53.600 --> 00:04:58.300
You have to be at least the score before you or higher.
00:04:58.300 --> 00:05:03.800
Every score is improving on the previous score.
00:05:03.800 --> 00:05:14.900
Whereas in frequency we can have non monotonic curves because here you could go up, down, we could have the uniform distribution.
00:05:14.900 --> 00:05:21.000
We could have all kinds of things but in cumulative frequency distribution you cannot have a uniform distribution.
00:05:21.000 --> 00:05:29.300
If everybody has only see the bottom score then you would have a uniform distribution because you will be adding 0 everytime.
00:05:29.300 --> 00:05:36.600
Otherwise the most frequent shape that you will see is a monotonic increase that looks like this.
00:05:36.600 --> 00:05:45.100
Here we see just like this normal looking distribution and what ends up happening when you have this normal-ish distribution
00:05:45.100 --> 00:05:51.700
you have the s shaped curve when you transform it into the cumulative frequency.
00:05:51.700 --> 00:06:01.900
This part in the middle, that part corresponds from about 400 to 600.
00:06:01.900 --> 00:06:17.300
That part corresponds to the biggest jumps and as you see there are big jumps here too but the jumps are just going in one direction.
00:06:17.300 --> 00:06:22.500
That is how it looks.
00:06:22.500 --> 00:06:37.500
Let us put that on hold for a second and let us talk about calculus in brief.
00:06:37.500 --> 00:06:38.000
A lot of people when they think about calculus, they think immediately derivatives and intervals like integrating stuff and getting derivative of equations.
00:06:38.000 --> 00:06:39.600
It is what they think about.
00:06:39.600 --> 00:06:43.100
Let us start packing and think about it conceptually.
00:06:43.100 --> 00:06:49.300
Calculus in some way you could think of an equation as being on a continuum.
00:06:49.300 --> 00:06:58.200
Let us just say that there are some equation that we are thinking of Y=x² or something like that.
00:06:58.200 --> 00:07:01.400
We all know what that function looks like.
00:07:01.400 --> 00:07:13.900
It looks something like this a parabola and when we think about this we are saying this is just plotting exactly what y is given from x.
00:07:13.900 --> 00:07:15.300
That is all that graph is.
00:07:15.300 --> 00:07:17.400
When x is 0 y is 0.
00:07:17.400 --> 00:07:19.000
When x is 1 y is 1.
00:07:19.000 --> 00:07:22.400
When x is 2 y is 4.
00:07:22.400 --> 00:07:27.100
We are just plotting those precise points.
00:07:27.100 --> 00:07:39.000
When you go and take the integral, I’m going to put the integral on this side versus you go and take the derivative,
00:07:39.000 --> 00:07:46.800
you are describing 2 different aspects of the same graph.
00:07:46.800 --> 00:08:00.300
When you talk about the integral, what you are doing is you are no longer plotting these particular point but now you are plotting these areas.
00:08:00.300 --> 00:08:05.100
When you think about integral, think about area.
00:08:05.100 --> 00:08:14.700
Whenever you have some curve or shape or line, when you take the integral you are pointing towards the area.
00:08:14.700 --> 00:08:24.300
You could get these areas of this weird bizarre shapes and at the same if you go on the other side of the continuum
00:08:24.300 --> 00:08:35.800
and you go towards the derivative which you will end up getting is not the area but instead you are just getting the slope.
00:08:35.800 --> 00:08:46.000
Here instead of being interested in the points themselves, now you are interested in these little slopes.
00:08:46.000 --> 00:08:51.400
All these little slopes, that every single, tiny point.
00:08:51.400 --> 00:08:55.900
You are really plotting those slopes.
00:08:55.900 --> 00:09:00.300
You are interested in these changes.
00:09:00.300 --> 00:09:11.900
Here I want you to think slope and obviously for any equation so even when you get a graph of slopes, you could get the slopes of slopes.
00:09:11.900 --> 00:09:16.500
Even when you have the slopes of slopes, you could get a slope of slopes of slopes.
00:09:16.500 --> 00:09:26.400
You could go more and more towards that derivative side or on the other hand if you get a graph of whole bunch of areas you could get the areas of that curve.
00:09:26.400 --> 00:09:35.300
You could go more and more towards the integral side as well.
00:09:35.300 --> 00:09:45.000
This is going to be important to us because we are going to be interested particularly in a curve that looks like this, the normal curve PDF.
00:09:45.000 --> 00:09:52.600
What we want to do is get the cumulative areas of that one.
00:09:52.600 --> 00:09:54.900
Which direction should we go?
00:09:54.900 --> 00:10:04.800
We should probably go towards the integral direction because what we want is the cumulative areas of this function.
00:10:04.800 --> 00:10:10.000
That is going to be an important concept.
00:10:10.000 --> 00:10:17.400
Now let us talk about the PDF or what we call the probability density function.
00:10:17.400 --> 00:10:22.100
We have talk about how the standard normal distribution is a little bit different than just the normal distribution.
00:10:22.100 --> 00:10:38.000
The mean is 0 and the stdev is always one because of that it is a special case that is very helpful to us.
00:10:38.000 --> 00:10:44.400
There is a special sign we use just for the PDF of the standard normal distribution.
00:10:44.400 --> 00:10:48.200
That is what we call little yi and it looks like this.
00:10:48.200 --> 00:10:54.600
When it is written out sometimes it looks like a lower case y.
00:10:54.600 --> 00:10:56.800
That is how people sometimes write it.
00:10:56.800 --> 00:10:59.000
That is what it looks like.
00:10:59.000 --> 00:11:07.000
That is the symbol we use to denote that we are going to be talking about standard normal distribution.
00:11:07.000 --> 00:11:13.800
The function actually looks something like this.
00:11:13.800 --> 00:11:23.500
X just represents some value on your variable of interest and there are different ways you could write this but the heart of this is e⁻¹/2x².
00:11:23.500 --> 00:11:33.800
That is the heart of this and even this ½ is like a constant, I just remember it as e⁻x².
00:11:33.800 --> 00:11:55.400
That is the heart of the shape, all divided by 2π.
00:11:55.400 --> 00:12:14.900
Obviously this can be written in slightly different ways you might also see it as 1/√2π multiplied by the exponential function to the power -1/2 x² or –x²/2.
00:12:14.900 --> 00:12:18.100
There are couple of different ways you could write that.
00:12:18.100 --> 00:12:23.400
There function might seem a little bit crazy to us but let us break it down.
00:12:23.400 --> 00:12:30.600
You might want to use www.wolframalpha.com or if you have a graphing calculator feel free to use that.
00:12:30.600 --> 00:12:34.200
That will work largely in the same way.
00:12:34.200 --> 00:12:46.700
If you go to www.google.com, www.wolframalpha.com is like a combination of fancy graphing calculators / Wikipedia for math and science stuff.
00:12:46.700 --> 00:12:50.900
It is helpful.
00:12:50.900 --> 00:12:58.000
Here we could write any function we want.
00:12:58.000 --> 00:13:21.800
Let us start off with just y= e^ -.5 x².
00:13:21.800 --> 00:13:26.700
Let us start with that part first and let us see what we got.
00:13:26.700 --> 00:13:31.300
What www.wolframalpha.com will do is that it will actually draw the equation for us.
00:13:31.300 --> 00:13:54.500
Notice that we have something that looks like a normal distribution but one of the issues is we just want to divide it by the √2×π.
00:13:54.500 --> 00:13:59.700
That will just change the shape of it very slightly.
00:13:59.700 --> 00:14:04.100
Notice that largely it is the same basic fundamental shape.
00:14:04.100 --> 00:14:11.100
You divide by that constant to give you a couple of properties of a normal distribution that are going to be important to us later.
00:14:11.100 --> 00:14:25.900
The heart of this function is that exponential function and it is in particular exponential function to the x² power.
00:14:25.900 --> 00:14:31.100
That is what gives us that nice curve.
00:14:31.100 --> 00:14:37.400
That is what we think of little yi, that is this equation.
00:14:37.400 --> 00:14:47.900
The PDF for a normal distribution that we know has any kind of mean and any kind of standard deviation.
00:14:47.900 --> 00:15:12.900
One thing that I forgot to point out is that when you look at this, one thing that you will notice is that then mean or point of symmetry is 0 and 1 stdev out.
00:15:12.900 --> 00:15:23.500
If you go 1 stdev out that looks about like 68% of that curve.
00:15:23.500 --> 00:15:26.100
It seems like more than half.
00:15:26.100 --> 00:15:34.600
This depicts the mean being 0 and stdev being 1.
00:15:34.600 --> 00:15:47.700
If we want a regular normal distribution that did not have a mean of 0, it did not have stdev of 1, what would the formula for that be?
00:15:47.700 --> 00:15:51.400
What would the function for that be?
00:15:51.400 --> 00:16:06.500
The general equation for the PDF, we do not have a special symbol, we just use the regular symbol that we use for all function f(x).
00:16:06.500 --> 00:16:14.800
Here it is still this function at the heart of it but all we are doing is we are going to be adding in mean
00:16:14.800 --> 00:16:21.500
and stdev as variables so that you could put in whatever mean and stdev that you want.
00:16:21.500 --> 00:16:25.400
Let us start with the heart part.
00:16:25.400 --> 00:16:45.300
It is e^ -, instead of ½ we need to change that x² and it is going to be x – μ because we are going to put in that μ.
00:16:45.300 --> 00:16:50.100
If μ is 50 we want that point of symmetry to be over 50.
00:16:50.100 --> 00:16:55.000
If μ is -5 we want that point of symmetry to be over -5.
00:16:55.000 --> 00:16:58.000
We want to square that.
00:16:58.000 --> 00:17:00.600
We are squaring that distance.
00:17:00.600 --> 00:17:10.500
Here we have done that actually except that is x – 0².
00:17:10.500 --> 00:17:18.400
That is what we could just convince it is x²/ 2Ʃ².
00:17:18.400 --> 00:17:25.900
That is where that ½ comes from and when Ʃ is 1 it is just -1, right?
00:17:25.900 --> 00:17:30.900
That is why we do not see that crazy part in here.
00:17:30.900 --> 00:17:39.600
There is just one more thing, 2 π Ʃ².
00:17:39.600 --> 00:18:00.200
Another way you could think about it is having 1/Ʃ × φ, instead of just putting an x we are putting in x – μ / Ʃ.
00:18:00.200 --> 00:18:13.200
If you out this function in here and you substitute x with all of these crap and you multiply by 1/ Ʃ then you would get this equation.
00:18:13.200 --> 00:18:16.600
This is the simplified version.
00:18:16.600 --> 00:18:21.700
Now let us put this in www.wolframalpha.com and let us see what kind of function we have.
00:18:21.700 --> 00:18:34.600
We could put in a μ(550) and Ʃ(100).
00:18:34.600 --> 00:18:39.700
Let us see what that normal distribution looks like.
00:18:39.700 --> 00:18:47.400
I will put this up here.
00:18:47.400 --> 00:18:54.700
We want to substitute in 550 right here and 100 right here.
00:18:54.700 --> 00:19:20.100
Let us put in e^ - ( x – μ(550))² ÷ 2 × Ʃ².
00:19:20.100 --> 00:19:25.800
I’m just going to make that 100², that is 10,000.
00:19:25.800 --> 00:19:38.800
I’m just going to make sure that it is all in this parentheses here.
00:19:38.800 --> 00:19:50.900
All divided by √2×π × Ʃ².
00:19:50.900 --> 00:19:55.200
I’m just going to put my 10,000 again.
00:19:55.200 --> 00:20:04.100
What we should see from this is the mean being at 550 and the standard deviation being at 100.
00:20:04.100 --> 00:20:20.500
Let us see and the nice thing about www.wolframalpha.com is that just in case I missed a parentheses
00:20:20.500 --> 00:20:29.200
it will rewrite the input for you in a more standard form instead of this linear way.
00:20:29.200 --> 00:20:34.000
So that you could check and see that you are missing a parentheses or something.
00:20:34.000 --> 00:20:49.400
Is this is where 550 is and that looks like the center of that curve and not only that from 550 if you go out to 450 or 650,
00:20:49.400 --> 00:20:52.800
that does looks like about 68% of that curve.
00:20:52.800 --> 00:21:09.600
What we see here is that this function, if you substitute in any mean and any stdev it will effectively draw or represent this normal distribution for you.
00:21:09.600 --> 00:21:21.500
That is the PDF, but what this gives you is at every point what is the probability for that particular value of x.
00:21:21.500 --> 00:21:27.600
Let us move on to cumulative distribution function.
00:21:27.600 --> 00:21:32.300
We want the cumulative distribution function.
00:21:32.300 --> 00:21:37.700
We do not just want that curve.
00:21:37.700 --> 00:21:46.000
Instead what we want is a cumulative adding up of all the areas that came before.
00:21:46.000 --> 00:21:54.600
What we are looking for is that curve and at any point I can tell you the percentage we are at.
00:21:54.600 --> 00:22:03.100
Here, it encapsulates all the space that came before it.
00:22:03.100 --> 00:22:11.300
As we talked about before, because I want that cumulative area, all we have to do now is take that integral of φ.
00:22:11.300 --> 00:22:28.100
We represent the cumulative density function as upper case φ rather than lower case φ and we put in x.
00:22:28.100 --> 00:22:47.500
All we do is we take the integral and we take the integral from –infinity up to x, whatever x is and that will give you the area so far of φ.
00:22:47.500 --> 00:23:00.300
Obviously you could actually get the integral but I’m just to leave it as it is because I just want you to know what the function actually means.
00:23:00.300 --> 00:23:07.300
The meaning is just the integral of little φ which we talked about.
00:23:07.300 --> 00:23:15.200
What that gives you is this idea of the area up to this point of x, whatever x is.
00:23:15.200 --> 00:23:27.400
Because we are talking about for a standard normal distribution that is why I’m using that φ or else I would use f(x) for the normal distribution function.
00:23:27.400 --> 00:23:29.300
That is PDF.
00:23:29.300 --> 00:23:32.500
That is a little bit easier.
00:23:32.500 --> 00:23:34.600
Let us go into some examples.
00:23:34.600 --> 00:23:38.800
Here is example 1 and it is just talking about frequency graphs.
00:23:38.800 --> 00:23:44.500
It is not actually talking about CDF or the function.
00:23:44.500 --> 00:23:50.100
Here it says that estimate a square that falls up to 48% percentile.
00:23:50.100 --> 00:23:58.600
Percentile is a word that we say just for cumulative frequency or cumulative relative frequency.
00:23:58.600 --> 00:24:14.500
One thing that makes this chart easy to use is that we could just go to 48, I will use a different color and I will go all the way across and go down.
00:24:14.500 --> 00:24:18.700
We will get a rough approximation of that score.
00:24:18.700 --> 00:24:25.200
That score is about a little bit less than 500, let us say 480.
00:24:25.200 --> 00:24:29.900
That is the score that certifies the 48% percentile.
00:24:29.900 --> 00:24:34.400
If you wanted it a little bit more than 40, you could just round it to 500.
00:24:34.400 --> 00:24:45.100
If you want it less than 48% percentile, you could round down to 450.
00:24:45.100 --> 00:24:46.400
Here is example 2.
00:24:46.400 --> 00:24:59.400
In problems with a normal distribution, the mean, stdev, x and the probability of x, these things are involved.
00:24:59.400 --> 00:25:04.700
Like a puzzle, if you have 3 you could figure out 4, right?
00:25:04.700 --> 00:25:14.600
Here we have couple of things that are missing but it gives you some of the other pieces and we have to figure out the other pieces that are missing.
00:25:14.600 --> 00:25:19.200
I’m just going to take a look at my first line.
00:25:19.200 --> 00:25:28.500
It says the mean is 3, stdev is 1, I have an x value, what is the probability of that x?
00:25:28.500 --> 00:25:46.800
I’m just going to pull up a regular Excel sheet and I have just labeled it with mean, stdev x, and probability of that x.
00:25:46.800 --> 00:25:55.800
I’m also going to write down z because often we need to find z in order to use the tables at the back of the book.
00:25:55.800 --> 00:26:00.700
I’m going to write in z.
00:26:00.700 --> 00:26:26.500
The mean here is 3, stdev is 1, x is 2, if we taught about that in a normal distribution picture.
00:26:26.500 --> 00:26:27.600
The mean will be 3, stdev 1, this will be 2.
00:26:27.600 --> 00:26:32.900
Since x is that 2, it is asking for this.
00:26:32.900 --> 00:26:43.000
It should be about 16%, just using that empirical rule.
00:26:43.000 --> 00:26:47.000
Let us find out exactly how much that is.
00:26:47.000 --> 00:26:58.000
If you wanted to use the special Excel function, you could actually just use normdist because you have everything you need.
00:26:58.000 --> 00:27:14.300
You have your x, mean, stev, and if you want the cumulative probability you would just write true which is what we want.
00:27:14.300 --> 00:27:20.600
We will get about 16% or .159.
00:27:20.600 --> 00:27:28.300
Another way you could do it is by finding the z score and then looking in up at the back of your book.
00:27:28.300 --> 00:27:43.500
You could use standardized or you could find the distance between the x and the mean and divide that by the stdev to get how many stdev away.
00:27:43.500 --> 00:27:47.500
It is 1 stdev away on the negative side.
00:27:47.500 --> 00:28:03.000
If I did not want to use this function I could just use normdist and that is where I will put in my z score and I will get the same thing.
00:28:03.000 --> 00:28:05.300
Those are 2 different ways that you could do it.
00:28:05.300 --> 00:28:09.700
You could also look at that z score at the back of your book.
00:28:09.700 --> 00:28:11.600
Let us do the second line.
00:28:11.600 --> 00:28:33.300
Here we have the mean but we do not have the stdev but we do have x which is .1 and 1x < .1 it is at .18.
00:28:33.300 --> 00:28:43.600
Let us sketch this out to help ourselves and I will draw it in a different color.
00:28:43.600 --> 00:28:50.300
Here the mean is 10 but I do not know what to write here.
00:28:50.300 --> 00:29:04.100
What I do know is around 18% which is a little more than 16%, around 18% that x is .1.
00:29:04.100 --> 00:29:11.400
My question is what is my stdev?
00:29:11.400 --> 00:29:16.500
What is this jump such that this 18% that is .1?
00:29:16.500 --> 00:29:27.700
We have all the pieces that we need, one thing that might be helpful to do is find the z score because the z score formula has the stdev.
00:29:27.700 --> 00:29:31.000
We have all the other pieces like the mean and x.
00:29:31.000 --> 00:29:37.400
Once we know the z score then we could easily find out the stdev.
00:29:37.400 --> 00:29:47.500
I’m just going to use my norm inverse which would get if I put in the probability it will give me the z score
00:29:47.500 --> 00:29:58.300
or you could also look up in the table at the back of your book and look for 18% and find the z score there.
00:29:58.300 --> 00:30:04.400
Here I find my z score is -.91.
00:30:04.400 --> 00:30:13.100
It is a little bit closer to the mean than -1 but it is almost -1.
00:30:13.100 --> 00:30:38.300
Once I have that then I could easily find my stdev function just by using my z score formula because the z score formula is just x – μ / stdev or /Ʃ.
00:30:38.300 --> 00:30:57.300
What I want to solve is for Ʃ and here I could just match my Ʃ and both sides multiply and divide z by both sides and I could get Ʃ = x – μ / z score.
00:30:57.300 --> 00:31:06.200
Then I could easily do in Excel or in your calculator or in your head.
00:31:06.200 --> 00:31:21.500
I will just take x – mean / z score and I wil get 10.81.
00:31:21.500 --> 00:31:33.600
That makes sense because if I went out about 10.81 that will be -.81 right here at the first stdev.
00:31:33.600 --> 00:31:38.300
This is much smaller than that.
00:31:38.300 --> 00:31:42.000
Here that answer makes sense.
00:31:42.000 --> 00:31:44.800
Let us go to the third problem.
00:31:44.800 --> 00:32:05.900
Now I’m missing my mean but I have my stdev, I have m y x which is -.6 and I know that probability where x < -.6 is 35%.
00:32:05.900 --> 00:32:09.600
I will put in that.
00:32:09.600 --> 00:32:17.300
Let us sketch that out just so we can check whether the answer that we get is reasonable.
00:32:17.300 --> 00:32:26.600
Now we have an idea what this middle value is but we do know that each jump is about 3 away.
00:32:26.600 --> 00:32:38.100
The other we cannot tell where to write that .6 we could tell by the percentage it cannot be up here.
00:32:38.100 --> 00:32:55.200
It must be somewhere here such that this is about 35%.
00:32:55.200 --> 00:33:05.200
It is not quite half of that half but it is a little bit more than half of this half.
00:33:05.200 --> 00:33:08.600
The question is what is this mean?
00:33:08.600 --> 00:33:14.100
We know at that point it is -.6.
00:33:14.100 --> 00:33:28.600
Whatever my mean is, it got to be bigger than -.6 just because we know that this is not quite half yet and if we have the middle that is the mean.
00:33:28.600 --> 00:33:40.500
Once again it is half way to find the z score because the z score formula has the mean in it and so it is easy to find the mean once we have the z score.
00:33:40.500 --> 00:33:55.800
Once again I’m going to use norms in and put in my probability and I will get the z score of -.38.
00:33:55.800 --> 00:33:57.200
That makes sense.
00:33:57.200 --> 00:34:03.100
That is in between 0 and -1.
00:34:03.100 --> 00:34:06.600
Let us use that in able to find our mean.
00:34:06.600 --> 00:34:20.400
If you want you could just derive it from this again so instead of going from z, x but now we want to isolate μ.
00:34:20.400 --> 00:34:39.000
It would be z Ʃ = x – μ and I’m going to move the μ over to this side and move this over this side.
00:34:39.000 --> 00:34:48.400
That is what I need to do in order to get my mean, μ.
00:34:48.400 --> 00:35:06.400
I will take my x and subtract the z score × stdev and I will get mean of .56.
00:35:06.400 --> 00:35:36.900
That makes sense because that is bigger than .6 and since that there is a distance of 3 that would makes sense for .6 in between .55 and -2.55.
00:35:36.900 --> 00:35:45.700
This makes sense to us so once you write the mean, the answer, you are good to go.
00:35:45.700 --> 00:35:52.700
Now let us move on to example 3.
00:35:52.700 --> 00:36:01.400
It is another puzzle like problem but once you know the mean and the stdev you could find out Q1 and Q3.
00:36:01.400 --> 00:36:07.200
If you want Q1 and Q3 that could hel you figure out the mean and stdev.
00:36:07.200 --> 00:36:10.900
Here we are missing Q1 and Q3.
00:36:10.900 --> 00:36:14.100
Here we are missing the mean and stdev.
00:36:14.100 --> 00:36:17.400
And here I am missing a little bit of both.
00:36:17.400 --> 00:36:19.800
Let us get started.
00:36:19.800 --> 00:36:26.000
Here let us start just by drawing what we are talking about here.
00:36:26.000 --> 00:36:36.800
Here we have the mean and the stdev which is 5, 5 away on this side and 5 away on this side.
00:36:36.800 --> 00:36:46.900
We know that at about this little score right here we know that is 34% of the curve.
00:36:46.900 --> 00:36:56.100
Q1 wants to split it out into quartiles not 16 and 34, that is not even enough.
00:36:56.100 --> 00:37:04.700
We know that Q1 has to be somewhere in between 1 stdev away and 0 stdev away.
00:37:04.700 --> 00:37:09.300
We know that it means to be somewhere in there.
00:37:09.300 --> 00:37:25.900
Not quite stdev away because we want this area which makes this 25%, makes that 25%, and same with Q3 and up here.
00:37:25.900 --> 00:37:29.800
That is what we are looking for here.
00:37:29.800 --> 00:37:36.300
I will show you 2 ways of doing it just with Excel.
00:37:36.300 --> 00:37:40.300
Excel makes it a lot easier for us to do these things.
00:37:40.300 --> 00:37:51.200
Here I have mean, stdev, Q1, and Q3.
00:37:51.200 --> 00:37:55.800
We have the mean of 10 and stdev of 5.
00:37:55.800 --> 00:38:07.600
One thing that is helpful is just to know what the z score is in Q1 and that is never going to change because z score is just how many stdev away.
00:38:07.600 --> 00:38:14.600
You could think about z scores as just being a reflection of the standard normal distribution and that never changes.
00:38:14.600 --> 00:38:18.600
The z at Q1, what is that?
00:38:18.600 --> 00:38:28.300
That is easy to find by using the norms inv function and there you would just put in the cumulative probability you want.
00:38:28.300 --> 00:38:34.000
That would be this area right here which is an easy 25%.
00:38:34.000 --> 00:38:41.600
I know my z score there, it is not quite -1 it is -.67.
00:38:41.600 --> 00:38:42.800
That makes sense.
00:38:42.800 --> 00:38:51.600
Once we know that, then we have all the things we need in order to find the raw score at Q1 or what we call x.
00:38:51.600 --> 00:39:11.500
We could put in the mean + z score × stdev and because Excel always holds order of operations, it will do the multiplication before it does the addition.
00:39:11.500 --> 00:39:16.800
That is going to be 6.63 and does that make sense?
00:39:16.800 --> 00:39:18.800
Yes it does.
00:39:18.800 --> 00:39:26.400
It is in between 5 and 10 and it is pretty close to 5 but not quite all the way to 5.
00:39:26.400 --> 00:39:35.800
That makes sense that would be 6.63 and we could do the same thing in order to find Q3.
00:39:35.800 --> 00:39:40.100
Let us find the z at Q3.
00:39:40.100 --> 00:39:54.300
If you wanted to do this without Excel you could also easily do that too because you know that the z at Q3 if you cover 75% of that curve.
00:39:54.300 --> 00:40:02.100
This is also .25, if you add it all up that is 75% of that curve.
00:40:02.100 --> 00:40:06.400
You could just look that up in the table at the back of your book.
00:40:06.400 --> 00:40:19.300
Look for .75 and then look for the z score that correspond at that point or you could find that in Excel by using norms.
00:40:19.300 --> 00:40:26.900
It is important to have that s because if you are just looking for the z score and put in the probability of 75%.
00:40:26.900 --> 00:40:37.800
.75 gives us .67 as the z score and that makes sense that these z scores for Q1 and Q3 precisely mirror each other.
00:40:37.800 --> 00:40:42.000
They are just the negative and positive versions of each other.
00:40:42.000 --> 00:40:53.300
For Q3, we could just use the mean and add how many stdev away you want to go.
00:40:53.300 --> 00:41:00.400
That is z × stdev and that gives us 13.37.
00:41:00.400 --> 00:41:10.700
If we look on this side it is not quite 15 but it is closer to 15 than it is to 10.13.37.
00:41:10.700 --> 00:41:12.200
That makes sense.
00:41:12.200 --> 00:41:25.500
I will write that in here 13.37 and this is something like 6.63.
00:41:25.500 --> 00:41:29.300
That is finding Q1 and Q3.
00:41:29.300 --> 00:41:35.900
There is yet another way that you can do it in Excel and this is going to be a super short cut.
00:41:35.900 --> 00:41:45.500
You could use the norm inv function because you have the probability .25.
00:41:45.500 --> 00:41:50.300
You have the mean and the stdev.
00:41:50.300 --> 00:41:53.500
That will give you the z score.
00:41:53.500 --> 00:42:02.600
You could also use that for finding Q3 by using norm inv where you put in the probability and I will spit out the raw score.
00:42:02.600 --> 00:42:15.000
The probability is 7.75, mean is 10, stdev is 5, and once again we get the same thing.
00:42:15.000 --> 00:42:18.100
For a lot less work you do not have to go through the z score method.
00:42:18.100 --> 00:42:26.900
The z score method is helpful just because you could also use the back of your book and maybe on your test who have Excel.
00:42:26.900 --> 00:42:31.300
That is a good and helpful thing to know.
00:42:31.300 --> 00:42:34.700
Now let us move on to the second problem.
00:42:34.700 --> 00:42:40.100
The second row the problem has changed a little bit.
00:42:40.100 --> 00:42:54.900
We have the same curve but we know that we do not know the mean, what that jump is, but we do know this.
00:42:54.900 --> 00:43:05.200
We know here the score is 120 and we know here the score is 180.
00:43:05.200 --> 00:43:09.600
I’m just going to show you a quick short cut here but it is very reasonable shortcut.
00:43:09.600 --> 00:43:14.600
We know that the mean has to be in the middle of these 2 numbers.
00:43:14.600 --> 00:43:16.900
Those 2 numbers are are mirrors of each other.
00:43:16.900 --> 00:43:23.500
They are exactly 25% away on this side and 25% away on this side and the normal distribution is perfectly symmetrical.
00:43:23.500 --> 00:43:27.800
We k now that the mean has to be the point in the middle.
00:43:27.800 --> 00:43:38.900
There is only one point that is precisely in the middle of those 2 numbers and we could easily find that by taking the average of 120 and 80.
00:43:38.900 --> 00:43:42.000
Between 120 and 80.
00:43:42.000 --> 00:43:58.300
I could just say take the average of these 2 numbers or you could alternatively add 120 to 180 and divide it by 2 and we would get 150.
00:43:58.300 --> 00:44:01.700
150 looks about right.
00:44:01.700 --> 00:44:07.500
That looks like where it should fall.
00:44:07.500 --> 00:44:13.300
The question now is what is the stdev?
00:44:13.300 --> 00:44:26.700
One way you could easily do this is we know the z score and we could use it to figure out the stdev.
00:44:26.700 --> 00:44:29.400
I do not have to find the z score again.
00:44:29.400 --> 00:44:32.400
I’m just going to use these.
00:44:32.400 --> 00:44:35.100
Now let us think about the formula for stdev.
00:44:35.100 --> 00:44:44.400
If you remember from the previous problem, it is just going to be whatever x in.
00:44:44.400 --> 00:44:51.400
I’m just going to use Q1 as my x – mean / z score.
00:44:51.400 --> 00:44:55.200
I will get 44.48.
00:44:55.200 --> 00:44:59.700
Let us see if that makes sense to us.
00:44:59.700 --> 00:45:26.100
If we are at 150, if we go out 44.5 then that should give us about 105 or 106.
00:45:26.100 --> 00:45:41.200
If we go out that far that makes sense because 120 falls in between that and 150 but it is a little bit closer to the 105 than it is to the 150.
00:45:41.200 --> 00:45:48.200
If we go out on the other side it will be 194.5.
00:45:48.200 --> 00:45:54.000
Once again it makes sense because 180 is pretty close to it but not all the way up there.
00:45:54.000 --> 00:46:00.100
Last problem in this set.
00:46:00.100 --> 00:46:09.700
Here we do not know the mean but we do know the stdev, the jump.
00:46:09.700 --> 00:46:12.500
This is a jump of 10.
00:46:12.500 --> 00:46:15.900
We know Q1.
00:46:15.900 --> 00:46:23.800
Here is Q1 and that is 100.
00:46:23.800 --> 00:46:30.300
We know that the mean has to be greater than 100.
00:46:30.300 --> 00:46:36.900
We do not know exactly how much greater but we know it is greater than 100.
00:46:36.900 --> 00:46:46.300
It cannot be 110 because you are not going 1 stdev out, you are going less than 1 stdev out.
00:46:46.300 --> 00:46:50.000
Let us see if we could figure out this strategy.
00:46:50.000 --> 00:46:54.400
We could find the z score very easily but we already know it.
00:46:54.400 --> 00:46:57.900
Using the z score we could find the mean.
00:46:57.900 --> 00:47:01.300
Once we know that we could find Q3.
00:47:01.300 --> 00:47:09.200
I will move this up here.
00:47:09.200 --> 00:47:13.000
Here we do know the mean but we do know the stdev.
00:47:13.000 --> 00:47:18.200
We know Q1 and we know the z score at Q1.
00:47:18.200 --> 00:47:24.000
Using that z score in Q1 I’m just going to go ahead and find my mean.
00:47:24.000 --> 00:47:43.200
In order to find the mean, if you remember from the previous problem it is just x – stdev × z score.
00:47:43.200 --> 00:47:53.500
The mean is at 106.75.
00:47:53.500 --> 00:47:56.000
That makes sense.
00:47:56.000 --> 00:48:29.100
It is not quite 110 but it is in between there and once we know that mean 106.7 then we could easily use that in order to find Q3.
00:48:29.100 --> 00:48:52.300
Q3 we could use norm inv and put in probability, mean, and stdev, or you could use the z score in order to find Q3.
00:48:52.300 --> 00:49:12.500
That makes sense 113 because if the mean is 106 or 107, then going 10 out would be 116.7 and that is too far out.
00:49:12.500 --> 00:49:16.400
113 is perfect for Q3.
00:49:16.400 --> 00:49:32.100
That is example 3 and notice that it just takes a little bit of reasoning to get around some of these things.
00:49:32.100 --> 00:49:35.500
Let us go to example 4.
00:49:35.500 --> 00:49:43.900
The miniature cars in an old town is 12 years old and the stdev is 8 years, what percentage of cars are more than 4 years old?
00:49:43.900 --> 00:49:51.900
One thing that helps me is if I draw a little distribution to help me out.
00:49:51.900 --> 00:50:24.400
The mean is 12, stdev is 8, what percentage of cars are more than 4 years old?
00:50:24.400 --> 00:50:31.200
There is implicit issue here.
00:50:31.200 --> 00:50:33.700
Let us say we go another 8 out that would mean we are at -4.
00:50:33.700 --> 00:50:37.300
Can a car be -4 years old?
00:50:37.300 --> 00:50:40.100
Let us think about this.
00:50:40.100 --> 00:50:47.400
It looks like maybe in somebody’s head or in plans.
00:50:47.400 --> 00:50:51.800
-4 years old is hard to think about.
00:50:51.800 --> 00:51:07.800
When we think about this distribution we want to cut it off at 0 because cars just are not -1 years old.
00:51:07.800 --> 00:51:10.300
It starts at 0.
00:51:10.300 --> 00:51:16.000
If we think about where 0 is, that is right in between there.
00:51:16.000 --> 00:51:27.600
When we think about it in terms of z score, the z score is 0, this is -1 and this is about -1.5.
00:51:27.600 --> 00:51:39.200
We are thinking about we do not want to count these cars because they do not exist.
00:51:39.200 --> 00:51:47.100
Thankfully our question is what percentage of the cars are more than 4 years old.
00:51:47.100 --> 00:51:57.000
Our question is a trick question to help us think about this issue.
00:51:57.000 --> 00:52:05.600
We are asking for this but remember percentage is always what is that compared to the whole.
00:52:05.600 --> 00:52:08.500
The whole is a little bit different here.
00:52:08.500 --> 00:52:15.500
The whole is not this whole curve because this part does not count.
00:52:15.500 --> 00:52:24.200
The whole is actually this part.
00:52:24.200 --> 00:52:30.000
It is asking what is the blue part in proportion to the red part?
00:52:30.000 --> 00:52:32.300
Tricky question.
00:52:32.300 --> 00:52:35.000
This takes a little bit of thinking.
00:52:35.000 --> 00:53:01.500
One thing probably we want to do is figure out the proportion of cars where age is greater than 4 years old and divide that by the proportion of cars where h > 0.
00:53:01.500 --> 00:53:04.600
That we could easily do by using z scores.
00:53:04.600 --> 00:53:21.800
We could take the z score or p(z score) > -1 / p or the z score > -1.5.
00:53:21.800 --> 00:53:35.700
You could do this by looking these probabilities at the back of your book or we will find these probabilities in Excel.
00:53:35.700 --> 00:53:55.300
Here is p where z > -1 and remember because we want the greater side, we have to do 1 – the functions here
00:53:55.300 --> 00:53:59.700
because Excel will give us the part on the negative side.
00:53:59.700 --> 00:54:02.800
Instead of the greater than side it will give us the less than side.
00:54:02.800 --> 00:54:17.800
I could use normsdist where we out in the z -1 and then it will split out the probability but it will split out the less than probability.
00:54:17.800 --> 00:54:25.700
Here I want to put in the 1 – normsdist and this should be greater than 50%.
00:54:25.700 --> 00:54:30.900
It should be 80 or something percent.
00:54:30.900 --> 00:54:41.000
Let us find the probability where the z is greater than – 1.5.
00:54:41.000 --> 00:54:49.400
I could use that same function normsdist of -1.5.
00:54:49.400 --> 00:54:55.700
Once again because I want the greater than part, I want to use my 1-.
00:54:55.700 --> 00:55:06.300
Once we have that then we could get the proportion what percentage of cars is here over here.
00:55:06.300 --> 00:55:13.900
That would be this over that.
00:55:13.900 --> 00:55:18.400
That is 90% of cars.
00:55:18.400 --> 00:55:22.400
This will be 90% of cars.
00:55:22.400 --> 00:55:26.100
This seems to be a little bit of a tricky problem.
00:55:26.100 --> 00:55:33.100
It does not look tricky at first but watch out for things like this whether it is a cut off at 0.
00:55:33.100 --> 00:55:39.200
You cannot have a negative age in this case.
00:55:39.200 --> 00:55:41.700
Watch out for these problems.
00:55:41.700 --> 00:55:44.000
That is it for www.educator.com.