WEBVTT mathematics/statistics/son
00:00:00.000 --> 00:00:02.700
Welcome to www.educator.com.
00:00:02.700 --> 00:00:08.600
Now let us talk about standard normal distributions and z scores.
00:00:08.600 --> 00:00:15.500
First we are going to contrast the normal distribution against standard normal distribution.
00:00:15.500 --> 00:00:21.500
It is pretty because just by knowing the normal distribution you already really know the standard normal distribution.
00:00:21.500 --> 00:00:31.100
Then we are going to talk about some normal distribution problems and contrast that with standard normal distribution problems.
00:00:31.100 --> 00:00:37.600
Before we talked about how the normal distribution really is a family of problems, it is not just one shape,
00:00:37.600 --> 00:00:42.500
but it could be stretched or shrunk on that x axis.
00:00:42.500 --> 00:00:50.900
And because of that it is actually an infinite set of distribution with all the different means and different standard deviations.
00:00:50.900 --> 00:00:53.500
You could have means of 10 and deviation of 10.
00:00:53.500 --> 00:00:54.900
That is one way of distribution.
00:00:54.900 --> 00:00:59.500
Another one could have a mean of 1 and a stdev of 2.
00:00:59.500 --> 00:01:01.900
There is like an infinite number of these.
00:01:01.900 --> 00:01:05.800
They all fit the empirical rule.
00:01:05.800 --> 00:01:12.300
Because it is problematic to work with like everything will one of these different normal distribution
00:01:12.300 --> 00:01:20.800
and they have thought of this transformative system where you transform all normal distribution into what is called the standard normal distribution.
00:01:20.800 --> 00:01:29.300
And there are what we are doing is where making that mean 0 and the stdev become 1.
00:01:29.300 --> 00:01:30.500
This way we do not have to worry about the actual values.
00:01:30.500 --> 00:01:35.000
We do not have to worry about the mean of actually 50.
00:01:35.000 --> 00:01:42.300
We just have to know the mean is in the middle.
00:01:42.300 --> 00:01:51.500
And the standard deviation by being 1 makes it really easy for us to label this stuff.
00:01:51.500 --> 00:02:13.300
In a standard normal distribution, what we really done is we have transformed the values into z scores or what we call standard deviation.
00:02:13.300 --> 00:02:22.100
We have normalized everything we do not care that the standard deviation is actually 10.
00:02:22.100 --> 00:02:25.800
We just care how many is the stdev the way they are.
00:02:25.800 --> 00:02:31.800
If you are 1 standard deviation away, 2 standard deviation away, and because of that we have changed this x axis
00:02:31.800 --> 00:02:35.900
instead of actually putting the values we are now putting the z scores.
00:02:35.900 --> 00:02:40.400
That is the only difference between the normal distribution and the standard normal distribution.
00:02:40.400 --> 00:02:47.900
In the standard normal distribution we basically ignore the values and we only use the z scores.
00:02:47.900 --> 00:02:51.200
We have everything in terms of standard deviations.
00:02:51.200 --> 00:02:54.800
Are you 1 standard deviation away, 1/2 standard deviation away?
00:02:54.800 --> 00:02:59.700
That is what we care about.
00:02:59.700 --> 00:03:12.700
In a normal distribution, which I will draw in blue, in a regular normal distribution, we usually have is probability, which is the area.
00:03:12.700 --> 00:03:16.900
This is represented by the area.
00:03:16.900 --> 00:03:26.800
The raw score, with the mean and standard deviations and actual values.
00:03:26.800 --> 00:03:32.200
Things like 150, 450.
00:03:32.200 --> 00:03:39.900
We have the z score, 0, -1, 1.
00:03:39.900 --> 00:03:44.300
That is what we think about a normal distribution, a regular normal distribution.
00:03:44.300 --> 00:03:49.800
What we do in a standard normal deviation is we ignore that part.
00:03:49.800 --> 00:03:51.000
It is the same thing.
00:03:51.000 --> 00:03:54.700
You still have probability and we still have the z scores, right?
00:03:54.700 --> 00:04:00.400
Now all the scores are z scores because we do not have any other scores other than that.
00:04:00.400 --> 00:04:10.400
That is the only thing that is different about a standard normal distribution.
00:04:10.400 --> 00:04:17.600
Now let us talk about the z score, raw score, and the mean, and the stdev, and the relationship that we have to each other.
00:04:17.600 --> 00:04:22.200
Before we really did it in an intuitive way.
00:04:22.200 --> 00:04:35.700
For instance, if our mean was 10, the mean is 10 and our standard deviation is 5.
00:04:35.700 --> 00:04:40.200
How do we know where to draw these notches?
00:04:40.200 --> 00:04:49.400
Well if it is a normal distribution we know that this is approximately this area should be About 68% more than half.
00:04:49.400 --> 00:04:56.500
What you could do is look for the point of inflection and often in my drawings the point of inflections is hard to see
00:04:56.500 --> 00:04:59.800
but in a nicer drawing you could see the point of inflection.
00:04:59.800 --> 00:05:08.900
What we could see is 1 standard deviation away means that it is a distance of 5.
00:05:08.900 --> 00:05:11.800
What would this value be right here?
00:05:11.800 --> 00:05:17.800
Since it is 5 on the negative side, all we have to do is subtract 5 and it is pretty intuitive.
00:05:17.800 --> 00:05:23.100
If we would want to go another five steps to the left and that would be 0.
00:05:23.100 --> 00:05:27.700
If we wanted to go yet another 5 steps to the left that would be -5.
00:05:27.700 --> 00:05:35.400
If we wanted to go yet another 5 steps to the left that would be -10 and so on and so forth.
00:05:35.400 --> 00:05:47.900
Same thing on the positive side, if we wanted to go 5 steps to the right and that would be 15, 20, 25, 30 and so forth.
00:05:47.900 --> 00:06:03.000
On to infinity and negative infinity. Now before we just did it in a very intuitive way and these notches just corresponds with z scores.
00:06:03.000 --> 00:06:13.000
This is 1 standard deviation out, 2 standard deviations out three standard deviations out,
00:06:13.000 --> 00:06:23.700
but there is systematic relationship that we could exploit here in turn with the formula between the z scores, raw score, and mean, and standard deviation.
00:06:23.700 --> 00:06:42.600
If we wanted to go 2 standard deviations out and find what this value was, in order to find the raw score what we do is take the mean
00:06:42.600 --> 00:06:47.900
and then add the z score and multiply by the stdev.
00:06:47.900 --> 00:07:01.900
This actually works with the negative side too because when we have a z score of -1 then you would subtract the stdev instead of adding to the mean.
00:07:01.900 --> 00:07:04.700
This formula ends up being good for us.
00:07:04.700 --> 00:07:09.600
Instead of raw score, we usually call the raw score the axis.
00:07:09.600 --> 00:07:16.000
Instead of writing out mean we would write mule.
00:07:16.000 --> 00:07:22.400
Then we could keep z and write sigma as our stdev.
00:07:22.400 --> 00:07:31.900
Now we see our formula here that allows us to go from the mean, z score and standard deviation into a raw score.
00:07:31.900 --> 00:07:37.300
We could use this formula to find anything, any of those four things.
00:07:37.300 --> 00:07:45.000
If we wanted to find the z score for instance, and we did some algebraic to this
00:07:45.000 --> 00:07:57.400
and we would just see that is actually the distance between the raw score and the mean divided by the standard deviation.
00:07:57.400 --> 00:08:00.700
Let us think for a second about what that means.
00:08:00.700 --> 00:08:07.000
That means the z score is take this distance and cut it open to these chunks.
00:08:07.000 --> 00:08:09.300
How many of those chunks do you have right?
00:08:09.300 --> 00:08:10.200
the z score really
00:08:10.200 --> 00:08:15.600
is telling you how many standard deviations are you away from the mean?
00:08:15.600 --> 00:08:20.700
It is totally relative to the mean and standard deviation.
00:08:20.700 --> 00:08:29.500
Also you can use this very same formula to solve for the mean and standard deviation.
00:08:29.500 --> 00:08:33.600
We could do that over here right.
00:08:33.600 --> 00:08:39.500
Let us say we wanted to solve for the mean, what would that look like?
00:08:39.500 --> 00:08:48.600
All we have to do is move this over so that would be x - z score multiplied by the standard deviation.
00:08:48.600 --> 00:08:54.900
Let us say we wanted to find standard deviation, what would we do then?
00:08:54.900 --> 00:08:57.900
That is pretty easy as well.
00:08:57.900 --> 00:09:12.600
All we have to do is you could just take this formula and swap these guys and that would be stdev = x to find out what is z.
00:09:12.600 --> 00:09:19.300
Here we see that using this very simple formula and you could just remember one of these
00:09:19.300 --> 00:09:23.000
because having one of them, you can derive the other ones from it.
00:09:23.000 --> 00:09:31.200
Just by having one of these formulas if you had any 3 of these you could solve for the fourth one.
00:09:31.200 --> 00:09:34.000
You do not even have to do the algebraic transformations.
00:09:34.000 --> 00:09:43.200
You could just plug them in and find out what is missing.
00:09:43.200 --> 00:09:52.200
So far we have only talked about the z scores for its nice and even like 1, -1, 2, -2.
00:09:52.200 --> 00:09:54.600
We have not talked about weird z scores.
00:09:54.600 --> 00:10:08.000
For instance, what about a z score of something like .5?
00:10:08.000 --> 00:10:15.200
What would be the area over here?
00:10:15.200 --> 00:10:25.500
Some of you maybe tempted to take .34 and divide it in half, but as you will see from this picture that would not work.
00:10:25.500 --> 00:10:34.500
You are not really dividing this area in half, this area still ends up being slightly bigger because it is taller than this area.
00:10:34.500 --> 00:10:39.900
In this area have been chunk out of it.
00:10:39.900 --> 00:10:42.500
You cannot just divide it by half.
00:10:42.500 --> 00:10:46.700
That is not going to be a good strategy.
00:10:46.700 --> 00:10:49.700
That would not give you the right area.
00:10:49.700 --> 00:10:51.100
How do we deal with this?
00:10:51.100 --> 00:10:59.900
Well, there are two ways of dealing with this these weird z scores and how to go from these weird z scores to probabilities that we do not know yet.
00:10:59.900 --> 00:11:05.000
That are not nice than my 34, 13 ½ .
00:11:05.000 --> 00:11:06.500
There are two ways of doing it.
00:11:06.500 --> 00:11:10.000
One way is by looking at up on a table.
00:11:10.000 --> 00:11:15.300
Often there are tables in the back of your text book or even in the back of an EP statistics
00:11:15.300 --> 00:11:30.100
like Princeton Review book or something that show you the transformation from weird z scores like .5 and .67, 1.9 probability.
00:11:30.100 --> 00:11:33.900
That is one way of doing it using z tables.
00:11:33.900 --> 00:11:42.900
The second way of doing it is by using Excel or your calculator.
00:11:42.900 --> 00:11:49.800
If you do not have a fancy TI something calculator, it should also come with similar functions to Excel.
00:11:49.800 --> 00:12:00.100
I’m going to show you those two methods of how to go from weird z scores to probabilities of their z scores and vice versa.
00:12:00.100 --> 00:12:05.200
How to go from the probabilities, like weird probabilities like 50%.
00:12:05.200 --> 00:12:13.300
We do not know where like 51% but we do not know what this value would be for 51%.
00:12:13.300 --> 00:12:18.700
how to go from those weird probabilities into that weird z scores.
00:12:18.700 --> 00:12:32.500
First, let us talk about the method by using the tables in the back of your butt.01225.1 Usually it is the first table you will see back there, table A or something like that and let us break it down.
00:12:32.500 --> 00:12:39.900
A lot of tables looks somewhat like this one here might look slightly different than probably roughly similar.
00:12:39.900 --> 00:12:45.000
And what it shows you up here is exactly what probability is plotted down here.
00:12:45.000 --> 00:12:51.600
What it shows you is the probability that shown certain on the negative side.
00:12:51.600 --> 00:13:03.200
Everything below the z score and this is what we call the cumulative probability because you are accumulating it as we go right.
00:13:03.200 --> 00:13:06.100
It is adding up all the probabilities on this side.
00:13:06.100 --> 00:13:12.800
This is showing you the cumulative probability at the z score.
00:13:12.800 --> 00:13:21.100
The table entry for z is the probabilities lying below z.
00:13:21.100 --> 00:13:24.600
Here the z scores and these are the probabilities.
00:13:24.600 --> 00:13:46.800
Now for the weird z scores, what would really be helpful is if we had z scores of - .34, -.341, -3.42, -3.43.
00:13:46.800 --> 00:13:49.300
We had all of these decimal places.
00:13:49.300 --> 00:13:53.100
It would be really nice if we had all these different z scores.
00:13:53.100 --> 00:14:03.400
That would probably be a skinny list of a whole bunch of z scores and a skinny list of a whole bunch of different probabilities.
00:14:03.400 --> 00:14:07.600
That would be a very inefficient use of space.
00:14:07.600 --> 00:14:21.000
What a lot of tables do is they put the z score up to like here, it is up on the tens place in this side and then we put the hundreds on this dimension.
00:14:21.000 --> 00:14:36.100
In order to find the probability for the z score -3.45, you have to find -3.4 and then go to .05.
00:14:36.100 --> 00:14:47.500
It is like you add this and you stick it on there if you added it would not work because it is negative.
00:14:47.500 --> 00:15:01.900
Here that would be -3.45 and at -3.45 the cumulative probability is .0003.
00:15:01.900 --> 00:15:07.600
Notice that it is a very small probability, it is not 0 but it is very, very small.
00:15:07.600 --> 00:15:09.000
And let us do another example.
00:15:09.000 --> 00:15:19.200
Let us say we wanted to know the probability at the z score -2.48.
00:15:19.200 --> 00:15:33.500
We go to -2.4 and then also go to 8 and that would be our probability .0067 less than -.1%.
00:15:33.500 --> 00:15:38.600
What if we wanted to find out the upper side?
00:15:38.600 --> 00:15:43.400
We are like your only giving me the cumulative probability on the lower side.
00:15:43.400 --> 00:15:55.200
What if we wanted to find out this probability at 2.48?
00:15:55.200 --> 00:15:56.100
What would we do then?
00:15:56.100 --> 00:16:04.700
Because the normal distribution and standard normal distribution to be is perfectly symmetrical
00:16:04.700 --> 00:16:10.000
what we would see is that if we know it for the negative side, we can just flip it over on the positive side.
00:16:10.000 --> 00:16:25.800
The negative side over here at -2.48 is a probability of .00066 then this 2 is also .0066 because it is perfectly symmetrical.
00:16:25.800 --> 00:16:31.100
You do not need to have both the positive and negative sides.
00:16:31.100 --> 00:16:36.900
Oftentimes tables might just give you the positive side or the negative side.
00:16:36.900 --> 00:16:46.900
Sometimes, they will give you both, but you do not actually need both, you could just figure it out from there.
00:16:46.900 --> 00:16:49.500
Let us talk about how to do that with Excel.
00:16:49.500 --> 00:16:58.300
For Excel they are nice functions that Excel prewritten out for us so that we do not have to actually look it up on a table.
00:16:58.300 --> 00:17:15.600
For normal distributions, when you know the mean and the standard deviation, there are ways to go from the raw score to the area underneath the curve.
00:17:15.600 --> 00:17:19.700
So basically it is very similar to that picture we found that z score.
00:17:19.700 --> 00:17:24.900
What it will give you is this cumulative probability.
00:17:24.900 --> 00:17:29.900
That entire area below that value.
00:17:29.900 --> 00:17:37.000
The normal distribution, you need to enter in the mean and the standard deviation.
00:17:37.000 --> 00:17:46.300
In order to go from the score into the probability you would use the normdis function and
00:17:46.300 --> 00:17:52.300
you would put in the score, but you would also put in the mean, the standard deviation.
00:17:52.300 --> 00:18:05.700
and there is another thing, you also have to put in shrew in order to get the cumulative probability because this is sort of asking cumulative probability.
00:18:05.700 --> 00:18:10.600
and that will give you the probability of that score.
00:18:10.600 --> 00:18:12.600
The cumulative probability of that score.
00:18:12.600 --> 00:18:16.700
Norm inverse is just the inverse of that.
00:18:16.700 --> 00:18:21.300
Here we give it the probability and it is been stuck out the score.
00:18:21.300 --> 00:18:31.900
Here you give it the probability, the mean, and the standard deviation, and it spits back out to you the score.
00:18:31.900 --> 00:18:47.100
Notice that these two are inverses that is why one of them is called norm in because in one, you get the probability and the other you get the score.
00:18:47.100 --> 00:18:51.800
These are our 2 flipping around values.
00:18:51.800 --> 00:19:04.700
For standard normal distribution, you do not need to enter in the mean and standard deviation because all standard normal distribution means are 0.
00:19:04.700 --> 00:19:08.100
The mean of 0 the stdev is 1.
00:19:08.100 --> 00:19:12.000
It makes it a lot simpler for us.
00:19:12.000 --> 00:19:18.200
The only difference between a normal distribution and standard normal distribution is this one little letter.
00:19:18.200 --> 00:19:32.000
All you have to do is remember to enter in norm s dis and here you could just put in the score, forget everything else and it will spit out the probability.
00:19:32.000 --> 00:19:48.400
For norm set in, it is exactly the opposite where you put in the probability and a spit back out of the score.
00:19:48.400 --> 00:19:52.200
Another handy little functions might be the functions standardized.
00:19:52.200 --> 00:20:04.600
Here you put in the raw score, the mean, and the standard deviation, and it will give you the z score.
00:20:04.600 --> 00:20:13.500
The standardized function simply uses the formula that we have talked about earlier where
00:20:13.500 --> 00:20:25.700
in order to give you z score, it takes the raw score or x – mule ÷ stdev.
00:20:25.700 --> 00:20:30.500
Given that, let us look at a few examples and do them in Excel.
00:20:30.500 --> 00:20:37.500
The distribution of SAT scores for Math for incoming students to the University was approximately normal.
00:20:37.500 --> 00:20:40.700
That is again important, it has to say approximately normal.
00:20:40.700 --> 00:20:41.900
watch out for that.
00:20:41.900 --> 00:20:47.200
I do not know if you have tricky instructors or tests but sometimes they might give you a problem that looks like a normal distribution problem,
00:20:47.200 --> 00:20:51.300
but it does not say it is normally distributed.
00:20:51.300 --> 00:21:00.300
With the mule of 550 and a standard deviation of 100, what percentage of scores with 400 or below.
00:21:00.300 --> 00:21:15.700
Note that 400 if you transform it into a z score would not it be nice even z score like -1 or -2 is actually like -1.5
00:21:15.700 --> 00:21:19.300
That is going to be a little bit difficult for us to use just the empirical rule.
00:21:19.300 --> 00:21:26.500
That is where we have to use something like either the table in the back of your book or Excel.
00:21:26.500 --> 00:21:40.500
Here I just have an empty Excel sheet and one thing I’m going to do is I’m going to use my columns
00:21:40.500 --> 00:21:48.500
to denote probability raw score z scores just because that can help me keep track of what I’m doing.
00:21:48.500 --> 00:21:56.000
I’m also going to write down my mean and standard deviation just to help make things a little easier.
00:21:56.000 --> 00:22:12.600
550, 100, and my probability where x is less than 400, that is where I’m going for.
00:22:12.600 --> 00:22:23.900
My raw score is 400 and my z score.
00:22:23.900 --> 00:22:34.500
I can just do this in my head it is just the distance 150 ÷ 100.
00:22:34.500 --> 00:22:37.300
It is on the negative side so it is -1.5.
00:22:37.300 --> 00:22:46.800
But just to practice using Excel, let us use our standardized function.
00:22:46.800 --> 00:22:49.700
Excel will guide you into what you exactly need.
00:22:49.700 --> 00:22:53.700
You need your x value for raw score.
00:22:53.700 --> 00:22:58.800
The mean and the standard deviation.
00:22:58.800 --> 00:23:04.200
I’m going to close my parentheses and I will get -1.5.
00:23:04.200 --> 00:23:20.100
The other way you could do this in Excel is do the formula because you know that you need the raw score – mean / stdev.
00:23:20.100 --> 00:23:26.400
You will get the same thing but I just wanted to illustrate for you that is what the standardized function is doing.
00:23:26.400 --> 00:23:38.100
Let us see how that now you need the probability, I could find the probability in 2 ways.
00:23:38.100 --> 00:23:43.200
One, I could use this standard normal distribution or I could use the regular normal distribution.
00:23:43.200 --> 00:23:45.400
Let us do both.
00:23:45.400 --> 00:23:53.900
First, I know that is the norm dist, I do not need the inverse one because I have the raw score, I need the probability.
00:23:53.900 --> 00:23:56.700
Let us start with norm dist.
00:23:56.700 --> 00:24:05.500
Here I know I need my x value and I know I need the mean, stdev.
00:24:05.500 --> 00:24:08.000
I’m just going to ask you it is cumulative?
00:24:08.000 --> 00:24:11.500
Let me just write should.
00:24:11.500 --> 00:24:27.900
That gives us .0668 and that makes sense if we think about where -1.5 was, it is in between -2 and -1.
00:24:27.900 --> 00:24:32.400
If it was at -1, it would be like 16%.
00:24:32.400 --> 00:24:36.500
If it is all the way at -2, it will be like 2%.
00:24:36.500 --> 00:24:42.700
6% sounds like it is in between 2% and 16%.
00:24:42.700 --> 00:24:53.400
We could also do this by using norms set dist.
00:24:53.400 --> 00:24:56.000
For that, all you need is the z score.
00:24:56.000 --> 00:25:05.500
I will put in my z score and I should get the exact same thing because the z score corresponds with the raw score.
00:25:05.500 --> 00:25:07.900
They are right above each other.
00:25:07.900 --> 00:25:18.800
Probabilities at that point should be exactly the same.
00:25:18.800 --> 00:25:21.800
Now let us talk about the different types of problems.
00:25:21.800 --> 00:25:28.800
Now that you know z score and standardized normal distribution, what kinds of problems might come in your way?
00:25:28.800 --> 00:25:35.500
The first sign of information you do not always need these numbers because the first set of information
00:25:35.500 --> 00:25:46.300
is the mean, stdev, probability, raw score or x, the z score or z.
00:25:46.300 --> 00:25:47.700
That is the first set.
00:25:47.700 --> 00:25:52.600
Anyone of these things can be missing and you could find it.
00:25:52.600 --> 00:25:54.400
What is missing here?
00:25:54.400 --> 00:26:01.600
Here we have that same prompt, it says what percentages of scores where 400 and below?
00:26:01.600 --> 00:26:03.600
That is percentages of scores.
00:26:03.600 --> 00:26:15.100
Or it might ask what percent of scores fall below the z score of -1.5.
00:26:15.100 --> 00:26:19.400
Here we see that they are both about percentage of score.
00:26:19.400 --> 00:26:22.000
What might be missing is dist.
00:26:22.000 --> 00:26:28.100
If you have either one of these, and this 2, you could find it no problem.
00:26:28.100 --> 00:26:35.200
Here is another set of problem, same prompt but what is missing here?
00:26:35.200 --> 00:26:45.300
Here it says what math scores separates the lowest 10% than the rest or what z scores separates the lowest 25% from the rest?
00:26:45.300 --> 00:26:52.800
Here you have the probabilities and you have the mean of stdev but one of these 2 things might be missing.
00:26:52.800 --> 00:26:55.800
I think of this as the score missing problems.
00:26:55.800 --> 00:27:01.200
We have probability missing problems and we have score missing problems.
00:27:01.200 --> 00:27:05.600
The last set looks like this.
00:27:05.600 --> 00:27:17.500
Note that this prompt does not give you the stdev but it does give you the z score and the raw score so that you could find the stdev here.
00:27:17.500 --> 00:27:25.000
Or it gives you the percentages and the raw score and then you find the stdev.
00:27:25.000 --> 00:27:28.200
Here something like a stdev is missing.
00:27:28.200 --> 00:27:34.500
You could also find the mean to be missing as well and they would look similar to this problem.
00:27:34.500 --> 00:27:46.000
As long as you have some combination of the other values in place, you could actually figure out what is missing.
00:27:46.000 --> 00:27:50.900
Now let us add in the shape analogy that we did in the previous lesson.
00:27:50.900 --> 00:27:54.400
I’m going to skip over a lot of this because it all stays the same.
00:27:54.400 --> 00:27:57.600
They are only parts where they drive your attention to is right here.
00:27:57.600 --> 00:28:04.800
Before we could only look at our probabilities and raw scores but now we have extended our repeater.
00:28:04.800 --> 00:28:13.300
We could look at probabilities to raw scores to z scores and even find the missing mean and stdev.
00:28:13.300 --> 00:28:25.300
That is all done to a combination of the z scores formula as well as either the tables or Excel.
00:28:25.300 --> 00:28:27.300
Let us do some problems.
00:28:27.300 --> 00:28:28.900
Here is example 1.
00:28:28.900 --> 00:28:48.000
In the US, the distribution of deaths due to heart disease 289 deaths per 100,000 stdev of 54 and cancer a mean of 200 stdev of 31 are roughly normal.
00:28:48.000 --> 00:28:50.500
We know that we could use our normal distribution stuff.
00:28:50.500 --> 00:29:00.400
In California, 254 deaths are form heart disease and 166 deaths are from cancer per 100 residents.
00:29:00.400 --> 00:29:08.400
Which rate is more extreme compared to the rest of the states, the average for the US?
00:29:08.400 --> 00:29:16.100
We have California’s death rate from heart disease and from cancer.
00:29:16.100 --> 00:29:19.500
We want to know which of these are really extreme?
00:29:19.500 --> 00:29:26.200
One way that we could find out is by finding out the z scores.
00:29:26.200 --> 00:29:36.700
I am going to get my Excel out.
00:29:36.700 --> 00:29:40.500
Let us first start with the US.
00:29:40.500 --> 00:29:51.400
The US mean for heart disease. I will use this row for heart disease and this row for cancer.
00:29:51.400 --> 00:30:08.800
The US mean is 289 and the stdev is 54.
00:30:08.800 --> 00:30:12.900
For cancer, it is 231.
00:30:12.900 --> 00:30:17.400
We want to know how extreme the California deaths are.
00:30:17.400 --> 00:30:33.100
It is hard to compare with just the numbers because even though 240 is less than 289, and 166 deaths is less than 200,
00:30:33.100 --> 00:30:38.200
we are wondering how far away from the mean are you?
00:30:38.200 --> 00:30:45.400
One way that we could do that is by using z scores because z scores will give you the distance in terms of the stdev.
00:30:45.400 --> 00:30:52.200
Because these populations have very different standard deviation, that is worth knowing.
00:30:52.200 --> 00:31:05.100
Here is California, heart disease and cancer.
00:31:05.100 --> 00:31:15.600
In California the mean is 240 and 166.
00:31:15.600 --> 00:31:19.300
What we might want to know is the z score.
00:31:19.300 --> 00:31:22.100
What is the z score?
00:31:22.100 --> 00:31:37.600
I might just put standardized and put in my x, put in the theoretical mean that I want to compare it too and my stdev.
00:31:37.600 --> 00:31:47.800
Obviously, I could also do my (x – mean) ÷ stdev.
00:31:47.800 --> 00:31:58.300
Here we see that the z score for heart disease is about 1 stdev away, -.9.
00:31:58.300 --> 00:32:01.200
How far down is cancer?
00:32:01.200 --> 00:32:04.900
How much less is cancer?
00:32:04.900 --> 00:32:08.700
How more healthy are Californian in terms of cancer?
00:32:08.700 --> 00:32:14.900
For here I will put in my regular formula, just so that we practice that too
00:32:14.900 --> 00:32:24.100
My cancer x – theoretical population ÷ stdev.
00:32:24.100 --> 00:32:27.600
I want it in terms of standard deviation step.
00:32:27.600 --> 00:32:42.100
We find out that cancer and this is probably because stdev is smaller and actually sort of out is farther than heart disease.
00:32:42.100 --> 00:32:47.700
The cancer rate is more extreme in a positive way.
00:32:47.700 --> 00:32:57.200
It is more extremely low than heart disease, although they are very close.
00:32:57.200 --> 00:33:15.400
Here the trick is find rates in terms of stdev or z scores.
00:33:15.400 --> 00:33:31.800
Example 2, heights of male college students in the US are approximately normal, estimate the percentage of these males that are at least 6ft tall.
00:33:31.800 --> 00:33:36.000
It will help if we just sketch this out briefly.
00:33:36.000 --> 00:33:47.400
Here is my mean as 70.1, my stdev is 2.7, what percentage of these males are at least 6 ft tall?
00:33:47.400 --> 00:34:02.800
What helps is if you transform this 6ft into inches, that is 72 inches.
00:34:02.800 --> 00:34:05.400
Where is 72 inches?
00:34:05.400 --> 00:34:08.000
That might be something like this.
00:34:08.000 --> 00:34:12.500
What percentage of males are at least 6 ft tall?
00:34:12.500 --> 00:34:18.100
I want all of these people because all of these people are at least 6 ft tall.
00:34:18.100 --> 00:34:21.000
They might be 7 ft tall but they are at least 6 ft tall.
00:34:21.000 --> 00:34:22.800
That is what I really want.
00:34:22.800 --> 00:34:32.200
What might be helpful is instead of my raw score, if I could find my z score and then I can look it up on the table in the book.
00:34:32.200 --> 00:34:35.200
Or I could use Excel.
00:34:35.200 --> 00:34:46.600
I’m going to use Excel to help me.
00:34:46.600 --> 00:34:54.700
Let us find the z score for this.03449.5 In order to find z score, let me write the formula here.
00:34:54.700 --> 00:35:14.800
In order to find my z score for 72 inches that would be 72 which is my (x – mean 70.1) ÷ stdev 2.7.
00:35:14.800 --> 00:35:27.300
Because I want to know how many jumps away, my jumps are 2.7.
00:35:27.300 --> 00:35:43.800
Now I will pull up my Excel and I could just put in (72 – 70.1) ÷ 2.7.
00:35:43.800 --> 00:35:50.700
I find that my z score is .70.
00:35:50.700 --> 00:35:58.800
In order to find the area, the cumulative area, remember cumulative area means this side.
00:35:58.800 --> 00:36:03.500
Excel is going to give me this side but that is now I want.
00:36:03.500 --> 00:36:05.600
I want this side.
00:36:05.600 --> 00:36:11.900
I might put in 1 – this area in order to get that area.
00:36:11.900 --> 00:36:29.800
1 – and will just put in my norms dist.
00:36:29.800 --> 00:36:35.600
Thankfully Excel gives you a little hint if you are a little off.
00:36:35.600 --> 00:36:56.300
What I get is .24, about 24% of these males are at least 6 ft tall.
00:36:56.300 --> 00:37:02.700
This area is about 24%.
00:37:02.700 --> 00:37:14.700
If I want to write it all out I would write my probability where x is greater than 72 inches is .24.
00:37:14.700 --> 00:37:29.100
This is example 3, in a standard normal distribution where P(z score) < .41 = .659, what is the mean and stdev?
00:37:29.100 --> 00:37:37.000
Actually this is a tricky question, before you go often trying to find the z score and all the stuff, note that it says standard normal distribution.
00:37:37.000 --> 00:37:46.600
Every standard normal distribution, mean is 0, stdev 1.
00:37:46.600 --> 00:38:02.900
Example 4, find the percentage of values in a standard normal distribution that fall between -.1.446 and 1.46.
00:38:02.900 --> 00:38:08.000
This is nice if we would sketch this out so that we know what to expect.
00:38:08.000 --> 00:38:15.400
Here is 0, and this is a standard normal distribution that is why I know that it is 0.
00:38:15.400 --> 00:38:27.900
Here is 1, here is where 1.46 is.
00:38:27.900 --> 00:38:34.300
A little bit less than1/2, probably a little bit too much less than half.
00:38:34.300 --> 00:38:48.600
I want to find this area right in between.
00:38:48.600 --> 00:38:59.900
Usually we look at the area between -1 and 1, and we know that 68%.
00:38:59.900 --> 00:39:02.500
We know that it is a little bit more than 68%.
00:39:02.500 --> 00:39:11.900
It is not quite like 95%, it is not quite that high but it is somewhere in between 68 and 95%.
00:39:11.900 --> 00:39:13.700
Let us try to figure this out.
00:39:13.700 --> 00:39:22.700
One way we could do it is by using our Excel or by looking it up in our book.
00:39:22.700 --> 00:39:29.200
In our book and in Excel, our problem is that they will give you the cumulative distribution.
00:39:29.200 --> 00:39:32.100
They will give you this area.
00:39:32.100 --> 00:39:35.800
What we really want is this area.
00:39:35.800 --> 00:39:37.900
What do we do?
00:39:37.900 --> 00:39:48.600
One thing we might want to do is just use our ability to reason so this is 50% of that curve.
00:39:48.600 --> 00:40:05.100
If we take 50% and take away this area, we could use norms dist and put in -1.46 then that should give us this area right here.
00:40:05.100 --> 00:40:14.000
This whole area is this.
00:40:14.000 --> 00:40:17.800
This area is in blue.
00:40:17.800 --> 00:40:21.500
That should give us the red area.
00:40:21.500 --> 00:40:42.200
Here I’m going to put in 50% - norms dist – 1.46.
00:40:42.200 --> 00:40:51.900
I want to take that area and multiply it by 2 because I want the other side too and it is perfectly symmetrical.
00:40:51.900 --> 00:40:55.600
We do not have to do any work just by multiply by 2.
00:40:55.600 --> 00:41:05.100
Then I would get .8557 so a little bit more cloes to 86%.
00:41:05.100 --> 00:41:06.100
That makes sense.
00:41:06.100 --> 00:41:10.000
Since it is more than 68%, it is less than 95% right?
00:41:10.000 --> 00:41:15.200
80 and 86%.
00:41:15.200 --> 00:41:36.300
My answer is where -1.46 my z falls in between this values, .86.
00:41:36.300 --> 00:41:42.200
That is example 4 and that is it for the standard normal distribution.
00:41:42.200 --> 00:41:44.000
Thanks for watching www.educator.com.