WEBVTT mathematics/probability/murray
00:00:00.000 --> 00:00:06.100
Hello, welcome back to the probability lectures here on www.educator.com, my name is Will Murray.
00:00:06.100 --> 00:00:12.700
Today, we are going to talk about sampling from a normal distribution, which is really starting to get into statistics.
00:00:12.700 --> 00:00:17.100
Sometimes it is still considered as a topic in probability.
00:00:17.100 --> 00:00:19.700
We are going to go ahead and talk about it.
00:00:19.700 --> 00:00:21.700
We are not using the Central Limit Theorem,
00:00:21.700 --> 00:00:26.100
I have another lecture on the Central Limit Theorem that is going to come after this one.
00:00:26.100 --> 00:00:30.500
If you are looking for Central Limit Theorem, just skip ahead and look for that video.
00:00:30.500 --> 00:00:35.000
In the meantime, we are going to learn how to take samples from a normal distribution
00:00:35.000 --> 00:00:37.600
without using the Central Limit Theorem.
00:00:37.600 --> 00:00:39.700
Let us see what this is all about.
00:00:39.700 --> 00:00:42.500
First of all, I have to tell you what it means to take samples.
00:00:42.500 --> 00:00:48.500
The idea is that, we have some kind of population of stuff and that can be almost anything.
00:00:48.500 --> 00:00:54.500
The example I have here is, we have, maybe students in the university and the students are all different heights.
00:00:54.500 --> 00:01:00.000
We are going to select some students at random and measure their heights.
00:01:00.000 --> 00:01:04.400
That is taking a bunch of samples, it could be that you are testing,
00:01:04.400 --> 00:01:10.100
for example, a soda machine and you are testing whether it dispenses the right amount of soda.
00:01:10.100 --> 00:01:14.700
You fill up 10 cups of soda and you measure how much they put in each cup.
00:01:14.700 --> 00:01:19.900
That is the same kind of idea, you are taking a bunch of samples of the population.
00:01:19.900 --> 00:01:25.600
The population that we are studying has a mean ν and the variance σ².
00:01:25.600 --> 00:01:31.400
We do not always know those, in this lecture we are going to need to know the variance but we will not always know the mean.
00:01:31.400 --> 00:01:37.800
A lot of examples that we will be studying, we would not know the mean of the population.
00:01:37.800 --> 00:01:40.200
As I mentioned, we are going to take samples.
00:01:40.200 --> 00:01:46.000
If we are talking about students and different heights, then our Y1 would represent the first student that we sample.
00:01:46.000 --> 00:01:50.000
We meet a student at random and then we measure how tall that person is.
00:01:50.000 --> 00:01:55.000
We meet another student at random and measure how tall that person is, and that is Y2.
00:01:55.000 --> 00:02:03.300
We meet another student at random, and so on, until we meet our last student which is the YN.
00:02:03.300 --> 00:02:06.600
We measure the height of that last sample student.
00:02:06.600 --> 00:02:09.400
That is what it means to take samples, you have a population,
00:02:09.400 --> 00:02:14.800
you select some of them at random and you measure whatever quantity you are interested in
00:02:14.800 --> 00:02:17.900
for each one of those of random selections.
00:02:17.900 --> 00:02:22.400
Let me show you some assumptions that we need to get started.
00:02:22.400 --> 00:02:29.200
The assumption that we are going to make for this lecture and for the next one, is that our samples are independent.
00:02:29.200 --> 00:02:33.900
If we happen to find a couple of tall students, it does not make it more or less likely
00:02:33.900 --> 00:02:37.200
that the student after that will be extra tall or extra short.
00:02:37.200 --> 00:02:43.300
The samples are totally independent of each other, that let us use some of our theorems about random variables.
00:02:43.300 --> 00:02:51.800
There is a buzz phrase that people use in probability and statistics, it is independent identically distributed.
00:02:51.800 --> 00:02:57.300
That is often shortened to IID because that phrase is used so often.
00:02:57.300 --> 00:03:02.500
IID means independent identically distributed random variables.
00:03:02.500 --> 00:03:05.100
The independent part is the assumption we just made.
00:03:05.100 --> 00:03:09.300
Identically distributed means that they are coming from the same population.
00:03:09.300 --> 00:03:17.100
Taking students from the same university or we are measuring how much a soda machine dispenses
00:03:17.100 --> 00:03:21.400
and we are taking samples from the same soda machine, that kind of thing.
00:03:21.400 --> 00:03:30.800
The assumption for this lecture only, for this lecture on normal population, is that our population has a normal distribution.
00:03:30.800 --> 00:03:33.800
That is the key difference between this lecture and the next lecture.
00:03:33.800 --> 00:03:36.400
The next lecture is on the central limit theorem.
00:03:36.400 --> 00:03:38.500
Up till now, everything else is the same.
00:03:38.500 --> 00:03:43.700
In the next lecture, we would not need to assume their population is normally distributed.
00:03:43.700 --> 00:03:49.800
In this lecture, we are assuming normal distribution because we are not using the central limit theorem.
00:03:49.800 --> 00:03:53.800
The notation that we use for that normal distribution is that,
00:03:53.800 --> 00:04:03.500
we say each of our samples has a normal distribution and it has mean ν and variance σ².
00:04:03.500 --> 00:04:08.700
That is the notation that we use for a normal distribution.
00:04:08.700 --> 00:04:16.900
The first variable is always the mean and the second variable is always the variance there.
00:04:16.900 --> 00:04:22.800
What we are going to do is take all of our samples, we will measure the heights of each student.
00:04:22.800 --> 00:04:27.700
And then, we will take the average of the heights that we measure, that is called the sample mean.
00:04:27.700 --> 00:04:31.200
We often use Y ̅ as a notation for the sample mean.
00:04:31.200 --> 00:04:35.500
It just means the average, it just means we take all the qualities that you measure
00:04:35.500 --> 00:04:39.700
and you add them up, and you divide by N.
00:04:39.700 --> 00:04:49.400
The idea is, we are going to use the sample mean and ask questions about how it relates to the actual mean of the population.
00:04:49.400 --> 00:04:58.100
It is not necessarily true that the average of the students that I study is equal to the average of the entire student population.
00:04:58.100 --> 00:05:05.600
The examples in this lecture are, all have to do with questions about how close are those 2 means to each other.
00:05:05.600 --> 00:05:08.900
What is the mean of the students we study, that is the sample mean.
00:05:08.900 --> 00:05:12.200
What is the mean of the entire population?
00:05:12.200 --> 00:05:19.000
That was the population mean which is not the same as the sample mean.
00:05:19.000 --> 00:05:24.200
The population mean, remember we were saying was μ.
00:05:24.200 --> 00:05:31.000
That is what we mean by the population mean, it is the average of all the students at the entire university.
00:05:31.000 --> 00:05:41.400
The sample mean is just the mean of the few students that we select to study and actually measure 1 by 1.
00:05:41.400 --> 00:05:47.000
The theorem that we are going to be using here is, the assumption we already made is that
00:05:47.000 --> 00:05:55.000
each of one of the variables Yi has a normal distribution with mean ν and variance σ².
00:05:55.000 --> 00:05:59.500
Then, the sample mean also has a normal distribution.
00:05:59.500 --> 00:06:06.500
It also has mean ν but its variance is σ²/N.
00:06:06.500 --> 00:06:16.900
It actually has a smaller variance than the individual random variables, representing the individual measurements.
00:06:16.900 --> 00:06:22.800
Just to summarize that words, Y ̅ the sample mean is a normally distributed random variable
00:06:22.800 --> 00:06:27.000
with mean ν and variance σ²/N.
00:06:27.000 --> 00:06:35.700
This is the key to the whole idea of sampling which is that, as you take more samples that means N grows.
00:06:35.700 --> 00:06:37.600
N is the number of samples that you take.
00:06:37.600 --> 00:06:43.100
If you take more samples, it shrinks the variance of your mean,
00:06:43.100 --> 00:06:49.200
what it means that your sample mean is more likely to be accurate.
00:06:49.200 --> 00:06:55.100
It is less variable, that is why it is more accurate to take a survey with a large number of people
00:06:55.100 --> 00:06:59.000
than with a small number because it shrinks the variance.
00:06:59.000 --> 00:07:05.800
This is the mathematical principle that guarantees that.
00:07:05.800 --> 00:07:10.300
What we are going to do is, we are going to use the fact that
00:07:10.300 --> 00:07:20.200
we have a normal distribution on this sample mean to ask questions about probabilities.
00:07:20.200 --> 00:07:24.900
The whole point of asking questions about probabilities is that,
00:07:24.900 --> 00:07:29.200
we can convert normal distributions to standard normal distributions.
00:07:29.200 --> 00:07:32.500
I will show you the equation for that on the next slide.
00:07:32.500 --> 00:07:39.000
Let me just mention that we can answer questions about standard normal distributions using charts.
00:07:39.000 --> 00:07:43.500
That is what I got on this slide, right here, a standard normal distribution.
00:07:43.500 --> 00:07:47.500
We often use the variable Z for a standard normal distribution.
00:07:47.500 --> 00:07:57.600
What that means is that, it is a normal distribution with mean 0 and variance 1 or standard deviation 1.
00:07:57.600 --> 00:08:01.200
That is what the standard normal distribution means.
00:08:01.200 --> 00:08:06.100
We got mean 0 and variance 1.
00:08:06.100 --> 00:08:12.000
The whole point of standard normal distributions is that, there are standard charts that you can look up probabilities on.
00:08:12.000 --> 00:08:16.200
We will be using this to solve the problems later on in the lecture.
00:08:16.200 --> 00:08:19.300
What all these little numbers represent on the chart is
00:08:19.300 --> 00:08:25.900
the probability of being above a certain cut off for a standard normal distribution.
00:08:25.900 --> 00:08:40.000
For example, if you want to say what is the probability that Z is greater than 0.93, for example.
00:08:40.000 --> 00:08:47.500
I would look at 0.9 here and I see the second decimal place is 0.03.
00:08:47.500 --> 00:08:54.400
I see that, that number on my chart there is 0.1762.
00:08:54.400 --> 00:08:59.300
My answer here would be 0.1762.
00:08:59.300 --> 00:09:05.900
That is how we look up to a probability for a standard normal distribution.
00:09:05.900 --> 00:09:15.100
It is also frequently asked, what is the probability that Z is less than something or between to bounds.
00:09:15.100 --> 00:09:20.500
What you have to do is always keep this picture in mind.
00:09:20.500 --> 00:09:24.300
If you want to find the probability that Z is less than something,
00:09:24.300 --> 00:09:33.400
what you do is you use this chart to find the probability that Z is greater than that cutoff, then do 1 - that probability.
00:09:33.400 --> 00:09:40.500
Let me write that down, the probability that Z is less than some cutoff is equal to 1 –
00:09:40.500 --> 00:09:43.800
the probability that Z is greater than some cutoff.
00:09:43.800 --> 00:09:48.000
We will see some examples of that in the exercises.
00:09:48.000 --> 00:09:52.500
You can also talk about the probability of Z being between 2 cutoffs.
00:09:52.500 --> 00:09:59.100
You can also flip these numbers around to get probabilities for negative values of Z.
00:09:59.100 --> 00:10:01.700
We will practice all of that in the examples.
00:10:01.700 --> 00:10:08.700
Remember, this is just for a standard normal distribution, when we have mean 0 and standard deviation 1.
00:10:08.700 --> 00:10:14.200
Let me show you now what you do for a nonstandard normal distribution.
00:10:14.200 --> 00:10:22.900
If you have a nonstandard normal distribution, what you do is you take any normal distribution.
00:10:22.900 --> 00:10:30.100
To make it standard normal, you subtract off its mean and you divide by its standard deviation.
00:10:30.100 --> 00:10:38.200
And then, what you get we are going to call that Z is a normal distribution that is standard.
00:10:38.200 --> 00:10:42.200
The mean is 0 and the standard deviation is 1.
00:10:42.200 --> 00:10:44.500
Let me remind you of that theorem we had.
00:10:44.500 --> 00:10:49.500
We had that theorem that said that Y ̅ is a normal distribution.
00:10:49.500 --> 00:10:57.000
Its mean is μ and its variance is σ²/N.
00:10:57.000 --> 00:11:03.100
That means its mean is μ and the standard deviation, remember is always the square root of the variance.
00:11:03.100 --> 00:11:16.300
Standard deviation is the square root of σ²/N, which is σ/√N.
00:11:16.300 --> 00:11:23.600
What we do to convert to Y ̅ to a standard normal is, we do Y ̅ - its mean.
00:11:23.600 --> 00:11:32.500
Y ̅ - μ divided by its standard deviation, we do σ divided by √N.
00:11:32.500 --> 00:11:38.500
Since that is in the denominator, a fraction, I’m going to do a little flip.
00:11:38.500 --> 00:11:47.700
I will get our √N × Y ̅ - μ divided by σ.
00:11:47.700 --> 00:11:53.300
That, according to my theorem is a standard normal distribution.
00:11:53.300 --> 00:11:58.500
If you call that Z then that is a standard normal distribution.
00:11:58.500 --> 00:12:04.800
It means I can use those charts, the chart that I showed you on the previous side, to look up probabilities for Z.
00:12:04.800 --> 00:12:07.400
That is the way you play this game.
00:12:07.400 --> 00:12:13.500
Often, the way the examples pen out is you are asked something about Y ̅.
00:12:13.500 --> 00:12:17.100
Often, you are asked about the relationship between Y ̅ and μ.
00:12:17.100 --> 00:12:23.400
How likely is it that Y ̅ and μ are within 1 unit of each other, something like this.
00:12:23.400 --> 00:12:27.000
You are trying to solve something about Y ̅ – μ.
00:12:27.000 --> 00:12:39.900
The trick here is to take Y ̅ – μ, and you are asked how likely is that Y ̅ - μ is within ½ unit of each other, something like that.
00:12:39.900 --> 00:12:43.900
The trick to solving these things is to convert it to a standard normal.
00:12:43.900 --> 00:12:53.100
The way you convert to standard normal is you multiply by √N/σ.
00:12:53.100 --> 00:13:00.600
That is a standard normal variable and you can look up probabilities for that on the charts.
00:13:00.600 --> 00:13:07.000
We will get a lot of practice of that in the exercises, but I want you to see the general idea first,
00:13:07.000 --> 00:13:16.400
which is that you start with Y ̅ - μ and then you multiply by √N/σ to convert it into a standard normal variable.
00:13:16.400 --> 00:13:19.900
I will go ahead and practice some exercises with that.
00:13:19.900 --> 00:13:25.300
The first example, we are going to measure the heights of students at a particular university.
00:13:25.300 --> 00:13:29.500
We are given that they are normally distributed, that means we can use the theorems
00:13:29.500 --> 00:13:32.600
that we have learned from this lecture.
00:13:32.600 --> 00:13:39.200
There is a standard deviation of 4 inches, that is going to be the σ that we are going to use later on.
00:13:39.200 --> 00:13:41.700
We are going to measure the heights of 9 students.
00:13:41.700 --> 00:13:46.400
We are going to sample 9 students, we are just going to go out in the crowd at the university
00:13:46.400 --> 00:13:49.300
and grab the first 9 students that we see.
00:13:49.300 --> 00:13:51.600
We are going to measure how tall they are.
00:13:51.600 --> 00:13:54.900
The question is, what is the probability that their mean,
00:13:54.900 --> 00:14:01.300
that means the mean of the students that we study is within 2 inches of the global means.
00:14:01.300 --> 00:14:05.700
That is the mean of all the students at the university.
00:14:05.700 --> 00:14:09.000
I will just clarify the difference between those 2 means here.
00:14:09.000 --> 00:14:15.600
When I say the global mean, that is all students at the university,
00:14:15.600 --> 00:14:24.100
all the students at enormous state university that you attend, or maybe you do not.
00:14:24.100 --> 00:14:31.200
Their mean, when I say their mean that means the sample mean of the students that we include in our sample.
00:14:31.200 --> 00:14:34.700
That is those 9 students that we are going to study.
00:14:34.700 --> 00:14:44.300
The global mean, the notation we have been using for that is μ, the sample mean is Y ̅.
00:14:44.300 --> 00:14:50.900
The question is, what is the probability that those 2 means are within 2 inches of each other?
00:14:50.900 --> 00:14:56.300
Within 2 inches of each other, that means + or -2 inches in either direction.
00:14:56.300 --> 00:15:06.200
We are trying to solve the question of Y ̅ – μ, the sample mean - the global mean.
00:15:06.200 --> 00:15:10.000
We are going to put absolute values to get myself within 2 inches.
00:15:10.000 --> 00:15:14.700
The absolute value of A - B means the distance from A to B.
00:15:14.700 --> 00:15:23.200
What is the probability that the distance from the sample mean to the global mean is less than or equal to 2?
00:15:23.200 --> 00:15:27.400
That is what we are trying to solve, we are trying to find the probability of that.
00:15:27.400 --> 00:15:33.000
But remember, the trick here is to convert to a standard normal distribution.
00:15:33.000 --> 00:15:42.300
The way you convert to a standard normal distribution is, you multiply both sides Y ̅ – μ.
00:15:42.300 --> 00:15:53.600
We multiply both sides by √N/σ.
00:15:53.600 --> 00:16:00.600
I will do the same thing on the right, 2 √N/σ.
00:16:00.600 --> 00:16:12.600
Let me fill in now what I know, I know that √N, N = 9, that is the number of students that we surveyed here.
00:16:12.600 --> 00:16:22.500
√N would then be 3, that is 2 × 3.
00:16:22.500 --> 00:16:32.400
The σ is the standard deviation, that is 4 inches.
00:16:32.400 --> 00:16:41.100
2 × 3/4, that is a very easy thing to simplify, that is just 3/2, that is 1.5.
00:16:41.100 --> 00:16:45.500
The point of that is that, that was a standard normal variable.
00:16:45.500 --> 00:16:51.900
We are asking about the absolute value of Z being less than or equal to 1.5.
00:16:51.900 --> 00:16:59.400
We are trying to find the probability that a standard normal variable will be less than or equal to 1.5.
00:16:59.400 --> 00:17:03.700
Let me draw a little graph here and show you what kind of area we are looking for.
00:17:03.700 --> 00:17:07.400
We will actually look it up on the next slide.
00:17:07.400 --> 00:17:14.400
What we are looking for here, I will draw my standard normal variable centered at 0 because it has mean 0.
00:17:14.400 --> 00:17:16.500
That is supposed to be symmetric.
00:17:16.500 --> 00:17:28.200
Here is -1.5 and I'm looking for that probability in between those two bounds right there.
00:17:28.200 --> 00:17:32.000
That is not exactly what the chart will tell me directly.
00:17:32.000 --> 00:17:39.800
What the chart will tell me is, if I have a particular Z value in mind, remember what the chart tells me.
00:17:39.800 --> 00:17:44.700
It will tell me the probability that Z is bigger than that value.
00:17:44.700 --> 00:17:53.400
I have to figure out from that, what my probability is that Z is between -1.5 and 1.5.
00:17:53.400 --> 00:17:59.800
The probability that the absolute value of Z is less than or equal 1.5.
00:17:59.800 --> 00:18:08.900
I see I can get it by taking that outside probability and subtracting off two copies of it, because there is a bottom tail and a top tail there.
00:18:08.900 --> 00:18:16.500
It is 1 -2 × the probability that Z is greater than 1.5.
00:18:16.500 --> 00:18:19.700
That is something that I will be able to look up on my chart, on the next slide.
00:18:19.700 --> 00:18:23.900
I just wanted to make sure that you understand where it is going to come from.
00:18:23.900 --> 00:18:29.100
I will look that up and that will give me my answer to this example.
00:18:29.100 --> 00:18:33.300
Let me recap the steps here, before I turn the page.
00:18:33.300 --> 00:18:40.400
I'm setting up a sample mean Y ̅ and a global mean μ.
00:18:40.400 --> 00:18:43.700
I want them to be within 2 inches of each other.
00:18:43.700 --> 00:18:48.400
That means, I want their difference to be less than 2 in absolute value.
00:18:48.400 --> 00:18:55.200
And then, I'm kind of building up my standard normal variable by multiplying that by √N/σ.
00:18:55.200 --> 00:19:10.300
Remember, the whole point was √N/σ × Y ̅ - μ is the standard normal variable.
00:19:10.300 --> 00:19:15.000
I’m building up that expression here, I multiply by √N/σ.
00:19:15.000 --> 00:19:23.000
My N was 9, √N was 3, my σ was 4 because that was the standard deviation given to me.
00:19:23.000 --> 00:19:28.200
I can simplify that down to 1.5, I want the probability that a standard normal variable
00:19:28.200 --> 00:19:35.700
will be less than 1.5 and absolute value, which really means between -1.5 and 1.5.
00:19:35.700 --> 00:19:43.000
Which means to calculate that area, I am going to have to look at the tail and subtract off two copies of the tail,
00:19:43.000 --> 00:19:46.100
because there is a top tail and bottom tail there.
00:19:46.100 --> 00:19:53.100
That is why I'm going to multiply this probability by 2, the probability that I will figure out on the next slide.
00:19:53.100 --> 00:19:59.200
Let me go ahead and figure out that probability using the chart.
00:19:59.200 --> 00:20:08.200
What we figured out is that, our probability that Z is less than or equal to 1.5.
00:20:08.200 --> 00:20:12.400
This is coming from the previous slide, you can scroll back and watch it if you like.
00:20:12.400 --> 00:20:19.900
That is 1 -2 × the probability that Z is greater than or equal to 1.5.
00:20:19.900 --> 00:20:27.700
I need to find 1.5 on my chart, there it is right there 1.5, it is 1.50.
00:20:27.700 --> 00:20:34.000
I’m going to take this number right here, 0.0668.
00:20:34.000 --> 00:20:57.300
1 - 2 × 0.0668, and that is 1 – 0.668 × 2 is 1336.
00:20:57.300 --> 00:21:10.900
It is 1 -0.1336 and 1 -0.1336 is 0.8664.
00:21:10.900 --> 00:21:17.500
If you wanted to estimate that as a percentage then that is about 87%.
00:21:17.500 --> 00:21:20.800
That is the answer to my problem.
00:21:20.800 --> 00:21:25.300
If I survey these 9 students and measure their heights,
00:21:25.300 --> 00:21:34.400
there is an 87% chance that my sample mean will be within 2 inches of the global mean of the population.
00:21:34.400 --> 00:21:41.700
That is how likely I am to get an accurate estimate, when I survey 9 students.
00:21:41.700 --> 00:21:50.400
If I want to make it more accurate, I will survey more students because that would increase the value of N in my calculations.
00:21:50.400 --> 00:21:59.900
Just to recap what we did on this side, we figured out on the previous side, we want Z to be less than 1.5 and absolute value.
00:21:59.900 --> 00:22:07.600
We figured out that, to get Z less than 1.5, what we can do is look at these two tails
00:22:07.600 --> 00:22:15.400
and cut off the two tails that represent the probability of Z being bigger than 1.5.
00:22:15.400 --> 00:22:20.500
That is why I have 1 -2 × the probability of Z being bigger than 1.5.
00:22:20.500 --> 00:22:29.300
Then I found 1.50 on my chart, 1.50 and there it is 0.0668.
00:22:29.300 --> 00:22:38.400
Plug in that number, do a little calculation and got to my answer of 87%.
00:22:38.400 --> 00:22:42.700
In our second example here, we re given that Y1 through YN
00:22:42.700 --> 00:22:50.700
are independent identically distributed random variables, that is what that phrase IID means.
00:22:50.700 --> 00:22:53.500
It means independent identically distributed.
00:22:53.500 --> 00:22:58.000
It comes up so often in probability that people just use that abbreviation for it.
00:22:58.000 --> 00:23:06.200
Each Y, it has a normal distribution with mean μ and variance σ².
00:23:06.200 --> 00:23:10.200
We are told that σ² is 64 and N is 36.
00:23:10.200 --> 00:23:13.300
N is the number of samples that we are going to be taking.
00:23:13.300 --> 00:23:21.300
We want to find the probability that the sample mean Y ̅ is within 1 unit of the global mean μ.
00:23:21.300 --> 00:23:28.100
We are also asked, what happens to this probability as N goes to infinity?
00:23:28.100 --> 00:23:32.100
Let us think about that, remember Y ̅ is the sample mean.
00:23:32.100 --> 00:23:41.300
It is the average of the samples and we want that to be close to μ which means we want Y ̅ - μ to be small.
00:23:41.300 --> 00:23:48.600
We want it to be less than 1.
00:23:48.600 --> 00:23:56.600
What I would like to do is build up a standard normal variables so I can use my theorem here.
00:23:56.600 --> 00:24:03.900
I'm going to multiply just like before, by √N/σ.
00:24:03.900 --> 00:24:10.700
I will √N/σ on the right hand side, I got to do the same thing to the left and right hand side.
00:24:10.700 --> 00:24:16.100
The point of that was that, that will give me a standard normal variable.
00:24:16.100 --> 00:24:20.600
I’m putting absolute values on the Z because I have the absolute values on Y – μ there.
00:24:20.600 --> 00:24:29.300
In this case, we are given that N is 36, √N = 6.
00:24:29.300 --> 00:24:42.700
Let me say that Z is less than or equal to 1 × √N is 6, σ² is 64 and that tells me that σ would be 8.
00:24:42.700 --> 00:24:47.400
We are given the variance, instead of the standard deviation that time.
00:24:47.400 --> 00:24:51.500
It is a quick maneuver to go from the variance to the standard deviation.
00:24:51.500 --> 00:24:53.000
N the standard deviation is always the square root of the variance.
00:24:53.000 --> 00:24:58.500
In this case, we got 8 and that is ¾.
00:24:58.500 --> 00:25:10.500
I need to find the probability that a standard normal variable is going to be less than or equal to ¾.
00:25:10.500 --> 00:25:18.000
Let me draw a little picture here.
00:25:18.000 --> 00:25:23.400
There is ¾, we want it to be between ¾ and - ¾.
00:25:23.400 --> 00:25:26.200
We are looking for that area right there.
00:25:26.200 --> 00:25:33.500
Just like in example 1, the way we can calculate that area is by finding the area outside there, the tail area.
00:25:33.500 --> 00:25:38.900
And then, subtracting off two copies of the tail.
00:25:38.900 --> 00:26:01.300
The probability that Z is less than or equal to ¾ is equal to 1 - 2 × the tail area, 2 × the probability that Z is bigger than ¾.
00:26:01.300 --> 00:26:06.700
I got a standard normal distribution setup on the next slide.
00:26:06.700 --> 00:26:11.400
But I think what I'm going to do is just tell you what the answer is right now,
00:26:11.400 --> 00:26:16.100
I can go ahead and use that space on this slide to show you the rest of the calculations.
00:26:16.100 --> 00:26:24.300
I will justify this number on the next slide, but what we are going to find is we will look up 0.75 on the next slide.
00:26:24.300 --> 00:26:35.600
We will figure out that the probability, the tail area of being bigger than that is 0.2266.
00:26:35.600 --> 00:26:38.800
That is what I looked up on the next slide.
00:26:38.800 --> 00:26:53.400
This is 1 -2 × 0.2266, and in turn that is 1 -, 2 × that is 0.4532.
00:26:53.400 --> 00:27:03.000
1 - 0.4532 is 0.5468 or approximately 55% there.
00:27:03.000 --> 00:27:15.200
That is the probability that you are going to be within 1 unit of the global mean μ, when you take these samples.
00:27:15.200 --> 00:27:21.300
The second part of this question here says, what happens to this probability as N goes to infinity.
00:27:21.300 --> 00:27:27.200
Let us think about it, as N goes to infinity that means your N is getting bigger and bigger.
00:27:27.200 --> 00:27:35.200
Which, if we trace through these calculations, that means that N would get very big.
00:27:35.200 --> 00:27:40.200
That in turn, would make that number very big, get bigger and bigger.
00:27:40.200 --> 00:27:54.500
That number is big, which means what you are doing is you are kind of moving these goal posts farther and farther out.
00:27:54.500 --> 00:28:00.400
This ¾ would get replaced by a bigger number.
00:28:00.400 --> 00:28:05.800
You would be looking at a wider range of your normal distribution.
00:28:05.800 --> 00:28:10.300
That probability would get bigger and bigger.
00:28:10.300 --> 00:28:16.400
Another way to think about it is that, the probability of being in the tail would get smaller and smaller.
00:28:16.400 --> 00:28:22.500
The probability of being in the tail would get smaller.
00:28:22.500 --> 00:28:29.300
The overall probability would get bigger and bigger.
00:28:29.300 --> 00:28:33.800
In fact, as N goes to infinity that probability would go to 1
00:28:33.800 --> 00:28:40.200
because you encompass more and more of the area, showing that you have more and more probability.
00:28:40.200 --> 00:28:44.600
The probability goes to 1, as N goes to infinity.
00:28:44.600 --> 00:28:52.900
That should make sense to you, it is kind of the precise version of saying that as you take more samples,
00:28:52.900 --> 00:28:57.800
your probability of being accurate is higher and higher and in fact, it approaches 1.
00:28:57.800 --> 00:29:05.600
If you take infinitely many samples then you are guaranteed to get an accurate average.
00:29:05.600 --> 00:29:10.700
Let me recap the steps here, before I jump onto the next slide and show you where that 1 number comes from.
00:29:10.700 --> 00:29:15.500
I still owe you that 0.2266, that part is still mysterious.
00:29:15.500 --> 00:29:20.400
I started out with Y - μ being less than or equal to 1.
00:29:20.400 --> 00:29:29.400
That came from this phrase here, Y is within 1 unit or Y ̅ is within 1 unit of μ.
00:29:29.400 --> 00:29:32.400
I want to build up a standard normal variable.
00:29:32.400 --> 00:29:35.000
I multiply by the √N/σ.
00:29:35.000 --> 00:29:38.100
I multiply both sides there by √N/σ.
00:29:38.100 --> 00:29:45.800
The N was 36 which means the √N is 6, that is √N right there.
00:29:45.800 --> 00:29:56.200
I was told σ² was 64, that tells me σ is 8, that is where that 8 comes from.
00:29:56.200 --> 00:30:04.900
6/8 collapses to 3/4 which means I'm looking at the region between - ¾ and ¾.
00:30:04.900 --> 00:30:11.200
The clever way to calculate that using the chart is, to find the tail region bigger than ¾
00:30:11.200 --> 00:30:14.500
because that is what our chart will do for us.
00:30:14.500 --> 00:30:22.100
The probability of being in that tail region, this is the part I’m going to look up on the chart on the next slide, is 0.2266.
00:30:22.100 --> 00:30:25.800
That is the only part that should not make sense yet, until you see the next slide.
00:30:25.800 --> 00:30:31.500
And then, when I plugged in 0.2266 and I simplified the calculations,
00:30:31.500 --> 00:30:38.100
it just reduced down 2.5468 or just about 55% is my probability.
00:30:38.100 --> 00:30:41.800
I went through and trace the role of N, in those calculations.
00:30:41.800 --> 00:30:49.100
I noticed that, if you put in a bigger and bigger N, we get a bigger cutoff for the bounds of the region here.
00:30:49.100 --> 00:30:55.500
When you extend the bounds outwards, it means your tails are getting smaller.
00:30:55.500 --> 00:31:05.100
The tails is what we subtracted, the total probabilities can get bigger and bigger, and go to 1 as N goes to infinity.
00:31:05.100 --> 00:31:12.700
That should conform with your intuition because it means that, as you take more and more samples,
00:31:12.700 --> 00:31:17.800
you are more and more likely to get accurate estimations of the global mean.
00:31:17.800 --> 00:31:20.900
That is sort of reassuring that it worked out that way.
00:31:20.900 --> 00:31:27.500
Let me show you where this number comes from, this 0.2266, that is the one part that I have not shown you yet.
00:31:27.500 --> 00:31:38.800
That comes from this chart right here of the normal table, we are trying to find the probability that Z was greater than ¾.
00:31:38.800 --> 00:31:44.400
Of course, ¾ in decimal is 0.75.
00:31:44.400 --> 00:31:55.200
I just have to find 0.75 on this chart, here 0.7 over here and here is the 0.05.
00:31:55.200 --> 00:32:10.300
I find their intersection, there it is 0.2266, that is the answer that I plugged into my calculations on the previous slide.
00:32:10.300 --> 00:32:14.500
That was the only missing piece of the puzzle on the previous slide.
00:32:14.500 --> 00:32:26.300
Everything else is supposed to make sense and you are supposed to understand the answer to example 2 now.
00:32:26.300 --> 00:32:33.800
In example 3, we have students at a university and we are going to keep track of the number of units that they have taken.
00:32:33.800 --> 00:32:40.000
It turns out that they have taken an average of 70 units, but their standard deviation is 20 units.
00:32:40.000 --> 00:32:46.300
Most students are probably somewhere between 50 units and 90 units, averaging around 70 units.
00:32:46.300 --> 00:32:49.800
We are going to assume that this is a normal distribution.
00:32:49.800 --> 00:32:52.000
We are going to sample 9 students.
00:32:52.000 --> 00:32:56.700
We are just going to go out and meet 9 students in the quad.
00:32:56.700 --> 00:33:01.200
We are going to say, how many units have you taken?
00:33:01.200 --> 00:33:09.000
We will do that 9 × and then we will calculate the average, just of those 9 students that we have met.
00:33:09.000 --> 00:33:11.800
We do not have the time to meet every student in the university,
00:33:11.800 --> 00:33:17.200
we will just sample 9 of them and calculate the average unit load of those 9 students.
00:33:17.200 --> 00:33:25.500
The question is, are we likely to get an average between 67 and 73 units?
00:33:25.500 --> 00:33:28.300
Let me show you how you think about that.
00:33:28.300 --> 00:33:35.100
67 is 3 units down from the mean, that is 70 -3.
00:33:35.100 --> 00:33:38.700
73 is 3 units up from the mean.
00:33:38.700 --> 00:33:46.900
What we are really asking is, what is our chance that we will be within 3 units of the global mean.
00:33:46.900 --> 00:33:54.000
The global mean is 70 units, the mean of the students that we are surveying is Y ̅.
00:33:54.000 --> 00:34:03.500
We want Y ̅ - μ here to be 3 units, to be less than or equal to 3 units.
00:34:03.500 --> 00:34:13.800
That is to get Y ̅ between 67 and 73, that is what we want to study there.
00:34:13.800 --> 00:34:19.700
Remember, the whole point is that we want to convert this into a standard normal variable.
00:34:19.700 --> 00:34:28.600
Our standard normal variable is always √N/σ × Y ̅ – μ.
00:34:28.600 --> 00:34:35.000
I'm going to multiply on some factors of √N/σ.
00:34:35.000 --> 00:34:46.500
√N/σ × Y ̅ - μ is less than or equal to 3 √N/σ.
00:34:46.500 --> 00:34:51.900
The whole point of that was that gives me a standard normal variable.
00:34:51.900 --> 00:34:58.500
I want to find now the probability that a standard normal variable will be within this range.
00:34:58.500 --> 00:35:03.500
√N, what is my N, that is the number of students that I'm sampling, that is 9.
00:35:03.500 --> 00:35:13.300
√N is going to be 3, that is coming from the √9 there, that is not the 3 that I found up above.
00:35:13.300 --> 00:35:23.200
The σ is the standard deviation of the population which is given to me to be 20 units.
00:35:23.200 --> 00:35:33.800
That is going to be 20 and my Z should be less than or equal to 3 × 3/20.
00:35:33.800 --> 00:35:41.200
9/20, I can convert that into a decimal, I think it is going to be useful 1/20 is 0.0579.
00:35:41.200 --> 00:35:47.100
9 of those is 0.45.
00:35:47.100 --> 00:35:55.300
We really want to find that probability that Z and absolute value is less than 0.45.
00:35:55.300 --> 00:35:58.100
Let me draw a picture of what I'm trying to calculate here.
00:35:58.100 --> 00:36:01.900
It is always very useful to draw pictures of these normal distributions.
00:36:01.900 --> 00:36:04.900
I hope to keep track of what you are looking up on the table.
00:36:04.900 --> 00:36:12.200
In this case, I wanted to be between -0.45 and 0.45.
00:36:12.200 --> 00:36:22.100
Those were not very symmetric there, that should be symmetric because 0.45 is the same distance on either side, there is 0.45.
00:36:22.100 --> 00:36:26.100
I'm looking for that area in between there.
00:36:26.100 --> 00:36:36.500
The probability that the absolute value of Z is less than 0.45.
00:36:36.500 --> 00:36:45.900
Another way to find that would be the probability that -0.45 is less than Z, is less than 0.45.
00:36:45.900 --> 00:36:48.300
That is not something that the table will tell me directly.
00:36:48.300 --> 00:36:55.100
Remember, the table will tell me how big, how much area I have in the tail of the distribution.
00:36:55.100 --> 00:37:00.500
What I will do is, I will find the area and the tail from the table.
00:37:00.500 --> 00:37:03.600
It looks like that I have to subtract 2 tails there.
00:37:03.600 --> 00:37:13.700
1 - 2 × the probability of Z being bigger than 0.45, that is what I'm going to look up on the next slide
00:37:13.700 --> 00:37:17.800
and actually convert that into an answer.
00:37:17.800 --> 00:37:21.000
Let me just to go over the steps again quickly for this slide.
00:37:21.000 --> 00:37:26.400
I was given that I'm looking for unit total between 67 and 73.
00:37:26.400 --> 00:37:34.700
If I want Y ̅ to be between 67 and 73, that is + or -3 from the mean of 70.
00:37:34.700 --> 00:37:39.600
It is the same as saying Y ̅ - μ is less than 3.
00:37:39.600 --> 00:37:42.900
I wanted to convert that into a standard normal variable.
00:37:42.900 --> 00:37:50.900
I multiplied both sides by √N/Σ, in order to build up my standard normal formula.
00:37:50.900 --> 00:37:57.000
My Z is now less than or equal to, 3 × √N.
00:37:57.000 --> 00:37:58.800
I was given that N = 9, where did that come from?
00:37:58.800 --> 00:38:02.800
That is number of students that you sample.
00:38:02.800 --> 00:38:12.500
My standard deviation is 20, that is my σ right there and the 9 was the N.
00:38:12.500 --> 00:38:17.600
I plug those values in and I get up absolute value of Z is less than 0.45.
00:38:17.600 --> 00:38:23.000
And then, I did a quick little picture to see what kind of area I’m measuring.
00:38:23.000 --> 00:38:30.100
I see that the way to measure that area is really to measure the tails, and then subtract the 2 tails from 1.
00:38:30.100 --> 00:38:36.900
That is what I'm going to carry over onto my neck slide and solve it out using a normal chart.
00:38:36.900 --> 00:38:40.900
This is kind of the rest of the example 3, we are going to use the normal chart.
00:38:40.900 --> 00:38:46.300
We figured out that, we are looking for the probability that Z is less than 0.45.
00:38:46.300 --> 00:39:00.200
The absolute value of Z is less than 0.45 which is 1 - 2 × the probability that Z is greater than 0.45.
00:39:00.200 --> 00:39:02.400
Remember, that is what we are finding with this normal chart.
00:39:02.400 --> 00:39:07.600
It will tell you the amount of area in the tail there, the probability in the tail.
00:39:07.600 --> 00:39:10.200
I need to find 0.45 on this chart.
00:39:10.200 --> 00:39:19.100
Here is 0.4, here 0.5, I see 0.3264 at the intersection of that row and column.
00:39:19.100 --> 00:39:48.300
1 -2 × 0.3264 and 2 × 0.3264 is 0.65, 64 × 2 is 128, .028.
00:39:48.300 --> 00:40:04.100
That is 0.3472 or approximately 35%, that is the probability that the 9 students
00:40:04.100 --> 00:40:12.300
that I survey will have their average unit load somewhere between 67 and 72.
00:40:12.300 --> 00:40:25.700
What I really calculated there, in other words the steps in the middle, was the probability that Y ̅ is between 67 and 73.
00:40:25.700 --> 00:40:27.300
Most of this was done on the previous page.
00:40:27.300 --> 00:40:29.500
Most of the dirty works was done on the previous page.
00:40:29.500 --> 00:40:37.900
All that I did on this page was, I use the chart to find the probability that Z was less than 0.45.
00:40:37.900 --> 00:40:42.600
In order to figure that out, I subtracted off 2 tails here.
00:40:42.600 --> 00:40:47.400
I looked up the value of the area in that tail, and then I just did
00:40:47.400 --> 00:40:55.400
a little simplification with the numbers and reduced it down to 35%.
00:40:55.400 --> 00:41:02.700
An example 4, we are going to take 6 samples from a normally distributed population with variance 0.67.
00:41:02.700 --> 00:41:05.300
We want to find the probability that the sample mean,
00:41:05.300 --> 00:41:11.400
the average of our samples will be within 0.5 units of the population mean.
00:41:11.400 --> 00:41:14.000
Let us calculate that out.
00:41:14.000 --> 00:41:20.800
The sample mean is Y ̅, that is the average of the samples that you have taken.
00:41:20.800 --> 00:41:23.900
The population mean is always μ.
00:41:23.900 --> 00:41:30.500
Even though, you do not know exactly what the value of μ is, we always call the population mean μ.
00:41:30.500 --> 00:41:35.200
The probability that they will be within 0.5 units of each other.
00:41:35.200 --> 00:41:43.400
We want Y ̅ - μ to be less than 0.5 in absolute value.
00:41:43.400 --> 00:41:47.900
Let me build up my standard normal variable, as usual.
00:41:47.900 --> 00:41:57.100
I will multiply this by √N/σ × Y ̅ – μ.
00:41:57.100 --> 00:42:02.300
That should be less than or equal 2.5 × √N/σ.
00:42:02.300 --> 00:42:05.800
I actually rigged the numbers for this one, it is supposed to work out fairly well.
00:42:05.800 --> 00:42:07.900
Let me show you how it works out.
00:42:07.900 --> 00:42:12.400
My N was 6, there is N = 6 there.
00:42:12.400 --> 00:42:19.900
I rigged this 0.67 to be equal or very close to 2/3.
00:42:19.900 --> 00:42:23.200
I hope it actually turns out to work.
00:42:23.200 --> 00:42:28.900
That was the variance, that was σ².
00:42:28.900 --> 00:42:41.600
What we have here is a0.5 × √6 divided by, σ by itself will be √2/3.
00:42:41.600 --> 00:42:47.000
The whole point of this was that, this was Z that is supposed to be a standard normal variable.
00:42:47.000 --> 00:43:01.600
We want Z in absolute value to be less than or equal 2.5 ×, I can simplify that in 2.5 × √6 divided by 2/3.
00:43:01.600 --> 00:43:10.800
If I do a little flip on the denominator, I will get 0.5 × 6 × 3/2.
00:43:10.800 --> 00:43:19.500
That is 0.5 × √9 and that is 3/2 or 1.5.
00:43:19.500 --> 00:43:28.400
What I'm really looking for is the probability that the absolute value of Z will be less than 1.5.
00:43:28.400 --> 00:43:35.000
Since I know I have a chart that will tell me the tail, the area in the tail of the distribution.
00:43:35.000 --> 00:43:37.900
That is what the chart will tell me, the area and the tail.
00:43:37.900 --> 00:43:48.400
What I want is the probability that Z is less than 1.5, that means the absolute value of Z is less than 1.5.
00:43:48.400 --> 00:43:53.800
Z is between -1.5 and 1.5.
00:43:53.800 --> 00:43:59.000
The way I can figure that out is I can subtract off 2 tails.
00:43:59.000 --> 00:44:05.800
1 -2 × the probability that Z is bigger than 1.5.
00:44:05.800 --> 00:44:11.200
That is really all I need to do for now, I’m going to look at a standard normal chart on the next slide.
00:44:11.200 --> 00:44:13.800
And, I will go ahead and finish that calculation.
00:44:13.800 --> 00:44:15.700
Let me recap the steps here.
00:44:15.700 --> 00:44:16.900
We want Y ̅ – μ.
00:44:16.900 --> 00:44:21.900
Y ̅ is the sample mean and μ is the population mean.
00:44:21.900 --> 00:44:23.700
Those mean different things.
00:44:23.700 --> 00:44:26.600
The population mean means the entire population.
00:44:26.600 --> 00:44:31.500
Sample mean means just the samples that we are looking at.
00:44:31.500 --> 00:44:36.000
We want to be within 0.5 units of each other.
00:44:36.000 --> 00:44:42.100
I said it is less than 0.5 and I cannot do much for that, until I convert it to a standard normal variable.
00:44:42.100 --> 00:44:47.800
That is what I'm doing here, multiplying both sides by √N/σ.
00:44:47.800 --> 00:44:50.300
I know what N is, it is √6.
00:44:50.300 --> 00:44:58.800
I know what σ is, it is √2/3, it came from there.
00:44:58.800 --> 00:45:05.300
I plug those in, it worked out fairly nicely and I got o3/2 or 1.5.
00:45:05.300 --> 00:45:10.300
We are going to find the probability of being between -1.5 and 1.5.
00:45:10.300 --> 00:45:19.800
They way that I’m going to do that is, by finding the probability of being in the tail and then, subtracting off 2 of those tails.
00:45:19.800 --> 00:45:22.100
I will make that out in the next slide.
00:45:22.100 --> 00:45:30.800
Let me just mention right now, before I bury this slide, that are we are going to use the same setup in example 5.
00:45:30.800 --> 00:45:36.700
The only difference is we are going to change the number of samples, in order to get a better probability.
00:45:36.700 --> 00:45:41.400
Make sure you understand this, before we move on to example 5,
00:45:41.400 --> 00:45:48.200
because example 5 would not make sense, unless you understand example 4 here.
00:45:48.200 --> 00:45:53.600
Let me just flip over to our chart of the normal distribution and we will finish this problem.
00:45:53.600 --> 00:46:08.900
On the previous page, I had solved it down to the probability is 1 -2 × the probability that Z is bigger than 1.5.
00:46:08.900 --> 00:46:16.000
I solved it out into a matter of finding the probability of the tail of the distribution.
00:46:16.000 --> 00:46:28.500
I need to find 1.5, there it is 1.50, the probability is 0.0668.
00:46:28.500 --> 00:46:38.800
This is 1 -2 × 0.0668 and just a little computation is all we have to do here.
00:46:38.800 --> 00:46:52.700
That is 1 – 0.668 × 2 is 0.1336.
00:46:52.700 --> 00:47:05.900
If I simplify that, 1 -0.1336 is 0.8664 and that is approximately 87%.
00:47:05.900 --> 00:47:27.800
If you take 6 samples and you want to find the probability that Y ̅ - μ was less than 0.5, that is what we just calculated here.
00:47:27.800 --> 00:47:35.300
We found the probability that our sample mean will be within 0.5 units of the global mean, it worked out to 87%.
00:47:35.300 --> 00:47:38.100
Most of the work there was on the previous page.
00:47:38.100 --> 00:47:42.800
I just kind of brought it down to looking at one number in the chart, and then just plug it in.
00:47:42.800 --> 00:47:48.900
That one number was the probability that Z is bigger than 1.5.
00:47:48.900 --> 00:47:57.900
We found that probability from the chart here, drop it into the computation, and reduced it down to 0.8664 or 87%.
00:47:57.900 --> 00:48:01.900
We are going to reuse this scenario in example 5.
00:48:01.900 --> 00:48:08.200
I really want to make sure the example 4 makes sense to you, before you go on to example 5.
00:48:08.200 --> 00:48:19.300
In particular, what we are going to do in example 5 is, we are going to try to raise that probability to 95% in example 5.
00:48:19.300 --> 00:48:26.800
Which means we are going to have to change the number of samples that we take.
00:48:26.800 --> 00:48:31.800
In order to get higher probability and more accurate answer, we will need to take more samples.
00:48:31.800 --> 00:48:35.800
Let us go ahead and take a look at example 5, and see how that works out.
00:48:35.800 --> 00:48:40.900
In example 5, we are going to reuse this scenario from example 4.
00:48:40.900 --> 00:48:45.200
If you have not just watched example 4, you really want to go back and watch example 4.
00:48:45.200 --> 00:48:52.000
Make sure that that make sense to you, before you start working through example 5.
00:48:52.000 --> 00:48:54.800
It is the same scenario as in example 4.
00:48:54.800 --> 00:48:59.800
We have a normally distributed population with variance 0.67.
00:48:59.800 --> 00:49:03.100
I want to ensure that our sample mean, our sample mean
00:49:03.100 --> 00:49:11.800
means the average of the samples that we take, will be within 0.5 units of the population mean.
00:49:11.800 --> 00:49:18.100
The population mean is the average of the entire population, that is what we have been calling μ.
00:49:18.100 --> 00:49:21.900
We want the probability to come out to be 95%.
00:49:21.900 --> 00:49:25.400
Since, we are dictating the probability, we cannot dictate the number of samples.
00:49:25.400 --> 00:49:29.200
We are asking, how many samples should we take?
00:49:29.200 --> 00:49:35.100
Let me work this out and show you how to think about this.
00:49:35.100 --> 00:49:39.300
First of all, we want our probability to be 95%.
00:49:39.300 --> 00:49:44.000
Let me think about that, in terms of the picture.
00:49:44.000 --> 00:49:45.500
I will draw a picture there.
00:49:45.500 --> 00:49:52.700
We want some cutoffs where we get 95% of the area in between those cutoff.
00:49:52.700 --> 00:49:59.600
We want those cutoffs to surround 95% of the area.
00:49:59.600 --> 00:50:11.300
I work backwards, that means that the 2 tail areas collectively give me 5% of the area.
00:50:11.300 --> 00:50:23.400
Those 2 tail areas 1 -0.95/2 which is 0.05/2 which is 0.025,
00:50:23.400 --> 00:50:31.000
I'm going to want the probability in the 2 tail areas to be a 0.025 each.
00:50:31.000 --> 00:50:36.800
I want to figure out what cutoff value of Z would correspond to that.
00:50:36.800 --> 00:50:41.100
I want to save myself space on this slide, I’m not going to show you the chart right away.
00:50:41.100 --> 00:50:42.700
We will see that on the next slide.
00:50:42.700 --> 00:50:46.900
You will see that that corresponds to Z = 1.96.
00:50:46.900 --> 00:50:52.100
We will look that up on the next slide and you will see that that is the Z value we are looking for.
00:50:52.100 --> 00:51:00.100
Let us put that on hold for now and let me go back and set up our standard normal variable.
00:51:00.100 --> 00:51:07.400
We want Y ̅ – μ to be within 0.5 units of each other.
00:51:07.400 --> 00:51:17.400
I’m going to set up my standard normal variable just like before, where I multiply both sides by √N/σ.
00:51:17.400 --> 00:51:21.600
I get √N/σ here.
00:51:21.600 --> 00:51:26.500
I will fill in what I can, the problem is I do not know N right now.
00:51:26.500 --> 00:51:31.800
That is going to be a little tricky, my Z is going to be the standard normal variable on the left.
00:51:31.800 --> 00:51:34.100
But, I do not know what N is.
00:51:34.100 --> 00:51:44.000
I do know what I want my Z value to be, or at least I know that I want my Z value to be between -1.96 and 1.96.
00:51:44.000 --> 00:51:49.200
I’m going to put 1.96 in for my Z value, my absolute value of Z.
00:51:49.200 --> 00:51:54.600
And then, I'm going to solve for the other quantities in this picture.
00:51:54.600 --> 00:51:59.100
0.5 √N, I do not know that, that is what I'm going to have to solve for.
00:51:59.100 --> 00:52:01.800
Σ, I think I do know.
00:52:01.800 --> 00:52:16.600
I'm given variance is 0.67, that tells me that σ² is the variance is 0.67, that is √0.67.
00:52:16.600 --> 00:52:22.900
What I’m going to do is solve this equation for N.
00:52:22.900 --> 00:52:27.900
It is going to work out pretty well, it is a calculator exercise really.
00:52:27.900 --> 00:52:42.700
If I multiply over to the other side, I get 1.96 × √0.67 divided by 0.5 is less than or equal to √N.
00:52:42.700 --> 00:52:48.300
You know what, dividing by 1/5 is the same as multiplying by 2.
00:52:48.300 --> 00:52:55.900
Let me go ahead and multiply 1.96 by 2, that will give me 3.92.
00:52:55.900 --> 00:53:02.200
I still have √0.67, and that is supposed to be less than √N.
00:53:02.200 --> 00:53:04.100
Let me square both sides now.
00:53:04.100 --> 00:53:06.500
I think I’m going to flip the N over to the other side.
00:53:06.500 --> 00:53:12.400
N is bigger than or equal to 3.92².
00:53:12.400 --> 00:53:18.700
If I square 0.67, the square root of that, I will just get 0.67 again.
00:53:18.700 --> 00:53:22.500
Now, that is just a matter of dropping the numbers into a calculator.
00:53:22.500 --> 00:53:35.900
I did that, when I drop that into a calculator I get 10.296.
00:53:35.900 --> 00:53:42.700
I just solved for N, and remember that N is the number of samples we are going to take.
00:53:42.700 --> 00:53:51.900
You cannot take a fraction of a sample, you take a whole number samples.
00:53:51.900 --> 00:54:00.000
This 10.296 does not make sense, I'm going to round it up to be on the safe side.
00:54:00.000 --> 00:54:10.400
I will take N = 11 samples and that should be enough to get my probability where I want it to be.
00:54:10.400 --> 00:54:15.100
That is my answer right there, N = 11 samples.
00:54:15.100 --> 00:54:23.800
That is the end of the problem, except to recap it and to show you that on a normal table where that 1.96 came from.
00:54:23.800 --> 00:54:25.800
Let me recap the steps there.
00:54:25.800 --> 00:54:32.400
First, I was thinking that I wanted to have 95% in between whatever boundaries I found,
00:54:32.400 --> 00:54:37.600
which means on the outside of boundaries I’m going to have 5%, 0.05.
00:54:37.600 --> 00:54:43.000
Since, there are two tails, I will divide that by 2 and I got 0.0252.
00:54:43.000 --> 00:54:49.300
I’m looking for a cutoff that cuts off 0.025 of the area.
00:54:49.300 --> 00:54:59.900
I will show you on the next slide that, when we look at the normal table, Z = 1.96 will give us that cut off.
00:54:59.900 --> 00:55:03.000
That is the only part that I need to fill in on the next slide.
00:55:03.000 --> 00:55:07.300
Meanwhile, over here I was setting up my standard normal variable.
00:55:07.300 --> 00:55:13.900
I wanted Y ̅ - μ to be within 0.5 units of each other.
00:55:13.900 --> 00:55:18.800
That is why I set their absolute value of their difference less than 0.5.
00:55:18.800 --> 00:55:27.900
And then, to set up my standard normal variable, I multiplied top and bottom by √N/σ
00:55:27.900 --> 00:55:28.900
and that gives me my absolute value of Z.
00:55:28.900 --> 00:55:33.600
Now, I plugged in that Z = 1.96 here.
00:55:33.600 --> 00:55:38.300
I do not have √N because I do not know what N is.
00:55:38.300 --> 00:55:40.200
That is what I'm asking, how many sample should I take?
00:55:40.200 --> 00:55:48.200
I have to leave that, but I can plug σ which I figure it out here to be √0.67.
00:55:48.200 --> 00:55:52.200
Now, it is an algebra problem, I have manipulated the algebra a little bit.
00:55:52.200 --> 00:55:56.900
1.96 divided by ½ is the same as multiplying by 2.
00:55:56.900 --> 00:56:00.200
That is where that 3.92 came from.
00:56:00.200 --> 00:56:07.900
I’m solving for N, I square both sides and I get N bigger than 3.92² × 0.67.
00:56:07.900 --> 00:56:14.000
I just threw those numbers in my calculator, I would not want to do something like that by hand.
00:56:14.000 --> 00:56:19.000
What I got was 10.296, but since we are talking about numbers of samples.
00:56:19.000 --> 00:56:21.400
We have to take a whole number of samples.
00:56:21.400 --> 00:56:27.000
I rounded that up to be safe to take, N =11 samples.
00:56:27.000 --> 00:56:33.400
That pretty much wraps up example 5, except I have to show you where that 1.96 came from.
00:56:33.400 --> 00:56:40.900
It really came from looking for 0.025 in the chart on the next slide.
00:56:40.900 --> 00:56:44.400
What we are doing here is, I just want to justify to you,
00:56:44.400 --> 00:56:55.500
we wanted the probability that Z is bigger than some little cutoff value of Z to be 0.025.
00:56:55.500 --> 00:56:57.900
That is what we figure out on the previous slide.
00:56:57.900 --> 00:57:00.900
I’m looking for 0.025 in the chart here.
00:57:00.900 --> 00:57:08.700
It looks like these numbers are getting smaller and smaller.
00:57:08.700 --> 00:57:10.100
I’m going to keep looking through these numbers.
00:57:10.100 --> 00:57:20.700
Here, I’m getting close 0.28, .0281, .0274, .0265, 0.0262, .0256, .0250.
00:57:20.700 --> 00:57:24.200
I found it, there is my answer right there.
00:57:24.200 --> 00:57:27.700
I’m going to read off what row and column those came from.
00:57:27.700 --> 00:57:40.400
It came from 1.9 and 0.06, that means that my Z value, my z is 1.96.
00:57:40.400 --> 00:57:45.500
That is where that number came from.
00:57:45.500 --> 00:57:54.400
I will say, we use that number, use on the previous slide,
00:57:54.400 --> 00:58:03.400
and we did some work calculations with that to derive that we want N to be 11, was the answer that we got.
00:58:03.400 --> 00:58:06.400
N = 11 samples.
00:58:06.400 --> 00:58:12.300
You can go back and watch the previous slide, if you do not remember where that came from.
00:58:12.300 --> 00:58:15.400
I would not go over that again now, you can just watch it again if you like.
00:58:15.400 --> 00:58:20.300
What we did on this slide was, we are looking for that cutoff that gave us a tail probability.
00:58:20.300 --> 00:58:28.600
Remember, this tail probability is what we are looking for, that was supposed to be 0.025.
00:58:28.600 --> 00:58:34.400
The real reason for that was, that would make the other tail probability 0.025.
00:58:34.400 --> 00:58:43.100
When you take those 2 probabilities away from 1 ,you get in the middle the probability is 0.95 which is what we are looking for.
00:58:43.100 --> 00:58:45.700
That is where we got the 0.025 from.
00:58:45.700 --> 00:58:48.800
But, we need to figure out which cutoff gave us that probability.
00:58:48.800 --> 00:58:55.400
I found 0.025 in the table, read off its numbers 1.9 and 0.06.
00:58:55.400 --> 00:59:05.000
I got Z = 1.96, and then I did some more calculations with that on the previous slide, to get down to N =11 samples.
00:59:05.000 --> 00:59:10.000
That wraps up this lecture on sampling from a normal distribution.
00:59:10.000 --> 00:59:15.600
The next lecture is going to look very similar to this, but we are going to be using the central limit theorem.
00:59:15.600 --> 00:59:20.100
All the examples will have very similar flavor, we are sort of converting to
00:59:20.100 --> 00:59:23.700
a standard normal variable then looking things up in the charts.
00:59:23.700 --> 00:59:27.000
But, the difference is we are going to be using the central limit theorem
00:59:27.000 --> 00:59:31.200
which means we would not have to start with a normal population anymore.
00:59:31.200 --> 00:59:35.200
When we use the central limit theorem, you can start with any population in the world,
00:59:35.200 --> 00:59:39.700
and then answer the same kinds of questions about whether your sample mean
00:59:39.700 --> 00:59:43.300
is going to be close to your population mean.
00:59:43.300 --> 00:59:46.100
I hope you will stick around and learn the center limit theorem.
00:59:46.100 --> 00:59:51.300
It is probably one of the most important results in probability, that is in the next lecture.
00:59:51.300 --> 00:59:54.600
That is also going to be our last lecture in the series, we are getting near the end.
00:59:54.600 --> 00:59:59.300
I really appreciate you are sticking around me to enjoy these probability lectures.
00:59:59.300 --> 01:00:03.700
This is the probability lecture series here on www.educator.com.
01:00:03.700 --> 01:00:07.000
I am your host, my name is Will Murray, thank you for joining me today, bye.