WEBVTT mathematics/probability/murray
00:00:00.000 --> 00:00:05.800
Hi, welcome back to the probability lectures here on www.educator.com, my name is Will Murray.
00:00:05.800 --> 00:00:09.900
This is our very last probability lecture, I’m want to say a special thank you
00:00:09.900 --> 00:00:13.700
to those of you who stuck with me through all the videos.
00:00:13.700 --> 00:00:19.500
Today, we are going to talk about the central limit theorem which is one of the crown jewels of probability.
00:00:19.500 --> 00:00:26.500
I'm very excited to talk about the central limit theorem and show you how it plays a role in sampling.
00:00:26.500 --> 00:00:31.000
We will be doing a lot of problems, solving questions about samplings.
00:00:31.000 --> 00:00:33.500
I need to give you the background here.
00:00:33.500 --> 00:00:36.200
It starts out just like the previous video.
00:00:36.200 --> 00:00:44.500
If you watched the previous video, sampling from a normal distribution, then the first slide is going to be exactly the same.
00:00:44.500 --> 00:00:51.000
You can safely skip that and then, I will show you what the difference is when we are using the central limit theorem .
00:00:51.000 --> 00:01:00.700
Let us jump into that, the setting here, like I said, this is exactly the same as in the previous video, at least for the first slide.
00:01:00.700 --> 00:01:03.200
The idea is that we have a population of stuff.
00:01:03.200 --> 00:01:10.800
For example, we could have a whole bunch of students at a university and each student is a different height.
00:01:10.800 --> 00:01:14.400
We have some distribution of heights at the university.
00:01:14.400 --> 00:01:20.300
There are some population mean which means the average of all the students at the university.
00:01:20.300 --> 00:01:24.800
We might or might not know that, that μ might be known or it might not be known.
00:01:24.800 --> 00:01:27.700
There is some variance σ².
00:01:27.700 --> 00:01:31.900
In the problems that we are going to solve today, we will need to know what the variance is.
00:01:31.900 --> 00:01:34.600
That should be given to you in the problems.
00:01:34.600 --> 00:01:41.800
And then, we are going to take some samples which means we are going to go out in the quad of the university.
00:01:41.800 --> 00:01:47.000
We will stop some students randomly and survey them on how tall they are.
00:01:47.000 --> 00:01:53.300
Or if we do not like measuring how tall people are, you can ask them how many units they are carrying
00:01:53.300 --> 00:01:59.300
or how much student that they have, or what their bank balance is, or other GPA.
00:01:59.300 --> 00:02:04.300
It does not really matter for the purposes of probability, what quantity we are keeping track of,
00:02:04.300 --> 00:02:09.900
the important thing is that we are taking random samples.
00:02:09.900 --> 00:02:17.100
The way we are going to keep track of them is, each student that we talked to counts as 1 random variable.
00:02:17.100 --> 00:02:22.300
For example, if we are talking about the heights of the students then Y1 is the height of the first student,
00:02:22.300 --> 00:02:27.700
Y2 is the height of the second student, and so on, until nth student.
00:02:27.700 --> 00:02:33.800
If we want to survey N students then YN is the height of the last student.
00:02:33.800 --> 00:02:41.000
Each one of those counts as a random variable and we will calculate the average of those samples.
00:02:41.000 --> 00:02:46.600
We will talk about probability questions related to whether the average of our sample
00:02:46.600 --> 00:02:51.100
is really close to the average of the entire population.
00:02:51.100 --> 00:02:54.500
That is all the same as in the previous lecture.
00:02:54.500 --> 00:02:59.200
What else is the same as the previous lecture is that our samples are independent.
00:02:59.200 --> 00:03:04.200
There is this catch phrase that you hear in probability a lot and in statistics,
00:03:04.200 --> 00:03:08.500
independent identically distributed random variables.
00:03:08.500 --> 00:03:13.900
They are independent meaning that, if we meet 1 student and that student is very tall,
00:03:13.900 --> 00:03:20.400
it does not really tell us that the next student is going to be tall or short because they are independent.
00:03:20.400 --> 00:03:26.500
Identically distributed means they are all coming from the same population.
00:03:26.500 --> 00:03:31.900
That is a buzz phrase to say independent identically distributed random variables.
00:03:31.900 --> 00:03:36.200
Here is where this lecture is different from the previous lecture.
00:03:36.200 --> 00:03:41.600
In the previous lecture, we have to assume that our population was normally distributed,
00:03:41.600 --> 00:03:45.200
which is not really valid when you are talking about heights of students.
00:03:45.200 --> 00:03:51.700
Because one thing, a student can never have a negative height so it is not really normally distributed.
00:03:51.700 --> 00:03:56.200
In this lecture, using the central limit theorem, which I have not gotten to yet,
00:03:56.200 --> 00:04:00.700
we do not have to assume that the population is normally distributed.
00:04:00.700 --> 00:04:07.200
The beautiful thing about the central limit theorem is that the population could have any distribution at all.
00:04:07.200 --> 00:04:13.800
The central limit theorem is very broad and it applies to any distribution at all.
00:04:13.800 --> 00:04:20.700
In particular, when you are doing sampling, you do not have to know the general parameters of the population at all.
00:04:20.700 --> 00:04:25.200
You do not need to know that your population is a normal distribution.
00:04:25.200 --> 00:04:29.300
That is the key feature of the central limit theorem is that,
00:04:29.300 --> 00:04:34.900
you do not need to know ahead of time what kind of distribution you are working with.
00:04:34.900 --> 00:04:38.700
Let me actually tell you, what the central limit theorem is.
00:04:38.700 --> 00:04:46.800
It says that, if you have independent samples from any population with mean μ and variance σ²,
00:04:46.800 --> 00:04:52.000
these are IID independent identically distributed samples.
00:04:52.000 --> 00:05:00.600
The conclusion of the central limit theorem says that, Y ̅ is the sample mean.
00:05:00.600 --> 00:05:08.900
That means you take the samples that you collect and you take their average,
00:05:08.900 --> 00:05:12.700
just of those samples, and that is a new random variable.
00:05:12.700 --> 00:05:16.300
That is a function of the random variables you had before.
00:05:16.300 --> 00:05:24.400
It says that, the distribution of that random variable approaches as N goes to infinity,
00:05:24.400 --> 00:05:35.200
it gets closer and closer to a normal distribution with mean μ and variance σ²/N.
00:05:35.200 --> 00:05:40.600
This is really one of the most extraordinary facts in all of mathematics,
00:05:40.600 --> 00:05:47.400
which is that we did not assume that the original population was normally distributed.
00:05:47.400 --> 00:05:55.900
But, even without that assumption, the sample mean approaches a normal distribution.
00:05:55.900 --> 00:06:05.400
This is sort of why the bell curve, the normal distribution is considered the most important distribution in all probability and statistics.
00:06:05.400 --> 00:06:11.300
It is because even if you do not start with the normal distribution, you could start with any distribution at all,
00:06:11.300 --> 00:06:18.700
you always end up with a normal distribution, as you take samples and you look at the sample mean.
00:06:18.700 --> 00:06:25.700
That is really quite extraordinary but the math is very powerful and it does work out that way.
00:06:25.700 --> 00:06:38.500
Let me mention that in practice, it is kind of the rule of thumb that people use in practice
00:06:38.500 --> 00:06:48.600
when applying the central limit theorem is that, it starts to kick in, remembering it applies as N goes to infinity.
00:06:48.600 --> 00:06:56.300
It really starts to become useful when N is bigger than about 30.
00:06:56.300 --> 00:06:58.300
N is the number of samples that you take.
00:06:58.300 --> 00:07:09.800
If you take more than 30 samples, you can safely assume that your sample mean will follow a normal distribution.
00:07:09.800 --> 00:07:25.800
We can invoke the central limit theorem and say that the sample mean will have a normal distribution,
00:07:25.800 --> 00:07:31.200
a normal distribution with mean μ and variance σ²/N.
00:07:31.200 --> 00:07:36.100
That is kind of how the central limit theorem is used in practice.
00:07:36.100 --> 00:07:45.800
As long as you take at least of 30 samples then, you can say that your sample mean is going to have a normal distribution.
00:07:45.800 --> 00:07:51.400
It does not even matter, what the distribution of your original population was.
00:07:51.400 --> 00:07:56.600
What do we actually do with that, once we know that the sample mean has a normal distribution,
00:07:56.600 --> 00:07:58.400
what are we supposed to do with that.
00:07:58.400 --> 00:08:02.900
From then on, it is pretty much the same as in the previous lecture.
00:08:02.900 --> 00:08:08.100
We can walk you through that, in case you did not just watch the previous video.
00:08:08.100 --> 00:08:14.300
What you do with a normal distribution is you convert it to a standard normal distribution.
00:08:14.300 --> 00:08:26.700
A standard normal distribution, I will remind you is a normal distribution with mean 0 and variance 1.
00:08:26.700 --> 00:08:33.200
It is what it means to be a standard normal distribution as mean 0 and variance 1.
00:08:33.200 --> 00:08:41.300
The point of the standard normal distribution is that, you can look up probabilities for a standard normal distribution using charts.
00:08:41.300 --> 00:08:45.600
Of course, there also lots of online applets that you can use,
00:08:45.600 --> 00:08:51.400
a lot of computer programs will know tell you probabilities for a standard normal distribution.
00:08:51.400 --> 00:08:56.000
For the videos in this lecture, I'm going to use charts.
00:08:56.000 --> 00:09:01.600
If you are lucky enough to have access to the some kind of online tool or computer program,
00:09:01.600 --> 00:09:10.400
that will tell you probabilities for a standard normal distribution then by all means, have at it and use that.
00:09:10.400 --> 00:09:13.100
This is kind of an archaic method that I'm showing you here.
00:09:13.100 --> 00:09:18.000
But still, in a lot of classroom settings people still use chart that is why I’m showing it to you.
00:09:18.000 --> 00:09:23.600
That is how you can look up probabilities for a standard normal distribution.
00:09:23.600 --> 00:09:29.900
This picture is one that is really useful to keep in mind.
00:09:29.900 --> 00:09:36.700
This chart, the way it works is, it tells you the probability of being above a certain cutoff.
00:09:36.700 --> 00:09:43.200
If you want to find the probability of being less than that cutoff, then you have to do something like subtracting from 1.
00:09:43.200 --> 00:09:49.500
If you want to be between 2 cutoffs, then you have to figure out the probability of being in the tails
00:09:49.500 --> 00:09:51.600
and then subtract those from 1.
00:09:51.600 --> 00:09:56.300
That is a kind of computations that you have to do to use these charts.
00:09:56.300 --> 00:10:02.100
That is all based on a standard normal distribution.
00:10:02.100 --> 00:10:05.900
In practice, you usually do not get a standard normal distribution.
00:10:05.900 --> 00:10:09.500
You usually get some kind of random normal distribution.
00:10:09.500 --> 00:10:15.200
Let me show you how you convert it to a standard normal distribution.
00:10:15.200 --> 00:10:21.300
I say recall because I did a whole lecture on this, earlier on in these probability lectures.
00:10:21.300 --> 00:10:24.400
If this is totally new to you, what you might want to do is go back
00:10:24.400 --> 00:10:30.700
and work through the lecture on the normal distribution, that we studied earlier on in this lecture series.
00:10:30.700 --> 00:10:35.200
But if you already worked through that lecture, maybe you just need a quick refresher, here you go.
00:10:35.200 --> 00:10:41.900
What we learned back then was that, if Y is any normal distribution then
00:10:41.900 --> 00:10:45.100
what you can do is convert it to a standard normal distribution,
00:10:45.100 --> 00:10:50.900
by subtracting off the mean and dividing by its standard deviation.
00:10:50.900 --> 00:10:59.700
We call that new variable Z, and what we learned is that Z is a standard normal distribution.
00:10:59.700 --> 00:11:02.000
The point of getting a standard normal distribution is then,
00:11:02.000 --> 00:11:10.300
you can look up probabilities in terms of Z and convert them back to find probabilities in terms of Y.
00:11:10.300 --> 00:11:19.500
What we learned in our central limit theorem was that, Y ̅ is essentially,
00:11:19.500 --> 00:11:27.600
it approaches a normal distribution with mean μ and variance σ²/N.
00:11:27.600 --> 00:11:37.300
What that means is that, if we do Y ̅ and we subtract its mean, that is Y ̅ - μ and divide by,
00:11:37.300 --> 00:11:42.000
its standard deviation is always the square root of its variance.
00:11:42.000 --> 00:11:50.400
I have to do √ σ ⁺N/N, I’m going to call that Z.
00:11:50.400 --> 00:11:59.600
Now, you notice if I take the denominator and flip it upside down because of the fraction in the denominator,
00:11:59.600 --> 00:12:06.600
then I will get σ², √σ² is just σ.
00:12:06.600 --> 00:12:14.000
√N is going to flip up to the numerator, that is why I get that √N/σ × Y – μ.
00:12:14.000 --> 00:12:20.700
That is where I’m getting this expression right here, that is where that comes from.
00:12:20.700 --> 00:12:25.500
The variable that we just created is a standard normal variable.
00:12:25.500 --> 00:12:31.300
I can use the charts to look up probabilities for that standard normal variable.
00:12:31.300 --> 00:12:35.100
That is how I’m going to be solving the examples.
00:12:35.100 --> 00:12:39.900
I’m going to ask you some kind of question about Y ̅,
00:12:39.900 --> 00:12:47.800
and then what we will do is we will build up the standard normal variable, translate it into a question about Z.
00:12:47.800 --> 00:12:54.700
And then, we will use the charts to look up probabilities on Z, that is how that is going to work.
00:12:54.700 --> 00:12:58.300
Let us jump into the exercises and practice that.
00:12:58.300 --> 00:13:02.400
In example 1, this is a very realistic problem.
00:13:02.400 --> 00:13:07.300
Homework problems take you an average of 12 minutes each, but there is a lot of variation there,
00:13:07.300 --> 00:13:10.000
there is a standard deviation of 10 minutes.
00:13:10.000 --> 00:13:16.800
Maybe, if you get a real quick problem, you can quickly dispense of it in 2 minutes.
00:13:16.800 --> 00:13:22.000
Or if you get a really tough one, it could take you 22 minutes or possibly even longer.
00:13:22.000 --> 00:13:26.500
Your assignment is to solve 36 problems, this is going to take awhile.
00:13:26.500 --> 00:13:31.600
What is the probability that it will take you more than 9 hours?
00:13:31.600 --> 00:13:39.100
Let us think about that, first of all I have a conversion to solve here.
00:13:39.100 --> 00:13:58.400
9 hours is 9 hours × 60 minutes per hour, that is 540 minutes.
00:13:58.400 --> 00:14:05.000
If I'm going to take 540 minutes, I want to convert that into an average .
00:14:05.000 --> 00:14:18.000
Remember, what I want to use is that Z is √N/σ × Y ̅ – μ.
00:14:18.000 --> 00:14:21.600
I know that, that will be a standard normal variable.
00:14:21.600 --> 00:14:24.500
Somehow, I got to buildup those quantities.
00:14:24.500 --> 00:14:31.500
If my total time spent on the homework is going to be 9 hours, that is 540 minutes.
00:14:31.500 --> 00:14:35.300
How much time will that be per problem on average?
00:14:35.300 --> 00:14:47.900
My Y ̅, which is the average time per problem, if I spent 540 minutes total then
00:14:47.900 --> 00:14:54.400
that would be 540 minutes divided by 36 problems.
00:14:54.400 --> 00:14:58.100
I rigged up those numbers to work fairly nicely.
00:14:58.100 --> 00:15:09.100
That is 15 minutes per problem.
00:15:09.100 --> 00:15:15.600
I want to know the likelihood that I'm going to end up spending more than 15 minutes per problem,
00:15:15.600 --> 00:15:22.100
over a 36 problem assignment.
00:15:22.100 --> 00:15:34.900
I want to find the probability that my Y ̅ is bigger than 15.
00:15:34.900 --> 00:15:42.300
Somehow, I want to build up this standard normal variable so I can use my normal distribution charts to solve this.
00:15:42.300 --> 00:15:51.500
If Y ̅ is bigger than 15, that mean Y ̅ - μ is bigger than, what was my μ?
00:15:51.500 --> 00:15:58.500
My μ is the average of all homework problems, 12 minutes on average,
00:15:58.500 --> 00:16:02.400
is what it takes me to solve a homework problem.
00:16:02.400 --> 00:16:13.400
15 -12, I will put in a σ and what is my σ?
00:16:13.400 --> 00:16:20.600
My σ is standard deviation, that is 10.
00:16:20.600 --> 00:16:30.700
The last ingredient here is √N, √N I'm going to include that.
00:16:30.700 --> 00:16:35.300
What is my √N, N is the number of problems that I have.
00:16:35.300 --> 00:16:47.300
N was 36, the √N is 6, this is 3 × 6/10, 18/10 is 1.8.
00:16:47.300 --> 00:17:00.800
That was my variable Z, I want to find the probability that Z, my standard normal variable is going to be bigger than 1.8.
00:17:00.800 --> 00:17:10.500
I'm going to look that up on the next page because I have a standard normal chart all set to go, on the next page.
00:17:10.500 --> 00:17:15.100
Let me just go ahead and tell you the answer, so we can wrap it up on this page.
00:17:15.100 --> 00:17:20.600
From the chart on the next page, and I will show you in a moment where that comes from.
00:17:20.600 --> 00:17:31.800
The probability what was it, it was 0.0359 is what we are going to find on the next page.
00:17:31.800 --> 00:17:40.300
If I want to think about in terms of percentages, that is just about 3.6%.
00:17:40.300 --> 00:17:47.700
That is my probability that I'm going to spend more than 9 hours on this homework assignment.
00:17:47.700 --> 00:17:53.900
It was about 3.6% chance that I'm going to spend more than 9 hours on this homework assignment.
00:17:53.900 --> 00:17:56.600
Maybe, I’m worried because I have something I need to do in 9 hours.
00:17:56.600 --> 00:18:00.000
I’m worried I would not get finished in time.
00:18:00.000 --> 00:18:08.100
It actually looks pretty good, it looks like there is more than 96% chance that I will finish on time, that is kind of reassuring.
00:18:08.100 --> 00:18:14.700
That is the answer to the problem, there is 3.6% chance that we will spend more than 9 hours on this homework assignment.
00:18:14.700 --> 00:18:22.600
Let me just recap the steps there, before I show you that one missing step of looking it up on the chart.
00:18:22.600 --> 00:18:27.600
We want to, first of all, convert into a standard unit here.
00:18:27.600 --> 00:18:34.100
I got 9 hours and I got 12 minutes, I decided to convert the hours into minutes.
00:18:34.100 --> 00:18:38.700
You can also convert the other way, if you wanted, but I think it is a little easier this way.
00:18:38.700 --> 00:18:48.300
9 hours is 540 minutes, that was a total on our time that I would spend doing all the homework problems.
00:18:48.300 --> 00:18:57.800
Since, I know that I have a result about averages here, I wanted to convert that into an average amount of time.
00:18:57.800 --> 00:19:08.100
The average time is 540 minutes divided by 36 problems and that is just 15 minutes per problem.
00:19:08.100 --> 00:19:14.500
The question is really, how likely am I to spend an average of more than 15 minutes per problem?
00:19:14.500 --> 00:19:19.100
I want to find the probability that Y ̅ is bigger than 15.
00:19:19.100 --> 00:19:28.300
For Y ̅ to be bigger than 15, I want to build up this expression Y ̅ - μ/σ × √N.
00:19:28.300 --> 00:19:36.400
I filled in my μ is 12, μ right there is the average.
00:19:36.400 --> 00:19:44.900
My σ was 10, there is σ and I got that from the problem here.
00:19:44.900 --> 00:19:52.700
My √N is 6, that comes from N = 36, the number of problems that we have to solve here.
00:19:52.700 --> 00:19:59.600
Then, I just simplified the numbers 15 -12 is 3, 3 × 6 is 18, 18 divided by 10 is 1.8.
00:19:59.600 --> 00:20:05.100
I want to find the probability that my standard normal variable is bigger than 1.8.
00:20:05.100 --> 00:20:07.200
That is what I’m going to confirm on the next page.
00:20:07.200 --> 00:20:13.900
I will show you where that comes from, but we will see that it comes out to 0.0359.
00:20:13.900 --> 00:20:24.900
If you think about that as a percentage, that is just about 3.6%.
00:20:24.900 --> 00:20:31.100
I just want to confirm the result that we used on the previous page.
00:20:31.100 --> 00:20:36.600
What we use on the previous page was, we calculated the probability that a standard normal variable,
00:20:36.600 --> 00:20:44.900
because we had converted a Y ̅ to a standard normal variable, was bigger than 1.8.
00:20:44.900 --> 00:20:49.000
I’m going to look up 1.8 on this chart, 1.80.
00:20:49.000 --> 00:21:00.500
There is 1.8, there is 1.80, it is 0.0359 which was the number that I gave you back on the previous slide.
00:21:00.500 --> 00:21:03.500
This shows you where that come from, it just come from this chart.
00:21:03.500 --> 00:21:14.100
.0359, that is where I got that answer of 3.6% that we used on the previous slide.
00:21:14.100 --> 00:21:18.400
That just completes that little gap that we had on the previous side.
00:21:18.400 --> 00:21:28.300
That totally answers our probability of having to spend more than 9 hours on this horrible homework assignment.
00:21:28.300 --> 00:21:35.000
In example 2, we have a bakery that is charting how many muffins do they start per day.
00:21:35.000 --> 00:21:40.300
We have figure out a long-term average of 30 muffins per day but there is a lot of variation in there.
00:21:40.300 --> 00:21:45.300
Maybe, they sell more muffins on the weekend and fewer on a weekday.
00:21:45.300 --> 00:21:49.200
They figure out that, there is a standard deviation of 8 muffins.
00:21:49.200 --> 00:21:54.800
What they are doing is, they are planning out the next month or so, actually 36 days.
00:21:54.800 --> 00:22:00.000
They are worried about, what is the chance that they will sell more than 1000 muffins?
00:22:00.000 --> 00:22:03.100
Maybe, they are worried about whether they are going to have to order some more supplies,
00:22:03.100 --> 00:22:07.000
some more flours, some more eggs, or something like that.
00:22:07.000 --> 00:22:10.500
Or maybe, they are worry about whether they are going to make enough money.
00:22:10.500 --> 00:22:14.100
They know they need to sell 1000 muffins in the next 36 days.
00:22:14.100 --> 00:22:18.500
It was the kind of calculations that a business person would make.
00:22:18.500 --> 00:22:22.000
We are going to answer them using the central limit theorem.
00:22:22.000 --> 00:22:37.200
Let me remind you what we have, our mean theorem is that, if we start out with Y ̅ – μ, very important distinction there.
00:22:37.200 --> 00:22:51.200
Y ̅ - μ × √N/σ is a standard normal variable, that is kind of our main result for this lecture.
00:22:51.200 --> 00:22:54.700
We want to figure out how we can use that here.
00:22:54.700 --> 00:23:04.600
I see a Y ̅ there, that Y ̅ is the average of the number of muffins we are going to sell each day.
00:23:04.600 --> 00:23:32.300
If the total muffins is going to be 1000, then that means the daily average is Y ̅ which will be 1000/36,
00:23:32.300 --> 00:23:33.500
because there are 36 days.
00:23:33.500 --> 00:23:41.300
It does simply a bit, I can take a 4 out of top and bottom there, simplify that down to 250/9,
00:23:41.300 --> 00:23:45.200
still not the nicest fraction in the world.
00:23:45.200 --> 00:23:54.400
I'm going to try to build up the standard normal variable and get an answer that I can look up easily on the normal charts.
00:23:54.400 --> 00:24:10.900
Y ̅ - μ is 250/9 - μ is the overall average which we figure out is 30 muffins per day, that is -30.
00:24:10.900 --> 00:24:16.700
That is a little awkward, let me go ahead and try to combine those fractions.
00:24:16.700 --> 00:24:21.700
I did do this one in fractions because I rigged it up so the fraction work fairly nicely,
00:24:21.700 --> 00:24:24.300
something we can work out in our heads.
00:24:24.300 --> 00:24:28.600
If the fractions did not work nicely, I would probably just be going to a decimal right now.
00:24:28.600 --> 00:24:30.900
But, this one works nicely.
00:24:30.900 --> 00:24:46.300
250/9 - 30 is 270/9, we get -20/9, you can convert that into a decimal, if you like.
00:24:46.300 --> 00:24:50.200
Let me continue to build up the standard normal variable.
00:24:50.200 --> 00:25:07.500
√N × Y ̅ - μ/σ, I want to be bigger than the values that we have, because we want to sell more than 1000 muffins.
00:25:07.500 --> 00:25:12.100
This should have been a greater than or equal to.
00:25:12.100 --> 00:25:17.600
This should be greater than or equal to -20/9.
00:25:17.600 --> 00:25:22.200
I’m multiplying by √N, the √N is 36, that is because N is 36.
00:25:22.200 --> 00:25:31.200
√36 is 6, and now I have a σ.
00:25:31.200 --> 00:25:37.400
Σ is our standard deviation, that is 8.
00:25:37.400 --> 00:25:42.000
That is because, what we are told in the problem here.
00:25:42.000 --> 00:25:57.200
I think this does simple fairly well, this is - 20/8 could simplify to 5/2, 6/2 could simplify to 3/1.
00:25:57.200 --> 00:26:03.500
3/9 could simplify to 1/3, we just get -5/3.
00:26:03.500 --> 00:26:06.500
Now, I'm going to convert it into a decimal.
00:26:06.500 --> 00:26:12.700
The point of this was that, this was a standard normal variable.
00:26:12.700 --> 00:26:20.500
I want that Z to be bigger than or equal to -5/3 which is as a decimal is -1.67.
00:26:20.500 --> 00:26:27.400
2/3 is about 0.67, most technically that is an approximation.
00:26:27.400 --> 00:26:31.200
I do not want any pure mathematicians to complain about that.
00:26:31.200 --> 00:26:41.700
It is -1.67, and now I want to figure out the probability that Z will be bigger than -1.67.
00:26:41.700 --> 00:26:50.200
Let me draw what I'm going to be looking for, then we will use a chart on the next page to actually calculate that.
00:26:50.200 --> 00:26:56.700
-1.67 was down here somewhere.
00:26:56.700 --> 00:26:59.000
I’m looking for the probability that Z is bigger than that.
00:26:59.000 --> 00:27:02.000
I’m looking for all that probability.
00:27:02.000 --> 00:27:09.800
The way the chart works is it will tell me probabilities of being bigger than a certain cutoff.
00:27:09.800 --> 00:27:19.200
What I'm going to do is find the probability that Z is bigger than 1.67 and then subtract that.
00:27:19.200 --> 00:27:40.100
The probability that Z is bigger than -1.67 is going to be 1 - the probability that Z is bigger than +1.67.
00:27:40.100 --> 00:27:50.400
This is what we are looking for, this probability that Z is bigger than -1.67.
00:27:50.400 --> 00:28:01.000
But I can figure it out as 1- this probability, the probability that Z is bigger than 1.67.
00:28:01.000 --> 00:28:04.000
That is how I'm going to calculate that out.
00:28:04.000 --> 00:28:10.400
The rest of it is simply a matter of looking it up on the chart, because that is the form that I can look things up on the chart.
00:28:10.400 --> 00:28:16.200
I have already done this, if I look it up on the chart on the next page.
00:28:16.200 --> 00:28:18.700
Let me tell you what the answer comes out to be.
00:28:18.700 --> 00:28:28.100
It is 1- 0.0475 and that comes from the chart on the next page.
00:28:28.100 --> 00:28:41.200
And then, 1- 0.0475 is 0.9525, and that is approximately 95%.
00:28:41.200 --> 00:28:49.700
It is in fact very likely that, this bakery is going to sell more than 1000 muffins in the next 36 days.
00:28:49.700 --> 00:28:55.500
If they are planning on buying supplies, buying flour, eggs for their muffins,
00:28:55.500 --> 00:29:01.900
then they better go out and buy more supplies because it is very likely that they will sell more than 1000 muffins.
00:29:01.900 --> 00:29:05.400
If they are worried about revenue then things are looking pretty good,
00:29:05.400 --> 00:29:10.200
because there was a good chance that they will sell more than 1000 muffins.
00:29:10.200 --> 00:29:11.300
Let me recap the steps here.
00:29:11.300 --> 00:29:16.900
There is one missing step which is the chart, which I will fill in on the next slide.
00:29:16.900 --> 00:29:21.700
In the meantime, the total muffins, we want that to be bigger than 1000,
00:29:21.700 --> 00:29:30.600
which means the daily average should be bigger than 1000/36, which is more than 250/9.
00:29:30.600 --> 00:29:32.300
I was just reducing the fractions there.
00:29:32.300 --> 00:29:35.000
I rigged this one up to give us nice fractions.
00:29:35.000 --> 00:29:38.700
And then, I kind of built up this standard normal variable.
00:29:38.700 --> 00:29:47.200
I subtract a μ, the μ was the average of 30 muffins per day, that comes from there.
00:29:47.200 --> 00:29:52.200
That is where that 30, and subtracted it and I got a negative number.
00:29:52.200 --> 00:29:55.200
It is significant that it is negative there.
00:29:55.200 --> 00:29:58.000
We do want to keep track of the negative sign.
00:29:58.000 --> 00:30:04.300
And then, I multiply by √N which was, there is my N is 36.
00:30:04.300 --> 00:30:12.800
√N is my 6 right there, divided by σ which is the standard deviation, 8 muffins right there, there is my 8.
00:30:12.800 --> 00:30:20.400
And then, I just did some simplifying fractions there, got down to -5/3.
00:30:20.400 --> 00:30:27.300
And I convert that back into a decimal which is -1.67.
00:30:27.300 --> 00:30:39.900
In order to find the probability of Z being bigger than -1.67, I flipped it around and I calculated that the probability of being in this tail.
00:30:39.900 --> 00:30:46.700
That is the probability of being in the tail, the probability that Z is bigger than 1.67.
00:30:46.700 --> 00:30:49.500
For that, I’m going to use the chart on the next page.
00:30:49.500 --> 00:30:51.800
I hope I have been reading the chart correctly.
00:30:51.800 --> 00:30:56.400
When we look on the next page, it really will be 0.0475.
00:30:56.400 --> 00:31:07.000
It will work out then to be 1- that is 95% chance that this bakery will sell more than 1000 muffins.
00:31:07.000 --> 00:31:11.300
That is just filling that one missing step from the chart.
00:31:11.300 --> 00:31:18.000
This is the normal distribution chart, this will tell you the probability that in normal variable,
00:31:18.000 --> 00:31:21.000
we will end up being bigger than a particular cutoff.
00:31:21.000 --> 00:31:27.600
In this case, our cutoff is 1.67, we are using that to solve the problem on the previous slide.
00:31:27.600 --> 00:31:39.400
The probability that Z is bigger than 1.67, here is 1.6 and the second decimal place is 7, it is right there.
00:31:39.400 --> 00:31:47.400
I hope this works out, 1.6 and 0.0475, that is what we use before.
00:31:47.400 --> 00:31:57.800
It is 0.0475 and that was the answer that we plugged into our calculations on the previous side.
00:31:57.800 --> 00:32:03.000
Just take this answer, drop into the calculations on the previous side.
00:32:03.000 --> 00:32:12.500
And then, we did some more work and we got our answer to be 95%.
00:32:12.500 --> 00:32:18.000
That fills in the one missing step from the previous slide.
00:32:18.000 --> 00:32:27.900
It is just a matter of taking this 1.67 and matching up 1.6 and 0.07, and finding the right probability.
00:32:27.900 --> 00:32:32.900
Of course, if you are using electronic tools, you probably do not need to use this chart.
00:32:32.900 --> 00:32:38.700
You can just ask what is the probability that a standard normal variable will be bigger than 1.67,
00:32:38.700 --> 00:32:43.200
and it should just spit out the answer for you.
00:32:43.200 --> 00:32:48.800
In example 3, we have a technician fixing a soda machine.
00:32:48.800 --> 00:32:56.300
It is supposed to give a certain amount of soda and she wants to check out whether it is dispensing the right amount of soda.
00:32:56.300 --> 00:33:06.500
She takes 100 samples and the standard deviation in this machine is 2.5 ml.
00:33:06.500 --> 00:33:10.400
We want to find the chance that her sample mean, that is the Y ̅,
00:33:10.400 --> 00:33:16.800
that is the mean of her 100 samples is within 0.5 ml of the true average amount.
00:33:16.800 --> 00:33:22.500
That is the true population mean, that is the μ of soda dispensed.
00:33:22.500 --> 00:33:25.500
Let me setup that one out for you.
00:33:25.500 --> 00:33:36.300
The whole point of this lecture is that, we look at Y ̅ - μ and we convert that to a standard normal variable.
00:33:36.300 --> 00:33:41.900
The way we do that is by multiplying by √N/σ.
00:33:41.900 --> 00:33:52.000
That turned out to be a standard normal variable meaning it has mean 0 and standard of deviation 1.
00:33:52.000 --> 00:33:57.700
What we want to is figure out Y ̅ – μ.
00:33:57.700 --> 00:34:04.600
In this case, we want the sample mean to be within 5 ml of the true mean.
00:34:04.600 --> 00:34:11.300
Within 5 ml means it could go 0.5 ml either way.
00:34:11.300 --> 00:34:21.400
I’m going to put absolute values here and set it less than or equal to 0.5 ml.
00:34:21.400 --> 00:34:25.300
I’m going to try to build up my standard normal variable.
00:34:25.300 --> 00:34:31.300
I'm going to build up by putting √N here and dividing by σ.
00:34:31.300 --> 00:34:38.000
Of course, I got to do that on the other side, as well, √N and divided by σ.
00:34:38.000 --> 00:34:43.900
The point of that is, that gives me a standard normal variable or actually there is a mass of values their,
00:34:43.900 --> 00:34:47.200
I put the absolute values on Z as well.
00:34:47.200 --> 00:34:53.600
That means, I want the absolute value of Z less than or equal to, let me fill in what I can here.
00:34:53.600 --> 00:35:01.200
0.5 √N, N is 100 because we are taking 100 samples here.
00:35:01.200 --> 00:35:08.100
That is × 10, √100 is 10, I rigged that up to make the numbers easy.
00:35:08.100 --> 00:35:16.800
What is my σ, σ is 2.5 here.
00:35:16.800 --> 00:35:25.300
10 divided by 2.5 is 4, this is 0.5 × 4 which is 2.
00:35:25.300 --> 00:35:27.600
That worked out really nicely, is not it.
00:35:27.600 --> 00:35:34.500
I want the probability that the absolute value of Z will be less than 2.
00:35:34.500 --> 00:35:46.000
It is the same as the probability that Z is between -2 and 2.
00:35:46.000 --> 00:35:49.700
Let me draw a little picture, it is always useful when you are working with these normal distributions
00:35:49.700 --> 00:35:54.800
to draw a picture and figure out what it is you are actually calculating.
00:35:54.800 --> 00:36:02.100
I want Z to be between -2 and 2, there is -2 and 2.
00:36:02.100 --> 00:36:04.600
I want to be in between there.
00:36:04.600 --> 00:36:09.900
The way my chart is set up, your chart might be different, but the way my chart works is,
00:36:09.900 --> 00:36:15.200
it will tell you the probability of Z being in the positive tail.
00:36:15.200 --> 00:36:18.900
It will tell you that area right there.
00:36:18.900 --> 00:36:24.200
In order to find the probability in the middle, what I'm going to do is
00:36:24.200 --> 00:36:32.500
calculate that probability in the tail and then subtract off 2 tails, that will give me the probability in the middle.
00:36:32.500 --> 00:36:39.300
I will do 1-2 × the probability that Z is bigger than 2.
00:36:39.300 --> 00:36:45.300
That should give me the probability of being between -2 and 2.
00:36:45.300 --> 00:36:48.800
That is something now that I can look up on my chart,
00:36:48.800 --> 00:36:56.600
or if you do not like charts and you got access to some kind of electronic tool, you can look that up more quickly.
00:36:56.600 --> 00:36:58.900
Just drop the number 2 into your tool.
00:36:58.900 --> 00:37:04.400
I’m going to use the chart and it is on the next page where I set up the charts.
00:37:04.400 --> 00:37:09.500
Let me go ahead and tell you the numerical answer now, and then let me just confirm that on the next page.
00:37:09.500 --> 00:37:13.800
I found the probability of Z being bigger than 2 from the chart.
00:37:13.800 --> 00:37:30.100
It was 0.0228, and that is 1- 2 × 0.0228 is 0.0456.
00:37:30.100 --> 00:37:51.500
That is 0.9544, and that is approximately 95%, just slightly over 95%.
00:37:51.500 --> 00:38:06.000
That is my answer, that is the probability that this technician’s sample mean will be within 0.5 ml of the true mean.
00:38:06.000 --> 00:38:08.300
Let me recap the steps there.
00:38:08.300 --> 00:38:15.400
There is one missing step which is the step from the chart, which will confirm that on the next slide.
00:38:15.400 --> 00:38:18.800
Just while we have the slide in front of us, let me recap the steps.
00:38:18.800 --> 00:38:30.800
I set up my standard normal variable, I know that Y ̅ - μ is always × √N/σ is a standard normal.
00:38:30.800 --> 00:38:36.800
I wanted Y ̅ - μ to be within 0.5 of each other.
00:38:36.800 --> 00:38:39.800
Y ̅ μ should be within 0.5 of each other.
00:38:39.800 --> 00:38:48.700
When we say two things are within a certain distance from each other, that really means that you want to bound their absolute value.
00:38:48.700 --> 00:38:53.100
The distance from A to B is the absolute value of A – B.
00:38:53.100 --> 00:38:57.900
We want the absolute value of Y ̅ - μ to be less than 0.5.
00:38:57.900 --> 00:39:03.200
I just tacked on these other quantities, √N/σ.
00:39:03.200 --> 00:39:09.900
The point of that was that gave me a standard normal variable, I can call that Z.
00:39:09.900 --> 00:39:14.500
And then, I wanted to fill in what my √N and σ were.
00:39:14.500 --> 00:39:20.200
N was 100, √N gave me 10, that is where that came from.
00:39:20.200 --> 00:39:27.300
The σ was the standard deviation given there, that is my σ 2.5.
00:39:27.300 --> 00:39:34.200
The numbers simplified nicely, that was me being clever, setting up nice numbers there.
00:39:34.200 --> 00:39:43.200
It simplified down to 2, but now you have to think a bit more because you want the probability that Z is less than 2.
00:39:43.200 --> 00:39:51.200
Absolute value, that means Z is between -2 and 2, that is the middle region here.
00:39:51.200 --> 00:39:57.000
The way my chart works, it will tell you the tail region, that would not you the middle region directly.
00:39:57.000 --> 00:40:07.100
What I did was, I said you could solve this by doing 1-2 tails, because there is a lower tail and there is an upper tail there.
00:40:07.100 --> 00:40:15.800
I'm going to look up on the chart on the next page and we will see that the probability of Z being bigger than to is 0.0228.
00:40:15.800 --> 00:40:21.100
It is 1-2 × that, 2 × that is 0.0456.
00:40:21.100 --> 00:40:30.300
We finally get 0.9544, get a 95% chance that we will be within 0.5 ml.
00:40:30.300 --> 00:40:34.700
By the way, example 4 is a follow up to example 3.
00:40:34.700 --> 00:40:37.400
I want to make sure that you understand example 3.
00:40:37.400 --> 00:40:42.200
It there is any steps in here that you are a little fuzzy on, just watch the video again and
00:40:42.200 --> 00:40:45.500
just make sure that you are very clear on all the steps here,
00:40:45.500 --> 00:40:49.100
because in example 4, we are going to tweak the numbers a little bit.
00:40:49.100 --> 00:40:53.300
It really will help if example 3 is already very solid for you.
00:40:53.300 --> 00:40:57.200
One missing step here is where that number comes from.
00:40:57.200 --> 00:41:03.900
Let us fill in that step, this is still example 3 and we had one missing step
00:41:03.900 --> 00:41:11.400
which was the probability that Z was bigger than 2, that was from the previous side.
00:41:11.400 --> 00:41:17.600
In order to solve that, I see that there is 2.0 right there.
00:41:17.600 --> 00:41:32.500
2.00 gives me 0. 0228 is what I used on the previous side.
00:41:32.500 --> 00:41:40.800
I use that number on the previous side and you can catch up with all the rest of the arithmetic on the previous side.
00:41:40.800 --> 00:41:45.400
If you do not remember how that worked out but we did some calculations with that.
00:41:45.400 --> 00:41:54.100
We came up with an answer of 95% for this soda dispensing machine.
00:41:54.100 --> 00:41:59.000
That wraps up example 3, we are going to use the same scenario for example 4.
00:41:59.000 --> 00:42:08.000
I do want to make sure that you understand example 3, before you go ahead and try example 4.
00:42:08.000 --> 00:42:12.300
In example 4, we have the same technician from example 3.
00:42:12.300 --> 00:42:17.600
Remember, this technician is taking samples from a soda dispensing machine.
00:42:17.600 --> 00:42:26.800
She wants to guarantee with probability 95% that her sample mean will be with 0.4ml of the true average.
00:42:26.800 --> 00:42:35.500
This looks a lot like example 3, the difference, I will just remind you was with examples 3,
00:42:35.500 --> 00:42:41.700
we had not 0.4 but a 0.5 ml tolerance.
00:42:41.700 --> 00:42:45.700
Here we are restricting that to 0.4 ml tolerance.
00:42:45.700 --> 00:42:52.300
We are deciding that 0.5 is not close enough, I want to get a 0.4 ml tolerance.
00:42:52.300 --> 00:42:56.500
I still want to keep the probability at 95%.
00:42:56.500 --> 00:43:00.400
That means, I have to change something else.
00:43:00.400 --> 00:43:07.300
Since, I want a more accurate answer, that means I'm going to need to take more samples than I did before.
00:43:07.300 --> 00:43:10.900
We are going to try and solve that together.
00:43:10.900 --> 00:43:16.900
First thing is to figure out with this probability 95%, what is that mean?
00:43:16.900 --> 00:43:22.400
Let me try to illustrate that graphically.
00:43:22.400 --> 00:43:31.400
I want to find some cutoff that gives me 95% of the probability in the middle.
00:43:31.400 --> 00:43:33.900
I will put a value of Z there, that is –Z.
00:43:33.900 --> 00:43:41.300
I want to get 95% of the probability in between these cutoffs, for the standard normal variable.
00:43:41.300 --> 00:43:51.200
This is supposed to be 95% here and I want to figure out what value of Z will give me that.
00:43:51.200 --> 00:44:03.800
The way to find that, since I have a chart that will tell me how much probability is in the tail of the normal distribution,
00:44:03.800 --> 00:44:09.300
what I can do is say that these tails, I got two tails here,
00:44:09.300 --> 00:44:28.400
the area in 1 tail should be 1- 95%, 1 - 0.95/2, which is 0.05/2 which is 0.025.
00:44:28.400 --> 00:44:39.900
I want to find a Z value, since that the probability of Z being bigger than that cutoff z is 0.025.
00:44:39.900 --> 00:44:44.500
I will check this on the next slide, you will see, we will look it up together.
00:44:44.500 --> 00:44:53.700
I do not want to break my flow for this slide, I will tell you right now that it comes out to be Z = 1.96.
00:44:53.700 --> 00:44:56.300
I want to make sure that was the right value that I used.
00:44:56.300 --> 00:45:02.100
Yes 1.96, that is the cutoff Z value that we are looking for.
00:45:02.100 --> 00:45:06.500
Let me go back and show you how that factors in with all the other numbers in the problem.
00:45:06.500 --> 00:45:11.300
I will just remind you that Z is our standard normal variable.
00:45:11.300 --> 00:45:20.100
The way we get it is we do √N/σ × Y ̅ – μ.
00:45:20.100 --> 00:45:26.500
Let us work that, I wanted out to have Y ̅ – μ.
00:45:26.500 --> 00:45:33.600
I wanted those two quantities, Y ̅ is the sample mean, that is Y ̅ right there.
00:45:33.600 --> 00:45:39.400
The true average of this machine is μ, I do not know what that is, by the way.
00:45:39.400 --> 00:45:41.400
I want this to be within 0.4 of each other.
00:45:41.400 --> 00:45:54.800
I want the difference between those two in absolute value to be less than 0.4.
00:45:54.800 --> 00:46:00.300
When I want to do is build up this standard normal variable.
00:46:00.300 --> 00:46:04.400
I’m going to multiply on √N/σ here.
00:46:04.400 --> 00:46:11.500
I will multiply on √N/σ, as well here.
00:46:11.500 --> 00:46:19.600
The point of that is this gives me my Z value, my standard normal variable.
00:46:19.600 --> 00:46:26.100
My Z is less than or equal to √N × 0.4.
00:46:26.100 --> 00:46:33.000
I know what σ is, my σ was given to me in example 3, that was 2.5.
00:46:33.000 --> 00:46:39.000
That is 2.5 from example 3 was where that was given to us.
00:46:39.000 --> 00:46:45.400
Let me fill that in, example 3 was where that information came from.
00:46:45.400 --> 00:46:50.500
The N in example 3, N was 100 because we took 100 samples.
00:46:50.500 --> 00:46:56.900
We cannot use that anymore because now we are trying to go for more accurate estimation.
00:46:56.900 --> 00:46:59.900
We are going to have to increase the number of samples.
00:46:59.900 --> 00:47:03.400
I know it is going to be increased because we have to get a more accurate answer.
00:47:03.400 --> 00:47:11.200
What we are really going to do is solve this for N.
00:47:11.200 --> 00:47:21.000
We are going to use this value of Z that we figured out over here, 1.96, and then we will solve for N.
00:47:21.000 --> 00:47:32.500
1.96 is less than or equal to √N × 0.4/2.5.
00:47:32.500 --> 00:47:39.900
Now, I’m just going to manipulate the algebra a little bit until I can get a value for N.
00:47:39.900 --> 00:47:45.000
I do not think this one is going to work out particularly nicely, I did not rigged the numbers for this one very well.
00:47:45.000 --> 00:47:55.800
2.5, I will multiply by both sides, I will divide both sides by 0.4, that should still be less than √N.
00:47:55.800 --> 00:48:08.600
N should be greater than or equal to 1.96 × 2.5 divided by 0.4.
00:48:08.600 --> 00:48:13.000
Since, I changed from √N to N, I'm squaring both sides.
00:48:13.000 --> 00:48:17.100
We should square that expression.
00:48:17.100 --> 00:48:21.300
That is not a number you will have work out in your head.
00:48:21.300 --> 00:48:25.400
I did that on a calculator and I will show you what I got there.
00:48:25.400 --> 00:48:36.400
I got 150.063, just slightly over 150 there.
00:48:36.400 --> 00:48:39.200
N is the number of samples that we are going to take.
00:48:39.200 --> 00:48:43.100
It is got to be a whole number, because you cannot take half of the sample.
00:48:43.100 --> 00:48:51.200
In order to make this work, I need a whole number bigger than 150.6063.
00:48:51.200 --> 00:48:56.400
N = 151, I will round that up.
00:48:56.400 --> 00:49:00.100
You always round up, if you are talking about the number of samples.
00:49:00.100 --> 00:49:12.600
The samples is enough, I solved that except for one detail of showing you on the chart where that Z value came from.
00:49:12.600 --> 00:49:17.200
I will go back over the steps here and then we will jump forward to the chart.
00:49:17.200 --> 00:49:20.600
I will show you where that Z value came from.
00:49:20.600 --> 00:49:24.100
To go back to be getting here.
00:49:24.100 --> 00:49:27.000
I start out with this probability of 95%.
00:49:27.000 --> 00:49:32.100
I’m going for a 95% probability here.
00:49:32.100 --> 00:49:42.000
In order to get 95% in the middle, that means my values on the tail, there is two tails, they are going to split the left over.
00:49:42.000 --> 00:49:48.400
The left over probability is 1 - 0.95/2.
00:49:48.400 --> 00:49:53.900
1-0.95 is 0.05, .05/2 is 0.025.
00:49:53.900 --> 00:50:02.900
I’m looking for cutoff value of z, that when I find the probability bigger than that, it is 0.025.
00:50:02.900 --> 00:50:07.000
It will come from the chart on the next slide.
00:50:07.000 --> 00:50:14.900
L will fill the part in, if you are willing to be a little patient with me, or you can skip ahead and see the chart on the next slide.
00:50:14.900 --> 00:50:18.100
Then, I would just going to hang onto that Z for a little while.
00:50:18.100 --> 00:50:23.400
I'm going to go back and I'm going to build up my standard normal variable here, that comes from this formula here.
00:50:23.400 --> 00:50:29.400
That is supposed to be a standard normal distribution.
00:50:29.400 --> 00:50:35.100
I started off with Y - μ and their absolute value, that comes from the word within here.
00:50:35.100 --> 00:50:39.000
I want two things to be within 0.4 ml.
00:50:39.000 --> 00:50:42.500
I want the absolute value of less than 0.4.
00:50:42.500 --> 00:50:45.900
And then, I built up my √N/σ.
00:50:45.900 --> 00:50:51.400
I do not know what the √N is because I have not been told how many samples I want at this point,
00:50:51.400 --> 00:50:53.700
that is what I'm solving for.
00:50:53.700 --> 00:50:59.500
The N is the variable but my σ, I was given the standard deviation in example 3.
00:50:59.500 --> 00:51:02.500
I dropped that in, it is 2.5.
00:51:02.500 --> 00:51:06.400
I fill in my Z value, that comes from over here.
00:51:06.400 --> 00:51:11.000
That is where that Z value came from.
00:51:11.000 --> 00:51:14.700
I just take this equation and I solve it for N.
00:51:14.700 --> 00:51:20.400
That is just a matter of manipulating the algebra around, squaring both sides because I had a √N.
00:51:20.400 --> 00:51:28.700
I get N = 150.063, and I need it to be a whole number.
00:51:28.700 --> 00:51:31.900
To be safe, I round it up.
00:51:31.900 --> 00:51:38.100
Because 150 samples would not be quite enough, I have to go for 151 samples.
00:51:38.100 --> 00:51:45.900
And then, I know with probability 95% that my sample mean will be close to the true mean.
00:51:45.900 --> 00:51:54.000
One missing step here is where that 1.96 came from and it comes from this 0.025.
00:51:54.000 --> 00:51:58.600
But, I'm using the chart that we will see on the next slide.
00:51:58.600 --> 00:52:02.500
I want to fill in that one missing step from example 4.
00:52:02.500 --> 00:52:10.500
We want to find a Z for which the probability of Z being bigger than that Z is 0.025.
00:52:10.500 --> 00:52:18.800
We work that out from the previous slide, that was what we are looking for.
00:52:18.800 --> 00:52:23.200
Let me draw a little picture of what we are dealing with.
00:52:23.200 --> 00:52:38.300
We want to find a Z, says that tail probability there is 0.025 which means I have to look for 0.025 in my chart.
00:52:38.300 --> 00:52:52.100
I’m going to start here, they are getting smaller 0.7, 0.6, 0.5, 0.4, 0.3, 0.02.
00:52:52.100 --> 00:53:02.000
It is getting close, 0.027, 0.026, 0.0256, 0.0250, there it is right there.
00:53:02.000 --> 00:53:17.400
I look at where row and column that happened in, 1.96 and 0.06, that tells me that my Z value is 1.96.
00:53:17.400 --> 00:53:24.800
You might also have electronic applets, you do not have to do this kind of old fashioned method of looking up on charts.
00:53:24.800 --> 00:53:31.200
That is totally fine with me, if you are okay using electronic tools in your class.
00:53:31.200 --> 00:53:36.900
You can jump from 0.025 to 1.96, that is totally fine with me.
00:53:36.900 --> 00:53:47.700
This 1.96, we went on and use that in our calculations on the previous side.
00:53:47.700 --> 00:53:56.900
Somehow, that work out on the previous side to tell us that we need N = 151 samples,
00:53:56.900 --> 00:54:05.200
in order to guarantee a particular accuracy of our sample mean.
00:54:05.200 --> 00:54:08.900
Just to recap there, most of the work was done on the previous side.
00:54:08.900 --> 00:54:13.000
I figure out on the previous side that I was looking for cutoff value of Z,
00:54:13.000 --> 00:54:17.100
such that the probability of being bigger than that was 0.025.
00:54:17.100 --> 00:54:24.900
We just look through this chart until I found 0.025, found that in the 1.9 row and the 0.06 column.
00:54:24.900 --> 00:54:27.700
I put those together and I get 1.96.
00:54:27.700 --> 00:54:35.100
The rest of it goes back to the previous side where we threw that 1.96 in a bunch of calculations
00:54:35.100 --> 00:54:42.200
and came back to N = 151 samples.
00:54:42.200 --> 00:54:53.700
In our final example here, we have a restaurant that is worried about how much money it is going to make tonight.
00:54:53.700 --> 00:55:00.800
It has done some studies and it is found that its customers spend an average of $30.00 per customer.
00:55:00.800 --> 00:55:03.500
But, they have a standard deviation of $10.00 which means,
00:55:03.500 --> 00:55:08.500
maybe if somebody just has an appetizer and a drink, maybe they will spend $20.00.
00:55:08.500 --> 00:55:16.800
Maybe, if they really go for the full menu and have drinks and desserts, and a few different extras,
00:55:16.800 --> 00:55:20.200
then they are going to end up spending $40.00 or even more.
00:55:20.200 --> 00:55:26.000
The restaurant, their average is $30.00 and they have 25 reservations tonight.
00:55:26.000 --> 00:55:31.000
I guess the reservation only restaurant, you cannot just walk in here, you have to have a reservation.
00:55:31.000 --> 00:55:34.000
They are expecting 25 customers tonight.
00:55:34.000 --> 00:55:37.900
They want to know the chance that their total revenue tonight, we are not worried about profit,
00:55:37.900 --> 00:55:48.500
we are not worried about what we are spending on supplies, total revenue will be between $725 and $800.
00:55:48.500 --> 00:55:52.800
Let me show you how this turns into a central limit theorem problem.
00:55:52.800 --> 00:55:56.100
Because it is not totally obvious right now, we are talking about total revenue.
00:55:56.100 --> 00:56:01.100
Let me show you here, let me remind you of what we are given at the beginning of this lecture.
00:56:01.100 --> 00:56:18.400
We are given that Y ̅ is sample mean – μ the global mean, divided by σ the standard deviation, and multiplied by √N.
00:56:18.400 --> 00:56:21.600
N is the size of the sample.
00:56:21.600 --> 00:56:25.900
The point of that was that would give you a standard normal variable.
00:56:25.900 --> 00:56:31.900
We are going to call it Z and that is a standard normal variable.
00:56:31.900 --> 00:56:37.000
In turn, the point of a standard normal variable is it is very easy to calculate probabilities.
00:56:37.000 --> 00:56:40.000
The way I’m doing it is I'm using charts.
00:56:40.000 --> 00:56:47.100
You might use charts for your class or you might have more sophisticated electronic tools, and that is okay with me.
00:56:47.100 --> 00:56:49.400
What is this have to do with this restaurant?
00:56:49.400 --> 00:56:56.300
They want to make between $725 and $800 total.
00:56:56.300 --> 00:57:05.000
If the total is going to be between 725, I said they want to make between that.
00:57:05.000 --> 00:57:08.000
Of course, they would be happy if they made more.
00:57:08.000 --> 00:57:14.300
They are worried about, they want calculate how likely is it that they will make between 725 and 800 total.
00:57:14.300 --> 00:57:20.300
What we have here is a result that has to do with the mean, the sample mean.
00:57:20.300 --> 00:57:22.400
How do we convert that into a mean?
00:57:22.400 --> 00:57:28.600
We just divide by the customers, the number of customers,
00:57:28.600 --> 00:57:33.200
and convert that into an average amount that each customer would spend.
00:57:33.200 --> 00:57:48.300
The mean, the average, Y ̅ would have to be between 725 divided by 25 and 800/25
00:57:48.300 --> 00:57:56.200
because that is how much the average customer would have to spend, in order to get the total between 725 and 800.
00:57:56.200 --> 00:58:01.900
I just did a little arithmetic here, I rigged this up so that the numbers came out fairly nicely.
00:58:01.900 --> 00:58:14.800
800/25 is 32 and 725/25 is 28, 725/25 is 29.
00:58:14.800 --> 00:58:20.700
Y ̅ would have to be between 29 and 32.
00:58:20.700 --> 00:58:24.200
What that means is, all the customers that come in tonight,
00:58:24.200 --> 00:58:29.700
they would have to spend an average of between $29.00 and $32.00.
00:58:29.700 --> 00:58:32.000
It does not mean they will have to spend between that,
00:58:32.000 --> 00:58:37.300
but it means you can still have some big spenders to come in and drop 50 bucks on a meal.
00:58:37.300 --> 00:58:43.500
You can still have some cheapskates to just buy an appetizer and then slid out of there after spending $10.00.
00:58:43.500 --> 00:58:53.800
But on average, it has to come out between $29 and $32.00 per customer for tonight's customers.
00:58:53.800 --> 00:58:59.800
What I would like to do is kind of buildup that standard normal variable.
00:58:59.800 --> 00:59:06.900
Y ̅ - μ would be between, my μ is my global average.
00:59:06.900 --> 00:59:14.500
That is right there, that is the 30, that is how much a customer spend on the average in the long term.
00:59:14.500 --> 00:59:29.500
That is 29 -30 and 32 -30, let me go ahead and divide by σ.
00:59:29.500 --> 00:59:37.500
My σ is the standard deviation, there it is $10.00 right there.
00:59:37.500 --> 00:59:43.000
I also need to multiply by √N, I did not really leave myself enough space to do that.
00:59:43.000 --> 00:59:44.600
I will give myself another line there.
00:59:44.600 --> 00:59:54.700
Y ̅ - μ/σ × √N, this not absolute value, this is not one of those within problems like the previous one.
00:59:54.700 --> 00:59:58.900
We have to be careful about what is positive, what is negative, no absolute value here.
00:59:58.900 --> 01:00:04.600
√N, N is 25, that is the number of customers we are going to be working with tonight.
01:00:04.600 --> 01:00:19.100
√N is 5, 5 × 29 -30/10 and 5 × 32 -30/10.
01:00:19.100 --> 01:00:28.300
That simplifies fairly nicely, 5/10 is ½, ½ × -1 is - ½.
01:00:28.300 --> 01:00:31.500
I will write that as -0.5.
01:00:31.500 --> 01:00:35.200
And then, the point of this was we are building up a standard normal variable.
01:00:35.200 --> 01:00:51.300
That is my Z right there, this is between -0.5 and 5/10 is still ½, 32 -30 is 2, 2 × ½ is just 1.0.
01:00:51.300 --> 01:00:58.900
I have a standard normal variable and I want to find the probability that it is between -1/2 and +1.
01:00:58.900 --> 01:01:03.500
Let me draw a graph of what I’m looking for.
01:01:03.500 --> 01:01:08.100
Possibly, if you have the right electronic tool, you can jump to the answer at this point.
01:01:08.100 --> 01:01:15.000
Just drop these numbers in your electronic tool, but let me show you how you can figure out using your charts.
01:01:15.000 --> 01:01:20.000
There is -0.5 and there is 1.0.
01:01:20.000 --> 01:01:25.500
I want to find the probability of being in between those two.
01:01:25.500 --> 01:01:30.400
What my chart will do is, it will tell me the probability of being in a tail.
01:01:30.400 --> 01:01:33.400
It will tell me that probability right there.
01:01:33.400 --> 01:01:40.900
It will also tell me that probability right there, but those are not the same because 0.5 and 1.0 are not symmetric.
01:01:40.900 --> 01:01:47.400
This is the probability that Z is greater than 1.0.
01:01:47.400 --> 01:01:56.100
This is the probability that Z is less than -0.5, but it is also the same as Z being bigger than 0.5.
01:01:56.100 --> 01:02:10.400
What I really want here is, my probability that Z is between -0.5, 0.5, and 1.0.
01:02:10.400 --> 01:02:20.700
Let me write that a little more clearly, - 0.5 and 1.0
01:02:20.700 --> 01:02:26.200
What I can do is, I can subtract off the two tails to get the probability.
01:02:26.200 --> 01:02:37.700
That is 1- the probability that Z is bigger than 0.5 - the probability that Z is bigger than 1.0.
01:02:37.700 --> 01:02:41.600
We have some other problems that is like this where we subtracted off the two tails.
01:02:41.600 --> 01:02:45.100
Those are symmetric ones, they started out with absolute values.
01:02:45.100 --> 01:02:48.700
We can just find one tail and then multiply it by 2.
01:02:48.700 --> 01:02:54.700
But these tails are not symmetric, I would have to do two separate calculations and
01:02:54.700 --> 01:03:00.200
look up two separate numbers on a chart there.
01:03:00.200 --> 01:03:05.300
Let me say that I will do the steps on the chart on the next page.
01:03:05.300 --> 01:03:12.900
I will just tell you what the answers are for now, and then I will prove to you by showing you the chart on the next page.
01:03:12.900 --> 01:03:23.700
The probability that Z is bigger than 0.5, I look that up on my chart, I got 0.3085.
01:03:23.700 --> 01:03:31.700
The probability that Z is bigger than 1.0 was 0.1587.
01:03:31.700 --> 01:03:37.200
Now, it is just 1 - 0.3085 0.1587.
01:03:37.200 --> 01:03:47.300
I was lazy, I threw that into a calculator and what I got was a 0.5328,
01:03:47.300 --> 01:03:51.700
could have done that by hand, that would have been that bad.
01:03:51.700 --> 01:04:00.000
If you want to estimate that, that would be just 53%.
01:04:00.000 --> 01:04:13.100
That is the probability that this restaurant, their total profit for tonight is between $725 and $800 tonight.
01:04:13.100 --> 01:04:15.800
There is that one missing step from the chart.
01:04:15.800 --> 01:04:21.800
We will confirm that on the next slide but before I turn the page from the slide, let me show you the steps here.
01:04:21.800 --> 01:04:27.500
I want to find the probability that my total was between 725 and 800.
01:04:27.500 --> 01:04:34.700
What I really know is a result about the average that each customer tonight is going to spend.
01:04:34.700 --> 01:04:38.300
I want to convert that total into an average.
01:04:38.300 --> 01:04:42.900
I just divided it by the number of customers.
01:04:42.900 --> 01:04:47.300
The total divided by the number of customers gives me the average.
01:04:47.300 --> 01:04:54.000
725/25 gives 29, 800/25 gives me 32.6
01:04:54.000 --> 01:04:59.300
And then, I start to build up this formula for a standard normal variable.
01:04:59.300 --> 01:05:08.500
I subtracted μ from both sides, that μ was the average that all the customers in the world spend at this restaurant.
01:05:08.500 --> 01:05:19.200
I subtracted 30 from both sides and then I divided by σ, where is my σ, there is my σ right there, the standard deviation.
01:05:19.200 --> 01:05:25.900
I divided by σ and then I multiply by √N on the next line.
01:05:25.900 --> 01:05:34.000
N was the number of customers, there is 25 of them, I’m going to multiply both sides by 5.
01:05:34.000 --> 01:05:36.500
5 is the √25.
01:05:36.500 --> 01:05:39.900
I simplified the numbers here, they have worked out pretty nicely.
01:05:39.900 --> 01:05:47.300
They are rigged to work nicely, simplified down to -0.5 and +1.0.
01:05:47.300 --> 01:05:58.800
We are really looking for the probability between -0.5 and 1.0 for a standard normal variable.
01:05:58.800 --> 01:05:59.300
The way my chart works and some people's chart work a little differently,
01:05:59.300 --> 01:06:04.300
but the way my chart works is it will tell you these tail probabilities.
01:06:04.300 --> 01:06:09.100
It tells you the positive tails but you can work out the negative tails the same way.
01:06:09.100 --> 01:06:16.500
What I will do is, I will look up the two different tails there and subtract them off from 1.
01:06:16.500 --> 01:06:22.000
That will give me this probability in the middle, that I'm really looking for.
01:06:22.000 --> 01:06:29.100
I will confirm those on the next page with the chart, it is 0.3085 and 0.1587.
01:06:29.100 --> 01:06:38.800
Once I look those up, I can drop them back into the calculation and just reduce it down to 0.5328 which is about 53%.
01:06:38.800 --> 01:06:49.100
That is my probability that my restaurant is going to make between $725 and $800 tonight.
01:06:49.100 --> 01:06:52.400
The one missing piece of the puzzle from the previous slide,
01:06:52.400 --> 01:07:00.400
we are still answering example 5 now, is to find those two probabilities.
01:07:00.400 --> 01:07:08.900
We were finding the probabilities of being less than - 0.5 or being bigger than 1.0.
01:07:08.900 --> 01:07:13.500
We actually want to find the probability of being between those cutoffs.
01:07:13.500 --> 01:07:19.000
This chart will tell you the probability of being in the tails.
01:07:19.000 --> 01:07:28.200
The probability of Z being less than -0.5 is the same as being bigger than 0.5.
01:07:28.200 --> 01:07:39.700
It should be in this chart somewhere, here it is 0.5 and 0.00 is 0.3085.
01:07:39.700 --> 01:07:58.100
The probability of Z being bigger than 1.0, here is 1.0, it is 0.1587.
01:07:58.100 --> 01:08:04.100
I took those numbers and I do not think I'm going to rehash all the calculations that I did on the previous side.
01:08:04.100 --> 01:08:09.600
I will just say, you plug those numbers in to the appropriate place on the previous side.
01:08:09.600 --> 01:08:14.400
You can go back and watch it, if you do not remember how it works out.
01:08:14.400 --> 01:08:20.900
It would be good if I can spell previous though.
01:08:20.900 --> 01:08:27.400
We work through some calculations and we came up with a 53% probability
01:08:27.400 --> 01:08:37.700
that this restaurant is going to make between $725 and $800 in their nightly revenue.
01:08:37.700 --> 01:08:45.100
That wraps up example 5, most of the work was done on the previous side.
01:08:45.100 --> 01:08:49.500
You can go back and check it out, just the missing step on the previous side was
01:08:49.500 --> 01:08:56.100
where these two numbers came from, the 0.3085 and 0.1587.
01:08:56.100 --> 01:09:06.500
Where they came from was by looking up 0.5 and 1.0, and then getting these two numbers from my standard normal chart.
01:09:06.500 --> 01:09:11.100
If you do not like using charts, if you have an electronic way of getting probabilities
01:09:11.100 --> 01:09:16.100
for a standard normal distribution, then by all means use that.
01:09:16.100 --> 01:09:20.000
It is definitely something quicker than this slightly archaic method.
01:09:20.000 --> 01:09:26.200
In some probability classes, they are still using charts so I want to show you that way.
01:09:26.200 --> 01:09:29.000
That wraps up this lecture on the central limit theorem and
01:09:29.000 --> 01:09:34.500
that is the last lecture in the probability series here on www.educator.com.
01:09:34.500 --> 01:09:37.700
My name is Will Murray, I have really enjoyed making these lectures.
01:09:37.700 --> 01:09:40.500
I hope you have learned something about probability.
01:09:40.500 --> 01:09:45.800
I hope you have been working through the examples and learning something along with me.
01:09:45.800 --> 01:09:49.900
I thank you very much for sticking with me through these probability lectures.
01:09:49.900 --> 01:09:55.000
I hope you are enjoying your probability class and all your math classes, bye now.