WEBVTT mathematics/statistics/son
00:00:00.000 --> 00:00:01.200
Welcome to educator.com.
00:00:01.200 --> 00:00:06.800
We are going to be talking about introducing the concept of sampling distribution.
00:00:06.800 --> 00:00:22.000
Here is the roadmap we have been talking about probability distributions and that is what we call distributions of probabilities from discrete outcomes.
00:00:22.000 --> 00:00:29.800
We are going on to sampling distributions and that is what we call distributions of outcomes that are continuous.
00:00:29.800 --> 00:00:38.200
For now we can apply the same fundamental logic from probability distribution directly to sampling distributions.
00:00:38.200 --> 00:00:49.000
We could treat them roughly similarly, but I do want to connect sampling and research methods to topics we have covered before with sampling distribution.
00:00:49.000 --> 00:00:53.600
We are going to finally talk about actually generating sampling distributions.
00:00:53.600 --> 00:01:00.200
First let us talk about the difference between probability and sampling distributions.
00:01:00.200 --> 00:01:08.300
In probability distributions we are always looking at discrete outcomes, finite or uncountable numbers of outcomes.
00:01:08.300 --> 00:01:19.700
For example in binomial distributions, if you have 10 trials, you have 0 to 10 different outcomes as possibilities and that is 11.
00:01:19.700 --> 00:01:24.700
They are discrete, finite, accountable, no problem.
00:01:24.700 --> 00:01:37.100
In probability distributions, what we are looking for is the probabilities of those discrete outcomes.
00:01:37.100 --> 00:01:45.100
We get a list of all these probabilities and we can actually make a list because it is a finite number.
00:01:45.100 --> 00:01:48.900
Let us talk about sampling distribution.
00:01:48.900 --> 00:01:51.600
Sampling distributions are roughly the same idea.
00:01:51.600 --> 00:01:57.800
You have the sample space and you want to know how likely each outcome is,
00:01:57.800 --> 00:02:04.600
the probability of each outcome and the set of all those probabilities that is called the sampling distribution.
00:02:04.600 --> 00:02:16.000
Here is the big difference, in standard discrete outcomes we are talking about continuous outcomes.
00:02:16.000 --> 00:02:26.500
Before we ask what is the probability that 2 out of 4 people for random people that you pick from the United States have a bachelor’s degree.
00:02:26.500 --> 00:02:37.200
Now, we might say if you pick 4 college students at random and we are not looking for the 2 anymore, that is discrete.
00:02:37.200 --> 00:02:44.000
We are looking for things like what is the average GPA.
00:02:44.000 --> 00:02:48.800
There is an infinite number of average GPA that you could potentially have.
00:02:48.800 --> 00:03:05.800
In the way it is not finite anymore, but it is infinite number of outcomes and uncountable number of outcomes.
00:03:05.800 --> 00:03:16.200
Now that is problematic because here we had like a list of all the different probabilities.
00:03:16.200 --> 00:03:19.000
Can you list all these outcomes?
00:03:19.000 --> 00:03:22.400
It is impossible, they are infinite.
00:03:22.400 --> 00:03:27.800
By definition, not listable, not put in a table of all.
00:03:27.800 --> 00:03:36.600
That is an issue with sampling distribution but sampling distributions get their power from other sources.
00:03:36.600 --> 00:03:44.400
We do not have to worry as much about that but I do want to know that this is a big difference between probability distributions and sampling distributions.
00:03:44.400 --> 00:03:50.900
But still we are going to be trying to find things like how we find the expected value of these distributions?
00:03:50.900 --> 00:03:53.500
How do we find the probability of some outcome?
00:03:53.500 --> 00:04:01.300
The same logic overall logic is going to apply for now.
00:04:01.300 --> 00:04:09.300
I just go over the logic of the original probability distribution.
00:04:09.300 --> 00:04:26.300
Basically we use probability to find this known population, we had known populations like things like fair coins, roll of two fair dice, or whatever something like that.
00:04:26.300 --> 00:04:31.300
And from that we generate a probability distribution.
00:04:31.300 --> 00:04:43.900
A whole bunch of different values for this random variable x and then the probabilities of those x.
00:04:43.900 --> 00:05:04.000
Okay once we have that, then we have this sample and it is actually from an unknown population.
00:05:04.000 --> 00:05:12.400
We get the sample but is it from that kind of population or from that population?
00:05:12.400 --> 00:05:14.000
We do not know.
00:05:14.000 --> 00:05:24.800
What we do is we take a sample and we compare it to the probability distribution and we look at is the sample very likely or very unlikely.
00:05:24.800 --> 00:05:33.200
From that we judge whether these known and unknown population is similar to each other or not right.
00:05:33.200 --> 00:05:37.400
Is it likely, unlikely?
00:05:37.400 --> 00:05:41.800
That is roughly bad idea.
00:05:41.800 --> 00:05:51.700
We went over a couple of specific examples, one of the ones that we went over in great detail
00:05:51.700 --> 00:06:00.200
was the one with 2 dice where the random variable is the sum of the two dice.
00:06:00.200 --> 00:06:10.400
The logic of sampling distribution is roughly similar, so we have some known population.
00:06:10.400 --> 00:06:18.000
We generate a sampling distribution this time instead of a probability distribution we will talk about how to generate those later.
00:06:18.000 --> 00:06:25.800
We get samples from unknown population and we compare it and we say is it likely or unlikely?
00:06:25.800 --> 00:06:28.000
Same underlying logic.
00:06:28.000 --> 00:06:34.800
The differences are going to be in the step how we generate is going to be different.
00:06:34.800 --> 00:06:46.700
How we judge whether it is likely or unlikely this step is also going to be a little bit different in terms of the nitty-gritty like how we actually do it,
00:06:46.700 --> 00:06:48.600
but the concept is the same.
00:06:48.600 --> 00:06:55.800
We had this known stuff, we have this unknown stuff, we compare the unknown staff to the known stuff.
00:06:55.800 --> 00:07:01.100
Let us go over that example that we knew really well.
00:07:01.100 --> 00:07:20.900
The known population here was 2 fair dice that we brought and we generate it as probability distribution.
00:07:20.900 --> 00:07:34.300
Here we have the probability distribution and the probability distribution where x the random variable is the sum of 2 dice.
00:07:34.300 --> 00:07:39.300
We also generated all these probabilities for each sum.
00:07:39.300 --> 00:07:44.900
We have the sum of the two dice, we have all the probabilities for each of those sums and each of these sums is discrete and countable.
00:07:44.900 --> 00:07:54.300
There is 10, 11 of them right.
00:07:54.300 --> 00:08:10.900
What we would do is let us say we rolled 2 dice and we do not know if they are fair dice or not some shady guy give it to us.
00:08:10.900 --> 00:08:29.700
If we are going to roll such as 1-1, that is what our sample is then we can say okay what is the probability that x=2?
00:08:29.700 --> 00:08:35.500
That is a pretty small probability, it looks like .025 or something.
00:08:35.500 --> 00:08:42.700
Because it is a small probably we will say this is unlikely sample.
00:08:42.700 --> 00:08:53.100
Let us say we got a sample that was something like 3-4.
00:08:53.100 --> 00:09:11.400
We might compare that to this probability where x =7 and that is pretty likely there is like 16% chance.
00:09:11.400 --> 00:09:16.200
We would say this is likely.
00:09:16.200 --> 00:09:25.000
This sample, we would probably say if we got this it is likely that it came from fair dice.
00:09:25.000 --> 00:09:28.600
If we had the sample we might say it is less likely.
00:09:28.600 --> 00:09:40.400
It is not that we stop here and say these dice are unfair because there is still a chance that you could get this but is less likely and just more likely.
00:09:40.400 --> 00:09:43.200
We are judging them relative to each other.
00:09:43.200 --> 00:09:52.600
We are going to do something similar but some of these steps will differ, namely this one and this one.
00:09:52.600 --> 00:10:01.200
How do we get a sample sampling distribution?
00:10:01.200 --> 00:10:03.600
We know how to get a probability distribution.
00:10:03.600 --> 00:10:10.800
Probability distributions are really straightforward because we use those fundamental rules of probability.
00:10:10.800 --> 00:10:17.200
Sampling distributions are different because we cannot use probability rules necessarily.
00:10:17.200 --> 00:10:29.500
Before what we did in probability distributions is we use things like the law large number and we could sample many times.
00:10:29.500 --> 00:10:40.100
This is the case where we do not know use probability actions, the rules of probability.
00:10:40.100 --> 00:10:41.700
We do not use those regularities.
00:10:41.700 --> 00:10:43.900
We just use the law large numbers.
00:10:43.900 --> 00:10:50.300
We pretend to sample many times and then we generate the probability distribution that way.
00:10:50.300 --> 00:10:51.000
We could do that.
00:10:51.000 --> 00:10:59.800
You could flip a coin hundreds and hundreds of times or you can use the probability principles to come up with a probability distribution.
00:10:59.800 --> 00:11:07.800
This one will take a lot longer than this one, but you have to know more stuff in order to implement this.
00:11:07.800 --> 00:11:12.600
But these are also shortcuts for the next one.
00:11:12.600 --> 00:11:18.400
In sampling distributions you also have two different methods for you to come up with sampling distribution.
00:11:18.400 --> 00:11:24.200
One of these actually applies here and that is this one.
00:11:24.200 --> 00:11:41.000
We can use law of large numbers and sample many times.
00:11:41.000 --> 00:11:45.600
So that is one way we can do it.
00:11:45.600 --> 00:11:48.800
Unfortunately we cannot use this one.
00:11:48.800 --> 00:12:00.400
There are no way we can use this one, but we can use something that we are going to learn about later called the central limit theorem.
00:12:00.400 --> 00:12:09.400
But for today, we are going to focus in on this one using the law of large numbers to sample many many times.
00:12:09.400 --> 00:12:19.100
I before we go on to actually nitty-gritty generating the sampling distribution here is how sampling distribution connects
00:12:19.100 --> 00:12:27.300
with the concepts of the sampling, unbiased sampling and research methods.
00:12:27.300 --> 00:12:30.900
Remember experimental methodology versus those other methodologies.
00:12:30.900 --> 00:12:39.300
The promise before was this, if you use random sampling and by random sampling I need unbiased,
00:12:39.300 --> 00:12:55.700
and if you use experimental design we promise that you could draw a conclusion about causation and promises that was the case.
00:12:55.700 --> 00:13:12.200
In order to do this mathematically, the sampling distribution is sort of the engine that allows
00:13:12.200 --> 00:13:16.600
this promise to come true because here is what the sampling distribution does.
00:13:16.600 --> 00:13:28.400
Imagine repeating your unbiased sampling and great experimental methodology over and over and over again.
00:13:28.400 --> 00:13:32.000
What kind of distribution would you see?
00:13:32.000 --> 00:13:44.400
Let us say you have this great experiment that you really wanted you do to see if x changes y?
00:13:44.400 --> 00:13:50.000
What would happen if we did that experiment over and over and over again?
00:13:50.000 --> 00:13:53.200
Overtime, you would see the truth.
00:13:53.200 --> 00:13:57.800
You would see what might really emerge from that experiment.
00:13:57.800 --> 00:14:07.700
And that is how sampling distributions is going to help us meet the fruits of this promise.
00:14:07.700 --> 00:14:12.900
Without sampling distributions this promise cannot fully come true.
00:14:12.900 --> 00:14:27.100
We are going to talk about actually simulating, generating a sampling distribution.
00:14:27.100 --> 00:14:43.700
This is the idea of how to we go from that no population and generate, create a sampling distribution.
00:14:43.700 --> 00:14:47.700
One way you could do is you could use what's called simulation.
00:14:47.700 --> 00:15:01.700
You can use computer programs typically to literally take random samples over and over and over again and create the sampling distribution.
00:15:01.700 --> 00:15:08.300
An example of a data set that you might do this with is something like this.
00:15:08.300 --> 00:15:20.900
Here we have an experimental design we want to know if hamsters who get normal amount of sleep versus less sleep which will experience more stress?
00:15:20.900 --> 00:15:28.700
Maybe we have hypothesis that you know having less sleep leads to greater stress.
00:15:28.700 --> 00:15:35.600
We are going to look at the independent variable of sleep in the regular or less sleep.
00:15:35.600 --> 00:15:43.400
We will wake the hamsters up and the dependent variable that we might look at is their stress hormone levels.
00:15:43.400 --> 00:15:58.800
Let us say we tested these hamsters, here are 10 hamsters in our lab and 7 of them were in the regular sleep group
00:15:58.800 --> 00:16:05.200
and 3 of them were randomly chosen for the less sleep group.
00:16:05.200 --> 00:16:10.000
These are resulting levels of stress hormone, but we want to know is are these less sleep hamsters,
00:16:10.000 --> 00:16:24.600
are these insomnia hamsters are they more stressed then you would expect by random chance.
00:16:24.600 --> 00:16:26.800
That is what we expect.
00:16:26.800 --> 00:16:40.800
The known population that we can think about is something like, these are randomly selected stress hormones.
00:16:40.800 --> 00:16:56.000
Randomly selected hamsters, that these 3 is it is not that they are more stress it is just that by chance you might get these numbers together.
00:16:56.000 --> 00:17:10.800
We might randomly select through a computer program 3 of the entire set of hamsters, so randomly select them and generate the sampling distribution.
00:17:10.800 --> 00:17:17.000
And maybe one thing we might want to do is get their means of 3 hamsters at a time.
00:17:17.000 --> 00:17:21.400
Pick 3 random hamsters, get their means and put it in my sampling distribution.
00:17:21.400 --> 00:17:27.400
Get 3 random hamsters, get the mean and put it in my sampling distribution over and over again until
00:17:27.400 --> 00:17:39.200
we get a whole bunch of means like a dot plot of a whole bunch of means.
00:17:39.200 --> 00:17:44.000
The means might range from 25 all the way to 64.
00:17:44.000 --> 00:17:50.800
The mean can be greater than or equal to or greater than 64 or less than or equal to 25.
00:17:50.800 --> 00:17:53.400
It has to be somewhere between.
00:17:53.400 --> 00:18:37.800
We have these less sleep hamsters, sleep deprived hamsters and so maybe will calculate their mean and so that is 55 + 55 + 64 / 3 = 58.
00:18:37.800 --> 00:18:49.300
Here we have x bar + 58 and we want to know is this likely or unlikely in given the sampling distribution?
00:18:49.300 --> 00:18:57.700
We want to ask likely or unlikely?
00:18:57.700 --> 00:19:08.900
If it is likely then we cannot separate whether this was due to less sleep or due to chance but it is unlikely we might say we do not think it is randomly selected.
00:19:08.900 --> 00:19:16.400
It is not that you randomly selected these hamsters.
00:19:16.400 --> 00:19:21.800
It is unlikely that you randomly selected these hamsters.
00:19:21.800 --> 00:19:28.400
It is more likely that something special has been done to them and we know what that special thing is it is that they have been deprived to sleep.
00:19:28.400 --> 00:19:31.500
In that way you can make some conclusions.
00:19:31.500 --> 00:19:41.500
You cannot necessarily say for certain whether sleep causes hormone levels to rise.
00:19:41.500 --> 00:19:47.300
It is not that you looked at the mechanism and saw the sleep causing hormones.
00:19:47.300 --> 00:19:57.100
It is not necessarily that but you can say whether this is the likely sample given random processes or you can say it is unlikely given random processes.
00:19:57.100 --> 00:20:03.700
That is really all we can know from this kind of logic.
00:20:03.700 --> 00:20:06.900
This kind of logic will take us far.
00:20:06.900 --> 00:20:20.300
Here we go, here are all our hamsters and here is our known sample of less sleep hamsters.
00:20:20.300 --> 00:20:32.900
It is those guys but you know maybe it is just that they are randomly picked.
00:20:32.900 --> 00:20:42.700
In order to get generate this sampling distribution of the mean here is what we did.
00:20:42.700 --> 00:20:49.300
I am just summarizing in steps what we talked about before.
00:20:49.300 --> 00:21:05.100
First what we did was take random sample of 3 hamsters and here we will call that n because n is the size of the sample.
00:21:05.100 --> 00:21:21.200
Then what we did was we computed a summary statistic, we computed the mean but we could have computed standard deviation.
00:21:21.200 --> 00:21:31.400
We could have computed your median, sum summary statistic.
00:21:31.400 --> 00:21:43.200
Number 3 is the important step, repeat steps 1 and 2.
00:21:43.200 --> 00:22:03.600
You do this over and over again, that is what it means to stimulate a sampling distribution and the 4th is examine and plot resulting sample statistics.
00:22:03.600 --> 00:22:19.400
Here these are all means, that is the mean, that is the mean, that is the mean, it is a mean of 3 that have been selected.
00:22:19.400 --> 00:22:29.200
If we repeat these 2 go over and over again and we plot it then we will see the distribution of sample statistics.
00:22:29.200 --> 00:22:37.200
That is why it is called a sampling distribution.
00:22:37.200 --> 00:22:47.800
Once we have a sampling distribution, then you can see it has the expected value.
00:22:47.800 --> 00:22:49.600
We can find the mean of this thing.
00:22:49.600 --> 00:22:51.200
It is like a middle mean.
00:22:51.200 --> 00:22:52.800
It is a mean of mean.
00:22:52.800 --> 00:22:56.200
We can find the standard deviation of this thing.
00:22:56.200 --> 00:23:00.800
That is what we mean by expected values.
00:23:00.800 --> 00:23:12.600
Okay, so now let us compare these two things.
00:23:12.600 --> 00:23:20.600
We know the logic of probability distributions, now let us apply the same logic to sampling distribution.
00:23:20.600 --> 00:23:28.500
Here we go known population 2 fair dice, probability distribution is the same thing I need if it just small.
00:23:28.500 --> 00:23:38.500
Here is a likely sample, here is a unlikely sample and we know it is likely by looking at this probability distribution.
00:23:38.500 --> 00:23:42.900
We know this is unlikely by looking it up in this probability distribution.
00:23:42.900 --> 00:23:44.900
Pretty straightforward.
00:23:44.900 --> 00:23:49.500
What about in terms of the hamsters stuff?
00:23:49.500 --> 00:24:03.900
The known population is all the hamsters in our study not only that but our mechanism is just like here we think it is 2 fair dice.
00:24:03.900 --> 00:24:17.100
Here we think it is all hamsters and they were 3 chosen randomly and we can simulate that process.
00:24:17.100 --> 00:24:24.900
We could choose 3 randomly and find a sampling distribution and these are all means now.
00:24:24.900 --> 00:24:34.500
Not only that but we could find a likely sample, which might be something like 47.
00:24:34.500 --> 00:24:46.500
If we found a mean of 3 hamsters so our sleep deprived hamsters were had a mean of 47
00:24:46.500 --> 00:24:53.300
in their stress hormone levels we might say this is very similar to chance.
00:24:53.300 --> 00:25:00.100
It seems like they are not that different from just picking hamsters at random
00:25:00.100 --> 00:25:09.900
but we might have an unlikely sample and here in the example I should do we had 58 and 58 is over here.
00:25:09.900 --> 00:25:21.900
And that might show us that is really unlikely that we would choose 3 hamsters at random and get such a high mean.
00:25:21.900 --> 00:25:31.300
And so in that sense, we can start thinking this is likely to have come from random generation.
00:25:31.300 --> 00:25:34.900
This is less likely to have come from random generation.
00:25:34.900 --> 00:25:51.300
We talked about a very specific method of simulating sample distributions like take 3 hamsters at a time, compute their means.
00:25:51.300 --> 00:25:55.900
I am just going to say that the same 4 steps but I am going to say that in just more general terms.
00:25:55.900 --> 00:26:04.700
Take a random sample of size n from the population, whatever your population is, whatever your n size is, your sample size.
00:26:04.700 --> 00:26:20.500
Then you compute a summary statistic and this could be the mean, median, and it could be mode, it could be variance, it could be a whole bunch of different things.
00:26:20.500 --> 00:26:26.200
All the summary statistics that we have talked about earlier and it could be any one of those things.
00:26:26.200 --> 00:26:34.100
Then you repeat 1 and 2 many times and that is the simulation part where we pretending to do this many, many times.
00:26:34.100 --> 00:26:39.300
And that is why it is really helpful that we have computer programs that can help us do this many,
00:26:39.300 --> 00:26:43.300
many times then we do not have to actually draw beads or patterns.
00:26:43.300 --> 00:26:49.500
Finally we want to display and examine the distribution of sample statistics.
00:26:49.500 --> 00:27:04.300
We will have a whole bunch of means or we will have a whole bunch of variances or whole bunch of standard deviation, or inter quartile ranges, whatever you want.
00:27:04.300 --> 00:27:10.500
In that way, this is the general method of simulating the sample statistics.
00:27:10.500 --> 00:27:24.700
Once you have that simulation this part is going from the known population and usually the known population is the random process,
00:27:24.700 --> 00:27:32.700
the default because you generate on nonrandom processes but it is really easy to generate random processes.
00:27:32.700 --> 00:27:38.500
And then we get our sampling distribution.
00:27:38.500 --> 00:27:44.100
This part is this area here simulating.
00:27:44.100 --> 00:27:54.600
That is that part.
00:27:54.600 --> 00:28:04.000
Now there is another part and that is the part where we now compare 2 samples.
00:28:04.000 --> 00:28:20.800
Rather we compare sample to the sampling distribution and we decide likely or unlikely.
00:28:20.800 --> 00:28:26.200
That is another part that we haven't talked about here.
00:28:26.200 --> 00:28:32.600
In order to do this, what you need to do is compute the summary statistics for your sample.
00:28:32.600 --> 00:28:41.600
If you have a whole bunch of variances to compute for your sample and then you compare it to your sampling distribution.
00:28:41.600 --> 00:28:44.800
And we make a call it is likely than unlikely.
00:28:44.800 --> 00:28:52.100
Even though we know all these stuffs there still some little bugger questions that remain.
00:28:52.100 --> 00:29:01.100
In these cases we know what the population looks like, like we have a group of hamsters
00:29:01.100 --> 00:29:07.300
and we know what it looks like and we are generating these random processes.
00:29:07.300 --> 00:29:12.900
What happens when we have no idea what the population looks like.
00:29:12.900 --> 00:29:14.900
We do not have a nice list of 10 hamsters.
00:29:14.900 --> 00:29:20.900
What if we want to know stuff like what if you randomly draw from any high school student in the US?
00:29:20.900 --> 00:29:23.600
We do not have all those numbers.
00:29:23.600 --> 00:29:37.500
Like in order to do the sampling process you would have to have all the giant list of GPAs of high school students in the US and pull randomly from them.
00:29:37.500 --> 00:29:39.400
What if you do not have that population?
00:29:39.400 --> 00:29:53.400
Also tell me how sampling distributions for summary statistics, other than the mean because we have only looked at the one with the mean and what it does look like.
00:29:53.400 --> 00:30:02.600
We have a sort of answer for this guess, but we want to know can we just pick one randomly like which one to use?
00:30:02.600 --> 00:30:11.600
The 3rd unanswered question is how to know whether a sample is sufficiently unlikely?
00:30:11.600 --> 00:30:21.200
So far we have been just eyeballing it like we look at it and say that seems unlikely, 2 person that seems unlikely 5% that seems unlikely.
00:30:21.200 --> 00:30:24.600
10% that seems more likely.
00:30:24.600 --> 00:30:33.200
It seems like we are just making a judgment call but how do we know whether it is truly unlikely or just our opinion?
00:30:33.200 --> 00:30:46.000
Do we always have to simulate a large number of samples in order to get a sampling distribution because that seem like that can be really hard do.
00:30:46.000 --> 00:30:59.200
These are the questions that remain, but we just went through the intro so these will be answered later on.
00:30:59.200 --> 00:31:11.800
Let us get in example 1, consider the sampling distribution of the mean so the summary statistic is the mean just like the hamster example.
00:31:11.800 --> 00:31:27.200
Sampling distribution of the mean of a random sample of size and taken from a population of size N with the mean of μ and stdev(sigma).
00:31:27.200 --> 00:31:34.600
So what is saying is just use these as constants, pretended they have been given to you.
00:31:34.600 --> 00:31:42.800
If N=n what are the mean and standard error of the sampling distribution?
00:31:42.800 --> 00:31:50.600
Here I should show you that there is actually a new little notation that you should know.
00:31:50.600 --> 00:31:55.000
The mean of the sampling distribution of mean looks like this.
00:31:55.000 --> 00:32:03.400
It is a μ because it is the expected value and expected values are theoretical populations.
00:32:03.400 --> 00:32:10.400
It is an expected value of mean so here we would put a little x bar.
00:32:10.400 --> 00:32:18.700
It is a bunch of little sample means and same thing with standard error.
00:32:18.700 --> 00:32:23.200
This should also say standard deviation.
00:32:23.200 --> 00:32:31.000
The special name for the standard deviation of the sampling distribution because that tends to be long and we use that concept over and over again.
00:32:31.000 --> 00:32:39.000
We just call it standard error but it is really just the standard deviation of the sampling distribution.
00:32:39.000 --> 00:32:46.400
Here is the sampling distribution of means, so we call it Sigma X bar, sub x bar.
00:32:46.400 --> 00:33:01.600
If our entire population is size N and we take sampling distribution of size N basically N = n.
00:33:01.600 --> 00:33:04.000
What would be the mean and standard error?
00:33:04.000 --> 00:33:14.800
Well, the mean of my sampling distribution should be the same exactly as the mean
00:33:14.800 --> 00:33:30.600
of my population because basically we are sampling the entire population.
00:33:30.600 --> 00:33:37.400
Think about the standard deviation remember we are calculating standard deviation.
00:33:37.400 --> 00:33:41.000
We are getting the average deviations.
00:33:41.000 --> 00:33:56.600
When we get the average deviation that ends up, when you take the mean over and over again that mean is going to be the same every single time.
00:33:56.600 --> 00:34:07.100
Before, if your population have whatever standard deviation now it is going to be super tiny because you are going to have no standard deviation and spread.
00:34:07.100 --> 00:34:13.200
You are just going to get the same mean every single time.
00:34:13.200 --> 00:34:24.800
This is going to be super small or maybe close to nothing because there is no spread or mean.
00:34:24.800 --> 00:34:36.200
What if n is very small and N is very large so our population is enormous?
00:34:36.200 --> 00:34:42.800
Our sample is small.
00:34:42.800 --> 00:34:46.800
What about in that case?
00:34:46.800 --> 00:34:51.800
What will our mean of the sampling distribution look like?
00:34:51.800 --> 00:35:00.800
All we know is that we do not for sure that it is going to be the same.
00:35:00.800 --> 00:35:02.600
That is all we can say.
00:35:02.600 --> 00:35:13.000
It might be close because if you take a whole bunch of these means it might be close but maybe we are going to be less sure.
00:35:13.000 --> 00:35:30.400
What about if the N is very large and n is very small?
00:35:30.400 --> 00:35:34.200
What would the spread of our sampling distribution looks like?
00:35:34.200 --> 00:35:42.400
In this case it might be smaller but it would not be necessarily super small.
00:35:42.400 --> 00:35:55.800
Maybe smaller but not necessarily super small.
00:35:55.800 --> 00:36:00.800
What about μ sub x bar?
00:36:00.800 --> 00:36:03.600
Here we are not sure.
00:36:03.600 --> 00:36:11.400
Is it going to be similar to the mean of the population?
00:36:11.400 --> 00:36:13.000
Is it going to be different?
00:36:13.000 --> 00:36:15.800
Let us think about taking one out.
00:36:15.800 --> 00:36:29.600
Any one single mean might be very different from the population but think about taking a whole bunch of those means and get the mean of that.
00:36:29.600 --> 00:36:32.400
When you get the mean sort of you get the middle.
00:36:32.400 --> 00:36:35.500
Here we have this giant population.
00:36:35.500 --> 00:36:35.900
We take all these samples out.
00:36:35.900 --> 00:36:40.400
We have all these samples and then we take the middle of that.
00:36:40.400 --> 00:36:44.800
It should be the middle of the population.
00:36:44.800 --> 00:36:51.400
Maybe we will take an educated guess and say that might be equal to μ too.
00:36:51.400 --> 00:37:06.100
We are not saying that any single one of them is going to be equal to the μ but the average of the whole bunch of those means
00:37:06.100 --> 00:37:10.300
maybe equal to the μ because we are always moving to the center.
00:37:10.300 --> 00:37:18.800
Example 2, what might be the best way to describe sampling distributions?
00:37:18.800 --> 00:37:26.100
Let us think about how to describe regular old distributions.
00:37:26.100 --> 00:37:31.700
Remember shape, center, and spread?
00:37:31.700 --> 00:37:42.300
Maybe that will apply to sampling distributions, shape, center, and spread.
00:37:42.300 --> 00:38:00.400
We could actually find the shape of it, the center, for example μ sub x bar or if you use something else like sampling distribution of standard deviations.
00:38:00.400 --> 00:38:15.800
It might be μ sub sigma, also spread, standard deviation of x bar.
00:38:15.800 --> 00:38:20.600
Maybe that is the way we could describe sampling distributions as well.
00:38:20.600 --> 00:38:31.400
Example 3, 3 very small populations are given each with a μ of 30.
00:38:31.400 --> 00:38:41.800
This is truly a small population there is only 2 items in this population and there is only 5 in this one, and there is only 3 items in this one.
00:38:41.800 --> 00:38:48.200
Match this to the corresponding sampling distribution of the sample mean n =2.
00:38:48.200 --> 00:38:58.400
Assume replacements if you draw 2 out you will put them back or else after once.
00:38:58.400 --> 00:39:01.000
Let us look at this.
00:39:01.000 --> 00:39:08.800
This one it seems to go from 10 all the way to 50 so you could have a mean of 10.
00:39:08.800 --> 00:39:12.400
You could also have a mean of 50.
00:39:12.400 --> 00:39:14.600
That is possible here.
00:39:14.600 --> 00:39:24.300
Think about a population where there are a bunch of 10 because every time you draw a 10 you will replace it.
00:39:24.300 --> 00:39:27.900
You could get a mean of 10.
00:39:27.900 --> 00:39:34.600
Here you could also get a mean of 10 and here you cannot get a mean of 10.
00:39:34.600 --> 00:39:36.900
There is no mean of 10 here.
00:39:36.900 --> 00:39:43.900
A cannot go possibly with C but A could go possibly with A or B.
00:39:43.900 --> 00:39:50.500
I should probably call this 1, 2, 3, just so that we do not get confused.
00:39:50.500 --> 00:40:01.300
Can A give a mean of 15 or 20?
00:40:01.300 --> 00:40:13.700
Can you put 10 and 15 together in any way and divide it by 2 in any way to get 15 or 20 or even 25 or 35?
00:40:13.700 --> 00:40:23.500
No, there is no possible way 10 and 15 can be combined together and divide it by 2 in any way to give you 15.
00:40:23.500 --> 00:40:37.300
But here we can have a mean of 10 or 15, and we can have a mean of 30 which is 10 + 50 = 60 ÷ 2 = 30.
00:40:37.300 --> 00:40:44.900
Those are the 3 types of mean you could have so I would say goes with that one.
00:40:44.900 --> 00:40:56.600
I would say B would go with this one because in B you could have a mean of 10 or 50 and you could have all these means in between.
00:40:56.600 --> 00:41:01.400
If you got 10 and 20 and average them together that would be 15.
00:41:01.400 --> 00:41:13.800
This one has those possibilities, these different possibilities that this one does not have.
00:41:13.800 --> 00:41:17.800
Let us move onto this one.
00:41:17.800 --> 00:41:27.300
Here we know that you cannot have a mean of 10 or 50 but you can have a mean of 20, 25, 30 and because of that we know that this one goes with this one.
00:41:27.300 --> 00:41:34.100
We have also used the other ones.
00:41:34.100 --> 00:41:46.100
Here we see that the things you have in your population limits the kinds of means that you will see in your sampling distribution of the mean.
00:41:46.100 --> 00:41:51.300
Here it is the sampling distributions of the sample mean but same idea.
00:41:51.300 --> 00:41:57.500
Example 4, here are very small populations again A, B, and C.
00:41:57.500 --> 00:42:02.500
Estimate the sampling distributions mean and compare them to the populations mean.
00:42:02.500 --> 00:42:07.900
Which standard deviation is smaller the population or the corresponding sampling distribution?
00:42:07.900 --> 00:42:14.500
I am just going to renumber these again and let us estimate these means.
00:42:14.500 --> 00:42:38.000
Here the mean looks like 30 and here the mean might be a little bit greater than 30.
00:42:38.000 --> 00:42:47.200
Here we know the mean because it is 30, 30, and 30.
00:42:47.200 --> 00:43:13.400
What we see is that the sample means even though n is smaller because it is only 2 we see that the sampling distribution of the mean,
00:43:13.400 --> 00:43:21.000
the expected value is very similar to the actual population mean.
00:43:21.000 --> 00:43:28.500
Estimate them, compare them, very similar.
00:43:28.500 --> 00:43:39.100
Which standard deviation is smaller, the population standard deviation or the corresponding sampling distribution?
00:43:39.100 --> 00:43:48.900
It might be helpful if we find out what the population standard deviation would look like.
00:43:48.900 --> 00:43:55.300
Here we have something like A, B, and C.
00:43:55.300 --> 00:44:08.300
A is 10 and 50, B is 10, 20, 30, 40, 50, and C is just 23 and 40.
00:44:08.300 --> 00:44:17.900
Let us find the standard deviations of these populations.
00:44:17.900 --> 00:44:28.300
The reason why we use standard deviation of the population is because we want it to divide by n rather than n-1.
00:44:28.300 --> 00:44:32.300
We could just put it all in blue.
00:44:32.300 --> 00:44:49.500
I want to test them to make sure I could use these blank ones so that I can just copy and paste the process.
00:44:49.500 --> 00:44:53.500
The one that has the greatest spread is A.
00:44:53.500 --> 00:45:05.700
The middle population here this one has the middle spread and this one has the least spread.
00:45:05.700 --> 00:45:35.200
Just to give you an idea this populations and standard deviations is 20 but if you estimate this, is this less than the standard deviation of 20 or greater?
00:45:35.200 --> 00:45:56.700
If you think about the standard deviation of 20 and another 20 would be that and usually within 3 standard deviation you have almost 99%.
00:45:56.700 --> 00:46:06.500
Here what we see is I already have 1 standard deviation you have almost everybody in there.
00:46:06.500 --> 00:46:11.100
I would say this standard deviation is smaller.
00:46:11.100 --> 00:46:22.700
Here μ sub x bar that is the same but sigma sub x bar is smaller that sigma.
00:46:22.700 --> 00:46:25.700
What about here?
00:46:25.700 --> 00:46:48.100
Here the sigma would be something like 14 and if I go about 14 that would be like that.
00:46:48.100 --> 00:47:00.900
14 would be that 1, 2, 3.
00:47:00.900 --> 00:47:07.500
Even when I go out about 1 standard deviation I will basically cover the entire space.
00:47:07.500 --> 00:47:23.100
Here although the μ sub x bar is similar our standard error is smaller that our standard deviation of the population.
00:47:23.100 --> 00:47:33.300
Let us that same logic for the last one that has a standard deviation of approximately 8 and here let us go out about 8.
00:47:33.300 --> 00:47:38.500
8 would be like that.
00:47:38.500 --> 00:47:57.500
Again, we see that although the μ is similar our standard deviation or standard error is less than sigma.
00:47:57.500 --> 00:48:05.300
One thing we find is that typically the population standard deviation is smaller than the corresponding sample distributions or what we call standard error.
00:48:05.300 --> 00:48:15.100
That is your introduction to sampling distributions.
00:48:15.100 --> 00:48:17.000
Thanks for using www.educator.com.