Sign In | Subscribe
Start learning today, and be successful in your academic & professional career. Start Today!
Loading video...
This is a quick preview of the lesson. For full access, please Log In or Sign up.
For more information, please see full course syllabus of Statistics
  • Discussion

  • Download Lecture Slides

  • Table of Contents

  • Transcription

  • Related Books

Bookmark and Share
Lecture Comments (2)

1 answer

Last reply by: Professor Son
Fri Oct 17, 2014 5:53 PM

Post by Anthony Biggs on April 7, 2012

I want to watch the part to do with CLT but it wont let me skip!!!!

Sampling Distribution of the Mean

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

  • Intro 0:00
  • Roadmap 0:05
    • Roadmap
  • Special Case of General Method for Simulating a Sampling Distribution 1:53
    • Special Case of General Method for Simulating a Sampling Distribution
    • Computer Simulation
  • Using Simulations to See Principles behind Shape of SDoM 15:50
    • Using Simulations to See Principles behind Shape of SDoM
    • Conditions
  • Using Simulations to See Principles behind Center (Mean) of SDoM 20:15
    • Using Simulations to See Principles behind Center (Mean) of SDoM
    • Conditions: Does n Matter?
    • Conditions: Does Number of Simulation Matter?
  • Using Simulations to See Principles behind Standard Deviation of SDoM 27:13
    • Using Simulations to See Principles behind Standard Deviation of SDoM
    • Conditions: Does n Matter?
    • Conditions: Does Number of Simulation Matter?
  • Central Limit Theorem 37:13
    • SHAPE
    • CENTER
    • SPREAD
  • Comparing Population, Sample, and SDoM 43:10
    • Comparing Population, Sample, and SDoM
  • Answering the 'Questions that Remain' 48:24
    • What Happens When We Don't Know What the Population Looks Like?
    • Can We Have Sampling Distributions for Summary Statistics Other than the Mean?
    • How Do We Know whether a Sample is Sufficiently Unlikely?
    • Do We Always Have to Simulate a Large Number of Samples in Order to get a Sampling Distribution?
  • Example 1: Mean Batting Average 55:25
  • Example 2: Mean Sampling Distribution and Standard Error 59:07
  • Example 3: Sampling Distribution of the Mean 1:01:04

Transcription: Sampling Distribution of the Mean

Hi and welcome to 0000

Today we are going to talk about the sampling distribution of the mean.0002

We have been learning about sampling distributions in general but the sampling distribution 0004

of the mean sometimes called the sampling distribution of the sample mean is a special case of it.0010

It has a lot of really interesting properties that better come in handy over and over again. 0019

It is worth looking into detail.0025

This is just me, it is not like all of statistics, but I am going to call it SDOM for short sampling distribution of the mean.0027

Just so that you will know what I'm talking about without me having to say sampling distribution of the mean every single time.0038

We are going to use simulations online to see some principles that might arise out of season principle 0046

and regularities that arise about the shape, mean, standard deviation of the SDOM.0055

Basically, the idea here is that shape, center, and spread really summarize the sampling distribution of mean.0064

We are also going to talk about how the principle of the SDOM that 0072

we have looked up through the simulations have also been proven in the central limit theorem.0076

We are not going to go over the big formal proofs, although you can find those online but I want you to see how these two connect to each other.0083

Finally we are going to compare the population distributions that we have looked at before 0092

sample distributions as well as sampling distribution of the mean.0097

This is a new kind of distribution.0102

Finally, we are going to recap and see if we had answered some of these questions that remain from last time.0104

Remember this is a special case of the general method for simulating the sampling distribution.0112

Let us go over that and fill in how the SDOM or the sampling distribution of the mean is a special case of regular old sampling distribution. 0123

First up, take a random sample of size n from the population that does not change with the SDOM for all sampling distributions.0147

That is always the same stuff.0159

The real difference is really in step number 2.0161

Here we are computing the summary statistics.0166

The thing that makes sampling distribution of the mean special is that the particular summary statistic that you compute here is the mean.0170

You will plot that mean.0181

We actually looked at SDOM before but we just called them sampling distributions of 0186

and now we are going to look at all the different properties of sampling distributions of the mean.0192

Then you repeat 1 and 2 many, many times, that step does not change.0198

Finally, display and examine the distribution of summary statistic.0202

That does not change either.0206

The only thing that we really have nailed down in the SDOM is this one and it is just that because 0208

it is the sampling distribution of the mean all you do is find the means then you plot those.0214

It is a distribution entirely made up of tiny little mean.0219

We are going to go to looking at some simulation on the computer so if you want to type in this in your browser, go ahead and do that it is

The actually have a cool statistics simulation you might want to also explore some of those.0243

This is what it should look like if you go to the web site here and if you hit begin it should start a java applet that looks something like this.0251

Let us just go over what these different things mean.0268

First things first, up on top it says parent population.0273

This is what this computer simulation shows you is a potential parent population but if you have a mouse, you can draw anything you want. 0293

It does not have to look like this.0305

It can look as different as you wanted to look.0308

You can make any parent population.0312

This is a 5 mode distribution and also if you click on this little bar you could see different ones that are already preprogrammed for you. 0314

There is the normal distribution levels like that, the uniform distribution and skewed distribution.0330

Custom just gives it to blank and then you could draw whatever you want.0338

Here you could paint any parent distribution you want and this is the distribution from which our sampling distribution will pull out a random sample.0344

And whenever you draw a distribution here, it will show you what the mean is.0359

The mean is shown in blue 17.58, the median shown in pink, and the standard deviation, 0365

this shows you 1 standard deviation served to the negative side, and the positive side from the mean.0377

Also how much skew it is in the kurtosis as well, something we learn to calculate back when we are talking about the normal distribution.0386

We could also go ahead and let us use normal for now.0396

Here we could see that the skew and kurtosis are perfectly 0.0404

The mean and median are the same value, they are right on top of each other and the standard deviation is 5.0410

5 on each side and you can sort of see if you go 5, 5, and 5 you have gotten 90% of distribution.0415

Just like a normal distribution should.0425

Let us go on to the steps outlined in how to stimulate a sampling distribution.0427

The first step was to pull out a set of random sample of size n from the parent population.0436

What we can do is we could tell this little java applet what out sum and n should be.0443

We could say give me n(16) it is going to pull out 16 and here we could actually ask for a variety of different summary statistic, 0452

but because we are talking about the SDOM or the sampling distribution of the mean, let us choose mean.0467

If you hit animate it, it will show you what it looks like to pull out one sample of size 16 and find the mean.0473

Here hit animate it, it is pulling out 16 randomly selected data points from the population. 0484

It finds the mean of that which is the little blue notch there and then drops that mean here.0494

We keep track of just the means and so let us do that again. 0502

It is going to pullout 16 data points find the mean and then drop back down. 0506

We are keeping track of all those means.0513

Let us do it one more time, pulling out 16, finds the mean, drop that down. 0515

Here our sampling distribution is really small but the nice thing about this little simulation is that it will do 5 different samples for you and just drop down 5 means.0523

Let me show you that.0537

It pulled off 5 samples like of 16 and did all that stuff, but without showing you pulling it out then it just drop down the 5 means.0539

In fact, you could do 1000 of those, so this time it pulled out 16,000 times and put the means down.0550

It could even do 10,000 and so we could keep hitting 10,000.0562

It might seem as though this is not changing.0569

What this is frequency.0572

How frequent a particular mean is?0574

Even though the frequency is going up, the shape is really not changing very much.0576

Let us look at this distribution of means. 0581

It has a skew that is very close to 0 and a kurtosis very close to 0.0584

In fact, this looks very much like a normal distribution. 0590

If you can imagine 3 of these little steps going out that is about 99% of the entire distribution.0594

Here we see that the mean is very similar to the mean here.0602

Here the mean is 16.01 and here the mean is 16.0608

We are already starting to see some of the things we have talked about earlier.0614

We see them in this simulation.0619

That is good.0624

The question is this, maybe it makes sense all your sampling distribution of means would be normal if your parent population is normal.0625

But what if your parent population was not normal?0637

For instance, what if it was uniform then what would our distribution of means look like?0639

Let us pullout 16 and find the mean and let us do that 10,000 times and another 10,000.0647

Here we have 20,001 different simulated little experiments and we have all these means and what do we see?0661

We see the means are still very similar 16 and 15.98.0671

Not only that but we see that it is approximately normal. 0678

Imagine this little space going out 3 times that is about 99% of this entire distribution. 0682

The skew is very close to 0.0689

The kurtosis is also very close to 0.0692

So far for normal parent populations as well as the uniform parent populations, we see that the sampling distribution of the mean is very close to normal.0694

Our notion of what normal distributions are and we know a lot about normal distributions.0706

What about skew?0712

Would we expect this sampling distribution of means would we expect that also to be normal or maybe a little bit skewed?0716

Let us animate 1 just to see here if we pull down 16, drop down the mean.0727

The mean is sort of close to where the mean is up here but will it be normal or skewed looking.0735

Let us do it 5 times, another 5 times, another 5 times, let us do it 10,000, another 10,000 and what do we see?0747

Although the skew is a little bit greater than it used to be in previous ones, this really looks more normal than anything else.0760

If we get another 10,000, another 10,000, it does not seem to change very much and it looks pretty normal. 0770

If you take this little space the standard deviation and go out about 3 times that is about 99% of the distribution.0779

Not only that but we see that this mean 8.07 is very similar to the mean of the current population 8.08.0787

We are saying that even though the original parent population is not normal, the sampling distribution of the mean 0797

or the SDOM tends to look very normal, specially if you have a lot of means.0806

You simulated it a lot.0813

A lot of large numbers can be involved and now let us do a custom one.0815

We can do some crazy ones here.0820

What about something like this?0822

What do we expect this to be to have a normal sampling distribution of the mean?0825

I am going to animate 1 here is 16 data point, finds the mean, drops it down 0830

and you do that 10,000 times and another 10,000 and we have 16.01.0839

16 as the similar mean, very close to 0 for skew, very close to 0 for kurtosis.0848

This is looking very, very normal.0859

Take the 3, go out 3, 99% that is very interesting.0863

You could try a whole bunch of different crazy things and I dare you to try to draw one 0870

that would not give you a normally distributed sampling distribution of the mean.0877

What about this one?0883

It is like a shadow of the parabola or something.0884

What about that?0892

Let us get directly to the 10,000.0893

Let us clear this.0897

Let us get directly to 10,000 and even for this crazy distribution we see that the sampling distribution of the mean looks fairly normal. 0902

The mean is almost always like right on top of the mean of the parent population and another thing 0917

that I want you to notice is that when the standard deviation is here in 11.66, here the standard deviation is much smaller.0926

There are couple of things we have already learned from using these simulations.0937

Let us think about using these simulations to see the principle behind the shape of the SDOM.0946

One of the things we saw is that no matter what the shape of the parent’s population SDOM tends to be normal. 0958

Wait a second, we have only tested that for n(16).0989

Maybe 16 is somehow special.0995

Maybe what about for n(2) is this also going to be normal. 0998

Let us try that.1008

Look at what we see when the n is very small this does not tend to be normal. 1009

What about n(5), that looks a little more normal but not really that nice normal distribution we saw with 16.1018

What about 10?1029

Now that is starting to look a little bit more normal.1030

What we see is that as long as n tends to be reasonably large and I do not know if there is a magic number 1036

but as long as tends to be pretty large, you tend to get a normal sampling distribution of the mean.1044

That is definitely one thing, so there are a couple of conditions.1052

These conditions have to be fulfilled before this is true.1060

Sample is n, the sample size must be reasonably large.1067

2 is too small, 5 even is a little too small.1081

It starts looking better but whatever reasonably large is it should be reasonably large.1087

A lot of times people use a rule like 40 to be any of n(40) to be like reasonably larger but you know that is just the rule of them.1093

What else?1104

We looked at sample size that might be an important thing, but what else might matter?1106

Does it matter how many times we sample?1117

Let us see, but let us clear the bottom 3 and let us do it for 5.1126

If we did the simulation 5 times then we would not get a normal distribution only when we start doing it1133

like 1000 times or 10,000 times rather or 20,000 times does it start looking more and more normal?1142

It also seems to be that the more simulations we have the better.1151

If using simulations must have a large number of simulations.1158

This is true only if n is reasonably large and only if you have a very large number of simulations if you are using simulations.1179

Although this is really helpful there are some conditions that you have to meet before you could invoke this. We learn some things from using simulations about the shape of the SDOM.1195

What about the principles behind the center or mean of the SDOM?1214

One of the things we found was generally the mean of the SDOM and I am going to show you a new notation for this, 1222

the mean of the SDOM is shown as mu sub x bar because it is the mu of a bunch means.1234

The mean of the SDOM tends to equal the mean of the population.1243

Sometimes I will say parent population, but that means the same thing as the population and that is symbolized by mu. 1257

Just plain mu.1266

Notice that this is using mu because remember SDOM and all sampling distributions are theoretical distributions.1267

Theoretical distributions tend to be notated just like populations.1279

Does it have to be that are there any conditions?1285

I will put conditions does n matter?1292

First we will test this.1303

We want to know if it matters the size of our sample.1306

Let me clear lower 3, let us do a skewed.1310

Let us see if it matters whether the size of our sample matters. 1321

Let us try for n=2 and let us do that 5 times or 10,000 times.1325

N=2 and this does not look very normal but the mean is still very close to that mean.1334

That is interesting to note.1344

What about for uniform?1350

N is 2 is the mean very similar to this mean?1352

It tends to be very similar.1359

For small n the means tend to be equal to each other. 1364

What about for n(5) which is also still pretty small.1370

Let us try 10,000, 16 and 16 still pretty good. 1375

What about for skewed?1381

Let us try 10,000.1382

We are seeing something.1385

What about for very large n?1388

Let us see about that 8.07, 8.08.1391

What about for some crazy custom distributions?1398

15.06, 15.05 pretty good. 1403

For both small and large n the means tend to be similar. 1412

Let us try with this crazy distribution in n(2), 15.04, 15.05.1419

We are seeing something here.1430

We are saying that sort of no matter the size of the n, the mean tends to be equal. 1432

Does n matter?1436

For both small and smaller and larger n, mu sub x bar = mu.1441

That is nice.1469

It is like we do not have to make sure that we have a large n in order to invoke this principle. 1470

What about does number of simulation matter?1477

Let us go see.1489

Let us clear the bottom 3, let us say we only did 5.1492

If we only did 5 simulations we get 15.61 which is not that far off from 15.05.1500

Let us clear that again and do another 5. 1510

Here we get 11.22 and here we get 15.05.1514

Here we see that maybe the number of actual simulations does matter.1522

Here we get 15.79 which is not so bad.1528

Let us clear that and do it again 16.47.1532

We are seeing that if you have a small number simulation then you are not really sure if it is close or not.1537

It might be very little usually a sort of in the right range but having more seems to give you that assurance.1547

Let us try with a larger n.1556

With a larger n we see 15.05, 15.03 and clear that again.1561

I am going to that simulation 5 times 15.03, 15.05, and clear it again.1571

16.51 that is off. 1578

One more time 16.1583

We do see that number of simulations better.1587

Having a large number simulations gives more accuracy in mu sub x bar in the sample mean. 1591

By accuracy I mean more of a match between the mean of your sampling distribution as well as your population mean.1619

n does not matter but the number of simulations does.1627

What about the standard deviation of the SDOM?1632

We have talked about it very briefly in one of the problems before but saying standard deviation of the sampling distribution of the mean is long.1639

We actually end up using this standard deviation a lot.1652

This is going to be something that comes up over and over again and because of that 1656

we give it a special name so that we could shortcut saying standard deviation of the sampling distribution of the mean.1660

The special name is this, it is standard error that is what it is called.1666

Deviation is the word for you how far did the distance between the mean and some data point that is the deviation.1677

Error is often used interchangeably with that term how, off are you from some target?1688

The target here being the mean.1699

It does not mean that we made an error like actual mistake but it just means how off are these data points.1701

That is why it is called standard error and whenever we say standard error what we really mean is this whole thing. 1709

The standard deviation of the sampling distribution of the mean.1716

We do not use the word standard error for any other standard deviations.1719

For instance, if we calculate the standard deviation of the sample or population we would not call it standard error. 1724

This term is only reserved for this concept.1731

What if we learn about the standard error when we did some of our simulations?1734

Let us look again with a special focus on standard deviation. 1742

Let us go to the normal distribution and let us start with the n(5).1751

Let us take 10,000 simulations and we see that it is fairly normal and it has a very similar mean has a skew and kurtosis close to 0.1759

This is looking very normal to us but what is one thing that you notice between this and this?1775

This one seems a lot sort of like cornier or sharper.1782

The standard error seems to be smaller than the population standard deviation.1787

Let us see if that is true for different values of n.1797

Let us try n=10 and I will just keep straight to 10,000 simulations and another 10,000. 1801

We see once again, it is nice and normal skew and kurtosis are close to 0, similar mean 1810

but we see that the standard deviation here is still bigger than the standard deviation down here.1818

Not only that, but we see that this is even pointier than it was before.1826

It was before even sharper. 1831

Let us try for n(16).1832

I think it is hard to notice but it is even pointier like narrow in terms of its standard deviation.1838

If you notice the numbers, these numbers keep going lower and lower.1851

Before where n=5, the standard deviation was 2.23.1856

For n=10, the standard deviation is 1.61.1863

Obviously the standard deviation has not changed at all. 1871

Let us look at standard deviation of n=25.1876

Here the standard error is 1 so it is really small.1878

One thing we see it is that basically the standard error is always smaller than the parent population, 1888

but not only that, how small it is 10 seems to be related to n?1900

Let us see if that is true with other formations.1906

Maybe we will draw a custom one. Here is the tri-modal distribution.1912

Maybe this time we will start off with the biggest n, the n(25) and let us see. 1920

The standard deviation is 1.71 and this is looking pretty normal but what about n=10?1929

Is the standard error going to be smaller than 1.7?1940

It is not, it is better and that is what we saw before as n is smaller and smaller the standard error is bigger and bigger.1944

There is a bigger spread.1956

Let us try with n=5.1957

We should predict the issue be even bigger and that is what we see with n=2 now the standard error is really big.1960

Maybe we will try with one other distribution, the uniform distribution.1972

Let us go from small to big n so now when n is small and getting bigger and bigger and bigger what should happen to standard error?1980

Standard error should get smaller and smaller.1989

They are facing an inverse relationship.1993

Here we start off with a pretty wide looking standard deviation and you want to look at the red part right here it is pretty wide 6.79.1996

Remember 6.79 is still smaller than 9.52.2006

It is always smaller than the parent population but always n matters.2013

What about n=5?2020

Now it is 4. Something, n=10, now it is 3, n=25, now it is 1.91.2024

We see a couple of things here.2039

Standard error and to write standard error we would write sigma because the sampling distribution is a theoretical distribution sub x bar 2041

to indicate it is the standard deviation of a bunch of means is always smaller than standard deviation of population and we call that just sigma.2057

This seems to be true.2083

Does n matter?2089

Yes it does matter.2096

How does it matter?2100

The larger the n the smaller the standard error. 2102

Obviously you could also write this as the smaller the n the larger the standard error. 2115

It is the same idea.2123

Do the numbers of simulations matter?2125

Let us see.2130

Let us say we only had 5 what is the standard deviation?2138

Well it is still smaller.2146

Standard deviation is almost always smaller, that is true. 2148

Standard deviation is almost always smaller but that is rough.2162

We do not know exactly how small.2172

Does number of simulation matter?2174

This is a pretty big idea.2191

It just tends to be smaller, not really for the big idea that standard error is smaller than the population standard deviation.2196

The simulations do not really matter but we want to get more precise than that. 2217

This is a pretty general to just say all the standard error is smaller than the standard deviation of the population.2222

It is nice to know exactly how much smaller.2229

That is where the central limit theorem comes in.2231

Although we have looked at it through simulation that is the empirical method of looking at 2237

what the properties of the sampling distribution of the mean look like.2244

With the central limit theorem did was it formalize those things like people observe that this is true that 2250

the shape tends to be normal as n is large that the center tends to be the similar between the SDOM as well as the population.2257

We also saw that the spread tends to be smaller and the SDOM than in the population particularly 2267

as n goes up but the central limit theorem is the formal proof of that idea.2275

I am not going to go for the proof bit I will just go over what it ends with.2281

The central limit theorem ends with this.2286

As sample size increases the shape of SDOM becomes more normal.2289

I should say approximates normality.2311

It is not like that it is like being built in becoming more like transforming but it approximates normality.2317

As n goes up the shape becomes more normal.2331

This is just a formal way of saying it and being actually proven this mathematically.2338

To note, although it is not part of the central limit theorem, one thing to note is that the population is not necessarily the same shape as the SDOM. 2343

The SDOM is always normal as well as sample size is large but that does not mean that the populations are always normal.2356

That is helpful to ask because we know the shape of the SDOM even though we do not know anything about the shape of the population.2365

What about center?2373

The principle of the center is that the mean of the SDOM is equal to the mean of the population and does n matter?2376

No, it does not. 2387

What about the spread?2390

Here the standard error is equal to the standard deviation of the population divided by the square root of n.2393

There is an easier way to remember this is that the variance of the SDOM is actually equal to the variance of the population divided by n.2425

Standard error when we square root both sides, this is going to be the variance of the standard error of the population divided by the square root of n.2438

That is the only thing but this will give us that nice inverse relationship as n goes up, as n becomes bigger and bigger and bigger 2450

you are dividing the population standard deviation by a bigger and bigger number. 2462

Therefore resulting in a smaller and smaller and smaller standard error.2469

As well as the opposite, as n get smaller and smaller let us consider the n(1).2475

When n is 1 we know that the standard error precisely equals the standard deviation of the population.2482

Let us just think about that in our heads, if we are actually getting just an n(1) from the population then we end up sampling the population exactly as it was.2491

It would be like dropping down the same thing and because of that the standard deviation would be the same, the mean would be the same.2504

Not only that but the shape would be the same as the shape of the population, it would not be normal.2512

It is helpful to think about the special case when n=1.2520

In statistics you would never sample n is just 1 that is not helpful to us.2525

We really get more out of the central limit theorem when n is larger and larger.2533

This is the central limit there in a nutshell and this is going to be helpful to us because now we do not need a large number of simulation 2541

because in a lot of the simulations we are looking at, the simulation matter like how many simulations you do.2555

What is nice about the fact that they just proven the central limit theorem also we may collect the CLT.2564

Is that now we could just get directly to these principals without having to actually do computer-based simulations.2570

but the computer-based simulations are helpful for just to be able to see and know where the central limit theorem comes from empirically.2579

Let us compare the population, the sample, and the SDOM.2587

These are 2 things that we looked at a lot before, but I want you to see how these all fit together.2596

When we talked the population before we called it the truth, and this is the thing we really want to know. 2603

We do not actually want to know about samples.2610

We really want to know about the population. 2612

The problem is it is really hard to get empirical data.2615

You actually go out and get data from the world. 2620

It is impossible to get data on.2623

Largely it is either theoretical or we just do not know what the population is like or when we do have known populations, 2626

it might be small populations or very well study.2637

The summary values for the population are called parameters and in the same way the samples are not the truth.2640

We are not really interested in samples but we are using the sample as a window to the truth.2651

What is nice about the sample is that unlike the population it is empirical.2657

We could actually go out there and get data on it.2661

The summary values are called statistics, we call them statistics. 2663

For the population we symbolize the mean by saying mu here we call it x bar here the size is N but the sample size is n.2668

Variance here is called sigma2 but here it is s, this is n-12 version.2687

For standard deviation it is just sigma4s.2700

Also you might see S but when you use S you are not trying to approximate the population standard deviation.2706

You are just interested in the actual standard deviation of the sample.2718

Now that we know this how does the SDOM fit in all of this?2722

The SDOM is it the truth?2729

Is it the thing we want to know?2732

No not really nor is it just a window to the truth.2734

In essence what it really help us do is that it helps us get form this to this.2738

It is sort of a middleman because what the SDOM like is like a whole distributions of windows to the truth.2744

Although it is not the truth itself, it helps us interpret the sample because it is a whole bunch of windows to the truth 2760

and you can see where does the sample fit in to the sampling distribution of the mean?2769

We do not get data on it.2776

It is not empirical.2780

It is theoretical.2781

It is still a theoretical distribution but we can easily generate it because of the CLT.2782

Those principles help us know what the SDOM looks like and instead of calling them parameters 2793

or statistics what we call them is expected values.2801

Just like probability distributions they are both theoretical distributions of samples.2804

Instead of calling it mu or x bar it is likely it is the distribution of windows to the truth, so it is the mu of a whole bunch of x bar.2813

Here instead of N and n, it is N sub x bar and this is only in the case where we use simulations.2828

Now that we just derive it through the central limit theorem and variance we will call it sigma2, 2848

but with the x bar and for standard deviation instead of calling it sigma, it is just sigma sub x bar.2861

Notice that there are these little x bars here and that is what really sets this sampling distribution of the mean apart 2871

because it is about the mean(means), n(mean), the variance(means) and the standard deviation(means).2880

It is always about means of sample means, and so because of that you always see the sub x bar here for all these expected values.2890

That is how these things fit in together and now let us answer some of these questions that remain. 2900

This was from the previous lesson.2910

We even went over sampling distributions in general. 2913

We had some questions that remained.2918

Perhaps the SDOM can help us answer some of these questions. 2920

What happens when we do not know what the population looks like?2925

In the case of the SDOM, we do not have to know what the population looks like.2929

If you use SDOM or the sampling distribution of sample means then you do not need know what the population looks like.2937

We do not have to know whether it is uniform or skewed or anything like that if we use the SDOM 2951

because we know the shape, mean, and spread of the SDOM.2970

We could use that instead of having to rely in the middleman.2978

Can we have sampling distributions for summary statistics other than the mean?2983

Why yes you can, because if you want you can play with the simulation further and I am just going to clear this and go with normal again. 2988

We have to use the mean we can actually use the median and so we can look at what the median looks like 3000

or we can look at standard deviation and look at what standard deviation looks like.3006

We could also look at the range, the interquartile.3015

We can look at variance and we can look at that for a whole bunch of different kinds of n as well as different kinds of summary statistics. 3024

If we look at that we could say that standard deviation, it is that what it looks like.3050

What about the median?3056

We can look at that but when we look at medians and standard deviation or variance, the CLT does not necessarily apply full force. 3058

For instance, the median here is not necessarily going to be the median here.3074

For instance, let me show you a custom example, notice that the median here is similar to the median here, 3082

but if we look at for small sample sizes here the median is pretty similar. 3102

Okay, so there at least the median works for this one.3114

The median work, but I think there are cases where the median does not necessarily work.3118

Like this one, here the median is 17 but here the median is 16.3129

The median do not always necessarily equal the median of the sampling distributions of median.3141

In these kinds of other sample statistics such as median and variances and standard deviation, 3147

you do not necessarily have all of the properties of the CLT.3157

There are some exceptions.3161

For instance, when you standard deviation very often you do get roughly normally distributed distributions.3163

But it is not quite as regular as the sampling distribution of the mean and there are some exceptions here and there 3174

but it is sort of hudge pudgy only the sampling distribution of the mean really fit all 3 properties of the CLT.3187

Here we can say yes but CLT does not necessarily apply.3194

How do we know whether a sample is sufficiently unlikely?3214

We do not really know for sure but one of the things that we talked about when we talked about normal distributions in the past 3222

is that we have this normal distribution, we can tell you where 99% of the means might fall or 95% or 90% of the means might fall.3230

We can actually make these cut off points once we know that it is a normal distribution. 3245

We can decide as long as it is way different than 99% of what is expected we can say it is sufficiently unlikely.3259

We can set these sort of arbitrary marks.3267

Although it does not say exactly how different it has to be, we could set marks given that it is a normal population.3270

Do we always have to simulate a large number of samples in order to get a sampling distribution?3280

No, not necessarily because of the CLT.3286

The CLT does not rely on simulations.3293

Simulations are really a way of empirically doing it like so pretending to do it many, many times, but because of the CLT we could actually go directly to it. 3306

We do not actually have to do the simulations.3322

Let us go into some examples.3324

Example 1, from 1910 to 1919 the batting averages of major league baseball players was approximately normal.3328

It is the mean and standard deviation.3337

Can you construct a sampling distribution of the mean batting average for random samples of 15 players?3339

What would be the resulting shape, center, and spread?3351

To me it helps just like that simulation it have the populations up here and the sampling distributions down here.3354

What I like to do is draw the population and the SDOM just to give myself a sense of what is going on.3365

The batting averages approximately normal with a mean of .266 and a standard deviations of .037.3373

The mu is .266 and standard deviation is .037.3388

What is the resulting shape, center, and spread of the sampling distribution of the mean batting average for random samples of 15 players?3396

We know that whatever this is, this should probably be skinnier but it should be approximately normal.3405

n is 15 because we are using n because it is the size of our sample.3425

We will use N sub x bar if we are talking about the number of simulations that we had done.3433

But we are not doing any simulations.3441

What would be the resulting shape?3444

I would say approximately normal.3448

What would be the center?3455

That is mu sub x bar and we know that mu sub x bar = mu and so that equals .266.3462

There we have center and what about spread?3472

Spread would be the standard deviation of the SDOM so sigma sub x bar.3481

We know that sigma sub x bar is equal to sigma ÷ √n (.037) / √15 .3491

I am just going to pull out in Excel file in order to do my calculations.3503

Feel free to use a calculator.3510

Divided by √(15 )and we get .00966.3512

We see that this is much smaller than than this which is skinnier shape so skinnier spread. 3523

Not as much spread.3536

There you have it.3538

We have the shape, center, and spread.3543

Example 2, consider the sampling distribution of the mean, I like to think of SDOM of a random sample of size n 3546

taken from a population of size N, with a mean of mu and a standard deviation of sigma.3557

For a fixed n how does the mean of the sampling distribution change as n increases?3566

For a fixed n how does the standard error change as n increases?3573

Whether n increases or not, mu sub x bar = mu.3579

n does not matter very much and that is one of the things we solved.3589

For a fixed N, how does the standard error change as n increases?3599

We know that sigma sub x bar, the standard error is sort of inversely proportional to the √n and the standard deviation of the population.3606

How does the standard error changes and increases?3626

Standard error becomes smaller that is the inverse relationship as one goes up, the other comes down.3630

Here whether n increases or decreases this relationship stays the same.3650

Here as n increases the standard error decreases. 3657

For most farms in the Mid West, each one 1,000 of an acre produces on average 15,000 kernels of corn with a standard deviation of 2,000.3662

Supposed 25 of these many plots are randomly chosen on a typical Mid Western farm, 3677

what is the probability that the mean number of kernels per plot will exceed 15,000, 16,000, 17,000.3686

Let us think about with this is asking us.3695

Imagine an acre, split up into 1,000 tiny, tiny, tiny little plots and pretend that 1 is 1,000.3697

I know it does not look quite right but pretend that it is.3710

In that little plot, this is the mean and this is the standard deviation for most farms.3713

Suppose 25 of these mini plots are randomly chosen on a typical Mid Western farm. 3721

What it is telling us is we do not know what this population?3727

What that distribution looks like?3731

We only know that on average the mu is 15,000 and the standard deviation is 2,000.3733

We want to know what is the probability that will take a plot and look at how many kernels there are 3746

and what is the probability that mean number will exceed 15,000.3759

You cannot actually just look at the population because we do not know if the population is normal, but we do know that the sampling distribution of the mean.3766

That distribution is approximately normal.3779

We will use the SDOM and although we cannot draw the population because we do not know what it looks like.3783

But we do know what this looks like.3797

This looks like this.3799

We know what the mu is mu sub x bar.3801

We know that is going to be 15,000 because we know it is the same as this.3807

We also know the standard error of this.3812

The standard error is standard deviation ÷ √n .3817

n is going to be 25.3825

That is 2,000 ÷ √25=2,000 ÷5=400.3829

We know that this is 400 and so we can say what is the probability that the mean number of kernels per plot will exceed 15,000?3856

We take 25 plots from 25 of these many plots and we find the mean.3876

What is the likelihood that the mean will exceed 15,000?3884

Let me color the different color.3888

15, 000 is right here, what is that probability?3893

The probability that that mean exceeds 15,000?3900

We know that is 50% because we know that normal distributions are symmetrical and that is 15%.3905

What about for 16,000?3928

One thing that might be helpful to know is how many standard deviations away?3932

If you remember, you have to sort of think back to all that normal distributions stuff.3939

We need to know where 16,000 is.3946

Each jump is 400 and we need to know how many 400 away the 16,000 is.3949

In order to do that we could actually just use Z score.3965

The Z score for 16,000 is 16,000 – 15,000 / 400.3969

That is 2.5 and we know that that the Z score where 16,000 is at 2.5. 3984

I am running out of rows.4016

I am just going to delete that row.4018

You can look at either in the back of your book how much this area is by using Z score of 2.5 givent hat the Z score is 0.4021

You can look at that little area and that is going to give us the probability that x bar = exceeds 15,000 and the probability that x bar exceeds 16,000.4041

The way that I am going to do it is I am going to look it up on my book.4060

I am going to use the normal distributions function in Excel.4064

Normsdist this gives me the area underneath the curve for a particular Z score 2.5.4068

This gives me this area.4085

I need to do 1 – this and this .0062.4091

The probability that x will exceed 15,000 is 50% but the probability that x bar will exceed 16,000 is only .0062.4099

Most of our x bars will be below 16,000.4121