Sign In | Subscribe
Start learning today, and be successful in your academic & professional career. Start Today!
Loading video...
This is a quick preview of the lesson. For full access, please Log In or Sign up.
For more information, please see full course syllabus of Statistics
  • Discussion

  • Download Lecture Slides

  • Table of Contents

  • Transcription

  • Related Books

Bookmark and Share

Start Learning Now

Our free lessons will get you started (Adobe Flash® required).
Get immediate access to our entire library.

Sign up for Educator.com

Membership Overview

  • Unlimited access to our entire library of courses.
  • Search and jump to exactly what you want to learn.
  • *Ask questions and get answers from the community and our teachers!
  • Practice questions with step-by-step solutions.
  • Download lesson files for programming and software training practice.
  • Track your course viewing progress.
  • Download lecture slides for taking notes.
  • Learn at your own pace... anytime, anywhere!

F-distributions

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

  • Intro 0:00
  • Roadmap 0:04
    • Roadmap
  • Z- & T-statistic and Their Distribution 0:34
    • Z- & T-statistic and Their Distribution
  • F-statistic 4:55
    • The F Ration ( the Variance Ratio)
  • F-distribution 12:29
    • F-distribution
  • s and p-value 15:00
    • s and p-value
  • Example 1: Why Does F-distribution Stop At 0 But Go On Until Infinity? 18:33
  • Example 2: F-distributions 19:29
  • Example 3: F-distributions and Heights 21:29

Transcription: F-distributions

Hi, welcome to educator.com.0000

We are going to talk about F distributions today.0002

So first we are going to review other distributions recovered besides F, namely the NT.0004

Then we are going to introduce the F statistic also called the variance ratio.0011

Then we are going to talk about the distribution of all these S, distribution of all these ratios and finally 0024.5 what Alpha and P value mean in an F distribution.0017

Because eventually were in a deep hypothesis testing with the F statistic.0029

Okay , first, these other distribution so we know how to calculate the Z statistic and we also know how to 0031

find the probability of such V value in a normal distribution.0044

But what is EZ distribution?0050

Well, imagine this.0053

Take a data set, let us just call it a population.0056

We take a data set, I will just draw a circle and we take some sort of sample from it, of size.0059

And we actually calculate the Z statistic for this sample so we calculate the goals, get the mean of this little0068

sample minus the mu divided by the standard error.0087

So you do that and then you plot the Z.0093

So imagine you replace all those that sample again to with replacement and you draw another sample and 0098

you do this again and then you plot that guy and you dump it back in, you draw another sample, you calculate Z.0115

So you do that over and over again many times which end up getting is a normal distribution overtime.0123

So many times if you plot Z you get a normal distribution and because of that we also call this a Z 0136

distribution because the distribution made up of a whole bunch of Z and it has the shape of a normal 0150

distribution so that is what we call a Z distribution.0159

Now, if you take that same idea and you do it, you get a sample, and instead of calculating Z for that simply0162

you calculate T, if you do this then and then you plot that and you do that over and over and over and over again you get a T distribution.0175

And this resulting t-distribution follows the rules of the t-distribution where it depends on the degrees of0195

freedom, how wide it is, at the lower your degrees of freedom assertive variable but the higher the bigger 0209

your degrees of freedom assertive, less variable and more normal it looks.0217

And so that is what we call the t-distribution.0222

So that is how Z statistic and the Z distribution sort of go together.0225

And this is how the T statistic and the t-distribution sort of go together.0232

And they just have to imagine taking a whole bunch of this sample, calculating whatever statistic and 0237

implying that statistic and then looking at the shape of those statistic.0245

So really what this is a sampling distribution of Z.0250

And this is a sampling distribution of T, instead of using means or Z squares to plot your plane instead use the T statistic.0260

And you could do that for anything you could do that for standard deviation and you can do for inter 0281

quartile, you can make the sampling distribution of anything you want.0286

That is important to keep in mind as we go into F distribution.0290

Okay so first thing is what is the F statistic?0293

We know how to calculate the T statistic and the V statistic, what is the F statistic?0301

Well, later on in these lessons were going to come across what we call the ANOVA, the analysis of variance.0307

Analyze means to break down and variance is well, you know what variances is, the spread of usually 0316

around the mean of your data set and so when we analyze variance, we are going to be breaking down 0325

variances into its multiple component and the F ratio happens to be ratio of those component variances.0332

And so I just want you to get sort of the big idea behind the F ratio not exactly how to calculate it, well get0343

into the details of that later on but the general concept.0352

So the S statistic usually is this idea that we have let us say two samples, x1, x2, x3, y1, y2, y3.0356

Now there is always some variation within the sample within exit there is some variation.0370

And within the Y there are some variation.0379

So there is definitely some variation but there is another variation here that we are really interested in.0385

We are really interested in the difference between these two things.0392

Between these two samples, so the F statistic really is taking those ideas and turning it into a ratio and here is what a ratio looks like.0397

It is really the between sample variance all over the within sample variance.0408

I remember variance is always squared, the average squared distance away from the mean and so because 0425

of that this is a squared number, this is the square number, they are both positive so this number is always going to be greater than zero.0433

There is no way that this number could be less than zero so the statistic is always going to be greater than zero.0442

Now another way to think about between sample variance and within sample variance is this.0449

Whenever we do these kind of test, we are really interested in the differences between the samples like that is really important to us.0454

But sometimes their difference is also like a part of that difference is going to be just inherent variation.0464

So sometimes there might be a difference between let us say,men and women, or people who got a 0478

tutorial versus people who did not, right?0486

People who study for the test versus people who did not, people went to private school versus people with public school.0488

There might be some difference between them?0495

But that difference is also going to have variation.0497

So this between sample variance often has inherent variation just variance you cannot do anything about 0500

inherent variation plus real difference the effect size between samples.0508

And noticed that we keep using this word between and that is to indicate that part, so between, that is the part that we are really interested in.0520

Over within sample variance and so here there is inherent variation between X and between the Y and that 0534

is not something we are interested in but it is good to know how variable are in the our little samples are.0557

Everyone very similar to each other, is very different, we need to compare the difference between the sample to the difference within the samples.0565

So this the inherent sample of the within sample variation is just inherent variation.0574

So these are all different ways of seeing the same thing and the reason why I want to say I also like this is0583

because later on we are not just going to be talking about between sample and within simple differences, we are going to add onto those ideas.0593

The final way I want you sort of think about the F statistic is basically this.0601

Ultimately in hypothesis testing, where going to want to know about differences between sample, that is the thing that were really interested in.0608

So it is going to be the variation that we want to explain because that is the reason that we did our research in the first place.0616

All versus the variation we cannot explain, not with this design at least.0631

So in our experimental design we will have these two groups and hopefully these groups will be similar to 0646

each other but different, similar within the group but different between the groups.0653

And that is why in a S statistic we want this variation that we want to explain to be quite large and this 0660

variation that we cannot explain or do anything about to come along for the ride where we want that to be relatively small.0667

Okay so let us do a limited thinking about the F ratio.0676

Now if we had a very big difference between the groups what kind of F ratio would we have?0679

When it is greater than one, less than one?0688

Well if our variation between the groups is bigger than the variation within the group then we should have 0690

a very large F so that should be F that is greater than one right so at least greater than one but maybe a lot 1144.0 greater than 1, it could be 2 over one or 2 over .5.0697

Any of those values which show between sample variances are a lot larger than within sample variance.0708

And so if there is a lot of within sample variance then that competes with the between sample variance so 0715

let us say there is a vague between sample difference but there is also a lot of differences within the 0728

samples themselves and sort of evens out and you might see an F that is smaller or even less than one right if this one is bigger than this one.0734

So that is how you could sort of think about the S statistic.0745

Now imagine getting that F statistic over and over and over again from the population and plotting a sampling distribution of S statistics.0748

What would you get?0761

Well, remember that F cannot go below zero because both numbers are going to be positive so the F really stops at zero.0763

But this is what the S statistic ends up looking like.0774

This is a skewed distribution and it has a positive tail.0778

That means it goes for a really long time on the positive side.0786

Its one-sided so it is not is not symmetrical, it is actually asymmetrical there is only a positive side and it is0792

because of the proportion of variances and variances are positive.0803

And like T is a family of distribution and you are going to be able to find the particular F distribution you are0810

working with by looking at the degrees of freedom in the numerator, the one about between sample 0819

differences and by looking at the denominator the sort of leftover or within sample differences variation.0829

So you are going to need both of those numbers in order to find out which S statistic you are working with 0847

and in Excel, it will actually ask you for the degrees of freedom for the numerator and denominator.0854

Now let us talk a little bit about what Alpha means here.0861

Alpha here, it will still need a cutoff point so critical F instead of a critical T or Z.0866

You will still need a critical F and the Alpha will still be our probability of making false alarm given that the null distribution is true.0877

This is the null F distribution just saying.0890

And the Alpha would be the same thing the probability of false alarm.0894

So once you know what that alpha sort of have, how you sort of picture that Alpha, let us talk about what that Alpha actually means.0899

If you go back to the original idea for that alpha the original idea is that cut off level.0910

So it is our level of tolerance for false alarms.0924

How the probability, the false alarm probability that we will tolerate and that is what we want.0930

We want Alpha to be very low.0945

Now our Alpha will be low, that is the smaller Alpha than this one, our Alpha will be low if our critical F is very big.0948

And what does it mean for F to be large?0962

This means our between sample variation variability is greater than our within sample variability.0964

And that is what it means and so as long as this is much larger than this, we have a large F and that is going0984

to mean a smaller a smaller chance of false alarm.0992

Now the Alpha is the cutoff level that we are going to set as the significance, the level that we will tolerate.0998

So what is the P value?1007

So the P value will be given our samples F, this is the probability that we would get this F or higher by chance in this probability.1009

So given our samples F actually will be easier so the idea is the probability, the false alarm probability for F1030

values, F statistics are equal to or more extreme than our sample, than the F from our sample.1058

So the probability that we would get an F greater than the one that we got so F from the sample.1080

So this is the F value once we have our sample statistic, this is the probability of false alarm that were willing to tolerate.1087

So it is the same idea as T statistics, the alpha, the P value and T statistics, we are just now applying it to a slightly different looking distribution.1101

Now examples.1112

Why does the F distribution stop at zero but go on in the positive direction until infinity?1117

Well, we know why it stops at zero.1122

The F distribution is a ratio of two positive numbers and we know that they are positive because variance squared, thus making it always positive.1125

But it goes on until infinity because there is no rule that says you can only be this much bigger in the 1148

numerator than denominator so the numerator can be like infinitely as big as the denominator who could go on forever and ever.1159

Example 2, in an F test also called the one-way ANOVA which we are going to talk about in a little bit, the P1168

value, you did an F test and the P value is .034, what is the best interpretation of this result?1177

It is plausible that all the samples are roughly equal.1186

So here we are thinking about let us say two sample and we need this versus this.1191

So the F value is between variation over within variation and if we have a big F value, if we have a big 1203

enough F value, so sample F then we can have a small P value .034.1229

So is it possible that all the samples are roughly equal?1239

No because we seem to have a large enough between sample variance so I would say no to that one.1247

It is possible that all the sample variances are roughly equal.1256

Well, that also is not necessarily what this means it could be that these within variations are very similar to1261

each other but that is not what this P value is talking about.1269

The within sample variation is much larger than the between sample variation.1272

Well, it is true we would have a small F instead it is this one.1278

The between sample variation is much larger than within.1283

So D is our answer.1286

Example 3, consider the height of the following pairs of samples.1288

Which will have the largest F.1295

Which will have the smallest F.1297

Okay let us think about this.1299

So players from NBA team Lakers versus adults in LA.1301

Well, if we draw those two population, Lakers versus LA.1306

This probably has a lot of variance, a lot of variance here, that is a lot of people, this probably have a very1313

small variance but there is probably pretty sizable difference between those two groups of people right like1321

average adult versus like the Lakers were probably all amazingly tall.1330

Well so that is the picture here.1335

Will this have a larger, will this have a smaller.1338

Well, what about adults in San Francisco versus adults in LA.1341

Well, this 2 probably both have a lot of within sample variation there's lots of adults in San Francisco, lots of1348

adults in a LA, they are all different from each other but their average just should probably be similar, it is1355

not like San Francisco's no pursuit for tall people or LA is no pursuit for tall people so this difference 1362

between the groups will probably be very small but the within group variability will be very large so I would1368

guess this would have actually a pretty small F, and what about this one.1375

This one is players from an NBA team Lakers versus players from another team and so here we might think 1381

Lakers, Clippers, and there is probably a pretty small variation here probably everybody is like about 6 feet1393

tall, and so they are probably all like super tall so there is not a lot of variation but there also probably similar across the teams to.1401

So because probably the average height on the Lakers is probably similar to the average height on the 1416

Clippers just that they are both tall groups of people so which one of these will probably have the largest F?1423

I think the biggest difference between the groups might actually be this one.1430

So I would guess I would go at this one given that I am not really sure about the variance here.1436

The variance is smaller but I am not sure how to compare these so far.1447

So this is the largest F and I am just going to go by having the largest numerator for sure.1452

Well, which will have the smallest F?1460

As in the smallest F would probably go at this one because not only does it have a small numerator but it 1464

has extremely large denominator so I would say this one would definitely have the smallest F.1472

So that is the end of F distribution.1478

See you next time for ANOVAs on educator.com.1483