For more information, please see full course syllabus of Statistics

For more information, please see full course syllabus of Statistics

### F-distributions

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

- Intro 0:00
- Roadmap 0:04
- Roadmap
- Z- & T-statistic and Their Distribution 0:34
- Z- & T-statistic and Their Distribution
- F-statistic 4:55
- The F Ration ( the Variance Ratio)
- F-distribution 12:29
- F-distribution
- s and p-value 15:00
- s and p-value
- Example 1: Why Does F-distribution Stop At 0 But Go On Until Infinity? 18:33
- Example 2: F-distributions 19:29
- Example 3: F-distributions and Heights 21:29

### General Statistics Online Course

### Transcription: F-distributions

*Hi, welcome to educator.com.*0000

*We are going to talk about F distributions today.*0002

*So first we are going to review other distributions recovered besides F, namely the NT.*0004

*Then we are going to introduce the F statistic also called the variance ratio.*0011

*Then we are going to talk about the distribution of all these S, distribution of all these ratios and finally 0024.5 what Alpha and P value mean in an F distribution.*0017

*Because eventually were in a deep hypothesis testing with the F statistic.*0029

*Okay , first, these other distribution so we know how to calculate the Z statistic and we also know how to*0031

*find the probability of such V value in a normal distribution.*0044

*But what is EZ distribution?*0050

*Well, imagine this.*0053

*Take a data set, let us just call it a population.*0056

*We take a data set, I will just draw a circle and we take some sort of sample from it, of size.*0059

*And we actually calculate the Z statistic for this sample so we calculate the goals, get the mean of this little*0068

*sample minus the mu divided by the standard error.*0087

*So you do that and then you plot the Z.*0093

*So imagine you replace all those that sample again to with replacement and you draw another sample and*0098

*you do this again and then you plot that guy and you dump it back in, you draw another sample, you calculate Z.*0115

*So you do that over and over again many times which end up getting is a normal distribution overtime.*0123

*So many times if you plot Z you get a normal distribution and because of that we also call this a Z*0136

*distribution because the distribution made up of a whole bunch of Z and it has the shape of a normal*0150

*distribution so that is what we call a Z distribution.*0159

*Now, if you take that same idea and you do it, you get a sample, and instead of calculating Z for that simply*0162

*you calculate T, if you do this then and then you plot that and you do that over and over and over and over again you get a T distribution.*0175

*And this resulting t-distribution follows the rules of the t-distribution where it depends on the degrees of*0195

*freedom, how wide it is, at the lower your degrees of freedom assertive variable but the higher the bigger*0209

*your degrees of freedom assertive, less variable and more normal it looks.*0217

*And so that is what we call the t-distribution.*0222

*So that is how Z statistic and the Z distribution sort of go together.*0225

*And this is how the T statistic and the t-distribution sort of go together.*0232

*And they just have to imagine taking a whole bunch of this sample, calculating whatever statistic and*0237

*implying that statistic and then looking at the shape of those statistic.*0245

*So really what this is a sampling distribution of Z.*0250

*And this is a sampling distribution of T, instead of using means or Z squares to plot your plane instead use the T statistic.*0260

*And you could do that for anything you could do that for standard deviation and you can do for inter*0281

*quartile, you can make the sampling distribution of anything you want.*0286

*That is important to keep in mind as we go into F distribution.*0290

*Okay so first thing is what is the F statistic?*0293

*We know how to calculate the T statistic and the V statistic, what is the F statistic?*0301

*Well, later on in these lessons were going to come across what we call the ANOVA, the analysis of variance.*0307

*Analyze means to break down and variance is well, you know what variances is, the spread of usually*0316

*around the mean of your data set and so when we analyze variance, we are going to be breaking down*0325

*variances into its multiple component and the F ratio happens to be ratio of those component variances.*0332

*And so I just want you to get sort of the big idea behind the F ratio not exactly how to calculate it, well get*0343

*into the details of that later on but the general concept.*0352

*So the S statistic usually is this idea that we have let us say two samples, x1, x2, x3, y1, y2, y3.*0356

*Now there is always some variation within the sample within exit there is some variation.*0370

*And within the Y there are some variation.*0379

*So there is definitely some variation but there is another variation here that we are really interested in.*0385

*We are really interested in the difference between these two things.*0392

*Between these two samples, so the F statistic really is taking those ideas and turning it into a ratio and here is what a ratio looks like.*0397

*It is really the between sample variance all over the within sample variance.*0408

*I remember variance is always squared, the average squared distance away from the mean and so because*0425

*of that this is a squared number, this is the square number, they are both positive so this number is always going to be greater than zero.*0433

*There is no way that this number could be less than zero so the statistic is always going to be greater than zero.*0442

*Now another way to think about between sample variance and within sample variance is this.*0449

*Whenever we do these kind of test, we are really interested in the differences between the samples like that is really important to us.*0454

*But sometimes their difference is also like a part of that difference is going to be just inherent variation.*0464

*So sometimes there might be a difference between let us say,men and women, or people who got a*0478

*tutorial versus people who did not, right?*0486

*People who study for the test versus people who did not, people went to private school versus people with public school.*0488

*There might be some difference between them?*0495

*But that difference is also going to have variation.*0497

*So this between sample variance often has inherent variation just variance you cannot do anything about*0500

*inherent variation plus real difference the effect size between samples.*0508

*And noticed that we keep using this word between and that is to indicate that part, so between, that is the part that we are really interested in.*0520

*Over within sample variance and so here there is inherent variation between X and between the Y and that*0534

*is not something we are interested in but it is good to know how variable are in the our little samples are.*0557

*Everyone very similar to each other, is very different, we need to compare the difference between the sample to the difference within the samples.*0565

*So this the inherent sample of the within sample variation is just inherent variation.*0574

*So these are all different ways of seeing the same thing and the reason why I want to say I also like this is*0583

*because later on we are not just going to be talking about between sample and within simple differences, we are going to add onto those ideas.*0593

*The final way I want you sort of think about the F statistic is basically this.*0601

*Ultimately in hypothesis testing, where going to want to know about differences between sample, that is the thing that were really interested in.*0608

*So it is going to be the variation that we want to explain because that is the reason that we did our research in the first place.*0616

*All versus the variation we cannot explain, not with this design at least.*0631

*So in our experimental design we will have these two groups and hopefully these groups will be similar to*0646

*each other but different, similar within the group but different between the groups.*0653

*And that is why in a S statistic we want this variation that we want to explain to be quite large and this*0660

*variation that we cannot explain or do anything about to come along for the ride where we want that to be relatively small.*0667

*Okay so let us do a limited thinking about the F ratio.*0676

*Now if we had a very big difference between the groups what kind of F ratio would we have?*0679

*When it is greater than one, less than one?*0688

*Well if our variation between the groups is bigger than the variation within the group then we should have*0690

*a very large F so that should be F that is greater than one right so at least greater than one but maybe a lot 1144.0 greater than 1, it could be 2 over one or 2 over .5.*0697

*Any of those values which show between sample variances are a lot larger than within sample variance.*0708

*And so if there is a lot of within sample variance then that competes with the between sample variance so*0715

*let us say there is a vague between sample difference but there is also a lot of differences within the*0728

*samples themselves and sort of evens out and you might see an F that is smaller or even less than one right if this one is bigger than this one.*0734

*So that is how you could sort of think about the S statistic.*0745

*Now imagine getting that F statistic over and over and over again from the population and plotting a sampling distribution of S statistics.*0748

*What would you get?*0761

*Well, remember that F cannot go below zero because both numbers are going to be positive so the F really stops at zero.*0763

*But this is what the S statistic ends up looking like.*0774

*This is a skewed distribution and it has a positive tail.*0778

*That means it goes for a really long time on the positive side.*0786

*Its one-sided so it is not is not symmetrical, it is actually asymmetrical there is only a positive side and it is*0792

*because of the proportion of variances and variances are positive.*0803

*And like T is a family of distribution and you are going to be able to find the particular F distribution you are*0810

*working with by looking at the degrees of freedom in the numerator, the one about between sample*0819

*differences and by looking at the denominator the sort of leftover or within sample differences variation.*0829

*So you are going to need both of those numbers in order to find out which S statistic you are working with*0847

*and in Excel, it will actually ask you for the degrees of freedom for the numerator and denominator.*0854

*Now let us talk a little bit about what Alpha means here.*0861

*Alpha here, it will still need a cutoff point so critical F instead of a critical T or Z.*0866

*You will still need a critical F and the Alpha will still be our probability of making false alarm given that the null distribution is true.*0877

*This is the null F distribution just saying.*0890

*And the Alpha would be the same thing the probability of false alarm.*0894

*So once you know what that alpha sort of have, how you sort of picture that Alpha, let us talk about what that Alpha actually means.*0899

*If you go back to the original idea for that alpha the original idea is that cut off level.*0910

*So it is our level of tolerance for false alarms.*0924

*How the probability, the false alarm probability that we will tolerate and that is what we want.*0930

*We want Alpha to be very low.*0945

*Now our Alpha will be low, that is the smaller Alpha than this one, our Alpha will be low if our critical F is very big.*0948

*And what does it mean for F to be large?*0962

*This means our between sample variation variability is greater than our within sample variability.*0964

*And that is what it means and so as long as this is much larger than this, we have a large F and that is going*0984

*to mean a smaller a smaller chance of false alarm.*0992

*Now the Alpha is the cutoff level that we are going to set as the significance, the level that we will tolerate.*0998

*So what is the P value?*1007

*So the P value will be given our samples F, this is the probability that we would get this F or higher by chance in this probability.*1009

*So given our samples F actually will be easier so the idea is the probability, the false alarm probability for F*1030

*values, F statistics are equal to or more extreme than our sample, than the F from our sample.*1058

*So the probability that we would get an F greater than the one that we got so F from the sample.*1080

*So this is the F value once we have our sample statistic, this is the probability of false alarm that were willing to tolerate.*1087

*So it is the same idea as T statistics, the alpha, the P value and T statistics, we are just now applying it to a slightly different looking distribution.*1101

*Now examples.*1112

*Why does the F distribution stop at zero but go on in the positive direction until infinity?*1117

*Well, we know why it stops at zero.*1122

*The F distribution is a ratio of two positive numbers and we know that they are positive because variance squared, thus making it always positive.*1125

*But it goes on until infinity because there is no rule that says you can only be this much bigger in the*1148

*numerator than denominator so the numerator can be like infinitely as big as the denominator who could go on forever and ever.*1159

*Example 2, in an F test also called the one-way ANOVA which we are going to talk about in a little bit, the P*1168

*value, you did an F test and the P value is .034, what is the best interpretation of this result?*1177

*It is plausible that all the samples are roughly equal.*1186

*So here we are thinking about let us say two sample and we need this versus this.*1191

*So the F value is between variation over within variation and if we have a big F value, if we have a big*1203

*enough F value, so sample F then we can have a small P value .034.*1229

*So is it possible that all the samples are roughly equal?*1239

*No because we seem to have a large enough between sample variance so I would say no to that one.*1247

*It is possible that all the sample variances are roughly equal.*1256

*Well, that also is not necessarily what this means it could be that these within variations are very similar to*1261

*each other but that is not what this P value is talking about.*1269

*The within sample variation is much larger than the between sample variation.*1272

*Well, it is true we would have a small F instead it is this one.*1278

*The between sample variation is much larger than within.*1283

*So D is our answer.*1286

*Example 3, consider the height of the following pairs of samples.*1288

*Which will have the largest F.*1295

*Which will have the smallest F.*1297

*Okay let us think about this.*1299

*So players from NBA team Lakers versus adults in LA.*1301

*Well, if we draw those two population, Lakers versus LA.*1306

*This probably has a lot of variance, a lot of variance here, that is a lot of people, this probably have a very*1313

*small variance but there is probably pretty sizable difference between those two groups of people right like*1321

*average adult versus like the Lakers were probably all amazingly tall.*1330

*Well so that is the picture here.*1335

*Will this have a larger, will this have a smaller.*1338

*Well, what about adults in San Francisco versus adults in LA.*1341

*Well, this 2 probably both have a lot of within sample variation there's lots of adults in San Francisco, lots of*1348

*adults in a LA, they are all different from each other but their average just should probably be similar, it is*1355

*not like San Francisco's no pursuit for tall people or LA is no pursuit for tall people so this difference*1362

*between the groups will probably be very small but the within group variability will be very large so I would*1368

*guess this would have actually a pretty small F, and what about this one.*1375

*This one is players from an NBA team Lakers versus players from another team and so here we might think*1381

*Lakers, Clippers, and there is probably a pretty small variation here probably everybody is like about 6 feet*1393

*tall, and so they are probably all like super tall so there is not a lot of variation but there also probably similar across the teams to.*1401

*So because probably the average height on the Lakers is probably similar to the average height on the*1416

*Clippers just that they are both tall groups of people so which one of these will probably have the largest F?*1423

*I think the biggest difference between the groups might actually be this one.*1430

*So I would guess I would go at this one given that I am not really sure about the variance here.*1436

*The variance is smaller but I am not sure how to compare these so far.*1447

*So this is the largest F and I am just going to go by having the largest numerator for sure.*1452

*Well, which will have the smallest F?*1460

*As in the smallest F would probably go at this one because not only does it have a small numerator but it*1464

*has extremely large denominator so I would say this one would definitely have the smallest F.*1472

*So that is the end of F distribution.*1478

*See you next time for ANOVAs on educator.com.*1483

## Start Learning Now

Our free lessons will get you started (Adobe Flash

Sign up for Educator.com^{®}required).Get immediate access to our entire library.

## Membership Overview

Unlimited access to our entire library of courses.Learn at your own pace... anytime, anywhere!