Sign In | Subscribe

Enter your Sign on user name and password.

Forgot password?
  • Follow us on:
Start learning today, and be successful in your academic & professional career. Start Today!
Loading video...
This is a quick preview of the lesson. For full access, please Log In or Sign up.
For more information, please see full course syllabus of Statistics
  • Discussion

  • Download Lecture Slides

  • Table of Contents

  • Transcription

  • Related Books

Start Learning Now

Our free lessons will get you started (Adobe Flash® required).
Get immediate access to our entire library.

Sign up for

Membership Overview

  • Unlimited access to our entire library of courses.
  • Search and jump to exactly what you want to learn.
  • *Ask questions and get answers from the community and our teachers!
  • Practice questions with step-by-step solutions.
  • Download lesson files for programming and software training practice.
  • Track your course viewing progress.
  • Download lecture slides for taking notes.
  • Learn at your own pace... anytime, anywhere!

Repeated Measures ANOVA

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

  • Intro 0:00
  • Roadmap 0:05
    • Roadmap
  • The Limitations of t-tests 0:36
    • Who Uploads more Pictures and Which Photo-Type is Most Frequently Used on Facebook?
  • ANOVA (F-test) to the Rescue! 5:49
    • Omnibus Hypothesis
    • Analyze Variance
  • Independent Samples vs. Repeated Measures 9:12
    • Same Start
    • Independent Samples ANOVA
    • Repeated Measures ANOVA
  • Independent Samples ANOVA 16:00
    • Same Start: All the Variance Around Grand Mean
    • Independent Samples
  • Repeated Measures ANOVA 18:18
    • Same Start: All the Variance Around Grand Mean
    • Repeated Measures
  • Repeated Measures F-statistic 21:22
    • The F Ratio (The Variance Ratio)
  • S²bet = SSbet / dfbet 23:07
    • What is This?
    • How Many Means?
    • So What is the dfbet?
    • So What is SSbet?
  • S² resid = SS resid / df resid 25:46
    • What is This?
    • So What is SS resid?
    • So What is the df resid?
  • SS subj and df subj 28:11
    • What is This?
    • How Many Subject Means?
    • So What is df subj?
    • So What is SS subj?
  • SS total and df total 31:42
    • What is This?
    • What is the Total Number of Data Points?
    • So What is df total?
    • so What is SS total?
  • Chart of Repeated Measures ANOVA 33:19
    • Chart of Repeated Measures ANOVA: F and Between-samples Variability
    • Chart of Repeated Measures ANOVA: Total Variability, Within-subject (case) Variability, Residual Variability
  • Example 1: Which is More Prevalent on Facebook: Tagged, Uploaded, Mobile, or Profile Photos? 40:25
    • Hypotheses
    • Significance Level
    • Decision Stage
    • Calculate Samples' Statistic and p-Value
    • Reject or Fail to Reject H0
  • Example 2: Repeated Measures ANOVA 58:57
  • Example 3: What's the Problem with a Bunch of Tiny t-tests? 1:13:59

Transcription: Repeated Measures ANOVA

Hi, welcome to

Today we are going to talk about repeated measures ANOVA.0002

So the repeated measures ANOVA is a lot like the regular one way independent samples ANOVA that we have been talking about.0004

But it is also a lot like the paired samples t-test and so we are going to talk about why we need the repeated measures ANOVA .0014

And we are going to contrast the independent samples ANOVA with the repeated measures0022

ANOVA and finally we are going to breakdown that repeated measures at statistic into its component variant parts.0027

Okay so previously, when we talked about one-way ANOVA we talk initially about why we0035

needed it and the reason why we need ANOVA is that the t-test is limited.0044

So previously we talked about this example, who uploads more pictures, Latino white Asian or black Facebook users?0049

When we saw this problem and we thought about maybe doing independent samples t-test we realize we would have to do a whole bunch of little t-test.0058

Well let us get this problem.0066

It is similar in some ways but it is also a little bit different so here is the question.0070

Which prototype is most frequently used on facebook?0076

Tagged, uploaded mobile uploads for profile pictures?0079

Now in the same way that this has many groups, at this all the problem also has many groups,0083

the one thing you could serve immediately tell us of that is if we try to use t-test we also have to use a bunch of little t-test here.0091

But here is another thing.0099

These variables are actually linked to one another.0102

Often people who have tagged photos have a number of uploaded photos who have a number of0104

mobile uploads will also have a number of profile pictures.0110

So in this sense although these are made up of four just separate groups of users and the user0114

here is the linked to any of the users and the other groups Latino, white, Asian, black groups, here we have these four sets of data.0121

Tagged, uploaded mobile or profile pictures but the number of tagged photos is linked to some0135

number of uploaded photos probably because they come from the same person and maybe this0146

person owns the digital camera that they really loving carry around everywhere.0153

So these scores in these different groups are actually linked to each other and these are what we0158

have called previously dependent samples or we called them paired samples before because0167

there were only two groups of them at that time but now we have four groups but we could still see that linked principle still hold.0173

So here were talking about were talking about different samples, multiple numbers samples0181

more than two but these samples are also linked to each other in some way.0188

And because of that those are called repeated measures because we are repeatedly measuring something over and over again.0194

Measuring photos here measuring photos here measuring photos here measuring photos here and because that is called repeated measures.0204

It is very similar to the idea of paired samples except were now talking about more than two.0211

So 3, 4, 5 we call those repeated measures so we have the same problem here as we did here.0217

If we have a bunch of t-test of our solution is a bunch of t-test, we have two problems whether their paired t-test or independent samples.0225

So in this case they would be paired.0243

That even in the case of paired t-test of the same problems that we did before, the first problem0247

is that with so many t-test the probability of false alarms goes up.0254

So this is going to be a problem.0258

And it is because we reject more null hypotheses every time you reject a null hypotheses you have a .05 chance of error.0264

So where compounding that problem.0273

The 2nd thing that is wrong when we do a whole bunch of little t-test instead of one giant test is0277

that we are ignoring some of the data when where calculating the population standard deviation.0284

So what we estimate that population standard deviation the more data of the better but if we0291

only look at two of the sample that a time then were ignoring the other two perfectly good sets0297

of data and were not using them in order to help us estimate more accurately the population standard deviation.0303

So we get a poorer estimate of S because we are not using all the data at our disposal.0311

So that is the problem and we need to find a way around it, thankfully, Ronald Fisher comes to the rescue with his test.0338

Okay so the ANOVA that is our general solution to the problem of too many tiny t-test.0349

But so far we only talked about ANOVAs for independent samples.0356

Now we need an ANOVA for repeated measures so the ANOVA is always going to start the same way with the Omnibus hypothesis.0360

One hypothesis to rule them all and the Omnibus hypothesis almost said all the samples come from the same population.0370

So the first group of photos equals the mu of the second group of photos equals the mu of the0378

third group of photos equals the mu of the fourth group.0389

And the alternative hypothesis is not that they are not all not equal to each other but that at least one is different, outlier.0392

And so the way we say that is that all mu’s of P, all the mu’s of the different photo type are not the same.0404

Now we have to keep in mind the logic that all of these mu’s are not the same, is not the same as0421

saying all of the mu’s are different from each other.0430

And when we say all of them are not the same if even one of them is not the same then this alternative hypothesis is true.0433

So this starts off much the same way as independent samples from there we go on to analyze variance.0442

And here were going to use that S statistic again.0450

And Ronald Fisher's big idea that he had upon is this idea that it when we talk about the F it is a0458

ratio of variances and really one way of thinking about it is the ratio of between sample or group variability over the within sample variability.0469

And another way of thinking about this is if the variability we are interested in and I do not just0495

mean that over passionate about it or we find a very curious but I really need is the variability0510

that we are making a hypothesis about over the variability that we cannot explain.0515

We do not know where that vary the other variability comes from, it just exists and we have to deal with it.0523

Okay and so this S statistic is going to be the same concept, the same concept will going to come0536

up, again we will talk about the repeated measures version of F.0544

There are going to be some subtle differences though.0548

Okay so let us talk about the independent samples ANOVA versus the repeated measures ANOVA.0551

People have the same start, they have the same hypothesis not only that but they both have the0561

same idea of taking all the variance in our sample and breaking it down into component parts.0567

Now what we talk about all the variance in our sample we really mean what is our sum of squares total.0572

What is the total amount of variability away from the grand mean in our entire data set?0581

And we can easily just from the sentence we could figure out what the formula for this would be.0590

This should be something like the variability of all every single one of our data point minus the0597

grand mean which we signify with two bars the double bar square and the Sigma now to do this0606

for every single data point not just the data point in one sample while the way it knows to do that is because this should say N total.0616

So this is going to go through every single data point in every single sample and subtract get the0625

distance from the grand mean in the square that distance and add those distances all squared.0634

Okay so that is the same idea to begin with.0643

Now we will take this at this total and break it down into its component parts.0647

Now an independent samples what we see is that all of the variability that were unable to0651

explain lies within the group, all of the variability that we are very interested in, that is between0664

the group, and so independent samples the story becomes, as this total is a conglomeration, it is0671

when you split it up into its part you see that it is made up of the sum of squares within the0681

group inside of the group and the sum of squares between the groups added up.0690

And because of that the S statistic here becomes the variance between over the variance within0696

and obviously each of these variances corresponds to its own sum of squares.0711

Now the repeated measures ANOVA were going to be talking about something slightly different because now we have these linked data.0720

So here the data is independent these samples are independent they are not linked to each other in any way.0730

Here, these samples are actually linked to each other.0738

Either by virtue of being made from the same subject or the same class produced or something about these scores are linked to each other.0744

So not only is there variability across the groups just like before so sort of between the groups and variability within the group.0753

But now we have a new kind of variability.0770

We have the variability caused by these different linkages, these all are different from each other but maybe similar across.0774

So the person who owns a digital camera they might just have an enormous number of photos all across the board.0785

The person who does not have a digital camera or not even a smartphone might have a low number of photos across-the-board.0792

So there are those things that we call often are called individual differences.0799

Those are differences that we actually mathematically quantify, we could actually explain where0806

it is but we are not actually interested in the study, were really interested in the between group difference.0813

But this is not all.0822

Once you have taken out this individual of variability there is still some residual within group variability left over.0825

And so that is that is really stuff we cannot explain, it is not caused by the individual differences0834

it is not because of between group, it is just within group differences.0842

So in repeated measures the sum of squares total actually breaks down slightly differently even0847

though it is still it is still this idea of breaking down the sum of squares total now it actually splits0856

up into some of squares subjects, this individual links the yellow part plus the sum of squares0865

within just like before that now we call it residual because this we have taken out the sum of the0878

variability that comes from the individual differences and so because of that there is there is only0889

left over and so because that we call it residual just like the words left over.0896

And of course the sum of squares between which is what were actually very interested in.0902

So just to recap, this is something that we can explain how there were not interested in, this is0907

something we cannot explain and this is something we are very interested in.0915

So, our S statistic will actually become our variability between divided by our variability residual0920

and in fact we just wanted to take this guy out of the equation we want him to out of the equation of F.0931

So F, it does not count the variability from the subjects, were individual difference, the are not interested in that.0940

Okay so I wanted to show you this and a picture here is what I would show you.0953

Here is what we mean by independent samples, remember, the independent samples ANOVA it0960

is always been a start off with that same idea and total the difference between each data point0966

from the grand mean squared and then add all of those up that is the total sum of squares.0975

In independent samples, what were going to do is take all of this all of this variability that SS total.0983

That is that SS total of the total variance and we are going to break them up into between group variance.0994

So think of this, this is just to signify the difference of all of these guys from the grand mean.1012

So the between group differences, so SS between and add to that the within group variability.1020

The variability that we have no explanation for.1038

So that is the within group variability.1041

So it only makes sense that the variability between divided by the variability within, this is what1046

we would use in order to figure out the ratio of the variability we are interested in or1067

hypothesizing about divided by the variability we cannot account for.1077

So this becomes the S statistics that were very much interested in.1090

Now when we talk about the repeated measures ANOVA, once again we start off similarly for1096

every single data point we want their squared distance away from the grand mean and add them all up.1106

In order to see this as a picture you want to see that whole this whole idea here that did the1113

distance of all of these away from the grand mean that is SS total.1123

However what we wanted to do is then break it up into its component parts and just like before1129

we have these differences between the groups so that is SS between.1138

And that SS between is the stuff that were really interested in so that is also going to be a factor here.1147

Where we take the variability between but then, we want to break up the rest of the variance1156

into one part that we can actually explain, we could account for it and into the rest of the residuals that we cannot explain.1165

So even when we are not interested in that we could actually account for the variability.1174

You could think of it across these rows because notice that person one, the viewer photos1182

across-the-board, person three just has more photos across-the-board and so those are the kinds1194

of individual differences, little level differences that we do not actually want in our S statistic.1204

Its variability we know where it comes from which is not interested in it in terms of our hypothesis testing.1211

So we have this SS subject, put a little yellow highlight here so that you know what it stands for1217

and that is the variability that we can explain but not part of my hypothesis testing.1232

And so what variability are we left with, we are left with any leftover variability, there is some1240

leftover variability and we call that residual variability and that is going to be SS residual.1248

And if we want to look at the variability that were interested in over the variability we cannot1257

explain, we are not going to include this variability, we are only going to use this one.1264

So the variability between groups divided by the variability residual, residual variability.1269

Once we have that now let us break it down even further.1281

So the repeated measures S statistic now you sort of know basically what it is.1289

It is the is the variability between groups divided by the variability within group.1294

Now we could break these up into their component parts so it is going to be the SS between1307

sum of squares between, divided by the degrees of freedom between, all divided by the sum of1316

squares of partnership residual, residual over the degrees of freedom residual.1327

So far it just looks like what we have always been doing with variability sum of squares through1338

the freedom but now we have to figure out okay how do we actually find these things.1348

And in fact, this is something you already know because this is actually exactly the same as the independent samples ANOVA.1355

The only thing that can really be different is this one.1371

Okay so let us start off here.1387

So this is what we are really looking for when we start double-click on this guy we double-click1390

on that variability what we find inside is something like this and then double-click on each of these things and figure out what is inside.1398

So conceptually this is idea the whole idea of the variability between groups is the difference of1405

sample mean from the grand mean because we want to know how each sample differs from that grand mean.1413

Now let us think about how many means we have because that is going to determine our degrees of freedom.1419

So how many means we have is usually K.1426

How many samples we have and so with the degrees of freedom between well it is going to be K -1.1430

And the way you can think about this is how many means we find, where you find three means1441

or how many means as groups so if we have four groups it would be for means if we had three groups it would be three means.1448

And if we knew two of them and we know the grand mean, we could actually figure out third and1456

so because of that is going our degrees of freedom is K – 1, the number means -1.1465

Okay so what is sum of squares between?1471

Well it is this whole idea of the difference of sample means from the grand mean and we could1479

say that the sample mean away from the grand mean we have a whole bunch of sample means.1484

Something to put my index there.1489

And we are going to square that because the study some of squares, and each means distance1491

should count more if that sample has a lot of members and I should get more votes so we are1498

going to multiply that by N sub I, how many in their sample.1504

And in order to figure out what I mean when I say okay let us think about that, I is going to stand1511

for each group so this Sigma is going to have to go from I = 1 through K how many groups.1518

And then it is going to cycle through group 1, group 2, group 3, group 4 and this is some of squares between.1527

Additional is very similar because this is actually the same thing from independent samples ANOVA.1536

So now we have to figure out how to find the other sum of squares the new one.1544

Sum of squares residual and degrees of freedom residual and the whole reason we want to do1551

that is because we want to find the variability residual, the leftover variability.1555

If any leftover spread within the groups is not accounted for by within subject variation.1562

Now within subject might mean within each person right but it might mean within each hamster1570

or each company that is being measured here repeatedly so whatever it is that your case is1578

whether animal or human or entity of some sort , that is considered within your subject of variability.1585

And those subjects are all slightly different from each other.1595

But that is not something that were actually interested in so we want to take that out and take the leftover variability.1598

And because it is the idea of leftover, we actually cant find a lot of this, we can find some of1604

squares directly, we have to find the leftover.1614

And so the way we do that is take the total sum of squares and then subtract out the stuff we do1617

not need which is namely the sum of squares between as well as the sum of squares within subject to the variability within subject.1624

And so here we see that we are going to have to find some of squares for everybody were1637

enough to find total as well as for the within subject, we already knew we have to find this one,1642

and that is how where we can find some of squares residual, literally whatever is left over.1650

In the same way to find degrees of freedom residual, we should have to know something about1656

the other degrees of freedom in order to find this sort of our whatever's left.1661

And so in order to find degrees of freedom residual, what we do is we multiply together the1667

degrees of freedom between times the degrees of freedom within subject and when we do this,1674

we are going to be able to find all the degrees of freedom that is leftover .1683

Okay so we realize in order to find sum of squares residual we have to find all these other sum of squares so here is some of squares within subject.1689

So the way to sort of think about this notion is this idea that were really talking about subject level or case level, subject level variation.1702

So each case differs a little better from the other cases for God knows what reason right but we1716

can actually account for it here, it is not totally unexplained we do not know why it exists, we1723

know it exists because the subjects are all slightly different from each other and we do not know1729

why it exists but we know what it is and we could calculate it.1734

Okay so conceptually you want to think about this at how far each subjects mean is away from the grand mean.1738

Remember in repeated measures we are repeatedly measuring each subject or case, we are1748

measuring them multiple times so if I am Facebook user I will be contributing for different scores to this problem.1754

Now I know what you could do is get a little mean just for me right?1763

The little mean of my four scores and that is my subject mean.1770

So each subject has her own little mean and we want to find the distance of those little means away from the grand mean.1774

So let us think how many subject means do we have?1783

We have N number of subjects, that we have an number of samples for each for each sample and number of measures for each sample.1787

So that is our sample size.1800

So what is degrees of freedom for within subjects?1803

Well, that is going to be N-1.1806

So what is the sum of squares for each subject?1808

Well one of the things you have to do is sort of figure out a way to talk about the subject level mean.1814

So here, I am just going to say mean and put a index for now but here in my in my little telling that1820

this Sigma will tell this what I is, I will go from one up to N sample size.1835

This is really the subject means and I want to get the distance squared distance from each1843

subject mean to the grand mean and square that, squared distance and we should also take into1853

account how many times is the subject being measured and that is going to be K number of times.1860

How many how many samples are taken how many measures are taken so repeated measures how many times the measure is repeated.1867

And the more times a subject participates the more this variation will count.1878

So there we have subject level variation, we are really only finding it so that we can find SS1889

residual, so better do it and so we also have to find sum of squares total and degrees of freedom total.1899

These are something we have gone over that just to drive it home remember the reason we1908

want to find this is just so we can find sum of squares residual.1913

So conceptually this is just the total variation of all the data points away from the grand mean.1916

What is the total number of data points?1922

That is going to be N total.1925

So every single data point counted up and the way we find that is sample size N times the1928

number of samples we have so if we had 30 people participating in four different measures, it is1937

30 times 4 and the number of samples is called K so NK of N subtotal.1944

So what is the degrees of freedom total?1954

Well it is either going to be N total minus 1 or the same exact numerical value will be NK -1 either way.1957

And so what is the sum of squares total?1967

Well we have already been through it this is what we always start off with at least conceptually1970

for every single data point notice that there is no bars on it not any means of literally every single data point.1975

The distance from the grand mean squared and we could put NK here just to say go and do this1984

for every single data point do not leave one behind.1994

So we have all of our different components, now let us put them together in this chart so that you will know how they fit together.1997

Remember the idea of the F is the variation we are interested in over variation we cannot2009

explain, we cannot account for do not know where it comes from, it is a mystery.2027

The formula for this is going to be F equals, I remember this is for repeated measures so that2033

between sample variability over the residual variability.2043

And in order to find that we are going to need between sample variability.2051

The idea is always going to be the best sample means difference from grand mean.2057

So basically the centres of each sample away the distance away from the grand mean and so the2072

formula for that is going to be S squared between equals SS between over the DF between and2084

we can find each of those component parts SS between going to be the sum to many zigzag, the2094

sum of all of my all of my X bars minus the grand mean the distance and when I say all I mean one at a time.2103

Each as individuals one at a time, and this distance should count more if you have more people2118

or more data point in your sample and I does not go from one to N, it goes from one to K, I am2125

going to do this for each sample, NK is my number of samples or number of groups.2134

So my degrees of freedom between is really going to be K -1, number groups -1.2139

Okay so now let us try to get residual variability.2148

Now residual variability is that leftover within groups within sample variability, now in order to2155

get leftover the formula for this is going to be the variability residual.2169

Now to get that you get the residual sum of squares and divide by the residual degrees of2181

freedom, the residual sum of squares is literally going to be the left over.2190

SS total minus SS subject plus SS between.2197

And my degrees of freedom residual is going to be a conglomeration of other degrees of2209

freedom available to S times the degrees of freedom between, okay.2222

So we know that in order to find these, so the total variability let us start there, we know this one2230

pretty well, all the data point in all of our samples away from the granting and so we actually do2242

not need the variability here and we do not need this variability either.2269

What we really need is the sum of squares total and that is going to be for each data point no X2274

bar anything get this squared distance from the grand mean.2283

So now that we have that we do not really need that but we can find it anyway so the degrees of2288

freedom total is going to be NK -1 so the total number of data points minus 1.2299

Now let us talk about within subject variability this is the Brad of each case away from grand2308

mean and when you talk about each case, each case can sort of be represented the point2324

estimate of it can be its own means so each cases mean so that is how I want you to think of it.2331

Each case is represented by its own little mean and so that is why they were using it means to calculate the distance.2337

So that SS subject is going to be the distance of each subject level mean away from the grand2345

mean squared and in order to say subject level, you got to put that N here so that it knows do2360

this for each subject not do this for each data point or do this for each, if we put a K there would2368

be do this for each group and we wanted to count more if they participate in more measures so if the measures are repeated over and over again.2386

So we want to put in the number of k and so that gives us our are some of squares for each2387

subject and once we have those two we can find this as well as some of squares between and2394

then we also need the degrees of freedom for within subjects just because were in need that to find out the degrees of freedom residual.2400

This guy do all this jump through hoops.2411

So the degree of freedom for each subject for subject level variance is going to be N -1, the number of subjects -1.2414

Okay so here is example 1 which is more prevalent uploaded mobile profile photos and so these2423

are all different kinds of photos but one person or one Facebook user presumably 1 person, they2434

are sort of the linking factor of all four of those measures.2441

So what is the null hypothesis?2448

Well it is that all of these groups really come from the same population.2451

The reason I use this P notation is for just different types of photos and I will call this one 2, 3, and 4.2457

Also it makes it easier for me to write my alternative hypothesis, it has a practical significance so all use of P’s are not equal so they are not all equal.2472

So the significance level we could just set it as alpha equals .05 just by convention because we2494

are going to be using F value, we do not have to determine whether the one tailed or two-tailed.2513

Always one tailed is cut off on one side and skewed to the rights it is always going to be just on2517

the positive side and so let us draw our decision stage with the jar of distribution.2528

We know that it ends at zero is alpha equals .05, what is the F here?2535

Well remember, in order to find as we need to know the denominators DF as well as this numerators DS.2548

And so here we know that F in the numerator is going to be degrees of freedom between group and that is K -1.2557

There is 4 groups so it is going to be 3 and the degrees of freedom of residual is going to be degrees of freedom between times degrees of freedom subject.2570

So we are going to need to find degrees of freedom subject and degrees of freedom subject is going to be N-1.2587

Now let us look at our data set in order to figure out how many we have in our sample.2597

So I have made it nice and pretty here, first type photos mobile uploads uploaded photos and2602

profile photos, as you look at this row it has all of the data from one subject so this2609

person has zero photos of any kind whereas let us look at this person.2619

This person has zero mobile uploads and zero profile photos that they have 79 uploaded photos2625

and 37 tag photos and so for each subject we can see that there is some variation there but2631

across the different samples we also see some variation.2639

So here down here I put step one they are all equal and are not all equal, equals .05, here is the2643

decision stage, our K is 4 groups 4 samples, our degrees of freedom between us 4 -1, we already2655

done that but might to fill this in, on degrees of freedom for subject the reason why this is there2663

is so that we can find the degrees of freedom residual and once we have that then we can find our critical F.2670

So the degrees of freedom for each subject we should count how many subjects we actually have2692

here, we could just count the rows, so I just picked profile photos -1 so we actually have 29, 292707

cases but our degrees of freedom for subject is 28.2719

Now degrees of freedom residual are those 2° of freedom multiplied to each other so 3×28 and2724

that is going to be 84 and that is our denominator degrees of freedom.2735

So now we can find our critical F.2740

In order to do that we use F inverse the probability is .05 and our first degrees of freedom is the2742

numerator one and our second degrees of freedom is the denominator and our critical F is 2.71, that is our critical F.2750

So once we have that we can now go on to sort of figure out, okay from there let us go on and calculate our sample tests.2767

So we will have to find the sample S statistic right before, I disputed generically because you2777

might have to find T statistic or the statistics but in that case we know because we have a2787

omnivorous hypothesis we need that S statistic and we have to find the P value afterwards so let us find the S statistics.2794

Go to your example again, this is example 1, let us put in all the different things you need.2803

So you need the variance between over the variance of the residual variance so let us start off2814

with variance between, it is something we already know, we know it is been split up into sum of2822

squares between and degrees of freedom between.2827

We actually have degrees of freedom between already so let us just fill that in, in order to find2830

the sum of squares between, you have to find the means for each of these groups.2834

We are also going to need to find out what is the N.2841

That is actually quite simple because we know that it is 29 for each of these groups so that makes life a little bit simpler.2845

Now let us find the averages for each of these samples so for the first sample I believe this is tag photos, the mean is 9.93,2861

I believe this is mobile uploads, that is 12.45 for uploaded photos, that averages 68 and finally for profile photos the average is 1.5.2874

Okay so now we going to have to calculate the grand mean.2905

The grand mean is quite easy to do on XL because you just take all your data points every single one and you calculate that average.2909

The average is 23.2919

I am just going to copy and paste that here, what I did was they put a point here so that it would2921

just point to that top value for the granting shouldn't change the granting is always the same.2929

Now that we have all of these values we could find N times XR minus the grand mean squared.2935

We could find that for each group, and then when we add that up, we end up getting our sum of2948

squares between and we get this joint number 82,700.2968

And so I am just going to put a pointer = point to that guy and then I am going to find out my variance between.2973

So my variance is still quite large about 27,600.2986

Okay so to have that now we need to find my variance of my residual variance.2993

In order to find residual variance, I know I am going to need to find all this other stuff that I did not necessarily plan on.3000

So one of the things I do need to find is my SS total as well as my SS subject.3007

I am going to start with SS total because although the idea is simple to on XL it looks a little crazy3014

just because it takes up a lot of space because we going to need to find this square distance3023

away from the grand mean for every single data point.3028

So here, all my data points are here.3033

Now I am going to need to find the square distance of this guy away from the grand mean, and then add them all up.3040

What is helpful in XL is to create separate rows and then to sort of add them up and so I am just3055

going to use save these for later, and so this is, I have already put in the formulas here, this 1 is3067

tag for the tag photos, it is sort of my partial way to find SS total just for the tag photos and I am3075

going to do it for the mobile photos and for the uploaded photos then for profile photos and then add them altogether .3085

So either sort of subtotal.3091

So what I need to find is that data points minus the grand mean and I will just use this grand mean that I found down here.3094

But what I need to do is I need to lock that down I need to say always use this grand mean do not use any other one.3105

You put that in parentheses so that I could square it.3113

So here I am going to do that all the way down for tag photos and just take this across for mobile3118

uploaded and profile photos and that is the nice thing about XL it will give you all of these values very very easily.3133

I am just going to shortness this for second, just to show you what each of these is talking about.3144

So click on this one, this cell gives me this value minus my grand mean which is locked down3152

squared so I have now and every single data points square distance away from the grand mean and these are all the differences square distance.3160

Now I need to add them all up.3171

So put sum, and I am not just going to add up this column I am literally going to add all this up.3174

So our total sum of squares is 257,000.3184

So I am going to go down to my sum of squares total and just put a pointer here and say that is it.3192

So how do I find my sum of squares for the subject level variation?3201

Well, this I know I need to find the mean for every subject then I need to find the distance3212

between that mean and the grand mean square that and multiply it by how many groups I have.3219

The nice thing is the number of groups I have this constant is always four for everybody so let us go ahead and find subject level mean.3226

So subject means are going to be found by averaging one person measures for all 4 sample and3235

so that guys average of zero, just copy and paste that down, if they wanted to check this one takes the average of these four measures.3248

So this is subject level variation and this shows you that this guy has a lot fewer photos period than this guy.3259

He has just an average a lot higher photos than this guy.3268

And this guy is sort of in the middle of those two.3273

Once we have these subject level means now we could find this idea K times the difference3276

squared for each subject so I know my K is going to be 4 times my subject level mean minus the3286

grand mean and I will just use my already calculated grand mean down here and I need to lock3302

that grand mean down because that grand mean is never going to change squared.3309

Once I have that then I probably want to add them all up in order to get my sum of squares for within subject variation.3316

I will just put this little sum signs so that I know that this is an another like data point, it is a3335

totally different thing, sum, and once I have that it is 56,600 and I know my sum of squares within subject.3345

Once I knew all those things now I can finally calculate some of squares residual because I know my ingredients.3360

I have my sum of squares total minus the sum of squares per subject plus the sum of squares3369

between and I could obviously distribute out that negative sign but I will just use the parentheses.3380

So here is my leftover sum of squares that whatever's leftover unaccounted for and I already3390

figured out my DF residual and so here I am going to put my sum of squares residual divided by3399

degrees of freedom residual and there get 1400.3410

So now we can finally finally calculate our F by taking the variance between and dividing that by the variance residual variance.3416

In there I get 19.69 which is quite a bit above the critical F of 2.7.3427

Now once I have that now I could find my P value.3435

So by May P value I would put in my F, put in my F value, my numerator degrees of freedom as3441

well as my denominator degrees of freedom and I get 9.3×10 to the negative 10 Power so that3456

means there is a lot of decimal places before you get to that 9 so it is very very very very small P value.3468

So what do we do?3475

We reject the null.3478

Also remember that in a F test , all we do is reject the Omnibus null hypothesis that does not3480

mean we know which groups are actually different from each other so when you do reject the3490

null after doing F test, you want to follow up and do post hoc test.3495

There is lots of different post hoc test you might learn to keep postop or Bonferroni corrections3500

so those all help us know the pairwise comparisons to figure out which means are actually3507

different from which other means and you probably also want to find effect size and in F test if3515

effect size is not D or G instead its Eta squared.3524

So we would reject a null.3527

Example 2, a weightless boot camp is trying out three different exercise programs to help their clients shed some extra pounds.3537

All participants are assigned to team up 4 people and each week their entire team is weight3546

together to see how many pounds they were able to take off.3552

The data shows their weekly weight loss as a team.3554

With a exercise program all equally effective in helping them lose weight note that all teams3558

tried all three exercise regime but they all receive the treatment in random order.3564

So this is definitely a case where we have three different treatments.3569

Treatment 1, 2 and 3 and we have data points which are going to be pounds lost.3574

How many pounds they were able to take off per week pounds loss per week but these are not independent samples.3581

They are actually linked to each other.3590

What's the link?3592

It is the team of four that lost that weight right so this team lost that much under this exercise3594

regime, lost that much under this exercise regime, lost that much under this exercise regime.3602

Now each team got these three exercise regimes in a different order.3608

Some people are 3, 2, 1, so they have all been balanced in that way so if you pull up your examples and good example 2, you will see this data set.3612

So here are the different teams or squads.3627

Here are the three different types of exercise program and in the different orders that they3630

were, that they did these exercises and each exercise was done for a week.3635

So let us think about this.3642

So to begin with we need a hypothesis so step one is the null hypothesis and all are equal.3644

So all the mutinies, exercise 1, exercise 2, exercise 3, they are all equal.3660

The alternative hypothesis is that not all are equal.3667

So step 2 is our significance level we could just set alpha equals to .05 once again because it on3674

the best hypothesis we know we are going to do a F test so it does not need to be two-tailed.3686

So step three this is the decision stage if user imagine that F distribution or color in that part,3692

what is that critical F?3703

Well, in order to find the critical F, we are going to need to find the DF between as well as the DF3706

residual because that is the numerator and the denominator degrees of freedom.3715

In order to find DF residual we also need to find DF subject and remember here subject does not3722

mean each individual person, subject really mean case.3730

And each case here is a squad.3733

So how many squads are there -1.3736

So count how many squads there are -1.3741

So there is 11° of freedom or subject.3746

For degrees of freedom between what were going to need is the number of different samples which is three okay -1 so 3 - 1 is 2.3760

And so my DF residual is the DF between times the DF subject and that is 22 so let us find the critical F.3774

We need F inverse, the probability that we need is .05, the degrees of freedom for the3784

numerator is 2, the degrees of freedom for the denominator is 22 and our critical F is 3.44.3792

Step 4, here we are going to need the F statistic and in order to find F, we need the variance between divided by the variance residual.3802

In order to find variance between we are going to need the SS between divided by DF between,3823

we already have DF between thankfully, so we do need SS between.3833

And the concept of SS between is the whole idea of each samples X bar, their distant away from3838

the grand mean squared and then depending on how many subjects you had in your sample how3849

many data points you had in your sample you get waited more or less.3856

Now the nice thing is all of these have the same number of subjects.3860

But let us go ahead and and try to do this.3864

So first we need the different samples so this exercise 1, exercise 2, exercise 3, we need their N,3869

their N is going to be 12, there is 12 data points in each sample.3879

We also need there each exercise regimes average weight loss so we need X bar and we also3888

need the grand mean because ultimately we are going to look for N times X bar minus the grand3901

mean squared in order to add all of those up.3908

So let us find X bars for exercise regime number 1.3912

So that an XL makes it nice and easy for us to just find out all those averages very quickly and3918

then once we have that, now we can find the grand mean.3931

The grand mean is also very easy to find here.3938

We just want to select all the data points.3941

I think I selected one of them twice, be careful about that.3944

So make sure everybody is selected just one time so this is the average weight loss per week3950

regardless of which team you were on regardless of which exercise you did.3960

And now let us find N times the X bar minus the grand mean squared and let us do that for each for each exercise regime.3965

Once we have that done we could find the sum, and the sum is 23.63.3983

So here in SS between I would put that number there.3997

So once we have that now we could actually find this because we already have calculated the DF between, was not too hard.4006

Now we have to work on variance residual, now in order to find variance residual, let me just add4018

in a couple of rows here just to give me a little more space, variance residual, in order to find4031

variance residual I am going to need to find SS residual divided by DF residual.4049

We already have DF residual so we just need to find SS residual, in order to find that I need SS4054

total minus SS between + SS subject level.4062

So I already have my SS between so I need to find SS total and SS for each subject.4071

So SS total is going to be for every single exercise regime, for every single one of these data4080

points I need to find that distance away from the grand mean, add them all up and square and that is going to be my SS total.4092

So for E1 here is my subtotal for SS total, for E2, my subtotal for SS total, for E3, my subtotal for SS total.4104

So that is X minus the grand mean, lock that grand mean down, squared and make sure you do4120

that for every single data point in E1 so if I check on that last data point and just go ahead and4141

copy and paste that although it have to hear let us just checked on this one, this is taking this4151

value, subtracting the grand mean from it and then squaring that distance.4157

So once I have this, I could sum them all up and get my SS total, my total sum of squared distances.4164

So I am just going to put a pointer here so that I do not have to rewrite the number.4180

Once I have that all I have to find the SS subject.4187

Now remember, the SS subject each subject has its own little mean could be repeatedly make4190

the measure right so we have to find the subjects mean and then we have to get the distance4195

between their mean and the grand mean, square that and multiply it by the number of measures, K.4201

So let us do that here, first we need to find the subjects X bars so that is going to be each squads4211

average weight loss so some squads probably lost more weight than others, so this is the average4226

weight loss for each squad so it looks like you know this squad loss a bit, so a little bit of variation4242

in subjects success and sure we are going to look at K times the subjects X bar minus the grand4259

mean squared so we already know K, K is going to be 3 times the subjects X bar minus the grand4272

mean, I am just going to use the one we have already calculated down here and of course lock that down so copy and paste this, squared.4284

So copy and paste that all the way down and I could find the summary here and this is going to be my sum of squares for subject.4298

That is the sum of the bunch of squares.4312

So that is 34 something.4318

I am just going to put a pointer there so I do not have to retype that but I could just see it nice and clearly right here.4321

So now you have everything I need in order to find SS residual so I need SS total minus my sum of squares between plus some of squares subject.4332

Once I have that now I could find my vary residual variance divided by degrees of freedom, okay4344

so here it looks like my residual variance is much smaller than my between sample variance and4356

so I could predict my F value will be pretty big so 11 point something divided by two point4372

something and that gives me 5.219 and that is a little bit bigger than my critical F.4381

So if I find my key value F disc and put in my F, my numerator degrees of freedom, my4391

denominator degrees of freedom, I would find .01 so that seems like a pretty small, smaller than4403

.05 so I am going to be rejecting my null.4414

So step five down here, reject the null.4418

And we know that once you reject the null you are going to need to also do post hoc tests as well as find it a square.4425

So that brings us to example 3 what is the problem with a bunch of tiny t-test?4432

Well with so many t-test the probability of type 1 error increases increasing the cut off A, actually4445

were not increasing the cut off, we are keeping it at .05 but the type 1 error increases because4458

we reject we have the possibility of rejecting the null multiple times.4466

With so many t-test the probability of type 1 error increases here it is because we may be rejecting more null hypothesis.4470

This is actually a correct answer so we might not be done yet.4478

With so many paired samples t-test we have a better estimate of S because we have been estimating S several times.4484

With so many paired samples t-test we have a poor estimate of S because were not using all of4491

the data to estimate one S in fact we are just using substance of the data to estimate S several times, that is a good answer.4500

So that is it for repeated measures ANOVA, thanks for using