WEBVTT mathematics/statistics/son
00:00:00.000 --> 00:00:01.800
Hi, welcome to educator.com.
00:00:01.800 --> 00:00:04.100
Today we are going to talk about repeated measures ANOVA.
00:00:04.100 --> 00:00:14.500
So the repeated measures ANOVA is a lot like the regular one way independent samples ANOVA that we have been talking about.
00:00:14.500 --> 00:00:22.500
But it is also a lot like the paired samples t-test and so we are going to talk about why we need the repeated measures ANOVA .
00:00:22.500 --> 00:00:27.600
And we are going to contrast the independent samples ANOVA with the repeated measures
00:00:27.600 --> 00:00:34.700
ANOVA and finally we are going to breakdown that repeated measures at statistic into its component variant parts.
00:00:34.700 --> 00:00:44.000
Okay so previously, when we talked about one-way ANOVA we talk initially about why we
00:00:44.000 --> 00:00:49.600
needed it and the reason why we need ANOVA is that the t-test is limited.
00:00:49.600 --> 00:00:57.900
So previously we talked about this example, who uploads more pictures, Latino white Asian or black Facebook users?
00:00:57.900 --> 00:01:06.500
When we saw this problem and we thought about maybe doing independent samples t-test we realize we would have to do a whole bunch of little t-test.
00:01:06.500 --> 00:01:09.700
Well let us get this problem.
00:01:09.700 --> 00:01:16.000
It is similar in some ways but it is also a little bit different so here is the question.
00:01:16.000 --> 00:01:19.000
Which prototype is most frequently used on facebook?
00:01:19.000 --> 00:01:23.000
Tagged, uploaded mobile uploads for profile pictures?
00:01:23.000 --> 00:01:31.400
Now in the same way that this has many groups, at this all the problem also has many groups,
00:01:31.400 --> 00:01:39.400
the one thing you could serve immediately tell us of that is if we try to use t-test we also have to use a bunch of little t-test here.
00:01:39.400 --> 00:01:41.700
But here is another thing.
00:01:41.700 --> 00:01:44.200
These variables are actually linked to one another.
00:01:44.200 --> 00:01:50.200
Often people who have tagged photos have a number of uploaded photos who have a number of
00:01:50.200 --> 00:01:54.300
mobile uploads will also have a number of profile pictures.
00:01:54.300 --> 00:02:00.900
So in this sense although these are made up of four just separate groups of users and the user
00:02:00.900 --> 00:02:14.900
here is the linked to any of the users and the other groups Latino, white, Asian, black groups, here we have these four sets of data.
00:02:14.900 --> 00:02:26.100
Tagged, uploaded mobile or profile pictures but the number of tagged photos is linked to some
00:02:26.100 --> 00:02:33.100
number of uploaded photos probably because they come from the same person and maybe this
00:02:33.100 --> 00:02:38.300
person owns the digital camera that they really loving carry around everywhere.
00:02:38.300 --> 00:02:46.800
So these scores in these different groups are actually linked to each other and these are what we
00:02:46.800 --> 00:02:53.600
have called previously dependent samples or we called them paired samples before because
00:02:53.600 --> 00:03:01.600
there were only two groups of them at that time but now we have four groups but we could still see that linked principle still hold.
00:03:01.600 --> 00:03:08.600
So here were talking about were talking about different samples, multiple numbers samples
00:03:08.600 --> 00:03:13.700
more than two but these samples are also linked to each other in some way.
00:03:13.700 --> 00:03:23.800
And because of that those are called repeated measures because we are repeatedly measuring something over and over again.
00:03:23.800 --> 00:03:31.500
Measuring photos here measuring photos here measuring photos here measuring photos here and because that is called repeated measures.
00:03:31.500 --> 00:03:37.500
It is very similar to the idea of paired samples except were now talking about more than two.
00:03:37.500 --> 00:03:45.000
So 3, 4, 5 we call those repeated measures so we have the same problem here as we did here.
00:03:45.000 --> 00:04:03.100
If we have a bunch of t-test of our solution is a bunch of t-test, we have two problems whether their paired t-test or independent samples.
00:04:03.100 --> 00:04:06.700
So in this case they would be paired.
00:04:06.700 --> 00:04:13.800
That even in the case of paired t-test of the same problems that we did before, the first problem
00:04:13.800 --> 00:04:18.500
is that with so many t-test the probability of false alarms goes up.
00:04:18.500 --> 00:04:24.500
So this is going to be a problem.
00:04:24.500 --> 00:04:33.500
And it is because we reject more null hypotheses every time you reject a null hypotheses you have a .05 chance of error.
00:04:33.500 --> 00:04:36.900
So where compounding that problem.
00:04:36.900 --> 00:04:44.600
The 2nd thing that is wrong when we do a whole bunch of little t-test instead of one giant test is
00:04:44.600 --> 00:04:51.400
that we are ignoring some of the data when where calculating the population standard deviation.
00:04:51.400 --> 00:04:57.100
So what we estimate that population standard deviation the more data of the better but if we
00:04:57.100 --> 00:05:03.200
only look at two of the sample that a time then were ignoring the other two perfectly good sets
00:05:03.200 --> 00:05:11.400
of data and were not using them in order to help us estimate more accurately the population standard deviation.
00:05:11.400 --> 00:05:37.700
So we get a poorer estimate of S because we are not using all the data at our disposal.
00:05:37.700 --> 00:05:48.700
So that is the problem and we need to find a way around it, thankfully, Ronald Fisher comes to the rescue with his test.
00:05:48.700 --> 00:05:56.100
Okay so the ANOVA that is our general solution to the problem of too many tiny t-test.
00:05:56.100 --> 00:06:00.100
But so far we only talked about ANOVAs for independent samples.
00:06:00.100 --> 00:06:10.200
Now we need an ANOVA for repeated measures so the ANOVA is always going to start the same way with the Omnibus hypothesis.
00:06:10.200 --> 00:06:18.100
One hypothesis to rule them all and the Omnibus hypothesis almost said all the samples come from the same population.
00:06:18.100 --> 00:06:29.000
So the first group of photos equals the μ of the second group of photos equals the μ of the
00:06:29.000 --> 00:06:32.400
third group of photos equals the μ of the fourth group.
00:06:32.400 --> 00:06:43.800
And the alternative hypothesis is not that they are not all not equal to each other but that at least one is different, outlier.
00:06:43.800 --> 00:07:01.400
And so the way we say that is that all μ’s of P, all the μ’s of the different photo type are not the same.
00:07:01.400 --> 00:07:09.800
Now we have to keep in mind the logic that all of these μ’s are not the same, is not the same as
00:07:09.800 --> 00:07:12.900
saying all of the μ’s are different from each other.
00:07:12.900 --> 00:07:22.300
And when we say all of them are not the same if even one of them is not the same then this alternative hypothesis is true.
00:07:22.300 --> 00:07:29.800
So this starts off much the same way as independent samples from there we go on to analyze variance.
00:07:29.800 --> 00:07:37.700
And here were going to use that S statistic again.
00:07:37.700 --> 00:07:49.200
And Ronald Fisher's big idea that he had upon is this idea that it when we talk about the F it is a
00:07:49.200 --> 00:08:15.300
ratio of variances and really one way of thinking about it is the ratio of between sample or group variability over the within sample variability.
00:08:15.300 --> 00:08:30.000
And another way of thinking about this is if the variability we are interested in and I do not just
00:08:30.000 --> 00:08:35.400
mean that over passionate about it or we find a very curious but I really need is the variability
00:08:35.400 --> 00:08:42.700
that we are making a hypothesis about over the variability that we cannot explain.
00:08:42.700 --> 00:08:56.500
We do not know where that vary the other variability comes from, it just exists and we have to deal with it.
00:08:56.500 --> 00:09:04.600
Okay and so this S statistic is going to be the same concept, the same concept will going to come
00:09:04.600 --> 00:09:08.500
up, again we will talk about the repeated measures version of F.
00:09:08.500 --> 00:09:11.500
There are going to be some subtle differences though.
00:09:11.500 --> 00:09:20.800
Okay so let us talk about the independent samples ANOVA versus the repeated measures ANOVA.
00:09:20.800 --> 00:09:27.000
People have the same start, they have the same hypothesis not only that but they both have the
00:09:27.000 --> 00:09:32.400
same idea of taking all the variance in our sample and breaking it down into component parts.
00:09:32.400 --> 00:09:41.500
Now what we talk about all the variance in our sample we really mean what is our sum of squares total.
00:09:41.500 --> 00:09:50.100
What is the total amount of variability away from the grand mean in our entire data set?
00:09:50.100 --> 00:09:56.800
And we can easily just from the sentence we could figure out what the formula for this would be.
00:09:56.800 --> 00:10:05.800
This should be something like the variability of all every single one of our data point minus the
00:10:05.800 --> 00:10:15.700
grand mean which we signify with two bars the double bar square and the Sigma now to do this
00:10:15.700 --> 00:10:25.500
for every single data point not just the data point in one sample while the way it knows to do that is because this should say N total.
00:10:25.500 --> 00:10:34.000
So this is going to go through every single data point in every single sample and subtract get the
00:10:34.000 --> 00:10:42.900
distance from the grand mean in the square that distance and add those distances all squared.
00:10:42.900 --> 00:10:46.800
Okay so that is the same idea to begin with.
00:10:46.800 --> 00:10:51.400
Now we will take this at this total and break it down into its component parts.
00:10:51.400 --> 00:11:03.700
Now an independent samples what we see is that all of the variability that were unable to
00:11:03.700 --> 00:11:11.100
explain lies within the group, all of the variability that we are very interested in, that is between
00:11:11.100 --> 00:11:21.300
the group, and so independent samples the story becomes, as this total is a conglomeration, it is
00:11:21.300 --> 00:11:29.800
when you split it up into its part you see that it is made up of the sum of squares within the
00:11:29.800 --> 00:11:35.900
group inside of the group and the sum of squares between the groups added up.
00:11:35.900 --> 00:11:51.400
And because of that the S statistic here becomes the variance between over the variance within
00:11:51.400 --> 00:12:00.000
and obviously each of these variances corresponds to its own sum of squares.
00:12:00.000 --> 00:12:10.300
Now the repeated measures ANOVA were going to be talking about something slightly different because now we have these linked data.
00:12:10.300 --> 00:12:18.500
So here the data is independent these samples are independent they are not linked to each other in any way.
00:12:18.500 --> 00:12:23.700
Here, these samples are actually linked to each other.
00:12:23.700 --> 00:12:33.500
Either by virtue of being made from the same subject or the same class produced or something about these scores are linked to each other.
00:12:33.500 --> 00:12:50.100
So not only is there variability across the groups just like before so sort of between the groups and variability within the group.
00:12:50.100 --> 00:12:53.700
But now we have a new kind of variability.
00:12:53.700 --> 00:13:05.000
We have the variability caused by these different linkages, these all are different from each other but maybe similar across.
00:13:05.000 --> 00:13:12.100
So the person who owns a digital camera they might just have an enormous number of photos all across the board.
00:13:12.100 --> 00:13:19.400
The person who does not have a digital camera or not even a smartphone might have a low number of photos across-the-board.
00:13:19.400 --> 00:13:25.700
So there are those things that we call often are called individual differences.
00:13:25.700 --> 00:13:33.000
Those are differences that we actually mathematically quantify, we could actually explain where
00:13:33.000 --> 00:13:42.100
it is but we are not actually interested in the study, were really interested in the between group difference.
00:13:42.100 --> 00:13:45.000
But this is not all.
00:13:45.000 --> 00:13:53.900
Once you have taken out this individual of variability there is still some residual within group variability left over.
00:13:53.900 --> 00:14:02.300
And so that is that is really stuff we cannot explain, it is not caused by the individual differences
00:14:02.300 --> 00:14:07.600
it is not because of between group, it is just within group differences.
00:14:07.600 --> 00:14:15.700
So in repeated measures the sum of squares total actually breaks down slightly differently even
00:14:15.700 --> 00:14:25.200
though it is still it is still this idea of breaking down the sum of squares total now it actually splits
00:14:25.200 --> 00:14:37.800
up into some of squares subjects, this individual links the yellow part plus the sum of squares
00:14:37.800 --> 00:14:49.000
within just like before that now we call it residual because this we have taken out the sum of the
00:14:49.000 --> 00:14:56.500
variability that comes from the individual differences and so because of that there is there is only
00:14:56.500 --> 00:15:01.700
left over and so because that we call it residual just like the words left over.
00:15:01.700 --> 00:15:07.500
And of course the sum of squares between which is what were actually very interested in.
00:15:07.500 --> 00:15:15.000
So just to recap, this is something that we can explain how there were not interested in, this is
00:15:15.000 --> 00:15:20.000
something we cannot explain and this is something we are very interested in.
00:15:20.000 --> 00:15:31.500
So, our S statistic will actually become our variability between divided by our variability residual
00:15:31.500 --> 00:15:40.300
and in fact we just wanted to take this guy out of the equation we want him to out of the equation of F.
00:15:40.300 --> 00:15:52.800
So F, it does not count the variability from the subjects, were individual difference, the are not interested in that.
00:15:52.800 --> 00:16:00.200
Okay so I wanted to show you this and a picture here is what I would show you.
00:16:00.200 --> 00:16:05.700
Here is what we mean by independent samples, remember, the independent samples ANOVA it
00:16:05.700 --> 00:16:15.200
is always been a start off with that same idea and total the difference between each data point
00:16:15.200 --> 00:16:22.800
from the grand mean squared and then add all of those up that is the total sum of squares.
00:16:22.800 --> 00:16:33.900
In independent samples, what were going to do is take all of this all of this variability that SS total.
00:16:33.900 --> 00:16:52.200
That is that SS total of the total variance and we are going to break them up into between group variance.
00:16:52.200 --> 00:17:00.600
So think of this, this is just to signify the difference of all of these guys from the grand mean.
00:17:00.600 --> 00:17:18.300
So the between group differences, so SS between and add to that the within group variability.
00:17:18.300 --> 00:17:21.300
The variability that we have no explanation for.
00:17:21.300 --> 00:17:26.400
So that is the within group variability.
00:17:26.400 --> 00:17:47.200
So it only makes sense that the variability between divided by the variability within, this is what
00:17:47.200 --> 00:17:56.700
we would use in order to figure out the ratio of the variability we are interested in or
00:17:56.700 --> 00:18:09.700
hypothesizing about divided by the variability we cannot account for.
00:18:09.700 --> 00:18:16.600
So this becomes the S statistics that were very much interested in.
00:18:16.600 --> 00:18:26.000
Now when we talk about the repeated measures ANOVA, once again we start off similarly for
00:18:26.000 --> 00:18:33.500
every single data point we want their squared distance away from the grand mean and add them all up.
00:18:33.500 --> 00:18:43.200
In order to see this as a picture you want to see that whole this whole idea here that did the
00:18:43.200 --> 00:18:48.900
distance of all of these away from the grand mean that is SS total.
00:18:48.900 --> 00:18:57.800
However what we wanted to do is then break it up into its component parts and just like before
00:18:57.800 --> 00:19:07.300
we have these differences between the groups so that is SS between.
00:19:07.300 --> 00:19:16.000
And that SS between is the stuff that were really interested in so that is also going to be a factor here.
00:19:16.000 --> 00:19:25.100
Where we take the variability between but then, we want to break up the rest of the variance
00:19:25.100 --> 00:19:34.400
into one part that we can actually explain, we could account for it and into the rest of the residuals that we cannot explain.
00:19:34.400 --> 00:19:42.400
So even when we are not interested in that we could actually account for the variability.
00:19:42.400 --> 00:19:54.500
You could think of it across these rows because notice that person one, the viewer photos
00:19:54.500 --> 00:20:03.800
across-the-board, person three just has more photos across-the-board and so those are the kinds
00:20:03.800 --> 00:20:10.900
of individual differences, little level differences that we do not actually want in our S statistic.
00:20:10.900 --> 00:20:17.100
Its variability we know where it comes from which is not interested in it in terms of our hypothesis testing.
00:20:17.100 --> 00:20:31.900
So we have this SS subject, put a little yellow highlight here so that you know what it stands for
00:20:31.900 --> 00:20:40.600
and that is the variability that we can explain but not part of my hypothesis testing.
00:20:40.600 --> 00:20:48.000
And so what variability are we left with, we are left with any leftover variability, there is some
00:20:48.000 --> 00:20:56.700
leftover variability and we call that residual variability and that is going to be SS residual.
00:20:56.700 --> 00:21:04.000
And if we want to look at the variability that were interested in over the variability we cannot
00:21:04.000 --> 00:21:09.400
explain, we are not going to include this variability, we are only going to use this one.
00:21:09.400 --> 00:21:21.200
So the variability between groups divided by the variability residual, residual variability.
00:21:21.200 --> 00:21:29.600
Once we have that now let us break it down even further.
00:21:29.600 --> 00:21:33.700
So the repeated measures S statistic now you sort of know basically what it is.
00:21:33.700 --> 00:21:47.200
It is the is the variability between groups divided by the variability within group.
00:21:47.200 --> 00:21:56.600
Now we could break these up into their component parts so it is going to be the SS between
00:21:56.600 --> 00:22:07.500
sum of squares between, divided by the degrees of freedom between, all divided by the sum of
00:22:07.500 --> 00:22:18.300
squares of partnership residual, residual over the degrees of freedom residual.
00:22:18.300 --> 00:22:28.000
So far it just looks like what we have always been doing with variability sum of squares through
00:22:28.000 --> 00:22:34.900
the freedom but now we have to figure out okay how do we actually find these things.
00:22:34.900 --> 00:22:51.500
And in fact, this is something you already know because this is actually exactly the same as the independent samples ANOVA.
00:22:51.500 --> 00:23:07.000
The only thing that can really be different is this one.
00:23:07.000 --> 00:23:10.600
Okay so let us start off here.
00:23:10.600 --> 00:23:18.000
So this is what we are really looking for when we start double-click on this guy we double-click
00:23:18.000 --> 00:23:25.100
on that variability what we find inside is something like this and then double-click on each of these things and figure out what is inside.
00:23:25.100 --> 00:23:33.200
So conceptually this is idea the whole idea of the variability between groups is the difference of
00:23:33.200 --> 00:23:39.600
sample mean from the grand mean because we want to know how each sample differs from that grand mean.
00:23:39.600 --> 00:23:46.100
Now let us think about how many means we have because that is going to determine our degrees of freedom.
00:23:46.100 --> 00:23:49.700
So how many means we have is usually K.
00:23:49.700 --> 00:24:01.100
How many samples we have and so with the degrees of freedom between well it is going to be K -1.
00:24:01.100 --> 00:24:08.200
And the way you can think about this is how many means we find, where you find three means
00:24:08.200 --> 00:24:16.200
or how many means as groups so if we have four groups it would be for means if we had three groups it would be three means.
00:24:16.200 --> 00:24:25.500
And if we knew two of them and we know the grand mean, we could actually figure out third and
00:24:25.500 --> 00:24:31.500
so because of that is going our degrees of freedom is K – 1, the number means -1.
00:24:31.500 --> 00:24:38.900
Okay so what is sum of squares between?
00:24:38.900 --> 00:24:43.700
Well it is this whole idea of the difference of sample means from the grand mean and we could
00:24:43.700 --> 00:24:49.600
say that the sample mean away from the grand mean we have a whole bunch of sample means.
00:24:49.600 --> 00:24:51.200
Something to put my index there.
00:24:51.200 --> 00:24:57.900
And we are going to square that because the study some of squares, and each means distance
00:24:57.900 --> 00:25:04.300
should count more if that sample has a lot of members and I should get more votes so we are
00:25:04.300 --> 00:25:10.700
going to multiply that by N sub I, how many in their sample.
00:25:10.700 --> 00:25:18.000
And in order to figure out what I mean when I say okay let us think about that, I is going to stand
00:25:18.000 --> 00:25:27.100
for each group so this Sigma is going to have to go from I = 1 through K how many groups.
00:25:27.100 --> 00:25:36.200
And then it is going to cycle through group 1, group 2, group 3, group 4 and this is some of squares between.
00:25:36.200 --> 00:25:44.000
Additional is very similar because this is actually the same thing from independent samples ANOVA.
00:25:44.000 --> 00:25:51.500
So now we have to figure out how to find the other sum of squares the new one.
00:25:51.500 --> 00:25:55.500
Sum of squares residual and degrees of freedom residual and the whole reason we want to do
00:25:55.500 --> 00:26:02.000
that is because we want to find the variability residual, the leftover variability.
00:26:02.000 --> 00:26:10.500
If any leftover spread within the groups is not accounted for by within subject variation.
00:26:10.500 --> 00:26:18.300
Now within subject might mean within each person right but it might mean within each hamster
00:26:18.300 --> 00:26:25.000
or each company that is being measured here repeatedly so whatever it is that your case is
00:26:25.000 --> 00:26:34.800
whether animal or human or entity of some sort , that is considered within your subject of variability.
00:26:34.800 --> 00:26:38.100
And those subjects are all slightly different from each other.
00:26:38.100 --> 00:26:44.400
But that is not something that were actually interested in so we want to take that out and take the leftover variability.
00:26:44.400 --> 00:26:53.700
And because it is the idea of leftover, we actually cant find a lot of this, we can find some of
00:26:53.700 --> 00:26:57.100
squares directly, we have to find the leftover.
00:26:57.100 --> 00:27:04.200
And so the way we do that is take the total sum of squares and then subtract out the stuff we do
00:27:04.200 --> 00:27:16.800
not need which is namely the sum of squares between as well as the sum of squares within subject to the variability within subject.
00:27:16.800 --> 00:27:22.000
And so here we see that we are going to have to find some of squares for everybody were
00:27:22.000 --> 00:27:29.800
enough to find total as well as for the within subject, we already knew we have to find this one,
00:27:29.800 --> 00:27:36.200
and that is how where we can find some of squares residual, literally whatever is left over.
00:27:36.200 --> 00:27:41.000
In the same way to find degrees of freedom residual, we should have to know something about
00:27:41.000 --> 00:27:47.400
the other degrees of freedom in order to find this sort of our whatever's left.
00:27:47.400 --> 00:27:53.900
And so in order to find degrees of freedom residual, what we do is we multiply together the
00:27:53.900 --> 00:28:03.300
degrees of freedom between times the degrees of freedom within subject and when we do this,
00:28:03.300 --> 00:28:08.800
we are going to be able to find all the degrees of freedom that is leftover .
00:28:08.800 --> 00:28:22.500
Okay so we realize in order to find sum of squares residual we have to find all these other sum of squares so here is some of squares within subject.
00:28:22.500 --> 00:28:36.600
So the way to sort of think about this notion is this idea that were really talking about subject level or case level, subject level variation.
00:28:36.600 --> 00:28:43.400
So each case differs a little better from the other cases for God knows what reason right but we
00:28:43.400 --> 00:28:48.800
can actually account for it here, it is not totally unexplained we do not know why it exists, we
00:28:48.800 --> 00:28:54.000
know it exists because the subjects are all slightly different from each other and we do not know
00:28:54.000 --> 00:28:58.600
why it exists but we know what it is and we could calculate it.
00:28:58.600 --> 00:29:07.700
Okay so conceptually you want to think about this at how far each subjects mean is away from the grand mean.
00:29:07.700 --> 00:29:14.300
Remember in repeated measures we are repeatedly measuring each subject or case, we are
00:29:14.300 --> 00:29:23.200
measuring them multiple times so if I am Facebook user I will be contributing for different scores to this problem.
00:29:23.200 --> 00:29:29.700
Now I know what you could do is get a little mean just for me right?
00:29:29.700 --> 00:29:34.400
The little mean of my four scores and that is my subject mean.
00:29:34.400 --> 00:29:43.300
So each subject has her own little mean and we want to find the distance of those little means away from the grand mean.
00:29:43.300 --> 00:29:47.500
So let us think how many subject means do we have?
00:29:47.500 --> 00:30:00.400
We have N number of subjects, that we have an number of samples for each for each sample and number of measures for each sample.
00:30:00.400 --> 00:30:02.700
So that is our sample size.
00:30:02.700 --> 00:30:06.000
So what is degrees of freedom for within subjects?
00:30:06.000 --> 00:30:08.100
Well, that is going to be N-1.
00:30:08.100 --> 00:30:14.100
So what is the sum of squares for each subject?
00:30:14.100 --> 00:30:20.600
Well one of the things you have to do is sort of figure out a way to talk about the subject level mean.
00:30:20.600 --> 00:30:35.000
So here, I am just going to say mean and put a index for now but here in my in my little telling that
00:30:35.000 --> 00:30:43.200
this Sigma will tell this what I is, I will go from one up to N sample size.
00:30:43.200 --> 00:30:53.200
This is really the subject means and I want to get the distance squared distance from each
00:30:53.200 --> 00:31:00.500
subject mean to the grand mean and square that, squared distance and we should also take into
00:31:00.500 --> 00:31:07.000
account how many times is the subject being measured and that is going to be K number of times.
00:31:07.000 --> 00:31:17.900
How many how many samples are taken how many measures are taken so repeated measures how many times the measure is repeated.
00:31:17.900 --> 00:31:29.000
And the more times a subject participates the more this variation will count.
00:31:29.000 --> 00:31:39.200
So there we have subject level variation, we are really only finding it so that we can find SS
00:31:39.200 --> 00:31:47.700
residual, so better do it and so we also have to find sum of squares total and degrees of freedom total.
00:31:47.700 --> 00:31:53.000
These are something we have gone over that just to drive it home remember the reason we
00:31:53.000 --> 00:31:56.400
want to find this is just so we can find sum of squares residual.
00:31:56.400 --> 00:32:02.000
So conceptually this is just the total variation of all the data points away from the grand mean.
00:32:02.000 --> 00:32:05.500
What is the total number of data points?
00:32:05.500 --> 00:32:08.100
That is going to be N total.
00:32:08.100 --> 00:32:17.500
So every single data point counted up and the way we find that is sample size N times the
00:32:17.500 --> 00:32:23.700
number of samples we have so if we had 30 people participating in four different measures, it is
00:32:23.700 --> 00:32:33.700
30 times 4 and the number of samples is called K so NK of N subtotal.
00:32:33.700 --> 00:32:36.800
So what is the degrees of freedom total?
00:32:36.800 --> 00:32:47.300
Well it is either going to be N total minus 1 or the same exact numerical value will be NK -1 either way.
00:32:47.300 --> 00:32:50.500
And so what is the sum of squares total?
00:32:50.500 --> 00:32:55.400
Well we have already been through it this is what we always start off with at least conceptually
00:32:55.400 --> 00:33:03.800
for every single data point notice that there is no bars on it not any means of literally every single data point.
00:33:03.800 --> 00:33:13.800
The distance from the grand mean squared and we could put NK here just to say go and do this
00:33:13.800 --> 00:33:16.700
for every single data point do not leave one behind.
00:33:16.700 --> 00:33:29.100
So we have all of our different components, now let us put them together in this chart so that you will know how they fit together.
00:33:29.100 --> 00:33:47.000
Remember the idea of the F is the variation we are interested in over variation we cannot
00:33:47.000 --> 00:33:53.500
explain, we cannot account for do not know where it comes from, it is a mystery.
00:33:53.500 --> 00:34:02.900
The formula for this is going to be F equals, I remember this is for repeated measures so that
00:34:02.900 --> 00:34:10.700
between sample variability over the residual variability.
00:34:10.700 --> 00:34:16.900
And in order to find that we are going to need between sample variability.
00:34:16.900 --> 00:34:31.700
The idea is always going to be the best sample means difference from grand mean.
00:34:31.700 --> 00:34:43.900
So basically the centres of each sample away the distance away from the grand mean and so the
00:34:43.900 --> 00:34:54.000
formula for that is going to be S squared between equals SS between over the DF between and
00:34:54.000 --> 00:35:03.400
we can find each of those component parts SS between going to be the sum to many zigzag, the
00:35:03.400 --> 00:35:18.000
sum of all of my all of my X bars minus the grand mean the distance and when I say all I mean one at a time.
00:35:18.000 --> 00:35:25.000
Each as individuals one at a time, and this distance should count more if you have more people
00:35:25.000 --> 00:35:34.000
or more data point in your sample and I does not go from one to N, it goes from one to K, I am
00:35:34.000 --> 00:35:39.600
going to do this for each sample, NK is my number of samples or number of groups.
00:35:39.600 --> 00:35:48.300
So my degrees of freedom between is really going to be K -1, number groups -1.
00:35:48.300 --> 00:35:54.800
Okay so now let us try to get residual variability.
00:35:54.800 --> 00:36:09.600
Now residual variability is that leftover within groups within sample variability, now in order to
00:36:09.600 --> 00:36:21.400
get leftover the formula for this is going to be the variability residual.
00:36:21.400 --> 00:36:30.000
Now to get that you get the residual sum of squares and divide by the residual degrees of
00:36:30.000 --> 00:36:37.400
freedom, the residual sum of squares is literally going to be the left over.
00:36:37.400 --> 00:36:48.900
SS total minus SS subject plus SS between.
00:36:48.900 --> 00:37:02.000
And my degrees of freedom residual is going to be a conglomeration of other degrees of
00:37:02.000 --> 00:37:09.800
freedom available to S times the degrees of freedom between, okay.
00:37:09.800 --> 00:37:21.700
So we know that in order to find these, so the total variability let us start there, we know this one
00:37:21.700 --> 00:37:49.000
pretty well, all the data point in all of our samples away from the granting and so we actually do
00:37:49.000 --> 00:37:53.800
not need the variability here and we do not need this variability either.
00:37:53.800 --> 00:38:02.900
What we really need is the sum of squares total and that is going to be for each data point no X
00:38:02.900 --> 00:38:08.500
bar anything get this squared distance from the grand mean.
00:38:08.500 --> 00:38:19.200
So now that we have that we do not really need that but we can find it anyway so the degrees of
00:38:19.200 --> 00:38:27.900
freedom total is going to be NK -1 so the total number of data points minus 1.
00:38:27.900 --> 00:38:44.600
Now let us talk about within subject variability this is the Brad of each case away from grand
00:38:44.600 --> 00:38:51.500
mean and when you talk about each case, each case can sort of be represented the point
00:38:51.500 --> 00:38:57.600
estimate of it can be its own means so each cases mean so that is how I want you to think of it.
00:38:57.600 --> 00:39:05.100
Each case is represented by its own little mean and so that is why they were using it means to calculate the distance.
00:39:05.100 --> 00:39:20.000
So that SS subject is going to be the distance of each subject level mean away from the grand
00:39:20.000 --> 00:39:28.000
mean squared and in order to say subject level, you got to put that N here so that it knows do
00:39:28.000 --> 00:39:46.000
this for each subject not do this for each data point or do this for each, if we put a K there would
00:39:46.000 --> 00:39:47.500
be do this for each group and we wanted to count more if they participate in more measures so if the measures are repeated over and over again.
00:39:47.500 --> 00:39:54.500
So we want to put in the number of k and so that gives us our are some of squares for each
00:39:54.500 --> 00:40:00.600
subject and once we have those two we can find this as well as some of squares between and
00:40:00.600 --> 00:40:11.100
then we also need the degrees of freedom for within subjects just because were in need that to find out the degrees of freedom residual.
00:40:11.100 --> 00:40:14.600
This guy do all this jump through hoops.
00:40:14.600 --> 00:40:22.900
So the degree of freedom for each subject for subject level variance is going to be N -1, the number of subjects -1.
00:40:22.900 --> 00:40:34.000
Okay so here is example 1 which is more prevalent uploaded mobile profile photos and so these
00:40:34.000 --> 00:40:41.400
are all different kinds of photos but one person or one Facebook user presumably 1 person, they
00:40:41.400 --> 00:40:48.200
are sort of the linking factor of all four of those measures.
00:40:48.200 --> 00:40:51.300
So what is the null hypothesis?
00:40:51.300 --> 00:40:56.900
Well it is that all of these groups really come from the same population.
00:40:56.900 --> 00:41:12.600
The reason I use this P notation is for just different types of photos and I will call this one 2, 3, and 4.
00:41:12.600 --> 00:41:33.800
Also it makes it easier for me to write my alternative hypothesis, it has a practical significance so all use of P’s are not equal so they are not all equal.
00:41:33.800 --> 00:41:52.700
So the significance level we could just set it as α equals .05 just by convention because we
00:41:52.700 --> 00:41:57.200
are going to be using F value, we do not have to determine whether the one tailed or two-tailed.
00:41:57.200 --> 00:42:07.800
Always one tailed is cut off on one side and skewed to the rights it is always going to be just on
00:42:07.800 --> 00:42:14.700
the positive side and so let us draw our decision stage with the jar of distribution.
00:42:14.700 --> 00:42:27.800
We know that it ends at zero is α equals .05, what is the F here?
00:42:27.800 --> 00:42:36.700
Well remember, in order to find as we need to know the denominators DF as well as this numerators DS.
00:42:36.700 --> 00:42:50.600
And so here we know that F in the numerator is going to be degrees of freedom between group and that is K -1.
00:42:50.600 --> 00:43:06.800
There is 4 groups so it is going to be 3 and the degrees of freedom of residual is going to be degrees of freedom between times degrees of freedom subject.
00:43:06.800 --> 00:43:17.200
So we are going to need to find degrees of freedom subject and degrees of freedom subject is going to be N-1.
00:43:17.200 --> 00:43:22.600
Now let us look at our data set in order to figure out how many we have in our sample.
00:43:22.600 --> 00:43:29.000
So I have made it nice and pretty here, first type photos mobile uploads uploaded photos and
00:43:29.000 --> 00:43:38.700
profile photos, as you look at this row it has all of the data from one subject so this
00:43:38.700 --> 00:43:44.900
person has zero photos of any kind whereas let us look at this person.
00:43:44.900 --> 00:43:51.100
This person has zero mobile uploads and zero profile photos that they have 79 uploaded photos
00:43:51.100 --> 00:43:59.100
and 37 tag photos and so for each subject we can see that there is some variation there but
00:43:59.100 --> 00:44:02.700
across the different samples we also see some variation.
00:44:02.700 --> 00:44:15.000
So here down here I put step one they are all equal and are not all equal, equals .05, here is the
00:44:15.000 --> 00:44:23.100
decision stage, our K is 4 groups 4 samples, our degrees of freedom between us 4 -1, we already
00:44:23.100 --> 00:44:30.200
done that but might to fill this in, on degrees of freedom for subject the reason why this is there
00:44:30.200 --> 00:44:52.000
is so that we can find the degrees of freedom residual and once we have that then we can find our critical F.
00:44:52.000 --> 00:45:07.500
So the degrees of freedom for each subject we should count how many subjects we actually have
00:45:07.500 --> 00:45:19.300
here, we could just count the rows, so I just picked profile photos -1 so we actually have 29, 29
00:45:19.300 --> 00:45:23.800
cases but our degrees of freedom for subject is 28.
00:45:23.800 --> 00:45:35.000
Now degrees of freedom residual are those 2° of freedom multiplied to each other so 3×28 and
00:45:35.000 --> 00:45:40.000
that is going to be 84 and that is our denominator degrees of freedom.
00:45:40.000 --> 00:45:42.300
So now we can find our critical F.
00:45:42.300 --> 00:45:50.000
In order to do that we use F inverse the probability is .05 and our first degrees of freedom is the
00:45:50.000 --> 00:46:07.200
numerator one and our second degrees of freedom is the denominator and our critical F is 2.71, that is our critical F.
00:46:07.200 --> 00:46:17.600
So once we have that we can now go on to sort of figure out, okay from there let us go on and calculate our sample tests.
00:46:17.600 --> 00:46:26.900
So we will have to find the sample S statistic right before, I disputed generically because you
00:46:26.900 --> 00:46:34.200
might have to find T statistic or the statistics but in that case we know because we have a
00:46:34.200 --> 00:46:43.100
omnivorous hypothesis we need that S statistic and we have to find the P value afterwards so let us find the S statistics.
00:46:43.100 --> 00:46:54.000
Go to your example again, this is example 1, let us put in all the different things you need.
00:46:54.000 --> 00:47:02.000
So you need the variance between over the variance of the residual variance so let us start off
00:47:02.000 --> 00:47:07.000
with variance between, it is something we already know, we know it is been split up into sum of
00:47:07.000 --> 00:47:09.800
squares between and degrees of freedom between.
00:47:09.800 --> 00:47:14.500
We actually have degrees of freedom between already so let us just fill that in, in order to find
00:47:14.500 --> 00:47:21.000
the sum of squares between, you have to find the means for each of these groups.
00:47:21.000 --> 00:47:25.500
We are also going to need to find out what is the N.
00:47:25.500 --> 00:47:40.700
That is actually quite simple because we know that it is 29 for each of these groups so that makes life a little bit simpler.
00:47:40.700 --> 00:47:54.300
Now let us find the averages for each of these samples so for the first sample I believe this is tag photos, the mean is 9.93,
00:47:54.300 --> 00:48:25.500
I believe this is mobile uploads, that is 12.45 for uploaded photos, that averages 68 and finally for profile photos the average is 1.5.
00:48:25.500 --> 00:48:29.500
Okay so now we going to have to calculate the grand mean.
00:48:29.500 --> 00:48:39.500
The grand mean is quite easy to do on XL because you just take all your data points every single one and you calculate that average.
00:48:39.500 --> 00:48:41.400
The average is 23.
00:48:41.400 --> 00:48:49.000
I am just going to copy and paste that here, what I did was they put a point here so that it would
00:48:49.000 --> 00:48:55.500
just point to that top value for the granting shouldn't change the granting is always the same.
00:48:55.500 --> 00:49:08.000
Now that we have all of these values we could find N times XR minus the grand mean squared.
00:49:08.000 --> 00:49:28.000
We could find that for each group, and then when we add that up, we end up getting our sum of
00:49:28.000 --> 00:49:33.200
squares between and we get this joint number 82,700.
00:49:33.200 --> 00:49:46.300
And so I am just going to put a pointer = point to that guy and then I am going to find out my variance between.
00:49:46.300 --> 00:49:52.800
So my variance is still quite large about 27,600.
00:49:52.800 --> 00:50:00.600
Okay so to have that now we need to find my variance of my residual variance.
00:50:00.600 --> 00:50:07.200
In order to find residual variance, I know I am going to need to find all this other stuff that I did not necessarily plan on.
00:50:07.200 --> 00:50:13.900
So one of the things I do need to find is my SS total as well as my SS subject.
00:50:13.900 --> 00:50:22.800
I am going to start with SS total because although the idea is simple to on XL it looks a little crazy
00:50:22.800 --> 00:50:28.300
just because it takes up a lot of space because we going to need to find this square distance
00:50:28.300 --> 00:50:33.500
away from the grand mean for every single data point.
00:50:33.500 --> 00:50:39.700
So here, all my data points are here.
00:50:39.700 --> 00:50:55.100
Now I am going to need to find the square distance of this guy away from the grand mean, and then add them all up.
00:50:55.100 --> 00:51:07.300
What is helpful in XL is to create separate rows and then to sort of add them up and so I am just
00:51:07.300 --> 00:51:15.200
going to use save these for later, and so this is, I have already put in the formulas here, this 1 is
00:51:15.200 --> 00:51:25.000
tag for the tag photos, it is sort of my partial way to find SS total just for the tag photos and I am
00:51:25.000 --> 00:51:31.000
going to do it for the mobile photos and for the uploaded photos then for profile photos and then add them altogether .
00:51:31.000 --> 00:51:33.700
So either sort of subtotal.
00:51:33.700 --> 00:51:45.000
So what I need to find is that data points minus the grand mean and I will just use this grand mean that I found down here.
00:51:45.000 --> 00:51:53.300
But what I need to do is I need to lock that down I need to say always use this grand mean do not use any other one.
00:51:53.300 --> 00:51:58.000
You put that in parentheses so that I could square it.
00:51:58.000 --> 00:52:13.000
So here I am going to do that all the way down for tag photos and just take this across for mobile
00:52:13.000 --> 00:52:23.900
uploaded and profile photos and that is the nice thing about XL it will give you all of these values very very easily.
00:52:23.900 --> 00:52:31.900
I am just going to shortness this for second, just to show you what each of these is talking about.
00:52:31.900 --> 00:52:40.300
So click on this one, this cell gives me this value minus my grand mean which is locked down
00:52:40.300 --> 00:52:51.600
squared so I have now and every single data points square distance away from the grand mean and these are all the differences square distance.
00:52:51.600 --> 00:52:54.100
Now I need to add them all up.
00:52:54.100 --> 00:53:03.700
So put sum, and I am not just going to add up this column I am literally going to add all this up.
00:53:03.700 --> 00:53:12.400
So our total sum of squares is 257,000.
00:53:12.400 --> 00:53:21.500
So I am going to go down to my sum of squares total and just put a pointer here and say that is it.
00:53:21.500 --> 00:53:31.700
So how do I find my sum of squares for the subject level variation?
00:53:31.700 --> 00:53:39.000
Well, this I know I need to find the mean for every subject then I need to find the distance
00:53:39.000 --> 00:53:46.600
between that mean and the grand mean square that and multiply it by how many groups I have.
00:53:46.600 --> 00:53:55.000
The nice thing is the number of groups I have this constant is always four for everybody so let us go ahead and find subject level mean.
00:53:55.000 --> 00:54:07.800
So subject means are going to be found by averaging one person measures for all 4 sample and
00:54:07.800 --> 00:54:18.700
so that guys average of zero, just copy and paste that down, if they wanted to check this one takes the average of these four measures.
00:54:18.700 --> 00:54:28.400
So this is subject level variation and this shows you that this guy has a lot fewer photos period than this guy.
00:54:28.400 --> 00:54:33.600
He has just an average a lot higher photos than this guy.
00:54:33.600 --> 00:54:36.500
And this guy is sort of in the middle of those two.
00:54:36.500 --> 00:54:46.500
Once we have these subject level means now we could find this idea K times the difference
00:54:46.500 --> 00:55:01.800
squared for each subject so I know my K is going to be 4 times my subject level mean minus the
00:55:01.800 --> 00:55:09.100
grand mean and I will just use my already calculated grand mean down here and I need to lock
00:55:09.100 --> 00:55:16.600
that grand mean down because that grand mean is never going to change squared.
00:55:16.600 --> 00:55:35.600
Once I have that then I probably want to add them all up in order to get my sum of squares for within subject variation.
00:55:35.600 --> 00:55:45.400
I will just put this little sum signs so that I know that this is an another like data point, it is a
00:55:45.400 --> 00:56:00.300
totally different thing, sum, and once I have that it is 56,600 and I know my sum of squares within subject.
00:56:00.300 --> 00:56:09.300
Once I knew all those things now I can finally calculate some of squares residual because I know my ingredients.
00:56:09.300 --> 00:56:19.800
I have my sum of squares total minus the sum of squares per subject plus the sum of squares
00:56:19.800 --> 00:56:30.200
between and I could obviously distribute out that negative sign but I will just use the parentheses.
00:56:30.200 --> 00:56:39.400
So here is my leftover sum of squares that whatever's leftover unaccounted for and I already
00:56:39.400 --> 00:56:50.100
figured out my DF residual and so here I am going to put my sum of squares residual divided by
00:56:50.100 --> 00:56:56.300
degrees of freedom residual and there get 1400.
00:56:56.300 --> 00:57:06.900
So now we can finally finally calculate our F by taking the variance between and dividing that by the variance residual variance.
00:57:06.900 --> 00:57:14.900
In there I get 19.69 which is quite a bit above the critical F of 2.7.
00:57:14.900 --> 00:57:21.200
Now once I have that now I could find my P value.
00:57:21.200 --> 00:57:36.000
So by May P value I would put in my F, put in my F value, my numerator degrees of freedom as
00:57:36.000 --> 00:57:48.000
well as my denominator degrees of freedom and I get 9.3×10 to the negative 10 Power so that
00:57:48.000 --> 00:57:54.700
means there is a lot of decimal places before you get to that 9 so it is very very very very small P value.
00:57:54.700 --> 00:57:58.300
So what do we do?
00:57:58.300 --> 00:58:00.600
We reject the null.
00:58:00.600 --> 00:58:10.000
Also remember that in a F test , all we do is reject the Omnibus null hypothesis that does not
00:58:10.000 --> 00:58:15.400
mean we know which groups are actually different from each other so when you do reject the
00:58:15.400 --> 00:58:20.400
null after doing F test, you want to follow up and do post hoc test.
00:58:20.400 --> 00:58:26.800
There is lots of different post hoc test you might learn to keep postop or Bonferroni corrections
00:58:26.800 --> 00:58:35.100
so those all help us know the pairwise comparisons to figure out which means are actually
00:58:35.100 --> 00:58:44.000
different from which other means and you probably also want to find effect size and in F test if
00:58:44.000 --> 00:58:47.300
effect size is not D or G instead its η squared.
00:58:47.300 --> 00:58:56.800
So we would reject a null.
00:58:56.800 --> 00:59:06.600
Example 2, a weightless boot camp is trying out three different exercise programs to help their clients shed some extra pounds.
00:59:06.600 --> 00:59:11.700
All participants are assigned to team up 4 people and each week their entire team is weight
00:59:11.700 --> 00:59:14.400
together to see how many pounds they were able to take off.
00:59:14.400 --> 00:59:18.000
The data shows their weekly weight loss as a team.
00:59:18.000 --> 00:59:24.500
With a exercise program all equally effective in helping them lose weight note that all teams
00:59:24.500 --> 00:59:29.600
tried all three exercise regime but they all receive the treatment in random order.
00:59:29.600 --> 00:59:34.100
So this is definitely a case where we have three different treatments.
00:59:34.100 --> 00:59:41.200
Treatment 1, 2 and 3 and we have data points which are going to be pounds lost.
00:59:41.200 --> 00:59:50.000
How many pounds they were able to take off per week pounds loss per week but these are not independent samples.
00:59:50.000 --> 00:59:51.900
They are actually linked to each other.
00:59:51.900 --> 00:59:54.000
What's the link?
00:59:54.000 --> 01:00:02.600
It is the team of four that lost that weight right so this team lost that much under this exercise
01:00:02.600 --> 01:00:08.400
regime, lost that much under this exercise regime, lost that much under this exercise regime.
01:00:08.400 --> 01:00:12.500
Now each team got these three exercise regimes in a different order.
01:00:12.500 --> 01:00:27.100
Some people are 3, 2, 1, so they have all been balanced in that way so if you pull up your examples and good example 2, you will see this data set.
01:00:27.100 --> 01:00:30.000
So here are the different teams or squads.
01:00:30.000 --> 01:00:35.000
Here are the three different types of exercise program and in the different orders that they
01:00:35.000 --> 01:00:41.800
were, that they did these exercises and each exercise was done for a week.
01:00:41.800 --> 01:00:44.500
So let us think about this.
01:00:44.500 --> 01:01:00.200
So to begin with we need a hypothesis so step one is the null hypothesis and all are equal.
01:01:00.200 --> 01:01:06.900
So all the mutinies, exercise 1, exercise 2, exercise 3, they are all equal.
01:01:06.900 --> 01:01:13.900
The alternative hypothesis is that not all are equal.
01:01:13.900 --> 01:01:25.900
So step 2 is our significance level we could just set α equals to .05 once again because it on
01:01:25.900 --> 01:01:32.400
the best hypothesis we know we are going to do a F test so it does not need to be two-tailed.
01:01:32.400 --> 01:01:43.500
So step three this is the decision stage if user imagine that F distribution or color in that part,
01:01:43.500 --> 01:01:46.600
what is that critical F?
01:01:46.600 --> 01:01:55.000
Well, in order to find the critical F, we are going to need to find the DF between as well as the DF
01:01:55.000 --> 01:02:01.700
residual because that is the numerator and the denominator degrees of freedom.
01:02:01.700 --> 01:02:10.400
In order to find DF residual we also need to find DF subject and remember here subject does not
01:02:10.400 --> 01:02:13.500
mean each individual person, subject really mean case.
01:02:13.500 --> 01:02:16.100
And each case here is a squad.
01:02:16.100 --> 01:02:21.500
So how many squads are there -1.
01:02:21.500 --> 01:02:26.600
So count how many squads there are -1.
01:02:26.600 --> 01:02:40.600
So there is 11° of freedom or subject.
01:02:40.600 --> 01:02:53.800
For degrees of freedom between what were going to need is the number of different samples which is three okay -1 so 3 - 1 is 2.
01:02:53.800 --> 01:03:04.300
And so my DF residual is the DF between times the DF subject and that is 22 so let us find the critical F.
01:03:04.300 --> 01:03:12.600
We need F inverse, the probability that we need is .05, the degrees of freedom for the
01:03:12.600 --> 01:03:22.400
numerator is 2, the degrees of freedom for the denominator is 22 and our critical F is 3.44.
01:03:22.400 --> 01:03:42.900
Step 4, here we are going to need the F statistic and in order to find F, we need the variance between divided by the variance residual.
01:03:42.900 --> 01:03:53.600
In order to find variance between we are going to need the SS between divided by DF between,
01:03:53.600 --> 01:03:58.000
we already have DF between thankfully, so we do need SS between.
01:03:58.000 --> 01:04:09.100
And the concept of SS between is the whole idea of each samples X bar, their distant away from
01:04:09.100 --> 01:04:16.500
the grand mean squared and then depending on how many subjects you had in your sample how
01:04:16.500 --> 01:04:20.300
many data points you had in your sample you get waited more or less.
01:04:20.300 --> 01:04:23.900
Now the nice thing is all of these have the same number of subjects.
01:04:23.900 --> 01:04:29.500
But let us go ahead and and try to do this.
01:04:29.500 --> 01:04:39.000
So first we need the different samples so this exercise 1, exercise 2, exercise 3, we need their N,
01:04:39.000 --> 01:04:47.700
their N is going to be 12, there is 12 data points in each sample.
01:04:47.700 --> 01:05:01.000
We also need there each exercise regimes average weight loss so we need X bar and we also
01:05:01.000 --> 01:05:08.000
need the grand mean because ultimately we are going to look for N times X bar minus the grand
01:05:08.000 --> 01:05:12.600
mean squared in order to add all of those up.
01:05:12.600 --> 01:05:18.200
So let us find X bars for exercise regime number 1.
01:05:18.200 --> 01:05:31.400
So that an XL makes it nice and easy for us to just find out all those averages very quickly and
01:05:31.400 --> 01:05:38.100
then once we have that, now we can find the grand mean.
01:05:38.100 --> 01:05:40.800
The grand mean is also very easy to find here.
01:05:40.800 --> 01:05:44.000
We just want to select all the data points.
01:05:44.000 --> 01:05:50.000
I think I selected one of them twice, be careful about that.
01:05:50.000 --> 01:05:59.800
So make sure everybody is selected just one time so this is the average weight loss per week
01:05:59.800 --> 01:06:05.500
regardless of which team you were on regardless of which exercise you did.
01:06:05.500 --> 01:06:23.400
And now let us find N times the X bar minus the grand mean squared and let us do that for each for each exercise regime.
01:06:23.400 --> 01:06:37.200
Once we have that done we could find the sum, and the sum is 23.63.
01:06:37.200 --> 01:06:46.500
So here in SS between I would put that number there.
01:06:46.500 --> 01:06:58.000
So once we have that now we could actually find this because we already have calculated the DF between, was not too hard.
01:06:58.000 --> 01:07:10.900
Now we have to work on variance residual, now in order to find variance residual, let me just add
01:07:10.900 --> 01:07:29.000
in a couple of rows here just to give me a little more space, variance residual, in order to find
01:07:29.000 --> 01:07:34.600
variance residual I am going to need to find SS residual divided by DF residual.
01:07:34.600 --> 01:07:42.600
We already have DF residual so we just need to find SS residual, in order to find that I need SS
01:07:42.600 --> 01:07:50.800
total minus SS between + SS subject level.
01:07:50.800 --> 01:08:00.400
So I already have my SS between so I need to find SS total and SS for each subject.
01:08:00.400 --> 01:08:12.600
So SS total is going to be for every single exercise regime, for every single one of these data
01:08:12.600 --> 01:08:23.700
points I need to find that distance away from the grand mean, add them all up and square and that is going to be my SS total.
01:08:23.700 --> 01:08:40.500
So for E1 here is my subtotal for SS total, for E2, my subtotal for SS total, for E3, my subtotal for SS total.
01:08:40.500 --> 01:09:01.500
So that is X minus the grand mean, lock that grand mean down, squared and make sure you do
01:09:01.500 --> 01:09:10.800
that for every single data point in E1 so if I check on that last data point and just go ahead and
01:09:10.800 --> 01:09:17.200
copy and paste that although it have to hear let us just checked on this one, this is taking this
01:09:17.200 --> 01:09:23.900
value, subtracting the grand mean from it and then squaring that distance.
01:09:23.900 --> 01:09:40.500
So once I have this, I could sum them all up and get my SS total, my total sum of squared distances.
01:09:40.500 --> 01:09:46.700
So I am just going to put a pointer here so that I do not have to rewrite the number.
01:09:46.700 --> 01:09:50.000
Once I have that all I have to find the SS subject.
01:09:50.000 --> 01:09:55.000
Now remember, the SS subject each subject has its own little mean could be repeatedly make
01:09:55.000 --> 01:10:01.400
the measure right so we have to find the subjects mean and then we have to get the distance
01:10:01.400 --> 01:10:11.300
between their mean and the grand mean, square that and multiply it by the number of measures, K.
01:10:11.300 --> 01:10:26.300
So let us do that here, first we need to find the subjects X bars so that is going to be each squads
01:10:26.300 --> 01:10:42.000
average weight loss so some squads probably lost more weight than others, so this is the average
01:10:42.000 --> 01:10:59.200
weight loss for each squad so it looks like you know this squad loss a bit, so a little bit of variation
01:10:59.200 --> 01:11:12.300
in subjects success and sure we are going to look at K times the subjects X bar minus the grand
01:11:12.300 --> 01:11:24.100
mean squared so we already know K, K is going to be 3 times the subjects X bar minus the grand
01:11:24.100 --> 01:11:38.400
mean, I am just going to use the one we have already calculated down here and of course lock that down so copy and paste this, squared.
01:11:38.400 --> 01:11:51.900
So copy and paste that all the way down and I could find the summary here and this is going to be my sum of squares for subject.
01:11:51.900 --> 01:11:57.700
That is the sum of the bunch of squares.
01:11:57.700 --> 01:12:01.100
So that is 34 something.
01:12:01.100 --> 01:12:11.800
I am just going to put a pointer there so I do not have to retype that but I could just see it nice and clearly right here.
01:12:11.800 --> 01:12:24.100
So now you have everything I need in order to find SS residual so I need SS total minus my sum of squares between plus some of squares subject.
01:12:24.100 --> 01:12:35.800
Once I have that now I could find my vary residual variance divided by degrees of freedom, okay
01:12:35.800 --> 01:12:52.000
so here it looks like my residual variance is much smaller than my between sample variance and
01:12:52.000 --> 01:13:01.000
so I could predict my F value will be pretty big so 11 point something divided by two point
01:13:01.000 --> 01:13:11.200
something and that gives me 5.219 and that is a little bit bigger than my critical F.
01:13:11.200 --> 01:13:23.300
So if I find my key value F disc and put in my F, my numerator degrees of freedom, my
01:13:23.300 --> 01:13:34.200
denominator degrees of freedom, I would find .01 so that seems like a pretty small, smaller than
01:13:34.200 --> 01:13:38.000
.05 so I am going to be rejecting my null.
01:13:38.000 --> 01:13:44.900
So step five down here, reject the null.
01:13:44.900 --> 01:13:52.000
And we know that once you reject the null you are going to need to also do post hoc tests as well as find it a square.
01:13:52.000 --> 01:14:05.300
So that brings us to example 3 what is the problem with a bunch of tiny t-test?
01:14:05.300 --> 01:14:18.000
Well with so many t-test the probability of type 1 error increases increasing the cut off A, actually
01:14:18.000 --> 01:14:25.900
were not increasing the cut off, we are keeping it at .05 but the type 1 error increases because
01:14:25.900 --> 01:14:29.800
we reject we have the possibility of rejecting the null multiple times.
01:14:29.800 --> 01:14:38.200
With so many t-test the probability of type 1 error increases here it is because we may be rejecting more null hypothesis.
01:14:38.200 --> 01:14:43.900
This is actually a correct answer so we might not be done yet.
01:14:43.900 --> 01:14:50.800
With so many paired samples t-test we have a better estimate of S because we have been estimating S several times.
01:14:50.800 --> 01:15:00.100
With so many paired samples t-test we have a poor estimate of S because were not using all of
01:15:00.100 --> 01:15:08.200
the data to estimate one S in fact we are just using substance of the data to estimate S several times, that is a good answer.
01:15:08.200 --> 01:15:13.000
So that is it for repeated measures ANOVA, thanks for using educator.com.