WEBVTT mathematics/statistics/son
00:00:00.000 --> 00:00:02.600
Hi, welcome to educator.com.
00:00:02.600 --> 00:00:05.600
We are going to talk about effect size and power.
00:00:05.600 --> 00:00:11.200
So effect size and power, 2 things you need to think about whenever you do hypothesis testing.
00:00:11.200 --> 00:00:13.100
So first effect size.
00:00:13.100 --> 00:00:19.000
We are going to talk about what effect sizes is by contrasting it to the T statistic.
00:00:19.000 --> 00:00:26.400
They actually have a lot in common but there is just one subtle difference that makes a huge difference.
00:00:26.400 --> 00:00:32.400
Then we are going to talk about the rules of effect size and why we need effect size.
00:00:32.400 --> 00:00:35.600
Then we are going to talk about power.
00:00:35.600 --> 00:00:42.600
What is it, why do we need it, and how do all these different things affect power for instance sample size,
00:00:42.600 --> 00:00:47.700
effect size, variability in α, the significance level.
00:00:47.700 --> 00:00:56.000
So first things first, just a review of what the sample T really means.
00:00:56.000 --> 00:01:10.400
So a lot of times people just memorize the T formula, it is you know the X bar minus μ over standard error but think about what this actually means.
00:01:10.400 --> 00:01:16.600
So T equals X bar minus μ over the standard error.
00:01:16.600 --> 00:01:19.000
And all right that is S sub X bar.
00:01:19.000 --> 00:01:27.100
What this will end up giving you is this distance so the distance between your sample and your hypothesized μ.
00:01:27.100 --> 00:01:37.100
And when you divided by standard error you get how many standard errors you need in order to get from bar to your μ.
00:01:37.100 --> 00:01:42.100
So you get distance in terms of standard error.
00:01:42.100 --> 00:01:51.100
So distance in terms of standard error.
00:01:51.100 --> 00:02:02.700
And you want to think of in terms sort of like instead of using like feet or inches or number of friends, we get distance in the unit of standard error.
00:02:02.700 --> 00:02:12.000
So whatever your standard error is for instance here that looks about right, because this is the normal
00:02:12.000 --> 00:02:19.000
distribution that should be about 68% so that is the standard error.
00:02:19.000 --> 00:02:26.200
Your T is how many of these you need in order to get to T.
00:02:26.200 --> 00:02:44.000
So this might be like a T of 3 1/2, 3 1/2 standard errors away gets you from μ to your sample difference and so this is the case of the two sample t-test.
00:02:44.000 --> 00:02:48.700
So independent samples are paired samples where we know the μ is zero.
00:02:48.700 --> 00:02:55.000
So this is sort of the concept behind the T statistic.
00:02:55.000 --> 00:03:01.500
Now here is the problem with this T statistic.
00:03:01.500 --> 00:03:05.000
It is actually pretty sensitive to N.
00:03:05.000 --> 00:03:19.600
So let us say you have a difference that is very, that is going to stay the same so a difference between, you know let us say 10 and 0.
00:03:19.600 --> 00:03:22.300
So we have not that difference.
00:03:22.300 --> 00:03:32.300
If you have a very very large N then your S becomes a lot skinnier.
00:03:32.300 --> 00:03:45.700
And because of that your standard error is also going to shrink so the standard error shrinks as N shrinks.
00:03:45.700 --> 00:03:57.500
And because of that, even though we have not changed anything about this mean, about the X bar or μ,
00:03:57.500 --> 00:04:03.700
by shrinking our standard error we made our T quite large.
00:04:03.700 --> 00:04:11.700
So we made our T like all of a sudden were 6 standard or errors away but I really have not changed the picture.
00:04:11.700 --> 00:04:17.900
So that is actually a problem that T is so highly affected by N.
00:04:17.900 --> 00:04:28.000
The problem with that is that you could artificially make a difference between means, look statistically significant by having a very large N.
00:04:28.000 --> 00:04:38.100
So we need something that tells us this distance that is less affected by N and that is after effects size comes in.
00:04:38.100 --> 00:04:47.900
So in effect size what we are doing is we want to know the distance in terms of something that is not so affected by N.
00:04:47.900 --> 00:04:55.600
And in fact we are going to use the population standard deviation because let us think about T.
00:04:55.600 --> 00:05:02.800
So that is X bar minus μ over standard error.
00:05:02.800 --> 00:05:07.200
So this contrast that to looking at the distance in terms of the standard deviation of the population, what would that look like.
00:05:07.200 --> 00:05:20.900
Well, we could actually derive the formula ourselves.
00:05:20.900 --> 00:05:28.600
We want that distance in terms of you know number of inches or number problem correct or whatever the
00:05:28.600 --> 00:05:41.300
raw score is over instead the standard error we would just use S or if you had it you would use Sigma.
00:05:41.300 --> 00:05:48.800
So you could think of this as the estimated Sigma and this is like the real deal Sigma.
00:05:48.800 --> 00:05:59.700
And that is what effect size is and effect size is often symbolized by the letters D and G.
00:05:59.700 --> 00:06:07.400
D is reserved for when you have when you have Sigma, G is used for when you use S.
00:06:07.400 --> 00:06:13.400
Now let us talk about the roles of effect size.
00:06:13.400 --> 00:06:23.700
The nice thing about effect size is that the N does not matter as much whether you have a small sample or large sample the effect size stays similar.
00:06:23.700 --> 00:06:36.500
In test statistics suggest T or Z, the N matters quite a bit and let us think again about why.
00:06:36.500 --> 00:06:46.100
So the T statistic I have been writing at so far as over standard error but let us think about what standard error is.
00:06:46.100 --> 00:06:57.100
Standard error is S divided by the square root of N, now as N gets bigger and bigger so let us think about N getting bigger.
00:06:57.100 --> 00:07:10.300
This whole thing in the denominator, this whole idea this whole thing becomes smaller and smaller.
00:07:10.300 --> 00:07:21.400
And when you divide a positive or negative or positive, if you divide some distance by a small number then
00:07:21.400 --> 00:07:27.900
you end up getting a more extreme value, more extreme.
00:07:27.900 --> 00:07:43.300
So by more extreme I mean way more positive, more positive ,more on the positive side or way more negative.
00:07:43.300 --> 00:07:54.100
So the T statistic is very very sensitive to N so is the Z because Z the only difference is instead of S we use Sigma.
00:07:54.100 --> 00:08:07.900
And so the same logic applies but for effect size T and G we do not divide by square root of N so in that way N does not really have as much.
00:08:07.900 --> 00:08:20.200
Okay so one thing to remember is if you know Sigma use covens D, if you need to estimate the standard deviation from the sample S, you want to use hedges G.
00:08:20.200 --> 00:08:30.500
Okay so now you know what effect size is and it is nice that it is not as affected by N but why do we need it?
00:08:30.500 --> 00:08:47.500
Well effect size is what we use, the statistically used to interpret practical significant so for instance we 0839.8 might have some sort of very small difference between group 1 and group 2 so with the males and females
00:08:47.500 --> 00:09:00.500
on some game or task, there is a very tiny difference like you know let us just say males are ahead by .0001 points.
00:09:00.500 --> 00:09:09.000
And practically, it sort if does not matter but if you have a large enough effect size if you have a large
00:09:09.000 --> 00:09:16.200
enough N you could get a small enough standard error that you can make that tiny difference seem like a
00:09:16.200 --> 00:09:25.300
big deal and you can imagine that would be sort of odd situation.
00:09:25.300 --> 00:09:32.800
We take a difference that sort of does not matter but then make a big deal out of it because of some fancy statistics we did.
00:09:32.800 --> 00:09:40.300
Well, that effect size is not going to be affected by N and so that going to give you more straightforward
00:09:40.300 --> 00:09:44.400
measure of is this difference big enough for sort of just care about.
00:09:44.400 --> 00:09:52.500
It is not going to tell you whether it was significant or not based on hypothesis testing but it can give you
00:09:52.500 --> 00:10:01.200
the idea of practical significant and here were using the modern term for significant as in important.
00:10:01.200 --> 00:10:14.100
It will tell you a bit practical importance not statistical outlier nests, that is how it is telling you, it is talking
00:10:14.100 --> 00:10:22.800
about just regular old practical importance and the way you can think about this is just thinking about it as is this different worth noticing.
00:10:22.800 --> 00:10:25.200
Is that worth even doing statistics on?
00:10:25.200 --> 00:10:32.600
The thing about hypothesis testing is that it could be deceiving, a very large sample size can lead to a
00:10:32.600 --> 00:10:41.100
statistically significant one of these outlier differences that we really do not care about that just has no practical significant.
00:10:41.100 --> 00:10:48.500
So here although we have been trying to talk about this again and again trying to sort of clarify that
00:10:48.500 --> 00:10:56.100
statistically significant does not mean important it just means it lies outside of our expectation.
00:10:56.100 --> 00:11:03.000
It is important to realize once again that statistical significance does not equal practical significant.
00:11:03.000 --> 00:11:12.200
This is sort of talking about how important something is and this is just sort of saying, does it stand out?
00:11:12.200 --> 00:11:18.700
Does our X bar our sample actually stand out?
00:11:18.700 --> 00:11:23.900
Okay now let us move on to power.
00:11:23.900 --> 00:11:25.300
What is power?
00:11:25.300 --> 00:11:31.200
Well, how we really needs to go back to our understanding of the two types of errors.
00:11:31.200 --> 00:11:35.400
Remember in hypothesis testing we can make an error in two different ways.
00:11:35.400 --> 00:11:44.300
One is the false alarm error and we set that false alarm error rate by α and the other kind of error is
00:11:44.300 --> 00:11:48.200
this incorrect decision that we can make called the miss.
00:11:48.200 --> 00:11:56.800
A miss is when we fail to reject the hypothesis, that the null hypothesis but we should, we really should.
00:11:56.800 --> 00:12:04.800
And that is signified by β, by the term β.
00:12:04.800 --> 00:12:18.900
Now when the null hypothesis is true then we can know if we already set our probability of making
00:12:18.900 --> 00:12:36.000
incorrect decision, just like subtraction we can figure out our probability of making a correct decision so if 1225.5 our probability is .05 in making incorrect decision, the other possibility is that we may correct decision 95%, 1-.05.
00:12:36.000 --> 00:12:44.000
In the same way when the null hypothesis is actually false we could figure out our probability of actually
00:12:44.000 --> 00:12:52.300
making a correct decision by just subtracting our probability of making incorrect decision from one.
00:12:52.300 --> 00:12:55.200
So this would be one minus β.
00:12:55.200 --> 00:13:06.800
In that way these two decisions that we make they add up to a probability of one and this 2 decisions that we can make add up to probability of one.
00:13:06.800 --> 00:13:14.300
But in reality only one of these worlds is true that is why they both have a probability of 1.
00:13:14.300 --> 00:13:22.600
We just have no idea whether this one is true or this one is true and anyone can never really say but that is the philosophical question.
00:13:22.600 --> 00:13:31.300
So given this picture, power resides here and this quadrant is what we think as power.
00:13:31.300 --> 00:13:44.000
Now power is just the idea given that the world is actually false, that this world we live in pretend we
00:13:44.000 --> 00:13:54.700
ignore this part right so I am just, just ignore this entire world, given that this null hypothesis is false, what
00:13:54.700 --> 00:14:03.300
is our probability of actually rejecting the null hypothesis and that is what we call power.
00:14:03.300 --> 00:14:14.900
So think of this as the probability of rejecting null when the null is false.
00:14:14.900 --> 00:14:24.400
So why do we need power, why do we need 1 – β?
00:14:24.400 --> 00:14:32.700
Well, here it is going to come back, those concepts come right back.
00:14:32.700 --> 00:14:39.000
Remember the idea that you know sometimes we wanted to detect some sort of disease right and we
00:14:39.000 --> 00:14:48.200
might give a test like for instance we want to know whether someone has HIV and so we give them a blood test to figure out, do they have HIV.
00:14:48.200 --> 00:14:57.400
Now this test are not perfect and so there is some chance that they will be able to detect the disease and some chance that will make a mistake.
00:14:57.400 --> 00:15:03.600
There is two ways that there is two ways of thinking about this prediction.
00:15:03.600 --> 00:15:16.500
One is what we call a positive predictive, value we could think about what is the probability that someone has the disease for instance HIV given that they test positive?
00:15:16.500 --> 00:15:26.200
Well this will help us know what is the chance that they actually have the disease once we know their test score.
00:15:26.200 --> 00:15:32.500
In this world we know their test scores and we want to know what is the probability that they have the disease.
00:15:32.500 --> 00:15:35.900
On the other hand we have what is called sensitivity.
00:15:35.900 --> 00:15:39.800
Sensitivity thinks about the world in a slightly foot way.
00:15:39.800 --> 00:15:50.500
Given that this person has the disease, has whatever disease such as HIV, one is the probability that they will actually test positive.
00:15:50.500 --> 00:15:54.200
And said that at these two actually give us very different world.
00:15:54.200 --> 00:16:07.400
In one world the given is that they have a positive test and what is the probability that they have the disease versus no decease.
00:16:07.400 --> 00:16:13.400
In this scenario the given is very different.
00:16:13.400 --> 00:16:16.600
The given is that they actually have the disease.
00:16:16.600 --> 00:16:23.300
Given that what is the probability that they will test positive versus negative?
00:16:23.300 --> 00:16:28.000
And so they are looking at this or they are looking at this.
00:16:28.000 --> 00:16:44.200
Now power is basically the probability of getting a hit, the probability of rejecting that null hypothesis given 1637.9 that the null hypothesis is actually false so it is actually wrong.
00:16:44.200 --> 00:16:49.700
Is this more like PPV, positive predictive value?
00:16:49.700 --> 00:16:51.800
Or is it more like sensitivity.
00:16:51.800 --> 00:16:53.700
Well let us think about this.
00:16:53.700 --> 00:17:02.500
In this world there is this reality that the given reality is that this is false.
00:17:02.500 --> 00:17:07.500
We need to reject it.
00:17:07.500 --> 00:17:12.100
What is the probability that will actually be rejected?
00:17:12.100 --> 00:17:20.400
So reject or fail to reject.
00:17:20.400 --> 00:17:32.000
Well one way of thinking about this in a more, in the comparison is to consider, what is this thing that we
00:17:32.000 --> 00:17:34.800
do not know in these two scenarios?
00:17:34.800 --> 00:17:39.300
We do not really know if they actually have HIV.
00:17:39.300 --> 00:17:45.500
We know their test we know that their test is either positive or negative and the test is uncertain but
00:17:45.500 --> 00:17:54.000
whether they actually have HIV or not, that does not have uncertainty, it is just that we do not know what it is.
00:17:54.000 --> 00:17:58.000
This is sort of like HIV in that way.
00:17:58.000 --> 00:18:08.500
This is the reality so HIV is the reality and this, this is the test results.
00:18:08.500 --> 00:18:21.400
This is also the reality and these are the results of hypothesis testing.
00:18:21.400 --> 00:18:27.600
And so in that way this picture is much more like sensitivity.
00:18:27.600 --> 00:18:36.600
And really when we apply the word sensitivity we see a whole new way of looking at power.
00:18:36.600 --> 00:18:45.200
Power is the idea how sensitive is your hypothesis test when there really is something to detect, can it detect it?
00:18:45.200 --> 00:18:49.300
When there really is HIV, can your test detect it?
00:18:49.300 --> 00:18:56.000
When the null hypothesis really is false, can your test detect it?
00:18:56.000 --> 00:18:59.000
That is the question that power is asking.
00:18:59.000 --> 00:19:05.400
Okay if you calculate power is there nice little formula for?
00:19:05.400 --> 00:19:08.600
Well power is more like the tables in the back of your book.
00:19:08.600 --> 00:19:13.100
You cannot like calculate with like one simple straightforward formula.
00:19:13.100 --> 00:19:20.800
There is actually more complex formula that does not both calculus but we can simulate power for a whole
00:19:20.800 --> 00:19:29.600
bunch of different scenarios and those scenarios all depend on outline effect size and also variability in
00:19:29.600 --> 00:19:34.100
sample size and because of that power is often found through simulation.
00:19:34.100 --> 00:19:46.900
So I am not going to focus on calculating power, instead I am going to try to give you a conceptual understanding power.
00:19:46.900 --> 00:19:55.200
Now often a desired level of power and sometimes you may be working with computer programs that might calculate power for you.
00:19:55.200 --> 00:20:06.400
A different level of power that you want to shoot for is .8 or above but I want you to know how this power interact with all this things.
00:20:06.400 --> 00:20:11.800
All of these things actually go into the calculation of power but I want you to know what is the conceptual level.
00:20:11.800 --> 00:20:23.800
So how does α or the significance level affect power, how does affect size, D or G affect power, how
00:20:23.800 --> 00:20:33.900
does variability S or you know, S squared affect power and how to sample size affect power.
00:20:33.900 --> 00:20:40.900
Okay so first thing is how does α affect power?
00:20:40.900 --> 00:20:47.100
Well here in this picture, I shown you 2 distribution.
00:20:47.100 --> 00:20:54.100
You could think of this one is the null distribution and this one as the alternative distribution.
00:20:54.100 --> 00:21:03.500
And noticed that both of these both of these distributions up here are exactly the same down here I just copied and pasted.
00:21:03.500 --> 00:21:16.200
The only thing that is different is not their means or the actual distribution.
00:21:16.200 --> 00:21:18.800
The only thing that is different is the cut off.
00:21:18.800 --> 00:21:37.000
Since here, the cut off scores right here and this is the α, and hear the cutoff score has been moved sort of closer towards the population mean.
00:21:37.000 --> 00:21:41.200
And now we have a huge α.
00:21:41.200 --> 00:21:43.800
So let us just assign some numbers here.
00:21:43.800 --> 00:21:57.500
I am just guessing that maybe that looks like α equals .05 that something more used to see, but this looks like maybe α equals let us say .15.
00:21:57.500 --> 00:22:02.700
What happens when we increase our α?
00:22:02.700 --> 00:22:06.300
Our α has gotten bigger, what happened, what happens to power?
00:22:06.300 --> 00:22:11.000
Well it might be helpful to think about what Power might be?
00:22:11.000 --> 00:22:23.900
In this picture, remember, power is the probability of rejecting the null hypothesis when the null hypothesis
00:22:23.900 --> 00:22:38.600
is actually false and here we often reject when it is more extreme than the cutoff value when your X bar is
00:22:38.600 --> 00:22:42.600
more extreme so these are the rejections of everything on this side.
00:22:42.600 --> 00:22:49.000
All of this stuff is reject, reject the null.
00:22:49.000 --> 00:23:00.400
And we want to look at the distribution where the null hypothesis is false a.k.a. the alternative hypothesis.
00:23:00.400 --> 00:23:15.800
So really were looking at this big section right here so here this big section, that is power, one minus β
00:23:15.800 --> 00:23:19.900
given that you could also figure out what β is.
00:23:19.900 --> 00:23:26.200
And β is our error rate for misses.
00:23:26.200 --> 00:23:41.300
When we fail to reject, fail to reject but the alternative hypothesis is true or the other way we could say it is the null hypothesis is false.
00:23:41.300 --> 00:23:49.700
So what happens to power when α becomes bigger?
00:23:49.700 --> 00:24:00.500
Well, let us colour in power right here and it seems like there is more of this distribution that is been colored in than this.
00:24:00.500 --> 00:24:13.300
So this part has been sort of added on, it used to be just this equals power but now we also have added on the section.
00:24:13.300 --> 00:24:23.900
So as α increases, power also increases.
00:24:23.900 --> 00:24:27.300
And hopefully you can see that from this picture.
00:24:27.300 --> 00:24:32.500
Now imagine moving α out this way so decreasing α.
00:24:32.500 --> 00:24:42.700
If we decreased α then this power portion of the distribution that power part will become smaller so
00:24:42.700 --> 00:24:56.000
the opposite, sort of the counterpoint to this is also true as α decreases, the power also decreases.
00:24:56.000 --> 00:25:11.500
But you might be asking yourself, then why cannot we just increase α so we could increase power right?
00:25:11.500 --> 00:25:15.700
Well, remember what α is, α is your false alarm rate.
00:25:15.700 --> 00:25:21.400
So when you increase α, you also increase your false alarm rate.
00:25:21.400 --> 00:25:25.900
So at the same time if you increase your false alarm rate your increasing power.
00:25:25.900 --> 00:25:33.100
And so this often is not a good way to increase power.
00:25:33.100 --> 00:25:38.600
But you should still know, with the relationship is.
00:25:38.600 --> 00:25:43.900
How about effect size, how does effect size affect power?
00:25:43.900 --> 00:25:55.200
Well remember, effect size is really sort of a, you can think of it roughly as this distance between the X bar and the μ.
00:25:55.200 --> 00:26:01.200
We are really looking at that distance in terms of standard deviation of the population.
00:26:01.200 --> 00:26:04.000
How does effect size affect power?
00:26:04.000 --> 00:26:14.600
Here I have drawn the same pictures, same cut off except I have moved this null, this alternative
00:26:14.600 --> 00:26:26.700
distribution a little bit out to be a more extreme so that we now have a larger distance, larger distance.
00:26:26.700 --> 00:26:40.000
And so this is a bigger effect size, bigger effect size so what happens when we increase the effect size and
00:26:40.000 --> 00:26:45.500
we keep everything else constant, that the cut off, the null hypothesis, everything.
00:26:45.500 --> 00:26:52.200
Well, let us colour in this, and colour in this.
00:26:52.200 --> 00:26:56.200
Which of these two blue areas is larger.
00:26:56.200 --> 00:26:57.800
Obviously this one.
00:26:57.800 --> 00:27:07.200
This power is bigger than this power and it is because we have a larger effect size so another thing we have
00:27:07.200 --> 00:27:26.300
learned is that larger effect size it leads to larger power so as you increase effect size you could increase power but here is the kicker.
00:27:26.300 --> 00:27:29.200
Can you increase effect size?
00:27:29.200 --> 00:27:31.600
Can you do anything about the effect size?
00:27:31.600 --> 00:27:35.100
Is there anything you could do?
00:27:35.100 --> 00:27:37.000
Not really.
00:27:37.000 --> 00:27:43.000
Effect size is something that sort of out there in the data but you cannot actually do anything to make it
00:27:43.000 --> 00:27:54.500
bigger but you should know that if you happen to have a larger effect size then you have more power than if your study of a small effect size.
00:27:54.500 --> 00:28:06.000
Okay so how does variability and sample size affect power?
00:28:06.000 --> 00:28:16.500
Now the reason I put these two things together is that remember, this distribution are S Toms, right?
00:28:16.500 --> 00:28:30.900
And so the variability in a S Tom right is actually standard error and standard error is S divided by the square root of N.
00:28:30.900 --> 00:28:39.700
So both variability and sample size will matter in power.
00:28:39.700 --> 00:28:43.500
And so here I want to show you how.
00:28:43.500 --> 00:28:53.200
Okay so here I have drawn the same means of the population of the S Toms and remember here we have
00:28:53.200 --> 00:28:58.900
the null hypothesis and the alternative hypothesis distribution.
00:28:58.900 --> 00:29:08.900
I have drawn the same pictures down here and I kept the same α about .05.
00:29:08.900 --> 00:29:20.000
So I had to move the cut off a little just so that I could color in .05 but something has changed and that is this.
00:29:20.000 --> 00:29:32.400
This a lot skinnier than this is, that is less variability so that S Tom has decreased in variability.
00:29:32.400 --> 00:29:46.000
So here standard error has decreased so we have sharper S Toms.
00:29:46.000 --> 00:29:50.200
Still normally distributed, just sharper.
00:29:50.200 --> 00:29:57.900
And so when we look at these skinnier distribution let us look at the consequences for power.
00:29:57.900 --> 00:30:10.500
Here lets color in power and let us color in power right here and it also helps to see what β is.
00:30:10.500 --> 00:30:15.000
So here we have a quite a large β and here we have a tiny β.
00:30:15.000 --> 00:30:32.500
And so that makes you realize that the one minus β appear the power here is larger than the one minus β down here.
00:30:32.500 --> 00:30:49.400
This is smaller than the 1 – β down here because remember were talking about proportions.
00:30:49.400 --> 00:30:55.500
This whole thing add up to 1, now this might look smaller to you, the whole thing adds up to one.
00:30:55.500 --> 00:31:01.300
If this is a really small proportion so let us do a number on it, that was less than .05.
00:31:01.300 --> 00:31:03.500
Let us go on .02.
00:31:03.500 --> 00:31:08.900
This looks bigger than .05 so let us go on .08.
00:31:08.900 --> 00:31:19.600
Then 1 - β here would be 92% and 1 - β here would be 98% so this is a larger power than this.
00:31:19.600 --> 00:31:39.600
So one thing we have seen is our standard error decreases so it is decreasing then power increases so this is what we call a negative relationship.
00:31:39.600 --> 00:31:48.100
As one goes down the other goes up and vice versa as standard error increases as these distribution
00:31:48.100 --> 00:31:53.600
become fatter and fatter then power will decrease overall the opposite way.
00:31:53.600 --> 00:32:01.000
Now because we already know this about standard error we could actually say something about sample
00:32:01.000 --> 00:32:09.300
size because sample size actually has the opposite it also has a negative relationship with standard error
00:32:09.300 --> 00:32:15.800
and sample size go get bigger and bigger and bigger standard error gets smaller and smaller and smaller
00:32:15.800 --> 00:32:27.000
and so sample size actually have a positive relationship with power so as sample size increases and
00:32:27.000 --> 00:32:33.800
therefore standard error decreases, power increases.
00:32:33.800 --> 00:32:44.800
And so we could figure that out just by reasoning through what standard error really mean.
00:32:44.800 --> 00:32:56.200
Okay so how do we increase power because often times your power or sensitivity is really a good thing.
00:32:56.200 --> 00:33:06.700
We want to be able to have experiments and studies that have a lot of power that would be a good hypothesis testing adventure to embark on.
00:33:06.700 --> 00:33:08.900
How do we actually increase it.
00:33:08.900 --> 00:33:14.000
Well can we just do this by changing α?
00:33:14.000 --> 00:33:29.900
Well the problem with this is that you get some consequences namely that falls alarms increase.
00:33:29.900 --> 00:33:40.500
So if you increase power with this strategy you are also going to increase false alarm, that is very dangerous so that is not something we want to do.
00:33:40.500 --> 00:33:43.000
That is type 1 error so that is something we do not want to do.
00:33:43.000 --> 00:33:49.600
So you do not want to change power by changing α although that is something under our control.
00:33:49.600 --> 00:33:58.700
Now we could try to change effect size but because of effect size is something that is already sort of true in
00:33:58.700 --> 00:34:05.500
the world right like what we have to do to mess with standard error of the standard deviation of the
00:34:05.500 --> 00:34:12.100
population, we cannot mess with that so this is actually something that is impossible to do.
00:34:12.100 --> 00:34:26.800
So that is one thing that we wish we could do but cannot do anything about.
00:34:26.800 --> 00:34:32.500
Can we change the variability in our sample, can we change the variability?
00:34:32.500 --> 00:34:35.400
Indirectly, we can.
00:34:35.400 --> 00:34:41.100
There is really one way to be able to change standard error.
00:34:41.100 --> 00:34:45.300
Can we do this by changing the standard deviation of the population?
00:34:45.300 --> 00:34:50.000
No, we cannot do that, that is out of our control.
00:34:50.000 --> 00:34:52.900
But we can change N.
00:34:52.900 --> 00:35:01.900
We can collect more data instead of having 40 subjects or cases in our study, we can have 80.
00:35:01.900 --> 00:35:09.800
And in that way we can increase our power and so really the one thing that sort of one tool that sort of
00:35:09.800 --> 00:35:15.800
available to us as researchers in order to affect power is really affecting sample size.
00:35:15.800 --> 00:35:19.700
None of these other things are really that appealing to us.
00:35:19.700 --> 00:35:33.100
We cannot change population variability, we cannot change effect size and if we change α then that is a dangerous option.
00:35:33.100 --> 00:35:39.000
And so what we have left here is affecting sample size.
00:35:39.000 --> 00:35:43.700
Now let us go on to some examples.
00:35:43.700 --> 00:35:49.400
Statistical test is designed with a significance level of .05 sample size of 100.
00:35:49.400 --> 00:35:59.700
As similar test of the same null hypothesis is designed with a significant level of .1 and a sample size of 100.
00:35:59.700 --> 00:36:05.400
If the null hypothesis is false which test has greater power?
00:36:05.400 --> 00:36:08.600
Okay so let us think about this.
00:36:08.600 --> 00:36:18.300
Here we have a situation one test one where α equals .05.
00:36:18.300 --> 00:36:30.400
Test 2 the other test, α = .10 so here α is larger.
00:36:30.400 --> 00:36:45.300
Remember α is moving that critical test statistic so we have taken this and let us have this α right
00:36:45.300 --> 00:36:57.600
here and what we do is we moved it over, moved it over here, well not that far but just so you can get the idea.
00:36:57.600 --> 00:37:11.500
And now our α is much bigger but what we see is that our β, our 1 - β has also gotten a lot bigger.
00:37:11.500 --> 00:37:23.800
So here we see that power increases but we should also note that now we have a higher tolerance for false
00:37:23.800 --> 00:37:30.700
alarms so we will also have more false alarm, will have more times when we reject the null period so we
00:37:30.700 --> 00:37:37.900
will reject the null lots of time sometimes will be right, sometimes will be wrong, both of this things increase.
00:37:37.900 --> 00:37:41.600
Example 2.
00:37:41.600 --> 00:37:50.300
Suppose the medical researcher was to test the claim of the pharmaceutical company that the mean number of side effects per patient for new drug is 6.
00:37:50.300 --> 00:37:57.600
The researcher is pretty sure the true number of side effects is between 8 and 10 so there like
00:37:57.600 --> 00:38:00.700
pharmaceutical company not telling the whole truth.
00:38:00.700 --> 00:38:08.400
Shows a random sample of patients reporting side effects and chooses the 5% level of significance so α equals .05.
00:38:08.400 --> 00:38:15.400
Is the power of the test larger is the true number of side effects is 8 or 10.
00:38:15.400 --> 00:38:21.700
So let us sort of think about okay what is the question really asking and then explain.
00:38:21.700 --> 00:38:28.700
So is the true number of side effects is 8 or 10 is really talking about your μ?
00:38:28.700 --> 00:38:45.500
And actually, here we are talking about the alternative μ because the null μ is probably going to be six.
00:38:45.500 --> 00:38:49.700
So here is the null hypothesis.
00:38:49.700 --> 00:38:54.400
The null hypothesis is that the pharmaceutical company is telling the truth.
00:38:54.400 --> 00:38:56.800
So the null hypothesis μ is six.
00:38:56.800 --> 00:39:10.100
Now, if the alternative μ is 8, it will be, maybe about here but if the real alternative population is actually
00:39:10.100 --> 00:39:19.200
a 10, so the other alternative, a 10, it is way out here.
00:39:19.200 --> 00:39:25.000
And which of these scenarios is the power larger.
00:39:25.000 --> 00:39:42.200
Well even if we set a very conservative critical test statistic, here is our power for 8 as is the true number of
00:39:42.200 --> 00:39:55.000
side effects but here is the power almost 100% for 10 being the true number of side effects and remember
00:39:55.000 --> 00:40:02.200
I am just trying these with just some standard error I do not care what it is just have to be the same across all of them.
00:40:02.200 --> 00:40:12.700
And so here we see that wow, it is way out farther, more of this is going to be covered when we reject the null.
00:40:12.700 --> 00:40:19.700
And so we see that the power is larger, is the true number of side effects is 10.
00:40:19.700 --> 00:40:28.000
And the reason for that is because this is really a question about effect size.
00:40:28.000 --> 00:40:40.300
The true certain distance between our null hypothesis distribution and our alternative hypothesis distribution.
00:40:40.300 --> 00:40:55.600
We know that as effect size goes up power also goes up easier to detect but we cannot do anything we cannot actually make effect size bigger.
00:40:55.600 --> 00:40:57.700
Example 3.
00:40:57.700 --> 00:41:09.100
Why are both the Z and T statistic affected by N while Colens D and hedges G are not then what do the Z,
00:41:09.100 --> 00:41:17.700
T, D and G all have in common and finally, what commonality does Z and D share.
00:41:17.700 --> 00:41:21.400
What commonality does T and G share?
00:41:21.400 --> 00:41:24.300
Well, I am going to draw this as sort of Ben diagram.
00:41:24.300 --> 00:41:51.400
So let me draw Z here and here, I will drop T and then here I will draw D and it is going to get crazy, here I will draw G.
00:41:51.400 --> 00:42:03.000
Now, if it helps, you might want to think about what these guys mean over the actual population, standard
00:42:03.000 --> 00:42:18.400
deviation over the standard error derived from the population standard deviation.
00:42:18.400 --> 00:42:27.000
And here we have standard error derived from the derived from the estimated population standard
00:42:27.000 --> 00:42:49.700
deviation whereas in D we have the distance, same distance, here just divided by Sigma, here we have the same distance divided by S.
00:42:49.700 --> 00:42:59.000
Okay so why are both the Z and T statistic affected by N while Colens D and Hedges G are not?
00:42:59.000 --> 00:43:07.000
Well, the thing that these two have in common is that these are about standard error and standard error is
00:43:07.000 --> 00:43:21.700
either Sigma divided by square root of N or S divided by square root of N and it is this dividing by square
00:43:21.700 --> 00:43:26.700
root of N that makes these two so affected by N.
00:43:26.700 --> 00:43:33.900
And so it is really because they are distances in terms of standard error.
00:43:33.900 --> 00:43:47.000
So when do the Z, T, D and G all have in common so that is that is the little guy right here, what do they all have in common?
00:43:47.000 --> 00:43:49.400
Well they all have this thing in common.
00:43:49.400 --> 00:44:01.500
So they are all about the distance between sample and population.
00:44:01.500 --> 00:44:03.900
So it is all about that distance.
00:44:03.900 --> 00:44:11.400
Some of them are in terms of standard error and some of them are in terms of population standard deviation.
00:44:11.400 --> 00:44:16.600
So what commonality does Z and D share.
00:44:16.600 --> 00:44:18.500
Well that going to be right in here.
00:44:18.500 --> 00:44:23.000
What do they have in common, they both rely on actually having Sigma.
00:44:23.000 --> 00:44:31.500
T and G both rely only on the sample estimate of the population standard deviation.
00:44:31.500 --> 00:44:36.400
So looks a little messy but hopefully this makes a little more sense.
00:44:36.400 --> 00:44:41.000
Thanks for using educator.com for effect size and power.