Sign In | Subscribe

Enter your Sign on user name and password.

Forgot password?
  • Follow us on:
Start learning today, and be successful in your academic & professional career. Start Today!
Loading video...
This is a quick preview of the lesson. For full access, please Log In or Sign up.
For more information, please see full course syllabus of Statistics
  • Discussion

  • Download Lecture Slides

  • Table of Contents

  • Transcription

  • Related Books

Bookmark and Share
Lecture Comments (2)

1 answer

Last reply by: Professor Son
Wed Nov 12, 2014 4:31 PM

Post by Brijesh Bolar on August 24, 2012

So how different is g and d from z.. I am getting a bit confused here..

Effect Size & Power

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

  • Intro 0:00
  • Roadmap 0:05
    • Roadmap
  • Distance between Distributions: Sample t 0:49
    • Distance between Distributions: Sample t
  • Problem with Distance in Terms of Standard Error 2:56
    • Problem with Distance in Terms of Standard Error
  • Test Statistic (t) vs. Effect Size (d or g) 4:38
    • Test Statistic (t) vs. Effect Size (d or g)
  • Rules of Effect Size 6:09
    • Rules of Effect Size
  • Why Do We Need Effect Size? 8:21
    • Tells You the Practical Significance
    • HT can be Deceiving…
    • Important Note
  • What is Power? 11:20
    • What is Power?
  • Why Do We Need Power? 14:19
    • Conditional Probability and Power
    • Power is:
  • Can We Calculate Power? 19:00
    • Can We Calculate Power?
  • How Does Alpha Affect Power? 20:36
    • How Does Alpha Affect Power?
  • How Does Effect Size Affect Power? 25:38
    • How Does Effect Size Affect Power?
  • How Does Variability and Sample Size Affect Power? 27:56
    • How Does Variability and Sample Size Affect Power?
  • How Do We Increase Power? 32:47
    • Increasing Power
  • Example 1: Effect Size & Power 35:40
  • Example 2: Effect Size & Power 37:38
  • Example 3: Effect Size & Power 40:55

Transcription: Effect Size & Power

Hi, welcome to

We are going to talk about effect size and power.0002

So effect size and power, 2 things you need to think about whenever you do hypothesis testing.0005

So first effect size.0011

We are going to talk about what effect sizes is by contrasting it to the T statistic.0013

They actually have a lot in common but there is just one subtle difference that makes a huge difference.0019

Then we are going to talk about the rules of effect size and why we need effect size.0026

Then we are going to talk about power.0032

What is it, why do we need it, and how do all these different things affect power for instance sample size,0035

effect size, variability in alpha, the significance level.0042

So first things first, just a review of what the sample T really means.0048

So a lot of times people just memorize the T formula, it is you know the X bar minus mu over standard error but think about what this actually means.0056

So T equals X bar minus mu over the standard error.0070

And all right that is S sub X bar.0076

What this will end up giving you is this distance so the distance between your sample and your hypothesized mu.0079

And when you divided by standard error you get how many standard errors you need in order to get from bar to your mu.0087

So you get distance in terms of standard error.0097

So distance in terms of standard error.0102

And you want to think of in terms sort of like instead of using like feet or inches or number of friends, we get distance in the unit of standard error.0111

So whatever your standard error is for instance here that looks about right, because this is the normal0123

distribution that should be about 68% so that is the standard error.0132

Your T is how many of these you need in order to get to T.0139

So this might be like a T of 3 1/2, 3 1/2 standard errors away gets you from mu to your sample difference and so this is the case of the two sample t-test.0146

So independent samples are paired samples where we know the mu is zero.0164

So this is sort of the concept behind the T statistic.0169

Now here is the problem with this T statistic.0175

It is actually pretty sensitive to N.0181

So let us say you have a difference that is very, that is going to stay the same so a difference between, you know let us say 10 and 0.0185

So we have not that difference.0199

If you have a very very large N then your S becomes a lot skinnier.0202

And because of that your standard error is also going to shrink so the standard error shrinks as N shrinks.0212

And because of that, even though we have not changed anything about this mean, about the X bar or mu,0226

by shrinking our standard error we made our T quite large.0237

So we made our T like all of a sudden were 6 standard or errors away but I really have not changed the picture.0244

So that is actually a problem that T is so highly affected by N.0252

The problem with that is that you could artificially make a difference between means, look statistically significant by having a very large N.0258

So we need something that tells us this distance that is less affected by N and that is after effects size comes in.0268

So in effect size what we are doing is we want to know the distance in terms of something that is not so affected by N.0278

And in fact we are going to use the population standard deviation because let us think about T.0288

So that is X bar minus mu over standard error.0295

So this contrast that to looking at the distance in terms of the standard deviation of the population, what would that look like.0303

Well, we could actually derive the formula ourselves.0307

We want that distance in terms of you know number of inches or number problem correct or whatever the0321

raw score is over instead the standard error we would just use S or if you had it you would use Sigma.0328

So you could think of this as the estimated Sigma and this is like the real deal Sigma.0341

And that is what effect size is and effect size is often symbolized by the letters D and G.0349

D is reserved for when you have when you have Sigma, G is used for when you use S.0360

Now let us talk about the roles of effect size.0367

The nice thing about effect size is that the N does not matter as much whether you have a small sample or large sample the effect size stays similar.0373

In test statistics suggest T or Z, the N matters quite a bit and let us think again about why.0384

So the T statistic I have been writing at so far as over standard error but let us think about what standard error is.0396

Standard error is S divided by the square root of N, now as N gets bigger and bigger so let us think about N getting bigger.0406

This whole thing in the denominator, this whole idea this whole thing becomes smaller and smaller.0417

And when you divide a positive or negative or positive, if you divide some distance by a small number then0430

you end up getting a more extreme value, more extreme.0441

So by more extreme I mean way more positive, more positive ,more on the positive side or way more negative.0448

So the T statistic is very very sensitive to N so is the Z because Z the only difference is instead of S we use Sigma.0463

And so the same logic applies but for effect size T and G we do not divide by square root of N so in that way N does not really have as much.0474

Okay so one thing to remember is if you know Sigma use covens D, if you need to estimate the standard deviation from the sample S, you want to use hedges G.0488

Okay so now you know what effect size is and it is nice that it is not as affected by N but why do we need it?0500

Well effect size is what we use, the statistically used to interpret practical significant so for instance we 0839.8 might have some sort of very small difference between group 1 and group 2 so with the males and females0510

on some game or task, there is a very tiny difference like you know let us just say males are ahead by .0001 points.0527

And practically, it sort if does not matter but if you have a large enough effect size if you have a large0540

enough N you could get a small enough standard error that you can make that tiny difference seem like a0549

big deal and you can imagine that would be sort of odd situation.0556

We take a difference that sort of does not matter but then make a big deal out of it because of some fancy statistics we did.0565

Well, that effect size is not going to be affected by N and so that going to give you more straightforward0573

measure of is this difference big enough for sort of just care about.0580

It is not going to tell you whether it was significant or not based on hypothesis testing but it can give you0584

the idea of practical significant and here were using the modern term for significant as in important.0592

It will tell you a bit practical importance not statistical outlier nests, that is how it is telling you, it is talking0601

about just regular old practical importance and the way you can think about this is just thinking about it as is this different worth noticing.0614

Is that worth even doing statistics on?0623

The thing about hypothesis testing is that it could be deceiving, a very large sample size can lead to a0625

statistically significant one of these outlier differences that we really do not care about that just has no practical significant.0632

So here although we have been trying to talk about this again and again trying to sort of clarify that0641

statistically significant does not mean important it just means it lies outside of our expectation.0648

It is important to realize once again that statistical significance does not equal practical significant.0656

This is sort of talking about how important something is and this is just sort of saying, does it stand out?0663

Does our X bar our sample actually stand out?0672

Okay now let us move on to power.0679

What is power?0684

Well, how we really needs to go back to our understanding of the two types of errors.0685

Remember in hypothesis testing we can make an error in two different ways.0691

One is the false alarm error and we set that false alarm error rate by Alpha and the other kind of error is0695

this incorrect decision that we can make called the miss.0704

A miss is when we fail to reject the hypothesis, that the null hypothesis but we should, we really should.0708

And that is signified by beta, by the term beta.0717

Now when the null hypothesis is true then we can know if we already set our probability of making0725

incorrect decision, just like subtraction we can figure out our probability of making a correct decision so if 1225.5 our probability is .05 in making incorrect decision, the other possibility is that we may correct decision 95%, 1-.05.0739

In the same way when the null hypothesis is actually false we could figure out our probability of actually0756

making a correct decision by just subtracting our probability of making incorrect decision from one.0764

So this would be one minus beta.0772

In that way these two decisions that we make they add up to a probability of one and this 2 decisions that we can make add up to probability of one.0775

But in reality only one of these worlds is true that is why they both have a probability of 1.0787

We just have no idea whether this one is true or this one is true and anyone can never really say but that is the philosophical question.0794

So given this picture, power resides here and this quadrant is what we think as power.0802

Now power is just the idea given that the world is actually false, that this world we live in pretend we0811

ignore this part right so I am just, just ignore this entire world, given that this null hypothesis is false, what0824

is our probability of actually rejecting the null hypothesis and that is what we call power.0835

So think of this as the probability of rejecting null when the null is false.0843

So why do we need power, why do we need 1 – beta?0855

Well, here it is going to come back, those concepts come right back.0864

Remember the idea that you know sometimes we wanted to detect some sort of disease right and we0873

might give a test like for instance we want to know whether someone has HIV and so we give them a blood test to figure out, do they have HIV.0879

Now this test are not perfect and so there is some chance that they will be able to detect the disease and some chance that will make a mistake.0888

There is two ways that there is two ways of thinking about this prediction.0897

One is what we call a positive predictive, value we could think about what is the probability that someone has the disease for instance HIV given that they test positive?0903

Well this will help us know what is the chance that they actually have the disease once we know their test score.0916

In this world we know their test scores and we want to know what is the probability that they have the disease.0926

On the other hand we have what is called sensitivity.0932

Sensitivity thinks about the world in a slightly foot way.0936

Given that this person has the disease, has whatever disease such as HIV, one is the probability that they will actually test positive.0940

And said that at these two actually give us very different world.0950

In one world the given is that they have a positive test and what is the probability that they have the disease versus no decease.0954

In this scenario the given is very different.0967

The given is that they actually have the disease.0973

Given that what is the probability that they will test positive versus negative?0976

And so they are looking at this or they are looking at this.0983

Now power is basically the probability of getting a hit, the probability of rejecting that null hypothesis given 1637.9 that the null hypothesis is actually false so it is actually wrong.0988

Is this more like PPV, positive predictive value?1004

Or is it more like sensitivity.1010

Well let us think about this.1012

In this world there is this reality that the given reality is that this is false.1014

We need to reject it.1022

What is the probability that will actually be rejected?1027

So reject or fail to reject.1032

Well one way of thinking about this in a more, in the comparison is to consider, what is this thing that we1040

do not know in these two scenarios?1052

We do not really know if they actually have HIV.1055

We know their test we know that their test is either positive or negative and the test is uncertain but1059

whether they actually have HIV or not, that does not have uncertainty, it is just that we do not know what it is.1065

This is sort of like HIV in that way.1074

This is the reality so HIV is the reality and this, this is the test results.1078

This is also the reality and these are the results of hypothesis testing.1088

And so in that way this picture is much more like sensitivity.1101

And really when we apply the word sensitivity we see a whole new way of looking at power.1107

Power is the idea how sensitive is your hypothesis test when there really is something to detect, can it detect it?1116

When there really is HIV, can your test detect it?1125

When the null hypothesis really is false, can your test detect it?1129

That is the question that power is asking.1136

Okay if you calculate power is there nice little formula for?1139

Well power is more like the tables in the back of your book.1145

You cannot like calculate with like one simple straightforward formula.1148

There is actually more complex formula that does not both calculus but we can simulate power for a whole1153

bunch of different scenarios and those scenarios all depend on outline effect size and also variability in1161

sample size and because of that power is often found through simulation.1169

So I am not going to focus on calculating power, instead I am going to try to give you a conceptual understanding power.1174

Now often a desired level of power and sometimes you may be working with computer programs that might calculate power for you.1187

A different level of power that you want to shoot for is .8 or above but I want you to know how this power interact with all this things.1195

All of these things actually go into the calculation of power but I want you to know what is the conceptual level.1206

So how does Alpha or the significance level affect power, how does affect size, D or G affect power, how1212

does variability S or you know, S squared affect power and how to sample size affect power.1224

Okay so first thing is how does Alpha affect power?1234

Well here in this picture, I shown you 2 distribution.1241

You could think of this one is the null distribution and this one as the alternative distribution.1247

And noticed that both of these both of these distributions up here are exactly the same down here I just copied and pasted.1254

The only thing that is different is not their means or the actual distribution.1263

The only thing that is different is the cut off.1276

Since here, the cut off scores right here and this is the alpha, and hear the cutoff score has been moved sort of closer towards the population mean.1279

And now we have a huge Alpha.1297

So let us just assign some numbers here.1301

I am just guessing that maybe that looks like alpha equals .05 that something more used to see, but this looks like maybe Alpha equals let us say .15.1304

What happens when we increase our Alpha?1317

Our Alpha has gotten bigger, what happened, what happens to power?1323

Well it might be helpful to think about what Power might be?1326

In this picture, remember, power is the probability of rejecting the null hypothesis when the null hypothesis1331

is actually false and here we often reject when it is more extreme than the cutoff value when your X bar is1344

more extreme so these are the rejections of everything on this side.1358

All of this stuff is reject, reject the null.1362

And we want to look at the distribution where the null hypothesis is false a.k.a. the alternative hypothesis.1369

So really were looking at this big section right here so here this big section, that is power, one minus Beta1380

given that you could also figure out what beta is.1396

And beta is our error rate for misses.1400

When we fail to reject, fail to reject but the alternative hypothesis is true or the other way we could say it is the null hypothesis is false.1406

So what happens to power when Alpha becomes bigger?1421

Well, let us colour in power right here and it seems like there is more of this distribution that is been colored in than this.1430

So this part has been sort of added on, it used to be just this equals power but now we also have added on the section.1440

So as Alpha increases, power also increases.1453

And hopefully you can see that from this picture.1464

Now imagine moving Alpha out this way so decreasing Alpha.1467

If we decreased alpha then this power portion of the distribution that power part will become smaller so1472

the opposite, sort of the counterpoint to this is also true as Alpha decreases, the power also decreases.1483

But you might be asking yourself, then why cannot we just increase alpha so we could increase power right?1496

Well, remember what Alpha is, alpha is your false alarm rate.1511

So when you increase Alpha, you also increase your false alarm rate.1516

So at the same time if you increase your false alarm rate your increasing power.1521

And so this often is not a good way to increase power.1526

But you should still know, with the relationship is.1533

How about effect size, how does effect size affect power?1538

Well remember, effect size is really sort of a, you can think of it roughly as this distance between the X bar and the mu.1544

We are really looking at that distance in terms of standard deviation of the population.1555

How does effect size affect power?1561

Here I have drawn the same pictures, same cut off except I have moved this null, this alternative1564

distribution a little bit out to be a more extreme so that we now have a larger distance, larger distance.1574

And so this is a bigger effect size, bigger effect size so what happens when we increase the effect size and1587

we keep everything else constant, that the cut off, the null hypothesis, everything.1600

Well, let us colour in this, and colour in this.1605

Which of these two blue areas is larger.1612

Obviously this one.1616

This power is bigger than this power and it is because we have a larger effect size so another thing we have1618

learned is that larger effect size it leads to larger power so as you increase effect size you could increase power but here is the kicker.1627

Can you increase effect size?1646

Can you do anything about the effect size?1649

Is there anything you could do?1651

Not really.1655

Effect size is something that sort of out there in the data but you cannot actually do anything to make it1657

bigger but you should know that if you happen to have a larger effect size then you have more power than if your study of a small effect size.1663

Okay so how does variability and sample size affect power?1674

Now the reason I put these two things together is that remember, this distribution are S Toms, right?1686

And so the variability in a S Tom right is actually standard error and standard error is S divided by the square root of N.1696

So both variability and sample size will matter in power.1711

And so here I want to show you how.1720

Okay so here I have drawn the same means of the population of the S Toms and remember here we have1723

the null hypothesis and the alternative hypothesis distribution.1733

I have drawn the same pictures down here and I kept the same Alpha about .05.1739

So I had to move the cut off a little just so that I could color in .05 but something has changed and that is this.1749

This a lot skinnier than this is, that is less variability so that S Tom has decreased in variability.1760

So here standard error has decreased so we have sharper S Toms.1772

Still normally distributed, just sharper.1786

And so when we look at these skinnier distribution let us look at the consequences for power.1790

Here lets color in power and let us color in power right here and it also helps to see what beta is.1798

So here we have a quite a large beta and here we have a tiny beta.1810

And so that makes you realize that the one minus Beta appear the power here is larger than the one minus Beta down here.1815

This is smaller than the 1 – beta down here because remember were talking about proportions.1832

This whole thing add up to 1, now this might look smaller to you, the whole thing adds up to one.1849

If this is a really small proportion so let us do a number on it, that was less than .05.1855

Let us go on .02.1861

This looks bigger than .05 so let us go on .08.1863

Then 1 - Beta here would be 92% and 1 - Beta here would be 98% so this is a larger power than this.1869

So one thing we have seen is our standard error decreases so it is decreasing then power increases so this is what we call a negative relationship.1879

As one goes down the other goes up and vice versa as standard error increases as these distribution1899

become fatter and fatter then power will decrease overall the opposite way.1908

Now because we already know this about standard error we could actually say something about sample1913

size because sample size actually has the opposite it also has a negative relationship with standard error1921

and sample size go get bigger and bigger and bigger standard error gets smaller and smaller and smaller1929

and so sample size actually have a positive relationship with power so as sample size increases and1936

therefore standard error decreases, power increases.1947

And so we could figure that out just by reasoning through what standard error really mean.1954

Okay so how do we increase power because often times your power or sensitivity is really a good thing.1965

We want to be able to have experiments and studies that have a lot of power that would be a good hypothesis testing adventure to embark on.1976

How do we actually increase it.1987

Well can we just do this by changing Alpha?1989

Well the problem with this is that you get some consequences namely that falls alarms increase.1994

So if you increase power with this strategy you are also going to increase false alarm, that is very dangerous so that is not something we want to do.2010

That is type 1 error so that is something we do not want to do.2020

So you do not want to change power by changing Alpha although that is something under our control.2023

Now we could try to change effect size but because of effect size is something that is already sort of true in2029

the world right like what we have to do to mess with standard error of the standard deviation of the2039

population, we cannot mess with that so this is actually something that is impossible to do.2045

So that is one thing that we wish we could do but cannot do anything about.2052

Can we change the variability in our sample, can we change the variability?2067

Indirectly, we can.2072

There is really one way to be able to change standard error.2075

Can we do this by changing the standard deviation of the population?2081

No, we cannot do that, that is out of our control.2085

But we can change N.2090

We can collect more data instead of having 40 subjects or cases in our study, we can have 80.2093

And in that way we can increase our power and so really the one thing that sort of one tool that sort of2102

available to us as researchers in order to affect power is really affecting sample size.2110

None of these other things are really that appealing to us.2116

We cannot change population variability, we cannot change effect size and if we change Alpha then that is a dangerous option.2120

And so what we have left here is affecting sample size.2133

Now let us go on to some examples.2139

Statistical test is designed with a significance level of .05 sample size of 100.2144

As similar test of the same null hypothesis is designed with a significant level of .1 and a sample size of 100.2149

If the null hypothesis is false which test has greater power?2160

Okay so let us think about this.2165

Here we have a situation one test one where Alpha equals .05.2168

Test 2 the other test, alpha = .10 so here Alpha is larger.2178

Remember alpha is moving that critical test statistic so we have taken this and let us have this Alpha right2190

here and what we do is we moved it over, moved it over here, well not that far but just so you can get the idea.2205

And now our Alpha is much bigger but what we see is that our beta, our 1 - Beta has also gotten a lot bigger.2217

So here we see that power increases but we should also note that now we have a higher tolerance for false2231

alarms so we will also have more false alarm, will have more times when we reject the null period so we2244

will reject the null lots of time sometimes will be right, sometimes will be wrong, both of this things increase.2251

Example 2.2258

Suppose the medical researcher was to test the claim of the pharmaceutical company that the mean number of side effects per patient for new drug is 6.2261

The researcher is pretty sure the true number of side effects is between 8 and 10 so there like2270

pharmaceutical company not telling the whole truth.2277

Shows a random sample of patients reporting side effects and chooses the 5% level of significance so Alpha equals .05.2281

Is the power of the test larger is the true number of side effects is 8 or 10.2288

So let us sort of think about okay what is the question really asking and then explain.2295

So is the true number of side effects is 8 or 10 is really talking about your mu?2302

And actually, here we are talking about the alternative mu because the null mu is probably going to be six.2309

So here is the null hypothesis.2325

The null hypothesis is that the pharmaceutical company is telling the truth.2330

So the null hypothesis mu is six.2334

Now, if the alternative mu is 8, it will be, maybe about here but if the real alternative population is actually2337

a 10, so the other alternative, a 10, it is way out here.2350

And which of these scenarios is the power larger.2359

Well even if we set a very conservative critical test statistic, here is our power for 8 as is the true number of2365

side effects but here is the power almost 100% for 10 being the true number of side effects and remember2382

I am just trying these with just some standard error I do not care what it is just have to be the same across all of them.2395

And so here we see that wow, it is way out farther, more of this is going to be covered when we reject the null.2402

And so we see that the power is larger, is the true number of side effects is 10.2413

And the reason for that is because this is really a question about effect size.2420

The true certain distance between our null hypothesis distribution and our alternative hypothesis distribution.2428

We know that as effect size goes up power also goes up easier to detect but we cannot do anything we cannot actually make effect size bigger.2440

Example 3.2455

Why are both the Z and T statistic affected by N while Colens D and hedges G are not then what do the Z,2458

T, D and G all have in common and finally, what commonality does Z and D share.2469

What commonality does T and G share?2478

Well, I am going to draw this as sort of Ben diagram.2481

So let me draw Z here and here, I will drop T and then here I will draw D and it is going to get crazy, here I will draw G.2484

Now, if it helps, you might want to think about what these guys mean over the actual population, standard2511

deviation over the standard error derived from the population standard deviation.2523

And here we have standard error derived from the derived from the estimated population standard2538

deviation whereas in D we have the distance, same distance, here just divided by Sigma, here we have the same distance divided by S.2547

Okay so why are both the Z and T statistic affected by N while Colens D and Hedges G are not?2570

Well, the thing that these two have in common is that these are about standard error and standard error is2579

either Sigma divided by square root of N or S divided by square root of N and it is this dividing by square2587

root of N that makes these two so affected by N.2602

And so it is really because they are distances in terms of standard error.2607

So when do the Z, T, D and G all have in common so that is that is the little guy right here, what do they all have in common?2614

Well they all have this thing in common.2627

So they are all about the distance between sample and population.2629

So it is all about that distance.2641

Some of them are in terms of standard error and some of them are in terms of population standard deviation.2644

So what commonality does Z and D share.2651

Well that going to be right in here.2656

What do they have in common, they both rely on actually having Sigma.2658

T and G both rely only on the sample estimate of the population standard deviation.2663

So looks a little messy but hopefully this makes a little more sense.2671

Thanks for using for effect size and power.2676