For more information, please see full course syllabus of Statistics

For more information, please see full course syllabus of Statistics

## Discussion

## Download Lecture Slides

## Table of Contents

## Transcription

## Related Books

### Confidence Intervals for the Difference of Two Independent Means

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

- Intro
- Roadmap
- One Mean vs. Two Means
- Notation
- Sampling Distribution of the Difference between Two Means (SDoD)
- Rules of the SDoD (similar to CLT!)
- When can We Construct a CI for the Difference between Two Means?
- Finding CI
- Finding t
- Interpreting CI
- Better Estimate of s (s pool)
- Example 1: Confidence Intervals
- Example 2: SE of the Difference

- Intro 0:00
- Roadmap 0:14
- Roadmap
- One Mean vs. Two Means 1:17
- One Mean vs. Two Means
- Notation 2:41
- A Sample! A Set!
- Mean of X, Mean of Y, and Difference of Two Means
- SE of X
- SE of Y
- Sampling Distribution of the Difference between Two Means (SDoD) 7:48
- Sampling Distribution of the Difference between Two Means (SDoD)
- Rules of the SDoD (similar to CLT!) 15:00
- Mean for the SDoD Null Hypothesis
- Standard Error
- When can We Construct a CI for the Difference between Two Means? 21:28
- Three Conditions
- Finding CI 23:56
- One Mean CI
- Two Means CI
- Finding t 29:16
- Finding t
- Interpreting CI 30:25
- Interpreting CI
- Better Estimate of s (s pool) 34:15
- Better Estimate of s (s pool)
- Example 1: Confidence Intervals 42:32
- Example 2: SE of the Difference 52:36

### General Statistics Online Course

### Transcription: Confidence Intervals for the Difference of Two Independent Means

*Hi and welcome to www.educator.com.*0000

*Today we are going to talk about confidence intervals for the difference of two independent means.*0002

*It is pretty important that there are for independent means because later we are going to go to non-independent or error means.*0007

*We have been talking about how to find confidence intervals and hypothesis testing for one mean.*0013

*We are going to talk about what that means for how we go about doing that for two means.*0023

*We are going to talk about what two means means?*0029

*We are going to talk a little bit about mu notation and we are going to talk about sampling distribution of the difference between two means.*0032

*I am going to shorten this, this is just means this is not like official or anything as SDOD*0041

*because it is long to say assembling distribution of the difference between two means, but that is what I mean.*0048

*We will talk about the rules of the SDOD and those are going to be very similar to the CLT (the central limit theorem) with just a few differences.*0055

*Finally, we all set it all up so that we can find and interpret the confidence interval.*0066

*One mean versus two means.*0075

*So far we have only looked at how to compare one mean against some population, but that is not usually how scientific studies go.*0081

*Most scientific studies involve comparisons.*0091

*Comparisons either between different kinds of water samples or language acquisition for babies versus babies who did not.*0093

*Scores from the control group versus the experimental group.*0102

*In science we are often comparing two different sets of the two different samples.*0106

*Two means really means two samples.*0112

*Here in the one mean scenarios we have one sample and we compare that to an idea in hypothesis testing*0120

*or we use that one sample in order to derive the potential population means.*0132

*But now we are going to be using two different means.*0140

*What do we do with those two means?*0143

*Do we just do the one sample thing two times or is there a different way?*0145

*Actually, there is different and more efficient way to go about this.*0152

*Two means is a different story.*0155

*They are related but different story.*0159

*In order to talk about two means and two samples, we have to talk about some new notation.*0162

*This is totally arbitrary that we use x and y.*0170

*You could use j and k or m and n, whatever you want.*0176

*X and y is the generic variables that we use.*0182

*Feel free to use your favorite letters.*0189

*One sample will just be called x and all of its members in the sample will be x sub 1, x sub 2, x sub 3.*0191

*When we say x sub I, we are talking about all of these little guys.*0203

*The other sample we do not just call it x as well because we will get confused.*0208

*We cannot call it x2 because x sub 2 has a meaning.*0216

*What we call it is y.*0221

*Y sub i now means all of these guys.*0224

*We could keep them separate.*0229

*In fact this x and y is going to follow us from here on out.*0232

*For instance when we talk about the mean of x we call it the x bar.*0236

*What would be the mean of y?*0241

*Maybe y bar right.*0243

*That makes sense.*0246

*And if you call this b, this will be b bar.*0247

*It just follows you.*0253

*When we are talking about the difference between two means we are always talking about this difference.*0256

*That is going to be x bar - y bar.*0264

*Now you could also do y bar - x bar, it does not matter.*0267

*But definitely mean by the difference between two means.*0271

*We could talk about the standard error of all whole bunch of x bars, standard error of x, standard error of y.*0274

*You could also talk about the variance of x and the variance of y.*0285

*You can have all kinds of thing they need something to denote that they are little different.*0292

*That standard error of x sort and another way you could write it is that we are not just talking about standard error.*0298

*When we say standard error, you need to keep in mind if we double-click on it that means the standard deviation of a whole bunch of means.*0312

*Standard deviation of a whole bunch of x bars.*0322

*Sometimes we do not have sigma so we cannot get this value.*0328

*We might have to estimate sigma from s and that would be s sub x bar.*0334

*If we wanted to know how to get this that would just be s sub x.*0345

*Notice that is different from this, but this is the standard error and this is the actual standard deviation of your sample ÷ √n.*0353

*Not just n the n of your sample x.*0367

*In this way we could perfectly denote that we are talking about the standard error of the x, the standard deviation of the x, and the n(x).*0372

*You could do the same thing with y.*0387

*The standard error of y, if you had sigma, you can just call it sigma sub y bar because it is the standard deviation of a whole bunch of y bars.*0390

*Or if you do not have sigma you could estimate sigma and use s sub y bar.*0402

*Instead of just getting the standard deviation of x we would get the standard deviation of y and divide that by √n Sub y.*0411

*It makes everything a little more complicated because now I have to write sub x and sub y after everything.*0423

*But it is not hard because the formula if you look remains exactly the same.*0430

*The only thing that is different now is that we just add a little pointer to say we are talking*0438

*about the standard deviation of our x sample or standard deviation of our y sample.*0446

*Even this looks a little more complicated, deep down at the heart of the structure it is still the standard error equals standard deviation of the sample ÷√n.*0452

*Let us talk about what this means, the sampling distribution of the difference between two means.*0466

*Let us first start with the population level.*0477

*When we talk about the population right now we do not know anything about the population.*0480

*We do not know if it is uniform, the mean, standard deviation.*0491

*Let us call this one x and this one y.*0500

*From this x population and this y population we are going to draw out samples and*0507

*create the sampling distribution and that is the SDOM (the sampling distribution of the mean).*0514

*Here is a whole bunch of x bars and here is a whole bunch of y bars.*0522

*Thanks to the central limit theorem if we have big enough n and all that stuff then we know that we could assume normality.*0530

*Here we know a little bit more than we know about the population.*0540

*We know that in the SDOM, the standard error, I will write s from here because*0545

*we are basically going to assume real life examples when we do not have the population standard deviation.*0557

*The only time we get that is like in problems given to you in statistics textbook.*0565

*We will call it s sub x bar and that can be the standard deviation of x/√n sub x.*0570

*We know those things and we also know the standard error of y and that is going to be the standard deviation of y ÷ √n sub y.*0585

*Because of that you do not write s sub y again because that would not make sense that*0601

*the standard error would equal the standard error over into something else.*0607

*That would not quite make sense.*0612

*You want to make sure that you keep this s special and different because standard error*0614

*is talking about entirely different idea than the standard deviation.*0621

*Now that we have two SDOM if we just decided to do this then we would not need to know anything new about creating a confidence interval of two means.*0625

*You what just create two separate confidence intervals like you consider that x bar,*0638

*consider that y bar, construct a 95% confidence interval for both of these guys.*0644

*You are done.*0649

*Actually what we want is not a sampling distribution of two means and get two sampling distributions.*0650

*We would like one sampling distribution of the difference between two means.*0661

*That is what I am going to call SDOD.*0668

*Here is what you have to imagine, in order to get the SDOM what we had to do is go to the population and draw out samples of size n and plot the means.*0671

*Do that millions and millions of times.*0682

*That is what we had to do here.*0685

*We also have to do that here, we want the entire population of y pulled out samples and plotted the means until we got this distribution of means.*0687

*Imagine pulling out a mean from here randomly and then finding the difference of those means and plotting that difference down here.*0699

*Do that over and over again.*0715

*You would start to get a distribution of the difference of these two means.*0718

*You would get a distribution of a whole bunch of x bar - y bar.*0727

*That is what this distribution looks like and that distribution looks normal.*0734

*This is actually one of the principle of probability distributions that we have covered before.*0742

*I think we have covered it in binomial distributions.*0747

*I know this is not a binomial distribution but the same principles apply here where if you draw from two normally distributed population*0749

*and subtract those from each other you will get a normal distribution down here.*0764

*We have this thing and what we now want to find is not just the mu sub x bar or mu sub y bar, that is not what we want to find.*0769

*What we want to find is something like the mu of x bar - y bar because this is our x bar - y bar and we want to find the mu of that.*0783

*Not only that but we also want to find the standard error of this thing.*0796

*I think we can figure out what that y might be.*0800

*At least the notation for it, that would be the standard error.*0807

*Standard error always have these x bar and y bar things.*0812

*This is how you notate the standard deviation of x bar - y bar and that is called*0817

*the standard error of the difference and that is a shortcut way of saying x bar - y bar.*0829

*We could just say of the difference.*0837

*You can think of this as the sampling distribution of a whole bunch of differences of means.*0839

*In order to find this, again it draws back on probability principles but actually let us go to variance first.*0845

*If we talk about the variance of this distribution that is going to be the variance of x bar + the variance of y bar.*0856

*If you go back to your probability principles you will see why.*0869

*This from this we could actually figure out standard error by square rooting both sides.*0874

*We are just building on all the things we have learned so far.*0881

*We know population.*0888

*We know how to do the SDOM.*0889

*We are going to use two SDOM in order to create a sampling distribution of differences.*0891

*Let us talk about the rules of the SDOD and these are going to be very, very similar to the CLT.*0898

*The first thing is this, if SDOM for x and SDOM for y are both normal then the SDOD is going to be normal too.*0909

*Think about when these are normal?*0919

*These are normal if your population is normal.*0922

*That is one case where it is normal.*0924

*This is also normal when n is large.*0927

*In certain cases, you can assume that the SDOM is normal, and if both of these have met those conditions,*0929

*then you can assume that the SDOD is normal too.*0939

*We have conditions where we can assume it is normal and they are not crazy.*0942

*There are things we have learned.*0949

*What about the mean?*0951

*It is always shape, center, spread.*0953

*What about the mean for the SDOD?*0956

*That is going to be characterized by mu sub x bar - y bar.*0959

*That is the idea.*0972

*Let us consider the null hypothesis and in the null hypothesis usually the idea is they are not different like nothing stands out.*0975

*Y does not stand out from x and x does not stand out from y.*0987

*That means we are saying very similar.*0991

*If that is the case we are saying is that when we take x bar – y bar and do it over and over again, on average, the difference should be 0.*0994

*Sometimes the difference will be positive.*1009

*Sometimes the difference will be negative.*1012

*But if x and y are roughly the same then we should actually get a difference of 0 on average.*1014

*For the null hypothesis that is 0.*1022

*The so what would be the alternative hypothesis?*1027

*Something like the mean of the SDOD is not 0.*1031

*This is in the case where x and y assume to be same.*1037

*That is always with the null hypothesis.*1051

*They assume to be the same.*1055

*They are not significantly different from each other.*1056

*That is the mean of the SDOD.*1058

*What about standard error?*1062

*In order to calculate standard error, you have to know whether these are independent samples or not.*1064

*Remember to go back to sampling, independent samples is where you know that these two*1073

*come from different populations and the picking one does not change the probabilities of picking the other.*1079

*As long as these are independent samples, then you can use these ideas of the standard error.*1089

*As we said before, it is easier when I think about the variance of the SDOD first because that rule is quite easy.*1096

*The variance of SDOD, so the variance is going to be just the variance of the SDOM + the variance of the SDOM for the other guy.*1105

*And notice that these are the x bars and the y bars.*1121

*These are for the SDOM they are not for the populations nor the samples.*1131

*From here what you can do is sort of justice derive the standard error formula.*1137

*We can just square root both sides.*1149

*If you wanted to just get standard error, then it would just be the square root of adding each of these variances together.*1153

*Let us say you double-click on this guy, what is inside of him?*1168

*He is like a stand in for just the more detailed idea of s sub x / n sub x.*1175

*Remember when we talk about standard error we are talking about standard error = s / √n.*1193

*The variance of the SDOM =s ^{2} /n.*1205

*If you imagine squaring this you would get s/n but we need the variance.*1210

*We need to add the variances together before you square root them.*1220

*Here we have the variance of y / n sub y.*1224

*You could write it either like this or like this.*1235

*They mean the same thing.*1240

*They are perfectly equivalent.*1242

*You do have to remember that when you have this all under the square root sign,*1244

*the square root sign acts like a parentheses so you have to do all of this before you square root.*1253

*That is standard error.*1261

*I know it looks a little complicated, but they are just all the principles we learned before,*1265

*but now we have to remember does it come from x or does come from y distributions.*1273

*That is one of the few things you have to ask yourself whenever we deal with two samples.*1279

*Now that we know the revised CLT for this sampling distribution of the differences,*1287

*now we need to ask when can we construct a confidence interval for the difference between two means?*1298

*Actually these conditions are very similar to the conditions that must be met when we construct an SDOM.*1306

*There are a couple of differences because we are dealing with two samples.*1314

*The three conditions have to be met.*1318

*All three of these have to be checked.*1321

*One is independence, the notion of independence.*1323

*The first is this, the two samples we are randomly and independently selected from two different populations.*1329

*That is the first thing you have to meet before you can construct this confidence interval.*1340

*The second thing is this, this is the assumption for normality.*1348

*How do we know that the SDOD is normal.*1355

*It needs to be reasonable to assume that both populations that the sample comes from the population are normal or your sample size is sufficiently large.*1358

*These are the same ones that apply to the CLT.*1372

*This is the case where we can assume normality for the SDOM but also the SDOD.*1376

*In number 3, in the case of sample surveys the population size should be at least 10 times larger than the sample size for each sample.*1384

*The only reason for this is we talked before about replacement, a sampling with replacement versus sampling not with replacement.*1397

*Well, whenever you are doing a sample you are technically not having replacement*1409

*but if your population is large enough then this condition actually makes it so that you could assume that it works pretty much like with replacement.*1413

*If you have many people then it does not matter.*1427

*That is the replacement rule.*1430

*Finally, we could get to actually finding the confidence interval.*1433

*Here is the deal, with confidence interval let us just review how we used to do it for one mean.*1444

*One mean confidence interval.*1450

*Back in the day when we did one mean and life was nice and what we would do is often take the SDOM*1455

*and assume that the x bar, the sample mean is at the center of it and then we construct something like 95% confidence interval.*1466

*These are .025 because if this is 95% and symmetrical there is 5% leftover but it needs to be divided on both sides.*1484

*What we did was we found these boundary values by using this idea, this middle + or – how many standard errors you are away.*1496

*We used either t or z.*1525

*I’m just going to use t from now on because usually we are not given the standard deviation of the population × the standard error.*1529

*That was the basic idea from before and that would give us this value, as well as this value.*1530

*We could say we have 95% confidence that the population mean falls in between these boundaries.*1537

*That is for one mean.*1545

*What about two means?*1548

*In this case, we are not going to be calculating using the SDOM anymore.*1549

*We are going to use the SDOD.*1560

*If this mean is going to be x bar, this sample mean then you can probably assume that*1562

*it might be something as simple as a difference between the two means.*1575

*That is what we assume to be the center of the SDOD.*1580

*Just like before, whatever level of confidence you need.*1583

*If it is 99% you have 1% left over on the side.*1593

*You have to divide that 1% in half so .5% for the side and .5% for that side.*1598

*In this case, let us just keep the 95%.*1603

*What we need to do is find these borders.*1611

*What we can to just use the exact same idea again.*1618

*We could use that exact same idea because we can find the standard error of this distribution.*1624

*We know what that is.*1629

*Let me write this out.*1631

*We will write s sub x bar.*1640

*We can actually just translate these ideas into something like this.*1645

*That would be taking this, adding or subtracting how many jumps away you are, like the distance you are away.*1652

*That would be something like x bar - y bar but instead of just having x in the middle we have this thing in the middle.*1661

*+ or – the t remains the same, t distributions but we have to talk about how to find degrees of freedom for this guy.*1670

*The new SE, but now this is the SE of the difference.*1680

*How do we write that?*1691

*X bar - y bar + or - the t × s sub x bar = y bar.*1694

*If we wanted to we could take all that out into the square root of variance of the SDOM for x and variance of SDOM for y.*1707

*We could unpack all of this if we need to but this is the basic idea of the confidence interval of two means.*1719

*In order to do this I want you to notice something.*1727

*Here we need to find t and because we need to find t we need to find degrees of freedom*1732

*but not just any all degrees of freedom because right now we have 2 degrees of freedom.*1740

*Degrees of freedom for x and degrees of freedom for y.*1744

*We need a degrees of freedom for the difference.*1747

*That is what we need.*1751

*Let us figure out how to do that.*1753

*We need to find degrees of freedom.*1756

*We know how to find degrees of freedom for x, that is straightforward.*1760

*That is n sub x -1 and degrees of freedom for y is just going to be n sub y -1.*1764

*Life is good.*1771

*Life is easy.*1772

*How do we find the degrees of freedom for the difference between x and y?*1773

*That is actually going to just be the degrees of freedom for x + degrees of freedom for y.*1778

*We just add them together.*1790

*If we want to unpack this, if you think about double-clicking on this and get that.*1792

*N sub x - 1 + n sub y -1.*1797

*I am just putting that parentheses as you could see the natural groupings but obviously you could*1804

*do them in any order because you could just do them straight across this adding and subtracting.*1810

*They all have the same order of operation.*1816

*That is degrees of freedom and once you have that then you can easily find the t.*1820

*Look it up in the back of your book or you can do it in Excel.*1830

*Let us interpret confidence interval.*1833

*We have the confidence interval let us think about how to say what we have found.*1837

*I am just going to briefly draw that picture again because this picture anchors my thinking.*1844

*Here is our difference of means.*1852

*When you look at this t, think of this as the difference of two means.*1858

*I guess I could write DOTM but that would just be DOM.*1863

*Here what we found, if we find something like a 95% confidence interval that means we have found these boundaries.*1869

*We say something like this.*1887

*The actual difference of the two means of the real population, of the population x and y.*1891

*The real population that they come from should be within this interval 95% of the time or something like*1919

*we have 95% confidence that the actual difference between means of the population of x and population of y should be within this interval.*1939

*That comes from that notion that this is created from the SDOM.*1950

*Remember the SDOM, the CLT says that their means or the means of the population.*1955

*We are getting the population means drop down to the SDOM and from the SDOM we get this.*1962

*Because of that we could actually make a conclusion that goes back to the population.*1970

*Let us think about if 0 is not in between here.*1980

*Remember the null hypothesis when we think about two means is going to be something like this.*1987

*That the mu sub x bar – y bar is going to be equal to 0.*1993

*This is going to mean that on average when you subtract these two things the average is going to be 0.*1998

*There is going to be no difference on average.*2004

*The alternative hypothesis should then be the mean of these differences should not be 0.*2006

*They are different.*2015

*If 0 is not within this confidence interval then we have very little reason to suspect that this would be true.*2016

*It is a very little reason to think that this null hypothesis is true.*2026

*We could also say that if we do not find 0 in our confidence interval that we might in my hypothesis testing be able to also reject the null hypothesis.*2030

*But we will get to that later.*2040

*I just wanted to show you this because the confidence interval here is very tightly linked to the hypothesis testing part.*2042

*They are like two side of the same coin.*2050

*That universe is fairly straightforward but I feel like I need to cover one other thing because sometimes this is emphasized in some books.*2052

*Some teachers emphasize this over other teachers and so I'm going to talk to you about SPOOL because this will come up.*2065

*One of the things I hope you noticed was that in order to find our estimate of SDOM,*2076

*in order to find the SDOD sample error what we did was we took the variance of one SDOM*2085

*and added that to the variance of the other SDOM and square root the whole thing.*2106

*Let me just write that here.*2110

*The s sub x bar - y bar is the square root of one the variances + the variance of the other SDOM.*2111

*Here what we did was let us just treat them separately and then combine them together.*2129

*That is what we did.*2137

*Although this is an okay way of doing it, in doing this we are assuming that they might have different standard deviations.*2138

*The two different populations might have two different standard deviations.*2154

*Normally, that is a reasonable assumption to make.*2159

*Very few populations have the exact standard deviation.*2162

*For the vast majority of time because we just assumed if you come from two different population you probably have two different standard deviations.*2166

*This is pretty reasonable to do like 98% of the time.*2177

*The vast majority of time.*2182

*But it is actually is not as good as the estimate of this value then, if you had just used up a POOL version of the standard deviation.*2184

*Here is what I mean.*2198

*Now we are saying, we are going to create the standard deviation of x.*2198

*You are going to be what we used to create the standard deviation of y.*2206

*Just of not make that explicit.*2210

*I am going to write this out so that you could actually see the variance of x and the variance of y.*2213

*We use x to create this guy and we use y to create that guy and they remain separate.*2228

*This is going to take a little reasoning.*2235

*Think back if you have more data then your estimate of the population standard deviation is better, more data more accurate.*2239

*Would not it be nice if we took all the guys from the x pool and all the guys from the y pull and put them together.*2253

*Together let us estimate the standard deviation.*2262

*Would not that be nice?*2267

*Then we will have more data and more data should give us a more accurate estimate of the population.*2268

*You can do that but only in the case that you have reason to think that the population of x has a similar standard deviation to the population of y.*2278

*If you have a reason to think they are both normally distributed.*2293

*Let us say something like this.*2299

*If you have reason to believe that the population x and y have similar standard deviation*2303

*then you can pull samples together to estimate standard deviation.*2324

*You can pull them together and that is going to be called spull.*2347

*There are very few populations that you can do this for.*2351

*One thing something like height of males and females, height tends to be normally distributed and we know that.*2357

*Height of Asians and Latinos or something, but there are a lot of examples that come to mind where you could do this.*2365

*That is why some teachers do not emphasize it but I know that some others do so.*2374

*That is why I want to definitely go over it.*2378

*How do you get spull and where does it come in?*2380

*Here is the thing, in order to find Spull, what we would do is we would substitute in spull for s sub x and s sub y.*2384

*Instead of two separate estimates of standard deviations use Spull.*2396

*We will be using Spull ^{2}.*2408

*How do we find Spull ^{2}?*2411

*In order to find Spull ^{2}, what you would do is you would add up all of the sum of squares.*2415

*The sum of squares of x and sum of squares of y, add them together and then divide by the sum of all the degrees of freedom.*2432

*If I double-click on this, this would mean the sum of squares of x + the sum of squares of y ÷ degrees of freedom x + degrees of freedom y.*2442

*This is what you need only to do in order to find Spull and then what you would do is substitute in s(x) ^{2} and s sub y^{2}.*2457

*That is the deal.*2469

*In the examples that are going to follow, I am not going to use Spull because there is very little reason usually to assume that we can use Spull.*2471

*And but a lot of times you might hear this phrase assumption of homogeneity of variance.*2483

*If you could assume that these guys have a similar variance, if you can assume*2490

*they have similar homogeneous variance then you can use Spull.*2502

*For the most part, for the vast majority of time you cannot assume homogenous variance.*2508

*Because of that we will often use this one.*2514

*However, I should say that some teachers do want you to be able to calculate both.*2517

*That is the only thing.*2525

*Finally I should just say one thing.*2528

*Usually this works just as well as pull.*2531

*It is just that there are sometimes we get more of a benefit from using this one.*2536

*If worse comes to worse, and after the statistics class you are only remember this one.*2543

*If not all you are pretty good to go.*2548

*Let us go on to some examples.*2551

*A random sample of American college students was collected to examine quantitative literacy.*2556

*How good they are in reasoning about quantitative ideas.*2562

*The survey sampled 1,000 students from four-year institutions, this was the mean and standard deviation.*2565

*800 from two-year institutions, here is the mean and standard deviations.*2571

*Are the conditions for confidence intervals met?*2576

*Also construct a 95% confidence interval and interpret it.*2581

*Let us think about the confidence interval requirements.*2586

*First is independent random samples.*2593

*It does say random sample right and these are independent populations.*2596

*One is for your institutions, one is to your institutions.*2603

*There are very few people going to both of them at the same time.*2606

*First one, check.*2609

*Second one, can we assume normality either because of the large n or because we know that both these populations are originally normally distributed?*2612

*Well, they have pretty large n, so I am going to say number 2 check.*2622

*Number 3, is this sample roughly sampling with replacement?*2627

*And although 1000 students seem a lot, there are a lot of college students.*2635

*I am pretty sure that this meets that qualification as well.*2640

*Go ahead and construct the 95% confidence interval.*2643

*Well, it helped to start off with the drawing of SDOD just to anchor my thinking.*2648

*And this mu sub x bar - y bar we could assume that this is x bar - y bar.*2656

*That is what we do with confidence intervals.*2667

*We use what we have from the samples to figure out what the population might be.*2670

*We want to construct a 95% confidence interval.*2678

*That is going to be .025 and then maybe it will help us to figure out the degrees of freedom so that we will know the t value to use.*2685

*Let us figure out degrees of freedom.*2703

*It is going to be the degrees of freedom for x and I will call x the four-year university guys and the degrees of freedom for y the two-year university guys.*2706

*That is going to be 999 + 799 and so it is going to be 1800 - 2 = 1798.*2718

*We have quite large degrees of freedom and let us find the t for this place.*2747

*We need to find is this and this.*2755

*Let us find the t first.*2760

*This is the raw score, this is the t, and let me delete some of the stuff.*2765

*I will just put x bar - y bar in there and we can find that later.*2772

*The t is going to be the boundaries for this guy and the boundaries for this guy.*2782

*What is our t value?*2788

*You can look it up in the back of your book or you could do it in Excel.*2790

*Here we want to put in the t in because we have the probability and remember this one*2799

*wants two tailed probability .05 and the degrees of freedom which is 1798 = 1.896.*2806

*We will put 1.961 just to distinguish it.*2819

*Let us write down our confidence interval formula and see what we can do.*2831

*Confidence interval is going to be x bar - y bar.*2838

*The middle of this guy + or - t × standard error of this guy.*2844

*That is going to be s sub x bar - y bar.*2854

*It would be probably helpful to find this thing.*2858

*X bar - y bar.*2862

*X bar - y bar that is going to be 330 – 310.*2868

*Let us also try to figure out the standard error of SDOD which is s sub x bar - y bar.*2883

*What I'm trying to do is find this guy.*2911

*In order to find that guy let us think about the formula.*2918

*I'm just writing this for myself.*2921

*The square root of the variance of x bar + the variance of y bar .*2925

*We do not have the variance of x bar and y bar.*2937

*Let us think about how to find the variance of x bar.*2943

*The variance of x bar is going to be s sub s ^{2} ÷ n sub x.*2947

*The variance of y bar is going to be the variance of y ^{2} ÷ n sub y.*2959

*I wanted to write all these things out just because I need to get to a place where finally I can put in s.*2977

*Finally, I can do that.*2986

*This is s sub x and this is s sub y.*2988

*I can put in 111 ^{2} ÷ n sub x which is 1000 and I could put in the standard deviation of y^{2} ÷ 800.*2990

*I have these two things and what I need to do is go back up here and add these and square root them.*3017

*Square root this + this.*3028

*I know that this equal that.*3034

*We have our standard error, which is 4.49 and this is 20 + or - 1.961.*3038

*Now I could do this.*3064

*I will going to take that in my calculator as well.*3066

*The confidence interval for the high boundary is going to be 20 + 1.961 × 4.49*3069

*and the confidence interval for the low boundary is going to be that same thing.*3085

*I am just going to change that into subtraction.*3097

*11.20.*3101

*Let me move this over.*3105

*It is going to be 28.8.*3110

*Let me get the low end first.*3117

*The confidence interval is from about 11.2 through 28.8.*3121

*We have to interpret it.*3127

*This is the hardest part for a lot of people.*3130

*We have to say something like this.*3133

*The true difference between the population means 95% of the time is going to fall in between these two numbers.*3136

*Or we have 95% confidence that the true difference between the two population means fall in between these two numbers.*3146

*Let us go to example 2.*3154

*This will be our last example.*3157

*If the sample size of both samples are the same, what would be the simplified formula for standard error of the difference?*3159

*If in addition, the standard deviation of both samples are the same, what would be the simplified formula for standard error of the difference?*3167

*This is just asking depending on how similar the two examples are can we simplify a formula for standard error.*3175

*We can.*3183

*Let us write the actual formula out so that would just x bar – y bar = square root of the variance of x bar + variance of y bar.*3184

*If we double-click on these guys that would give the variance of x / n sub x + the variance of y / n sub y.*3207

*It is asking, what if the sample size for both samples are the same?*3223

*What would be the simplified formula?*3230

*That is saying that if n sub x = n sub y then what would be this?*3231

*We can get the variance of x + variance of y / n.*3240

*Because the n for each of them should be the same.*3251

*This would make it a lot simpler.*3254

*If in addition a standard deviation of both samples are the same right then this would mean that*3260

*because the standard deviation is the same then the variances are the same.*3272

*That would be that case.*3276

*If in addition this was the case, then you would just get 2 × s ^{2} whatever the equal variances /n.*3279

*That would make it a simple formula.*3294

*That would make life a lot easier but that is not always the case.*3298

*If it is you know that it will be simple for you.*3303

*That is it for the confidence intervals for the difference between two means.*3307

*Thank you for using www.educator.com.*3312

0 answers

Post by Terry Kim on October 20, 2015

why are we adding df and variances when we are actually calculating the DIFFERENCE? H(null): mu_(x-y) = 0 here it is 0 because it is the difference

but I don't get why we add the dfs and variances if its S_(x-y) isn't it also should be sqrt(s^2_(x)-s^2(y))??

0 answers

Post by Professor Son on November 12, 2014

Just for students who happen to have a class with me, I don't emphasize s-pool a lot because typically it's more conservative to assume that they are separate. If you take a more advanced statistics class, you could learn about hypothesis testing that allows us to infer whether we can pool standard deviations together.

0 answers

Post by Professor Son on November 12, 2014

In the section about s-pool, I accidentally refer to SE as "sample error" but what I meant to say was "standard error."