For more information, please see full course syllabus of Statistics

For more information, please see full course syllabus of Statistics

### Introduction to Confidence Intervals

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

- Intro 0:00
- Roadmap 0:06
- Roadmap
- Inferential Statistics 0:50
- Inferential Statistics
- Two Problems with This Picture… 3:20
- Two Problems with This Picture…
- Solution: Confidence Intervals (CI)
- Solution: Hypotheiss Testing (HT)
- Which Parameters are Known? 6:45
- Which Parameters are Known?
- Confidence Interval - Goal 7:56
- When We Don't Know m but know s
- When We Don't Know 18:27
- When We Don't Know m nor s
- Example 1: Confidence Intervals 26:18
- Example 2: Confidence Intervals 29:46
- Example 3: Confidence Intervals 32:18
- Example 4: Confidence Intervals 38:31

### General Statistics Online Course

### Transcription: Introduction to Confidence Intervals

*Hi and welcome to www.educator.com.*0000

*Today we are going to be introduced to competence intervals.*0002

*Here is the roadmap for today, first we are going to do a brief overview of inferential statistics.*0005

*We have been trying to do some inferential statistics but there have been a couple of problems we keep running into.*0013

*So far I have fudged it.*0022

*We will address some of those problems head on and come up with 2 solutions.*0024

*One of those solutions is the competence interval and we are going to talk about competence intervals*0031

*when the sigma, population, standard deviation is known and when sigma is unknown.*0039

*Those are the two situations we are going to be focused on.*0046

*Let us go over inferential statistics.*0049

*We know the big picture idea there is some population represented by X and we wish we could know the population but we do not.*0055

*But instead what we can know is little samples.*0065

*We could know that but the problem is samples are biased.*0071

*Whenever we have samples and we summarize them using these mathematical summaries we call them statistics.*0074

*Just to give you an example of some statistics there things like x bar or s, those are all statistics.*0084

*What we would like to do is use these samples to understand something about the population.*0093

*Statistics, the field is about using these statistics to estimate parameters and*0100

*to give you ideas about parameters there are things like mu or sigma.*0108

*That is our whole goal.*0112

*Here we realize in order to jump from things like x bar and s to mu and sigma we are going to need more than just wishful thinking.*0114

*And that is where the sampling distributions come in.*0132

*Here we talk about sampling distribution often we are talking about some sort of statistic.*0135

*When we talk about sampling distribution of the mean we are talking about a whole bunch of x bars.*0142

*Here we have a whole bunch of x.*0148

*Here we have a whole bunch of x bar and that is the distribution.*0150

*When we summarize these statistics in the sampling distribution we call them expected values.*0155

*So it is not just mu, it is mu sub x bar.*0164

*It is not just sigma it is sigma sub x bar.*0168

*What we want to do is go from this to understand this but what we have learned*0172

*so far is how to see the relationship between parameters and expected values.*0177

*We know that these things have a relationship to each other.*0186

*And from doing that we could then make this jump.*0189

*It is like we use this to say something like this.*0195

*There are two problems with this picture although it seems rosy and there is still to nagging questions.*0199

*We would look at them a little bit before but we need to solve this more rigorously than we had before.*0210

*One question is this, what happens when we do not know what the population looks like?*0217

*Of course we could use the central limit theorem when we know mu and sigma from the population.*0222

*What if we do not know mu?*0229

*What if we do not know sigma?*0231

*Then what happens?*0233

*Also, how do we know whether a sample is sufficiently unlikely because remember the whole point*0234

*of the sampling distribution is for us to take sampling distributions from a known population and compare it to an unknown population.*0240

*If this sample does not match the sampling distribution enough that it is very unlikely to come from the sampling distribution.*0254

*We could say this is probably not the population that the sample came from.*0261

*How do we know when it sufficiently weird?*0266

*To answer these two questions there is going to be to solutions.*0269

*You can think of it as this one.*0275

*This first question roughly, they are both actually are answered in each of these but this one goes along better with that one.*0281

*This one goes along better with that one.*0287

*The two solutions are these, one is competence interval.*0291

*When we talk about competence interval here is what we are doing, we are going to figure out where mu might be from the sample.*0302

*We are going to try to figure out the population mu from the sample and*0306

*that is what we do when we do not know what the population looks like.*0336

*We try to figure it out from the sample.*0342

*Hypothesis testing actually takes another view.*0344

*The hypothesis testing, we come up with a hypothesis for what the population is like.*0349

*Hypothesize a population mu first.*0355

*In this case we are saying we are going to pull from something and figure out and pick a potential population mu.*0363

*And then we are going to test how weird the sample is.*0376

*We are going to come up with a number to tell us this is how weird the sample is.*0387

*We are going to decide is that weirdness weird enough?*0393

*That is going to be hypothesis testing.*0398

*But we are going to focus here on competence intervals.*0401

*Okay, when we talk about competence intervals we need to get an inventory of what we know so far.*0404

*Basically that is asking the question, which parameters are known or given to us?*0413

*What happens when we do not know what the population looks like?*0418

*Well we may not know what*0422

*The population looks like because we do not know anything about the population, or we know*0424

*Only a little bit about the population.*0428

*This is the case where we know a little.*0431

*Here we do not know mu but we do know sigma.*0434

*For some reason we have some partial information and that helps us out.*0444

*Here we know nothing.*0450

*Here nothing is helping us that we do not know mu and we are trying to figure it out but we do not know sigma either.*0454

*It is like nothing is helping us out here.*0464

*We just have to pull ourselves up from our own bootstraps.*0466

*These are the two situations that we are going to talk about it.*0471

*Here is the goal of competence interval.*0475

*The basic idea of the competence interval is going to be this.*0480

*We are going to try to figure out where mu might be but we do know x bar.*0484

*We know everything about the sample but we do not know anything about the population.*0497

*But in this case I am going to show you what happens when we already know sigma.*0503

*So we have a leg up.*0508

*We know sigma life is little that easier for us today.*0509

*Here is the thing, we do not know what the population looks like so cannot draw a normal or skewed or anything.*0513

*We have no idea what the population looks like and we have no idea what the population mu is.*0524

*But we for some reason know sigma is.*0531

*Sigma is given to us.*0533

*From there can we construct an SDOM?*0534

*Given that n is sufficiently large we can assume that it is normal.*0540

*We have no idea what mu is and so we do not know what mu sub x bar is.*0548

*We do not know it at all but we can figure out sigma sub x bar.*0553

*We could figure out the standard error because we have sigma and we could divide that by √n.*0559

*We have a little bit of information about the SDOM.*0566

*Here is what we do in competence intervals.*0570

*First assume that the x bar is the mu sub x bar.*0574

*Whatever your sample x bar is we are going to put back here.*0586

*We are going to assume it.*0591

*Here is why, because we always assume one thing to figure out the other,*0595

*here we are going to assume things about the x bar to figure out mu.*0601

*And hypothesis testing, we assume something about the population to figure out how*0605

*Weird x bar is.*0609

*Here because we know that the SDOM tends to be normal given a sufficiently*0612

*Large n what we know is that we can find out with reasonable competence what some*0621

*Significant borders are.*0632

*For instance, let us say we are one standard deviation away.*0634

*This is raw score and this is z score so we know at one standard deviation away*0642

*this base right here we know that that is 68% of SDOM.*0650

*Let us think about what this might mean.*0660

*When we get these borders what we might end up saying is that these are the borders in which 68% of our values will fall in the SDOM.*0663

*And here is what we could say we could also say that there is 68% chance that our*0679

*Population mu will fall in that zone.*0686

*That is a 68% competence interval.*0691

*For 68% is higher than half, but it is not that high.*0697

*But here is the thing we can have a high competence interval.*0702

*We can have a 95% competence interval or we can have a 99% competence interval.*0707

*That is what we can do.*0713

*We can have here is my x bar, here is 0 but what we can do is figure out*0716

*These borders such that we are now sure that 95% chance of having our*0730

*Population mean fall in this interval.*0744

*We can know that.*0748

*That is called the competence interval.*0750

*That is pretty hypothetically and you can even go to 99%.*0753

*And we could easily figure out these borders.*0756

*Here is how.*0759

*Because we easily figure out the border we could figure out what the z scores are.*0761

*This is what we call a two-tailed competence interval because even though the middle part is 95% that does not mean that part of 5%.*0772

*You will have 105% so that means that part is .025 so 2.5% and this part is .025.*0785

*And those the only parts that we are not sure.*0796

*There is a small chance that the population mean will fall somewhere out here but it is a very small chance.*0798

*We are trying to reduce it as much is possible.*0809

*Let us think about how we could find the z score out here.*0812

*We could use our tables in the back of the book, our z tables and we can look up and usually z tables will give you like one side.*0817

*We can look up .025 and look at the z score or we could do it on our Excel.*0830

*Instead of using normsdist, normsdist will give you the proportion of the distribution.*0837

*We are going to put in normsin as the inverse and here we want to put in the probability.*0848

*Now this is going to be my probability.*0855

*I am going to put in this probability .025 and we get 1.967.*0870

*This value here is -1.96 and because the normal distribution is symmetric we know that this part is also 1.96*0884

*but now a positive instead of negative.*0892

*We know our z values on the end and if we know the z values what is our raw score here?*0896

*Tell me what this value is and also tell me what that value is.*0908

*Well the z score tells you how many standard errors away you are.*0915

*How many jumps away and each jump is worth that much.*0921

*We are away 1.96 of these jumps.*0926

*We are going to multiply this by this and then*0931

*Either subtract it from x or add it to x.*0934

*Step two in finding competence interval is let us say you want to find a 95% competence interval finds the z scores.*0938

*It is all in the case where you know sigma.*0953

*Step 3 is this, now you want to actually find the actual scores and that is going to be x bar + or -the z score × standard error.*0957

*That is what you are going to do.*0984

*And we know what the standard error is.*0986

*I am going to rewrite this to be x bar + or - z score × sigma / √n.*0989

*When we do that we could find these competence intervals.*1003

*Once you have these competence intervals then you that with 95% competence that*1009

*your population mean will fall in this interval between these two numbers.*1019

*Now the 95% is actually called the capture rate that is like 95% and 99%, whatever.*1028

*What would the competence interval be for 100%?*1042

*It would go from –infinity to infinity because that is how far the normal distribution goes.*1047

*But the capture rate is this the proportion of random sample for which this interval captures mu.*1053

*Let us imagine taking a whole bunch of random sample, it is going to be that 95% of the*1080

*Time those random samples in tail mu.*1091

*They somehow overlap with mu.*1097

*That is what we mean by 95% capture rate.*1099

*That is when you know sigma but now we do not know sigma.*1103

*We are in trouble but we do not know mu.*1113

*We do not sigma either.*1115

*Still our goal remains the same, we try to figure out mu from x bar.*1116

*But now we are a little hobbled.*1128

*I do not have a tool that I use to have.*1132

*The beginning part of the story stays the same.*1135

*The population we have no idea and from there we want to find the SDOM because*1139

*we are going to figure out how good our sample is.*1146

*We know the shape of our SDOM as long as our s is sufficiently big.*1151

*Can we figure out sigma sub x bar anymore?*1157

*No we cannot because we do not have sigma so how can figure out sigma sub x bar.*1161

*We cannot figure out that standard error.*1170

*Here is where another idea comes in.*1171

*There is another way we can estimate the standard error of the sampling distribution that is going to be s sub x bar.*1175

*Because we are going to use the sample standard deviation s instead of sigma.*1186

*Remember s is more variable, not quite right and because of that we corrected already a little bit by using n -1 instead of n.*1200

*Here we are going to divide that by √n.*1214

*If you double click on this you would see the square root of the sum of squares ÷ √ n -1.*1218

*You would see this inside of that.*1231

*We already tried to correct it a little bit, but s is still variable.*1234

*It is not quite as good as having sigma.*1242

*And there can be other problems that we run into.*1245

*This is pretty good though and it is a pretty good estimate but you always have*1249

*to keep in mind we have not as good of a standard error as we used to.*1254

*We have to account for that.*1262

*But the steps remain the same.*1265

*First assume x bar for mu sub x bar.*1267

*Two, find z for your capture rate.*1275

*If your capture rate for example 95% then you would find the z scores.*1287

*It is helpful to memorize that for this capture rate the z scores are going to be + or -1.96.*1297

*It is going to come up a lot.*1305

*Find the z scores for your capture rate.*1306

*Here we run into a problem.*1310

*I wish we could use z scores but here is an issue, we actually cannot because s is to variable for us to assume perfect normality.*1314

*And because of that we cannot use the z and instead we have to use the t which is very similar to z.*1330

*Find the t score for your capture rate.*1348

*Instead of having raw score and z score we are going to find t score.*1352

*For now you just need to know that you can find your t score in the back of the book but in*1366

*The next lesson we are going to go over why you use t and why you cannot use z.*1372

*That is a big story.*1377

*You are going to find t.*1380

*Once you find the t for your capture rate and that will also be + or -, t is going to be very similar to z score.*1383

*We are going to use this formula.*1390

*You are going to use a very similar idea to the z score competence interval where you want to know x bar + or -.*1396

*How a t score is also going to tell you how many standard errors away.*1407

*T × standard error.*1411

*But remember, you use t when you estimate this from sample.*1417

*If we unpack this, this is what it can look like x bar + or - t × this is that estimated standard error s/√n.*1426

*It is still the same idea.*1443

*It is how many jumps away, figuring that out and then multiplying that to the length of the jump*1446

*and adding that to x bar for the high-value and then subtracting that from the x bar for the low value.*1451

*In order to find t here is what you need to know for now.*1458

*You need to know whether it is a 1 or 2 tailed distribution.*1465

*If your competence interval is two-tailed then remember these are .025*1470

*because you would split the remaining 5% on both side.*1478

*But sometimes where t values though only give you one side.*1482

*They might give you a one sided 5% or one sided .25%.*1487

*You have to just keep in mind whether it is one tailed or two tailed and also the t distributions are a whole bunch of different distributions.*1493

*They are a whole bunch of different tables basically.*1502

*You have to also know what degrees of freedom.*1508

*For now you could remember degrees of freedom as n -1.*1514

*There are reasons for all of these things why we use t, why we use degrees of freedom all that stuff.*1521

*That will be covered in the next lesson.*1528

*For now, here is what you need to know.*1529

*You need to know whether it is one tailed or two tailed.*1532

*You also need to know degrees of freedom.*1534

*Once you have that you could actually look it up in t table usually found in the back of your book.*1536

*It might also be called the students t distribution because - invented it but he was actually contracted to work for Guinness.*1542

*That is why I cannot publish it under his actual name.*1553

*We published it under the pseudonym student because that is called the students t.*1556

*You can look up your degrees of freedom and then look for the area that you need and then go down and find the t score.*1560

*Very similar to z score.*1573

*Let us go on to some examples.*1574

*Example 1, consider two extreme situations n=10 and n=1,000.*1582

*If you use s in the formula for CI given sigma, here is the actual formula for when you have sigma.*1591

*We use 1.96 because we use the z score.*1609

*Which of these situations would you expect to give a capture rate closer to 95%?*1614

*Here is what this question is really asking.*1621

*When you know sigma for competence interval for 95% competence interval 1.96 that is my z × sigma / √n.*1624

*What it is asking you is what if you substituted in s?*1649

*Here we do not know sigma but we are going to just take this formula and use the z value s/√n.*1656

*In order to answer this question you really only need to keep in mind one thing, when is s more like sigma.*1676

*S is more like sigma when n is very large.*1687

*This situation would give you a very close capture rate of 95%.*1708

*This would be very, very similar.*1721

*However, when n is 10 you have more uncertainty and because of that the t distribution it is not as tight.*1724

*It is actually more like spread out and because of that, when n=10 you do not capture 95% just by being about 2 standard deviations out this way.*1733

*That would not capture 95% of those samples.*1748

*In fact you have to go out further to capture 95%.*1753

*This is going to be much closer to 95% capture rate.*1758

*This is going to give you a smaller capture rate.*1763

*That is because your s is going to be more variable and because of that your t distribution*1766

*is going to be more disperse because more variable means sort of wider.*1778

*95% CI for a population mean is calculated for random sample of weights and the resulting CI is from 42 to 48 pounds.*1785

*For each statement indicate whether it is a true or false interpretation of the CI.*1798

*This question is asking you do you understand what the competence interval means?*1807

*Do you understand what it is for?*1811

*Let us see, 95% of the weights in the population are between 42 and 48.*1813

*Does competence interval tell us about the actual population numbers?*1821

*No, it only tells us about the population mean.*1830

*This is actually not true.*1833

*We do not know anything about the actual numbers of the population.*1836

*We do not know whether it is skewed, whether it is uniform distribution.*1840

*We do not know any of those things.*1847

*The 95% thing would only be reasonable if the population was normal and its mu was exactly equal to x bar.*1848

*That would be the case.*1862

*That is not true.*1864

*What about number 2?*1866

*95% of weights in the sample are between 42 and 48, does the CI tell us anything about this sample?*1868

*No, using the sample to estimate population mean.*1878

*We are using the SDOM.*1882

*We do not know anything about the sample itself.*1884

*That is also not true.*1888

*What about number 3?*1890

*The probability that the interval includes the population mean is 95%.*1893

*This is actually true.*1899

*There is only a 5% chance that this interval does not contain the population mean.*1902

*What about number 4?*1916

*The sample mean might not be in the competence interval.*1919

*That does not make sense if you look at the picture because we use the sample mean in order to construct the competence interval.*1924

*Of course this is in the competence intervals and this is just ridiculous.*1932

*Example 3, a random sample of 22 men had a mean body temperature of 98.1°, standard deviation of .73.*1936

*Construct a 95% competence interval for the mean of the population that the sample was drawn from.*1950

*Interpret the CI and 98.6° included in this.*1956

*This the average human body temperature.*1963

*We have body temperatures in the world and we do not know what that population looks like.*1965

*We are asking can we construct 95% competence interval such that whatever*1975

*the population mean is there is a 95% chance that we have covered it.*1989

*We start by assuming that the mean of the sample x bar is the mean of the sampling distribution of the mean.*1994

*We have done step one.*2004

*Step two is we have to construct CI and so here they give us x, but do we have sigma?*2008

*No.*2023

*We know that we cannot use the z score.*2025

*We have to use the t score.*2029

*Let us find the t for this.*2031

*This is .025 chance that we would not find it on the site and here is .025 chance that we can find it on the site.*2033

*What is the t scores?*2043

*This is the raw score or the temperature.*2046

*What is the t score for .025 when the degrees of freedom and that is n -1 there is 22 man so 22-1= 21 degrees of freedom. S*2049

*If you look in your book, at your students t distributions I am going to go down to where the df=21.*2065

*I am going to go across to where it says you know .025.*2074

*My table actually gives me this area so I am going to look at .025 on the side.*2080

*You and it says 2.08 is my t score.*2086

*That makes sense.*2093

*That is around 1.96.*2095

*You will see that as degrees of freedom get greater and greater this value becomes more and more close to 1.96.*2098

*On this side we know that it is symmetrical so I know it is -2.08.*2108

*From here I can construct my CI.*2114

*The CI is going to be the x bar + or – the t value × my standard error.*2118

*My estimated standard error here is s sub x bar because we do not have sigma.*2129

*That is going to be s ÷ √n.*2137

*Let us put in numbers here, so that is 98.1 that is our sample mean ± t value 2.08 × s .73 ÷ √22.*2141

*I am just going to calculate this on a calculator so that is going to be 98.1 and I will do the + side first. +2.08.*2167

*Excel does order of operation.*2182

*It needs to do the multiplication before the addition and its .3 ÷ √22.*2185

*That is the high-end of my competence interval is 98.4 and the low end is going to be 97.8.*2195

*98.4 and 97.8 are my CI.*2217

*When we interpret the competence interval we want to say something like*2229

*there is a 95% chance that the mean of the population lies between these two values.*2239

*Or another way we could say it is that if we draw samples at random, 95% of those samples will include the population mean.*2250

*95% of the samples in between this interval will include the population mean.*2264

*Let us think about this competence interval, is it reasonable?*2271

*Is 98.6° included that is supposed to be the mean for everybody.*2280

*We see that it is not actually.*2286

*Maybe this sample is odd because our competence interval does not actually include the mean*2288

*that we secretly know for providing temperature of people.*2297

*That is when competence intervals are helpful.*2307

*Here is example 4, in a random sample of 1000 community college students, their mean score on a quantitative literacy test was 310.*2310

*The standard deviation on this test of all the community college students have taken is 360.*2324

*Construct a 95% competence interval for the mean of all community college students have ever taken this test.*2331

*Here is our random sample and their mean or x bar is 310 but the standard deviation*2338

*of all the students who have taken this test that is the sigma is 360.*2351

*Construct a 95% competence interval.*2358

*Well, the first part that we know population we do not know but we are given the population standard deviation.*2361

*And from that, let us construct the SDOM.*2374

*Well given that this n is quite large let us assume normality.*2377

*Here we could find out the standard error by putting 360 ÷ √ 1000.*2382

*Now going to our steps of our competence interval first we assume that x bar is the mean of our sampling distribution of the mean.*2395

*Here we could use the z instead of t because we have sigma and because of that we know that this is normal.*2412

*That is going to be +1.96 and -1.96 in order to construct a 95% competence interval.*2425

*Our CI is going to look something like this x bar + or – z × standard error.*2436

*If you sort of double click on standard error what you will find is sigma / √n.*2446

*Let us put in numbers here.*2464

*310 is our x bar.*2467

*Our z score is 1.96.*2471

*Our sigma is 360.*2475

*Our n is 1,000.*2479

*Let us put these in our calculators.*2483

*I will do the high end first 310 + 1.96 × 360 ÷√1,000.*2487

*Order of operations says it does not matter anything you multiply or divide it in.*2508

*That is my high end 332 as the high scoring end.*2516

*The low scoring end, the lower bound of my 95% CI is 287.7.*2524

*That is going to be 287.7 as well as 332.3.*2537

*The mean of the population 95% should fall between this interval.*2547

*That is the end for our competence intervals.*2558

*That is part one of competence intervals.*2561

*Hope you join me for t distributions to find out why we use t instead of z sometimes.*2566

*Thank you for using www.educator.com.*2571

0 answers

Post by Michelle Greene on October 15, 2013

Again, you are using Excel when we cannot use Excel on exams. Please show us with at scientific calculator... excel is not helpful but the calculator is very helpful.

0 answers

Post by Brijesh Bolar on August 20, 2012

What book are you referring to for these sessions.. Or what book do we refer.