For more information, please see full course syllabus of Statistics

For more information, please see full course syllabus of Statistics

## Discussion

## Download Lecture Slides

## Table of Contents

## Transcription

## Related Books

### Sample Spaces

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

- Intro
- Roadmap
- Why is Probability Involved in Statistics
- Taste Test with Coffee Drinkers
- Creating a Probability Model
- D'Alembert vs. Necker
- Problem with D'Alembert's Model
- Covering Entire Sample Space
- Where Do Probabilities Come From?
- Checking whether Model Matches Real World
- Example 1: Law of Large Numbers
- Example 2: Possible Outcomes
- Example 3: Brands of Coffee and Taste
- Example 4: How Many Different Treatments are there?

- Intro 0:00
- Roadmap 0:07
- Roadmap
- Why is Probability Involved in Statistics 0:48
- Probability
- Can People Tell the Difference between Cheap and Gourmet Coffee?
- Taste Test with Coffee Drinkers 3:37
- If No One can Actually Taste the Difference
- If Everyone can Actually Taste the Difference
- Creating a Probability Model 7:09
- Creating a Probability Model
- D'Alembert vs. Necker 9:41
- D'Alembert vs. Necker
- Problem with D'Alembert's Model 13:29
- Problem with D'Alembert's Model
- Covering Entire Sample Space 15:08
- Fundamental Principle of Counting
- Where Do Probabilities Come From? 22:54
- Observed Data, Symmetry, and Subjective Estimates
- Checking whether Model Matches Real World 24:27
- Law of Large Numbers
- Example 1: Law of Large Numbers 27:46
- Example 2: Possible Outcomes 30:43
- Example 3: Brands of Coffee and Taste 33:25
- Example 4: How Many Different Treatments are there? 35:33

### General Statistics Online Course

### Transcription: Sample Spaces

*Hi and welcome to www.educator.com.*0000

*We are going to be talking about sample spaces and start to talk about probability and statistics today.*0001

*First question is why is probability involved in statistics?*0007

*We will talk a little bit about some probability fundamentals and then talk about what is a sample space.*0013

*From there we are going to talk about how to make sure we cover the entire sample space.*0020

*We would not have any columns.*0025

*We are going to introduce the fundamental principle of counting.*0026

*We are going to talk about where the probability come from?*0030

*how can we check whether our model matches the real world?*0035

*We might have a probability model but how do we know whether that model is any good?*0038

*They are going to involve large numbers.*0044

*okay. First, why the heck are we talking about probability and statistics?*0047

*Statistics is all about samples from the population and we never really know what that population is like.*0053

*we have an idea of what that population is like and we call it a model.*0061

*We call it the model of the population.*0065

*We want to know how likely is a particular sample, the empirical data that we have, stuff we actually collect,*0068

*how likely is this given a particular model of the world, a theoretical model or a theoretical population.*0075

*whenever you compute probability you are looking at some subset over the total number of whatever you have.*0085

*in this case, in probably of statistics we are looking at how likely is a particular experimental outcome over all the different kinds of outcomes that we could potentially have had.*0096

*we need to have a model of the world to figure out this total.*0110

*model generated.*0116

*and that will give us the probability or the likelihood of our experimental outcome.*0120

*let us start off with a little example.*0128

*Just a little case that might be helpful for us to wrap our minds out.*0131

*I like to drink triple coffee because I just add like sugar and cream but I want to know can people tell the difference between El Chico and expensive coffee?*0136

*Can we tell between cheap and gourmet?*0151

*Like all of these expensive brands where they have coffee that is apparently like monkeys eat the coffee bean and cool it out.*0154

*Sometimes in the stomach make the coffee bean better.*0165

*It is like most of the expensive coffees to buy.*0169

*It sounds good but it is supposed to be awesome.*0172

*Is it really awesome?*0176

*Can people tell?*0179

*I was wondering about why is it expensive like wine as well.*0180

*Can people tell that it is expensive or not?*0184

*What model of the world you might have is that these people are just guessing.*0186

*We cannot tell the difference.*0193

*They are only guessing between gourmet and cheap coffee.*0195

*When you go to Starbucks and pay, it is nice because they have music and stuff and that coffee just tastes better than Mc Donald’s coffee.*0198

*Maybe when people actually tastes the coffee they are just guessing.*0211

*Let us say we want to do a taste test with 100 coffee drinkers and we just give them little cups that looks like the same thing*0216

*and one has expensive coffee in it and the other has cheap coffee in it.*0229

*Let us say we have the n of a 100 people.*0235

*100 taste testers.*0238

*Let us say no one can actually tell the difference, does that mean everybody will get it wrong?*0239

*We will have 0 in a 100 people correct, probably not.*0246

*Even if they are just guessing, it might be reasonable to expect something like about 50% of people being able to tell the difference.*0251

*We might say 50 out of 100 correct.*0262

*That will be pretty reasonable, still reasonable to assume model.*0271

*Let us say 90 out of 100 people got it correctly.*0282

*How probably it is hard to do if everyone is just guessing.*0292

*If it is 90 out of 100 correct then it is difficult to see the model, the model maybe wrong.*0297

*In that way we can look at the data that we have and see whether our model is likely or not.*0322

*Let us say everyone can tell the difference, does it have to be 90 out of 100?*0335

*Could it be 89 out of 100?*0345

*Could it be 88 out of 100?*0347

*Could it be 70 out of 100?*0356

*Could it be 60 out of 100?*0358

*When we draw a line for people can tell the difference.*0360

*If we say that earlier 50 out of 100 got it correct, do we say that this model is likely?*0365

*Here we might say model is more likely than this scenario.*0381

*If everyone can tell the difference this might be a more reasonable data to see.*0395

*We do not expect that.*0405

*In that way it is important to know the probabilities of these different outcomes.*0410

*How likely is 50 out of 100?*0416

*How likely is 89 out of 100?*0421

*Is one more likely than the other given that particular model of how the world works.*0423

*In order to create a probability model we want to talk about a couple of different things to help us get better.*0428

*To help us get on the same page about probability.*0439

*As we have talked about before, when we talk about probability as an outcome we usually talk about it as P(z).*0444

*This might be the probability of being correct.*0453

*We might start with one taster to make it easy.*0457

*That probability that one taster is correct given that they might be only guessing, maybe let us say 50%.*0463

*There is only 50% chance that they are correct.*0472

*The probability of being incorrect might be the other half.*0476

*Correct and incorrect are what we think of as distinct events.*0485

*You cannot be correct and incorrect at the same time.*0491

*It can only be one or the other.*0494

*We call that mutual exclusivity.*0496

*Because we have these join events, these probability should add up to 100%.*0498

*It can only be correct or incorrect.*0506

*That is the only two zones of this space.*0507

*We have covered that whole space and it adds up to total probability of 1.*0511

*That is how probabilities always work.*0518

*If you have covered the entire sample space, an entire sample space value is 1.*0521

*How about 2 tasters?*0528

*What is the probability that both of them guess correctly?*0532

*It is helpful to think about what is the entire sample space?*0538

*What are the different scenarios we can have?*0545

*Number one taste coffee and number two taste coffee, that is the outcome that we are interested in.*0548

*There is a chance that 1 might get it correct but 2 does not.*0556

*There is also a chance that 2 gets it correct and 1 does not.*0562

*There are only different outcomes.*0566

*It would be helpful if we could figure out the entire sample space and assign probabilities to each zone of that sample space.*0569

*How do we do that is the question.*0576

*That is actually a very old question.*0581

*In the 1700’s there were two mathematicians, one guy is a French guy.*0584

*They have this argument about what is the probability of having two heads in a row?*0593

*That is similar to this idea what is the probability of getting 2 correct guesses from our tasters?*0607

*It is the same problem.*0621

*It is what we call isomorphic, they are the same structure.*0625

*The other one was saying they are 3 different situations that you could have.*0630

*One situation is that you could flip heads, or tails, that is not what we are interested in.*0637

*We are interested in 2 heads.*0647

*There is another possibility that the first flip and second flip is heads.*0648

*This is what we are interested in.*0654

*There is another possibility that the first flip is heads and the second flip is tails.*0657

*That is not what we are looking for.*0664

*In doing this model we have this 3 situations.*0668

*The first has the number of heads as 0.*0674

*The second has the number of heads as 2.*0677

*The third has the number of heads as 1.*0683

*In each situation has a probability of 1 and if you add them all up you will get 1.*0687

*He has prepared himself but the other one came along and said I think you left out something.*0696

*Situations are the first of this test should be symmetrical, that should be equal to all the situations or the first flip of heads.*0706

*I do not know why you have the first flipping heads are more likely.*0717

*If you add these 2 together you have 2 heads versus 1 head, what is that more likely than the first flipping tails.*0723

*That does not make sense.*0731

*One goes out that he thinks that the sample space is like this.*0732

*First flip tails and second flip heads.*0736

*First flip tails and second flip is also tails.*0741

*First flip heads and second flip heads.*0744

*That is what we are interested in.*0747

*First flip heads and second flip tails.*0749

*If you look at this, the number of heads that is being 0 in this one has 1 flip probability.*0753

*1 out of 4.*0767

*Having 2 heads, that is what we are interested in, that is this one and that is also 1 flip probability.*0768

*There are 2 different ways where you could have 1 head and the other one being tails.*0778

*Here is 2 of them and that would be ½.*0788

*If you add all of these up you will get a total of 1.*0792

*This is the right probability model not this.*0795

*I hope you could see there is a problem with there.*0803

*One issue with this model is he is going to make a complete list of all the different outcomes that he could have.*0816

*All possible outcomes that is what we mean by the entire sample space.*0824

*If you have all the possible outcomes in all these different zones.*0830

*Then we would cover the entire sample space and that is equal to 1.*0835

*This guy is missing some of the possible outcomes.*0839

*The other one got it right because he listed all of the possible outcomes that could have happen.*0845

*The sample space is the complete list of all outcomes.*0850

*Remember this joint means, another way of saying it is mutually exclusive which means that no joint events can happen at one time simultaneously.*0858

*You can only have one or the other.*0873

*All of the outcomes in the sample space must have a total probability equal to 1.*0875

*Each of these probability or outcomes must have a probability of between 0 and 1.*0881

*If in some event, like in even A has a probability of 0, this means that there is no chance that this is happening.*0890

*If we have another event that has a probability of 1 that means it is going to happen 100%.*0898

*How do we avoid if there is a problem?*0907

*How do we become like Nicor?*0914

*How do we make sure that we cover the entire sample space?*0915

*This is where we are going to involve the what we call the fundamental principle of counting.*0919

*Before I tell you what that is, I’m just going to show you using what we call an event tree.*0923

*Let us think about taster 1, he could be correct or incorrect.*0930

*We think that we have a 50 – 50 probability.*0936

*This one could be correct or incorrect.*0941

*Based on that, if taster 1 is correct, taster 2 could be correct or incorrect.*0947

*But when taster 1 is incorrect, those same events can happen.*0955

*Taster 1 could be correct or incorrect.*0962

*There are 4 different outcomes we see whether both correct, taster 1 is correct and 2 is incorrect, 1 is incorrect and 2 is correct, or both incorrect.*0966

*This is our entire sample space.*0981

*Presumably each of this in our model where everyone is guessing, each of this has a probability that is equal to each other, ¼.*0983

*What is the probability that one person gets it right but the other one gets it wrong, we do not care.*0997

*That would be these 2 added together, ½.*1002

*Just like the heads and tails case.*1006

*That is just 2 people, when we have 2 people and 2 different choices, you can think of each like each taster as a slot.*1009

*A slot where something could happen.*1033

*Here 2 things could potentially happen.*1036

*Here another 2 things could potentially happen.*1039

*If you multiply them together, you will get 4 outcomes.*1042

*This reminds you of combinations.*1047

*Those were the same principles because we are looking at how many different kinds of outcomes can we have.*1053

*That is just for 2 tasters and it already gets a little bit complicated.*1062

*What about if we have more tasters, for instance 3 tasters?*1068

*Taster 1 could be correct or incorrect, 2 can be correct or incorrect, 3 can be correct or incorrect.*1072

*If we sum all these up we have 1 branch here, another branch here, another branch here, another branch here.*1099

*We have 8 different outcomes.*1114

*We have C, C, C, we have C, I, I.*1117

*We know that we have 8 different outcomes, the way that I do it is I write my first one, half of those have to be correct or incorrect.*1125

*Half of 8 is 4, that is going to be 4 if the taster 1 is correct and 4 where taster 1 is incorrect.*1153

*Out of these 4, half of them taster 2 has to be correct, taster 2 has to be incorrect.*1162

*That is the same case for this guy, half of them taster 2 has to be correct, half of them taster 2 has to be incorrect.*1171

*Taster 3, we know that for each of these cases, because they are identical here, taster 3 has to be correct half of the time and incorrect half of the time.*1181

*This is a systematically and we sure that each line is different from each other.*1202

*We have CCC, CCI, CIC, CII.*1209

*The way you could look at this is you have taster 1, 2, 3, each have 2 possible events and 8 different outcomes.*1213

*For 4 tasters it would be complicated to draw a tree.*1227

*Instead I am going to just find how many outcomes we have.*1231

*Here I have 4 tasters, each has 2 possible events being correct or incorrect.*1238

*That is 16 possible outcomes.*1248

*I can use this method where I might have half of 16 is 8, CCCC.*1255

*I do not have space for this.*1266

*Maybe I will try to draw it a little bit smaller.*1269

*CCCC, here is 8 I.*1276

*I will draw the next one with blue, the other half of these taster 2 has to be correct and half of them taster 2 has to be incorrect.*1286

*That is going to be 4.*1305

*Taster 3 half of the time has to be correct and half of the time has to be incorrect.*1314

*Finally, I will go back to red and we just alternate.*1333

*I remember having to do this for logic classes.*1341

*Hopefully your instructors would not ask you to do more than 4.*1348

*It can be done, you just have to keep track of half of it have to be correct and half is incorrect.*1356

*This is our 16 sample space and each of them have a probability of 1 out of 16.*1365

*That is where probability comes from.*1373

*One is that it comes from observed data, we look at actual data in the world in order to figure out the probability.*1379

*In fact you might think that it is a 50-50 chance of having boy and girl but actually it is 51% chance of having a boy versus a girl.*1386

*Those probability might be affected of other things like, in other countries.*1398

*The second thing is symmetry.*1405

*Heads and tails are good example of symmetry.*1407

*There are more reason of thinking of flipping heads is more likely than flipping tails.*1414

*Whenever you have somebody who is guessing, guessing on a multiple choice test that involves symmetry.*1419

*What we mean by symmetry is not necessarily but they are the same for each option given that there is no reason the other one is better than the other.*1426

*The final thing is subjective estimates.*1436

*This one is how lucky are you to do a get a good grade in this class.*1439

*No one can actually tell you for sure, you just have a feeling maybe this percent or this percent.*1449

*Those are subjective not based on hard data.*1459

*Since we are in probabilities come from, the question that arises is if we have a probability model of the world, how do we know that they are model or theory of the world matches the real world?*1467

*It will be useless to have a model that is inaccurate that it does not match the real world.*1487

*Here is where we involve the large raw numbers.*1494

*What we assume is a reasonable fit to the real situation is we assume that when we can compare the probabilities derive from the model with the probabilities observed from the data.*1499

*If we have a lot of observed data and that matches with our model, then we would assume it is a reasonable fit.*1515

*That is what we mean by the raw large numbers.*1534

*The more data we have the more we trust in that match.*1537

*If we have match but we have a real small data set then we would not trust it.*1545

*The larger and larger our data set becomes then if it matches it is pretty good.*1552

*In this model and Nichor’s model, they predicted different probabilities for getting heads.*1558

*Flipping 0 head that is 1/3, 1/3, and 1/3 and they all add up to 100% or 1 probability.*1569

*In Nichor’s model, he thought that this have a 1 heads probability, that a probability of just one heads is ½.*1581

*If we fit 3,000 coins or you did it in a computer simulation you might get data that look something like this.*1592

*782 out of 3,000 came out with 0 heads.*1604

*I should say pairs of coin flips.*1612

*725 came out with 2 heads in a row.*1620

*About1500 came out with 1 head and the other being tails.*1626

*When you look at the probabilities, you just take this number and divide by the total.*1635

*You see that when we get these particular values, do these match the Nichor model or do these match the other model?*1640

*It is easy to see that these actually match the Nichor model.*1650

*Using the large numbers we could say the Nichor model it fits more with the real world than the other model.*1655

*Let us go into some examples.*1665

*Which of these statements accurately applies in large numbers?*1670

*We are looking at the fit between our data and the real world.*1674

*Does it really predict or look like the real world?*1681

*An opinion pollster says all you need to do to ensure the accuracy of the poll result is to make sure you have a large sample.*1685

*That sounds reasonable because we want to make sure that our poll results, if we say who do you think will win the next election?*1695

*We want to make sure that matches the actual population of voters, if you have a real large sample that is more likely going to match the real world.*1704

*A casino operator says all I need to do to ensure the house will win most of the time is to keep a large number of people coming to my casino.*1713

*Let us think about this one.*1729

*The raw large numbers is about having a lot of data then whatever your data says you know that will probably match the real world.*1731

*The house winning those are probabilities that are set by the games.*1741

*How do the games are set up?*1749

*Having a large number of people coming in affect those probabilities?*1752

*No, you just have to change those probabilities first.*1757

*This one is a no.*1761

*The number of people coming in are not going to have change those probabilities to help the house win more.*1763

*That is not going to change the probability.*1771

*The world large numbers does not say that having more data will change the probabilities,*1773

*it just says that having more data will help you know what the real world probabilities are.*1780

*It just helps you understand.*1787

*It does not help you change the real world.*1789

*A manufacturer says all I need to do to keep my proportion of defective items is low is to manufacture a lot of light valves.*1791

*This affects the understanding of proportion.*1801

*Proportion is percentage and that is relative.*1805

*If you have a crappy factory and 25% of the valves are defective, whether you have a small number of valves*1811

*or large number of valves they are still 25% that are defective.*1822

*If you have a lot of valves it will not change the proportion.*1826

*Once again it is wrong, because the raw large numbers does not have a change of the real world probability,*1831

*it only helps you understand that or know what they are.*1837

*Example 2, suppose you slipped a tera coin 7 times, how many possible outcomes are there?*1842

*Thankfully it does not say list all of them, it just says how many possible outcomes.*1852

*Think of each coin flip as a slot where one of two things can happen, heads or tails.*1857

*There are 2 possibilities for each of these.*1864

*v2 ^{7} that is our answer.*1871

*Suppose you roll a dice 9 times, how many possible outcomes are there?*1876

*It is like to think of each roll the die as a potential event that has 6 different possibilities.*1884

*Each has 6 and so this would be 6 ^{9}.*1898

*The other way that you will see the fundamental rule of counting is that it will usually say if you have n possibilities*1910

*and k number of events, total outcome, is n^k.*1948

*Here you could say if you have n possibilities, 6 possibilities for each k events then it is 6 ^{9}.*1971

*Same thing here, I always forget which is which.*1983

*This is that idea.*1990

*You could see it more readily when you see each event as a slot to be filled with possibility.*1993

*Example 3, supposed 5 taste testers are comparing 3 brands of coffee.*2003

*What is the sample and all possible outcomes?*2012

*Here maybe they have tastes one coffee then they have to pick whether it is Starbucks, Mc Donald’s, or Dunkin Donuts.*2016

*This question is actually a bit weird because it is a little bit big.*2027

*Let us say that is what this question is asking.*2032

*What are the possible outcomes?*2035

*What might these people guess?*2042

*If I have 5 taste testers and each of them can have 1 out of 3 guesses, Starbucks, Mc Donald’s, or Dunkin Donuts.*2044

*That is 3 ^{5}.*2058

*What you want to do is make sure that all of the sample space is covered.*2068

*If 5 taste testers, you want to have the equal probability of the first one picking Starbucks.*2073

*The second and third one picking Starbucks.*2086

*It might be helpful to figure out what his actually is.*2094

*9 × 9 × 3 = 81 × 3 = that is a lot of possible outcomes.*2097

*I will just leave it up like that.*2115

*That is a lot of possible outcomes but usually they would not ask you to draw that out.*2121

*Example 4, assume the different treatments for anxiety randomly signs each new patients to 1 to 2 levels of exercise and 5 different types of medication.*2130

*How many different treatments are there?*2148

*Show the sample space in a tree diagram and as a table.*2150

*First thing is how many different treatments are there?*2153

*The first slot will be levels of exercise.*2159

*They get 1 of 2 levels of exercise.*2163

*The second slot is 5 different types of medication.*2165

*I will just call these ABCDE.*2169

*There are 10 different treatments.*2173

*Let us get started.*2178

*First, we will have the exercise and then we will have the medication part of the tree.*2181

*The exercise part of the tree will be mild and moderate.*2188

*Medication will be ABCDE.*2194

*If we look at all the outcomes, the table we could look at it as mild, mild, mild, mild, ABCDE.*2207

*Same principle as before.*2230

*Each of these different treatments are equally likely or we wanted to be equally likely in our sample.*2232

*For instance we look at this treatment group, this group of people or group of experimental cases gets mild exercise they also get medication B.*2248

*That is the end of sample spaces.*2268

*Thank you for using www.educator.com.*2271

0 answers

Post by Angel Evan on March 28, 2013

Question about probability. Suppose we have a projected audience of 10,000,000 people. Through survey data, we see that 85% of respondents use email on a daily basis. Would it be statistically correct to say that there is 85% probability that the target audience uses email on a daily basis?