Sign In | Subscribe
Start learning today, and be successful in your academic & professional career. Start Today!
Loading video...
This is a quick preview of the lesson. For full access, please Log In or Sign up.
For more information, please see full course syllabus of Probability
  • Discussion

  • Study Guides

  • Download Lecture Slides

  • Table of Contents

  • Transcription

Bookmark and Share
Lecture Comments (4)

3 answers

Last reply by: Dr. William Murray
Tue Sep 2, 2014 7:57 PM

Post by David Llewellyn on August 28, 2014

I don't follow where you get each indicator variable to be equal to r/N.
Y1 is OK but, surely, the probability of picking the gender of the second person depends on your first choice as there is no replacement. If the first choice is a woman then the probability of getting a woman on the second choice is (r-1)/(N-1) but if the first choice was a man then the probability of getting a woman is r/(N-1).
The probability of picking a woman gets even more complex on the third choice as it depends on whether you have picked 2, 1 or 0 women already being (r-2)/(N-2), (r-1)/(N-2) and r/(N-2) respectively.
This trend continues all the way up to the nth choice where depending on how many women have been picked already the probability of picking a woman is (r-n-1)/(N-n-1), ... (r-2)/(N-n-1), (r-1)/(N-n-1), r/(N-n-1).
I can't see how this simplifies to nr/N.
What am I missing?

Hypergeometric Distribution

Download Quick Notes

Hypergeometric Distribution

Lecture Slides are screen-captured images of important points in the lecture. Students can download and print out these lecture slide images to do practice problems as well as take notes while watching the lecture.

  • Intro 0:00
  • Hypergeometric Distribution 0:11
    • Hypergeometric Distribution: Definition
    • Random Variable
  • Formula for the Hypergeometric Distribution 1:50
    • Fixed Parameters
    • Formula for the Hypergeometric Distribution
  • Key Properties of Hypergeometric 6:14
    • Mean
    • Variance
    • Standard Deviation
  • Example I: Students Committee 7:30
  • Example II: Expected Number of Women on the Committee in Example I 11:08
  • Example III: Pairs of Shoes 13:49
  • Example IV: What is the Expected Number of Left Shoes in Example III? 20:46
  • Example V: Using Indicator Variables & Linearity of Expectation 25:40

Transcription: Hypergeometric Distribution

Hi and welcome back to the probability lectures here on www.educator.com, my name is Will Murray.0000

Today, we are going to be discussing the glamorously named hyper geometric distribution.0006

Let me tell you about the situation where you would use the hyper geometric distribution.0011

I set it up in terms of picking a committee of women and men.0016

The idea is that you have a larger group, you have a big group of N people.0021

There is N and there is a n in the hyper geometric distribution.0029

Make sure you do not get those mixed up.0033

You got N people total, all women, and N - R man.0035

What you are going to do is you are going to form a committee from this larger group.0043

Your committee is going to have n, that is the number of men and women you are going to put on your committee.0048

We want to emphasize here that this is an unordered choice.0058

You are going to just grab a group of people, it does not matter which order you are grabbing them in.0061

You are not going to have a chair of the committee, you are not going to have any special positions.0066

You are going to have a group of people, you can think of it may be as a team, a sports team.0071

It is without replacement meaning you cannot pick the same person twice.0077

You grab this group of people and then the question is, how many women did you end up with on your committee, out of all the possible men and women?0081

More specifically, what are the chances of getting exactly y women on your committee?0092

Our random variable here represents the number of women that you end up with on our committee.0100

Let us go ahead and look at all the parameters, there is a lot of them, and let us figure out the formula.0106

There is a lot of parameters here N is the total number of people that you are looking at.0113

That is the number people that are available to be selected on your committee.0119

R is the number of women available and that means that all that remain are men, that is N - R is the number of men available.0125

And then n is the number of people we are going to pick.0135

When I look at this large pool, let me draw a little Venn diagram here.0140

This large pool of N people available, there is N people available /all.0145

All of them are women, N - R of them are men.0152

We are going to create our committee of N people and that means that,0157

we want to find the probability of why those people being women which means that n - Y of those people are men.0163

The probability distribution formula looks very complicated but 0173

I'm going to try to persuade you that it is actually a very easy formula to remember,0178

if you can remember this situation that we are describing.0182

The probability formula is R choose Y/N - R choose n – Y.0187

Multiply by that and then N choose n.0194

I want to emphasize that these are all binomial coefficients.0199

These are combinations, you will use the factorial formula to simplify these.0202

That looks like a very difficult formula to remember but it is not, and here is why.0208

The denominator that just represents, remember there is N people total and you are choosing n of them.0214

This is the total number of ways to choose your committee.0221

There is N people total and you are choosing n of those people to be on your committee.0234

If you are going to disregard gender, you are just making a choice of n people out of the total number of people.0243

Suppose you take gender into account and suppose you want to get exactly Y women on your committee.0250

You have a fixed number of women that you want to get on your committee.0256

Then you will look at all the women in the room and you would choose exactly Y of them to be on your committee.0260

There are women and you are choosing Y of them to be on your committee.0266

You are making a choice of Y people out of R women available.0274

Then, after you have chosen your women, you look around at all the men and you choose the number of men you need.0280

How many men do you need?0286

If you want to get Y women, that means you need n – Y men and how many men are available.0288

We said there is N – R, the number of men available.0299

This term really represents you choosing the men to be on your committee.0303

You have a certain number of ways you can pick the women.0310

You can have a certain number of ways you can pick the men.0313

You multiply those together, that gives you the total number of ways to pick your committee that has exactly Y women.0316

And then, you divide that by the total number of ways to pick your committee, if you do not pay any attention to gender at all.0325

That is actually, I think that is a fairly easy formula to remember, even though it looks very complicated.0332

It is definitely one of the most complicated probability distribution formula.0339

This Y here, the range for Y, you could have as few as 0 people, 0 women on your committee.0344

Or it is a n bit complicated here because the most number of women you can have on your committee would be N, 0350

because that is the size of the committee, or R because that is the number of women available.0359

Whichever one of those is smaller, that is the maximum possible number of women you can have on your committee.0364

We need to get a couple of properties down with the hyper geometric distribution.0373

The most useful one is the mean, which you remember is the same as expected value.0377

The expected value of the hyper geometric distribution, this n × R/N.0384

N is the size of your committee, R is the number of women available, 0392

and N is the total number of people in the room that you are choosing from.0398

The variance is really a kind of a nasty formula, I do not recommend memorizing it.0403

I do not use it very often but I wanted to record it for posterity, in case you do need it.0409

These are actual fractions, let me emphasize, these are not binomial coefficients.0415

This is just what it turns out to be.0422

Like I said, I do not really think there is a lot of intuition to be gained from this variance.0424

I do not think it is worth memorizing that formula.0434

The standard deviation, of course, is just the square root of the variance.0436

It is always the square root of the variance.0440

I just took the variance formula and took the square root of it, to get the standard deviation.0442

Let us go ahead and jump into some examples here.0449

In example 1, we got 33 students in a class and 12 women and 21 men.0452

We are going to pick a committee, maybe we are going to do a group project and 7 students are going in a group project.0459

I will pick 7 students at random, what is the chance that we will get exactly 5 women working on that project?0465

This is a hyper geometric distribution, let me set up the parameters here.0471

N is the total number of people available, that is 33.0475

R is the number of women in the room, that is 12.0482

That means that N - R is the number of men available, that is 21.0486

The number people on our committee is 7 and we are interested in the chance that we are going to end up with Y,0499

with 5 women on our committee, that is the value of Y or Y is 5.0507

That is because we want our committee to have exactly 5 women.0512

Let me write down the formula for the hyper geometric distribution.0516

P of Y is R choose Y, that is where we picked the women, × N -R men available, n - - Y men on our committee ÷ N ÷ n,0520

that is the total number of ways we could have chosen this committee or this group of students do a project.0536

I will just drop the numbers in.0541

R is 12, Y is 5, N - R is 21, n -y is 7 -5 is 2, N is 33, and n is 7.0543

I'm going to leave that as a fraction like that, I did not bother to work it out to a decimal.0565

It would be a fairly small number, if you actually worked out the numbers, it should be pretty small.0572

But it would be a load of factorials that I just did not want to calculate.0577

I did not think it would be very illuminating but it would be pretty small,0582

because if you pick 7 people at random from a class like this,0587

the chance you getting 5 women is very low because there is there is more men than women in this class.0590

Let me recap where those came.0597

First, I set up all my parameters, the N, R, n, n – R, and Y.0601

Then I just use the probability distribution formula for the hyper geometric distribution.0607

This is the formula, I know it looks difficult to remember but if you kind of think about what each one of those factors represents, 0612

it is really not hard to remember the formula.0619

I think this formula kind of makes intuitive sense, if you think about the R choose Y0625

means you are picking Y women from R available women.0631

N -R being is the number of men available and n - Y is the number of men you want.0637

We multiply those together and N choose n is the number ways of choosing your committee in the first place.0644

We drop the numbers in for each one of those and we just give that as our answer.0652

That is our chance that the committee will contain exactly 5 women.0656

We are going to hang onto these numbers for the next example.0660

Remember the basic setup of this example and we will go ahead and take a look at that.0663

Example 2 was referring back to example 1.0670

In example 1, we were picking students from a class and we are picking a committee of 7 students, maybe a group project in a class.0673

Let me just remind you of the parameters from example 1.0683

We had N was the number of students in the class, 33.0686

R was the number women in the class, I got this from example 1, they were 12 woman in the class.0692

N was the number of people that we are picking to be on our committees, that is 7.0698

The expected number of women is the expected number of our random variable Y.0706

Y is the number of women on our committee.0712

We have a formula for the expected value of a hyper geometric random variable, the mean.0723

E of Y is n × r/N.0734

In this case, that n is 7 × r is 12, N is 33.0740

I guess we could simplify that, 12 and 33, you can take out a 3 from each of those.0751

7 × 4/11, that is a 28/11.0757

Our units here are women, that is the total number of women we expect on our committee.0763

Obviously, you cannot have fractions of women but on average, if we did this many times,0769

we would expect to see on average, 28/11 is a lot less than 3.0775

A little less than 3 women on the committee, on average.0782

To recap here, I got these parameters from example 1.0788

Example 1 setup how many people there were in the room, how many women, how many men,0793

how many people we are picking on our committee.0797

I got this formula for the mean from the third slide at the beginning of the lecture.0800

If you scroll back a couple of slides, you will see this mean formula.0806

I will just drop the numbers in and I simplified that down to a certain number of women.0809

Of course, in real life, we will either have 1 woman, or 2 women, or 3 women.0816

On average, we will have a bit fewer than 3 women on our committee.0823

In example 3 here, you open up your shoe closet and you do a shoe inventory.0831

It looks like you have 10 pairs of shoes in your closet.0837

You have lots of pairs of shoes in your closet.0840

You are getting ready to move to a new apartment.0843

You are in a hurry, you grab the nearest box you see and you start throwing your shoes in.0846

You are not really keeping track of which shoe matches up which.0852

You are just throwing them all in, you will unpack them after you move.0855

You start throwing your shoes in and you get 13 shoes in the box, and it is full.0859

You seal up the box and then you start to wonder, how many left shoes are in the box and how many right shoes are in the box?0866

In particular, what is the probability that there are exactly 5 left shoes and 8 right shoes in the box?0873

This is a hyper geometric distribution because if you think about it, it is just like selecting women and men to be on a committee.0879

You had a certain number of left shoes in your closet.0890

You have a certain number right shoes in your closet.0893

You grab some and put them in the box, it is just like selecting women and men to be on your committee.0894

Let me set up the parameters here for the hyper geometric distribution.0900

N is the total number of people in a room, or in this case, it is the total number of shoes in the closet,0905

before you start packing them.0913

Shoes in the closet, counting both left and right.0915

Let us say we got 10 pairs, there are 20 of those.0920

R is the number of left handed shoes.0926

Left handed shoes sounds a little strange, I will just say left shoes.0930

There are 10 left shoes in your closet, assuming that all your pairs match up.0936

Let me go ahead and calculate N – R, that is the number of right shoes but that is 20 -10 is still 10.0942

N is the number of shoes that you have chosen randomly, when you throw them in a box.0960

The number in the box and that is given to us to be 13.0968

Y is the number of left shoes that we are interested in.0979

Y is 5, 5 left shoes, because we are curious about the likelihood that there are exactly 5 left shoes in the box.0983

Let me go ahead and remind you of the formula for the hyper geometric distribution.0998

P of Y, it is not hard to remember if you think about what these things are measuring.1003

It is R choose Y because it is the number of left shoes available, the number that you are interested in,1009

× the number of right shoes available, that is N – R.1017

N - R and n – y, that is the number of right shoes that should be in the box × 1022

all the possible ways of choosing your shoes, that is N choose n.1028

I will just fill in all the numbers here.1035

R is 10, Y is 5, N - R is 10, n - Y is 13 – 5, that is 8.1037

N was 20 and n was 13, 20 choose 13.1053

That is all the number of ways that you could have chosen 13 there.1075

Again, I did not bother to simplify this down because it will be a lot of factorials.1080

I think I will just leave it that way.1088

If you want to simplify that down, you could just calculate a bunch of factorials,1093

and then do some arithmetic there and get a decimal answer.1098

Let me recap and show you where each one of those values came from.1104

Each one of these numbers, these parameters for the problem came from somewhere in the problem.1108

N is the total number of shoes available in the closet.1114

They were 10 pairs which means they were 20 shoes available.1118

R is the number of left shoes.1122

We figure this analogously to picking a committee of people from a group of women and men.1125

Instead, we are picking a box of shoes from a group of left and right shoes.1133

R is the number of left shoes that we just picked.1137

We picked R to be the number of left shoes.1143

We could have switched the role of left shoes and right shoes, and it really would not matter,1147

we would end up getting the same answer here.1150

The number of left shoes, since there is 10 pairs, there is exactly 10 left shoes that makes 1155

the number of right shoes to be 20 -10 which is 10.1161

That is easy to figure out as well.1166

The number of shoes in the box total is 13, that is where that 13 came from, that is n right there.1168

Y is the number of left shoes that we are interested in.1179

We want to find the probability of getting 5 left choose, that 5 came from that number right there.1183

You could switch the roles of left shoes and right shoes.1189

You could have keep track of right shoes instead, and that we are giving you the same answer.1192

The probability of that Y, I just wrote down the formula for the hyper geometric distribution.1197

I do remember this, even though this is kind of a complicated formula, 1203

it is not hard to remember when you think about what each one of these things it is counting and what each one represents physically.1206

I just dropped in all the parameters, r, y, N, n.1214

We got some number that you could simplify to a fraction or to a decimal but it did not seem to me to be that relevant.1221

We are going to hang onto this example and we are going to keep using this example in problem 4.1232

Remember these numbers and we will look in another aspect of this in the next example.1242

Example 4, this refers back to example 3.1248

If you have not just watched example 3, go back and watch example 3.1251

Or at least, read the setup before you look at example 4 and that will make sense.1255

Remember back then, we have a shoe closet which has 10 pairs of shoes.1261

We start throwing the shoes into a box at random because we are getting ready to move and we are in a hurry.1266

We are not going to bother to keep the left shoe with its corresponding right shoe.1271

We just throw our shoes into the box and it turns out that there are 13 shoes in the box.1276

I'm curious about how many left shoes there might be in the box?1281

This is again a hyper geometric distribution, let me remind you of the parameters that we had on example 3.1287

This was coming from example 3.1294

N was the total number of shoes, that is 20 total number of shoes in your closet.1296

r is the number of left shoes, there is 10 left shoes which means that there is 10 right shoes.1302

n is the number of shoes in the box which we said back in example 3, we said the box fills up when you got 13 shoes in there.1313

Our n is 13.1323

I want to know the expected number of left shoes in the box.1325

Remember, we sealed up the box, we cannot go and count.1329

Let us try to find the expected number of our random variable here.1332

Y is the number of left shoes in the box.1338

We want to find the expected number of left shoes, E of Y, the expected number of left shoes.1351

We have a formula for the expected value of the hyper geometric distribution.1358

Let me remind you what it was.1363

It is the same as the mean.1365

It is n × r/N, that is in this case, n is 13, r is 10.1368

I’m just reading these from up above.1380

N is 20, the 10 and the 20 simplify down to 13/2.1382

13/2 which is 6.5 left shoes.1390

It makes perfect sense and another sense is absurd because you cannot have half a shoe.1400

You are not cutting your shoes in half.1406

It does not really mean that we open the box, there will be 6.5 left shoes in there.1408

You either find some whole number shoes, you might find 4 left shoes, you might find 7 left shoes.1416

You will not find 6.5 left shoes.1422

What it does mean is that if you pack many boxes and there are 13 shoes in each one, 1425

on average, over the long run you will expect to find 6 1/2 left shoes per box.1433

On average, if you add up all the left shoes and divide by the number of boxes.1440

Of course, that does not make sense because if you have 13 left shoes,1446

remember that in your shoe closet, half of the shoes are left and half of the shoes were right.1451

On average, you expect see half of them being left shoes.1458

If you have 13 shoes total then on average you expect to see 6 1/2 left shoes.1461

Let me recap that problem.1468

We took these parameters from examples 3.1470

If these numbers look strange to you, just go back and read the setup in examples 3.1474

You will see that we had 20 shoes in the closet, 10 left shoes, 10 right shoes.1479

We took 13 of them, we threw them into a box.1485

The mean number of shoes there, the mean of the number left shoes using the hyper geometric distribution is n × r/N.1490

That formula came from our slide about means and standard deviations, earlier on in this lecture.1499

I think it was the third slide of this video.1506

You can scroll back and see where that comes from.1508

I just drop the numbers in 13, 10, and 20.1511

Simplify down to 6.5 left shoes which of course, does not make sense because you will find a whole number of shoes in the box.1513

But as an expected value, as an average value, it makes perfect sense because out of 13 shoes,1522

you can expect half of them to be left shoes and half of them to be right shoes.1528

You would expect in the long run, an average of 6 1/2 left shoes in the box.1532

Example 5 here is a little more theoretical, it is asking us to use indicator variables and linearity of expectation1541

to prove that the expected value of a hyper geometric random variable is n × r/N.1549

This one is a little more theoretical, we are going to prove this value.1559

We cannot just pull it from the earlier slide.1562

Let me show you how this works out.1565

Remember the premise of the hyper geometric distribution.1568

We are calculating a random variable that represents the number of women on a committee of,1572

n was the number of people on our committee.1593

We have several parameters here.1597

N is the total number of people in the room that we are going to pick from, total number of people.1602

Among those total number of people, R is the number of women and N - R is the number of men in the room.1614

N – r is the number of men.1629

We are going to pick a committee of n people and we want to find the expected number of women.1632

There is a very clever way to do this which is to set up indicator variable.1639

Let me show you what I mean by indicator variables.1642

Let me define Y1, by definition is an indicator variable.1645

Let us consider that we are going to pick these people to be on our committee one by one.1652

We look around the room and say I want you, you, you, and you, to be on the committee.1658

We are picking these people one by one.1663

Y1 is going to be an indicator variable that tells us whether the first person on that committee is a woman or not.1665

Y1 is defined to be, one if we get a woman on the first pick.1675

We pick our first person to be on the committee, Y1 is an indicator variable.1693

It is going to be a one if it is a woman, 0 if it is a man.1698

It is a little strange but when we say Y1 is the number of women we get on the first choice.1707

We either get one woman or we get a man, that is 0 women.1712

We will define Y2 to be one, if we get a woman on the second pick.1716

The second person we look at.1730

If that is a woman, we say Y2 was going to be 1.1732

If it is a man, we say Y2 is going to be 0.1736

Let us keep on going and we are picking n people to be on this committee.1742

We go to Yn here, we define our indicator variables.1746

There is one variable for each person on this committee.1752

What that means is Y is the total number of women on the committee.1756

What that means is it is the number of women we got on the first pick, which is either 1 or 0.1768

The number of woman we got on the second pick up to Yn.1775

The total number of women, we can count the number of women just by counting all the 1 we got by those indicator variables.1779

That breaks down into a sum of these indicator variables.1787

In order to find the expected value of Y, the expected number of women, it is the same as the expected value of Y1 + Y2, up to Yn.1791

We can use linearity of expectations.1806

This is where we are going to use linearity right here, linearity of expectation, very important here.1809

These variables are not independent but linearity of expectation does not require that.1819

Even though these variables are not independent, if you get a woman on the first pick, 1825

you are less likely to get a woman on the second because there is fewer women to pick now.1829

Even though they are not independent, you can still use linearity of expectation.1834

That is the glorious thing about linearity of expectation.1838

It breaks up in the expected value of each of these indicator variables.1843

What is the expected value of each of these indicator variables?1848

Let us think about that, I will give you good way to think about that.1853

If you think about just listing Y1, we pick one woman out of a crowd.1856

The original definition of expected value is, you look at all the possible values of that variable1865

and you multiply that variable × the probability of getting that value.1873

This is going back to the original definition of expected value.1879

I covered this in one of the very early lectures on probability.1885

You can go back and look at some of those early lectures on probability and you will see this.1890

What are the possible values of these indicator variables?1893

There is only 0 and 1 because we setup here that the indicator variable is always going to be 0 or 1.1897

This expands out in to 0 × the probability of 0 + 1 × the probability of 1.1907

What is the probability that indicator variable is going to come up 0?1919

It is the probability that we get a man because the indicator variable was 0 if we get a man + 1 × 1923

the probability that we get a woman when we make our first pick.1934

I do not care about the 0, the probability of getting a woman.1940

How many people were there in the room?1947

There were N people in the room and r of those of people is women.1950

This is exactly r/N, that is the expected value of one of those indicator variables.1959

It is just r/N.1967

We can say that all of those indicator variables, they all have the same expected value.1972

Each one of these is r/N and there are n of these variables.1980

What we get here is n × r/N.1993

That is the expected value of our random variable.2000

That is the expected number of women on our committee.2006

That checks with the value of the mean that I gave you way back on the third slide of this video.2011

That is really where that number comes from, now you have the derivation to back it up.2019

Now, you hopefully understand it yourself.2023

In case that did not make sense, a quick recap here.2026

N was the total number of people in the room N.2029

r is the number of women which leaves N - R to be the number of men left over.2033

We are going to pick a committee of n people and Y is the number of women we get on our committee.2039

One way to break that down is to look at our picks one by one.2048

We pick this person and then that person and then that person and then that person, to be on our committee.2051

Each person we set up this n indicator variable, that is going to be 1 if we get a woman and 0 if we get a man.2057

Each person has their own indicator variable and that means the total number of women 2065

is just the sum of all these indicator variables.2071

It is the sum of all the women that we got when we made each one of these picks.2075

The expected value is expected value of the sum here is where we use linearity of expectation.2081

That is kind of a big deal in probability, let me highlight that to break that up into the expected value of each of these indicator variables.2089

We can calculate the expected value of these indicator variables, we just say the only possible values they can take are 0 and 1.2098

Using our original definition for expected value, we have 0 × the probability of 0, 1 × the probability of 1.2107

We really only need to calculate the probability of 1, which means the probability that we get a woman, 2115

when we pick a certain person from this room.2121

There are R women in the room and n total people in the room, that probability is r/N.2124

We fill that in for each of our expected values here, it is the same for every indicator variable.2137

We are adding up a bunch of r/N.2148

We are adding up n of them and we get n × r/N as our answer.2152

That checks with the mean of the hyper geometric random variable that I gave you back earlier on in this lecture.2162

That is our last example problem and that wraps up our lecture here on the hyper geometric distribution.2171

You are watching the probability videos here on www.educator.com.2179

My name is Will Murray, thank you for joining us, see you next time, bye.2184