WEBVTT mathematics/probability/murray
00:00:00.000 --> 00:00:05.200
Hi and welcome back to the probability lectures here on www.educator.com, my name is Will Murray.
00:00:05.200 --> 00:00:09.100
Today, we are going to learn about the geometric distribution.
00:00:09.100 --> 00:00:16.500
It looks a lot like the binomial distribution in the initial setup because it is describing a similar situation.
00:00:16.500 --> 00:00:23.600
I will try to make it clear how the geometric distribution is actually different from the binomial distribution.
00:00:23.600 --> 00:00:28.500
The idea of the geometric distribution is that you have a sequence of trials.
00:00:28.500 --> 00:00:34.400
Each one of these trials can have two outcomes, you think of those as being success or failure.
00:00:34.400 --> 00:00:39.300
Very typically, you think of this as being a sequence of coin flips, you are flipping a coin,
00:00:39.300 --> 00:00:43.300
but it can really be anything where there are 2 possible outcomes.
00:00:43.300 --> 00:00:48.300
For example, each year you want to know if the Yankees are going to win the World Series.
00:00:48.300 --> 00:00:52.700
Each year, either they win the World Series or they do not win the World Series.
00:00:52.700 --> 00:00:57.600
There is a yes or no outcome every single time you run the trial.
00:00:57.600 --> 00:01:09.500
The key point about the geometric distribution is that you continue the trials indefinitely until you get the first success.
00:01:09.500 --> 00:01:16.200
For example, if you are flipping a coin, you would keep flipping a coin over and over again
00:01:16.200 --> 00:01:20.100
until you get the first head and then you would stop.
00:01:20.100 --> 00:01:26.600
Or if you are tracking the Yankees winning the World Series, you wait and wait and wait as many years it takes,
00:01:26.600 --> 00:01:31.600
until the Yankees win the World Series for the first time and then you stop.
00:01:31.600 --> 00:01:37.800
The difference between that and the binomial distribution is that, we do not know the number of trials in advance
00:01:37.800 --> 00:01:40.300
and we stop after we get the first success.
00:01:40.300 --> 00:01:46.100
Remember, the binomial distribution, we have a fixed number of trials that we decide ahead of time.
00:01:46.100 --> 00:01:52.300
We say, I'm going to flip this coin 10 × and I will count the number of heads, that was the binomial distribution.
00:01:52.300 --> 00:02:00.800
This is the geometric distribution when we say, I'm going to flip this coin as long as it takes until I get the first head.
00:02:00.800 --> 00:02:02.400
That is the difference between those two.
00:02:02.400 --> 00:02:13.700
You will be really careful when you are studying a new situation, and when talking about the geometric distribution or the binomial distribution.
00:02:13.700 --> 00:02:17.400
I still now have given you any formulas for the geometric distribution.
00:02:17.400 --> 00:02:24.100
The way it works is you have a fix parameter which is the probability of success on each trial.
00:02:24.100 --> 00:02:27.300
If you are flipping a fair coin then P would just be ½.
00:02:27.300 --> 00:02:32.100
If you are tracking the Yankees winning World Series, and you figure that each year,
00:02:32.100 --> 00:02:37.600
they have a 10% chance of winning the World Series then P would be 1/10.
00:02:37.600 --> 00:02:42.500
Q is going to be the probability of failure that is always 1 – P.
00:02:42.500 --> 00:02:44.800
Q is just dependent on P.
00:02:44.800 --> 00:02:49.800
You do not really need to know Q ahead of time because as long as you know P, you can work out what Q is.
00:02:49.800 --> 00:02:57.900
The random variable you are keeping track of is the number of trials that you have to take, in order to get that first success.
00:02:57.900 --> 00:03:02.200
That is another different aspect between the geometric distribution and binomial distribution.
00:03:02.200 --> 00:03:09.600
In the binomial distribution, Y was the number of successes that you get in total.
00:03:09.600 --> 00:03:16.800
In the geometric distribution, Y is the number of × it takes to get the first success.
00:03:16.800 --> 00:03:20.900
We are ready to actually get the formula for the binomial distribution.
00:03:20.900 --> 00:03:29.400
The probability of getting exactly Y trials is equal to Q ⁺Y -1 × P.
00:03:29.400 --> 00:03:36.300
Here, Y can be any number between 1 or it can be arbitrarily large, that is why I put Y less than infinity there.
00:03:36.300 --> 00:03:49.200
This should be fairly easy to remember because what this really represents is Q ⁺Y -1 means you have to fail on the first Y -1 trials.
00:03:49.200 --> 00:03:58.600
If you want to get the first head on your 6th coin flip that means the first 5 coin flips has to come up tails.
00:03:58.600 --> 00:04:08.900
What you are really doing here is your failing on the first Y -1 trials.
00:04:08.900 --> 00:04:21.800
Your first 5 coin flips have to be tails and then this 1 power of P at the end means you exceed on the Yth trial.
00:04:21.800 --> 00:04:27.600
If you are flipping a coin that means on the 6th time you flip a coin, you will only get a head.
00:04:27.600 --> 00:04:38.600
There is a P probability of succeeding on the Yth trial.
00:04:38.600 --> 00:04:45.500
Word of warning here, we have the same bad notation here that we had for a lot of our other distributions
00:04:45.500 --> 00:04:54.900
and a lot of our other formulas in probability, which is that we are using P to represent two different things here.
00:04:54.900 --> 00:05:00.900
This P right here and this P right here are 2 different p's.
00:05:00.900 --> 00:05:16.600
That P, remember, is the probability of success on 1 trial, probability of success on any given trial.
00:05:16.600 --> 00:05:20.500
If you are flipping a coin and it is a fair coin, that P is ½.
00:05:20.500 --> 00:05:25.300
If you are rolling a dice and you are trying to get a 6 then that P would be 1/6.
00:05:25.300 --> 00:05:47.800
This P represents the probability of Y, your random variable having the value of little Y, the probability of Y trials /all.
00:05:47.800 --> 00:05:52.700
That is unfortunate that people use the letter P for many different things.
00:05:52.700 --> 00:05:56.600
It is a curse of the word probability starting with the letter P.
00:05:56.600 --> 00:06:00.300
When you study probability, people tend to overuse letter P.
00:06:00.300 --> 00:06:02.400
Everything is called P.
00:06:02.400 --> 00:06:12.300
Unfortunately, you have to keep track of these and never use lowercase P for both of these, these are 2 different uses of the letter P.
00:06:12.300 --> 00:06:14.100
Just keep track of that.
00:06:14.100 --> 00:06:20.300
But having kept track of that, it should be pretty easy to remember this formula for the geometric distribution
00:06:20.300 --> 00:06:25.100
because you just remember that you keep flipping a coin until you get your first head,
00:06:25.100 --> 00:06:33.400
which means you have to fail on all the previous tries and then succeed on try number Y.
00:06:33.400 --> 00:06:40.700
You fail Y -1 ×, that is what the Q - 1 gives you and you succeed on the very last time.
00:06:40.700 --> 00:06:47.600
That is why there is 1 power of P to represent that final success there.
00:06:47.600 --> 00:06:48.400
Let us keep going with this.
00:06:48.400 --> 00:06:51.800
There is a couple key properties that we need for any distribution.
00:06:51.800 --> 00:06:53.700
I will just list them out here.
00:06:53.700 --> 00:06:56.400
The mean, remember, that is the same as the expected value.
00:06:56.400 --> 00:07:05.100
Mean and expected value are the same thing.
00:07:05.100 --> 00:07:10.400
The mean or the expected value for the geometric distribution is just 1/P.
00:07:10.400 --> 00:07:15.000
The variance for the geometric distribution is Q/P².
00:07:15.000 --> 00:07:26.100
Remember that Q is 1 – P, if you might see that written as 1 – P/P², those mean the same thing.
00:07:26.100 --> 00:07:29.700
Standard deviation is always the square root of the variance.
00:07:29.700 --> 00:07:36.400
It is just the square root of Q/P² which simplifies down to √ Q/P.
00:07:36.400 --> 00:07:48.100
You can also write that as the √ 1 - P that would also be legit to say that that is the standard deviation.
00:07:48.100 --> 00:07:53.400
There is a very useful fact that I want you remember from calculus
00:07:53.400 --> 00:07:57.100
because it comes up a lot when you are doing geometric distribution problems.
00:07:57.100 --> 00:08:00.800
That fact is the sum of an infinite geometric series.
00:08:00.800 --> 00:08:05.600
Let me remind you of the formula for the sum of an infinite geometric series.
00:08:05.600 --> 00:08:12.700
We covered this in calculus 2, if you do not remember this, you might want to go back and review this section of calculus 2
00:08:12.700 --> 00:08:17.100
because we use it a lot in probability, the sum of the geometric series.
00:08:17.100 --> 00:08:25.800
What we learn back in calculus 2 is, if you have a series A + AR + AR², that is a geometric series
00:08:25.800 --> 00:08:34.500
because to get through each term to the next, you are multiplying by R, you are multiplying by R each time.
00:08:34.500 --> 00:08:45.400
I did not write this down on the initial slide but this only works if the common ratio has absolute value less than 1.
00:08:45.400 --> 00:08:54.500
In this case, the sum of the geometric series is given by A/1 – R.
00:08:54.500 --> 00:08:58.400
I think that formula is not very useful to remember.
00:08:58.400 --> 00:09:03.300
I think it is a much more useful to remember the sum of geometric series in words.
00:09:03.300 --> 00:09:09.500
The way I remember it is it is the first term divided by 1 - the common ratio.
00:09:09.500 --> 00:09:16.800
That first term was A and the common ratio is the amount you are multiplying by to get from each term to the next.
00:09:16.800 --> 00:09:22.600
The reason I like that formula better is because it avoids a lot of special cases.
00:09:22.600 --> 00:09:30.400
If you look in your calculus book, you might see different special cases for the sum of AR to be N.
00:09:30.400 --> 00:09:33.600
When N starts at 0, we might see 1 formula there.
00:09:33.600 --> 00:09:39.800
You might see another formula when N starts at 1 of AR ⁺N, a different formula.
00:09:39.800 --> 00:09:46.900
You end up having to memorize all these different formulas depending on subtle differences in how the sum is presented.
00:09:46.900 --> 00:09:53.700
The formula in words is always the same, no matter how you give the series.
00:09:53.700 --> 00:09:56.400
If you remember that one, you will never go wrong.
00:09:56.400 --> 00:10:04.700
I really like this formula the first term / 1 - the common ratio, that is what I tend to remember.
00:10:04.700 --> 00:10:11.400
When I apply that to adding up the sum of the geometric series, it always works.
00:10:11.400 --> 00:10:17.100
Let me give you an example of how that comes up, when we are studying the geometric distribution.
00:10:17.100 --> 00:10:20.800
A lot of times, when you are studying a probability problem,
00:10:20.800 --> 00:10:28.400
you want to say what is the probability that it will take at least a certain number of trials to achieve success?
00:10:28.400 --> 00:10:36.000
If I'm flipping a coin, what is the probability that I will have to flip it at least 3 × before I see my first head?
00:10:36.000 --> 00:10:43.800
If I'm waiting for the Yankees to win the World Series, what is the probability that it will take at least 10 years for them to win the World Series,
00:10:43.800 --> 00:10:49.100
or what is the probability that they will win sometime in the next 10 years?
00:10:49.100 --> 00:10:59.200
The way we want to calculate that is, we want to add up all the values that are bigger than or equal to Y.
00:10:59.200 --> 00:11:04.500
We want to add up if we want to find the probability that it is at least little Y.
00:11:04.500 --> 00:11:12.800
We look at the probability of little Y + the probability of little Y + 1 little Y + 2, and so on, we add that up.
00:11:12.800 --> 00:11:24.100
I’m just going to use my formula for the geometric distribution P of Y is equal to Q ⁺Y -1 × P.
00:11:24.100 --> 00:11:34.900
I fill that in for P of Y and then for P of Y + 1, I just increment the exponent by 1, Q ⁺YP then Q ⁺Y + 1P, and so on.
00:11:34.900 --> 00:11:36.700
That is a geometric series.
00:11:36.700 --> 00:11:43.600
What we are doing each time is we are multiplying each term by Q.
00:11:43.600 --> 00:11:45.500
We got a geometric series.
00:11:45.500 --> 00:11:50.300
I use my first term/ 1 - common ratio formula.
00:11:50.300 --> 00:11:56.300
The first term here is Q ⁺Y -1, fill that in.
00:11:56.300 --> 00:12:02.000
The common ratio is Q, that is why I get 1 - Q in the denominator.
00:12:02.000 --> 00:12:13.100
1-Q is just P, that was because Q itself was 1 – P, 1 – Q is P.
00:12:13.100 --> 00:12:21.600
That cancels with the P in the numerator and we get just Q ⁺Y -1.
00:12:21.600 --> 00:12:28.200
That is our probability that we will go at least Y trials until we get our success.
00:12:28.200 --> 00:12:34.200
Another way to think about that is that in order for the experiment to last for Y trials,
00:12:34.200 --> 00:12:41.800
say 10 trials, it means that you have to fail for the first 9 × that you run the experiment.
00:12:41.800 --> 00:12:51.000
If you are going to run an experiment for 10 trials or more, that means you have to fail 9 ×.
00:12:51.000 --> 00:13:01.300
Q was the probability of failure, you have to fail for the first Y -1 ×, that why you get Q ⁺Y -1 there.
00:13:01.300 --> 00:13:04.500
Let us see how this plays out in some examples.
00:13:04.500 --> 00:13:08.700
In the first example, we are going to draw cards from a deck until we get an ace.
00:13:08.700 --> 00:13:14.000
We got a 52 card deck here, we just start pulling out cards until we get an ace.
00:13:14.000 --> 00:13:23.400
A key question you should always ask about selection like this is do we replace the cards back in the deck before we draw the next one?
00:13:23.400 --> 00:13:28.700
What I have told you in the problem is, if we do replace them that really change the answer.
00:13:28.700 --> 00:13:37.700
You want to be very sure that you understand in probability questions, are you drawing with replacement or without replacement?
00:13:37.700 --> 00:13:39.500
We are going to draw until we get an ace.
00:13:39.500 --> 00:13:46.000
We are being asked what is the chance that we will draw exactly 3 × here?
00:13:46.000 --> 00:13:47.300
Let us think about that.
00:13:47.300 --> 00:13:52.300
This is the geometric distribution because we are doing a sequence of independent trials.
00:13:52.300 --> 00:13:56.400
We are drawing a card, putting it back, trying another card, or possibly the same card.
00:13:56.400 --> 00:13:59.600
Putting it back, drawing again, putting the card back.
00:13:59.600 --> 00:14:04.000
We want to keep doing this until we get an ace and then we stop.
00:14:04.000 --> 00:14:10.400
We want to find out how many times we are going to draw until we get that first ace?
00:14:10.400 --> 00:14:13.400
Let us first of all figure out what our parameter is.
00:14:13.400 --> 00:14:21.500
Our P is our probability of getting an ace when we draw a card from a 52 card deck.
00:14:21.500 --> 00:14:27.400
There are 4 aces in there and there is 52 cards total, that is 1/13.
00:14:27.400 --> 00:14:31.300
We have a 1/13 chance of success anytime we draw a card.
00:14:31.300 --> 00:14:35.900
Q is always 1 – P, that is 12/13.
00:14:35.900 --> 00:14:40.000
That is a chance it will fail on any particular drawing.
00:14:40.000 --> 00:14:45.800
We have a 12/13 chance of getting something other than an ace.
00:14:45.800 --> 00:14:50.400
Let me write down the formula for the geometric distribution.
00:14:50.400 --> 00:14:54.800
P of Y is Q ⁺Y -1 × P.
00:14:54.800 --> 00:15:04.900
In this case, our Y is equal to 3 because we want to know the chance that we will have to draw exactly 3 ×, P of 3.
00:15:04.900 --> 00:15:10.800
In this case, Q is 12/13, I will fill that in, 12/13.
00:15:10.800 --> 00:15:20.000
Y -1 is 3 -1 that is 2, P is 1/13.
00:15:20.000 --> 00:15:25.300
I think I can simplify that a bit, it is 12²/13³.
00:15:25.300 --> 00:15:33.400
I do not think I’m going to go ahead and multiply those out because I do not think the numbers will get any more illuminating there.
00:15:33.400 --> 00:15:35.900
I will leave that as my answer.
00:15:35.900 --> 00:15:38.900
You could multiply that out and get a decimal, if you wanted to.
00:15:38.900 --> 00:15:44.600
Of course, there will be a number between 0 and 1, a fairly low number.
00:15:44.600 --> 00:15:49.500
I do not think it will be very revealing, the decimal that you get.
00:15:49.500 --> 00:15:52.900
That is our answer for example 1 here.
00:15:52.900 --> 00:15:57.000
Just to recap how we did that, we figured out this is a geometric distribution
00:15:57.000 --> 00:16:00.700
because we are running independent trials until we get a success.
00:16:00.700 --> 00:16:08.300
The probability of success is, there are 4 aces out of 52 cards, it is 1/13.
00:16:08.300 --> 00:16:11.100
Q was 1 – P, that is 12/13.
00:16:11.100 --> 00:16:14.700
I will just drop those into my probability distribution formula.
00:16:14.700 --> 00:16:21.900
The Y here is 3 because we want to draw exactly 3 × until we get an ace.
00:16:21.900 --> 00:16:33.100
I filled in Q, I filled in Y, our Y -1, and I filled in P, and then I simplify that down to get an answer.
00:16:33.100 --> 00:16:42.300
In example 2, we have the Akron Arvacks and they are competing in the northwestern championship of pin the tail on the donkey.
00:16:42.300 --> 00:16:46.700
It looks like they have a 10% chance of winning the championship each year.
00:16:46.700 --> 00:16:51.000
We want Y to be the number of years until they next win.
00:16:51.000 --> 00:16:57.900
We are sitting there in Akron and we are hoping that the Arvarks are going to bring home the trophy this year.
00:16:57.900 --> 00:17:02.800
We want to know how many years we might have to wait until we see that trophy.
00:17:02.800 --> 00:17:05.500
We want to find the mean and standard deviation of that.
00:17:05.500 --> 00:17:10.300
This is a geometric distribution once again, because we are waiting for the first success.
00:17:10.300 --> 00:17:16.600
We are waiting for our home team to bring home that first trophy.
00:17:16.600 --> 00:17:21.800
Our P, that is our probability of them winning in any given year, it is 10%.
00:17:21.800 --> 00:17:23.900
I will write that as 1/10.
00:17:23.900 --> 00:17:30.800
Our Q, the probability of failure , if they are not bringing home the trophy that is 1 – P.
00:17:30.800 --> 00:17:39.600
Q is always 1 – P, that is 9/10, 1 -1/10.
00:17:39.600 --> 00:17:48.600
I gave you the expected value formula which is the same as the mean in one of the earlier slides of this lecture.
00:17:48.600 --> 00:17:56.800
It is always 1/P, that is 1/1/10 and that is exactly 10.
00:17:56.800 --> 00:17:59.100
We want to express that in years.
00:17:59.100 --> 00:18:07.600
That is not at all surprising, that is kind of the more intuitive results in probability.
00:18:07.600 --> 00:18:17.300
The mean of the geometric distribution, if they have a 10% chance of bringing home the trophy in any given year
00:18:17.300 --> 00:18:24.400
then on average, you expect to wait about 10 years until you bring home a trophy.
00:18:24.400 --> 00:18:29.700
That is not surprising, about once every 10 years, they will bring home a trophy.
00:18:29.700 --> 00:18:34.100
If you sit down the wait, on average it will take you about 10 years.
00:18:34.100 --> 00:18:37.300
You will be waiting about 10 years to see them bring home a trophy.
00:18:37.300 --> 00:18:41.300
The sigma² is the variance.
00:18:41.300 --> 00:18:43.800
The first thing you figure out there was the mean.
00:18:43.800 --> 00:18:46.300
Now, we are going to figure out the variance.
00:18:46.300 --> 00:18:48.100
That is not what the problem is asking.
00:18:48.100 --> 00:18:54.100
The problem is actually asking for standard deviation but the standard deviation is the square root of the variance.
00:18:54.100 --> 00:18:59.000
Once you figure out the variance, it is very easy to find the standard deviation, we just take the square root.
00:18:59.000 --> 00:19:01.300
Let us find the variance first.
00:19:01.300 --> 00:19:13.200
The variance is sigma² is Q/P², that is 9/10 divided by P is 1/10.
00:19:13.200 --> 00:19:15.800
P² is 1/100.
00:19:15.800 --> 00:19:23.300
If I do the flip on the fraction then I get 100 × 9/10.
00:19:23.300 --> 00:19:28.600
100/10 cancels to 10, I get 10 × 9 which is 90.
00:19:28.600 --> 00:19:36.700
The standard deviation, once you found the variance that is very easy.
00:19:36.700 --> 00:19:43.600
You just take the square root of the variance, the √ 90.
00:19:43.600 --> 00:19:49.100
I can simplify that a little bit, I can pull a 9 out of the square root and we will turn into a 3.
00:19:49.100 --> 00:19:57.800
I get 3 √10 and I just threw that into my calculator before I started this.
00:19:57.800 --> 00:20:04.300
What I came up with was 3 √10 is 9. 487.
00:20:04.300 --> 00:20:08.200
Our units there are years.
00:20:08.200 --> 00:20:14.600
The standard deviation in the waiting time is 9.487 years.
00:20:14.600 --> 00:20:20.400
That just means if you are waiting for Akron to bring home a trophy and pin the tail on the donkey,
00:20:20.400 --> 00:20:29.800
you expect to wait about 10 years on average but the standard deviation is almost another 9 1/2 years on either side.
00:20:29.800 --> 00:20:37.200
You could easily be waiting 20 years, for example, before they bring home their first trophy.
00:20:37.200 --> 00:20:39.700
Let me remind you how we did each step there.
00:20:39.700 --> 00:20:46.400
First identify that this was a geometric distribution because we are waiting for them to bring home their first trophy.
00:20:46.400 --> 00:20:50.500
We are waiting for the first success in a sequence of trials.
00:20:50.500 --> 00:20:57.800
Each year they go out, they play the championships, has a 10% chance they win everything.
00:20:57.800 --> 00:21:09.100
That 10% is actually the probabilities, that is why I get the P=1/10, that came from the 10% there.
00:21:09.100 --> 00:21:12.600
The Q is always 1 – P, I got the 9/10.
00:21:12.600 --> 00:21:20.800
I just read off the mean and variance, and standard deviation formulas from one of the earlier slides in this lecture.
00:21:20.800 --> 00:21:24.100
You can scroll back and you will see where those formulas come from.
00:21:24.100 --> 00:21:27.500
The mean is 1/P that is 10 years.
00:21:27.500 --> 00:21:30.900
We expect to wait about 10 years before we are going to bring home a trophy.
00:21:30.900 --> 00:21:33.000
Very intuitive result, by the way.
00:21:33.000 --> 00:21:40.100
The variance is Q/P², I filled in my Q and my P, simplified that down to 90.
00:21:40.100 --> 00:21:44.700
The units on that would be years² which really is not very meaningful.
00:21:44.700 --> 00:21:46.600
I did not write those in.
00:21:46.600 --> 00:21:50.500
In fact, what we are looking for is the standard deviation.
00:21:50.500 --> 00:21:57.600
That is the square root of the variance, always standard deviation you find it by taking the square root of the variance.
00:21:57.600 --> 00:21:59.900
You always can find it that way.
00:21:59.900 --> 00:22:11.400
That simplify down to 3 √10 and I made decimal of that 9.487 years is the standard deviation there.
00:22:11.400 --> 00:22:14.600
In example 3 here, we are going to make things a little more complicated.
00:22:14.600 --> 00:22:22.700
We are setting up a game between you and your friend and it is a fairly simple game but it will give us some interesting probability.
00:22:22.700 --> 00:22:30.000
You just take turns rolling a dice and you get to roll first and then your friend rolls, and then you roll, then your friend rolls.
00:22:30.000 --> 00:22:36.700
Whoever rolls a 6 wins and I guess the person has to pay them some money or something.
00:22:36.700 --> 00:22:38.900
You are both trying to roll a 6.
00:22:38.900 --> 00:22:42.100
If you a roll 6 right away, then you win.
00:22:42.100 --> 00:22:46.500
If you fail to roll a 6 and then your friend rolls a 6, then your friend wins.
00:22:46.500 --> 00:22:52.500
And then, you just keep going back and forth until the first person rolls a 6.
00:22:52.500 --> 00:22:55.900
We want to ask several questions about this.
00:22:55.900 --> 00:22:59.100
What is the chance that you will win on your third roll?
00:22:59.100 --> 00:23:02.600
In exactly your third roll, you win the game.
00:23:02.600 --> 00:23:06.900
What is the chance that your friend will get to roll 3 × or more?
00:23:06.900 --> 00:23:14.300
What is the chance that you will win overall and that includes you winning on the first roll, maybe on your second roll, third roll, and so on.
00:23:14.300 --> 00:23:16.100
Several different questions here.
00:23:16.100 --> 00:23:23.700
This is a geometric distribution because if you can think of you and your friend rolling a dice together,
00:23:23.700 --> 00:23:29.500
you are going to keep rolling the dice until the first 6 comes up, whether it is you or your friend rolls the 6,
00:23:29.500 --> 00:23:32.000
you are going to keep rolling the dice until the first 6 comes up.
00:23:32.000 --> 00:23:38.700
And then, the game is over, you make the payoff or somebody has to wash the dishes.
00:23:38.700 --> 00:23:40.000
You do not roll it anymore.
00:23:40.000 --> 00:23:46.300
That is definitely a geometric distribution because you are waiting for the first success.
00:23:46.300 --> 00:23:55.700
This is geometric, probability is the chance of getting a 6 on any given roll, that is 1/6.
00:23:55.700 --> 00:24:07.900
Our Q is 1 – P, that is 5/6.
00:24:07.900 --> 00:24:10.200
Let us go ahead and answer these 3 questions.
00:24:10.200 --> 00:24:19.400
I did not really left myself space on the slides, hang on to this because I’m going to go in the next slide and will answer the questions.
00:24:19.400 --> 00:24:24.200
The first question is that you will win on your 3rd roll.
00:24:24.200 --> 00:24:28.800
Let me remind you that you get to go first and you roll first and then your friend goes, and then you roll and then your friend goes,
00:24:28.800 --> 00:24:35.700
and then you roll and so on, like that.
00:24:35.700 --> 00:24:47.200
If you are going to win on your third roll, that means that we have to get that 6 on the 5th turn of the game.
00:24:47.200 --> 00:24:54.800
That is exactly what would happen, in order for you to win on your 3rd roll of the game.
00:24:54.800 --> 00:25:01.200
What we are really asking here is the probability that Y is equal to 5.
00:25:01.200 --> 00:25:05.600
Let me remind you of the formula for probability distribution.
00:25:05.600 --> 00:25:14.300
For geometric distribution, P of Y is Q ⁺Y-1 × P.
00:25:14.300 --> 00:25:20.500
We already said on the previous slide, we already said that our P for this game is 1/6
00:25:20.500 --> 00:25:24.400
because that is probability of getting a 6 on any particular roll.
00:25:24.400 --> 00:25:27.600
Our Q is just 1 – P, that is 5/6.
00:25:27.600 --> 00:25:40.700
In this case, P of 5 is Q ⁺Y -1 that is 5/6 ⁺Y - 1 -5 here, that is⁴ × P is 1/6.
00:25:40.700 --> 00:25:49.200
I can simplify that down a little bit into 5⁴/6⁴ × 6⁵.
00:25:49.200 --> 00:25:54.100
That is the probability that you will win on exactly your 3rd roll of the game.
00:25:54.100 --> 00:25:58.500
I did not find the decimal for that, it will be a pretty small number because
00:25:58.500 --> 00:26:02.600
it is not very likely that you will win exactly on the 3rd roll of the game.
00:26:02.600 --> 00:26:08.100
What is the probability that your friend will roll 3 × or more?
00:26:08.100 --> 00:26:19.600
What that is really asking is, were your friend rolling 3 × or more, that is the 6th roll of the game.
00:26:19.600 --> 00:26:27.100
In order to get to that turn, you have to have 6 rolls or more.
00:26:27.100 --> 00:26:37.500
What we are really asking here is the probability that Y is greater than or equal to 6.
00:26:37.500 --> 00:26:42.300
We work that out using a geometric series on one of the earlier slides.
00:26:42.300 --> 00:26:49.000
That is the probability, let me just remind you the probability that Y
00:26:49.000 --> 00:26:57.500
is greater than or equal to any particular value of little Y is Q ⁺Y -1.
00:26:57.500 --> 00:27:06.400
We work that using a geometric series, you can also think about that as you have to fail Y -1 ×,
00:27:06.400 --> 00:27:12.400
in order to get the success on the yth turn or later.
00:27:12.400 --> 00:27:18.600
In this case, our Q is 5/6.
00:27:18.600 --> 00:27:23.900
Our Y -1 is 6 -1, that is 5.
00:27:23.900 --> 00:27:28.300
That does not really simplify so I’m just going to leave that in that form.
00:27:28.300 --> 00:27:41.100
That is our chance that the game will run 6 turns or more in total, which will give your friend a chance to roll 3 × or more.
00:27:41.100 --> 00:27:45.500
What is the probability that you will win this game?
00:27:45.500 --> 00:27:47.300
That means you can win on any turn.
00:27:47.300 --> 00:27:49.600
This is going to be a little more complicated.
00:27:49.600 --> 00:28:00.400
The probability that you will win means that the game ends either on the first turn or that it ends on the third turn
00:28:00.400 --> 00:28:06.400
because if it ends on the third term that means you rolled last and you win, or it ends on the 5th turn and so on.
00:28:06.400 --> 00:28:14.000
It really means what is the probability that this game is going to go an odd number of terms.
00:28:14.000 --> 00:28:22.400
The probability of 1 + the probability of 3, we will fill in each of these, + the probability of 5, and so on.
00:28:22.400 --> 00:28:28.900
That is the probability that you will win this game and that the game will be won on either the first turn,
00:28:28.900 --> 00:28:31.900
or the 3rd term, or the 5th turn, or so on.
00:28:31.900 --> 00:28:33.800
Let me fill that in.
00:28:33.800 --> 00:28:41.100
The probability of 1, for each one of those, I'm going to use my geometric distribution formula.
00:28:41.100 --> 00:28:47.800
Q ⁺Y-1 × P, I’m just going to leave it in terms of P and Q for now.
00:28:47.800 --> 00:28:51.700
When I fill in P and Q are, in a moment.
00:28:51.700 --> 00:29:06.200
Q ⁺Y-1P when Y is equal to 1 that is Q ⁺0P, that is P +, when Y =3 that is Q ⁺2P.
00:29:06.200 --> 00:29:19.900
Y = 5 will give us Q ⁺4P, Y = 7 I will fill in one more, + Q ⁺6P, and so on.
00:29:19.900 --> 00:29:37.400
What I see here is a geometric series, this is a geometric series with,
00:29:37.400 --> 00:29:40.200
I want to figure out what the common ratio is between each term.
00:29:40.200 --> 00:29:46.400
I see that to get from the first term to the next one, I’m multiplying by Q².
00:29:46.400 --> 00:29:51.000
To get from that term to the next one, I’m multiplying by Q² and so on.
00:29:51.000 --> 00:29:54.200
It is Q² every time, that is why it is a geometric series.
00:29:54.200 --> 00:29:58.100
My common ratio is Q².
00:29:58.100 --> 00:30:03.800
I can use my formula for the sum of the geometric series.
00:30:03.800 --> 00:30:17.900
Remember my formula, the best way to remember it is in words, first term/ 1 - the common ratio.
00:30:17.900 --> 00:30:25.700
In this case, the first term is P, the common ratio is Q² 1 - Q².
00:30:25.700 --> 00:30:31.000
I think I’m going to fill in what actual numbers for the P and the Q².
00:30:31.000 --> 00:30:37.900
P where was that, it is up here is 1/6.
00:30:37.900 --> 00:30:42.000
1 - Q² 1 - Q was 5/6.
00:30:42.000 --> 00:30:46.100
Q² is 25/36.
00:30:46.100 --> 00:30:53.200
I want to simplify that a bit, we will multiply top and bottom by 36.
00:30:53.200 --> 00:30:55.500
Let me separate those.
00:30:55.500 --> 00:31:00.600
Top and bottom by 36 there, so 36 × the top 36 × the bottom.
00:31:00.600 --> 00:31:02.900
I will simplify things on the bottom.
00:31:02.900 --> 00:31:16.900
That will give me a 6 in the top and in the bottom I will get 36 -25 which simplifies down to 6/11.
00:31:16.900 --> 00:31:19.200
That is your chance of winning the game.
00:31:19.200 --> 00:31:22.200
Your chance of winning the game is 6 /11.
00:31:22.200 --> 00:31:25.200
Notice, by the way, that is a little bit more than ½.
00:31:25.200 --> 00:31:31.900
6/12 would be ½, 6/11 is slightly more than ½.
00:31:31.900 --> 00:31:37.800
That makes complete sense because you have a slight advantage in this game.
00:31:37.800 --> 00:31:40.600
The advantage is that you got to roll first.
00:31:40.600 --> 00:31:46.600
You are a little more likely to get a 6 before your friend is.
00:31:46.600 --> 00:31:50.500
It is not a big advantage because 6 are not all that likely when you roll the dice.
00:31:50.500 --> 00:31:53.700
In the long run, it would not really make a difference who got to roll first.
00:31:53.700 --> 00:32:02.900
But you get to pick up a slight advantage by rolling first and that is why 6/11 is slightly bigger than ½.
00:32:02.900 --> 00:32:06.800
Let me go to the steps again, make sure that everybody was able to follow them.
00:32:06.800 --> 00:32:14.600
The key thing here is we are playing this game where you roll and then your friend rolls, and then you roll and then your friend rolls.
00:32:14.600 --> 00:32:21.300
You want to keep track of the terms of the game that is sort of odd-even, odd-even, between you and your friend.
00:32:21.300 --> 00:32:27.500
If you are going to win on your third roll, your third role is the 5th roll of the game.
00:32:27.500 --> 00:32:36.500
That is why I put P of 5 here and then I use my formula for the geometric distribution to get Q ⁺Y -1.
00:32:36.500 --> 00:32:45.200
The Y was 5 × P and then I just simplified that down to 5⁴/6⁵.
00:32:45.200 --> 00:32:52.100
If your friend is going to roll 3 × or more, that means your friend has to get to his third roll.
00:32:52.100 --> 00:33:00.800
His third roll is the 6th roll of the game that is why we are looking at the probability that Y is greater than or equal to 6.
00:33:00.800 --> 00:33:06.300
We worked out the probability generically for the geometric distribution on one of the earlier slides.
00:33:06.300 --> 00:33:13.700
You can go back and check it out, if you have not watched it recently but that probability is Q ⁺Y -1.
00:33:13.700 --> 00:33:20.800
Another way to think about that, we worked it out using geometric series before but you can also think about that as,
00:33:20.800 --> 00:33:28.200
in order for the game to last 6 or more turns, it means we have to fail on the first 5 turns.
00:33:28.200 --> 00:33:33.900
Fail means we have to not roll a 6 on the first 5 turns.
00:33:33.900 --> 00:33:44.000
The chance of not rolling a 6 is 5/6 and to last 6 turns, we got to roll something else 5 × in a row.
00:33:44.000 --> 00:33:46.300
That is where that answer comes from.
00:33:46.300 --> 00:33:50.700
Finally, we are asking what is the probability the you will win?
00:33:50.700 --> 00:33:55.500
You get to roll on all odd turns, the first, the third, the fifth, and so on.
00:33:55.500 --> 00:34:05.600
We are asking what is the probability that the game is won on the first turn or on the third turn, or on the fifth turn?
00:34:05.600 --> 00:34:12.100
Adding up those probabilities and each one of those, I use my geometric distribution formula.
00:34:12.100 --> 00:34:18.300
The cool thing is that when I'm wrote all those out, I noticed that I had a geometric series.
00:34:18.300 --> 00:34:24.000
I had a common ratio of Q² that let me use my geometric series formula.
00:34:24.000 --> 00:34:29.400
By the way, I reminded you of that, the geometric series formula earlier on in this lecture.
00:34:29.400 --> 00:34:32.000
You scroll back a couple of slides, you will see that.
00:34:32.000 --> 00:34:38.400
The first term/ 1 - common ratio, the first term is, the common ratio was Q².
00:34:38.400 --> 00:34:43.200
Then I filled in the numbers for P and Q, that came from up here.
00:34:43.200 --> 00:34:50.500
Fill the numbers for P and Q down here, did a little bit simplifying with the fractions and it came down to 6 /11.
00:34:50.500 --> 00:34:58.400
I did not convert that into a decimal but one thing I know for sure is that that is a little bit bigger than 50%,
00:34:58.400 --> 00:35:06.200
which makes sense because you get to roll first, you are a little bit more likely than your friend to win this game.
00:35:06.200 --> 00:35:15.800
It is a very plausible answer that it can come out to be a little bit over 50%.
00:35:15.800 --> 00:35:22.300
Let us move on to example 4, here we have a company that is interviewing applicants for jobs.
00:35:22.300 --> 00:35:27.100
They have a job opening and they are interviewing their applicants.
00:35:27.100 --> 00:35:32.100
In the general population, 10% of the applicants actually possess the right skills.
00:35:32.100 --> 00:35:36.700
Maybe they have to have knowledge of a certain computer applications, for example.
00:35:36.700 --> 00:35:38.800
Or they have to have study probability.
00:35:38.800 --> 00:35:44.300
Only 10% of applicants for this job actually are qualified for the job.
00:35:44.300 --> 00:35:47.300
The company is just going to interview people over and over again,
00:35:47.300 --> 00:35:54.000
until they find 1 person who is qualify for the job and then they are going to hire that person.
00:35:54.000 --> 00:35:59.000
We are asking here the probability that they will interview exactly 10 applicants
00:35:59.000 --> 00:36:06.600
which essentially means that the first 9 people will be bombs and the 10th person is that qualified person,
00:36:06.600 --> 00:36:10.800
and they are going to hire the 10th person.
00:36:10.800 --> 00:36:17.000
Part B here, we are going to calculate the probability that they will interview at least 10 applicants.
00:36:17.000 --> 00:36:24.300
Maybe, they will have to interview 50 people before they find the perfect person for that job but it is at least 10.
00:36:24.300 --> 00:36:32.400
This is a geometric distribution because we have a sequence of trials, ultimately ending in 1 success.
00:36:32.400 --> 00:36:40.200
As soon as we get 1 success, as soon as we find 1 person who is qualified, we hire that person and then we send everybody also away.
00:36:40.200 --> 00:36:43.200
We stop the interview process right there.
00:36:43.200 --> 00:36:48.500
This is a geometric distribution, let me fill in my parameters here.
00:36:48.500 --> 00:36:58.300
The probability of any given person being a worthy applicant for the job is 10%, that is 1/10.
00:36:58.300 --> 00:37:04.300
The Q is always 1 – P, in this case that is 9/10.
00:37:04.300 --> 00:37:10.700
Let me write down some of the formulas that we had earlier on in the lecture because those would be useful for this.
00:37:10.700 --> 00:37:27.300
P of Y is Q ⁺Y -1 × P and the probability that Y is greater than or equal to any particular value is just Q ⁺Y -1.
00:37:27.300 --> 00:37:30.500
Let us go ahead and work that out.
00:37:30.500 --> 00:37:37.400
In this case, for part A, we want to know what is the probability that we will interview exactly 10 applicants?
00:37:37.400 --> 00:37:43.400
I have 9 failures and the 10th person is the perfect person for the job.
00:37:43.400 --> 00:37:59.500
That is the probability of getting exactly 10 and from my formula, my Q ⁺Y -1 P formula that is Q⁹ × P, which our Q was 9/10.
00:37:59.500 --> 00:38:15.800
Let me raise that to the 9th power, multiply it by a single power of P 1/10 and that is 9⁹/10⁹ × another 10, 10 ⁺10.
00:38:15.800 --> 00:38:19.500
I did not try to find the decimal for that, it would again be quite small.
00:38:19.500 --> 00:38:24.300
If you plug that into your calculator, it should be a very small decimal
00:38:24.300 --> 00:38:28.800
because it is quite unlikely that we would have to interview exactly 10 applicants.
00:38:28.800 --> 00:38:36.600
Most likely, we will find somebody earlier than that or probably later than that.
00:38:36.600 --> 00:38:40.700
Let us find the probability that they will interview at least 10 applicants.
00:38:40.700 --> 00:38:46.800
That is the probability that Y is greater than or equal to 10.
00:38:46.800 --> 00:38:50.300
Y is the number of people that we interview.
00:38:50.300 --> 00:39:04.300
Using our formula up here, the Q ⁺Y -1, that is Q⁹ which is 9/10⁹.
00:39:04.300 --> 00:39:19.900
I did not simplify that but that will be a bit bigger, 10 × bigger than our previous answer.
00:39:19.900 --> 00:39:26.100
That is our probability that we will interview at least 10 applicants.
00:39:26.100 --> 00:39:28.400
That is our answer for both of these.
00:39:28.400 --> 00:39:33.500
By the way, we are going to hang onto this example for the next problem.
00:39:33.500 --> 00:39:37.400
Do not let these numbers and the whole situation completely slip your mind.
00:39:37.400 --> 00:39:41.800
In the meantime, let me quickly remind you where everything came from here.
00:39:41.800 --> 00:39:47.300
In part A, we want to find the probability that they will interview exactly 10 applicants.
00:39:47.300 --> 00:39:57.400
It is a geometric distributions so I’m using my geometric distribution formula right here Q ⁺Y -1 P.
00:39:57.400 --> 00:40:04.100
Our Y here is 10 and we have Q⁹ × P.
00:40:04.100 --> 00:40:08.700
The values of P and Q, I got the P from this 10% right here,
00:40:08.700 --> 00:40:15.800
that is the probability that any given applicant will be a success and Q is just 1 – that, that is 9/10.
00:40:15.800 --> 00:40:19.200
Drop those numbers in and simplify it down.
00:40:19.200 --> 00:40:25.000
In part B, we want the probability of interviewing at least 10 applicants.
00:40:25.000 --> 00:40:29.800
At least 10 means the probability that it will be greater than or equal to 10.
00:40:29.800 --> 00:40:43.100
Using the formula we derived back in one of the earlier slides, several slides ago, the beginning of this video, it is Q ⁺Y -1, that is Q⁹.
00:40:43.100 --> 00:40:48.200
The way to think about that is, in order to interview at least 10 applicants,
00:40:48.200 --> 00:40:51.600
that means the first 9 applicants are failures.
00:40:51.600 --> 00:40:59.400
Each one of those 9 people has a 9/10 chance of being a failure, that means we have to see 9 failures in a row,
00:40:59.400 --> 00:41:03.600
in order to ensure that we end up talking to at least 10 people.
00:41:03.600 --> 00:41:08.400
Like I said, hang onto these numbers for the next example because
00:41:08.400 --> 00:41:15.300
we are going to use the same scenario for the next example, for example 5.
00:41:15.300 --> 00:41:19.200
In example 5, we are going to look back at the company from example 4.
00:41:19.200 --> 00:41:28.900
If you have not just watched example 4, you really need to go back and read that one before example 5 will make sense.
00:41:28.900 --> 00:41:35.300
Checkout example 4, there was a company interviewing applicants for a job opening and
00:41:35.300 --> 00:41:38.700
each applicant has a 10% chance of being selected.
00:41:38.700 --> 00:41:47.500
We interview and interview until we find a good one and then we keep that person, and we stop interviewing.
00:41:47.500 --> 00:41:52.300
What we are doing in example 5 is we are keeping track of how long this procedure will take.
00:41:52.300 --> 00:42:10.700
Apparently, it takes 3 hours to interview an unqualified applicant and 5 hours to interview a qualified applicant.
00:42:10.700 --> 00:42:17.900
All of these people that do not meet the qualifications is going to take 3 hours each for us to figure out
00:42:17.900 --> 00:42:23.000
that these people are actually bombs and do not deserve to be here.
00:42:23.000 --> 00:42:26.700
And then, we finally get somebody that we think is qualified, we are going to interview them
00:42:26.700 --> 00:42:31.600
for an extra 2 hours just to make sure that they really are the right person for this job.
00:42:31.600 --> 00:42:37.600
We want to calculate the mean and the standard deviation of the time to conduct all the interviews.
00:42:37.600 --> 00:42:43.300
How long do we expect this interview process to take at this company?
00:42:43.300 --> 00:42:46.700
Let me show you how to set that up, we have not really seen a problem like this before.
00:42:46.700 --> 00:42:51.100
This is our first one.
00:42:51.100 --> 00:42:58.500
Let me set up a variable that represents time here, T is going to be the time.
00:42:58.500 --> 00:43:12.500
Remember, Y is the number of applicants that we speak to.
00:43:12.500 --> 00:43:19.900
Remember, the deal here is we are going to keep interviewing until we find somebody good.
00:43:19.900 --> 00:43:25.900
That means if we find many good on the 16th try, that means we interviewed 15 people
00:43:25.900 --> 00:43:30.200
who did not measure up and then number 16 was the good one.
00:43:30.200 --> 00:43:38.500
In general T is, we have all the people who do not measure up, there is Y -1 of them.
00:43:38.500 --> 00:43:42.400
Each one of those people cost us 3 hours each.
00:43:42.400 --> 00:43:47.500
They cost us 3 hours to find out that those people did not actually deserve the job.
00:43:47.500 --> 00:43:54.300
The last person, the person who is good that we actually want to give the job to,
00:43:54.300 --> 00:43:57.100
we have to do some extra scrutiny on that person.
00:43:57.100 --> 00:44:04.500
It took us 5 hours to interview her because we wanted to make extra sure that she was really qualified for the job.
00:44:04.500 --> 00:44:08.500
The total time is 3 × Y -1 + 5.
00:44:08.500 --> 00:44:21.300
We can simplify that a bit, that is 3 Y - 3 + 5 which is 3 Y + 2.
00:44:21.300 --> 00:44:29.100
That is the total time that it takes and we want to find the expected value, the mean of that, and the standard deviation.
00:44:29.100 --> 00:44:40.400
Let me calculate first the expected value and the variance of Y because those are going to be useful intermediate steps.
00:44:40.400 --> 00:44:43.200
Let me remind you here what our P was.
00:44:43.200 --> 00:44:47.100
Our P was 1/10 for this problem.
00:44:47.100 --> 00:44:53.000
That is because 10% of the applicants have the right qualifications.
00:44:53.000 --> 00:45:01.800
Our Q is always 1 – P, it is 9/10 here.
00:45:01.800 --> 00:45:11.900
Our expected value of Y, what we learned at the beginning of this lecture is that it is always 1/P.
00:45:11.900 --> 00:45:18.100
In this case, it is always 1/P.
00:45:18.100 --> 00:45:24.300
1/1/10 is just 10.
00:45:24.300 --> 00:45:31.900
Let me go ahead and find variance of Y because that is going to be useful as a steppingstone to finding the standard deviation.
00:45:31.900 --> 00:45:36.500
The variance of Y is always Q/P².
00:45:36.500 --> 00:45:40.000
Again, it is coming from one of the first slides in this lecture.
00:45:40.000 --> 00:45:42.800
You can scroll back and you can find that.
00:45:42.800 --> 00:45:49.900
In this case, the Q is 9/10, P² is 1/100.
00:45:49.900 --> 00:45:58.400
If we do a flip on the denominator, flip that up, we get john 900/10, that is 90.
00:45:58.400 --> 00:46:01.600
That is the variance of Y.
00:46:01.600 --> 00:46:06.200
What we are really want is the mean and standard deviation of T.
00:46:06.200 --> 00:46:09.000
Let me go ahead and figure those out for T.
00:46:09.000 --> 00:46:14.500
We want the expected value of T but that is the expected value.
00:46:14.500 --> 00:46:24.200
T was 3 Y + 2 and it is time to remember some properties of expectation.
00:46:24.200 --> 00:46:26.500
In particular, expectation is linear.
00:46:26.500 --> 00:46:32.300
You can write this as 3 × E of Y + 2.
00:46:32.300 --> 00:46:35.400
We can pull the 2 out because expectation is linear.
00:46:35.400 --> 00:46:45.300
This is 3 × E of Y was 10, 3 × 10 + 2 is 32.
00:46:45.300 --> 00:46:49.400
That is the expected amount of time to conduct all these interviews.
00:46:49.400 --> 00:46:52.400
Our unit here is hour, let me go ahead and fill that in.
00:46:52.400 --> 00:47:01.300
32 hours is the expected amount of time to conduct all these interviews.
00:47:01.300 --> 00:47:08.000
The variance is more complicated and I want to remind you of an old rule in probability.
00:47:08.000 --> 00:47:20.200
The variance of AY + B is equal to A² × the variance of Y.
00:47:20.200 --> 00:47:22.200
There is no B in the answer.
00:47:22.200 --> 00:47:29.500
That is an old rule in the probability that is very useful and we are going to invoke it right here, the variance of AY + B.
00:47:29.500 --> 00:47:33.700
The B does not affect it, that is shifting all the data over.
00:47:33.700 --> 00:47:35.800
It does not affect how much they vary.
00:47:35.800 --> 00:47:38.400
You get A² × V of Y.
00:47:38.400 --> 00:47:52.100
The variance of T here is the variance of 3 Y + 2 which is now, if I use my A is equal to 3
00:47:52.100 --> 00:47:57.500
and my B is equal to 2 then I get A² × V of Y.
00:47:57.500 --> 00:48:02.300
But B² is 9, so 9 × the variance of Y.
00:48:02.300 --> 00:48:05.300
I figure out the variance of Y up here was 90.
00:48:05.300 --> 00:48:11.900
That is 9 × 90 which is 810, that is the variance.
00:48:11.900 --> 00:48:15.900
It is not the standard deviation, we are trying to find the standard deviation.
00:48:15.900 --> 00:48:23.400
This was variance, this is the mean that we found up above here.
00:48:23.400 --> 00:48:28.700
What I really want is the standard deviation.
00:48:28.700 --> 00:48:40.400
The standard deviation is the square root of the variance of T which is √810.
00:48:40.400 --> 00:48:48.000
I can factor 81 out of that, it is a perfect square so I get 9 × √10.
00:48:48.000 --> 00:48:59.500
I did calculate that decimal for that, I calculated 9 √10 is about 28.46.
00:48:59.500 --> 00:49:06.000
Since, this is a standard deviation my units there would be hours.
00:49:06.000 --> 00:49:11.000
If you are a company and you are planning this interview process, you know that on average,
00:49:11.000 --> 00:49:14.000
about 1 and 10 applicants is going to have the right skills.
00:49:14.000 --> 00:49:21.800
On average, it will take about 32 hours to find the right employee for the job.
00:49:21.800 --> 00:49:28.700
The standard deviation on that figure is 28.46 hours.
00:49:28.700 --> 00:49:37.000
I guess that was an approximation, I should be clear that that was a calculator approximation right there.
00:49:37.000 --> 00:49:38.900
Let me show you how I did that.
00:49:38.900 --> 00:49:40.900
It was one of the trickier problems here.
00:49:40.900 --> 00:49:49.000
What I want to do was write a formula for the amount of time it takes to conduct all the interviews.
00:49:49.000 --> 00:49:54.700
Think of Y as being the number of applicants and T is the time we spend on them.
00:49:54.700 --> 00:50:01.800
What we are doing is all the unqualified people that we speak to, we spent 3 hours each on them.
00:50:01.800 --> 00:50:06.500
Remember, the last person to talk to is the person we hire.
00:50:06.500 --> 00:50:10.600
As soon as we find a good person, we hire her.
00:50:10.600 --> 00:50:16.800
That means all the previous people were unqualified, that is Y -1 people.
00:50:16.800 --> 00:50:21.200
We spend 3 hours on each that is why we have Y - 1 there.
00:50:21.200 --> 00:50:29.000
The last person cost us 5 hours because we want to give that last some extra scrutiny
00:50:29.000 --> 00:50:32.500
and make sure that that person really is qualified for the job.
00:50:32.500 --> 00:50:38.400
We get 3 × Y -1 + 5 that simplifies down to 3 Y + 2.
00:50:38.400 --> 00:50:44.000
We want to find the mean, and the variance, and standard deviation of T of 3 Y + 2.
00:50:44.000 --> 00:50:50.400
To find that, we need the mean and variance of Y itself.
00:50:50.400 --> 00:51:01.200
Using those formulas that I gave you on one of the first slides in this lecture, the mean of the variance are 1 /P and Q/P².
00:51:01.200 --> 00:51:06.700
The P was 1/10, that was coming from example f4, the previous slide.
00:51:06.700 --> 00:51:15.700
Because 10% of the applicants are qualified, your chance of getting a qualified applicant is 1/10.
00:51:15.700 --> 00:51:18.200
The Q is just 1 - P 9/10.
00:51:18.200 --> 00:51:26.200
We drop those numbers in here and you get the mean and variance of Y as being 10 and 90.
00:51:26.200 --> 00:51:32.700
To find the mean and the variance of T, to find the mean, we are going to use linearity.
00:51:32.700 --> 00:51:36.600
The mean is linear, it is 3 × the mean of Y + 2.
00:51:36.600 --> 00:51:39.800
3 × 10 + 2 is 32 hours.
00:51:39.800 --> 00:51:48.100
The variance is not linear, we have this rule right here which tells us what to do with linear expressions in the variance.
00:51:48.100 --> 00:51:57.700
The variance of 3 Y + 2, the 2 is the B but it turns out that that had no affect on the answer
00:51:57.700 --> 00:52:01.200
because there is no B up here, it is just A² Y.
00:52:01.200 --> 00:52:12.000
It is 9 × the variance of Y, 9 × that is our 90 right here, it is coming in here, we get 810.
00:52:12.000 --> 00:52:15.400
And the standard deviation is always the square root of the variance.
00:52:15.400 --> 00:52:24.800
We take √810 and I simplify that down and found the decimal to 28.46 hours.
00:52:24.800 --> 00:52:30.800
That is our mean and our standard deviation, and the time to conduct all these interviews,
00:52:30.800 --> 00:52:37.700
if you are in this company planning for how long it might take to fill your next job opening.
00:52:37.700 --> 00:52:42.800
That is our last example, that wraps up the geometric distribution.
00:52:42.800 --> 00:52:46.700
You are watching the probability lectures here on www.educator.com.
00:52:46.700 --> 00:52:50.000
My name is Will Murray, thank you very much for watching, bye.