WEBVTT mathematics/statistics/son
00:00:00.000 --> 00:00:01.600
Hi and welcome to www.educator.com.
00:00:01.600 --> 00:00:03.700
Today we are going to talk about t-distribution.
00:00:03.700 --> 00:00:11.100
Previously, we learn that there are different situations where we use z and when you use t.
00:00:11.100 --> 00:00:15.600
Today we are going to talk about when to use z versus t.
00:00:15.600 --> 00:00:21.900
We are going to break down and sort of reflect and recognize what is z and t?
00:00:21.900 --> 00:00:24.600
What do they have in common and with is different about them?
00:00:24.600 --> 00:00:30.700
For certain cases we are going to ask question, why not z why t instead?
00:00:30.700 --> 00:00:33.300
What does not z have?
00:00:33.300 --> 00:00:35.600
What is deficient about z?
00:00:35.600 --> 00:00:43.900
We will talk about rules of t distribution, they follow certain patterns and t distributions
00:00:43.900 --> 00:00:48.900
are a family of distributions separated by degrees of freedom.
00:00:48.900 --> 00:00:53.200
Different t distributions have different degrees of freedom.
00:00:53.200 --> 00:00:56.100
We are going to talk about what are degrees of freedom?
00:00:56.100 --> 00:01:05.500
We are going to talk about how degrees of freedom relates to that family of t distribution, and then finally summarize how to find t.
00:01:05.500 --> 00:01:12.200
First off, when do we use z versus t?
00:01:12.200 --> 00:01:20.400
We covered in the previous sections where we look at whether we knew the population parameters or not.
00:01:20.400 --> 00:01:30.900
In hypothesis testing, we frequently do not know the μ of the population, but sometimes we are given sigma for some reason or another.
00:01:30.900 --> 00:01:41.700
In this case we use z in order to figure out how many standard errors away from the mean we are in our SDOM.
00:01:41.700 --> 00:01:47.500
But in other situations, we do not know what sigma is.
00:01:47.500 --> 00:01:58.100
In that case we use t in order to figure out how many standard errors away our x bar is from our μ.
00:01:58.100 --> 00:02:05.700
Just to draw that picture for you remember we are interested in the SDOM because the SDOM tends to be normal given certain conditions.
00:02:05.700 --> 00:02:27.100
Although μ sub x bar = μ given the CLT what we often want to know is if we have x bar that fall here or x bar that falls here.
00:02:27.100 --> 00:02:33.200
We want to know how far away it is from the μ sub x bar.
00:02:33.200 --> 00:02:44.700
In order to find that we would not just use the raw score and get the raw distance but we would want that distance in terms of standard deviation.
00:02:44.700 --> 00:02:47.800
But because this is the SDOM, we call it the standard error.
00:02:47.800 --> 00:03:00.700
We would either want a z or t and these numbers tell us how many standard errors away we are from this point right in the μ.
00:03:00.700 --> 00:03:06.500
What is the z and t?
00:03:06.500 --> 00:03:28.200
The commonality as we saw before is it tells us number of standard error away from μ sub x bar and that is common to both.
00:03:28.200 --> 00:03:33.600
That is what the z score and t score both have in common.
00:03:33.600 --> 00:03:37.500
Because of that their formula looked very much the same.
00:03:37.500 --> 00:03:50.700
For instance, one way we can write the z formula is like this.
00:03:50.700 --> 00:04:10.800
We have x bar - μ or μ sub x bar they are the same and this gives us the distance in terms of just the raw values.
00:04:10.800 --> 00:04:17.800
Just how many whatever inches away, points away, whatever it is.
00:04:17.800 --> 00:04:24.500
Whatever your raw score means, degrees away divided by standard error.
00:04:24.500 --> 00:04:35.600
If we double-click on that standard error and look at what is inside than the standard error also written as sigma sub x bar
00:04:35.600 --> 00:04:43.700
because it is the standard deviation of a whole bunch of mean = sigma ÷ √n.
00:04:43.700 --> 00:04:54.500
If we look at the t score formula then we have almost the same formula.
00:04:54.500 --> 00:05:02.400
We have that distance ÷ how big your little steps are, how big your standard deviations are.
00:05:02.400 --> 00:05:11.500
But when we double-click on the standard error like something on the desktop, you double-click it and open it up what is inside?
00:05:11.500 --> 00:05:22.800
Well, you could also write this one as s sub x bar and that would be s ÷ √n.
00:05:22.800 --> 00:05:25.500
Here in lies this difference right there.
00:05:25.500 --> 00:05:27.500
That is our difference.
00:05:27.500 --> 00:05:45.500
Here the difference is that standard error found using the sigma, the true population standard deviation.
00:05:45.500 --> 00:06:10.700
Obviously if you use the real deal that is better or more accurate than the standard error found using estimate population standard deviation.
00:06:10.700 --> 00:06:15.400
That is s.
00:06:15.400 --> 00:06:22.900
S is estimated from the sample and if we double clicked on s it would look like this.
00:06:22.900 --> 00:06:35.500
It is that basic idea of all the squared deviations away from x bar, away from the mean of the sample.
00:06:35.500 --> 00:06:41.700
X sub i - x bar².
00:06:41.700 --> 00:06:54.600
We have all the squared deviations and we add them up ÷ n -1 because this is our estimate of the population standard deviations
00:06:54.600 --> 00:07:01.400
and all of that under the square root sign in order to just leave us a standard deviation rather than variance.
00:07:01.400 --> 00:07:06.500
This is an estimate of population standard deviation.
00:07:06.500 --> 00:07:09.900
It is not the real deal, so it is not as accurate.
00:07:09.900 --> 00:07:18.200
One thing you should know is that the z score is less variable and the t score is going to be more variable.
00:07:18.200 --> 00:07:22.700
That is going to come in to bear on why we use which one.
00:07:22.700 --> 00:07:28.600
Okay, so why not z?
00:07:28.600 --> 00:07:37.900
When we have situations where we do not have the population standard deviation, why not z?
00:07:37.900 --> 00:07:45.700
Why cannot just be like you are using s, why cannot we do that?
00:07:45.700 --> 00:07:48.600
Why do we use t?
00:07:48.600 --> 00:07:54.400
It is because we use s this is actually something a little bit weird.
00:07:54.400 --> 00:08:01.200
The weirdness comes from the fact that this s is much more variable than sigma.
00:08:01.200 --> 00:08:05.600
Sometimes when we get our estimate, our estimate is scat on.
00:08:05.600 --> 00:08:09.000
Sometimes when we get our estimate it is off.
00:08:09.000 --> 00:08:11.400
That is what we mean when it is more variable.
00:08:11.400 --> 00:08:14.700
It is not going to hit the nail and head everything single time.
00:08:14.700 --> 00:08:16.700
It is going to vary in its accuracy.
00:08:16.700 --> 00:08:21.900
Now z scores are normally distributed when SDOM is normal.
00:08:21.900 --> 00:08:23.500
Here is what this means.
00:08:23.500 --> 00:08:36.000
The way you can think about it is like this, when the SDOM is normal and we pick a bunch of points out
00:08:36.000 --> 00:08:43.200
and find the standard error from those points and plot those, we will get another normal distribution.
00:08:43.200 --> 00:08:50.600
But that is not necessarily the case for s.
00:08:50.600 --> 00:09:07.300
Here we need to know that z scores are nice because z scores is going to perfectly cut off that normal distribution accurately for you.
00:09:07.300 --> 00:09:16.800
Remember, the normal distribution it always has that probability underneath the pro and it has these little marks.
00:09:16.800 --> 00:09:23.000
These can be set in terms of z scores.
00:09:23.000 --> 00:09:39.200
What is nice about the SDOM when it is normal is that when we have the z score it will perfectly match to the proportion of the curve that it covers.
00:09:39.200 --> 00:09:41.500
This will always match.
00:09:41.500 --> 00:09:47.300
The problem is t scores do not match up in this way.
00:09:47.300 --> 00:10:00.200
We can just say why do we just call a t score a z score and still use the same areas underneath the curve?
00:10:00.200 --> 00:10:02.900
We cannot do that because that is just the superficial change.
00:10:02.900 --> 00:10:10.800
Here is what we mean by the z scores are normally distributed.
00:10:10.800 --> 00:10:19.500
When you get z scores and when we talk about normal distribution, I'm not just talking about that bell shaped curve.
00:10:19.500 --> 00:10:28.000
Yes overall it should have that bell shaped general shape but it is a little more specific than that.
00:10:28.000 --> 00:10:35.000
You can have the bell shaped and have the perfect normal distribution.
00:10:35.000 --> 00:10:45.600
For instance, 1 standard deviation away this is area will give you 34% of the area underneath the curve.
00:10:45.600 --> 00:10:52.800
This area is about 14% and this area is about 2%.
00:10:52.800 --> 00:10:55.700
That is a true normal distribution.
00:10:55.700 --> 00:11:02.000
This on the other hand, it looks on the surface as if it is normally distributed.
00:11:02.000 --> 00:11:05.300
It looks like that bell shaped curve, but it is not.
00:11:05.300 --> 00:11:06.300
Here is why.
00:11:06.300 --> 00:11:16.900
This area, I should have actually drawn it a little bit differently, but I want to show you that do not go by appearances.
00:11:16.900 --> 00:11:18.000
Appearances can be deceiving.
00:11:18.000 --> 00:11:25.200
This might actually be a little bit less than 34%.
00:11:25.200 --> 00:11:27.700
It might be something like 25%.
00:11:27.700 --> 00:11:40.000
If that was the case, you would see this area and that area is not 34%.
00:11:40.000 --> 00:11:41.600
It is 25%.
00:11:41.600 --> 00:11:50.300
Not only that but this area is now a little bit more than 13 ½, it is around 14%.
00:11:50.300 --> 00:11:55.200
Now this area is not 2% but 11%.
00:11:55.200 --> 00:12:01.700
Although it looks like a bell shaped curve, it is not quite a normal distribution because
00:12:01.700 --> 00:12:05.900
it does not follow that empirical rule that we have talked about before.
00:12:05.900 --> 00:12:10.900
What is nice about z scores is that z scores will always fall in this pattern.
00:12:10.900 --> 00:12:14.900
These z scores will always correspond to these numbers.
00:12:14.900 --> 00:12:19.600
That is why you could always use that z table in the back and rely on it.
00:12:19.600 --> 00:12:25.800
The t scores are not going to do that for you.
00:12:25.800 --> 00:12:32.800
T scores may not give you that perfect 34, 13 ½ and 2% sort of distribution.
00:12:32.800 --> 00:12:42.000
Even though the SDOM might be normal, the t scores are not necessarily normal.
00:12:42.000 --> 00:12:51.900
We had this normal thing and we have t scores and how do we go from t score's defining this area underneath the curve.
00:12:51.900 --> 00:12:54.100
That is the problem we have here.
00:12:54.100 --> 00:13:01.800
It turns out that if n is big then this does not matter as much.
00:13:01.800 --> 00:13:07.800
It n is really large, if your sample size is large then the t distribution approximates normal.
00:13:07.800 --> 00:13:15.400
It goes towards normal but when n is small, then you have to worry.
00:13:15.400 --> 00:13:21.000
Also when n is in the middle or when n is big, it is just large.
00:13:21.000 --> 00:13:29.900
There are all these situations where you have to worry about the t as well as the area underneath the curve.
00:13:29.900 --> 00:13:36.400
If the t scores are not normally distributed then we cannot calculate the area underneath the curve.
00:13:36.400 --> 00:13:51.900
If we have our lovely SDOM and we know that the SDOM is nice and normal and we have our μ sub x bar here then everything is fine and dandy.
00:13:51.900 --> 00:13:57.900
We have x bar here and we want to find that distance, and we find the t score.
00:13:57.900 --> 00:14:04.400
The problem is we cannot translate from this directly into this area.
00:14:04.400 --> 00:14:06.700
That is the problem we ran into.
00:14:06.700 --> 00:14:25.300
Here what we see is this sort of more like a t distribution than a z distribution.
00:14:25.300 --> 00:14:31.400
I'm just going to call the z distributions to call them basically, the normal distribution.
00:14:31.400 --> 00:14:36.300
The t distribution is often a little bit smooched.
00:14:36.300 --> 00:14:39.900
Think of having that perfect normal bell shape.
00:14:39.900 --> 00:14:42.400
It is squishing the top of it down.
00:14:42.400 --> 00:14:47.800
It makes that shape ball out a little bit.
00:14:47.800 --> 00:14:54.700
It is not as sharply peaked but a little bit more variable.
00:14:54.700 --> 00:15:02.200
We had said the s is more variable than the sigma.
00:15:02.200 --> 00:15:10.900
It makes sense that the t comes from s is more variable than the z that comes from sigma.
00:15:10.900 --> 00:15:21.600
You might be thinking what are we stuck?
00:15:21.600 --> 00:15:23.800
We are not stuck and here is why.
00:15:23.800 --> 00:15:28.500
He actually worked out all the t distributions as well.
00:15:28.500 --> 00:15:44.200
He manually calculated a lot of the t distributions and made tables of the t distributions that we still use today.
00:15:44.200 --> 00:15:48.900
He published those tables and under the pseudonym the student.
00:15:48.900 --> 00:15:57.200
At the time he was working for Guinness brewery and he could not publish because they were sort of like we do not want you to know who we are.
00:15:57.200 --> 00:16:00.200
Our secrets are very dark beer.
00:16:00.200 --> 00:16:06.700
He published under the pseudonym and because of that some of your t distributions
00:16:06.700 --> 00:16:13.200
in the back of your book maybe labeled the students t to talk about Bosset’s t.
00:16:13.200 --> 00:16:20.400
Here is what Bosset found, he found that t distribution can be reliable to.
00:16:20.400 --> 00:16:28.100
You can know about them it is just that you need more information when you need for the z distribution.
00:16:28.100 --> 00:16:30.600
For z distribution you do not need to know anything.
00:16:30.600 --> 00:16:33.200
You just need to know z and it will give you the probability.
00:16:33.200 --> 00:16:35.400
Life is simple.
00:16:35.400 --> 00:16:42.000
T distributions are not that simple, but not that complicated either.
00:16:42.000 --> 00:16:51.700
They had a few more conditions to satisfy and the biggest condition that you will have to know is about degrees of freedom.
00:16:51.700 --> 00:17:04.000
Because for each degree of freedom there is a slightly different t distribution that goes along with it.
00:17:04.000 --> 00:17:10.800
Let us talk about some of the rules that govern t distributions.
00:17:10.800 --> 00:17:18.900
The first one you already know as t distribution gets more normal as n gets bigger.
00:17:18.900 --> 00:17:22.100
This makes sense if we step back and think about it for a second.
00:17:22.100 --> 00:17:34.100
Imagine if n=n then what would your s be?
00:17:34.100 --> 00:17:51.300
If your sample is like a entire population then s should be much closer to the actual
00:17:51.300 --> 00:17:56.700
population standard deviation much better than when n is small.
00:17:56.700 --> 00:18:04.700
It is still a little off because of the n-1 thing but it is very close and that is the closest you can get.
00:18:04.700 --> 00:18:31.500
T distributions are more normalized and gets bigger because s is a better estimate of sigma as n gets bigger.
00:18:31.500 --> 00:18:33.500
That makes sense.
00:18:33.500 --> 00:18:36.400
The problem all stems from s.
00:18:36.400 --> 00:18:47.800
It is variability that as s gets better, less variable and more accurate to the population then t gets better.
00:18:47.800 --> 00:18:50.300
T is based on s.
00:18:50.300 --> 00:18:54.900
That is like t distributions are more normalized as n gets bigger.
00:18:54.900 --> 00:18:58.300
T distributions are a family of distribution.
00:18:58.300 --> 00:19:00.200
It is not just one distribution.
00:19:00.200 --> 00:19:04.700
It is a whole bunch of them that are alike in some way and it depends on n.
00:19:04.700 --> 00:19:14.400
It depends technically on degrees of freedom, but you can say it depend on n sometimes because degrees of freedom is often n -1.
00:19:14.400 --> 00:19:19.400
There are other kinds of degrees of freedom this is the one you need to know for now.
00:19:19.400 --> 00:19:23.200
But later on we will distinguish between different kinds of degrees of freedom.
00:19:23.200 --> 00:19:33.900
Degrees of freedom is actually important as a general idea it is just the number of data points -1.
00:19:33.900 --> 00:19:37.700
We have a family of distributions.
00:19:37.700 --> 00:19:39.300
They all look sort of a like.
00:19:39.300 --> 00:19:49.700
They are all symmetrical and they are unimodal and they have that bell like shape, but they're not quite normal.
00:19:49.700 --> 00:19:51.300
Not all of them.
00:19:51.300 --> 00:20:01.000
As n gets bigger, or as degrees of freedom gets bigger the distribution becomes more and more normal.
00:20:01.000 --> 00:20:06.800
Let us step back and talk a little bit about degrees of freedom first.
00:20:06.800 --> 00:20:13.500
Let us assume there are three subjects in one samples so n=3.
00:20:13.500 --> 00:20:24.500
We know that just by the blind formula n -1 degrees of freedom is 2 but what does this mean?
00:20:24.500 --> 00:20:27.900
Here is the thing.
00:20:27.900 --> 00:20:34.700
Let us assume there are three subjects in one sample and let us say it is some score on a statistics test.
00:20:34.700 --> 00:20:44.100
They can score from 0 to 100 and if I say pick any 3 scores you want and that could be those subject scores.
00:20:44.100 --> 00:20:46.500
Your degrees of freedom would be 3.
00:20:46.500 --> 00:20:48.700
You are free to choose any 3 scores.
00:20:48.700 --> 00:20:50.500
You are not limited.
00:20:50.500 --> 00:20:52.900
You are not restricted in any way.
00:20:52.900 --> 00:21:00.800
If you figure out any sample statistic, let us say the mean or variance.
00:21:00.800 --> 00:21:13.700
If you figure out any sample statistic then if you randomly picked 2 of those scores you can no longer just pick the 3rd score freely.
00:21:13.700 --> 00:21:22.700
You have to pick a particular score because you already used up some of your populations for the mean.
00:21:22.700 --> 00:21:28.100
The mean will constrain which two scores you could pick.
00:21:28.100 --> 00:21:31.700
This logic will become more important later.
00:21:31.700 --> 00:21:33.700
Let us put some numbers in here.
00:21:33.700 --> 00:21:39.300
Let us talk about the case when n= 3 and degrees of freedom = 3.
00:21:39.300 --> 00:21:50.200
It would be like there are three subjects and they could score from 0 to 100.
00:21:50.200 --> 00:21:53.800
I am totally free.
00:21:53.800 --> 00:22:01.200
I can pick 87, 52, my last score I can pick anything I want.
00:22:01.200 --> 00:22:04.800
I can pick 52 again, 100, or 0.
00:22:04.800 --> 00:22:06.100
It does not matter.
00:22:06.100 --> 00:22:08.300
I can just pick any score I want.
00:22:08.300 --> 00:22:13.500
If I erase these other scores I will just put in a different score.
00:22:13.500 --> 00:22:15.400
It does not matter.
00:22:15.400 --> 00:22:17.400
I'm very free to vary.
00:22:17.400 --> 00:22:25.300
But let us talk about the most situations that we have in statistics where we figure out summary statistics.
00:22:25.300 --> 00:22:29.800
Here we have n=3 and degrees of freedom =2.
00:22:29.800 --> 00:22:31.200
Here is why.
00:22:31.200 --> 00:22:37.700
The score is the same, it can go from 0 to 100.
00:22:37.700 --> 00:22:43.600
We also found the x bar =50.
00:22:43.600 --> 00:22:54.200
If we found that the x bar = 50, then we cannot just take any score all 3 times.
00:22:54.200 --> 00:22:56.700
Can we pick any score for the first one?
00:22:56.700 --> 00:22:59.600
Yes I can pick 0.
00:22:59.600 --> 00:23:03.200
Can I pick any score for the 2nd one?
00:23:03.200 --> 00:23:06.600
Sure, I can pick 100.
00:23:06.600 --> 00:23:11.800
Now that third score I cannot take any score.
00:23:11.800 --> 00:23:14.600
If I pick 72 my mean would not be 50.
00:23:14.600 --> 00:23:19.300
If I pick 42 my mean would not be 50.
00:23:19.300 --> 00:23:22.800
If I pick another 0, my mean would not be 50.
00:23:22.800 --> 00:23:30.100
That is the problem and because of that if this is my data set so far I have been free to vary.
00:23:30.100 --> 00:23:35.500
I freely chose this guy but this last one I am locked in.
00:23:35.500 --> 00:23:37.000
I have to choose 50.
00:23:37.000 --> 00:23:40.400
That is the only way I can get a mean of 50.
00:23:40.400 --> 00:23:42.700
That is what we call degrees of freedom.
00:23:42.700 --> 00:23:49.300
This logic is going to become more important later on, but for now what you can think about is
00:23:49.300 --> 00:23:57.500
because we are deriving other summary statistics from our sample we are not completely free to vary.
00:23:57.500 --> 00:23:59.300
We locked ourselves down.
00:23:59.300 --> 00:24:05.000
We pinned ourselves down and built little gates for us at the borders.
00:24:05.000 --> 00:24:19.800
Now you know degrees of freedom and we know as degrees of freedom or n goes up we see more and more normal like distributions.
00:24:19.800 --> 00:24:22.400
I have drawn three distributions here for you.
00:24:22.400 --> 00:24:29.200
Here you might notice that I have used basically the same picture of a curve for all three of these.
00:24:29.200 --> 00:24:32.800
You might think they have all the same distribution.
00:24:32.800 --> 00:24:41.700
Not true because you have to take a look at the way that I have shown you that t down here.
00:24:41.700 --> 00:24:53.000
The way that I have labeled this x axis or t axis in this case is really to change our interpretation of these curves.
00:24:53.000 --> 00:24:56.100
Remember what the normal distribution says.
00:24:56.100 --> 00:25:02.200
The normal distribution says 1 standard deviation to the right or positive side, 1 standard deviation
00:25:02.200 --> 00:25:07.200
to the negative side that area should be about 68% of your entire curve.
00:25:07.200 --> 00:25:10.100
Is it true here?
00:25:10.100 --> 00:25:21.400
No it is not, this does not look like more than 50% of the curve.
00:25:21.400 --> 00:25:26.100
This looks like maybe 1/3.
00:25:26.100 --> 00:25:28.600
Maybe a little less than 1/3.
00:25:28.600 --> 00:25:39.400
This is starting to look more like 60% of the curve, but still maybe not quite 68% of the curve.
00:25:39.400 --> 00:25:44.500
It is still only looks like may be 50% of the curve or a little more.
00:25:44.500 --> 00:26:00.300
Imagine that this was shifted in the middle this would be more like 68% of the curves.
00:26:00.300 --> 00:26:07.500
Something like this would be more like 60% of the curve.
00:26:07.500 --> 00:26:22.200
That is how you can see that as your degrees of freedom increases it becomes more and more normal.
00:26:22.200 --> 00:26:24.800
Even this is not quite normal.
00:26:24.800 --> 00:26:28.100
This is not quite 68% but a little bit less actually.
00:26:28.100 --> 00:26:38.500
As the DF gets bigger and bigger that area starts to look more and more like the normal distribution.
00:26:38.500 --> 00:26:55.100
Now there is another way I can draw these pictures and I believe in this other way you can see more easily how helped this is more variable version.
00:26:55.100 --> 00:27:03.700
Remember I am saying that t distribution is like you are stomping down on the peak of it and smooching it out a little bit.
00:27:03.700 --> 00:27:10.300
I believe that if I draw the same picture in a slightly different way you will see why.
00:27:10.300 --> 00:27:13.900
In this case, here is what I have done.
00:27:13.900 --> 00:27:27.400
I have kept the t axis the same and now it is labeled in the same way, but I have drawn these distributions in a slightly different way.
00:27:27.400 --> 00:27:35.700
Now this one is a little wider and this one is less wide and this one is even less wide.
00:27:35.700 --> 00:27:40.700
It becomes more narrow, more like the normal distribution.
00:27:40.700 --> 00:27:55.000
Notice that if I drew the line here, a little bit after 1 standard deviation away we see they are a little of that curve on the side.
00:27:55.000 --> 00:28:05.000
You know if that is 50% and maybe 15%, 10%, something like that.
00:28:05.000 --> 00:28:12.800
This might look more roughly equivalent to this, maybe a little bit less.
00:28:12.800 --> 00:28:15.200
Maybe like 20%.
00:28:15.200 --> 00:28:19.700
This looks like much more than this.
00:28:19.700 --> 00:28:26.000
Maybe this is like 25 or 30% even compared to this.
00:28:26.000 --> 00:28:38.900
In that way you can see using the same concepts the drawing and picture in a slightly different way that this distribution is much more variable.
00:28:38.900 --> 00:28:41.300
It is spread is very wide.
00:28:41.300 --> 00:28:45.200
Where is this distribution is much less variable?
00:28:45.200 --> 00:28:51.000
Remember t is all because of the variability found in s.
00:28:51.000 --> 00:29:01.400
When s is very, very variable and n is very small, s is very variable, so the t distribution is also quite variable.
00:29:01.400 --> 00:29:12.200
As s n gets bigger, s gets more and more accurate, more like the actual standard deviation of the population.
00:29:12.200 --> 00:29:15.500
And because of that, it becomes more and more normal.
00:29:15.500 --> 00:29:20.700
Let us break this one down.
00:29:20.700 --> 00:29:29.400
In degrees of freedom of 60, here is what it might look like.
00:29:29.400 --> 00:29:37.500
It might look something that is very close to our 34, 13 ½ , 2% normal distribution.
00:29:37.500 --> 00:29:52.100
If we drew our little lines there, that would probably look very close to this picture.
00:29:52.100 --> 00:29:56.900
It looks pretty close.
00:29:56.900 --> 00:30:10.500
When we draw something like this, this area might only be 25% of this whole curve.
00:30:10.500 --> 00:30:16.800
This other areas also combined 25%.
00:30:16.800 --> 00:30:26.100
If I split this like this, then this would be something like 14%.
00:30:26.100 --> 00:30:32.200
A little bit less than this but still quite a bit.
00:30:32.200 --> 00:30:39.800
This one might even be more than 14%, maybe like 18%.
00:30:39.800 --> 00:30:48.700
As you can see that in this distribution even though I have drawn it like this and just labeled it differently.
00:30:48.700 --> 00:30:55.400
In reality, it will look more like this if you kept this t axis to be constant.
00:30:55.400 --> 00:30:59.000
It will look sort of smooched out.
00:30:59.000 --> 00:31:06.800
How do you find t at the end of the day?
00:31:06.800 --> 00:31:14.400
How do you find the t and not only that how do you find the probability associated with that t?
00:31:14.400 --> 00:31:18.000
For instance, where t is greater than 2?
00:31:18.000 --> 00:31:20.700
How do you find these probabilities?
00:31:20.700 --> 00:31:24.200
We know how to do it for z but how do you do it for t?
00:31:24.200 --> 00:31:31.900
One thing that you could do is you can look at the back of your book usually in the appendix section
00:31:31.900 --> 00:31:37.700
there is something called the t distribution or the students t distributions that you can look at.
00:31:37.700 --> 00:31:54.600
Oftentimes it will have degrees of freedom on one side like 2, 3, 4, 5 all the way down and then it will show you either one tailed or two tailed area.
00:31:54.600 --> 00:32:06.600
It might give you .25, .10 and .05, .025.
00:32:06.600 --> 00:32:09.200
It might give you these areas.
00:32:09.200 --> 00:32:15.200
The number right here tells you the t score at that place.
00:32:15.200 --> 00:32:42.100
If you wanted to know where the 25% cut off is, what this t score is for degrees of freedom = 2 distribution and you would look right here.
00:32:42.100 --> 00:32:55.600
If you wanted to know it for .025 then you would look here.
00:32:55.600 --> 00:33:04.600
You want to look for degrees of freedom, as well as how much of the curve you're trying to cover.
00:33:04.600 --> 00:33:08.200
That is definitely one way to do it.
00:33:08.200 --> 00:33:14.800
The other way you could do it is by using Excel and just like how Excel will help you find probabilities
00:33:14.800 --> 00:33:23.000
and z scores for the standardized normal distribution you can also find it in Excel for the t distribution.
00:33:23.000 --> 00:33:26.300
It needs a couple of hints.
00:33:26.300 --> 00:33:31.700
Let us start off with tdist.
00:33:31.700 --> 00:33:39.600
Tdist is the case where you want to find the probability but you have everything else.
00:33:39.600 --> 00:33:52.900
What the tdist will do is if you put in the degrees of freedom and you put in the actual x value.
00:33:52.900 --> 00:33:59.200
You can think of the x value as the t value and it will only take positive t values.
00:33:59.200 --> 00:34:17.900
For instance, a t value of 1 and the number of tails if you want this entire area or you just want that area alone.
00:34:17.900 --> 00:34:26.000
You can either put in one or two then it will give you the probability of this area.
00:34:26.000 --> 00:34:29.700
I can show you right here.
00:34:29.700 --> 00:34:58.100
Let us put in tdist for t(1) and degrees of freedom 2 and let us look at what it might say for two tails.
00:34:58.100 --> 00:35:11.100
It will say 42% and if you look at this exact same thing, but if you look at it for one tail it will just divide this area in half.
00:35:11.100 --> 00:35:15.900
21% and 42% makes sense.
00:35:15.900 --> 00:35:21.900
Basically this is giving you this area + this area if you want 2 tails.
00:35:21.900 --> 00:35:25.100
But if you only want one tail it will just give you this area.
00:35:25.100 --> 00:35:49.400
We know that for 95% competence interval we often use z score of 1.96 and that will give us a tail of .025 or if we count two tails 25%.
00:35:49.400 --> 00:35:58.200
Let us see what this gives for 1.96 when we have a degrees of freedom of only 2.
00:35:58.200 --> 00:36:03.600
Let us put in 1.96.
00:36:03.600 --> 00:36:12.900
If we put that in our z score, if we put in 2 tails we would only get 5%, but let us see what we get here.
00:36:12.900 --> 00:36:19.000
Degrees of freedom 2 and number of tails let us put in 2.
00:36:19.000 --> 00:36:23.500
Do you think this should be more or less than 5%?
00:36:23.500 --> 00:36:25.900
Let us think about this.
00:36:25.900 --> 00:36:36.100
The t distribution is like slightly smooched, it is more spread out and because of that it is going to have this longer tail.
00:36:36.100 --> 00:36:40.100
It is not going to be nice and all compact in the middle.
00:36:40.100 --> 00:36:41.800
It will be spread out.
00:36:41.800 --> 00:36:44.300
We would imagine that it have a fat tail.
00:36:44.300 --> 00:36:46.700
I would say more than 5%.
00:36:46.700 --> 00:36:55.700
We see that it is almost 20% a t of 1.96.
00:36:55.700 --> 00:36:58.600
Let us put that same z score in.
00:36:58.600 --> 00:37:07.200
Normsdist this is whenever we want the probability and put a 1.96.
00:37:07.200 --> 00:37:20.800
Here we get the negative side, so we want 1 - and this gives us just 1 tails.
00:37:20.800 --> 00:37:24.700
I am going to change this to 1 tail, so we could look at it.
00:37:24.700 --> 00:37:33.100
Here on one of our tails, one side of it, it is almost 9 1/2% is still out there.
00:37:33.100 --> 00:37:38.500
But when we use the z score only 2 1/2% are still out there.
00:37:38.500 --> 00:37:51.200
Let us look at the same t distribution for a very high degrees of freedom.
00:37:51.200 --> 00:37:53.600
Let us try 60.
00:37:53.600 --> 00:38:07.600
Even with something like 60 we are starting to get very close to the z distribution, but still this guy is more variable than the z distribution.
00:38:07.600 --> 00:38:09.000
Let us see if we could go even higher.
00:38:09.000 --> 00:38:13.900
Instead of 60 I am going to put in 120.
00:38:13.900 --> 00:38:22.700
Notice we are getting closer but still these are more variable than these.
00:38:22.700 --> 00:38:24.200
Let us go a little less.
00:38:24.200 --> 00:38:28.700
Let us go like 1000 and see what happens there.
00:38:28.700 --> 00:38:36.100
We are getting close but still slightly more variable.
00:38:36.100 --> 00:38:38.600
That is a good principle for us to know.
00:38:38.600 --> 00:38:44.000
The t distribution although it approximates normal, it approximates it from one side.
00:38:44.000 --> 00:38:49.300
Here is the normal distribution standards .02499.
00:38:49.300 --> 00:38:56.400
There it is and it is getting closer and closer to it, but it is approaching it from the high-end.
00:38:56.400 --> 00:39:05.100
These numbers are dropping and getting really close to that, but not quite hitting it.
00:39:05.100 --> 00:39:15.400
Now you know how to get the probabilities but what if you have the probably and you want to find the t score?
00:39:15.400 --> 00:39:16.300
What would you do?
00:39:16.300 --> 00:39:22.600
In this case, you would use the inverse t in for inverse.
00:39:22.600 --> 00:39:25.800
Here you would put in the two tailed probability.
00:39:25.800 --> 00:39:34.300
Let us say we want to know what is the t boundary for if we wanted only 5% in our tails?
00:39:34.300 --> 00:39:37.700
Here is the situation I am talking about for this one.
00:39:37.700 --> 00:39:53.300
We had this distribution and we know we want these to be .025, just like a z distribution.
00:39:53.300 --> 00:39:58.200
We want it to .025 but we want to know what these numbers are here.
00:39:58.200 --> 00:40:02.700
We want to know what these numbers are.
00:40:02.700 --> 00:40:05.400
It depends on your degrees of freedom.
00:40:05.400 --> 00:40:13.300
Let us try degrees of freedom of 2, 60, 120, and 1000.
00:40:13.300 --> 00:40:24.200
Let me label this.
00:40:24.200 --> 00:40:43.400
Here we get the probabilities from t dist and here are the probabilities from standardized normal distribution, or the z distribution.
00:40:43.400 --> 00:41:00.100
We do not want the probabilities we actually want the t boundaries themselves and the z boundaries themselves.
00:41:00.100 --> 00:41:12.400
If we want the z boundary at .025 or at 5%, we would use normsin and we put in our probability.
00:41:12.400 --> 00:41:14.500
I forget if it is one tailed or two tailed.
00:41:14.500 --> 00:41:17.400
Let us try one tailed but we would need two tails.
00:41:17.400 --> 00:41:29.300
We get very close to -1.96.
00:41:29.300 --> 00:41:41.200
We just have to memorize that but that is why this is saying at -1.96 you have about 2 1/2% in that little tail.
00:41:41.200 --> 00:41:43.700
Now what about the t?
00:41:43.700 --> 00:41:52.100
In Excel it is inconsistent because z it gives it to you on the negative side, for the t it only gets 2 for the positive side.
00:41:52.100 --> 00:41:55.200
That is confusing but I often do not memorize that.
00:41:55.200 --> 00:42:00.200
I just try out a couple of things until it spits out the thing I'm looking for.
00:42:00.200 --> 00:42:06.700
You have to understand how these things work so that you could predict what's going on.
00:42:06.700 --> 00:42:17.700
We will use t inverse and we want to know the probability and I believe this is going to be two-tailed.
00:42:17.700 --> 00:42:23.400
.05 and degrees of freedom of 2.
00:42:23.400 --> 00:42:43.400
We get .05 and degrees of freedom just to test whether this is one tailed or two tailed.
00:42:43.400 --> 00:42:44.700
Let me put that in.
00:42:44.700 --> 00:42:49.700
I believe you have to give it two tails.
00:42:49.700 --> 00:43:00.200
You have to put in the two tails probability here so that is .05 and the degrees of freedom 2 and this will give us these boundaries.
00:43:00.200 --> 00:43:09.200
This will only give us the positive boundary, but because it is symmetrical, you automatically know the other side.
00:43:09.200 --> 00:43:12.800
This would give us a boundary of 4.3.
00:43:12.800 --> 00:43:23.700
Remember for the z score this boundary will be 1.96 but for a t distribution with the degree of freedom of 2, this would be 4.3.
00:43:23.700 --> 00:43:29.000
That is quite high because remember it is really spread out.
00:43:29.000 --> 00:43:32.200
You got to go way out far in order to get just that 2%.
00:43:32.200 --> 00:43:41.600
What about this boundary for degrees of freedom of 60?
00:43:41.600 --> 00:43:43.700
What do we get then?
00:43:43.700 --> 00:43:50.400
We get something very close to 1.96 but it is a little bigger than 1.96.
00:43:50.400 --> 00:44:01.400
Remember because the t-distribution is more variable you to go farther out there in order to capture just that small amount of .025%.
00:44:01.400 --> 00:44:06.000
That mean 2.5% or .025.
00:44:06.000 --> 00:44:30.800
If we go to 120 we should expect is that boundary to come closer and closer to 1.96 from the big side, but not quite hit 1.96 or more closely 1.9599.
00:44:30.800 --> 00:44:38.500
We are getting close to that 1.96 number, but still it is a little bit higher.
00:44:38.500 --> 00:44:52.400
Finally we will go buck wild and put in degrees of freedom of 1000 we get something very close to 1.96 but still little the higher than 1.96.
00:44:52.400 --> 00:45:00.800
Those are two different ways that you can find the t, as well as the probability that t is associated with.
00:45:00.800 --> 00:45:14.000
Remember the degrees of freedom and you have to know whether you want two tailed probability or one tailed probability.
00:45:14.000 --> 00:45:17.000
As well as your degrees of freedom.
00:45:17.000 --> 00:45:22.600
That is what you will have to know in order to look things up on a t distribution.
00:45:22.600 --> 00:45:28.900
Let us go on to some examples.
00:45:28.900 --> 00:45:34.100
In each of these situations which distribution do you use, the z or the t?
00:45:34.100 --> 00:45:42.000
Is there a 500 million people on Facebook how many people have fewer friends than Diana, who has 490 friends?
00:45:42.000 --> 00:45:49.600
Assume that the number of friends on Facebook is normally distributed and here they give you the sigma.
00:45:49.600 --> 00:45:53.200
We know that you can use the z distribution here.
00:45:53.200 --> 00:46:03.300
Here the researchers want to compare a given sample of Facebook users average number of friends 25 to the entire population.
00:46:03.300 --> 00:46:11.900
What proportion of sample means will be equal or greater than the mean of this group?
00:46:11.900 --> 00:46:18.800
N = 25, but the mean is 580.
00:46:18.800 --> 00:46:22.700
They have an average of 580 friends.
00:46:22.700 --> 00:46:35.800
Here I definitely would not necessarily use z but I also do not have the standard deviation.
00:46:35.800 --> 00:46:40.300
Maybe this is connected to the previous problem.
00:46:40.300 --> 00:46:50.800
If so, if I assume that they come from the whole population and they give us the information for the whole population here.
00:46:50.800 --> 00:46:56.000
If sigma = 100 then I will use z.
00:46:56.000 --> 00:47:00.100
This one I probably left out some information.
00:47:00.100 --> 00:47:01.900
What about this last one?
00:47:01.900 --> 00:47:09.300
Researchers want to know the 95% competence interval for tagged photos given that a sample of 32 people
00:47:09.300 --> 00:47:14.700
have an average of 185 tagged photos and a standard deviation of 112.
00:47:14.700 --> 00:47:24.400
Here it is very clear, since I know s but I do not know the sigma for tagged photos.
00:47:24.400 --> 00:47:27.900
I only know the sigma for friends, but not for tagged photos.
00:47:27.900 --> 00:47:34.800
In this case, what I would do is use the t distribution because I will probably have to estimate
00:47:34.800 --> 00:47:39.800
the population standard deviation from the sample standard deviation.
00:47:39.800 --> 00:47:48.800
Example 2, what we get is that problem and we just have to solve it.
00:47:48.800 --> 00:47:54.000
There are 500 million people on Facebook but how many people have fewer friends than Diana?
00:47:54.000 --> 00:48:00.500
Here it is good to know that we do not need a sampling distribution of the mean.
00:48:00.500 --> 00:48:02.500
We do not need the SDOM.
00:48:02.500 --> 00:48:06.100
In fact, we are just using the population and Diana.
00:48:06.100 --> 00:48:14.900
We could draw the population and it tells us that the population is normally distributed.
00:48:14.900 --> 00:48:34.200
Number of friends is normally distributed and so the μ = 600 and a standard deviation is 100.
00:48:34.200 --> 00:48:40.500
This little space is 100 so this would be 700.
00:48:40.500 --> 00:48:49.500
Diana has 490 friends so here would be 500.
00:48:49.500 --> 00:48:56.700
It is asking how many people have fewer friends than Diana?
00:48:56.700 --> 00:49:00.300
How many have that?
00:49:00.300 --> 00:49:10.500
It is tricky because this will give us the proportion but it would not give us how many people?
00:49:10.500 --> 00:49:16.100
What we will have to do it multiply that proportion to the 500 million.
00:49:16.100 --> 00:49:21.300
This is all 500,000,000 and that is 100%.
00:49:21.300 --> 00:49:34.200
We will need to know some proportion of them that have friends fewer than Diana, fewer than 490.
00:49:34.200 --> 00:49:41.100
We will have to figure that out and so we will have to multiply 500 million by the percentage.
00:49:41.100 --> 00:49:43.800
Let us get cracking.
00:49:43.800 --> 00:50:09.900
We can figure out the z score for Diana and that would be 490 - 600 and ÷ 100.
00:50:09.900 --> 00:50:16.400
I only need to do standard error if I was using the SDOM but I am using the population standard deviation.
00:50:16.400 --> 00:50:18.200
That is often helpful to draw this.
00:50:18.200 --> 00:50:31.100
Here we have about 100 ÷ 100 = -1.1.
00:50:31.100 --> 00:50:46.400
The z score of -1.1 and I want to know the proportion of people who have friends less than Diana.
00:50:46.400 --> 00:51:04.800
You can look this up on the back of your book, so I would just look up the z score of -1.1 or you could put it into Excel normsdist -1.1.
00:51:04.800 --> 00:51:21.000
I should get about .1357 so that would be .1357.
00:51:21.000 --> 00:51:29.400
That is about 13 ½ % of the population have fewer friends than Diana.
00:51:29.400 --> 00:51:43.300
What I want to do is only get 13% of these entire populations and that would be 500 million × .1357.
00:51:43.300 --> 00:51:57.300
You can do this on a calculator, so that × 500 million = 67.83 million.
00:51:57.300 --> 00:52:04.100
Do not forget to put the million part.
00:52:04.100 --> 00:52:09.500
It is not that you only have 67 people who have fewer friends than Diana.
00:52:09.500 --> 00:52:12.000
That would be our answer right there.
00:52:12.000 --> 00:52:26.600
The researchers want to compare a given sample of Facebook users average number of friends a sample of 25 to the whole population.
00:52:26.600 --> 00:52:38.800
What proportion of sample means will be equal or greater than the mean of this group?
00:52:38.800 --> 00:52:45.300
Here I'm going to assume because there is no other way to this problem.
00:52:45.300 --> 00:52:53.000
I am going to assume that we could use the information from example 2 because we are talking about the same thing, the number of friends.
00:52:53.000 --> 00:52:56.100
We actually know the population.
00:52:56.100 --> 00:53:15.600
The population is approximately normally distributed with the μ of 600 and standard deviation of 100.
00:53:15.600 --> 00:53:24.900
μ= 600, standard deviation=100 and from this I need to generate an SDOM because
00:53:24.900 --> 00:53:31.400
now we are talking about samples of people not just one person at a time.
00:53:31.400 --> 00:53:36.300
Because of that I need to generate SDOM for n = 25.
00:53:36.300 --> 00:53:54.500
The nice thing is we already know the μ sub x bar = μ that is 600 but we actually also know
00:53:54.500 --> 00:54:00.100
the standard error because standard error is standard deviation ÷√n.
00:54:00.100 --> 00:54:06.400
In this case, it is 100 ÷ √25 =20.
00:54:06.400 --> 00:54:14.700
1 standard error away here is 20.
00:54:14.700 --> 00:54:21.700
This would be 580, 560, and so forth.
00:54:21.700 --> 00:54:30.700
It is asking what proportion of sample means will be equal to or greater than the mean of this group?
00:54:30.700 --> 00:54:41.300
Equal to or greater than means all of these and they are just asking for proportions we do not have to do anything to it once we get the answer.
00:54:41.300 --> 00:54:50.300
Well, it might be nice if we could actually get the z score for this SDOM.
00:54:50.300 --> 00:54:56.600
Here, instead of just putting 580 I would want to find the z score here.
00:54:56.600 --> 00:55:02.800
Here are friends but I want to know it in terms of z score.
00:55:02.800 --> 00:55:18.000
It is actually really easy because it is the z score of -1 and we can actually just use the empirical rule to find this out because we know at the mean,
00:55:18.000 --> 00:55:27.500
at the expected value we know that this is 50% and this is 34%.
00:55:27.500 --> 00:55:40.700
If we add that together, the proportion of sample means greater than or equal to the mean
00:55:40.700 --> 00:55:57.300
of this group that = the proportion where z score is greater than or equal to -1 and that is .84%.
00:55:57.300 --> 00:56:05.900
Final example, researchers want to know the 95% competence interval for tagged photos given that
00:56:05.900 --> 00:56:14.700
a sample of 32 people have an average of 185 tagged photos and a standard deviation of 112.
00:56:14.700 --> 00:56:16.800
Interpret what the CI means.
00:56:16.800 --> 00:56:26.200
Here we do not know anything about the population, but we do know x bar which is 185
00:56:26.200 --> 00:56:33.200
and we do know the standard deviation of the sample s which is 112.
00:56:33.200 --> 00:56:36.100
We also know n is 32.
00:56:36.100 --> 00:56:48.500
Remember when we talk about competence interval we want to go from the sample to figure out where the population mean be.
00:56:48.500 --> 00:57:00.100
What we do is we assume that we are going to pretend SDOM here and we assume that the
00:57:00.100 --> 00:57:08.200
x bar is going to equal the expected value of this SDOM which is 185.
00:57:08.200 --> 00:57:18.000
From there we could actually estimate the standard error by using s.
00:57:18.000 --> 00:57:40.600
Here μ sub x bar = 185 this is assumed not sigma but x sub x bar is s ÷ √n=112 ÷ √32.
00:57:40.600 --> 00:57:57.200
If you pull up a calculator you could just calculate that out 112 ÷ √32 and get 19.8.
00:57:57.200 --> 00:58:15.700
We know how far the jumps are and because we used s we cannot just find the z score we have to find t score.
00:58:15.700 --> 00:58:24.000
We will have to use t score in order to create a 95% competence interval.
00:58:24.000 --> 00:58:34.900
Although I do not know what the t distribution for the degrees of freedom of 32 – 1.
00:58:34.900 --> 00:58:40.500
I do not know degrees of freedom of 31 t distributions looks like.
00:58:40.500 --> 00:58:44.300
We will have to figure that out.
00:58:44.300 --> 00:58:51.300
What we eventually want is this to be .025.
00:58:51.300 --> 00:59:03.900
These are together a combined two tailed probability of 5% and we will have to use t inverse because we already know the probability.
00:59:03.900 --> 00:59:07.900
We want to go backwards to find the t.
00:59:07.900 --> 00:59:19.100
T inverse and we put in our two-tailed probability .05 and put in our degrees of freedom, which in this case is 31.
00:59:19.100 --> 00:59:24.500
We ask what is the t and it says it is 2.04.
00:59:24.500 --> 00:59:37.300
The t right here at these borders is 2.04 and because it is symmetrical we also know that this one is -2.04.
00:59:37.300 --> 00:59:47.200
In order to find the competence interval we are really looking for these raw values right here.
00:59:47.200 --> 01:00:04.700
In order to get that we add the middle point and add 2.04 standard errors to get out here and we subtract out 2.04 standard errors to get out here.
01:00:04.700 --> 01:00:15.500
The competence interval will be x bar + or - the t score.
01:00:15.500 --> 01:00:27.300
How many jumps multiplied by how big these jumps actually are and that is the score right here multiplied by s(x).
01:00:27.300 --> 01:00:38.300
If we actually put in our numbers that is going to be 185 + or -2.04 × 19.8.
01:00:38.300 --> 01:00:43.500
If you just pull out a calculator we could get 185.
01:00:43.500 --> 01:01:00.400
Make sure to put that = 185 even I forget sometimes +2.04 × 19.8 and remember Excel knows order of operations.
01:01:00.400 --> 01:01:04.100
It will do the multiplication part before it does the addition part,
01:01:04.100 --> 01:01:23.600
The upper limit will be 225.39 and the lower limit will be 144.61.
01:01:23.600 --> 01:01:37.600
I just rounded to the nearest tenth and this would be 225.4 and this would be 144.6.
01:01:37.600 --> 01:01:44.800
We need to interpret what the CI means.
01:01:44.800 --> 01:02:01.400
This means that there is a 95% chance that the population mean will fall in between 144.6 and 225.4 that is the interval.
01:02:01.400 --> 01:02:04.200
That is it for t-distributions.
01:02:04.200 --> 01:02:06.000
Thank you for using www.educator.com.